a programming language has two or more steps, the first one is lexing, lexing is just a fancy word for "tokenizer", it takes the contents of a file and breaks it into tokens, here is an example:
while True:
print("Hello world!")
to:
WHILELOOP
BOOLEAN: True
COLON
PRINT FUNCTION
STRING: "Hello world!"
or, something like that. the next step is parsing, parsing checks if the tokens are in a specific pattern, and if the pattern is unknown, like :
True while ("hello world!")print
then it throws an error, but if it was correct, it would evaluate the tokens, so:
i will be using sly, sly stands for "sly lex yacc".
so create a new file and import sly's Lexer
from sly import Lexer
and then create a class called lexer, or anything you want, and let that class inherit from the sly Lexer class.
from now on your code would look like this:
from sly import Lexer
class lexer(Lexer):
pass
let us create a set called tokens, this is important, it MUST be called tokens to make the lexer recognize all our tokens, like ints or floats, etc.
once you have created the tokens set, add tokens called NUMBER, PLUS, NEGATIVE, with no quotes, your IDE or text editor or repl is probably throwing an error at you but ignore that.
tokens = {
NUMBER,
PLUS,
NEGATIVE
}
by the way, it doesn't need to be called NUMBER, PLUS and NEGATIVE, you can just call them num and sum and sub, or something else.
now, let us tell the lexer what these tokens ACTUALLY look like, luckily, sly makes this very easy to do, first, make variables for each token, the variable must have the same name as the token, or else, the lexer will get confused, second, let's add a raw string to each variable:
NUMBER = r'\d+' # regex for numbers
PLUS = r'\+' # must add a backspace.
NEGATIVE = r'\-'
now, let us try out the lexer by adding theses 4 lines of code:
for token in lexer().tokenize('1+1'):
print(token)
Success! we have created a lexer! but you may recognize a problem when you add spaces:
for token in lexer().tokenize('1 + 1'):
print(token)
Token(type='NUMBER', value='1', lineno=1, index=0)
Illegal character ' ' at index 1
since we didn't define what is a whitespace, it is throwing an error at us, to fix this, just add an "ignore" variable that specifies what you want the lexer to ignore, so just add into the lexer class:
ignore = r" \t" # regex for whitespace
and the error should go away!
Now we have a fully functioning lexer! i recommend adding your own tokens, like multiplication, and division.
the full code is here:
from sly import Lexer
class lexer(Lexer):
tokens = {
NUMBER,
PLUS,
NEGATIVE
}
NUMBER = r'\d+'
PLUS = r'\+'
NEGATIVE = r'\-'
ignore = r" \t"
while True:
data = input("> ")
for token in lexer().tokenize(data):
print(token)
i added some minor changes, but the rest of the code is still the same.
Let's Make a programming language! 1. the Lexer
How do programming languages work?
a programming language has two or more steps, the first one is lexing, lexing is just a fancy word for "tokenizer", it takes the contents of a file and breaks it into tokens, here is an example:
to:
or, something like that.
the next step is parsing, parsing checks if the tokens are in a specific pattern, and if the pattern is unknown, like :
then it throws an error, but if it was correct, it would evaluate the tokens, so:
to:
What we will be building
we will be building a lexer for a calculator, then in the next
post we will make a parser for it.
Let's begin!
i will be using sly, sly stands for "sly lex yacc".
so create a new file and import sly's Lexer
and then create a class called lexer, or anything you want, and let that class inherit from the sly Lexer class.
from now on your code would look like this:
let us create a set called tokens, this is important, it MUST be called tokens to make the lexer recognize all our tokens, like ints or floats, etc.
once you have created the tokens set, add tokens called NUMBER, PLUS, NEGATIVE, with no quotes, your IDE or text editor
or repl is probably throwing an error at you but ignore that.
by the way, it doesn't need to be called NUMBER, PLUS and NEGATIVE, you can just call them num and sum and sub, or something else.
now, let us tell the lexer what these tokens ACTUALLY look like, luckily, sly makes this very easy to do, first, make variables for each token, the variable must have the same name as the token, or else, the lexer will get confused,
second, let's add a raw string to each variable:
now, let us try out the lexer by adding theses 4 lines of code:
Success! we have created a lexer! but you may recognize a problem when you add spaces:
since we didn't define what is a whitespace, it is throwing an error at us, to fix this, just add an "ignore" variable that specifies what you want the lexer to ignore, so just add into the lexer class:
and the error should go away!
Now we have a fully functioning lexer!
i recommend adding your own tokens, like multiplication, and division.
the full code is here:
i added some minor changes, but the rest of the code is still the same.
@Th3OneAndOnly Sorry about that, this is my first tutorial on repl.it.