As the title says, today, I will show you how to make a simple parser with Python, using Parsimonious, a Python parser.
Disclaimer
If you are reading this tutorial, you may want to build a programming language, but this one won't do amazing stuff like what you see at other tutorials. This tutorial will just help you how to make a real parser. Means that it will only parse your code and nothing else.
Ok, if you are ready to start, let's go!
The design of our language
In this tutorial, we will make a parser for our own language, a simple language called Cotton, the design of it will look like this:
[ x = 120 ]
[ y = "Hello world!" ]
[ print x ]
[ print y ]
Now, you know what our language will look like. Let's started!
Installation
First of all, you need to install Parsimonious first. Type the following in your Terminal to install it:
pip install parsimonious
Now, in your directory, create a Python file called parser.py, it will contains all of our code. Then open it using your favourite editor/IDE. Mine is Neovim.
First, on parser.py, import the Grammar module from parimonious.Grammar. This helps us to make the grammar of our language.
from parsimonious.grammar import Grammar
Now, we will declare a variable called grammar that will contain, well, our grammar.
grammar = Grammar("""
# The grammar here
""")
Replace the # The grammar here part with our grammar:
expr = (statement / emptyline)* # Main part
emptyline = ws+ # Matches emptylines
ws = ~"\s*" # Matches whitespaces
# Classify square brackets
lpar = "[" # Matches the left one
rpar = "]" # Matches the right one
statement = lpar ws? things ws? rpar ws? # Statement
things = (print / declare)* # Commands
print = "print" ws types # the print command
declare = varname ws? equal ws? types ws? # The declare command
varname = ~"[A-Za-z_][A-Za-z0-9]*" # Matches ariable name
equal = ws? "=" ws? # Matches equal sign
types = (int / float / string / varname)* # Data types
# Int, float and string
int = ~"\d+"
float = ~"\d+\.\d+"
string = ~'"[^\"]+"'
Now, let's try it!
# Our test code
code = '''
[ x = "Hello" ]
[ y = 120 ]
[ print x ]
[ print y ]
'''
# Print the parse result
print(grammar.parse(code))
Now, when you run the program, if nothing is wrong, you should see a node tree like this:
<Node called "expr" matching "
[ x = "Hello" ]
[ y = 120 ]
[ print x ]
[ print y ]
">
<Node matching "
">
<Node called "emptyline" matching "
">
<RegexNode called "ws" matching "
">
<RegexNode called "ws" matching "">
<Node matching "[ x = "Hello" ]
">
<Node called "statement" matching "[ x = "Hello" ]
">
<Node called "lpar" matching "[">
<Node matching " ">
<RegexNode called "ws" matching " ">
<Node called "things" matching "x = "Hello" ">
<Node matching "x = "Hello" ">
<Node called "declare" matching "x = "Hello" ">
<RegexNode called "varname" matching "x">
<Node matching " ">
<RegexNode called "ws" matching " ">
<Node called "equal" matching "= ">
<Node matching "">
<RegexNode called "ws" matching "">
<Node matching "=">
<Node matching " ">
<RegexNode called "ws" matching " ">
<Node matching "">
<RegexNode called "ws" matching "">
<Node called "types" matching ""Hello"">
<Node matching ""Hello"">
<RegexNode called "string" matching ""Hello"">
<Node matching " ">
<RegexNode called "ws" matching " ">
<Node matching "">
<RegexNode called "ws" matching "">
<Node called "rpar" matching "]">
<Node matching "
">
<RegexNode called "ws" matching "
">
<Node matching "[ y = 120 ]
">
<Node called "statement" matching "[ y = 120 ]
">
<Node called "lpar" matching "[">
<Node matching " ">
<RegexNode called "ws" matching " ">
<Node called "things" matching "y = 120 ">
<Node matching "y = 120 ">
<Node called "declare" matching "y = 120 ">
<RegexNode called "varname" matching "y">
<Node matching " ">
<RegexNode called "ws" matching " ">
<Node called "equal" matching "= ">
<Node matching "">
<RegexNode called "ws" matching "">
<Node matching "=">
<Node matching " ">
<RegexNode called "ws" matching " ">
<Node matching "">
<RegexNode called "ws" matching "">
<Node called "types" matching "120">
<Node matching "120">
<RegexNode called "int" matching "120">
<Node matching " ">
<RegexNode called "ws" matching " ">
<Node matching "">
<RegexNode called "ws" matching "">
<Node called "rpar" matching "]">
<Node matching "
">
<RegexNode called "ws" matching "
">
<Node matching "[ print x ]
">
<Node called "statement" matching "[ print x ]
">
<Node called "lpar" matching "[">
<Node matching " ">
<RegexNode called "ws" matching " ">
<Node called "things" matching "print x">
<Node matching "print x">
<Node called "print" matching "print x">
<Node matching "print">
<RegexNode called "ws" matching " ">
<Node called "types" matching "x">
<Node matching "x">
<RegexNode called "varname" matching "x">
<Node matching " ">
<RegexNode called "ws" matching " ">
<Node called "rpar" matching "]">
<Node matching "
">
<RegexNode called "ws" matching "
">
<Node matching "[ print y ]
">
<Node called "statement" matching "[ print y ]
">
<Node called "lpar" matching "[">
<Node matching " ">
<RegexNode called "ws" matching " ">
<Node called "things" matching "print y">
<Node matching "print y">
<Node called "print" matching "print y">
<Node matching "print">
<RegexNode called "ws" matching " ">
<Node called "types" matching "y">
<Node matching "y">
<RegexNode called "varname" matching "y">
<Node matching " ">
<RegexNode called "ws" matching " ">
<Node called "rpar" matching "]">
<Node matching "
">
<RegexNode called "ws" matching "
">
That's a pretty large node tree, right?
Conclusion
From this tutorial, you have know how to make your own parser in Python. It's the end of my tutorial now, having a nice day coders! :D
How to make your own parser in Python!
Hello guys!
As the title says, today, I will show you how to make a simple parser with Python, using
Parsimonious
, a Python parser.Disclaimer
If you are reading this tutorial, you may want to build a programming language, but this one won't do amazing stuff like what you see at other tutorials. This tutorial will just help you how to make a real parser. Means that it will only parse your code and nothing else.
Ok, if you are ready to start, let's go!
The design of our language
In this tutorial, we will make a parser for our own language, a simple language called
Cotton
, the design of it will look like this:Now, you know what our language will look like. Let's started!
Installation
First of all, you need to install
Parsimonious
first. Type the following in your Terminal to install it:Now, in your directory, create a Python file called
parser.py
, it will contains all of our code. Then open it using your favourite editor/IDE. Mine is Neovim.First, on
parser.py
, import theGrammar
module fromparimonious.Grammar
. This helps us to make the grammar of our language.Now, we will declare a variable called
grammar
that will contain, well, our grammar.Replace the
# The grammar here
part with our grammar:Now, let's try it!
Now, when you run the program, if nothing is wrong, you should see a node tree like this:
That's a pretty large node tree, right?
Conclusion
From this tutorial, you have know how to make your own parser in Python. It's the end of my tutorial now, having a nice day coders! :D
Pretty cool! Im gonna have to try this!