How to make your own parser in Python!
Wumi4 (454)

Hello guys!

As the title says, today, I will show you how to make a simple parser with Python, using Parsimonious, a Python parser.

Disclaimer

If you are reading this tutorial, you may want to build a programming language, but this one won't do amazing stuff like what you see at other tutorials. This tutorial will just help you how to make a real parser. Means that it will only parse your code and nothing else.

Ok, if you are ready to start, let's go!

The design of our language

In this tutorial, we will make a parser for our own language, a simple language called Cotton, the design of it will look like this:

[ x = 120 ]
[ y = "Hello world!" ]
[ print x ]
[ print y ]

Now, you know what our language will look like. Let's started!

Installation

First of all, you need to install Parsimonious first. Type the following in your Terminal to install it:

pip install parsimonious

Now, in your directory, create a Python file called parser.py, it will contains all of our code. Then open it using your favourite editor/IDE. Mine is Neovim.

First, on parser.py, import the Grammar module from parimonious.Grammar. This helps us to make the grammar of our language.

from parsimonious.grammar import Grammar

Now, we will declare a variable called grammar that will contain, well, our grammar.

grammar = Grammar("""
	# The grammar here
""")

Replace the # The grammar here part with our grammar:

	expr = (statement / emptyline)* # Main part
        emptyline = ws+ # Matches emptylines

        ws = ~"\s*" # Matches whitespaces
	# Classify square brackets
        lpar = "[" # Matches the left one
        rpar = "]" # Matches the right one
        statement = lpar ws? things ws? rpar ws? # Statement

        things = (print / declare)* # Commands
        print  = "print" ws types # the print command
        declare = varname ws? equal ws? types ws? # The declare command
        varname = ~"[A-Za-z_][A-Za-z0-9]*" # Matches ariable name
        equal = ws? "=" ws? # Matches equal sign
        
        types = (int / float / string / varname)* # Data types
		
	# Int, float and string
        int = ~"\d+"
        float = ~"\d+\.\d+"
        string = ~'"[^\"]+"'

Now, let's try it!

# Our test code
code = '''
[ x = "Hello" ]
[ y = 120 ]
[ print x ]
[ print y ]
'''

# Print the parse result
print(grammar.parse(code))

Now, when you run the program, if nothing is wrong, you should see a node tree like this:

<Node called "expr" matching "
[ x = "Hello" ]
[ y = 120 ]
[ print x ]
[ print y ]
">
    <Node matching "
    ">
        <Node called "emptyline" matching "
        ">
            <RegexNode called "ws" matching "
            ">
            <RegexNode called "ws" matching "">
    <Node matching "[ x = "Hello" ]
    ">
        <Node called "statement" matching "[ x = "Hello" ]
        ">
            <Node called "lpar" matching "[">
            <Node matching " ">
                <RegexNode called "ws" matching " ">
            <Node called "things" matching "x = "Hello" ">
                <Node matching "x = "Hello" ">
                    <Node called "declare" matching "x = "Hello" ">
                        <RegexNode called "varname" matching "x">
                        <Node matching " ">
                            <RegexNode called "ws" matching " ">
                        <Node called "equal" matching "= ">
                            <Node matching "">
                                <RegexNode called "ws" matching "">
                            <Node matching "=">
                            <Node matching " ">
                                <RegexNode called "ws" matching " ">
                        <Node matching "">
                            <RegexNode called "ws" matching "">
                        <Node called "types" matching ""Hello"">
                            <Node matching ""Hello"">
                                <RegexNode called "string" matching ""Hello"">
                        <Node matching " ">
                            <RegexNode called "ws" matching " ">
            <Node matching "">
                <RegexNode called "ws" matching "">
            <Node called "rpar" matching "]">
            <Node matching "
            ">
                <RegexNode called "ws" matching "
                ">
    <Node matching "[ y = 120 ]
    ">
        <Node called "statement" matching "[ y = 120 ]
        ">
            <Node called "lpar" matching "[">
            <Node matching " ">
                <RegexNode called "ws" matching " ">
            <Node called "things" matching "y = 120 ">
                <Node matching "y = 120 ">
                    <Node called "declare" matching "y = 120 ">
                        <RegexNode called "varname" matching "y">
                        <Node matching " ">
                            <RegexNode called "ws" matching " ">
                        <Node called "equal" matching "= ">
                            <Node matching "">
                                <RegexNode called "ws" matching "">
                            <Node matching "=">
                            <Node matching " ">
                                <RegexNode called "ws" matching " ">
                        <Node matching "">
                            <RegexNode called "ws" matching "">
                        <Node called "types" matching "120">
                            <Node matching "120">
                                <RegexNode called "int" matching "120">
                        <Node matching " ">
                            <RegexNode called "ws" matching " ">
            <Node matching "">
                <RegexNode called "ws" matching "">
            <Node called "rpar" matching "]">
            <Node matching "
            ">
                <RegexNode called "ws" matching "
                ">
    <Node matching "[ print x ]
    ">
        <Node called "statement" matching "[ print x ]
        ">
            <Node called "lpar" matching "[">
            <Node matching " ">
                <RegexNode called "ws" matching " ">
            <Node called "things" matching "print x">
                <Node matching "print x">
                    <Node called "print" matching "print x">
                        <Node matching "print">
                        <RegexNode called "ws" matching " ">
                        <Node called "types" matching "x">
                            <Node matching "x">
                                <RegexNode called "varname" matching "x">
            <Node matching " ">
                <RegexNode called "ws" matching " ">
            <Node called "rpar" matching "]">
            <Node matching "
            ">
                <RegexNode called "ws" matching "
                ">
    <Node matching "[ print y ]
    ">
        <Node called "statement" matching "[ print y ]
        ">
            <Node called "lpar" matching "[">
            <Node matching " ">
                <RegexNode called "ws" matching " ">
            <Node called "things" matching "print y">
                <Node matching "print y">
                    <Node called "print" matching "print y">
                        <Node matching "print">
                        <RegexNode called "ws" matching " ">
                        <Node called "types" matching "y">
                            <Node matching "y">
                                <RegexNode called "varname" matching "y">
            <Node matching " ">
                <RegexNode called "ws" matching " ">
            <Node called "rpar" matching "]">
            <Node matching "
            ">
                <RegexNode called "ws" matching "
                ">

That's a pretty large node tree, right?

Conclusion

From this tutorial, you have know how to make your own parser in Python. It's the end of my tutorial now, having a nice day coders! :D

You are viewing a single comment. View All