Basil
elucent (41)

The Basil Programming Language

Hello! Basil is our (@basilTeam: @elucent, @minchingtonak) entry to the Repl.it language jam.

Unique Compilation

The goal of Basil is to try to have it all - high-level language features like macros, dynamic code evaluation, minimal type annotations - without any of the runtime costs normally associated with high-level languages - large runtime systems, slow performance, high memory use.

To accomplish this, Basil makes use of a novel system to compile its code. The main innovation is that the Basil compiler is actually an interpreter, able to interpret any Basil expression. When it gets to an expression it wants to save for runtime, it wraps it up and saves it to the end of the program. These runtime values propagate and result in an expression tree of all code reachable at runtime - notably without unnecessary operations between constants or dead code. Basil also statically type-checks this tree, so none of the code it ultimately compiles incurs a dynamic typing or boxing overhead.

Expressive Freedom

The primary goal of Basil as a language, however, isn't just speed. Basil exists to help programmers express complex problems in natural ways. While keeping Basil totally compiled and statically-typed, we've managed to implement the following features to enable uniquely customizable syntax:

  1. Simple, homoiconic code! Basil is a Lisp dialect at its core, and has extremely simple base syntax with no reserved words and few special symbols.

  2. Extremely simple macros! Only one primitive for code interpolation/evaluation, as opposed to the three or so usually present in Lisp dialects.

  3. User-defined infix functions and macros! Can have any arity (unary, binary, or more!) and custom precedence.

  4. Mixfix function syntax! Functions can be defined with keywords in their argument lists, to help break up high-arity function calls into natural-language phrases.

Implementation Details

The Basil compiler was implemented entirely within the language jam, almost entirely from scratch in C++. Only two libraries were used (both created before the jam):

  1. Jasmine, a runtime x86_64 assembler and object format.

  2. Collections, an alternative C++ standard library.

Everything else is totally original, created since August 10th.

The Basil compiler has no runtime dependencies other than libc, and the entire compiler and runtime fit comfortably in under 400 kB.

The code is first tree-walk interpreted by the compiler, using dynamically-typed Basil values to represent code. This is then lowered for runtime values to typed nodes, which are then lowered to a simple IR and then to x86_64 machine code. This code is then linked, loaded and executed just-in-time by the Basil environment.

Try It Out!

The best way to give Basil a try is to work through the interactive tour! It walks you through every core Basil feature, from the basic syntax to infix macros.

You can also try running the Basil REPL. You can reach it by exiting the tour on the project page, and then running ./basil in command line. The REPL mode features a number of commands to view compiler intermediates or generated assembly, so if you're curious how that all fits together give it a shot!

If you're curious how some part of the project works, or are looking for a list of all the language's features, take a look at our README for a somewhat informal language specification.

Support the Project

If you liked the project, consider giving us a star on GitHub, or following the devs (@elucentdev, @minchyAK) on Twitter!

If you'd like to get in touch (with a question, suggestion, etc), the best way to reach us is on Twitter or Discord (our handles are elucent#1839 and minchy#2474).

Thanks so much for taking a look at our project! It was a lot of fun making it, and we're going to be continuing work on it past the jam - so stay tuned! :)

You are viewing a single comment. View All
fuzzyastrocat (1296)

Does Basil have a garbage collector? If not, how does it manage memory?

elucent (41)

@fuzzyastrocat We didn't implement one for the jam, so the current version leaks a bunch of memory. We'll be implementing one for a future release though.

fuzzyastrocat (1296)

@elucent Actually, the reason I asked is because I'm not sure if one is needed. I'm experimenting with a functional language of mine called Eros that has no manual memory management but doesn't require a garbage collector.

It compiles to a stack-based VM, and the stack-semantics allow it to have a very simple memory collection scheme: whenever an item is popped from the stack, free its memory and all other objects it contains. This requires some shifty use of memory so that things don't get accidentally freed, but so far it's actually working. I'm wondering if a similar concept could be used for Basil.

elucent (41)

@fuzzyastrocat I've considered using stack-based allocations in the past, but I'm not sure they're a great idea for performance. Using the stack over a separate heap allocation can yield some solid performance gains (I wrote a blog post about it in June), but I think these gains might fall off with more complex object graphs.

It basically comes down to, with stack semantics, you need to perform a collection whenever you pop, be it an object or stack frame. And anything you want to survive outside of the stack frame needs to be copied into the next stack frame down. With a separate heap, you can do fewer GCs overall - likely better for the cache because you don't need to jump into GC code very frequently.

So I guess the main thing is, it really depends. I think I could reuse the same GC infrastructure for both systems, and possibly even allow the user to choose between them. Stack allocations tend to be quite fast on a basic level, but I only profiled them with simple, single-allocated objects like strings. If you had a larger graph of objects that need to persist between frames, I think the performance gains would be cut by all the extra copying you need to do. Still, it's a very interesting concept, and I do plan on investigating it when we implement our GC system.

fuzzyastrocat (1296)

@elucent That's a nice blog post! Cool to hear that you're open to trying new ideas, I like that about the Basil team.

I do agree that a stack is a kind of "shallow" memory model that might make it difficult to implement more complex structures, but I've actually had success by treating more complicated structures as a single entity on the stack. Then, when it's time to deallocate, the structure and all entities inside it are freed.

In Eros, there is a "dummy" stack object right at the bottom (top? it's the first item in the stack frame) of each stack frame. This object contains all the arguments of the current function (and other important call data). The important part is that these objects have not been copied there, they have been moved there. This means that they are only there, so the garbage collector will only collect them once.

Obviously, sometimes you will have arguments used in multiple places. This is solved by the dummy object sometimes containing references to the argument values being in other places. This way, the arguments will only exist in one place at any given time, and they will only be collected once there are no more references pointing to them (which is computed at compile-time — the last time an argument appears will be where the argument is physically moved to).

I'm not sure how applicable this mechanism would be for Basil, but it might be interesting. I'm actually not entirely sure if it feasible but so far I have had success with it.

fuzzyastrocat (1296)

@elucent Oh also, one thing I think I didn't really make clear in my comment:

Even if Eros has to collect memory more often, it's still doing less work because it knows exactly what memory it needs to collect. (Contrast this to a traditional GC which must search through all objects to determine if they are "accessible".) The stack-y system bypasses this search operation, which means it takes constant time (theoretically) to collect garbage rather than time which is proportional (in some degree, perhaps O(n) or O(n^2) or the like, I'd have to do more detailed analysis) to the amount of data currently "in" the program.