Share your Programming Language Jam submissions here!

← Back to all posts
Basil
elucent (41)

The Basil Programming Language

Hello! Basil is our (@basilTeam: @elucent, @minchingtonak) entry to the Repl.it language jam.

Unique Compilation

The goal of Basil is to try to have it all - high-level language features like macros, dynamic code evaluation, minimal type annotations - without any of the runtime costs normally associated with high-level languages - large runtime systems, slow performance, high memory use.

To accomplish this, Basil makes use of a novel system to compile its code. The main innovation is that the Basil compiler is actually an interpreter, able to interpret any Basil expression. When it gets to an expression it wants to save for runtime, it wraps it up and saves it to the end of the program. These runtime values propagate and result in an expression tree of all code reachable at runtime - notably without unnecessary operations between constants or dead code. Basil also statically type-checks this tree, so none of the code it ultimately compiles incurs a dynamic typing or boxing overhead.

Expressive Freedom

The primary goal of Basil as a language, however, isn't just speed. Basil exists to help programmers express complex problems in natural ways. While keeping Basil totally compiled and statically-typed, we've managed to implement the following features to enable uniquely customizable syntax:

  1. Simple, homoiconic code! Basil is a Lisp dialect at its core, and has extremely simple base syntax with no reserved words and few special symbols.

  2. Extremely simple macros! Only one primitive for code interpolation/evaluation, as opposed to the three or so usually present in Lisp dialects.

  3. User-defined infix functions and macros! Can have any arity (unary, binary, or more!) and custom precedence.

  4. Mixfix function syntax! Functions can be defined with keywords in their argument lists, to help break up high-arity function calls into natural-language phrases.

Implementation Details

The Basil compiler was implemented entirely within the language jam, almost entirely from scratch in C++. Only two libraries were used (both created before the jam):

  1. Jasmine, a runtime x86_64 assembler and object format.

  2. Collections, an alternative C++ standard library.

Everything else is totally original, created since August 10th.

The Basil compiler has no runtime dependencies other than libc, and the entire compiler and runtime fit comfortably in under 400 kB.

The code is first tree-walk interpreted by the compiler, using dynamically-typed Basil values to represent code. This is then lowered for runtime values to typed nodes, which are then lowered to a simple IR and then to x86_64 machine code. This code is then linked, loaded and executed just-in-time by the Basil environment.

Try It Out!

The best way to give Basil a try is to work through the interactive tour! It walks you through every core Basil feature, from the basic syntax to infix macros.

You can also try running the Basil REPL. You can reach it by exiting the tour on the project page, and then running ./basil in command line. The REPL mode features a number of commands to view compiler intermediates or generated assembly, so if you're curious how that all fits together give it a shot!

If you're curious how some part of the project works, or are looking for a list of all the language's features, take a look at our README for a somewhat informal language specification.

Support the Project

If you liked the project, consider giving us a star on GitHub, or following the devs (@elucentdev, @minchyAK) on Twitter!

If you'd like to get in touch (with a question, suggestion, etc), the best way to reach us is on Twitter or Discord (our handles are elucent#1839 and minchy#2474).

Thanks so much for taking a look at our project! It was a lot of fun making it, and we're going to be continuing work on it past the jam - so stay tuned! :)

Commentshotnewtop
sourcerose (15)

Ayyy! Congrats on the Win!

fuzzyastrocat (1289)

Neat! I'm having trouble understanding the compile system though — is it like a JIT compiler? What do you mean by "an expression it wants to save for runtime", does it compile some expressions and interpret others?

elucent (41)

@fuzzyastrocat It's an ahead of time compiler that interprets some expressions and compiles others. Specifically, it compiles stateful code, and interprets pure expressions that are guaranteed to terminate. You can see a bit more detail on this at the bottom of the README.

fuzzyastrocat (1289)

@elucent Okay, I think I get it now. But I'm not exactly sure how this achieves the goal listed above... wouldn't interpreting (evaluating) some expressions make it slower?

(Correct me if I'm wrong, not trying to downplay your cool lang!)

elucent (41)

@fuzzyastrocat So the trick is, interpretation happens totally separately from compilation! The compiler interprets some expressions in advance, and only the other expressions it didn't interpret make it into the final binary. So, the compile times might be a bit longer - the interpreter is indeed not very fast - but actually running the compiled program involves no interpreter!

You can think of it kind of like a really generalized kind of constant folding, if you're familiar with that.

fuzzyastrocat (1289)

@elucent Okay yeah, the constant folding idea helped me understand completely. Interesting idea, I could see how this could really be optimized for a [nearly] pure functional language like Haskell.

xxpertHacker (535)

To accomplish this, Basil makes use of a novel system to compile its code. The main innovation is that the Basil compiler is actually an interpreter, able to interpret any Basil expression. When it gets to an expression it wants to save for runtime, it wraps it up and saves it to the end of the program. These runtime values propagate and result in an expression tree of all code reachable at runtime - notably without unnecessary operations between constants or dead code. Basil also statically type-checks this tree, so none of the code it ultimately compiles incurs a dynamic typing or boxing overhead.

That is the weirdest, most indirect way that I've ever seen someone describe the concept "optimization" (regarding constant expressions).

Seems perfectly equivalent to C++'s concept of constexpr/constinit, can you confirm?

fuzzyastrocat (1289)

@xxpertHacker It's basically constant folding, except it works for literally anything that can be evaluated at compile-time. So if you write a fib function that has no side-effects (print statements, etc) and you write fib(10) somewhere in your code, then at compile-time that fib(10) will be replaced with 55.

While this definitely does help the goal of being a high-level language with low-level speed, I feel like there's a lot of runtime things which are what slow down high-level languages (automatic memory management being the first thing that comes to mind, along with a few things like dynamic dispatch). I'd be interested to know what the Basil team is planning for things like that.

xxpertHacker (535)

@fuzzyastrocat Literally C++'s constexpr.

template <typename T>
T const fibonacci(
	T const n,
	T const a = static_cast<T const>( 0 ),
	T const b = static_cast<T const>( 1 )
) {
	return n
		? fibonacci(
			n - static_cast<T const>( 1 ),
			b,
			a + b
		)
		: a;
}
...
using u128 = __uint128_t const;

std::wcout << fibonacci<u128>( static_cast<u128>( 10 ) );
// emitted binary only has call to `std::wostream operator<<` with a static `55`

Yet C++ allows the user to manually do it to make sure that it happens at compile-time. (debatable which is better, but a good proportion of non-constexpr code is still folded at compile time)

Really, any language could be optimized like this, I'm not sure whether or not it is rare though.

I know that Google's V8 ECMAScript engine will optimize constants into static values, but it won't call functions.
Any language that uses LLVM to compile their code should have optimizations like this applied, to a certain degree.
I wonder if languages like Java do this, being a high-level, memory-managed language, which is considered to be less than ideal in performance, according to Basil's view.

This is really just an optimization technique done by the compiler, it's not new, now if Basil can successfully use your stack-based memory management idea, and optimize the hell out of it, that might be interesting.

fuzzyastrocat (1289)

@xxpertHacker I agree with you on pretty much all your points there. While it does help, I don't think this is an optimization will entirely fulfill the goal of being a high-level language with low-level speed.

My gc-less stack memory management model requires one big thing: strict FP (and by virtue of that, strict immutability). Take that away, and it becomes really inefficient and bad. (I tried it on a custom implementation of Python I'm making... it didn't go too well.) Eros has this — however, because Basil is a LISP descendant, I have a feeling it's not a strictly FP language. In that case, the scheme may be partially applied (and it probably won't be as bad as it was with my Python implementation), but it won't be fully optimal performance-wise.

Now, there may be optimizations that could make it work on a non-strictly FP language. I have yet to find them however.

xxpertHacker (535)

@fuzzyastrocat Immutable FP code is the most optimizable stuff on this Earth, they just don't compare, but yeah, I was literally thinking about how immutability plays into this as I had written the comment, I believe that, ideally, a language could use whichever it feels appropriate, and determine which to use at compile-time.

For example, function f generates a short, immutable array that is passed around, so it might be eligible for allocation into stack memory, whereas another function, g, uses a mutable struct, the compiler should use normal computer memory to store it.

(Can functions be called with these different memory management methods and still work?)

You could say that it's the best of both worlds, at the cost of the compiler developer's time. It could get complicated trying to interlope different memory management systems in the same program, all without the programmer ever having to care.

Maybe one could just statically analyze all possible code paths and choose the ideal place to insert memory deallocation instructions?

Also, any language attempting to gain low-level performance would need lossless abstractions to be a core part of the language, and the way that it's written. I believe Rust encourages this.

fuzzyastrocat (1289)

@xxpertHacker Agreed, there's a lot of opportunities for optimization of immutability.

Hmm, yeah that would be interesting. It'd definitely be a puzzle to figure out for the developer, but I think using multiple memory management schemes would be possible.

The difficulty I see is overlap — what happens when f's result is passed to g? You'd have to have two different "kinds" of g: one that accepts something from the stack and one that accepts something from the heap. (Actually, as I'm writing this I'm thinking that encoding that information into the value itself might be more efficient, though more difficult).

The static analysis of code paths is another thing I've been thinking about. It gets difficult since memory can be used in different ways based on runtime situations:

int* reference;

void fun(int* foo) {
  reference = foo;
}

int main() {
  int* value = NULL;
  if(<blah>) {
    fun(value);
  }
  // *value never used again — deallocate?

  // do something with *reference
}

Here, we can't deallocate value since reference might point to value and it's used later. So we can do borrow checking like Rust, but that pushes some of it onto the programmer again.

Or does it? I can't prove this, but I have a feeling that an "automatic borrower" could be made, i.e. it would automatically borrow and/or copy memory where needed. If such a system exists, it would probably be complicated, but it might be possible.

You're correct about Rust and lossless abstractions, and yeah that would need to be an integral part of the language itself.

xxpertHacker (535)

@fuzzyastrocat

Obviously, this would incur some memory overhead, as well as a slight performance overhead since every time a reference to the value was made it would have to be added to the value's reference list and it would have to be removed when that reference goes out of scope.

Sounds like reference counting, can you confirm? Or are you suggesting something that is moderately different?
If so, deferred reference counting, at a language-level, could help out quite a lot here, falling back to normal reference counting when it can't be used.

The difficulty I see is overlap — what happens when f's result is passed to g? You'd have to have two different "kinds" of g: one that accepts something from the stack and one that accepts something from the heap.

Yes! This is precisely what I had referred to.

(Can functions be called with these different memory management methods and still work?)

If function f and g both accept an object, or value that needs to fit into memory, as opposed to a single integer/float, the function could be given an object that was "moved" and is stack-allocated, or it could be called with an object that is stored in "normal" memory. In reality, the instructions that would operate on the data are inherently different, meaning a branch must be somewhere.
Maybe it's like C++'s overloading or generic code, where a separate function would be statically linked and called in place of what the programmer thought was only one function.
Or we could have a runtime branch, but that would introduce runtime overhead.

How would one deal with functions then?

T f (reference) {
    ...
}

T g (reference) {
    ...
}

T h (reference) {
    let functionArray = [ f, g ];
    functionArray[ index ]( reference );
}

Stuff like that (not uncommon in my own code), would not be fun to statically analyze. One could insert extra code to intercept which function was called, a sort of "boxed" function, almost allowing dynamic types to be passed in, and then calling and returning the proper function (ideally while reusing the stack frame for extra optimization).

There are infinite ways one might solve this problem, I wonder how Basil will handle it.

The static analysis of code paths is another thing I've been thinking about. It gets difficult since memory can be used in different ways based on runtime situations:

Oh, of course, static analysis can always be complicated, but the key idea is that when it can help, put the information gained to use, which would improve what programs that it can, while leaving the rest unchanged. If it's too complicated, the compiler can bail out on the optimization and leave it to whatever it currently would do, if it had not applied the optimization at all.

fuzzyastrocat (1289)

@xxpertHacker I deleted that part when I realized how similar to reference counting it was.

Right, there'd have to be some kind of branch. Except for situations where it could be deduced at compile-time what type of value was getting passed to the function.

What do you mean? In the example you give there, zero copies would need to be made. The function selected there would be passed a reference (the reference passed to h), and whichever function that was would get ownership of it (it doesn't matter which one, it just matters that a function borrowed it).

Agreed, I'm interested to see how Basil handles it.

Right. Also, you can always implement static analysis in stages, i.e. start with a naive algorithm and slowly build up to a complex one.

xxpertHacker (535)

@fuzzyastrocat

What do you mean? In the example you give there, zero copies would need to be made. The function selected there would be passed a reference (the reference passed to h), and whichever function that was would get ownership of it (it doesn't matter which one, it just matters that a function borrowed it).

What matters is that the function being called cannot easily be determined, and it may later be called with a different object.

The function, when called, would have to be set up with instructions prepared to operate on the data that was passed to it, regardless of whether the value was stack-allocated or allocated in linear memory.

f(stack_object); // can be stacticlly determined which to call 
if ( condition ) {
    f(stack_object); // same thing, just in a branch :/
}
arr = [ f ];

arr[ index ](stack_object); // nothing wierd yet, all functions are assumed to be their stack allocated accepting variants 

arr[ index ](linear_object); // wait, now the **same** function needs to staticially take a different shaped object??

The last example is where the problem could arise from using different memory storage methods. Nothing to do with copying at all, although, the function itself might make a copy, which is why it matters since it needs to know how to transverse the object, which differs based on how it is stored (stack vs linear memory).

fuzzyastrocat (1289)

@xxpertHacker Ohhhh, you're talking about the stack-allocated/heap-allocated thing (I thought you were talking about my auto-borrow idea). Yeah, that's where having the "what type of allocation is it" data encoded into the value would be nice, since that way the function could easily behave differently based on what type of value it gets.

xxpertHacker (535)

@fuzzyastrocat But not without runtime performance overhead and/or binary bloat.

elucent (41)

@xxpertHacker I hadn't checked Repl in a while, so I'm sorry it took a while, but I figured it'd be good to conclusively respond to your point.

In short, yes, from an evaluation perspective you can think of Basil as constexpr-by-default. The difference between Basil and C++ (or just about any other language) is that Basil specifically leverages this for lots of other compile-time features. New types, generic functions, metaprogramming - in a language like C++ these each require separate support in the language, generally involving dedicated grammar. In a language like Scheme or Python, you can use existing procedural abstractions to manage these things to an extent, but only with a substantial runtime cost. In Basil, these kind of fall out of the woodwork - types and quoted code are just values, and you can manipulate them with expressions and functions like anything else. But because they're evaluated constexpr-by-default, like everything else, there's no runtime cost incurred. Overall, the fundamental pieces of Basil's evaluation system definitely aren't that new, but besides Basil I'm not aware of any other language that combines them in quite the same way.

For the record I actually think C++ is one of the languages that comes closest to what Basil offers, what with constexpr and templates and such. I think the two main drawbacks (which also apply to languages with similar offerings like Haskell) are:

  • Inflexible grammar. Templates are pretty clunky and unreadable, especially if you go "all-out" with them. Basil of course uses plain old functions in places you'd otherwise need a template, and additionally supports infix functions and keywords, so the user has almost total control over how their function can be used.

  • Inability to manipulate code. Pretty straightforward, the C preprocessor is obviously a pretty weak and unsafe form of code manipulation, and this limits the programmer's ability to simplify their APIs and avoid repeated work. I work primarily in C++ and I find myself running up against this wall quite frequently. So Basil providing code manipulation and evaluation features in the same way it supports all other computations is something I'm quite happy with.

xxpertHacker (535)

@elucent

Inability to manipulate code. Pretty straightforward, the C preprocessor is obviously a pretty weak and unsafe form of code manipulation, and this limits the programmer's ability to simplify their APIs and avoid repeated work. I work primarily in C++ and I find myself running up against this wall quite frequently. So Basil providing code manipulation and evaluation features in the same way it supports all other computations is something I'm quite happy with.

Can you elaborate on that point in particular?

How does it look in Basil, what does it offer?

elucent (41)

@xxpertHacker Nothing complicated, it's mostly the same as in any Lisp. Code can be quoted (using the prefix : operator) to skip evaluating it and treat it as data, and data can be spliced (using the | ... | delimiters) into other code for evaluation. I think it might be bugged in the current REPL version unfortunately, but you could for example write the following function:

def (do stmt n :times)
    while n > 0
        |stmt| 
        n = n - 1
do :(display "hello world") 2 times

Basil also supports macros, for convenience, which automatically quote their arguments and splice their bodies into the containing code. They can be infix as well, so we could actually write the following:

infix-macro (stmt repeat n)
    def i |n|
    while i > 0
        |stmt|
        i = i - 1
(display "hello world") repeat 2

It lets us skip the explicit quoting when we use the procedure.

It's important to note that all of this is still just operating on simple Basil values - :(display "hello world") is just a list of a symbol and string, and could be equivalently written [:display "hello world"]. Even splice and quote, beneath the syntactic sugar, are just built-in functions - not keywords.

This means that all of this can be manipulated and composed with the same mechanisms we'd use for any other value, whereas in most other languages a special, non-composable macro grammar would be necessary (like #define or Rust's macros). The innovation in Basil over languages that have historically supported homoiconic metaprogramming is that Basil's compiler resolves all these operations ahead-of-time (erroring if it's impossible to do so), so there's no runtime cost, even though it's still possible to use arbitrary expressions.

DaRubyMiner360 (0)

Congrats on winning!

LucasAllori (27)

Great job at making this!

JaydenLiu1 (9)

Is this coding language Open-Source? It's not like I want the source code anyways...

elucent (41)

@JaydenLiu1 the source is on the Repl.it page! But we'll probably be doing most of our post-jam development on GitHub.

JaydenLiu1 (9)

@elucent Congrats winning the PLJAM!

JaydenLiu1 (9)

Wow! Nice coding language! Congrats on winning, and keep doing this! Never stop making awesome stuff!

LucasAllori (27)

If I wanted to make a programming language, where would I start? (Congratulations for winning btw)

fuzzyastrocat (1289)

Does Basil have a garbage collector? If not, how does it manage memory?

elucent (41)

@fuzzyastrocat We didn't implement one for the jam, so the current version leaks a bunch of memory. We'll be implementing one for a future release though.

fuzzyastrocat (1289)

@elucent Actually, the reason I asked is because I'm not sure if one is needed. I'm experimenting with a functional language of mine called Eros that has no manual memory management but doesn't require a garbage collector.

It compiles to a stack-based VM, and the stack-semantics allow it to have a very simple memory collection scheme: whenever an item is popped from the stack, free its memory and all other objects it contains. This requires some shifty use of memory so that things don't get accidentally freed, but so far it's actually working. I'm wondering if a similar concept could be used for Basil.

elucent (41)

@fuzzyastrocat I've considered using stack-based allocations in the past, but I'm not sure they're a great idea for performance. Using the stack over a separate heap allocation can yield some solid performance gains (I wrote a blog post about it in June), but I think these gains might fall off with more complex object graphs.

It basically comes down to, with stack semantics, you need to perform a collection whenever you pop, be it an object or stack frame. And anything you want to survive outside of the stack frame needs to be copied into the next stack frame down. With a separate heap, you can do fewer GCs overall - likely better for the cache because you don't need to jump into GC code very frequently.

So I guess the main thing is, it really depends. I think I could reuse the same GC infrastructure for both systems, and possibly even allow the user to choose between them. Stack allocations tend to be quite fast on a basic level, but I only profiled them with simple, single-allocated objects like strings. If you had a larger graph of objects that need to persist between frames, I think the performance gains would be cut by all the extra copying you need to do. Still, it's a very interesting concept, and I do plan on investigating it when we implement our GC system.

fuzzyastrocat (1289)

@elucent That's a nice blog post! Cool to hear that you're open to trying new ideas, I like that about the Basil team.

I do agree that a stack is a kind of "shallow" memory model that might make it difficult to implement more complex structures, but I've actually had success by treating more complicated structures as a single entity on the stack. Then, when it's time to deallocate, the structure and all entities inside it are freed.

In Eros, there is a "dummy" stack object right at the bottom (top? it's the first item in the stack frame) of each stack frame. This object contains all the arguments of the current function (and other important call data). The important part is that these objects have not been copied there, they have been moved there. This means that they are only there, so the garbage collector will only collect them once.

Obviously, sometimes you will have arguments used in multiple places. This is solved by the dummy object sometimes containing references to the argument values being in other places. This way, the arguments will only exist in one place at any given time, and they will only be collected once there are no more references pointing to them (which is computed at compile-time — the last time an argument appears will be where the argument is physically moved to).

I'm not sure how applicable this mechanism would be for Basil, but it might be interesting. I'm actually not entirely sure if it feasible but so far I have had success with it.

fuzzyastrocat (1289)

@elucent Oh also, one thing I think I didn't really make clear in my comment:

Even if Eros has to collect memory more often, it's still doing less work because it knows exactly what memory it needs to collect. (Contrast this to a traditional GC which must search through all objects to determine if they are "accessible".) The stack-y system bypasses this search operation, which means it takes constant time (theoretically) to collect garbage rather than time which is proportional (in some degree, perhaps O(n) or O(n^2) or the like, I'd have to do more detailed analysis) to the amount of data currently "in" the program.

ironblockhd (388)

dang you really deserved to win with this much effort put into!

TheForArkLD (734)

Hello Lua, and congrats! 👏

fuzzyastrocat (1289)

@TheForArkLD I don't really think this is like Lua, I'd say it's very like Crystal because it's a high-level language with performance times comparable to C (well, I dunno about Basil, I haven't tried it yet, but I assume that's the goal)

ridark (102)

This is soooooo cool, love it keep it up @basilTeam @elucent @minchingtonak

realTronsi (762)

Congrats and good job!

fuzzyastrocat (1289)

Good job on the win! Not really a fan of LISP-y syntax, but I will definitely learn this lang since you won!