Compiler

You can view the code for this project here.

Writing a compiler for the "Unnamed Language" was the subject of the four homework assignments in my Compiler Construction class at UVic. This project spanned the entire semester, and seeing everything come together into a real, working compiler made it one of my favourite projects to date.

Lexer/parser

The first order of business was the implementation of a lexer and parser for the language. For this I elected to use the ANTLR3 parser generator. The main difficulty of this step was getting used to ANTLR's syntax and error messages, but otherwise it was fairly straightforward. At this point, the program would send an error message if there was a syntax error in the input, and do nothing otherwise.

Abstract syntax tree

Once I had a working parser for the language, I added code to return an abstract syntax tree upon successful parsing of the input program. Storing the code in a tree structure allows for much easier analysis of the program. Specifically, type checking of the program lends itself very nicely to this tree structure. An expression such as a + b will be stored as a binary PlusExpression vertex with two Variable children with known types, and we can simply ensure that there is a valid way to add the types of a and b together. Once this is checked, we can determine the type of the PlusExpression vertex, and use it in further analysis if necessary.

Intermediate code generation

With type checking implemented and tested, I could be reasonably sure that any input programs continuing past this point are at least syntactically correct. Being able to assume correct syntax makes any semantic analysis much easier. At this point, the compiler generates intermediate representation code and ensures that all variables exist in any scope in which they're used, among other minor analyses.

JVM bytecode generation

The syntax of our intermediate representation code was already very similar to the JVM's bytecode, so while this last compilation step isn't a direct one-to-one translation, it is very straightforward. Now the compiler outputs valid JVM bytecode that can be assembled into an executable class file.

My main takeaways from this project were learning how to design and maintain the code from a higher level than I was previously used to. Since I would be using the same code base over the entire semester, I had to design, document, and track my implementation in such a way that it would be easily modifiable as I added features.