SMPL Compiler

tags: Python Compilers

About

SMPL is a small programming language used in academia and can be parsed using an LL(1) parser. It consists of integers, arrays (single and multidimensional) and functions (user- and pre-defined).

Aim

The aim of this project is to implement an optimizing compiler of SMPL language. The entire project is built from scratch, including the lexer and the parser.
The compiler should generate an SSA-based IR, and it should perform optimizations such as copy propagation, common subexpression elimination, and dead code elimination.

Note: this project is a WIP

Implementation

- Generating SSA-based IR

This step involved generating SSA-based IR of the given input program while leaving out arrays and functions. Optimizations such as copy propagation and common subexpression elimination are performed on the fly while generating the IR.

  1. Building a recursive-descent parser

    Based on the given grammar, I first built a simple recursive-descent parser for a subset of the language.

  2. Performing optimizations

    While parsing, the copy propagation and common subexpression elimination optimizations are applied on the fly. To achieve this, I build the dominator tree as I parse the input and before converting any statement into its equivalent instruction, I first check if either of the optimizations can be applied.

    The output of this step is best visualized once the IR Visualization tool is built.

- Building an IR visualization tool

The in-memory data structure of the previous step stores all the information required to build a visualization of the control flow graph (CFG) of the program.

In this step, I simply walk the IR generated and transpile it into DOT language and use graphviz to visualize the CFG.

IR Visualization of a sample program
Optimized IR visualization of a program (Instr for 10 - z is reused)

- Adding array operations

In this step, I extend the compiler to include array operations. These operations also undergo common subexpression elimination to eliminate any redundant loads.
One important point here is to include kill instructions every time a store is performed. This is important as it ensures that the correct value is used every time an array element is accessed.

IR Visualization of program with arrays
IR Visualization of array operations

Note: the kill instructions included here are just for debugging purpose

- Adding support for user-defined functions

After adding array operations, I added support for user-defined functions.

There are a few assumptions that I make here -

IR Visualization of program with UDF
IR Visualization of a UDF

What I learned

Despite facing several challenges, building a compiler from scratch was a lot of fun! Since this project mainly dealt with the middle- and the back-end of a compiler, everything I learned was new to me.

The two main takeaways from this project for me are -

  1. The importance of using the right data structures

    As I performed a lot of operations on the fly, I had to store a lot of information using many data structures. Choosing the right data structure here was crucial, as the number of data structures being used and the amount of duplicate info being stored could otherwise easily blow up.

  2. The importance of OOP and architecting the project well

    I realized the importance of planning of the overall project design when I was dealing with user-defined functions. Since I had already created a separate class for a CFG keeping the overall program structure in mind, I was able to leverage this design advantage, making my work of dealing with user-defined functions as simple as creating another object of the CFG class.

Home  |  /projects/ Github Linkedin Email Resume Website Hugo theme by Yukuro