About
SMPL is a small programming language used in academia and can be parsed using an LL(1) parser. It consists of integers, arrays (single and multidimensional) and functions (user- and pre-defined).
Aim
The aim of this project is to implement an optimizing compiler of SMPL language. The entire project is built from
scratch, including the lexer and the parser.
The compiler should generate an SSA-based IR, and it should perform optimizations such as copy propagation,
common subexpression elimination, and dead code elimination.
Note: this project is a WIP
Implementation
- Generating SSA-based IR
This step involved generating SSA-based IR of the given input program while leaving out arrays and functions. Optimizations such as copy propagation and common subexpression elimination are performed on the fly while generating the IR.
-
Building a recursive-descent parser
Based on the given grammar, I first built a simple recursive-descent parser for a subset of the language. -
Performing optimizations
While parsing, the copy propagation and common subexpression elimination optimizations are applied on the fly. To achieve this, I build the dominator tree as I parse the input and before converting any statement into its equivalent instruction, I first check if either of the optimizations can be applied.
The output of this step is best visualized once the IR Visualization tool is built.
- Building an IR visualization tool
The in-memory data structure of the previous step stores all the information required to build a visualization of the
control flow graph (CFG) of the program.
In this step, I simply walk the IR generated and transpile it into DOT language and use graphviz
to visualize the
CFG.
- Adding array operations
In this step, I extend the compiler to include array operations. These operations also undergo common subexpression
elimination to eliminate any redundant loads.
One important point here is to include kill instructions every time a store is performed. This is important as it
ensures that the correct value is used every time an array element is accessed.
Note: the kill instructions included here are just for debugging purpose
- Adding support for user-defined functions
After adding array operations, I added support for user-defined functions.
There are a few assumptions that I make here -
- Arrays cannot be passed to the functions
- Functions can only return a scalar
- Global variables are not accessible from within the functions
- Variable names are unique, i.e. there are no conflicting names between global and local variables
What I learned
Despite facing several challenges, building a compiler from scratch was a lot of fun!
Since this project mainly dealt with the middle- and the back-end of a compiler, everything I learned was new to me.
The two main takeaways from this project for me are -
-
The importance of using the right data structures
As I performed a lot of operations on the fly, I had to store a lot of information using many data structures. Choosing the right data structure here was crucial, as the number of data structures being used and the amount of duplicate info being stored could otherwise easily blow up. -
The importance of OOP and architecting the project well
I realized the importance of planning of the overall project design when I was dealing with user-defined functions. Since I had already created a separate class for a CFG keeping the overall program structure in mind, I was able to leverage this design advantage, making my work of dealing with user-defined functions as simple as creating another object of the CFG class.