IIT Bombay

tags: Compilers Data Flow Analysis

Position: Research Intern

Oct 2019 - Jun 2020

Tech Stack: Java, ANTLR

About

As a Research Intern, I was working alongside Anshuman Dhuliya, a PhD student at IIT Bombay, on his project Synergistic Program ANalyzer (SPAN). My role was to contribute to the Front-end of SPAN - to create a source-to-source compiler that takes a high-level specification as input and transpiles it into its Python equivalent.

Problem

SPAN required data flow analyses as an input and in Python language to apply those optimizations to an input program. There are several ways to specify a data flow analysis but SPAN wanted them to be in a specific format and a way for the end users to be able to write them in an intuitive way.

Goal

To create a domain-specific language that would enable users to specify data flow analyses in an intuitive way, and to transpile the input to its Python equivalent.

Process

- Devising the domain specific language - specDFA

  1. Understanding lexers and parsers

    To gain a better understanding of lexers and parsers, I familiarized myself with Flex and Bison by creating a JSON parser.

  2. Understanding data flow analysis

    This was one of the most important steps as it would determine the syntax of the language to be developed. I researched about and understood the different type of analyses such as Available Expression, Constant Propagation, and Live Range Analysis

  3. Deciding upon the language syntax

    I acquainted myself with Haskell’s syntax to adopt it for specDFA. After several iterations, specDFA’s syntax was agreed upon.

Sample spec for Available Expression analysis
Sample spec for Available Expression analysis

- Converting specDFA to Python

  1. Building the Abstract Syntax Tree (AST)

    After exploring a few options, I used ANTLR to build the AST as it provides a convenient way to build and walk the parse tree. While walking the parse tree, I store all the required context at every step and build the AST required for the code generation step.

  2. Walking the AST and emitting Python code

    I used the builder pattern to emit the equivalent Python code. Since the AST stores all the required information, this step simply involved walking the AST and string formatting the context to generate the Python code

class AvL(types.Lattice):
    def __init__(self, func: obj.Func, val: Optional[set[types.Expr]] = None, top: bool = False, bot: bool = False):
        super().__init__(func, val, top, bot)

    def meet(self, other):
        return set.intersect(self.val, other.val)

    def __lt__(self, other):
        return set.issubset(self.val, other.val)
        
Python equivalent of the above specDFA input

What I learned

Home  |  /experience/ Github Linkedin Email Resume Website Hugo theme by Yukuro