Project partners:
The intention of this assignment was to develop a compiler, named jmm, which is able to translate Java-- programs into java bytecodes. The compiler follows a well defined compilation flow, which includes: lexical analysis (using an LL(1) parser), syntactic analysis, semantic analysis and code generation. Among these stages, it includes:
- Error treatment and recover mechanisms
- Generation of a Syntax Tree (Abstract Syntax Tree)
- Generation of a DAG (Directed Acyclic Graph)
- Generation of java bytecodes
The execute the compiler, use the following commands:
java –jar jmm.jar [-r=<num>] [-o] <input_file.jmm>
The compiler is able to skip a predefined number of errors, using the approach suggested on the project's proposal. Using the numberRecoveries variable (in the file jmm.jjt), the compiler is able to report numberRecoveries errors, so that the programmer can then proceed with their correction. This is done by skipping blocks of code whenever a new error is found, and incrementing a counter.
The compiler implements the following semantic rules:
-
Check if the return value of a function is ever initialized.
-
Check if variables are assigned to other variables with compatible types.
-
Check if the function called is compatible with any function (that is, a function having the same signature - number of arguments, as well as the type of those arguments).
-
Check if the return value of a function can be assigned to a variable.
-
Check if a variable is valid within a given scope.
-
Check if the return value of a function can be used in an expression.
-
Check if a variable is not defined more than one time.
-
Assumes the return value of a function it doesn't know to the variable it is beeing assigned or assumes it is void if not being assigned to anything
The intermediate representation is being delivered by both the Syntax Tree (Abstract Syntax Tree) and the DAG (Directed Acyclic Graph). This representation is made after both the lexical and syntax are complete. Also, the IR help us structure the Java-- code in something more simpler and manageable. It will also help us in the optimizations of the code generation part of the project.
The code generation is performed using as an input a DAG (Directed Acyclic Graph), which is generated from the AST (Abstract Syntax Tree). Then, the DAG is transversed starting from its root. Each DAG node is then matched with a JVM instruction. This instructions are already defined by the compiler, but are incomplete, having ? operators to mark a value that is expected by that same instruction. Each of the values are then provided in order, so that they can replace the ? operators, and the instructions outputted to the class file.
The group was able to achieve the expected compiler in this project:
- Developed a parser for Java-- using JavaCC and taking as starting point the Java-- grammar furnished (using LL(1));
- Included error treatment and recovery mechanisms;
- Proceeded with the specification of the AST;
- Included the necessary symbol tables;
- Semantic Analysis;
- Generated JVM code accepted by jasmin corresponding to the invocation of functions in Java--;
- Generated JVM code accepted by jasmin for arithmetic expressions;
- Generated JVM code accepted by jasmin for conditional instructions (if and if-else);
- Generated JVM code accepted by jasmin for loops;
- Generated JVM code accepted by jasmin to deal with arrays.
- Completed the compiler and test it using a set of Java-- classes;
- Proceeded with the optimizations related to the code generation, related to the register allocation (“-r” option) and the optimizations related to the “-o” option.
This were the suggested stages for the compiler and they were all applied in this project.
The tasks were well distributed betweed all the peers in this work. All of us had a change to work in every topic. The work was passed around to keep everyone interested and to be able to help the classmate. It should be also noted that everyone impacted the work the same way and help to provide a stable and healthy group environment.
The project was well rounded and distributed along it's parts, all having their more troubelling counterparts. This project gave us a better insight vision of how a compiller works and processes the information. It should also be taken in account the amount of new information learnt over the course of the semester to build this compiler.
The language has a very limited syntax. If included more like the programming language C it whould leave more option of implematation but whould also make the project more difficult. A good balance between a feasible work and a troubling one should be taken in consideration.