Compiler Construction-Ii Project "C2ASM" (Cross Compiler)
Compiler Construction-Ii Project "C2ASM" (Cross Compiler)
CONSTRUCTION-II
    PROJECT
    “C2ASM”
 (Cross Compiler)
COURSE TITLE:    COMPILER CONSTRUCTION-II
                 (BSCS-603)
COURSE SUPERVISOR:
                 Sir Tafseer Ahmed Khan
                 Madam Sadaf Alvi
                                                       Thanks,
                                                  M Owais Khan Afridi,
                                                  C2ASM Programmer.
We Need To Make                          Transition Diagrams For Identifiers And Keywords For
Others.
                                                      *
         Start            l/-            Other
Start = =
Other *
Start < =
>
                                 Other                     *
For ‘ >’,’>=’ :-
Start > =
Other *
            Start                  / / / /-
                                  % / * +
FA For Punctuations :-
For ( , ) , { , } , , , ;
Start ( ,),{,},,,;
FA For Numbers :-
Start d L/l
                                             Other
                                                          *
FA For Errors :-
                       ^
Void                               (                 123
Main                               )                 231
Int                                {                  0 etc
Long                                }
If                                  ,               Identifiers
Else                                 ;              variable names
For                                                 function names
While
Do
Return
OPERATORS
==                                 +                                  =
<>                                 -
>                                   /
>=                                  *
<                                 %
<=
Format:
(class , value)
Keywords:
 (void , ----)
 (main, ---)
(dt, int/long)
(if, ---)
(else, ---)
(for, ---)
(while, ---)
(do, ---)
(return, ---)
Identifiers:
(Id, _1/_fld/…….etc )
Punctuations:
braces_open={
braces_close=}
paranthesis_open=(
paranthesis_close=)
comma= ,
semicolon= ;
square_open= [
square_close= ]
Operators:
relop= ‘==’, ‘<>’ , ‘>=’ , ‘<=’ , ‘>’ , ‘<’
assignop= ‘=’
add_sub= ‘+’,’-‘
mul_div_mod= ‘*’ , ’/’ , ‘%’
Numbers:
int_const = 0,1,2,232,2323,…….etc
long_const = 1L,2L,232l,2312123L,…….etc
GRAMMAR:
Features
The grammar for the C-Language subset has the following notable features.
   1. Multiple Global Variable Declarations
   2. Multiple Global Function Declarations
   3. Main program
   4. Variable Declarations at the start of the program as in C-Language
   5. for, while and do-while loops
   6.  Nested loops
   7.  Function calls but they are different from original C-Language function calls,
   8.   The ‘return’ key word
   9.  The argument to functions or the right hand side of an assignment operator can be function
       calls
   10. Recursion is allowed.
Productions
The grammar for the language has been split into three sections for its easy understanding. Three
sections in which the grammar is split into are
                         ,
3. <data-or-function> à id <data-or-function>
             Selection-Set =   ,
4. <data-or-function> à ;
             Selection-Set =   ;
5. <data-or-fucntion> à (<argument-list>) <function-body>
             Selection-Set = (
10. <variable-declarations> à Є
              Selection-Set = id , { , while , for , do , return , if , [ , }
12. <variable-list> à Є
             Selection-Set =      ;
13. <argument-list> à void
           Selection-Set = void
16. <argument> à Є
           Selection-Set = )
Statement Productions :
19. <statements> à Є
            Selection-Set = }
29. <optional-else> à Є
             Selection-Set = id , while , { , for , do , return , if , } , [
31.<right-hand-side> à [<fucntion-call>]
            Selection-Set = [
32.<fucntion-call> à id (<optional-expression-list>)
            Selection-Set = id
33. <optional-expression-list> à Є
             Selection-Set = )
34.<optional-expression-list> à <expression-list-element> <expression-listf>
            Selection-Set = ( , id , int_const , long_const , [
36. <expression-listf> à Є
            Selection-Set = )
37.<expression-list-element> à <expression>
           Selection-Set = ( , id , int_const , long_const
Expression Productions :
40. <relational> à Є
             Selection-Set = ) , ; , ,
43. <Subract> à Є
             Selection-Set = relop , ) , ; ,   ,
45.<Add> à + <U><add>
          Selection-Set = +
46. <Add> à Є
             Selection-Set = - , relop , ) , ; , ,
48.<Multiply> à * <V><Multiply>
            Selection-Set = *
49. <Multiply> à Є
             Selection-Set = + , - , relop , ) , ; , ,
51.<divide> à / <W><divide>
            Selection-Set = /
52. <divide> à Є
             Selection-Set = * , + ,      - , relop , ) , ; , ,
55. <mod> à Є
             Selection-Set = / , * , + , - , relop , ) , ; ,   ,
56.<X> à (<expression>)
           Selection-Set = (
57.<X>à id
             Selection-Set = id
58.<X> à int-const
             Selection-Set = int_const
59.<X> à long-const
           Selection-Set = long_const
Convention:
  • Those ACTION SYMBOLS,Which are in small letters are used for type checking,No
     INTERMEDIATE CODE is Generated for them.
  • Those ACTION SYMBOLS,Which are in CAPITAL LETTERS letters are used to show
     ATOMS,means INTERMEDIATE CODE is generated for them.
  • Those in Bold Italic belong to TOKEN SET.
4. <data-or-function>k à ;
10. <variable-declarations> à Є
12. <variable-list>t à Є
13. <argument-list> à void {Set parameter Info of this Particular Function to Void}
                     {Do Function binding}{Registering Function’s Start, Set func_index}
16. <argument> à Є
19. <statements> à Є
29. <optional-else> à Є
31.<right-hand-side>r à [<fucntion-call>f]
                            {Find Function’s Return Value and assign it’s referece to
                            Right hand Side i.e,r }
                  rßf
33. <optional-expression-list> à Є
36. <expression-listf> à Є
--CONVENTIONS:
*<Subract/Add/Multiply/Divide/Mod>p,q
                   p=Inherited Attribute,q=Synthesized Attribute
*<arithmetic/T/U/V/W/X>p
                   p= Synthesized Attribute
*<relational>k,t1
                   k,t1= Synthesized Attribute
56.<X>p à (<expression>p)
   58.<X>p à int-consti
                pßi
   59.<X>p à long-consti
               pßi
   • Name:
             It’s a String Class Object
• Datatype
        0=Int
        1=Long
        22=Void
• Scope
        0=Global Scope
        1,2,3…….n= ScopeStack( ).Top
               Where “ScopeStack( ).Top” is a Method, that will give CURRENT SCOPE.
• Binding
        -2= IdentifierBinded Globally
         -1= IdentifierBinded to Main()
        Func_indx=Index of Variable[Must be a Function Variable] to which a Local/Temp
                    Variable is binded
• Function_or_not
              Function_or_not=0, If Identifier is not a function
              Function_or_not=1, If Identifier is a Function
• Type
        0=Global Identifier
        1=Main Identifier
        2=Local Identifier
        3=Parameter Identifier
        4=Temporary Identifier
• Offset
         -10 = Undefined
         2n=Where n=1,2,3,4,……..n
         2n must be calculated by programmer defined function for Local And Parameters Of
   Functions
• Xtra
       -10 = Undefinded
       2n=Where n=1,2,3,4,……..n
       2n must be calculated by programmer defined function for the TOTAL SUM of all
   PARAMETERS
     Int/Long NUMBER
3) Symbol table for Labels.
   Specific instance in our code is labels. Which is a Vector of STL(C++)
STRING
• op
          op=OP-Code Like in ASSEMBLY or they’are ATOMS generated by the Syntax
             Box.In Our case it can’ve values like LABEL,ASSIGN,CONDJUMP,JUMP,
             RETURN,PARAM,CALL,JUMPF,CMP,ADD,SUB,MUL,DIV,MOD,
             PROC_MARKER.
• Type
          It’s an Internal representation of all the ATOMs in an INTEGER.Like LABEL has a
          type 25,CALL has type 31,etc.
• Expr
          It’s an structure,desgined for making ATOM Sets as small and with no REDUNDANT
          information already present in other tables.It has following structure.
                       Index Datatype Whichtable
          Datatype Int         Int        int
          -   Index
                 It points to the ORIGNAL position of identifiers in other tables.Which may be
                 syn_identifier’s or args_identifier’s INDEX.
          -   Datatype
                 It has a value
                         0=int
                         1=long
                         22=void
                         23=int_const
                         24=long_const
          -   Whichtable
                 As we’ve been discussing the TABLE formats in our compiler. It’s evident that
                 there are 5 tables with which we’re running the whole SYNTAx Box and Code
                 Generator.To Facilitate the programming these tables are ASSUMED to have
                    some INTEGER NUMBERs attached to them,Which is as Follows.
                          0=syn_identifier
                          1=args_identifier
                          2=number_long
                          3=number_int
                          4=lables
Note: Since Arg1,Arg2,Result are all Expr type Structures,Therefore They all are defined in terms
of Expr’s Fields.Those Fields which’ve an X in their place are NOT USED.We’ve used “-10” for
all the things which are UNDEFINED or have no Relevant meaning in that context
LABEL:
    It outputs a label in the code.
             Result.Index=Pointer to “labels “ symbol table particular entry
             Result.Datatype= -10
             Result.Whichtable=Table Number,here it’s 4
ASSIGN:
     It’ll perform the assignment operation like a=b or a=v+f….etc in the program.
             Arg1=R.H.S
             Arg1.Index=Points to “syn_identifier” or “args_identifier” or “number_long” or
                        “number_int” SYMBOL table’s particular entry.
             Arg1.Datatype=0 for int
                            1 for long
                            23 for int_const
                            24 for long_const
             Arg1.Whichtable
                            0=syn_identifier
                            1=args_identifier
                            2=number_long
                            3=number_int
             Result =L.H.S
                    Result.Index=Points to “syn_identifier” or “args_identifier” SYMBOL table’s
                                  particular entry.
                    Result.Datatype=
                            0 for int
                            1 for long
                    Result.Whichtable=
                            0=syn_identifier
                            1=args_identifier
CONDJUMP:
                    Arg1:
                    Arg1.Index=Points to “syn_identifier” SYMBOL table’s
                                 particular entry.
                    Arg1.Datatype=
                           0 for int
                           1 for long
                    Arg1.Whichtable=
                            0=syn_identifier
                    Arg2:
                    Arg2.Index= -10
                    Arg2.Datatype=0 or 1
                          We’ll use this value for comparison with Arg1
                Arg2.Whichtable= -10
                Result:
                Result.Index=Pointer to “labels “ symbol table particular entry
                Result.Datatype= -10
                Result.Whichtable=Table Number,here it’s 4
JUMPF:
                Result:
                Result.Index=Pointer to “labels “ symbol table particular entry
                Result.Datatype= -10
                Result.Whichtable=Table Number,here it’s 4
RETURN:
                 Arg1:
                 Arg1.Index=Pointer to “syn_identifier” table’s particular entry
                 Arg1.Datatype=0 or 1 or 22
                 Arg1.Whichtable=Table Number,Here it’s 0
                 Result:
                 Result.Index=Pointer to “syn_identifier“ or “args_identifier” or “number_long”
                               or ”number_int” symbol table’s particular entry
          Result.Datatype=0 for int
                         1 for long
                         23 for int_const
                         24 for long_const
          Result.Whichtable
                       0=syn_identifier
                       1=args_identifier
                       2=number_long
                       3=number_int
PARAM:
                Result:
                Result.Index=Pointer to “syn_identifier“ or “args_identifier” or “number_long”
                             or ”number_int” symbol table’s particular entry
                Result.Datatype=0 for int
                              1 for long
                              23 for int_const
                              24 for long_const
                Result.Whichtable
                              0=syn_identifier
                              1=args_identifier
                              2=number_long
                         3=number_int
CALL:
         Arg1:
                Arg1.Index=Number Of arguments expected
                Arg1.Datatype= -10
                Arg1.WhichTable= -10
         Result:
                Result.Index=Pointer to “syn_identifier“ symbol table’s particular entry
                Result.Datatype=0 for int
                               1 for long
                              22 for Void
                       It Shows Function’s return type
                Result.Whichtable
                               0=syn_identifier
JUMPF:
         Arg1:
                 Arg1.Index=Pointer to “syn_identifier” symbol table’s particular entry.
                 Arg1.Datatype=0 for int
                              1 for long
                 Arg1.Whichtable
                         0=syn_identifier
                         1=args_identifier
                 Arg2:
                 Arg2.Index=Pointer to “syn_identifier” or “args_identifier” or
                           “number_int” or “number_long” symbol table’s particular
                           entry.
                 Arg2.Datatype=0 for int
                               1 for long
                               23 for int_const
                               24 for long_const
                 Arg2.Whichtable
                               0=syn_identifier
                               1=args_identifier
                               2=number_long
                               3=number_int
                 Result:
                       Result.Index=Pointer to “labels” symbol table’s particular entry.
                       Result.Datatype= -10
                       Result.Whichtable
                                    4=labels
CMP:
             Arg1:
                     Arg1.Index=Pointer to “syn_identifier” or “args_identifier” or
                               “number_int” or “number_long” symbol table’s particular
                               entry.
                     Arg1.Datatype=0 for int
                                   1 for long
                            23 for int_const
                            24 for long_const
                     Arg1.Whichtable
                             0=syn_identifier
                             1=args_identifier
                             2=number_long
                             3=number_int
                     Arg2:
                     Arg2.Index=Pointer to “syn_identifier” or “args_identifier” or
                               “number_int” or “number_long” symbol table’s particular
                               entry.
                     Arg2.Datatype=0 for int
                                   1 for long
                                   23 for int_const
                                   24 for long_const
                     Arg2.Whichtable
                                   0=syn_identifier
                                   1=args_identifier
                                   2=number_long
                                   3=number_int
                     Result:
                           Result.Index=Pointer to “syn_identifier”
                                          symbol table’s particular entry.
                             Result.Datatype=0 for int
                                           1 for long
                             Result.Whichtable
                                          0=syn_identifier
ADD/SUB/MUL/DIV/MOD:
                 Arg1:
                     Arg1.Index=Pointer to “syn_identifier” or “args_identifier” or
                               “number_int” or “number_long” symbol table’s particular
                               entry.
                          Arg1.Datatype=0 for int
                                       1 for long
                                       23 for int_const
                                       24 for long_const
                          Arg1.Whichtable
                                        0=syn_identifier
                                        1=args_identifier
                                        2=number_long
                                        3=number_int
                          Arg2:
                          Arg2.Index=Pointer to “syn_identifier” or “args_identifier” or
                                    “number_int” or “number_long” symbol table’s particular
                                    entry.
                          Arg2.Datatype=0 for int
                                        1 for long
                                        23 for int_const
                                        24 for long_const
                          Arg2.Whichtable
                                        0=syn_identifier
                                        1=args_identifier
                                        2=number_long
                                        3=number_int
                          Result:
                          Result.Index=Pointer to “syn_identifier”symbol table’s particular entry.
                          Result.Datatype=0 for int
                                       1 for long
                          Result.Whichtable
                                        0=syn_identifier
PROC_MARKER:
                    Arg1:
                    Arg1.Index= -1 For main() or Function’s Index For OTHER FUNCTIONS
                    Arg1.Datatype= 1 for Start or 0 for End
                    Arg1.Whichtable=Saving The TOTAL LENGTH of OFFSETS of Local
                                    Variables/Temporaries of a Particular Function
Action Symbols:
                     There are many action Symbols used in this particular compiler.Many of them
are used for type checking and preparing other information which is being used by the code
generator.We’ve implemented them as Helper Functions having Declarations as follows.There
names help us guess their respective functionality
//Helper Functions
void settype(int index,int type);
void settype_args(int index,int type);
expr newtemp(int type);
string newtempname(void);
long chk_ident(long index);
bool chk_types(expr v1,expr v2);
void setatom(int op,expr arg1,expr arg2,expr result);
int newlabel(void);
long chk_func(long index);
void args_info(long index,long &init_arg,long &fin_arg);
int calc_param_offset();
int calc_local_offset();
CODE GENERATOR:
                           Now,The Code generator will take ATOMs STREAMS as input and
start making ASSEMBLY CODE.We’ve coded Functions against all ATOMS.So,When a Particular
Atoms is seen it’s corresponding function is called, generating ASSEMBLY code for it.Which can
be TESTED on an assembler.
MY CODING CONVENTION:
                    I’ve used mapping of all Keywords, Punctuations, Operators, Tokens, Atoms or
Intermediate Code to INTEGER NUMBERS, To ease Programming. Take a look at following to
better understand what I’m trying to say. I’ve used these number counterparts all over in my
implementation,since numbers are easy to handle then strings and more efficient.
"int" = 0
"long" = 1
"{" = 2
"}" = 3
"(" = 4
")" = 5
"," = 6
";" = 7
"==" = 8
"<>" = 9
">=" = 10
"<=" = 11
">" = 12
"<" = 13
"=" = 14
"+" = 15
"-" = 16
“*" = 17
"/" = 18
"%" = 19
"[" = 20
"]" = 21
"void" = 22
“int_const" = 23
"long_const" = 24
label" = 25
"assign" = 26
"condjump" = 27
"jump" = 28
"return" = 29
"param" = 30
"call" = 31
"jumpf" = 32
"cmp" = 100
"proc_marker" = 34