Intermediate Code Generation
dr. Hristo Nikolov
LIACS
Leiden University
Leiden, Fall 2011
Why Intermediate Code?
Front-end: generates intermediate representation
Back-end: generates target code
Why Intermediate code?
Retargeting is facilitated
A machine-independent code optimizer can be applied
Parser
Static
Checker
Intermediate
Code
Generator
Intermediate
Code
Code
Generator
Intermediate Languages
Syntax trees, Postfix notation,
Three-address code (x := y op z)
Example:
a := b * -c + b * -c
assign
Syntax
tree:
assign
*
b
uniminus
DAG:
uniminus
Postfix notation: a b c uniminus * b c uniminus * + assign
uniminus
Three-Address Code
Sequence of statements:
x := y op z
x+y*z
=>
Example:
t1 := y * z
t2 := x + t1
a := b * -c + b * -c
t1 := - c
t2 := b *t1
t3 := - c
t4 := b * t3
t5 := t2 + t4
a := t5
Linearized representation
of a syntax tree
t1
t2
t5
a
:=
:=
:=
:=
- c
b *t1
t2 + t2
t5
Linearized representation
of a syntax DAG
Types of Three-Address Statements
Assignment statements:
Indexed assignments:
Pointer assignments:
Copy statements:
Unconditional jumps:
Conditional jumps:
Function calls:
p(x1,..., xn) =>
x := y op z and x := op y
x := y[ i ] and x[ i ] := y
x := & y, x := * y, * x := y
x := y
goto L
if x relop y goto L
param x, call p, n, and return y
param x1
. . .
param xn
call p, n
Syntax-directed Translation into
Three-Address Code
Temporary names are created
E E1 + E2
e.g.
=> t := E1 + E2,
t5 := t2 + t4
a := t5
When expression is an identifier no new temporary
Nonterminal E has two attributes:
E.place the name which holds the value of E
E.code three-address code sequence
Syntax-directed Translation
To produce three-address code for assignments
Production
S id := E
Semantic Rule
S.code := E.code || gen( id.place := E.place )
E E 1 + E2 E.place := newtemp();
E.code := E1.code || E2.code || gen( E.place := E1.place + E2.place )
E E1 * E2
E.place := newtemp();
E.code := E1.code || E2.code || gen( E.place := E1.place * E2 .place )
E - E1
E.place := newtemp();
E.code := E1.code || gen( E.place := uminus E1.place )
E ( E1 )
E.place := E1.place
E.code := E.code
E id
E.place := id.place
E .code :=
Function newtemp() returns distinct names (t1, t2)
gen(x := y op z) => x := y op z
Flow-of-control Statements
While statements
Production
S while E do S1
S.begin :
Semantic Rule
S.begin := newlabel()
S.after := newlabel()
S.code := gen( S.begin : ) ||
E.code ||
gen ( if E.place = 0 goto S.after ) ||
S1.code ||
gen ( goto S.begin) ||
gen( S.after : )
E.code
...
if E.place = 0 goto S.after
S1.code
...
goto S.begin
S.after :
...
Flow-of-control Statements
Example
Source program
fragment
Three-address
code sequence
i := 2 * n + k
while i do
i := i - k
t1 := 2
t2 := t1 * n
t3 := t2 + k
i := t3
L1: if i = 0 goto L2
t4 := i - k
i := t4
goto L1
L2:
Implementation (1)
Quadruples, triples, and idirect triples
Quadruple is record: op, arg 1, arg 2, result
a := b * -c + b * -c
t1
t2
t3
t4
t5
a
:=
:=
:=
:=
:=
:=
- c
b *t1
- c
b * t3
t2 + t4
t5
Three-address code
op
arg 1
arg 2
result
(0)
uniminus
(1)
(2)
uniminus
(3)
t3
t4
(4)
t2
t4
t5
(5)
:=
t5
t1
t1
t2
t3
Quads (quadruples)
Pro: easy to rearrange code for global optimization
Cons: lots of temporaries
Implementation (2)
Quadruples, triples, and idirect triples
Triple is record: op, arg 1, arg 2
a := b * -c + b * -c
t1
t2
t3
t4
t5
a
:=
:=
:=
:=
:=
:=
- c
b *t1
- c
b * t3
t2 + t4
t5
Three-address code
op
arg 1
arg 2
(0)
uniminus
(1)
(2)
uniminus
(3)
(2)
(4)
(1)
(3)
(5)
:=
(4)
(0)
Triples
Pro: temporaries are implicit
Cons: difficult to rearrange code
Implementation (3)
Quadruples, triples, and idirect triples
Implicit Triples: pointers to triples
a := b * -c + b * -c
t1
t2
t3
t4
t5
a
:=
:=
:=
:=
:=
:=
- c
b *t1
- c
b * t3
t2 + t4
t5
Three-address code
statement
op
arg 1
arg 2
(0)
(14)
(14) uniminus
(1)
(15)
(15)
(2)
(16)
(16) uniminus
(3)
(17)
(17)
(16)
(4)
(18)
(18)
(15)
(17)
(5)
(19)
(19)
:=
(18)
Implicit Triples
c
b
c
Triple Container
Pro: temporaries are implicit and it is
easier to rearrange code
(14)
Names and Scope
The three-address code is simplistic
it assumes that the names of variables can be easily
resolved by the back end in global or local variables
We need local symbol tables to
record global and local declarations in procedures,
blocks, and structs to resolve names
Declarations in a Procedure
PMD
M
{ offset := 0 }
DD;D
D id : T
{ enter( id.name, T.type, offset);
offset := offset + T.width }
T integer
{ T.type := integer;
T.width := 4 }
T real
{ T.type := real;
T.width := 8 }
T array [ num ] of T1
{ T.type := array( num.val, T1.type );
T.width := num.val x T1.width }
T ^ T1
{ T.type := pointer( T1.type );
T.width := 4 }
Keeping Track of Scope Information
Nested procedures
PD
D D ; D | id : T | proc id ; D ; S
D proc id ; D1 ; S: then a new symbol table is created
The new table points to the enclosing procedures table
Procedure enter() knows in which table to make an entry
Symbol Table Operations
mktable( previous ): returns a pointer to a new table
which is linked to a previous table in the outer scope
enter( table, name, type, offset ): creates a new entry
in table
addwidth( table, width ): accumulates the total width
of all entries in table
enterproc( table, name, newtable ): creates a new
entry in table for procedure with local scope newtable
Processing Declarations
Nested Procedures
PMD
{ addwidth( top( tblptr ), top( offset ) );
pop( tblptr ); pop( offset ) }
{ t := mktable( nil );
push( t, tblptr ); push( 0, offset ) }
D D1 ; D2
D proc id ; N D1 ; S
{ t := top( tblptr );
addwidth( t, top( offset ) );
pop( tblptr ); pop( offset );
enterproc( top( tblptr ), id.name, t ) }
D id : T
{ enter( top ( tblptr ), id.name, T.type, top( offset ) );
top( offset ) := top( offset ) + T.width }
{ t := mktable( top( tblptr ) );
push( t, tblptr ); push( 0, offset ) }
Assignment Statements
A Translation Scheme
S id := E
{ p := lookup( id.name );
if p <> nil then
emit( p := E.place )
else error }
E E1 + E2
{ E.place := newtemp;
emit( E.place := E1.place + E2.place ) }
E E1 * E2
{ E.place := newtemp;
emit( E.place := E1.place * E2.place ) }
E - E1
{ E.place := newtemp;
emit( E.place := uniminus E1.place )}
E ( E1 )
{ E.place := E1.place }
E id
{ p := lookup( id.name );
if p <> nil then
E.place := p
else error }
Reusing Temporary Names
Each time newtemp() generates a new temporary name
Can we reuse temporaries?
E E1 + E2 results in:
evaluate E1 into t1
evaluate E2 into t2
t3 := t1 + t2
If t1 is no longer used, can reuse t1
instead of using new temp t3
Modify newtemp() to use a stack :
Keep a counter c, initialized to 0
newtemp() increments c and returns temporary $c
Decrement counter on each use of a $i in a three-address
statement
Reusing Temporary Names
Example: x := a * b + c * d e * f
Statement
Value of c
0
$0 := a * b
$1 := c * d
$0 := $0 + $1
$1 := e * f
$0 := $0 - $1
:= $0
Addressing Array Elements
One-Dimensional Arrays
A : array [low .. high] of type;
ith element of A begins in location
base + (i low) x W
base = A[low], W = sizeof( type )
t1
t2
t3
:=
:=
:=
:=
c
i * W
t1[t2]
t3
Compile time evaluation if rewritten as
i x W + ( base low x W )
c = base low x W is stored in symbol-table entry for A
A[i] = i x W + c
Addressing Array Elements
Two-Dimensional Arrays... (1)
A : array [1..2, 1..3] of integer;
First
Row
A[1,1]
A[1,1]
A[1,2]
A[2,1]
A[1,3]
A[1,2]
A[2,1]
A[2,2]
A[2,2]
A[1,3]
A[2,3]
A[2,3]
Row-major
Column-major
First
Column
In Row-major form A[i,j] is equivalent to A[i][j]
Addressing Array Elements
Two-Dimensional Arrays... (2)
A : array [1..2, 1..3] of integer; // Row-major form
A[i,j]= base+((ilowi)x Nj+j-lowj)x W
Nj = highj-lowj+1
((i x Nj) + j) x W + c, where
c = base - ((lowi x Nj) + lowj) x W
t1
t1
t2
t3
t4
:=
:=
:=
:=
:=
:=
i * 3
t1 + j
c
t1 * 4
t2[t3]
t4
Addressing Array Elements
Grammar
L := E
E+E
|(E)
|L
L
Elist ]
| id
Elist Elist , E
| id [ E
S
E
Synthesized Attributes:
E.place
Elist.array
Elist.place
Elist.ndim
L.place
L.offset
null
name of temp holding value of E
array name
name of temp holding index value
number of array dimensions
lvalue (=name of temp)
index into array (=name of temp)
indicates non-array simple id
Addressing Array Elements
A Translation Scheme ... (1)
S L := E
{ if L.offset = null then /* L is a simple id */
emit( L.place := E.place )
else
emit( L.place [ L.offset ] := E.place ) }
E E1 + E2
{ E.place := newtemp;
emit( E.place := E1.place + E2.place ) }
E ( E1 )
{ E.place := E1.place }
EL
{ if L.offset = null then /* L is a simple id */
E.place := L.place
else begin
E.place := newtemp;
emit( E.place :=L.place [ L.offset ] )
end }
Addressing Array Elements
A Translation Scheme... (2)
L Elist ]
{ L.place := newtemp; L.offset := newtemp
emit( L.place := c( Elist.array ) )
emit( L.offset := Elist.place * width( Elist.array )) }
L id
{ L.place := id.place;
L.offset := null }
Elist Elist 1, E
{ t := newtemp; m := Elist 1.ndim +1;
emit( t := Elist 1.place * limit( Elist 1.array, m ));
emit( t := t + E.place );
Elist.array := Elist 1.array;
Elist.place := t;
Elist.ndim := m }
Elist id [ E
{ Elist.array := id.place;
Elist.place := E.place;
Elist.ndim := 1 }
Translating Boolean Expressions
Composed by boolean operators and, or, not
Applied to boolean variables or relational expressions
Relational expression: E1 relop E2, where
E1 and E2 are arithmetic expressions
The grammar:
E E or E | E and E | not E | ( E ) | id relop id | true | false
Attribute op: <, , =, , >, represented by relop
Translating Boolean Expressions
Numerical representation
True = 1, false = 0
Example:
a or b and not c
t1 := not c
t2 := b and t1
t3 := a or t2
a < b: if a<b then 1 else 0
100:
101:
102:
103:
104:
if a
t :=
goto
t :=
< b goto 103
0
104
1
Translating Boolean Expressions
A Translation Scheme
E E1 or E2
{ E.place := newtemp;
emit( E.place := E1.place or E2.place ) }
E E1 and E2
{ E.place := newtemp;
emit( E.place := E1.place and E2.place ) }
E not E1
{ E.place := newtemp;
emit( E.place := not E1.place ) }
E ( E1 )
{ E.place := E1.place }
E id1 relop id2
{ E.place := newtemp;
emit( if id1.place relop.op id2.place goto nextstat + 3 );
emit( E.place := 0 );
emit( goto nextstat + 2 );
emit( E.place := 1 ) }
E true
{ E.place := newtemp;
emit( E.place := 1 ) }
E false
{ E.place := newtemp;
emit( E.place := 0 ) }
Backpatching
Code generation problem:
Labels (addresses) that control must go to may not be
known at the time the jump statements are generated
A solution - backpatching:
Generate branching statements with empty jumps
Put the statements in a goto list
Fill the labels of these statements when the proper label
is determined
Backpatching
Quadruples in a quadruple array
Labels are indexes into the array
Manipulate the list of lables:
makelist( i ) creates a new list containing three-address
location i, returns a pointer to the list
merge( p1 , p2 ) concatenates lists pointed to by p1 and
p2, returns a pointer to the concatenates list
backpatch( p, i ) inserts i as the target label for each of
the statements in the list pointed to by p
Backpatching
Grammar for boolean expressions
E E1 or M E2
| E1 and M E2
| not E1
| ( E1 )
| id1 relop id2
| true
| false
M
Synthesized Attributes:
E.code
E.truelist
E.falselist
M.quad
three-address code
backpatch list for jumps on true
backpatch list for jumps on false
location of current three-address
quad
Backpatching
A Translation Scheme... (1)
E E1 or M E2
{ backpatch( E1.falselist, M.quad );
E.truelist := merge( E1.truelist, E2.truelist );
E.falselist := E2.falselist }
E E1 and M E2
{ backpatch( E1.truelist, M.quad );
E.truelist := E2.truelist;
E.falselist := merge( E1.falselist, E2.falselist ) }
E not E1
{ E.truelist := E1.falselist;
E.falselist := E1.truelist }
E ( E1 )
{ E.truelist := E1.truelist;
E.falselist := E1.falselist }
Backpatching
A Translation Scheme... (2)
E id1 relop id2
{ E.truelist := makelist( nextquad );
E.falselist := makelist( nextquad + 1 );
emit( if id1.place relop.op id2.place goto_ );
emit( goto_ ) }
E true
{ E.truelist := makelist( nextquad );
emit( goto_ ) }
E false
{ E.falselist := makelist( nextquad );
emit( goto_ ) }
{ M.quad := nextquad }
Backpatching
Example: a<b or c<d and e<f
E id1 relop id2
E E1 or M E2
(M.quad := 102)
E E1 and M E2
(M.quad := 104)
backpatch( {102}, 104 )
backpatch( {101}, 102 )
100:
101:
102:
103:
104:
105:
if a < b goto_
goto_
if c < d goto_
goto_
if e < f goto_
goto_
backpatch( {102}, 104 )
backpatch( {101}, 102 )
100:
101:
102:
103:
104:
105:
if a
goto
if c
goto
if e
goto
< b goto TRUE
102
< d goto 104
FALSE
< f goto TRUE
FALSE
Flow-of-control Statements
Grammar
Synthesized Attributes:
S if E then S
| if E then S else S
| while E do S
| begin L end
| A
LL;S
| S
E.code
E.truelist
E.falselist
M.quad
S.nextlist
L.nextlist
three-address code
backpatch list for jumps on true
backpatch list for jumps on false
location of current three-address
quad
backpatch list for jumps to the
next statement after S (or nil)
backpatch list for jumps to the
next statement after L (or nil)
Flow-of-Control Statements
A Translation Scheme using backpatching... (1)
S if E then M1 S1 N
else M2 S2
{ backpatch( E.truelist, M1.quad );
backpatch( E.falselist, M2.quad );
S.nextlist := merge(S1.nextlist,
merge(N.nextlist,S2.nextlist) }
{ N.nextlist := makelist( nextquad );
emit( goto_) }
{ M.quad := nextquad }
S if E then M S1
{ backpatch( E.truelist, M.quad );
S.nextlist := merge( E.falselist, S1.nextlist) }
S wihle M1 E do
M2 S1
{ backpatch( S1.nextlist, M1.quad );
backpatch( E.truelist, M2.quad );
S.nextlist := E.falselist;
emit( goto_ M1.quad ) }
Flow-of-Control Statements
A Translation Scheme using backpatching... (2)
S begin L end
{ S.nextlist := L.nextlist }
SA
{ S.nextlist := nil }
L L1 ; M S
{ backpatch( L1.nextlist, M.quad );
L.nextlist := S.nextlist }
LS
{ L.nextlist := S.nextlist }
Intermediate Code Generation
Chapters for reading:
8.1 to 8.6