MODULE 4
Intermediate Code Generation
Intermediate code generation
• In the analysis-synthesis model of a compiler, the
front end of a compiler translates a source program
into an independent intermediate code, then the
back end of the compiler uses this intermediate
code to generate the target code (which can be
understood by the machine).
• Intermediate code can translate the source
program into the machine program.
• Static checking includes type checking, which
ensures that operators are applied to
compatible operands.
• It also includes any syntactic checks that
remain after parsing.
• For example, static checking assures that a
break-statement in C is enclosed within a
while-, for-, or switch-statement; an error is
reported if such an enclosing statement does
not exist.
Significance of Intermediate code
• If a compiler translates the source language to its target machine language
without having the option for generating intermediate code, then for each new
machine, a full native compiler is required.
• Intermediate code eliminates the need of a new full compiler for every unique
machine by keeping the analysis portion same for all the compilers.
• The second part of compiler, synthesis, is changed according to the target
machine.
• It becomes easier to apply the source code modifications to improve code
performance by applying code optimization techniques on the intermediate code.
• With a suitably defined intermediate
representation, a compiler for language i and
machine j can then be built by combining the
front end for language i with the backend for
machine j.
• This approach to creating suite of compilers can
save a considerable amount of effort: m X n
compilers can be built by writing just m front
ends and n back ends.
Intermediate Representation
Intermediate codes can be represented in a variety
of ways and they have their own benefits.
• High Level IR - High-level intermediate code
representation is very close to the source
language itself. They can be easily generated from
the source code and we can easily apply code
modifications to enhance performance. But for
target machine optimization, it is less preferred.
• Low Level IR - This one is close to the target
machine, which makes it suitable for register and
memory allocation, instruction set selection, etc.
It is good for machine-dependent optimizations.
• Intermediate code can be either language
specific (e.g., Byte Code for Java) or language
independent (three-address code).
• Syntax trees are high level; they depict the
natural hierarchical structure of the source
program and are well suited to tasks like static
type checking.
• Three-address code can range from high- to
low-level, depending on the choice of
operators.
Advantages of Intermediate Code
Generation
• It is Machine Independent. It can be executed
on different platforms.
• It creates the function of code optimization
easy. A machine-independent code optimizer
can be used to intermediate code to optimize
code generation.
• It can perform efficient code generation.
• From the existing front end, a new compiler
for a given back end can be generated.
Representations on Intermediate code
• The intermediate code can be represented in
the form of postfix notation, syntax tree,
directed acyclic graph, three address codes,
Quadruples, and triples.
Variants of syntax tree
• Nodes in a syntax tree represent constructs in
the source program; the children of a node
represent the meaningful components of a
construct.
• A directed acyclic graph (hereafter called a
DAG) for an expression identifies the common
sub-expressions (sub-expressions that occur
more than once) of the expression.
Directed Acyclic graphs for Expressions
• Like the syntax tree for an expression, a DAG has
leaves corresponding to atomic operands and
interior nodes corresponding to operators.
• The difference is that a node N in a DAG has more
than one parent if N represents a common sub-
expression; in a syntax tree, the tree for the
common sub-expression would be replicated as
many times as the sub-expression appears in the
original expression.
• Thus, a DAG not only represents expressions more
succinctly, it gives the compiler important clues
regarding the generation of efficient code to
evaluate the expressions.
• DAG for the expression
a + a * (b - c) + (b - c) * d
• The leaf for a has two parents, because a appears
twice in the expression.
• More interestingly, the two occurrences of the
common subexpression b- c are represented by
one node, the node labeled —.
• That node has two parents, representing its two
uses in the subexpressions a* (b-c ) and (b-c)*d .
• Even though b and c appear twice in the
complete expression, their nodes each have one
parent, since both uses are in the common
subexpression b-c .
• The sequence of steps shown in constructs
the D, provided Node and Leaf return an
existing node, if possible.
• We assume that entry-a points to the symbol-
table entry for a, and similarly for the other
identifiers.
• When the call to Leaf (id; entry-a) is repeated
at step 2, the node creed by the previous call
is returned, so p2 = p1.
• The SDD of Fig. 6.4 can construct either syntax trees or
DAG’s.
• It was used to construct syntax trees where functions
Leaf and Node created a fresh node each time they were
called.
• It will construct a DAG if, before creating a new node,
these functions first check whether an identical node
already exists.
• If a previously created identical node exists, the existing
node is returned.
• For instance, before constructing a new node, Node(op,
left, right) we check whether there is already a node with
label op, and children left and right, in that order.
• If so, Node returns the existing node; otherwise, it
creates a new node.
The Value number method for
constructing DAG’s
• Often, the nodes of a syntax tree or DAG are stored
in an array of records, each row of the array
represents one record, and therefore one node.
• In each record, the first field is an operation code,
indicating the label of the node.
• The leaves have one additional field, which holds
the lexical value (either a symbol-table pointer or a
constant, in this case), interior nodes have two
additional fields indicating the left and right
children.
• In this array, we refer to nodes by giving the
integer index of the record for that node within
the array.
• This integer historically has been called the value
number for the node or for the expression
represented by the node.
• Let the signature of an interior node be the triple
<op; l; r>, where op is the label, l its left child's
value number, and r its right child's value number.
• A unary operator may be assumed to have r = 0.
Three Address Code
• Three-address code is a linearized
representation of a syntax tree or a DAG in
which explicit names correspond to the
interior nodes of the graph.
• In three-address code, there is at most one
operator on the right side of a instruction;
that is, no built-up arithmetic expressions are
permitted.
• Thus source-language expression like
x+y*z
might be translated into the sequence of three-
address instruction
t1 = y * z
t2 = x + t
where t1 and t2 are compiler-generated
temporary names.
Example:
Addresses and Instructions
• Three-address code is built from two
concepts: addresses and instructions.
• Three-address code can be implemented using
records with fields for the addresses; records
called quadruples and triples.
An address can be one of the following:
1. A name. For convenience, we allow source-
program names to appear as addresses in three-
address code. In an implementation, a source
name is replaced by a pointer to its symbol-table
entry, where all information about the name is
kept.
2. A constant. In practice, a compiler must deal with
many different types of constants and variables.
Type conversions within expressions are
considered.
3. A compiler-generated temporary. It is useful,
especially in optimizing compilers, to create a
distinct name each time a temporary is needed.
These temporaries can be combined, if possible,
when registers are allocated to variables.
Symbolic Labels
• Symbolic labels will be used by instructions
that alter the flow of control.
• A symbolic label represents the index of a
three-address instruction in the sequence of
instructions.
• Actual indexes can be substituted for the
labels, either by making a separate pass or by
back-patching.
Common three-address instruction
forms:
1. Assignment instructions of the form x = y op z, where op is a
binary arithmetic or logical operation, and x, y, and z are
addresses.
2. Assignments of the form x = op y, where op is a unary
operation. Essential unary operations include unary minus,
logical negation, and conversion operators that, for example,
convert an integer to a floating-point number.
3. Copy instructions of the form x = y, where x is assigned the
value of y.
4. An unconditional jump goto L. The three-address instruction
with label L is the next to be executed.
5. Conditional jumps of the form if x goto L and if False x goto L.
These instructions execute the instruction with label L next if
x is true and false, respectively. Otherwise, the following
three-address instruction in sequence is executed next, as
usual.
• 6. Conditional jumps such as if x relop y goto L,
which apply a relational operator (=, etc.) to x and
y, and execute the instruction with label L next if x
stands in relation relop to y. If not, the three-
address instruction following if x relop y goto L is
executed next, in sequence.
• 7. Procedure calls and returns are implemented
using the following instructions: param x for
parameters; call p, n and y = call p, n for
procedure and function calls, respectively; and
return y, where y, representing a returned value,
is optional.
• Their typical use is as the sequence of three
address instructions
param x1
param x2
param xn
call p, n
generated as part of a call of the procedure p(x1,
x2,….,xn).
• The integer n, indicating the number of actual
parameters in “call (p, n)”, is not redundant
because calls can be nested.
• That is, some of the first param statements could
be parameters of a call that comes after p returns
its value; that value becomes another parameter
of the later call.
• 8. Indexed copy instructions of the form x = y[i] and x[i]
= y. The instruction x = y[i] sets x to the value in the
location i memory units beyond location y. The
instruction x[i] = y sets the contents of the location i
units beyond x to the value of y.
• 9. Address and pointer assignments of the form x = &
y, x = * y, and * x = y. The instruction x = & y sets the r-
value of x to be the location (l-value) of y.2 Presumably
y is a name, perhaps a temporary, that denotes an
expression with an l-value such as A[i][j], and x is a
pointer name or temporary. In the instruction x = * y,
presumably y is a pointer or a temporary whose r-value
is a location. The r-value of x is made equal to the
contents of that location. Finally, * x = y sets the r-value
of the object pointed to by x to the r-value of y.
Three address code Representations
• The description of three-address instructions
specifies the components of each type of
instruction, but it does not specify the
representation of these instructions in a data
structure.
• In a compiler, these instructions can be
implemented as objects or as records with
fields for the operator and the operands.
• Three such representations are called
“quadruples”," triples," and “indirect triples."
Quadruples
• A quadruple (or quad) has four fields, which we
call op, arg 1, arg 2, and result.
• Some rules:
1. Instructions with unary operators like x = minus
y or x = y do not use arg2. Note that for a copy
statement like x = y, op is =, while for most other
operations, the assignment operator is implied.
2. Operators like param use neither arg 2 nor
result.
3. Conditional and unconditional jumps put the
target label in result.
Three-address code for the assignment a = b*-c+b*-c;
Triples
• A triple has only three fields, which we call op,
arg 1, and arg 2.
• Using triples, we refer to the result of an
operation x op y by its position, rather than by
an explicit temporary name.
• Thus, instead of the temporary t1 , a triple
representation would refer to position (0).
Indirect Triples
Indirect triples consist of a listing of pointers to triples,
rather than a listing of triples themselves.
• For example, let us use an array instruction to
list pointers to triples in the desired order.
• For example, let us use an array instruction to
list pointers to triples in the desired order.
• With indirect triples, an optimizing compiler
can move an instruction by reordering the
instruction list, without affecting the triples
themselves.