0% found this document useful (0 votes)
9 views26 pages

Lesson 5

The document discusses a control flow graph (CFG) of a simple program, focusing on live variables and reaching definitions. It provides detailed steps for identifying live variables at each node and systematically computes reaching definitions using data-flow equations. The document includes tables and trees to illustrate the findings and concludes with a translation grammar for a specified source language.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views26 pages

Lesson 5

The document discusses a control flow graph (CFG) of a simple program, focusing on live variables and reaching definitions. It provides detailed steps for identifying live variables at each node and systematically computes reaching definitions using data-flow equations. The document includes tables and trees to illustrate the findings and concludes with a translation grammar for a specified source language.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

2.

Consider the following control flow graph (CFG) of a simple program:


1 a=1

2 5 7
yes yes
a == b b == c return (c) →

no no

read (b) 3 read (a) 6

c=a+b
4

Variables b and c are input parameters, and variable c is also output parameter, while
variable a is local.
Answer the following questions:

(a) Find the live variables of the given CFG, by directly applying the definition of
live variable. Write the live variables on the CFG prepared on the next pages.
(b) Find the reaching definitions of the given CFG, by means of the systematic
method of the flow equations for reaching definitions: first write the equations,
then iteratively solve them (use the tables prepared on the next pages). Write
the reaching definitions on the CFG prepared on the next pages.

21
please here write the LIVE VARIABLES (at the node inputs) obtained directly


1 a=1

2 5 7
yes yes
a == b b == c return (c) →

no no

read (b) 3 read (a) 6

c=a+b
4

22
table of definitions and suppressions at the nodes
for REACHING DEFINITIONS

node defined node suppressed

1 1
2 2
3 3
4 4
5 5
6 6
7 7

system of data-flow equations for REACHING DEFINITIONS

node in equations out equations

23
iterative solution table of the system of data-flow equations (REACHING DEF.S)
(the number of tables and columns is not significant)

initialization 1 2 3 4 5 6
# in out in out in out in out in out in out in out
1
2
3
4
5
6
7

7 8 9 10 11 12 13
# in out in out in out in out in out in out in out
1
2
3
4
5
6
7

24
please here write the REACHING DEFINITIONS (at the node outputs)
obtained systematically


1 a=1

2 5 7
yes yes
a == b b == c return (c) →

no no

read (b) 3 read (a) 6

c=a+b
4

25
Solution
(a) Here are the live variables at the node inputs, obtained directly from the defini-
tion of liveness (annotated next to the arrow tips to mean they are input):

↓ { b, c }
1 a=1

2 { a, b, c } 5 7
yes yes
{ a, b, c } a == b b == c return (c) → ∅
{ b, c } {c}

no no
{a} {b}
read (b) 3 read (a) 6

{ a, b }
c=a+b
{ a, b }
4

Of course all the inputs to the same node bear the same live variables. Variables
b and c are live on the input node of the CFG, as they are input and then used in
the CFG. Variable c is defined in the CFG and then is output, yet it is not live
at the output node of the CFG, as whether it is used subsequently is unknown.
To examine a few cases: variable a is not live at the input of node 1 as 1 defines
a; b and c are live at 1 since they are input parameters assigned outside of the
program and are used at 2 successor of 1; a, b and c are live at 2 since all of them
have been previously assigned, a and b are used at 2, and c is used at 3 successor
of 2; only a is live at 3 since it has been previously assigned and is used at 4
successor of 3, while b and c are assigned at 3 and 4 successor of 3 without being
used; and so on. The reader may wish to systematically verify the solution by
himself, by means of the data-flow equation method for live variables.

26
(b) Here is the systematic computation of the reaching definitions. The definitions
in the CFG are a1 , b3 , c4 and a6 , plus the input parameters b? and c? , which are
defined outside of the CFG. First here are the definition and suppression tables:
table of definitions and suppressions at the nodes
for REACHING DEFINITIONS

node defined node suppressed

1 a1 1 a6
2 2
3 b3 3 b?
4 c4 4 c?
5 5
6 a6 6 a1
7 7
As said, it is necessary to consider the definitions b? and c? for the input pa-
rameters of the program. The rest of the definitions and suppressions is clear.
Second here are the data-flow equations, divided into input and output:
system of data-flow equations for REACHING DEFINITIONS

node in equations out equations



1 in (1) = { b? , c? } out (1) = { a1 } ∪ in (1) − { a6 }

2 in (2) = out (1) ∪ out (4) out (2) = in (2)



3 in (3) = out (2) out (3) = { b3 } ∪ in (3) − { b? }

4 in (4) = out (3) ∪ out (6) out (4) = { c4 } ∪ in (4) − { c? }

5 in (5) = out (2) out (5) = in (5)



6 in (6) = out (5) out (6) = { a6 } ∪ in (6) − { a1 }

7 in (7) = out (5) out (7) = in (7)

Here it happens that every definition suppresses another definition of the same
variable or input parameter (see the definition and suppression tables computed
above). The rest of the system of data-flow equations is clear.

27
iterative solution table of the system of data-flow equations
(REACHING DEF.S)
initialization 1 2 3 4 5
# in out in out in out in out in out in out
1 b? c? a 1 b? b? c? a 1 b? b? c ? a1 b? b? c ? a 1 b? b? c ? a 1 b? b? c?
c? c? c? c? c?
2 ∅ ∅ a1 b? a 1 b? a1 a6 a1 a6 a1 a6 a1 a6 a1 a6 a1 a6 a1 a6
c? c4 c? c4 b? b3 b? b3 b3 b? b3 b? b3 b? b3 b? b3 b?
c? c4 c? c4 c? c4 c? c4 c? c4 c? c4 c? c4
3 ∅ b3 ∅ b3 a1 b? a1 b3 a1 a6 a1 a6 a1 a6 a1 a6 a1 a6
c? c4 c? c4 b? b3 b3 c ? b3 b? b3 c? b3 b?
c? c4 c4 c? c4 c4 c? c4
4 ∅ c4 a6 b3 a 6 b3 a6 b3 a6 b3 a1 a6 a1 a6 a1 a6 a1 a6 a1 a6
c4 c4 b3 c ? b3 c 4 b3 b? b3 b? b3 b?
c4 c? c4 c4 c? c4
5 ∅ ∅ ∅ ∅ a1 b? a1 b? a1 a6 a1 a6 a1 a6 a1 a6 a1 a6
c? c4 c? c4 b? b3 b? b3 b3 b? b3 b? b3 b?
c? c4 c? c4 c? c4 c? c4 c? c4
6 ∅ a6 ∅ a6 ∅ a6 a 1 b? a 6 b? a1 a6 a 6 b3 a1 a6
c? c4 c? c4 b3 b? b? c? b3 b?
c? c4 c4 c? c4
7 ∅ ∅ ∅ ∅ ∅ ∅ a 1 b? a 1 b? a1 a6 a1 a6 a1 a6
c? c4 c? c4 b3 b? b3 b? b3 b?
c? c4 c? c4 c? c4

The in columns of steps 4 and 5 are identical, thus the solution procedure ter-
minates in four steps. The contents of the out column of step 4 represent the
definitions reaching the CFG nodes, plus the contents of the in cell of node 1.
Notice that the solution procedure for reaching definitions propagates forward.
For this reason, the solution procedure starts from the input definitions.
The definitions that are the slowest to reach the output of their farthest node
are the following: b? to reach node 4, b3 to reach node 6, both b3 and a6 to reach
node 7. All of them take exactly four steps to propagate as far as possible. In
fact a quick look shows that they have to traverse a path of four nodes. For
instance: b? has to traverse the four nodes 2, 5, 6 and 4; instead on path 2, 3
and 4, which has length only three, b? would be suppressed by b3 at node 3; and
there are not any other acyclic paths that connect the input node 1 to node 4.
There is not any definition that can traverse a path of five nodes before looping
on a node already reached, or before being suppressed, or before exiting the
CFG. This is why the solution procedure converges in exactly four steps.

28
Here are the reaching definitions at the node outputs, obtained systematically
(annotated next to the arrow exits to mean they are output):

{ b? , c ? , }

1 a=1

{ a1 , b? , c? }

 
a 1 , a 6 , b? ,
2 5 7
yes b3 , c ? , c 4
a == b   b == c yes return (c)
  a 1 , a 6 , b? ,   ↓
a1 , a6 , b? , b3 , c ? , c 4 a 1 , a 6 , b ? , n o
a1 , a6 , b? ,
b3 , c ? , c 4 b3 , c ? , c 4 b3 , c ? , c 4
no no

read (b) 3 read (a) 6

{ a1 , a6 , b3 , c? , c4 } { a6 , b? , b3 , c? , c4 }

{ a1 , a6 , b? , b3 , c4 } c=a+b
4

Of course all the outputs to the same node bear the same reaching definitions.
Definitions b? and c? are reaching on the input node of the CFG as they are
input parameters to the program. The reader may wish to verify the systematic
solution directly on the CFG by applying the definition of reaching definition.

29
4 Language Translation and Semantic Analysis 20%
1. Consider the source language Ls below, over the two-letter alphabet Σ = { a, b }:

a h bk |
 
Ls = h, k ≥ 0 ∧ k ≤ h ≤ 2k ∨ k > h

A source grammar Gs that generates language Ls is (axiom S):



 S → X | Y


Gs X → aaXb | aXb | ε

 Y → aY b | Y b | b

Answer the following questions:

(a) Write a syntax scheme Gτ (or a translation grammar) for the translation τ over
the source language Ls defined as follows:
τ (ah bk ) = as bs s = min (h, k)
without changing the source grammar Gs .
Furthermore, draw the syntax translation trees of the two valid source strings:
aab abb
(b) Is scheme (or grammar) Gτ a deterministic translation ? Explain your answer.

18
Solution
(a) The source strings can be informally characterized as those that consist of letters
a followed by letters b, i.e., ah bk , where the number h of letters a is not larger
than twice the number k of letters b.
A possible translation grammar Gτ is (axiom S), which computes the destination
string as bs with s = min ( h, k ):

 S → X | Y


Gτ X → aaa X bb | aa X b
b | ε
ε (1) if h ≥ k, i.e., s = k

Y → aY b | Y b |
 b
(2) if h < k, i.e., s = h
a b ε ε
In the case 1 the number s of letters a output is equal to the number k of input
letters b, while in the case 2 the number s of letters b output is equal to the
number h of input letters a. This is equivalent to computing s = min ( h, k ).
Here are the two translation trees:

S S

X Y

aa b a b
a
X b a
Y b

ε b
ε ε

τ (a a b) = a b τ (a b b) = a b

(b) Translation τ is a function from a string to one string, since the exponent s is
the unique minimum value between the exponents h and k.
However, the source grammar Gs is clearly ambiguous, as both nonterminals X

and Y generate strings ambiguously. For instance, each string S ⇒ X = ⇒ aaabb

and S ⇒ Y = ⇒ a b b b has two syntax trees, thus both are ambiguous. Therefore
the syntax scheme (or translation grammar) Gτ is certainly nondeterministic.

19
2. The grammar below (axiom S), over a one-letter terminal alphabet { f }, models the
abstract syntax of the binary trees where all the internal nodes have two child nodes
(simplified for the sake of brevity). A sample tree is reported on the right (a prefix
subscript index is associated to every leaf node only for future reference):

S
sample tree
 D

 S → D

D 7f

D → DD


 D → Df D D

D → ff

D 3f D 6f

1f 2f 4f 5f

This grammar just models an abstract syntax, which is unsuitable for syntax analysis.
We call distance between two tree nodes n1 and n2 , denoted as dist (n1 , n2 ), the length
of the shortest path between them; for instance dist (2 f, 4 f ) = 6 and dist (2 f, 7 f ) = 5.
The depth of any tree node is its distance from the root, hence in the sample tree the
node depth ranges from 0 (for the root S) to 5 (for the leaf nodes 1 f , 2 f , 4 f and 5 f ).
We intend to design an attribute grammar that supports the computation of the
maximal distance between any two tree leaf nodes. In the above sample tree the
maximal distance is 6, because for instance dist (2 f, 4 f ) = 6.
The grammar must also allow for the computation of the depth of the deepest internal
node that is a common ancestor of any two leaf nodes that have maximal distance.
In the above example the depth is 2, because the deepest common ancestor of the
leaf nodes 2 f and 4 f has depth 2 (it is the second node D from the root).
Consider the table below, which lists a set of attributes for the grammar:

name type domain symbol meaning


d right integer D d epth of the current node

maximal d istance, from the current


mdl left integer D node, of a l eaf node descendant of
the current node

maximal distance between any pair


mpl left integer D, S of l eaf nodes both descendant of the
current node

d epth of the deepest internal node


that is a common ancestor of a pair
dpl left integer D, S of l eaf nodes that are both descen-
dant of the current node and have a
maximal distance

20
The picture below shows the previous sample tree decorated with the attribute values:

mpl = 6
S
dpl = 2

mdl = 4
mpl = 6
dpl = 2 D d=1

mdl = 3
mpl = 6
dpl = 2 D d=2 f

mdl = 2
mpl = 3 mdl = 2
D d=3 D d=3
dpl = 3 mpl = 3
dpl = 3

mdl = 1 mdl = 1
mpl = 2 D d=4 f mpl = 2 D d=4 f
dpl = 4 dpl = 4

f f f f

Answer the following questions (use the tables / drawings / spaces on the next pages):

(a) Write the semantic rules to compute attribute d.


(b) Write the semantic rules to compute attribute mdl. Decorate the next tree with
the attribute value, according to the provided semantic rules.
(c) Write the semantic rules to compute attribute mpl. Decorate the next tree with
the attribute value, according to the provided semantic rules.
(d) (optional) Write the semantic rules to compute attribute dpl. Decorate the next
tree with the attribute value, according to the provided semantic rules.

When writing the semantic rules you can use functions like max (v1 , . . . , vn ), for any
n ≥ 2, to denote the maximum of a set of values v1 , . . . , vn .

21
tree to be decorated for questions (b-c-d)

D D

f f D f

f f

22
semantic rules for attribute d – question (a)

# syntax semantics

1: S0 → D1

2: D0 → D1 D2

3: D0 → D1 f

4: D0 → f f

semantic rules for attribute mdl – question (b)

# syntax semantics

1: S0 → D1

2: D0 → D1 D2

3: D0 → D1 f

4: D0 → f f

23
semantic rules for attribute mpl – question (c)

# syntax semantics

1: S0 → D1

2: D0 → D1 D2

3: D0 → D1 f

4: D0 → f f

semantic rules for attribute dpl – question (d)

# syntax semantics

1: S0 → D1

2: D0 → D1 D2

3: D0 → D1 f

4: D0 → f f

24
Solution
(a) Here are the semantic rules for attribute d:

# syntax semantics
1: S0 → D1 d1 := 1

2: D0 → D1 D2 d1 , d2 := d0 + 1

3: D0 → D1 f d1 := d0 + 1

4: D0 → f f

This is quite standard a computation: the node depth is incremented from top
to bottom. Rule 1 initializes.
(b) Here are the semantic rules for attribute mdl :

# syntax semantics
1: S0 → D1

2: D0 → D1 D2 mdl0 := max (mdl1 , mdl2 ) + 1

3: D0 → D1 f mdl0 := mdl1 + 1

4: D0 → f f mdl0 := 1

In the rule 2, the maximum root-to-leaf distance in the two subtrees is taken,
plus one to count the parent node. In the rule 3, there is only one subtree, thus
taking the maximum is unnecessary. Rule 4 initializes.
(c) Here are the semantic rules for attribute mpl :

# syntax semantics
1: S0 → D1 mpl0 := mpl1

2: D0 → D1 D2 mpl0 := max (mpl1 , mpl2 , mdl1 + mdl2 + 2)

3: D0 → D1 f mpl0 := max (mpl1 , mdl1 + 2)

4: D0 → f f mpl0 := 2

In the rule 2, it is taken the maximum of the leaf-to-leaf distances in the two
subtrees and of the distance sum plus two for the two leaves to belong to different
subtrees. In the rule 3, there is only one subtree, thus the maximum is simplified.
Rule 4 initializes.

25
(d) Here are the semantic rules for attribute dpl :

# syntax semantics
1: S0 → D1 dpl0 := dpl1

2: D0 → D1 D2 if (mpl0 = mpl1 ) then


dpl0 := dpl1
else if (mpl0 = mpl2 ) then
dpl0 := dpl2
else
dpl0 := d0
end if

3: D0 → D1 f if (mpl0 = mpl1 ) then – equiv. to mpl1 ≥ mdl1 +2


dpl0 := dpl1
else
dpl0 := d0
end if

4: D0 → f f dpl0 := d0

In the rules 2 and 3 (with some simplification) the max distances of letter pairs
are compared and the parent node depth is set accordingly. Rule 4 initializes.

tree decorated – all questions

mpl = 5
S
dpl = 1

mdl = 3
mpl = 5
dpl = 1 D

mdl = 1
mpl = 2
mdl = 2
dpl = 2 D D
mpl = 3
dpl = 2

mdl = 1
f f mpl = 2 D f
dpl = 3

f f

26
2. Consider the following (simplified) portion of a grammar of a programming language,
with nested conditional statements if ... then ... else ... end, possibly without else
branch (axiom ST):


 1: hSTi → hIFi

 2: hSTi → asg


 3: hIFi → if cond then hSTi else hSTi end
4: hIFi → if cond then hSTi end

We intend to compute a numerical parameter c that indicates the completeness degree


of the nested instructions. Parameter c tells whether all the branches of the nested
conditional instructions are complete, or whether some branch else is missing.
To do so, decreasing weights are assigned to the nested branches. Therefore, each
instruction asg is assigned a value equal to 2−n , where n ≥ 0 is the number of
enclosing conditional instructions.
For instance, in the sample program and tree below, the first asg (3-rd line) has a
value 2−2 = 14 = 0.25, while the second asg (6-th line) has a value 2−1 = 12 = 0.5:

c = 0.25 + 0.5 = 0.75


ST

if cond then IF
if cond then
asg if cond then ST else ST end
end
else IF asg
asg 2−1 = 1
= 0.5
2
end if cond then ST end

asg
1
2−2 = 4 = 0.25

Answer the following questions (use the tables / trees / spaces on the next pages):

(a) Compute the correct value of parameter c for the entire conditional statement
by using only one attribute, also named c, of a suitable type. The required
result must be computed as the value of c in the root node of the abstract
syntax tree. In the above example, the root node ST will have an attribute
c = 2−2 + 2−1 = 0.75.
(b) Decorate the example tree on the next page and write the values of attribute c
in all the relevant nodes.
(c) Say if the defined attribute grammar is of type L. Provide an explanation for
your answer and avoid generic or tautological sentences.

19
attribute specification to be completed and semantic functions – question (a)

name type domain symbol


c real

# syntax semantics

1: ST0 → IF1

2: ST0 → asg

3: IF0 → if cond then ST1 else ST2 end

4: IF0 → if cond then ST1 end

20
syntax tree to be decorated – question (b)

ST

IF

if cond then ST else ST end

IF IF

if cond then ST else ST end if cond then ST end

IF end IF end asg

if cond then ST else ST end if cond then ST

asg asg asg

21
Solution
(a) Rules of the attribute grammar:

name type domain symbol


c left real ST, IF

# syntax semantics
1: ST0 → IF1 c0 := c1
2: ST0 → asg c0 := 1
3: IF0 → if...then ST1 else ST2 end c0 := 12 c1 + 21 c2
4: IF0 → if...then ST1 end c0 := 12 c1

(b) Decorated tree: TBD


(c) The grammar is of type L because it is purely synthesized, as its unique attribute
c is of type left.

22
2. Consider the program Control Flow Graph (CFG) shown below, with nine nodes:


x := 0 1

read c
2
x++
yes

c == ‘ a ’ 3

no

y := x − 1 4

5 7
yes yes
y == 0 c == ‘ ⊣ ’ return ‘ ⊣ ’ 8

no no
read c
6 return c 9
y−− ↓

The program has three variables: x and y (both integer), and c (character). By means
of variable c, the program reads a one-way input tape, which contains a (possibly
empty) string w ∈ Σ∗ , with Σ = { a, b }. String w is terminated by ⊣, i.e., the input
tape contains w ⊣. The first read operation executed sets the variable c to the initial
character of w if w 6= ε, or to the terminator ⊣ if w = ε.
Answer the following questions (use the tables / drawings / spaces on the next pages):

(a) Write the regular expression E that represents the execution traces of the CFG,
disregarding the program semantics, over the node name alphabet { 1, . . . , 9 }.
(b) Informally find the live variables at each node and write them by each node (the
live variables of a node are those live at the node input). Can the found liveness
be exploited to optimize the memory allocation for the program ?
(c) Find the variables used and defined at each node. Then write the flow equations
of the live variables, but only for the nodes 2, 4 and 5 (it is not required to solve
such equations).
(d) (optional) Verify that the solution found at point (b) satisfies the flow equations
of nodes 2, 4 and 5. Remember that each node has two equations: in and out.

19
Solution
(a) Regular expression E, disregarding semantics:
E = 1 ( 2 3 )+ 4 5 ( 6 5 )∗ 7 ( 8 | 9 )
or, equivalently:
E = 1 ( 2 3 )+ 4 ( 5 6 )∗ 5 7 ( 8 | 9 )
(b) Live variables:

x := 0 1 ∅

read c
2 {x}
x++

yes

c == ‘ a ’ 3 { x, c }

no

y := x − 1 4 { x, c }

7 {c}
yes yes
y == 0 c == ‘ ⊣ ’ return ‘ ⊣ ’ 8 ∅
5 { y, c } ↓
no no

read c
6 {y} return c 9 { c }
y−− ↓

The two variables x and y are never simultaneously live. Since both are of integer
type, they can be directly unified into one variable, say z (this is to say they
can share the same memory cell or processor register). Thus the program can
be optimized as follows:

20

z := 0 1 ∅

read c
2 {z}
z++

yes

c == ‘ a ’ 3 { z, c }

no

z := z − 1
4 { z, c }
(i.e., z − −)

7 {c}
yes yes
z == 0 c == ‘ ⊣ ’ return ‘ ⊣ ’ 8 ∅
5 { z, c } ↓
no no

read c
6 {z} return c 9 { c }
z−− ↓
Writing z − − at node 4 would just be a short notation for the assignment
z := z − 1, the operation being the same. The optimized program uses only two
variables, i.e., z and c, instead of three.
(c) Definitions and usages at the nodes:

node defined node used


1 {x} 1 ∅
2 { c, x } 2 {x}
3 ∅ 3 {c}
4 {y} 4 {x}
5 ∅ 5 {y}
6 { c, y } 6 {y}
7 ∅ 7 {c}
8 ∅ 8 ∅
9 ∅ 9 {c}

21
system of data-flow equations

node in equations out equations



2 in (2) = { x } ∪ out (2) − { x, c } out (2) = in (3)

4 in (4) = { x } ∪ out (4) − { y } out (4) = in (5)

5 in (5) = { y } ∪ out (5) − ∅ out (5) = in (6) ∪ in(7)

(d) At each node, the live variables directly provide the corresponding in set. The
verification just consists of substituting the informal solution into the out equa-
tions, thus obtaining the out sets, and then of checking that the in equations
become identities. Here is the verification (node 2 is analyzed stepwise, the
others more speedily):
• For node 2: since from the solution it is in (3) = { x, c }, from the out
equation we have out (2) = { x, c }; and since from the solution it is in(2) =
{ x }, the in equation becomes { x } = { x } ∪ { x, c } − { x, c } , i.e.,
{ x } = { x } ∪ ∅, which is an identity.
• For node 4: the out equation yields out (4) = in (5) = { y, c }, from 
which the in equation becomes { x, c } = { x } ∪ { y, c } − { y } ,
i.e., { x, c } = { x } ∪ { c }, an identity.
• For node 5: the out equation yields out (5) = in (6) ∪ in (7) = { y } ∪ { c } =
{ y, c }, from which the in equation becomes { y, c } = { y } ∪ { y, c }, an
identity.
Therefore, all the listed equations pass the verification test and thus confirm the
correctness of the informal solution (or of the part considered).

22

You might also like