0% found this document useful (0 votes)
16 views26 pages

Unit-4 F&CD

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views26 pages

Unit-4 F&CD

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

UNIT-4

Type Checking

Type Checking in Compiler Design


Type checking is the process of verifying and enforcing constraints of types in values.
A compiler must check that the source program should follow the syntactic and semantic
conventions of the source language and it should also check the type rules of the language. It
allows the programmer to limit what types may be used in certain circumstances and assigns
types to values. The type-checker determines whether these values are used appropriately or
not.
It checks the type of objects and reports a type error in the case of a violation, and
incorrect types are corrected. Whatever the compiler we use, while it is compiling the
program, it has to follow the type rules of the language. Every language has its own set of
type rules for the language. We know that the information about data types is maintained and
computed by the compiler.
The information about data types like INTEGER, FLOAT, CHARACTER, and all the
other data types is maintained and computed by the compiler. The compiler contains
modules, where the type checker is a module of a compiler and its task is type checking.
4.1 Specification of a Simple Type Checker
• Specification of a simple type checker for a simple language in which the type of
each identifier must be declared before the identifier is used.
• The type checker is a translation scheme that synthesizes the type of each expression from the
types of its sub expressions.
• The type checker can handle arrays, pointers, statements, and functions.
• Specification of a simple type checker includes the following:
1. A Simple Language
2. Type Checking of Expressions
3. Type Checking of Statements
4. Type Checking of Functions
1. Simple Language
• The following grammar generates programs, represented by the non terminal P, consisting of a
sequence of declarations D followed by a single expression E.

P -> D ; E

D -> D ; D | id : T

T -> char | integer | array[num] of T | ‡ T

translation scheme for above rules:


P -> D ; E

D -> D ; D

D -> id : T { addtype(id.entry, T.type }

T -> char { T.type := char }

T -> integer { T.type := integer }

T -> ‡T1 { T.type := pointer(T1.type) }

T -> array[num] of T1 { T.type := array(1..num.val, T1.type) }

2. Type Checking of Expressions


• The synthesized attribute type for E gives the type expression assigned by the type system to
the expression generated by E.
• The following semantic rules say that constants represented by the
tokens literal and num have type char and integer, respectively:

Rule Semantic Rule

E -> literal { E.type := char }

E -> num { E.type := integer}

• A function lookup(e) is used to fetch the type saved in the symbol-table entry pointed to by e.
• When an identifier appears in an expression, its declared type is fetched and assigned to the
attribute type;

E -> id { E.type := lookup(id.entry}

• The expression formed by applying the mod operator to two subexpressions of type integer has
type integer; otherwise, its type is type_error. The rule is
E -> E1 mod E2 { E.type := if E1.type = integer and E2.type = integer

then integer

else typr_error}

• In an array reference E1 [E2], the index expression E2 must have type integer, in which case
the result is the dement type t obtained from the type array(s. t ) of E1; we make no use of the
index set s of the array.
E -> E1 [ E2 ] { E.type := if E2.type = integer and E1.type = array(s,t)

then t

else typr_error}

• Within expressions, the postfix operator ↑ yields the object pointed to by its operand. The type
of E ‡ is the type ‡ of the object pointed to by the pointer E:
E -> E1 ‡ { E.type := if E1.type = ponter(t) then t

else typr_error}

3. Type Checking of Statements


• Since language constructs like statements typically do not have values, the special basic type
void can be assigned to them.
• If an error is detected within a statement, then the type type_error assigned.
• The assignment statement, conditional statement, and while statements are considered for
the type checking.
• The Sequences of statements are separated by semicolons.

Statement Rule Semantic Rule

Assignment S -> id := E {S.type := if id.type = E,type then void

else type_error}

Conditional E -> if E then S1 { S.type := if E.type = Boolean then S1.type

else type_error}
While E -> while E do { S.type := if E.type = Boolean then S1.type
S1
else type_error}

Sequence E -> S1 ; S2 { S.type := if S1.type = void and S2.type = void then void

else type_error}

4. Type Checking of Functions


• The application of a function to an argument can be captured by the production
E -> E ( E )

• An expression is the application of one expression to another. The rules for associating type
expressions with nonterminal T can be augmented by the following production and action to
permit function types in declarations.
T -> T1' -> T2' {T.type := T1.type -> T2.type}

• Quotes around the arrow used as a function constructor distinguish it from the arrow used as
the meta symbol in a production.
• The rule for checking the type of a function application is

E -> E1 ( E2 ) { E.type := if E2.type = s and E1.type = s -> t

then t

else typr_error}

4.2 Equivalence of Type Expressions


• If two type expressions are equal then return a certain type else return type_error.
Key Ideas:
o The main difficulty arises from the fact that most modern languages allow the naming
of user-defined types.
o For instance, in C and C++ this is achieved by the typedef statement.
o When checking equivalence of named types, we have two possibilities.
1. Structural Equivalence
2. Names Equivalence
1. Structural Equivalence of Type Expressions
• Type expressions are built from basic types and constructors, a natural concept of equivalence
between two type expressions is structural equivalence. i.e., two expressions are either the
same basic type or formed by applying the same constructor to structurally equivalent types.
That is, two type expressions are structurally equivalent if and only if they are identical.
• For example, the type expression integer is equivalent only to integer because they are the
same basic type.
• Similarly, pointer (integer) is equivalent only to pointer (integer) because the two are formed
by applying the same constructor pointer to equivalent types.
• The algorithm recursively compares the structure of type expressions without checking for
cycles so it can be applied to a tree representation. It assumes that the only type constructors
are for arrays , products, pointers, and functions.
• The constructed type array(n1 , t1) and array(n2 , t2) are equivalent if n1 = n2 and t1 =t2

2. Name Equivalence of Type Expressions

Type Checking

• In some languages, types can be given names (Data type name). For example, in
the Pascal program fragment.
• The identifier link is declared to be a name for the type cell. The variables next, last, p, q, r are
not identical type, because the type depends on the implementation.
• Type graph is constructed to check the name equivalence.
o Every time a type constructor or basic type is seen, a new node is created.
o Every time a new type name is seen, a leaf is created.
o Two type expressions are equivalent if they are represented by the same node in the
type graph.
Example: Consider Pascal program fragment

Names Equivalence
• The identifier link is declared to be a name for the type ‡cell. new type names np and nqr have
been introduced.
• since next and last are declared with the same type name, they are treated as having equivalent
types. Similarly, q and r are treated as having equivalent types because the same implicit type
name is associated with them.
• However, p, q, and next do not have equivalent types, since they all have types with different
names.

Pascal Program-Fragment
Note that type name cel1 has three parents. All labeled pointer. An equal sign appears between the
type name link and the node in the type graph to which it refers.
4.3 Type Conversion
Conversion from one type to another type is known as implicit if it is to be
done automatically by the compiler. Implicit type conversions are also called Coercion and
coercion is limited in many languages.
Example: An integer may be converted to a real but real is not converted to an integer.
Conversion is said to be Explicit if the programmer writes something to do the Conversion.
Tasks:
1. has to allow “Indexing is only on an array”
2. has to check the range of data types used
3. INTEGER (int) has a range of -32,768 to +32767
4. FLOAT has a range of 1.2E-38 to 3.4E+38.
The type conversion is an operation that takes a data object of one type and creates the
equivalent data objects of multiple types. The signature of a type conversion operation is
given as
conversion_op :type1→type2
There are two types of type conversions which are as follows −
• Implicit type conversion (Coercions) − The programming languages that
enable mixed-mode expressions should describe conventions for implicit
operand type conversions.
Coercion is defined as an automatic conversion between types. For example in Pascal, if the
operands for the addition operation are of integer type and other real types, one of then the
integer data object is implicitly changed to type real earlier the addition is implemented.
• Explicit type conversion − Some languages support few efficiencies for doing
explicit conversions, both widening and narrowing. In some cases, warning
messages are created when an explicit narrowing conversion results in a
meaningful change to the value of the object being modified.
For example, Pascal supports a built-in function round that changes a real-number data
object to an integer data object with a value similar to the rounded value of the real. In C -
based languages, explicit type conversions are known as casts. The desired type is located in
parentheses only before the expression to be modified, as shown in (int) X for float X
converts the value of X to type integer. One of the reasons for the parenthesis in C
conversions is that C has several two-word type names, such as long int.
Advantages of Type Conversion
There are the following advantages of type conversion which are as follows −
• If during type checking, a mismatch appears between the actual type of an
argument and the expected type for that operation, then type conversion easily
converts the data object implicitly and prevents the error.
• In some languages such as C, type conversion is a built-in function, which
implicitly casts an expression to convert it to the correct type.
• Type conversion is automatically invoked in certain cases of mismatch. For
example, in Pascal, if the arguments for an arithmetic operation including ‘+’
are of mixed real and integer type, the integer data object is implicitly
converted to type real earlier the addition is implemented.
• There is no data is hidden. Because each short integer can be defined as a long
integer, no data is hidden by necessarily invoking a short int→long int.
• With dynamic type checking, conversions or coercions are built at the point that
the type mismatch is recognized during execution. For such languages
narrowing conversions can be enabled. For static type checking, more code is
added to the compiled program.
• Type conversion is a subtitle need when there is a large number of data types in
a programming language.
Types of Type Checking:
There are two kinds of type checking:
1. Static Type Checking.
2. Dynamic Type Checking.
1. Static Type Checking:
Static type checking is defined as type checking performed at compile time. It checks the
type variables at compile-time, which means the type of the variable is known at the compile
time. It generally examines the program text during the translation of the program. Using the
type rules of a system, a compiler can infer from the source text that a function (fun) will be
applied to an operand (a) of the right type each time the expression fun(a) is evaluated.
Examples of Static checks include:
• Type-checks: A compiler should report an error if an operator is applied to an
incompatible operand. For example, if an array variable and function variable are added
together.
• The flow of control checks: Statements that cause the flow of control to leave a
construct must have someplace to which to transfer the flow of control. For example, a
break statement in C causes control to leave the smallest enclosing while, for, or switch
statement, an error occurs if such an enclosing statement does not exist.
• Uniqueness checks: There are situations in which an object must be defined only once.
For example, in Pascal an identifier must be declared uniquely, labels in a case statement
must be distinct, and else a statement in a scaler type may not be represented.
• Name-related checks: Sometimes the same name may appear two or more times. For
example in Ada, a loop may have a name that appears at the beginning and end of the
construct. The compiler must check that the same name is used at both places.
The Benefits of Static Type Checking:
1. Runtime Error Protection.
2. It catches syntactic errors like spurious words or extra punctuation.
3. It catches wrong names like Math and Predefined Naming.
4. Detects incorrect argument types.
5. It catches the wrong number of arguments.
6. It catches wrong return types, like return “70”, from a function that’s declared to return
an int.
2. Dynamic Type Checking:
Dynamic Type Checking is defined as the type checking being done at run time. In Dynamic
Type Checking, types are associated with values, not variables. Implementations of dynamically
type-checked languages runtime objects are generally associated with each other through a type tag,
which is a reference to a type containing its type information. Dynamic typing is more flexible. A
static type system always restricts what can be conveniently expressed. Dynamic typing results in
more compact programs since it is more flexible and does not require types to be spelled out.
Programming with a static type system often requires more design and implementation effort.
Languages like Pascal and C have static type checking. Type checking is used to check the
correctness of the program before its execution. The main purpose of type-checking is to check the
correctness and data type assignments and type-casting of the data types, whether it is syntactically
Correct or not before there execution.
Static Type-Checking is also used to determine the amount of memory needed to store the variable.
The design of the type-checker depends on:
1. Syntactic Structure of language constructs.
2. The Expressions of languages.
The rules for assigning types to constructs (semantic rules).
The Position of the Type checker in the Compiler:

Type checking in Compiler

The token streams from the lexical analyzer are passed to the PARSER. The PARSER will
generate a syntax tree. When a program (source code) is converted into a syntax tree, the type-
checker plays a Crucial Role. So, by seeing the syntax tree, you can tell whether each data type is
handling the correct variable or not. The Type-Checker will check and if any modifications are
present, then it will modify. It produces a syntax tree, and after that, INTERMEDIATE CODE
Generation is done.
4.4 Run Time Environment
• Run Time Environment establishes relationships between names and data objects.
• The allocation and de-allocation of data objects are managed by the Run Time Environment
• Each execution of a procedure is referred to as an activation of the procedure.
• If the procedure is recursive, several of its activations may & alive at the same time. Each call
of a procedure leads to an activation that may manipulate data objects allocated for its use.
• The representation of a data object at run time is determined by its type.
• Elementary data types, such as characters, integers, and reals can be represented by equivalent
data objects in the target machine.
• However, aggregates, such as arrays, strings , and structures, are usually represented by
collections of primitive objects.
Source Language Issues
1. Procedure
2. Activation Trees
3. Control Stack
4. The Scope of a Declaration
5. Bindings of Names

1. Procedure

• A procedure definition is a declaration that associates an identifier with a statement. The


identifier is the procedure name and the statement is the procedure body.
• A procedure returns value for the called function.
• A complete program will also be treated as a procedure.
• When a procedure name appears within an executable statement, we say that the procedure is
called at that point.
• The basic idea is that a procedure call executes the procedure body.
• Some of the identifiers appearing in a procedure definition are special, and are called formal
parameters of the procedure.
• Actual parameters may be passed to a called procedure.
• Procedures can contains local and global variables .
2. Activation Trees
• Each execution of procedure is referred to as an activation of the procedure.
• Lifetime of an activation is the sequence of steps present in the execution of the procedure.
• If ‘a’ and ‘b’ be two procedures then their activations will be non-overlapping (when one is
called after other) or nested (nested procedures).
• A procedure is recursive if a new activation begins before an earlier activation of the same
procedure has ended.
• An activation tree shows the way control enters and leaves activations.
Rules to Construct an Activation Tree
• Each node represents an activation of a procedure.
• The root node represents the activation of the main program.
• The node for a is the parent of the node for b if and only if control flows from activation a to b.
• The node for a is to the left of the node for b if and only if the lifetime of a occurs before the
lifetime of b.
Sample Code for Quick sort
• main()

• {
• Int n;
• readarray();
• quicksort(1,n);
• }

• quicksort(int m, int n)

• {
• Int i= partition(m,n);
• quicksort(m,i-1);
• quicksort(i+1,n);
• }

The activation tree for this program

Activation Tree
3. Control Stack

• We can use a stack , called a control stack to keep track of live procedure activations.
• The idea is to push the node for activation onto the control stack as the activation begins and to
pop the node when the activation ends.
• Then the contents of the control stack are related to paths to the root of the activation tree.
• When node n is at the top of the control stack, the stack contains the nodes along the path from
n to the root.
Example

• The activation tree that have been reached when control enters the activation represented by
q(2,3 ) . Activations with labels r, p(1, 9), p(1, 3), and q(1, 3) have executed to completion, so
the figure contains dashed lines to their nodes. The solid lines mark the path from q(2, 3) to the
root.

Control stack contains nodes along a path to the root


4. Scope of Declaration

• A declaration in a language is a syntactic construct that associates information with a name.


Declarations may be explicit, as in the Pascal fragment var i : integer; or they may be Implicit.
• For example, any variable name starting with I is assumed to denote an integer in a Fortran
program, unless otherwise declared.
• The scope rules of a language determine which declaration of a name applies when the name
appears in the text of a program.
• The portion of the program to which a declaration applies is called the scope of that
declaration. An occurrence of a name in a procedure is said to be local to the procedure if it is
in the scope of a declaration within the procedure; otherwise, the occurrence is said to be
nonlocal.
• At compile time, the symbol table can be used to find the declaration that applies to an
occurrence of a name.
• Special, static, global, volatile, final and so on are also used to declare variables.
5. Binding of Names

• Even if each name is declared once in a program, the same name may denote different data
objects at run time. The informal term "data object" corresponds to a storage location that can
hold values.
• In programming language semantics, the term environment refers to a function that maps a
name to a storage location, and the term state refers to a function that maps a storage location
to the value held here
• Environments and states are different; an assignment changes the state, but not the
environment. For example, suppose that storage address 100, associated with variable pi, holds
0. After the assignment pi := 3. 14, the same storage address is associated with pi, but the value
held there is 3.14.

Binding of Names
4.5 Storage Organization

• The executing target program runs in its own logical address space in which each program
value has a location. The management and organization of this logical address space is shared
between the compiler, operating system, and target machine. The operating system maps the
logical addresses into physical addresses, which are usually spread throughout memory.
• The run-time representation of an object program in the logical address space consists of data
and program areas.
• The run time storage is subdivided to hold code and data as follows:
o The generated target code
o Data objects
o Control stack(which keeps track of information of procedure activations0)
• The size of the generated target code is fixed at compile time, so the compiler can place the
executable target code in a statically determined area Code, usually in the low end of memory.
• The size of some program data objects, such as global constants, and data generated by the
compiler, such as information to support garbage collection, may be known at compile time,
and these data objects can be placed in another statically determined area called Static.
• One reason for statically allocating as many data objects as possible is that the addresses of
these objects can be compiled into the target code.
• In early versions of Fortran, all data objects could be allocated statically.
• The executing target program runs in its own logical address space in which each program
value has a location. The management and organization of this logical address space is shared
between the complier, operating system and target machine. The operating system maps the
logical address into physical addresses, which are usually spread throughout memory.
Fig. Typical subdivision of run-time memory into code and data areas

• Multibyte objects are bytes and given the address of first byte. Run-time storage comes in
blocks, where a byte is the smallest unit of memory. Four bytes
• The storage layout for data objects is strongly influenced by the addressing constraints of the
target machine.
• A character array of length 10 needs only enough bytes to hold 10 characters, a compiler may
allocate 12 bytes to get alignment, leaving 2 bytes unused.
• This unused space due to alignment considerations is referred to as padding.
• The size of some program objects may be known at run time and may be placed in an area
called static.
• The dynamic areas used to maximize the utilization of space at run time are stack and heap.

Activation Records

Activation Records

• Procedure calls and returns are usually managed by a run-time stack called the control stack.
Each live activation has an activation record (sometimes called a frame) on the control stack.
The contents of activation records vary with the language being implemented.
• The following are the contents in an activation record
1. Temporary values, such as those arising from the evaluation of expressions, in cases where
those temporaries cannot be held in registers.
2. Local data belonging to the procedure whose activation record this is.
3. A saved machine status, with information about the state of the machine just before the call
to the procedure. This information typically includes the return address and the contents of
registers that were used by the calling procedure and that must be restored when the return
occurs.
4. An "access link" may be needed to locate data needed by the called procedure but found
elsewhere, e.g., in another activation record.
5. A control link, pointing to the activation record of the caller.
6. Space for the return value of the called function, if any. Again, not all called procedures
return a value, and if one does, we may prefer to place that value in a register for efficiency.
7. The actual parameters used by the calling procedure. Commonly, these values are not placed
in the activation record but rather in registers.
4.6 Storage Allocation
There are basically three storage-allocation strategy is used in each of the three data areas in
the organization.
1. Static allocation lays out storage for all data objects at compile time.
2. Stack allocation manages the run-time storage as a stack,
3. Heap allocation allocates and de-allocates storage as needed at run time from a data
area known as a heap.
1. Static Allocation
• In static allocation, names are bound to storage as the program is compiled, so there is no need
for a run-time support package.
• Since the bindings do not change at run time, every time a procedure is activated, its names
are bound to the same storage locations.
The above property allows the values of local names to be retained across activations of a
procedure. That is, when control returns to a procedure, the values of the locals are the same as
they were when control left the last time
• From the type of a name, the compiler determines the amount of storage to set aside for that
name.
• The address of this storage consists of an offset from an end of the activation record for the
procedure.
• The compiler must eventually decide where the activation records go, relative to the target
code and to one another.
Limitation of Static Allocation
• The size of a data object and constraints on its position in memory must be known at compile
time.
• Recursive procedures are restricted, because all activations of a procedure use the same
bindings for local names.
• Dynamic allocation is not allowed. so data structures cannot be created dynamically.

Static Allocation
2. Stack Allocation
• Stack allocation is based on the idea of a control slack.
• A stack is a Last In First Out (LIFO) storage device where new storage is allocated and
deallocated at only one ``end'', called the top of the stack.
• Storage is organized as a stack, and activation records are pushed and popped as activations
begin and end, respectively.
• Storage for the locals in each call of a procedure is contained in the activation record for that
call. Thus locals are bound to fresh storage in each activation, because a new activation record
is pushed onto the stack when a call is made.
• Furthermore, the values of locals are detected when the activation ends. that is, the values are
lost because the storage for locals disappears when the activation record is popped.
• At run time, an activation record can be allocated and de-allocated by incrementing and
decrementing top of the stack respectively.
Stack Allocation

3. Heap Allocation
• The deallocation of activation records need not occur in a last-in first-out fashion, so storage
cannot be organized as a stack.
• Heap allocation parcels out pieces of contiguous storage, as needed for activation records or
other objects. Pieces may be deallocated in any order. So over time the heap will consist of
alternate areas that are free and in use.
• Heap is an alternate for stack.
4.7 Access to Non-local Names:
• In some cases, when a procedure refer to variables that are not local to it, then such variables are called
nonlocal variables
• There are two types of scope rules, for the non-local names. They are
Static scope
Dynamic scope

Static Scope or Lexical Scope


• Lexical scope is also called static scope. In this type of scope, the scope is verified by examining the text of
the program.
• Examples: PASCAL, C and ADA are the languages that use the static scope rule.
• These languages are also called block structured languages
Block
• A block defines a new scope with a sequence of statements that contains the local data declarations. It is
enclosed within the delimiters.
Example:
{
Declaration statements
……….
}
• The beginning and end of the block are specified by the delimiter. The blocks can be in nesting fashion that
means block B2 completely can be inside the block B1
• In a block structured language, scope declaration is given by static rule or most closely nested loop
• At a program point, declarations are visible
The declarations that are made inside the procedure
The names of all enclosing procedures
The declarations of names made immediately within such procedures
• The displayed image on the screen shows the storage for the names corresponding to particular block
• Thus, block structure storage allocation can be done by stack
Lexical Scope for Nested Procedure
• If a procedure is declared inside another procedure then that procedure is known as nested procedure
• A procedure pi, can call any procedure, i.e., its direct ancestor or older siblings of its direct ancestor
Procedure main
Procedure P1
Procedure P2
Procedure P3
Procedure P4

Nesting Depth:
• Lexical scope can be implemented by using nesting depth of a procedure. The procedure of calculating
nesting depth is as follows:
The main programs nesting depth is ‘1’
When a new procedure begins, add ‘1’ to nesting depth each time
When you exit from a nested procedure, subtract ‘1’ from depth each time
The variable declared in specific procedure is associated with nesting depth
Static Scope or Lexical Scope
• The lexical scope can be implemented using access link and displays.

Access Link:
• Access links are the pointers used in the implementation of lexical scope which is obtained by using
pointer to each activation record
• If procedure p is nested within a procedure q then access link of p points to access link or most recent
activation record of procedure q
Example: Consider the following piece of code and the runtime stack during execution of the program
program test;
var a: int;
procedure A;
var d: int;
{
a := 1,
}
procedure B(i: int);
var b : int;
procedure C;
var k : int;
{
A;
}
{
if(i<>0) then B(i-1)
else C;
}
{
B(1);
}
Displays:
• If access links are used in the search, then the search can be slow
• So, optimization is used to access an activation record from the direct location of the variable without any
search
• Display is a global array d of pointers to activation records, indexed by lexical nesting depth. The number
of display elements can be known at compiler time
• d[i] is an array element which points to the most recent activation of the block at nesting depth (or lexical
level)
• A nonlocal X is found in the following manner:
• Use one array access to find the activation record containing X. if the most-closely nested declaration of X
is at nesting depth I, the d[i] points to the activation record containing the location for X
• Use relative address within the activation record
Example:

Dynamic Scope
The dynamic scope allocation rules are used for non-block structured languages. In this type of
scoping, the non-local variables access refers to the non-local data which is declared in most recently
called and still active procedures. There are two methods to implement non-local accessing under
dynamic scoping are −
Deep Access − The basic concept is to keep a stack of active variables. Use control links instead of
access links and to find a variable, search the stack from top to bottom looking for the most recent
activation record that contains the space for desired variables. This method of accessing nonlocal
variables is called deep access. Since search is made “deep” in the stack, hence the method is called
deep access. In this method, a symbol table should be used at runtime.
Shallow Access − The idea to keep central storage and allot one slot for every variable
name, if the names are not created at runtime then the storage layout can be fixed at compile time.
Otherwise, when new activation procedure occurs, then that procedure changes the storage entries for
its local at entry and exit (i.e., while entering in the procedure and while leaving the procedure)

4.8 Parameter Passing


• The communication medium among procedures is known as parameter passing. The values of
the variables from a calling procedure are transferred to the called procedure by some
mechanism.
R- value
• The value of an expression is called its r-value. The value contained in a single variable also
becomes an r-value if its appear on the right side of the assignment operator.
• R-value can always be assigned to some other variable.
L-value
• The location of the memory(address) where the expression is stored is known as the l-value of
that expression.
• It always appears on the left side if the assignment operator.

Has both l-value ←a=1; →


has only r-value
& r-value ←b=2*a;
←b=7+b;

l-value error, ←7=x+y;


as constant 7 doesn’t
represent any
memory location

Types of Parameter Passing


i. Formal Parameter: Variables that take the information passed by the caller procedure
are called formal parameters. These variables are declared in the definition of the called
function.
ii. Actual Parameter: Variables whose values and functions are passed to the called
function are called actual parameters. These variables are specified in the function call as
arguments.
Different ways of passing the parameters to the procedure
1. Call by Value
2. Call by reference
3. Copy restore
4. Call by name
1. Call by Value
• In call by value the calling procedure pass the r-value of the actual parameters and the
compiler puts that into called procedure’s activation record.
• Formal parameters hold the values passed by the calling procedure, thus any changes made in
the formal parameters does not affect the actual parameters.
2. Call by Reference

• In call by reference the formal and actual parameters refers to same memory location.
• The l-value of actual parameters is copied to the activation record of the called function. Thus
the called function has the address of the actual parameters.
• If the actual parameters does not have a l-value (eg- i+3) then it is evaluated in a
new temporary location and the address of the location is passed.
• Any changes made in the formal parameter is reflected in the actual parameters (because
changes are made at the address).

3. Call by Copy Restore

• In call by copy restore compiler copies the value in formal parameters when the procedure is
called and copy them back in actual parameters when control returns to the called function.
• The r-values are passed and on return r-value of formals are copied into l-value of actuals.
Example:
int y;
calling_procedure()
{
y = 10;
copy_restore(y); //l-value of y is passed
printf y; //prints 99
}
copy_restore(int x)
{
x = 99; // y still has value 10 (unaffected)
y = 0; // y is now 0
}
4. Call by Name
• In call by name the actual parameters are substituted for formals in all the places formals
occur in the procedure.
• It is also referred as lazy evaluation because evaluation is done on parameters only when
needed.
4.9 Symbol Table

• Symbol tables are data structures that are used by compilers to hold information about source-
program constructs. The information is collected incrementally by the analysis phases of a
compiler and used by the synthesis phases to generate the target code.
• Entries in the symbol table contain information about an identifier such as its character string
(or lexeme) , its type, its position in storage, and any other relevant information.

Symbol Table

• The symbol table, which stores information about the entire source program, is used by all
phases of the compiler.
• An essential function of a compiler is to record the variable names used in the source program
and collect information about various attributes of each name.
• These attributes may provide information about the storage allocated for a name, its type, its
scope.
• A symbol table may serve the following purposes depending upon the language in hand:
o To store the names of all entities in a structured form at one place.
o To verify if a variable has been declared.
o To implement type checking, by verifying assignments and expressions.
o To determine the scope of a name (scope resolution).
o Function name, parameter and variable.
Symbol Table is an important data structure created and maintained by the compiler in order
to keep track of semantics of variables i.e. it stores information about the scope and binding
information about names, information about instances of various entities such as variable and
function names, classes, objects, etc.
• It is built-in lexical and syntax analysis phases.
• The information is collected by the analysis phases of the compiler and is used by the
synthesis phases of the compiler to generate code.
• It is used by the compiler to achieve compile-time efficiency.
• It is used by various phases of the compiler as follows:-
1. Lexical Analysis: Creates new table entries in the table, for example like entries
about tokens.
2. Syntax Analysis: Adds information regarding attribute type, scope, dimension, line of
reference, use, etc in the table.
3. Semantic Analysis: Uses available information in the table to check for semantics i.e.
to verify that expressions and assignments are semantically correct(type checking)
and update it accordingly.
4. Intermediate Code generation: Refers symbol table for knowing how much and
what type of run-time is allocated and table helps in adding temporary variable
information.
5. Code Optimization: Uses information present in the symbol table for machine-
dependent optimization.
6. Target Code generation: Generates code by using address information of identifier
present in the table.
Symbol-Table Entries
• A compiler uses a symbol table to keep track of scope and binding information about names.
The symbol table is searched every time a name is encountered in the source text.
• Changes to the table occur if a new name or new information about an existing name is
discovered. A linear list is the simplest to implement, but its performance is poor. Hashing
schemes provide better performance.
• The symbol table grows dynamically even though fixed at compile time.
• Each entry in the symbol table is for the declaration of a name.
• The format of entries does not uniform.
• The following information about identifiers are stored in symbol table.
o The name.
o The data type.
o The block level.
o Its scope (local, global).
o Pointer / address
o Its offset from base pointer
Information used by the compiler from Symbol table:
• Data type and name
• Declaring procedures
• Offset in storage
• If structure or record then, a pointer to structure table.
• For parameters, whether parameter passing by value or by reference
• Number and type of arguments passed to function
• Base Address
Operations of Symbol table – The basic operations defined on a symbol table include:

Implementation of Symbol table(Data Structures)


Following are commonly used data structures for implementing symbol table:-
1. List –
• In this method, an array is used to store names and associated information.
• A pointer “available” is maintained at end of all stored records and new names are
added in the order as they arrive
• To search for a name we start from the beginning of the list till available pointer and
if not found we get an error “use of the undeclared name”
• While inserting a new name we must ensure that it is not already present otherwise an
error occurs i.e. “Multiple defined names”
• Insertion is fast O(1), but lookup is slow for large tables – O(n) on average
• The advantage is that it takes a minimum amount of space.
2. Linked List –
• This implementation is using a linked list. A link field is added to each record.
• Searching of names is done in order pointed by the link of the link field.
• A pointer “First” is maintained to point to the first record of the symbol table.
• Insertion is fast O(1), but lookup is slow for large tables – O(n) on average
3. Hash Table –
• In hashing scheme, two tables are maintained – a hash table and symbol table and are
the most commonly used method to implement symbol tables.
• A hash table is an array with an index range: 0 to table size – 1. These entries are
pointers pointing to the names of the symbol table.
• To search for a name we use a hash function that will result in an integer between 0 to
table size – 1.
• Insertion and lookup can be made very fast – O(1).
• The advantage is quick to search is possible and the disadvantage is that hashing is
complicated to implement.
4. Binary Search Tree –
• Another approach to implementing a symbol table is to use a binary search tree i.e. we
add two link fields i.e. left and right child.
• All names are created as child of the root node that always follows the property of the
binary search tree.
• Insertion and lookup are O(log2 n) on average.
4.10 Dynamic Storage Allocation
• The techniques needed to implement dynamic storage allocation is mainly depends on how the
storage deallocated. If deallocation is implicit, then the run-time support package is responsible
for determining when a storage block is no longer needed. There is less a compiler has to do if
deallocation is done explicitly by the programmer.
Explicit Allocation of Fixed-Sized Blocks

• The simplest form of dynamic allocation involves blocks of a fixed size.


• Allocation and deallocation can be done quickly with little or no storage overhead.
• Suppose that blocks are to be drawn from a contiguous area of storage. Initialization of the
area is done by using a portion of each block for a link to the next block.
• A pointer available points to the first block. Allocation consists of taking a block off the list
and deallocation consists of putting the block back on the list.
Explicit Allocation of Variable-Sized Blocks

• When blocks are allocated and deallocated, storage can become fragmented; that is, the heap
may consist of alternate blocks that are free.
• The situation can occur if a program allocates five blocks and then de- allocates the second and
fourth.
• Fragmentation is of no consequence if blocks are of fixed size, but if they are of variable size,
because we could not allocate a block larger than any one of the free blocks, even though the
space is available.
• First fit, worst fit and best fit are some methods for allocating variable-sized blocks.

Implicit Deallocation
• Implicit deallocation requires cooperation between the user program and the run-
time package, because the latter needs to know when a storage block is no longer in use.
• This cooperation is implemented by fixing the format of storage blocks.

• Reference counts:
o We keep track of the number of blocks that point directly to the present block. If this
count ever drops to 0, then the block can be deallocated because it cannot be referred
to.
o In other words, the block has become garbage that can be collected. Maintaining
reference counts can be costly in time.

• Marking techniques:
o An alternative approach is to suspend temporarily execution of the user program and
use the frozen pointers to determine which blocks are in use.

You might also like