Unit-4 F&CD
Unit-4 F&CD
Type Checking
P -> D ; E
D -> D ; D | id : T
D -> D ; D
• A function lookup(e) is used to fetch the type saved in the symbol-table entry pointed to by e.
• When an identifier appears in an expression, its declared type is fetched and assigned to the
attribute type;
• The expression formed by applying the mod operator to two subexpressions of type integer has
type integer; otherwise, its type is type_error. The rule is
E -> E1 mod E2 { E.type := if E1.type = integer and E2.type = integer
then integer
else typr_error}
• In an array reference E1 [E2], the index expression E2 must have type integer, in which case
the result is the dement type t obtained from the type array(s. t ) of E1; we make no use of the
index set s of the array.
E -> E1 [ E2 ] { E.type := if E2.type = integer and E1.type = array(s,t)
then t
else typr_error}
• Within expressions, the postfix operator ↑ yields the object pointed to by its operand. The type
of E ‡ is the type ‡ of the object pointed to by the pointer E:
E -> E1 ‡ { E.type := if E1.type = ponter(t) then t
else typr_error}
else type_error}
else type_error}
While E -> while E do { S.type := if E.type = Boolean then S1.type
S1
else type_error}
Sequence E -> S1 ; S2 { S.type := if S1.type = void and S2.type = void then void
else type_error}
• An expression is the application of one expression to another. The rules for associating type
expressions with nonterminal T can be augmented by the following production and action to
permit function types in declarations.
T -> T1' -> T2' {T.type := T1.type -> T2.type}
• Quotes around the arrow used as a function constructor distinguish it from the arrow used as
the meta symbol in a production.
• The rule for checking the type of a function application is
then t
else typr_error}
Type Checking
• In some languages, types can be given names (Data type name). For example, in
the Pascal program fragment.
• The identifier link is declared to be a name for the type cell. The variables next, last, p, q, r are
not identical type, because the type depends on the implementation.
• Type graph is constructed to check the name equivalence.
o Every time a type constructor or basic type is seen, a new node is created.
o Every time a new type name is seen, a leaf is created.
o Two type expressions are equivalent if they are represented by the same node in the
type graph.
Example: Consider Pascal program fragment
Names Equivalence
• The identifier link is declared to be a name for the type ‡cell. new type names np and nqr have
been introduced.
• since next and last are declared with the same type name, they are treated as having equivalent
types. Similarly, q and r are treated as having equivalent types because the same implicit type
name is associated with them.
• However, p, q, and next do not have equivalent types, since they all have types with different
names.
Pascal Program-Fragment
Note that type name cel1 has three parents. All labeled pointer. An equal sign appears between the
type name link and the node in the type graph to which it refers.
4.3 Type Conversion
Conversion from one type to another type is known as implicit if it is to be
done automatically by the compiler. Implicit type conversions are also called Coercion and
coercion is limited in many languages.
Example: An integer may be converted to a real but real is not converted to an integer.
Conversion is said to be Explicit if the programmer writes something to do the Conversion.
Tasks:
1. has to allow “Indexing is only on an array”
2. has to check the range of data types used
3. INTEGER (int) has a range of -32,768 to +32767
4. FLOAT has a range of 1.2E-38 to 3.4E+38.
The type conversion is an operation that takes a data object of one type and creates the
equivalent data objects of multiple types. The signature of a type conversion operation is
given as
conversion_op :type1→type2
There are two types of type conversions which are as follows −
• Implicit type conversion (Coercions) − The programming languages that
enable mixed-mode expressions should describe conventions for implicit
operand type conversions.
Coercion is defined as an automatic conversion between types. For example in Pascal, if the
operands for the addition operation are of integer type and other real types, one of then the
integer data object is implicitly changed to type real earlier the addition is implemented.
• Explicit type conversion − Some languages support few efficiencies for doing
explicit conversions, both widening and narrowing. In some cases, warning
messages are created when an explicit narrowing conversion results in a
meaningful change to the value of the object being modified.
For example, Pascal supports a built-in function round that changes a real-number data
object to an integer data object with a value similar to the rounded value of the real. In C -
based languages, explicit type conversions are known as casts. The desired type is located in
parentheses only before the expression to be modified, as shown in (int) X for float X
converts the value of X to type integer. One of the reasons for the parenthesis in C
conversions is that C has several two-word type names, such as long int.
Advantages of Type Conversion
There are the following advantages of type conversion which are as follows −
• If during type checking, a mismatch appears between the actual type of an
argument and the expected type for that operation, then type conversion easily
converts the data object implicitly and prevents the error.
• In some languages such as C, type conversion is a built-in function, which
implicitly casts an expression to convert it to the correct type.
• Type conversion is automatically invoked in certain cases of mismatch. For
example, in Pascal, if the arguments for an arithmetic operation including ‘+’
are of mixed real and integer type, the integer data object is implicitly
converted to type real earlier the addition is implemented.
• There is no data is hidden. Because each short integer can be defined as a long
integer, no data is hidden by necessarily invoking a short int→long int.
• With dynamic type checking, conversions or coercions are built at the point that
the type mismatch is recognized during execution. For such languages
narrowing conversions can be enabled. For static type checking, more code is
added to the compiled program.
• Type conversion is a subtitle need when there is a large number of data types in
a programming language.
Types of Type Checking:
There are two kinds of type checking:
1. Static Type Checking.
2. Dynamic Type Checking.
1. Static Type Checking:
Static type checking is defined as type checking performed at compile time. It checks the
type variables at compile-time, which means the type of the variable is known at the compile
time. It generally examines the program text during the translation of the program. Using the
type rules of a system, a compiler can infer from the source text that a function (fun) will be
applied to an operand (a) of the right type each time the expression fun(a) is evaluated.
Examples of Static checks include:
• Type-checks: A compiler should report an error if an operator is applied to an
incompatible operand. For example, if an array variable and function variable are added
together.
• The flow of control checks: Statements that cause the flow of control to leave a
construct must have someplace to which to transfer the flow of control. For example, a
break statement in C causes control to leave the smallest enclosing while, for, or switch
statement, an error occurs if such an enclosing statement does not exist.
• Uniqueness checks: There are situations in which an object must be defined only once.
For example, in Pascal an identifier must be declared uniquely, labels in a case statement
must be distinct, and else a statement in a scaler type may not be represented.
• Name-related checks: Sometimes the same name may appear two or more times. For
example in Ada, a loop may have a name that appears at the beginning and end of the
construct. The compiler must check that the same name is used at both places.
The Benefits of Static Type Checking:
1. Runtime Error Protection.
2. It catches syntactic errors like spurious words or extra punctuation.
3. It catches wrong names like Math and Predefined Naming.
4. Detects incorrect argument types.
5. It catches the wrong number of arguments.
6. It catches wrong return types, like return “70”, from a function that’s declared to return
an int.
2. Dynamic Type Checking:
Dynamic Type Checking is defined as the type checking being done at run time. In Dynamic
Type Checking, types are associated with values, not variables. Implementations of dynamically
type-checked languages runtime objects are generally associated with each other through a type tag,
which is a reference to a type containing its type information. Dynamic typing is more flexible. A
static type system always restricts what can be conveniently expressed. Dynamic typing results in
more compact programs since it is more flexible and does not require types to be spelled out.
Programming with a static type system often requires more design and implementation effort.
Languages like Pascal and C have static type checking. Type checking is used to check the
correctness of the program before its execution. The main purpose of type-checking is to check the
correctness and data type assignments and type-casting of the data types, whether it is syntactically
Correct or not before there execution.
Static Type-Checking is also used to determine the amount of memory needed to store the variable.
The design of the type-checker depends on:
1. Syntactic Structure of language constructs.
2. The Expressions of languages.
The rules for assigning types to constructs (semantic rules).
The Position of the Type checker in the Compiler:
The token streams from the lexical analyzer are passed to the PARSER. The PARSER will
generate a syntax tree. When a program (source code) is converted into a syntax tree, the type-
checker plays a Crucial Role. So, by seeing the syntax tree, you can tell whether each data type is
handling the correct variable or not. The Type-Checker will check and if any modifications are
present, then it will modify. It produces a syntax tree, and after that, INTERMEDIATE CODE
Generation is done.
4.4 Run Time Environment
• Run Time Environment establishes relationships between names and data objects.
• The allocation and de-allocation of data objects are managed by the Run Time Environment
• Each execution of a procedure is referred to as an activation of the procedure.
• If the procedure is recursive, several of its activations may & alive at the same time. Each call
of a procedure leads to an activation that may manipulate data objects allocated for its use.
• The representation of a data object at run time is determined by its type.
• Elementary data types, such as characters, integers, and reals can be represented by equivalent
data objects in the target machine.
• However, aggregates, such as arrays, strings , and structures, are usually represented by
collections of primitive objects.
Source Language Issues
1. Procedure
2. Activation Trees
3. Control Stack
4. The Scope of a Declaration
5. Bindings of Names
1. Procedure
Activation Tree
3. Control Stack
• We can use a stack , called a control stack to keep track of live procedure activations.
• The idea is to push the node for activation onto the control stack as the activation begins and to
pop the node when the activation ends.
• Then the contents of the control stack are related to paths to the root of the activation tree.
• When node n is at the top of the control stack, the stack contains the nodes along the path from
n to the root.
Example
• The activation tree that have been reached when control enters the activation represented by
q(2,3 ) . Activations with labels r, p(1, 9), p(1, 3), and q(1, 3) have executed to completion, so
the figure contains dashed lines to their nodes. The solid lines mark the path from q(2, 3) to the
root.
• Even if each name is declared once in a program, the same name may denote different data
objects at run time. The informal term "data object" corresponds to a storage location that can
hold values.
• In programming language semantics, the term environment refers to a function that maps a
name to a storage location, and the term state refers to a function that maps a storage location
to the value held here
• Environments and states are different; an assignment changes the state, but not the
environment. For example, suppose that storage address 100, associated with variable pi, holds
0. After the assignment pi := 3. 14, the same storage address is associated with pi, but the value
held there is 3.14.
Binding of Names
4.5 Storage Organization
• The executing target program runs in its own logical address space in which each program
value has a location. The management and organization of this logical address space is shared
between the compiler, operating system, and target machine. The operating system maps the
logical addresses into physical addresses, which are usually spread throughout memory.
• The run-time representation of an object program in the logical address space consists of data
and program areas.
• The run time storage is subdivided to hold code and data as follows:
o The generated target code
o Data objects
o Control stack(which keeps track of information of procedure activations0)
• The size of the generated target code is fixed at compile time, so the compiler can place the
executable target code in a statically determined area Code, usually in the low end of memory.
• The size of some program data objects, such as global constants, and data generated by the
compiler, such as information to support garbage collection, may be known at compile time,
and these data objects can be placed in another statically determined area called Static.
• One reason for statically allocating as many data objects as possible is that the addresses of
these objects can be compiled into the target code.
• In early versions of Fortran, all data objects could be allocated statically.
• The executing target program runs in its own logical address space in which each program
value has a location. The management and organization of this logical address space is shared
between the complier, operating system and target machine. The operating system maps the
logical address into physical addresses, which are usually spread throughout memory.
Fig. Typical subdivision of run-time memory into code and data areas
• Multibyte objects are bytes and given the address of first byte. Run-time storage comes in
blocks, where a byte is the smallest unit of memory. Four bytes
• The storage layout for data objects is strongly influenced by the addressing constraints of the
target machine.
• A character array of length 10 needs only enough bytes to hold 10 characters, a compiler may
allocate 12 bytes to get alignment, leaving 2 bytes unused.
• This unused space due to alignment considerations is referred to as padding.
• The size of some program objects may be known at run time and may be placed in an area
called static.
• The dynamic areas used to maximize the utilization of space at run time are stack and heap.
Activation Records
Activation Records
• Procedure calls and returns are usually managed by a run-time stack called the control stack.
Each live activation has an activation record (sometimes called a frame) on the control stack.
The contents of activation records vary with the language being implemented.
• The following are the contents in an activation record
1. Temporary values, such as those arising from the evaluation of expressions, in cases where
those temporaries cannot be held in registers.
2. Local data belonging to the procedure whose activation record this is.
3. A saved machine status, with information about the state of the machine just before the call
to the procedure. This information typically includes the return address and the contents of
registers that were used by the calling procedure and that must be restored when the return
occurs.
4. An "access link" may be needed to locate data needed by the called procedure but found
elsewhere, e.g., in another activation record.
5. A control link, pointing to the activation record of the caller.
6. Space for the return value of the called function, if any. Again, not all called procedures
return a value, and if one does, we may prefer to place that value in a register for efficiency.
7. The actual parameters used by the calling procedure. Commonly, these values are not placed
in the activation record but rather in registers.
4.6 Storage Allocation
There are basically three storage-allocation strategy is used in each of the three data areas in
the organization.
1. Static allocation lays out storage for all data objects at compile time.
2. Stack allocation manages the run-time storage as a stack,
3. Heap allocation allocates and de-allocates storage as needed at run time from a data
area known as a heap.
1. Static Allocation
• In static allocation, names are bound to storage as the program is compiled, so there is no need
for a run-time support package.
• Since the bindings do not change at run time, every time a procedure is activated, its names
are bound to the same storage locations.
The above property allows the values of local names to be retained across activations of a
procedure. That is, when control returns to a procedure, the values of the locals are the same as
they were when control left the last time
• From the type of a name, the compiler determines the amount of storage to set aside for that
name.
• The address of this storage consists of an offset from an end of the activation record for the
procedure.
• The compiler must eventually decide where the activation records go, relative to the target
code and to one another.
Limitation of Static Allocation
• The size of a data object and constraints on its position in memory must be known at compile
time.
• Recursive procedures are restricted, because all activations of a procedure use the same
bindings for local names.
• Dynamic allocation is not allowed. so data structures cannot be created dynamically.
Static Allocation
2. Stack Allocation
• Stack allocation is based on the idea of a control slack.
• A stack is a Last In First Out (LIFO) storage device where new storage is allocated and
deallocated at only one ``end'', called the top of the stack.
• Storage is organized as a stack, and activation records are pushed and popped as activations
begin and end, respectively.
• Storage for the locals in each call of a procedure is contained in the activation record for that
call. Thus locals are bound to fresh storage in each activation, because a new activation record
is pushed onto the stack when a call is made.
• Furthermore, the values of locals are detected when the activation ends. that is, the values are
lost because the storage for locals disappears when the activation record is popped.
• At run time, an activation record can be allocated and de-allocated by incrementing and
decrementing top of the stack respectively.
Stack Allocation
3. Heap Allocation
• The deallocation of activation records need not occur in a last-in first-out fashion, so storage
cannot be organized as a stack.
• Heap allocation parcels out pieces of contiguous storage, as needed for activation records or
other objects. Pieces may be deallocated in any order. So over time the heap will consist of
alternate areas that are free and in use.
• Heap is an alternate for stack.
4.7 Access to Non-local Names:
• In some cases, when a procedure refer to variables that are not local to it, then such variables are called
nonlocal variables
• There are two types of scope rules, for the non-local names. They are
Static scope
Dynamic scope
Nesting Depth:
• Lexical scope can be implemented by using nesting depth of a procedure. The procedure of calculating
nesting depth is as follows:
The main programs nesting depth is ‘1’
When a new procedure begins, add ‘1’ to nesting depth each time
When you exit from a nested procedure, subtract ‘1’ from depth each time
The variable declared in specific procedure is associated with nesting depth
Static Scope or Lexical Scope
• The lexical scope can be implemented using access link and displays.
Access Link:
• Access links are the pointers used in the implementation of lexical scope which is obtained by using
pointer to each activation record
• If procedure p is nested within a procedure q then access link of p points to access link or most recent
activation record of procedure q
Example: Consider the following piece of code and the runtime stack during execution of the program
program test;
var a: int;
procedure A;
var d: int;
{
a := 1,
}
procedure B(i: int);
var b : int;
procedure C;
var k : int;
{
A;
}
{
if(i<>0) then B(i-1)
else C;
}
{
B(1);
}
Displays:
• If access links are used in the search, then the search can be slow
• So, optimization is used to access an activation record from the direct location of the variable without any
search
• Display is a global array d of pointers to activation records, indexed by lexical nesting depth. The number
of display elements can be known at compiler time
• d[i] is an array element which points to the most recent activation of the block at nesting depth (or lexical
level)
• A nonlocal X is found in the following manner:
• Use one array access to find the activation record containing X. if the most-closely nested declaration of X
is at nesting depth I, the d[i] points to the activation record containing the location for X
• Use relative address within the activation record
Example:
Dynamic Scope
The dynamic scope allocation rules are used for non-block structured languages. In this type of
scoping, the non-local variables access refers to the non-local data which is declared in most recently
called and still active procedures. There are two methods to implement non-local accessing under
dynamic scoping are −
Deep Access − The basic concept is to keep a stack of active variables. Use control links instead of
access links and to find a variable, search the stack from top to bottom looking for the most recent
activation record that contains the space for desired variables. This method of accessing nonlocal
variables is called deep access. Since search is made “deep” in the stack, hence the method is called
deep access. In this method, a symbol table should be used at runtime.
Shallow Access − The idea to keep central storage and allot one slot for every variable
name, if the names are not created at runtime then the storage layout can be fixed at compile time.
Otherwise, when new activation procedure occurs, then that procedure changes the storage entries for
its local at entry and exit (i.e., while entering in the procedure and while leaving the procedure)
• In call by reference the formal and actual parameters refers to same memory location.
• The l-value of actual parameters is copied to the activation record of the called function. Thus
the called function has the address of the actual parameters.
• If the actual parameters does not have a l-value (eg- i+3) then it is evaluated in a
new temporary location and the address of the location is passed.
• Any changes made in the formal parameter is reflected in the actual parameters (because
changes are made at the address).
• In call by copy restore compiler copies the value in formal parameters when the procedure is
called and copy them back in actual parameters when control returns to the called function.
• The r-values are passed and on return r-value of formals are copied into l-value of actuals.
Example:
int y;
calling_procedure()
{
y = 10;
copy_restore(y); //l-value of y is passed
printf y; //prints 99
}
copy_restore(int x)
{
x = 99; // y still has value 10 (unaffected)
y = 0; // y is now 0
}
4. Call by Name
• In call by name the actual parameters are substituted for formals in all the places formals
occur in the procedure.
• It is also referred as lazy evaluation because evaluation is done on parameters only when
needed.
4.9 Symbol Table
• Symbol tables are data structures that are used by compilers to hold information about source-
program constructs. The information is collected incrementally by the analysis phases of a
compiler and used by the synthesis phases to generate the target code.
• Entries in the symbol table contain information about an identifier such as its character string
(or lexeme) , its type, its position in storage, and any other relevant information.
Symbol Table
• The symbol table, which stores information about the entire source program, is used by all
phases of the compiler.
• An essential function of a compiler is to record the variable names used in the source program
and collect information about various attributes of each name.
• These attributes may provide information about the storage allocated for a name, its type, its
scope.
• A symbol table may serve the following purposes depending upon the language in hand:
o To store the names of all entities in a structured form at one place.
o To verify if a variable has been declared.
o To implement type checking, by verifying assignments and expressions.
o To determine the scope of a name (scope resolution).
o Function name, parameter and variable.
Symbol Table is an important data structure created and maintained by the compiler in order
to keep track of semantics of variables i.e. it stores information about the scope and binding
information about names, information about instances of various entities such as variable and
function names, classes, objects, etc.
• It is built-in lexical and syntax analysis phases.
• The information is collected by the analysis phases of the compiler and is used by the
synthesis phases of the compiler to generate code.
• It is used by the compiler to achieve compile-time efficiency.
• It is used by various phases of the compiler as follows:-
1. Lexical Analysis: Creates new table entries in the table, for example like entries
about tokens.
2. Syntax Analysis: Adds information regarding attribute type, scope, dimension, line of
reference, use, etc in the table.
3. Semantic Analysis: Uses available information in the table to check for semantics i.e.
to verify that expressions and assignments are semantically correct(type checking)
and update it accordingly.
4. Intermediate Code generation: Refers symbol table for knowing how much and
what type of run-time is allocated and table helps in adding temporary variable
information.
5. Code Optimization: Uses information present in the symbol table for machine-
dependent optimization.
6. Target Code generation: Generates code by using address information of identifier
present in the table.
Symbol-Table Entries
• A compiler uses a symbol table to keep track of scope and binding information about names.
The symbol table is searched every time a name is encountered in the source text.
• Changes to the table occur if a new name or new information about an existing name is
discovered. A linear list is the simplest to implement, but its performance is poor. Hashing
schemes provide better performance.
• The symbol table grows dynamically even though fixed at compile time.
• Each entry in the symbol table is for the declaration of a name.
• The format of entries does not uniform.
• The following information about identifiers are stored in symbol table.
o The name.
o The data type.
o The block level.
o Its scope (local, global).
o Pointer / address
o Its offset from base pointer
Information used by the compiler from Symbol table:
• Data type and name
• Declaring procedures
• Offset in storage
• If structure or record then, a pointer to structure table.
• For parameters, whether parameter passing by value or by reference
• Number and type of arguments passed to function
• Base Address
Operations of Symbol table – The basic operations defined on a symbol table include:
• When blocks are allocated and deallocated, storage can become fragmented; that is, the heap
may consist of alternate blocks that are free.
• The situation can occur if a program allocates five blocks and then de- allocates the second and
fourth.
• Fragmentation is of no consequence if blocks are of fixed size, but if they are of variable size,
because we could not allocate a block larger than any one of the free blocks, even though the
space is available.
• First fit, worst fit and best fit are some methods for allocating variable-sized blocks.
Implicit Deallocation
• Implicit deallocation requires cooperation between the user program and the run-
time package, because the latter needs to know when a storage block is no longer in use.
• This cooperation is implemented by fixing the format of storage blocks.
• Reference counts:
o We keep track of the number of blocks that point directly to the present block. If this
count ever drops to 0, then the block can be deallocated because it cannot be referred
to.
o In other words, the block has become garbage that can be collected. Maintaining
reference counts can be costly in time.
• Marking techniques:
o An alternative approach is to suspend temporarily execution of the user program and
use the frozen pointers to determine which blocks are in use.