Unit-1
1. Differences between compiler and interpreter
Feature Compiler Interpreter
Execution Translates entire program Translates and runs program
before running line by line
Speed Faster execution after Slower, due to real-time
compilation translation
Output Generates an executable file No separate output file
Error Handling Displays all errors after Stops at first error
compilation
Example Languages C, C++ Python, JavaScript
2. What are the parts of a compiler?
● Lexical Analyzer
● Syntax Analyzer (Parser)
● Semantic Analyzer
● Intermediate Code Generator
● Code Optimizer
● Code Generator
● Symbol Table Manager
● Error Handler
3. Differences between passes and phases
Feature Phases Passes
Definition Logical divisions of compilation Number of times source code is
work scanned
Count Fixed (typically 6-7) Can be single or multiple
Example Lexical analysis, parsing, etc. Single-pass or multi-pass
compilers
4. Is the compiler or interpreter efficient? Write reasons.
● Compiler is more efficient in terms of execution time because it translates the entire
program into machine code beforehand.
● Interpreter is more flexible, useful for debugging and quick scripting.
5. Do you prefer to have more passes in the compiler? Justify.
Yes, more passes allow:
● Better error checking
● More sophisticated optimizations
● Easier modular design
However, it may increase compilation time.
6. Front end of the compiler / language dependent phases
These phases depend on the source language's syntax and semantics:
● Lexical Analysis
● Syntax Analysis
● Semantic Analysis
● Intermediate Code Generation
7. Give examples for compiled and interpreted languages
● Compiled Languages: C, C++, Rust, Go
● Interpreted Languages: Python, JavaScript, Ruby
8. Back end of the compiler / language independent phases
These phases focus on target machine and are mostly language-independent:
● Intermediate Code Optimization
● Code Generation
● Code Optimization
9. Cousins of the compiler
● Assembler – Converts assembly to machine code
● Linker – Links multiple object files
● Loader – Loads programs into memory
● Preprocessor – Processes directives (e.g., macros in C)
10. Examples of single pass and multi pass compilers and which one is efficient
● Single Pass: Pascal, early versions of C
● Multi Pass: Modern C, Java, FORTRAN
● Efficiency:
○ Single-pass: faster, less memory
○ Multi-pass: better error checking, optimization
11. When do you prefer a single pass compiler?
● When speed is crucial (e.g., embedded systems)
● When code size is small
● When limited resources are available
12. When do you prefer a multi-pass compiler?
● For large, complex programs
● When high optimization is needed
● When precise error handling is required
Unit-2
1. Differences between DFA and NFA
Aspect DFA (Deterministic Finite NFA (Nondeterministic Finite
Automaton) Automaton)
Transitions One transition per input Multiple transitions allowed
symbol
Epsilon Moves Not allowed Allowed
Determinism Deterministic Non-deterministic
Implementation Easier to implement Harder to implement
Speed Faster due to determinism Slower in general
Expressive Power Equal to NFA Equal to DFA
2. What is meant by a regular set?
A regular set is a set of strings that can be described by a regular expression and recognized by
a finite automaton. Regular sets are used to define the lexical syntax of programming
languages.
3. What are tokens? Give examples.
Tokens are the smallest units of a program that have meaning. They are generated during
lexical analysis.
Examples:
● Keywords: int, while
● Identifiers: sum, count
● Operators: +, -, *
● Literals: 100, "hello"
● Symbols: ;, (, )
4. Write regular expressions for the given strings.
Let’s assume these:
● Integer: [0-9]+
● Identifier: [a-zA-Z_][a-zA-Z0-9_]*
● Floating-point: [0-9]+\.[0-9]+
● Whitespace: [ \t\n]+
● Operators: (\+|\-|\*|\/|==|!=|<=|>=)
5. What are the different buffering techniques? Which is best?
Buffering techniques:
● Unbuffered: One character at a time (inefficient).
● Single Buffer: Whole input in one buffer.
● Double Buffer: Two halves of a buffer used alternately (best due to efficiency and
reduced I/O).
Best: Double buffering – minimizes disk access delays.
6. Transition Diagram
7. Regular definitions for the following:
● Whitespace: ws → [ \t\n]+
● Digit: d → [0-9]
● Integer: int → d+
● Float: float → d+\.d+
● Identifier: id → [a-zA-Z_][a-zA-Z0-9_]*
● Operators: op → (\+|\-|\*|\/|==|=)
8. Lex for simple applications.
Example: Lex program to recognize identifiers and numbers.
%%
[0-9]+ { printf("Integer: %s\n", yytext); }
[a-zA-Z_][a-zA-Z0-9_]* { printf("Identifier: %s\n", yytext); }
[\n\t ] { /* ignore whitespace */ }
. { printf("Unknown: %s\n", yytext); }
%%
9. For the given code, write the tokens.
For code: int sum = a + b;
Tokens:
● int → Keyword
● sum → Identifier
● = → Operator
● a → Identifier
● + → Operator
● b → Identifier
● ; → Delimiter
10. Various error recovery strategies in lexical analysis
● Panic Mode: Skip characters until a known delimiter is found.
● Phrase-level Recovery: Replace or insert symbols to continue parsing.
● Error Productions: Use grammar rules that match errors.
● Global Correction: Tries minimal changes to make input valid (costly).
11. Why is lexical analysis separated from syntax analysis?
● Simplicity: Each phase focuses on a single task.
● Efficiency: Lexical analysis can be optimized separately.
● Modularity: Easier to debug and maintain.
● Reusability: Same lexer can be used with different parsers.
12. Patterns, lexemes, and tokens for the given input.
Example input: while (a < 10) a = a + 1;
Lexeme Token Pattern
while KEYWORD while
( SYMBOL \(
a IDENTIFIER [a-zA-Z_][a-zA-Z0-9_]*
< OPERATOR <
10 NUMBER [0-9]+
) SYMBOL \)
a IDENTIFIER same as above
= OPERATOR =
a IDENTIFIER same as above
+ OPERATOR +
1 NUMBER [0-9]+
; SYMBOL ;
13. Transition table
Example for recognizing a simple identifier (id → [a-zA-Z_][a-zA-Z0-9_]*):
State Input Next State
0 [a-zA-Z_] 1
1 [a-zA-Z0-9_] 1
1 other - (accept)
Unit-3
1. When is a grammar said to be ambiguous? Give an example.
A grammar is ambiguous if there exists at least one string with more than one parse
tree (or leftmost/rightmost derivation).
Example:
E → E + E | E * E | id
The string id + id * id can be parsed in multiple ways due to unclear precedence, making the
grammar ambiguous.
2. Left recursion elimination
Grammar with left recursion:
A → Aα | β
Eliminated form:
A → βA'
A' → αA' | ε
Example:
E→E+T|T
→ E → T E'
→ E' → + T E' | ε
3. Left factoring examples
Used to make grammar suitable for predictive parsing.
Before:
S → if E then S else S | if E then S
After left factoring:
S → if E then S S'
S' → else S | ε
4. Comparison between CFG and regular expressions
Feature Regular Expressions Context-Free Grammar (CFG)
Expressiveness Less expressive More expressive
Used in Lexical analysis Syntax analysis
Recognizes Regular languages Context-free languages
Handles nesting No Yes
5. Give example for LL(1) grammar
E → T E'
E' → + T E' | ε
T → id
This grammar is LL(1) as it has no left recursion and no common prefixes.
6. LL(1) grammar properties
● No left recursion
● Should be left factored
● Can be parsed using predictive parser
● Parsing table has at most one entry per cell
7. Give example for not an LL(1) grammar
S→aA|aB
A→x
B→y
Conflict: Both a A and a B start with the same token a – not LL(1).
8. Operator grammar
● A type of grammar used to handle operator precedence.
● No epsilon or left recursion.
● Has operator precedence relations between terminals like <·, =·, >·.
Example:
E → E + E | E * E | id
9. Drawbacks of recursive descent parser
● Cannot handle left-recursive grammars
● Backtracking may be required if grammar is not LL(1)
● May be inefficient for complex grammars
10. Error heuristics in predictive parser
● Use of panic mode: Skip to a synchronizing token
● Error productions: Add rules to handle common mistakes
● Panic recovery sets: Defined for each non-terminal
11. Different actions of shift reduce parser
● Shift: Move next input symbol to the stack
● Reduce: Replace symbols on the stack with a non-terminal
● Accept: Input is successfully parsed
● Error: Parsing fails due to invalid input
12. FIRST rules or algorithm
FIRST(X) = set of terminals that begin strings derived from X.
Rules:
● If X is terminal: FIRST(X) = {X}
● If X → ε is a production: ε ∈ FIRST(X)
● If X → Y1Y2...Yn, add FIRST(Y1) to FIRST(X), and if ε in FIRST(Y1), continue to Y2, and so
on.
13. FOLLOW rules or algorithm
FOLLOW(A) = set of terminals that can appear immediately to the right of A in some sentential
form.
Rules:
● If A is the start symbol: add $ to FOLLOW(A)
● If there’s a rule B → αAβ, then add FIRST(β) - {ε} to FOLLOW(A)
● If FIRST(β) contains ε or β is ε, add FOLLOW(B) to FOLLOW(A)
14. Kernel items and nonkernel items
● Kernel items:
○ Items with a dot after the first symbol
○ Items of the form [A → α·β], where α ≠ ε
○ Includes the start item [S' → ·S]
● Non-kernel items:
○ Items of the form [A → ·α] where α is any string (start of production)
15. Differences between top-down and bottom-up parsers
Feature Top-Down Parsing Bottom-Up Parsing
Start Symbol Starts from start symbol Starts from input
Derivation Leftmost derivation Rightmost derivation (in
reverse)
Examples LL(1), Recursive descent LR(0), SLR, LALR, CLR
Complexity Simpler More powerful, complex
16. Which is the powerful parser?
Bottom-up parsers (LR parsers) are more powerful than top-down parsers because:
● They can parse a larger class of grammars.
● They can handle left recursion and ambiguous constructs better.
17. How will you implement precedence relations?
Using operator precedence parsing, define:
● Precedence table with relations <·, =·, >· between operators.
● Modify parser to use these to decide when to shift or reduce.
18. When will a predictive parser encounter an error?
● If the current token does not match the entry in the parsing table
● Or the entry is empty, meaning no rule applies
19. Can we construct a parser using ambiguous grammar? Justify.
Technically yes, but:
● It’s not advisable because it leads to multiple parse trees, making syntax analysis and
semantic interpretation unclear.
● Parser generators like YACC can resolve ambiguity with precedence and associativity
rules.
20. What is YACC?
YACC (Yet Another Compiler Compiler) is a tool used to:
● Generate LALR(1) parsers
● Accept a grammar specification and produce C code for the parser
● Often used with LEX for lexical analysis
Unit IV
1. Types of Attributes
Attributes are used in syntax-directed translation.
● Synthesized Attributes: Computed from the children nodes and passed upwards in the
parse tree.
● Inherited Attributes: Passed downwards or across siblings in the parse tree.
2. Types of Intermediate Languages
1. Three-Address Code (TAC)
Example: t1 = a + b
2. Quadruples
Format: (op, arg1, arg2, result)
Example: (+ , a , b , t1)
3. Triples
Format: (op, arg1, arg2) where result is implied by position
Example: (+ , a , b)
4. Indirect Triples
Uses pointers to triples.
3. What are the types of Intermediate Representations? Which is the best one?
Types:
● Syntax Trees
● Postfix Notation
● Three-Address Code
● Quadruples
● Triples
Best: Three-Address Code (TAC)
Reason: Easy to translate into machine code and analyze for optimization.
4. What is meant by Backpatching?
Backpatching is the process of filling in jump addresses (like in if or while statements) later
when they become known.
Used in code generation for boolean expressions and control flow.
5. Different Storage Allocation Strategies
● Static Allocation: Memory assigned at compile time.
● Stack Allocation: Uses a runtime stack for local variables.
● Heap Allocation: For dynamically allocated memory (e.g., malloc).
Stack is used for function calls; heap is for objects/data with unknown lifetimes.
Unit V
1. DAG Construction for the Given Expression
DAG (Directed Acyclic Graph) helps detect:
● Common subexpressions
● Dead code
● Reordering of computations
For a = b + c + b + c, DAG avoids recomputing b + c.
2. Next Use Information
It tells where a variable will be used next. Helps in:
● Register allocation
● Dead code elimination
Each instruction keeps track of:
● Live/dead status
● Next use position
3. Structure Preserving Optimizations
These preserve program control flow and structure.
Examples:
● Loop unrolling
● Inlining
● Code motion (loop-invariant code)
4. Function Preserving Optimizations
These maintain functional behavior but may change structure.
Examples:
● Constant folding
● Strength reduction
● Dead code elimination
● Copy propagation
5. How to Calculate the Cost of an Instruction?
Based on:
● Number of machine cycles
● Instruction type (memory vs register)
● Target architecture
● Register usage, memory access
Example: a = b + c
Cost depends on: whether b and c are in registers or memory.
6. Equation for Data Flow Analysis
General equation for reaching definitions:t
OUT[B] = GEN[B] ∪ (IN[B] – KILL[B])
IN[B] = ⋃ OUT[P] for all predecessors P of B
Used in:
● Live variable analysis
● Available expressions
● Reaching definitions
7. What is Copy Propagation?
It replaces occurrences of variables with the values they were copied from.
Example:
x = y
z = x + 1
Becomes:
z = y + 1
8. Common Subexpression Elimination
Avoids recomputation of expressions that are computed earlier and still valid.
Before:t
t1 = a + b
t2 = a + b
After:
t1 = a + b
t2 = t1
9. Strength Reduction
Replaces expensive operations with cheaper ones.
Example:t
x = i * 2 → x = i + i
x = a^2 → x = a * a
10. Code Optimization Techniques
● Constant folding
● Dead code elimination
● Common subexpression elimination
● Loop unrolling
● Strength reduction
● Copy propagation
● Peephole optimization
11. Loop Optimization Techniques
● Loop Invariant Code Motion: Move calculations outside the loop.
● Loop Unrolling: Reduce number of iterations.
● Induction Variable Elimination
● Strength Reduction in Loops