Compiler Construction Notes
1. Definitions and Basics
• Compiler: A program that translates code written in a high-level programming
language (source code) into machine code or an intermediate code that can be
executed by a computer.
• Translator: A general term for tools that convert code from one form to another,
including:
o Assembler: Converts assembly language to machine code.
o Interpreter: Executes code directly without compiling it into machine code.
o Compiler: Converts high-level code to machine code or intermediate code.
2. Purpose of a Compiler
• Translation: Convert source code into machine code or intermediate code.
• Optimization: Improve the efficiency of the code by reducing redundancy and
improving performance.
• Error Detection: Identify and report errors in the source code.
• Portability: Enable code to run on different platforms with minimal changes.
• Abstraction: Hide low-level details from the programmer.
• Security: Ensure that the code is safe from vulnerabilities.
3. Language Processing System
Source Program
• Preprocessor: Handles tasks before compilation, such as:
o Macro Expansion: Replace macros with their definitions.
o File Inclusion: Insert the contents of other files into the source code.
o Conditional Compilation: Include or exclude code based on conditions.
o Line Control: Manage line numbers for debugging.
o Token Comprehension & String Fracture: Break code into tokens for
parsing.
o Symbol Definition and Substitution: Define and replace symbols.
o Error Handling: Detect and report errors in the preprocessor phase.
Target Assembly Program
• Assembler: Converts assembly code into machine code.
• Relocatable Machine Code: Code that can be loaded at any memory location.
4. Phases of Compilation
Phase 1: Lexical Analysis
• Purpose: Break the source code into tokens (keywords, identifiers, operators, etc.).
• Output: Stream of tokens for syntax analysis.
Phase 2: Syntax Analysis (Parsing)
• Purpose: Check the grammatical structure of the code.
• Output: Syntax tree or parse tree.
Phase 3: Semantic Analysis
• Purpose: Ensure that the code makes logical sense.
• Checks:
o Type Checking: Ensure variables and operations are of the correct type.
o Division by Zero: Prevent division by zero errors.
o Function Calls: Ensure functions receive valid arguments.
• Output: Checked syntax tree or error messages.
Phase 4: Intermediate Code Generation
• Purpose: Convert the syntax tree into an intermediate representation (IR) for further
processing.
• Example IR:LOAD a ADD to SQPT_88
LOAD b NSA b
LOAD c ADD b SQPT_88
NULL b DIV AB, 2A
NULL a, c STORE root 1
NULL C, AC NSA b
SUB RB, AC SUB b SQPT_88
CALL SQPT, BR DIV AB, 2A
Phase 5: Code Optimization
• Purpose: Improve the efficiency of the code by removing redundancy and
optimizing operations.
• Example:
o Original Code:temp1 = int to real (60)
temp2 = id3 * temp1
temp3 = id2 + temp1
id1 = temp3
o Optimized Code:temp1 = id3 * 60.0
id1 = id2 + temp1
Phase 6: Code Generation
• Purpose: Convert the optimized intermediate code into machine-specific
instructions.
• Example Assembly Output (x86-64):MOV rate, a
MOV rb_2, b
MOV rate, c
MOV rate, max, 4
MOV rate, max, 1
Calculate Sqr^2
MOV rate, max, 1
Calculate Sqr^2
Phase 7: Linking and Execution
• Purpose: Link the compiled code with external libraries and create the final
executable file.
• Example:
o Input Coefficients: a=1, b=-3, c=2
o Output: Roots are real and different: 2.0000 | 1.0000
5. Assembly Language Features & Functions
1. Translation: Convert assembly code to machine code.
2. Symbol Resolution: Resolve symbolic references in the code.
3. Creative Processing: Handle complex instructions and data.
4. Zero Handling: Manage operations involving zero.
5. Generation of Object Code: Produce machine code from assembly code.
6. Platform Specificity: Tailor code for specific hardware platforms.
7. Low-Level Programming: Directly control hardware resources.
6. Advantages and Disadvantages
Advantages
• Efficiency: Optimized for performance.
• Low-Level Control: Direct access to hardware.
• Ease of Use: High-level languages simplify development.
• Portability: Code can run on different platforms with minimal changes.
Disadvantages
• Completion Overhead: Compilation can be time-consuming.
• Platform Dependence: Some code may not be portable across platforms.
• Debugging Complexity: Low-level code can be harder to debug.
• Learning Curve: Requires understanding of both high-level and low-level concepts.
7. Quadratic Equation Solver
Quadratic Equation: ( ax^2 + bx + c = 0 )
Phases of Compilation for Quadratic Solver
1. Lexical Analysis: Break the equation into tokens.
2. Syntax Analysis: Check the structure of the equation.
3. Semantic Analysis: Ensure the equation makes logical sense.
4. Intermediate Code Generation: Convert the equation into an intermediate
representation.
5. Optimization: Optimize the intermediate code.
6. Code Generation: Generate machine code for solving the equation.
7. Linking and Execution: Link with libraries and execute the solver.
Example Output:
• Input Coefficients: a=1, b=-3, c=2
• Output: Roots are real and different: 2.0000 | 1.0000
8. Code Optimization and Generation
Code Optimization
• Improve Efficiency: Remove redundant operations and optimize calculations.
• Example:
o Original Code:temp1 = int to real (60)
temp2 = id3 * temp1
temp3 = id2 + temp1
id1 = temp3
o Optimized Code:temp1 = id3 * 60.0
id1 = id2 + temp1
Code Generation
• Convert Intermediate Code to machine-specific instructions.
• Example Assembly Output (x86-64):MOV rate, a
MOV rb_2, b
MOV rate, c
MOV rate, max, 4
MOV rate, max, 1
Calculate Sqr^2
MOV rate, max, 1
Calculate Sqr^2
9. Semantic Analysis and Intermediate Code Generation
Semantic Analysis
• Check Logic and Operations: Ensure the code makes logical sense.
• Type Checking: Ensure variables and operations are of the correct type.
• Division by Zero: Prevent division by zero errors.
• Function Calls: Ensure functions receive valid arguments.
Intermediate Code Generation
• Convert Code into an intermediate representation (IR) for further processing.
• Example IR:LOAD a ADD to SQPT_88
LOAD b NSA b
LOAD c ADD b SQPT_88
NULL b DIV AB, 2A
NULL a, c STORE root 1
NULL C, AC NSA b
SUB RB, AC SUB b SQPT_88
CALL SQPT, BR DIV AB, 2A