CD UNIT-1

The document provides an overview of language processors, focusing on the roles of lexical analysis, code optimization, and code generation in the compilation process. It details the structure and phases of a compiler, the function of the lexical analyzer, and the importance of error handling and token specification. Additionally, it discusses the use of tools like Lex for generating lexical analyzers and the significance of efficient input buffering techniques.

Uploaded by

yannaprasanth

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views60 pages

CD UNIT-1

Uploaded by

yannaprasanth

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 60

• Language Processors: Overview of language processing

system: – preprocessors – compiler – assembler – Linkers &

loaders, difference between compiler and interpreter-
structure of a compiler:–phases of a compiler.
• Lexical Analysis: - Role of Lexical Analysis: Lexical analysis
Versus Parsing – Tokens, Patterns, and Lexemes – Attributes
for Tokens – Lexical errors - Input Buffering: Buffer Pairs –
Sentinels
• Specification of Tokens: Strings and Languages – Operations
on Languages – Regular Expressions – Regular Definitions
• Recognition of Tokens: Transition Diagrams – Recognition of
Reserved Words and Identifiers - Completion of the Running
Example – Architecture of a Transition–Diagram-Based Lexical
Analyzer
• The Lexical Analyzer Generator (LEX): Use of Lex – Structure
of Lex Programs
Code optimization
• Code optimization phase attempts to improve the
intermediate code, so that faster-running machine
code will result.
• Faster/shorter/Less power consumable target code.
• Compiler spent significant amount of time on this
phase.
• Optimized Three address code after Code
Optimization phase for the example statement is
Code Generation
• It takes intermediate representation of the source
program as input and maps it into the target
language.
• If the target language is machine code, registers or
memory locations are selected for each of the
variables used by the program.
• Intermediate instructions are translated into
sequences of machine instructions
• Crucial part is assignment of registers to hold variables.
• First operand of each instruction specifies destination.
• F-> floating point number
• #-> 60.0 consider as immediate constant

• MOVF id3, R2
• MULF #60.0, R2
• MOVF id2, R1
• ADDF R2, R1
• MOVF R1, id1
Error handler

• Each phase encounters errors.

• After detecting an error, a phase must somehow deal
with that error, so that compilation can proceed,
allowing further errors in the source program to be
detected.
• Lexical analysis phase can detect errors that do not form
any token of the language.
• Syntax analysis phase can detect the token stream that
violates the (structure (or) syntax rules of the language.
• Semantic analysis phase detects the constructs that
have no meaning to the operation involved.
Phase Pass
The process of compilation is One complete scan of the
carried in various steps. source language is called pass
Each step is called a phase It includes reading an input file
and writing to an output file

Different phases include: Many phases can be grouped as

LA,SA,SeA,ICG,CO,CG one pass
The task of compilation may be
carried out in single pass or
multiple passes
Role of Lexical Analysis

• The lexical analyzer is the first phase of a compiler.

• Its main task is to read the input characters and
produce as output a sequence of tokens that the
parser uses for syntax analysis.
• Another task of lexical analyzer is stripping out from
the source program comments and white space in
the form of blank and tab and newline characters.
• Correlating error messages from the compiler with
the source program.
• The lexical analyzer may keep track of the number of newline
characters seen, so that line number can be associated with
an error message.
• In some compilers, the lexical analyzer is in charge of making
a copy of the source program with the error messages
marked in it.
• If the lexical analyzer finds a token invalid, it generates an
error.
• The lexical analyzer works closely with the syntax analyzer.
• It reads character streams from the source code, checks for
legal tokens, and passes the data to the syntax analyzer
when it demands.
• The lexical analyzer collects information about tokens into
their associated attributes.
• After identifying the tokens, the strings are entered into
database called a symbol table.
• It works in two phases:
1. Scan
2. Separation of tokens
Lexical analysis Vs Parsing
• All compilers separate the task of analyzing
syntax into two different parts.
• Lexical and syntax
• Lexical-> small scale language constructs
– Names and literals
• Syntax-> large scale language constructs
– expressions, statements and program units
Why lexical analysis is separated from
syntax analysis?
1. Simplicity
– lexical analysis is simplified because it is less complex than syntax
analyser
– Syntax analyser can be smaller and cleaner by removing low level details
of lexical analysis
2. Efficiency
– lexical analysis should be optimized ( requires significant portion of total
compile time)
– Syntax analysis should not be optimized
3. Portability
– Lexical analysis may not be portable because input device-specific
peculiarities can be restricted to scanner
– Syntax analysis is always portable
Token

• Token is a sequence of characters that can be treated

as a single logical entity. Sequence of characters having
the collective meaning in the source program
• Typical tokens are identifiers, keywords, operators,
special symbols, constants.
• Pattern: Set of rules that describe tokens
• Lexeme: Sequence of characters in the source program
that are matched with a pattern of the token
• Ex: keyword if if condition
• relational op <,>,<= < or <= or >
Lexical errors
• It is hard for lexical analyzer to tell without aid of
other computers, that there is a source code error.
• Some errors are out of power of lexical analyzer to
recognize: – fi (a == f(x)) …
• Lexical analyzer can not tell whether fi is a
misspelling keyword if or an undeclared function
identifier. Since fi is valid lexeme.
• Such errors are recognized when no pattern for
tokens matches a character sequence.
• Other phase of the compiler probably parser
handle this type of error.
• If lexical analyser unable to proceed because of
none of the patterns for tokens matches any
prefix of the remaining input ,
• The simplest recovery strategy is panic mode
recovery
Error recovery
• Panic mode: successive characters are ignored
until we reach to a well formed token
– Delete one character from the remaining input
– Insert a missing character into the remaining input
– Replace a character by another character
– Transpose two adjacent characters
Input Buffering:
Buffer Pairs – Sentinels
Buffer pairs
• Because of the amount of time taken to
process characters and number of characters
must be processed during the compilation of
large source program, specialized buffering
techniques have been introduced.
• We need to introduce a two buffer scheme to
handle large look-aheads safely
• Each buffer is of same size N
• N is usually size of disk block
• We can read N characters into a buffer
• If fewer than N characters remain in the input
file, then a special character represented by
eof marks the end of the source file.
• Two pointers to the input are maintained:
• 1. Pointer lexeme begin :marks the beginning
of the current lexeme
• 2. Pointer forward: scans until a pattern match
is found
• Once the next lexeme is determined, forward
is set to the character at its right end.
• Lexeme begin is set to the character
immediately after the lexeme just found.
Sentinels
• For each character read we make two tests:
– one for the end of the buffer
– One to determine what character is read
 We can combine the buffer-end test with the
test for the current character if we extend each
buffer to hold a sentinel character at the end.

The sentinel is a special character that can not

be part of the source program -eof
• Eof is marked for the end of the entire input.
• Any eof that appears other than at the end of
a buffer means that the input is at an end.
Specification of Tokens
• In theory of compilation regular expressions
are used to formalize the specification of
tokens
• Regular expressions are means for specifying
regular languages
– Strings and Languages
– Operations on Languages
– Regular Expressions
– Regular Definitions
Strings and Languages
Recognition of Tokens
• Transition Diagrams
• Recognition of Reserved Words and Identifiers
• Completion of the Running Example
• Architecture of a Transition–Diagram-Based
Lexical Analyzer
• An input file, which we call lex.1, is written in the
Lex language and describes the lexical analyzer to
be generated.
• The Lex compiler transforms lex.1 to a C program,
in a file that is always named lex.yy.c.
• The latter file is compiled by the C compiler into a
file called a. out, as always.
• The C-compiler output is a working lexical analyzer
that can take a stream of input characters and
produce a stream of tokens.
• The attribute value, whether it be another numeric
code, a pointer to the symbol table, or nothing, is
placed in a global variable yylval, which is shared
between the lexical analyzer and parser, thereby
making it simple to return both the name and an
attribute value of a token.
• The declarations section includes declarations of
variables, manifest constants (identifiers declared to
stand for a constant, e.g., the name of a token), and
regular definitions.
• The translation rules each have the form
Pattern{ Action}
• Each pattern is a regular expression, which may use the
regular definitions of the declaration section.
• The actions are fragments of code, typically written in C,
although many variants of Lex using other languages
have been created.
• The third section holds whatever additional functions are
used in the actions.
• Alternatively, these functions can be compiled
separately and loaded with the lexical analyzer.
• When called by the parser, the lexical analyzer
begins reading its remaining input, one character
at a time, until it finds the longest prefix of the
input that matches one of the patterns P.
• It then executes the associated action A.
• Typically, A, will return to the parser, but if it does
not (e.g., because P describes whitespace or
comments), then the lexical analyzer proceeds to
find additional lexemes, until one of the
corresponding actions causes a return to the
parser.
• The lexical analyzer returns a single value, the
token name, to the parser, but uses the shared,
integer variable yylval to pass additional
information about the lexeme found, if needed.

HW 31712
No ratings yet
HW 31712
22 pages
Compiler Design: Lexical Analysis
No ratings yet
Compiler Design: Lexical Analysis
68 pages
Compiler Lexical Analysis Guide
No ratings yet
Compiler Lexical Analysis Guide
26 pages
Unit 1
No ratings yet
Unit 1
24 pages
Chapter 2 - Lexical Analysis
No ratings yet
Chapter 2 - Lexical Analysis
48 pages
CD Laqs
No ratings yet
CD Laqs
29 pages
Chapter 2 Lexical Analysis (Scanning)
No ratings yet
Chapter 2 Lexical Analysis (Scanning)
56 pages
Acd 2.1
No ratings yet
Acd 2.1
20 pages
4 Lexical Analysis
No ratings yet
4 Lexical Analysis
60 pages
CD - CH2 - Lexical Analysis
No ratings yet
CD - CH2 - Lexical Analysis
59 pages
Module 5 Lexical Analyser
No ratings yet
Module 5 Lexical Analyser
10 pages
Comp Chap2
No ratings yet
Comp Chap2
36 pages
CD Unit 1
No ratings yet
CD Unit 1
54 pages
Unit 1
No ratings yet
Unit 1
50 pages
CD - Ch.1
No ratings yet
CD - Ch.1
28 pages
ATCD Mod 3
No ratings yet
ATCD Mod 3
46 pages
CD Unit-1
No ratings yet
CD Unit-1
31 pages
CD - Ch.1
No ratings yet
CD - Ch.1
28 pages
CD - CH2 - Lexical Analysis
No ratings yet
CD - CH2 - Lexical Analysis
67 pages
An Analysis of Compiler Design in Context of Lexical Analyzer
No ratings yet
An Analysis of Compiler Design in Context of Lexical Analyzer
5 pages
Compiler Design Essentials
100% (1)
Compiler Design Essentials
193 pages
Notes - IAE-1-CD
No ratings yet
Notes - IAE-1-CD
14 pages
ACD Unit-2 Part-2
No ratings yet
ACD Unit-2 Part-2
20 pages
Unit I
No ratings yet
Unit I
89 pages
Chapter 2 - Lexical Analysis
No ratings yet
Chapter 2 - Lexical Analysis
74 pages
CD Chapter 1
No ratings yet
CD Chapter 1
28 pages
Lecture 3 - Lexical Analysis
No ratings yet
Lecture 3 - Lexical Analysis
42 pages
Lexical Analysis in Compiler Design
No ratings yet
Lexical Analysis in Compiler Design
46 pages
Lexical Analysis in Compilers
No ratings yet
Lexical Analysis in Compilers
5 pages
Lexical Analyzer
No ratings yet
Lexical Analyzer
16 pages
Chapter 1
No ratings yet
Chapter 1
28 pages
Unit 03 Scanner
No ratings yet
Unit 03 Scanner
51 pages
Unit NO.03 Phases in Compilers-Lexical Analysis& Syntax Analysis
No ratings yet
Unit NO.03 Phases in Compilers-Lexical Analysis& Syntax Analysis
43 pages
Lexical Analyzer
No ratings yet
Lexical Analyzer
16 pages
Unit2 Lexical Analyzer
No ratings yet
Unit2 Lexical Analyzer
6 pages
Compiler Design Lexical Analysis
No ratings yet
Compiler Design Lexical Analysis
24 pages
2-Lexical Analysis
No ratings yet
2-Lexical Analysis
52 pages
Compiler Design
No ratings yet
Compiler Design
117 pages
Comp Final
No ratings yet
Comp Final
16 pages
Chapter 2 Lexical Analysis
No ratings yet
Chapter 2 Lexical Analysis
14 pages
Compiler Design
No ratings yet
Compiler Design
14 pages
Chapter 2-Lexical Analysis
No ratings yet
Chapter 2-Lexical Analysis
48 pages
Compiler Construction CS-4207: Lecture 4-5 Instructor Name: Atif Ishaq
100% (1)
Compiler Construction CS-4207: Lecture 4-5 Instructor Name: Atif Ishaq
37 pages
Lexical Analysis and Parsing CD
No ratings yet
Lexical Analysis and Parsing CD
107 pages
21CS51 ATCD MODULE 2 - 2 Lexical Analyser Part1
No ratings yet
21CS51 ATCD MODULE 2 - 2 Lexical Analyser Part1
63 pages
CD Unit I Part II Lexical Analysis
No ratings yet
CD Unit I Part II Lexical Analysis
58 pages
L2 Lexical Analysis
No ratings yet
L2 Lexical Analysis
59 pages
Unit 2
No ratings yet
Unit 2
61 pages
Lexical Analysis in Compiler Design With Example
No ratings yet
Lexical Analysis in Compiler Design With Example
8 pages
Unit 2
No ratings yet
Unit 2
14 pages
Unit 01 - PART 2
No ratings yet
Unit 01 - PART 2
25 pages
Compiler Design Chapter 2
No ratings yet
Compiler Design Chapter 2
14 pages
Ch2 - Lexical Analysis
No ratings yet
Ch2 - Lexical Analysis
71 pages
2.1 - Lexical Analysis
No ratings yet
2.1 - Lexical Analysis
102 pages
Lexical Analysis - Compiler Design: Token, Pattern and Lexeme
No ratings yet
Lexical Analysis - Compiler Design: Token, Pattern and Lexeme
5 pages
Google Ai-Ml Report-5j6
No ratings yet
Google Ai-Ml Report-5j6
52 pages
Rice Mill J6 Report
No ratings yet
Rice Mill J6 Report
18 pages
2024 SIH1554-21501A05J6 Y Prasanth
0% (1)
2024 SIH1554-21501A05J6 Y Prasanth
6 pages
Infosys Pragathi Report
No ratings yet
Infosys Pragathi Report
68 pages
Full Stack Guide for Frontend Engineers
No ratings yet
Full Stack Guide for Frontend Engineers
137 pages
Mathematics P2 May June 2022 MG Eng
No ratings yet
Mathematics P2 May June 2022 MG Eng
21 pages
MS WORD Icons and Uses
No ratings yet
MS WORD Icons and Uses
12 pages
Bulk Materials Requirements Checklist
No ratings yet
Bulk Materials Requirements Checklist
1 page
Data Insights for Auto Parts Firm
100% (3)
Data Insights for Auto Parts Firm
46 pages
Capstone 2 Content
No ratings yet
Capstone 2 Content
3 pages
Open Air Theatre Case Study Ansal Plaza
67% (3)
Open Air Theatre Case Study Ansal Plaza
23 pages
Frequency Shift Keying Scheme To Implement SDR Using Hackrf One
No ratings yet
Frequency Shift Keying Scheme To Implement SDR Using Hackrf One
12 pages
Siemens SQM
No ratings yet
Siemens SQM
6 pages
6th Grade Science Revision Test
No ratings yet
6th Grade Science Revision Test
3 pages
348 15035 BA114 2011 4 2 1 Sheet Ser - Noans
No ratings yet
348 15035 BA114 2011 4 2 1 Sheet Ser - Noans
13 pages
Chemistry of Peptide Synthesis
No ratings yet
Chemistry of Peptide Synthesis
8 pages
Mystery Diagnosis Presentation
No ratings yet
Mystery Diagnosis Presentation
19 pages
Guitar String Tuning 101
No ratings yet
Guitar String Tuning 101
2 pages
Chapter-2-Static of Data-1
No ratings yet
Chapter-2-Static of Data-1
13 pages
Axcelis Oxygenfreeplasmachipscalepkg
No ratings yet
Axcelis Oxygenfreeplasmachipscalepkg
8 pages
Trifecta Verde en Resplandor Villas Floor Plans Brochure
No ratings yet
Trifecta Verde en Resplandor Villas Floor Plans Brochure
45 pages
3399 Report Acon 12.12.2021
No ratings yet
3399 Report Acon 12.12.2021
29 pages
Adjusting Solar Panel Voc for Cold
No ratings yet
Adjusting Solar Panel Voc for Cold
14 pages
Review Combustion
No ratings yet
Review Combustion
21 pages
Mak Thermic Fluid Synth
No ratings yet
Mak Thermic Fluid Synth
4 pages
DSP Lab: DFT & MATLAB Tasks
No ratings yet
DSP Lab: DFT & MATLAB Tasks
8 pages
Refractory Technician Training
100% (1)
Refractory Technician Training
32 pages
Escaping Free-Energy Minima - PNAS
No ratings yet
Escaping Free-Energy Minima - PNAS
7 pages
Linear & Angular Mechanics Test
No ratings yet
Linear & Angular Mechanics Test
5 pages
UG Final Exam Grade Sheet
No ratings yet
UG Final Exam Grade Sheet
2 pages
Class 12 Chemistry Project Electrochemistry
No ratings yet
Class 12 Chemistry Project Electrochemistry
4 pages
Machine Learning and Real-World Applications
100% (1)
Machine Learning and Real-World Applications
19 pages
Kmno4 Titration
100% (1)
Kmno4 Titration
3 pages
P2 AEIS Math Exam Papers (TYE 5 Years Series)
100% (1)
P2 AEIS Math Exam Papers (TYE 5 Years Series)
26 pages

CD UNIT-1

Uploaded by

CD UNIT-1

Uploaded by

• Language Processors: Overview of language processing

system: – preprocessors – compiler – assembler – Linkers &

• Each phase encounters errors.

Different phases include: Many phases can be grouped as

• The lexical analyzer is the first phase of a compiler.

• Token is a sequence of characters that can be treated

The sentinel is a special character that can not

You might also like