0% found this document useful (0 votes)

78 views21 pages

Lexical Analysis: Programming Languages Translators

The document discusses lexical analysis in programming language translators. Lexical analysis involves breaking the source code text into tokens through processes like recognizing keywords, identifiers, numbers, operators, and punctuation. It creates a stream of tokens from the character input by grouping characters into meaningful units like identifiers, numbers, strings, and punctuation symbols. The lexical analyzer represents each unique token with a numeric code to simplify later parsing.

Uploaded by

Anwar Mohamed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

78 views21 pages

Lexical Analysis: Programming Languages Translators

Uploaded by

Anwar Mohamed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

Programming Languages Translators

Lexical Analysis
Tasks of a Scanner

recognizes the keywords of the language

these are the reserved words that have a special meaning
in the language, such as the word class in Java
recognizes special characters, such as ( and ), or
groups of special characters, such as := and ==
recognizes identifiers, integers, reals, decimals,
strings, etc
ignores whitespaces (tabs, blanks, etc) and
comments
recognizes and processes special directives (such as
the #include "file" directive in C) and macros
Lexical Analysis
A scanner groups input characters into tokens
input: x = x * (acc+123)
token value
identifier x
equal =
identifier x
star*
left-paren (
identifier acc
plus +
integer 123
right-paren )
Tokens are typically represented by numbers
Lexical Analysis

Lexical analyzer splits it into tokens

Token = sequence of characters (symbolic
name) representing a single terminal symbol
Identifiers: myVariable
Literals: 123 5.67 true
Keywords: char sizeof
Operators: + - * /
Punctuation: ; , } {
Discards whitespace and comments
Examples of Tokens in C
Tokens Lexemes
identifier Age, grade,Temp, zone, q1
number 3.1416, -498127,987.76412097
string A cat sat on a mat., 90183654
open parentheses (
close parentheses )
Semicolon ;
reserved word if IF, if, If, iF
Examples of Tokens in C

Lexical analyzer usually represents each token

by a unique integer code
+ { return(PLUS); } // PLUS = 401
- { return(MINUS); } // MINUS = 402
* { return(MULT); } // MULT = 403
/ { return(DIV); } // DIV = 404
Some tokens require regular expressions
[a-zA-Z_][a-zA-Z0-9_]* { return (ID); } // identifier
[1-9][0-9]* { return(DECIMALINT); }
0[0-7]* { return(OCTALINT); }
(0x|0X)[0-9a-fA-F]+ { return(HEXINT); }
slide 6
Example

Token Informal description Sample lexemes

if Characters i, f if
else Characters e, l, s, e else
comparison < or > or <= or >= or == or != <=, !=

id Letter followed by letter and digits pi, score, D2

number Any numeric constant 3.14159, 0, 6.02e23
literal Anything but sorrounded by core dumped

printf(total = %d\n, score);

Redefining Identifiers can be
dangerous

program confusing;
const true = false;
begin
if (a<b) = true then
f(a)
else
Whitespace

Whitespace is any space, tab, end-of-line

character (or characters), or character
sequence inside a comment
No token may contain embedded whitespace
(unless it is a character or string literal)
Example:
>= one token
> = two tokens
Reserved Keywords in C

auto, break, case, char, const, continue,

default, do, double, else, enum, extern, float,
for, goto, if, int, long, register, return, short,
signed, sizeof, static, struct, switch, typedef,
union, unsigned, void, volatile, wchar_t, while
C++ added a bunch: bool, catch, class,
dynamic_cast, inline, private, protected,
public, static_cast, template, this, virtual and
others
Each keyword is mapped to its own token
slide 10
Lexical Analysis

The process of converting a character stream into a

corresponding sequence of meaningful symbols
(called tokens or lexemes) is called tokenizing, lexing
or lexical analysis. A program that performs this
process is called a tokenizer, lexer, or scanner.
In Scheme, we tokenize (set! x (+ x 1)) as
( set! x ( + x 1 ) )
Similarly, in Java, we tokenize
System.out.println("Hello World!"); as
System . out . println ( "Hello
World!" ) ;
Parsing Process
Call the scanner to get tokens

Build a parse tree from the stream of tokens

A parse tree shows the syntactic structure of the
source program.

Add information about identifiers in the symbol

table
Report error, when found, and recover from the
error
12
Parsing
Parsing is a process that constructs a syntactic
structure (i.e. parse tree) from the stream of
tokens.
We already learn how to describe the syntactic
structure of a language using (context-free)
grammar.
So, a parser only need to do this?

Stream of tokens
Parser Parse tree
Context-free grammar
Sentinels

E = M eof * C * * 2 eof eof

Switch (*forward++) {
case eof:
if (forward is at end of first buffer) {
reload second buffer;
forward = beginning of second buffer;
}
else if {forward is at end of second buffer) {
reload first buffer;\
forward = beginning of first buffer;
}
else /* eof within a buffer marks the end of input */
terminate lexical analysis;
break;
cases for the other characters;
}
Transition diagrams

Transition diagram for relop

Transition diagrams (cont.)

Transition diagram for reserved words and

identifiers
Transition diagrams (cont.)

Transition diagram for unsigned numbers

Recognition

state = 0;
while ( (c = next_char() ) != EOF ) {
switch (state) {
case 0: if ( c == a ) state = 1;
break;
case 1: if ( c == b ) state = 2;
break;
case 2: if ( c == c ) state = 3;
break;
case 3: if ( c == a ) state = 1;
else { ungetchar(); return (TRUE); }
break;
default:
error();
}
}
if ( state == 3 ) return (TRUE) else return (FALSE);
Finite Automata for the Lexical Tokens

a- z a- z
i f 0-9
2 0-9
1 2 3 1 2
1
0-9

IF ID NUM

0-9 0-9
0-9
. 1 - 2 - 3
\n
4
a- z
1 2 3 1 2
any but \n
. blank, etc.
5 blank, etc.
4 0-9 5 0-9

REAL White space error

(and comment starting with - -)

(Appel, pp. 21)

LEXICAL ANALYSIS

Lexical Errors
Deleting an extraneous character
Inserting a missing character
Replacing an incorrect character by a correct
character
Transposing two adjacent characters(such as ,
fi=>if)
Pre-scanning
Tokens / Patterns / Regular Expressions
Lexical Analysis - searches for matches of lexeme to pattern
Lexical Analyzer returns:<actual lexeme, symbolic identifier of token>

For Example: Token Symbolic ID

if 1
then 2
else 3
>,>=,<, 4
Set of all regular := 5
expressions plus
id 6
symbolic ids plus
analyzer define required int 7
functionality. real 8

algs algs
REs --- NFA --- DFA (program for simulation)

02 Lexical Analysis
No ratings yet
02 Lexical Analysis
86 pages
4-Intro To Flex and Bison-09!09!2024
No ratings yet
4-Intro To Flex and Bison-09!09!2024
28 pages
Compiler Construction CS-4207: Lecture 4-5 Instructor Name: Atif Ishaq
100% (1)
Compiler Construction CS-4207: Lecture 4-5 Instructor Name: Atif Ishaq
37 pages
Lexical Analysis in Compilers
No ratings yet
Lexical Analysis in Compilers
5 pages
Lecture 3
No ratings yet
Lecture 3
22 pages
Chapter 3 - Lexical Analysis and Lexical Analyzer Generators
No ratings yet
Chapter 3 - Lexical Analysis and Lexical Analyzer Generators
52 pages
Unit 1 (B)
No ratings yet
Unit 1 (B)
69 pages
2-Lexical Analysis Part1
No ratings yet
2-Lexical Analysis Part1
39 pages
Parser Lexical Analysis
No ratings yet
Parser Lexical Analysis
6 pages
Compiler Design: Lexical Analysis
No ratings yet
Compiler Design: Lexical Analysis
27 pages
Lexical Analysis and Lexical Analyzer Generators: COP5621 Compiler Construction
No ratings yet
Lexical Analysis and Lexical Analyzer Generators: COP5621 Compiler Construction
52 pages
Chapter 2 Lexical Analysis
No ratings yet
Chapter 2 Lexical Analysis
14 pages
Compiler Design for Students
No ratings yet
Compiler Design for Students
40 pages
21CS51 ATCD MODULE 2 - 2 Lexical Analyser Part1
No ratings yet
21CS51 ATCD MODULE 2 - 2 Lexical Analyser Part1
63 pages
Compiler Design Lexical Analysis
No ratings yet
Compiler Design Lexical Analysis
24 pages
Ch3 Modified
No ratings yet
Ch3 Modified
80 pages
Day 2 - Lexial Analyzer
No ratings yet
Day 2 - Lexial Analyzer
37 pages
Lexical Analysis Techniques Guide
No ratings yet
Lexical Analysis Techniques Guide
20 pages
Unit 2 Lexical Analyzer
No ratings yet
Unit 2 Lexical Analyzer
63 pages
Chapter 2 - Lexical Analysis - Regular Expressions
No ratings yet
Chapter 2 - Lexical Analysis - Regular Expressions
27 pages
Compilers: Topic 2: Lexical Analysis
No ratings yet
Compilers: Topic 2: Lexical Analysis
29 pages
Compiler Lexical Analysis Guide
No ratings yet
Compiler Lexical Analysis Guide
39 pages
Chapter 2
No ratings yet
Chapter 2
67 pages
Lexical Analysis
No ratings yet
Lexical Analysis
57 pages
Chapter 2 - Lexical Analysis
No ratings yet
Chapter 2 - Lexical Analysis
10 pages
Chapter 2 Lexical Analyser
No ratings yet
Chapter 2 Lexical Analyser
40 pages
21CS51 ATCD MODULE 2 - 2 Lexical Analyser Part2
No ratings yet
21CS51 ATCD MODULE 2 - 2 Lexical Analyser Part2
62 pages
1 - Scanning Slides Sanyal Part1
No ratings yet
1 - Scanning Slides Sanyal Part1
22 pages
2 - Lexical Analysis
No ratings yet
2 - Lexical Analysis
52 pages
Compiler Course: Lexical Analysis
No ratings yet
Compiler Course: Lexical Analysis
50 pages
Comp Chap2
No ratings yet
Comp Chap2
36 pages
SSC Module2 LexicalAnalysis
No ratings yet
SSC Module2 LexicalAnalysis
26 pages
Compiler Lexical Analysis Guide
No ratings yet
Compiler Lexical Analysis Guide
26 pages
Compiler-Lexical Analysis
100% (1)
Compiler-Lexical Analysis
59 pages
Chapter 2
No ratings yet
Chapter 2
41 pages
2 1 Lexical Analysis
No ratings yet
2 1 Lexical Analysis
30 pages
Ch2 - Lexical Analysis
No ratings yet
Ch2 - Lexical Analysis
71 pages
Lexical Analysis: Textbook:Modern Compiler Design
No ratings yet
Lexical Analysis: Textbook:Modern Compiler Design
43 pages
Unit2 Lexical Analyzer
No ratings yet
Unit2 Lexical Analyzer
6 pages
CH 3
No ratings yet
CH 3
66 pages
2-Lexical Analysis
No ratings yet
2-Lexical Analysis
52 pages
CH 3 Myppt
No ratings yet
CH 3 Myppt
59 pages
Unit NO.03 Phases in Compilers-Lexical Analysis& Syntax Analysis
No ratings yet
Unit NO.03 Phases in Compilers-Lexical Analysis& Syntax Analysis
43 pages
Ch2 - Lexical Analysis
No ratings yet
Ch2 - Lexical Analysis
71 pages
Lexical Analysis 2
No ratings yet
Lexical Analysis 2
24 pages
04 Lexi Cal A Analysis
No ratings yet
04 Lexi Cal A Analysis
39 pages
Chapter 2 - Lexical Analyser
No ratings yet
Chapter 2 - Lexical Analyser
38 pages
Pdf&rendition 1
No ratings yet
Pdf&rendition 1
14 pages
Lexical Analysis
No ratings yet
Lexical Analysis
6 pages
Compiler Design Chapter 2
No ratings yet
Compiler Design Chapter 2
14 pages
Module 5 Lexical Analyser
No ratings yet
Module 5 Lexical Analyser
10 pages
Chapter 2 - Lexical Analysis
No ratings yet
Chapter 2 - Lexical Analysis
74 pages
Lecture 4 Lexical Analysis
No ratings yet
Lecture 4 Lexical Analysis
23 pages
Compiler Lexical Analysis Guide
No ratings yet
Compiler Lexical Analysis Guide
56 pages
R.V. College of Engineering
No ratings yet
R.V. College of Engineering
56 pages
CC (CBCS 2019-2020)
No ratings yet
CC (CBCS 2019-2020)
73 pages
Flyer bR301 BLE C45
No ratings yet
Flyer bR301 BLE C45
2 pages
Blade RF
No ratings yet
Blade RF
14 pages
Api Acr1281u C1 1.11
No ratings yet
Api Acr1281u C1 1.11
98 pages
Packetyzer: Previous Work
No ratings yet
Packetyzer: Previous Work
3 pages
Mastercard Rules
No ratings yet
Mastercard Rules
443 pages
Install Open VPN On Ubuntu
No ratings yet
Install Open VPN On Ubuntu
11 pages
Prime Bank
No ratings yet
Prime Bank
24 pages
StewartCalc8 10 01
No ratings yet
StewartCalc8 10 01
32 pages
Porn Logins
No ratings yet
Porn Logins
2 pages
Types of Computer & Their Parts
No ratings yet
Types of Computer & Their Parts
6 pages
An Overview of Analytics, and AI: Learning Objectives For Chapter 1
No ratings yet
An Overview of Analytics, and AI: Learning Objectives For Chapter 1
23 pages
Session - 04 - Adders Subtractors
No ratings yet
Session - 04 - Adders Subtractors
19 pages
It Analyst Resume
100% (1)
It Analyst Resume
7 pages
DC Motor Control Using Fuzzy Logic Contr
No ratings yet
DC Motor Control Using Fuzzy Logic Contr
9 pages
CSX Cybersecurity Fundamentals Study Guide 2nd Edition Isaca Instant Download
100% (2)
CSX Cybersecurity Fundamentals Study Guide 2nd Edition Isaca Instant Download
64 pages
Amul PPT Mba - Dox
No ratings yet
Amul PPT Mba - Dox
74 pages
Dsa-Unit 2
No ratings yet
Dsa-Unit 2
85 pages
NIOS 8.5.x Documentation
No ratings yet
NIOS 8.5.x Documentation
2,385 pages
BPMN 2.0 for Process Modelers
No ratings yet
BPMN 2.0 for Process Modelers
2 pages
Research Paper-Business Analytics
No ratings yet
Research Paper-Business Analytics
13 pages
Tutorial Setting EA Wolf
No ratings yet
Tutorial Setting EA Wolf
11 pages
Quantus Fluorometer Operating Manual TM396
No ratings yet
Quantus Fluorometer Operating Manual TM396
17 pages
Attendance Monitoring System
No ratings yet
Attendance Monitoring System
43 pages
M.Tech VLSI Lab Report
No ratings yet
M.Tech VLSI Lab Report
5 pages
Petar Pavloski: Curriculum Vitae
No ratings yet
Petar Pavloski: Curriculum Vitae
2 pages
Mode S SSR: Advanced Airframe Interrogation
No ratings yet
Mode S SSR: Advanced Airframe Interrogation
11 pages
Telecom Brochure
No ratings yet
Telecom Brochure
4 pages
Cambridge International AS & A Level: Information Technology 9626/02
No ratings yet
Cambridge International AS & A Level: Information Technology 9626/02
11 pages
User Manual ETK-20180803
No ratings yet
User Manual ETK-20180803
59 pages
3 Design
No ratings yet
3 Design
32 pages
User Story Template Guide
100% (1)
User Story Template Guide
3 pages
A Contact Less Electrical Energy Transmission System
No ratings yet
A Contact Less Electrical Energy Transmission System
8 pages
Cloudy With A Chance of Profit: How Web Computing Can Help Your Business
No ratings yet
Cloudy With A Chance of Profit: How Web Computing Can Help Your Business
3 pages
Linux Booting Steps
100% (2)
Linux Booting Steps
21 pages
VX-2100 - 2200 VHF 2013 Ec061n90k
No ratings yet
VX-2100 - 2200 VHF 2013 Ec061n90k
86 pages
VADS Installation Guide
No ratings yet
VADS Installation Guide
10 pages

Lexical Analysis: Programming Languages Translators

Uploaded by

Lexical Analysis: Programming Languages Translators

Uploaded by

Programming Languages Translators

recognizes the keywords of the language

Lexical analyzer splits it into tokens

Lexical analyzer usually represents each token

Token Informal description Sample lexemes

id Letter followed by letter and digits pi, score, D2

printf(total = %d\n, score);

Whitespace is any space, tab, end-of-line

auto, break, case, char, const, continue,

The process of converting a character stream into a

Build a parse tree from the stream of tokens

Add information about identifiers in the symbol

E = M eof * C * * 2 eof eof

Transition diagram for relop

Transition diagram for reserved words and

Transition diagram for unsigned numbers

REAL White space error

(Appel, pp. 21)

For Example: Token Symbolic ID

You might also like