0% found this document useful (0 votes)

23 views39 pages

Lec 4 CH 2

The document discusses scanning and lexical analysis, focusing on finite automata, including the conversion from regular expressions to deterministic finite automata (DFAs) and non-deterministic finite automata (NFAs). It outlines the algorithms for making transitions, recognizing tokens, and handling errors, as well as the process of constructing NFAs from regular expressions using Thompson's construction. Additionally, it describes the subset construction algorithm for converting NFAs to DFAs and methods for simulating NFAs.

Uploaded by

Hesham MosaAd

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views39 pages

Lec 4 CH 2

Uploaded by

Hesham MosaAd

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 39

2.

Scanning (Lexical Analysis)

Downloaded by Mamdouh Farghaly (mamfarouk3@gmail.com)

Contents
Finite Automata

From Regular Expressions to DFAs

From an NFA to a DFA

Lookahead, Backtracking, and
Nondeterministic Automata

Downloaded by Mamdouh Farghaly (mamfarouk3@gmail.com)

A Typical Action of DFA Algorithm
• Making a transition: move the character from the input
string to a string that accumulates the characters
belonging to a single token (the token string value or
lexeme of the token)
• Reaching an accepting state: return the token just
recognized, along with any associated attributes.
• Reaching an error state: either back up in the input
(backtracking) or to generate an error token.
letter

letter [other]
start in_id finish return ID

digit
Finite automation for an
identifier with delimiter and
return value
• The error state represents the fact that either
an identifier is not to be recognized (if came
from the start state) or a delimiter has been seen
and we should now accept and generate an
identifier-token.
• [other]: indicate that the delimiting character
should be considered look-ahead, it should be
returned to the input string and not consumed.
letter

letter [other]
start in_id finish return ID

digit
Finite automation for an
identifier with delimiter and
return value
• This diagram also expresses the principle of
longest sub-string described in Section 2.2.4:
the DFA continues to match letters and digits (in
state in_id) until a delimiter is found.
• By contrast the old diagram allowed the DFA to accept at any point
while reading an identifier string.
letter
letter

letter
letter [other]
star In-id
start in_id finish return ID
t

digit
digit
How to arrive at the start state in
the first place
(combine all the tokens into one DFA)
Each of these tokens begins with a
different character
• Consider the tokens given by
: =
return ASSIGN

the strings : =, <=, and = < =

return LE
• Each of these is a fixed =
string, and DFAs for them return EQ

can be written as right

=
return ASSIGN
:

• Uniting all of their start < =

return LE
states into a single start state =

to get the DFA return EQ

Several tokens beginning with
the same character
=

• They cannot be <

return LE

simply written as the < >

return NE

right diagram, since <

it is not a DFA return LT

• The diagram can be

return LE

< >
rearranged into a return NE

DFA [other] return LT

Expand the Definition of a Finite
Automaton
• One solution for the problem is to expand
the definition of a finite automaton
• More than one transition from a state
may exist for a particular character
(NFA: non-deterministic finite automaton,)
• Developing an algorithm for systematically
turning these NFA into DFAs
ε-transition
• A transition that may occur without consulting the
input string (and without consuming any characters)


It may be viewed as a "match" of the empty string.

( This should not be confused with a match of the
characterεin the input)
ε-Transitions Used in Two Ways.
• First: to express a choice of : =

alternatives in a way without 

combining states  < =

– Advantage: keeping the 

=
original automata intact
and only adding a new
start state to connect them

• Second: to explicitly 
describe a match of the
empty string.
Definition of NFA
• An NFA (non-deterministic finite automaton) M consists
of
– an alphabet , a set of states S,
– a transition function T: S x ( U{ε})℘(S),
– a start state s0 from S, and a set of accepting states A from S

• The language accepted by M, written L(M),

– is defined to be the set of strings of characters c1c2…. cn with
– each ci from  U{ε}such that
– there exist states s1 in T(s0 ,c1), s2 in (s1, c2),..., sn in T(sn-1 , cn)
with sn an element of A.
• Any of the cI in c1c2……cn may beε,and
the string that is actually accepted is the string c,c2. . .cn with theε's
removed (since the concatenation of s withε is s itself).
Thus, the string c,c2.. .cn may actually have fewer than n
characters in it

• The sequence of states s1,..., sn are chosen from the sets of

states T(sQ , c1),..., T(sn-1, cn), and this choice will not
always be uniquely determined.
The sequence of transitions that accepts a particular string is
not determined at each step by the state and the next input
character.
Indeed, arbitrary numbers ofε's can be introduced into the string at
any point, corresponding to any number ofε-transitions in the NFA.
• An NFA does not represent an algorithm.
However, it can be simulated by an algorithm
that backtracks through every non-deterministic
choice.
Example 2.10
• The string abb can be accepted by either 2

of the following sequences of transitions: a b

a b ε b a 
1 3 4
→1→2→4→2→4

aε ε bεb
→1→3→4→2→4→2→4
• This NFA accepts the languages as
a
follows:
regular expression: (a|ε)b*
ab+|ab*|b* b b

• Left DFA accepts the same language.

b
Example 2.11
• It accepts the string acab by making the
following transitions:
– (1)(2)(3)a(4)(7)(2)(5)(6)c(7)(2)(3)a(4)(7)(8)(9)b(10)
• It accepts the same language as that generated by
the regular expression : (a | c) *b

a
 3 4 
   b
1 2 7 8 9 10
c
 5 6


2.4 From Regular Expression To
DFAs
Main Purpose
• Study an algorithm:
– Translating a regular expression into a DFA via
NFA.

Regular Program
NFA DFA
Expression
2.4.1 From a Regular Expression
to an NFA
The Idea of Thompson’s
Construction
• Use ε-transitions
– to “glue together” the machine of each piece of a regular
expression
– to form a machine that corresponds to the whole expression
• Basic regular expression
– The NFAs for basic regular expression of the form a, ε,or φ

a 
The Idea of Thompson’s
Construction
• Concatenation: to construct an NFA equal to rs
– To connect the accepting state of the machine of r to
the start state of the machine of s by anε-transition.
– The start state of the machine of r as its start state and
the accepting state of the machine of s as its accepting
state.
– This machine accepts L(rs) = L(r)L(s) and so
corresponds to the regular expression rs.

r s
… …
The Idea of Thompson’s
Construction
• Choice among alternatives: To construct an NFA
equal to r | s
– To add a new start state and a new accepting state and
connected them as shown usingε-transitions.
– Clearly, this machine accepts the language L(r|s)
=L(r )UL ( s), and so corresponds to the regular
expression r|s.
r
…
 

 

s
…
The Idea of Thompson’s
Construction
• Repetition: Given a machine that corresponds to r，
Construct a machine that corresponds to r*
– To add two new states, a start state and an accepting state.
– The repetition is afforded by the newε-transition from the
accepting state of the machine of r to its start state.
– To draw an ε-transition from the new start state to the new
accepting state.
– This construction is not unique, simplifications are possible in the
many cases.


 
r
…


Examples of NFAs Construction
Example 1.12: Translate regular expression ab|a into NFA
a

a  b

a  b
 

 
a
Examples of NFAs Construction
Example 1.13: Translate regular expression letter(letter|digit)* into NFA
letter
letter  

digit
 
letter


letter
 
 
 
letter 

letter
  
letter   
 
letter


2.4.2 From an NFA to a DFA
Goal and Methods
• Goal
– Given an arbitrary NFA, construct an equivalent DFA. (i.e., one
that accepts precisely the same strings)
• Some methods
– (1) Eliminating -transitions
• -closure: the set of all states reachable by -transitions from a state
or states
– (2) Eliminating multiple transitions from a state on a single input
character.
• Keeping track of the set of states that are reachable by matching a
single character
– Both these processes lead us to consider sets of states instead of
single states. Thus, it is not surprising that the DFA we construct
has sets of states of the original NFA as its states.
The Algorithm Called Subset
Construction.
• The -closure of a Set of states:
– The -closure of a single state s is the set of states
reachable by a series of zero or more -transitions,
and we write this set as . s
• Example 2.14: regular a*


 a 
1 2 3 4


The algorithm called subset
construction.


 a 
1 2 3 4

1 = { 1，2，4}， 2 ={2}， 3 ={2，3，4}， and 4 ={4}.

The -closure of a set of states : the union of the -closures of each individual state.
S= ∪s
sin S

{1,3} = 1 3 = {1，2，3}{2，3，4}={1，2，3，4}
The Subset Construction Algorithm

(1) Compute the -closure of the start state of M; to obtain new state M .
(2) For this set, and for each subsequent set, compute transitions on
characters a as follows.
Given a set S of states and a character a in the alphabet,
Compute the set
Sa = { t | for some s in S there is a transition from s to t on a }.
Then, compute Sa ' , the -closure of Sa.
This defines a new state in the subset construction, together with
a new transition S Sa ' .
(3) Continue with this process until no new states or transitions are created.
(4) Mark as accepting those states constructed in this manner that contain
an accepting state of M.
Examples of Subset Construction


 a 
1 2 3 4

M -closure of M ( S ) Sa

1 1,2,4 3

3 2,3,4 3
a
a
{1,2,4} {2,3,4}
Examples of Subset Construction
a  b
 2 3 4 5 

1
 
a

6 7

M -closure of M (S) Sa Sb

1 1,2,6 3,7
3,7 3,4,7,8 5
5 5,8

a b
{1,2,6} {3,4,7,8} {5,8}
Examples of Subset Construction


letter
 5 6 

letter   
1 2 3 4 9
 
letter
7 8

M -closure of M (S) Sletter Sdigit 

1 1 2
2 2,3,4,5,7,10 6 8
6 4,5,6,7,9,10 6 8
lett er
8 4,5,7,8,9,10 6 8

letter {4,5,6,7,9,10}
letter
{1} {2,3,4,5,7,10} digit letter

digit {4,5,7,8,9,10}
2.4.3 Simulating an NFA
using the Subset
Construction
One Way of Simulating an NFA
• NFAs can be implemented in similar ways to
DFAs, except that NFAs are nondeterministic
– Many different sequences of transitions that
must be tried.
– Store up transitions that have not yet been tried
and backtrack to them on failure.
An Other Way of Simulating an NFA
• Use the subset construction
– Instead of constructing all the states of the associated
DFA
– Construct only the state at each point that is indicated
by the next input character
• The advantage: Not need to construct the entire DFA
– Example: input single character a, construct the start
state {1,2,6}and then the second state {3,4,7,8} to
move and match the a.
– Since no following b, accept without generating the
state {5,8}
a b
{1,2,6} {3,4,7,8} {5,8}
An Other Way of Simulating an NFA
• The disadvantage: A state may be constructed many times, if the path
contains loops
– Example: given the input string r2d3, the sequence of states as showing
below letter

letter {4,5,6,7,9,10}
letter
{1} {2,3,4,5,7,10} digit letter

digit {4,5,7,8,9,10}
digit

• If these states are constructed as the transitions occur, then the states
of the DFA have been constructed and the state {4,5,7,8,9,10}has even
been constructed twice
– Less efficient than constructing the entire DFA
End of Chapter Two

THANKS

Compiler Construction Basics
No ratings yet
Compiler Construction Basics
79 pages
Non Deterministic Finite Automata (NFA)
No ratings yet
Non Deterministic Finite Automata (NFA)
26 pages
Can We Build A Finite Automaton For Every Regular Expression?, - Build FA Based On The Definition of Regular Expression
No ratings yet
Can We Build A Finite Automaton For Every Regular Expression?, - Build FA Based On The Definition of Regular Expression
66 pages
Lecture 3 Lexical Analyzer
No ratings yet
Lecture 3 Lexical Analyzer
44 pages
548445041
No ratings yet
548445041
17 pages
4-Lexical Analysis Part3
No ratings yet
4-Lexical Analysis Part3
37 pages
04 Regular Expressions & FAs
No ratings yet
04 Regular Expressions & FAs
46 pages
Lexical Analysis - Constructing A Scanner From Regular Expressions1
No ratings yet
Lexical Analysis - Constructing A Scanner From Regular Expressions1
20 pages
Unit 01 - Part 3
No ratings yet
Unit 01 - Part 3
18 pages
02 Automata
No ratings yet
02 Automata
78 pages
CS-352 - Spring 2024 - Lec4
No ratings yet
CS-352 - Spring 2024 - Lec4
38 pages
Lec 6
No ratings yet
Lec 6
27 pages
Lec2 0 NFA
No ratings yet
Lec2 0 NFA
30 pages
Lec2 1 Nondeterminism
No ratings yet
Lec2 1 Nondeterminism
9 pages
Lab Assignment-I
No ratings yet
Lab Assignment-I
6 pages
Compiler 5
No ratings yet
Compiler 5
42 pages
ICS312 Set 29: Deterministic Finite Automata Nondeterministic Finite Automata
No ratings yet
ICS312 Set 29: Deterministic Finite Automata Nondeterministic Finite Automata
21 pages
2 - 4 Finite Automata
No ratings yet
2 - 4 Finite Automata
23 pages
SEM04a-NFA Construction and Minimum DFA
No ratings yet
SEM04a-NFA Construction and Minimum DFA
48 pages
Lec 2
No ratings yet
Lec 2
10 pages
Lect 07
No ratings yet
Lect 07
46 pages
Automata Theory for CS Students
No ratings yet
Automata Theory for CS Students
33 pages
Dfa 1
No ratings yet
Dfa 1
23 pages
Two Issues in Lexical Analysis
No ratings yet
Two Issues in Lexical Analysis
11 pages
Week-2 Lecture 2 Lexical Analysis
No ratings yet
Week-2 Lecture 2 Lexical Analysis
15 pages
Lec02 Lexicalanalyzer
100% (1)
Lec02 Lexicalanalyzer
50 pages
Lec 4
No ratings yet
Lec 4
17 pages
Unit1 Dfa & Nfa
No ratings yet
Unit1 Dfa & Nfa
74 pages
Formal Language and Automata Theory: Prof. Sachin Jain, Prof - Atul Kumar, Prof. Vaibhavi Patel
No ratings yet
Formal Language and Automata Theory: Prof. Sachin Jain, Prof - Atul Kumar, Prof. Vaibhavi Patel
86 pages
Regular Expression
No ratings yet
Regular Expression
46 pages
3 - Lecture 07
No ratings yet
3 - Lecture 07
70 pages
ATCD All Units Basics
No ratings yet
ATCD All Units Basics
38 pages
NFA to DFA Conversion Guide
No ratings yet
NFA to DFA Conversion Guide
62 pages
DFA Concepts for Computer Science Students
0% (1)
DFA Concepts for Computer Science Students
9 pages
Lexical Analysis: Regular Expressions
No ratings yet
Lexical Analysis: Regular Expressions
11 pages
Patterns, Automata, and Regular Expressions
No ratings yet
Patterns, Automata, and Regular Expressions
4 pages
Finite Autometa PDF
No ratings yet
Finite Autometa PDF
40 pages
Lecture 3
No ratings yet
Lecture 3
29 pages
Flat CH 2
No ratings yet
Flat CH 2
86 pages
Fa 2
No ratings yet
Fa 2
36 pages
Finite Automata
No ratings yet
Finite Automata
37 pages
TAFL Unit 1 - Basic Concepts and Automata Theory - Detailed Notes
No ratings yet
TAFL Unit 1 - Basic Concepts and Automata Theory - Detailed Notes
13 pages
Lecture 2b
No ratings yet
Lecture 2b
37 pages
Lect 04
No ratings yet
Lect 04
12 pages
CC Lec 5
No ratings yet
CC Lec 5
24 pages
BM NOC Lec3 Lec4
No ratings yet
BM NOC Lec3 Lec4
11 pages
Finite Automata: DFA vs. NFA
No ratings yet
Finite Automata: DFA vs. NFA
34 pages
Atfl Unit 1 Notes
No ratings yet
Atfl Unit 1 Notes
41 pages
Lec08 NFAtoDFA
No ratings yet
Lec08 NFAtoDFA
78 pages
TOC DFA Regex NFA Explained
No ratings yet
TOC DFA Regex NFA Explained
9 pages
Compiler Design: RE to DFA
No ratings yet
Compiler Design: RE to DFA
23 pages
Lecture 2 Final
No ratings yet
Lecture 2 Final
12 pages
Automata Theory and Computability: "NFA To DFA Conversion"
No ratings yet
Automata Theory and Computability: "NFA To DFA Conversion"
8 pages
Transition Diagram
No ratings yet
Transition Diagram
13 pages
Lesson 16
No ratings yet
Lesson 16
26 pages
Aho-3 7
No ratings yet
Aho-3 7
5 pages
Lecture Notes O.S
No ratings yet
Lecture Notes O.S
3 pages
Lecture 3 STN
No ratings yet
Lecture 3 STN
27 pages
Lecture 1 STN
No ratings yet
Lecture 1 STN
21 pages
Context-Free Grammars Lec5
No ratings yet
Context-Free Grammars Lec5
45 pages
Lec3 - 2. Scanning (Lexical Analysis)
No ratings yet
Lec3 - 2. Scanning (Lexical Analysis)
11 pages
Lec - 2. Scanning (Lexical Analysis) Part 1
No ratings yet
Lec - 2. Scanning (Lexical Analysis) Part 1
37 pages
Lec - 1. INTRODUCTION
No ratings yet
Lec - 1. INTRODUCTION
39 pages
Class 8 Syllabus Overview
No ratings yet
Class 8 Syllabus Overview
28 pages
Digital Logic Design Project Report
No ratings yet
Digital Logic Design Project Report
11 pages
Nasa
No ratings yet
Nasa
36 pages
Hydrogen Production from Coffee Waste
No ratings yet
Hydrogen Production from Coffee Waste
8 pages
Certified Reliability Engineer
No ratings yet
Certified Reliability Engineer
9 pages
Photocopiable Activities-Part 1
No ratings yet
Photocopiable Activities-Part 1
1 page
Starting Out With C++ From Control Structures Through Objects 7th Edition Tony Gaddis Test Bank
No ratings yet
Starting Out With C++ From Control Structures Through Objects 7th Edition Tony Gaddis Test Bank
32 pages
Fendt 818
No ratings yet
Fendt 818
4 pages
Odv-065r17e17k17k DS 0-0-2
No ratings yet
Odv-065r17e17k17k DS 0-0-2
1 page
Math Exam for Advanced Students
No ratings yet
Math Exam for Advanced Students
4 pages
Rambold - Katalog 2004 EN
No ratings yet
Rambold - Katalog 2004 EN
53 pages
Sound Waves Activity, PhET-1
No ratings yet
Sound Waves Activity, PhET-1
7 pages
Soal Uas Kelas 1 SD Saraswati 22
No ratings yet
Soal Uas Kelas 1 SD Saraswati 22
3 pages
DSA Revision - Pratham Kohli
No ratings yet
DSA Revision - Pratham Kohli
10 pages
Motion WS1
No ratings yet
Motion WS1
3 pages
HCIA-WLAN V3.0 Training Material-5
No ratings yet
HCIA-WLAN V3.0 Training Material-5
100 pages
For K 0,1, ..... ..,9
No ratings yet
For K 0,1, ..... ..,9
2 pages
Operator Portal Tech Stack
No ratings yet
Operator Portal Tech Stack
2 pages
Lesson 2.3 Standard Normal Curve and Z Scores
No ratings yet
Lesson 2.3 Standard Normal Curve and Z Scores
18 pages
14 Agriculture01
No ratings yet
14 Agriculture01
75 pages
Motor Grader PDF
No ratings yet
Motor Grader PDF
10 pages
Soil Nitrogen Dynamics Guide
No ratings yet
Soil Nitrogen Dynamics Guide
18 pages
Electromag - Transformers - Report Final
No ratings yet
Electromag - Transformers - Report Final
7 pages
Satprep Assignment: Circular Measurement 1.: Diagram Not To Scale
No ratings yet
Satprep Assignment: Circular Measurement 1.: Diagram Not To Scale
5 pages
EEE F111 Handout II Sem 2023-2024
No ratings yet
EEE F111 Handout II Sem 2023-2024
3 pages
HTML Notes by Manthan
No ratings yet
HTML Notes by Manthan
9 pages
Exponential Smoothing Techniques
No ratings yet
Exponential Smoothing Techniques
18 pages
Bay Control Mimic BCM801: Main Features
No ratings yet
Bay Control Mimic BCM801: Main Features
16 pages
ED 340 1eng Manuale Rapido 1 - 0
100% (2)
ED 340 1eng Manuale Rapido 1 - 0
2 pages
Second Quantization in Quantum Mechanics
No ratings yet
Second Quantization in Quantum Mechanics
13 pages

Lec 4 CH 2

Uploaded by

Lec 4 CH 2

Uploaded by

2.

Scanning (Lexical Analysis)

Downloaded by Mamdouh Farghaly (mamfarouk3@gmail.com)

From Regular Expressions to DFAs

From an NFA to a DFA

Downloaded by Mamdouh Farghaly (mamfarouk3@gmail.com)

the strings : =, <=, and = < =

can be written as right

• Uniting all of their start < =

to get the DFA return EQ

• They cannot be <

simply written as the < >

right diagram, since <

it is not a DFA return LT

• The diagram can be

DFA [other] return LT

It may be viewed as a "match" of the empty string.

alternatives in a way without 

combining states  < =

– Advantage: keeping the 

• The language accepted by M, written L(M),

• The sequence of states s1,..., sn are chosen from the sets of

of the following sequences of transitions: a b

• Left DFA accepts the same language.

1 = { 1，2，4}， 2 ={2}， 3 ={2，3，4}， and 4 ={4}.

M -closure of M (S) Sa Sb

M -closure of M (S) Sletter Sdigit 

You might also like