Chapter_2 (2-3)
Lexical Analysis
Language And Automata Theory
Lexical Analysis (scanner)
Source
Program Tokens
Lexical
(Character analyzer
Stream)
▪ Lexical Analysis is
also known as
lexical scanner.
average = (sum/count)
lexical analyzer
◼ The lexical analyzer needs to scan
and identify only a finite set of valid
string/token/lexeme that belong to
average = (sum/count)
(accepted) the language in hand.
average identifier
◼ و السؤال األن = Assignment operator
مقبولaverage هل المتغير ( open parenthesis
في اللغة المكتوب بهاaccepted sum identifier
/ Division operator
البرنامج – الجافا مثال count Identifier
) Close parenthesis
The Problem Description
The only problem left with the lexical analyzer is
how to verify the validity of a regular expression
used in specifying the patterns of keywords of a
language.
A well-known solution is to use finite automata
(AUTOMATA THEORY) for verification.
What is Automata Theory ?
Study of abstract computing devices, or
“machines”
Automaton = an abstract computing device
• Note: A “device” need not even be a physical hardware!
A fundamental question in computer science:
• Find out what different models of machines can do and
cannot do
• The theory of computation
Computability vs. Complexity
THE CONCEPTS OF AUTOMATA THEORY
ALPHABET, STRING, LANGUAGE
Alphabet (∑)
An alphabet is a finite, non-empty set of symbols
◼ We use the symbol ∑ (sigma) to denote an
alphabet
◼ Examples:
◼ Binary: ∑ = {0,1}
◼ All lower case letters: ∑ = {a,b,c,..z}
◼ Alphanumeric: ∑ = {a-z, A-Z, 0-9}
◼ DNA molecule letters: ∑ = {a,c,g,t}
◼ …
String
A string or word is a finite sequence of symbols
chosen from Alphabet (∑)
◼ Empty string is (or “epsilon”)
◼ Length of a string w, denoted by “|w|”, is equal
to the number of (non- ) characters in the string
◼ E.g., x = 010100 |x| = 6
◼ x = 01 0 1 00 |x| = ?
Languages
A language is a set of strings
L is a said to be a language over alphabet ∑, only if L ∑*
➔ this is because ∑* is the set of all strings (of all possible length
including 0) over the given alphabet ∑
Examples:
1. Let L be the language of all strings consisting of n 0’s followed by n 1’s:
L = {,01,0011,000111,…}
1. Let L be the language of all strings of with equal number of 0’s and 1’s:
L = {,01,10,0011,1100,0101,1010,1001,…}
The Membership Problem
Given a string w∑* and a language L over
alphabet ∑ , decide whether or not w L.
ينتم اىل لغة الجافا
ي cin البمجة
هل امر ر ◼
C++ ينتم اىل لغة
ي cin البمجة
هل امر ر ◼
الثنائ
ي تنتم اىل لغة النظام
ي 10110 هل الكلمة ◼
The Membership Problem
Given a string w∑* and a language L over
alphabet ∑ , decide whether or not w L.
Example:
Let w = 100011
Is w the language of strings with equal
number of 0s and 1s?
Finite Automata - Finite state machine
◼ A finite automaton is a finite-state transition diagram
that can be used to model the recognition of a token
type specified by a regular expression
12
Finite Automata - Finite state machine
Finite state machines can be represented in many ways,
one of which is a State Diagram.
An example of a finite state machine (FSM)
Finite Automata - Table
Another representation of the finite state machine is the Table.
FINITE STATE MACHINE (FSM)
Modeling recognition of the word/string “then”
q0 q1 q2 q3 q4
Start state Transition Intermediate Final state
state
• Each state of the machine is represented by a circle.
• Transition function is represented by arcs labeled by
input symbols leading from one state to another.
FINITE STATE MACHINE (FSM)
Modeling recognition of the word/string “then”
q0 q1 q2 q3 q4
Start state Transition Intermediate Final state
state
• The accepting states are double circles, and the starting
state is indicated by an arc with no state at its source (tail)
end.
FINITE STATE MACHINE (FSM)
Modeling recognition of the word/string “then”
q0 q1 q2 q3 q4
Start state Transition Intermediate Final state
state
❑ As each symbol is read from the input string, the machine
proceeds to a new state as indicated by the transition
function, which is a function of the input symbol and the
current state of the machine.
FINITE STATE MACHINE (FSM)
Modeling recognition of the word/string “then”
q0 q1 q2 q3 q4
Start state Transition Intermediate Final state
state
When the entire input string has been read, the machine
is either in an accepting state or in a non-accepting state.
If it is in an accepting state, then we say the input string
has been accepted. Otherwise the input string has not been
accepted, i.e. (rejected)
FINITE STATE MACHINE (FSM)
A finite state machine consists of:
q0 q1 q2 q3 q4
Start state Transition Intermediate Final state
state
1. A finite set of states (Q) , one of which is designated the
starting state, and zero or more of which are designated
accepting states.
FINITE STATE MACHINE (FSM)
A finite state machine consists of:
q0 q1 q2 q3 q4
Start state Transition Intermediate Final state
state
2. A state transition function (δ) which has two arguments,
– a state and an input symbol (from a given input alphabet)
– and returns as result a state.
FINITE STATE MACHINE (FSM)
A finite state machine consists of:
q0 q1 q2 q3 q4
Start state Transition Intermediate Final state
state
3. The input is a string of symbols (∑) from the input
alphabet.
4. The machine is initially in the starting state (q0).
(The starting state may also be an accepting state.)
5. Final state (F)
Finite Automata (FA)
◼ A finite automaton can be a Nondeterministic finite
automaton or a Deterministic finite automaton
Deterministic Finite Non-deterministic Finite
Automata (DFA) Automata (NFA)
The machine can exist in only one The machine can exist in multiple
state at any given time states at the same time
22
Deterministic Finite Automata (DFA)
◼ The DFA is defined by the 5-tuple:
◼ {Q, ∑, q0, F, δ }
◼ A Deterministic Finite Automaton (DFA) consists of:
◼ Q ==> a finite set of states
◼ ∑ ==> a finite set of input symbols (alphabet)
◼ q0 ==> a start state
◼ F ==> set of final states
◼ δ ==> a transition function, which is a mapping between
Q x ∑ ==> Q
Deterministic Finite Automata (DFA)
start
➢ There are no contradictions in the state
transitions. 1
a
This means that for each state there is
2
exactly one arc leaving that state labeled
b
by each possible input symbol.
3
b
So called deterministic. 4
What does a DFA do on reading an input
string?
◼ Input: a word w in ∑*
◼ Question: Is w acceptable by the DFA?
◼ Steps:
◼ Start at the “start state” q0
◼ For every input symbol in the sequence w do
◼ Compute the next state from the current state, given the
current input symbol in w and the transition function
◼ If after all symbols in w are consumed, the current state
is one of the final states (F) then accept w;
◼ Otherwise, reject w.
Example #1
◼ Build a DFA for the following language:
◼ L = {w | w is a binary string that contains 01 as a substring}
◼ Steps for building a DFA to recognize L:
◼ ∑ = {0,1}
◼ Decide on the states: Q
◼ Designate start state and final state(s)
◼ δ: Decide on the transitions:
◼ Final states == same as “accepting states”
◼ Other states == same as “non-accepting states”
Regular expression: (0+1)*01(0+1)*
DFA for strings containing 01
• What makes this DFA deterministic? • Q = {q0,q1,q2}
1 0 0,1 • ∑ = {0,1}
• start state = q0
start 0 1
q0 q1 q2 • F = {q2}
Final • Transition table
state symbols
0 1
q0 q1 q0
states
• What if the language allows q1 q1 q2
empty strings? *q2 q2 q2
Example #2
Clamping Logic:
◼ A clamping circuit waits for a ”1” input, and turns on forever.
However, to avoid clamping on spurious noise, we’ll design a DFA
that waits for two consecutive 1s in a row before clamping on.
◼ Build a DFA for the following language:
L = { w | w is a bit string which contains the substring
11}
◼ State Design:
◼ q0 : start state (initially off), also means the most recent input was
not a 1
◼ q1: has never seen 11 but the most recent input was a 1
◼ q2: has seen 11 at least once
Example #3
◼ Build a DFA for the following language:
L = { w | w is a binary string that has even
number of 1s and even number of 0s}
◼ ?
Non-deterministic Finite
Automata (NFA)
start ◼ A Non-deterministic Finite Automaton
(NFA)
1 a,b ◼ is of course “non-deterministic”
a ◼ Implying that the machine can exist in more
2 than one state at the same time
b ◼ Transitions could be non-deterministic
3 1 qj
qi … • Each transition function therefore
b maps to a set of states
1 qk
4
Non-deterministic Finite
Automata (NFA)
◼ A Non-deterministic Finite Automaton (NFA)
consists of:
◼ Q ==> a finite set of states
◼ ∑ ==> a finite set of input symbols (alphabet)
◼ q0 ==> a start state
◼ F ==> set of final states
◼ δ ==> a transition function, which is a mapping
between Q x ∑ ==> subset of Q
◼ An NFA is also defined by the 5-tuple:
◼ {Q, ∑ , q0,F, δ }
How to use an NFA?
◼ Input: a word w in ∑*
◼ Question: Is w acceptable by the NFA?
◼ Steps:
◼ Start at the “start state” q0
◼ For every input symbol in the sequence w do
◼ Determine all possible next states from all current states, given the
current input symbol in w and the transition function
◼ If after all symbols in w are consumed and if at least one of the
current states is a final state then accept w;
◼ Otherwise, reject w.
Regular expression: (0+1)*01(0+1)*
NFA for strings containing 01
Why is this non-deterministic?
• Q = {q0,q1,q2}
0,1 0,1 • = {0,1}
• start state = q0
start 0 1
q0 q1 q2 • F = {q2}
Final • Transition table
state symbols
0 1
What will happen if at state q1 q0 {q0,q1} {q0}
states
an input of 0 is received? q1 Φ {q2}
*q2 {q2} {q2}
ADVANTAGES & CAVEATS FOR NFA
Great for modeling regular expressions
• String processing - e.g., grep, lexical analyzer
Could a non-deterministic state machine be
implemented in practice?
• A parallel computer could exist in multiple “states” at the
same time
• Probabilistic models could be viewed as extensions of
non-deterministic state machines
(e.g., toss of a coin, a roll of dice)
DIFFERENCES: DFA VS. NFA
DFA NFA
1. All transitions are deterministic 1. Some transitions could be non-
• Each transition leads to exactly one
deterministic
state • A transition could lead to a subset of
2. For each state, transition on all states
possible symbols (alphabet) 2. Not all symbol transitions need to
should be defined be defined explicitly (if undefined
will go to a dead state – this is just
3. Accepts input if the last state is in a design convenience, not to be
F confused with “non-determinism”)
4. Sometimes harder to construct 3. Accepts input if one of the last
because of the number of states states is in F
5. Practical implementation is 4. Generally easier than a DFA to
feasible construct
5. Practical implementation has to be
deterministic (convert to DFA) or
in the form of parallelism
But, DFAs and NFAs are equivalent in their power to capture langauges !!
Examples For FSM
Show a finite state machine in either state graph or table
form for Strings containing an odd number of zeros.
Examples For FSM
Strings containing three consecutive ones
Examples For FSM
Show a finite state machine in either state graph or table form for
Strings containing exactly three zeros.
• Show a finite state machine in either state graph or table form for Strings
containing an odd number of zeros and an even number of ones.
Examples For FSM
RE: (a | b)*abb start
States: {1, 2, 3, 4}
Input symbols: {a, b} 1 a,b
Transition function:
a
(1,a) = {1,2}, (1,b) = {1} 2
(2,b) = {3}, (3,b) = {4}
b
Start state: 1
3
Final state: {4}
b
4
39
DFA Reg Ex
Theorem 1
DFA to RE construction
Informally, trace all distinct paths (traversing cycles only once)
from the start state to each of the final states
and enumerate all the expressions along the way
Example: 1 0 0,1
q0 0 q1 1 q2
(1*) 0 (0*) 1 (0 + 1)*
1* 00* 1 (0+1)*
Q) What is the language?
1*00*1(0+1)*
REGULAR EXPRESSIONS (REG EX) TO DFA
Let the regular expiration is p(p|q)*p
All strings of p’s and q’s starting and ending with p.
Draw a DFA that accepts the same set of strings generated by that
regular expression.
FINITE AUTOMATA - SOME APPLICATIONS
• Software for designing and checking the behavior of
digital circuits
• Lexical analyzer of a typical compiler
• Software for scanning large bodies of text (e.g., web
pages) for pattern finding
• Software for verifying systems of all types that have a
finite number of states (e.g., stock market transaction,
communication/network protocol)