0% found this document useful (0 votes)
5 views33 pages

Finite Automata

The document discusses finite automata conversions and lexing, highlighting the equivalence of regular expressions, NFAs, and DFAs in expressing regular languages. It details algorithms for converting between these forms, including Thompson's construction for RE to NFA, subset construction for NFA to DFA, and Hopcroft's algorithm for DFA minimization. Additionally, it covers lexer generation techniques and handling keywords and whitespace in the context of lexical analysis.

Uploaded by

Bijay Nag
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views33 pages

Finite Automata

The document discusses finite automata conversions and lexing, highlighting the equivalence of regular expressions, NFAs, and DFAs in expressing regular languages. It details algorithms for converting between these forms, including Thompson's construction for RE to NFA, subset construction for NFA to DFA, and Hopcroft's algorithm for DFA minimization. Additionally, it covers lexer generation techniques and handling keywords and whitespace in the context of lexical analysis.

Uploaded by

Bijay Nag
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

CS 432

Fall 2018
Mike Lam, Professor

Finite Automata Conversions


and Lexing
Finite Automata

Key result: all of the following have the same expressive
power (i.e., they all describe regular languages):
– Regular expressions (REs)
– Non-deterministic finite automata (NFAs)
– Deterministic finite automata (DFAs)

Proof by construction
– An algorithm exists to convert any RE to an NFA
– An algorithm exists to convert any NFA to a DFA
– An algorithm exists to convert any DFA to an RE
– For every regular language, there exists a minimal DFA

Has the fewest number of states of all DFAs equivalent to RE
Finite Automata

Finite automata transitions:

Kleene's construction

Hopcroft's
algorithm
(minimize)

Regex NFA DFA Lexer

Thompson's Subset Lexer


construction construction generators

Brzozowski's algorithm
(direct to minimal DFA)

(dashed lines indicate transitions to a minimized DFA)


Finite Automata Conversions

RE to NFA: Thompson's construction
– Core insight: inductively build up NFA using “templates”
– Core concept: use null transitions to build NFA quickly

NFA to DFA: Subset construction
– Core insight: DFA nodes represent subsets of NFA nodes
– Core concept: use null closure to calculate subsets

DFA minimization: Hopcroft’s algorithm
– Core insight: create partitions, then keep splitting

DFA to RE: Kleene's construction
– Core insight: repeatedly eliminate states by combining regexes
Thompson's Construction

Basic idea: create NFA inductively, bottom-up
– Base case:

Start with individual alphabet symbols (see below)
– Inductive case:

Combine by adding new states and null/epsilon transitions

Templates for the three basic operations
– Invariant:

The NFA always has exactly one start state and one accepting state

a
Thompson's: Concatenation
A B
Thompson's: Concatenation
AB
Thompson's: Union

B
Thompson's: Union

A|B
Thompson's: Closure

A
Thompson's: Closure
A*
ε
Thompson's Construction
Base case

Concatenation

Union

Closure
ε
Subset construction

Basic idea: create DFA incrementally
– Each DFA state represents a subset of NFA states
– Use null closure operation to “collapse” null/epsilon transitions
– Null closure: all states reachable via epsilon transitions

Essentially: where can we go “for free?”

Formally: ε-closure(s) = {s} ∪ { t ∈ S | (s,ε→t) ∈ δ }
– Simulates running all possible paths through the NFA

Null closure of A = { A }
Null closure of B = { B, D }
Null closure of C =
Null closure of D =
Subset construction

Basic idea: create DFA incrementally
– Each DFA state represents a subset of NFA states
– Use null closure operation to “collapse” null/epsilon transitions
– Null closure: all states reachable via epsilon transitions

Essentially: where can we go “for free?”

Formally: ε-closure(s) = {s} ∪ { t ∈ S | (s,ε→t) ∈ δ }
– Simulates running all possible paths through the NFA

Null closure of A = { A }
Null closure of B = { B, D }
Null closure of C = { C, D }
Null closure of D = { D }
Subset construction

Basic idea: create DFA incrementally
– Each DFA state represents a subset of NFA states
– Use null closure operation to “collapse” null/epsilon transitions
– Null closure: all states reachable via epsilon transitions

Essentially: where can we go “for free?”

Formally: ε-closure(s) = {s} ∪ { t ∈ S | (s,ε→t) ∈ δ }
– Simulates running all possible paths through the NFA

Null closure of A = { A }
Null closure of B = { B, D }
Null closure of C = { C, D }
Null closure of D = { D }
Formal Algorithm
SubsetConstruction(S, Σ, s0, SA, δ):
t0 := ε-closure(s0)
S' := { t0 } S'A := ∅ W := { t0 }
while W ≠ ∅:
choose u in W and remove it from W
for each c in Σ:
t := ε-closure(δ(u,c))
δ'(u,c) = t
if t is not in S' then
add t to S’ and W
add t to S'A if any state in t is also in SA
return (S', Σ, t0, S'A, δ')
Subset Example
Subset Example
Subset Example

{B,D}
a

{A}

b
{C,D}
Subset Example
SubsetConstruction(S, Σ, s0, SA, δ):
t0 := ε-closure(s0)
S' := { t0 } S'A := ∅ W := { t0 }
while W ≠ ∅:
choose u in W and remove it from W
for each c in Σ:
t := ε-closure(δ(u,c))
δ'(u,c) = t
if t is not in S' then
add t to S’ and W
add t to S'A if there exists a state v in t that is also in SA
return (S', Σ, t0, S'A, δ')
Subset Example

{B,D,E} a
a a
{A,E}
b {E}
b
b
{C,D}
Algorithms

Subset construction is a fixed-point algorithm
– Textbook: “Iterated application of a monotone function”
– Basically: A loop that is mathematically guaranteed to
terminate at some point
– When it terminates, some desirable property holds

In the case of subset construction: the NFA has been
converted to a DFA!
Hopcroft’s DFA Minimization

Split into two partitions (final & non-final)

Keep splitting a partition while there are states with differing behaviors
– Two states transition to differing partitions on the same symbol
– Or one state transitions on a symbol and another doesn’t

When done, each partition becomes a single state

{B,D}
a

{A} Same behavior; collapse!

b
{C,D}
a,b {B,C,D}

{A}
Hopcroft’s DFA Minimization

Split into two partitions (final & non-final)

Keep splitting a partition while there are states with differing behaviors
– Two states transition to differing partitions on the same symbol
– Or one state transitions on a symbol and another doesn’t

When done, each partition becomes a single state

{B,D}
a
a
Differing behavior on
{A} ‘a’; split partition! {B,D}
a
b
{C,D} {A}

b
{C,D}
Kleene's Construction

Replace edge labels with REs
– "a" → "a" and "a,b" → "a|b"

Eliminate states by combining REs
– See pattern below; apply pairwise around each state to be eliminated
– Repeat until only one or two states remain

Build final RE
– One state with "A" self-loop → "A*"
– Two states: see pattern below

B A C
Eliminating A C Combining final B
states: two states:

D D

AB*C|D A*B(C|DA*B)*
Brzozowski’s Algorithm

Direct NFA → minimal DFA conversion

Sub-procedures:
– Reverse(n): invert all transitions in NFA n, adding a new start
state connected to all old final states
– Subset(n): apply subset construction to NFA n
– Reach(n): remove any part of NFA n unreachable from start state

Apply them all in order three times to get minimal DFA
– First time eliminates duplicate suffixes
– Second time eliminates duplicate prefixes
– MinDFA(n) = Reach(Subset(Reverse(Reach(Subset(Reverse(n))))))
– Potentially easier to code than Hopcroft’s algorithm
Brzozowski’s Algorithm

MinDFA(n) = Reach(Subset(Reverse(Reach(Subset(Reverse(n))))))

Example from
EAC (p.76)
NFA/DFA complexity

What are the time and space requirements to...
– Build an NFA?
– Run an NFA?
– Build a DFA?
– Run a DFA?

aa*|b {B,D}
a
{A}
ε
b
{C,D}
NFA/DFA complexity

Thompson's construction
– At most two new states and four transitions per regex character
– Thus, a linear space increase with respect to the # of regex characters
– Constant # of operations per increase means linear time as well

NFA execution
– Proportional to both NFA size and input string size
– Must track multiple simultaneous “current” states

Subset construction
– Potential exponential state space explosion
– A n-state NFA could require up to 2n DFA states
– However, this rarely happens in practice

DFAs execution
– Proportional to input string size only (only track a single “current” state)
NFA/DFA complexity

NFAs build quicker (linear) but run slower
– Better if you will only run the FA a few times
– Or if you need features that are difficult to implement with DFAs

DFAs build slower but run faster (linear)
– Better if you will run the FA many times

NFA DFA
Build time O(m) O(2m)
Run time O(m×n) O(n)

m = length of regular expression


n = length of input string
Lexers

Auto-generated
– Table-driven: generic scanner, auto-generated tables
– Direct-coded: hard-code transitions using jumps
– Common tools: lex/flex and similar

Hand-coded
– Better I/O performance (i.e., buffering)
– More efficient interfacing w/ other phases
– This is what we’ll do for P2
Handling Keywords

Issue: keywords are valid identifiers

Option 1: Embed into NFA/DFA
– Separate regex for keywords
– Easier/faster for generated scanners

Option 2: Use lookup table
– Scan as identifier then check for a keyword
– Easier for hand-coded scanners
– (Thus, this is probably easier for P2)
Handling Whitespace

Issue: whitespace is usually ignored
– Write a regex and remove it before each new token

Side effect: some results are counterintuitive
– Is this a valid token? “3abc”
– For now, it’s actually two!
– We’ll reject them later, in the parsing phase

You might also like