Regular Expressions and
Languages
1
4/11/2020
Regular Expressions vs. Finite Automata
2
⚫ Offers a declarative way to express the pattern of any string
we want to accept
⚪ E.g., 01*+ 10*
⚫ Automata => more machine-like
< input: string , output: [accept/reject] >
⚫ Regular expressions => more program syntax-like
⚫ Unix environments heavily use regular expressions
⚪ E.g., bash shell, grep, vi & other editors
⚫ Lexical analyzers such as Lex or Flex
4/11/2020
Regular Expressions
3
⚫ Operators of Regular Expressions
⚪ Review of three operations on languages L and M:
Union --- L∪M = {x | x∈L or x∈M}
Example: If L={001, 10, 111} and M={ε, 001} then
L∪M={ε, 10, 001, 111 }
Concatenation --- LM = {xy | x∈L, y∈M}
Example: If L={001, 10, 111} and M={ε, 001} then
LM={001, 10, 111, 001001, 10001, 111001 }
4/11/2020
Kleene Closure (the * operator)
4
⚫ Kleene Closure of a given language L:
⚪ L 0 = { ε}
⚪ L1= {w | for some w ∈ L}
⚪ L2= { w1w2 | w1 ∈ L, w2 ∈ L (duplicates allowed)}
⚪ Li= { w1w2…wi | all w’s chosen are ∈ L (duplicates allowed)}
⚪ (Note: the choice of each wi is independent)
⚪ L* = Ui≥0 Li (arbitrary number of concatenations)
Example:
⚫ Let L = { 1, 00}
0
⚪ L = { ε}
⚪ L1= {1,00}
⚪ L2= {11,100,001,0000}
⚪ L3= {111,1100,1001,10000,000000,00001,00100,0011}
⚪ L* = L0 U L1 U L2 U …
4/11/2020
Regular Expressions
5
Building Regular Expressions
⚪ Recursive definition of a regular expression
(RE) E and the language , L(E):
Basis:
Constants ε and φ are RE’s, defining languages {ε} and φ,
respectively ⇒ L(ε) = {ε}, L(φ) = φ.
If a is a symbol, then a is an RE, defining the language {a}
⇒ L(a) = {a}. (note: a is of bold face)
4/11/2020
Regular Expressions
6
Building Regular Expressions
⚪ Induction: given two RE’s E and F, then
E + F is an RE such that L(E + F) = L(E) ∪ L(F)
(union)
EF is an RE such that L(EF) = L(E)L(F)
(concatenation)
E * is an RE such that L(E*) = (L(E))* (closure)
(E) is an RE such that L((E)) = L(E)
(parenthesization).
4/11/2020
Regular Expressions
7
⚫ Examples
⚪ RE F = 1 “expresses” the language L(1) = {1}.
*
⚪ RE E = 1
Language expressed by E ---
L = L(E) = L(1*) = (L(1))* = ({1})*
(closure of language)
= {ε, 1, 11, 111, 1111, …}
= {1n | n ≥ 0}
4/11/2020
Regular Expressions
8
⚫ Examples
*
⚪ RE G = 01
Language expressed by G ---
L = L(G) = L(01*) = L(0)L(1*) (concatenation)
= {0}{ε, 1, 11, 111, 1111, …}
= {0, 01, 011, 0111, …}
= {01n | n ≥ 0}
4/11/2020
Regular Expressions
9
⚫ Examples
*
⚪ RE H = 1 + 01
Language expressed by H ---
L = L(H) = L(1 + 01*) = L(1) U L(01*)
= {1} U {0, 01, 011, 0111, …}
= {1, 0, 01, 011, 0111, …}
= {1}U{01n | n ≥ 0}
4/11/2020
Regular Expressions
10
⚫ Examples
⚪ RE K = ε + a*
Language expressed by K ---
L = L(K) = L(ε + a*) = L(ε ) U L(a*)
= {ε} U {ε, a, aa, aaa, …}
= {ε, a, aa, aaa, …}
= L(a*)
That is, we have the following RE equalities:
ε + a* = a* = a* + ε
4/11/2020
Regular Expressions
11
⚫ Example
⚪ A RE defining a language of strings of alternating 0’s and 1’s
(including none) is one of the two below:
(01)* + (10)* + 0(10)* + 1(01)*
(0…1 1…0 0…0 1…1)
(ε + 1)(01)*(ε + 0)
4/11/2020
Regular Expressions
12
Precedence of RE operators
⚪ Precedence
*
Highest --- (closure)
Next--- . (concatenation) (left to right)
Last--- + (union) (left to right)
Use parentheses anywhere to resolve ambiguity
4/11/2020
Regular Expressions
13
Precedence of RE operators
⚪ Example
Three ways to interpret 01* + 1:
(0(1*)) + 1 by precedence above (= 01* + 1)
(01)* + 1 (another meaning)
0(1* + 1) (a third meaning)
4/11/2020
FA’s & RE’s
14
⚫ Important Theorems
⚪ Every language defined by a DFA is also defined by an RE.
⚪ Every language defined by an RE is also defined by an ε-NFA.
4/11/2020
FA’s & RE’s
15
ε-NFA NFA
RE DFA
4/11/2020
FA’s & RE’s
16
From DFA’s to RE’s
If L = L(A) for some DFA A, then there is an RE
R such that L = L(R).
Prove by constructing progressively string sets
defined by a certain RE form Rij(k) until the entire set
of acceptable strings (i.e., language L(A)) is
obtained.
Assume the states are {1, 2, ..., n} (1 is the start
state).
4/11/2020
FA’s & RE’s
17
⚫ Meaning of Rij(k) ---
⚪ Rij(k) is a regular expression
Language is set of strings w
w is the label of a path from state i to state j of DFA
A
The path has no intermediate state greater than k
4/11/2020
FA’s & RE’s
18
⚫ Meaning of Rij(k) ---
⚪ T construct Rij(k), we use induction, starting at k = 0
and stop at k = n (the largest state number).
Then, when k = n, i =1, and j specifies an accepting
state, then Rij(k) defines a set of strings accepted by
DFA A, with each string forming a path starting
from the start state to the accepting state.
4/11/2020
FA’s & RE’s
19
(k)
⚫ Meaning of R ---
ij
Basis:
⚪ when k = 0, since all state numbers ≥ 1, and so there is no
intermediate state in path i to j;
2 cases to consider:
(1) an arc (a transition) from i to j;
(2) a path from i to i itself.
4/11/2020
FA’s & RE’s
20
(k)
⚫ Meaning of R ---
ij
Basis (cont’d):
⚪ If i ≠ j, only (case 1) is possible:
no symbol for such a transition ⇒ Rij(0) = φ
one symbol a for the transition ⇒ Rij(0) = a
multiple symbls a1, a2, ..., am for the transition,
• ⇒ Rij(0) = a1 + a2 + ... + am
4/11/2020
FA’s & RE’s
21
(k)
⚫ Meaning of R ---
ij
Basis (cont’d) i ≠ j: qi qj
Rij(0) = φ
a
qi qj
Rij(0) = a
a1+…+am
(0)
qi qj
Rij = a1 + a2 + ... + am
4/11/2020
FA’s & RE’s
22
(k)
⚫ Meaning of R ---
ij
⚪ If i = j, only (case 2) is possible, which means there
exists at least a path ε from i to i itself, in addition to
the 3 cases:
no symbol for such a transition ⇒ Rij(0)= ε
one symbol a for the transition ⇒ Rij(0)= ε + a
multiple symbls a1, a2, ..., am for the transition,
• ⇒ Rij(0) = ε + a1 + a2 + ... + am
4/11/2020
FA’s & RE’s
(k) 23
⚫ Meaning of R ---
ij ε
Basis (cont’d) i = j: qi
Rij(0) = ε ε+a
Rij(0) = ε + a qi
Rij(0) = ε + a1 + a2 + ... + am ε+a1+…+am
qi
4/11/2020
FA’s & RE’s
24
R
Induction (to compute ij(k) ):
⚪ Suppose there is a path from i to j that goes through no state
numbered higher than k. Then, two cases should be
considered:
(1) the path does not go through k ⇒ Rij(k-1)
(2) the path goes through k at least once, then the path may be
broken into 3 pieces:
through i to k without passing k ⇒ Rik(k-1)
from k to k itself ⇒ (Rkk(k-1))* (recusive);
from k to j without passing k ⇒ Rkj(k-1).
4/11/2020
FA’s & RE’s
25
⚫ Illustration of paths represented by Rij(k) :
(Rkk(k-1))*
…… circulating zero
or more times
i … k … j
Rik (k-1)
Rkj(k-1)
4/11/2020
FA’s & RE’s
26
Induction (cont’d):
⚪ The three pieces are concatenated to be
Rik(k-1)(Rkk(k-1))*Rkj(k-1).
⚪ Combining (1) & (2), we get the RE defining “all the paths from
i to j that go through no state higher than k” as
Rij(k) = Rij(k-1) + Rik(k-1)(Rkk(k-1))*Rkj(k-1).
4/11/2020
FA’s & RE’s
27
⚫ Example
⚪ Convert the following DFA into an RE.
1 0, 1
0
start 1 2
⚪ Rij(0) may be constructed to be (details in the next page):
R11(0) ε+1
R12(0) 0
R21(0) φ
R22(0) (ε + 0 + 1)
4/11/2020
FA’s & RE’s
28
⚫ Example
1 0, 1
0
start 1 2
⚪ R11(0) = ε + 1 because δ(1, 1) = 1 & going back to itself
⚪ R12(0) = 0 because δ(1, 0) = 2 (going out to state 2)
⚪ R21(0) = φ because there is no path from state 2 to 1
⚪ R22(0) = (ε + 0 + 1) because δ(2, 0) = 2 & δ(2, 1) = 2 &
going back to itself
4/11/2020
FA’s & RE’s
29
⚫ Example (cont’d) 0, 1
1
0
start 1 2
⚪ We can then compute all Rij(k) for k=1 & k=2.
⚪ However, we may alternatively compute only necessary
terms of Rij(k) backward from the final states, to save time.
4/11/2020
FA’s & RE’s
30
⚫ Example (cont’d)
⚪ There is only one final state 2, so only have to compute
R12(2) = R12(1) + R12(1)(R22(1))*R22(1).
⚪ Only have to compute R12(1) and R22(1), without computing
R21(1) and R11(1).
⚪ To compute each of these terms, we need some RE equalities
to simplify intermediate results.
4/11/2020
FA’s & RE’s
31
⚫ Some equalities (R is an RE):
1. φR=Rφ=φ (φ=annihilator for concatenation)
2. φ + R = R + φ = R (φ=identity for union)
3. εR = Rε = R (ε = identity for concatenation)
4. (ε + a)* = a* = (a + ε)*
5. (ε + a)a* = (εa* + aa*) = a* + a+ = a*
6. a*(ε + a) = (a*ε + a*a) = a* + a+ = a*
(all provable by easy deduction)
4/11/2020
FA’s & RE’s
32
⚫ To compute
R12(2) = R12(1) + R12(1)(R22(1))*R22(1)
⚪ R12(1) = R12(0) + R11(0)(R11(0))*R12(0)
= 0 + (ε + 1)(ε + 1)*0 (by substitutions)
= 0 + (ε + 1)1*0 (by 4. in last slide )
= 0 + 1* 0 (by 5.)
= (ε + 1*)0 (by distributive law)
= 1*0 (by 4.)
4/11/2020
FA’s & RE’s
33
⚫ To compute
R12(2) = R12(1) + R12(1)(R22(1))*R22(1)
⚪ R22(1) = R22(0) + R21(0)(R11(0))*R12(0)
= (ε + 0 + 1) + φ(ε + 1)*0 (by substitutions)
= (ε + 0 + 1) + φ (by 1.)
=ε+0+1 (by 2.)
4/11/2020
FA’s & RE’s
34
⚫ To compute
R12(2) = R12(1) + R12(1)(R22(1))*R22(1)
⚪ Finally, R12(2)
= 1*0 +1*0(ε + 0 + 1)*(ε + 0 + 1) (by subst.)
= 1*0 +1*0(0 + 1)*(ε + 0 + 1) (by 4.)
= 1*0 +1*0(0 + 1)* (by 6.)
=1*0(ε + (0 + 1)*) (by distributive law)
= 1*0(0 + 1)* (by 4.)
4/11/2020
FA’s & RE’s
35
⚫ Check the correctness of the final result
R12(2) = 1*0(0 + 1)*
1 0, 1
0
start 1 2
It is a language that begins with zero or more 1’s
than have a zero and then any string of zero’s
and 1’s
correct (by looking at the diagram directly)!
4/11/2020
Acknowledgement
36
⚫ Tania Akter Setu
⚫ Lecturer, Dept. of CSE , UITS
4/11/2020