Flat - Unit 2
Flat - Unit 2
me/jntuh
UNIT - II
CONTENTS
Regular Expressions:
Regular Expressions:
Just as finite automata are used to recognize patterns of strings, regular expressions are
used to generate patterns of strings. A regular expression is an algebraic formula whose value
is a pattern consisting of a set of strings, called the language of the expression.
• characters from the alphabet over which the regular expression is defined.
• variables whose values are any pattern defined by a regular expression.
• epsilon which denotes the empty string containing no characters.
• null which denotes the empty set of strings.
• Concatenation: If R1 and R2 are regular expressions, then R1R2 (also written as R1.R2) is also a
regular expression.
• Kleene closure: If R1 is a regular expression, then R1* (the Kleene closure of R1) is also a
regular expression.
Let r and s be regular expressions that represent the sets R and S, respectively.
– r+s Represents the set R U S (precedence 3)
– rs Represents the set RS (precedence 2)
– r * Represents the set R* (highest precedence)
– (r) Represents the set R (not an op, provides
precedence)
Identities:
1. Øu = uØ = Ø Multiply by 0
2. εu = uε = u Multiply by 1
3. Ø* = ε
4. ε* = ε
5. u+v = v+u
6. u+Ø=u
6. u + u = u 8.
7. u* = (u*)*
9. u(v+w) = uv+uw
= (u+vu*)*
= (u*v*)*
= u*(vu*)*
= (u*v)*u*
If L=L(A) for some DFA, then there is a regular expression R such that L=L(R)
We are going to construct regular expressions from a DFA by eliminating states.
When we eliminate a state s, all the paths that went through s no longer exist in
the automaton.
If the language of the automaton is not to change, we must include, on an arc
that goes directly from q to p ,
the labels of paths that went from some state q to state p , through s.
The label of this arc can now involve strings, rather than single symbols (may be
an infinite number of strings).
We use a regular expression to represent all such strings.
Thus, we consider automata that have regular expressions as labels.
• For every regular expression R, there is a corresponding FA that accepts the set of strings
generated by R.
• For every FA A there is a corresponding regular expression that generates the set of
strings accepted by A.
Our construction of FA from regular expressions will allow "epsilon transitions" (a transition
from one state to another with epsilon as the label). Such a transition is always possible, since
epsilon (or the empty string) can be said to exist between any two input symbols. We can show
that such epsilon transitions are a notational convenience; for every FA with epsilon transitions
there is a corresponding FA without them.
• Even number of a’s : The regular expression for even number of a’s is (b|ab*ab*)*. We can
construct a finite automata as shown in Figure 1.
The above automata will accept all strings which have even number of a’s. For zero a’s, it
will be in q0 which is final state. For one ‘a’, it will go from q0 to q1 and the string will
not be accepted. For two a’s at any positions, it will go from q0 to q1 for 1st ‘a’ and q1 to
q0 for second ‘a’. So, it will accept all strings with even number of a’s.
• String with ‘ab’ as substring : The regular expression for strings with ‘ab’ as substring is
(a|b)*ab(a|b)*. We can construct finite automata as shown in Figure 2.
• The above automata will accept all string which have ‘ab’ as substring. The automata will
remain in initial state q0 for b’s. It will move to q1 after reading ‘a’ and remain in same
state for all ‘a’ afterwards. Then it will move to q2 if ‘b’ is read. That means, the string
has read ‘ab’ as substring if it reaches q2.
• String with count of ‘a’ divisible by 3 : The regular expression for strings with count of
a divisible by 3 is {a3n | n >= 0}. We can construct automata as shown in Figure 3.
The above automata will accept all string of form a3n. The automata will remain in initial
state q0 for ɛ and it will be accepted. For string ‘aaa’, it will move from q0 to q1 then q1
to q2 and then q2 to q0. For every set of three a’s, it will come to q0, hence accepted.
Otherwise, it will be in q1 or q2, hence rejected.
Note : If we want to design a finite automata with number of a’s as 3n+1, same automata
can be used with final state as q1 instead of q0.
If we want to design a finite automata with language {akn | n >= 0}, k states are required.
We have used k = 3 in our example.
• Binary numbers divisible by 3 : The regular expression for binary numbers which are
divisible by three is (0|1(01*0)*1)*. The examples of binary number divisible by 3 are 0,
011, 110, 1001, 1100, 1111, 10010 etc. The DFA corresponding to binary number
divisible by 3 can be shown in Figure 4.
The above automata will accept all binary numbers divisible by 3. For 1001, the automata
will go from q0 to q1, then q1 to q2, then q2 to q1 and finally q2 to q0, hence accepted.
For 0111, the automata will go from q0 to q0, then q0 to q1, then q1 to q0 and finally q0
to q1, hence rejected.
• String with regular expression (111 + 11111)* : The string accepted using this regular
expression will have 3, 5, 6(111 twice), 8 (11111 once and 111 once), 9 (111 thrice), 10
(11111 twice) and all other counts of 1 afterwards. The DFA corresponding to given
regular expression is given in Figure 5.
• Note: Throughout the following, keep in mind that a string is accepted by an NFA-ε if
there exists a path from the start state to a final state.
• Lemma 1: Let r be a regular expression. Then there exists an NFA-ε M such that L(M) =
L(r). Furthermore, M has exactly one final state with no transitions out of it.
• Proof: (by induction on the number of operators, denoted by OP(r), in r).
• Basis: OP(r) = 0
• Inductive Hypothesis: Suppose there exists a k 0 such that for any regular expression r
where 0 OP(r) k, there exists an NFA-ε such that L(M) = L(r). Furthermore, suppose that M
has exactly one final state.
• Inductive Step: Let r be a regular expression with k + 1 operators (OP(r) = k + 1), where
k + 1 >= 1.
Case 1) r = r1 + r2
Since OP(r) = k +1, it follows that 0<= OP(r1), OP(r2) <= k. By the inductive hypothesis there
exist NFA-ε machines M1 and M2 such that L(M1) = L(r1) and L(M2) = L(r2). Furthermore,
both M1 and M2 have exactly one final state.
Case 2) r = r1r2
Since OP(r) = k+1, it follows that 0<= OP(r1), OP(r2) <= k. By the inductive
hypothesis there exist NFA-ε machines M1 and M2 such that L(M1) = L(r1) and
L(M2) = L(r2). Furthermore, both M1 and M2 have exactly one final state.
Case 3) r = r1*
Since OP(r) = k+1, it follows that 0<= OP(r1) <= k. By the inductive hypothesis there exists an NFA-ε
machine M1 such that L(M1) = L(r1). Furthermore, M1 has exactly one final state.
Examples:
Solution: r = r1r2 r1 = 0
r2 = (0+1)*
r2 = r3* r3 = 0+1
r3 = r4 + r5 r4 = 0
r5 = 1
Transition graph:
1.Let M = (Q, Σ, δ, q1, F) be a DFA with state set Q = {q1, q2, …, qn},
and define: Ri,j = { x | x is in Σ* and δ(qi,x) = qj
Ri,j is the set of all strings that define a path in M from qi to qj.
Observations;
Basis: k=0
R0i,j contains single symbols, one for each transition from qi to qj, and possibly ε if i=j.
r0i,j = Ø
for all 1 p m
for all 1 p m
Inductive Hypothesis:
Suppose that Rk-1i,j can be represented by the regular expression rk-1i,j for all
1 i,j n, and some k1.
Applications:
• Regular expressions are useful in a wide variety of text processing tasks, and more
generally string processing, where the data need not be textual. Common applications
include data validation, data scraping (especially web scraping), data wrangling, simple
parsing, the production of syntax highlighting systems, and many other tasks.
• While regexps would be useful on Internet search engines, processing them across the
entire database could consume excessive computer resources depending on the complexity
In the theory of formal languages, the pumping lemma for regular languages is a
lemma that describes an essential property of all regular languages. Informally, it says that all
sufficiently long words in a regular language may be pumped—that is, have a middle section of
the word repeated an arbitrary number of times—to produce a new word that also lies within the
same language.
Specifically, the pumping lemma says that for any regular language L there exists a constant p
such that any word w in L with length at least p can be split into three substrings, w = xyz, where
the middle portion y must not be empty, such that the words xz, xyz, xyyz, xyyyz, … constructed
by repeating y zero or more times are still in L. This process of repetition is known as
"pumping". Moreover, the pumping lemma guarantees that the length of xy will be at most p,
imposing a limit on the ways in which w may be split. Finite languages vacuously satisfy the
pumping lemma by having p equal to the maximum string length in L plus one.
The pumping lemma is useful for disproving the regularity of a specific language in question. It
was first proven by Michael Rabin and Dana Scott in 1959,[1] and rediscovered shortly after by
Yehoshua Bar-Hillel, Micha A. Perles, and Eli Shamir in 1961, as a simplification of their
pumping lemma for context-free languages.
Let L be a regular language. Then there exists an integer p ≥ 1 depending only on L such that
every string w in L of length at least p (p is called the "pumping length"[4]) can be written as w =
xyz (i.e., w can be divided into three substrings), satisfying the following conditions:
1. |y| ≥ 1,
2. |xy| ≤ p, and
3. xynz ∈ L for all n ≥ 0.
y is the substring that can be pumped (removed or repeated any number of times, and the
resulting string is always in L). (1) means the loop y to be pumped must be of length at least one;
(2) means the loop must occur within the first p characters. |x| must be smaller than p (conclusion
of (1) and (2)), but apart from that, there is no restriction on x and z.
In simple words, for any regular language L, any sufficiently long word w (in L) can be split into
3 parts. i.e. w = xyz , such that all the strings xynz for n ≥ 0 are also in L.
The pumping lemma is often used to prove that a particular language is non-regular: a proof by
contradiction (of the language's regularity) may consist of exhibiting a word (of the required
length) in the language that lacks the property outlined in the pumping lemma.
For example, the language L = {anbn : n ≥ 0} over the alphabet Σ = {a, b} can be shown to be
non-regular as follows. Let w, x, y, z, p, and i be as used in the formal statement for the pumping
lemma above. Let w in L be given by w = apbp. By the pumping lemma, there must be some
decomposition w = xyz with |xy| ≤ p and |y| ≥ 1 such that xyiz in L for every i ≥ 0. Using |xy| ≤ p,
we know y only consists of instances of a. Moreover, because |y| ≥ 1, it contains at least one
instance of the letter a. We now pump y up: xy2z has more instances of the letter a than the letter
b, since we have added some instances of a without adding instances of b. Therefore, xy2z is not
in L. We have reached a contradiction. Therefore, the assumption that L is regular must be
incorrect. Hence L is not regular.
The proof that the language of balanced (i.e., properly nested) parentheses is not regular follows
the same idea. Given p, there is a string of balanced parentheses that begins with more than p left
parentheses, so that y will consist entirely of left parentheses. By repeating y, we can produce a
string that does not contain the same number of left and right parentheses, and so they cannot be
balanced.
Proof idea: Whenever a sufficiently long string xyz is recognized by a finite automaton, it must have
reached some state (qs=qt) twice. Hence, after repeating ("pumping") the middle part y arbitrarily often
(xyyz, xyyyz, ...) the word will still be recognized.
For every regular language there is a finite state automaton (FSA) that accepts the language. The
number of states in such an FSA are counted and that count is used as the pumping length p. For
a string of length at least p, let q0 be the start state and let q1, ..., qp be the sequence of the next p
states visited as the string is emitted. Because the FSA has only p states, within this sequence of
p + 1 visited states there must be at least one state that is repeated. Write qs for such a state. The
transitions that take the machine from the first encounter of state qs to the second encounter of
state qs match some string. This string is called y in the lemma, and since the machine will match
a string without the y portion, or with the string y repeated any number of times, the conditions of
the lemma are satisfied.
The FSA accepts the string: abcd. Since this string has a length at least as large as the number of
states, which is four, the pigeonhole principle indicates that there must be at least one repeated
state among the start state and the next four visited states. In this example, only q1 is a repeated
state. Since the substring bc takes the machine through transitions that start at state q1 and end at
state q1, that portion could be repeated and the FSA would still accept, giving the string abcbcd.
Alternatively, the bc portion could be removed and the FSA would still accept giving the string
ad. In terms of the pumping lemma, the string abcd is broken into an x portion a, a y portion bc
and a z portion d.
If a language L is regular, then there exists a number p ≥ 1 (the pumping length) such that every
string uwv in L with |w| ≥ p can be written in the form
uwv = uxyzv
From this, the above standard version follows a special case, with both u and v being the empty
string.
Since the general version imposes stricter requirements on the language, it can be used to prove
the non-regularity of many more languages, such as { ambncn : m≥1 and n≥1 }.[6]
While the pumping lemma states that all regular languages satisfy the conditions described
above, the converse of this statement is not true: a language that satisfies these conditions may
still be non-regular. In other words, both the original and the general version of the pumping
lemma give a necessary but not sufficient condition for a language to be regular.
In other words, L contains all strings over the alphabet {0,1,2,3} with a substring of length 3
including a duplicate character, as well as all strings over this alphabet where precisely 1/7 of the
string's characters are 3's. This language is not regular but can still be "pumped" with p = 5.
Suppose some string s has length at least 5. Then, since the alphabet has only four characters, at
least two of the first five characters in the string must be duplicates. They are separated by at
most three characters.
• If the duplicate characters are separated by 0 characters, or 1, pump one of the other two
characters in the string, which will not affect the substring containing the duplicates.
• If the duplicate characters are separated by 2 or 3 characters, pump 2 of the characters separating
them. Pumping either down or up results in the creation of a substring of size 3 that contains 2
duplicate characters.
• The second condition of L ensures that L is not regular: Consider the string . This string is in L
The Myhill-Nerode theorem provides a test that exactly characterizes regular languages. The
typical method for proving that a language is regular is to construct either a finite state machine
or a regular expression for the language.
Not all languages are regular. For example, the language L = {anbn : n ≥ 0} is not regular.
Similarly, the language {ap : p is a prime number} is not regular. A pertinent question therefore
is how do we know if a language is not regular.
Question: Can we conclude that a language is not regular if no one could come up with a DFA,
NFA, ε-NFA, regular expression or regular grammar so far?
- No. Since, someone may very well come up with any of these in future.
We need a property that just holds for regular languages and so we can prove that any language
without that property is not regular. Let's recall some of the properties.
• We have seen that a regular language can be expressed by a finite state automaton. Be it
deterministic or non-deterministic, the automaton consists of a finite set of states.
• Since the states are finite, if the automaton has no loop, the language would be finite.
- Any finite language is indeed a regular language since we can express the language
using the regular
expression: S1 + S2 + ... + SN, where N is the total number of strings accepted by the
automaton.
• However, if the automaton has a loop, it is capable of accepting infinite number of strings.
- Because, we can loop around any number of times and keep producing more and more
strings.
- This property is called the pumping property (elaborated below).
Any finite automaton with a loop can be divided into parts three.
For example consider the following DFA. It accepts all strings that start with aba followed by
any number of baa's and finally ending with ba.
Investigating this further, we can say that any string w accepted by this DFA can be written as w
= x yi z
where y represents the part that can be pumped again and again to generate more and more valid
strings. This is shown below for the given example.
2. What if the loop was at the beginning? Say a self-loop at q0 instead of at q2.
Then x = ε or |x| = 0. In such a special case, w = yz.
3. What is the loop was at the end. Say a self loop at q6 instead of at q2.
Then z = ε or |z| = 0. In such a special case, w = xy.
6. What is the shortest string accepted if there are more final states? Say q2 is final.
ab of length 2.
7. What is the longest string accepted by the DFA without going through the loop even once?
ababa (= xz). So, any string of length > 5 accepted by DFA must go through the loop at least
once.
8. What is the longest string accepted by the DFA by going through the loop exactly once?
ababaaba (= xyz) of length 8. We call this pumping length.
More precisely, pumping length is an integer p denoting the length of the string w such that w is
obtained by going through the loop exactly once. In other words, |w| = |xyz| = p.
Pumping Lemma: If L is a regular language, then there exists a constant p such that every string
w ∈ L, of length p or more can be written as w = xyz, where
1. |y| > 0
2. |xy| ≤ p
3. xyiz ∈ L for all i
Before proving L is not regular using pumping property, let's see why we can't come up with a
DFA or regular expression for L.
It may be tempting to use the regular expression a*b* to describe L. No doubt, a*b* generates
these strings. However, it is not appropriate since it generates other strings not in L such as a, b,
aa, ab, aaa, aab, abb, ...
Let's try to come up with a DFA. Since it has to accept ε, start state has to be final. The following
DFA can accept anbn for n ≤ 3. i.e. {ε, a, b, ab, aabb, aaabbb}
The basic problem is DFA does not have any memory. A transition just depends on the current
state. So it cannot keep count of how many a's it has seen. So, it has no way to match the number
of a's and b's. So, only way to accept all the strings of L is to keep adding newer and newer states
which makes automaton to infinite states since n is unbounded.
Now, let's prove that L does not have the pumping property.
⇒ w = ap/2bp/2
We know that w can be broken into three terms xyz such that y ≠ ε and xyiz ∈ L.
Then xy2z has more a's than b's and does not belong to L.
Then xy2z has more b's than a's and does not belong to L.
Then xy2z has a's and b's out of order and does not belong to L.
Since none of the 3 cases hold, the pumping property does not hold for L. And therefore L is not
regular.
|w| = 2p + 2 ≥ p
Therefore, pumping property does not hold for L. Hence, L is not regular.
Lets assume L is regular. Let p be the pumping length. Let q ≥ p be a prime number (since we
cannot assume that pumping length p will be prime).
We know that w can be broken into three terms xyz such that y ≠ ε and xyiz ∈ L
|xyq+1z| = |xyzyq|
q
= |xyz| + |y |
= q + q.|y|
= q(1 + |y|) which is a composite number.
Exercises
Show that the following languages are not regular.
4. L = { anbm : n ≠ m }
5. L = { anbm : n > m }
6. L = { w : na(w) = nb(w) }
7. L = { ww : w ∈ {a,b}* }
8. L = { an2 : n > 0 }
IMPORTANT NOTE
Never use pumping lemma to prove a language regular. Pumping property is necessary but
not sufficient for regularity.
"Kleene's theorem" redirects here. For his theorems for recursive functions, see Kleene's recursion
theorem.
In theoretical computer science and formal language theory, a regular language (also called a
rational language[1][2]) is a formal language that can be expressed using a regular expression, in
the strict sense of the latter notion used in theoretical computer science (as opposed to many
regular expressions engines provided by modern programming languages, which are augmented
with features that allow recognition of languages that cannot be expressed by a classic regular
expression).
Regular languages are very useful in input parsing and programming language design
• The empty language Ø, and the empty string language {ε} are regular languages.
• For each a ∈ Σ (a belongs to Σ), the singleton language {a} is a regular language.
• If A and B are regular languages, then A ∪ B (union), A • B (concatenation), and A* (Kleene star)
are regular languages.
• No other languages over Σ are regular.
See regular expression for its syntax and semantics. Note that the above cases are in effect the
defining rules of regular expression.
Examples
All finite languages are regular; in particular the empty string language {ε} = Ø* is regular.
Other typical examples include the language consisting of all strings over the alphabet {a, b}
which contain an even number of as, or the language consisting of all strings of the form: several
as followed by several bs.
A simple example of a language that is not regular is the set of strings { anbn | n ≥ 0 }.[4]
Intuitively, it cannot be recognized with a finite automaton, since a finite automaton has finite
memory and it cannot remember the exact number of a's. Techniques to prove this fact
rigorously are given below.
Equivalent formalisms
Properties 9. and 10. are purely algebraic approaches to define regular languages; a similar set of
statements can be formulated for a monoid M⊂Σ*. In this case, equivalence over M leads to the
concept of a recognizable language.
Some authors use one of the above properties different from "1." as alternative definition of
regular languages.
Some of the equivalences above, particularly those among the first four formalisms, are called
Kleene's theorem in textbooks. Precisely which one (or which subset) is called such varies
between authors. One textbook calls the equivalence of regular expressions and NFAs ("1." and
"2." above) "Kleene's theorem".[6] Another textbook calls the equivalence of regular expressions
and DFAs ("1." and "3." above) "Kleene's theorem".[7] Two other textbooks first prove the
expressive equivalence of NFAs and DFAs ("2." and "3.") and then state "Kleene's theorem" as
the equivalence between regular expressions and finite automata (the latter said to describe
"recognizable languages").[2][8] A linguistically oriented text first equates regular grammars ("4."
above) with DFAs and NFAs, calls the languages generated by (any of) these "regular", after
which it introduces regular expressions which it terms to describe "rational languages", and
finally states "Kleene's theorem" as the coincidence of regular and rational languages.[9] Other
authors simply define "rational expression" and "regular expressions" as synonymous and do the
same with "rational languages" and "regular languages".[1][2]
Closure properties
The regular languages are closed under the various operations, that is, if the languages K and L
are regular, so is the result of the following operations:
• the set theoretic Boolean operations: union K ∪ L, intersection K ∩ L, and complement L, hence
also relative complement K-L.
• the regular operations: K ∪ L, concatenation K ∘ L, and Kleene star L*.
• the trio operations: string homomorphism, inverse string homomorphism, and intersection with
regular languages. As a consequence they are closed under arbitrary finite state transductions, like
quotient K / L with a regular language. Even more, regular languages are closed under quotients
with arbitrary languages: If L is regular then L/K is regular for any K.[citation needed]
• the reverse (or mirror image) LR.[citation needed]
Decidability properties
Given two deterministic finite automata A and B, it is decidable whether they accept the same
language.[12] As a consequence, using the above closure properties, the following problems are
also decidable for arbitrarily given deterministic finite automata A and B, with accepted
languages LA and LB, respectively:
For regular expressions, the universality problem is NP-complete already for a singleton
alphabet.[13] For larger alphabets, that problem is PSPACE-complete. If regular expressions are
extended to allow also a squaring operator, with "A2" denoting the same as "AA", still just
regular languages can be described, but the universality problem has an exponential space lower
bound,[16][17][18] and is in fact complete for exponential space with respect to polynomial-time
reduction.[19]
Complexity results
In computational complexity theory, the complexity class of all regular languages is sometimes
referred to as REGULAR or REG and equals DSPACE(O(1)), the decision problems that can
be solved in constant space (the space used is independent of the input size). REGULAR ≠ AC0,
since it (trivially) contains the parity problem of determining whether the number of 1 bits in the
input is even or odd and this problem is not in AC0. On the other hand, REGULAR does not
contain AC0, because the nonregular language of palindromes, or the nonregular language
can both be recognized in AC0.
If a language is not regular, it requires a machine with at least Ω(log log n) space to recognize
(where n is the input size).[22] In other words, DSPACE(o(log log n)) equals the class of regular
languages. In practice, most nonregular problems are solved by machines taking at least
logarithmic space.
To locate the regular languages in the Chomsky hierarchy, one notices that every regular
language is context-free. The converse is not true: for example the language consisting of all
strings having the same number of a's as b's is context-free but not regular. To prove that a
language such as this is not regular, one often uses the Myhill–Nerode theorem or the pumping
lemma among other methods.[23]
• Finite languages - those containing only a finite number of words.[24] These are regular languages,
as one can create a regular expression that is the union of every word in the language.
• Star-free languages, those that can be described by a regular expression constructed from the
empty symbol, letters, concatenation and all boolean operators including complementation but
not the Kleene star: this class includes all finite languages.[25]
• Cyclic languages, satisfying the conditions uv ∈ L ⇔ vu ∈ L and w ∈ L ⇔ w n ∈ L. The number
of words in a regular language
Let denote the number of words of length in . The ordinary generating function for L is the
formal power series
The generating function of a language L is a rational function if L is regular. Hence for any
regular language there exist an integer constant , complex constants and complex polynomials
such that for every the number of words of length in is .Thus, non-regularity of certain languages
can be proved by counting the words of a given length in . Consider, for example, the Dyck
language of strings of balanced parentheses. The number of words of length in the Dyck language
is equal to the Catalan number , which is not of the form , witnessing the non-regularity of the
Dyck language. Care must be taken since some of the eigenvalues could have the same
magnitude. For example, the number of words of length in the language of all even binary words
is not of the form , but the number of words of even or odd length are of this form; the
corresponding eigenvalues are . In general, for every regular language there exists a constant such
that for all , the number of words of length is asymptotically .The zeta function of a language L is
The zeta function of a regular language is not in general rational, but that of a cyclic language is.
Generalizations
The notion of a regular language has been generalized to infinite words (see ω-automata) and to
trees (see tree automaton).
Rational set generalizes the notion (of regular/rational language) to monoids that are not
necessarily free. Likewise, the notion of a recognizable language (by a finite automaton) has
namesake as recognizable set over a monoid that is not necessarily free. Howard Straubing notes
in relation to these facts that “The term "regular language" is a bit unfortunate. Papers influenced
by Eilenberg's monograph[35] often use either the term "recognizable language", which refers to
the behavior of automata, or "rational language", which refers to important analogies between
regular expressions and rational power series. (In fact, Eilenberg defines rational and
recognizable subsets of arbitrary monoids; the two notions do not, in general, coincide.) This
terminology, while better motivated, never really caught on, and "regular language" is used
almost universally.”[36]
Rational series is another generalization, this time in the context of a formal power series over a
semiring. This approach gives rise to weighted rational expressions and weighted automata. In
this algebraic context, the regular languages (corresponding to Boolean-weighted rational
expressions) are usually called rational languages. Also in this context, Kleene's theorem finds a
generalization called the Kleene-Schützenberger theorem.
Input − DFA
Step 1 − Draw a table for all pairs of states (Qi, Qj) not necessarily connected directly [All are
unmarked initially]
Step 2 − Consider every state pair (Qi, Qj) in the DFA where Qi ∈ F and Qj ∉ F or vice versa and
mark them. [Here F is the set of final states]
If there is an unmarked pair (Qi, Qj), mark it if the pair {δ (Qi, A), δ (Qi, A)} is marked for some
input alphabet.
Step 4 − Combine all the unmarked pair (Qi, Qj) and make them a single state in the reduced
DFA.
Example
abcdef
a b c d e f
c✔✔
d✔✔
e✔✔
f ✔✔✔
Step 3 − We will try to mark the state pairs, with green colored check mark, transitively. If we
input 1 to state ‘a’ and ‘f’, it will go to state ‘c’ and ‘f’ respectively. (c, f) is already marked,
hence we will mark pair (a, f). Now, we input 1 to state ‘b’ and ‘f’; it will go to state ‘d’ and ‘f’
respectively. (d, f) is already marked, hence we will mark pair (b, f).
a b c d e f
c✔✔
d✔✔
e✔✔
f ✔✔✔✔✔
After step 3, we have got state combinations {a, b} {c, d} {c, e} {d, e} that are unmarked.
So the final minimized DFA will contain three states {f}, {a, b} and {c, d, e}
If X and Y are two states in a DFA, we can combine these two states into {X, Y} if they are not
distinguishable. Two states are distinguishable, if there is at least one string S, such that one of δ
(X, S) and δ (Y, S) is accepting and another is not accepting. Hence, a DFA is minimal if and
only if all the states are distinguishable.
Algorithm 3
Step 1 − All the states Q are divided in two partitions − final states and non-final states and are
denoted by P0. All the states in a partition are 0th equivalent. Take a counter k and initialize it
with 0.
Step 2 − Increment k by 1. For each partition in Pk, divide the states in Pk into two partitions if
they are k-distinguishable. Two states within this partition X and Y are k-distinguishable if there
is an input S such that δ(X, S) and δ(Y, S) are (k-1)-distinguishable.
Step 4 − Combine kth equivalent sets and make them the new states of the reduced DFA.
Example
q δ(q,0) δ(q,1)
a b c
b a d
c e f
d e f
e e f
f f f
• P0 = {(c,d,e), (a,b,f)}
• P1 = {(c,d,e), (a,b),(f)}
• P2 = {(c,d,e), (a,b),(f)}
Hence, P1 = P2.
There are three states in the reduced DFA. The reduced DFA is as follows −
Q δ(q,0) δ(q,1)