0% found this document useful (0 votes)
43 views34 pages

Flat - Unit 2

The document covers the concepts of regular expressions and finite automata, detailing their definitions, operations, and applications in automata theory. It explains the conversion between finite automata and regular expressions, including the use of the pumping lemma and closure properties of regular languages. Additionally, it provides examples of constructing finite automata from specific regular expressions and discusses the equivalence of regular expressions and NFA-ε.

Uploaded by

loki36929
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views34 pages

Flat - Unit 2

The document covers the concepts of regular expressions and finite automata, detailing their definitions, operations, and applications in automata theory. It explains the conversion between finite automata and regular expressions, including the use of the pumping lemma and closure properties of regular languages. Additionally, it provides examples of constructing finite automata from specific regular expressions and discusses the equivalence of regular expressions and NFA-ε.

Uploaded by

loki36929
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

www.android.universityupdates.in | www.universityupdates.in | https://telegram.

me/jntuh

Formal Language Automata and Automata Theory MRITS

UNIT - II
CONTENTS

Regular Expressions:

Finite Automata and Regular Expressions

Applications of Regular Expressions

Algebraic Laws for Regular Expressions,\

Conversion of Finite Automata to Regular Expressions.

Pumping Lemma for Regular Languages,

Statement of the pumping lemma,

Applications of the Pumping Lemma.

Closure Properties of Regular Languages:

Closure properties of Regular languages,

Decision Properties of Regular Languages,

Equivalence and Minimization of Automata.

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

Formal Language Automata and Automata Theory MRITS

Regular Expressions:

Just as finite automata are used to recognize patterns of strings, regular expressions are
used to generate patterns of strings. A regular expression is an algebraic formula whose value
is a pattern consisting of a set of strings, called the language of the expression.

1.A Regular Language is used to specify a Language and it does so preciously


2. Regular expressions are very intuitive.
3.Regular expressions are very useful in a variety of contexts.
4.Given a regular expression, an NFA-ε can be constructed from it automatically.
5.Thus, so can an NFA, a DFA, and a corresponding program, all automatically!

Operands in a regular expression can be:

• characters from the alphabet over which the regular expression is defined.
• variables whose values are any pattern defined by a regular expression.
• epsilon which denotes the empty string containing no characters.
• null which denotes the empty set of strings.

• Let Σ be an alphabet. The regular expressions over Σ are:

– Ø Represents the empty set { }


– ε Represents the set {ε}
– a Represents the set {a}, for any symbol a in ΣOperators used in
regular expressions include:

• Union: If R1 and R2 are regular expressions, then R1 | R2 (also written as R1 U R2 or R1 + R2)


is also a regular expression.

L(R1|R2) = L(R1) U L(R2).

• Concatenation: If R1 and R2 are regular expressions, then R1R2 (also written as R1.R2) is also a
regular expression.

L(R1R2) = L(R1) concatenated with L(R2).

• Kleene closure: If R1 is a regular expression, then R1* (the Kleene closure of R1) is also a
regular expression.

L(R1*) = epsilon U L(R1) U L(R1R1) U L(R1R1R1) U ...

Closure has the highest precedence, followed by concatenation, followed by union.

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

Formal Language Automata and Automata Theory MRITS

Let r and s be regular expressions that represent the sets R and S, respectively.
– r+s Represents the set R U S (precedence 3)
– rs Represents the set RS (precedence 2)
– r * Represents the set R* (highest precedence)
– (r) Represents the set R (not an op, provides
precedence)

If r is a regular expression, then L(r) is used to denote the corresponding language.

Examples: Let Σ = {0, 1}


(0 + 1)* All strings of 0’s and 1’s
0(0 + 1)* All strings of 0’s and 1’s, beginning with a 0
(0 + 1)*1 All strings of 0’s and 1’s, ending with a 1
(0 + 1)*0(0 + 1)* All strings of 0’s and 1’s containing at least one 0
(0 + 1)*0(0 + 1)*0(0 + 1)* All strings of 0’s and 1’s containing at least two 0’s
(0 + 1)*01*01* All strings of 0’s and 1’s containing at least two 0’s
(1 + 01*0)* All strings of 0’s and 1’s containing an even number of 0’s
1*(01*01*)* All strings of 0’s and 1’s containing an even number of 0’s
(1*01*0)*1* All strings of 0’s and 1’s containing an even number of 0’s

Identities:

1. Øu = uØ = Ø Multiply by 0

2. εu = uε = u Multiply by 1

3. Ø* = ε

4. ε* = ε

5. u+v = v+u

6. u+Ø=u

6. u + u = u 8.
7. u* = (u*)*
9. u(v+w) = uv+uw

10. (u+v)w = uw+vw

11. (uv)*u = u(vu)


12. (u+v)* = (u*+v)*
= u*(u+v)*

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

Formal Language Automata and Automata Theory MRITS

= (u+vu*)*

= (u*v*)*

= u*(vu*)*

= (u*v)*u*

Finite Automata and Regular Expressions

If L=L(A) for some DFA, then there is a regular expression R such that L=L(R)
We are going to construct regular expressions from a DFA by eliminating states.
When we eliminate a state s, all the paths that went through s no longer exist in
the automaton.
If the language of the automaton is not to change, we must include, on an arc
that goes directly from q to p ,
the labels of paths that went from some state q to state p , through s.
The label of this arc can now involve strings, rather than single symbols (may be
an infinite number of strings).
We use a regular expression to represent all such strings.
Thus, we consider automata that have regular expressions as labels.

Conversion of Finite Automata to Regular Expressions.

Regular expressions and finite automata have equivalent expressive power:

• For every regular expression R, there is a corresponding FA that accepts the set of strings
generated by R.
• For every FA A there is a corresponding regular expression that generates the set of
strings accepted by A.

The proof is in two parts:

1. an algorithm that, given a regular expression R, produces an FA A such that L(A) ==


L(R).
2. an algorithm that, given an FA A, produces a regular expression R such that L(R) ==
L(A).

Our construction of FA from regular expressions will allow "epsilon transitions" (a transition
from one state to another with epsilon as the label). Such a transition is always possible, since
epsilon (or the empty string) can be said to exist between any two input symbols. We can show
that such epsilon transitions are a notational convenience; for every FA with epsilon transitions
there is a corresponding FA without them.

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

Formal Language Automata and Automata Theory MRITS

Designing Finite Automata from Regular Expression


In this article, we will see some popular regular expressions and how we can convert them to
finite automata.

• Even number of a’s : The regular expression for even number of a’s is (b|ab*ab*)*. We can
construct a finite automata as shown in Figure 1.

The above automata will accept all strings which have even number of a’s. For zero a’s, it
will be in q0 which is final state. For one ‘a’, it will go from q0 to q1 and the string will
not be accepted. For two a’s at any positions, it will go from q0 to q1 for 1st ‘a’ and q1 to
q0 for second ‘a’. So, it will accept all strings with even number of a’s.

• String with ‘ab’ as substring : The regular expression for strings with ‘ab’ as substring is
(a|b)*ab(a|b)*. We can construct finite automata as shown in Figure 2.

• The above automata will accept all string which have ‘ab’ as substring. The automata will
remain in initial state q0 for b’s. It will move to q1 after reading ‘a’ and remain in same
state for all ‘a’ afterwards. Then it will move to q2 if ‘b’ is read. That means, the string
has read ‘ab’ as substring if it reaches q2.
• String with count of ‘a’ divisible by 3 : The regular expression for strings with count of
a divisible by 3 is {a3n | n >= 0}. We can construct automata as shown in Figure 3.

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

Formal Language Automata and Automata Theory MRITS

The above automata will accept all string of form a3n. The automata will remain in initial
state q0 for ɛ and it will be accepted. For string ‘aaa’, it will move from q0 to q1 then q1
to q2 and then q2 to q0. For every set of three a’s, it will come to q0, hence accepted.
Otherwise, it will be in q1 or q2, hence rejected.

Note : If we want to design a finite automata with number of a’s as 3n+1, same automata
can be used with final state as q1 instead of q0.
If we want to design a finite automata with language {akn | n >= 0}, k states are required.
We have used k = 3 in our example.

• Binary numbers divisible by 3 : The regular expression for binary numbers which are
divisible by three is (0|1(01*0)*1)*. The examples of binary number divisible by 3 are 0,
011, 110, 1001, 1100, 1111, 10010 etc. The DFA corresponding to binary number
divisible by 3 can be shown in Figure 4.

The above automata will accept all binary numbers divisible by 3. For 1001, the automata
will go from q0 to q1, then q1 to q2, then q2 to q1 and finally q2 to q0, hence accepted.
For 0111, the automata will go from q0 to q0, then q0 to q1, then q1 to q0 and finally q0
to q1, hence rejected.

• String with regular expression (111 + 11111)* : The string accepted using this regular
expression will have 3, 5, 6(111 twice), 8 (11111 once and 111 once), 9 (111 thrice), 10
(11111 twice) and all other counts of 1 afterwards. The DFA corresponding to given
regular expression is given in Figure 5.

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

Formal Language Automata and Automata Theory MRITS

Equivalence of Regular Expressions and NFA-ε

• Note: Throughout the following, keep in mind that a string is accepted by an NFA-ε if
there exists a path from the start state to a final state.
• Lemma 1: Let r be a regular expression. Then there exists an NFA-ε M such that L(M) =
L(r). Furthermore, M has exactly one final state with no transitions out of it.
• Proof: (by induction on the number of operators, denoted by OP(r), in r).

• Basis: OP(r) = 0

Then r is either Ø, ε, or a, for some symbol a in Σ

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

Formal Language Automata and Automata Theory MRITS

• Inductive Hypothesis: Suppose there exists a k 0 such that for any regular expression r
where 0 OP(r) k, there exists an NFA-ε such that L(M) = L(r). Furthermore, suppose that M
has exactly one final state.

• Inductive Step: Let r be a regular expression with k + 1 operators (OP(r) = k + 1), where
k + 1 >= 1.

Case 1) r = r1 + r2

Since OP(r) = k +1, it follows that 0<= OP(r1), OP(r2) <= k. By the inductive hypothesis there
exist NFA-ε machines M1 and M2 such that L(M1) = L(r1) and L(M2) = L(r2). Furthermore,
both M1 and M2 have exactly one final state.

Case 2) r = r1r2

Since OP(r) = k+1, it follows that 0<= OP(r1), OP(r2) <= k. By the inductive
hypothesis there exist NFA-ε machines M1 and M2 such that L(M1) = L(r1) and
L(M2) = L(r2). Furthermore, both M1 and M2 have exactly one final state.

Case 3) r = r1*

Since OP(r) = k+1, it follows that 0<= OP(r1) <= k. By the inductive hypothesis there exists an NFA-ε
machine M1 such that L(M1) = L(r1). Furthermore, M1 has exactly one final state.

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

Formal Language Automata and Automata Theory MRITS

Examples:

Problem: Construct FA equivalent to RE, r = 0(0+1)*

Solution: r = r1r2 r1 = 0

r2 = (0+1)*

r2 = r3* r3 = 0+1

r3 = r4 + r5 r4 = 0

r5 = 1

Transition graph:

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

Formal Language Automata and Automata Theory MRITS

Definitions Required to Convert a DFA to a Regular Expression

1.Let M = (Q, Σ, δ, q1, F) be a DFA with state set Q = {q1, q2, …, qn},
and define: Ri,j = { x | x is in Σ* and δ(qi,x) = qj
Ri,j is the set of all strings that define a path in M from qi to qj.

2. Note that states have been numbered starting at 1!

Observations;

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

Formal Language Automata and Automata Theory MRITS

Lemma 2: Let M = (Q, Σ, δ, q1, F) be a DFA. Then there exists a regular


expression r such that L(M) = L(r).
Proof:
First we will show (by induction on k) that for all i,j, and k, where 1 i,j n And 0 k n,
that there exists a regular expression r such that L(r) = Rki,j .

Basis: k=0

R0i,j contains single symbols, one for each transition from qi to qj, and possibly ε if i=j.

Case 1) No transitions from qi to qj and i != j r0i,j = Ø

r0i,j = Ø

Case 2) At least one (m  1) transition from qi to qj and i != jCase 2) At least one (m  1)


transition from qi to qj and i != j

r0i,j = a1 + a2 + a3 + … + am where δ(qi, ap) = qj,

for all 1  p  m

Case 3) No transitions from qi to qj and i = j


r0i,j = ε

Case 4) At least one (m  1) transition from qi to qj and i = j


r0i,j = a1 + a2 + a3 + … + am + ε where δ(qi, ap) = qj

for all 1  p  m

Inductive Hypothesis:

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

Formal Language Automata and Automata Theory MRITS

Suppose that Rk-1i,j can be represented by the regular expression rk-1i,j for all
1  i,j  n, and some k1.

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

Formal Language Automata and Automata Theory MRITS

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

Formal Language Automata and Automata Theory MRITS

Applications of Regular Expressions

Applications:

• Regular expressions are useful in a wide variety of text processing tasks, and more

generally string processing, where the data need not be textual. Common applications

include data validation, data scraping (especially web scraping), data wrangling, simple

parsing, the production of syntax highlighting systems, and many other tasks.

• While regexps would be useful on Internet search engines, processing them across the

entire database could consume excessive computer resources depending on the complexity

and design of the regex.

Algebraic Laws for Regular Expressions


Definition :Two regular expressions with variables are equivalent if whatever
languageswe substitute for the variables, the results of the two expressions are the
same language.

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

Formal Language Automata and Automata Theory MRITS

Some laws for union and concatenation


L(M+N)=LM+LN (distributive)
(L+M)N=LN+MN (distributive)
Conversion of Finite Automata to Regular Expressions.

Pumping Lemma for Regular Languages:

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

Formal Language Automata and Automata Theory MRITS

In the theory of formal languages, the pumping lemma for regular languages is a
lemma that describes an essential property of all regular languages. Informally, it says that all
sufficiently long words in a regular language may be pumped—that is, have a middle section of
the word repeated an arbitrary number of times—to produce a new word that also lies within the
same language.

Specifically, the pumping lemma says that for any regular language L there exists a constant p
such that any word w in L with length at least p can be split into three substrings, w = xyz, where
the middle portion y must not be empty, such that the words xz, xyz, xyyz, xyyyz, … constructed
by repeating y zero or more times are still in L. This process of repetition is known as
"pumping". Moreover, the pumping lemma guarantees that the length of xy will be at most p,
imposing a limit on the ways in which w may be split. Finite languages vacuously satisfy the
pumping lemma by having p equal to the maximum string length in L plus one.

The pumping lemma is useful for disproving the regularity of a specific language in question. It
was first proven by Michael Rabin and Dana Scott in 1959,[1] and rediscovered shortly after by
Yehoshua Bar-Hillel, Micha A. Perles, and Eli Shamir in 1961, as a simplification of their
pumping lemma for context-free languages.

Statement of the pumping lemma

Let L be a regular language. Then there exists an integer p ≥ 1 depending only on L such that
every string w in L of length at least p (p is called the "pumping length"[4]) can be written as w =
xyz (i.e., w can be divided into three substrings), satisfying the following conditions:

1. |y| ≥ 1,
2. |xy| ≤ p, and
3. xynz ∈ L for all n ≥ 0.

y is the substring that can be pumped (removed or repeated any number of times, and the
resulting string is always in L). (1) means the loop y to be pumped must be of length at least one;
(2) means the loop must occur within the first p characters. |x| must be smaller than p (conclusion
of (1) and (2)), but apart from that, there is no restriction on x and z.

In simple words, for any regular language L, any sufficiently long word w (in L) can be split into
3 parts. i.e. w = xyz , such that all the strings xynz for n ≥ 0 are also in L.

Below is a formal expression of the Pumping Lemma

Use of the lemma

The pumping lemma is often used to prove that a particular language is non-regular: a proof by
contradiction (of the language's regularity) may consist of exhibiting a word (of the required
length) in the language that lacks the property outlined in the pumping lemma.

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

Formal Language Automata and Automata Theory MRITS

For example, the language L = {anbn : n ≥ 0} over the alphabet Σ = {a, b} can be shown to be
non-regular as follows. Let w, x, y, z, p, and i be as used in the formal statement for the pumping
lemma above. Let w in L be given by w = apbp. By the pumping lemma, there must be some
decomposition w = xyz with |xy| ≤ p and |y| ≥ 1 such that xyiz in L for every i ≥ 0. Using |xy| ≤ p,
we know y only consists of instances of a. Moreover, because |y| ≥ 1, it contains at least one
instance of the letter a. We now pump y up: xy2z has more instances of the letter a than the letter
b, since we have added some instances of a without adding instances of b. Therefore, xy2z is not
in L. We have reached a contradiction. Therefore, the assumption that L is regular must be
incorrect. Hence L is not regular.

The proof that the language of balanced (i.e., properly nested) parentheses is not regular follows
the same idea. Given p, there is a string of balanced parentheses that begins with more than p left
parentheses, so that y will consist entirely of left parentheses. By repeating y, we can produce a
string that does not contain the same number of left and right parentheses, and so they cannot be
balanced.

Proof of the pumping lemma

Proof idea: Whenever a sufficiently long string xyz is recognized by a finite automaton, it must have
reached some state (qs=qt) twice. Hence, after repeating ("pumping") the middle part y arbitrarily often
(xyyz, xyyyz, ...) the word will still be recognized.

For every regular language there is a finite state automaton (FSA) that accepts the language. The
number of states in such an FSA are counted and that count is used as the pumping length p. For
a string of length at least p, let q0 be the start state and let q1, ..., qp be the sequence of the next p
states visited as the string is emitted. Because the FSA has only p states, within this sequence of
p + 1 visited states there must be at least one state that is repeated. Write qs for such a state. The
transitions that take the machine from the first encounter of state qs to the second encounter of
state qs match some string. This string is called y in the lemma, and since the machine will match
a string without the y portion, or with the string y repeated any number of times, the conditions of
the lemma are satisfied.

For example, the following image shows an FSA.

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

Formal Language Automata and Automata Theory MRITS

The FSA accepts the string: abcd. Since this string has a length at least as large as the number of
states, which is four, the pigeonhole principle indicates that there must be at least one repeated
state among the start state and the next four visited states. In this example, only q1 is a repeated
state. Since the substring bc takes the machine through transitions that start at state q1 and end at
state q1, that portion could be repeated and the FSA would still accept, giving the string abcbcd.
Alternatively, the bc portion could be removed and the FSA would still accept giving the string
ad. In terms of the pumping lemma, the string abcd is broken into an x portion a, a y portion bc
and a z portion d.

General version of pumping lemma for regular languages

If a language L is regular, then there exists a number p ≥ 1 (the pumping length) such that every
string uwv in L with |w| ≥ p can be written in the form

uwv = uxyzv

with strings x, y and z such that |xy| ≤ p, |y| ≥ 1 and

uxyizv is in L for every integer i ≥ 0.[5]

From this, the above standard version follows a special case, with both u and v being the empty
string.

Since the general version imposes stricter requirements on the language, it can be used to prove
the non-regularity of many more languages, such as { ambncn : m≥1 and n≥1 }.[6]

Converse of lemma not true

While the pumping lemma states that all regular languages satisfy the conditions described
above, the converse of this statement is not true: a language that satisfies these conditions may
still be non-regular. In other words, both the original and the general version of the pumping
lemma give a necessary but not sufficient condition for a language to be regular.

For example, consider the following language L:

In other words, L contains all strings over the alphabet {0,1,2,3} with a substring of length 3
including a duplicate character, as well as all strings over this alphabet where precisely 1/7 of the
string's characters are 3's. This language is not regular but can still be "pumped" with p = 5.
Suppose some string s has length at least 5. Then, since the alphabet has only four characters, at
least two of the first five characters in the string must be duplicates. They are separated by at
most three characters.

• If the duplicate characters are separated by 0 characters, or 1, pump one of the other two
characters in the string, which will not affect the substring containing the duplicates.

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

Formal Language Automata and Automata Theory MRITS

• If the duplicate characters are separated by 2 or 3 characters, pump 2 of the characters separating
them. Pumping either down or up results in the creation of a substring of size 3 that contains 2
duplicate characters.
• The second condition of L ensures that L is not regular: Consider the string . This string is in L

exactly when and thus L is not regular by the Myhill-Nerode theorem.

The Myhill-Nerode theorem provides a test that exactly characterizes regular languages. The
typical method for proving that a language is regular is to construct either a finite state machine
or a regular expression for the language.

Applications of the Pumping Lemma

Not all languages are regular. For example, the language L = {anbn : n ≥ 0} is not regular.
Similarly, the language {ap : p is a prime number} is not regular. A pertinent question therefore
is how do we know if a language is not regular.

Question: Can we conclude that a language is not regular if no one could come up with a DFA,
NFA, ε-NFA, regular expression or regular grammar so far?
- No. Since, someone may very well come up with any of these in future.

We need a property that just holds for regular languages and so we can prove that any language
without that property is not regular. Let's recall some of the properties.

• We have seen that a regular language can be expressed by a finite state automaton. Be it
deterministic or non-deterministic, the automaton consists of a finite set of states.

• Since the states are finite, if the automaton has no loop, the language would be finite.

- Any finite language is indeed a regular language since we can express the language
using the regular
expression: S1 + S2 + ... + SN, where N is the total number of strings accepted by the
automaton.

• However, if the automaton has a loop, it is capable of accepting infinite number of strings.

- Because, we can loop around any number of times and keep producing more and more
strings.
- This property is called the pumping property (elaborated below).

The pumping property of regular languages

Any finite automaton with a loop can be divided into parts three.

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

Formal Language Automata and Automata Theory MRITS

• Part 1: The transitions it takes before the loop.


• Part 2: The transitions it takes during the loop.
• Part 3: The transitions it takes after the loop.

For example consider the following DFA. It accepts all strings that start with aba followed by
any number of baa's and finally ending with ba.

1. What strings are accepted by this DFA?


ababaaba, ababaabaaba, ababaabaabaaba, .... so on and so forth. Thus the strings accepted
by the above DFA can be divided into three parts: aba, (baa)i and ba. Here, i > 0.

Investigating this further, we can say that any string w accepted by this DFA can be written as w
= x yi z
where y represents the part that can be pumped again and again to generate more and more valid
strings. This is shown below for the given example.

Before we generalize further, let's investigate this example a little more.

2. What if the loop was at the beginning? Say a self-loop at q0 instead of at q2.
Then x = ε or |x| = 0. In such a special case, w = yz.

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

Formal Language Automata and Automata Theory MRITS

3. What is the loop was at the end. Say a self loop at q6 instead of at q2.
Then z = ε or |z| = 0. In such a special case, w = xy.

4. Can y be equal to ε ever?


No. It is impossible. If y = ε, it implies there is no loop which implies the language is finite.
We have already seen that a finite language is always regular. So, we are now concerned only
with infinite regular language. Hence, y can never be ε. Or |y| > 0.

5. What is the shortest string that is accepted by the DFA?


ababa. Obviously, a string obtained without going through the loop.
There is a catch however. See the next question.

6. What is the shortest string accepted if there are more final states? Say q2 is final.
ab of length 2.

7. What is the longest string accepted by the DFA without going through the loop even once?
ababa (= xz). So, any string of length > 5 accepted by DFA must go through the loop at least
once.

8. What is the longest string accepted by the DFA by going through the loop exactly once?
ababaaba (= xyz) of length 8. We call this pumping length.

More precisely, pumping length is an integer p denoting the length of the string w such that w is
obtained by going through the loop exactly once. In other words, |w| = |xyz| = p.

9. Of what use is this pumping length p?


We can be sure that |xy| ≤ p. This can be used to prove a language non-regular.

Now, let's define a regular language based on the pumping property.

Pumping Lemma: If L is a regular language, then there exists a constant p such that every string
w ∈ L, of length p or more can be written as w = xyz, where

1. |y| > 0
2. |xy| ≤ p
3. xyiz ∈ L for all i

Proving languages non-regular


1. The language L = { anbn : n ≠ 0 } is not regular.

Before proving L is not regular using pumping property, let's see why we can't come up with a
DFA or regular expression for L.

L = { ε, ab, aabb, aaabbb, .... }

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

Formal Language Automata and Automata Theory MRITS

It may be tempting to use the regular expression a*b* to describe L. No doubt, a*b* generates
these strings. However, it is not appropriate since it generates other strings not in L such as a, b,
aa, ab, aaa, aab, abb, ...

Let's try to come up with a DFA. Since it has to accept ε, start state has to be final. The following
DFA can accept anbn for n ≤ 3. i.e. {ε, a, b, ab, aabb, aaabbb}

The basic problem is DFA does not have any memory. A transition just depends on the current
state. So it cannot keep count of how many a's it has seen. So, it has no way to match the number
of a's and b's. So, only way to accept all the strings of L is to keep adding newer and newer states
which makes automaton to infinite states since n is unbounded.

Now, let's prove that L does not have the pumping property.

Lets assume L is regular. Let p be the pumping length.

Consider a string w = aa....abb....b such that |w| = p.

⇒ w = ap/2bp/2

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

Formal Language Automata and Automata Theory MRITS

We know that w can be broken into three terms xyz such that y ≠ ε and xyiz ∈ L.

There are three cases to consider.

• Case 1: y is made up of only a's

Then xy2z has more a's than b's and does not belong to L.

• Case 2: y is made up of only b's

Then xy2z has more b's than a's and does not belong to L.

• Case 3: y is made up of a's and b's

Then xy2z has a's and b's out of order and does not belong to L.

Since none of the 3 cases hold, the pumping property does not hold for L. And therefore L is not
regular.

2. The language L = { uuR : u ∈ {a,b}*} is not regular.

Lets assume L is regular. Let p be the pumping length.

Consider a string w = apbbap.

|w| = 2p + 2 ≥ p

Since, xy ≤ p, xy will consist of only a's.

⇒ y is made of only a's

⇒ y2 is made of more number of a's than y since |y| > 0


(Let's say y 2 has m a's more than y where m > 1)

⇒ xy2z = ap+mbbap where m ≥ 1

⇒ xy2z = ap+mbbap cannot belong to L.

Therefore, pumping property does not hold for L. Hence, L is not regular.

3. The language L = { an : n is prime } is not regular.

Lets assume L is regular. Let p be the pumping length. Let q ≥ p be a prime number (since we
cannot assume that pumping length p will be prime).

Consider the string w = aa....a such that |w| = q ≥ p.

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

Formal Language Automata and Automata Theory MRITS

We know that w can be broken into three terms xyz such that y ≠ ε and xyiz ∈ L

⇒ xyq+1z must belong to L


⇒ |xyq+1z| must be prime

|xyq+1z| = |xyzyq|
q
= |xyz| + |y |
= q + q.|y|
= q(1 + |y|) which is a composite number.

Therefore, xyq+1z cannot belong to L. Hence, L is not regular.

Exercises
Show that the following languages are not regular.

4. L = { anbm : n ≠ m }
5. L = { anbm : n > m }
6. L = { w : na(w) = nb(w) }
7. L = { ww : w ∈ {a,b}* }
8. L = { an2 : n > 0 }

IMPORTANT NOTE

Never use pumping lemma to prove a language regular. Pumping property is necessary but
not sufficient for regularity.

Closure Properties of Regular Languages:


For natural language that is regulated, see List of language regulators.

"Kleene's theorem" redirects here. For his theorems for recursive functions, see Kleene's recursion
theorem.

In theoretical computer science and formal language theory, a regular language (also called a
rational language[1][2]) is a formal language that can be expressed using a regular expression, in
the strict sense of the latter notion used in theoretical computer science (as opposed to many
regular expressions engines provided by modern programming languages, which are augmented
with features that allow recognition of languages that cannot be expressed by a classic regular
expression).

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

Formal Language Automata and Automata Theory MRITS

Alternatively, a regular language can be defined as a language recognized by a finite automaton.


The equivalence of regular expressions and finite automata is known as Kleene's theorem[3]
(after American mathematician Stephen Cole Kleene). In the Chomsky hierarchy, regular
languages are defined to be the languages that are generated by Type-3 grammars (regular
grammars).

Regular languages are very useful in input parsing and programming language design

Closure properties of Regular languages


Formal definition

The collection of regular languages over an alphabet Σ is defined recursively as follows:

• The empty language Ø, and the empty string language {ε} are regular languages.
• For each a ∈ Σ (a belongs to Σ), the singleton language {a} is a regular language.
• If A and B are regular languages, then A ∪ B (union), A • B (concatenation), and A* (Kleene star)
are regular languages.
• No other languages over Σ are regular.

See regular expression for its syntax and semantics. Note that the above cases are in effect the
defining rules of regular expression.

Examples

All finite languages are regular; in particular the empty string language {ε} = Ø* is regular.
Other typical examples include the language consisting of all strings over the alphabet {a, b}
which contain an even number of as, or the language consisting of all strings of the form: several
as followed by several bs.

A simple example of a language that is not regular is the set of strings { anbn | n ≥ 0 }.[4]
Intuitively, it cannot be recognized with a finite automaton, since a finite automaton has finite
memory and it cannot remember the exact number of a's. Techniques to prove this fact
rigorously are given below.

Equivalent formalisms

A regular language satisfies the following equivalent properties:

1. it is the language of a regular expression (by the above definition)


2. it is the language accepted by a nondeterministic finite automaton (NFA)[note 1][note 2]
3. it is the language accepted by a deterministic finite automaton (DFA)[note 3][note 4]
4. it can be generated by a regular grammar[note 5][note 6]
5. it is the language accepted by an alternating finite automaton

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

Formal Language Automata and Automata Theory MRITS

6. it can be generated by a prefix grammar


7. it can be accepted by a read-only Turing machine
8. it can be defined in monadic second-order logic (Büchi-Elgot-Trakhtenbrot theorem[5])
9. it is recognized by some finite monoid M, meaning it is the preimage { w∈Σ* | f(w)∈S } of a
subset S of a finite monoid M under a monoid homomorphism f: Σ* → M from the free monoid on
its alphabet[note 7]
10. the number of equivalence classes of its "syntactic relation" ~ is finite[note 8][note 9] (this number
equals the number of states of the minimal deterministic finite automaton accepting L.)

Properties 9. and 10. are purely algebraic approaches to define regular languages; a similar set of
statements can be formulated for a monoid M⊂Σ*. In this case, equivalence over M leads to the
concept of a recognizable language.

Some authors use one of the above properties different from "1." as alternative definition of
regular languages.

Some of the equivalences above, particularly those among the first four formalisms, are called
Kleene's theorem in textbooks. Precisely which one (or which subset) is called such varies
between authors. One textbook calls the equivalence of regular expressions and NFAs ("1." and
"2." above) "Kleene's theorem".[6] Another textbook calls the equivalence of regular expressions
and DFAs ("1." and "3." above) "Kleene's theorem".[7] Two other textbooks first prove the
expressive equivalence of NFAs and DFAs ("2." and "3.") and then state "Kleene's theorem" as
the equivalence between regular expressions and finite automata (the latter said to describe
"recognizable languages").[2][8] A linguistically oriented text first equates regular grammars ("4."
above) with DFAs and NFAs, calls the languages generated by (any of) these "regular", after
which it introduces regular expressions which it terms to describe "rational languages", and
finally states "Kleene's theorem" as the coincidence of regular and rational languages.[9] Other
authors simply define "rational expression" and "regular expressions" as synonymous and do the
same with "rational languages" and "regular languages".[1][2]

Closure properties

The regular languages are closed under the various operations, that is, if the languages K and L
are regular, so is the result of the following operations:

• the set theoretic Boolean operations: union K ∪ L, intersection K ∩ L, and complement L, hence
also relative complement K-L.
• the regular operations: K ∪ L, concatenation K ∘ L, and Kleene star L*.
• the trio operations: string homomorphism, inverse string homomorphism, and intersection with
regular languages. As a consequence they are closed under arbitrary finite state transductions, like
quotient K / L with a regular language. Even more, regular languages are closed under quotients
with arbitrary languages: If L is regular then L/K is regular for any K.[citation needed]
• the reverse (or mirror image) LR.[citation needed]

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

Formal Language Automata and Automata Theory MRITS

Decidability properties

Given two deterministic finite automata A and B, it is decidable whether they accept the same
language.[12] As a consequence, using the above closure properties, the following problems are
also decidable for arbitrarily given deterministic finite automata A and B, with accepted
languages LA and LB, respectively:

• Containment: is LA ⊆ LB ?[note 10]


• Disjointness: is LA ∩ LB = {} ?
• Emptiness: is LA = {} ?
• Universality: is LA = Σ* ?
• Membership: given a ∈ Σ*, is a ∈ LB ?

For regular expressions, the universality problem is NP-complete already for a singleton
alphabet.[13] For larger alphabets, that problem is PSPACE-complete. If regular expressions are
extended to allow also a squaring operator, with "A2" denoting the same as "AA", still just
regular languages can be described, but the universality problem has an exponential space lower
bound,[16][17][18] and is in fact complete for exponential space with respect to polynomial-time
reduction.[19]

Complexity results

In computational complexity theory, the complexity class of all regular languages is sometimes
referred to as REGULAR or REG and equals DSPACE(O(1)), the decision problems that can
be solved in constant space (the space used is independent of the input size). REGULAR ≠ AC0,
since it (trivially) contains the parity problem of determining whether the number of 1 bits in the
input is even or odd and this problem is not in AC0. On the other hand, REGULAR does not

contain AC0, because the nonregular language of palindromes, or the nonregular language
can both be recognized in AC0.

If a language is not regular, it requires a machine with at least Ω(log log n) space to recognize
(where n is the input size).[22] In other words, DSPACE(o(log log n)) equals the class of regular
languages. In practice, most nonregular problems are solved by machines taking at least
logarithmic space.

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

Formal Language Automata and Automata Theory MRITS

Location in the Chomsky hierarchy

Regular language in classes of Chomsky hierarchy.

To locate the regular languages in the Chomsky hierarchy, one notices that every regular
language is context-free. The converse is not true: for example the language consisting of all
strings having the same number of a's as b's is context-free but not regular. To prove that a
language such as this is not regular, one often uses the Myhill–Nerode theorem or the pumping
lemma among other methods.[23]

Important subclasses of regular languages include

• Finite languages - those containing only a finite number of words.[24] These are regular languages,
as one can create a regular expression that is the union of every word in the language.
• Star-free languages, those that can be described by a regular expression constructed from the
empty symbol, letters, concatenation and all boolean operators including complementation but
not the Kleene star: this class includes all finite languages.[25]
• Cyclic languages, satisfying the conditions uv ∈ L ⇔ vu ∈ L and w ∈ L ⇔ w n ∈ L. The number
of words in a regular language

Let denote the number of words of length in . The ordinary generating function for L is the
formal power series

The generating function of a language L is a rational function if L is regular. Hence for any
regular language there exist an integer constant , complex constants and complex polynomials
such that for every the number of words of length in is .Thus, non-regularity of certain languages
can be proved by counting the words of a given length in . Consider, for example, the Dyck
language of strings of balanced parentheses. The number of words of length in the Dyck language
is equal to the Catalan number , which is not of the form , witnessing the non-regularity of the
Dyck language. Care must be taken since some of the eigenvalues could have the same
magnitude. For example, the number of words of length in the language of all even binary words
is not of the form , but the number of words of even or odd length are of this form; the
corresponding eigenvalues are . In general, for every regular language there exists a constant such
that for all , the number of words of length is asymptotically .The zeta function of a language L is
The zeta function of a regular language is not in general rational, but that of a cyclic language is.

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

Formal Language Automata and Automata Theory MRITS

Generalizations

The notion of a regular language has been generalized to infinite words (see ω-automata) and to
trees (see tree automaton).

Rational set generalizes the notion (of regular/rational language) to monoids that are not
necessarily free. Likewise, the notion of a recognizable language (by a finite automaton) has
namesake as recognizable set over a monoid that is not necessarily free. Howard Straubing notes
in relation to these facts that “The term "regular language" is a bit unfortunate. Papers influenced
by Eilenberg's monograph[35] often use either the term "recognizable language", which refers to
the behavior of automata, or "rational language", which refers to important analogies between
regular expressions and rational power series. (In fact, Eilenberg defines rational and
recognizable subsets of arbitrary monoids; the two notions do not, in general, coincide.) This
terminology, while better motivated, never really caught on, and "regular language" is used
almost universally.”[36]

Rational series is another generalization, this time in the context of a formal power series over a
semiring. This approach gives rise to weighted rational expressions and weighted automata. In
this algebraic context, the regular languages (corresponding to Boolean-weighted rational
expressions) are usually called rational languages. Also in this context, Kleene's theorem finds a
generalization called the Kleene-Schützenberger theorem.

Decision Properties of Regular Languages


• Some of these questions we can answer already with what we know
– Is a given string in the language?
– Given 2 languages are there strings that are in both?
• Others require additional tools:
– Is the language the same as another regular language?
– Is there a language L that is not
regular?

Equivalence and Minimization of Automata


DFA Minimization using Myphill-Nerode Theorem
Algorithm

Input − DFA

Output − Minimized DFA

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

Formal Language Automata and Automata Theory MRITS

Step 1 − Draw a table for all pairs of states (Qi, Qj) not necessarily connected directly [All are
unmarked initially]

Step 2 − Consider every state pair (Qi, Qj) in the DFA where Qi ∈ F and Qj ∉ F or vice versa and
mark them. [Here F is the set of final states]

Step 3 − Repeat this step until we cannot mark anymore states −

If there is an unmarked pair (Qi, Qj), mark it if the pair {δ (Qi, A), δ (Qi, A)} is marked for some
input alphabet.

Step 4 − Combine all the unmarked pair (Qi, Qj) and make them a single state in the reduced
DFA.

Example

Let us use Algorithm 2 to minimize the DFA shown below.

Step 1 − We draw a table for all pair of states.

abcdef

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

Formal Language Automata and Automata Theory MRITS

Step 2 − We mark the state pairs.

a b c d e f

c✔✔

d✔✔

e✔✔

f ✔✔✔

Step 3 − We will try to mark the state pairs, with green colored check mark, transitively. If we
input 1 to state ‘a’ and ‘f’, it will go to state ‘c’ and ‘f’ respectively. (c, f) is already marked,
hence we will mark pair (a, f). Now, we input 1 to state ‘b’ and ‘f’; it will go to state ‘d’ and ‘f’
respectively. (d, f) is already marked, hence we will mark pair (b, f).

a b c d e f

c✔✔

d✔✔

e✔✔

f ✔✔✔✔✔

After step 3, we have got state combinations {a, b} {c, d} {c, e} {d, e} that are unmarked.

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

Formal Language Automata and Automata Theory MRITS

We can recombine {c, d} {c, e} {d, e} into {c, d, e}

Hence we got two combined states as − {a, b} and {c, d, e}

So the final minimized DFA will contain three states {f}, {a, b} and {c, d, e}

DFA Minimization using Equivalence Theorem

If X and Y are two states in a DFA, we can combine these two states into {X, Y} if they are not
distinguishable. Two states are distinguishable, if there is at least one string S, such that one of δ
(X, S) and δ (Y, S) is accepting and another is not accepting. Hence, a DFA is minimal if and
only if all the states are distinguishable.

Algorithm 3

Step 1 − All the states Q are divided in two partitions − final states and non-final states and are
denoted by P0. All the states in a partition are 0th equivalent. Take a counter k and initialize it
with 0.

Step 2 − Increment k by 1. For each partition in Pk, divide the states in Pk into two partitions if
they are k-distinguishable. Two states within this partition X and Y are k-distinguishable if there
is an input S such that δ(X, S) and δ(Y, S) are (k-1)-distinguishable.

Step 3 − If Pk ≠ Pk-1, repeat Step 2, otherwise go to Step 4.

Step 4 − Combine kth equivalent sets and make them the new states of the reduced DFA.

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

Formal Language Automata and Automata Theory MRITS

Example

Let us consider the following DFA −

q δ(q,0) δ(q,1)

a b c

b a d

c e f

d e f

e e f

f f f

Let us apply the above algorithm to the above DFA −

• P0 = {(c,d,e), (a,b,f)}
• P1 = {(c,d,e), (a,b),(f)}
• P2 = {(c,d,e), (a,b),(f)}

Hence, P1 = P2.

There are three states in the reduced DFA. The reduced DFA is as follows −

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh


www.android.universityupdates.in | www.universityupdates.in | https://telegram.me/jntuh

Formal Language Automata and Automata Theory MRITS

Q δ(q,0) δ(q,1)

(a, b) (a, b) (c,d,e)

(c,d,e) (c,d,e) (f)

(f) (f) (f)

www.android.previousquestionpapers.com | www.previousquestionpapers.com | https://telegram.me/jntuh

You might also like