Theory of Computation
Unit-3 Context Free Grammar
PPT
Prepared By
Archana Kadam
Assistant Professor, 1
Unit-3 Context-Free Grammar
Introduction, Regular Grammar, Context Free Grammar- Definition,
Derivation. Sentential form, parse tree, Ambiguous Grammar,
Simplification of CFG: Eliminating unit productions, useless production,
useless symbols, Greibach normal form, Chomsky normal form. Types of
Grammar: Chomsky Hierarchy, Context Free Language (CFL): Closure
properties of CFL.
Case Study- CFG for Parenthesis Match- XML and Document Type
Definitions, Natural Language Processing- Text Parsing
Definition
■ A context-free grammar (CFG) G is a
quadruple (V, T, P, S) where
■ V: a set of non-terminal symbols
■ T: a set of terminals (V ∩ T = Ǿ)
■ P: a set of rules (P: V → (V U T)*)
■ S: a start symbol.
Example
■ V = {q, f,}
■ T = {0, 1}
■ P = {q → 11q, q → 00f,
f → 11f, f → ε }
■ S=q
■ (R= {q → 11q | 00f, f → 11f | ε })
How do we use rules?
■ If A → B, then xAy xBy and we say that
xAy derivates xBy.
■ If s ··· t, then we write s * t.
■ A string x in Σ* is generated by G=(V,Σ,R,S)
if S * x.
■ L(G) = { x in Σ* | S * x}.
Example
■ G = ({S}, {0,1}. {S → 0S1 | ε }, S)
■ ε in L(G) because S -> ε .
■ 01 in L(G) because S -> 0S1-> 01.
■ 0011 in L(G) because
S -> 0S1 -> 00S11 -> 0011.
■ 0 1 in L(G) because S -> * 0 1 .
■ L(G) = {0 1 | n > 0}
Context-Free Language (CFL)
■ A language L is context-free if there
exists a CFG G such that L = L(G).
■ Example are palindrome
■ L(G) is
■ S-> aSa | bSb | a | b
Corollary
■ Every regular language is a CFL.
■ The class of regular languages is a
proper subclass of CFLs.
CFL
Regula
r
Context Free Grammar
■ Construct the CFG for L ={ an bn | n>=1}
■ L = {ab, aabb, aaabbb,…..}
P : S-> aSb | ab
Construct the CFG for L of palindrome
strings. S - > aSa | bSb | Ɛ
9
Derivations
■ There are three ways to derive a string
from grammar
1. Leftmost Derivation : always deriving leftmost
non-terminal first
2. Rightmost Derivation : always deriving
rightmost non-terminal first
3. Derivation Tree / Parse Tree
10
Derivations Examples
■ Consider following CFG , G= {(S,A), (a,b), P, S}
■ Where P: S-> aAS | a
■ A-> SbA | SS | ba
■ Derive string aabbaa using leftmost and rightmost derivation
■ First number the production rules
■ 1. S -> aAS Left most derivation is
■ 2. S -> a S-> aAS…………Rule 1
-> aSbAS……...Rule 3
■ 3. A -> SbA ->aabAS……….Rule 2
■ 4. A -> SS ->a a b b a S….Rule 5
-> a a b b a a…...Rule 2
■ 5. A -> ba
11
Derivations Examples
■ Consider following CFG , G= {(S,A), (a,b), P, S}
■ Where P: S-> aAS | a
■ A-> SbA | SS | ba
■ Derive string aabbaa using leftmost and rightmost derivation
■ First number the production rules
■ 1. S -> aAS Right most derivation is
■ 2. S -> a S-> aAS…………Rule 1
-> aAa….……...Rule 2
■ 3. A -> SbA ->aSbAa……….Rule 3
■ 4. A -> SS ->a S b b a a….Rule 5
-> a a b b a a…...Rule 2
■ 5. A -> ba
12
Derivations Examples
■ Derive string aabbaa using derivation tree
■ First number the production rules
■ 1. S -> aAS
■ 2. S -> a Derivation Tree
S
■ 3. A -> SbA
■ 4. A -> SS a
A
S
■ 5. A -> ba
A a
S
b
b b a
13
Ambiguous Context Free Grammar
■ A CFG for a language is said to be
ambiguous, if there exists at least one
string which can be generated more
than one ways.
■ There exists more than one leftmost
derivation, rightmost derivation and
more than one derivation tree.
14
Ambiguous Context Free Grammar
■ Example :
■ E -> E + E | E * E | id
Now to derive a string id + id * id there are two leftmost
derivation given below
E => E + E E => E * E
=> id + E => E + E * E
=> id + E * E => id + E * E
=> id + id * E => id + id * E
=> id + id * id => id + id * id
So this grammar is ambiguous
15
Ambiguous Context Free Grammar
■ Example :
Check the ambiguity in following grammar
S -> iCtS | iCtSeS | a
C -> b
Consider string ‘ibtibtaea’
16
Ambiguous Context Free Grammar
S -> iCtS | iCtSeS | a
C -> b
Consider string ‘ibtibtaea’
S -> iCtS S -> iCtSeS
-> ibtS -> ibtSeS
-> ibtiCtSeS -> ibtiCtSeS
-> ibtibtaea ->ibtibtaea
So this grammar is ambiguous
17
Ambiguous Context Free Grammar
Check whether the grammar is ambiguous or not.
A->AA
A->(A)
A->a
For the string "a(a)(a)a"
18
As two derivation tree Grammar is Ambiguous
19
Removal of Ambiguity
Example :
E -> E + E | E * E | id
Postpone higher priority operators from start symbol
Use other Non terminals to take care of it
E-> E + T | T
T->T*F|F
F -> id
20
CFG Simplification
21
Three ways to simplify/clean a CFG
(clean)
1. Eliminate useless symbols
(simplify)
2. Eliminate ε-productions A => ε
3. Eliminate unit productions A => B
22
Eliminating useless symbols
A symbol X is reachable if there exists:
■ S 🡺* α X β
A symbol X is generating if there exists:
■ X 🡺* w,
■ for some w ∈ T*
For a symbol X to be “useful”, it has to be both
reachable and generating
■ S 🡺* α X β 🡺* w’, for some w’ ∈ T*
reachable generating
23
Algorithm to detect useless
symbols
1. First, eliminate all symbols that are not
generating
2. Next, eliminate all symbols that are not
reachable
24
Example: Useless symbols
■ S🡺AB | a
■ A🡺 b
1. A, S are generating
2. B is not generating (and therefore B is useless)
3. ==> Eliminating B… (i.e., remove all productions that involve B)
1. S🡺 a
2. A🡺b
4. Now, A is not reachable and therefore is useless
5. Simplified G: S 🡺 a
25
What’s the point of removing ε-productions?
A🡺ε
Eliminating ε-productions
Theorem: If G=(V,T,P,S) is a CFG for a language L, then
L\ {ε} has a CFG without ε-productions
Definition: A is “nullable” if A🡺* ε
■ If A is nullable, then any production of the form
“B🡺 CAD” can be simulated by:
■ B 🡺 CD | CAD
■ This can allow us to remove ε transitions for A
26
Algorithm to detect all nullable
variables
■ Basis:
■ If A🡺 ε is a production in G, then A is
nullable
(note: A can still have other productions)
■ Induction:
■ If there is a production B🡺 C1C2…Ck,
where every Ci is nullable, then B is also
nullable
27
Eliminating ε-productions
Given: G=(V,T,P,S)
Algorithm:
1. Detect all nullable variables in G
2. Then construct G1=(V,T,P1,S) as follows:
i. For each production of the form: A🡺X1X2…Xk, where
k≥1, suppose m out of the k Xi’s are nullable symbols
ii. Then G1 will have 2m versions for this production
i. i.e, all combinations where each Xi is either present or absent
iii. Alternatively, if a production is of the form: A🡺ε, then
remove it
28
Example: Eliminating
ε-productions
■ Let L be the language represented by the following CFG G:
i. S🡺AB
ii. A🡺aAA | ε
iii. B🡺bBB | ε Simplified
grammar
Goal: To construct G1, which is the grammar for L-{ε}
■ Nullable symbols: {A, B}
■ G1 can be constructed from G as follows: G 1:
■ B 🡺 b | bB | bB | bBB • S 🡺 A | B | AB
■ ==> B 🡺 b | bB | bBB • A 🡺 a | aA | aAA
■ Similarly, A 🡺 a | aA | aAA • B 🡺 b | bB | bBB
■ Similarly, S 🡺 A | B | AB
+
■ Note: L(G) = L(G1) U {ε} • S🡺ε
29
Eliminating unit productions
A => B B has to be a variable
What’s the point of removing unit transitions ?
Will save #substitutions
E.g., A=>B | … A=>xxx | yyy | zzz | …
B=>C | … B=> xxx | yyy | zzz | …
C=>D | … C=> xxx | yyy | zzz | …
D=>xxx | yyy | zzz D=>xxx | yyy | zzz
after 30
before
Putting all this together…
■ Theorem: If G is a CFG for a language that
contains at least one string other than ε, then there
is another CFG G1, such that L(G1)=L(G) - ε, and
G1 has:
■ no ε -productions
■ no unit productions
■ no useless symbols
■ Algorithm:
Step 1) eliminate ε -productions Again,
Step 2) eliminate unit productions the order is
Step 3) eliminate useless symbols important!
Why?
31
Normal Forms
32
Why normal forms?
■ If all productions of the grammar could be
expressed in the same form(s), then:
a. It becomes easy to design algorithms that use
the grammar
b. It becomes easy to show proofs and properties
33
Chomsky Normal Form (CNF)
Let G be a CFG for some L-{ε}
Definition:
G is said to be in Chomsky Normal Form if all
its productions are in one of the following
two forms:
i. A 🡺 BC where A,B,C are variables, or
ii. A🡺a where a is a terminal
■ G has no useless symbols
■ G has no unit productions
■ G has no ε-productions
34
CNF checklist
Is this grammar in CNF?
G1:
1. E 🡺 E+T | T*F | (E) | Ia | Ib | I0 | I1
2. T 🡺 T*F | (E) | Ia | Ib | I0 | I1
3. F 🡺 (E) | Ia | Ib | I0 | I1
4. I 🡺 a | b | Ia | Ib | I0 | I1
Checklist:
• G has no ε-productions
• G has no unit productions
• G has no useless symbols
• But…
• the normal form for productions is violated
So, the grammar is not in CNF
35
How to convert a G into CNF?
■ Assumption: G has no ε-productions, unit productions or useless
symbols
1) For every terminal a that appears in the body of a production:
i. create a unique variable, say Xa, with a production Xa 🡺 a, and
ii. replace all other instances of a in G by Xa
2) Now, all productions will be in one of the following
two forms:
■ A 🡺 B1B2… Bk (k≥3) or A🡺a
4) Replace each production of the form A 🡺 B1B2B3… Bk by:and so on…
B2 C2
B1 C1
■ A🡺B1C1 C1🡺B2C2 … Ck-3🡺Bk-2Ck-2 Ck-2🡺Bk-1Bk
36
Example #1
G in CNF:
G:
X0 => 0
S => AS | BABC
X1 => 1
A => A1 | 0A1 | 01
S => AS | BY1
B => 0B | 0
Y1 =>
C => 1C | 1 AY2
Y =>AX
A2=> BC1 | X0Y3 | X0X1
Y3 => AX1
B => X0B | 0
C => X1C | 1
All productions are of the form: A=>BC or A=>a
37
Example #2
G: 1. E 🡺 EX+T | TX*F | X(EX) | IXa | IXb | IX0 | IX1
1. E 🡺 E+T | T*F | (E) | Ia | Ib | I0 | I1 2. T 🡺 TX*F | X(EX) | IXa | IXb | IX0 | IX1
2. T 🡺 T*F | (E) | Ia | Ib | I0 | I1 3. F 🡺 X(EX) | IXa | IXb | IX0 | IX1
3. F 🡺 (E) | Ia | Ib | I0 | I1 4. I 🡺 Xa | Xb | IXa | IXb | IX0 | IX1
4. I 🡺 a | b | Ia | Ib | I0 | I1 Step (1) 5. X+ 🡺 +
6. X* 🡺 *
7. X+ 🡺 +
8. X( 🡺 (
)
(2 9. …….
ep
St
1. E 🡺 EC1 | TC2 | X(C3 | IXa | IXb | IX0 | IX1
2. C 1 🡺 X +T
3. C 2 🡺 X *F
4. C3 🡺 EX)
5. T 🡺 ..…….
6. ….
38
Other Normal Forms
■ Griebach Normal Form (GNF)
■All productions of the form
A==>a α
Where a is terminal and α is string of zero or
more non-terminals
39
Greibach Normal Form (GNF)
■ Two ways to handle to convert grmmar in GNF
■ Theorem 1: By substitution Method
■ Example:
A->ABA | AB
A-> aA | a
Using theorem 1 of substitution for B in production of first
grammsr
A-> aABA | aBA | aAB| aB
Now all grammar production are in required format.
(Kindly note do not replace for grammar production already in format)
40
Greibach Normal Form (GNF)
■ Two ways to handle to convert grammar in GNF
■ Theorem 2:is used to handle left recursive
grammar
■ If CFG consist of production of the form
A -> Aα1 | Aα2 |….| A αr | β1 | β2 |….| βn
Then equivalent grammar can be formed as
A ->
β1 | β2 |….| βn
A -> β1 Z | β2 Z|….| βn Z
Z -> α1 | α2 |….| αr
Z -> α1 Z | α2 Z |….| αr Z
41
Greibach Normal Form (GNF)
Now A’s production are in
Example : GNF.
A -> AB| aB | bB| c Need to handle Z, will apply
theorem one of substitution
B -> b So,
To bring this grammar in GNF will apply Z-> b | bZ
theorem 2 to handle left recursive
grammar of A -> AB
Final Answer is
Here, α1 = B,
β1 = aB, β2 = bB, , β3 = c A -> aB | bB| c
A -> | aBZ | bBZ| cZ
Then,
Z -> b | bZ
A -> aB | bB| c
A -> | aBZ | bBZ| cZ
Z -> B | BZ
42
Chomsky Hierarchy
26-09-2023 43
Introduction
■ Different classes of phrase structure
grammar can be obtained with few
constraints
■ It is a containment hierarchy
■ Described by Chomsky-Schitzenberger
■ Suggested four different classes
Type-0 (Unrestricted grammar)
Type-1 (Context sensitive grammar)
Type-2 (Context free grammar)
Type-3 (Regular grammar)
26-09-2023 44
Type-0 (Unrestricted grammar)
■ No restriction on production rules
■ Production of form α −> β , where α ≠ ε
■ Semi-thue grammar or unrestricted grammar
■ Generate languages are recognized by
Turing machines (recursively enumerable
languages)
■ For example: A −>AB, AB −>BC, B −>AcD,
Ac −>abc, D −> ε
26-09-2023 45
Type-1 (Context sensitive grammar)
Restriction on production rules are as follows:
1. for α −> β, the length of β is at least as much as
length of α, except for S −> ε
2. The rule S −> ε is allowed only if S doesn’t appear on
RHS
3. Context sensitive because of productions of form
α1Aiα2 −> α1βα2 , β ≠ ε and α1 , α2 may or may not be
empty
Context sensitive language (CSL) recognized by turing
machine
Example: A −>aBC, aB −>ab, bC−>bc
Derivation:
A=>aBC=>abC ;
abC=>abc ; finally A=>abc
26-09-2023 46
Type-2 (Context free grammar)
■ Production of form A−> α , where α
is sentential form
■ Context free languages (CFL)
generated by context free grammar
are recognized by Push Down
Automata (PDA)
■ For example:
■ S −>aSb, S −>bSb, S −>a, S −>b
26-09-2023 47
Type-3 (Regular grammar)
■ The restriction on production rules are as follows:
1. Left-hand side (LHS) should contain only one
non-terminal (NT) symbol
2. Right-hand side (RHS) can contain at most one NT
symbol, which should be rightmost or leftmost
symbol
■ Regular language by regular grammar , which are
recognized by finite state machines (FSM), and
denoted by regular expressions
■ Based on position of RHS non-terminal symbol, the
regular grammar is classified as left-linear or
right-linear grammar
26-09-2023 48
Type-3 (Regular grammar)
1. Left-linear grammar:
2. allowed productions are: A −>Bw , A −>w,
■ The rule S −> ε is allowed only if S doesn’t
appear on RHS
■ For example:
S −>Ca|Bb ,
C −>Bb ,
B −>Ba|b
26-09-2023 49
Type-3 (Regular grammar)
2. Right-linear grammar:
■ allowed productions are:
■ A −>wB , A −>w,
■ The rule S −> ε is allowed only if S
doesn’t appear on RHS
■ For example: S −>0A, A −>0A | 1
26-09-2023 50
CFL Closure Properties
51
Closure Property Results
■ CFLs are closed under:
■ Union
■ Concatenation
■ Kleene closure operator
■ Substitution
■ Homomorphism, inverse homomorphism
■ reversal
■ CFLs are not closed under:
■ Intersection
■ Difference
■ Complementation
52
Substitution of a CFL:
example
■ Let L = language of binary palindromes s.t., substitutions for 0
and 1 are defined as follows:
■ s(0) = {anbn | n ≥1}, s(1) = {xx,yy}
■ Prove that s(L) is also a CFL.
CFG for L: CFG for s(0): CFG for s(1):
S=> 0S0|1S1|ε S0=> aS0b | ab S1=> xx | yy
Therefore, CFG for s(L):
S=> S0SS0 | S1 S S1 |ε
S0=> aS0b | ab
S1=> xx | yy 53
CFLs are closed under union
Let L1 and L2 be CFLs
To show: L2 U L2 is also a CFL
Let us show by using the result of Substitution
■ Make a new language:
■ Lnew = {a,b} s.t., s(a) = L1 and s(b) = L2
==> s(Lnew) == same as == L1 U L2
■ A more direct, alternative proof
■ Let S1 and S2 be the starting variables of the
grammars for L1 and L2
■ Then, Snew => S1 | S2
54
CFLs are closed under concatenation
Let us show Let L and L be CFLs
■ by using the result of Substitution
1 2
■ Make Lnew= {ab} s.t.,
s(a) = L1 and s(b)= L2
==> L1 L2 = s(Lnew)
■ L1 grammar is S-> aSa|bSb|a|b|Ɛ
■ L2 grammar is S-> aSb |Ɛ
S->S1.S2
S1-> aSa|bSb|a|b|Ɛ
S2-> aSb |Ɛ 55
CFLs are closed under
Kleene Closure
■ Let L be a CFL
■ Let Lnew = {a}* and s(a) = L1
■ Then, L* = s(L )
new
■ Let L ={a}, S->a
■ S->S1S | Ɛ
■ S1->a
56
We won’t use substitution to prove this result
CFLs are closed under
Reversal
■ Let L be a CFL, with grammar
G=(V,T,P,S)
■ For LR, construct GR=(V,T,PR,S) s.t.,
■ If A==> α is in P, then:
■ A==> αR is in PR
■ (that is, reverse every production)
■ S-> aSb
■ SR -> bSa
57
Some negative closure results
CFLs are not closed under
Intersection
■ Existential proof:
■ L1 = {0n1n2i | n≥1,i≥1}
■ L2 = {0i1n2n | n≥1,i≥1}
■ Both L1 and L2 are CFLs
■ S-> AB
■ A->0A1 | 01
■ B->2B|2
■ Therefore, L1 ∩ L2 will be 0n1n2n
■ Not possible to construct grammar so CFL’s
are not closed under intersection
58
Some negative closure results
CFLs are not closed under
complementation
■ Follows from the fact that CFLs are not
closed under intersection
■ L1 ∩ L 2 = L 1 U L 2
Logic: if CFLs were to be closed under complementation
🡺 the whole right hand side becomes a CFL (because
CFL is closed for union)
🡺 the left hand side (intersection) is also a CFL
🡺 but we just showed CFLs are
NOT closed under intersection!
🡺 CFLs cannot be closed under complementation.
59
Some negative closure results
CFLs are not closed under
difference
■ Follows from the fact that CFLs are not
closed under complementation
■ Because, if CFLs are closed under
difference, then:
■ L = ∑* - L
■ So L has to be a CFL too
■ Contradiction
60
Decision Properties
■ Emptiness test
■ Generating test
■ Reachability test
■ Membership test
■ PDA acceptance
61
Application of CFL
■ Compiler –Parsing
■ Eg a=b;
■ <STMT> -> <Assign STMT> | <If STMT>
|<while loop>|<for loop>
■ <while loop>-> while <condition><block of stmt>
■ <block of stmt> -> <List of STMT>|<STMT>
■ <Assign STMT> -> <ID>=<ID>
62
Application of CFL
■ Markup Languages
■ DOC -> Element DOC
■ Element -> Text | <EM> DOC </EM> | <P>
DOC </P> | <OL> List </OL>
■ XML
■ DTD
63
memo :
addressee: sender date title body;
addressee : TEXT;
sender : TEXT;
date : DATE;
title : TEXT;
body : TEXT;
For example, the following would be a valid memo:
<memo> <addressee>John</addressee>
<sender>Carla</addressee> <date>1998-09-01</date>
<title>New coffee maker</title>
<body> The new coffee maker has been installed! Operation is
simple: put a cup in the opening and press the red button.
</body>
</memo>
64
“Undecidable” problems for
CFL
■ Is a given CFG G ambiguous?
■ Is a given CFL inherently ambiguous?
■ Is the intersection of two CFLs empty?
■ Are two CFLs the same?
■ Is a given L(G) equal to ∑*?
65