Context-Free Grammars
Formalism
Derivations
Backus-Naur Form
Left- and Rightmost Derivations
rikip ginanjar
Informal Comments
• A context-free grammaris a notation for describing
languages.
• It is more powerful than finite automata or RE’s, but
still cannot define all possible languages.
• Useful for nested structures, e.g., parentheses in
programming languages.
2
rikip ginanjar
Informal Comments – (2)
• Basic idea is to use “variables” to stand for sets of
strings (i.e., languages).
• These variables are defined recursively, in terms
of one another.
• Recursive rules (“productions”) involve only
concatenation.
• Alternative rules for a variable allow union.
3
rikip ginanjar
Example: CFG for { 0n1n |n > 1}
• Productions:
S -> 01
S -> 0S1
• Basis: 01 is in the language.
• Induction: if w is in the language, then so is 0w1.
4
rikip ginanjar
CFG Formalism
• Terminals = symbols of the alphabet of the language
being defined.
• Variables= nonterminals = a finite set of other
symbols, each of which represents a language.
• Start symbol = the variable whose language is the one
being defined.
5
rikip ginanjar
Productions
• A production has the form variable -> string of
variables and terminals.
• Convention:
– A, B, C,… are variables.
– a, b, c,… are terminals.
– …, X, Y, Z are either terminals or variables.
– …, w, x, y, z are strings of terminals only.
– , , ,… are strings of terminals and/or variables.
6
rikip ginanjar
Example: Formal CFG
• Here is a formal CFG for { 0n1n |n > 1}.
• Terminals = {0, 1}.
• Variables = {S}.
• Start symbol = S.
• Productions =
S -> 01
S -> 0S1
7
rikip ginanjar
Derivations – Intuition
• We derive strings in the language of a CFG by starting
with the start symbol, and repeatedly replacing some
variable A by the right side of one of its productions.
– That is, the “productions for A” are those that have A on
the left side of the ->.
8
rikip ginanjar
Derivations – Formalism
• We say A => if A -> is a production.
• Example: S -> 01; S -> 0S1.
• S => 0S1 => 00S11 => 000111.
9
rikip ginanjar
Iterated Derivation
• =>* means “zero or more derivation steps.”
• Basis: =>* for any string .
• Induction: if =>* and => , then =>* .
10
rikip ginanjar
Example: Iterated Derivation
• S -> 01; S -> 0S1.
• S => 0S1 => 00S11 => 000111.
• So S =>* S; S =>* 0S1; S =>* 00S11; S =>* 000111.
11
rikip ginanjar
Sentential Forms
• Any string of variables and/or terminals derived from
the start symbol is called a sentential form.
• Formally, is a sentential form iff S =>* .
12
rikip ginanjar
Language of a Grammar
• If G is a CFG, then L(G), the language of G, is {w |S
=>* w}.
– Note: w must be a terminal string, S is the start symbol.
• Example: G has productions S -> ε and S -> 0S1.
• L(G) = {0n1n |n > 0}.
Note: ε is a legitimate
right side.
13
rikip ginanjar
Context-Free Languages
• A language that is defined by some CFG is called a
context-free language.
• There are CFL’s that are not regular languages,
such as the example just given.
• But not all languages are CFL’s.
• Intuitively: CFL’s can count two things, not three.
14
rikip ginanjar