1     Closure Properties
1.1   Regular Operations
Union of CFLs
Proposition 1. If L1 and L2 are context-free languages then L1 ∪ L2 is also context-free.
Proof. Let L1 be language recognized by G1 = (V1 , Σ, R1 , S1 ) and L2 the language recognized by
G2 = (V2 , Σ, R2 , S2 ). Assume that V1 ∩ V2 = ∅; if this assumption is not true, rename the variables
of one of the grammars to make this condition true.
    We will construct a grammar G = (V, Σ, R, S) such that L(G) = L(G1 ) ∪ L(G2 ) as follows.
    • V = V1 ∪ V2 ∪ {S}, where S 6∈ V1 ∪ V2 (and V1 ∩ V2 = ∅)
    • R = R1 ∪ R2 ∪ {S → S1 |S2 }
    We need to show that L(G) = L(G1 ) ∪ L(G2 ). Consider w ∈ L(G). That means there is a
               ∗
derivation S ⇒G w. Since the only rules involving S are S → S1 and S → S2 , this derivation
                                   ∗                   ∗
is either of the form S ⇒G S1 ⇒G w or S ⇒G S2 ⇒G w. Consider the first case. Since the
                                                                      ∗                   ∗
only rules for variables in V1 are those belonging to R1 and since S1 ⇒G w, we have S1 ⇒G1 w,
                                                 ∗                             ∗
and so w ∈ L1 = L(G1 ). If the derivation S ⇒G w is of the form S ⇒G S2 ⇒G w, then by a
similar reasoning we can conclude that w ∈ L(G2 ). Hence if w ∈ L(G) then w ∈ L(G1 ) ∪ L(G2 ).
Conversely, consider w ∈ L(G1 ) ∪ L(G2 ). Suppose w ∈ L(G1 ); the case that w ∈ L(G2 ) is similar
                                      ∗                                   ∗
and skipped. That means that S1 ⇒G1 w. Since R1 ⊆ R, we have S1 ⇒G w. Thus, we have
           ∗
S ⇒G S1 ⇒G w which means that w ∈ L(G). This completes the proof.
Concatenation, Kleene Closure
Proposition 2. CFLs are closed under concatenation and Kleene closure
Proof. Let L1 be language generated by G1 = (V1 , Σ, R1 , S1 ) and L2 the language generated by
G2 = (V2 , Σ, R2 , S2 ). As before we will assume that V1 ∩ V2 = ∅.
Concatenation Let G = (V, Σ, R, S) be such that V = V1 ∪ V2 ∪ {S} (with S 6∈ V1 ∪ V2 ), and
    R = R1 ∪ R2 ∪ {S → S1 S2 }. We will show that L(G) = L(G1 )L(G2 ). Suppose w ∈ L(G).
                                              ∗ G                                                   ∗ G
      Then there is a leftmost derivation S ⇒lm w. The form such a derivation is S ⇒G S1 S2 ⇒lm
            ∗ G                         ∗ G              ∗ G
      w1 S2 ⇒lm w1 w2 = w. Thus, S1 ⇒lm w1 and S2 ⇒lm w2 . Since the rules in R restricted to
                                                                           ∗ G1              ∗ G2
      V1 are R1 and restricted to V2 are R2 , we can conclude that S1 ⇒lm w1 and S2 ⇒lm w2 .
      Thus, w1 ∈ L(G1 ) and w2 ∈ L(G2 ) and therefore, w = w1 w2 ∈ L(G1 )L(G2 ). On the other
                                                                ∗              ∗
      hand, if w1 ∈ L(G1 ) and w2 ∈ L(G2 ) then we have S1 ⇒G1 w1 and S2 ⇒G2 w2 . Take
                                                                       ∗              ∗
      w = w1 w2 ∈ L(G1 )L(G2 ). Now since R1 ∪ R2 ⊆ R, we have S1 ⇒G w1 and S2 ⇒G w2 .
                                      ∗         ∗
      Therefore, we have, S ⇒G S1 S2 ⇒G w1 S2 ⇒G w1 w2 = w, and so w ∈ L(G).
                                                    1
Kleene Closure Let G = (V = V1 ∪ {S}, Σ, R = R1 ∪ {S → SS1 | }, S), where S 6∈ V1 . We will
    show that L(G) = (L(G1 ))∗ . We will show if w ∈ L(G) then w ∈ (L(G1 ))∗ by induction
    on the length of the leftmost derivation of w. For the base case, consider w such that
    S ⇒G w. Since S →  is the only rule for S whose right-hand side has terminals, this
    means that w = . Further,  ∈ (L(G1 ))∗ which establishes the base case. The induction
                                                              ∗ G
      hypothesis assumes that for all strings w, if S ⇒lm w in < n steps then w ∈ (L(G1 ))∗ .
                                  ∗ G
      Consider w such that S ⇒lm w in n steps. Any leftmost derivation has the following form:
                     ∗ G         ∗ G                                   ∗ G
      S ⇒G SS1 ⇒lm w1 S1 ⇒lm w1 w2 = w. Now we have S ⇒lm w1 is < n steps (because
           ∗ G                                          ∗ G
      S1 ⇒lm w2 takes at least one step), and S1 ⇒lm w2 . This means that w1 ∈ (L(G1 ))∗ (by
      induction hypothesis) and w2 ∈ L(G1 ) (since the only rules in R for variables in V1 are those
      belonging to R1 ). Thus, w = w1 w2 ∈ (L(G1 ))∗ . For the converse, suppose w ∈ (L(G1 ))∗ . By
      definition, this means that there are w1 , w2 , . . . wn (for n ≥ 0) such that wi ∈ L(G1 ) for all
      i. Now if n = 0 (i.e., w = ) then we have S ⇒G w because S →  is a rule. Otherise, since
                                ∗                                            ∗
      wi ∈ L(G1 ), we have S1 ⇒G1 wi , for each i. Since R1 ⊆ R, S1 ⇒G wi . Hence we have the
      following derivation
                                                                ∗            ∗       ∗
      S ⇒G SS1 ⇒G SSS1 ⇒G · · · ⇒G S(S1 )n ⇒G (S1 )n ⇒G w1 (S1 )n−1 ⇒G · · · ⇒G w1 w2 · · · wn = w
1.2      Intersection and Complementation
Intersection
Proposition 3. CFLs are not closed under intersection
Proof.      • L1 = {ai bi cj | i, j ≥ 0} is a CFL
          – Generated by a grammar with rules S → XY ; X → aXb|; Y → cY |.
   • L2 = {ai bj cj | i, j ≥ 0} is a CFL.
          – Generated by a grammar with rules S → XY ; X → aX|; Y → bY c|.
   • But L1 ∩ L2 = {an bn cn | n ≥ 0}, which we will see soon, is not a CFL.
Intersection with Regular Languages
Proposition 4. If L is a CFL and R is a regular language then L ∩ R is a CFL.
Proof. Let P be the PDA that accepts L, and let M be the DFA that accepts R. A new PDA
P 0 will simulate P and M simultaneously on the same input and accept if both accept. Then P 0
accepts L ∩ R.
                                                    2
   • The stack of P 0 is the stack of P
   • The state of P 0 at any time is the pair (state of P , state of M )
   • These determine the transition function of P 0
   • The final states of P 0 are those in which both the state of P and state of M are accepting.
   More formally, let M = (Q1 , Σ, δ1 , q1 , F1 ) be a DFA such that L(M ) = R, and P = (Q2 , Σ, Γ, δ2 , q2 , F2 )
be a PDA such that L(P ) = L. Then consider P 0 = (Q, Σ, Γ, δ, q0 , F ) such that
   • Q = Q1 × Q2
   • q0 = (q1 , q2 )
   • F = F1 × F2
                                    {((p, q 0 ), b) | (q 0 , b) ∈ δ2 (q, x, a)}
                                                                                                                                     when x = 
            δ((p, q), x, a) =
                                    {((p0 , q 0 ), b) | p0 = δ1 (p, x) and (q 0 , b) ∈ δ2 (q, x, a)} when x 6= 
   One can show by induction on the number of computation steps, that for any w ∈ Σ∗
                                w                           w                      w
                       hq0 , i −→P 0 h(p, q), σi iff q1 −→M p and hq2 , i −→P hq, σi
The proof of this statement is left as an exercise. Now as a consequence, we have w ∈ L(P 0 )
              w                                                                                     w
iff hq0 , i −→P 0 h(p, q), σi such that (p, q) ∈ F (by definition of PDA acceptance) iff hq0 , i −→P 0
                                                                      w                  w
h(p, q), σi such that p ∈ F1 and q ∈ F2 (by definition of F ) iff q1 −→M p and hq2 , i −→P hq, σi and
p ∈ F1 and q ∈ F2 (by the statement to be proved as exercise) iff w ∈ L(M ) and w ∈ L(P ) (by
definition of DFA acceptance and PDA acceptance).
   Why does this construction not work for intersection of two CFLs?
Complementation
Proposition 5. Context-free languages are not closed under complementation.
Proof. [Proof 1] Suppose CFLs were closed under complementation. Then for any two CFLs L1 ,
L2 , we have
   • L1 and L2 are CFL. Then, since CFLs closed under union, L1 ∪ L2 is CFL. Then, again by
     hypothesis, L1 ∪ L2 is CFL.
   • i.e., L1 ∩ L2 is a CFL
i.e., CFLs are closed under intersection. Contradiction!
     [Proof 2] L = {x | x not of the form ww} is a CFL.
   • L generated by a grammar with rules X → a|b, A → a|XAX, B → b|XBX, S → A|B|AB|BA
But L = {ww | w ∈ {a, b}∗ } we will see is not a CFL!
                                                           3
Set Difference
Proposition 6. If L1 is a CFL and L2 is a CFL then L1 \ L2 is not necessarily a CFL
Proof. Because CFLs not closed under complementation, and complementation is a special case of
set difference. (How?)
Proposition 7. If L is a CFL and R is a regular language then L \ R is a CFL
Proof. L \ R = L ∩ R
1.3   Homomorphisms
Homomorphism
Proposition 8. Context free languages are closed under homomorphisms.
Proof. Let G = (V, Σ, R, S) be the grammar generating L, and let h : Σ∗ → Γ∗ be a homomorphism.
A grammar G0 = (V 0 , Γ, R0 , S 0 ) for generating h(L):
   • Include all variables from G (i.e., V 0 ⊇ V ), and let S 0 = S
   • Treat terminals in G as variables. i.e., for every a ∈ Σ
        – Add a new variable Xa to V 0
        – In each rule of G, if a appears in the RHS, replace it by Xa
   • For each Xa , add the rule Xa → h(a)
   G0 generates h(L). (Exercise!)
Example 9. Let G have the rules S → 0S0|1S1|.
   Consider the homorphism h : {0, 1}∗ → {a, b}∗ given by h(0) = aba and h(1) = bb.
   Rules of G0 s.t. L(G0 ) = L(L(G)):
                                       S → X0 SX0 |X1 SX1 |
                                     X0 → aba
                                     X1 → bb
                                                  4
1.4   Inverse Homomorphisms
Inverse Homomorphisms
   Recall: For a homomorphism h, h−1 (L) = {w | h(w) ∈ L}
Proposition 10. If L is a CFL then h−1 (L) is a CFL
Proof Idea
For regular language L: the DFA for h−1 (L) on reading a symbol a, simulated the DFA for L on
h(a). Can we do the same with PDAs?
   • Key idea: store h(a) in a “buffer” and process symbols from h(a) one at a time (according
     to the transition function of the original PDA), and the next input symbol is processed only
     after the “buffer” has been emptied.
   • Where to store this “buffer”? In the state of the new PDA!
Proof. Let P = (Q, ∆, Γ, δ, q0 , F ) be a PDA such that L(P ) = L. Let h : Σ∗ → ∆∗ be a homomor-
phism such that n = maxa∈Σ |h(a)|, i.e., every symbol of Σ is mapped to a string under h of length
at most n. Consider the PDA P 0 = (Q0 , Σ, Γ, δ 0 , q00 , F 0 ) where
   • Q0 = Q × ∆≤n , where ∆≤n is the collection of all strings of length at most n over ∆.
   • q00 = (q0 , )
   • F 0 = F × {}
   • δ 0 is given by
                               (
                                {((q, h(x)), )}                    if v = a = 
          δ 0 ((q, v), x, a) =
                                {((p, u), b) | (p, b) ∈ δ(q, y, a)} if v = yu, x = , and y ∈ (∆ ∪ {})
      and δ 0 (·) = ∅ in all other cases.
   We can show by induction that for every w ∈ Σ∗
                                       w                            w0
                              hq00 , i −→P 0 h(q, v), σi iff hq0 , i −→P hq, σi
                                                                                                      w
where h(w) = w0 v. Again this induction proof is left as an exercise. Now, w ∈ L(P 0 ) iff hq00 , i −→P 0
                                                                                    h(w)
h(q, ), σi where q ∈ F (by definition of PDA acceptance and F 0 ) iff hq0 , i −→P hq, σi (by exercise)
iff h(w) ∈ L(P ) (by definition of PDA acceptance). Thus, L(P 0 ) = h−1 (L(P )) = h−1 (L).