0% found this document useful (0 votes)

194 views109 pages

The Theory of Languages and Computation: Preliminary Notes - Please Do Not Distribute

The document is a preliminary outline for a textbook on the theory of languages and computation. It provides an overview of the topics that will be covered in each chapter, including notation, proofs, set theory, formal languages, automata, computability, and current research areas. The chapters cover fundamental concepts such as regular languages, context-free grammars, pushdown automata, Turing machines, and the halting problem.

Uploaded by

Nemanja Milosavljevic

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

194 views109 pages

The Theory of Languages and Computation: Preliminary Notes - Please Do Not Distribute

Uploaded by

Nemanja Milosavljevic

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 109

The Theory of Languages and Computation

Jean Gallier jean@saul.cis.upenn.edu Andrew Hicks rah@grip.cis.upenn.edu Department of Computer and Information Science University of Pennsylvania

Preliminary notes - Please do not distribute.

a 1 b b 3 a 2

a,b

Contents
1 Automata 1.1 Notation . . . . . . . . . . . . . . . . . 1.2 Proofs . . . . . . . . . . . . . . . . . . 1.3 Set Theory . . . . . . . . . . . . . . . 1.4 The Natural numbers and Induction . 1.5 Foundations of Language Theory . . . 1.6 Operations on Languages . . . . . . . 1.7 Deterministic Finite Automata . . . . 1.8 The Cross Product Construction . . . 1.9 Non-Deterministic Finite Automata . 1.10 Directed Graphs and Paths . . . . . . 1.11 Labeled Graphs and Automata . . . . 1.12 The Theorem of Myhill and Nerode . 1.13 Minimal DFAs . . . . . . . . . . . . . 1.14 State Equivalence and Minimal DFAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 3 3 3 15 20 21 23 27 29 32 34 42 46 47 54 54 56 57 61 67 68 69 69 71 75 79 81 83 87 91 93

2 Formal Languages 2.1 A Grammar for Parsing English . . . . . . . . . . . . . . . . . 2.2 Context-Free Grammars . . . . . . . . . . . . . . . . . . . . . 2.3 Derivations and Context-Free Languages . . . . . . . . . . . . 2.4 Normal Forms for Context-Free Grammars, Chomsky Normal 2.5 Regular Languages are Context-Free . . . . . . . . . . . . . . 2.6 Useless Productions in Context-Free Grammars . . . . . . . . 2.7 The Greibach Normal Form . . . . . . . . . . . . . . . . . . . 2.8 Least Fixed-Points . . . . . . . . . . . . . . . . . . . . . . . . 2.9 Context-Free Languages as Least Fixed-Points . . . . . . . . 2.10 Least Fixed-Points and the Greibach Normal Form . . . . . . 2.11 Tree Domains and Gorn Trees . . . . . . . . . . . . . . . . . . 2.12 Derivations Trees . . . . . . . . . . . . . . . . . . . . . . . . . 2.13 Ogdens Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . 2.14 Pushdown Automata . . . . . . . . . . . . . . . . . . . . . . . 2.15 From Context-Free Grammars To PDAs . . . . . . . . . . . . 2.16 From PDAs To Context-Free Grammars . . . . . . . . . . . . 3 Computability 3.1 Computations of Turing Machines 3.2 The Primitive Recursive Functions 3.3 The Partial Recursive Functions . 3.4 Recursively Enumerable Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . and Recursive Languages

. . . . . . . . . . . . Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

95 . 97 . 99 . 102 . 103

3.5 3.6 3.7 3.8 3.9 3.10 3.11 3.12

Phrase-Structure Grammars . . . . . . . . . . . . . . Derivations and Type-0 Languages . . . . . . . . . . Type-0 Grammars and Context-Sensitive Grammars The Halting Problem . . . . . . . . . . . . . . . . . . A Univeral Machine . . . . . . . . . . . . . . . . . . The Parameter Theorem . . . . . . . . . . . . . . . . Recursively Enumerable Languages . . . . . . . . . . Hilberts Tenth Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . .

104 105 106 107 107 107 107 107 108 108 108 108 108

4 Current Topics 4.1 DNA Computing . . . . . . . . . . . . . . 4.2 Analog Computing . . . . . . . . . . . . . 4.3 Scientic Computing/Dynamical Systems 4.4 Quantum Computing . . . . . . . . . . . .

Chapter 1

Automata
1.1 Notation

The following conventions are useful and standard. stands for not or the negation of. stands for for all. stands for there exists. stands for such that. s.t. stands for such that. stands for implies as in A B (A implies B). stands for is equivalent to as in A B (A is equivalent to B). i is the same as .

1.2

Proofs

The best way to learn what proofs are and how to do them is to see examples. If you try to nd a denition of a proof or you going around asking people what they think a proof is, then you will quickly nd that you are asking a hard question. Our approach will be to avoid dening proofs (something we couldnt do anyway), and instead do a bunch so you can see what we mean. Often students say I dont know how to do proofs. But they do. Almost everyone could do the following: Theorem x = 5 is a solution of 2x = 10. Proof 2 5 = 10. So in some sense, EVERYONE can do a proof. Things get stickier though if it is not clear what you allowed to use. For example, the following theorem is often proved in elementary real analysis courses: Theorem 1 > 0. Just given that theorem out of context, it really isnt clear that there is anything to prove. But, in a analysis course a denition of 1 and 0 is given so that it is sensible to give a proof. In other words the basic assumptions are made clear. One of our goals in the next few sections is to clarify what is to be considered a basic assumption.

1.3

Set Theory

Most people are introduced to computer science by using a real computer of course, and for the most part this requires a knowledge of only some basic algebra. But as one starts to learn more about about the theory 4

of computer science, it becomes apparent that a kind of mathematics dierent from algebra must be used. What is needed is a way of stating problems precisely and a way to deal with things that are more abstract than basic algebra. Fortunately, there is a lot of mathematics available to do this. In fact, there are even dierent choices that one can use for the foundations, although the framework that we will use is used by almost everyone. That framework is classical set theory as was invented by Cantor in the 19th century. We should emphasize that one reason people start with set theory as their foundations is that the idea of a set seems pretty natural to most people, and so we can communicate with each other fairly well since we seem to all have the same thing in mind. But what we take as an axiom, as opposed to what we construct, is a matter of taste. For example, some objects that we dene below, such as the ordered pair or the function, could be taken as part of the axioms. There is no universal place where one can say the foundations should begin. It seems to be the case though, that when most people read the denition of a set, they understand it, in the sense that they talk to other people about sets and seem to be talking about the same thing. Denition 1.3.1 A set is a collection of objects. The objects of a set are called the elements of that set. Notation If we are given a set A such that x is in A i x has property P , then we write A = {x | x has property P }, or some obvious variation on the above. If x is an element of A then we may write x A. Example 1.3.2 Thus, is the set of prime numbers. Similarly, if we want to only look at prime numbers of the form n2 + 1 we could write {x P | there exists a natural number n, such that x = n2 + 1.} A more compact way of denoting this set is {x P | (n N ) (x = n2 + 1)} or even simply {x P | (n N )(x = n2 + 1)}. Although these last two ways of dening the above set is more compact, if many things are written in this manner it soon becomes hard to read them. For nite sets we can just list the elements, so the set with the elements 1, 2 and 3 is {1, 2, 3}. On the other hand to list the numbers from 1 to 100 we could write {1, 2, 3, ..., 99, 100}. Notation Some sets of numbers are used so often that they are given their own names. We will use N for the set of natural numbers {0, 1, 2, 3...}, Z for the set of integers {0, 1, 1, 2, 2, ...}, Z + for the set of positive p integers, Q for the set of rational numbers { |p, q Z, q = 0}, and R for the set of real numbers. Another q common notation is that for any set A of numbers, if n is an integer then nA = {na|a A}. Thus, for example, 2Z is the set of even numbers. Incidentally, we have not indicated how to construct these sets, and we have no intention of doing so. One can start from scratch, and dene the natural numbers, and then the integers and the rationals etc. This is a very interesting process, and one can continue it in many dierent directions, dening the real numbers, the p-adic numbers, the complex numbers and the quaternions. Some of these objects have applications in computer science, but the closest we will get to the foundational aspects of numbers systems is when we study the natural numbers below. Denition 1.3.3 If A and B are sets and for every x A we have that x B then we say that A is a subset of B and we write A B. 5 P = {x | x is a prime number}

Denition 1.3.4 We consider two sets to be equal if they have the same elements, i.e. if A B and B A. If A and B are equal then we write A = B. It might seem strange to dene what it means for two things to be equal. A familiar example of a situation where it is not so clear as to what is meant by equality is how to dene equality between two objects that have the same type in a programming language (i.e. the notion of equality for a given data structure). For example, given two pointers, are they equal if they point at the same memory location, or are they equal if they point at memory locations that have the same contents ? Thus in programming one needs to be careful about this, but from now on since everything will be dened in terms of sets, we can use the above denition. Denition 1.3.5 The union of the sets A and B is the set A B = {x | x A or x B}. More generally, for any set C we dene C = {x|(A C) (x A)}. For example, if A = {1, 2, 6, {10, 100}, {0}, {{3.1415}}} then A = {10, 100, 0, {3.1415}}. There are a number of variants of this notation. For example, suppose we have a set of 10 sets C = {A1 , ..., A10 }. Then the union, C, can also be written as
10

Ai .
i=1

Lemma 1.3.6 A B = B A. Proof What needs to be shown here ? The assertion is that two sets are equal. Thus we need to show that x A B x B A. This means that we need to do two little proofs: one that x A B x B A and one that x A B x B A. : If x A B then we know that x is in either A or B. We want to show that x is in either B or A. But from logic we know that these are equivalent statements, i.e. p or q is logically the same as q or p. So we are done with this part of the proof. : This proof is very much like the one above. Denition 1.3.7 The intersection of the sets A and B is the set A B = {x | x A and x B}. Denition 1.3.8 The dierence of the sets A and B is the set A B = {x | x A and x B}. Denition 1.3.9 By an ordered pair (a, b) we mean the set {{a}, {a, b}}. This denition of ordered pair is due to Kuratowski. At this point, a good problem for the reader is to prove theorem 1.3.10. Just as the cons operator is the foundation of Lisp like programming languages, the pair is the foundation of most mathematics, and for essentially the same reasons. The following is the essential reason for the importance of the ordered pair. Theorem 1.3.10 If (a, b) = (c, d) then a = c and b = d.

(2, 3)

(1, 1)

(0, 0)

Figure 1.1: Here we see a few of the points from the lattice of integers in the plane Proof See exercise 4 for a hint. Notation It is possible to give many dierent denitions of an n-tuple (a1 , a2 , ..., an ). For example, we could dene a triple (a, b, c) as (a, (b, c)) or ((a, b), c). Which denition we choose doesnt really matter - the only thing that is important is that the denition implies that if two n-tuples are equal, then their entries are equal. From this point on we will assume that we are using one of these denitions of an n-tuple. Denition 1.3.11 We dene the Cartesian product of A and B as A B = {(a, b) | a A and b B}. A convenient notation is to write A2 for A A. In general, if we take the product of a set A with itself n times, then we will sometimes write it as An . Example 1.3.12 Probably the most familiar example of a Cartesian product is the plane, R2 = R R. This is partly since it was the rst example, due to Decartes, who invented what we now call Cartesian coordinates. The idea of this important breakthrough is to think of the plane as a product. Then the position of a point can be encoded as a pair of numbers. A similar example is the lattice Z Z, which is a subset of R R. This lattice is simply the set of points in the plane with integer coordinates (see gure 1.1). Can you picture Z 3 R3 ? Example 1.3.13 For nite sets, one can list the elements in a Cartesian product. Thus {1, 2, 3}{x, y} = {(1, x), (1, y), (2, x), (2, y), (3, x), (3, y)}. Do you see how to list the elements of a Cartesian product using two nested loops ? Example 1.3.14 For any set A, A = A = . Denition 1.3.15 A relation is a set of ordered pairs. By a relation on A we mean a subset of A A. Notation If R is an relation on A and (a, b) R then we sometimes write aRb. 7

Figure 1.2: Here we consider two points in the plane equivalent if they have the same distance to the origin. Thus every equivalence class is either a circle or the set containing the origin. Denition 1.3.16 An equivalence relation on A is a relation on A satisfying (1) a a for all a A, (2) if a b then b a, (3) a b and b c imply that a c. The equivalence relation is an abstraction of equality. You have probably encountered equivalence relations when you studied plane geometry: congruence and similarity of triangles are the two most common examples. The point of the concept of congruence of two triangles is that even if two triangles are not equal as subsets of the plane, for some purposes they can be considered as being the same. Later in this chapter we will consider an equivalence relation on the states of a nite state machine. In that situation two states will be considered equivalent if they react in the same way to the same input. Example 1.3.17 A classic example of an equivalence relation is congruence modulo an integer. (Sometimes this is taught to children in the case of n = 12 and called clock arithmetic, since it is like adding hours on a clock, i.e. 4 hours after 11 oclock is 3 oclock.) Here, we x an integer, n, and we consider two other integers to be equivalent if when we divide n into each of them as much as possible, then they have the same remainders. Thus, if n = 7 then 15 and 22 are consider to be equivalent. Also 7 is equivalent to 0, 14, 21,... , i.e. 7 is equivalent to all multiple of 7. 1 is equivalent to all multiples of 7, plus 1. Hence, 1 is equivalent to -6. We will elaborate on this notion later, since it is very relevant to the study of nite state machines. Example 1.3.18 Suppose one declares two points in the plane to be equivalent if they have the same distance to the origin. Then this is clearly an equivalence relation. If one xes a point, the set of other points in the plane that are equivalent to that point is the set of points lying on the circle containing the point that is centered about the origin, unless of course the point is the origin, which is equivalent only to itself. (See gure 1.2). Denition 1.3.19 If is an equivalence relation on A and a A, then we dene the equivalence class of a, [a], as {b A|b a}. Notice that a b i [a] = [b]. If it is not true that a b, then [a] [b] = . Denition 1.3.20 A partition of a set A is a collection of non-empty subsets of A, P , with the properties (1) For all x A, there exists U P such that x U . (P covers A.) (2) If U, V P then either U = V or U V = . The elements of P are sometimes referred to as blocks. Very often though the set has some other structure and rather than block or equivalence class another term is used, such as residue class, coset or ber. The typical diagram that goes along with the denition of a partition can be seen in gure 1.3 . We draw

tiger lion cheetah mouse rat beaver

gorilla chimpanzee naked mole rats

antelope caribou deer

blue whale sperm whale humpback whale moose

capybara

Figure 1.3: A simple classication of animals provides an example of an equivalence relation. Thus, one may view each of the usual biological classications such as kingdom, phylum, genus, species, etc. as equivalence relations or partitions on the set of living things. the set A as a blob, and break it up into compartments, each compartment corresponding to an element of P. There is a canonical correspondence between partitions of a set and equivalence relations on that set. Namely, given an equivalence relation, one can dene a partition as the set of equivalence classes. On the other hand, given a partition of a set, one can dene two elements of the set to be equivalent if they lie in the same block. Denition 1.3.21 A function is a set of ordered pairs such that any two pairs with the same rst member also have the same second member. The domain of a function, f , dom(f ), is the set {a | b s.t. (a, b) f }. The range of f , ran(f ), is the set {b | a s.t. (a, b) f }. If the domain of f is A and the range of f is contained in b then we may write f : A B. Several comments need to be made about this denition. First, a function is a special kind of relation. Therefore we can use the relation notation af b to say that (a, b) f . This in not standard notation for functions! It is more common in this case to write b = f (a) or f (a) = b. This brings us to our second comment. The reader is probably used to specifying a function with a formula, like y = x2 , or f (x) = ecos(x) , or with the more modern notation x x2 . Even though this notation doesnt indicate what the domain and range of a function is, it is usually clear from the context. You may remember that a common exercise in calculus courses is to determine the domain and range of a function given by such a formula, assuming that the 1 ? domain and range lie in the real numbers. For example, what is the domain and range of y = 2 x 1 In this book we usually will need to be more explicit about such things, but that does not mean that the readers past experience with functions is not useful. Finally, consider the dierence between what is meant by a function in a programming language as opposed to our denition. Our denition doesnt make any reference to a method for computing the range values from the domain values. In fact this may be what people nd the most confusing about this sort of abstract denition the rst time they encounter it. The point is that the function exists independently of any method or formula that might be used to compute it. Notice that is very much in the philosophy of modern programming: functions should be given just with specs on what they will do, and the user need not know anything about the specic implementation. Another way to think of this is that for a given function there may be many programs that compute, and one should take care to distinguish between a function and a 9

y x z b

c d a

Figure 1.4: A schematic depiction of a function. The blob on the left represents the domain, with elements x, y and z, and the blob on the right contains the range. Thus, for example, f (x) = b. This function is not injective or surjective.
y x z b a c d y c b a d

v
x

w
z

Figure 1.5: The function on the left is injective, but not surjective. The function on the right is surjective but not injective. program that computes this function. In fact, below in example 1.3.40 you will see that there are functions which cant be computed at all! Example 1.3.22 The set {(1, 5), (2, 4), (3, 7), (6, 7)} is a function. Example 1.3.23 The set {(1, 5), (2, 4), (3, 7), (3, 7)} is not a function. It is a relation however. Example 1.3.24 Take f to be the set {(n, pn )|n Z + , pn is the nth prime number}. Many algorithms are know for computing this function, although this is an active area of research. No formula is known for this function. Example 1.3.25 Take f to be the set {(x, x)|x R, x 0}. This is the usual square root function. Example 1.3.26 The empty set may be considered a function. This may look silly, but it actually allows for certain things to work out very nicely, such as theorem 1.3.42. Denition 1.3.27 A function (A, f, B) is 1-1 or injective if any two pairs of f with the same second member have the same rst member. f is onto B, or surjective , if the range of f is B. Denition 1.3.28 A function f : A B that is both injective and surjective is called a bijection and we say that there is a bijection between A and B. The idea of a bijection between two sets is that the elements of the sets can be matched up in a unique way. This provides a more general way of comparing the sizes of things than counting them. For example, if you are given two groups of people and you want to see if they were the same size, then you could count them. But another way to check if they are the same size is to start pairing them o and see if you run out of people in one of the groups before the other. If you run out of people in both groups at the same time, then the groups are of the same size. This means that you have constructed a bijection between the two groups (sets, really) of people. The advantage of this technique is that it avoids counting. For nite sets this is a useful technique often used in the subject of combinatorics (see exercise ). But it has the nice feature that it also provides a way to check if two innite sets are the same size! A remarkable result of all this is that one nds that some innite objects are bigger than others. See example 1.3.40.

This point gets mapped to this point

Figure 1.6: A bijection from the circle minus a point to the real line. Example 1.3.29 It turns out that most of the innite sets that we meet are either the size of the integers or the size of the real numbers. Consider the circle x2 + (y 1)2 = 1 with the point (0, 2) removed. We claim that this set has the same size as the real numbers. One can geometrically see a bijection between the two sets in gure 1.6. We leave it as an exercise to the reader to write down the equations for this function and prove that it is a bijection. Example 1.3.30 Another surprising consequence of our notion of size is that the plane and the real line have the same number of points! One way to see this is to construct a bijection from (0, 1) (0, 1) (0, 1), i.e. from the unit square to the unit interval. To do this we use the decimal representation of the real numbers: each real number in (0, 1) can be written in the form .d1 d2 d3 ... where d is an integer from 0 to 9. We then dene the bijection as (.d1 d2 d3 ..., .c1 c2 c3 ...) .d1 c1 d2 c2 d3 c3 .... There is something to worry about here - namely that some numbers have two decimal representations. For example .1 = .2. We leave it to the reader as an exercise to provide the appropriate adjustments so that 9 the above map is a well-dened bijection. Denition 1.3.31 A sequence is a function whose domain is N . If f : N A then we say that f is a sequence of elements of A. Intuitively, one thinks of a sequence as an innite list, because given a sequence f we can list the elements of the sequence: f (0), f (1), f (2), .... We will be loose with the denition of sequence. For example, if the domain of f is the positive integers, will will also consider it a sequence, or if the domain is something like {1, 2, ...k} we will refer to f as a nite sequence. Denition 1.3.32 If there is a bijection between A and B then we say that A and B have the same cardinality. If there is a bijection between A and the set of natural numbers, then we say that A is denumerable. If a set is nite or denumerable, it may also be referred to as countable. Notice that we have not dened what is means for a set to be nite. There are at least two ways to do this. One way is to declare a set to be innite if there is a bijection from the set to a proper subset of that set. This makes it easy to show, for example that the integers are innite: just use x 2x. Then a set is said to be nite if it is not innite. This works out OK, but seems a bit indirect. Another way to dene nite is to say that there is a bijection between it and a subset of the natural numbers of the form {x|x n}, where n is a xed natural number. In any case, we will not go into these issues in any more detail. Note that, in light of our denition of a sequence, a set is countable is its elements can all be put on an innite list. This tells us immediately, for example, that the set of programs from a xed language is countable, since they can all be listed (list then alphabetically).

Denition 1.3.33 If f : A B is a bijection then we write A B. If f : A B is an injection and there is no bijection between A and B then we write A < B. Denition 1.3.34 The power set of a set A is the set (A) = {B | B A}. Example 1.3.35 If A = {1, 2, 3} then (A) = {{1}, {2}, {3}, {1, 2}, {1, 3}{2, 3}, {1, 2, 3}, }. Example 1.3.36 If A = then (A) = {}. The above example indicates that there is a natural injection from a set to its power set, i.e. x {x}. The fact that no such surjection exists is the next theorem. Theorem 1.3.37 For any set A, A < (A). Proof It would be a crime to deprive the reader of the opportunity to work on this problem. It is a tough problem, but even if you dont solve it, you will benet from working on it. See the exercises if you want a hint. Denition 1.3.38 For any two sets A and B we dene AB = {f | f : B A}. Example 1.3.39 {0, 1}{1,2,3} has eight elements. We can see this by counting. If we are given a function f : {1, 2, 3} {1, 2}, then how many choices do we have for f (1) ? Two, since it can be 0 or 1. Likewise for f (2) and f (3). Thus there are 2 2 2 = 8 dierent functions. Example 1.3.40 Here we show that N < N N , illustrating a technique called Cantor diagnolization. This method can also be used to show that N < R. We do the proof by contradiction. Suppose that N N N . Thus there is a bijection F : N N N . This means that we can list the elements of N N : F (0), F (1), F (2)... (remember - F (k) is an function, not a number!) We now show that there is a function not on the list. Dene g : N N by the equation g(n) = F (n)(n) + 1. Then g must be on the list, so it is F (k) for some k. But then F (k)(k) = g(k) = F (k)(k) + 1, a contradiction. Therefore no such bijection can exist, i.e. N N is not denumerable. Food for thought It would seem that if one picked a programming language, then the set of programs that one could write in that language would be denumerable. On the other hand, N N is not denumerable, so it would appear to be the case that there are more functions around then there are programs. This is the rst evidence we have that there are things around which just cant be computed. Denition 1.3.41 If A B then we dene the characteristic function, A of A (with respect to B) for each x B by letting A (x) = 1 if x B and A (x) = 0 otherwise. Notice that A is a function on B, not A, i.e. A : B {0, 1}. . In our above notation, The set of characteristics functions on a set A is just {0, 1} . Theorem 1.3.42 For any set A, {0, 1} (A).
A A

Proof We need to nd a function between these two sets and then prove that it is a 1-1 and onto function. So, give a characteristic function, f , what is a natural way to associate to it a subset of A ? Since f is a characteristic function, there exists a set A so that f = A . The idea is then to map f to A. In other words A dene F : {0, 1} (A) as F (f ) = F (A ) = A. Several things have to be done now. First of all how do we know that this function is well dened ? Maybe for a given characteristic function there are two unequal sets A and B and A = B . But it is easy to see that this cant happen (why ?). So now we know we have a well dened mapping. Next, we show that F is onto. Thus, for any U (A) we need to produce an f {0, 1}A such that F (f ) = U. Here we can just let U = {x A|f (x) = 1}. To show that F is 1-1 suppose that F (f ) = F (g). Say f = A and g = B . Thus A = B. But then certainly A = B , i.e. f = g. Denition 1.3.43 A partial order on A is a relation on A satisfying the following: for every a, b, c A (1) a a, (2) a b and b c implies that a c. (3) a b and b a implies that a = b. Example 1.3.44 The usual order on the set of real numbers is an example of a partial order that everyone is familiar with. To prove that (1), (2) and (3) hold requires that one knows how the real numbers are dened. We will just take them as given, so in this case we cant prove that the usual order is a total order. Example 1.3.45 A natural partial order that we have already encountered is the subset partial order, known also as inclusion. The reader should check (1), (2) and (3) do in fact hold. Denition 1.3.46 Given a set A, a, b and a partial order on A, we say that a and b are compatible if either a b or b a. A total order on a set is a partial order on that set in which any two elements of the set are compatible. Example 1.3.47 In example (1.3.44) the order is total, but in example (1.3.45) is is not. It is possible to draw a picture of this - see gure 1.7. Notice that there is one containment that we have not included in this diagram, the fact that {1} {1, 3}, since it was not typographically feasible using the containment sign. It is clear from this example that partially ordered sets are things that somehow cant be arranged in a linear fashion. For this reason total orders are also sometime called linear orderings. Example 1.3.48 The Sharkovsy ordering of N is: 3 < 5 < 7 < 9 < < 3 2 < 5 2 < 7 2 < 9 2 < < 3 22 < 5 22 < 7 22 < 9 22 < < 23 < 22 < 2 < 1. Notice that in this ordering of the natural numbers, there is a largest and a smallest element. This ordering has remarkable applications to theory of dynamical systems. Example 1.3.49 The dictionary ordering on Z Z, a total order, is dened as follows. (a, b) < (c, d) if a < c or a = c and b < d. This type of order can actually be dened on any number of products of any totally ordered sets. We nish this section with a discussion of the pigeon hole principle. The pigeon hole principle is probably the single most important elementary tool needed for the study of nite state automata. The idea is that if there are n + 1 pigeons, and n pigeon holes for them to sit in, then if all the pigeon go into the holes it then it must be the case that two pigeons occupy the same hole. It seems obvious, and it is. Nevertheless it turns out to be the crucial fact in proving many non-trivial things about nite state automata. The formal mathematical statement of the pigeon hole principle is that if f : A B and A and B are nite sets A has more elements then B, then f is not injective. How does one prove this ? We leave this to the curious reader, who wants to investigate the foundations some more. 13

{1, 2, 3}
U

U {2, 3}
U
U

{1, 2}

{1, 3}

U
{1}
U

U {2} U

U
{3}

Figure 1.7: Containment is a partial ordering on the power set of a set.

Exercises
1 Show that A B = B A. 2 Show that (A B) C = A (B C). 3 Show that (A B) C = A (B C). 4 Show that (a, b) = (c, d) implies that a = c and b = d. (Hint - consider separately the cases of a = b and a = b.) 5 Show that for any sets A, B and C that (i) A A, (ii) A B implies B A, and (iii) A B and B C implies that A C. (Given that (i), (ii) and (iii) are true it is tempting to say that is an equivalence relation on the set of sets. But exercise 21 shows why this can cause technical diculties.) 6 Show the following: (i) Z N (ii) nZ Z for any integer n. 7 Show that N N N. 8 Show that Q N. 9 Show that A C and B D implies that A B C D. 10 Show that A (B C) = (A B) (A C) and that A (B C) = (A B) (A C) 11 Show that A(B
C

ABC . Do you know any computer languages where this bijection occurs ? 14

12 How many elements are in ? 13 Show that {N n |n Z + } N . (This is the essence of what is called Godel numbering.)

14 Show that a countable union of countable sets is countable. 15 Show that if A has n elements and B has m elements, n, m Z + , then AB has nm elements. What does this have to do with exercise 12 ? 16 Use exercise 15 to show that if A has n N elements, then P(A) has 2n elements. 17 Recall that we dene 0! = 1 and for n > 0 we dene n! = n (n 1) (n 2) 2 1. We dene P erm(A) = {f AA |f is a bijection}. Show that if A has n elements then P erm(A) has n! elements. 18 Dene the relation div = {(a, b)|a, b N, a is a divisor of b}. Show that div is a partial order on N . 19 We dene intervals of real numbers in the usual way: [a, b] = {x R|a x b}, (a, b) = {x R|a < x < b}, etc. Show that if a < b and c < d that (a, b) (c, d) and that [a, b] [c, d]. 20 Show that [0, 1) [0, 1]. 21 Russels Paradox Consider the set U = {A|A A}. Show that U U i U U . Does this paradox mean that set theory is awed ? The answer is yes, in the sense that if you are not careful you will nd yourself in trouble. For example, you may be able to generate contradictions if you dont follow the rules of your set theory. Notice that we didnt state what the rules were and we ran into a paradox! This is what happened after Cantor introduced set theory: he didnt have any restrictions on what could be done with sets and it led to the above famous paradox, discover by Bertrand Russell. As a result of this paradox, people started to look very closely at the foundations of mathematics, and thereafter logic and set theory grew at a much more rapid pace than it had previously. Mathematicians came up with ways to avoid the paradoxes in a way that preserved the essence of Cantors original set theory. In the mean time, most of the mathematicians who were not working on the foundations, but on subjects like Topology and Dierential Equations ignored the paradoxes, and nothing bad happened. We will be able to do the same thing, i.e. we may ignore the technical diculties that can arise in set theory. The reason for this can be well expressed by the story of the patient who complains to his doctor of some pain: Patient : My hand hurts when I play the piano. What should I do ? Doctor: Dont play the piano. In other words, it will be OK for us to just use set theory if we just avoid doing certain things, like forming {A|A A} ! or forming sets like the set that contains all sets. This will denitely cause some trouble. Luckily, nothing we are interested in doing requires anything peculiar enough to cause this kind of trouble. 22 Show that for any set A, A < (A). (Hint-consider the subset of A {a A|a f (a)}. Keep Russels paradox in mind.) 23 Let A be a set of open intervals in the real line with the property that no two of them intersect. Show that A is countable. Generalize to higher dimensions. 24 One way to dene when a set is innite is if there is a bijection between the set and a proper subset of the set. Using this denition show that N, Q and R are innite. 25 Using the denition of innite from exercise 24, show that if a set contains an innite subset, then it is innite. 26 Again, using the denition of innite from exercise 24, show that if a < b then (a, b) and [a, b] are innite. 15

27 Show that that the Sharkovsky ordering of N is a total ordering. 28 Find an explicit way to well-order Q. Can you generalize this result ? 29 A partition of a positive integer is a decomposition of that integer into a sum of positive integers. For example, 5 + 5, and 2 + 6 + 1 + 1 are both partitions of 10. Here order doesnt count, so we consider 1 + 2 and 2 + 1 as the same partition of 3. The numbers that appear in a partitions are called the parts of the partition. Thus the partition of 12, 3 + 2 + 2 + 5, has four parts: 3, 2, 2 and 5. Show that the number of ways to partition n into partitions with m parts is equal to the number of partitions of n where the largest part in any of the partitions is m. (Hint - Find a geometric way to represent a partition.) Can you re-phrase this exercise for sets ? 30 Cantor-Bernstein Theorem Show that if there is an injection from A into B, and an injection from B into A, that there is a bijection between A and B. 31 If P is a partition of A then there is a natural map : A P dened by a [a]. Show that is surjective. 32 The First Isomorphism Theorem for Sets Given a function between two sets f : A B, there is a natural partition Pf with corresponding equivalence relation induced on A by dening [a] = f 1 (f (a)). Prove that (i) Pf is a partition of A, (ii) For all a, b A, a b i f (a) = f (b), (ii) there is a bijection : Pf ran(f ) with the property that f = . This last property is sometimes phrased as saying that the following diagram commutes:
f A Pf B

This theorem occur in several dierent forms, most notably in group theory. 33 Generalize example 1.3.29 to 2-dimensions by considering a sphere with its north pole deleted sitting on a plane. Write down the equations for the map and prove that it is a bijection. This famous map is called the stereographic projection. It was once used by map makers, but is very useful in mathematics (particularly complex analysis) due to its numerous special properties. For example, it maps circles on the sphere to circles or lines in the plane.

1.4

The Natural numbers and Induction

Now that the reader has had a brief (and very intense) introduction to set theory, we now look closely and the set of natural numbers and the method of proof called induction. What is funny about induction is that from studying the natural numbers one discovers a new way to do proofs. You may then wonder, what really comes rst - the set theory or the logic. Again, we wont address these issues, since they would bring us too far astray. We introduce the reader to induction with two examples, and then give a formal statement. The rst example seems simple, and on rst meeting induction through it, the reader may think how can such a simple idea be useful ?. We hope that the second example, which is a more subtle application of induction, 16

Figure 1.8: Induction via light bulbs

Figure 1.9: If bulb 3 is on, then so it 4, 5, etc. answers this question. Induction is not merely useful - it is one of the most powerful techniques available to mathematicians and computers scientists, not to mention one of the most commonly used. Not only can induction be used to prove things, but it usually gives a constructive method for doing so. As a result, induction is closely related to recursion, a crucial concept (to say the least!) in computer science. Our rst example will illustrate the idea of induction, and the second and third are applications. Suppose that there is an innite line of light bulbs, which we can think of as going from left to right, and labeled with 0,1,2,.... The light bulbs are all in the same circuit, and are wired to obey the following rule: If a given light bulb is lit, then the light bulb to the right of it will also be lit. Given this, what can be concluded if one is told that a given light bulb is lit ? Clearly, all of the bulbs to the right of that bulb will also be lit. In particular, what will happen if the rst light bulb in the line is turned on ? It seems to be an obvious conclusion that they must all be on. If the you understands this, then you understand one form induction. The next step is to learn how to apply it. Our second example is a game sometimes called the towers of Hanoi. The game consists of three spindles, A, B and C, and a stack of n disks, which uniformly diminish in size from the rst to the last disk. The disks have holes in them and in the start of the game are stacked on spindle A, with the largest on the bottom to the smallest on the top, as in gure. You are allowed to move a disk and put it on a second spindle provided that there is no smaller disk already there. The goal of the game is to move all of the disks to spindle B. If one plays this game for a little while, it becomes clear that it is pretty tricky. But if it is approached systematically, there is an elegant solution that is nice example of induction.

Figure 1.10: If bulb 0 is lit, then they all are!

Figure 1.11: The Towers of Hanoi First, try solving the game with just two disks. This is easily accomplished through the following moves: disk 1 rst goes to C, then disk 2 goes to B, then disk 1 goes to B. Notice that there is nothing special about spindle B here - we could have moved all of the disks to spindle C in the same manner. Now consider the game with three disks. We come to this problem with the solution of the rst, and we use it as follows. We know that we are free to move the top two disks around however we like, keeping the smaller one above larger one. So use this to move them to spindle C. Then move the remaining third disk to spindle B. Finally, apply our method of moving the two disks again to the disks on C, moving them to B. This wins the game for us. The game with three disks then allows us to solve the game with four disks. (Do you see how ?) At this point, if you see the general pattern you probably will say ahh! you can then always solve the game for any number of disks!. This means that the induction is so clear to you, that you dont even realize that it is there! Lets look closely at the logic of the game. Suppose that Pn is the statement the game with n disks can be solved. We saw that P2 was a true statement. And after seeing that Pn Pn+1 we immediately concluded that Pn was always true. Well, this certainly is intuitive. And you may even see how this is really the same as the light bulb example. So we now ask, what is it that allows us to conclude that Pn is true for all n ? Or, what is it that allows us to conclude that all of the lightbulbs are on if the rst one is on ? Well, if one wants a formal proof, the right way to start is to say Its so obvious, how could it be otherwise ?. This is key to illuminating what is going on here. Assume that for some positive integer, m, Pm was false. Then it must be that Pm1 is also false, or else we would have a contradiction. You may be tempted to try the same reasoning on Pm1 , concluding that Pm2 is false. Where does it end ? If there was a smallest value k for which Pk was false, then we would know that Pk1 was true. But we know that if Pk1 is true that then Pk must be true, a contradiction. To sum up, if one of the Pn s was false, then we can make a contradiction by looking at the smallest one that is false, so they must all be true. This is all perfectly correct, and the fact that we are using that is so crucial is the following: Every non-empty subset of N has a smallest member. The above fact is known as the well-ordering property of N, and we will take this as an axiom. 1 To sum up, the principle of induction is as follows: if one know that P1 is true and that Pn Pn+1 for every n, then Pn is true for every n. Incidentally, you might nd it useful to ask yourself if there are any sets besides N for which the above is true. For example, it is certainly not true for the integers. It turns out that any set can be ordered in such a way that the above property holds, if you have the right axioms of set theory. An ordering with this property is called a well-ordering. If you try to imagine a well-ordering of R, you will see that this is a very strange business!
1 There are many dierent ways to approach this - one can even start with a set of axioms for the real numbers and dene N and then prove the well-ordering property.

Example 1.4.1 Show that 1 + 2 + + n =

n(n 1) . 2 Proof Let the above statement be Pn . Clearly it is true if n = 1. If we know Pn is true then adding n + 1 to both sides of the equality gives 1+2++n+n+1= n(n 1) + n + 1, 2

but some algebra shows that this is just Pn+1 . Therefore, Pn Pn+1 , so by induction the formula is always true for all n = 1, 2, .... This is a classic example, and probably the single most used example in teaching induction. There is one troubling thing about this though, namely where does the formula come from ? This is actually the hard part! In this case there is a geometric way to guess the above formula (see exercise 4). On one hand, mathematical induction provides a way of proving statements that you can guess, so it doesnt provide anything new except in the verication of a statement. But on the other hand it is closely related to recursion, and can be used in a constructive fashion to solve a problem. If we stopped at this point, you might have gotten the impression from the above example that induction is a trick for proving that certain algebraic formulas or identities are true. Induction is a very useful tool for proving such things, but it can handle much more general things, such as the following example. Example 1.4.2 Recall that an integer > 1 is said to be prime if its only divisors are 1 and itself. Show that every integer > 1 is a product of prime numbers. Proof The proof is by induction. Let Pn be the statement every integer from 2 and n is a product of primes. The base case, P2 , 2 is a product of primes, is certainly true. We want to show that if we assume that Pn is true that we can prove Pn+1 , so we assume that every integer from 2 and n is a product of primes. Pn+1 is the statement every integer from 2 to n + 1 is a product of primes. This means that we need to show that n + 1 is a product of primes. If n + 1 is a prime number then certainly this is true. Otherwise, n + 1 = ab where a and b are integers with 2 a, b n. But now we can use Pn to conclude that a and b are products of primes. But then clearly n + 1 is a product of primes since n + 1 = ab.

Exercises
1 Prove that 1 1 1 1 1 + + + =1 . 12 23 34 n(n + 1) n+1

Can you give a proof without induction ?

2 Write a computer program that wins the towers of Hanoi game. Do you see the relationship between induction and recursion ? 3 How many moves does it take to win the towers of Hanoi game if it has n disks ? 4 Find a formula for the sum 1 + 3 + 5 + + 2n 1, and prove it by induction. There are at least two ways to do this, one using a geometric representation of the sum as is indicated below. 5 Find a geometric representation, like that given in exercise 4, for the formula in example 1.4.1. 1 xn+1 . 1x It is also possible to prove this directly, using algebra. For what x is this formula true (e.g. does it work when x is a real number, a complex number, a matrix, an integer mod k, etc.) ? 6 Use induction to prove the formula for the sum of a geometric series: 1+x+x2 +x3 ++xn = 19

7 Use induction to show that 12 + 22 + + n2 = ? 8 Notice that

n(n + 1)(2n + 1) . What about for sums of higher powers 6

13 = (1)2 , 13 + 23 = (1 + 2)2 , 13 + 23 + 33 = (1 + 2 + 3)2 . Can you formulate a general statement and prove it with induction ? 9 Recall that we dene 0! = 1 and for n > 0 we dene n! = n (n 1) (n 2) 2 1. Notice that 1! = 1!, 1! 3! = 3!, 1! 3! 5! = 6!, 1! 3! 5! 7! = 10!. Can you formulate a general statement and prove it with induction ? 10 We dene Cn,k = n! . Use induction to show that Cn,0 + Cn,1 + + Cn,n = 2n . k!(n k)!

11 If one had n objects, then the number of way of picking k objects from the n objects is in fact Cn,k . Use this fact to give another solution to exercise 10. 12 In section 1.3, exercises 15, 16 and 17, did you use induction in your proofs ? If not, redo them using induction. 13 For a xed n = 1, 2, ... we dene an n-vector x = (x1 , ..., xn ) as an ordered n-tuple of real numbers. The norm of ||x|| is the number x2 + + x2 . We may add and subtract vectors componentwise. The triangle n 1 inequality states that for any three n-vectors x, y and z ||x z|| ||x y|| + ||y z||. Using this and induction prove the generalized triangle inequality: If x1 , ..., xm are n-vectors, then ||x1 xm || ||x1 x2 || + ||x2 x3 || + + ||xm1 xm ||. 14 (This exercise requires some knowledge of linear algebra) Show that the determinant of an upper triangular matrix is the product of its diagnol entries.

1.5

Foundations of Language Theory

We now begin to lay the mathematical foundations of languages that we will use throughout the rest of this book. Our viewpoint a language is a set of strings. In turn, a string is a nite sequence of letters from some alphabet. These concepts are dened rigorously as follows. Denition 1.5.1 An alphabet is any nite set. We will usually use the symbol to represent an alphabet and write = {a1 , . . . , ak }. The ai are called the symbols of the alphabet. Denition 1.5.2 A string (over ) is a function u : {1, ..., n} or the function : . The latter is called the empty string or null string and is sometimes denoted by , , e or 1. If a string is non-empty then we may write it by listing the elements of its range in order. Example 1.5.3 = {a, b}, u : {1, 2, 3} by u(1) = a, u(2) = b and u(3) = a. We write this string as aba. The set of all strings over is denoted as . Thus {a} = {an |n = 0, 1, ...}, where we have introduced the convention that a0 = . Observe that is a countable set. It is a useful convention to use letters from the beginning of the alphabet to represent single letters and letters from the end of the alphabet to represent strings. Warning Although letters like a and b are used to represent specic elements of an alphabet, they may also be used to represent variable elements of an alphabet, i.e. one may encounter a statement like Suppose that = {0, 1} and let a .

A language (over ) is a subset of . Concatenation is a binary operation on the strings over a given alphabet . If u : {1, ..., m} and v : {1, ..., n} then we dene u v : {1, ..., m + n} as u(1)...u(m)v(1)...v(n) or (u v)(i) = u(i) for 1 i m v(i m) for m + 1 i m + n.

If u is the empty string then we dene u v = v and similarly if v is the empty string. Generally the dot will not be written between letters. Remarks Concatenation is not commutative, e.g. (ab)(bb) = (bb)(ab). But it is true that for any string u, un um = um un . Concatenation is associative, i.e. u(vw) = (uv)w. u is a prex of v if there exists y such that v = uy. u is a sux of v if there exists x such that v = xu. u is a substring of v if there exists x and y such that v = xuy. We say that u is a proper prex (sux, substring) of v i u is a prex (sux, substring) of v and u = v. Each of the above relations are partial orders on . Given that is a totally ordered set, e.g. = {a1 , ..., an }, then there is a natural extension to a total order on , called the lexicographic ordering. We dene u v if u is a prex of v or there exists x, y, z and ai , aj such that in the order of we have that ai < aj and u = xai y and v = xaj z.

Exercises
1 Given a string w, its reversal wR is dened inductively as follows: R = , (ua)R = auR , where a . Also, recall that u0 = , and un+1 = un u. Prove that (wn )R = (wR )n . 2 Suppose that a and b are two dierent members of an alphabet. Prove that ab = ba. 3 Suppose that u and v are non-empty strings over an alphabet. Prove that if uv = vu then there is a string w and natural numbers m, n such that u = wm , v = wn . 21

4 Prove that for any alphabet , is a countable set. 5 Lurking behind the notions of alphabet and language is the idea of a semi-group, i.e. a set equipped with an associative law of composition that has an identity element. is the free semi-group over . Is a given language over necessarily a semi-group ?

1.6

Operations on Languages

A way of building more complex languages from simpler ones is to combine them using various operations. The union and intersection operations we have already seen. Given some alphabet , for any two languages S, T over , the dierence S T of S and T is the language S T = {w | w S and w T }. /

The dierence is also called the relative complement . A special case of the dierence is obtained when S = , in which case we dene the complement L of a language L as L = {w | w L}. / The above operations do not make use the structure of strings. T he following operations make use of concatenation. Denition 1.6.1 Given an alphabet , for any two languages S, T over , the concatenation ST of S and T is the language ST = {w | u S, v T, w = uv}. L0 = {}, Ln+1 = Ln L.

For any language L, we dene Ln as follows:

Example 1.6.2 For example, if S = {a, b, ab}, T = {ba, b, ab} and U = {a, a2 , a3 } then S 2 = {aa, ab, aab, ba, bb, bab, aba, abb, abab}, T 2 = {baba, bab, baab, bba, bb, abba, abb, abab}, U 2 = {a2 , a3 , a4 , a5 , a6 }, ST = {aba, ab, aab, bba, bab, bb, abba, abb}. Notice that even though S, T and U have the same number of elements, their squares all have dierent numbers of elements. See the exercises for more on this funny phenomenon. Multiplication of languages has lots of nice properties, such as L = , and L{} = L. In general, ST = T S. So far, all of the operations that we have introduced preserve the niteness of languages. This is not the case for the next two operations.

Denition 1.6.3 Given an alphabet , for any language L over , the Kleene -closure L of L is the innite union L = L0 L1 L2 . . . Ln . . . . The Kleene +-closure L+ of L is the innite union L+ = L1 L2 . . . Ln . . . . Since L1 = L, both L and L+ contain L. Also, notice that since L0 = {}, the language L always contains , and we have L = L+ {}.

Remark has already been dened when is an alphabet. Modulo some set theory, the Kleene *-closure of coincides with this previous denition if we view as a language over itself. Therefore the Kleene *-closure is an extension of our original * operation.

However, if L, then L+ . / /

Exercises
1 Prove the following identities: (i) L = , (ii) L = , (iii) L{} = L, (iv) {}L = L, (v) (S {})T = ST T, (vi) S(T {}) = ST S, (vii) Ln L = LLn . (viii) = {}, (x) L = L , (xi) L L = L . 2 Given a language L over , we dene the reverse of L as LR = {wR | w L}. For each of the following, either prove equality or provide a counter example. Which of the false equalities can be made true by replacing = with a containment sign ? (i)(S T )R = S R T R ; (ii)(ST )R = T R S R ; (iii)(L )R = (LR ) . (iv)(S T ) = S T . (v)(ST ) = T S 3 Prove that if LL = L then L contains the empty string or L = . 4 Suppose that L1 and T are languages over a two letter alphabet. If ST = T S is S = T ? 5 Does A = B imply that A = B ? Find a counter example or provide a proof. 23 (ix) L+ = L L,

a 1 a b b 3 2 a b b a 4

Figure 1.12: A dfa 6 Let Lk = {a, a2 , ..., ak }. How many elements are in L2 ? k 7 If L is a nite language with k elements, show that L2 has at most k 2 elements. For each positive integer k nd a language with k elements whose square has k 2 elements. Can this be done with a language over a one letter alphabet ? 8 If L is a nite language with k elements, show that L2 has at least k elements. How close can you come to this lower bound with an example ?

1.7

Deterministic Finite Automata

We are now ready to dene the basic type of machine, the Deterministic Finite Automaton, or DFA. These objects will take a string and either accept or reject it, and thus dene a language. Our task is to rigorously dene these objects and then explain what it means for one of them to accept a language. Example 1.7.1 Let = {a, b}, S = {ab, abab, ababab, ...}. A machine to accept S is A few comments are necessary. First of all, the small arrow on the left of the diagram is pointing to the start state, 1, of the machine. This is where we input strings. The circle on the far right with the smaller circle and the 4 in it is a nal state, which is where we need to nish if a string is to be accepted. Our convention is that we always point a little arrow to the start state and put little circles in the nal states (there may be more than one). How does this machine process strings ? Take abb for example. We start at state 1 and examine the leftmost letter of the string rst. This is an a so we move from state 1 to 2. Then we consider the second leftmost letter, b, which according to the machine, moves us from state 2 to state 4. Finally, we read a b, which moves us from state 4 to state 3. State 3 is not a nal state, so the string is rejected. If our string had been ab, we would have nished in state 4, and so the string would be accepted. What roles is played by state 3 ? If a string has been partially read into the machine and a sequence of abs has been encountered then we dont know yet if we want to keep the string, until we get to the end or we get an aa or bb. So we bounce back and forth between states 2 and 4. But if, for example, we encounter the letters bb in our string, then we know that we dont want to accept it. Then we go to a state that is not nal, 3, and stay there. State 3 is an example of a dead state, i.e. a non-nal state where all of the outgoing arrows point back at the state. Example 1.7.2 Let T = S {}. A machine that accepts T is The point here is that if we allow the empty string we can simplify the machine. The interpretation of processing the empty string is simply that we start at state 1 and move to state 1. Thus, if the start state is also a nal state, then empty string is accepted by the machine. The formal denition of a DFA should now more accessible to the reader. 24

a 1 b b 3 a 2

a,b
Figure 1.13: Denition 1.7.3 A deterministic nite automata is a 5-tuple (Q, , , q0 , F ) where Q and are nite sets, called the states and the alphabet, : Q Q is the transition function, q0 Q is a distinguished state called the start state and F is a subset of the set of states, known as the set of nal states. Notice that our denition doesnt say anything about how to compute with a DFA. To do that we have to make more denitions. The function obviously corresponds to the labeled arrows in the examples we have seen: given that we are in a state p, if we receive a letter a then we move to (p, a). But this doesnt tell us what to do with an element of . We need to extend to a function where : Q Q. This is done inductively by letting (q, ) = q and (q, ua) = ( (q, u), a) where a and u . We then have Denition 1.7.4 If M = (Q, , , q0 , F ) is a DFA then we dene the language accepted by M to be L(M ) = {w | (q0 , w) F }. A language L is said to be regular if there is a DFA M such that L = L(M ). Sometimes we say that M recognizes L. The fact that not every language over a given alphabet is regular can be proved by a cardinality argument. The number of possible dfas that can be constructed over a given alphabet is a countable set, but the total number of languages over any alphabet is uncountable. So in fact most languages are not regular!. The class of regular languages is the smallest class of languages in a hierarchy of classes that we will consider. To explicitly give an example of a language that is not regular though, we will need something called the pumping lemma. But rst we will give more examples of DFAs and their languages. Example 1.7.5 If L = {w {a, b} | w contains an odd number of a s} then a DFA specifying L is Example 1.7.6 If L = {w {a, b} | w ends in the string abb} then a DFA specifying L is A useful concept is the length of a string w, denoted |w|, which is dened to be the total number of letters in a string if the string is non-empty, and 0 is the string is empty. 25

a a
Figure 1.14:
b

a a

a
b b

a
b

Figure 1.15: Example 1.7.7 If L = {w {a, b} | |w| = 3 } then a DFA specifying L is

Example 1.7.8 If L = {w {a, b} | w = an bm , n, m > 0 } then a DFA specifying L is Example 1.7.9 If L = {w {a} | |w| = a4k+1 , k 0 } then a DFA specifying L is

Exercises
1 Write a computer program taking as input a DFA D = (Q, , , q0 , F ) and a string w, and returning the sequence of states traversed along the path specied by w (from the start state). The program should also indicate whether or not the string is accepted. 2 Show that if L is regular then LR is regular. 3 Construct DFAs for the following languages: (a) {w | w {a, b} , w has neither aa nor bb as a substring}. (b) {w | w {a, b} , w has an odd number of bs and an even number of as}. 4 Let L be a regular language over some alphabet . (a) Is the language L1 consisting of all strings in L of length 200 a regular language? (b) Is the language L2 consisting of all strings in L of length > 200 a regular language? Justify your answer in both cases.
a; b a; b a; b a; b a; b

Figure 1.16:

a a
b b

a; b

Figure 1.17:

a a a
Figure 1.18: 5 Classify all regular languages on a one letter alphabet. 6 Suppose that L is a language over and one letter alphabet and L = L . Show that L is regular. 7 How many distinct DFAs are there on a given set of n states over an alphabet with k letters ? 8 Show that every nite language is regular. 9 Suppose that a language L is nite. What is the minimum number of states that a machine accepting L need have ? 10 Let be an alphabet, and let L1 , L2 , L be languages over . Prove or disprove the following statements (if false, then provide a counter example). (i) If L1 L2 is a regular language, then either L1 or L2 is regular. (ii) If L1 L2 is a regular language, then either L1 or L2 is regular. (iii) If L is a regular language, then L is regular. 11 Dene a language to be one-state if there is a DFA accepting it that has only one nal state. Show that every regular language is a nite union of one-state languages. Give an example of a language that is not one-state and prove that it is not. 12 A DFA is said to be connected if given q Q there is a string w such that (q0 , w) = q. Show that if a language is regular, then there is a connected DFA accepting that language. 13 What eect does the changing of the start state of a given machine have on the language accepted by that machine ? 27

14 What eect does the changing of the nal states of a given machine have on the language accepted by that machine ? 15 In our denition of a DFA we had that : Q Q. We could have curried our map, and instead dened : QQ , i.e. every letter a gives rise to a map fa : Q Q where fa (q) = (a, q) (see the exercises of 1.2). We may then dene : QQ . is an example of a monoid action. For a given machine it may be the case that ( ) P erm(Q), where P erm(Q) is the set of bijections from Q to itself. Show that if this is the case and the machine is connected that for each letter a of and n N there is a string accepted by the machine which contains a power of a greater than n. 16 For any language L over , dene the prex closure of L as P re(L) = {u |v such that uv L}. Is it true that L being regular implies P re(L) is regular ? What about the converse ? 17 Show that {an bm |n, m N are relatively prime} is not regular.

1.8

The Cross Product Construction

Now that we have dened what it means for a language to be regular over an alphabet, it is natural to ask what sort of closure properties this collection has under some of the dened properties, i.e. is it closed under unions, intersection, reversal, etc. To answer these questions we are led to some new constructions. The most natural question to ask is whether the union of regular languages is regular. In this case we must restrict ourselves to nite unions, since every language is the union of nite languages. It is in fact true that the union of two regular languages over a common alphabet is regular. To show this we introduce the notion of the cross product of DFAs. Let = {a1 , . . . , am } be an alphabet and suppose that we are given two DFAs D1 = (Q1 , , 1 , q0,1 , F1 ) and D2 = (Q2 , , 2 , q0,2 , F2 ), accepting L1 and L2 respectively. We will show that the union, the intersection, and the relative complement of regular languages is a regular language. First we will explain how to construct a DFA accepting the intersection L1 L2 . The idea is to construct a DFA simulating D1 and D2 in parallel. This can be done by using states which are pairs (p1 , p2 ) Q1 Q2 . Dene a DFA, D, as follows: D = (Q1 Q2 , , , (q0,1 , q0,2 ), F1 F2 ), ((p1 , p2 ), a) = (1 (p1 , a), 2 (p2 , a)), for all p1 Q1 , p2 Q2 , and a . Clearly D is a DFA, since D1 and D2 are. Also, by the denition of , we have
((p1 , p2 ), w) = ((1 (p1 , w), 2 (p2 , w)),

where the transition function : (Q1 Q2 ) Q1 Q2 is dened by

for all p1 Q1 , p2 Q2 , and w . Example 1.8.1 A product of two DFAs with two states each is given below.
We have that w L(D1 ) L(D2 ) i w L(D1 ) and w L(D2 ),i 1 (q0,1 , w) F1 and 2 (q0,2 , w) F2 , i ((q0,1 , q0,2 ), , w) F1 F2 , i w L(D). Thus L(D) = L(D1 ) L(D2 ).

We can now modify D very easily to accept L(D1 )L(D2 ). We change the set of nal to (F1 Q2 )(Q1 F2 ). Then 28

cross product figure

Figure 1.19: w L(D1 ) L(D2 )i w L(D1 ) or w L(D2 )i 1 (q0,1 , w) F1 or 2 (q0,2 , w) F2 , i ((q0,1 , q0,2 ), w) (F1 Q2 ) (Q1 F2 ), i w L(D). Thus, L(D) = L(D1 ) L(D2 ). We can also modify D to accept L(D1 ) L(D2 ). We change the set of nal states to F1 (Q2 F2 ). Then w L(D1 ) L(D2 )i w L(D1 ) and w L(D2 ), i / 1 (q0,1 , w) F1 and2 (q0,2 , w) F2 , i / ((q0,1 , q0,2 ), w) F1 (Q2 F2 ), i w L(D). Thus, L(D) = L(D1 ) L(D2 ). In all cases if D1 has n1 states and D2 has n2 states then the DFA D has n1 n2 states.

Exercises
1 A morphism between two DFAs D1 = (Q1 , , 1 , q0,1 , F1 ) and D2 = (Q2 , , 2 , q0,2 , F2 ) is a function f : Q1 Q2 such that f (1 (q, a)) = 2 (f (q), a) for all q Q1 , a , f (q0,1 ) = q0,2 and f (F1 ) F2 (note that we require D1 and D2 to have the same alphabets). If the morphism is surjective then we say that D2 is the homomorphic image of D1 . i) Show that if u then f (1 (q, u)) = 2 (f (q), u) for all q Q1 , a . ii) Show that if f : D1 D2 is a morphism, then L(D1 ) L(D2 ). When is L(D1 ) = L(D2 ) ? iii) By the product of D1 and D2 , denoted D1 D2 , we mean the product machine described above with F1 F2 as nal states. Show D1 and D2 are homomorphic images of D1 D2 . 2 A morphism f between D1 and D2 is called an isomorphism if it is bijective and f (F1 ) = F2 . If a isomorphism between D1 and D2 exists then D1 and D2 are said to be isomorphic and we write D1 D2 . i) Show that the inverse of an isomorphism is a morphism, hence is an isomorphism. ii) Show that D1 D2 implies that L(D1 ) = L(D2 ). iii) Show that for a given alphabet there is a machine I over that alphabet such that for any other DFA D over the same alphabet we have that D I I D D. 3 (For readers who like Category theory) Show that the collection of DFAs over a xed alphabet forms a category. Are there products ? Coproducts ? Is there a terminal object ?

1.9

Non-Deterministic Finite Automata

There would appear to be a number of obvious variations on the denition of a DFA. One might allow for example, that the transition function not necessarily be dened for every state and letter, i.e. not every state has || arrows coming out of it. Or perhaps we could allow many arrows out of a state labeled with the same letter. Whatever we dene though, we have the same issue to confront that we did for DFAs, namely, what does mean for a machine to accept a language? After this is done, we will see that all the little variations dont give us any new languages, i.e. the new machines are not computing anything dierent then the old. Then why bother with them ? Because they make the constructions of some things much easier and often make proofs clear where they were not at all clear earlier. The object that we dene below has the features that we just described, plus we will now allow arrows to be labeled with . Roughly, the idea here is that you can move along an arrow labeled if that is what you want to do. Denition 1.9.1 A non-deterministic nite automata (or N F A) is a ve-tuple (Q, , , q0 , F ) where Q and are nite sets, called the states and the alphabet, : Q ( {}) 2Q is the transition function, q0 Q is a distinguished states, called the start state and F is a subset of the set of states, known as the set of nal states. There are three funny things about this denition. First of all is the non-determinism, i.e. given a string there are many paths to choose from. This is probably the hardest to swallow, as it seems too powerful. The next thing is that we have moves. One way to think of this is that we take our string and stick in epsilons wherever we want, and then feed it to the machine. Thirdly, since the range of is the power set of Q, a pair (q, a) may be mapped to the null set, which in terms of arrows and a diagram means that no arrow out the state p has the label a. We would like to dene the language accepted by N , and for this, we need to extend the transition function : Q ( {})2Q to a function : Q 2 Q . The presence of -transitions (i.e., when q (p, )) causes technical problems. To overcome these problems we introduce the notion of -closure. Denition 1.9.2 For any state p of an NFA we dene the -closure of p to be set -closure(p) consisting of all states q such that there is a path from p to q whose spelling is . This means that either q = p, or that all the edges on the path from p to q have the label . We can compute -closure(p) using a sequence of approximations as follows. Dene the sequence of sets of states (-cloi (p))i0 as follows: -clo0 (p) = {p},

-cloi+1 (p) = -cloi (p) {q Q | s -cloi (p), q (s, )}. Since -cloi (p) -cloi+1 (p), -cloi (p) Q, for all i 0, and Q is nite, there is a smallest i, say i0 , such that -cloi0 (p) = -cloi0 +1 (p), and it is immediately veried that -closure(p) = -cloi0 (p).

(It should be noted that there are more ecient ways of computing -closure(p), for example, using a stack (basically, a kind of depth-rst search.)) When N has no -transitions, i.e., when (p, ) = for all p Q (which means that can be viewed as a function : Q 2Q ) we have -closure(p) = {p}. Given a subset S of Q, we dene -closure(S) as -closure(S) =
pS

-closure(p).

When N has no -transitions we have -closure(S) = S. We are now ready to dene the extension : Q 2Q of the transition function : Q ( {})2Q . The intuition behind the extended transition function is that (p, w) is the set of all states reachable from p by a path whose spelling is w. Denition 1.9.3 Given an NFA N = (Q, , , q0 , F ) (with -transitions), the extended transition function : Q 2Q is dened as follows: for every p Q, every u and every a , (p, ) = -closure({p}), (p, ua) = -closure
s (p,u)

(s, a) .

The language L(N ) accepted by an NFA N is the set L(N ) = {w | (q0 , w) F = }. We now extend : Q 2Q to a function : 2Q 2Q dened as follows: for every subset S of Q, for every w , (S, w) = (p, w).
pS

Let Q be the subset of 2Q consisting of those subsets S of Q that are -closed, i.e., such that S = -closure(S). If we consider the restriction : Q Q of : 2Q 2Q to Q and , we observe that is the transition function of a DFA. Indeed, this is the transition function of a DFA accepting L(N ). It is easy to show that is dened directly as follows (on subsets S in Q): (S, a) = -closure
sS

(s, a) .

Then, the DFA D is dened as follows: D = (Q, , , -closure({ }), F), It is not dicult to show that L(D) = L(N ), that is, D is a DFA accepting L(N ). Thus we have converted the NFA N into a DFA D (and gotten rid of -transitions). The states of the DFA D equivalent to N are -closed subsets of Q. For this reason the above construction is often called the subset construction. It is due to Rabin and Scott. Although theoretically ne, the method may construct useless sets S that are not reachable from the start state -closure({q0 }). A more economical construction is given below. 31 where F = {S Q | S F = }.

An Algorithm to convert an NFA into a DFA: The subset construction Given an input NFA N = (Q, , , q0, F ), a DFA D = (K, , , S0, F) is constructed. It is assumed that is a list of triples (S, a, T ), with S, T K, and a . S0 := -closure({q0}); K := {S0}; total := 1; marked := 0; := nil; while marked < total do; marked := marked + 1; S := K[marked]; for each a do U := sS (s, a); T := -closure(U ); if T K then / total := total + 1; K[total] := T endif; := append[, (S, a, T )] endfor endwhile; F := {S K | S F = }
Exercises
1 Implement the subset algorithm for converting an NFA with -transitions to a DFA. Pay particular attention to the input and output format. Explain your data structures. How do you deal with set comparisons? 2 Let = {a1 , . . . , an } be an alphabet of n symbols. (i) Construct an NFA with 2n + 1 states accepting the set Ln of strings over such that, every string in Ln has an odd number of ai , for some ai . Equivalently, if Li is the set of all strings over with an odd n number of ai , then Ln = L1 Ln . n n (ii) Prove that there is a DFA with 2n states accepting the language Ln .

3 Prove that every DFA accepting Ln (from problem 2) has at least 2n states. Hint : If a DFA D with k < 2n states accepts Ln , show that there are two strings u, v with the property that, for some ai , u contains an odd number of ai s, v contains an even number of ai s, and D ends in the same state after processing u and v. From this, conclude that D accepts incorrect strings. 4 (i) Given two alphabets and with , then we may dene a map e : by simply taking a string in and removing all occurrences of elements of , i.e. just erase the letters that arent in . Show that if L is regular over then e(L) is regular over . (ii) On the other hand, suppose that L . Then for any a , dene the a-blow-up of L to be La = {ak1 u1 ...akn un |u1 ...un L and k1 , .., kn 0}. Show that La is regular if L is regular. (iii) Again, suppose that L . Dene the blow-up of L relative to to be L = {w1 u1 ...wn un |u1 ...un L and w1 , ...wn }. Is L regular over ?

1.10

Directed Graphs and Paths

It is often useful to view DFAs and NFAs as labeled directed graphs. The purpose of this section is to review some of these concepts. We begin with directed graphs. Our denition is very exible, since it allows parallel edges and self loops. Denition 1.10.1 A directed graph is a quadruple G = (V, E, s, t), where V is a set of vertices, or nodes, E is a set of edges, or arcs, and s, t: E V are two functions, s being called the source function, and t the target function. Given an edge e E, we also call s(e) the origin (or source) of e, and t(e) the endpoint (or target ) of e. Remark The functions s, t need not be injective or surjective. Thus, we allow isolated vertices. Example 1.10.2 Let G be the directed graph dened such that E = {e1 , e2 , e3 , e4 , e5 , e6 , e7 , e8 }, V = {v1 , v2 , v3 , v4 , v5 , v6 }, and s(e1 ) = v1 , s(e2 ) = v2 , s(e3 ) = v3 , s(e4 ) = v4 , s(e5 ) = v2 , s(e6 ) = v5 , s(e7 ) = v5 , s(e8 ) = v5 , t(e1 ) = v2 , t(e2 ) = v3 , t(e3 ) = v4 , t(e4 ) = v2 , t(e5 ) = v5 , t(e6 ) = v5 , t(e7 ) = v6 , t(e8 ) = v6 . Such a graph can be represented by the following diagram: In drawing directed graphs, we will usually omit edge names (the ei ), and sometimes even the node names (the vj ). We now dene paths in a directed graph. Denition 1.10.3 Given a directed graph G = (V, E, s, t) for any two nodes u, v V , a path from u to v is a triple = (u, e1 . . . en , v), where e1 . . . en is a string (sequence) of edges in E such that, s(e1 ) = u, t(en ) = v, and t(ei ) = s(ei+1 ), for all i such that 1 i n 1. When n = 0, we must have u = v, and the path (u, , u) is called the null path from u to u. The number n is the length of the path. We also call u the source (or origin) of the path, and v the target (or endpoint ) of the path. When there is a nonnull path from u to v, we say that u and v are connected . Remark In a path = (u, e1 . . . en , v), the expression e1 . . . en is a sequence, and thus, the ei are not necessarily distinct. For example, the following are paths: 1 = (v1 , e1 e5 e7 , v6 ), 2 = (v2 , e2 e3 e4 e2 e3 e4 e2 e3 e4 , v2 ), and 2 = (v1 , e1 e2 e3 e4 e2 e3 e4 e5 e6 e6 e8 , v6 ). Clearly, 2 and 3 are of a dierent nature from 1 . Indeed, they contain cycles. This is formalized as follows. 33

e7 e8

v2
e2

v3
Figure 1.20:

Denition 1.10.4 Given a directed graph G = (V, E, s, t), for any node u E a cycle (or loop) through u is a nonnull path of the form = (u, e1 . . . en , u) (equivalently, t(en ) = s(e1 )). More generally, a nonnull path = (u, e1 . . . en , v) contains a cycle i for some i, j, with 1 i j n, t(ej ) = s(ei ). In this case, letting w = t(ej ) = s(ei ), the path (w, ei . . . ej , w) is a cycle through w. A path is acyclic i it does not contain any cycle. Note that each null path (u, , u) is acyclic. Obviously, a cycle = (u, e1 . . . en , u) through u is also a cycle through every node t(ei ). Also, a path may contain several dierent cycles. Paths can be concatenated as follows. Denition 1.10.5 Given a directed graph G = (V, E, s, t), two paths 1 = (u, e1 . . . em , v) and 2 = (u , e . . . e , v ) can be concatenated provided that v = u , in which case their concatenation is the path 1 n 1 2 = (u, e1 . . . em e . . . e , v ). n 1 It is immediately veried that the concatenation of paths is associative, and that the composition of the path = (u, e1 . . . em , v) with the null path (u, , u) or with the null path (v, , v) is the path itself. The following fact, although almost trivial, is used all the time, and is worth stating precisely. Lemma 1.10.6 Given a directed graph G = (V, E, s, t), if the set of nodes V has m 1 elements (in particular, it is nite), then every path of length at least m contains some cycle. Proof . Let = (u, e1 . . . en , v). By the hypothesis, n m. Consider the sequence of nodes (u, t(e1 ), . . . , t(en1 ), t(en ) = v). This sequence contains n + 1 elements. Since n m, we have n + 1 > m, and by the so-called pigeonhole principle, since V only contains m distinct nodes, some node in the sequence must appear twice. This shows that either t(ej ) = u = s(e1 ) for some j with 1 j n, or t(ej ) = t(ei ), for some i, j, with 1 i < j n, and thus t(ej ) = s(ei+1 ), with 1 i < j n. Combining both cases, we have t(ej ) = s(ei ) for some i, j, with 1 i j n, which yields a cycle. A consequence of lemma 1.10.6 is that in a nite graph with m nodes, given any two nodes u, v V , in order to nd out whether there is a path from u to v, it is enough to consider paths of length m 1. Indeed, if there is path between u and v, then there is some path of minimal length (not necessarily unique, but this doesnt matter). If this minimal path has length at least m, then by the lemma, it contains a cycle. However, by deleting this cycle from the path , we get an even shorter path from u to v, contradicting the minimality of .

Exercises
1 Let D = (Q, , , q0 , F ) be a DFA. Suppose that every path in the graph of D from the start state to some nal state is acyclic. Does it follow that L(D) is a nite language?

1.11

Labeled Graphs and Automata

Denition 1.11.1 A labeled directed graph is a tuple G = (V, E, L, s, t, ), where V is a set of vertices, or nodes, E is a set of edges, or arcs, L is a set of labels, s, t: E V are two functions, s being called the source function, and t the target function, and : E L is the labeling function. Given an edge e E, we also call s(e) the origin (or source) of e, t(e) the endpoint (or target ) of e, and (e) the label of e. Note that the function need not be injective or surjective. Thus, distinct edges may have the same label.

a
e6

a
v6

e8 b

e5 b

a
e1

a
v2
e2

a e3
b

v3
Figure 1.21: Example 1.11.2 Let G be the directed graph dened such that E = {e1 , e2 , e3 , e4 , e5 , e6 , e7 , e8 }, V = {v1 , v2 , v3 , v4 , v5 , v6 }, L = {a, b}, and s(e1 ) = v1 , s(e2 ) = v2 , s(e3 ) = v3 , s(e4 ) = v4 , s(e5 ) = v2 , s(e6 ) = v5 , s(e7 ) = v5 , s(e8 ) = v5 , t(e1 ) = v2 , t(e2 ) = v3 , t(e3 ) = v4 , t(e4 ) = v2 , t(e5 ) = v5 , t(e6 ) = v5 , t(e7 ) = v6 , t(e8 ) = v6 , (e1 ) = a, (e2 ) = b, (e3 ) = a, (e4 ) = a, (e5 ) = b, (e6 ) = a, (e7 ) = a, (e8 ) = b. Such a labeled graph can be represented by the following diagram: In drawing labeled graphs, we will usually omit edge names (the ei ), and sometimes even the node names (the vj ). Paths, cycles, and concatenation of paths are dened just as before (that is, we ignore the labels). However, we can now dene the spelling of a path. Denition 1.11.3 Given a labeled directed graph G = (V, E, L, s, t, ) for any two nodes u, v V , for any path = (u, e1 . . . en , v), the spelling of the path is the string of labels (e1 ) . . . (en ). When n = 0, the spelling of the null path (u, , u) is the null string . 36

For example, the spelling of the path 2 = (v1 , e1 e2 e3 e4 e2 e3 e4 e5 e6 e6 e8 , v6 ) is abaabaabaab. Every DFA and every NFA can be viewed as a labeled graph, in such a way that the set of spellings of paths from the start state to some nal state is the language accepted by the automaton in question. Given a DFA D = (Q, , , q0 , F ), where : Q Q, we associate the labeled directed graph GD = (V, E, L, s, t, ) dened as follows: V = Q, E = {(p, a, q) | q = (p, a), p, q Q, a },

L = , s((p, a, q)) = p, t((p, a, q)) = q, and ((p, a, q)) = a. Such labeled graphs have a special structure that can easily be characterized. It is easily shown that a string w is in the language L(D) = {w | (q0 , w) F } i w is the spelling of some path in GD from q0 to some nal state. Similarly, given an NFA N = (Q, , , q0 , F ), where : Q ( {}) 2Q , we associate the labeled directed graph GN = (V, E, L, s, t, ) dened as follows: V = Q, E = {(p, a, q) | q (p, a), p, q Q, a {}},

L = {}, s((p, a, q)) = p, t((p, a, q)) = q, ((p, a, q)) = a. Remark When N has no -transitions, we can let L = . Such labeled graphs have also a special structure that can easily be characterized. A string w is in the language L(N ) = {w | (q0 , w) F = } i w is the spelling of some path in GN from q0 to some nal state. Conversely, if we are given a labeled directed graph it may be viewed as an NFA if we pick a start state and a set of nal states. The relationship between NFAs and labeled directed graphs could be made more formal then this, say, using category theory, but it is suciently simple that that is probably unnecessary. Let = {a1 , . . . , am } be an alphabet. We dene a family Rn , of sets of languages as follows: R0 = {{a1 }, . . . , {am }, , {}}, Rn+1 = Rn {L1 L2 , L1 L2 , L |L1 , L2 , L Rn }. We dene R as R=
n0

Rn = R0 R1 R2 .

R is the family of regular languages over . The reason for this name is because R is precisely the set of regular languages over . The elements of R0 are called the atomic languages. In this section we show that the regular languages are those languages that are nitely generated using the operations union, concatenation and Kleene-*. A regular expression is a natural way of denoting how these operations are used to generate a language. One thing to be careful about is that R depends on the alphabet , although our notation doesnt reect this fact. If for any reason it is unclear from the context which R we are referring to, we can use the notation R() to denote R.

Example 1.11.4 Suppose we take = {a, b}. Then R1 = {{a}, {b}, , {}, {a, b}, {ab}, {ba}, {, a, a2, ...}, {, b, b2, ...}}. Observe that in general, Rn is a nite set. In this case it contains 9 languages, 7 of which are nite and two innite ones. Lemma 1.11.5 The family R is the smallest family of languages that contains the (atomic) languages {a1 }, . . . , {am }, , {}, and is closed under union, concatenation, and Kleene . Proof Use induction on n. Note that a given language may be built up in dierent ways. For example, {a, b} = ({a} {b}) . Given an alphabet = {a1 , . . . , am }, consider the new alphabet D = {+, , , (, ), , }. Next, we dene a family Rn of languages over D as follows: R0 = {a1 , . . . , am , , }, Finally, we dene R as Rn+1 = Rn {(S + T ), (S T ), (U ) |S, T, U Rn }. R = R0 R1 R2

R is the set of regular expressions (over ).

Lemma 1.11.6 The language R is the smallest language which contains the symbols a1 , . . . , am , , , from D, such that (S + T ), (S T ), and U belong to R, when S, T, U R. Proof Exercise. For simplicity of notation we write (R1 R2 ) instead of (R1 R2 ). Example 1.11.7 R = (a + b) , S = (a b ) . Every regular expression S R can be viewed as the name, or denotation, of some regular language L R. Similarly, every regular language L R is the interpretation (or meaning) of some regular expression S R. This is made rigorous by dening a function L: R P( ), where P( ) is the set of subsets of . We may think of L as standing for the language denoted by. This function can be dened recursively by the equations L[ai ] = {ai }, L[] = , L[] = {},

L[(S + T )] = L[S] L[T ], L[(ST )] = L[S]L[T ], L[U ] = L[U ] . 38

ai
Figure 1.22: Here we see the three types of machines that accept the atomic languages. The top machine accepts the empty set because it has no nal states. The middle machine accepts only since it has no arrows leaving it or going into it. The last machine accepts only a xed letter ai (i.e. there is one machine for each letter). Remark The function L is not injective. For example, if S = (a + b) , T = (a b ) , then L[S] = L[T ] = {a, b}. For simplicity we often denote L[S] as LS . Theorem 1.11.8 For every regular expression S R, the language L[S] is regular (L[S] R). Conversely, for every regular language L R, there is some regular expression S R such that L = L[S]. In other words, the range of L is exactly R. We break the theorem up into two lemmas, which actually say a bit more then the theorem. Lemma 1.11.9 There is an algorithm, which, given any regular expression S R, constructs an NFA NS accepting LS , i.e. such that LS = L(NS ). Proof The proof is by induction on the strata of R. We show that for each n if R Rn then there is an NFA that accepts the language of R, and that this NFA has the properties that (i) There are no edges entering the start state, and (ii) there is one nal state, which has no outgoing edges. Without loss of generality, assume that = {a1 , ..., ak }. NFAs for R0 are given in gure (1.22). Next, suppose that our hypothesis is true for Rm where m < n. Let R Rn be a regular expression. Then R is either the Kleene-* of a regular expression in Rn1 or the sum or product of two regular expressions in Rn1 . Suppose that R = S , S Rn1 . Then by induction there is an NFA accepting the language denoted by S that has a single start state with no incoming edges and a single nal state with no outgoing edges (see gure (1.23). This can always be achieved by adding a single start state and nal state and using the appropriate epsilon transitions. In gure (1.23), and the gures to follow, we draw this type a machine with a blob in the middle and imagine that inside the blob is a collection of states and transitions. To construct a machine that will recognize the language denoted by R we alter the above machine to create the machine that appears in gure (1.24). Next, suppose that it is the case that R = (ST ), where S and T are in Rn1 . Then by induction there are two NFAs accepting the languages denoted by S and T respectively, and as above me may assume that they have the form depicted in gure (1.23). From these two machines we construct an NFA that will accept the appropriate language - see gure (1.25). 39

Figure 1.23: For any regular language there is an NFA accepting of the form depicted above, namely a single start and nal state, each of which has no incoming arrows.

Figure 1.24: If the machine in gure (1.23) is altered to appear as above, then the language accepted by the new machine will be the Kleene-* of the language accepted by the machine in gure (1.23).

Figure 1.25: How to construct an NFA to accept the product of two languages, given an NFA for each of the languages.

Figure 1.26:

Finally, suppose that it is the case that R = (S + T ), S, T Rn1 . Again, by induction there are two NFAs accepting the languages denoted by S and T respectively. which look like the above two. A machine corresponding to their sum is given in gure (1.26). Of course, it is crucial that in each of the above three constructions the induction hypothesis is satised, i.e. the resulting machine has a start state with no incoming edges and a single nal state with no outgoing edges. Therefore the proof of the rst lemma is complete. Lemma 1.11.10 There is an algorithm, which, given any NFA N , constructs a regular expression S R, denoting L(N ), i.e., such that LS = L(N ). This is the node elimination algorithm. The idea is to allow more general labels on the edges of an NFA, namely, regular expressions. We will call these machines generalized NFAs. Example 1.11.11 Consider a generalized NFA with two state p and q where q is nal, and the edge between them is labeled by a regular expression S. The language accepted by this generalized NFA is exactly L(S) ! The proof will proceed as follows. We start with an NFA and view it as a generalized NFA. We will then contract this generalized NFA, step by step, preserving the recognized language at each step, until we have a generalized NFA which has only one edge, as in the above example. The expression that labels the edge will correspond to the language of the original NFA that we started with. But before we start this procedure we must put the NFA the normal form that we used earlier. Preprocessing, phase 1: Consider a given NFA. If there are incoming edges to the start state then add a new start state with an -transition to the original start state. If necessary, we need to add a new (unique) nal state, with -transitions from each of the old nal states to the new nal state, if there are outgoing edges from any of the old nal states. At the end of this phase, the start state s is a source (no incoming edges), and the nal state t is a sink (no outgoing edges). Preprocessing, phase 2: We need to atten parallel edges. For any pair of states (p, q) (p = q is possible), if there are k edges from p to q labeled u1 , . . ., uk , then create a single edge labeled with the regular expression u1 + + uk . For any pair of states (p, q) (p = q is possible) such that there is no edge from p to q, we put an edge labeled . At the end of this phase, the resulting generalized NFA has the property that for any pair of states (p, q) (where p = q is possible), there is a unique edge labeled with some regular expression denoted as Rp,q . When Rp,q = , this really means that there is no edge from p to q in the original NFA N . By interpreting each Rp,q as a function call (really, a macro) to the NFA Np,q accepting L[Rp,q ] (constructed using the previous algorithm), we can verify that the original language L(N ) is accepted by this new generalized NFA. Node elimination This algorithm only applies if the generalized NFA has at least one node distinct from s and t. Pick any node r distinct from s and t. For every pair (p, q) where p = r and q = r, consider the triangle formed by these states. If p = q then we may draw a diagram like the one in gure (1.27). Then, replace the label of the edge from p to q with the expression Rp,q + Rp,r Rr,r Rr,q (see gure (1.28)). We emphasize that this operation must be performed simultaneously for every pair (p, q) . Also, keep in mind that if in the original machine there was no edge from p to q then we now have an edge between them 41

Rr;r

Rp;r

Rr;q

Rp;q

Figure 1.27:
Rp;q

Rp;r Rr;r Rr;q

Figure 1.28: labeled with the empty set. If we dont pay attention to other edges coming and going from r then our algorithm probably wont work properly. At the end of this step delete the node r and all edges adjacent to r. Observe that it is possible that p = q, in which case the triangle is at. It is also possible that p = s or q = t. Also, this step is performed for all pairs (p, q), which means that both (p, q) and (q, p) are considered (when p = q). Note that this step only has an eect if there are edges from p to r and from r to q in the original NFA N . Otherwise r can simply be deleted, as well as the edges adjacent to r. Other simplications can be made. For example, when Rr,r = we can simplify Rp,r Rr,r Rr,q to Rp,r Rr,q . When Rp,q = we have Rp,r Rr,r Rr,q . The order in which the nodes are eliminated is irrelevant, although it eects the size of the nal expression. The algorithm stops when the only remaining nodes are s and t. Then the label R of the edge from s to t is a regular expression denoting L(N ). This completes the proof of the second lemma. In gure (1.29) we see an example of node elimination applied to a machine that accepts all binary strings whose second to last letter is the digit 1. Another example, involving simultaneous edges during the elimination process can be seen in gure (??). Denition 1.11.12 Two regular expressions S, T R are equivalent , denoted as S T , i L[S] = L[T ]. = It is easy to prove that is an equivalence relation. The relation satises some (nice) identities. For = = example: ((S + T ) + U ) (S + (T + U )), = ((ST )U ) (S(T U )), = (S + T ) (T + S), = (S S ) S , = (S ) = S .

Exercises
1 (a) Find a regular expression denoting the set of all strings over = {a, b} with no more than three as. (b) Find a regular expression denoting the set of all strings over = {a, b} such that the number of as is a multiple of three (including 0). 2 Let R be any regular language. Prove that the language L = {w | www R} is regular (this is a bit tricky!). 3 Recall that for two regular expressions R and S, R S means that L(R) = L(S), i.e. R and S denote = the same language. (i) Show that if R S .T , then R S.R + T . = = (ii) Assume that (the empty string) is not in the language denoted by the regular expression S. Hint : Prove that x R if and only if x S .T , by observing that R S.R + T implies that for every k 0, = R S k+1 .R + (S k + S k1 + . . . + S 2 + S + ).T, = and that since S, every string in S k+1 .R has length at least k + 1. / 4 Recall that two regular expressions R and S are equivalent, denoted as R S, i they denote the same = regular language L(R) = L(S). Show that the following identities hold for regular expressions: (R + S).T R.T + S.T = T.R + T.S T.(R + S) = R .R R = (R ) = R (R + S) = (R .S) .R 5 Show that the recursion equations for L can be used to dene L. Give a rigorous proof !

1.12

The Theorem of Myhill and Nerode

The purpose of this section is to give one more characterization of the regular languages in terms of certain kinds of equivalence relations on strings. Pushing this characterization a bit further, we will be able to prove the existence of minimal DFAs. Let D = (Q, , , q0 , F ) be a DFA. The DFA D may be redundant. For example, there may be states that are not accessible from the start state. This motivates the following denition. The set Qacc of accessible states is the subset of Q dened as Qacc = {p Q | w , (q0 , w) = p}. The set Qacc can be easily computed by stages. If Q = Qacc , we can clean up D by deleting the states in Q Qacc and restricting the transition function to Qacc . We then get an equivalent DFA D , i.e. L(D) = L(D ), where all the states of D are accessible. Therefore, without loss of generality, we assume that we are dealing with DFAs such that Q = Qacc .

Recall that an equivalence relation on a set A is a relation which is reexive, symmetric, and transitive. Given any a A, the set {b A | a b} is called the equivalence class of a, and it is denoted as [a] , or even as [a]. Recall that for any two elements a, b A, [a] [b] = i a b, and [a] = [b] i a b. The set of equivalence classes associated with the equivalence relation is a partition of A (also denoted as A/ ). This means that it is a family of nonempty pairwise disjoint sets whose union is equal to A itself. The equivalence classes are also called the blocks of the partition . The number of blocks in the partition is called the index of (and ). Given any two equivalence relations 1 and 2 with associated partitions 1 and 2 , 1 2 i every block of the partition 1 is contained in some block of the partition 2 . Then every block of the partition 2 is the union of blocks of the partition 1 and we say that 1 is a renement of 2 (and similarly, 1 is a renement of 2 ). Note that 2 has at most as many blocks as 1 does. We now dene an equivalence relation on strings induced by a DFA. We say that two strings u, v are equivalent i when feeding rst u and then v to the DFA, u and v drive the DFA to the same state. From the point of view of the observer u and v have the same eect (reaching the same state). Denition 1.12.1 Given a DFA D = (Q, , , q0 , F ) we dene the relation D on as follows: for any two strings u, v , u D v i (q0 , u) = (q0 , v). The relation D turns out to have some interesting properties. In particular it is right-invariant , which means that for all u, v, w if u v then uw vw. Lemma 1.12.2 Given any (accessible) DFA D = (Q, , , q0 , F ), the relation D is an equivalence relation which is right-invariant and has nite index. Furthermore, if Q has n states, then the index of D is n and every equivalence class of D is a regular language. Finally, L(D) is the union of some of the equivalence classes of D . Proof . The fact that D is an equivalence relation is trivial. To prove that D is right-invariant we need the fact that that for all u, v and for all p Q, (p, uv) = ( (p, u), v). Then if u D v, which means that (q0 , u) = (q0 , v), we have that (q0 , uw) = ( (q0 , u), w) = ( (q0 , v), w) = (q0 , vw), which means that uw D vw. Thus D is right-invariant. We still have to prove that D has index n. Dene the function f : Q such that f (u) = (q0 , u). Note that if u D v, which means that (q0 , u) = (q0 , v), then f (u) = f (v). Thus the function f : Q induces a function f : Q dened such that f ([u]) = f (u), for every equivalence class [u] , where = / is the partition associated with D . However, the function f : Q is injective (one-to-one) since f ([u]) = f ([v]) means that (q0 , u) = (q0 , v), which 44

means precisely that u D v, i.e., [u] = [v]. Since Q has n states has at most n blocks. But since every state is accessible, has exactly n blocks. Finally, every equivalence class of is a set of strings of the form {w | (q0 , w) = p}, for some p Q, which is accepted by the DFA obtained from D by changing F to {p}. Thus every equivalence class is a regular language and L(D) is the union of the equivalence classes corresponding to the nal states in F . The remarkable fact due to Myhill and Nerode is that lemma 1.12.2 has a converse. Lemma 1.12.3 Given any equivalence relation on , if is right-invariant and has nite index n then every equivalence class (block) in the partition associated with is a regular language. Proof . Let C1 , . . . , Cn be the blocks of , and assume that C1 = [] is the equivalence class of the empty string. First, we claim that for every block Ci and every w there is a unique block Cj such that Ci w Cj , where Ci w = {uw | u Ci }.

We also claim that for every w and every block Ci , C1 w Ci

For every u Ci the string uw belongs to one and only one of the blocks of , say Cj . For any other string v Ci , by denition we have that u v and so by right invariance it follows that uw vw. But since uw Cj and Cj is an equivalence class we also have vw Cj . This proves the rst claim. i w Ci .

For every class Ci , let

If C1 w Ci then it follows that w = w Ci since C1 = []. Conversely, if w Ci then for any v C1 = [] since v. By right invariance we have w vw and therefore vw Ci , which shows that C1 w Ci . Di = ({1, . . . , n}, , , 1, {i}),

Using claims 1 and 2 it is immediately veried that L(Di ) = Ci , proving that every block Ci is a regular language. We can combine lemma 1.12.2 and lemma 1.12.3 to get the following characterization of a regular language due to Myhill and Nerode. Theorem 1.12.4 A language L (over an alphabet ) is a regular language i it is the union of some of the equivalence classes of an equivalence relation on which is right-invariant and has nite index. Theorem 1.12.4 can also be used to prove that certain languages are not regular. As an example we prove that L = {an bn | n 1} is not regular. In general it is false that Ci a = Cj for some block Cj . We can only claim that Ci a Cj .

where (i, a) = j i Ci a Cj .

Assume that L is regular. Then there is some equivalence relation which is right-invariant and of nite index which has the property that L is the union of some of the classes of . Since the set {a, aa, aaa, . . . , ai , . . .}

Another useful tool for proving that languages are not regular is the so-called pumping lemma. 45

is innite and has a nite number of classes, two of these strings must belong to the same class. This means that ai aj for some i = j. But is right invariant so by concatenating with bi on the right we have that ai bi aj bi for some i = j. However ai bi L and since L is the union of classes of we also have aj bi L for i = j, which is absurd, given the denition of L. Therefore L is not regular.

Lemma 1.12.5 Given any DFA D = (Q, , , q0 , F ) where Q has m states, for every w , if w L(D) and |w| m then there exists a decomposition of w as w = uxv, where x = , uxi v L(D), for all i 0, and |ux| m. Proof . Let w = w1 . . . wn . Since |w| m, we have n m. Since w L(D), let (q0 , q1 , . . . , qn ), be the sequence of states in the accepting computation of w (where qn F ). Consider the subsequence (q0 , q1 , . . . , qm ). This sequence contains m + 1 states, but there are only m states in Q and thus we have qi = qj , for some i, j such that 0 i < j m. Letting u = w1 . . . wi , x = wi+1 . . . wj and v = wj+1 . . . wn it is clear that the conditions of the lemma hold. The reader is urged to use the pumping lemma to prove that L = {an bn | n 1} is not regular. The usefulness of the condition |ux| m lies in the fact that it reduces the number of legal decompositions uxv of w. We now consider an equivalence relation associated with a language L.

Exercises
1 Let D = (Q, , , q0 , F ) be a DFA. Recall that a state p Q is accessible i there is some string w , such that (q0 , w) = p, i.e., there is some path from q0 to p in D. Consider the following method for computing the set Qacc of accessible states (of D): dene the sequence of sets Qi Q, where acc Qi+1 = {q Q | p Qi , a , q = (p, a)}. acc acc Q0 = {q0 }, acc

acc Let Qi denote the set of states reachable from q0 by paths of length i. (a) Find a regular expression denoting the set of all strings over = {a, b} with no more than three as. (b) Find a regular expression denoting the set of all strings over = {a, b} such that the number of as is a multiple of three (including 0). 2 Let D = (Q, , , q0 , F ) be a deterministic nite automaton. Dene the relations and on as follows: x y if and only if, for all p Q, (p, x) F i (p, y) F, and xy if and only if, for all p Q, (p, x) = (p, y).

(a) Show that is a left-invariant equivalence relation and that is an equivalence relation that is both left and right invariant. (A relation R on is left invariant i uRv implies that wuRwv for all w , and R is right invariant i uRv implies that uwRvw for all w .) 46

(c) Given any language L , dene the relations L and L on as follows: u L v and u L v i, for all x, y , xuy L i, for all z , zu L i zv L,

(b) Let n be the number of states in Q (the set of states of D). Show that has at most 2n equivalence classes and that has at most nn equivalence classes.

i xvy L.

Prove that L is left-invariant, and that L is left and right-invariant. Prove that if L is regular, then both L and L have a nite number of equivalence classes. Hint : Show that the number of classes of L is at most the number of classes of , and that the number of classes of L is at most the number of classes of . 3 Let = {a, b}. Classify all languages over whose strings are of the form ai bj .

1.13

Minimal DFAs

Given any language L (not necessarily regular), we can dene an equivalence relation L which is rightinvariant, but not necessarily of nite index. However, when L is regular the relation L has nite index. In fact, this index is the size of a smallest DFA accepting L. This will lead us to a method for constructing minimal DFAs. Denition 1.13.1 Given any language L (over ), we dene the relation L on as follows: for any two strings u, v , uL v i w (uw L i vw L). We leave as an easy exercise to prove that L is an equivalence relation which is right-invariant. It is clear that L is the union of the equivalence classes of strings in L. When L is regular we have the following remarkable result. Lemma 1.13.2 If L is a regular language and D = (Q, , , q0 , F ) an (accessible) DFA such that L = L(D) then L is a right-invariant equivalence relation and we have D L . Furthermore, if L has m classes and Q has n states then m n. Proof . By denition, u D v i (q0 , u) = (q0 , v). Since w L(D) i (q0 , w) F , the fact that uL v can be expressed as w (uw L i vw L), i, w ( (q0 , uw) F i (q0 , vw) F ),

w ( ( (q0 , u), w) F

i ( (q0 , v), w) F ),

and if (q0 , u) = (q0 , v), this shows that uL v. Since the number of classes of D is n and D L it follows that the equivalence relation L has fewer classes than D and m n. Lemma 1.13.2 shows that when L is regular the index m of L is nite and it is a lower bound on the size of all DFAs accepting L. It remains to show that a DFA with m states accepting L exists. However, going back to the proof of lemma 1.12.3 starting with the right-invariant equivalence relation L of nite index m, if L is the union of the classes Ci1 , . . . , Cik , then the DFA DL = ({1, . . . , m}, , , 1, {i1, . . . , ik }), 47

where (i, a) = j i Ci a Cj , is such that L = L(DL ). Thus DL is a minimal DFA accepting L. In the next section we give an algorithm which allows us to nd DL given any DFA D accepting L. This algorithms nds which states of D are equivalent.

Exercises
1 The existence of a minimal DFA follows immediately from the well-ordering of the natural numbers. The point of the previous section is the construction given for a minimal DFA. In fact, this DFA is unique up to isomorphism. (i) Show that if D is minimal and L = L(D) then L =D . (ii) Use (i) to show that a minimal DFA is unique up to isomorphism. (iii) Show that if D is minimal and L(D) = L(D) then D is the homomorphic image of D.

1.14

State Equivalence and Minimal DFAs

The proof of lemma 1.13.2 suggests the following denition of equivalence between states. Denition 1.14.1 Given a DFA D = (Q, , , q0 , F ), the relation on Q, called state equivalence, is dened as follows: for all p, q Q, pq i w ( (p, w) F i (q, w) F ).

When p q we say that p and q are indistinguishable. It is trivial to verify that is an equivalence relation, and that it satises the following property: if p q then (p, a) (q, a), for all a . The following lemma shows the relationship between L and . Lemma 1.14.2 For any (accessible) DFA D = (Q, , , q0 , F ) accepting the regular language L = L(D), the function : Q dened such that (u) = (q0 , u) induces a bijection : /L Q/ , dened such that ([u]L ) = [(q0 , u)] . Furthermore, we have [u]L a [v]L i ((u), a) (v).

Proof . Since (u) = (q0 , u) and (v) = (q0 , v), the fact that (u) (v) can be expressed as w ( ( (q0 , u), w) F

w ( (q0 , uw) F

i ( (q0 , v), w) F ), i (q0 , vw) F ),

which is exactly uL v. Thus, since every state in Q is accessible, we have a bijection : /L Q/ . Since (u) = (q0 , u), we have ((u), a) = ( (q0 , u), a) = (q0 , ua) = (ua), 48

and thus, ((u), a) (v) can be expressed as (ua) (v). By the previous part, this is equivalent to uaL v, which is equivalent to [u]L a [v]L . Lemma 1.14.2 shows that the DFA DL is isomorphic to the DFA D/ obtained as the quotient of the DFA D modulo the equivalence relation on Q. More precisely, if D/ = (Q/ , , / , [q0 ] , F/ ), where / ([p] , a) = [(p, a)] , then D/ is isomorphic to the minimal DFA DL accepting L, and thus, it is a minimal DFA accepting L. The minimal DFA D/ is obtained by merging the states in each block of the partition associated with , forming states corresponding to the blocks of , and drawing a transition on input a from a block Ci to a block Cj of i there is a transition q = (p, a) from any state p Ci to any state q Cj on input a. The start state is the block containing q0 , and the nal states are the blocks consisting of nal states. Note that if F = , then has a single block (Q), and if F = Q, then has a single block (F ). In the rst case, the minimal DFA is the one state DFA rejecting all strings. In the second case, the minimal DFA is the one state DFA accepting all strings. When F = and F = Q, there are at least two states in Q, and also has at least two blocks, as we shall see shortly. It remains to compute explicitly. This is done using a sequence of approximations. In view of the previous discussion, we are assuming that F = and F = Q, which means that n 2, where n is the number of states in Q. Denition 1.14.3 Given any DFA D = (Q, , , q0 , F ), for every i 0, the relation i on Q, called i-state equivalence, is dened as follows: for all p, q Q, p i q i w , |w| i ( (p, w) F i (q, w) F ).

When p i q, we say that p and q are i-indistinguishable. Since we assumed that F = and F = Q, it is immediately veried that 0 has exactly two equivalence classes F and Q F , and that . . . i+1 i . . . 1 0 . If this sequence was strictly decreasing, the partition associated with i+1 would contain at least one more block than the partition associated with i , and since only has nitely many blocks (at most the number of states n in Q), there is a smallest integer, say i0 , such that i0 +1 = i0 . Furthermore, we have i0 n 2, where n is the number of states in Q. Thus, it remains to compute i+1 from i , which can be done using the following lemma. The lemma will also show that = i0 . Lemma 1.14.4 For any (accessible) DFA D = (Q, , , q0 , F ), for all p, q Q, p i+1 q i p i q and (p, a) i (q, a), for every a . Furthermore, if F = and F = Q, there is a smallest integer i0 n 2, such that i0 +1 = i0 = . 49

We leave the easy proof as an exercise. Using lemma 1.14.4, we can compute inductively, starting from 0 = (F, Q F ), and computing i+1 from i , until the sequence of partitions associated with the i stabilizes. There are a number of algorithms for computing , or to determine whether p q for some given p, q Q. A simple method to compute is described in Hopcroft and Ullman. It consists in forming a triangular array corresponding to all unordered pairs (p, q), with p = q (the rows and the columns of this triangular array are indexed by the states in Q, where the entries are below the descending diagonal). Initially, the entry (p, q) is marked i p and q are not 0-equivalent, which means that p and q are not both in F . Then, we process every unmarked entry on every row as follows: for any unmarked pair (p, q), we consider of pairs ((p, a), (q, a)), for all a . If any pair ((p, a), (q, a)) is already marked, this means that (p, a) and (q, a) are inequivalent, and thus p and q are inequivalent, and we mark the pair (p, q). We continue in this fashion, until at the end of a round during which all the rows are processed, nothing has changed. When the algorithm stops, all marked pairs are inequivalent, and all unmarked pairs correspond to equivalent states. There are ways of improving the eciency of this algorithm, see Hopcroft and Ullman for such improvements. Fast algorithms for testing whether p q for some given p, q Q also exist. One of these algorithms based on forward closures is discussed in the exercises. Such an algorithm is related to a fast unication algorithm.

Exercises
1 The purpose of this problem is to get a fast algorithm for testing state equivalence in a DFA. Let D = (Q, , , q0 , F ) be a deterministic nite automaton. Recall that state equivalence is the equivalence relation on Q, dened such that, pq i z ( (p, z) F i (q, z) F ). (q, z) F ).

and that i-equivalence is the equivalence relation i on Q, dened such that, p i q i z , |z| i ( (p, z) F i

(i) Prove that p i+1 q i p i q and (p, a) i (q, a), for every a . Prove that that =i , for some i n 2, where n is the number of states in Q. A relation S Q Q is a forward closure i it is an equivalence relation and (1) Whenever (p, q) S, then ((p, a), (q, a)) S, for all a . We say that a forward closure S is good i whenever (p, q) S, then good(p, q), where good(p, q) holds i either both p, q F , or both p, q F . / Given any relation R Q Q, recall that the smallest equivalence relation R containing R is the relation (R R1 ) (where R1 = {(q, p) | (p, q) R}, and (R R1 ) is the reexive and transitive closure of (R R1 )). We dene the sequence of relations Si Q Q as follows: S0 = R Si+1 = (Si {((p, a), (q, a)) | (p, q) Si , a }) . (ii) Prove that Si0 +1 = Si0 for some least i0 . Prove that Si0 is the smallest forward closure containing R. (iii) Prove that p q i the forward closure of the relation R = {(p, q)} is good. Hint : First, show that p q i good(p, q) and (p, a) (q, a) for all a . 50

2 A fast algorithm for computing the forward closure of the relation R = {(p, q)}, or detecting a bad pair of states, can be obtained as follows. An equivalence relation on Q is represented by a partition . Each equivalence class C in the partition is represented by a tree structure consisting of nodes and (parent) pointers, with the pointers from the sons of a node to the node itself. The root has a null pointer. Each node also maintains a counter keeping track of the number of nodes in the subtree rooted at that node. Two functions union and f ind are dened as follows. Given a state p, f ind(p, ) nds the root of the tree containing p as a node (not necessarily a leaf). Given two root nodes p, q, union(p, q, ) forms a new partition by merging the two trees with roots p and q as follows: if the counter of p is smaller than that of q, then let the root of p point to q, else let the root of q point to p. In order to speed up the algorithm, we can modify f ind as follows: during a call f ind(p, ), as we follow the path from p to the root r of the tree containing p, we redirect the parent pointer of every node q on the path from p (including p itself) to r. Say that pair p, q is bad i either both p F and q F , or both p F and q F . The function bad is / / such that bad( p, q ) = true if p, q is bad, and bad( p, q ) = f alse otherwise. For details of this implementation of partitions, see Fundamentals of data structures, by Horowitz and Sahni, Computer Science press, pp. 248-256. Then, the algorithm is as follows: function unif [p, q, , dd]: f lag; begin trans := lef t(dd); f f := right(dd); pq; = (p, q); st = (pq); f lag = 1; k := Length(f irst(trans)); while st = () f lag = 0 do uv := top(st); uu := lef t(uv); vv := right(uv); pop(st); if bad(f f, uv) = 1 then f lag := 0 else u := f ind(uu, ); v := f ind(vv, ); if u = v then union(u, v, ); for i = 1 to k do u1 := delta(trans, uu, k i + 1); v1 := delta(trans, vv, k i + 1); uv = (u1, v1); push(st, uv) endfor endif endif endwhile end The algorithm uses a stack st. We are assuming that the DFA dd is specied as a list of two sublists, the st list being a representation of the transition function, and the second one of the set of nal states. The transition function itself is a list of lists, where the i-th list represents the i-th row of the transition table for dd. The function delta is such that delta(trans, i, j) returns the j-th state in the i-th row of the transition table of dd. For example, we have a DFA

dd = (((2, 3), (2, 4), (2, 3), (2, 5), (2, 3), (7, 6), (7, 8), (7, 9), (7, 6)), (5, 9)). Implement the above algorithm, and test it at least for the above DFA dd and the pairs of states (1, 6) and (1, 7).

0,1 1 0,1

0+1 1 0+1

(0+1)*1

0+1

(0+1)*1

0+1

(0+1)*1(0+1)

Figure 1.29: An example of node elimination. Keep in mind that if the original NFA lacks an edge between two states, then during the node elimination process we assume that there is an edge there, but it is labeled with the empty set. In most cases we dont want to draw this edge in, since the picture will look like a mess. In many cases these edges can be ignored, so we only draw it if it plays a non-trivial role in the elimination process.

1 q 0 p
1) The given machine.

r 0 0

1 0

1 q 0 p f r 0 1

2) The machine preprocessed.

1 q 0 p 0 r 0 f 1 0 p q 0

01*1 01* f

3) Prepare to eliminate the node r. We must take into account all non-empty edges going in and out of r, in other words, elimanatie with. respect to not only the pair (q,f) but the pair (q,q).

4) r has been elimated. We next eliminate q. This must be done with respect to the pairs (p,p) and (p,f).

0(011)0 + 0(011)01* f ( 0(011)0)* ( + 0(011)01* ) f

5) q has now been eliminated and we prepare the eliminate p.

6) The final machine with the desired regular expression.

Figure 1.30: Another example of node elimination. This example is trickier than the prior one, since some of the nodes have many edges coming and going into them, so several triangles must be simultaneously collapsed.

Chapter 2

Formal Languages
2.1 A Grammar for Parsing English

The study of grammar in computer science is, of course, related to the study of the grammar of languages, and so in some sense its development overlaps with the development of linguistics, a truly vast subject. In the primordial soup of grammar were the parts of speech. Once one realizes that that parts of sentences could be labeled to indicate their function, it seems natural that one might notice that there appear to be rules dictating how these parts of speech are used. Also, if they are used incorrectly in a sentence, a listening native speaker will say that the sentence sounds wrong. As a result of having the notion of the parts of speech, it becomes possible to make a statement grammar. For example, In English, every sentence contains a verb. or A proper sentence of Martian consists of 100 nouns followed by 200 verbs and a participle. After discovering the parts of speech the next step is to start diagramming, i.e. parsing sentences. The formalization of this process led to the formal notion of a grammar in the 1950s, a process which culminated in Noam Chomskys classic book Syntactic structures. Chomsky is responsible for the denition of a context free grammar, which is a mathematical way of saying how to generate the sentences of a language. A collection of rules are given, often starting in the following way: S N P + V P N P stands for noun phrase and V P stands for verb phase. The idea is that simple sentences consists of two pieces, a noun phrase followed by a verb phrase. But this appears to not be very useful, since we dont know what noun and verb phrases are, and so how does this help to understand English grammar ? We answer this by showing what a noun phrase may be decomposed into: N P proper noun or N P indef inite pronoun or N P determiner + common noun This looks worse! It seems that more and more strange, undened words are being introduced. Fortunately, these rules eventually terminate in recognizable words. But before we give the full set of rules that demonstrate this, some discussion about the rst rule might be helpful. 55

A noun phrase is what is sounds like, namely, the generalization of what a noun is, and everyone knows that a noun is a person, place or thing. Now admittedly, this last denition is a little shaky, since it is hard to imagine an example of something that is not a thing ! Nevertheless, we ask the reader to bear with us. We would like to consider strings of words (phrases) like The child or Fifteen hundred poison jelly beans or Those rst three elements or Five out of every six planets or Somebody or The top of the icosahedron to be considered as nouns, but they have all those funny modiers in front of what we would normally call the nouns, namely child and jelly beans. So we make up the more general category, the noun phrase. Below we give a collection of rules for generating noun phrases, which are one-half of the rules for simplex sentences, sometimes called minimal clauses. A minimal clause is a noun phrase followed by a verb phrase. But we must emphasize here that these rules will sometimes generate phrases that are not standard American! In fact, this is in some sense the entire problem of syntax in linguistics, i.e. nding a system that describes exactly what the structure of a language is. The authors know of no way of doing this with a system like that described below, which is an example of a context free grammar. So we have to settle for something that will in fact generate many of the sentences that we consider to be standard American, but it will also generate a few more. Nevertheless, such a grammar is still useful in parsing sentences, i.e. given a sentence it can be decomposed into pieces. But it is not even clear that all sentences can be parsed. So we settle for an approximation here, since our point is not to solve the problems of linguistics, but just to motivate our later study of formal languages.

s np + vp np proper noun | indenite pronoun | determiner + common noun proper-noun indenite pronoun {some-, no-, every-, any-} {-body, -one, -thing} determiner (pre-article) article (locative) pre-article quantier (out) of (every) quantier {number } {(a) few, some, much, many, more, most, all} article {denite, non-denite } denite {the, NEAR, FAR } NEAR {this, these } FAR {that, those } non-denite {some (or other), {a, } (certain) } locative [status number dimension] status {past, present, future } number {cardinal, ordinal } cardinal {one, two, three,... } ordinal {rst, second, third,..., second from the end, next to the last, last } dimension {top, bottom, right, left, front, back, middle } common-noun {count, non-count } count (plural) non-count vp auxiliary {equational, nonequational (manner) } (adver(ial(s))) auxiliary tense (modal) (participle) tense {pres, past} modal {can, may, shall, will, must, (have to), be to, come to, ought to, do (to)? } participle [perfect/ progressive ] perfect have -en/d progressive be -ing equational {compula complement, attributive np(q) } copula {be, become, remain } complement {({preposition, like}) np, adj } preposition {in, out, under,over, through, across, from, to(?), towards, away(from),...} adj (intensier) intensier {somewhat, rather, quite, very, awfully, terribly, damned } attributive {have, cost, weigh, measure} npq {two dollars, three square feet, seventeen pounds, an awful lot } non-equational {VI, VT np } VI [participle/complement] VT [participle/complement] manner adj -ly adverb(ial(s)) [place time duration frequency] place preposition np time {yesterday, today, tomorrow, then, now, in the future, three minutes ago, four years ago,... } duration { (for) two hours, (for) three years,...} frequency {twice, three times, three times a minute, often, seldom, never,...}

2.2

Context-Free Grammars

A context-free grammar basically consists of a nite set of grammar rules. In order to dene grammar rules, we assume that we have two kinds of symbols: the terminals, which are the symbols of the alphabet underlying the languages under consideration, and the nonterminals, which behave like variables ranging over strings of terminals. A rule is of the form A , where A is a single nonterminal, and the right-hand side is a string of terminal and/or

nonterminal symbols. As usual, rst we need to dene what the object is (a context-free grammar), and then we need to explain how it is used. Unlike automata, grammars are used to generate strings, rather than recognize strings.

Denition 2.2.1 A context-free grammar (for short, CFG) is a quadruple G = (V, , P, S), where V is a nite set of symbols called the vocabulary (or set of grammar symbols); V is the set of terminal symbols (for short, terminals); P (V ) V is a nite set of productions (or rewrite rules, or rules). S (V ) is a designated symbol called the start symbol ;

The set N = V is called the set of nonterminal symbols (for short, nonterminals). Thus, P N V , and every production A, is also denoted as A . A production of the form A is called an epsilon rule, or null rule. Remark : Context-free grammars are sometimes dened as G = (VN , VT , P, S). The correspondence with our denition is that = VT and N = VN , so that V = VN VT . Thus, in this other denition, it is necessary to assume that VT VN = . Example 1. G1 = ({E, a, b}, {a, b}, E, P ), where P is the set of rules E aEb, E ab. As we will see shortly, this grammar generates the language L1 = {an bn | n 1}, which is not regular. Example 2. G2 = ({E, +, , (, ), a}, {+, , (, ), a}, E, P ), where P is the set of rules E E + E, E E E, E (E), E a.

This grammar generates a set of arithmetic expressions.

2.3

Derivations and Context-Free Languages

The productions of a grammar are used to derive strings. In this process, the productions are used as rewrite rules. Formally, we dene the derivation relation associated with a context-free grammar. Denition 2.3.1 Given a context-free grammar G = (V, , P, S), the (one-step) derivation relation =G associated with G is the binary relation =G V V dened as follows: for all , V , we have =G i there exist , V , and some production (A ) P , such that = A
+

and = .

The transitive closure of =G is denoted as =G and the reexive and transitive closure of =G is denoted as =G .

When the grammar G is clear from the context, we usually omit the subscript G in =G , =G , and =G . A string V such that S = is called a sentential form, and a string w such that S = w is n called a sentence. A derivation = involving n steps is denoted as = . Note that a derivation step =G is rather nondeterministic. Indeed, one can chose among various occurrences of nonterminals A in , and also among various productions A with left-hand side A. For example, using the grammar G1 = ({E, a, b}, {a, b}, E, P ), where P is the set of rules E aEb, E ab, every derivation from E is of the form E = an Ebn = an abbn = an+1 bn+1 , or where n 0. E = an Ebn = an aEbbn = an+1 Ebn+1 , Grammar G1 is very simple: every string an bn has a unique derivation. This is usually not the case. For example, using the grammar G2 = ({E, +, , (, ), a}, {+, , (, ), a}, E, P ), where P is the set of rules E E + E, E E E, E (E), E a,

the string a + a a has the following distinct derivations, where the boldface indicates which occurrence of E is rewritten: = a + E E = a + a E = a + a a, and E = E + E = a + E = a + E E = a + a E = a + a a. In the above derivations, the leftmost occurrence of a nonterminal is chosen at each step. Such derivations are called leftmost derivations. We could systematically rewrite the rightmost occurrence of a nonterminal, getting rightmost derivations. The string a + a a also has the following two rightmost derivations, where the boldface indicates which occurrence of E is rewritten: = E + E a = E + a a = a + a a, and E = E E = E a = E + E a = E + a a = a + a a. The language generated by a context-free grammar is dened as follows. 59 E = E + E = E + E E E = E E = E + E E

Denition 2.3.2 Given a context-free grammar G = (V, , P, S), the language generated by G is the set L(G) = {w | S = w}. A language L is a context-free language (for short, CFL) i L = L(G) for some context-free grammar G. It is technically very useful to consider derivations in which the leftmost nonterminal is always selected for rewriting, and dually, derivations in which the rightmost nonterminal is always selected for rewriting. Denition 2.3.3 Given a context-free grammar G = (V, , P, S), the (one-step) leftmost derivation relation = associated with G is the binary relation = V V dened as follows: for all , V , we have
lm lm +

=
lm

i there exist u , V , and some production (A ) P , such that = uA

+ lm lm

and = u.
lm

The transitive closure of = is denoted as = and the reexive and transitive closure of = is denoted as dened as follows: for all , V , we have

=. The (one-step) rightmost derivation relation = associated with G is the binary relation = V V
lm rm rm

=
rm

i there exist V , v , and some production (A ) P , such that = Av

+ rm rm

and = v.
rm

The transitive closure of = is denoted as = and the reexive and transitive closure of = is denoted as =.
rm

Remarks: It is customary to use the symbols a, b, c, d, e for terminal symbols, and the symbols A, B, C, D, E for nonterminal symbols. The symbols u, v, w, x, y, z denote terminal strings, and the symbols , , , , , denote strings in V . The symbols X, Y, Z usually denote symbols in V . Given a context-free grammar G = (V, , P, S), parsing a string w consists in nding out whether w L(G), and if so, in producing a derivation for w. The following lemma is technically very important. It shows that leftmost and rightmost derivations are universal. This has some important practical implications for the complexity of parsing algorithms. Lemma 2.3.4 Let G = (V, , P, S) be a context-free grammar. For every w , for every derivation + + + S = w, there is a leftmost derivation S = w, and there is a rightmost derivation S = w.
lm rm

Proof . Of course, we have to somehow use induction on derivations, but this is a little tricky, and it is necessary to prove a stronger fact. We treat leftmost derivations, rightmost derivations being handled in a similar way. Claim: For every w , for every V + , if = w, then there is a leftmost derivation = w.
lm n n

The claim is proved by induction on n. 60

For n = 1, there exist some , V and some production A , such that = A and w = . Since w is a terminal string, , , and , are terminal strings. Thus, A is the only nonterminal in , and the 1 derivation step = w is a leftmost step (and a rightmost step!). If n > 1, then the derivation = w is of the form = 1 = w. There are two sub-cases. Case 1. If the derivation step = 1 is a leftmost step = 1 , by the induction hypothesis, there is a
lm n1 lm n1 n

leftmost derivation 1 = w, and we get the leftmost derivation = 1 = w.

lm lm n1

Case 2. The derivation step = 1 is a not a leftmost step. In this case, there must be some u , , V , some nonterminals A and B, and some production B , such that = uAB and 1 = uA,
n1

where A is the leftmost nonterminal in . Since we have a derivation 1 = w of length n 1, by the induction hypothesis, there is a leftmost derivation 1 = w.
lm n1

Since 1 = uA where A is the leftmost terminal in 1 , the rst step in the leftmost derivation 1 = w is of the form uA = u,
lm lm

for some production A . Thus, we have a derivation of the form = uAB = uA = u = w.

lm lm n2

We can commute the rst two steps involving the productions B and A , and we get the derivation = uAB = uB = u = w.
lm lm n2

This may no longer be a leftmost derivation, but the rst step is leftmost, and we are back in case 1. Thus, n1 we conclude by applying the induction hypothesis to the derivation uB = w, as in case 1. Lemma 2.3.4 implies that L(G) = {w | S = w} = {w | S = w}.
lm rm + +

We observed that if we consider the grammar G2 = ({E, +, , (, ), a}, {+, , (, ), a}, E, P ), where P is the set of rules E E + E, E E E, E (E), E a, 61

the string a + a a has the following two distinct leftmost derivations, where the boldface indicates which occurrence of E is rewritten: E = E E = E + E E = a + E E = a + a E = a + a a, and E = E + E = a + E = a + E E = a + a E = a + a a. When this happens, we say that we have an ambiguous grammars. In some cases, it is possible to modify a grammar to make it unambiguous. For example, the grammar G2 can be modied as follows. Let G3 = ({E, T, F, +, , (, ), a}, {+, , (, ), a}, E, P ), where P is the set of rules E E + T, E T, T T F, T F,

F (E), F a.

We leave as an exercise to show that L(G3 ) = L(G2 ), and that every string in L(G3 ) has a unique derivation. Unfortunately, it is not always possible to modify a context-free grammar to make it unambiguous. There exist context-free languages that have no unambiguous context-free grammars. For example, it can be shown that L3 = {am bm cn | m, n 1} {am bn cn | m, n 1} is context-free, but has no unambiguous grammars. All this motivates the following denition. Denition 2.3.5 A context-free grammar G = (V, , P, S) is ambiguous if there is some string w L(G) that has two distinct leftmost derivations (or two distinct rightmost derivations). Thus, a grammar G is unambiguous if every string w L(G) has a unique leftmost derivation (or a unique rightmost derivation). A context-free language L is inherently ambiguous if every CFG G for L is ambiguous. Whether or not a grammar is ambiguous aects the complexity of parsing. Parsing algorithms for unambiguous grammars are more ecient than parsing algorithms for ambiguous grammars. We now consider various normal forms for context-free grammars.

2.4

Normal Forms for Context-Free Grammars, Chomsky Normal Form

One of the main goals of this section is to show that every CFG G can be converted to an equivalent grammar in Chomsky Normal Form (for short, CNF). A context-free grammar G = (V, , P, S) is in Chomsky Normal Form i its productions are of the form A BC, A a, or S ,

where A, B, C N , a , S is in G i L(G), and S does not occur on the right-hand side of any production. Note that a grammar in Chomsky Normal Form does not have -rules, i.e., rules of the form A , except when L(G), in which case S is the only -rule. It also does not have chain rules, i.e., rules of the form A B, where A, B N . Thus, in order to convert a grammar to Chomsky Normal Form, we need to show how to eliminate -rules and chain rules. This is not the end of the story, since we may still have rules of the form A where either || 3 or || 2 and contains terminals. However, dealing with such rules is a simple recoding matter, and we rst focus on the elimination of -rules and chain rules. It turns out that -rules must be eliminated rst. The rst step to eliminate -rules is to compute the set E(G) of erasable (or nullable) nonterminals E(G) = {A N | A = }. The set E(G) is computed using a sequence of approximations Ei dened as follows: E0 = {A N | (A ) P }, Ei+1 = Ei {A | (A B1 . . . Bj . . . Bk ) P, Bj Ei , 1 j k}. Clearly, the Ei form an ascending chain E0 E1 Ei Ei+1 N, and since N is nite, there is a least i, say i0 , such that Ei0 = Ei0 +1 . We claim that E(G) = Ei0 . Actually, we prove the following lemma. Lemma 2.4.1 Given any context-free grammar G = (V, , P, S), one can construct a context-free grammar G = (V , , P , S ) such that: (1) L(G ) = L(G); (3) S does not occur on the right-hand side of any production in P . (2) P contains no -rules other than S , and S P i L(G);
+

Proof . We begin by proving that E(G) = Ei0 . For this, we prove that E(G) Ei0 and Ei0 E(G). To prove that Ei0 E(G), we proceed by induction on i. Since E0 = {A N | (A ) P }, we have 1 A = , and thus A E(G). By the induction hypothesis, Ei E(G). If A Ei+1 , either A Ei and then A E(G), or there is some production (A B1 . . . Bj . . . Bk ) P , such that Bj Ei for all j, 1 j k. + By the induction hypothesis, Bj = for each j, 1 j k, and thus A = B1 . . . Bj . . . Bk = B2 . . . Bj . . . Bk = Bj . . . Bk = , which shows that A E(G).
+ + +

To prove that E(G) Ei0 , we also proceed by induction, but on the length of a derivation A = . If 1 n+1 A = , then A P , and thus A E0 since E0 = {A N | (A ) P }. If A = , then A = = , for some production A P . If contains terminals of nonterminals not in E(G), it is impossible to derive from , and thus, we must have = B1 . . . Bj . . . Bk , with Bj E(G), for all j, 1 j k. However, 63
n

Bj = where nj n, and by the induction hypothesis, Bj Ei0 . But then, we get A Ei0 +1 = Ei0 , as desired. Having shown that E(G) = Ei0 , we construct the grammar G . Its set of production P is dened as follows. First, we create the production S S where S V , to make sure that S does not occur on the right-hand / side of any rule in P . Let P1 = {A | V + } {S S}, and let P2 be the set of productions P2 = {A 1 2 . . . k k+1 | 1 V , . . . , k+1 V , B1 E(G), . . . , Bk E(G) A 1 B1 2 . . . k Bk k+1 P, k 1, 1 . . . k+1 = }. Note that L(G) i S E(G). If S E(G), then let P = P1 P2 , and if S E(G), then let / P = P1 P2 {S }. We claim that L(G ) = L(G), which is proved by showing that every derivation using G can be simulated by a derivation using G , and vice-versa. All the conditions of the lemma are now met. From a practical point of view, the construction or lemma 2.4.1 is very costly. For example, given a grammar containing the productions S ABCDEF,

A , B , D , E , C ,

F , ... ..., eliminating -rules will create 26 1 = 63 new rules corresponding to the 63 nonempty subsets of the set {A, B, C, D, E, F }. We now turn to the elimination of chain rules. It turns out that matters are greatly simplied if we rst apply lemma 2.4.1 to the input grammar G, and we explain the construction assuming that G = (V, , P, S) satises the conditions of lemma 2.4.1. For every nonterminal A N , we dene the set IA = {B N | A = B}. The sets IA are computed using approximations IA,i dened as follows: IA,0 = {B N | (A B) P },
+

IA,i+1 = IA,i {C N | (B C) P, and B IA,i }. Clearly, for every A N , the IA,i form an ascending chain IA,0 IA,1 IA,i IA,i+1 N, and since N is nite, there is a least i, say i0 , such that IA,i0 = IA,i0 +1 . We claim that IA = IA,i0 . Actually, we prove the following lemma. Lemma 2.4.2 Given any context-free grammar G = (V, , P, S), one can construct a context-free grammar G = (V , , P , S ) such that: 64

(1) L(G ) = L(G); (3) S does not occur on the right-hand side of any production in P . (2) Every rule in P is of the form A where || 2, or A a where a , or S i L(G);

Proof . First, we apply lemma 2.4.1 to the grammar G, obtaining a grammar G1 = (V1 , , S1 , P1 ). The proof that IA = IA,i0 is similar to the proof that E(G) = Ei0 . First, we prove that IA,i IA by induction on i. This is straightforward. Next, we prove that IA IA,i0 by induction on derivations of the form + A = B. In this part of the proof, we use the fact that G1 has no -rules except perhaps S1 , and that n+1 S1 does not occur on the right-hand side of any rule. This implies that a derivation A = C is necessarily n of the form A = B = C for some B N . Then, in the induction step, we have B IA,i0 , and thus C IA,i0 +1 = IA,i0 . We now dene the following sets of rules. Let P2 = P1 {A B | A B P1 }, and let We claim that G = (V1 , , P2 P3 , S1 ) satises the conditions of the lemma. For example, S1 does not appear on the right-hand side of any production, since the productions in P3 have right-hand sides from P1 , and S1 does not appear on the right-hand side in P1 . It is also easily shown that L(G ) = L(G1 ) = L(G). Let us apply the method of lemma 2.4.2 to the grammar G3 = ({E, T, F, +, , (, ), a}, {+, , (, ), a}, E, P ), where P is the set of rules E E + T, E T, T T F, T F,

P3 = {A | B P1 , N1 , B IA }. /

F (E), F a.

We get IE = {T, F }, IT = {F }, and IF = . The new grammar G has the set of rules 3 E E + T, E T F, E (E), E a,

T T F, T (E), T a,

F (E), F a. At this stage, the grammar obtained in lemma 2.4.2 no longer has -rules (except perhaps S i L(G)) or chain rules. However, it may contain rules A with || 3, or with || 2 and where contains terminals(s). To obtain the Chomsky Normal Form. we need to eliminate such rules. This is not dicult, but notationally a bit messy. 65

Lemma 2.4.3 Given any context-free grammar G = (V, , P, S), one can construct a context-free grammar G = (V , , P , S ) such that L(G ) = L(G) and G is in Chomsky Normal Form, that is, a grammar whose productions are of the form A BC,

A a, S ,

where A, B, C N , a , S is in G i L(G), and S does not occur on the right-hand side of any production in P . Proof . First, we apply lemma 2.4.2, obtaining G1 . Let r be the set of terminals occurring on the right-hand side of rules A P1 , with || 2. For every a r , let Xa be a new nonterminal not in V1 . Let P2 = {Xa a | a r }. Let P1,r be the set of productions
where a1 , . . . , ak r and i N1 . For every production

A 1 a1 2 k ak k+1 , A 1 a1 2 k ak k+1

in P1,r , let A 1 Xa1 2 k Xak k+1 be a new production, and let P3 be the set of all such productions. Let P4 = (P1 P1,r ) P2 P3 . Now, productions A in P4 with || 2 do not contain terminals. However, we may still have productions A P4 with || 3. We can perform some recoding using some new nonterminals. For every production of the form A B1 Bk , where k 3, create the new nonterminals [B1 Bk1 ], [B1 Bk2 ], , [B1 B2 B3 ], [B1 B2 ], and the new productions A [B1 Bk1 ]Bk ,

[B1 Bk1 ] [B1 Bk2 ]Bk1 , , [B1 B2 B3 ] [B1 B2 ]B3 , [B1 B2 ] B1 B2 .

All the productions are now in Chomsky Normal Form, and it is clear that the same language is generated. Applying the rst phase of the method of lemma 2.4.3 to the grammar G , we get the rules 3 E T X F, E X( EX) , E EX+ T,

E a, 66

X+ +, X , X( (, X) ).

F X( EX) , F a,

T X( EX) , T a,

T T X F,

After applying the second phase of the method, we get the following grammar in Chomsky Normal Form: E [EX+ ]T, [EX+ ] EX+ , E [T X ]F, [T X ] T X ,

E [X( E]X) , [X( E] X( E, E a, T [T X ]F,

X+ +, X , X) ). X( (,

F [X( E]X) , F a,

T [X( E]X) , T a,

For large grammars, it is often convenient to use the abbreviation which consists in grouping productions having a common left-hand side, and listing the right-hand sides separated by the symbol |. Thus, a group of productions A 1 , A 2 ,

, A k , may be abbreviated as A 1 | 2 | | k . An interesting corollary of the CNF is the following decidability result. There is an algorithm which, given any context-free grammar G, given any string w , decides whether w L(G). Indeed, we rst convert G to a grammar G in Chomsky Normal Form. If w = , we can test whether L(G), since this is the case i S P . If w = , letting n = |w|, note that since the rules are of the form A BC or A a, where 67

a , any derivation for w has n 1 + n = 2n 1 steps. Thus, we enumerate all (leftmost) derivations of length 2n 1.

There are much better parsing algorithms than this naive algorithm. We now show that every regular language is context-free.

2.5

Regular Languages are Context-Free

The regular languages can be characterized in terms of very special kinds of context-free grammars, rightlinear (and left-linear) context-free grammars. Denition 2.5.1 A context-free grammar G = (V, , P, S) is left-linear i its productions are of the form A Ba, A a,

where A, B N , and a . A context-free grammar G = (V, , P, S) is right-linear i its productions are of the form A aB, A a,

A .

where A, B N , and a .

A .

The following lemma shows the equivalence between NFAs and right-linear grammars. Lemma 2.5.2 A language L is regular if and only if is it generated by some right-linear grammar. Proof . Let L = L(D) for some DFA D = (Q, , , q0 , F ). We construct a right-linear grammar G as follows. Let V = Q , S = q0 , and let P be dened as follows: P = {p aq | q = (p, a), p, q Q, a } {p | p F }. p = wq and thus, L(D) = L(G). Conversely, let G = (V, , P, S) be a right-linear grammar. First, let G = (V , , P , S) be the right-linear grammar obtained from G by adding the new terminal E to N , replacing every rule in P of the form A a where a by the rule A aE, and adding the rule E . It is immediately veried that L(G ) = L(G). Next, we construct the NFA M = (Q, , , q0 , F ) as follows: Q = N , q0 = S, F = {A N | A }, and for all A N and all a . It is easily shown by induction on the length of w that A = wB and thus, L(M ) = L(G ) = L(G).

It is easily shown by induction on the length of w that i q = (p, w),

(A, a) = {B N | A aB P }, i B (A, w),

A similar lemma holds for left-linear grammars. It is also easily shown that the regular languages are exactly the languages generated by context-free grammars whose rules are of the form A u, A Bu,

where A, B N , and u .

2.6

Useless Productions in Context-Free Grammars

Given a context-free grammar G = (V, , P, S), it may contain rules that are useless for a number of reasons. For example, consider the grammar G3 = ({E, A, a, b}, {a, b}, E, P ), where P is the set of rules E aEb, E ab, E A,

A bAa.

The problem is that the nonterminal A does not derive any terminal strings, and thus, it is useless, as well as the last two productions. Let us now consider the grammar G4 = ({E, A, a, b, c, d}, {a, b, c, d}, E, P ), where P is the set of rules E aEb, E ab, A cAd, A cd.

This time, the nonterminal A generates strings of the form cn dn , but there is no derivation E = from E where A occurs in . The nonterminal A is not connected to E, and the last two rules are useless. Fortunately, it is possible to nd such useless rules, and to eliminate them. Let T (G) be the set of nonterminals that actually derive some terminal string, i.e. T (G) = {A (V ) | w , A =+ w}. The set T (G) can be dened by stages. We dene the sets Tn (n 1) as follows: T1 = {A (V ) | (A w) P, with w }, and

It is easy to prove that there is some least n such that Tn+1 = Tn , and that for this n, T (G) = Tn . If S T (G), then L(G) = , and G is equivalent to the trivial grammar / G = ({S}, , , S). If S T (G), then let U (G) be the set of nonterminals that are actually useful, i.e., U (G) = {A T (G) | , (T (G) ) , S = A}. The set U (G) can also be computed by stages. We dene the sets Un (n 1) as follows: U1 = {A T (G) | (S A) P, with , (T (G) ) }, and

Tn+1 = Tn {A (V ) | (A ) P, with (Tn ) }.

It is easy to prove that there is some least n such that Un+1 = Un , and that for this n, U (G) = Un {S}. Then, we can use U (G) to transform G into an equivalent CFG in which every nonterminal is useful (i.e., for which V = U (G)). Indeed, simply delete all rules containing symbol not in U (G). The details are left as an exercise. It should be noted than although dull, the above considerations are important in practice. Certain algorithms for constructing parsers, for example, LR-parsers, may loop if useless rules are not eliminated! We now consider another normal form for context-free grammars, the Greibach Normal Form. 69

Un+1 = Un {B T (G) | (A B) P, with A Un , , (T (G) ) }.

2.7

The Greibach Normal Form

Every CFG G can also be converted to an equivalent grammar in Greibach Normal Form (for short, GNF). A context-free grammar G = (V, , P, S) is in Greibach Normal Form i its productions are of the form A aBC,

A aB, A a, or S , where A, B, C N , a , S is in G i L(G), and S does not occur on the right-hand side of any production. Note that a grammar in Greibach Normal Form does not have -rules other than possibly S . More importantly, except for the special rule S , every rule produces some terminal symbol. An important consequence of the Greibach Normal Form is that every nonterminal is not left recursive. A + nonterminal A is left recursive i A = A for some V . Left recursive nonterminals cause top-down deterministic parsers to loop. The Greibach Normal Form provides a way of avoiding this problem. There are no easy proofs that every CFG can be converted to a Greibach Normal Form. A particularly elegant method due to Rosenkrantz using least xed-points and matrices will be given in section 2.10. Lemma 2.7.1 Given any context-free grammar G = (V, , P, S), one can construct a context-free grammar G = (V , , P , S ) such that L(G ) = L(G) and G is in Greibach Normal Form, that is, a grammar whose productions are of the form A aBC, A aB,

A a, S ,

where A, B, C N , a , S is in G i L(G), and S does not occur on the right-hand side of any production in P .

2.8

Least Fixed-Points

Context-free languages can also be characterized as least xed-points of certain functions induced by grammars. This characterization yields a rather quick proof that every context-free grammar can be converted to Greibach Normal Form. This characterization also reveals very clearly the recursive nature of the context-free languages. We begin by reviewing what we need from the theory of partially ordered sets. Denition 2.8.1 Given a partially ordered set A, , an -chain (an )n0 is a sequence such that an an+1 for all n 0. The least-upper bound of an -chain (an ) is an element a A such that: (1) an a, for all n 0; (2) For any b A, if an b, for all n 0, then a b. A partially ordered set A, is an -chain complete poset i it has a least element , and i every -chain has a least upper bound denoted as an .

Remark : The in -chain means that we are considering countable chains ( is the ordinal associated with the order-type of the set of natural numbers). This notation may seem arcane, but is standard in denotational semantics. For example, given any set X, the power set 2X ordered by inclusion is an -chain complete poset with least element . The Cartesian product 2X 2X ordered such that
n

(A1 , . . . , An ) (B1 , . . . , Bn ) i Ai Bi (where Ai , Bi 2X ) is an -chain complete poset with least element (, . . . , ). We are interested in functions between partially ordered sets. Denition 2.8.2 Given any two partially ordered sets A1 , 1 and A2 , 2 , a function f : A1 A2 is monotonic i for all x, y A1 , x 1 y implies that f (x) 2 f (y).

If A1 , 1 and A2 , 2 are -chain complete posets, a function f : A1 A2 is -continuous i it is monotonic, and for every -chain (an ), f ( an ) = f (an ). Remark : Note that we are not requiring that an -continuous function f : A1 A2 preserve least elements, i.e., it is possible that f (1 ) =2 . We now dene the crucial concept of a least xed-point. Denition 2.8.3 Let A, be a partially ordered set, and let f : A A be a function. A xed-point of f is an element a A such that f (a) = a. The least xed-point of f is an element a A such that f (a) = a, and for every b A such that f (b) = b, then a b. The following lemma gives sucient conditions for the existence of least xed-points. It is one of the key lemmas in denotational semantics. Lemma 2.8.4 Let A, be an -chain complete poset with least element . Every -continuous function f : A A has a unique least xed-point x0 given by x0 = f n ().

Furthermore, for any b A such that f (b) b, then x0 b. Proof . First, we prove that the sequence , f () , f 2 (), . . . , f n (), . . . is an -chain. This is shown by induction on n. Since is the least element of A, we have f (). Assuming by induction that f n () f n+1 (), since f is -continuous, it is monotonic, and thus we get f n+1 () f n+2 (), as desired. Since A is an -chain complete poset, the -chain (f n ()) has a least upper bound x0 = Since f is -continuous, we have f (x0 ) = f ( f n ()) = f (f n ()) = 71 f n+1 () = x0 , f n ().

and x0 is indeed a xed-point of f . Clearly, if f (b) b implies that x0 b, then f (b) = b implies that x0 b. Thus, assume that f (b) b for some b A. We prove by induction of n that f n () b. Indeed, b, since is the least element of A. Assuming by induction that f n () b, by monotonicity of f , we get f (f n ()) f (b), and since f (b) b, this yields Since f n () b for all n 0, we have f n+1 ()) b. x0 = f n () b.

The second part of lemma 2.8.4 is very useful to prove that functions have the same least xed-point. For example, under the conditions of lemma 2.8.4, if g: A A is another -chain continuous function, letting x0 be the least xed-point of f and y0 be the least xed-point of g, if f (y0 ) y0 and g(x0 ) x0 , we can deduce that x0 = y0 . Indeed, since f (y0 ) y0 and x0 is the least xed-point of f , we get x0 y0 , and since g(x0 ) x0 and y0 is the least xed-point of g, we get y0 x0 , and therefore x0 = y0 .

Lemma 2.8.4 also shows that the least xed-point x0 of f can be approximated as much as desired, using the sequence (f n ()). We will now apply this fact to context-free grammars. For this, we need to show how a context-free grammar G = (V, , P, S) with m nonterminals induces an -continuous map G : 2 2 2 2 .
m m

2.9

Context-Free Languages as Least Fixed-Points

Given any context-free grammar G = (V, , P, S) with m nonterminals A1 , . . . Am , grouping all the productions having the same left-hand side, the grammar G can be concisely written as A1 1,1 + + 1,n1 , Ai i,1 + + i,ni ,

Am m,1 + + m,nn . Given any set A, let Pf in (A) be the set of nite subsets of A. Denition 2.9.1 Let G = (V, , P, S) be a context-free grammar with m nonterminals A1 , . . ., Am . For any m-tuple = (L1 , . . . , Lm ) of languages Li , we dene the function []: Pf in (V ) 2 inductively as follows: []({}) = {}, []({a}) = {a}, []() = ,

[]({Ai }) = Li , if Ai N , []({X}) = []({})[]({X}), [](S {}) = [](S) []({}),

if a ,

if S Pf in (V ), S = , V , S. / 72

if V + , X V,

Then, writing the grammar G as A1 1,1 + + 1,n1 , Ai i,1 + + i,ni ,

Am m,1 + + m,nn , we dene the map G : 2 2 2 2

m m

such that G (L1 , . . . Lm ) = ([]({1,1 , . . . , 1,n1 }), . . . , []({m,1 , . . . , m,nm })) for all = (L1 , . . . , Lm ) 2 2 .
m

One should verify that the map [] is well dened, but this is easy. The following lemma is easily shown. Lemma 2.9.2 Given a context-free grammar G = (V, , P, S) with m nonterminals A1 , . . ., Am , the map G : 2 2 2 2
m m

is -continuous. Now, 2 2 is an -chain complete poset, and the map G is -continuous. Thus, by lemma 2.8.4, the map G has a least-xed point. It turns out that the components of this least xed-point are precisely the languages generated by the grammars (V, , P, Ai ). Before proving this fact, let us give an example illustrating it. Example. Consider the grammar G = ({A, B, a, b}, {a, b}, P, A) dened by the rules B aBb + ab. The least xed-point of G is the least upper bound of the chain (n (, )) = ((n (, ), n (, )), G,A G,B G where 0 (, ) = 0 (, ) = , G,A G,B and n+1 (, ) = n (, )n (, ) {ab}, G,B G,B G,A n+1 (, ) = an (, )b {ab}. G,B G,B A BB + ab,
m

It is easy to verify that 1 (, ) = {ab}, G,A

1 (, ) = {ab}, G,B

2 (, ) = {ab, aabb}, G,B

2 (, ) = {ab, abab}, G,A 3 (, ) = {ab, aabbab, aabbaabb, abaabb}, G,A

3 (, ) = {ab, aabb, aaabbb}. G,B

By induction, we can easily prove that the two components of the least xed-point are the languages LA = {am bm an bn | m, n 1} {ab} and LB = {an bn | n 1}. Letting GA = ({A, B, a, b}, {a, b}, P, A) and GB = ({A, B, a, b}, {a, b}, P, B), it is indeed true that LA = L(GA ) and LB = L(GB ) . We have the following theorem due to Ginsburg and Rose. Theorem 2.9.3 Given any context-free grammar G = (V, , P, S) with m nonterminals A1 , . . ., Am , the least xed-point of the map G is the m-tuple of languages (L(GA1 ), . . . , L(GAm )), where GAi = (V, , P, Ai ). Proof . Writing G as A1 1,1 + + 1,n1 , Am m,1 + + m,nn , let M = max{|i,j |} be the maximum length of right-hand sides of rules in P . Let n (, . . . , ) = (n (, . . . , ), . . . , n (, . . . , )). G G,1 G,m Then, for any w , observe that w 1 (, . . . , ) G,i w n (, . . . , ) G,i for some n 2 i there is some rule Ai i,j with i,j of the form i,j = u1 Aj1 u2 uk Ajk uk+1 , where u1 , . . . , uk+1 , k 1, and some w1 , . . . , wk such that
n1 wh G,jh (, . . . , ),

Ai i,1 + + i,ni ,

i there is some rule Ai i,j with w = i,j , and that

and w = u1 w1 u2 uk wk uk+1 . We prove the following two claims. Claim 1: For every w , if Ai = w, then w p (, . . . , ), for some p 1. G,i Claim 2: For every w , if w n (, . . . , ), with n 1, then Ai = w for some p (M + 1)n1 . G,i Proof of Claim 1. We proceed by induction on n. If Ai = w, then w = i,j for some rule A i,j , and by the remark just before the claim, w 1 (, . . . , ). G,i If Ai = w with n 1, then for some rule Ai i,j . If
n+1 1 p n

Ai = i,j = w i,j = u1 Aj1 u2 uk Ajk uk+1 ,

h where u1 , . . . , uk+1 , k 1, then Ajh = wh , where nh n, and

w = u1 w1 u2 uk wk uk+1 for some w1 , . . . , wk . By the induction hypothesis, wh ph h (, . . . , ), G,j for some ph 1, for every h, 1 h k. Letting p = max{p1 , . . . , pk }, since each sequence (q (, . . . , )) G,i is an -chain, we have wh p h (, . . . , ) for every h, 1 h k, and by the remark just before the claim, G,j w p+1 (, . . . , ). G,i Proof of Claim 2. We proceed by induction on n. If w 1 (, . . . , ), by the remark just before the claim, G,i If w n (, . . . , ) for some n 2, then there is some rule Ai i,j with i,j of the form G,i i,j = u1 Aj1 u2 uk Ajk uk+1 , where u1 , . . . , uk+1 , k 1, and some w1 , . . . , wk such that
n1 wh G,jh (, . . . , ),

then w = i,j for some rule A i,j , and Ai = w.

and By the induction hypothesis, Ajh = wh with ph (M + 1)n2 , and thus Ai = u1 Aj1 u2 uk Ajk uk+1 = = w, so that Ai = w with p p1 + + pk + 1 M (M + 1)n2 + 1 (M + 1)n1 , since k M . Combining Claim 1 and Claim 2, we have L(GAi ) =
n p p1 pk ph

w = u1 w1 u2 uk wk uk+1 .

n (, . . . , ), G,i 75

which proves that the least xed-point of the map G is the m-tuple of languages (L(GA1 ), . . . , L(GAm )).

We now show how theorem 2.9.3 can be used to give a short proof that every context-free grammar can be converted to Greibach Normal Form.

2.10

Least Fixed-Points and the Greibach Normal Form

The hard part in converting a grammar G = (V, , P, S) to Greibach Normal Form is to convert it to a grammar in so-called weak Greibach Normal Form, where the productions are of the form A a, S , or

where a , V , and if S is a rule, then S does not occur on the right-hand side of any rule. Indeed, if we rst convert G to Chomsky Normal Form, it turns out that we will get rules of the form A aBC, A aBC or A a. Using the algorithm for eliminating -rules and chain rules, we can rst convert the original grammar to a grammar with no chain rules and no -rules except possibly S , in which case, S does not appear on the right-hand side of rules. Thus, for the purpose of converting to weak Greibach Normal Form, we can assume that we are dealing with grammars without chain rules and without -rules. Let us also assume that we computed the set T (G) of nonterminals that actually derive some terminal string, and that useless productions involving symbols not in T (G) have been deleted. Let us explain the idea of the conversion using the following grammar: A AaB + BB + b.

B Bd + BAa + aA + c. The rst step is to group the right-hand sides into two categories: those whose leftmost symbol is a terminal ( V ) and those whose leftmost symbol is a nonterminal ( N V ). It is also convenient to adopt a matrix notation, and we can write the above grammar as aB B {d, Aa}

(A, B) = (A, B)

+ (b, {aA, c})

Thus, we are dealing with matrices (and row vectors) whose entries are nite subsets of V . For notational simplicity, braces around singleton sets are omitted. The nite subsets of V form a semiring, where addition is union, and multiplication is concatenation. Addition and multiplication of matrices are as usual, except that the semiring operations are used. We will also consider matrices whose entries are languages over . Again, the languages over form a semiring, where addition is union, and multiplication is concatenation. The identity element for addition is , and the identity element for multiplication is {}. As above, addition and multiplication of matrices are as usual, except that the semiring operations are used. For example, given any languages Ai,j and Bi,j over , where i, j {1, 2}, we have A1,1 A2,1 A1,2 A2,2 B1,1 B2,1 B1,2 B2,2 = A1,1 B1,1 A1,2 B2,1 A2,1 B1,1 A2,2 B2,1 76 A1,1 B1,2 A1,2 B2,2 A2,1 B1,2 A2,2 B2,2

Letting X = (A, B), K = (b, {aA, c}), and H= the above grammar can be concisely written as X = XH + K. More generally, given any context-free grammar G = (V, , P, S) with m nonterminals A1 , . . ., Am , assuming that there are no chain rules, no -rules, and that every nonterminal belongs to T (G), letting X = (A1 , . . . , Am ), we can write G as X = XH + K, for some appropriate m m matrix H in in which every entry contains a set (possibly empty) of strings in V + , and some row vector K in which every entry contains a set (possibly empty) of strings each beginning with a terminal ( V ). aB B {d, Aa}

Given an m m square matrix A = (Ai,j ) of languages over , we can dene the matrix A whose entry A is given by i,j A = i,j
n0 0 n

An , i,j

where A = Idm , the identity matrix, and A is the n-th power of A. Similarly, we dene A+ = (
n1

An ). i,j

Given a matrix A where the entries are nite subset of V , where A = {A1 , . . . , Am }, for any m-tuple = (L1 , . . . , Lm ) of languages over , we let [](A) = ([](Ai,j )). Given a system X = XH + K where H is an m m matrix and X, K are row matrices, if H and K do not contain any nonterminals, we claim that the least xed-point of the grammar G associated with X = XH +K is KH . This is easily seen by computing the approximations X n = n (, . . . , ). Indeed, X 0 = K, and G X n = KH n + KH n1 + + KH + K = K(H n + H n1 + + H + Im ). Similarly, if Y is an m m matrix of nonterminals, the least xed-point of the grammar associated with Y = HY + H is H + (provided that H does not contain any nonterminals). Given any context-free grammar G = (V, , P, S) with m nonterminals A1 , . . ., Am , writing G as X = XH + K as explained earlier, we can form another grammar GH by creating m2 new nonterminals Yi,j , where the rules of this new grammar are dened by the system of two matrix equations X = KY + K, Y = HY + H, where Y = (Yi,j ). The following lemma is the key to the Greibach Normal Form. 77

Lemma 2.10.1 Given any context-free grammar G = (V, , P, S) with m nonterminals A1 , . . ., Am , writing G as X = XH + K as explained earlier, if GH is the grammar dened by the system of two matrix equations X = KY + K, Y = HY + H, as explained above, then the components in X of the least-xed points of the maps G and GH are equal. Proof . Let U be the least-xed point of G , and let (V, W ) be the least xed-point of GH . We shall prove that U = V . For notational simplicity, let us denote [U ](H) as H[U ] and [U ](K) as K[U ]. Since U is the least xed-point of X = XH + K, we have U = U H[U ] + K[U ]. Since H[U ] and K[U ] do not contain any nonterminals, by a previous remark, K[U ]H [U ] is the least-xed point of X = XH[U ] + K[U ], and thus, K[U ]H [U ] U. On the other hand, by monotonicity, K[U ]H [U ]H K[U ]H [U ] + K K[U ]H [U ] K[U ]H [U ]H[U ] + K[U ] = K[U ]H [U ], and since U is the least xed-point of X = XH + K, U K[U ]H [U ]. Therefore, U = K[U ]H [U ]. We can prove in a similar manner that W = H[V ]+ . Let Z = H[U ]+ . We have K[U ]Z + K[U ] = K[U ]H[U ]+ + K[U ] = K[U ]H[U ] = U, and H[U ]Z + H[U ] = H[U ]H[U ]+ + H[U ] = H[U ]+ = Z, and since (V, W ) is the least xed-point of X = KY + K and Y = HY + H, we get V U and W H[U ]+ . We also have and V = K[V ]W + K[V ] = K[V ]H[V ]+ + K[V ] = K[V ]H[V ] , V H[V ] + K[V ] = K[V ]H[V ] H[V ] + K[V ] = K[V ]H[V ] = V,

and since U is the least xed-point of X = XH + K, we get U V . Therefore, U = V , as claimed. Note that the above lemma actually applies to any grammar. Applying lemma 2.10.1 to our example grammar, we get the following new grammar: Y1 Y3 Y2 Y4 Y1 Y3 78

(A, B) = (b, {aA, c}) Y1 Y3 Y2 Y4 = aB B {d, Aa}

+ (b, {aA, c}), Y2 Y4 + aB B {d, Aa}

There are still some nonterminals appearing as leftmost symbols, but using the equations dening A and B, we can replace A with {bY1 , aAY3 , cY3 , b} and B with {bY2 , aAY4 , cY4 , aA, c}, obtaining a system in weak Greibach Normal Form. This amounts to converting the matrix aB B {d, Aa}

H= to the matrix aB

{bY2 , aAY4 , cY4 , aA, c} {d, bY1 a, aAY3 a, cY3 a, ba}

The weak Greibach Normal Form corresponds to the new system X = KY + K, Y = LY + L. This method works in general for any input grammar with no -rules, no chain rules, and such that every nonterminal belongs to T (G). Under these conditions, the row vector K contains some nonempty entry, all strings in K are in V , and all strings in H are in V + . After obtaining the grammar GH dened by the system X = KY + K, Y = HY + H, we use the system X = KY + K to express every nonterminal Ai in terms of expressions containing strings i,j involving a terminal as the leftmost symbol (i,j V ), and we replace all leftmost occurrences of nonterminals in H (occurrences Ai in strings of the form Ai , where V ) using the above expressions. In this fashion, we obtain a matrix L, and it is immediately shown that the system X = KY + K, Y = LY + L, generates the same tuple of languages. Furthermore, this last system corresponds to a weak Greibach Normal Form. It we start with a grammar in Chomsky Normal Form (with no production S ) such that every nonterminal belongs to T (G), we actually get a Greibach Normal Form (the entries in K are terminals, and the entries in H are nonterminals). Thus, we have justied lemma 2.7.1. The method is also quite economical, since it introduces only m2 new nonterminals. However, the resulting grammar may contain some useless nonterminals.

2.11

Tree Domains and Gorn Trees

Derivation trees play a very important role in parsing theory and in the proof of a strong version of the pumping lemma for the context-free languages known as Ogdens lemma. Thus, it is important to dene derivation trees rigorously. We do so using Gorn trees. Let N+ = {1, 2, 3, . . .}. Denition 2.11.1 A tree domain D is a nonempty subset of strings in N satisfying the conditions: + (1) For all u, v D, if uv D, then u D. (2) For all u D, for every i N+ , if ui D then uj D for every j, 1 j i. The tree domain D = {, 1, 2, 11, 21, 22, 221, 222, 2211} is represented as follows: 1 11 21 221 2211 2 22 222

A tree labeled with symbols from a set is dened as follows. Denition 2.11.2 Given a set of labels, a -tree (for short, a tree) is a total function t : D , where D is a tree domain. The domain of a tree t is denoted as dom(t). Every string u dom(t) is called a tree address or a node. Let = {f, g, h, a, b}. The tree t : D , where D is the tree domain of the previous example and t is the function whose graph is {(, f ), (1, h), (2, g), (11, a), (21, a), (22, f ), (221, h), (222, b), (2211, a)} is represented as follows: f h a a h a g f b

The outdegree (sometimes called ramication) r(u) of a node u is the cardinality of the set {i | ui dom(t)}. Note that the outdegree of a node can be innite. Most of the trees that we shall consider will be nitebranching, that is, for every node u, r(u) will be an integer, and hence nite. If the oudegree of all nodes in a tree is bounded by n, then we can view the domain of the tree as being dened over {1, 2, . . . , n} . A node of outdegree 0 is called a leaf . The node whose address is is called the root of the tree. A tree is nite if its domain dom(t) is nite. Given a node u in dom(t), every node of the form ui in dom(t) with i N+ is called a son (or immediate successor ) of u. Tree addresses are totally ordered lexicographically: u v if either u is a prex of v or, there exist strings x, y, z N and i, j N+ , with i < j, such that u = xiy and v = xjz. + In the rst case, we say that u is an ancestor (or predecessor ) of v (or u dominates v) and in the second case, that u is to the left of v. If y = and z = , we say that xi is a left brother (or left sibling) of xj, (i < j). Two tree addresses u and v are independent if u is not a prex of v and v is not a prex of u. Given a nite tree t, the yield of t is the string t(u1 )t(u2 ) t(uk ), where u1 , u2 , . . . , uk is the sequence of leaves of t in lexicographic order. For example, the yield of the tree below is aaab: f h a a h a Given a nite tree t, the depth of t is the integer d(t) = max{|u| | u dom(t)}. Given a tree t and a node u in dom(t), the subtree rooted at u is the tree t/u, whose domain is the set {v | uv dom(t)} and such that t/u(v) = t(uv) for all v in dom(t/u). Another important operation is the operation of tree replacement (or tree substitution). Denition 2.11.3 Given two trees t1 and t2 and a tree address u in t1 , the result of replacing t2 at u in t1 , denoted by t1 [u t2 ], is the function whose graph is the set of pairs {(v, t1 (v)) | u is not a prex of v} {(uv, t2 (v)}. g f b

Let t1 and t2 be the trees dened by the following diagrams: Tree t1 f h a a h a Tree t2 g a b g f b

The tree t1 [22 t2 ] is dened by the following diagram: f h a a a g g b

We can now dene derivation trees and relate derivations to derivation trees.

2.12

Derivations Trees

Denition 2.12.1 Given a context-free grammar G = (V, , P, S), for any A N , an A-derivation tree for G is a (V {})-tree t such that: (1) t() = A; (2) For every nonleaf node u dom(t), if u1, . . . , uk are the successors of u, then either there is a production A X1 Xk in P such that t(u) = A and t(ui) = Xi for all i, 1 i k, or A P , t(u) = A and t(u1) = . A complete derivation (or parse tree) is an S-tree whose yield belongs to . A derivation tree for the grammar G3 = ({E, T, F, +, , (, ), a}, {+, , (, ), a}, E, P ),

where P is the set of rules E E + T, E T, T T F, T F,

F (E), F a, (* Picture of tree goes here *)

is shown below. The yield of the derivation tree is a + a a. Derivations trees are associated to derivations inductively as follows. Denition 2.12.2 Given a context-free grammar G = (V, , P, S), for any A N , if : A = is a derivation in G, we construct an A-derivation tree t with yield as follows. (1) If n = 0, then t is the one-node tree such that dom(t ) = {} and t () = A. (2) If A = B = = , then if t1 is the A-derivation tree with yield B associated with the n1 derivation A = B, and if t2 is the tree associated with the production A (that is, if = X1 Xk , then dom(t2 ) = {, 1, . . . , k}, t2 () = A, and t2 (i) = Xi for all i, 1 i k, or if = , then dom(t2 ) = {, 1}, t2 () = A, and t2 (1) = ), then t = t1 [u t2 ], where u is the address of the leaf labeled B in t1 . The tree t is the A-derivation tree associated with the n derivation A = . Given the grammar G2 = ({E, +, , (, ), a}, {+, , (, ), a}, E, P ), where P is the set of rules E E + E, E E E, E (E), E a,
n1 n

the parse trees associated with two derivations of the string a + a a are shown below. (* insert picture of trees here *) The following lemma is easily shown. Lemma 2.12.3 Let G = (V, , P, S) be a context-free grammar. For any derivation A = , there is a unique A-derivation tree associated with this derivation, with yield . Conversely, for any A-derivation tree t with yield , there is a unique leftmost derivation A = in G having t as its associated derivation tree.
lm n

We will now prove a strong version of the pumping lemma for context-free languages due to Bill Ogden (1968). 83

2.13

Ogdens Lemma

Ogdens lemma states some combinatorial properties of parse trees that are deep enough. The yield w of such a parse tree can be split into 5 substrings u, v, x, y, z such that w = uvxyz, where u, v, x, y, z satisfy certain conditions. It turns out that we get a more powerful version of the lemma is we allow ouselves to mark certain occurrences of symbols in w before invoking the lemma. We can imagine that marked occurrences in a nonempty string w are occurrences of symbols in w in boldface, or red, or any given color. For example, given w = aaababbbaa, we can mark the symbols of even index as follows: aaababbbaa. In view of lemma 2.4.1, we can assume without loss of generality that we are considering grammars without -rules (except perhaps S ). Ogdens lemma only yields useful information for grammars G generating an innite language. We could make this hypothesis, but it seems more elegant to use the precondition that the lemma only applies to strings w L(D) such that w contains at least K marked occurrences, for a constant K large enough. If K is large enough, L(G) will indeed be innite. Lemma 2.13.1 For every context-free grammar (without -rules), there is some integer K > 1 such that, for every marked string w + , if w L(G) and w contains at least K marked occurrences, then there exists some decomposition of w as w = uvxyz, and some A N , such that the following properties hold: (1) There are derivations S = uAz, A = vAy, and A = x, so that uv n xy n z L(G) for all n 0 (the pumping property); (2) x contains some marked occurrence; (3) Either (both u and v contain some marked occurrence), or (both y and z contain some marked occurrence); (4) vxy contains less than K marked occurrences. Proof . Let t be any parse tree for w. We call a leaf of t a marked leaf if its label is a marked occurrence in the marked string w. The general idea is to make sure that K is large enough so that parse trees with yield w contain enough repeated nonterminals along some path from the root to some marked leaf. Let r = |N |, and let p = max{2, max{|| | (A ) P }}. We claim that K = p2r+3 does the job. Is is easily shown by induction on the depth of trees that if the outdegree of a tree t is bounded by p 2, then if t has at least pd leaves, the depth of t is at least d (and the maximum outdegree of nodes in t is at least 2. This is false for p = 1). Thus, if K = p2r+3 , since p 2 and we are assuming that the marked input w has at least K marked occurrences, the depth of any parse tree for w is a least 2r + 3. The key concept in the proof is the notion of a B-node. Given a parse tree t, a B-node is a node with at least two immediate successors u1 , u2 , such that for i = 1, 2, either ui is a marked leaf, or ui has some marked leaf as a descendant. We consider all paths from the root to some marked leaf maximizing the number of B-nodes. Among such paths, dene a path (s0 , . . . , sn ) from the root to some marked leaf, such that: (i) s0 is the root of t;
+ + +

(ii) If sj is in the path, sj is not a leaf, and sj has a single descendant which is a marked leaf v, let sj+1 be the immediate successor of sj on the unique path from sj to the marked leaf v; (iii) If sj is a B-node in the path, then let sj+1 be the leftmost immediate successors of sj with the maximum number of marked leaves as descendants (assuming that if sj+1 is a marked leaf, then it is its own descendant). (iv) If sj is a leaf, then it is a marked leaf and n = j. We will show that the path (s0 , . . . , sn ) contains at least 2r + 3 B-nodes. Claim: For every i, 0 i n, if the path (si , . . . , sn ) contains at most b B-nodes, then si has at most pb marked leaves as descendants. Proof . We proceed by backward induction, i.e., by induction on n i. For i = n, there are no B-nodes, so that b = 0, and there is indeed p0 = 1 marked leaf sn . Assume that the claim holds for the path (si+1 , . . . , sn ). If si is not a B-node, then the number b of B-nodes in the path (si+1 , . . . , sn ) is the same as the number of B-nodes in the path (si , . . . , sn ), and si+1 is the only immediate successor of si having a marked leaf as descendant. By the induction hypothesis, si+1 has at most pb marked leaves as descendants, and this is also an upper bound on the number of marked leaves which are descendants of si . If si is a B-node, then if there are b B-nodes in the path (si+1 , . . . , sn ), there are b + 1 B-nodes in the path (si , . . . , sn ). By the induction hypothesis, si+1 has at most pb marked leaves as descendants. Since si is a B-node, si+1 was chosen to be the leftmost immediate successor of si having the maximum number of marked leaves as descendants. Thus, since the outdegree of si is at most p, and each of its immediate successors has at most pb marked leaves as descendants, the node si has at most ppd = pd+1 marked leaves as descendants, as desired. Applying the claim to s0 , since w has at least K = p2r+3 marked occurrences, the path (s0 , . . . , sn ) contains at least 2r + 3 B-nodes. Let us now select the lowest 2r + 3 B-nodes in the path, (s0 , . . . , sn ), and denote them (b1 , . . . , b2r+3 ). Every B-node bi has at least two immediate successors ui < vi such that ui or vi is on the path (s0 , . . . , sn ). If the path goes through ui , we say that bi is a right B-node and if the path goes through vi , we say that bi is a left B-node. Since 2r + 3 = r + 2 + r + 1, either there are r + 2 left B-nodes or there are r + 2 right B nodes in the path (b1 , . . . , b2r+3 ). Let us assume that there are r + 2 left B nodes, the other case being similar. Let (d1 , . . . , dr+2 ) be the lowest r + 2 left B-nodes in the path. Since there are r + 1 B-nodes in the sequence (d2 , . . . , dr+2 ), and there are only r distinct nonterminals, there are two nodes di and dj , with 2 i < j r + 2, such that t(di ) = t(dj ) = A, for some A N . We can assume that di is an ancestor of dj , and thus, dj = di v, for some v = . If we prune out the subtree t/di rooted at di from t, we get an S-derivation tree having a yield of the form + uAz, and we have a derivation of the form S = uAz, since there are at least r + 2 left B-nodes on the path, and we are looking at the lowest r + 1 left B-nodes. Considering the subtree t/di , pruning out the subtree t/dj rooted at v in t/di , we get an A-derivation tree having a yield of the form vAy, and we have + a derivation of the form A = vAy. Finally, the subtree t/dj is an A-derivation tree with yield x, and we + have a derivation A = x. This proves (1) of the lemma. Since sn is a marked leaf and a descendant of dj , x contains some marked occurrence, proving (2). Since d1 is a left B-node, one of its immediate successors to the left of di leads to some marked leaf, and thus u contains some marked occurrence. Similarly, since di is a left B-node, one of its immediate successors to the left of di+1 leads to some marked leaf, and thus v contains some marked occurrence. This proves (3). Finally, since (b1 , . . . , b2r+3 ) are the lowest 2r + 3 B-nodes in the path, and (d1 , . . . , dr+2 ) is a subsequence of (b1 , . . . , b2r+3 ), the sequence (dj , . . . , b2r+3 ) has at most 2r + 2 B-nodes, and by the claim shown earlier, dj has at most p2r+2 marked leaves as descendants. Since p2r+2 < p2r+3 = K, this proves (4). 85

Observe that condition (2) implies that x = , and condition (3) implies that either u = and v = , or y = and z = . Thus, the pumping condition (1) implies that the set {uv n xy n z | n} is an innite subset of L(G), and L(G) is indeed innite, as we mentioned earlier. The standard pumping lemma due to Bar-Hillel, Perles, and Shamir, is obtained by letting all occurrences be marked in w L(G). Lemma 2.13.2 For every context-free grammar (without -rules), there is some integer K > 1 such that, for every string w + , if w L(G) and |w| K, then there exists some decomposition of w as w = uvxyz, and some A N , such that the following properties hold: (1) There are derivations S = uAz, A = vAy, and A = x, so that uv n xy n z L(G) for all n 0 (the pumping property); (2) x = ; (3) Either v = or y = ; (4) |vxy| K. A stronger version could be stated, and we are just following tradition in stating this standard version of the pumping lemma. The pumping lemma or Ogdens lemma can be used to show that certain languages are not context-free. As an illustration, we show that the language L = {an bn cn | n 1} is not context-free. Since L is innite, we will be able to use the pumping lemma. The proof proceeds by contradiction. If L was context-free, there would be some context-free grammar G such that L = L(G), and some constant K > 1 as in Ogdens lemma. Let w = aK bK cK , and choose the b s as marked occurrences. Then by Ogdens lemma, x contains some marked occurrence, and either both u, v or both y, z contain some marked occurrence. Assume that both u and v contain some b. We have the following situation: a ab b b b b bc c .
u v xyz + + +

If we consider the string uvvxyyz, the number of as is still K, but the number of bs is strictly greater than K since v contains at least one b, and thus uvvxyyz L, a contradiction. / If both y and z contain some b we will also reach a contradiction because in the string uvvxyyz, the number of cs is still K, but the number of bs is strictly greater than K. Having reached a contradiction in all cases, we conclude that L is not context-free. Let us now show that the language L = {am bn cm dn | m, n 1} is not context-free. Again, we proceed by contradiction. This time, let w = aK bK cK dK , where the bs and cs are marked occurrences.

By Ogdens lemma, either both u, v contain some marked occurrence, or both y, z contain some marked occurrence, and x contains some marked occurrence. Let us rst consider the case where both u, v contain some marked occurrence. If v contains some b, since uvvxyyz L, v must contain only bs, since otherwise we would have a bad string in L, and we have the following situation: a ab b b b b bc cd d .
u v xyz

Since uvvxyyz L, the only way to preserve an equal number of bs and ds is to have y d+ . But then, vxy contains cK , which contradicts (4) of Ogdens lemma. If v contains some c, since uvvxyyz L, v must contain only cs, since otherwise we would have a bad string in L, and we have the following situation: a ab bc c c c c cd d .
u v xyz

Since uvvxyyz L and the number of as is still K whereas the number of cs is strictly more than K, this case is impossible. Let us now consider the case where both y, z contain some marked occurrence. Reasoning as before, the only possibility is that v a+ and y c+ : a a a a a ab bc c c c c cd d .
u v x y z

But then, vxy contains bK , which contradicts (4) of Ogdens Lemma. Since a contradiction was obtained in all cases, L is not context-free. Ogdens lemma can also be used to show that the language {am bn cn | m, n 1} {am bm cn | m, n 1} is inherently ambiguous. The proof is quite involved. Another corollary of the pumping lemma is that it is decidable whether a context-free grammar generates an innite language. Lemma 2.13.3 Given any context-free grammar G (without -rules), there is some K > 1 such that L(G) is innite i there is some w L(G) such that K |w| < 2K. Proof . If there is some w L(G) such that |w| K, we already observed that the pumping lemma implies that L(G) contains an innite subset of the form {uv n xy n z | n 0}. Conversely, assume that L(G) is innite. If |w| < K for all w L(G), then L(G) is nite. Thus, there is some w L(G) such that |w| K. Let w L(G) be a minimal string such that |w| K. By the pumping lemma, we can write w as w = uvxyxz, where x = , vy = , and |vxy| K. By the pumping property, uxz L(G). If |w| 2K, then |uxz| = |uvxyz| |vy| > |uvxyz| |vxy| 2K K = K, and |uxz| < |uvxyz|, contradicting the minimality of w. Thus, we must have |w| < 2K. In particular, if G is in Chomsky Normal Form, it can be shown that we just have to consider derivations of length at most 4K 3. 87

2.14

Pushdown Automata

We have seen that the regular languages are exactly the languages accepted by DFAs or NFAs. The contextfree languages are exactly the languages accepted by pushdown automata, for short, PDAs. However, although there are two versions of PDAs, deterministic and nondeterministic, contrary to the fact that every NFA can be converted to a DFA, nondeterministic PDAs are strictly more powerful than deterministic PDAs (DPDAs). Indeed, there are context-free languages that cannot be accepted by DPDAs. Thus, the natural machine model for the context-free languages is nondeterministic, and for this reason, we just use the abbreviation PDA, as opposed to NPDA. We adopt a denition of a PDA in which the pushdown store, or stack, must not be empty for a move to take place. Other authors allow PDAs to make move when the stack is empty. Novices seem to be confused by such moves, and this is why we do not allow moves with an empty stack. Intuitively, a PDA consists of an input tape, a nondeterministic nite-state control, and a stack. Given any set X possibly innite, let Pf in (X) be the set of all nite subsets of X. Denition 2.14.1 A pushdown automaton is a 7-tuple M = (Q, , , , q0 , Z0 , F ), where Q is a nite set of states; is a nite input alphabet ; is a nite pushdown store (or stack) alphabet ; q0 Q is the start state (or initial state); Z0 is the initial stack symbol (or bottom marker ); : Q ( {}) Pf in (Q ) is the transition function. A transition is of the form (q, ) (p, a, Z), where p, q Q, Z , and a {}. A transition of the form (q, ) (p, , Z) is called an -transition (or -move). The way a PDA operates is explained in terms of Instantaneous Descriptions, for short IDs. Intuitively, an Instantaneous Description is a snapshot of the PDA. An ID is a triple of the form (p, u, ) Q . The idea is that p is the current state, u is the remaining input, and represents the stack. It is important to note that we use the convention that the leftmost symbol in represents the topmost stack symbol. Given a PDA M , we dene a relation M between pairs of IDs. This is very similar to the derivation relation =G associated with a context-free grammar. Intuitively, a PDA scans the input tape symbol by symbol from left to right, making moves that cause a change of state, an update to the stack (but only at the top), and either advancing the reading head to the next symbol, or not moving the reading head during an -move. F Q is the set of nal (or accepting) states;

Denition 2.14.2 Given a PDA M = (Q, , , , q0 , Z0 , F ), the relation M is dened as follows: (1) For any move (q, ) (p, a, Z), where p, q Q, Z , a , for every ID of the form (p, au, Z), we have (p, au, Z) M (q, u, ). (2) For any move (q, ) (p, , Z), where p, q Q, Z , for every ID of the form (p, u, Z), we have (p, u, Z) M (q, u, ). As usual, + is the transitive closure of M , and is the reexive and transitive closure of M . M M A move of the form (p, au, Z) M (q, u, ) where a {}, is called a pop move. A move on a real input symbol a causes this input symbol to be consumed, and the reading head advances to the next input symbol. On the other hand, during an -move, the reading head stays put. When we say that we have a computation. There are several equivalent ways of dening acceptance by a PDA. (p, u, ) (q, v, ) M

Denition 2.14.3 Given a PDA M = (Q, , , , q0 , Z0 , F ), the following languages are dened: (1) T (M ) = {w | (q0 , w, Z0 ) (f, , ), where f F , and }. M We say that T (M ) is the language accepted by M by nal state. (2) N (M ) = {w | (q0 , w, Z0 ) (q, , ), where q Q}. M We say that N (M ) is the language accepted by M by empty stack . (3) L(M ) = {w | (q0 , w, Z0 ) (f, , ), where f F }. M We say that L(M ) is the language accepted by M by nal state and empty stack . In all cases, note that the input w must be consumed entirely. The following lemma shows that the acceptance mode does not matter for PDAs. As we will see shortly, it does matter for DPDAs. Lemma 2.14.4 For any language L, the following facts hold. (1) If L = T (M ) for some PDA M , then L = L(M ) for some PDA M . (2) If L = N (M ) for some PDA M , then L = L(M ) for some PDA M . (3) If L = L(M ) for some PDA M , then L = T (M ) for some PDA M . (4) If L = L(M ) for some PDA M , then L = N (M ) for some PDA M . In view of lemma 2.14.4, the three acceptance modes T, N, L are equivalent. The following PDA accepts the language L = {an bn | n 1} by empty stack. Q = {1, 2}, = {Z0 , a}; (1, a) (1, a, Z0 ), (1, aa) (1, a, a), (2, ) (1, b, a), (2, ) (2, b, a). The following PDA accepts the language L = {an bn | n 1} by nal state and empty stack. Q = {1, 2, 3}, = {Z0 , A, a}, F = {3}; (1, A) (1, a, Z0 ), (1, aA) (1, a, A), 90

(1, aa) (1, a, a), (2, ) (1, b, a), (2, ) (2, b, a), (3, ) (1, b, A), (3, ) (2, b, A). DPDAs are dened as follows. Denition 2.14.5 A PDA M = (Q, , , , q0 , Z0 , F ) is a deterministic PDA (for short, DPDA), i the following conditions hold for all (p, Z) Q : either (1) |(p, a, Z)| = 1 for all a , and (p, , Z) = , or (2) (p, a, Z) = for all a , and |(p, , Z)| = 1. A DPDA operates in realtime i it has no -transitions. It turns out that for DPDAs the most general acceptance mode is by nal state. Indeed, there are language that can only be accepted deterministically as T (M ). The language L = {am bn | m n 1} is such an example. The problem is that am b is a prex of all strings am bn , with m n 2. A language L is a deterministic context-free language i L = T (M ) for some DPDA M . A PDA is unambiguous i for every w , there is at most one computation (q0 , w, Z0 ) IDn , where IDn is an accepting ID. We now show that every context-free language is accepted by a PDA.

2.15

From Context-Free Grammars To PDAs

We show how a PDA can be easily constructed from a context-free grammar. Although simple, the construction is not practical for parsing purposes, since the resulting PDA is horribly nondeterministic. Given a context-free grammar G = (V, , P, S), we dene a one-state PDA M as follows: Q = {q0 }; = V ; q0 = S; Z0 = S; F = ; For every rule (A ) P , there is a transition (q0 , ) (q0 , , A). For every a , there is a transition

(q0 , ) (q0 , a, a).

The intuition is that a computation of M mimics a leftmost derivation in G. One might say that we have a pop/expand PDA.

Lemma 2.15.1 Given any context-free grammar G = (V, , P, S), the PDA M just described accepts L(G) by empty stack, i.e., L(G) = N (M ). Proof . The following two claims are proved by induction. Claim 1: For all u, v and all N V {}, if S = u, then
lm

(q0 , uv, S) (q0 , v, ). Claim 2: For all u, v and all V , if then S = u.

(q0 , uv, S) (q0 , v, )

We now show how a PDA can be converted to a context-free grammar

2.16

From PDAs To Context-Free Grammars

The construction of a context-free grammar from a PDA is not really dicult, but it is quite messy. The construction is simplied if we rst convert a PDA to an equivalent PDA such that for every move (q, ) (p, a, Z) (where a {}), we have || 2. In some sense, we form a kind of PDA in Chomsky Normal Form. Lemma 2.16.1 Given any PDA M = (Q, , , , q0 , Z0 , F ), another PDA
M = (Q , , , , q0 , Z0 , F )

can be constructed, such that L(M ) = L(M ) and the following conditions hold: (1) There is a one-to-one correspondence between accepting computations of M and M ; (2) If M has no -moves, then M has no -moves; If M is unambiguous, then M is unambiguous;
(3) For all p Q , all a {}, and all Z , if (q, ) (p, a, Z), then q = q0 and || 2.

The crucial point of the construction is that accepting computations of a PDA accepting by empty stack and nal state can be decomposed into subcomputations of the form (p, uv, Z) (q, v, ), where for every intermediate ID (s, w, ), we have = for some = . The nonterminals of the grammar constructed from the PDA M are triples of the form [p, Z, q] such that (p, u, Z) + (q, , ) for some u . Given a PDA M = (Q, , , , q0 , Z0 , F ) satisfying the conditions of lemma 2.16.1, we construct a context-free grammar G = (V, , P, S) as follows: V = {[p, Z, q] | p, q Q, Z } {S}, where S is a new symbol, and the productions are dened as follows: for all p, q Q, all a {}, all X, Y, Z , we have: (1) S P , if q0 F ; (2) S a P , if (f, ) (q0 , a, Z0 ), and f F ; (3) S a[p, X, f ] P , for every f F , if (p, X) (q0 , a, Z0 ); (4) S a[p, X, s][s, Y, f ] P , for every f F , for every s Q, if (p, XY ) (q0 , a, Z0 ); (5) [p, Z, q] a P , if (q, ) (p, a, Z) and p = q0 ; (6) [p, Z, s] a[q, X, s] P , for every s Q, if (q, X) (p, a, Z) and p = q0 ; (7) [p, Z, t] a[q, X, s][s, Y, t] P , for every s, t Q, if (q, XY ) (p, a, Z) and p = q0 .

Lemma 2.16.2 Given any PDA M = (Q, , , , q0 , Z0 , F ) satisfying the conditions of lemma 2.16.1, the context-free grammar G = (V, , P, S) constructed as above generates L(M ), i.e., L(G) = L(M ). Furthermore, G is unambiguous i M is unambiguous. Proof . We have to prove that L(G) = {w + | (q0 , w, Z0 ) + (f, , ), f F }

{ | q0 F }.

For this, the following claim is proved by induction. Claim: For all p, q Q, all Z , all k 1, and all w , [p, Z, q] = w
lm k

i (p, w, Z) + (q, , ).

Using the claim, it is possible to prove that L(G) = L(M ), In view of lemmas 2.16.1 and 2.16.2, the family of context-free languages is exactly the family of languages accepted by PDAs. It is harder to give a grammatical characterization of the deterministic context-free languages. One method is to use Knuth LR(k)-grammars. Another characterization can be given in terms of strict deterministic grammars due to Harrison and Havel.

Chapter 3

Computability
The subject of computability is about dening mathematically what computation might be and then using those denitions for exploration of the subject. For example, before we talk about complexity of an algorithm it is probably a good idea to dene rst what our model of computation we are using. When did people rst start studying computability ? This is a fuzzy thing, since the subject of computability is intermixed with many dierent things. But what is clear is that the subject as we present it here came in to being in the 1930s due to the work of Church, Godel, Kleene, Post and Turing. At least one inuence was that real computers were being developed. But the subject was also of interest to other types of mathematician of the time. In particular, a great deal of work was initiated by Hilbert in his 1900 address at the international congress of mathematicians. There Hilbert gave a list of 21 problems that he thought were important. The so-called 10th, was to nd an algorithm that would check if any given set of polynomial equations (i.e. a Diophantine system) had solutions. This problem was not settled until 1971, when Matiyasevich, building on the work of Davis, Putnam and ??? showed that no such algorithm could exist. But the point is, that in 1900 there was no good denition of algorithm. Hilberts 10th problem may be considered just one of the pressures that were building to have such a denition. As for the fuzzy aspect mentioned above, there are a number of things that can be said. As was mentioned above, in 1900 no one was sure what was really meant precisely by an algorithm or procedure. Despite this, in the 19th century when non-constructive methods were rst introduced (noticed?), they were not accepted by all mathematicians. Intuitively, by a non-constructive proof we mean a proof that doesnt give an algorithm that produces the required object, but just abstractly shows the existence of the object. This led to something of an argument among mathematicians, notably between Hilbert and Brouwer as to whether the non-constructive proofs were right. This eventually led to a branch of mathematics known as constructive analysis. Thus, in constructive analysis, for example, if a real-valued function is to be shown to have a root, then an algorithm for generating the root must be given. Looking further back, a notion that is analogous to computation is the greek idea of a constructible number. Here we are given a few tools, usually a ruler and compass, (i.e. presumably a means of drawing) and use them to draw gures. We then can ask what lengths may be constructed with these tools. Thus a number is be constructible if there was a nite sequence of operations with these tools resulting in a segment whose length was that number. With the ruler and compass it turns out that only real algebraic numbers that are expressible with nested square roots of rational numbers may be constructed. We mean this analogy of a construction of a number with straight edge and compass with the computation of a function as only a rough analogy to help guide the reader with familiar examples. But it brings up an interesting point. For two thousand years it was an open problem as to whether an angle could be tri-sected with a ruler and compass. But it was known to Archimedes that if other tools (quadric surfaces, essentially) were allowed, then the angle could be tri-sected. This sort of phenomenon was troublesome when in the 1920s people started making attempts at dening what computable should mean. Namely, what operations should be allowed? A reasonable seeming choice leads to the denition of the primitive recursive functions. Unfortunately, this

class of functions turns out to be a little too small for the study of the subject. Interestingly though most of the programs that are written today are essentially primitive recursive. One really has to look hard to nd a computable function that is not primitive recursive. Clearly, early mechanical devices for doing arithmetic, such as the one designed and built by Pascal in the 17th century, and the sophisticated machines of Babbage in the early 19th century may be considered as ancestors of the modern computer. But one should be aware that these machines, like modern computers, are nite state machines. The models that we will present for computation below are not nite state, i.e. they have an unbounded amount of memory. For studying the theoretical aspects of computation this is appropriate, but often leads to results that are impractical or irrelevant in real life computing. Nevertheless, it is a suitable foundation and a rst step towards making a mathematical model of computing. If we want to worry about space and time limitations then we can impose them on our model and engage in what is know as complexity theory. In order to dene what we mean for something to be computable, it is good to rst settle on what the something should be. A reasonable seeming place would be functions from the natural numbers to the natural numbers. But clearly we would like to be able to compute things with several inputs, so we rst settle on functions from n-tuples of natural numbers to the natural numbers. We now dene the Turing machine model, which computes functions f : ,
m

where = {a1 , . . . , aN } is some input alphabet. We only consider deterministic Turing machines. A Turing machine also uses a tape alphabet such that . The tape alphabet contains a distinguished symbol B ,called the blank . In the model that we will be considering, a Turing machine uses a single / tape to store information. This tape can be viewed as a string over . The tape is used both as an input tape and a storage mechanism. Symbols on the tape can be overwritten and the tape can grow in either the left or right direction. There is a read/write head that is, any any given instant during the computation, pointing to a symbol on the tape. Unlike Pushdown automata or NFAs, the read/write head can move to the left or the right. We now give the formal denition: Denition 3.0.3 A (deterministic) Turing machine (or TM ) M is a sextuple M = (Q, , , {L, R}, , q0), where Q is a nite set of states; is a nite input alphabet ; is a nite tape alphabet , s.t. , Q = , and with blank B ; / q0 Q is the start state (or initial state); is the transition function, a (nite) set of quintuples Q {L, R} K, such that for all (p, a) Q, there is at most one triple (b, m, q) {L, R}Q such that (p, a, b, m, q) . A quintuple (p, a, b, m, q) is called an instruction. It is also denoted as p, a b, m, q. The eect of an instruction is to switch from state p to state q, overwrite the symbol currently scanned a with b, and move the read/write head either left or right, according to m. There is a convenient way to represent an instruction with a diagram. If the instruction is p, a b, m, q. as above then we draw the diagram A Turing machine can be represented by drawing a circle for each state and then connecting the states as described above. 97

Example 3.0.4 Consider the Turing machine

3.1

Computations of Turing Machines

To explain how a Turing machine works, we describe its action on Instantaneous descriptions. We take advantage of the fact that K = to dene instantaneous descriptions. Denition 3.1.1 Given a Turing machine M = (K, , , {L, R}, , q0), an instantaneous description (for short an ID) is a (nonempty) string in K+ , that is, a string of the form upav, where u, v , p K, and a . The intuition is that an ID upav describes a snapshot of a TM in the current state p, whose tape contains the string uav, and with the read/write head pointing to the symbol a. Thus, in upav, the state p is just to the left of the symbol presently scanned by the read/write head. We explain how a TM works by showing how it acts on IDs. Denition 3.1.2 Given a Turing machine M = (K, , , {L, R}, , q0), the yield relation (or compute relation) is a binary relation dened on the set of IDs as follows. For any two IDs ID1 and ID2 , we have ID1 ID2 i either 1. (p, a, b, R, q) , and either (a) ID1 = upacv, c , and ID2 = ubqcv, or (b) ID1 = upa and ID2 = ubqB; or 2. (p, a, b, L, q) , and either (a) ID1 = ucpav, c , and ID2 = uqcbv, or (b) ID1 = pav and ID2 = qBbv. Note how the tape is extended by one blank after the rightmost symbol in case (1)(b), and by one blank before the leftmost symbol in case (2)(b). As usual, we let + denote the transitive closure of , and we let denote the reexive and transitive closure of . We can now explain how a Turing function computes a partial function f : .
m

Since we allow functions taking m 1 input strings, we assume that contains the special delimiter , not in , used to separate the various input strings. It is convenient to assume that a Turing machine cleans up its tape when it halts, before returning its output. For this, we will dene proper IDs. We also dene starting and blocking IDs. 98

Denition 3.1.3 Given a Turing machine M = (K, , , {L, R}, , q0), where contains some delimiter , not in in addition to the blank B, a starting ID is of the form q0 w1 ,w2 , . . . ,wm where w1 , . . . , wm and m 2, or q0 w with w + , or q0 B. A blocking (or halting) ID is an ID upav such that there are no instructions (p, a, b, m, q) for any (b, m, q) {L, R} K. A proper ID is a halting ID of the form B k pwB l , where w , and k, l 0 (with l 1 when w = ). Computation sequences are dened as follows. Denition 3.1.4 Given a Turing machine M = (K, , , {L, R}, , q0), a computation sequence (or computation) is a nite or innite sequence of IDs ID0 , ID1 , . . . , IDi , IDi+1 , . . . , such that IDi IDi+1 for all i 0. A computation sequence halts i it is a nite sequence of IDs, so that ID0 IDn , and IDn is a halting ID. A computation sequence diverges if it is an innite sequence of IDs. We now explain how a Turing machine computes a partial function. Denition 3.1.5 A Turing machine M = (K, , , {L, R}, , q0) computes the partial function f :
m

i the following conditions hold: 1. For every w1 , . . . , wm , given the starting ID ID0 = q0 w1 ,w2 , . . . ,wm or q0 w with w + , or q0 B, the computation sequence of M from ID0 halts in a proper ID i f (w1 , . . . , wm ) is dened. 2. If f (w1 , . . . , wm ) is dened, then M halts in a proper ID of the form IDn = B k pf (w1 , . . . , wm )B h , which means that it computes the right value. A function f (over ) is Turing computable i it is computed by some Turing machine M . 99

Note that by (1), the TM M may halt in an improper ID, in which case f (w1 , . . . , wm ) must be undened. This corresponds to the fact that we only accept to retrieve the output of a computation if the TM has cleaned up its tape, i.e., produced a proper ID. In particular, intermediate calculations have to be erased before halting. Example. K = {q0 , q1 , q2 , q3 }; = {a, b}; = {a, b, B}; The instructions in are: q0 , B B, R, q3 , q0 , a b, R, q1 , q0 , b a, R, q1 , q1 , a b, R, q1 ,

q1 , b a, R, q1 , q1 , B B, L, q2 , q2 , b b, L, q2 , q2 , B B, R, q3 . The reader can easily verify that this machine exchanges the as and bs in a string. For example, on input w = aaababb, the output is bbbabaa. It turns out that the class of Turing computable functions coincides with the class of partial recursive functions. First, we dene the primitive recursive functions. q2 , a a, L, q2 ,

3.2

The Primitive Recursive Functions

The primitive recursive functions were rst formally dened by Goedel in 1929, although they were known to Hilbert. Like nite state automata, they are a simplied model of computation. But unlike regular languages, most functions (languages) that we encounter are primitive recursive. One has to look around a bit to nd a function that is computable but not primitive recursive. And once such a function is found, it is usually not an easy task to show that it is not primitive recursive and it will be a very hard function to compute. It seems reasonable that if two functions are computable then their composition should also be. Likewise for recursion. Also there are some primitive functions that we denitely want to consider computable : projection functions, the zero function, and the successor function. These are called the initial or base functions. Given these, we may use the above two operations to generate a class of functions closed under those operations. This class is known as the class of primitive recursive functions. To continue with the ruler and compass analogy, the rational lengths are like the initial functions and the operations of composition and recursion are like the ruler and compass. We now begin the formal denitions: Denition 3.2.1 Let = {a1 , . . . , aN }. The base functions over are the following functions:

1. The erase function E, dened such that E(w) = , for all w ; 2. For every j, 1 j N , the j-successor function Sj , dened such that Sj (w) = waj , for all w ; 100

3. The projection functions Pin , dened such that Pin (w1 , . . . , wn ) = wi , for every n 1, every i, 1 i n, and for all w1 , . . . , wn .
1 Note that P1 is the identity function on . Projection functions can be used to permute the arguments of another function.

A crucial closure operation is (extended) composition. Denition 3.2.2 Let = {a1 , . . . , aN }. For any function g: ,
m

and any m functions

h i : ,
n

the composition of g and the hi is the function f : ,

denoted as g (h1 , . . . , hm ), such that f (w1 , . . . , wn ) = g(h1 (w1 , . . . , wn ), . . . , hm (w1 , . . . , wn )), for all w1 , . . . , wn .
2 2 As an example, f = g (P2 , P1 ) is such that

f (w1 , w2 ) = g(w2 , w1 ). Another crucial closure operation is primitive recursion. Denition 3.2.3 Let = {a1 , . . . , aN }. For any function g: ,
m1

where m 2, and any N functions the function

h i : ,
m+1

f : ,
m

is dened by primitive recursion from g and h1 , . . . , hN , if f (, w2 , . . . , wm ) = g(w2 , . . . , wm ), f (ua1 , w2 , . . . , wm ) = h1 (u, f (u, w2 , . . . , wm ), w2 , . . . , wm ), ... = ... f (uaN , w2 , . . . , wm ) = hN (u, f (u, w2 , . . . , wm ), w2 , . . . , wm ), 101

for all u, w2 , . . . , wm . When m = 1, for some xed w , we have f () = w, f (ua1 ) = h1 (u, f (u)), ... = ... f (uaN ) = hN (u, f (u)), for all u . As an example, the following function g: , is dened by primitive recursion:
1 g(, v) = P1 (v), 3 g(uai , v) = Si (P2 (u, g(u, v), v)),

where 1 i N . It is easily veried that g(u, v) = vu. Then,

2 2 f = g (P2 , P1 )

computes the concatenation function, i.e. f (u, v) = uv. Denition 3.2.4 Let = {a1 , . . . , aN }. The class of primitive recursive functions is the smallest class of functions (over ) which contains the base functions and is closed under composition and primitive recursion. We leave as an exercise to show that every primitive function is a total function. The class of primitive recursive functions may not seem very big, but it contains just about any total function that anyone would ever want to compute. Although it is rather tedious to prove, the following theorem can be shown. Theorem 3.2.5 For an alphabet = {a1 , . . . , aN }, every primitive recursive function is Turing computable. The best way to prove the above theorem is probably to introduce yet another model of computation, RAM programs. Then, it can be shown that every Turing machine can simulate a RAM program. It is also rather easy to show that the RAM-computable functions contain the primitive recursive functions. It is very instructive to consider some examples of the primitive recursive functions over the natural numbers, the way they were originally dened. To do this within the denitions given above we simply let = {a}. Then we identify the sting an with the natural number n. Thus the erase function becomes the zero function E(x) = 0 the successor becomes addition by 1 S(x) = x + 1, and the projections become the usual projections on the natural numbers Pin (x1 , ..., xn ) = xi . This doesnt look like much to work with. But the amazing thing is that very complex functions can be made out of them.

102

Example 3.2.6 As a rst example lets work backwards. Consider the factorial function, f (n) = n!, which is given by the equations f (0) = 0, f (n + 1) = nf (n). How do we show that this function is primitive recursive ? We must demonstrate that we can construct it within the above formalism. Beware - it is not automatic that this function is primitive recursive. For example, the dening equations contain a multiplication. But we did show that concatenation was primitive recursive. What does this translate into over the natural numbers ? Addition, i.e. we now know that the function +(x, y) = x + y is primitive recursive. We want to show that the multiplication function, m(x, y) =xy, is primitive recursive. We may dene m by the equations m(x, 0) = x, m(x + 1, y) = +(m(x, y), y). This is still technically not enough, since we need to exhibit two the functions g and h used in the denition 2 3 3 of primitive recursive. But it is not hard to see that we may take g = P1 and h = + (P2 , P3 ). So far, the primitive recursive functions do not yield all the Turing-computable functions. In order to get a larger class of functions, we need the closure operation known as minimization.

3.3

The Partial Recursive Functions

The operation of minimization (sometimes called minimalization) is dened as follows. Denition 3.3.1 Let = {a1 , . . . , aN }. For any function g: ,
m+1

where m 0, for every j, 1 j N , the function f : ,

is dened by minimization over {aj } from g, if the following conditions hold for all w1 , . . . , wm : 1. f (w1 , . . . , wm ) is dened i there is some n 0 such that g(ap , w1 , . . . , wm ) is dened for all p, 0 p n, and j g(an , w1 , . . . , wm ) = . j 2. When f (w1 , . . . , wm ) is dened, where n is such that g(an , w1 , . . . , wm ) = j and g(ap , w1 , . . . , wm ) = j f (w1 , . . . , wm ) = an , j

for every p, 0 p n 1. The value f (w1 , . . . , wm ) is also denoted as minj u[g(u, w1 , . . . , wm ) = ]. 103

Note: When f (w1 , . . . , wm ) is dened,

f (w1 , . . . , wm ) = an , j

where n is the smallest integer such that condition (1) holds. It is very important to require that all the values g(ap , w1 , . . . , wm ) be dened for all p, 0 p n, when dening f (w1 , . . . , wm ). Failure to do so allows j non-computable functions. The class of partial computable functions is dened as follows. Denition 3.3.2 Let = {a1 , . . . , aN }. The class of partial recursive functions is the smallest class of functions (over ) which contains the base functions and is closed under composition, primitive recursion, and minimization. The class of recursive functions is the subset of the class of partial recursive functions consisting of functions dened for every input. One of the major results of computability theory is the following theorem. Theorem 3.3.3 For an alphabet = {a1 , . . . , aN }, every partial recursive function is Turing-computable. Conversely, every Turing-computable function is a partial recursive function. Similarly, the class of recursive functions is equal to the class of Turing-computable functions that halt in a proper ID for every input. To prove that every partial recursive function is indeed Turing-computable, the simplest thing to do is to show that every partial recursive function is RAM-computable. For the converse, one can show that given a Turing machine, there is a primitive recursive function describing how to go from one ID to the next. Then, minimization is used to guess whether a computation halts. The proof shows that every partial recursive function needs minimization at most once. The characterization of the recursive functions in terms of TMs follows easily. We can also deal with languages.

3.4

Recursively Enumerable Languages and Recursive Languages

The idea behind a set being recursively enumerable can be expressed by the following example. Suppose that we are seeking a integer solution to the equation x2 + y 2 = z 2 + z 4 . We may try to nd a solution to this equation by enumerating all integer triples and then plugging them into the equation one by one, each time checking to see if the triple is a solution. As we go along, if we get a triple that is a solution then we are done, otherwise we forget about it and move on to the next triple. This is ne except that, if there is no solution we will go on forever plugging in triples to the equation, and hence be caught in an innite loop. But we may get lucky and nd that there is a solution. Now lets generalize a little. Suppose that we wanted to write a program that would take as input a system of polynomial equations in several variables as input and determine if the system had a solution in the integers. Applying the above scheme to each equation seems like a good idea. But there is a problem: if the system of equations given to the program has no solution we get caught in an innite loop. So if our program returns an answer, it will be yes, the system has a solution. But otherwise we have to sit and wait, and there is no way of telling if the wait is going to be forever! This is one way of looking at recursively enumerable sets. The other way, which is equivalent, is that a recursively enumerable set is the simply the range of a recursive function. That is where the name comes from. In the denition that follows we assume that the TMs under consideration have a tape alphabet. containing the special symbols 0 and 1. 104

Denition 3.4.1 Let = {a1 , . . . , aN }. A language L is recursively enumerable (for short, an r.e. set) i there is some TM M such that for every w L, M halts in a proper ID with the output 1, and for every w L, either M halts in a proper ID with the output 0, or it runs forever. A language L is / recursive i there is some TM M such that for every w L, M halts in a proper ID with the output 1, and for every w L, M halts in a proper ID with the output 0. / Thus, given a recursively enumerable language L, for some w L, it is possible that a TM accepting L runs / forever on input w. On the other hand, for a recursive language L, a TM accepting L always halts in a proper ID. When dealing with languages, it is often useful to consider nondeterministic Turing machines. Such machines are dened just like deterministic Turing machines, except that their transition function is just a (nite) set of quintuples K {L, R} K, with no particular extra condition. It can be shown that every nondeterministic Turing machine can be simulated by a deterministic Turing machine, and thus, nondeterministic Turing machines also accept the class of r.e. sets. It can be shown that a recursively enumerable language is the range of some recursive function. It can also be shown that a language L is recursive i both L and its complement are recursively enumerable. There are recursively enumerable languages that are not recursive. Turing machines were invented by Turing around 1935. The primitive recursive functions were known to Hilbert circa 1890. Gdel formalized their denition in 1929. The partial recursive functions were dened by o Kleene around 1934. Church also introduced the -calculus as a model of computation around 1934. Other models: Post systems, Markov systems. The equivalence of the various models of computation was shown around 1935/36. RAM programs were only dened around 1963. A further study of the partial recursive functions requires the notions of pairing functions and of universal functions (or universal Turing machines).

3.5

Phrase-Structure Grammars

Context-free grammars can be generalized in various ways. The most general grammars generate exactly the recursively enumerable languages. Between the context-free languages and the recursively enumerable languages, there is a natural class of languages, the context-sensitive languages. The context-sensitive languages also have a Turing-machine characterization. We begin with phrase-structure grammars. Denition 3.5.1 A phrase-structure grammar is a quadruple G = (V, , P, S), where V is a nite set of symbols called the vocabulary (or set of grammar symbols); V is the set of terminal symbols (for short, terminals); S (V ) is a designated symbol called the start symbol ; P V N V V is a nite set of productions (or rewrite rules, or rules). Every production , is also denoted as . A production of the form is called an epsilon rule, or null rule. The set N = V is called the set of nonterminal symbols (for short, nonterminals).

105

Example 1. G1 = ({S, A, B, C, D, E, a, b}, {a, b}, S, P ), where P is the set of rules S ABC, AB aAD,

AB bAE, DC BaC, EC BbC, Da aD, Db bD, Ea aE,

Eb bE, AB ,

C , aB Ba, bB Bb.

It can be shown that this grammar generates the language L = {ww | w {a, b}}, which is not context-free.

3.6

Derivations and Type-0 Languages

The productions of a grammar are used to derive strings. In this process, the productions are used as rewrite rules. Denition 3.6.1 Given a phrase-structure grammar G = (V, , P, S), the (one-step) derivation relation =G associated with G is the binary relation =G V V dened as follows: for all , V , we have =G i there exist , V , and some production ( ) P , such that =
+

and = .

The transitive closure of =G is denoted as =G and the reexive and transitive closure of =G is denoted as =G . When the grammar G is clear from the context, we usually omit the subscript G in =G , =G , and =G . The language generated by a phrase-structure grammar is dened as follows. Denition 3.6.2 Given a phrase-structure grammar G = (V, , P, S), the language generated by G is the set + L(G) = {w | S = w}. A language L is a type-0 language i L = L(G) for some phrase-structure grammar G. 106
+

The following lemma can be shown. Lemma 3.6.3 A language L is recursively enumerable i it generated by some phrase-structure grammar G. In one direction, we can construct a nondeterministic Turing machine simulating the derivations of the grammar G. In the other direction, we construct a grammar simulating the computations of a Turing machine. We now consider some variants of the phrase-structure grammars.

3.7

Type-0 Grammars and Context-Sensitive Grammars

We begin with type-0 grammars. At rst glance, it may appear that they are more restrictive than phrasestructure grammars, but this is not so. Denition 3.7.1 A type-0 grammar is a phrase-structure grammar G = (V, , P, S), such that the productions are of the form , where N + . A production of the form is called an epsilon rule, or null rule. Lemma 3.7.2 A language L is generated by a phrase-structure grammar i it is generated by some type-0 grammar. We now place additional restrictions on productions, obtaining context-sensitive grammars. Denition 3.7.3 A context-sensitive grammar (for short, csg) is a phrase-structure grammar G = (V, , P, S), such that the productions are of the form A , with A N , V + , , V , or S ,

and if S P , then S does not appear on the right-hand side of any production. The notion of derivation is dened as before. A language L is context-sensitive i it is generated by some context-sensitive grammar. We can also dene monotonic grammars. Denition 3.7.4 A monotonic grammar is a phrasestructure grammar G = (V, , P, S), such that the productions are of the form with , V + and || ||, or S ,

and if S P , then S does not appear on the right-hand side of any production. 107

Example 2. where P is the set of rules G2 = ({S, A, B, C, a, b, c}, {a, b, c}, S, P ), S ABC,

AC CA, BC CB, BA AB, CA AC,

S ABCS, AB BA,

CB BC, A a, B b, C c.

It can be shown that this grammar generates the language L = {w {a, b, c}+ | #(a) = #(b) = #(c)}, which is not context-free. Lemma 3.7.5 A language L is generated by a context-sensitive grammar i it is generated by some monotonic grammar. The context-sensitive languages are accepted by space-bounded Turing machines, dened as follows. Denition 3.7.6 A linear-bounded automaton (for short, lba) is a nondeterministic Turing machine such that for every input w , there is some accepting computation in which at most |w| + 1 tape symbols are scanned. Lemma 3.7.7 A language L is generated by a context-sensitive grammar i it is accepted by a linear-bounded automaton. The class of context-sensitive languages is very large. The main problem is that no practical methods for constructing parsers from csgs are known.

3.8 3.9 3.10 3.11 3.12

The Halting Problem A Univeral Machine The Parameter Theorem Recursively Enumerable Languages Hilberts Tenth Problem
108

Chapter 4

Current Topics
4.1 4.2 4.3 4.4 DNA Computing Analog Computing Scientic Computing/Dynamical Systems Quantum Computing

109

Introduction To The Theory of Computation
No ratings yet
Introduction To The Theory of Computation
398 pages
Gallier Theory of Computation
No ratings yet
Gallier Theory of Computation
398 pages
Station Island
No ratings yet
Station Island
246 pages
Computation Theory for CS Students
No ratings yet
Computation Theory for CS Students
231 pages
Notes On Formal Languages Automata Computability and Complexity Gallier J Instant Download
100% (2)
Notes On Formal Languages Automata Computability and Complexity Gallier J Instant Download
56 pages
Theory of Computation
100% (2)
Theory of Computation
220 pages
Notes On Formal Languages Automata Computability and Complexity Gallier J - The Full Ebook Version Is Available, Download Now To Explore
100% (7)
Notes On Formal Languages Automata Computability and Complexity Gallier J - The Full Ebook Version Is Available, Download Now To Explore
45 pages
Book
No ratings yet
Book
401 pages
Final3 PDF
No ratings yet
Final3 PDF
91 pages
Notes
No ratings yet
Notes
193 pages
Computer Science Press - Introduction To Logic and Automata
No ratings yet
Computer Science Press - Introduction To Logic and Automata
302 pages
Automata Theory v0.8
No ratings yet
Automata Theory v0.8
50 pages
Full Notes
No ratings yet
Full Notes
152 pages
Introduction To The Theory of Computing
No ratings yet
Introduction To The Theory of Computing
227 pages
Computation Theory Essentials
100% (3)
Computation Theory Essentials
343 pages
MMI2023
No ratings yet
MMI2023
169 pages
Notes On Formal Languages Automata Computability and Complexity Gallier J Available Full Chapters
100% (1)
Notes On Formal Languages Automata Computability and Complexity Gallier J Available Full Chapters
132 pages
Lectures 1 To 31
No ratings yet
Lectures 1 To 31
118 pages
Theory of Computation
No ratings yet
Theory of Computation
120 pages
Theory of Automata
No ratings yet
Theory of Automata
2 pages
K.L.P. Mishra (FLAT)
43% (7)
K.L.P. Mishra (FLAT)
434 pages
Comput Ability Decid Ability Complexity
No ratings yet
Comput Ability Decid Ability Complexity
172 pages
Buehler - Classical Metalogic PDF
No ratings yet
Buehler - Classical Metalogic PDF
158 pages
Mathematical Logic Essentials
100% (1)
Mathematical Logic Essentials
222 pages
Mathlogicp1 PDF
No ratings yet
Mathlogicp1 PDF
122 pages
Automata Notes
No ratings yet
Automata Notes
75 pages
An Introduction To Formal Languages and Automata Book
No ratings yet
An Introduction To Formal Languages and Automata Book
14 pages
G52lac Notes
No ratings yet
G52lac Notes
128 pages
Godel
No ratings yet
Godel
296 pages
Butterfly Trigonometry
No ratings yet
Butterfly Trigonometry
4 pages
Warm Up Lesson Presentation Lesson Quiz
No ratings yet
Warm Up Lesson Presentation Lesson Quiz
23 pages
Logos LitChart
No ratings yet
Logos LitChart
4 pages
Mathematical Logic For Applications: Mikl Os Ferenczi - Mikl Ossz OTS
No ratings yet
Mathematical Logic For Applications: Mikl Os Ferenczi - Mikl Ossz OTS
120 pages
MacKenzie TheAutomationOfProof
No ratings yet
MacKenzie TheAutomationOfProof
23 pages
Fyalgbook
No ratings yet
Fyalgbook
47 pages
The Characteristics of Mathematical Creativity PDF
No ratings yet
The Characteristics of Mathematical Creativity PDF
16 pages
Unit-2 (39) METHODS OF PROOF PDF
No ratings yet
Unit-2 (39) METHODS OF PROOF PDF
20 pages
Mathematical Logic 1st Edition George Tourlakis Online Reading
100% (1)
Mathematical Logic 1st Edition George Tourlakis Online Reading
94 pages
The Principle of Mathematical Induction
No ratings yet
The Principle of Mathematical Induction
2 pages
Trivium Study Plan: Grammar, Logic, Rhetoric
No ratings yet
Trivium Study Plan: Grammar, Logic, Rhetoric
27 pages
Number Theory BSC Notes PDF
50% (2)
Number Theory BSC Notes PDF
24 pages
Computer Science Syllabus
No ratings yet
Computer Science Syllabus
115 pages
Module 13 Students
No ratings yet
Module 13 Students
13 pages
Role of Axiomatization in Math & Math Ed
100% (3)
Role of Axiomatization in Math & Math Ed
24 pages
John Locke's Knowledge Theory
No ratings yet
John Locke's Knowledge Theory
8 pages
PGT Maths 29 12 18 S1-DSC
0% (1)
PGT Maths 29 12 18 S1-DSC
58 pages
Pure Questions
No ratings yet
Pure Questions
15 pages
Math 8 3rd Grading Exam (Repaired)
No ratings yet
Math 8 3rd Grading Exam (Repaired)
4 pages
IGCSE Maths: Brackets & Proof
No ratings yet
IGCSE Maths: Brackets & Proof
11 pages
A Syntactic Approach To Eta Equality in Type Theory: Healfdene Goguen
100% (2)
A Syntactic Approach To Eta Equality in Type Theory: Healfdene Goguen
10 pages
Frap Book
No ratings yet
Frap Book
150 pages
Algorithmics PDF
No ratings yet
Algorithmics PDF
119 pages
Discrete Math Proofs Guide
No ratings yet
Discrete Math Proofs Guide
20 pages
Curs 1 DEPI - Prob - Slides - en
No ratings yet
Curs 1 DEPI - Prob - Slides - en
116 pages
Epsilondelta
No ratings yet
Epsilondelta
3 pages
AMC 8 2008-2009 - American Mathematics Competition - AMC 8 Math Club 2008-2009 - 132p
93% (14)
AMC 8 2008-2009 - American Mathematics Competition - AMC 8 Math Club 2008-2009 - 132p
132 pages
Hellenophilia Vs History of Science
100% (1)
Hellenophilia Vs History of Science
11 pages
CS Syllabus
No ratings yet
CS Syllabus
71 pages
Walton 2018
No ratings yet
Walton 2018
26 pages

The Theory of Languages and Computation: Preliminary Notes - Please Do Not Distribute

Uploaded by

The Theory of Languages and Computation: Preliminary Notes - Please Do Not Distribute

Uploaded by

The Theory of Languages and Computation

Preliminary notes - Please do not distribute.

3.5 3.6 3.7 3.8 3.9 3.10 3.11 3.12

tiger lion cheetah mouse rat beaver

gorilla chimpanzee naked mole rats

antelope caribou deer

blue whale sperm whale humpback whale moose

This point gets mapped to this point

Figure 1.7: Containment is a partial ordering on the power set of a set.

The Natural numbers and Induction

Figure 1.8: Induction via light bulbs

Figure 1.10: If bulb 0 is lit, then they all are!

Example 1.4.1 Show that 1 + 2 + + n =

Can you give a proof without induction ?

7 Use induction to show that 12 + 22 + + n2 = ? 8 Notice that

n(n + 1)(2n + 1) . What about for sums of higher powers 6

Foundations of Language Theory

For any language L, we dene Ln as follows:

Deterministic Finite Automata

Figure 1.15: Example 1.7.7 If L = {w {a, b} | |w| = 3 } then a DFA specifying L is

The Cross Product Construction

where the transition function : (Q1 Q2 ) Q1 Q2 is dened by

cross product figure

Non-Deterministic Finite Automata

Directed Graphs and Paths

Labeled Graphs and Automata

R is the set of regular expressions (over ).

L[(S + T )] = L[S] L[T ], L[(ST )] = L[S]L[T ], L[U ] = L[U ] . 38

Rp;r Rr;r Rr;q

The Theorem of Myhill and Nerode

We also claim that for every w and every block Ci , C1 w Ci

For every class Ci , let

State Equivalence and Minimal DFAs

i ( (q0 , v), w) F ), i (q0 , vw) F ),

2) The machine preprocessed.

0(01*1)*0 + 0(01*1)*01* f ( 0(01*1)*0)* ( + 0(01*1)*01* ) f

5) q has now been eliminated and we prepare the eliminate p.

6) The final machine with the desired regular expression.

This grammar generates a set of arithmetic expressions.

Derivations and Context-Free Languages

i there exist u , V , and some production (A ) P , such that = uA

i there exist V , v , and some production (A ) P , such that = Av

The claim is proved by induction on n. 60

leftmost derivation 1 = w, and we get the leftmost derivation = 1 = w.

for some production A . Thus, we have a derivation of the form = uAB = uA = u = w.

Normal Forms for Context-Free Grammars, Chomsky Normal Form

[B1 Bk1 ] [B1 Bk2 ]Bk1 , , [B1 B2 B3 ] [B1 B2 ]B3 , [B1 B2 ] B1 B2 .

E [X( E]X) , [X( E] X( E, E a, T [T X ]F,

Regular Languages are Context-Free

It is easily shown by induction on the length of w that i q = (p, w),

(A, a) = {B N | A aB P }, i B (A, w),

Useless Productions in Context-Free Grammars

Tn+1 = Tn {A (V ) | (A ) P, with (Tn ) }.

Un+1 = Un {B T (G) | (A B) P, with A Un , , (T (G) ) }.

The Greibach Normal Form

Context-Free Languages as Least Fixed-Points

[]({Ai }) = Li , if Ai N , []({X}) = []({})[]({X}), [](S {}) = [](S) []({}),

Then, writing the grammar G as A1 1,1 + + 1,n1 , Ai i,1 + + i,ni ,

Am m,1 + + m,nn , we dene the map G : 2 2 2 2

It is easy to verify that 1 (, ) = {ab}, G,A

2 (, ) = {ab, aabb}, G,B

2 (, ) = {ab, abab}, G,A 3 (, ) = {ab, aabbab, aabbaabb, abaabb}, G,A

3 (, ) = {ab, aabb, aaabbb}. G,B

i there is some rule Ai i,j with w = i,j , and that

Ai = i,j = w i,j = u1 Aj1 u2 uk Ajk uk+1 ,

h where u1 , . . . , uk+1 , k 1, then Ajh = wh , where nh n, and

then w = i,j for some rule A i,j , and Ai = w.

Least Fixed-Points and the Greibach Normal Form

+ (b, {aA, c})

(A, B) = (b, {aA, c}) Y1 Y3 Y2 Y4 = aB B {d, Aa}

+ (b, {aA, c}), Y2 Y4 + aB B {d, Aa}

{bY2 , aAY4 , cY4 , aA, c} {d, bY1 a, aAY3 a, cY3 a, ba}

Tree Domains and Gorn Trees

The tree t1 [22 t2 ] is dened by the following diagram: f h a a a g g b

where P is the set of rules E E + T, E T, T T F, T F,

F (E), F a, (* Picture of tree goes here *)

0(011)0 + 0(011)01* f ( 0(011)0)* ( + 0(011)01* ) f