COSC261  Formal Languages and Compilers
Walter Guttmann
2016
Contents
1 Finite Automata and Regular Languages
1.1 Languages . . . . . . . . . . . . . . . . . . . . .
1.2 Deterministic Finite Automata . . . . . . . . .
1.3 Non-Deterministic Finite Automata . . . . . . .
1.4 Regular Expressions . . . . . . . . . . . . . . .
1.5 Minimisation of Deterministic Finite Automata
1.6 Decision Problems for Regular Languages . . . .
1.7 Non-Regular Languages . . . . . . . . . . . . .
1.8 Modelling Independent Processes . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
4
4
7
13
18
22
25
27
29
2 Context-Free Languages
2.1 Context-Free Grammars . . . . . . . . . . . .
2.2 The Cocke-Younger-Kasami Algorithm . . . .
2.3 Pushdown Automata . . . . . . . . . . . . . .
2.4 Non-Context-Free Languages . . . . . . . . . .
2.5 Decision Problems for Context-Free Languages
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
30
30
37
40
44
46
.
.
.
.
.
.
47
50
54
61
64
66
73
.
.
.
.
74
74
76
80
81
3 Compilers
3.1 Lexical Analysis . . . . . . . . . . .
3.2 Syntax Analysis . . . . . . . . . . .
3.3 Semantic Analysis . . . . . . . . . .
3.4 Machine-Independent Optimisation
3.5 Code Generation . . . . . . . . . .
3.6 Machine-Dependent Optimisation .
4 Computability and Complexity
4.1 Decision Problems . . . . . .
4.2 Turing Machines . . . . . . .
4.3 Undecidability . . . . . . . . .
4.4 Complexity Classes . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Motivation
specifying behaviour of software systems
 part of software engineering
 build a model of software to validate design before implementation
 generate code from model
UML state diagrams describe behaviour
 example: http://flylib.com/books/2/292/1/html/2/images/0321160762/graphics/08fig20.gif
 they are a fancy kind of finite state automaton
 non-deterministic automata leave choices open for the implementer
 orthogonal regions describe concurrent execution and interaction
 Are non-deterministic automata more expressive than deterministic ones?
 What about automata with orthogonal regions?
pattern matching
 regular expressions describe patterns
 search using regular expressions supported by many programs
 How can a pattern be matched to a text fast?
 Can all patterns be described by regular expressions?
compilers
 programs can be run by an interpreter or by compiling them first
 interpreting may be slow
 compiling to machine code avoids much of the overhead
 compiler performs analysis, code generation and optimisation
 How can these tasks be automated for different programming languages?
syntax analysis
 code which is not syntactically correct should not be executed
 context-free grammars describe the syntax of programs
 Python syntax at http://docs.python.org/3/reference/grammar.html
 CSS syntax at http://www.w3.org/TR/CSS21/grammar.html
 How does a parser check whether code conforms to a given syntax?
 Can a parser be generated automatically?
describing structure
 regular expressions for patterns
 context-free grammars for program syntax
 Can everything about a programs structure be described by context-free grammars?
expressivity versus efficiency
 context-sensitive and type-0 grammars are more expressive
 but they have less efficient algorithms
 How fast is checking whether an input matches a description?
 How fast is checking whether two descriptions are equivalent?
secure communication
 HTTPS, as in https://www.google.co.nz/
 authenticated and encrypted
 public-key encryption using RSA
 symmetric-key encryption using RC4
RSA
 needs two prime numbers p and q
 public key is n = pq
 computing n from p and q is easy
 breaking RSA by factorising n into p and q is considered hard
 What do easy and hard mean?
 Which problems are easy and which are hard?
complexity of problems
problem domain
numbers
graphs
graphs
graphs
logic
optimisation
easy
considered hard
primality testing
factorisation
shortest path
longest path
visit every edge exactly once visit every node exactly once
2-colouring
3-colouring
2-satisfiability
3-satisfiability
linear programming
integer linear programming
computability of problems
 computers are faster, smaller, cheaper than decades ago
 yet they solve the same kind of problems
 Are there different computation models?
 Which problems can computers solve?
 Are there problems they will never solve?
1
1.1
Finite Automata and Regular Languages
Languages
Alphabet, string and language are defined as follows:
 An alphabet  is a non-empty finite set of symbols.
 A string over  is a finite sequence of symbols from .
 The length |w| of a string w is the number of symbols in w.
 The empty string  is the unique string of length 0.
  is the set of all strings over .
 A language L over  is a set of strings L   .
Example:
Let a, b   be symbols and let x, y, z   be strings.
 Symbols and strings can be concatenated by writing one after the other.
 xy is the concatenation of strings x and y, and similarly for ax, xa or ab.
 Concatenation is associative: x(yz) = (xy)z, so parentheses are omitted as in xyz.
  is an identity for concatenation: x = x = x.
 |xy| = |x| + |y|.
Example:
Let A, B   be languages.
 Concatenate languages A and B by concatenating each string from A with each from B.
 AB = {xy | x  A and y  B}.
 Language concatenation is associative.
 {} is the identity of language concatenation.
Example:
Concatenation can be iterated.
 an is the string comprising n copies of the symbol a  .
 xn is the string that concatenates n copies of the string x   .
 These operations are defined inductively:
 The base case is x0 = .
 The inductive case is xn+1 = xn x.
Example:
An is defined similarly for a language A   .
 A0 = {}.
 An+1 = AAn .
 A1 = A, A2 = AA, A3 = AAA, . . .
S
 A = nN An = A0  A1  A2  A3  . . .
 A = {x1 x2 . . . xn1 xn | xi  A for each 1  i  n for some n  N}.
S
 A+ = n1 An = AA = A1  A2  A3  . . .
Example:
Some properties of language operations:
 A A = A = A .
 A =  = A.
 A(B  C) = AB  AC.
 (A  B)C = AC  BC.
 But AB 6= BA in general.
Do not confuse languages and strings:
 {a, b} = {b, a} but ab 6= ba.
 {a, b} = {a, b, b} but ab 6= abb.
 {a, a} = {a} but aa 6= a.
  and {} and  are all different.
6
Let x be a string and n  N. Then xn x = xxn .
Proof by induction on n:
1.2
Deterministic Finite Automata
Example of a transition diagram:
A deterministic finite automaton (DFA) is a structure M = (Q, , , q0 , F ) where
 Q is a non-empty finite set, the states,
  is a non-empty finite set, the input alphabet,
  : Q    Q is the transition function,
 q0  Q is the start state,
 F  Q is the set of final states or accepting states.
7
The above example is
The function  may be given by a transition table:
M operates as follows:
 M reads an input string w   symbol-by-symbol.
 M starts in state q0 and moves from state to state.
 If M is in state q and the next symbol is a then M moves to state (q, a).
 M ends up in state p after reading w, and accepts w if p  F .
Example:
The following DFA accepts strings over {a, b} that do not contain b-symbols:
DFA accepting strings over {a, b} with a number of a-symbols that is not a multiple of 4:
The extended transition function  : Q    Q is
 ) = q,
 (q,
 ax) = ((q,
 (q,
a), x) where a   and x   .
  extends  to strings.
Example:
Conditions of acceptance are as follows:
 0 , w)  F .
 w is accepted by M if (q
 0 , w) 
 w is rejected by M if (q
/ F.
 0 , w)  F } is the language accepted by M .
 L(M ) = {w   | (q
 A   is regular if A = L(M ) for some DFA M .
The first example of a DFA has
 0 , ab) = q0 
 ab 
/ L(M ) since (q
/ F,
 0 , bbaab) = q2  F ,
 bbaab  L(M ) since (q
 L(M ) = {x   | x contains the substring aa}.
The following DFA accepts strings that are binary representations of multiples of 3:
0
1
1
q0
0
q1
q2
0
Consider for which strings the automaton ends up in state qi .
1.2.1
Closure Properties of Deterministic Finite Automata
Regular languages are closed under:
 complement,
 intersection,
 union,
 concatenation,
 star.
The following DFA accepts binary representations of numbers that are not multiples of 3:
10
Let A   be regular. Then A is regular.
Proof:
The following automata accept strings whose length is not a multiple of 3 and strings ending
with the symbol a, respectively.
p0
a, b
a, b
p1
a, b
a
a
q0
q1
b
p2
Their product automaton is:
11
Let A, B   be regular. Then A  B is regular.
Proof:
Let A, B   be regular. Then A  B is regular.
Proof:
12
1.3
Non-Deterministic Finite Automata
The following automaton accepts strings with a symbol 1 in the third position from the end.
It is not a DFA because there are two 1-transitions in state q0 and no transitions in state q3 .
0, 1
q0
q1
0, 1
q2
0, 1
q3
A non-deterministic finite automaton (NFA) is a structure M = (Q, , , q0 , F ) where
 Q, , q0 and F are as in a DFA,
  : Q    P(Q) is the transition relation, which yields a set of successor states.
 The power set P(Q) of Q is the set of subsets of Q, that is, P(Q) = {S | S  Q}.
The above example has the following transition relation:
M operates as a DFA except:
 If M is in state q and the next symbol is a then M may move to any of the states in (q, a).
 If (q, a) is empty then M gets stuck and the input is not accepted.
Possible transitions for input 111 are:
13
The extended transition relation  : Q    P(Q) is
 ) = {q},
 (q,
 ax) = S
 (q,
p(q,a) (p, x) where a   and x   .
Example:
Conditions of acceptance are as follows:
 0 , w)  F 6= , otherwise w is rejected.
 w is accepted by M if (q
 0 , w)  F 6= } is the language accepted by M .
 L(M ) = {w   | (q
1.3.1
From Non-Deterministic Finite Automata to Deterministic Finite Automata
The above NFA is converted to a DFA using the subset construction:
14
Every language accepted by an NFA is accepted by a DFA.
Proof:
Remarks:
 Every DFA is easily converted to an NFA, so NFAs accept exactly the regular languages.
 The number of states may grow exponentially in the subset construction.
 Let Ln = {w  {0, 1} | w has length  n and symbol 1 in the nth position from the end}.
For each n  1 there is an NFA with n + 1 states that accepts Ln , but no DFA with less
than 2n states that accepts Ln .
15
1.3.2
Non-Deterministic Finite Automata with -Transitions
The following automaton accepts the union of two regular languages. It is not a DFA because
of the -transitions in state r0 .
b
a
a
q0
q1
b
r0
a, b
p0
p1
a, b
p2
a, b
An NFA with -transitions is a structure M = (Q, , , q0 , F ) where
 Q, , q0 and F are as in a DFA,
  is a special symbol with  
/ ,
  : Q  (  {})  P(Q) is the transition relation.
  may have -transitions and yields a set of successor states.
The above example has the following transition relation:
r0
{p0 , q0 }
q0 {q1 } {q0 }
q1 {q1 } {q0 }
p0 {p1 } {p1 }
p1 {p2 } {p2 }
p2 {p0 } {p0 }
M operates as an NFA except:
 M may move from state q to any state in (q, ) without consuming a symbol from the input.
 E(q) = {p  Q | p is reachable from q with a sequence of -transitions} is the -closure of q.
Example:
16
The extended transition relation  : Q    P(Q) is
 ) = E(q),
 (q,
S
 ax) = S
 (q,
pE(q)
r(p,a) (r, x) where a   and x   .
Every language accepted by an NFA with -transitions is accepted by a DFA. Proof:
 Idea: replace each state by its -closure.
 Modify subset construction to use start state E(q0 ) and  0 (S, a) =
qS
p(q,a)
E(p).
The following NFA with -transitions accepts Python 3 integers such as 7, 0o177, 0X100000000,
000 and 79228162514264337593543950336:
0
q2
0
q1
19
q3
09
q0
q4
q5
q6
o,O
q7
q9
q10
x,X
q11
q12
0,1
q13
q14
b,B
q15
q16
The -closure of several states in this example is:
17
q8
09,af,AF
07
q17