Formal Languages and Automata Theory
Siu On CHAN
Fall 2018
Chinese University of Hong Kong
                                       1/27
Welcome to 3130
       https://www.cse.cuhk.edu.hk/~siuon/csci3130
                   Tentative syllabus and schedule
                               Textbook
       Introduction to the Theory of Computation, Michael Sipser
          Please sign up on piazza.com and ask questions
                     Or come to our office hours
                                                                   2/27
Computers can beat experts at Go
                 Source: Wikipedia on AlphaGo versus Lee Sedol
                                                                 3/27
Computers can fake videos
               https://youtu.be/cQ54GDm1eL0
                  (warning: expletives at the end)
            Is there anything that a computer cannot do?
                                                           4/27
Impossibilites
                     Why care about the impossible?
   Example from Physics:
   Since the Middle Ages, people tried to
   design machines that use no energy
   Later physical discoveries forbid creating
   energy out of nothing
   Perpetual motion is impossible
                                                 “water screw” perpetual
                                                 motion machine
     Understanding the impossible helps us to focus on the possible
                                                                           5/27
Laws of computation
     Just like laws of physics tell us what are (im)possible in nature…
                                       δQ
        ∆U = Q + W              dS =              S − S0 = kB ln Ω
                                       T
      Laws of computation tell us what are (im)possible to do with
                               computers
                         Part of computer science
   To some extent, laws of computation are studied in automata theory
                                                                          6/27
Exploiting impossibilities
     Certain tasks are believed impossible to solve quickly on current
                                 computers
   Given n = pq that is the product of two unknown primes, find p and q
                     Building block of cryptosystems
                                                          $
                            011001110110110
                                                                          7/27
Candy machine
                Machine takes $5 and $10 coins
                A gumball costs $15
                Actions: +5, +10, Release
                              +10        +10
                         +5         +5         +5, +10
                    $0        $5         $10
                     R         R          R         +5, +10
                                                              8/27
Slot machine
               Why?
                      =
                          9/27
Different kinds of machines
                                +10        +10
                           +5         +5        +5, +10
                      $0        $5         $10
                      R          R          R
                                                   +5, +10
                                      R
                    Only one example of a machine
          We will look at different kinds of machines and ask
     • what kind of problems can this kind of machines solve?
     • What are impossible for this kind of machines?
     • Is machine A more powerful than machine B?
                                                                10/27
Machines with different resources in this course
      finite automata   Devices with a small amount of memory
                        These are very simple machines
      push-down         Devices with unbounded memory that
      automata          can be accessed in a restricted way
                        Used to parse grammars
      Turing machines   Devices with unbounded memory
                        These are actual computers
      time-bounded      Devices with unbounded memory but
      Turing Machines   bounded running time
                        These are computers that run fast
                                                                11/27
Course highlights
     • Finite automata
       Closely related to pattern searching in text
                         Find (ab)∗ (ab) in abracadabra
     • Grammars
         • Grammars describe the meaning of sentences in English, and the
           meaning of programs in Java
         • Useful for natural language processing and compilers
                                                                            12/27
Course highlights
                             Turing machines
     • General model of computers, capturing anything we could ever
       hope to compute
     • But there are many things that computers cannot do
     Given the code of a program, tell if the program prints the string
                                  “3130”
                             #include <stdio.h>
                             main(t,_,a)char *a;{return!0<t?t<3?main(-79,-13,a+main(-87,1-_,
                             main(-86,0,a+1)+a)):1,t<_?main(t+1,_,a):3,main(-94,-27+t,a)&&t==2?_<13?
                             main(2,_+1,"%s %d %d\n"):9:16:t<0?t<-72?main(_,t,
                             "@n'+,#'/*{}w+/w#cdnr/+,{}r/*de}+,/*{*+,/w{%+,/w#q#n+,/#{l,+,/n{n+,/+#n+,/#\
                             ;#q#n+,/+k#;*+,/'r :'d*'3,}{w+K w'K:'+}e#';dq#'l \
                             q#'+d'K#!/+k#;q#'r}eKK#}w'r}eKK{nl]'/#;#q#n'){)#}w'){){nl]'/+#n';d}rw' i;# \
                             ){nl]!/n{n#'; r{#w'r nc{nl]'/#{l,+'K {rw' iK{;[{nl]'/w#q#n'wk nw' \
                             iwk{KK{nl]!/w{%'l##w#' i; :{nl]'/*{q#'ld;r'}{nlwb!/*de}'c \
                             ;;{nl'-{}rw]'/+,}##'*}#nc,',#nw]'/+kd'+e}+;#'rdq#w! nr'/ ') }+}{rl#'{n' ')# \
                             }'+}##(!!/")
                             :t<-50?_==*a?putchar(31[a]):main(-65,_,a+1):main((*a=='/')+t,_,a+1)
        Does the program       :0<t?main(2,2,"%s"):*a=='/'||main(0,main(-61,*a,
                             "!ek;dc i@bK'(q)-[w]*%n+r3#l,{}:\nuwloca-O;m .vpbks,fxntdCeghiry"),a+1);}
                                  print “3130”?
         Formal verification of software must fail on corner cases
                                                                                                             13/27
Course highlights
                     Time-bounded Turing machines
     • Many problems can be solved on a computer in principle, but
       takes too much time in practice
     • Traveling salesperson: Given a list of cities, find the shortest way
       to visit them all and return home
                                           Seoul
                                                       Tokyo
                                   Shanghai
                               Hong Kong      Taipei
                                              Manila
                         Bangkok
     • For 100 cities, takes 100+ years to solve even on the fastest
       computer!
                                                                              14/27
Problems we will look at
                    Can machine A solve problem B?
     • Examples of problems we will consider
         • Given a word s, does it contain “to” as a subword?
         • Given a number n, is it divisible by 7?
         • Given two words s and t, are they the same?
     • All of these have “yes/no” answers (decision problems)
     • There are other types of problems, like “Find this” or “How many
       of that” but we won’t look at them
                                                                          15/27
Alphabets and Strings
     • Strings are a common way to talk about words, numbers, pairs
       of numbers
       Which symbols can appear in a string? As specified by an
       alphabet
                    An alphabet is a finite set of symbols
     • Examples
       Σ1 = {a, b, c, d, . . . , z}: the set of English letters
       Σ2 = {0, 1, 2, . . . , 9}: the set of digits (base 10)
       Σ3 = {a, b, c, . . . , z, #}: the set of letters plus special symbol #
                                                                                16/27
Strings
          An input to a problem can be represented as a string
      A string over alphabet Σ is a finite sequence of symbols in Σ
                axyzzy is a string over Σ1 = {a, b, c, . . . , z}
                  3130 is a string over Σ2 = {0, 1, . . . , 9}
                ab#bc is a string over Σ3 = {a, b, . . . , z, #}
     • The empty string will be denoted by ε
       (What you get using "" in C, Java, Python)
     • Σ∗ denotes the set of all strings over Σ
       All possible inputs using symbols from Σ only
                                                                      17/27
Languages
          A language is a set of strings (over the same alphabet)
          Languages describe problems with “yes/no” answers:
   L1 = All strings containing the substring “to”        Σ1 = {a, . . . , z}
                           stop, to, toe are in L1
                           ε, oyster are not in L1
               L1 = {x ∈ Σ∗1 | x contains the substring “to”}
                                                                               18/27
Examples of languages
      L2 = {x ∈ Σ∗2 | x is divisible by 7}             Σ2 = {0, 1, . . . , 9}
                         L2 contains 0, 7, 14, 21, …
                                                                                19/27
Examples of languages
      L2 = {x ∈ Σ∗2 | x is divisible by 7}               Σ2 = {0, 1, . . . , 9}
                           L2 contains 0, 7, 14, 21, …
        L3 = {s#s | s ∈ {a, . . . , z}∗ }        Σ3 = {a, b, . . . , z, #}
                     Which of the following are in L3 ?
        ab#ab                         ab#ba                         a##a#
                                                                                  19/27
Examples of languages
      L2 = {x ∈ Σ∗2 | x is divisible by 7}               Σ2 = {0, 1, . . . , 9}
                           L2 contains 0, 7, 14, 21, …
        L3 = {s#s | s ∈ {a, . . . , z}∗ }        Σ3 = {a, b, . . . , z, #}
                     Which of the following are in L3 ?
        ab#ab                         ab#ba                         a##a#
         Yes                           No                            No
                                                                                  19/27
Finite Automata
Example of a finite automaton
                                    +10        +10
                               +5         +5         +5, +10
                       $0           $5         $10             go
                       R             R          R         +5, +10
     • There are states $0, $5, $10, go
     • The start state is $0
     • Takes inputs from {+5, +10, R}
     • The state go is an accepting state
     • There are transitions specifying where to go to for every state
       and every input symbol
                                                                         20/27
Deterministic finite automaton
        A finite automaton (DFA) is a 5-tuple (Q, Σ, δ, q0 , F ) where
     • Q is a finite set of states
     • Σ is an alphabet
     • δ : Q × Σ → Q is a transition function
     • q0 ∈ Q is the initial state
     • F ⊆ Q is the set of accepting states (or final states)
    In diagrams, the accepting states will be denoted by double circles
                                                                          21/27
Example
                          0            1                 0,1
                          q0     1     q1   0            q2
                                                table of transition
                                                function δ
     alphabet Σ = {0, 1}                                       inputs
     states Q = {q0 , q1 , q2 }                                0    1
     initial state q0                                     q0   q0   q1
     accepting states F = {q0 , q1 }
                                                states
                                                          q1   q2   q1
                                                          q2   q2   q2
                                                                         22/27
Language of a DFA
   A DFA accepts a string x if starting from the initial state and following
      the transition as x is read from left to right, the DFA ends at an
                                accepting state
                            0            1            0,1
                           q0     1      q1     0     q2
          The DFA accepts 0 and 011           but not 10 and 0101
   The language of a DFA is the set of all strings x accepted by the DFA
         0 and 011 are in the language          10 and 0101 are not
                                                                               23/27
The languages of these DFAs?
                                                             Σ = {a, b}
           Σ = {a, b}                                                q0
            b           a                                                 b
                                                                 a
                 a                             a       q1                     q3       b
            q0          q1
                 b                                 a         b            b        a
                                               b       q2                     q4       a
                             0          1              0,1
                             q0    1    q1     0       q2
                                  Σ = {0, 1}
                                                                                           24/27
Examples
  Construct a DFA over {0, 1} that accepts all strings with at most three
                                    1s
                                                                            25/27
Examples
  Construct a DFA over {0, 1} that accepts all strings with at most three
                                    1s
             0            0            0            0            0,1
            q0     1     q1      1     q2     1     q3     1    q4+
                                                                            25/27
Examples
     Construct a DFA over {0, 1} that accepts all strings ending in 01
                                                                         26/27
Examples
     Construct a DFA over {0, 1} that accepts all strings ending in 01
   Hint: The DFA should “remember” the last 2 bits of the input string
                                                                         26/27
Examples
     Construct a DFA over {0, 1} that accepts all strings ending in 01
   Hint: The DFA should “remember” the last 2 bits of the input string
                                                   q00       0
                                           0
                                      q0           1
                              0            1       q01       0
                         qε                    1         0
                                                   q10       1
                                  1        0
                                      q1               0
                                           1
                                                   q11       1
           We will see a much simpler DFA in the next lecture
                                                                         26/27
Examples
    Construct a DFA over {0, 1} that accepts all strings ending in 101
                                                                         27/27