NFA to DFA conversion and regular expressions
CSCI 3130 Formal Languages and Automata Theory
Siu On CHAN
Fall 2018
Chinese University of Hong Kong
                                                 1/22
DFAs and NFAs are equally powerful
                  NFA can do everything a DFA can do
                      How about the other way?
       Every NFA is equivalent to some DFA for the same language
                                                                   2/22
NFA → DFA algorithm
                          Given an NFA, figure out
    1. the initial active states
    2. how the set of active states changes upon reading an input
       symbol
                                                                    3/22
NFA → DFA example
                               ε,1
            NFA:         q0           q1     ε      q2
                                0
          Initial active states (before reading any input)?
                                                              4/22
NFA → DFA example
                                           ε,1
                  NFA:          q0               q1   ε   q2
                                           0
            Initial active states (before reading any input)?
        partial
        DFA:             {q0 , q1 , q2 }
              How does the set of active states change?
NFA → DFA example
                                           ε,1
                  NFA:          q0               q1   ε   q2
                                           0
            Initial active states (before reading any input)?
        partial
        DFA:             {q0 , q1 , q2 }
              How does the set of active states change?
NFA → DFA example
                                           ε,1
                  NFA:          q0                q1          ε   q2
                                           0
            Initial active states (before reading any input)?
        partial                            1
        DFA:             {q0 , q1 , q2 }         {q1 , q2 }
              How does the set of active states change?
NFA → DFA example
                                           ε,1
                  NFA:          q0                q1          ε       q2
                                           0
            Initial active states (before reading any input)?
                               0                                           0,1
        partial                            1
                                                                  1
        DFA:             {q0 , q1 , q2 }         {q1 , q2 }                ∅
                                           0
              How does the set of active states change?
                                                                                 4/22
NFA → DFA summary
                          0                                0,1
                                      1
         DFA:                                          1
                    {q0 , q1 , q2 }       {q1 , q2 }       ∅
                                      0
        Every DFA state corresponds to a subset of NFA states
     A DFA state is accepting if it contains an accepting NFA state
                                                                      5/22
Regular expressions
Regular expressions
     Powerful string matching feature in advanced editors (e.g. Vim,
     Emacs) and modern programming languages (e.g. PERL, Python)
                          PERL regex examples:
                   colou?r matches “color”/“colour”
           [A-Za-z]*ing matches any word ending in “ing”
           We will learn to parse complicated regex recursively
                     by building up from simpler ones
    Also construct the language matched by the expression recursively
       Will focus on regular expressions in formal language theory
             (notations differ from PERL/Python/POSIX regex)
                                                                        6/22
String concatenation
                                           st = abbbab
     s = abb                               ts = bababb
     t = bab                               ss = abbabb
                                           sst = abbabbbab
                   s = x1 . . . xn ,   t = y1 . . . ym
                                    ⇓
                       st = x1 . . . xn y1 . . . ym
                                                             7/22
Operations on languages
     • Concantenation of languages L1 and L2
                       L1 L2 = {st : s ∈ L1 , t ∈ L2 }
     • n-th power of language L
                  Ln = {s1 s2 . . . sn | s1 , s2 , . . . , sn ∈ L}
     • Union of L1 and L2
                    L1 ∪ L2 = {s | s ∈ L1 or s ∈ L2 }
                                                                     8/22
Example
                  L1 = {0, 01}           L2 = {ε, 1, 11, 111, . . . }
          L1 L2 = {0, 01, 011, 0111, . . . } ∪ {01, 011, 0111, 01111, . . . }
                = {0, 01, 011, 0111, . . . }
                     0 followed by any number of 1s
    L12 = {00, 001, 010, 0101}                      L22 = L2
                                                   L2n = L2       for any n > 1
                       L1 ∪ L2 = {0, 01, ε, 1, 11, 111, . . . }
                                                                                  9/22
Operations on languages
    The star of L are contains strings made up of zero or more chunks
                                   from L
                          L∗ = L0 ∪ L1 ∪ L2 ∪ . . .
             Example: L1 = {0, 01} and L2 = {ε, 1, 11, 111, . . . }
                             What is L1∗ ?    L2∗ ?
                                                                        10/22
Example
                                 L1 = {0, 01}
          L10 = {ε}
          L11 = {0, 01}
          L12 = {00, 001, 010, 0101}
          L13 = {000, 0001, 0010, 00101, 0100, 01001, 01010, 010101}
                      Which of the following are in L1∗ ?
     00100001                    00110001                    10010001
                                                                        11/22
Example
                                 L1 = {0, 01}
          L10 = {ε}
          L11 = {0, 01}
          L12 = {00, 001, 010, 0101}
          L13 = {000, 0001, 0010, 00101, 0100, 01001, 01010, 010101}
                      Which of the following are in L1∗ ?
     00100001                    00110001                    10010001
        Yes                          No                         No
                                                                        11/22
Example
                                  L1 = {0, 01}
           L10 = {ε}
           L11 = {0, 01}
           L12 = {00, 001, 010, 0101}
           L13 = {000, 0001, 0010, 00101, 0100, 01001, 01010, 010101}
                       Which of the following are in L1∗ ?
     00100001                     00110001                    10010001
        Yes                           No                         No
          L1∗ contains all strings such that any 1 is preceded by a 0
                                                                         11/22
Example
                           L2 = {ε, 1, 11, 111, . . . }
                                any number of 1s
          L20 = {ε}
          L21 = L2
          L22 = L2
          L2n = L2    (n > 1)
                                                          12/22
Example
                           L2 = {ε, 1, 11, 111, . . . }
                                any number of 1s
                                                 L2∗ = L20 ∪ L21 ∪ L22 ∪ . . .
          L20 = {ε}
                                                     = {ε} ∪ L2 ∪ L2 ∪ . . .
          L21 = L2
                                                     = L2
          L22 = L2
          L2n = L2    (n > 1)
                                                            L2∗ = L2
                                                                                 12/22
Combining languages
    We can construct languages by starting with simple ones, like {0}
                     and {1}, and combining them
    {0}({0} ∪ {1})∗          ⇒   0(0 + 1)∗
                                 all strings that start with 0
                                                                        13/22
Combining languages
    We can construct languages by starting with simple ones, like {0}
                     and {1}, and combining them
    {0}({0} ∪ {1})∗           ⇒   0(0 + 1)∗
                                  all strings that start with 0
    ({0}{1}∗ ) ∪ ({1}{0}∗ )   ⇒   01∗ + 10∗
                                  0 followed by any number of 1s, or
                                  1 followed by any number of 0s
                                                                        13/22
Combining languages
    We can construct languages by starting with simple ones, like {0}
                     and {1}, and combining them
    {0}({0} ∪ {1})∗           ⇒   0(0 + 1)∗
                                  all strings that start with 0
    ({0}{1}∗ ) ∪ ({1}{0}∗ )   ⇒   01∗ + 10∗
                                  0 followed by any number of 1s, or
                                  1 followed by any number of 0s
             0(0 + 1)∗ and 01∗ + 10∗ are regular expressions
     Blueprints for combining simpler languages into complex ones
                                                                        13/22
Syntax of regular expressions
   A regular expression over Σ is an expression formed by the following
                                   rules
     • The symbols ∅ and ε are regular expressions
     • Every symbol a in Σ is a regular expression
     • If R asd S are regular expressions, so are R + S, RS and R∗
                                Examples:
                   ∅                                   ε
                                                     ∗
               0(0 + 1)∗                          1 (ε + 0)
               01∗ + 10∗                      (0 + 1)∗ 01(0 + 1)∗
     A language is regular if it is represented by a regular expression
                                                                          14/22
Understanding regular expressions
                               Σ = {0, 1}
             01∗ = 0(1)∗ represents {0, 01, 011, 0111, . . . }
                   0 followed by any number of 1s
                            01∗ is not (01)∗
                                                                 15/22
Understanding regular expressions
   0 + 1 yields {0, 1}                                         strings of length 1
   (0 + 1)∗ yields {ε, 0, 1, 00, 01, 10, 11, . . . }                   any string
   (0 + 1)∗ 010                                        any string that ends in 010
   (0 + 1)∗ 01(0 + 1)∗                                   any string containing 01
                                                                                     16/22
Understanding regular expressions
            What language does the following represent?
             ((0 + 1)(0 + 1))∗ + ((0 + 1)(0 + 1)(0 + 1))∗
                                                            17/22
Understanding regular expressions
            What language does the following represent?
              ((0 + 1)(0 + 1))∗ + ((0 + 1)(0 + 1)(0 + 1))∗
         ((0 + 1)(0 + 1))∗                 ((0 + 1)(0 + 1)(0 + 1))∗
                                                                      17/22
Understanding regular expressions
            What language does the following represent?
              ((0 + 1)(0 + 1))∗ + ((0 + 1)(0 + 1)(0 + 1))∗
         ((0 + 1)(0 + 1))∗                 ((0 + 1)(0 + 1)(0 + 1))∗
          (0 + 1)(0 + 1)                    (0 + 1)(0 + 1)(0 + 1)
                                                                      17/22
Understanding regular expressions
            What language does the following represent?
              ((0 + 1)(0 + 1))∗ + ((0 + 1)(0 + 1)(0 + 1))∗
         ((0 + 1)(0 + 1))∗                 ((0 + 1)(0 + 1)(0 + 1))∗
           (0 + 1)(0 + 1)                   (0 + 1)(0 + 1)(0 + 1)
        strings of length 2                  strings of length 3
                                                                      17/22
Understanding regular expressions
             What language does the following represent?
               ((0 + 1)(0 + 1))∗ + ((0 + 1)(0 + 1)(0 + 1))∗
          ((0 + 1)(0 + 1))∗                 ((0 + 1)(0 + 1)(0 + 1))∗
       strings of even length              strings whose length is a
                                                 multiple of 3
           (0 + 1)(0 + 1)                    (0 + 1)(0 + 1)(0 + 1)
        strings of length 2                   strings of length 3
                                                                       17/22
Understanding regular expressions
             What language does the following represent?
               ((0 + 1)(0 + 1))∗ + ((0 + 1)(0 + 1)(0 + 1))∗
            strings whose length is even or a multiple of 3
              = strings of length 0, 2, 3, 4, 6, 8, 9, 10, 12, . . .
          ((0 + 1)(0 + 1))∗                      ((0 + 1)(0 + 1)(0 + 1))∗
       strings of even length                   strings whose length is a
                                                      multiple of 3
           (0 + 1)(0 + 1)                          (0 + 1)(0 + 1)(0 + 1)
        strings of length 2                         strings of length 3
                                                                            17/22
Understanding regular expressions
            What language does the following represent?
              ((0 + 1)(0 + 1) + (0 + 1)(0 + 1)(0 + 1))∗
                                                          18/22
Understanding regular expressions
            What language does the following represent?
              ((0 + 1)(0 + 1) + (0 + 1)(0 + 1)(0 + 1))∗
                (0 + 1)(0 + 1) + (0 + 1)(0 + 1)(0 + 1)
                                                          18/22
Understanding regular expressions
            What language does the following represent?
               ((0 + 1)(0 + 1) + (0 + 1)(0 + 1)(0 + 1))∗
                 (0 + 1)(0 + 1) + (0 + 1)(0 + 1)(0 + 1)
          (0 + 1)(0 + 1)                    (0 + 1)(0 + 1)(0 + 1)
                                                                    18/22
Understanding regular expressions
            What language does the following represent?
                ((0 + 1)(0 + 1) + (0 + 1)(0 + 1)(0 + 1))∗
                 (0 + 1)(0 + 1) + (0 + 1)(0 + 1)(0 + 1)
           (0 + 1)(0 + 1)                    (0 + 1)(0 + 1)(0 + 1)
        strings of length 2                   strings of length 3
                                                                     18/22
Understanding regular expressions
            What language does the following represent?
                ((0 + 1)(0 + 1) + (0 + 1)(0 + 1)(0 + 1))∗
                 (0 + 1)(0 + 1) + (0 + 1)(0 + 1)(0 + 1)
                        strings of length 2 or 3
           (0 + 1)(0 + 1)                    (0 + 1)(0 + 1)(0 + 1)
        strings of length 2                   strings of length 3
                                                                     18/22
Understanding regular expressions
              What language does the following represent?
                  ((0 + 1)(0 + 1) + (0 + 1)(0 + 1)(0 + 1))∗
   strings that can be broken into blocks, where each block has length 2
                                    or 3
                   (0 + 1)(0 + 1) + (0 + 1)(0 + 1)(0 + 1)
                          strings of length 2 or 3
             (0 + 1)(0 + 1)                    (0 + 1)(0 + 1)(0 + 1)
          strings of length 2                   strings of length 3
                                                                           18/22
Understanding regular expressions
                 What language does the following represent?
                   ((0 + 1)(0 + 1) + (0 + 1)(0 + 1)(0 + 1))∗
   strings that can be broken into blocks, where each block has length 2
                                    or 3
                         Which are in the language?
   ε         1            01           011          00110      011010110
                                                                           19/22
Understanding regular expressions
                What language does the following represent?
                  ((0 + 1)(0 + 1) + (0 + 1)(0 + 1)(0 + 1))∗
   strings that can be broken into blocks, where each block has length 2
                                    or 3
                        Which are in the language?
  ε         1            01           011          00110       011010110
  3         7            3             3             3             3
       The regular expression represents all strings except 0 and 1
                                                                           19/22
Understanding regular expressions
            What language does the following represent?
                                         ends in at most two 0s
                                       z    }|    {
                       (1 + 01 + 001)∗ (ε + 0 + 00)
                       |      {z     }
             at most two 0s between two consecutive 1s
                                                                  20/22
Understanding regular expressions
            What language does the following represent?
                                         ends in at most two 0s
                                      z   }|    {
                                         ∗
                      (1 + 01 + 001) (ε + 0 + 00)
                      |      {z     }
            at most two 0s between two consecutive 1s
                                                                  20/22
Understanding regular expressions
             What language does the following represent?
                                           ends in at most two 0s
                                        z   }|    {
                                           ∗
                        (1 + 01 + 001) (ε + 0 + 00)
                        |      {z     }
              at most two 0s between two consecutive 1s
                         Never three consecutive 0s
      The regular expression represents strings not containing 000
                                     Examples:
      ε               00                  0110010110                0010010
                                                                              20/22
Writing regular expressions
     Write a regular expression for all strings with two consecutive 0s
                                                                          21/22
Writing regular expressions
     Write a regular expression for all strings with two consecutive 0s
                          (anything)00(anything)
                            (0 + 1)∗ 00(0 + 1)∗
                                                                          21/22