Regular Expressions to Finite Automata
• High-level sketch
                      NFA
       Regular
                                   DFA
     expressions
       Lexical                  Table-driven
     Specification          Implementation of DFA
                                              59
Regular Expressions to NFA (0)
• The steps are:
  – Break down the RE into constituent sub-
    expressions
  – Create simple NFA’s for each basic symbol
  – Combine the NFA’s using rules for the
    operators (concatenation, alternation,
    exponentiation
     • Note: details here differ slightly from book, but
       are equivalent
                                                   60
Regular Expressions to NFA (1)
• For each kind of rexp, define an NFA
  – Notation: NFA for rexp M
For ϵ
                    ϵ
For input a
                    a
                                         61
Regular Expressions to NFA (2)
For AB
                A       B
                            (slightly different in book)
For A | B
            ϵ       B         ϵ
                              ϵ
            ϵ       A
                                                    62
Regular Expressions to NFA (3)
For A*
                    ϵ
            ϵ       A        ϵ
                                 63
Example of RegExp -> NFA conversion
• Consider the regular expression
                      (1|0)*1
• The NFA is
              ϵ   C
                      1       ϵ
                          E                   1
  A   ϵ   B                           ϵ
              ϵ       0       ϵ
                                  G       H       I
                  D       F
                  ϵ
                                                      64
NFA to DFA. The Trick
• Simulate the NFA
• Each state of DFA
   = a non-empty subset of states of the NFA
• Start state
   = the set of NFA states reachable through ϵ-moves from NFA
      start state
       See ϵ -closure() algorithm, p.69 in textbook
• Add a transition S →a S’ to DFA iff
   – S’ is the set of NFA states reachable from any state in S
     after seeing the input a
      • considering ϵ -moves as well
                                                            65
NFA -> DFA Example: Machine for (1|0)*1
                  C
                          1       ϵ
              ϵ               E
      ϵ                                         1
  A       B               0           G ϵ   H       I
              ϵ   D           F   ϵ
 •NFA Start state is marked
                                                        66
  ϵ -closure(A) = DFA Start State
ϵ -closure(A)               ϵ
                                1       ϵ
                    ϵ   C           E
    A    ϵ      B                           G ϵ
                                                      1
                                0                 H       I
                    ϵ   D           F   ϵ
                        ϵ
 (States reachable from A on ϵ−transitions)
     S
                                        Call this “S”.
        ABCDH                           Where can we get from
                                        S on input “0”?
                                        (ϵ -closure(F))
                                                              67
 Second DFA state, Reachable on 0 from
 Start State
ϵ -closure(F)                     ϵ
                                      1       ϵ
                      ϵ       C           E
    A    ϵ      B                                 G ϵ
                                                             1
                                      0                 H        I
                      ϵ       D           F   ϵ
                              ϵ
                              T                          0
     S                    0           BCDFGH
        ABCDH
                    Call this “T”.                 Next: back to “S” to look
                    Where can we get               for another possible state
                    from T on input “0”?                              68
                    Answer: only back to T
DFA Start State Again
                      ϵ
                          1       ϵ
              ϵ   C           E
  A   ϵ   B                           G ϵ
                                                1
                          0                 H       I
              ϵ   D           F   ϵ
                                  From “S” again:
                                  Where can we get on
                                  input “1”?
                                  ϵ-closure(E), + I
                                                        69
 Third DFA state, Reachable on “1” From
 Start State
ϵ-closure(E), + I               ϵ
                                    1       ϵ
                    ϵ       C           E
    A    ϵ   B                                  G ϵ
                                                          1
                                    0                 H       I
                    ϵ       D           F   ϵ
                            ϵ
                                T                     0       Call this “U”.
                                                              Because this
     S                  0           BCDFGH
                                                              includes
                            0                   1             state I, it is
        ABCDH                                         1       an accepting
                        1           BCDEGHI                   state.
                            U                                       70
The Resultant DFA: Three States
 (1|0)*1
                                        0
               0          T
           S       0            1
                                         1
               1          U
    No more distinct states were found, so
    this is the complete DFA.
                                             71
NFA to DFA. Remarks
• An NFA may be in many states at any time
• How many different states ?
• If there are N states, the NFA must be in
  some subset of those N states
• How many non-empty subsets are there?
  – 2N - 1 = finitely many
                                              72
Implementation
• A DFA can be implemented by a 2D table T[]
  – One dimension is “states”
  – Other dimension is “input symbols”
  – For every transition Si →a Sk define T[i,a] = k
• DFA “execution”
  – If in state Si and input a, read T[i,a] = k and skip to
    state Sk
  – Very efficient
                                                      73
Table Implementation of a DFA
                                    0
               0            T
       S           0            1
                                    1
               1        U
     (1|0)*1
                        0       1       “U” is the
                   S    T       U       accepting
                                        state, so it
                   T    T       U       is marked
                                        with a “*”.
                   U*   T       U
                                               74
Algorithm to simulate DFA (Fig 3.22, p.116)
array move[ ]       input       int final(int s) {/* “1” iff s is final */
                                    return (s == 2);
   state        0           1
                                    }
    0           1           2   int scan() {
     1          1           2       int s = 0; -- current state
                                    int c, move[3,2];
    2*          1           2       c = getchar();
                                    while (c != EOF) {
                                      s = move[s,c];
 S,T, & U are now states              c = getchar();
 0, 1, and 2
                                      }
                                    if (final(s)) return (YES);
 This algorithm reads
 an input string,                   else return (NO);
 attempting to verify           }
 it using a DFA encoded
 as a table.
                                                                      75
Implementation (Wrap-up)
• NFA -> DFA conversion is at the heart of tools
  such as lex and flex
• But, DFAs can be huge
• There are algorithms for reducing DFA’s but
  we will not look at this issue
                                           76
Lex and Flex: Scanner Generators
  Lex source                  lex.yy.c
   program          Lex      contains
     foo.l        compiler
                             function
                              yylex( )
                                a.out
    lex.yy.c         C         (binary
                  compiler
                             executable)
     candidate                token
                  a.out
      text file              stream
                                           77
Lex “Program” Structure
      %{
      .....C header stuff, copied into lex.yy.c.....
      %}
      .....Regular Definitions....
      %%
      .....Translation Rules for all tokens.....
      %%
      .....Auxiliary C Procedures, also copied
          into lex.yy.c
                                                       78
Regular Definitions and Translation Rules
         D1   R1
         D2   R2
         D3   R3
         %%
         P1   { action1 }
         P2   { action2 }
         P3   { action3 }
         P4   { action4 }
                                            79
Earlier Example Token Specs
(Added ASSIGN token)
            Token        Code        lexval
              “if”        IF            -
            “then”      THEN            -
            “else”       ELSE           -
               id         ID    (symtab entry)
             num         NUM        (value)
               :=      ASSIGN           -
                <       RELOP          LT
               <=       RELOP          LE
                =       RELOP          EQ
               <>       RELOP          NE
                >       RELOP          GT
               >=       RELOP          GE
                                                 80
A Lex Implementation
 %{
 #define IF           3
 #define THEN         4
 #define ELSE         5       /* Regular definitions */
 #define ID           6       delim    [ \t\n]
 #define NUM          7       ws       {delim}+
 #define RELOP        8       letter   [A-Za-z]
 #define ASSIGN 10            digit    [0-9]
 /* Subtypes of RELOP */      id       {letter}({letter}|{digit})*
 #define LT 1                 num      {digit}+
 #define LE 2                 %%
 #define EQ 3
 #define NE 4
 #define GT 5
 #define GE 6
 int yylval; /* tokenval */
 #include "sym.h"
 Sym *insert(),*lookup();
 %}
                                                              81
A Few Notes about Lex
• When a pattern is matched, the corresponding action
  is executed
• Static char array “yytext” contains matched string
• If the action does not return, Lex tries to consume
  more input, looking for another token
   – Therefore actions normally execute a return statement with
     a token code
   – To “throw away” something, just use an “empty” action
• If more than one pattern matches a prefix of the
  remaining input, chooses the longest matching string
• If more than one pattern matches the same input,
  chooses the first one defined
• Unrecognized input is copied to stdout
                                                          82
The Translation Rules
     {ws}      { /* throw it away */ }
     if        { return(IF); }
     then      { return(THEN); }
     else      { return(ELSE); }
     {id}      { if ((yylval=(int)lookup(yytext))==NULL)
                    yylval=(int)insert(yytext,ID); return(ID); }
     {num}     { yylval=atoi(yytext); return(NUM); }
     "<"       { yylval=LT; return(RELOP); }
     "<="      { yylval=LE; return(RELOP); }
     "<>"      { yylval=NE; return(RELOP); }
     "="       { yylval=EQ; return(RELOP); }
     ">"       { yylval=GT; return(RELOP); }
     ">=”      { yylval=GE; return(RELOP); }
     ":="      { return (ASSIGN); }
     %%
     /* Code for insert( ) and lookup( ) can go here... */
                                                                   83
Demonstration of Lex Example
     /* A Short mainline to test the lexical analyser*/
     extern int yylval;
     main() {
       int tok;
       init_strtab(20000); /* part of symbol table package */
       do {
         tok = yylex( );
         printf("Token %d Value %d\n", tok, yylval);
         }
       while (tok != 0);
     }
                                                                84
 Compile, Link and Run..
                                      torch% a.out                      typed input
torch% lex scanner.l                  if a<=26 then b:=sum else b:=0    string
                                      Token 3 Value 0
torch% cc main.c sym.c lex.yy.c -ll   Token 6 Value 53416
main.c:                               Token 8 Value 2
sym.c:                                Token 7 Value 26                 “left-overs”
lex.yy.c:                             Token 4 Value 26
Linking:                              Token 6 Value 53440
                                      Token 10 Value 53440
torch%
                                      Token 6 Value 53464
                                      Token 5 Value 53464
                                      Token 6 Value 53440
                                      Token 10 Value 53440
            IF         3              Token 7 Value 0
            THEN       4              Token 0 Value 0
            ELSE       5
            ID         6              torch%
            NUM        7
            RELOP      8
            ASSIGN     10                                                 85
86
First Glimpse of the Term Project...
• A small object-oriented language, compiled to MIPS
• “Cool”: Classroom Objected-Oriented Language”
• “UnCool” (because I simplified it, so it isn’t Cool
  anymore!)
• Multiple classes, all in single source file
• Int, Bool, String, and Int array primitive types
• Public Methods and private Attribute fields
   – With Attribute initializers
• Assignment, expressions, if and while
• Simple input-output of numbers, and output of Strings
   – (Uses built-in facilities in MIPS simulator)
                                                    87
Example Program: Factorial.cl
class Fac {
   computeFac(num : Int) : Int {
    if num < 1 then 1
    else num * (computeFac(num-1)) fi
    };
};
class Main {
  main() : Object {
    { out_string("Enter a number ");
      out_int((new Fac).computeFac(in_int()));
       out_string("\n");
    }
  };
};                                               88
Project (continued)
• Language spec is on the web
    – (see "UnCool Manual" in Downloads section)
• Bison-ready grammar file will be provided
    – You will need to provide a lexical analyser (Can use flex)
•   On-line bison and flex manuals available from course web page
•   Symbol table manager and several test programs will be provided
•   “Reference compiler” binary on torch (see manual)
•   Three passes:
    – Parsing and construction of an AST
    – Semantic Analysis Type checking
    – Code Generation
• The reference compiler is about 1900 lines of code +bison & flex
  files
                                                                   89
Programming Assignment 1
• Familiarize yourself with the UnCool language
  – and the MIPS simulator
• by writing and debugging a small program
  – Quick Sort, to sort a list of numbers
  – starting point provided (as a Java program)
• Complete assignment posted on web
• Due one week from today (January 23)
  – worth 5 points, of the 35% for the term Project
                                                  90