Theory of Computation
Presented By
           Dr Saravanan S ME,Ph.D
School of Computing Science and Engineering
        VIT Bhopal University, Bhopal
           Theory of Automata
• Theory of automata is a theoretical branch of
  computer science and mathematical. It is the
  study of abstract machines and the
  computation problems that can be solved using
  these machines. The abstract machine is called
  the automata. The main motivation behind
  developing the automata theory was to develop
  methods to describe and analyse the dynamic
  behaviour of discrete systems.
• The State is represented by circles, and
  the Transitions is represented by arrows.
Symbols:
• Symbols are an entity or individual objects,
  which can be any letter, alphabet or any
  picture.
Example:
1, a, b, #
Alphabets:
• Alphabets are a finite set of symbols. It is denoted by ∑.
• Examples:
• ∑ = {a, b}
• ∑ = {A, B, C, D}
• ∑ = {0, 1, 2}
• ∑ = {0, 1, ....., 5]
• ∑ = {#, β, Δ}
String:
• It is a finite collection of symbols from the alphabet. The string is
   denoted by w.
Example 1:
• If ∑ = {a, b}, various string that can be generated from ∑ are {ab,
   aa, aaa, bb, bbb, ba, aba.....}.
• A string with zero occurrences of symbols is known as an empty
   string. It is represented by ε.
• The number of symbols in a string w is called the length of a string.
   It is denoted by |w|.
• Example 2:
          w = 010
          Number of Sting |w| = 3
Language:
• A language is a collection of appropriate
  string. A language which is formed over Σ can
  be Finite or Infinite.
• Example: 1
L1 = {Set of string of length 2} = {aa, bb, ba,
bb}
L2 = {Set of all strings starts with 'a'} = {a, aa,
aaa, abb, abbb, ababb}
               Finite Automata
• Finite automata are used to recognize patterns.
• It takes the string of symbol as input and changes
  its state accordingly. When the desired symbol is
  found, then the transition occurs.
• At the time of transition, the automata can either
  move to the next state or stay in the same state.
• Finite automata have two states, Accept
  state or Reject state. When the input string is
  processed successfully, and the automata reached
  its final state, then it will accept.
         Formal Definition of FA
• A finite automaton is a collection of 5-tuple
  (Q, ∑, δ, q0, F), where:
• Q: finite set of states
• ∑: finite set of the input symbol
• q0: initial state
• F: final state
• δ: Transition function
           Finite Automata Model
• Finite automata can be represented by input tape
  and finite control.
• Input tape: It is a linear tape having some
  number of cells. Each input symbol is placed in
  each cell.
• Finite control: The finite control decides the next
  state on receiving particular input from input tape.
  The tape reader reads the cells one by one from
  left to right, and at a time only one input symbol
  is read.
Transition Diagram
                     Transition Table
Present State   Next state for   Next State of Input 1
                Input 0
→q0             q1               q2
q1              q0               q2
*q2             q2               q2
Types of Automata
Deterministic Finite Automata(DFA)
• DFA refers to deterministic finite automata.
  Deterministic refers to the uniqueness of the
  computation. The finite automata are called
  deterministic finite automata if the machine is
  read an input string one symbol at a time.
• In DFA, there is only one path for specific input
  from the current state to the next state.
• DFA does not accept the null move, i.e., the DFA
  cannot change state without any input character.
• DFA can contain multiple final states. It is used in
  Lexical Analysis in Compiler
         Formal Definition of DFA
A DFA is a collection of 5-tuples same as we
described in the definition of FA
• Q: finite set of states
• ∑: finite set of the input symbol
• q0: initial state
• F: final state
• δ: Transition function Q x ∑→Q
                  Example 1
• Q = {{q0, q1, q2}             ,∑   =   {0,   1},
  δ ,q0 = {q0} ,F = {q2} }= ?
     State         0            1
     →q0           q0           q1
     q1            q2           q1
     *q2           q2           q2
                   Example 2
DFA with ∑ = {0, 1} accepts all starting with 0.
                  Example 3
DFA with ∑ = {0, 1} accepts all ending with 0.
         Non-Deterministic Finite
         Automata(NFA or NDFA)
• NFA stands for non-deterministic finite automata.
  It is easy to construct an NFA than DFA for a
  given regular language.
• The finite automata are called NFA when there
  exist many paths for specific input from the
  current state to the next state.
• Every NFA is not DFA, but each NFA can be
  translated into DFA.
• NFA is defined in the same way as DFA but with
  the following two exceptions, it contains multiple
  next states, and it contains ε transition.
Example
         Formal definition of NFA
A DFA is a collection of 5-tuples same as we
described in the definition of FA
• Q: finite set of states
• ∑: finite set of the input symbol
• q0: initial state
• F: final state
• δ: Transition function δ: Q x ∑ →2Q
                Example
Present State    Next state for   Next State of
                 Input 0          Input 1
→q0              q0, q1           q1
q1               q2               q0
*q2              q2               q1, q2
                    Example 2
Present State   0               1
→q0             q0, q1          q0, q2
q1              q3              ε
q2              q2, q3          q3
*q3             q3              q3
Example 3
                 Problem 1
• Let M =({ q0,q1,q2,q3},{0,1},δ,q0,q0) and the
   transition functions are
  δ (q0, 0) =q2          δ (q2, 0) =q0
  δ (q0, 1) =q1         δ (q2, 1) =q3
 δ (q1, 0) =q3        δ (q3, 0) =q1
  δ (q1, 1) =q0         δ (q3, 1) =q2
a) Represent M by its State table
b) Represents M by its State diagram
c) Which of the following strings are accepted by
1)0101 2) 110101                 3) 01001
                Problem 2
• String are{ 001, 1001,01101}
       Conversion from NFA to DFA
Let, M = (Q, ∑, δ, q0, F) is an NFA which accepts
the language L(M). There should be equivalent
DFA denoted by M' = (Q', ∑', q0', δ', F') such that
L(M) = L(M').
   Steps for converting NFA to DFA
• Step 1: Initially Q' = ϕ
• Step 2: Add q0 of NFA to Q'. Then find the
  transitions from this start state.
• Step 3: In Q', find the possible set of states for
  each input symbol. If this set of states is not in
  Q', then add it to Q'.
• Step 4: In DFA, the final state will be all the
  states which contain F(final states of NFA)
Example 1: Convert the given NFA to DFA.
     State          0              1
     →q0            {q0, q1}       {q1}
     *q1            ϕ              {q0, q1}
                        Solution
State       0          1
→[q0]       [q0, q1]   [q1]
*[q1]       ϕ          [q0, q1]
*[q0, q1]   [q0, q1]   [q0, q1]
   Conversion from NFA with ε to DFA
Steps for converting NFA with ε to DFA:
Step 1: We will take the ε-closure for the starting state of NFA
as a starting state of DFA.
Step 2: Find the states for each input symbol that can be
traversed from the present. That means the union of transition
value and their closures for each state of NFA present in the
current state of DFA.
Step 3: If we found a new state, take it as current state and
repeat step 2.
Step 4: Repeat Step 2 and Step 3 until there is no new state
present in the transition table of DFA.
Step 5: Mark the states of DFA as a final state which contains
the final state of NFA.
                  Example
Convert the given NFA into its equivalent DFA
                      Solution
•   ε-closure(q0) = {q0, q1, q2}
•   ε-closure(q1) = {q1, q2}
•   ε-closure(q2) = {q2}
•   δ'(A, 0) = A
•   δ'(A, 1) = B
•   δ'(A, 2) = C
•   δ'(B, 0) = ϕ
•   δ'(B, 1) = B
•   δ'(B, 2) = C
            Minimization of DFA
• Step 1: Remove all the states that are
  unreachable from the initial state via any set
  of the transition of DFA.
• Step 2: Draw the transition table for all pair of
  states.
• Step 3: Now split the transition table into two
  tables T1 and T2. T1 contains all final states,
  and T2 contains non-final states.
• Step 4: Find similar rows from T1 such that:
                        COMPILER
DEFINITION
 Compiler is a program that reads a program written in
  one language – source program – and translate it into
  an equivalent program in another language – target
  language
 Compiler reports to its user the presence of error in
  source program
       source program                Target program
                          COMPILER
                          Error Message
         Types of Compiler
There are 4 types
     – Single – Pass
     – Multi- Pass
     – Load and Go
     – Debugging
Analysis – Synthesis Model
  There are 2 pars to compilation
1.Analysis Model
   breaks up the source program into
    constituent pieces and creates an
    intermediate representation of the source
    program
   Tree Syntax tree
      Analysis – Synthesis model
2.Synthesis Model
  Constructs the desired target program from the
  intermediate representation
Software Tools :
          Structural Editors
         Pretty Printers
         Static Checkers
         Interpreters
         Text formatters
         Silicon compilers
         Query interpreters
 Context of compilers
Figure 1. A Language-Processing System
  Analysis of the source program
Analysis consists of 3 phases
    1.Linear analysis
           source program is read from left-to-right
           Grouped into tokens  sequences of
           characters having a collective meaning.
    2.Hierarchical analysis
          tokens are grouped hierarchically into
           nested collections with collective meaning.
    3.semantic analysis
         to ensure that the components of a
          program fit together meaningfully.
              Phases of a Compiler
                           Source Program
                       1
                                Lexical Analyzer
                       2
                            Syntax Analyzer
                       3
                            Semantic Analyzer
Symbol-table Manager                               Error Handler
                       4       Intermediate Code
                                   Generator
                       5
                            Code Optimizer
                       6
                            Code Generator
                           Target Program
                Lexical Analysis
• Also called scanning
Scans Input & Identify tokens
Removes White spaces and comments
An Example
      Position := initial + rate * 60 ;
             All are tokens
                      Token
•   Three types of tokens:
•   Terminal Symbols (TRM)-
•   Keywords and Operators, Literals (LIT),
•   Identifiers (IDN).
                 Example
• int a = 10;
 Tokens
int (keyword),
a(identifier),
=(operator),
10(constant)
;(punctuation-semicolon)
Total token =5
                 Example 2
int main()
{
// printf() sends the string inside quotation to
 // the standard output (the display)
printf("Welcome to GeeksforGeeks!"); return 0;
}
• Tokens 'int', 'main', '(', ')', '{', 'printf', '(', '
  "Welcome to GeeksforGeeks!" ', ')', ';', 'return',
  '0', ';', '}‘
Total number of tokens = 14
                   Lexeme
• It is a sequence of characters in the source
   code that are matched by given predefined
   language rules for every lexeme to be specified
   as a valid token. int, goto,main
Example
main is lexeme of type identifier(token)
(,),{,} are lexemes of type punctuation(token)
                     Pattern
• It specifies a set of rules that a scanner follows
   to create a token.
• It must start with the alphabet, followed by
   the alphabet or a digit.
• The sequence of characters that make the
   keyword.
Example
(, ), {, }
                     LEX
• Lex is a program that generates lexical
  analyzer. It is used with YACC parser
  generator.
• The lexical analyzer is a program that
  transforms an input stream into a sequence of
  tokens.
    The function of Lex is as follows:
• Lex file format
 {definitions } %% { rules } %% { user subroutines }
                    Syntax Analysis
 The Syntactic Analysis is also called Parsing.
 Tokens are grouped into grammatical phrases represented by a Parse Tree
 The hierarchical structure is expressed by recursive rules, called
   Productions.
Parse Tree: An Example
                  Semantic Analysis
   The Semantic Analysis phase checks the program for
    semantic errors (Type Checking) and gathers type information
    for the successive phases.
 Type Checking: Check types of operands .No real number as
    index for array; etc.
   Intermediate Code Generation
 An intermediate code is generated as a program for an abstract machine.
 The intermediate code should be easy to translate into the target program.
 As intermediate code we consider the three-address code, similar to
   assembly:
• sequence of instructions with at most three operands such that:
     There is at most one operator, in addition to the assignment.
• make explicit the operators precedence.
     Temporary names must be generated to compute intermediate
      operations.
                     Code Optimization
 This phase attempts to improve the intermediate code so that
    faster-running
• machine code can be obtained.
 Different compilers adopt different optimization techniques.
                       Code Generation
This phase generates the target code consisting of assembly code.
1. Memory locations are selected for each variable;
2. Instructions are translated into a sequence of assembly instructions;
3. Variables and intermediate results are assigned to memory registers.
  The Structure of a Compiler
                                                      Code Generator
                                               [Intermediate Code Generator]
                                                          Non-optimized Intermediate Code
                               Scanner
                          [Lexical Analyzer]
              Tokens
                                                      Code Optimizer
                               Parser
                          [Syntax Analyzer]
                                                               Optimized Intermediate Code
        Parse tree
                                                      Code Optimizer
                          Semantic Process
                         [Semantic analyzer]                           Target machine code
Abstract Syntax Tree w/ Attributes
         Symbol Table Management
• A symbol table is a data structure containing a record for
  each identifier, with fields for the attributes of the
  identifier.
•    The data structure allows us to find the record for each
    identifier quickly and to store or retrieve data from that
    record quickly.
      Error Detection and Reporting
• Each phase encounter errors
 The lexical phase can detect errors where the characters
   remaining in the input do not form any token of the language.
 Errors where the token stream violates the structure rules
  (syntax)of the language are determined by the syntax analysis
  phase.
 During semantic analysis the compiler tries to detect constructs
  that have the right syntactic structure but no meaning to the
  operation involved
 e.g., if we try to add two identifiers, one of which is the name
  of an array, and the other the name of a procedure
         Compiler-Construction Tools
Other names are
         Compiler – compiler
         Compiler – generators
         Translator – writing system
For compiler construction there are special tools like :
• 1- Scanner generators Eg.Lex and Flex
• 2- Parser generators      Eg. Yacc and Bison
• 3- Syntax-directed translation engines
• 4- Code-generator generators
• 5- Data-flow analysis engines