0% found this document useful (0 votes)

78 views25 pages

Automata & Compiler Design Course

jjjj

Uploaded by

Sushant K

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

78 views25 pages

Automata & Compiler Design Course

jjjj

Uploaded by

Sushant K

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

Module 1

AUTOMATA THEORY AND COMPILER DESIGN

Course Code 21CS51 CIE Marks 50
Teaching Hours/Week (L:T:P: S) 3:0:0:0 SEE Marks 50
Total Hours of Pedagogy 40 Total Marks 100
Credits 03 Exam Hours 03
Course Learning Objectives

CLO 1. Introduce the fundamental concepts of Automata Theory, Formal Languages and compiler
design
CLO 2. Principles Demonstrate Application of Automata Theory and Formal Languages in the field of
compiler design
CLO 3. Develop understanding of computation through Push Down Automata and Turing Machines
CLO 4. Introduce activities carried out in different phases of Phases compiler
CLO 5. Identify the undecidability problems.

Teaching-Learning Process (General Instructions)

These are sample Strategies, which teachers can use to accelerate the attainment of the various course
outcomes.
1. Lecturer method (L) needs not to be only a traditional lecture method, but alternative effective
teaching methods could be adopted to attain the outcomes.
2. Use of Video/Animation to explain functioning of various concepts.
3. Encourage collaborative (Group Learning) Learning in the class.
4. Ask at least three HOT (Higher order Thinking) questions in the class, which promotes critical
thinking.
5. Adopt Problem Based Learning (PBL), which fosters students’ Analytical skills, develop design
thinking skills such as the ability to design, evaluate, generalize, and analyze information
rather than simply recall it.
6. Introduce Topics in manifold representations.
7. Show the different ways to solve the same problem with different approaches and encourage
the students to come up with their own creative ways to solve them.
8. Discuss how every concept can be applied to the real world - and when that's possible, it helps
improve the students' understanding.
Module-1
Introduction to Automata Theory: Central Concepts of Automata theory, Deterministic Finite
Automata(DFA), Non- Deterministic Finite Automata(NFA) ,Epsilon- NFA, NFA to DFA Conversion,
Minimization of DFA

Introduction to Compiler Design: Language Processors, Phases of Compilers

Textbook 1: Chapter1 – 1.5, Chapter2 – 2.2,2.3,2.5 Chapter4 –4.4

Textbook 2: Chapter1 – 1.1 and 1.2
Teaching-Learning Process Chalk and board, Active Learning, Problem based learning
Module-2
Module 1

Regular Expressions and Languages: Regular Expressions, Finite Automata and Regular
Expressions, Proving Languages Not to Be Regular

Lexical Analysis Phase of compiler Design: Role of Lexical Analyzer, Input Buffering , Specification of
Token, Recognition of Token.

Textbook 1: Chapter3 – 3.1, 3.2, Chapter4- 4.1

Textbook 2: Chapter3- 3.1 to 3.4

Teaching-Learning Process Chalk and board, Active Learning, Demonstration

Module-3
Context Free Grammars: Definition and designing CFGs, Derivations Using a Grammar, Parse Trees,
Ambiguity and Elimination of Ambiguity, Elimination of Left Recursion, Left Factoring.

Syntax Analysis Phase of Compilers: part-1: Role of Parser , Top-Down Parsing

Textbook 1: Chapter 5 – 5.1.1 to 5.1.6, 5.2 (5.2.1, 5.2.2), 5.4

Textbook 2: Chapter 4 – 4.1, 4.2, 4.3 (4.3.2 to 4.3.4) ,4.4
Teaching-Learning Process Chalk and board, Problem based learning, Demonstration
Module-4
Push Down Automata: Definition of the Pushdown Automata, The Languages of a PDA.

Syntax Analysis Phase of Compilers: Part-2: Bottom-up Parsing, Introduction to LR Parsing: SLR,
More Powerful LR parsers

Textbook1: Chapter 6 – 6.1, 6.2

Textbook2: Chapter 4 – 4.5, 4.6, 4.7 (Up to 4.7.4)
Teaching-Learning Process Chalk & board, Problem based learning
Module-5
Introduction to Turing Machine: Problems that Computers Cannot Solve, The Turing machine,
problems, Programming Techniques for Turing Machine, Extensions to the Basic Turing Machine

Undecidability : A language That Is Not Recursively Enumerable, An Undecidable Problem That Is RE.

Other Phases of Compilers: Syntax Directed Translation- Syntax-Directed Definitions, Evaluation

Orders for SDD’s. Intermediate-Code Generation- Variants of Syntax Trees, Three-Address Code.

Code Generation- Issues in the Design of a Code Generator

Textbook1: Chapter 8 – 8.1, 8.2,8.3,8.4 Chapter 9 – 9.1,9.2

Textbook2: Chapter 5 – 5.1, 5.2, Chapter 6- 6.1,6.2 Chapter 8- 8.1
Teaching-Learning Process Chalk and board, MOOC
Module 1

Course Outcomes
At the end of the course the student will be able to:
CO 1. Acquire fundamental understanding of the core concepts in automata theory and Theory of
Computation
CO 2. Design and develop lexical analyzers, parsers and code generators
CO 3. Design Grammars and Automata (recognizers) for different language classes and become
knowledgeable about restricted models of Computation (Regular, Context Free) and their
relative powers.
CO 4. Acquire fundamental understanding of the structure of a Compiler and Apply concepts
automata theory and Theory of Computation to design Compilers
CO 5. Design computations models for problems in Automata theory and adaptation of such model
in the field of compilers

Assessment Details (both CIE and SEE)

The weightage of Continuous Internal Evaluation (CIE) is 50% and for Semester End Exam (SEE) is 50%.
The minimum passing mark for the CIE is 40% of the maximum marks (20 marks). A student shall be
deemed to have satisfied the academic requirements and earned the credits allotted to each subject/
course if the student secures not less than 35% (18 Marks out of 50) in the semester-end
examination (SEE), and a minimum of 40% (40 marks out of 100) in the sum total of the CIE
(Continuous Internal Evaluation) and SEE (Semester End Examination) taken together

Continuous Internal Evaluation:

Three Unit Tests each of 20 Marks (duration 01 hour)
1. First test at the end of 5th week of the semester
2. Second test at the end of the 10th week of the semester
3. Third test at the end of the 15th week of the
semester Two assignments each of 10 Marks
1. First assignment at the end of 4th week of the semester
2. Second assignment at the end of 9th week of the semester
Group discussion/Seminar/quiz any one of three suitably planned to attain the COs and POs for
20 Marks (duration 01 hours)
1. At the end of the 13th week of the semester

The sum of three tests, two assignments, and quiz/seminar/group discussion will be out of 100
marks and will be scaled down to 50 marks
(to have a less stressed CIE, the portion of the syllabus should not be common /repeated for any
of the methods of the CIE. Each method of CIE should have a different syllabus portion of the
course).
CIE methods /question paper has to be designed to attain the different levels of Bloom’s
taxonomy as per the outcome defined for the course.
Semester End Examination:
Theory SEE will be conducted by University as per the scheduled timetable, with common
question papers for the subject (duration 03 hours)
1. The question paper will have ten questions. Each question is set for 20 marks and
Marks scored shall be proportionally reduced to 50 marks
2. There will be 2 questions from each module. Each of the two questions under a module
(with a maximum of 3 sub-questions), should have a mix of topics under that module.
3. The students have to answer 5 full questions, selecting one full question from each
module
Module 1

Suggested Learning Resources:

Textbooks
1. John E Hopcroft, Rajeev Motwani, Jeffrey D. Ullman,“ Introduction to Automata
Theory, Languages and Computation”, Third Edition, Pearson.
2. Alfred V.Aho, Monica S.Lam,Ravi Sethi, Jeffrey D. Ullman, “ Compilers Principles,
Techniques and Tools”, Second Edition,Perason.
Reference:
1. Elain Rich, “Automata,Computability and complexity”, 1st Edition, Pearson
Education,2018.
2. K.L.P Mishra, N Chandrashekaran , 3rd Edition , ‘Theory of Computer Science”,PHI,2012.
3. Peter Linz, “An introduction to Formal Languages and Automata “, 3rd Edition,
Narosa Publishers,1998.
4. K Muneeswaran, ”Compiler Design”, Oxford University Press 2013.
Weblinks and Video Lectures (e-Resources):

1. https://nptel.ac.in/courses/106/106/106106049/#
2. https://nptel.ac.in/courses/106/104/106104123/
3. https://www.jflap.org/
Activity Based Learning (Suggested Activities in Class)/ Practical Based learning
Module 1

Introduction to Automata Theory

It is a theoretical branch of computer science along with mathematics. It is the study of

abstract machines and the computation problems that can be solved using these machines.

The term "Automata" is derived from the Greek word "αὐτόματα" which means "self-acting".
An automaton (Automata in plural) is an abstract self-propelled computing device which
follows a predetermined sequence of operations automatically.

Abstract machine:The abstract machine is called the automata. An abstract machine

resembles a function in mathematics. It receives inputs and produces outputs according to
specified rules. Abstract machines differ from more literal machines because they are
assumed to function perfectly and independently from hardware. They are subdivided into
types on the basis of characteristics such as how they perform their operations and what types
of inputs they can receive.

The automata theory is a developed methods to describe and analyse the dynamic
behaviour of discrete systems.

This automaton consists of states and transitions. The State is represented by circles,
and the Transitions is represented by arrows.

Automata is the kind of machine which takes some string as input and this input
goes through a finite number of states and may enter in the final state.

Concepts of Automata theory:

There are the basic terminologies that are important and frequently used in
automata:

Alphabet

• Definition − An alphabet is any finite set of symbols.

• Example − ∑ = {a, b, c, d} is an alphabet set where ‘a’, ‘b’, ‘c’, and ‘d’ are symbols.

String

• Definition − A string is a finite sequence of symbols taken from ∑.

• Example − ‘cabcad’ is a valid string on the alphabet set ∑ = {a, b, c, d}

Length of a String

• Definition − It is the number of symbols present in a string. (Denoted by |S|).

• Examples −
Module 1

o If S = ‘cabcad’, |S|= 6
o If |S|= 0, it is called an empty string (Denoted by λ or ε)

Kleene Star

• Definition − The Kleene star, ∑*, is a unary operator on a set of symbols or

strings, ∑, that gives the infinite set of all possible strings of all possible lengths
over ∑ including ϵ.
• Representation − ∑* = ∑0 ∪ ∑1 ∪ ∑2 ∪……. where ∑p is the set of all possible
strings of length p.
• Example − If ∑ = {a, b}, ∑* = {ϵ, a, b, aa, ab, ba, bb,………..}

Kleene Closure / Plus

• Definition − The set ∑+ is the infinite set of all possible strings of all possible lengths
over ∑ excluding λ.
• Representation − ∑+ = ∑1 ∪ ∑2 ∪ ∑3 ∪…….
∑+ = ∑* − { ϵ }
• Example − If ∑ = { a, b } , ∑+ = { a, b, aa, ab, ba, bb,………..}

Language

• Definition − A language is a subset of ∑* for some alphabet ∑. It can be finite or

infinite.
• Example − If the language takes all possible strings of length 2 over ∑ = {a, b}, then
L = { ab, aa, ba, bb }

An automaton with a finite number of states is called a Finite Automaton (FA) or Finite
State Machine (FSM).

Finite Automaton can be classified into two types −

• Deterministic Finite Automaton (DFA)

• Non-deterministic Finite Automaton (NDFA / NFA)

Deterministic Finite Automaton (DFA)

In DFA, for each input symbol, one can determine the state to which the machine will move.
Hence, it is called Deterministic Automaton. As it has a finite number of states, the machine
is called Deterministic Finite Machine or Deterministic Finite Automaton.

o Finite automata are used to recognize patterns.

o It takes the string of symbol as input and changes its state accordingly. When
the desired symbol is found, then the transition occurs.
Module 1

o At the time of transition, the automata can either move to the next state or
stay in the same state.
o Finite automata have two states, Accept state or Reject state. When the input
string is processed successfully, and the automata reached its final state, then
it will accept.

Formal Definition of a DFA

A DFA can be represented by a 5-tuple (Q, ∑, δ, q0, F) where −

• Q is a finite set of states.

• ∑ is a finite set of symbols called the alphabet.
• δ is the transition function where δ: Q × ∑ → Q
• q0 is the initial state from where any input is processed (q0 ∈ Q).
• F is a set of final state/states of Q (F ⊆ Q).

There are 2 preferred notations for describing automata

1.Transition diagram
2.Transition table

Transition Diagram:
A transition diagram or state transition diagram is a directed graph which can be constructed as
follows:
➢ There is a node for each state in Q, which is represented by the circle.
➢ There is a directed edge from node q to node p labeled a if δ(q, a) = p.
➢ In the start state, there is an arrow with no source.
➢ Accepting states or final states are indicating by a double circle.

q a p
Module 1

Transition Table
The transition table is basically a tabular representation of the transition function. It
takes two arguments (a state and a symbol) and returns a state (the "next state").

➢ A transition table is represented by the following things:

➢ Columns correspond to input symbols.
➢ Rows correspond to states.
➢ Entries correspond to the next state.
➢ The start state is denoted by an arrow with no source.
➢ The accept state is denoted by a star.

Example

Let a deterministic finite automaton be →

• Q = {a, b, c},
• ∑ = {0, 1},
• q0 = {a},
• F = {c}, and

Transition function δ as shown by the following table −known as transition table.

Present State Next State for Input 0 Next State for Input 1

a a b

b c a

c b c

Its graphical representation would be as follows −

c
Module 1

Example 1:

Solution:

Transition table of given DFA is as follows:

Present Next state Next State

State for Input 0 of Input 1

→q0 q1 q2

q1 q0 q2

*q2 q2 q2

Explanation:

o In the above table, the first column indicates all the current states. Under
column 0 and 1, the next states are shown.
o The first row of the transition table can be read as, when the current state is
q0, on input 0 the next state will be q1 and on input 1 the next state will be
q2.
o In the second row, when the current state is q1, on input 0, the next state will
be q0, and on 1 input the next state will be q2.
o In the third row, when the current state is q2 on input 0, the next state will be
q2, and on 1 input the next state will be q2.
o The arrow marked to q0 indicates that it is a start state and circle marked to
q2 indicates that it is a final state.

Example 2:
Module 1

Solution:

Transition table of given NFA is as follows:

Present Next state for Next

State Input 0 State of
Input 1

→q0 q0 q1

q1 q1, q2 q2

q2 q1 q3

*q3 q2 q2

Explanation:

o The first row of the transition table can be read as, when the current state is
q0, on input 0 the next state will be q0 and on input 1 the next state will be
q1.
o In the second row, when the current state is q1, on input 0 the next state will
be either q1 or q2, and on 1 input the next state will be q2.
o In the third row, when the current state is q2 on input 0, the next state will be
q1, and on 1 input the next state will be q3.
o In the fourth row, when the current state is q3 on input 0, the next state will be
q2, and on 1 input the next state will be q2.

Note:The term deterministic refers to the fact that on each input there is one
and only one state to which the automaton transition can happen from its
current state.
Module 1

There are 3 types of problems for which we can construct a DFA:

1.Pattern recognition problems
2.Divisible by K problems
3. Modulo-K-counter problems

1.Pattern recognition problems:

To solve such type of problems steps to be followed are as below:
step 1: Identify the minimum string
step 2:Identify the alphabets
step 3:Construct skeleton of DFA
step 4: Identify other transitions not defined in step 3.
step5:Construct DFA using transitions in step 3 & step 4.

1. Construct a DFA to accept string of a’s having atleast one ‘a’.

Sol:
step 1: Here the minimum string is a.
step 2: Alphabet ∑ = {a}
step3:

start q0 a q1 accept

ᵟ(q0,a)=q1
step 4: Identify other transitions

ᵟ(q1,a)=?
step5: Construct DFA using transitions in step 3 & step 4.

q0 a q1 a

a
q0 q1
q1
*q1
Note: atleast : min
atmost : max
therefore The language to be accepted by DFA can be written as
L={a,aa,aaa,aaaa,….}
Or
L={an,n≥1}
Or

L={w:na≥1,wϵ{a}*}
Module 1

2. Design a DFA to accept strings of a’s &b’s having atleast one a.

Sol:
i) Minimum string is ‘a’
ii) Alphabet ∑={a,b}
iii) Construct skeleton of DFA

a
q0 q1 accept

iv) Remaining transition

δ( q1,b) = q1
δ(q0,b) = q0
v)
b
q0 a
a q1 b,a

a b

q0 q1 q0
q1 q1
*q1
L={w:na(w)≥1,wϵ{a,b}*}

3.Draw a DFA to accept strings of a’s &b’s having exactly one a.

Sol:
i)Minimum string is a
ii)Alphabet are {a,b}
iii)Skeleton DFA

a
q0 q1 accept

iv)remaining transitions
δ(q1,b)=q1 abbbb
δ(q0,b)=q0 a
b b a,b bbbba
start a a baba
q0 q1 q2

trap state
accept
Module 1

Note: In the DFA, there is no transition from state q for the input symbol a.
But we can include a transition from state q1 for i/p ‘a’ to a non final reject state called
trap state.Therefore q2 is called trap state .

A b
q0 q1 q0
q2 q1
*q1
q2 q2 q2
4.Draw a DFA to accept strings of 0’s &1’s having 3 consecutive 0’s
Sol:
i)Minimum string 3 consecutive 0’s(000)
ii)Alphabet ∑={0,1}
iii)
1 0,1
q0 0 q1 0 q2 0 q3

1
Further Conversion u study from class notes.
Introduction to Compiling:
1.1 INTRODUCTION OF LANGUAGE PROCESSING SYSTEM

Fig 1.1: Language Processing System

Preprocessor

A preprocessor produce input to compilers. They may perform the following functions.

1. Macro processing: A preprocessor may allow a user to define macros that are short hands
for longer constructs.
2. File inclusion: A preprocessor may include header files into the program text.
3. Rational preprocessor: these preprocessors augment older languages with more modern
flow-of- control and data structuring facilities.
4. Language Extensions: These preprocessor attempts to add capabilities to the language by
certain amounts to build-in macro

COMPILER

Compiler is a translator program that translates a program written in (HLL) the source program
and translate it into an equivalent program in (MLL) the target program. As an important part of
a compiler is error showing to the programmer.

Fig 1.2: Structure of Compiler

ShreedeviPramod,Dept of CSE,BrCE
Executing a program written n HLL programming language is basically of two parts. the
source program must first be compiled translated into a object program. Then the results object
program is loaded into a memory executed.

Fig 1.3: Execution process of source program in Compiler

ASSEMBLER
Programmers found it difficult to write or read programs in machine language. They begin to use
a mnemonic (symbols) for each machine instruction, which they would subsequently translate
into machine language. Such a mnemonic machine language is now called an assembly
language. Programs known as assembler were written to automate the translation of assembly
language in to machine language. The input to an assembler program is called source program,
the output is a machine language translation (object program).

INTERPRETER
An interpreter is a program that appears to execute a source program as if it were machine language.

Fig1.4: Execution in Interpreter

Languages such as BASIC, SNOBOL, LISP can be translated using interpreters. JAVA also uses
interpreter. The process of interpretation can be carried out in following phases.
1. Lexical analysis
2. Syntax analysis
3. Semantic analysis
4. Direct Execution

Advantages:
Modification of user program can be easily made and implemented as execution
proceeds. Type of object that denotes a various may change dynamically.
Debugging a program and finding errors is simplified task for a program used for
interpretation. The interpreter for the language makes it machine independent.
Disadvantages:
The execution of the program is slower.
Memory consumption is more.

LOADER AND LINK-EDITOR:

Once the assembler procedures an object program, that program must be placed into memory and

ShreedeviPramod,Dept of CSE,BrCE
executed. The assembler could place the object program directly in memory and transfer control
to it,

ShreedeviPramod,Dept of CSE,BrCE
thereby causing the machine language program to be execute. This would waste core by leaving
the assembler in memory while the user’s program was being executed. Also the programmer
would have to retranslate his program with each execution, thus wasting translation time. To
over come this problems of wasted translation time and memory. System programmers
developed another component called loader
“A loader is a program that places programs into memory and prepares them for execution.” It
would be more efficient if subroutines could be translated into object form the loader
could”relocate” directly behind the user’s program. The task of adjusting programs or they may
be placed in arbitrary core locations is called relocation. Relocation loaders perform four
functions.

1.2 TRANSLATOR
A translator is a program that takes as input a program written in one language and produces as
output a program in another language. Beside program translation, the translator performs
another very important role, the error-detection. Any violation of d HLL specification would be
detected and reported to the programmers. Important role of translator are:
1 Translating the HLL program input into an equivalent ml program.
2 Providing diagnostic messages wherever the programmer violates specification of the HLL.

1.3 LIST OF COMPILERS

1. Ada compilers
2 .ALGOL compilers
3 .BASIC compilers
4 .C# compilers
5 .C compilers
6 .C++ compilers
7 .COBOL compilers
8 .Common Lisp compilers
9. ECMAScript interpreters
10. Fortran compilers
11 .Java compilers
12. Pascal compilers
13. PL/I compilers
14. Python compilers
15. Smalltalk compilers

1.4 STRUCTURE OF THE COMPILER DESIGN

Phases of a compiler: A compiler operates in phases. A phase is a logically interrelated

operation that takes source program in one representation and produces output in another
representation. The phases of a compiler are shown in below
There are two phases of compilation.
a. Analysis (Machine Independent/Language Dependent)
b. Synthesis(Machine Dependent/Language independent)

Compilation process is partitioned into no-of-sub processes called ‘phases’.

Lexical Analysis:-
LA or Scanners reads the source program one character at a time, carving the source program into
a sequence of automic units called tokens.
ShreedeviPramod,Dept of CSE,BrCE
Fig 1.5: Phases of Compiler

Syntax Analysis:-
The second stage of translation is called Syntax analysis or parsing. In this phase expressions,
statements, declarations etc… are identified by using the results of lexical analysis. Syntax
analysis is aided by using techniques based on formal grammar of the programming language.

Intermediate Code Generations:-

An intermediate representation of the final machine language code is produced. This phase
bridges the analysis and synthesis phases of translation.

Code Optimization :-
This is optional phase described to improve the intermediate code so that the output runs faster
and takes less space.

Code Generation:-
The last phase of translation is code generation. A number of optimizations to reduce the length
of machine language program are carried out during this phase. The output of the code
generator is the machine language program of the specified computer.

Code Generator produces the object code by deciding on the memory locations for data,
selecting code to access each datum and selecting the registers in which each computation is to
be done. Many computers have only a few high speed registers in which computations can be
ShreedeviPramod,Dept of CSE,BrCE
performed quickly. A good code generator would attempt to utilize registers as efficiently as
possible.
Table Management (or) Book-keeping:-
A compiler needs to collect information about all the data objects that appear in the source
program. The information about data objects is collected by the early phases of the compiler-
lexical and syntactic analyzers. The data structure used to record this information is called as
Symbol Table.
Error Handing :-
One of the most important functions of a compiler is the detection and reporting of errors in the
source program. The error message should allow the programmer to determine exactly where the
errors have occurred. Errors may occur in all or the phases of a compiler.
Whenever a phase of the compiler discovers an error, it must report the error to the error handler,
which issues an appropriate diagnostic msg. Both of the table-management and error-Handling
routines interact with all phases of the compiler.
Example:

ShreedeviPramod,Dept of CSE,BrCE
Fig 1.6: Compilation Process of a source code through phases

ShreedeviPramod,Dept of CSE,BrCE
Lexical Analyzer
THE ROLL OF THE LEXICAL ANALYZER

• Lexical Analyzer reads the source program character by character to

produce tokens.
• Normally a lexical analyzer doesn’t return a list of tokens at one shot,
it returns a token when the parser asks a token from it.

source Lexical token

To semantic analysis
program
Parser
Analyzer get next token

Symbol
Table

The lexical analysis is the first phase of the compiler where a lexical analyser operate as an
interface between the source code and the rest of the phases of a compiler. It reads the input
characters of the source program, groups them into lexemes, and produces a sequence of
tokens for each lexeme. The tokens are sent to the parser for syntax analysis.

If the lexical analyzer is located as a separate pass in the compiler it can need an intermediate
file to locate its output, from which the parser would then takes its input. It can eliminate the
need for the intermediate file, the lexical analyzer and the syntactic analyser (parser) are often
grouped into the same pass where the lexical analyser operates either under the control of the
parser or as a subroutine with the parser.

The lexical analyzer also interacts with the symbol table while passing tokens to the parser.
Whenever a token is discovered, the lexical analyzer returns a representation for that token to
the parser. If the token is a simple construct including parenthesis, comma, or a colon, then it
returns an integer program. If the token is a more complex items including an identifier or
another token with a value, the value is also passed to the parser.

Lexical analyzer separates the characters of the source language into groups that logically
belong together, called tokens. It includes the token name which is an abstract symbol that
define a type of lexical unit and an optional attribute value called token values. Tokens can be
identifiers, keywords, constants, operators, and punctuation symbols including commas and

ShreedeviPramod,Dept of CSE,BrCE
parenthesis. A rule that represent a group of input strings for which the equal token is make
as output is called the pattern.

Regular expression plays an essential role in specifying patterns. If a keyword is treated as a

token, the pattern is only a sequence of characters. For identifiers and various tokens, patterns
form a difficult structure.

The lexical analyzer also handles issues including stripping out the comments and whitespace
(tab, newline, blank, and other characters that are used to separate tokens in the input). The
correlating error messages that are generated by the compiler during lexical analyzer with the
source program.

For example, it can maintain track of all newline characters so that it can relate an ambiguous
statement line number with each error message. It can be implementing the expansion of
macros, in the case of macro, pre-processors are used in the source program.

Lexical Analysis has to access secondary memory each time to identify tokens. It is time-
consuming and costly. So, the input strings are stored into a buffer and then scanned by
Lexical Analysis.

Lexical Analysis scans input string from left to right one character at a time to identify
tokens. It uses two pointers to scan tokens −

• Begin Pointer (bptr) − It points to the beginning of the string to be read.

• Look Ahead Pointer (lptr) − It moves ahead to search for the end of the token.

Example − For statement int a, b;

• Both pointers start at the beginning of the string, which is stored in the buffer.

• Look Ahead Pointer scans buffer until the token is found.

ShreedeviPramod,Dept of CSE,BrCE
• The character ("blank space") beyond the token ("int") have to be examined before the
token ("int") will be determined.

• After processing token ("int") both pointers will set to the next token ('a'), & this
process will be repeated for the whole program.

A buffer can be divided into two halves. If the look Ahead pointer moves towards halfway in
First Half, the second half is filled with new characters to be read. If the look Ahead pointer
moves towards the right end of the buffer of the second half, the first half will be filled with
new characters, and it goes on.

ShreedeviPramod,Dept of CSE,BrCE
Sentinels − Sentinels are used to making a check, each time when the forward pointer is
converted, a check is completed to provide that one half of the buffer has not converted off. If
it is completed, then the other half should be reloaded.

Buffer Pairs − A specialized buffering technique can decrease the amount of overhead,
which is needed to process an input character in transferring characters. It includes two
buffers, each includes N-character size which is reloaded alternatively.

There are two pointers such as the lexeme Begin and forward are supported. Lexeme Begin
points to the starting of the current lexeme which is discovered. Forward scans ahead before a
match for a pattern are discovered. Before a lexeme is initiated, lexeme begin is set to the
character directly after the lexeme which is only constructed, and forward is set to the
character at its right end.

Preliminary Scanning − Certain processes are best performed as characters are moved from
the source file to the buffer. For example, it can delete comments. Languages
like FORTRAN which ignores blank can delete them from the character stream. It can also
collapse strings of several blanks into one blank. Pre-processing the character stream being
subjected to lexical analysis saves the trouble of moving the look ahead pointer back and
forth over a string of blanks.

Token:

It is basically a sequence of characters that are treated as a unit as it cannot be further

broken down.
In programming languages like C language-
keywords (int, char, float, const, goto, continue, etc.)
identifiers (user-defined names),
operators (+, -, *, /), delimiters/punctuators like comma (,), semicolon(;), braces ({ }), etc.
, strings can be considered as tokens.
This phase recognizes three types of tokens: Terminal Symbols (TRM)- Keywords and
Operators, Literals (LIT), and Identifiers (IDN).

ShreedeviPramod,Dept of CSE,BrCE
A token is a pain consisting of a token name and an optional attribute value. The token name
is an abstract symbol representing a kind of lexical unit. A particular denoting an identifier.
The token names are the input symbol that are the parser process.
Typically tokens are ,
1) Identifiers
2) Keywords
3) Operator
4) special symbol 5)constant
Pattern
A pattern is a description of the form that the lexeme’s of a token may take. In the case
of keyword as a token , the pattern is just the sequence of characters that form
the keyword. For identifiers and some other tokens. The pattern is more complex
structure.That is matched by many string.
Lexeme
A lexeme is a sequence of character in the source program that matches pattern for a token
and identified by the lexical analyzer as an instance of that token.

Example : while( y > = t ) = y -3

Will be represented by the set of pairs.

ShreedeviPramod,Dept of CSE,BrCE

Automata Theory and Compiler Design
0% (1)
Automata Theory and Compiler Design
4 pages
5th Sem
No ratings yet
5th Sem
20 pages
5 Sem 21 Scheme Syllabus
No ratings yet
5 Sem 21 Scheme Syllabus
23 pages
Issyll Merged
No ratings yet
Issyll Merged
15 pages
BCS503 - Theory of Computation - Syllabus
No ratings yet
BCS503 - Theory of Computation - Syllabus
3 pages
Cse228 Toc
No ratings yet
Cse228 Toc
10 pages
Automata & Compiler Design Course
No ratings yet
Automata & Compiler Design Course
43 pages
CS3103-1 - Toc - Lesson Plan
No ratings yet
CS3103-1 - Toc - Lesson Plan
19 pages
Syallabus
No ratings yet
Syallabus
2 pages
Syllabus
No ratings yet
Syllabus
2 pages
20210624-80604 Automata and Compiler Design
No ratings yet
20210624-80604 Automata and Compiler Design
59 pages
Automata Handout
No ratings yet
Automata Handout
10 pages
20CM1108
No ratings yet
20CM1108
3 pages
Automata Theory & Compiler Design
No ratings yet
Automata Theory & Compiler Design
69 pages
Course Outline 251 CSE2233 D AAU
No ratings yet
Course Outline 251 CSE2233 D AAU
5 pages
Act Co
No ratings yet
Act Co
3 pages
FLAT Lesson Plan 23 24
No ratings yet
FLAT Lesson Plan 23 24
7 pages
Cse2013 - Theory-Of-Computation - TH - 1.0 - 63 - Cse2013 - 59 Acp
100% (1)
Cse2013 - Theory-Of-Computation - TH - 1.0 - 63 - Cse2013 - 59 Acp
2 pages
Course Title: Theory of Computation Credit Units: 04 Course Code: CSE204
No ratings yet
Course Title: Theory of Computation Credit Units: 04 Course Code: CSE204
3 pages
Automata & Compiler Design Course
No ratings yet
Automata & Compiler Design Course
6 pages
Tafl Syllabus
No ratings yet
Tafl Syllabus
2 pages
At&cd DCM Unit 4-1
No ratings yet
At&cd DCM Unit 4-1
113 pages
Unit I - Automata and Lexical Analyzer
No ratings yet
Unit I - Automata and Lexical Analyzer
145 pages
S6 Syllabus
No ratings yet
S6 Syllabus
31 pages
0.0 Introduction To Course
No ratings yet
0.0 Introduction To Course
20 pages
Theory of Computation Course
No ratings yet
Theory of Computation Course
2 pages
It (Acd-Question Bank)
No ratings yet
It (Acd-Question Bank)
5 pages
CENG280 Info PDF
No ratings yet
CENG280 Info PDF
2 pages
Automata and Compiler Design (R18a1201)
No ratings yet
Automata and Compiler Design (R18a1201)
67 pages
ITT307 - Ktu Qbank
No ratings yet
ITT307 - Ktu Qbank
7 pages
Automata Theory and Compiler Design Course Handout
No ratings yet
Automata Theory and Compiler Design Course Handout
23 pages
Lm-Flat
No ratings yet
Lm-Flat
125 pages
AT Syllabus
No ratings yet
AT Syllabus
3 pages
Course Handbook - Formal Language and Automata Theory
No ratings yet
Course Handbook - Formal Language and Automata Theory
2 pages
S3 TheoryOfComputation
No ratings yet
S3 TheoryOfComputation
5 pages
It2106 Flat SPR
No ratings yet
It2106 Flat SPR
3 pages
Lesson Plan of at&CD Cse (Aiml) II-II
No ratings yet
Lesson Plan of at&CD Cse (Aiml) II-II
23 pages
Flat R20
No ratings yet
Flat R20
152 pages
NewSyllabus 143320148604903
No ratings yet
NewSyllabus 143320148604903
3 pages
Indico AL2 Course-Outline
No ratings yet
Indico AL2 Course-Outline
4 pages
FLA Syllabus
No ratings yet
FLA Syllabus
2 pages
Cs8501 - Theory of Computation - by DR W T Chembian
No ratings yet
Cs8501 - Theory of Computation - by DR W T Chembian
167 pages
Syllabus
No ratings yet
Syllabus
3 pages
ACD Notes Full PDF
No ratings yet
ACD Notes Full PDF
59 pages
Proceedings of The 39th Academic Council (17.12.2015)
No ratings yet
Proceedings of The 39th Academic Council (17.12.2015)
8 pages
Compiler Design
No ratings yet
Compiler Design
4 pages
CSPC41-Section B
No ratings yet
CSPC41-Section B
4 pages
Outline PDF
No ratings yet
Outline PDF
3 pages
Theory of Computation Course Plan
No ratings yet
Theory of Computation Course Plan
8 pages
Acd 14-07-2025
No ratings yet
Acd 14-07-2025
8 pages
Formal Languages & Automata Theory
No ratings yet
Formal Languages & Automata Theory
46 pages
Automata Outline
No ratings yet
Automata Outline
4 pages
It - (R22) - 2-2 - Automata and Compiler Design - Digital Notes - (2023-24)
No ratings yet
It - (R22) - 2-2 - Automata and Compiler Design - Digital Notes - (2023-24)
64 pages
Theory of Automata Notes Theory of Automata Notes
No ratings yet
Theory of Automata Notes Theory of Automata Notes
58 pages
Compiler Design
No ratings yet
Compiler Design
4 pages
Btech - Syllabus 3rd Year
No ratings yet
Btech - Syllabus 3rd Year
26 pages
Parul University - Faculty of Engineering & Technology
100% (1)
Parul University - Faculty of Engineering & Technology
2 pages
Gujarat Technological University: W.E.F. AY 2020-21
No ratings yet
Gujarat Technological University: W.E.F. AY 2020-21
4 pages
CG Mcqs For Quiz Tomarrow Exam-AJANIL
No ratings yet
CG Mcqs For Quiz Tomarrow Exam-AJANIL
12 pages
Papellidema
No ratings yet
Papellidema
3 pages
Papellidema
No ratings yet
Papellidema
3 pages
Cloud Application Developer Sample Questions
No ratings yet
Cloud Application Developer Sample Questions
5 pages
Explanation
No ratings yet
Explanation
9 pages
At Solutions
No ratings yet
At Solutions
15 pages
Mod 3 (Diagarams)
No ratings yet
Mod 3 (Diagarams)
16 pages
OSC Cie-3
No ratings yet
OSC Cie-3
16 pages
SEPM Imp Solutions
No ratings yet
SEPM Imp Solutions
16 pages
Structure
No ratings yet
Structure
3 pages
Orbital Distance Points
No ratings yet
Orbital Distance Points
1 page
Big Data Analytics Assignment Guide
No ratings yet
Big Data Analytics Assignment Guide
1 page
Practice
No ratings yet
Practice
45 pages
21 MAT41 Set 1
No ratings yet
21 MAT41 Set 1
8 pages
7th Cssyll
No ratings yet
7th Cssyll
49 pages
Top 5 Free AI ML Deep Learning Projects For KSCST 2025
No ratings yet
Top 5 Free AI ML Deep Learning Projects For KSCST 2025
2 pages
Name1 m1c1 m1c2 Merged
No ratings yet
Name1 m1c1 m1c2 Merged
233 pages
J.K. Brimacombe - Design of Continuous Casting Machines
100% (1)
J.K. Brimacombe - Design of Continuous Casting Machines
13 pages
Hsslive XI CS Chap6 Data Types and Operators
No ratings yet
Hsslive XI CS Chap6 Data Types and Operators
4 pages
Andrew Bredenkamp, Stella Markantonatou and Louisa Sadler
No ratings yet
Andrew Bredenkamp, Stella Markantonatou and Louisa Sadler
6 pages
Civil Service Daily Time Record
No ratings yet
Civil Service Daily Time Record
1 page
Model Questions DWT
No ratings yet
Model Questions DWT
2 pages
MCS-011 D16 Compressed
No ratings yet
MCS-011 D16 Compressed
2 pages
Log Data Normalization for Event Streams
No ratings yet
Log Data Normalization for Event Streams
6 pages
OOP Lab03 BasicOOTechniques
No ratings yet
OOP Lab03 BasicOOTechniques
22 pages
Taguig Population Data
No ratings yet
Taguig Population Data
4 pages
DE Shaw OA Round
No ratings yet
DE Shaw OA Round
7 pages
SPEED7 Driver License Activation Guide
No ratings yet
SPEED7 Driver License Activation Guide
6 pages
No-Reference Image Quality via Entropy
No ratings yet
No-Reference Image Quality via Entropy
8 pages
Ds Mid-II 2nd Year Dip
No ratings yet
Ds Mid-II 2nd Year Dip
4 pages
V Semester CSIOT Core Course Syllabi
No ratings yet
V Semester CSIOT Core Course Syllabi
18 pages
Relations and Functions
No ratings yet
Relations and Functions
17 pages
Divide and Conquer 2
No ratings yet
Divide and Conquer 2
6 pages
Flutter Dependency Injection Guide
No ratings yet
Flutter Dependency Injection Guide
10 pages
Chapter-7. Binary Search Tree
No ratings yet
Chapter-7. Binary Search Tree
12 pages
Mid Exam of OOP-Aimen BEE-3A
No ratings yet
Mid Exam of OOP-Aimen BEE-3A
5 pages
UNIT-2 Data Analytics Using R
No ratings yet
UNIT-2 Data Analytics Using R
23 pages
Subjective Type Questions
No ratings yet
Subjective Type Questions
24 pages
CH 4 - Context Free Languages Amd Grammars
No ratings yet
CH 4 - Context Free Languages Amd Grammars
86 pages
Parsing
No ratings yet
Parsing
6 pages
TSP Saa 8 Puzzle
No ratings yet
TSP Saa 8 Puzzle
9 pages
3 Đèn Luân Phiên
No ratings yet
3 Đèn Luân Phiên
1 page
Sltest Ref
No ratings yet
Sltest Ref
610 pages
Exam Dump
No ratings yet
Exam Dump
17 pages
Arithmetic and Quadratic Sequence
No ratings yet
Arithmetic and Quadratic Sequence
25 pages
B9bc7e68 1729316114477
No ratings yet
B9bc7e68 1729316114477
61 pages
Echomind Report
No ratings yet
Echomind Report
22 pages