0 ratings0% found this document useful (0 votes) 81 views41 pagesCD Unit 1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, 
claim it here.
Available Formats
Download as PDF or read online on Scribd
CONTENTS
16
Ww
18
CHAPTER 2.
21
22
23
2a.
25
26
27
28
29
‘CHAPTER 3. CONTEXT FREE GRAMMAR ...
By
32
33
  
I-27
INTRODUCTION .
1
aes Sseueros
‘Frres OF TRANSLATORS
134 Compilers
132 Assembler
13.3 Interpreters
13a Macros
sors
Tinker and toad
136 _ Unkers
EOF A COM EQNSTRUCTION OF COMPILER
 
‘Syntax Analysis (Parse!)
ie analysts
Seman cade Generation
ose Stn
6 Code Generation
8 7 Table Management (Bookkeeping)
136 Erortanding
ASSES IN COMPILATION
1.6.1 Classification ‘of compiler w.r.t Number ‘of Passes
1.6.2 __One Pass vs Multi Pass’ ‘Compiler
BOOTSTRAPPING =
COMPILER CONSTRUCTION TOOLS
. LEXICAL ANALYSIS ..
INTRODUCTION
2.1.1 Role of Lexical Analyzer
312 fnpu Bufesng
BeStGN OF LEXICAL ANALYZER
Sei Gransition Diagram (Finite Automata) for Tokens :
‘SPECIFICATION OF TOKENS (REGULAR EXPRESSIONS)
RECOGNITION OF TOKEN (FINITE AUTOMATA)
241 Types of Finite Automata
‘CONVERSION OF REGULAR EXPRESSION TO FINITE AUTOMATA (NFA)
CONVERSION FROM NFA TO DFA (Subset Constructic
MINIMIZING NUMBER OF STATES OF DFA mien
LANGUAGE FOR SPECIFYING LEXICAL ANALYZERS
IMPLEMENTATION OF A LEXICAL ANALYZER,
 
vee 28-66
  
 
  
‘wrrooucion
Seavanons
TREES (DERIVA
yee TION TREES)
122 Conversion of Ambiguous to Unambiguous GrammarINTRODUCTION
1.1 DEFINITION OF COMPILER
 
The compiler translates a human-readable Program into machine-executable code.
. OR
A compiler is a program that translates a high level language (e.g. C, C++ :
Java) into a low level language.
 
Source Language Target Program
COMPILER }-—————""">
(A programme written
(Apr rogram written
in High-level language) in Machine code)
e.g. C, C++, Java
 
 
 
Fig. 1.1: Compiler
1.2. COMPILERS AND TRANSLATORS
Over the years, Programming languages have progressed from machine
languages written in binary 0s & 1s to the languages written in English terms which
can be easily understandable by user.
Computer languages can be classified into following categories :—
(@) Machine language (Binary form 0's & 1's)
(6) Assembly language
(© High level language (e.g. C, C++, Java)
(c) Machine language : There is only one language understood by computer
ie, Machine language. It is normally written as strings of binary 1's and 0's.
Obviously, it is difficult to read & understand the instructions of binary form. A
Program written in machine language has following disadvantages :— _
1. Difficult to read & understand
2, Machine dependent :
8. Error Prone.ISHAN'S Compe,
 
 
some new langua
va to these limitations, over the Years £8 han
cue 2 shew Melly understandable by @ user and which arg yy.
this, we also require a software Program which
a
boa,
dependent. But t0 do h
emt Gepesdent program into Machine language becauso o
OP Ute,
 
by
SY introduction
 
hy q translator which converte high
uk Janguage. I
  
4
        
e instructions. Therefore, we require
‘As, Computor can directly execute mac!
‘Assembly language into machine;
 
general way, we can det ras. a
s one language into another
or : A program whic!
 
       
 
 
‘convert this machine in
Gin caly understand Machine language. That Software Program ig cally Tr
‘Translator (Compiler or Assembler). dt tanguay
(@) Assembly language : Assembly language was introduced in 1959 ‘A program that translates between high level languages is called "language
‘overcome some of the sbove limitations of Machine language. Using A, Whicy_ translator", "Source to Source translator” or "language rewriter".
language 2 programmer need not keep track of storage locations of aa |
Program in ee Program ia
4 one Language ieee ‘Ancther Language
instructions. Instron can be easily written in the form of letters & eymbon
ols but
notin Os & 1's
For Bg. Toadd two numbers we write ADD A, B in Assembly language,
Following are the advantages of Assembly language over by
: aching
language:
(0) Reading is easier. ‘
(®) Addresses.
© Aidreses are eynbole& programmer need not worry about ade
mnemonic eg we use ST instead of 01010000 for store instr.a
Assembly language cece ea
(Easy w locate and carret errors
(©) High level Iti
wey e enguage :It is machine independent language whi
This cakes ee nT: to know any thing about internal structures erm ee ©
So enn sg easier. e.g. C, C+, Java, ete, cae
'¥o numbers we can simply write
¢=2+ bin high evel language,
    
 
 
 
 
High level language.
low Leet ee ‘Assembly &]
or
Machine langu 24139119 9001
[SseS8h lenguegs —— 219000 | | Complexity Decreases
poe |
SUB.
High ievel BAB
language gO | User
D | intabe Understandability
ti Increases,
 
 
 
 
a
 
 
Fig. 1.2: Translator
TYPES OF TRANSLATORS
Compilers
Assemblers
Interpreters
Macros
Preprocessors
Linkers & Loaders.
beaded
1.3.1 Compilers
‘A compiler is a translator which converts high level language (soure program)
into low level language (object program or Machine program).
 
 
Machine
 
 
 
tore ome
vores ae
eal
El
2 Fig. 1.8: Compiler
Compiler also find out the various errors encountered during compilation of
"Yow-level language using various
‘through various phases of
Program.
Compiler converts high level language into
phases. A character stream inputted by user goes
compilation which at last will give target language.1.3.2 Assembler
° translator which translates Assembly language progran,
‘Assembler is & program of he comPuteE
valent machine language
ta eee
sets Tite | ein
Program T
Fig. 14: Assembler
e more time to run assembly language progr,
: than
its
Computer will always tak
machine language program.
» Computer has to convert first assembly language prog,
fam inty
Because
rachine program by running assembler program. But Assembler reduce,
effort done by programmer. As user has to write instructions in form of Joy Xtra
symbols but not inform of O's & 1's ters oy
133 Interpreters
4n Interpreter a program which execute the programmis
/ grammin, i
instead of just trafblating it into another format. Tt translates: ann gee
programming langulge statements one by one. Sxecutes
|i tev
tithe interpreter Jo Rona of
Program Execution
irect Execution)
 
 
 
 
Fig. 16: Interpreter
Interpreters translates (e compile) & exccutes etatement by statement of whole
ile) &
rogram in very slower manner,
guages which were mainly used during 1970s use
 
 
    
Introduction
Difference between Compiler & Interpreter :-
 
 
Interpreter
Tt translates & executes the
program
‘They are slower as they have to
translate whole program line by
line.
Requires less memory for execution
‘Compiler
T | Itonly translates the program
 
 
2. | They are faster as they translate
the whole program at once
8, | Require more memory for
execution.
4, | Difficult to write Compiler | Easy to write
 
 
programs
6. | C, CH uses compiler Java uses Interpreter
 
 
 
(1.3.4 Macros
| “Macros are like functions having actual & formal arguments as we have in C,
(C++. But, Rather Macros are used in Assembly language, not in High level language
(c,CH)
For e.g. A Macro named SUM having P and Q as its formal arguments is called
by Macro calling statement i.e. SUM A,B where A and B are its actual arguments. A
and B will be copied into formal arguments P & Q respectively.
MACRO MACRO
SUMP.Q Macro Translation for SUM AB
LOAD Q — LOAD B
ADDP SUM A,B ADDA
STORE P STORE B
MEND MEND
1.3.5 Preprocessors
fare programs that, transform the source code before
Preprocessors
compilation. As, Compilers are called language processors, therefore these are
called Preprocessors. Preprocessing occurs before compilation i.e preprocessor tells or
directs the compiler, to include some header files into the program.
 
Source Pree preprocessor} —»] Compiler —e Machine Languase
ogra
 
 
 
 
 
 
Fig. 1.7: Preprocessorsring «
1.36 Linkers and Loaders
4.3.6.1 Linkers:
‘A Linker ia a program that combines object modules to foy,
executable program, A cftware oft conaat of multiple source progr
retin which auld bo contained before execution. Bach modulo of « softy,
to produce object programe (object modules), Linker
to loader to load it into smomory ta
may
8 op
  
 
 
1.3.6.2 Loaders
 
 
 
 
Executable Program
Fig. 1.8: Linking and Loading
Wty
tituto value 100 at every occur
Nee
a
aro de
loads them into main memory for
 
over
 
Linker perform task of" Symbol Resolution".
‘Symbol Resolution : Software consist of various programs (or modules). The
2 (or functions) in one program can be referenced by subprogram (or
another program through symbols.
 
       
Program 2.
# include 
extorn void dsplay () 5
Program 1.
# include 
‘extern void show ()
‘void main () void show ()
{ (
show () display ();
  
9 : Symbol Resolution
‘The variablee & functions which uso Yoxtern" keyword i, for global declaration
leclared in ono program and ean be roferonced in another program.
 
 
accepts the input as linked modules &
execution by the computer.
from secondary storage to main memory for
\ddroseos with physical addresses.
& londor. Bas there tasks
Londer is a program which
 
Loaders load or copies program
jaders can also replace virtual a‘
no clear difforence b/w tasks of
   
 
‘There
 
 
 
 
 
slap with each other. As, the loader ean porform both linking & 1o
Source Reault of
Progra] Taneiator Enoeution
 
 
 
 
 
 
Fig. 1.9: Loader and LinkerFunctions of Loar” to space in momory forthe Programs
Allocation abel Resolution between object modules
Linking rm adres dspendent loetions of 2427656 cong,
om eto different part of@ Program a
nd addrespine instructions & data into memory.
  
 
  
Loading Schemes
1. Simple comp
 
ion without loader (Compile & go Loading sep,
me),
SuurceProeram[ Comrie | onjet
tee | Freer | Program
Loaded.
 
 
 
 
Fig. (2)
2. Compilation with loader (General Loading scheme)
Source Progra 1
     
  
 
      
 
 
 
 
 
‘ones
' Pron Sm
P
Loader
Source Program 2 [Com sou
SEREAE] Comper | —aPoO" Performa both
Loading & Linking
Free
Memory
Pi 7
‘1.11: Londing Schemes
 
ISHAN'S Compa
De
ty
Introduction, 2
‘As shown in Figure 1.11 (a).
If compilation is done & object programs are directly loaded without a loader,
than a translator (compiler or Assembler) program will also take lot of space in
memory which will result in wastage of memory. In figure 1.11 (6).
(On using Loaders with compilation, Loaders will replace the space taken by
compiler or Assembler. But loaders will take less memory than compiler or
‘Assembler causing more free memory to be available to user.
‘Types of Loaders :
(Absolute Loaders : It is a primitive type of loader which does only the
Joading. It does not perform linking & program relocation.
In Absolute loading, the programmer has to give the address in memory
explicitly where he/she wants to store the program. This is the main
limitation of Absolute loading.
@
Relocatable Loader : A loader which also performs relocation with
loading. It is the responsibility of Relocatable loader to load each function
‘or subprogram at non-overlapping addresses & to give each function a
original load address.
1.4. THE STRUCTURE OF A COMPILER.
A compile:
 
system software or a program which takes as input the source
language (e.g C, C++, Java) & produces its corresponding low level language
(Machine language program).
In this topic, we are basically going to discuss the internal structure of a
compiler ie. how actually compiler converts a program writte:
language which is understandable by machine or a compute
‘This operation or process is so complex & difficult such that it
do this whole process in one go.
 
auser toa
‘stern.
impossible to
 
So, the process of compilation is generally broken down into subtaske calle
phases. Each phase is interdependent on other phase. Output of one phase will b
input to another phase. First phase of compiler takes as input the source languag
written by a programmer & the last phase produces the required object language (
target program).
Figure 1.12 shows the structure of a compiler or various phases
compiler.Inoducton : "
3. Semantic Analysis
 
‘Performe Type checking ie for each operator, type compatibilities
of operators are checked
 
4, Intermediate code Generation
 
 
= Converts output of ite previous phase into intermediate
representation. E.g (a) Postfix Notation
(0) Three Address code
© Quadruples
(@ Triples & Indirect Triples
Error Handling (© Syntax Trees
5. Code Optimization
 
 
 
 
 
 
Tati frtermediato Code Generation
[rt code
 
 
 
 
 
 
     
 
 
= Tt converts the intermediate representation of source program into
fan efficient code. ie faster running code. Code optimization can be
done in 3 ways
(©) Local optimization : Elimination of common sub expression
(&) Loop optimization : Removing loop invariants,
(© Global Data Flow Analysis : Perform optimization using
 
 
 
 
 
 
 
  
     
 
 
    
| Target Programa information flow.
Ready for Execution) @.__Code Generation
‘The whol Fig. 1.12: Phases of Compiler = Te converts optimized intermediate code into Machine TAssembly
ole
Process of compilation - is carried out in diffe code.
ferent phases “= Ttallocates memory locations for variables in the program.
Tt reads the = Geanner) 7. Table Management
Source program one character at a time & group th Keep Record of each token & ite attributes (je identifier name
A token is a sec a i
quence of . data type, location etc)
t of chars ;
= puma zments ldeatitey acne collective meaning Eg of + Symbol Table is a data structures which eontaine Tokens ___
ens in the e rs, labels, : ;
2. Syntax Analysis to table, eament %. Error Handling
+ Receives tok Detect & Report errors occured at each phase of the compiler.
 
 
 
  
     
(lexical Analyei) <2, iaput
Analysis) & eenerated -
are PPOduet whicrarchal gee erence
called Syntaxane compilation P
 
 
rouse for an input A= BYCap @
 
 
 
 
 
 
           
 
 
 
 
Saree
ON aera ore eae ge re
 
8
Examples of Toke:
 
 
 
 
 
 
 
 
 
 
1: Shot
Example 1 Soe oF compiler et
aon —oatpet__| Q Sporaiare ae
bse es © 5.6, 60.5)
[SNo|___—" "Source Program _ @ Identifiers sum
7, [Procrammer_ ecaana © “Labels
[2 [besical Ansley ee Tee
= [Syntax Analysis
Ch B ‘c
= is ‘Tree eck data
| [Semantic Analysis [Parse whether Nuceet ARS : i
Bg character to2! or n°) Find out Tokens & their types.
cannot beT data yt Token Type Values.
JArithmetic ene, Weed IDENTIFIER a
etic operation IDENTIFIER, b
—lintermediate code|Thres Address code|T1 = B+C RSSIGNOPERATOR = &
[Generation J(ntermediate code) |T2=T14p ‘ADD-OPERATOR +
JA=T2 CONSTANT 5
& [Code optimization optimized T1=B+C 2. Put information about Tokens into Symbol Table.
lintermediatecode JA = T24D
\ddress_ ‘Token Type, val
7, |GodeGeneration [Assembly code MOV BRT a id, integer valu en
Mov cc Re 382 id, integer, value= b
ADD R21 :
MOV DR :
[ADD R38 R} ‘360 ‘constant, integer, value =5
IMOV_R1 a A
 
 
 
 
 
 
15 DIFFERENT PHASESISTATES IN CONSTRUCTION OF COMPILER
15.1 Lexical Analysis (Scanner)
Lexical Analyzer read:
a is the source prograi
tokens of source program A token i erry
collective meaning then ora
+ it insert tokens into symbol
of characters whic!
table,
 
character by character and returns
+h have some
 
 
 
 
 
 
Remember, Information regarding keyword, operators, delimiters need not |
inserted into symbol table, as they are repeated number of times ina program
have very common occurrence in a program,
3. After Sing out tokens & storing them into symbol table, a token stre
is generated as follows
 
    
332] + (const, 360]
is of form (token - type , index}
token-type tells whether it is constant, identifier, label ete.
index tells about the address of the token in the symbol table.A lee
4 jon performed by Lexical Analygi, S
Brample 8 What will be the cpt perfo 498I8 Phas, ~
statement. WA= Jo) then’ co %
‘Ans.
‘Tokens will be
© ken Tye hig!
xeyworDs ——_‘Iethen, GOTO
DENTIFIERS A
ASSIGN-OPERATOR = = =
LABEL 200
DELIMITERS —(,)
(i) Symbol Table
address | Token Type, value
236 id, integer, value = A
238 constant, integer, value = 10
288
 
label, value = 200
 
 
 
 
(ii) Token Stream will be,
id, 236) = [constant, 298) then GOTO flabel, 288)
155.2 Syntax Analysis (Parser)
Program,
 
oe |
 
aN
aa | const (10)
 
1.5.3 Semantic Analysis}
‘Semantic refers to the "meaning of a program" as opposed to its syntax or
structure. Functions performed by a semantic Analyzer
(a) Type Checking : Checks or verifies that each operator has operands that
are permitted by source language jon or there should be type
compatibility between operator & operands.
Implicit type conversion (coercion) : Coercion refers to changing one
data type to another automatically when data type rands of an
operator are different or mismatches. Bg in Pascal when "+" operator has
operands of type real & integer, the integer type will be automatically
converted to real type before the addition is done.
ie int+ real
] Type conversion
real + real = real
Egg. of Semantic Anal
If a & bare real no's & 10
toreal
Semantic
a+b* 10 Sait + a+b * int to real (10)
 
   
©
   
an integer then semantic Analysis will convert interation
rate code CF" j
“54 Intermediate Code vox & semantic analyse on Program, comp
; sing sy! i termediate between r
er which i basically Source ja Re
inte
 
 
Fig.
atypesof Intermediate Code =
1. Postfix Notation
2, Three address code
(@) Quadruples
@ Triples
3. Syntax Trees
1. Postfix Notation : In postfix Notation,
‘ce operator fllows an operan:
ee.
 
 
 
() Postfix Notation for expres
Eg Three Address code for the statement
 
   
Intermediate Code
Lid: Intermediate Code Generation
operator comes after an
Derang
Postfix Notation for the expression (a+b) * (c+d) is abt edge
(arb) - (¢#d) is ab* cas,
2. Three Address code : These are Statements of for:
which there will be atmost three addresses, ie, two for operan
Each instruction has atmost one operator on right hand side
 
d=a*bie| Three Address code
—
 
 
 
 
oT koh
ie
ds & one for Res, nat
tl=a*b
t2=tlte
d=t2
 
 
8 Syntax Trees : Ij
ies 8 et condensed
ill be ope
Exgeyntax Tre fog Pe verators,
id+iata oe
form of parse Tre
e in which leaves are
ep ;
Neus oy It is basically an oy
  
5:5 Code optimization
   
improve the intermediate code, so that
faster running object code (machine code)will be produced
It performs following task ~
(@) Improve target code.
(Eliminate redundant information (common sub expressions)
(© Remove unnecessary operation.
Ww instructions with faster ones.
 
    
) ‘Types of optimization:
@ Local optimization
(© Global Data flow Analysis,
(@) Local optimization: Local optimization includes removing common sub
ions or redundant information,
Loop opti
 
zation
 
 
 
 
PS=P4+QtR
Z=P+Q+M
 
 
 
 
 
 
Unoptimized code
In unoptimized code , we have to perform four additions. But in optimized code,
only three additions\are required which take less time of execution.
(©) Loop optimization } Program spend most of their time in loops. So, it is very
important to concentrate on loops for increasing performance of the whole program.
A statement which computes the same value every time when loop is executed
is called "Loo . Those statements can be taken outside the
loop’ which leads to decrease in execution time of loop & obviously program also.
 
   
 
  
 
 
 
int a=5; . inta=6;
inte; inte;
for (int i= 1; i< =5; i++) loop c=atd;
{  cout<: >} for (int i= 151 <=5 it)
cana: optimization {
} cout < | Woap
Geceration ADD C
Source Program STOREA
 
Assembly Code :
15.7 Table Management (Bookkeeping)
, Esch phase of the compiler which we
{able management phase in fig 1
Previous phases in the table,
we have discussed earlier, is attached tp
~ ach phase uses information stored by ite
 
 
fetected by following phases of compiler :
 
 
 
 
 
 
 
 
 
 
 
Phase ‘Type of Error Description
[Lexical phase | lexical error [when remaining unread characters
do not form any token
[Syntax [syntax or syntactic phase | when token stream do not follow
analyzer errors structural rules of language E.g
[Missing Parenthesis
[Semantic [semantic errore Type checking errors
analyzer
Intermediate | Intermediate code Eg Incompatible types of operands.
[Code generator |generation errors
[Code optimizer |code optimization errors |some statements never reach while
execution of program.
[Code generator |code generation error [ome data values are very large to
fit into register for allocation.
 
 
 
_. Example 6. Apply lexical analysis, Syntax Analysis (parser) & Intermediate code generation
 
 
 
inthe statement
While (A>=B && A<=3+B-I0)do
A=A+B :
‘Ans. 1. Lexical Analyzer will find otit tokens & will store tokens into symbol
Table. *
address | Token -Type, value
540 id, value= A
542 id, value= Brate following parse tree for the aboy.
stream. ‘ 2
will gene :
2, Syntax analyzer 2
Example 6. Show the whole compilation process for the statement A= B*C+20, where A,
a - : ‘are of real lypes.
enti ae ocd
—
as xN
Z 200
WAN < of PK | “
ee en eA 7
 ap
th ed TNS
 = 
aA ae
nnioe
 
   
const)
4% Intermediate codo ie thne Address code for the Statoment is given
below :
IfAZB goto (3)
goto (9)
       
         
TI=3*B
Three address code T2=T1-10
=—_—_ IfA S T2 goto (7)
Boto (9) ‘Ti=id,tidy
A=A4B id,-T1420.0
oto (1)
   
e F id, RL
 
Three Address code,
MULTF id, R1
ADDF # 24
‘MOVP Ri, id,
Assembly]
code200
208
22
216
ple 7 Shove
Pane wrious phases of
   
row object code
 
 
 
 
 
 
 
 
 
TiHayidy
TasideT1
Tasid,T2
igor,
TisidJid,
Tesidyrt
idsider2
 
  
PASSES IN COMPILATION
‘Whe whole source program can be processed several times before generating
sembly / Machine code.
Pass+ One complete scan or processing of source program. Several
tnases can be grouped into one pass. Lexical, syntax & semantic Analysis are often
‘ouped in single pass. Each Pass reads source Program & write output into
ntermediate file, which then can be read by subsequent passes. ic. output of one
Sage will be input to next Pass.
 
1 Classification of compiler w.r-t Number of Passes
1, One - Pass Compiler : One pass compiler reads the program only once &
then translate it.
2. Multi -Pass compiler : It reads the program several times, each time
transforming it to another form.
 
Source code
 
‘Assembly Code! Machine Code
Fig. 1.14: PassesISHAWS og
    
piler
  
pass vs Multi Pass is
 
[One Pass compiler
Tt reads program
translates it at same
7.6.2 One
su ram several
Tega Oe Rein
ifforent form.
E | They are “slower”
jar passes means
 
 
only
time, 8)
  
 
 
Farmore number| They are faster.
‘more execution)
Less effici
 
ie optimization & code
 
tor cod
[gencration..
‘called "wide compiler"
can sean cach & every pot
of program.
[Memory occupied by one paas can be}
id by subsequent pass therefore}
is required by
 
 
It is also called * N; rrow
As It has limited scope. “a
i Reqanay
     
 
 
 
Large memory
compiler.
 
 
 
 
E.g Pascal & C lan;
|- pass compilation: #8 Weng
 
 
 
 
tion.
 
      
 
mpilers from existing compiler.
Sometimes, we think “How the first compiler was compiled”?
‘This question is :
‘This question is answered by Boot strapping
= Using Bootstr
translate a more con
rogram,
= For Seottp ing a compiler is characterized
§ ~ Source language it compiles
Te Teveet language it generates
~ Implement
Se ibait tation language that it is written in
guages can be represented
using a T- diagram as
s 7 :
a which can be abbreviated as , SiT
1
Fig. 1.15 -T-diagram
 
hich is used
in turn can handle more complisin
 
by three languages :
 
 
for another macl
  
28
tetroduetion
‘Cross Compiler : A compiler may run on ene machine and produce target code
e. Such a compiler is called as a cross — compiler.“
1. Suppose we write a cross - compiler for a new languages ‘Lin
implementation languaga’S' to generate code for machine 'N’
uN 1
ns on machine M and genorates code for M,
 
ie,
2. Hfanexisting compiler for'S
it is characterized by SuM.
If LsN runs through SuM, wo get a compilor LwN, i.e. compiler from L to
N that runs on M.
  
 
? i.e, LsN + SM = LN
divean also be represented as
 
 
 
= xl
8
 
 
 
 
 
when SsM runs on SsA, SaM will be generated
sm fs
are Al
Ll
La]
 
 
 
 
 
 
 
 
 
 
io SM. SMSSMAN'S Com,
       
 
ie Blisainate common sub
  
‘compiler construction
enerate lexical ay
‘Automata which 2
 
  
   
       
Toole
 
 
 
 
 
a eoanner generatcnt? in| | prosram.
ee varus yrodves ernie: ana Babi aanagement «Keer seerd of each token found by lexical
‘yntax of Programming Language based or Whig | «ART landling : Bagh phase detects & report error.
oleh
oe mast) we Passes
tion Engines ease coftware si  8
¢ > ; 4
3 ; 3
é Fig. 2.4: Transition Diagram for Keywords
 
 
 
¥ Table 2.1:
es Integer codes for different Tokens [Identifier|
eo Tdentifier
Integer. = fixed.
oe des and values wh desig Different Programmers can chose different
uppose, ifidentifier ig the Lexical Analyzer.
is stor i
ved at location 236 in symbol Table, then,
letter
digit not
dette
or digit
   
 
  
return(G,install)
  
 
Aateger Cate, Insta)
ly |
6.236) Parser}. |
Fig 2.5: Transition Diagram for Identifier
ifeonstant
is stored at locati
es S08 than [Constant]
cera
 
 
 
Similarly,ical Anaya at
thon Lt=ULIULZU...... Here * canbe 0,
‘ Lt ={e} UJUCLYU.
& Lt = (e111,
Operations on Languages
If Li = (00, 10} & La = {01, 11}
 
   
 
 
 
 
 
 
i 7 Operation Deseri ____ Example
Union LAU La = (eet of string in Ls] LxUL2 =(00, 10, 01, 11)
& strings in Le
Concatenation _|LiLa= (Set of string in Li[Lile = (0001, 0011, 1001,
followed by strings| ~ 1011}
inl) 5
Kleen closure of la [13 =19 ur, uLzU = &, 00, 10, 1010, 0010,
u 1000, 0000, 600000,
4 :
 
 
  
 
 
 
 
 
 
  
   
 
or Rit (Ret Re) = (Rit Re) + Rs
v= Oy 001000,
Ea
=u... Lj= _ {00, 10, 1010, 0010,
: 1000, 0000,000000,
ora 4-00 01000...)
ig. 2.7 : Transition Di mf
. “nsition Diagram for Relational operator f
SPECI : s
Soon OF TOKENS (RE Rules of Regular Expre:
* To specify tokens, Re 1. is a Regular expression.
. Bula
It provides convenient a a 2. Union of two Regular Expressions Ri and Re.
"Regular Expressions def, Sef notation for ie. Ri+Reor Ril Reis also a Regular Expression.
~ (ansition Diagram, 3. Concatenation of two Regular Expressions Riand Re *
Satna Expressions are def ie. RiReis also a Regular Expression.
UR ina Regular Epreceg vet alphi 4. Closure of Regular Expression R i.e , R* is also a Regular Expression.
Teg expression, ™ then L(R) rep, 5. If Ris Regular Expression then (R) is also a Regular Expression.
e
can be denied by a %84 collection of atrin Algebric Laws
iE iL danguage = 88 Over some fixed alphabet 1. Ril Re=Rel Ri or Ri+Ru=Re+ Rt
then, Lnigg ott etrings of g 2 1a] Ra) = (Ril Ra) Rai Ri ReR)= BRIE |
4. Rie | RY=RiRe | ist i
or Rr (Re+ R= Rife + Ris Ore a :
& eR=Re=R (x \xample 6. Write Regular Expression for 2 = {a, b}
' : neater,» (a) Le=set of strings having atleast
2: Find Regular Expression for following Langua, ten. * (0) ngs ig atleast one occurrence of double letter
Example - sd ‘ns. (a+ 6)* (aa + bb) (a+ b)*
of string.
 
Pee0=1, Rey (b) L=set of strings having double letter at Beginning and
bL=1)) gs. (aa bb) (a+ 8)* (aa+ bb)
8 (©) L=set of string having double letter at beginning or on ending of string.
ns. (aa + bb) (a+ B)* + (a +b)* (aa + Bb) + (aa + bb) (a + b)* (aa + bb
:mple 7. Write regular expression for set of words having a, ¢, i
he order although not necessarity consecutively.
}) ans. Ifa = ((b—d] + [f—h] + Gn) + [pa] + (v —z] + Blankspace)*
 
(b= (4,11, 111,
(Hint: *=0, 1,2
Ans. It
() L=(e, 1, 1M, 1D
Ans.(11)¢
@_ L=Seofoll strings of sand 1'¢= 6,0, 1,01, 11,00, 000, 19)
 
 
 
 
 
‘Ans. (0# 1)" or (011)*
(@) L=Setof all strings of 0's and 1's ending with 11, ++ Required Regular Expression will be
Ans. (0+ 1)*11 aaaeaiaoaua
 
(© L=Set ofall strings of O's and 1's be i .
‘Ans.0(0+1)*1 ei 'th Oand Ending win, ) 14. RECOGNITION OF TOKEN (FINITE AUTOMATA)
Regular Expression are used to represent various type of Tokens. From these
ng language over x= j, , teevlar Expression, Finite Automata is generated. Finite Automata is basical
=H) ther name of Lexical Analyzer which checks or recognize whether a string is
 
 
Example 3 : Write Regular Expressions for follo
(0) Strings of length zero or one.
 
 
 
  
To Token or not.
Strings of length two, Finite Automata:
Ans.aa |0b | ba bb or (aa++ab+ ba + bby + It is a Machine or Recognizer for a language that is used to check
© Strings of Been length, ” vhether string is accepted by a language or not.
Ans (calablbal68)* oF (00 +0b-+ + Itgives answer "Yes" if string is accepted else “no”.
© Ssofallsringsofa'sand bsharne + InFinite Automata, Finite means finite number of states & Automata
. means, the Automatic Machine which works without any interference of
 
havi
ing atleast two occurrences of aa,
human being.
Ans. (0 +.8)* aa (
2 (0+ 8) aa (a+ bye
Representation of Finite Automata :There are 2 ways to represent finite
Automata.
1. Transition Diagram : It is a directed graph or flow chart having states
& edges.
Final State
Initial State
0 + Initial state
0, 1, 2+ States
a,b Input symbol
jon Table :
2+ Final StateFinite Automata can be Represented by 5 tuple (Q, »
Qisfinite non-empty set of states.
@
+ 90,8)
 
(6) Sis finite set of input symbols -
(0 dis transition function
@ qe Qisinitial state.
(© FE Qisthe set offinal states.
Example 8: Design Finite Aulomata which accepts string “abby
 
Ans.
States Q= (4s, a1, 42, G3)
Input symbols Z={a, b)
‘Transition Function 3 8 (go, a) =
+) = 9 8 (41, 8) = G2, 5(qs, b) =,
Initial Stave % Gee
Final State (F) = tab
24.4 Types of Finite Automata
(@) Deterministic Finite Automata (DFA)
(6) Non-Deterministic Fi
nite Automata (NI
(@) Deterministic Fin; ee
ae oat ite Automata (DFA) : “Deterministic”
hs ane and onl; i tna
enn 'Y one state to which the automata can bit
a an a ‘S-tuple, 30, 0, F)
(2) Qis finite ‘Ron-empty set of states
 
cat Anaysis 38
Neat Anatly 8
(6) Non-Deterministic Finite Automata (NFA) : “Non-Deterministic’
rns there can be seyeral possible transitions. So, output is non-deterministic for a
 
iven input.
NFA is 5-tuple (@, 2, 6, go, F) where
 
 
 
 
 
(@) Qis finite non-empty set of states.
(®) Dis finite set of input symbols.
(©) dis transition function to move from current state to next state.
5: QxE+2e
(@ q€ Qis initial state
(© FC Qisset of final states,
+ Difference Between DFA and NFA
DFA NFA
7. Every transition from one state | There can be multiple transitions for
to other is unique & | an input ie. non-deterministic.
deterministic in nature.
2, Null transitions (¢) are not | Null Transitions (¢) are allowed means
allowed. transition form current state to next
state without any input.
3. ‘Transition function ¢ | Transition functic ye
6:QxE+Q 7 8:Qx E> 20,
4. Requires less memory as | Requires more memory.
transitions & states are less
5. EDFA EgNFA
Infig. In fig.
Each State has unique input symbol | As, there are multiple transitions of
E= (a, b} outgoing from it. | input symbol ‘a’ on state 0. & no
transition from state 2, ie non-
 
 
 
 
 
2.5 CONVERSION OF REGULAR EXPRESSION TO FINITE AUTOMATA
(NFA)‘A Regular i
Recognize @ token, We ae ae
Bac (NFA). So We BB CONT Regular Expression into NP4. ae
igerthm fr eopversion OF 1? NFA: eam
oular Expression R ‘Ans.
language denoted by R
L
ISHaNs
Expression i8 basically @ representation of &
‘oken Recognizer which is nog
 
 
    
"A accepting
         
 
Fore, NFAis.
Fora, NFAis
Fora +b, or alb NFA is
Forab, NFAis
Porat, NFA is
  
Toke, s
ite, NY ca Anais
hing by yaa
yple 10: Draw NFA for Regular Expression a (a+b)* ab
 
Example 11: Draw NFA for atb+ ab
Ans.
Ans.transitions
1 ish
a sts Eo theme
pete
be repeated, ut
Alort:etosare(D)
‘re otanes hse nuit be ound
1 Pus Allstate Pou stack.
areas. it
psure (54
enclosure) 66)
all states are covered.
  
 
 
forcachsttet,withedges—t
a
‘
6
6 {Uf tisnot present in e-closure (1)
1 i-
(e-closue (1)=e-losure (T) U(t)
   
tele) =
Example thle
7 (O,eclaare( ecoare orf
 
e-ctosure @)= 6)
pFAare
Algorit
 
2,6 CONVERSION F
‘To construct DPA €9!
collection of states of NFA.
  
Example 16: Draw NFA for
DPA.
‘Ans, NPA for (atb)* abb is
 
  
ROM NFA TO DFA (Subset Consruction Algo)
aivalent to NPA, it should be remembered that states of
ithm NFA -to-DFA:
sA with set of states N=
  
p= {dd
set do unmarked
while there is an
{ sotd marked
{For achinput symbol‘
unmarked state din D’
re isa transition 0%
 
in NBA to which the
d
    
¢ pep
‘Add transition > labeled
setd' unmarked
y
n, convert NFA lo
regular Bxpresion (atbj"abb Then7 wert it into DFA
Tow we can apply algorithm fo con —
-losure 0) lot
210,247) Tey
p={a} 3
‘Apply steps (4-12) of Algo on state A 4 by,
‘The transitions of symbols a, b from state A ‘
A= MAORI 4 TF
vba by ay
Penh + 3 6 B }
ls
1 For Input Symbol a, For Input Symbol»,
1 =63) Tei |
2 Beeelosure (1) Osea
“clos
=eosure (38)) seclosurcgy, 8
= eclosure (8)Ue-closure (8) = (5,67, 3
=18.6,7,1,2, 4)U(8) 4
B=(1,2,3,4, i, “
Bete Is | Cniaseey
1 Poange het
“Ad transormation fom A toBand Atoc’ f
(step 1
 
    
For input symbol
+ Te = 88)
e-closure(Ts) = e-closure (3,8)"
= (1,2,3,4,
 
 
 
 
 
 
 
   
 
For input symbol b
Te = (5,9)
e-closure (Ts)
= e-closure ((5,9))
= e-closure (5)Us-closure (9)
= (6, 6,7, 1,2, 4} U (9}
 
= {1,2,4,5,6,7,9} = D
= D! = (A,B,C)UD}.={8,B,CD)
«Add Transitions from B to B and from B to D
e 7
@--®
[For state O|
Since, Ga ( te ee 6 7}
Fabby t 4 ag
Te ee ie )
For input symbol a For input symbol b
T= (8,8) = 5)
e-closure {3,8}=B | -. e-closure (5) = C
Add Transition from C to B and C toC
»
Q
©+-®
[For state D] :
cs eee cee 4, GB & 7f 9)
ab by le ele bh
To ke
For input symbol a For input symbol b
T. = (3,8) ‘Ty = {5,10}
e-closure (T:) = B e-closure ({5,10})
: = e-closure (6)Ue-closure (10)
= {5, 6, 7, 1, 2, 4,}U{10}
={1, 2, 4,5, 6,7, 10}=E
& »-D’ = (A.B,C,D) U (E} = A.B,.CD.E}Be (ER bet Synh % 1?
day bb $ 4 ab i
tm (ae BY
For input symbol a For input symbol b
T= (88) ‘Te = {5}
e-closure (3,8)) = B e-closure (5) = C
 
‘Add Transition from E to B and E to C
®
 
   
‘Therefore states
Dagan om fa of DEA wl be Dr
=A, B, ©, D, E}. Joining alll transitos
 
   
* It can also be represented in form of Transition Table as
 
state)
Fi
 
stato)
 
a
B
c
D
®
E contains state 10, which is final state in NFA
be final state in DFA.
 
ING NUMBER OF STATES OF DFA
_-Minimizing means Reducing the number of states in DFA. We have to detect
those states of DFA whose presence or absence in DFA does not affect language
accepted by DFA. These States can be
‘Algorithm :
Input: DFA D1 with sot of states Q with set of final states F.
Output: DFA D2 which accepts same language as D1 and having minimum
 
 
 
 
no. of states as possible.
 
Method
(2) Make a partition ‘1' of set of states with two subsets :
(@ Final state
(Non Final states ‘@-F” ;
2=(F,Q-F) :
) Apply following procedure to make tow from .
For each set S of 2.
Partition S into Subsets such that two states p & q of S are in same
subset of S iff for each input symbol 'a' states p & q have transitions
to states in same set of x. Replace S in zrmw by set of subsets formed,
If stoew = 2, Let musi = 2 & continue with step 4. Else repeat step 2 for
 
 
@
= Tom,
Choose one state from each set of east as representative o
‘These statesjwill be states of Minimized DFA D2.
@ f that set.Ans.
Make a Transition Table
 
inal state) G) |B
2 Makea patton of set of states i, x = (F, QF)
(E}, {A, B, C, Dj}
mn
 
3(¢) For input symbol a, on (A, B, C, D} of z0
c—t4p| A! Bele in same set (4, B,C, D} of xo
D4)
 
 
  
"4, (6) For input a, on (A, B, C} of
o
@
—  g-8 ap]
®
 
 
A.B)
B—2+B} All Bs lic in same set of x1
C+ +B
For input bon A,B, C)ofm \
ate c
 
io in same set (A,B, C} of,
Lio in other set (D) of 7,
1 will be split into {A, C} and {B}
  
For input a, on {A, 0} of a2
lie in same set of
CB.
For input 6, on {A, C} of xz
acl
| Lie in same set of 12
   
‘There will be 4 states of Minimized DFA corresponding to 6 states of
‘given-DFA.
trai
 
  
+. (AO) can be renamed as Al
. B
: D ;
. E
4 States of Minimized DFA ie. Al, B, D,E will be joined by seeing the
 
ns from the given DFA Table .
:: Minimized or Reduced Automata will beAns. 1. Make a Transition Table,
(iti state)
   
 
       
FW For inputb, on (1,2,8.4) of 0,
»
—
Piacente mnt
a—_h—s Lie in same set {5} of
abe
 
, 2, 3, 4} will-be split into {1, 3} and 2,4)
1, 3}, 2 4}
som
+4, (@)For input symbol a on {1,3} of
1—2+2 | Liein same set (2, 4} of
ate 4!
Similarly for input symbol a on (2,4) of a
2+ 3 Lie in same set (1, 3} of m1
4243
(&) For input symbol b on {1,3} of m1
it 3 Lie in same set{1, 3} of m1
ats
Similarly for input symbol b on {2,4} of 1
at * Lie in same set {5} of 1
‘ S
4} will not be splitted.aid eee a
\ Gaitislotate) 39 24
24
inalstate) = ©
      
(@) Ausiliary Definitions : It denotes the Regular Expressions of the form
 
 
 
 
  
 
 
2 LANGUAGE FOR SPECIFYING LEXICAL ANALYZERS 2 . (Dy = By
i D, = R, :
stave we ever tought which language OF CO4Te® Program te be Bs
produce lexical Analyzer. LEX source program represents the Jac," i Distinet Names}? {Regular Expressions
pecfcation of lexical Analyzer Nguage 3 7 i
LEK: j a = Re
*  iuie-a tool or sf which automatically generate lexical :
‘utomata) ~ Analg ct Names (Di) + Shortcut name of Regular Expression
‘It ibe ai input a LEX source program and produces lo «Regular Expression (Rj + Notation to represent eallection of MPM symbole
asits output ical
+ And then , Lexical Analyzer will convert the input tri eo horns iti i
Gis wien: ine entered ‘Auxiliary Definition for Identifiers +
Lx Di Lotter =A R
Seae—>| LEX \{—> Lesical De digit = 0111) Re
isa Analyzer Ds |. identifier =letter Getter | digi 1 Re
. ‘Auxiliary Definition for
Input __.| Lexical 1 igit
String —"] Analyzer [7 Tokens
 
 
i Fig, 2.8: LEX T signedinteger= sif integer
LEX Source ot ‘ool ignedinteger= sign intege)
ee . iliars finition for Dee ymber
Lexical Analyonr im : It is a language used for specifying or represen an Lena meoravenine _ .
eximel = signedinteger .inteee
+ = Auxiliary Definition for Exponential ‘Numbers
‘Exponential - No= (decimal {signedinteges) E pignedintewer,
+ Auxiliary Definition for Real Number: r
Real-No= decimal | ‘Exponential —
‘Lexical Analyzer
 
Leger
 
LE)
X source program consist of 2 parts :
 
 
 
 
 
 
 
 
 
 
 
“Foss (Translation Rules : It is et ofruls actions which tells ns
what it has to do or what it has to return: to parser on ‘encountering the token.
It consist of ‘statements of form: !
Fae Pi {Actiomi}
as Translation vi Pa (Actions)
 
 
 
 
 
Rules ‘
 
~
Fig. 2.9 : Compan
ompotients of LEX Source P; |
‘rogram
Pa {Actionro — Er ee
   
oe
ion consistis Ream
wit som or Regular Expression HOE of opus
; n names. ae
  
 
 
 
Foray srry res of ode which 218 ex0ted When,
Q
ies specs ext of statements to be executed ype :
etl expresin or pattern Pi matches ‘en, 7
‘Then, NFA‘s for corresponding patterns will be.
Bsample: 7
‘Translation Rules for "Keywords'
begin {return 1)
Patterss | end. {return 2}
pep if {return 3} > Actions
Expressions then {return 4}
clse {return 6}
  
From Table 2.1, we can see, that if Lexical Analyzer is given the
it will recognize the token "begin" & Lexical Analyzer will return 1 as int
the parse.
‘Translation Rules for "Identifiers"
 
+ + A start state ie taken and using
letter (etter +igit)*
connected together to make combined NFA =
  
Ile i
ical Anelyzer is given the token which is ari "identifier", then the Ad
taken by the Lexical analyzer is toi
val 6 an inteper code tothe parse tare the name in symbol tablet
29
IMPLEMENTATION OF A LEXICAL ANALYZER
LEX generates Lexi
itsinput ‘cal Analyzer a ite output by taking LEX pret!
LEX program
is coll
aerating Aton, fPalers (Rogular Expression & #
atterns
teaeriing te tokens to be recog a
recognized by lexical analyze!
Fig. 2.11: Combined NFA
For each
Pattern,
7 SSoreeponding NFA will be designed. «The Final state ofeach NFA show that it has found ta own token P.
UD .‘Shey
Toabined NFA to DPA. As itis alivayg g
Convert tht ‘ha program. Ry tg)
‘A with & PI
ai ea which token we have found, 1f nop, w
Tae Mies any final states of NFA then contro} me a y
+5 Ny
condition includes more than one final y
ate of DFA inclu 2 state opyp
: ee stipe pattern coming fret in the Translation ruje, wa ty
bee,
AUXILARY DEFINITIONS a
 
Example 19: Concert the following LEX program into Lexical 4,
‘Auomata)-
TRANSLATION RULES
th
abb {}
art}
Hint;
Ans.1. Convert the patterns into NFA's
mea
 
  
a
Gonvedt NFA to DRA:
A eslosite = (01,3, 7
'he transition on eyrabot,
bole a,b from state A
 
 
th
hRet- 2,
< at oat
Az 9G L
bh bb
gant
e-closure (Ta)
= e-closure (2, 4,7)
=2,4,0=B
ha
at
Bet 2
by
Net
e-closure ()=(7} =D
T=
at
= {8}
by
T= (8)
sceclosure (@)=¢ | e-closure (8) ={8}=C
           
4 7) 2247
at at
a 74
bh bE
= 8) =8
e-closure (Ts) ® :
=eclosuret) GF
=@=C
6 ©
7 eth
at at e
aay ay
by bt
5, 8 } =
e-closure (5, 8)
=16,3)=E
 
1-728
§-9‘zaclosure (1)=(7}=D | e-closure (8) = {8} = C
[HersuseE ] /
Po
 
at at ae
E= ( 5 8 }
a by by
es =f & 8 } =a}
ae -
ure )=$ | e-closure (6,8)) = (6,8} = F
+
ema) 6.6
+ ecclosure (6) =,
; O=6 | econure 6) =(9)= 0
: : ®
®--O
+ Combini
6,840 fal etter NPA Diagram
8, We get complete DFA. Since ss*
States in
NPA havi
tats in NEA having thre tte
states ie, 247, 8, 5
8, 68, 68 are final states
\
‘Tokens Recognized +
\ + maT
 
 
Tokens Recognize |
 
 
State | a [>
ois? | 2aT[ 8 ‘one
oat | 7 | 58 a \
a |e| 8 we
1 \7 | 8 one
ss | o | 8 ab
cs | #| 8 abb
¢ lel? none
 
 
 
 
 
+ 0137+ Nostate in {0,1,3,7) is Final state. ‘Therefore , no token will be
| Recognized by this state,
|, state 2 in these’ states is final eat s
combined NFA. Therefore, 247 will accept &
8 ig Final State in combined NFA. It accepts bY io
state 2 accepts @ in
arf
combined NFA.
ad a_i not final state & there it
. 58+ "yj Final eats but Bis sons
acoopts nothing:
  
jn combined NFA. Therefore
GC) Both states 6 & 8 are final But 6 eccepts ebb and 8
ccopts a” b* in combined NF ‘bb comes before a bin
‘Translation rules given ‘in Question. Therefore, state 68 will
accopt token ab
—_—<_ISHANe
ign Lexical Analyzer for following Lex Protos
 
   
   
‘pxample 20: Desi
‘The combined NFA: for various patterns will be :—
\” ,yxILIARY DEFINITIONS
 
© Ans. 1.
   
\ etter = AIBICI IZ
digit = 0/112 9
\
‘TRANSLATION RULES
begin {return 1}
end {return 2}
Nee {return 3}
x then {return 4}
\ else {return 5}
letter (etter + digit)’ [value = Install Q3
return 6 |
digitt (eee (3
\ return 7 }
5 value = 1 ;]
return 8
= {rae =2;
return 8 }
: SL fraluo =a ;
rss
<>
{ee =43
a
> <
value = 5
E ne return 8 } Fig. 2.12: Combined NFA +
{ value = ¢ i
A
|e
for following language and convert it into DFA using
Le (aa + (bb) c*)
‘Ans. NFA for about language will be:
@—_—-® 7
      
     
Conversion from NFA to DFA
e-closure (0) = (0,1, 4)=4
For input symbol @
. T= 2)
: 2 starting state of DFA.
In combined NF, transions from alate 1 to 2 and fom
same. Because, Input 'b' is a letter.
onto ne)4 For input symbol b | Foy NT ty
Th={} ‘vat,
‘: eclosure (Ts) . ed
zy apy Mu, 12)
Tae ‘ yo
~ +
=o T= ‘Te = {10}
a “.  e-closure (¢) = a e-closure (10)= G
Abs bee ge 4 e-closure or s os ; @=9
t{-, 4 + 1 ~} =
Forinput symbola | For input symbol b | For input [For state D]
ay == (19, teN=neg
a SUAnnne *  eelosure 1 e-closure (Ts) = é-closure (I\) =e-closure (T.)=$=D *
= Be ee = 110, 9, ‘Transition Table and Diagram for DFA will be :—
=F
ees a b c Here
: C={3, 13 (itial State) ATB D A=0,14
uy E D D B={}
Tal-. od De G C= 6, 6,8,9, 11, 12)
D D=9
Tag h=¢ he=¢ é D D
: D D E={3, 12)
closure or $ e-closure Ty =p e-closure (I)=¢ ao a pees
. - f = ), 9, 11, 12}
Resa] as D D G G={10,
CHT & & 9 11, ay 7
Hobbes yy
TH. a, to, Wad
Aste T= 1
felosure (1s) e-closure
= e-dlosure (7) ‘KEY POINTS TO REMEMBER
asia Analyeor:Ttreads the sour Program and eonyg
Input Buffering :To recognize complete token, buffering iy,
red illbbulfered uti its prefix tobe read combined to tay
1 » Regular Expression isa standard notation used to rep ate ais
oftokens. TeSent te i
+ Fite Automate (Transition Diagrams) are used to
"CORN;
  
‘Transition Diagram are flowcharts which consist of
‘Types of Finite Automata
DFA Determinstic Finite Automata)
NFA (Non-Determinstic Finite Automata)
LEX: LEX isa software tool which automat
eee tit is a language used for
‘LEX Program consist of two Parts
Auxiliary Definitions
‘Translation Rules
: tically genera
te Lexi
SPEFing cr Repl
'
  
REVIEW QUESTIONS
ize the number of states of a DFA andy
 
a
8. Defi Ry example, an NFA into an equivalent DFA. Explain)
fine Regular Express
Q4. Explain Pressions and vari
the Various rules t
Sanat ne ‘te of constructing NFA ted construct Regular Eipe
Qs, a mM a regular Expressie!
 
CONTEXT FREE GRAMMAR
 
3.1 INTRODUCTION
Grammar :A Grammar isa set-of rules which checks whether a string
belongs to a particular language a not.
A program consists of a various string of characters . But, every string is not a_
proper or meaningful string. So, to identify valid strings in a language, some rules
should be specified to check whether string is valid or not. These rules are nothing
but actually make a Grammar.
E.g In English language, Grammar checks whether string of characters is
acceptable or not i.e, checks whether nouns, verbs, adverbs ete are in proper
sequence.
 
 
 
 
stor [ Boxtsh ares) -
Geamnier ‘Not Acceptable
 
 
 
Fig. 3.1: Grammar
Context Free Grammar : It is a notation used to specify the syntax of
language. Context free Grammar are used to design parsers.
As lexical analyzer generate a string of tokens which are given to parser to
construct parse tree . But , before constructing parse tree, these tokens will be
grouped so that result of grouping’ will be valid construct of a language.
So, to specify constructs of language, a suitable notation is used which will be precise
& easy to understand. ‘This notation is Context free Grammar.
Formally, Context free Grammar (@) can defined as:
It isa 4-tuple (V,Z, P, S)of Productions oF Set of rules
3. Pisaset sie
sym ;
1 Pea ln etm ng
Gis con! :
e(VUE)
rer: tritedown Grammar [or language
Exar parce
Ans, Let G=W,2P,8)
\ v=)
where vee
P=(S-aS
Soa
}
‘These productions generate the language a*
ie So a
$8 2aS aa or at
S saaaora®
 
S ea8 >
8 2 aS anS saaaS
In Context Free Grammar , variabl
ane » variable or Non-Ter
ide of » (Arrow). These symbols will be expanded titi all tan
generated.
‘The terminal eymbols are basically tokens w:
Example 2: Find out la
G=((S), (0,b1S-+asb,
HS >aSb, $+
Ans. S = ash 2
> anSbb
aaaSdbp
aaaaSbyny
 
Language L= frrfnay}
 
 
018 occur
ial symbols xy
sed in a language,
mnguage generated by a Grammar
 
te 2
ted Free Grammar
{ample 3 : Write down CFG G generating all integers.
 
as.Let G  =(,3,P,S)
where ‘V =(S, ,,}
E=(0,1,2,3, e008 +.)
P ={ S + 
 > + |-
 -+  |
igit> + 0111218..roe 19
For e.g, the derivation for ~ 12 is given by :
S>  
= ~
=~ 
= -l > -12.
 
\ ample 4: Write a Contest free Grammar for palindrome Language
L= {wew"|w € (a,b)"} having middle symbol c. Here R means reverse of string.
ns. Language represents a palindrome with a middle symbol 'c!
Grammar _G =(V, 3, P, S) will have productions,
S+aSa
S+bSb
S+c
For e.g. To generate Palindrome abc a, derivation is
S= aSa
= abSba
= abeba
ixample 5 : Write CFG for language which generates palindrome of binary numbers.
ms, #0801181
S+o0llle
For e.g. To generate string 0110, derivation is
S=0S0
= 01810
= 01610
= 0110pian w
acing a given string’s non-terminal
2
2 DERIVATIONS r
Y Tight
 
     
\fre0 Grammar
 
n
 
 
 
 
ee rep ful
erivation means ef application, of rules that =
: aie rae, The et aartne symbol is called derivatioy Produces RS GPARSE ‘TREES (DERIVATION TREES)
sagt emia! ering, EREOnIng TD tart Sol by \\ parse Tree for a CFG G = (V,,P, 8) is tree eatisying following conditions
replacing a variable Y SO guage is called context Free lang, Set of: Root has label S
‘pale can deriv ne Z Buage, “leg, Every vertex has label which can be a variable (V) , terminal (E) or e.
Darian a ee \. A+ CC production, then C1, C2 ....:4Ca are children of
‘Types o! : A dusivation’A Sw i node labeled A.
vation : A derivation A =} w is called j :
@ papal eae w only to the leftmost variable a etter Leaf Node’ are terminals (2) & Interior nodes are variables (V).
we spy nr tuner of derivations, VEIY ston, Wigld ied of Derivation Tree is the concatenation ofthe labels ofthe leaves
(b). Rightmost Adorivation A > w is a rightmog, a to right ordering. .
aple 7: If CFG has productions.
  
yroduction ightmost variable at eve;
esoply Pied Canonical Derivation TY step,
Example 6: Let G be a OFG with productions.
S+AA :
AvaB
Bob
Bes Find (1) Leftmost (2) Rightmost derivatioy
Ans.: (1) Leftmost Derivation:
 
for string ab
s Saa
3 aBA
3 ava
3 abap
3 abab
(2) Rightmost Derivation;
8 3S aa
3 aap
3 Aad
3 apap
3 abab
S>aAS\d
A +SbA|SS| ba
Show that $ > aa bb aa & construct parse tree whose yield is aa bb aa.
tn
s3 aas
= agbas
= aabAS
= aabbaS
= aabbaa
» Se aabbas
Derivation Tree :
 
Yield = Left to Right ordering of leaves. = aa bb aa
mple 8 : Consider the CFG
S > dBlaA
aad
 
Ab
B +a|aS|bBB E
Find (a) Leftmost (b) Rightmost derivation for string b aa baba. Also find
 
vation Trees.ation
 
NN
ass ati Deri oN
ge0B
2 bOBB
bbe
abbas
abbaabB
abhaabaS
boas va \
co bban bab ;
co bbaababe “N
 
Derivation Tre for Rightinostp,
ti
/\
my
 
 
B
MAN
S,
uta WA \,
/
JN
Frame Cie gonnar \
Solna
a Is
nt nd i
8 (eleanor:
‘,
 
 
Derivation Tree for hing et Fe Canna
eo (as(aa)
efimost Derivation
        
 
Rightmost Derivation
83
2 OS)
 
Rightmost Derivation
83ee rn .
DB der Grammar given below : sy
 
 
 
 
Bxample10 Cons! "po BHE |BeE lid
sehtmast Derivation for string. / ext Five Grammer 5
1p Leto (2) RighimostF
dente idtid+id 7. j. For string id + id « id, there exits two parse trees.
‘Ans. (1) Leftmost Derivation: ‘ EE E+E
ia 4 | eid+E
= EtE+E AA if Ogee
eit E+E ¥ IN | wid +id«E
eidtid+E fo 7 Pow eidtideid
eid+id tid fa |
a
ES EE
sE+EsE
AN 2 id+ E+E sf
sid+id*E
7 é
Hee: oid + id id |
2 E+id+id | TA | \ ., same string is gonerated using two different leftmost id ia
eidtid+ia id & { \ srivations. Bach having a different parse treo.
| | ‘Two different parse trees exist for string id + id + id.
33.1 Ambiguity ia 4 Grammar is Ambiguous. 7
3.2 Conversion of Ambiguous to Unambiguous Grammar
 
A context free Grammar G is
non-terminal
one derivation tree for a given string, ‘The productions containing more than one occurrence of a given
Said to be ambiguous if there exists more th
| iés right hand side are ambiguous productions.
 
AGrammar that cae Ee. E+E*E
pas rd 7 E
Dustin rth ame eae att one Lettnost Derivation (or Right It contain two E's on RHS
m i ightmon in convert ‘the of Ambiguous productions to Unambiguous
4. Follouing sentence in Eng ambiguous Grammar. 2 aa, a ee or Non serminal t reece sect of theo none
“tn Books selected informa arminal farther down the Parse Tree.
U gives wo diferent men Se” txample 12 : Consider the Grammar
~ In Books, “elected nie E+ E+E |BE| lid
In Boole ett information is given Convert this ambiguous Grammar into unambiguous Grammar.
So thegpalt tele, “information is giver Ans, Stop 1 Introduction a non terminal T term) which eannot be further addition
ee ve Sentence can said as an a. Mf two terms.
ple iguous one ir . -EtT
1. Ve us one in language of English. ARe, a jon E + T. Substitute E
yas, we have production with 'T
step 2:48 port
nonterminal F (factor) which cannot bo g,
yumbers. “ther
_
other
tan of 02
with F to get,
T>TFIF
F@lid
‘Unambiguous Grammar will be
E+E+TIT
T>TeFIF
Fo@ lid
Brample 18, Prove that following grammar is Ambiguous for the sy
2 then s] else s2
 ~ if  then < statement >
cannot
‘Substitute T
 
lif  then  else 
| other statement
Convert it into Unambiguous grammar.
Ans. Given grammar is Ambig in
iguous. Since, there exists t
ee b Wo parse Trees
i Because else condition can belong to any if statement
a
it
ond> then 
|
a '
2s 3
" then = else 
I=
TO  then  else 
! ASN
| |
2 a
In above parse Tree, else belongs to first if.
Conversion to Unambiguous grammar
‘The Unambiguous grammar will be :
 > |
ifcond >thenelse
lother-statement
-rif then
|ifthen
else 
ha
 
Grammar: A set of rules which checks whether string belongs to a particular
language or not.
Context Tree Grammar: It is 4-tuple (V, 3, P, S) where production (P) is of
form Ava where AEV anda € (VU).
KEY POINTS TO REMEMBER
 
 
 
Derivation : Replacing Non-terminal of string by R.H.S of production.
‘Types of Derivation :
(@ _Leftmost Derivation (&) Rightmost Derivation
Parse Trees: A tree for Grammar G = (V, 5, P, S) has following attributes =
+ Root has label S
Every vertex can have label as a variable (V), terminal (©) or e.
IfA+ CC: Ca, then C1, Cz ‘n, are children of node A
‘Ambiguous Grammar :A Grammar having two or more Parse Trees fore
given string.oN
 
78
REVIEW QUESTIONS —~
Q.1. Find (a) Leftmost (6) Rightmost derivation for... .
> 0B/1A “ne O04,
A> 010S|1AA 4
B1/1S|0BB
Q.2, IfG is the grammar S + SbS | a, show that G is a
Q.3. What are Leftmost and Rightmost derivations,
Q.4. Consider the grammar
S+aB|bA
A-a&S | bAAla
B->bS | aBBIb
For the string aaabbabbba, find
(a) The Leftmost derivation
(6) Rightmost definition
(© Parse Tree
Q.5. Consider one grammar
S-@Qla
L-LS/S _,
Find parse tree for following 1
@- @a)
@. @@,a))
Q.6. For the given grammar Ze
S+aS/SBld
B~ Bbic
Show grammar is ambiguous for string “aadccb”.
Mbiguoys