0% found this document useful (0 votes)

18 views71 pages

2 Lexing

The document discusses the process of lexing in compiler design, detailing the role of a lexical analyzer in converting a character stream into a token stream. It covers concepts such as input buffering, regular expressions, error handling, and the importance of context in lexing. Additionally, it introduces the Knuth-Morris-Pratt algorithm for efficient string matching within the lexing process.

Uploaded by

karthikvkalyani2

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views71 pages

2 Lexing

Uploaded by

karthikvkalyani2

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 71

Lexing

Rupesh Nasre.

CS3300 Compiler Design

IIT Madras
August 2020
Character stream

Machine-Independent
Machine-Independent
Lexical
LexicalAnalyzer
Analyzer Code
CodeOptimizer
Optimizer

Intermediate representation

Backend
Token stream
Frontend

Syntax
SyntaxAnalyzer
Analyzer Code
CodeGenerator
Generator

Syntax tree Target machine code

Machine-Dependent
Machine-Dependent
Semantic
SemanticAnalyzer
Analyzer Code
CodeOptimizer
Optimizer

Syntax tree Target machine code

Intermediate
Intermediate
Code Symbol
CodeGenerator
Generator Table
2
Intermediate representation
Lexing Summary
Character stream
●
Basic lex Lexical
Machine-Indep.
Machine-Indep.
LexicalAnalyzer
Analyzer Code
CodeOptimizer
Optimizer
●
Input Buffering Token stream
Intermediate
representation

●
KMP String Matching Syntax
SyntaxAnalyzer
Analyzer Code
CodeGenerator
Generator

Syntax tree Target machine code

●
Regex → NFA → DFA Semantic
Machine-Dependent
Machine-Dependent
SemanticAnalyzer
Analyzer Code
CodeOptimizer
Optimizer
●
Regex → DFA Syntax tree Target machine code
Intermediate
Intermediate
Code
CodeGenerator
Generator
Intermediate
representation

3
Role
●
Read input characters
●
Group into words (lexemes)
●
Return sequence of tokens
●
Sometimes
– Eat-up whitespace
– Remove comments
– Maintain line number information

4
Token, Pattern, Lexeme
Token Pattern Sample lexeme
if Characters i, f if
comparison <= or >= or < or > or == or != <=, !=
identifier letter (letter + digit)* pi, score, D2
number Any numeric constant 3.14159, 0, 6.02e23
literal Anything but “, surrounded by “” “core dumped”

The following classes cover most or all of the tokens

●
One token for each keyword
●
Tokens for the operators, individually or in classes
●
Token for identifiers
●
One or more tokens for constants
●
One token each for punctuation symbols 5
Representing Patterns
●
Keywords can be directly represented (break, int).
●
And so do punctuation symbols ({, +).
●
Others are finite, but too many!
– Numbers
– Identifiers
– They are better represented using a regular expression.
– [a-z][a-z0-9]*, [0-9]+

6
Classwork: Regex Recap
●
If L is a set of letters (A-Z, a-z) and D is a set
of digits (0-9),
– Find the size of the language LD.
– Find the size of the language L U D.
– Find the size of the language L4.
●
Write regex for real numbers
– Without eE, without +- in mantissa (1.89)
– Without eE, with +- in mantissa (-1.89)
– With eE, with -+ in exponent (-1.89E-4)
7
Classwork
●
Write regex for strings over alphabet {a, b} that
start and end with a.
●
Strings with third last letter as a.
●
Strings with exactly three bs.
●
Strings with even length.
●
Homework
– Exercises 3.3.6 from ALSU.

8
Example Lex
/*/*variables
variables*/*/
Patterns [a-z]
[a-z] {{
yylval
yylval==*yytext
*yytext--'a';
'a';
return
returnVARIABLE;
VARIABLE; Tokens
}}

/*/*integers
integers*/*/
[0-9]+
[0-9]+ {{
yylval
yylval==atoi(yytext);
atoi(yytext); Lexemes
return
returnINTEGER;
INTEGER;
}}

/*/*operators
operators*/*/
[-+()=/*\n]
[-+()=/*\n]{{return
return*yytext;
*yytext;}}
/*/*skip
skipwhitespace
whitespace*/*/
[[\t]
\t] ;;

/*/*anything
anythingelse
elseis
isan
anerror
error*/*/
.. yyerror("invalid
yyerror("invalidcharacter");
character");
9
a1.l a1.y

lex
lex yacc
yacc

lex.yy.c y.tab.c y.tab.h

gcc
gcc

Lexer and parser are not separate binaries;

This is your compiler. a.out they are part of the same executable.10
Lex Regex
Expression Matches Example
c Character c a
\c Character c literally \*
“s” String s literally “**”
. Any character but newline a.*b
^ Beginning of a line ^abc
$ End of a line abc$
[s] Any of the characters in string s [abc]
[^s] Any one character not in string s [^abc]
r* Zero or more strings matching r a*
r+ One or more strings matching r a+
r? Zero or one r a?
r{m, n} Between m and n occurrences of r a{1,5}
r1r2 An r1 followed by an r2 ab
r1 | r2 An r1 or an r2 a|b
(r) Same as r (a | b) 11
r1/r2 r1 when followed by r2 abc/123
Homework
●
Write a lexer to identify special words in a text.
– Words like stewardesses: only one hand
– Words like typewriter: only one keyboard row
– Words like skepticisms: alternate hands
●
Implement grep using lex with search pattern
as alphabetical text (no operators *, ?, ., etc.).

12
Lexing and Context
●
Language design should ensure that lexing
can be done without context.
●
Your assignments and most languages need
context-insensitive lexing.

DO
DO55 I I==1.25
1.25 DO
DO55 I I==1,25
1,25

●
“DO 5 I” is an identifier in Fortran, as spaces are allowed in identifiers.
●
Thus, first is an assignment, while second is a loop.
●
Lexer doesn't know whether to consider the input “DO 5 I” as an identifier
or as a part of the loop, until parser informs it based on dot or comma.
●
Alternatively, lexer may employ a lookahead.
13
Lexical Errors
●
It is often difficult to report errors for a lexer.
– fi (a == f(x)) ...
– A lexer doesn't know the context of fi. Hence it
cannot “see” the structure of the sentence –
structure is known only to the parser.
– fi = 2; OR fi(a == f(x));
●
But some errors a lexer can catch.
– 23 = @a;
– if $x friendof anil ...

What should a lexer do on catching an error? 14

Error Handling
●
Multiple options
– exit(1);
– Panic mode recovery: delete enough input to recognize a
token
– Delete one character from the input
– Insert a missing character into the remaining input
– Replace a character by another character
– Transpose two adjacent characters
●
In practice, most lexical errors involve a single character.
●
Theoretical problem: Find the smallest number of
transformations (add, replace, delete) needed to convert the source
program into one that consists only of valid lexemes.
– Too expensive in practice to be worth the effort. 15
Homework
●
Try exercise 3.1.2 from ALSU.

16
Input Buffering
●
“We cannot know we were executing a finite
loop until we come out of the loop.”
●
In C, without reading the next character we
cannot determine a binary minus symbol (a-b).
 ->, -=, --, -e, ...
 Sometimes we may have to look several
characters in future, called lookahead.
 In the fortran example (DO 5 I), the lookahead
could be upto dot or comma.
●
Reading character-by-character from disk is
inefficient. Hence buffering is required. 17
Input Buffering
●
A block of characters is read from disk into a buffer.
●
Lexer maintains two pointers:
– lexemeBegin
– forward E = M * C * * 2 \f

forward
lexemeBegin

What
Whatis
isthe
theproblem
problemwith
withsuch
suchaascheme?
scheme?
18
Input Buffering
●
The issue arises when the lookahead is
beyond the buffer.
●
When you load the buffer, the previous content
is overwritten!
Input read Input to be
read

E = M * C * * 2 \f

forward
lexemeBegin

How
Howdo
dowe
wesolve
solvethis
thisproblem?
problem? 19
Double Buffering
●
Uses two (half) buffers.
●
Assumes that the lookahead would not be
more than one buffer size.

Buf1 Buf2

E = M * C * * 2 \f

forward
lexemeBegin

20
Transition Diagrams
●
Step to be taken on each character can be
specified as a state transition diagram.
– Sometimes, action may be associated with a state.
< =
0 1 2 return(comp, LE);
other yyless(1); return(comp, LT);
= 3

= return(comp, EQ);
4 5
>
other yyless(1); return(assign, ASSIGN);
6
= 8 return(comp, GE);
7
other 9 yyless(1); return(comp, GT);
21
...
Keywords vs. Identifiers
●
Keywords may match identifier pattern
– Keywords: int, const, break, ...
– Identifiers: (alpha | _) (alpha | num | _)*
●
If unaddressed, may lead to strange errors.
– Install keywords a priori in the symbol table.
– Prioritize keywords
●
In lex, the rule for a keyword must precede
that of the identifier.

Incorrect (lex may give warning) Correct

Special vs. General
●
In general, a specialized pattern must precede the
general pattern (associativity).
●
Lex also follows maximum substring matching rule
(precedence).
– Reordering the rules for < and <= would not affect the
functionality.
●
Compare with rule specialization in Prolog.
●
Classwork: Count number of he and she in a text.
●
Classwork: Write lex rules to recognize quoted
strings in C.
23
– Try to recognize \” inside it.
he and she
she ++s; she {++s; REJECT;}
he ++h; he {++h;}
Retries another rule

What if I want to count all possible substrings he?

In general, the action associated with a rule may
not be easy / modular to duplicate.
Input: he ahe he she she fsfds fsf fs sfhe he she she she

he=5, she=5 he=10, she=5

24
By the way...
●
Sometimes, you need not have a parser at all...
– You could define main in your lex file.
– Simply call yylex() from main.
– Compile using lex, then compile lex.yy.c using gcc
and execute a.out.

25
Lookahead

Duniya usi ki hai jo aage dekhe

26
Lookahead
●
Lexer needs to look into the future to know
where it is presently.

DO DO / .* COMMA { return DO;}

DO55 I I==1,25
1,25

●
/ signifies the lookahead symbol. The input is
read and matched, but is left unconsumed in
the current rule.

Corollary: DO loop index and increment must be on the same line

– no arbitrary whitespace allowed.
27
String Matching
●
Lexical analyzer relies heavily on string
matching.
●
Given a program text T (length n) and a
pattern string s (length m), we want to check if
s occurs in T.
●
A naive algorithm would try all positions of T to
check for s (complexity O(m*n)).
n
T

m
s

28
Where can we do better?
●
T = abababaababbbabbababb
●
s = ababaa

i=0
abababaababbbabbababb
ababaa

29
Where can we do better?
●
T = abababaababbbabbababb
●
s = ababaa

i=0
abababaababbbabbababb
ababaa

30
Where can we do better?
●
T = abababaababbbabbababb
●
s = ababaa

i=1
abababaababbbabbababb
ababaa

31
Where can we do better?
●
T = abababaababbbabbababb
●
s = ababaa

i=2
abababaababbbabbababb
ababaa Match found

32
Where can we do better?
●
T = abababaababbbabbababb
●
s = ababaa

T's current suffix

i=0
abababaababbbabbababb
ababaa
s's proper prefix

Key observation: T's current suffix which is a proper prefix in s

has the treasure for us.
Whenever there is a mismatch, we should utilize this overlap, 33
rather than restarting.
Where can we do better?
●
T = abababaababbbabbababb
●
s = ababaa

T's current suffix

i=0
abababaababbbabbababb
ababaa
s's proper prefix

Key observation: T's current suffix which is a proper prefix in s

has the treasure for us.
Whenever there is a mismatch, we should utilize this overlap, 34
rather than restarting.
Knuth-Morris-Pratt Algorithm
●
In 1970, Morris conceived the idea.
●
After a few weeks, Knuth independently discovered
the idea.
●
In 1970, Morris and Pratt published a techreport.
●
KMP published the algorithm jointly in 1977.
●
In 1969, Matiyasevic discovered a similar algorithm.

35
Source: wikipedia
KMP String Matching
●
First linear time algorithm for string matching.
●
Whenver there is a mismatch, do not restart;
rather fail intelligently.
●
We define a failure function for each position,
taking into account the suffix and the prefix.
●
Note that the matched part of the large string T is
essentially the pattern string s. Thus, failure
function can be computed simply using pattern s.

abababaababbbabbababb
ababaa
36
Failure is not final.

Failure function for ababaa

i 1 2 3 4 5 6
f(i) 0 0 1 2 3 1
seen a ab aba abab ababa ababaa
prefix ϵ ϵ a ab aba a

Algorithm given as Figure 3.19 in ALSU.

37
String matching with failure function
Text = a1a2...am; pattern = b1b2...bn (both indexed from 1)
s=0
for (i = 1; i <= m; ++i) { Go over Text
if (s > 0 && ai != bs+1) s = f(s) Handle failure
if (ai == bs+1) ++s Character match

if (s == n) return “yes” Full match

}
return “no”
i 1 2 3 4 5 6
f(i) 0 0 1 2 3 1
seen a ab aba abab ababa ababaa
prefix ϵ ϵ a ab aba a

38
String matching with failure function
Text = a1a2...am; pattern = b1b2...bn (both indexed from 1)
s=0
for (i = 1; i <= m; ++i) { Go over Text
while (s > 0 && ai != bs+1) s = f(s) Handle failure
if (ai == bs+1) ++s Character match

if (s == n) return “yes” Full match

}
return “no”
abababaababbbabbababb
ababaa

i 1 2 3 4 5 6
39
f(i) 0 0 1 2 3 1
Classwork
●
Find failure function for pattern ababba.
●
Test it on string abababbaa.

●
Fibonacci strings are defined as
– s1 = b, s2 = a, sk = sk-1sk-2 for k > 2
– e.g., s3 = ab, s4 = aba, s5 = abaab
● Find the failure function for s6.

40
Fibonacci Strings
– s1 = b, s2 = a, sk = sk-1sk-2 for k > 2
– e.g., s3 = ab, s4 = aba, s5 = abaab

●
Do not contain bb or aaa.
●
The words end in ba and ab alternatively.
●
Suppressing last two letters creates a palindrome.
●
...

Source: Wikipedia 41
KMP Generalization
●
KMP can be used for keyword matching.
●
Aho and Corasick generalized KMP to
recognize any of a set of keywords in a text.
h e r s
0 1 2 8 9
i

s s
6 7

h e
3 4 5

Transition diagram for keywords he, she, his and hers.

i 1 2 3 4 5 6 7 8 9
f(i) 0 0 0 1 2 0 3 0 3 42
KMP Generalization
●
When in state i, the failure function f(i) notes
the state corresponding to the longest proper
suffix that is also a prefix of some keyword.
h e r s
0 1 2 8 9
i

s s
6 7

h e
3 4 5

Transition diagram for keywords he, she, his and hers. In

Instate
state7,
7,character
character
ssmatches
matchesprefix
prefixofof
the keyword she
the keyword she to to
i 1 2 3 4 5 6 7 8 9
reach
reachstate
state3.
3.
f(i) 0 0 0 1 2 0 3 0 3 43
Regex to DFA
●
Approach 1: Regex NFA DFA
●
Approach 2: Regex DFA
– The ideas would be helpful in parsing too.

44
Regex NFA DFA
Draw an NFA for *cpp

Ʃ
c p p
0 1 2 3

p p
c
c p p
0 1 2 3
c c

How does a machine draw an NFA for an arbitrary

regular expression such as ((aa)*b(bb)*(aa)*)* ? 45
Regex NFA DFA
●
For the sake of convenience, let's convert *cpp
into *abb and restrict to alphabet {a, b}.
●
Thus, the regex is (a|b)*abb.
●
How do we create an NFA for (a|b)*abb?
ϵ
a
ϵ ϵ
ϵ ϵ a b b
ϵ b ϵ

46
Regex NFA DFA
●
For the sake of convenience, let's convert *cpp
into *abb and restrict to alphabet {a, b}.
●
Thus, the regex is (a|b)*abb.
●
How do we create an NFA for (a|b)*abb?
ϵ
a
ϵ 2 3 ϵ
0 ϵ 1 6 ϵ 7
a
8
b
9 b 10
ϵ 4 b ϵ
5

47
Regex NFA DFA
NFA state DFA state a b
{0, 1, 2, 4, 7} A B C State
{1, 2, 3, 4, 6, 7, 8} B B D Transition
Table
{1, 2, 4, 5, 6, 7} C B C
{1, 2, 4, 5, 6, 7, 9} D B E
{1, 2, 4, 5, 6, 7, 10} E B C

ϵ
a
ϵ 2 3 ϵ
0 ϵ 1 6 ϵ 7
a
8
b
9 b 10
ϵ 4 b ϵ
5

48
Regex NFA DFA
NFA state DFA state a b
{0, 1, 2, 4, 7} A B C State
{1, 2, 3, 4, 6, 7, 8} B B D Transition
Table
{1, 2, 4, 5, 6, 7} C B C
{1, 2, 4, 5, 6, 7, 9} D B E
{1, 2, 4, 5, 6, 7, 10} E B C

b
C
b a b
a b
A B D b E DFA
a a
a
49
Regex NFA DFA
Ʃ
a b b NFA
0 1 2 3

b b
a
a b b DFA
0 1 2 3
a a

b
C
b a b
a b
A B D b E DFA
a a non-minimal

a
50
Regex NFA DFA
(a|b)*abb Regex

ϵ
a
ϵ 2 3 ϵ
0 ϵ 1 6 ϵ 7 a 8 b 9 b 10 NFA
ϵ b ϵ
4 5

ϵ
b
C
b a b
a b
A B D b E DFA
a a non-minimal

a
51
Regex DFA
1. Construct a syntax tree for regex#.
2. Compute nullable, firstpos, lastpos, followpos.
3. Construct DFA using transition function.
4. Mark firstpos(root) as start state.
5. Mark states that contain position of # as
accepting states.

52
Regex DFA
●
Regex is (a|b)*abb#.
●
Construct a syntax tree for the regex.
.

. #
. 6
b
. b 5
* 4
a
3
| ●
Leaves correspond to operands.
●
Interior nodes correspond to operators.
●
Operands constitute strings.

1 a b 2
53
Functions from Syntax Tree
●
For a syntax tree node n
– nullable(n): true if n represents ϵ.
– firstpos(n): set of positions that correspond to the
first symbol of strings in n's subtree.
– lastpos(n): set of positions that correspond to the
last symbol of strings in n's subtree.
– followpos(n): set of next possible positions from n
for valid strings.
ϵ
a
ϵ 2 3 ϵ
0 ϵ 1 6 ϵ 7 a 8
b
9
b
10
ϵ 4 b 5 ϵ
54

ϵ
nullable
●
nullable(n): true if n represents ϵ.
●
Regex is (a|b)*abb#.
F .

F . #
F . F
b
F . b F
T * F
a
F

F |

F a b F
55
nullable
●
nullable(n): true if n represents ϵ.
Node n nullable(n)
leaf labeled ϵ true
leaf with position i false
or-node n = c1 | c2 nullable(c1) or nullable(c2)
cat-node n = c1c2 nullable(c1) and nullable(c2)
star-node n = c* true

Classwork: Write down the rules for firstpos(n).

●
firstpos(n): set of positions that correspond to the
first symbol of strings in n's subtree.
56
firstpos
●
firstpos(n): set of positions that correspond
to the first symbol of strings in n's subtree.
Node n firstpos(n)
leaf labeled ϵ {}
leaf with position i {i}
or-node n = c1 | c2 firstpos(c1) U firstpos(c2)
cat-node n = c1c2
star-node n = c* firstpos(c)

57
firstpos
●
firstpos(n): set of positions that correspond
to the first symbol of strings in n's subtree.
Node n firstpos(n)
leaf labeled ϵ {}
leaf with position i {i}
or-node n = c1 | c2 firstpos(c1) U firstpos(c2)
cat-node n = c1c2 if (nullable(c1)) firstpos(c1) U firstpos(c2)
else firstpos(c1)

star-node n = c* firstpos(c)

Classwork: Write down the rules for lastpos(n).

58
lastpos
●
lastpos(n): set of positions that correspond
to the last symbol of strings in n's subtree.
Node n lastpos(n)
leaf labeled ϵ {}
leaf with position i {i}
or-node n = c1 | c2 lastpos(c1) U lastpos(c2)
cat-node n = c1c2 if (nullable(c2)) lastpos(c1) U lastpos(c2)
else lastpos(c2)

star-node n = c* lastpos(c)

59
firstpos lastpos
{1,2,3} {6} .

{1,2,3} {5} .
#
{1,2,3} {4} . 6
b {6} {6}
{1,2,3} {3} . b 5
{5} {5}
{1,2} {1,2} * 4
a {4} {4}
3
{3} {3}
{1,2} {1,2} |

1 a b 2
{1} {1} {2} {2}

60
followpos
●
followpos(n): set of next possible positions
from n for valid strings.
– If n is a cat-node with child nodes c1 and c2, then
for each position in lastpos(c1), all positions in
firstpos(c2) follow.
– If n is a star-node, then for each position in
lastpos(n), all positions in firstpos(n) follow.

61
followpos
If n is a cat-node with child nodes c1 and c2, then for each position in
lastpos(c1), all positions in firstpos(c2) follow.
{1,2,3} {6} .

{1,2,3} {5} .
#
{1,2,3} {4} . 6
b {6} {6}
{1,2,3} {3} . b 5
{5} {5}
{1,2} {1,2} * 4
a {4} {4}
3 n followpos(n)
{3} {3}
{1,2} {1,2} | 1 {3}
2 {3}

1 a b 2
{1} {1} {2} {2}
62
followpos
If n is a cat-node with child nodes c1 and c2, then for each position in
lastpos(c1), all positions in firstpos(c2) follow.
{1,2,3} {6} .

{1,2,3} {5} .
#
{1,2,3} {4} . 6
b {6} {6}
{1,2,3} {3} . b 5
{5} {5}
{1,2} {1,2} * 4
a {4} {4}
3 n followpos(n)
{3} {3}
{1,2} {1,2} | 1 {3}
2 {3}
3 {4}
1 a b 2 4 {5}
{1} {1} {2} {2} 5 {6}
6 {} 63
followpos
If n is a star-node, then for each position in lastpos(n), all positions in
firstpos(n) follow.

{1,2,3} {6} .

{1,2,3} {5} .
#
{1,2,3} {4} . 6
b {6} {6}
{1,2,3} {3} . b 5
{5} {5}
{1,2} {1,2} * 4
a {4} {4}
3 n followpos(n)
{3} {3}
{1,2} {1,2} | 1 {3}
2 {3}
3 {4}
1 a b 2 4 {5}
{1} {1} {2} {2} 5 {6}
6 {} 64
followpos
If n is a star-node, then for each position in lastpos(n), all positions in
firstpos(n) follow.

{1,2,3} {6} .

{1,2,3} {5} .
#
{1,2,3} {4} . 6
b {6} {6}
{1,2,3} {3} . b 5
{5} {5}
{1,2} {1,2} * 4
a {4} {4}
3 n followpos(n)
{3} {3}
{1,2} {1,2} | 1 {3, 1, 2}
2 {3, 1, 2}
3 {4}
1 a b 2 4 {5}
{1} {1} {2} {2} 5 {6}
6 {} 65
Regex DFA
1.Construct a syntax tree for regex#.
2.Compute nullable, firstpos, lastpos, followpos.
3.Construct DFA using transition function (next slide).
4.Mark firstpos(root) as start state.
5.Mark states that contain position of # as
accepting states.

66
DFA Transitions
create unmarked state firstpos(root). {1,2,3} {6} .
while there exists unmarked state s {
mark s a b a
1 2 3
for each input symbol x {
uf = U followpos(p) where p is in s labeled x
transition[s, x] = uf n followpos(n)
1 {3, 1, 2}
if uf is newly created
2 {3, 1, 2}
unmark uf b 3 {4}
a
} 123 1234 4 {5}
5 {6} 67
} 6 {}
Final DFA
b

b a
a b b DFA
123 1234 1235 1236
a
a

Ʃ
a b b NFA
0 1 2 3

b b
a
a b b DFA
0 1 2 3
a a
68
Regex DFA
1.Construct a syntax tree for regex#.
2.Compute nullable, firstpos, lastpos, followpos.
3.Construct DFA using transition function.
4.Mark firstpos(root) as start state.
5.Mark states that contain position of # as
accepting states.

Do this for (b|ab)(aa|b).

69
In case you are wondering...
●
What to do with this DFA?
– Recognize strings during lexical analysis.
– Could be used in utilities such as grep.
– Could be used in regex libraries as supported in
php, python, perl, ....

70
Lexing Summary
Character stream
●
Basic lex Lexical
Machine-Indep.
Machine-Indep.
LexicalAnalyzer
Analyzer Code
CodeOptimizer
Optimizer
●
Input Buffering Token stream Intermediate
representation

●
KMP String Matching Syntax
SyntaxAnalyzer
Analyzer Code
CodeGenerator
Generator

Syntax tree Target machine code

2 Lexing
No ratings yet
2 Lexing
73 pages
4-Intro To Flex and Bison-09!09!2024
No ratings yet
4-Intro To Flex and Bison-09!09!2024
28 pages
Unit 1 (B)
No ratings yet
Unit 1 (B)
69 pages
Lexical Analysis in Compilers
No ratings yet
Lexical Analysis in Compilers
5 pages
Code:: Compiler Design (3170701) 190090107055
No ratings yet
Code:: Compiler Design (3170701) 190090107055
76 pages
Chapter 3 - Lexical Analysis and Lexical Analyzer Generators
No ratings yet
Chapter 3 - Lexical Analysis and Lexical Analyzer Generators
52 pages
2 Lexing
No ratings yet
2 Lexing
16 pages
Compiler Design Lab KCS552
No ratings yet
Compiler Design Lab KCS552
82 pages
Unit 2 Lexical Analyzer
No ratings yet
Unit 2 Lexical Analyzer
63 pages
Compiler Lexical Analysis Guide
No ratings yet
Compiler Lexical Analysis Guide
56 pages
COS 320 Compilers: David Walker
No ratings yet
COS 320 Compilers: David Walker
38 pages
21CS51 ATCD MODULE 2 - 2 Lexical Analyser Part2
No ratings yet
21CS51 ATCD MODULE 2 - 2 Lexical Analyser Part2
62 pages
CompilerDesignLabManual PDF
No ratings yet
CompilerDesignLabManual PDF
11 pages
Lexical Analyzer
No ratings yet
Lexical Analyzer
33 pages
SPCC Exp7
No ratings yet
SPCC Exp7
8 pages
Lexical Analysis
No ratings yet
Lexical Analysis
31 pages
CC2
No ratings yet
CC2
6 pages
Lexical Analysis
No ratings yet
Lexical Analysis
45 pages
Lexical Analyzer
No ratings yet
Lexical Analyzer
31 pages
Chapter 3 - Lexical Analysis
No ratings yet
Chapter 3 - Lexical Analysis
52 pages
Lexical Analysis with Lex
No ratings yet
Lexical Analysis with Lex
5 pages
Compiler
No ratings yet
Compiler
60 pages
Analysis-Synthesis Model & Lex Overview
No ratings yet
Analysis-Synthesis Model & Lex Overview
194 pages
Lexical Analysis 2
No ratings yet
Lexical Analysis 2
24 pages
2 - Lexical Analysis
No ratings yet
2 - Lexical Analysis
52 pages
Lexical Analysis and Lexical Analyzer Generators: COP5621 Compiler Construction
No ratings yet
Lexical Analysis and Lexical Analyzer Generators: COP5621 Compiler Construction
52 pages
Compiler Lab Manual Final E-Content
75% (16)
Compiler Lab Manual Final E-Content
55 pages
Compiler Design Lexical Analysis
No ratings yet
Compiler Design Lexical Analysis
24 pages
Chapter 2 - Lexical Analysis - Regular Expressions
No ratings yet
Chapter 2 - Lexical Analysis - Regular Expressions
27 pages
CD ch2
No ratings yet
CD ch2
104 pages
Lexical Analysis with Flex Guide
No ratings yet
Lexical Analysis with Flex Guide
22 pages
HW 31712
No ratings yet
HW 31712
22 pages
2 Lex
No ratings yet
2 Lex
45 pages
Chapter 2
No ratings yet
Chapter 2
77 pages
Lexical Analysis
No ratings yet
Lexical Analysis
62 pages
Chapter 2 - Lexical Analyser
No ratings yet
Chapter 2 - Lexical Analyser
38 pages
CH 3 Myppt
No ratings yet
CH 3 Myppt
59 pages
VI Sem CSE: System Programming Manual
No ratings yet
VI Sem CSE: System Programming Manual
41 pages
CD Unit-2
No ratings yet
CD Unit-2
64 pages
Class 2019 Lex
No ratings yet
Class 2019 Lex
30 pages
CD Unit-2
No ratings yet
CD Unit-2
64 pages
Lecture 3
No ratings yet
Lecture 3
22 pages
System Software Manual
No ratings yet
System Software Manual
27 pages
Lab
No ratings yet
Lab
169 pages
Lexical Analysis
No ratings yet
Lexical Analysis
57 pages
Compilers CH 3
No ratings yet
Compilers CH 3
58 pages
Lab Manual CD
No ratings yet
Lab Manual CD
19 pages
Lexical Analysis
No ratings yet
Lexical Analysis
62 pages
02 Lexical Analysis
No ratings yet
02 Lexical Analysis
86 pages
CC (CBCS 2019-2020)
No ratings yet
CC (CBCS 2019-2020)
73 pages
CD Cse Record
No ratings yet
CD Cse Record
76 pages
Lexical Analysis1
No ratings yet
Lexical Analysis1
44 pages
Flex
No ratings yet
Flex
36 pages
CD 1
No ratings yet
CD 1
92 pages
CH 3
No ratings yet
CH 3
66 pages
Compiler-Lexical Analysis
100% (1)
Compiler-Lexical Analysis
59 pages
Chapter2-Lexical Analysis
No ratings yet
Chapter2-Lexical Analysis
64 pages
Guidelines Evaluation Criteria South Zone 281123
No ratings yet
Guidelines Evaluation Criteria South Zone 281123
13 pages
OC18 Schedule From 23 June To 26 June
No ratings yet
OC18 Schedule From 23 June To 26 June
1 page
Preview-9789388176033 A34986158
No ratings yet
Preview-9789388176033 A34986158
18 pages
PG DMC
No ratings yet
PG DMC
3 pages
Kitchen Safety in Hospital
No ratings yet
Kitchen Safety in Hospital
6 pages
Control Statements, Arrays PDF
No ratings yet
Control Statements, Arrays PDF
27 pages
Chloewilsonsmresume
No ratings yet
Chloewilsonsmresume
1 page
Me, Myself, and The Others: Development of Values, Principles and Ideologies, Love and Attraction, and Risk Taking Behavior and Peer Influences
100% (1)
Me, Myself, and The Others: Development of Values, Principles and Ideologies, Love and Attraction, and Risk Taking Behavior and Peer Influences
9 pages
Java MCQ
100% (1)
Java MCQ
13 pages
CLC 12-Combined Final Capstone Proposal Ref
No ratings yet
CLC 12-Combined Final Capstone Proposal Ref
4 pages
3 Part Thesis Example
100% (4)
3 Part Thesis Example
4 pages
HALP232811
No ratings yet
HALP232811
3 pages
Energy Costs and Transformer Data
No ratings yet
Energy Costs and Transformer Data
41 pages
A Detailed Lesson Plan in Media and Information Literacy Evolution of Media
No ratings yet
A Detailed Lesson Plan in Media and Information Literacy Evolution of Media
11 pages
Panasonic LC-RD1217P
No ratings yet
Panasonic LC-RD1217P
2 pages
Execution of Residential Project
No ratings yet
Execution of Residential Project
36 pages
Unit Five DIFFERENTIATE
No ratings yet
Unit Five DIFFERENTIATE
32 pages
Embark - The CFOs Roadmap To Finance Transformation
No ratings yet
Embark - The CFOs Roadmap To Finance Transformation
15 pages
Outline Informative Speech
No ratings yet
Outline Informative Speech
3 pages
MFG Fiberglass Tank Filters-1
No ratings yet
MFG Fiberglass Tank Filters-1
2 pages
Petrinet and Dynamic Programming
No ratings yet
Petrinet and Dynamic Programming
9 pages
Chapter 9: Managing Flow Variability: Process Control and Capability
No ratings yet
Chapter 9: Managing Flow Variability: Process Control and Capability
6 pages
Industrial Manual Switch Specs
No ratings yet
Industrial Manual Switch Specs
3 pages
MBA Marketing Management Course
No ratings yet
MBA Marketing Management Course
20 pages
Appen English Transcription Exam
No ratings yet
Appen English Transcription Exam
11 pages
First Quarter Exam in English 9
No ratings yet
First Quarter Exam in English 9
11 pages
Department of Education: Republic of The Philippines
No ratings yet
Department of Education: Republic of The Philippines
12 pages
Affirmative Negative: I/You/He/She/It/We/You/They Will Work
No ratings yet
Affirmative Negative: I/You/He/She/It/We/You/They Will Work
3 pages
Mediterranean Sea
No ratings yet
Mediterranean Sea
1 page
Winning Strategies For ACT - 2022 - Sample
100% (1)
Winning Strategies For ACT - 2022 - Sample
26 pages
Physics - Horizontal Projectile Motion NOTES
No ratings yet
Physics - Horizontal Projectile Motion NOTES
19 pages
Discrete Mathematics and Combinatorics
No ratings yet
Discrete Mathematics and Combinatorics
9 pages
Hubo Catalog 2023
No ratings yet
Hubo Catalog 2023
33 pages
5000LM - Flashlight ENG INST - MANUAL v12
No ratings yet
5000LM - Flashlight ENG INST - MANUAL v12
2 pages

2 Lexing

Uploaded by

2 Lexing

Uploaded by

Lexing

CS3300 Compiler Design

Syntax tree Target machine code

Syntax tree Target machine code

Syntax tree Target machine code

The following classes cover most or all of the tokens

lex.yy.c y.tab.c y.tab.h

Lexer and parser are not separate binaries;

What should a lexer do on catching an error? 14

Incorrect (lex may give warning) Correct

What if I want to count all possible substrings he?

he=5, she=5 he=10, she=5

Duniya usi ki hai jo aage dekhe

DO DO / .* COMMA { return DO;}

Corollary: DO loop index and increment must be on the same line

T's current suffix

Key observation: T's current suffix which is a proper prefix in s

T's current suffix

Key observation: T's current suffix which is a proper prefix in s

Failure function for ababaa

Algorithm given as Figure 3.19 in ALSU.

if (s == n) return “yes” Full match

if (s == n) return “yes” Full match

Transition diagram for keywords he, she, his and hers.

Transition diagram for keywords he, she, his and hers. In

How does a machine draw an NFA for an arbitrary

Classwork: Write down the rules for firstpos(n).

Classwork: Write down the rules for lastpos(n).

Do this for (b|ab)*(aa|b)*.

Syntax tree Target machine code

You might also like

Do this for (b|ab)(aa|b).