0% found this document useful (0 votes)
128 views33 pages

Lex and Yacc

Lex and Yacc are tools for generating lexical analyzers and parsers. Lex takes regular expressions defining tokens as input and generates code for a lexical analyzer. It returns successive tokens to the parser. Yacc takes a context-free grammar and actions as input and generates a parser. It works with the lexer to build a parse tree. Together, Lex and Yacc can be used to build the "front-end" of compilers by breaking input text into tokens then parsing the tokens according to a grammar. They allow defining languages concisely for many applications.

Uploaded by

Boris Strugatsky
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
128 views33 pages

Lex and Yacc

Lex and Yacc are tools for generating lexical analyzers and parsers. Lex takes regular expressions defining tokens as input and generates code for a lexical analyzer. It returns successive tokens to the parser. Yacc takes a context-free grammar and actions as input and generates a parser. It works with the lexer to build a parse tree. Together, Lex and Yacc can be used to build the "front-end" of compilers by breaking input text into tokens then parsing the tokens according to a grammar. They allow defining languages concisely for many applications.

Uploaded by

Boris Strugatsky
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Lex and Yacc

A Quick Tour
HW8Use Lex/Yacc to
Turn this: Into this:
<P> Here's a list:
Here's a list: * This is item one of a list
<UL> * This is item two. Lists should be
<LI> This is item one of a list indented four spaces, with each item
<LI>This is item two. Lists should be marked by a "*" two spaces left of
indented four spaces, with each item four-space margin. Lists may contain
marked by a "*" two spaces left of four- nested lists, like this:
space margin. Lists may contain * Hi, I'm item one of an inner list.
nested lists, like this:<UL><LI> Hi, I'm * Me two.
item one of an inner list. <LI>Me two. * Item 3, inner.
<LI> Item 3, inner. </UL><LI> Item 3, * Item 3, outer list.
outer list.</UL> This is outside both lists; should be back
This is outside both lists; should be to no indent.
back to no indent.
<P><P> Final suggestions:
Final suggestions

2
if myVar == 6.02e23**2 then f( .. !
char stream
LEX token stream

if myVar == 6.02e23**2 then f( !


tokenstream
YACC
parse tree

if-stmt

== fun call

var ** Arg 1 Arg 2

float-lit int-lit ... !


3
Lex / Yacc History
Origin early 1970s at Bell Labs
Many versions & many similar tools
Lex, flex, jflex, posix,
Yacc, bison, byacc, CUP, posix,
Targets C, C++, C#, Python, Ruby, ML,
Well use jflex & byacc/j, targeting java
(but for simplicity, I usually just say lex/yacc)

4
Uses
Front end of many real compilers
E.g., gcc
Little languages:
Many special purpose utilities evolve some
clumsy, ad hoc, syntax
Often easier, simpler, cleaner and more
flexible to use lex/yacc or similar tools from
the start

5
Lex:
A Lexical Analyzer Generator
Input:
Regular exprs defining "tokens"
my.flex
Fragments of declarations & code
Output:
A java program yylex.java
jflex
Use: yylex.java
Compile & link with your main()
Calls to yylex() read chars & return
successive tokens.

7
yacc:
A Parser Generator
Input:
my.y
A context-free grammar
Fragments of declarations & code
Output: byaccj
A java program & some header files
Use: ParserVal.java

Compile & link it with your main()


Call yyparse() to parse the entire input Parser.java
yyparse() calls yylex() to get successive tokens

9
Lex Input: "mylexer.flex"
// java stuff
%: Lex %%
section %byaccj Declarations & code: most
delims
%{ copied verbatim to java pgm
public foo()
%}
%% Token code
Rules/ [a-zA-Z]+ {foo(); return(42); }
regexps [ \t\n] {; /* skip whitespace */}
+

{Actions}
No action
11
Lex Regular Expressions

Letters & numbers match themselves

Ditto \n, \t, \r

Punctuation often has special meaning

But can be escaped: \* matches *

Union, Concatenation and Star

r|s, rs, r*; also r+, r?; parens for grouping

Character groups

[ab*c] == [*cab], [a-z2648AEIOU], [^abc]

^ for not only in char groups, not complementation

12
SE
E E+n | E-n | n

Yacc Input: expr.y


%{
Java decls import java.io.*; Parser.java
%}
Yacc decls %token NUM VAR Parser.java
%%
stmt: exp { printf(%d\n,$1);}
;
Rules exp : exp + NUM { $$ = $1 + $3; }
and | exp - NUM { $$ = $1 - $3; }
{Actions}
| NUM { $$ = $1; }
; C code; java ex later
%%
Java code public static void main( Parser.java
14
Expression lexer: expr.l
%{ y.tab.h:
#include "y.tab.h" #define NUM 258
#define VAR 259
#define YYSTYPE int
%} extern YYSTYPE yylval;
%%
[0-9]+ { yylval = atoi(yytext); return NUM;}
[ \t] { /* ignore whitespace */ }
\n { return 0; /* logical EOF */ }
. { return yytext[0]; /* +-*, etc. */ }
%%
yyerror(char *msg){printf("%s,%s\n",msg,yytext);}
int yywrap(){return 1;}

15
Lex/Yacc Interface:
Compile Time
my.y my.flex more.java

byaccj jflex

Parser.java ParserVal.java Yylex.java

javac

Parser.class

17
Lex/Yacc Interface:
Run Time
main()

yyparse()
Token code

yylex() yylval
Myaction:
... Token value
yylval = ...
...
return(code)
18
Parser Value class
public class ParserVal //then do !
{
yylval = new ParserVal(3.14);
public int ival;
public double dval; yylval = new ParserVal(42);
public String sval; // ...or something like...
public Object obj; yylval = new ParserVal(new
public ParserVal(int val) myTypeOfObject());
{ ival=val; }
public ParserVal(double val)
{ dval=val; }
public ParserVal(String val) // in yacc actions, e.g.:!
{ sval=val;} $$.ival = $1.ival + $2.ival;
public ParserVal(Object val) $$.dval = $1.dval - $2.dval;!
{ obj=val; }
}//end class!

20
More Yacc Declarations

Token %token BHTML BHEAD BTITLE BBODY P BR LI


names & %token EHTML EHEAD ETITLE EBODY
types %token <sval> TEXT

Type of yylval (if any)


Nonterm
%type <obj> page head title
names &
types %type <obj> words list item items

Start sym %start page

22
Calculator example On this &
next 3 slides,
From http://byaccj.sourceforge.net/ some details
may be
%{! missing or
import java.lang.Math;! wrong, but
import java.io.*;! the big
import java.util.StringTokenizer;! picture is OK
%}!
/* YACC Declarations; mainly op prec & assoc */!
%token NUM!
%left '-' '+!
%left '*' '/!
%left NEG /* negation--unary minus */!
%right '^' /* exponentiation */!
/* Grammar follows */!
%%!
... !

25
... !
/* Grammar follows */!
%%!
input: /* empty string */! input is one expression per line;
| input line! output is its value
;!

line: \n!
| exp \n { System.out.println(" + $1.dval + " "); }!
;!

exp: NUM !{ $$ = $1; } !


| exp '+' exp !{ $$ = new ParserVal($1.dval + $3.dval); }!
| exp '-' exp !{ $$ = new ParserVal($1.dval - $3.dval); }!
| exp '*' exp !{ $$ = new ParserVal($1.dval * $3.dval); }!
| exp '/' exp !{ $$ = new ParserVal($1.dval / $3.dval); }!
| '-' exp %prec NEG!{ $$ = new ParserVal(-$2.dval); }!
| exp '^' exp !{ $$=new ParserVal(Math.pow($1.dval, $3.dval));}!
| '(' exp ')' !{ $$ = $2; }!
;!

%%! Ambiguous grammar; prec/assoc decls are a (smart) hack to fix that.
...!
26
%%!
String ins;!
StringTokenizer st;!
void yyerror(String s){!
System.out.println("par:"+s);!
}!
boolean newline;! NOT using lex; barehanded
int yylex(){! lexer with same interface
String s; int tok; Double d; !
if (!st.hasMoreTokens()) !
if (!newline) { !
token code newline=true; !
via return return \n'; //As in classic YACC example !
} else return 0; !
s = st.nextToken(); ! value via yylval
try { !
d = Double.valueOf(s); /*this may fail*/ !
yylval = new ParserVal(d.doubleValue()); !
tok = NUM; } !
catch (Exception e) { ! See slide 20
tok = s.charAt(0);/*if not float, return char*/ !
}
return tok;!
}!
27
void dotest(){!
BufferedReader in = new BufferedReader(new InputStreamReader(System.in)); !
System.out.println("BYACC/J Calculator Demo");!
System.out.println("Note: Since this example uses the StringTokenizer"); !
System.out.println("for simplicity, you will need to separate the items"); !
System.out.println("with spaces, i.e.: '( 3 + 5 ) * 2'"); !
while (true) {
System.out.print("expression:"); !
try { !
ins = in.readLine(); !
} !
catch (Exception e) { } !
st = new StringTokenizer(ins); !
newline=false; !
yyparse(); !
}!
}!

public static void main(String args[]){ !


Parser par = new Parser(false); !
par.dotest();!
}!

28
Lex and Yacc

More Details
# set following 3 lines to the relevant paths on your system
JFLEX = ~ruzzo/src/jflex-1.4.3/jflex-1.4.3/bin/jflex
BYACCJ = ~ruzzo/src/byaccj/yacc.macosx
JAVAC = javac

LEXDEBUG = 0 # set to 1 for token dump Makefile:


# targets: Not required, but convenient

run: Parser.class
java Parser $(LEXDEBUG) test.ratml

Parser.class: Yylex.java Parser.java Makefile test.ratml


$(JAVAC) Parser.java
General form
Yylex.java: jratml.flex
$(JFLEX) jratml.flex A: B C!
(tab) D!
Parser.java: jratml.y
$(BYACCJ) -J jratml.y Means A depends on B &
C and is built by running D
clean:
rm -f *~ *.class *.java

31
Parser states
Not exactly elements of PDAs Q, but similar
A yacc "state" is a set of "dotted rules" rules in G with
a "dot (or _) somewhere in the right hand side.
In a state, "A _" means this rule, up to and
including is consistent with input seen so far; next
terminal in the input must derive from the left end of
some such . E.g., before reading any input, "S _ "
is consistent, for every rule S " (S = start symbol)
Yacc deduces legal shift/goto actions from terminals/
nonterminals following dot; reduce actions from rules
with dot at rightmost end. See examples below

32
state 0!
State Diagram state 2!
$acc : . S $end! $acc : S . $end!
S : . 'a' 'b' C 'd'! S
(partial) S : . 'a' 'e' F 'g'! $end accept!
'a' shift 1!
S goto 2!
0 $accept : S $end!
1 S : 'a' 'b' C 'd'! a $end
2 | 'a' 'e' F 'g'! state 1!
3 C : 'h C! S : 'a' . 'b' C 'd (1)!
4 | 'h'! S : 'a' . 'e' F 'g (2)!
5 F : 'h' F! accept!
6 | 'h'! 'b' shift 3!
'e' shift 4!
h b e
state 5! state 3! state 4!
C : 'h' . C! S : 'a' 'b' . C 'd' (1)! S : 'a' 'e' . F 'g' (2)!
C : 'h' . ! h
'h' shift 5! 'h' shift 5! 'h' shift 7!
'd' reduce 4! C goto 6! F goto 8!
C goto 9!

C C

state 9! state 6! state 10!


C : 'h' C .! S : 'a' 'b' C . 'd' (1)! d S : 'a' 'b' C 'd' . (1)!

. reduce 3! 'd' shift 10! . reduce 1!

33
Yacc Output: state 3! state 7!
S : 'a' 'b' . C 'd' (1)! F : 'h' . F (5)!
Same Example# F : 'h' . (6)!
0 $accept : S $end! 'h' shift 5!
1 S : 'a' 'b' C 'd'! . error! 'h' shift 7!
2 | 'a' 'e' F 'g'! 'g' reduce 6!
3 C : 'h' C! C goto 6!
4 | 'h'! F goto 11!
5 F : 'h' F! state 4! state 8!
6 | 'h'! S : 'a' 'e' . F 'g' (2)! S : 'a' 'e' F . 'g' (2)!

state 0! 'h' shift 7! 'g' shift 12!


$accept : . S $end (0)! . error! . error!
state 9!
'a' shift 1! F goto 8! C : 'h' C . (3)!
. error!
state 5! . reduce 3!
S goto 2!
C : 'h' . C (3)! state 10!
C : 'h' . (4)! S : 'a' 'b' C 'd' . (1)!
state 1!
S : 'a' . 'b' C 'd' (1)! 'h' shift 5! . reduce 1!
S : 'a' . 'e' F 'g' (2)! 'd' reduce 4! state 11!
F : 'h' F . (5)!
'b' shift 3! C goto 9!
'e' shift 4! . reduce 5!
. error! state 6! state 12!
state 2! S : 'a' 'b' C . 'd' (1)! S : 'a' 'e' F 'g' . (2)!
$accept : S . $end (0)!
'd' shift 10! . reduce 2!
$end accept! . error!
34
Yacc In Action PDA stack: alternates
between "states" and
initially, push state 0 symbols from (V ).
while not done {
let S be the state on top of the stack;
let i in be the next input symbol;
look at the action defined in S for i:
if "accept", halt and accept;
if "error", halt and signal a syntax error;
if "shift to state T", push i then T onto the stack;
if "reduce via rule r (A )", then:
pop exactly 2*|| symbols
(the 1st, 3rd, ... will be states, and
the 2nd, 4th, ... will be the letters of );
let T = the state now exposed on top of the stack;
T's action for A is "goto state U" for some U;
push A, then U onto the stack.
}
Implementation note: given the tables, its deterministic,
and fast -- just table lookups, push/pop. 35
expr: expr '+' term | term ;
Yacc "Parser Table" term: term '*' fact | fact ;
fact: '(' expr ')' | 'A' ;

State Dotted Rules Shift Actions Goto Actions


A + * ( ) $end expr term fact (default)
0 $accept : _expr $end 5 4 1 2 3 error
$accept : expr_$end
1 6 accept error
expr : expr_+ term
expr : term_ (2)
2 7 reduce 2
term : term_* fact
3 term : fact_ (4) reduce 4
4 fact : (_expr ) 5 4 8 2 3 error
5 fact : A_ (6) reduce 6
6 expr : expr +_term 5 4 9 3 error
7 term : term *_fact 5 4 10 error
expr : expr_+ term
8 6 11 error
fact : ( expr_)
expr : expr + term_ (1)
9 7 reduce 1
term : term_* fact
10 term : term * fact_ (3) reduce 3
11 fact : ( expr )_ (5) reduce 5
36
shift/goto # # is a state #
reduce # # is a rule #
A : _ (#) # is this rule #
Yacc Output . default action

state 0 state 1
$accept : _expr $end $accept : expr_$end
expr : expr_+ term
( shift 4
$end accept
A shift 5
+ shift 6
. error
. error
state 2
expr goto 1 expr : term_ (2)
term goto 2 term : term_* fact
fact goto 3
* shift 7
. reduce 2
... 37
Implicit Dotted Rules
state 0 $accept: _ expr $end
$accept : _expr $end expr: _ expr '+ term
expr: _ term
term: _ term '*' fact
( shift 4
term: _ fact
A shift 5
. error fact: _ '(' expr ')'
fact: _ 'A'
expr goto 1
term goto 2
fact goto 3

38
Goto & Lookahead
state 0 $accept: _ expr $end
$accept : _expr $end expr: _ expr '+ term
expr: _ term
term: _ term '*' fact
( shift 4
term: _ fact
A shift 5
. error fact: _ '(' expr ')'
fact: _ 'A'
expr goto 1
term goto 2
fact goto 3

39
using the unambiguous expr: expr '+' term | term ;
expression grammar here term: term '*' fact | fact ;
& parse table on slide 36 fact: '(' expr ')' | 'A' ;

Example: input "A + A $end"


Action: Stack: Input:
0 A + A $end
shift 5
0 A 5 + A $end
reduce fact A, go 3
state 5 says reduce rule 6 on +; state 0
(exposed on pop) says goto 3 on fact 0 fact 3 + A $end
reduce fact term, go 2
0 term 2 + A $end
reduce expr term, go 1
0 expr 1 + A $end
shift 6
40
Action: Stack: Input:

shift 6
0 expr 1 + 6 A $end
shift 5
0 expr 1 + 6 A 5 $end
reduce fact A, go 3
0 expr 1 + 6 fact 3 $end
reduce term fact, go 9
0 expr 1 + 6 term 9 $end
reduce expr expr + term, go 1
0 expr 1 $end
accept

41
An Error Case: "A ) $end":
Action: Stack: Input:
0 A ) $end
shift 5
0 A 5 ) $end
reduce fact A, go 3
0 fact 3 ) $end
reduce fact term, go 2
0 term 2 ) $end
reduce expr term, go 1
0 expr 1 ) $end
error
42
Q: Do you have any advice for
up-and-coming programmers?
A: ... One more piece of advice
take a theoretician to lunch...

From the end of a 2008 interview with


Steve Johnson, creator of yacc
http://www.techworld.com.au/article/252319/a-z_programming_languages_yacc

44

You might also like