0% found this document useful (0 votes)
171 views54 pages

Bison Parser Generator Guide

Bison is a general-purpose parser generator that takes a grammar description for an LALR(1) context-free grammar and converts it into a C program to parse that grammar. It is upward compatible with Yacc. Using Bison requires fluency in C programming. The Bison process involves specifying a grammar, writing actions in C, generating a parser from the grammar file with Bison, compiling the generated C code, and linking to produce an executable.

Uploaded by

tpoj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
171 views54 pages

Bison Parser Generator Guide

Bison is a general-purpose parser generator that takes a grammar description for an LALR(1) context-free grammar and converts it into a C program to parse that grammar. It is upward compatible with Yacc. Using Bison requires fluency in C programming. The Bison process involves specifying a grammar, writing actions in C, generating a parser from the grammar file with Bison, compiling the generated C code, and linking to produce an executable.

Uploaded by

tpoj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 54

YACC/Bison

1
Bison
• Bison is a general-purpose parser generator that
converts a grammar description for an LALR(1)
context-free grammar into a C program to parse
that grammar.

• Bison is upward compatible with Yacc: all


properly-written Yacc grammars ought to work
with Bison with no change.

• You need to be fluent in C programming in order


to use Bison.
2
Bison
• A nonterminal symbol in the formal
grammar is represented in Bison input as
an identifier, like an identifier in C.

• By convention, it should be in lower case,


such as expr, stmt or declaration.

3
Bison
• The Bison representation for a terminal
symbol is also called a token type.

• Token types as well can be represented as


C-like identifiers.

• By convention, these identifiers should be


upper case to distinguish them from
nonterminals: for example, INTEGER,
IDENTIFIER, IF or RETURN.
4
Bison
• The terminal symbol error is reserved for error
recovery.

• The grammar rules also have an expression in


Bison syntax.

• For example, here is the Bison rule for a C return


statement.

stmt: RETURN expr ’;’


;
5
Semantic Values
• A formal grammar selects tokens only by their
classifications.

• For example: if a rule mentions the terminal symbol


‘integer constant’, it means that any integer
constant is grammatically valid in that position.

• The precise value of the constant is irrelevant to


how to parse the input:
If ‘x+4’ is grammatical valid then ‘x+1’ or ‘x+3989’ is
6
equally grammatical.
Semantic Values
• But the precise value is very important for
what the input means once it is parsed.

• A compiler is useless if it fails to distinguish


between 4, 1 and 3989 as constants in the
program!

• Therefore, each token in a Bison grammar


has both a token type and a semantic
value.
7
Semantic Values
• The token type is a terminal symbol defined
in the grammar, such as INTEGER,
IDENTIFIER or ’,’.

• It tells everything you need to know to


decide where the token may validly appear
and how to group it with other tokens.

• The grammar rules know nothing about


8
tokens except their types.
Semantic Values
• The semantic value has all the rest of the
information about the meaning of the token.

• The value of an integer, or the name of an


identifier are examples of semantic values

• A token such as ’,’ which is just punctuation


doesn’t need to have any semantic value.
9
Semantic Values
• For example, an input token might be classified as
token type INTEGER and have the semantic value 4.

• Another input token might have the same token type


INTEGER but value 3989.

• When a grammar rule says that INTEGER is allowed,


either of these tokens is acceptable because each is
an INTEGER.

• When the parser accepts the token, it keeps track of


the token’s semantic value.
10
Semantic Values
• Each grouping can also have a semantic value
as well as its nonterminal symbol.

• For example, in a calculator, an expression


typically has a semantic value that is a number.

• In a compiler for a programming language, an


expression typically has a semantic value that is
a tree structure describing the meaning of the
expression.

11
Semantic Actions
• In order to be useful, a program must do more
than parse input;

• It must also produce some output based on the


input.

• In a Bison grammar, a grammar rule can have an


action made up of C statements.

• Each time the parser recognizes a match for that


12
rule, the action is executed.
Semantic Actions
• Most of the time, the purpose of an action is to compute
the semantic value of the whole construct from the
semantic values of its parts.

• For example, suppose we have a rule which says an


expression can be the sum of two expressions.

• When the parser recognizes such a sum, each of the


subexpressions has a semantic value which describes how
it was built up.

• The action for this rule should create a similar sort of value
for the newly recognized larger expression.
13
Semantic Actions
• For example, here is a rule that says an
expression can be the sum of two
subexpressions:
expr: expr ’+’ expr { $$ = $1 + $3; }
;
• The action says how to produce the
semantic value of the sum expression from
the values of the two subexpressions.
14
Stages in Using Bison
1)
• Formally specify the grammar in a form
recognized by Bison.

• For each grammatical rule in the language,


describe the action that is to be taken when an
instance of that rule is recognized.

• The action is described by a sequence of C


statements.
15
Stages in Using Bison
2)
• Write a lexical analyzer to process input
and pass tokens to the parser.

• The lexical analyzer may be written by


hand in C.

• It could also be produced using Lex.


16
Stages in Using Bison
3) Write a controlling function that calls the
Bison-produced parser.

4) Write error-reporting routines.

17
Stages in Using Bison
• To turn this source code as written into a
runnable program, you must follow these
steps:
1. Run Bison on the grammar to produce
the parser.
2. Compile the code output by Bison, as
well as any other source files.
3. Link the object files to produce the
finished product.
18
The Overall Layout of a Bison
Grammar
The input file for the Bison utility is a Bison grammar file. The
general form of a Bison grammar file is as follows:
%{
Prologue
%}
Bison declarations
%%
Grammar rules
%%
Epilogue (auxiliary procedures)

The ‘%%’, ‘%{’ and ‘%}’ are punctuation that appears in


every Bison grammar file to separate the sections.
19
The Overall Layout of a Bison
Grammar
• The prologue may define types and variables
used in the actions.

• You can also use preprocessor commands to


define macros (#define) used there, and use
#include to include header files that do any of
these things.

• You need to declare the lexical analyzer yylex


and the error printer yyerror here, along with any
other global identifiers used by the actions in the
grammar rules.
20
The Overall Layout of a Bison
Grammar
• The Bison declarations declare:
-The names of the terminal and nonterminal
symbols.

• You can declare a token type name


(terminal symbol) as follows:

%token name

21
The Overall Layout of a Bison
Grammar
• May also describe operator precedence
and the data types of semantic values of
various symbols:-

• you can use %left, %right to specify


associativity and precedence
%left ‘+’ ‘-’ (Lowest precedence)
%left ‘*’ ‘/’
%left UMINUS (Highest precedence)
22
The Overall Layout of a Bison
Grammar
• The grammar rules define how to construct
each nonterminal symbol from its parts.

• The epilogue can contain any code you


want to use.

• Often the definitions of functions declared


in the prologue go here.
23
Example
• Write a complete YACC/Bison description
for the integer arithmetic expression:
1) expr -> expr + term | term
2) term -> term * factor | factor
3) factor -> (expr) | number
4) number -> number digit | digit
5) digit -> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

24
Example
• The grammar is given in a BNF-like form
and associated actions written in C.

• Note that it uses the original left recursion


in the BNF grammar.

• Bottom-up parsing is not bothered by left


recursion.
25
Example
• The YACC/Bison convention for
constructing values from a parse is that
the value of the left-handed side of a
production is indicated by the symbol $$ .

• The value of the nth symbol of the right


hand side is given by $n.

26
Example
• For example:
expr : expr ‘+’ term
$1 + $3 represents the sum of the first and
third symbol.

27
Example
%{
#include <stdio.h>
token DIGIT
%}

%%
command: expr ‘\n’ {printf(“The result is %d\n”, $1);}
;
expr: expr ‘+’ term {$$ = $1 + $3;}
| term {$$ = $1;}
28
;
Example
term: term ‘*’ factor {$$ = $1 * $3;}
| factor {$$ = $1;}
;

factor: number {$$ = $1;}


| ‘(‘expr’)’ {$$ = $2;}
;

number : number digit {$$= 10 * $1 + $2;}


| digit {$$ = $1;}
29
;
Example
DIGIT : ‘0’ {$$=0;}
| ‘1’ {$$=1;}
| ‘2’ {$$=2;}
| ‘3’ {$$=3;}
| ‘4’ {$$=4;}
| ‘5’ {$$=5;}
| ‘6’ {$$=6;}
| ‘7’ {$$=7;}
| ‘8’ {$$=8;}
| ‘9’ {$$=9}
;
%%
30
Example
main ( )
{ yyparse( );
return 0;
}
int yylex(void)
{
static int done = 0; /*flag to end parse */
int c ;
if (done) return 0; /* stop parse */
c = getchar ();
if (c==‘\n’) done = 1; /* next call will end parse */
return c;
31
}
Example
int yyerror(char *s)
{ /*allows for error printing message*/
printf(“%s\n”,s);
}

32
Example
• YACC/Bison generates a procedure
yyparse( ) from the grammar.

• YACC/Bison also assumes that tokens are


recognized by a scanner procedure called
yylex( ) similar to the getToken procedure.

• yylex( ) reads and returns a character.


33
Example
• One difference between yylex() and
getToken is that YACC/Bison will only end a
parse when yylex() returns 0.

• yylex() includes a code to remember when a


new line is reached and to return 0 on the
next call after a new line.

• It is also possible to generate yylex()


automatically from scanner generator such
34as LEX/Flex.
Running Bison to Make the Parser
• Before running Bison to produce a parser,
we need to decide how to arrange all the
source code in one or more source files.

• For such a simple example, the easiest


thing is to put everything in one file.

• The definitions of yylex( ), yyerror( ) and


main( ) go at the end, in the epilogue of the
file.
35
Running Bison to Make the Parser
• For a large project, you would probably
have several source files, and use make to
arrange to recompile them.

• With all the source in a single file, you use


the following command to convert it into a
parser file: bison file.y

36
Running Bison to Make the Parser
• In this example the file was called
‘siaecalc.y’ (for “Simple Integer Arithemtic
calculator”).

• Bison produces a file named ‘file.tab.c’,


removing the ‘.y’ from the original file name.

• The file output by Bison contains the source


code for yyparse( ).
37
Running Bison to Make the Parser
• The additional functions in the input file
(yylex, yyerror and main) are copied
verbatim to the output.

• To generate parser.c run the following


command:
//bison parser.y --defines –output
../src/parser.c --verbose
38
Compiling the Parser File
• Here is how to compile and run the parser file:
# List files in current directory.
$ ls
rpcalc.tab.c rpcalc.y

# Compile the Bison parser.


‘-lm’ tells compiler to search math library for pow.
‘-ll’ tells compiler that Lex library is included
‘-ly’ tells compiler that YACC/Bison library is included
$ cc -lm -o siaecalc siaecalc.tab.c -ll -ly
39
Compiling the Parser File
• # List files again.
$ ls
rpcalc rpcalc.tab.c rpcalc.y

• The file ‘siaecalc’ now contains the


executable code.

40
Using the Parser
• Here using rpcalc:
$ siaecalc
Input the following expression:
4+9
13
3+7 + 5*4
30

41
Parsing Table
• To find out how YACC/Bison forms the
parse tables.

• Run the YACC/Bison in ‘verbose mode’ in


order to provide details in:
y.output

• To obtain this file in a Unix environment:


yacc –v nums.y
42
Parsing Table
• The file y.output will contain details of the
parser states.

• Example
The input to YACC/Bison nums.y contains:

43
Parsing Table
%token NUM
%left ‘+’
%left ‘*’
%%
expr:expr ‘+’ expr
|expr ‘*’ expr
|’(‘expr’)’
| NUM
;

44
Parsing Table
The y.output will contain:
• state 0
$accept: _expr $end
NUM shift 3 (shift for terminals)
( shift 2
 error (any other input)
expr goto 1 (for nonterminals)

45
Parsing Table
• The underline symbol indicates the configurations
in the grammar to which a state belongs.

• state 1
$accept: expr_$end
expr: expr_+expr
expr: expr_* expr

$end accept
+ shift 4
* shift 5
 error (any other symbols)
46
Parsing Table
• state 2
expr:(_expr)
NUM shift 3
( shift 2
 error
expr goto 6

47
Parsing Table
• state 3
expr : NUM_(4) (Reduce by rule 4)
 reduce 4

• state 4
expr:expr + _expr
NUM shift 3
( shift 2
 error
expr goto 7

48
Parsing Table
• state 5
expr : expr * _expr
NUM shift 3
( shift 2
 error
expr goto 8

49
Parsing Table
• state 6
expr : expr_+ expr
expr :expr_* expr
expr : (expr_)

+ shift 4
* shift 5
) shift 9
 error

50
Parsing Table
• state 7
expr : expr_ + expr
expr : expr + expr_
expr : expr_* expr

+ shift 4
* shift 5
 reduce 1

51
Parsing Table
• state 8
expr : expr_+expr
expr : expr_* expr
expr : expr * expr_ (2)

 reduce 2

52
Parsing Table
• state 9
expr : (expr)_ (3)

 reduce 3

53
Parsing Table
• The annotated grammar will be:
accept : 0expr1
expr : 0,2,4,5expr1 ,6,7,8 ‘+’ 4expr7
| 0,2,4,5expr1 ,6,7,8 ‘+’ 5expr8
| 0,2,4,5’(‘2 expr6’)‘9
| 0,2,4,5NUM3

54

You might also like