Smart Mini Compilar
Smart Mini Compilar
Compiler Construction
PROJECT REPORT
Mam Maryam
Assistant professor
Smart mini Compiler
Dedication
We dedicate this project to God Almighty our creator, our strong pillar, our source of
inspiration, wisdom, knowledge and understanding. He has been the source of our
strength throughout this program and on His wings only have we soared. We also
dedicate this work to my teacher; Mam Maryam who has encouraged us all the way and
whose encouragement has made sure that we give it all it takes to finish that which we
have started. To my friends who have been affected in every way possible by this quest.
Thank you. God bless you.
Table of Contents
Dedication........................................................................................................................................ii
Table of Contents...........................................................................................................................iv
Chapter 1..........................................................................................................................................1
Introduction......................................................................................................................................2
1.1. Compilar Design Phase.....................................................................................................3
1.2. Analysis Phase……………………………………….……………………….….………3
1.3 Synthesis Phase………………………………………..…………………………………3
1.4 Graphical Representation………………………….….………………………………….4
Chapter 2..........................................................................................................................................5
LEXCIAL ANALYSIS...................................................................................................................5
2.1 Introduction.......................................................................................................................6
2.2 The Lexical Analysis.....................................................................................................6
2.3 THE LEX CODE.......................................................................................................... 7
Chapter 3........................................................................................................................................18
SYNTAX ANALYSIS..............................................................................................................18
3.1 Introduction.....................................................................................................................19
3.2 Structure of LEX Program………………….…………………………………………..19
3.3 Structure of Yacc Program……………………………………………………………...20
3.4 Paras Tree……………………………………………………………………………….32
Chapter 4........................................................................................................................................35
Conclusion ....................................................................................................................................36
Chapter 1
Introduction
Chapter 1: Introduction
Lexical analysis reflects a compiler's first step.It takes the modified source code from
preprocessor language and is written in the form of sentences. The lexical analyzer splits these
syntaxes into a set of tokens by eliminating some space in the source code or comments.
Symbol table is an important data structure that compilers build and retain to store information
on the existence of various entities such as variable names, function names etc. All the pieces of
the research and the creation of a compiler using the symbol table. Use Lex, we developed a
lexical analyzer for the C language.
It takes a C code as input, and outputs a tokens line. The output tokens shown contain keywords,
identifiers, signed / unsigned integer / floating point constants, operators, special characters,
headers, data form specifications, list, single-line statement, multi-line statement, pre-processor
instruction, pre-defined functions (print and scan), user-defined functions and the main feature.
The token, token sort, and token line number are seen in the C code.
The line number is shown so that the code can be debugged more quickly for errors. Errors in
single-line comments are shown along with multi-line comments and line numbers. The output
also includes the tokens symbol list, and their sort. Using hash organization, the symbol table is
created.
process produces an intermediate representation of the source file and symbol table. This step
is generated by:
1. Lexical Analysis
2. Syntax Analysis
3. Semantic Analysis
4. Intermediate Code Generation
Chapter 2
Lexical Analysis
2.
2.1. Introduction
Lexical analysis reflects a compiler's first step. It takes the modified source code from
preprocessor language and is written in the form of sentences. The lexical analyzer splits these
syntaxes into a set of tokens by eliminating some space in the source code or comments.
If a token is considered invalid by the lexical analyzer, it will produce an error. The lexical
analyzer interacts similarly with the analyzer of syntaxes. It reads character streams from
source code, checks for valid tokens, and when it requests, sends the data to the syntax
analyzer.
Diagram
"meaning") is lexical analysis. A software that performs lexical analysis can be called a lexer,
tokenizer, or scanner (although the first step of a lexer is often referred to as "scanner").
In general, such a lexer is paired with a parser, which analyzes the grammar of programming
languages, web sites, and so on.
The script we write is a computer program named "lex," which produces lexical analysers
("scanners" or "lexers"). Lex reads an input stream that defines the source code for the
lexical analyser and outputs that execute the lexer in the programming language of C.
%{
int lineno = 1;
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
#define AUTO 1
#define BREAK 2
#define CASE 3
#define CHAR 4
#define CONST 5
#define CONTINUE 6
#define DEFAULT 7
#define DO 8
#define DOUBLE 9
#define ELSE 10
#define ENUM 11
#define EXTERN 12
#define FLOAT 13
#define FOR 14
#define GOTO 15
#define IF 16
#define INT 17
#define LONG 18
#define REGISTER 19
#define RETURN 20
#define SHORT 21
#define SIGNED 22
#define SIZEOF 23
#define STATIC 24
#define STRUCT 25
#define SWITCH 26
#define TYPEDEF 27
#define UNION 28
#define UNSIGNED 29
#define VOID 30
#define VOLATILE 31
#define WHILE 32
#define IDENTIFIER 33
#define SLC 34
#define MLCS 35
#define MLCE 36
#define LEQ 37
#define GEQ 38
#define EQEQ 39
#define NEQ 40
#define LOR 41
#define LAND 42
#define ASSIGN 43
#define PLUS 44
#define SUB 45
#define MULT 46
#define DIV 47
#define MOD 48
#define LESSER 49
#define GREATER 50
#define INCR 51
#define DECR 52
#define COMMA 53
#define SEMI 54
#define HEADER 55
#define MAIN 56
#define PRINTF 57
#define SCANF 58
#define DEFINE 59
#define INT_CONST 60
#define FLOAT_CONST 61
#define TYPE_SPEC 62
#define DQ 63
#define OBO 64
#define OBC 65
#define CBO 66
#define CBC 67
#define HASH 68
#define ARR 69
#define FUNC 70
#define NUM_ERR 71
#define UNKNOWN 72
#define CHAR_CONST 73
#define SIGNED_CONST 74
#define STRING_CONST 75
%}
alpha [A-Za-z]
digit [0-9]
und [_]
space [ ]
tab [ ]
line [\n]
char \'.\'
at [@]
string \"(.^([%d]|[%f]|[%s]|[%c]))\"
%%
{space}* {}
{tab}* {}
{string} return STRING_CONST;
{char} return CHAR_CONST;
{line} {lineno++;}
auto return AUTO;
break return BREAK;
case return CASE;
char return CHAR;
const return CONST;
continue return CONTINUE;
default return DEFAULT;
do return DO;
}
void display()
{
struct node *myNode;
int i,j, k=1;
printf("-------------------------------------------------------------------");
printf("\nSNo \t|\tToken \t\t|\tToken Type \t\n");
printf("-------------------------------------------------------------------\n");
for (i = 0; i < eleCount; i++)
{
if (hashTable[i].count == 0)
continue;
myNode = hashTable[i].head;
if (!myNode)
continue;
while (myNode != NULL)
{
printf("%d\t\t", k++);
printf("%s\t\t\t", myNode->token);
printf("%s\t\n", myNode->attr);
myNode = myNode->next;
}
}
return;
}
int main()
{
int scan, slcline=0, mlc=0, mlcline=0, dq=0, dqline=0;
yyin = fopen("isPrime.c","r");
printf("\n\n");
scan = yylex();
while(scan)
{
if(lineno == slcline)
{
scan = yylex();
continue;
}
if(lineno!=dqline && dqline!=0)
{
if(dq%2!=0)
printf("\n******** ERROR!! INCOMPLETE STRING at Line %d
********\n\n", dqline);
dq=0;
}
if((scan>=1 && scan<=32) && mlc==0)
{
printf("%s\t\t\tKEYWORD\t\t\t\tLine %d\n", yytext, lineno);
insertToHash(yytext, "KEYWORD");
}
if(scan==33 && mlc==0)
{
printf("%s\t\t\tIDENTIFIER\t\t\tLine %d\n", yytext, lineno);
insertToHash(yytext, "IDENTIFIER");
}
if(scan==34)
{
printf("%s\t\t\tSingleline Comment\t\tLine %d\n", yytext, lineno);
slcline = lineno;
}
if(scan==35 && mlc==0)
{
printf("%s\t\t\tMultiline Comment Start\t\tLine %d\n", yytext, lineno);
mlcline = lineno;
mlc = 1;
}
if(scan==36 && mlc==0)
{
printf("\n******** ERROR!! UNMATCHED MULTILINE COMMENT END %s at
Line %d ********\n\n", yytext, lineno);
}
if(scan==36 && mlc==1)
{
mlc = 0;
printf("%s\t\t\tMultiline Comment End\t\tLine %d\n", yytext, lineno);
}
if((scan>=37 && scan<=52) && mlc==0)
{
printf("%s\t\t\tOPERATOR\t\t\tLine %d\n", yytext, lineno);
insertToHash(yytext, "OPERATOR");
}
if((scan==53||scan==54||scan==63||(scan>=64 && scan<=68)) && mlc==0)
{
printf("%s\t\t\tSPECIAL SYMBOL\t\t\tLine %d\n", yytext, lineno);
if(scan==63)
{
dq++;
dqline = lineno;
}
insertToHash(yytext, "SPECIAL SYMBOL");
}
if(scan==55 && mlc==0)
{
printf("%s\tHEADER\t\t\t\tLine %d\n",yytext, lineno);
}
if(scan==56 && mlc==0)
{
printf("%s\t\t\tMAIN FUNCTION\t\t\tLine %d\n", yytext, lineno);
insertToHash(yytext, "IDENTIFIER");
}
if((scan==57 || scan==58) && mlc==0)
{
printf("%s\t\t\tPRE DEFINED FUNCTION\t\tLine %d\n", yytext, lineno);
insertToHash(yytext, "PRE DEFINED FUNCTION");
}
if(scan==59 && mlc==0)
{
printf("%s\t\t\tPRE PROCESSOR DIRECTIVE\t\tLine %d\n", yytext, lineno);
}
if(scan==60 && mlc==0)
{
printf("%s\t\t\tINTEGER CONSTANT\t\tLine %d\n", yytext, lineno);
insertToHash(yytext, "INTEGER CONSTANT");
}
if(scan==61 && mlc==0)
{
printf("%s\t\t\tFLOATING POINT CONSTANT\t\tLine %d\n", yytext, lineno);
insertToHash(yytext, "FLOATING POINT CONSTANT");
}
if(scan==62 && mlc==0)
{
printf("%s\t\t\tTYPE SPECIFIER\t\t\tLine %d\n", yytext, lineno);
}
if(scan==69 && mlc==0)
{
printf("%s\t\t\tARRAY\t\t\t\tLine %d\n", yytext, lineno);
insertToHash(yytext, "ARRAY");
}
if(scan==70 && mlc==0)
{
printf("%s\t\t\tUSER DEFINED FUNCTION\t\tLine %d\n", yytext, lineno);
insertToHash(yytext, "USER DEFINED FUNCTION");
}
if(scan==71 && mlc==0)
{
printf("\n******** ERROR!! CONSTANT ERROR %s at Line %d
********\n\n", yytext, lineno);
}
if(scan==72 && mlc==0)
{
printf("\n******** ERROR!! UNKNOWN TOKEN %s at Line %d
********\n\n", yytext, lineno);
}
if(scan==73 && mlc==0)
{
printf("%s\t\t\tCHARACTER CONSTANT\t\t\tLine %d\n", yytext, lineno);
insertToHash(yytext, "CHARACTER CONSTANT");
}
if(scan==74 && mlc==0)
{
printf("%s\t\t\tSIGNED CONSTANT\t\t\tLine %d\n", yytext, lineno);
insertToHash(yytext, "SIGNED CONSTANT");
}
if(scan==75 && mlc==0)
{
printf("%s\t\t\tSTRING CONSTANT\t\t\tLine %d\n", yytext, lineno);
insertToHash(yytext, "STRING CONSTANT");
}
scan = yylex();
}
if(mlc==1)
printf("\n******** ERROR!! UNMATCHED COMMENT STARTING at Line %d
********\n\n",mlcline);
printf("\n");
printf("\n\t******** SYMBOL TABLE ********\t\t\n");
display();
printf("-------------------------------------------------------------------\n\n");
}
int yywrap()
{
return 1;
}
Input file (isPrime.c)
#include<stdio.h>
int main()
{
int a,i,flag=0;
printf("Input no");
scanf("%d",&a);
i=2;
while(i <= a/2)
{
if(a%i == 0)
{
flag=1;
break;
}
i++;
}
if(flag==0)
printf("%d Prime", a);
else
printf("Not Prime");
return 0;
}
Chapter 3
SYNTAX ANALSIS
3.1 Introduction
Syntax analysis in computer science is the method of searching for the syntactic consistency of
the code. Syntax analysis or parsing is meant to verify that we have a correct sequence of
tokens. Tokens are true symbol sets, keywords, signatures etc. The parser must be capable of
processing the infinite number of potential legitimate programs that can be given to him. The
standard way to describe the language is by defining a grammar.
A grammar is a collection of rules (or productions) which determine the language syntax (i.e.,
what is a valid language sentence). For a particular language there can be more than one
grammar. To find any errors in the code the parser analyzes the source code (token stream)
against the output rules. This phase's performance is a parse-tree.
Through writing two scripts, the syntax analyzer for the C language functions as a lexical
analyzer (lexer) and generates a stream of tokens, while the other one functions as a parser.
The lexer's called the lex software. Lex reads an input stream that defines the source code for
the lexical analyser and outputs that execute the lexer in the programming language of C.
The section on description describes macros, and imports header files in C. Any C code that
must be copied verbatim into the created source file can also be written here.
The portion of rules connects standard patterns of speech to claims of C. If the lexer sees text
matching a given pattern in the data, then the corresponding C code will be executed.
The section on C code includes C statements and functions that are copied verbatim to source
file created. Such claims probably contain code in the Laws section named by the laws.
The rules section consists of the context-free grammar used for parse tree creation. A general
rule is organized according to the following:
nonterminal
: sentential form
| sentential form
.................
| sentential form
;
Actions can be aligned with rules and are performed while following the corresponding
sentential type.
The section on routines can include the C program defining the input file, the operation
routines and other user-defined functions
Lex Program :
alpha [A-Za-z_]
fl (f|F|l|L)
ul (u|U|l|L)*
digit [0-9]
space []
hex [a-fA-F0-9]
exp [Ee][+-]?{digit}+
%{
char datatype[100] = "dummy";
int tl;
char next;
#include <stdio.h>
#include <string.h>
%}
%%
\n { yylineno++; }
"/*" { multicomment(); }
"//" { singlecomment(); }
"#include<"({alpha})*".h>" {}
"#define"({space})""({alpha})""({alpha}|{digit})*""({space})""({digit})+""
{ return DEFINE;}
"#define"({space})""({alpha}({alpha}|{digit})*)""({space})""(({digit}+)\.({digit}+))""
{ return DEFINE;}
"#define"({space})""({alpha}({alpha}|{digit})*)""({space})""({alpha}({alpha}|{digit})*)""
{ return DEFINE;}
")" { return(')'); }
("["|"<:") { return('['); }
("]"|":>") { return(']'); }
"." { return('.'); }
"&" { return('&'); }
"!" { return('!'); }
"~" { return('~'); }
"-" { return('-'); }
"+" { return('+'); }
"*" { return('*'); }
"/" { return('/'); }
"%" { return('%'); }
"<" { return('<'); }
">" { return('>'); }
"^" { return('^'); }
"|" { return('|'); }
"?" { return('?'); }
"printf"|"scanf" { insertToHash(yytext,"PROCEDURE",yylineno); return IDENTIFIER; }
"main" { insertToHash(yytext,"PROCEDURE",yylineno); return
IDENTIFIER; }
{alpha}({alpha}|{digit})* {
if(strcmp(datatype, "dummy")==0)
return IDENTIFIER;
else
{
insertToHash(yytext,datatype,yylineno);
return IDENTIFIER;
}
}
[ \t\v\n\f] { }
. { /* ignore bad characters */ }
%%
struct cnode
{
char num[50];
//int lno;
char type[20];
};
struct cnode ctable[100];
int ccount = 0;
void disp()
{
int i;
printf("\n\n------------------------------CONSTANT TABLE------------------------------\n");
printf("--------------------------------------------------------------------------\n");
printf("Value \t\t|\tData Type\t\t\n");
printf("--------------------------------------------------------------------------\n");
for(i=0;i<ccount;i++)
{
printf("%s\t\t\t", ctable[i].num);
printf("%s\t\t\t\n", ctable[i].type);
//printf("\t%d\t\n", ctable[i].lno);
}
printf("\n\n");
}
struct node
{
char token[100];
char attr[100];
//int line[100];
int line_count;
struct node *next;
};
struct hash
{
struct node *head;
int hash_count;
};
return newnode;
}
{
if (strcmp(myNode->token, token)==0)
{
flag = 1;
//myNode->line[(myNode->line_count)++] = l;
if(strcmp(myNode->attr, attr)!=0)
{
strcpy(myNode->attr, attr);
}
break;
}
myNode = myNode->next;
}
if(!flag)
{
//adding new node to the list
newnode->next = (hashTable[hi].head);
//update the head of the list and no of nodes in the current bucket
hashTable[hi].head = newnode;
hashTable[hi].hash_count++;
}
return;
}
void display()
{
struct node *myNode;
int i,j, k=1;
printf("\n-----------------------------------------Symbol
Table---------------------------------------------\n");
printf("--------------------------------------------------------------------------------------------------");
printf("\nToken \t\t|\tToken Type \t\t\t\t\t \n");
printf("--------------------------------------------------------------------------------------------------\n"
);
for (i = 0; i < eleCount; i++)
{
if (hashTable[i].hash_count == 0)
continue;
myNode = hashTable[i].head;
if (!myNode)
continue;
while (myNode != NULL)
{
//printf("%d\t\t", k++);
printf("%s\t\t\t", myNode->token);
printf("%s\t\t\t", myNode->attr);
/*for(j=0;j<(myNode->line_count);j++)
printf("%d ",myNode->line[j]);*/
printf("\n");
myNode = myNode->next;
}
}
printf("--------------------------------------------------------------------------------------------------\n");
return;
}
yywrap()
{
return(1);
}
multicomment()
{
char c, c1;
while ((c = input()) != '*' && c != 0);
c1=input();
if(c=='*' && c1=='/')
{
c=0;
}
if (c != 0)
putchar(c1);
}
singlecomment()
{
char c;
while(c=input()!='\n');
if(c=='\n')
c=0;
if(c!=0)
putchar(c);
}
Chapter 4
Conclusion
Chapter 5: Conclusion
For a subset of C language, the lexical analyzer and the syntax analyzer is created which
includes selection statements, compound statements, iteration statements, jumping
statements, user-defined functions and primary expressions. It is important to
remember that if due caution is not taken when defining the context-free grammar for
the language, conflicts (shift-reduce and reduce-reduce) will arise in the case of a syntax
analyser. We will also define unambiguous grammar for proper operation of the parser.