Name- Shrinivas .k.
patil
Roll No. A4-67
Lab- Compiler Design
Practical_no - 1
LEX:
Lex is a program generator designed for lexical processing of character input streams. It accepts
a high level, problem-oriented specification for character string matching, and produces a
program in a general-purpose language which recognizes regular expressions. The regular
expressions are specified by the user in the source specifications given to Lex. The Lex written
code recognizes these expressions in an input stream and partitions the input stream into strings
matching the expressions. At the boundaries between strings program sections provided by the
user are executed. The Lex source file associates the regular expressions and the program
fragments. As each expression appears in the input to the program written by Lex, the
corresponding fragment is executed.
Lex is not a complete language, but rather a generator representing a new language
feature which can be added to different programming languages, called ``host languages.'' Just as
general purpose languages can produce code to run on different com puter hardware, Lex can
write code in different host languages.
Lex turns the user's expressions and actions (called source in this pic) into the host
general-purpose language; the generated program is named yylex. The yylex program will
recognize expressions in a stream (called input in this pic) and perform the specified actions for
each expression as it is detected.
Diagram of LEX
Format for Lex file
The general format of Lex source is:
{definitions}
%%
{rules}
%%
{user subroutines}
where the definitions and the user subroutines are often omitted. The second %% is optional, but
the first is required to mark the beginning of the rules. The absolute minimum Lex program is
thus %% (no definitions, no rules) which translates into a program which copies the input to the
output unchanged.
Regular Expression
A regular expression (or RE) specifies a set of strings that matches it; the functions in this
module let you check if a particular string matches a given regular expression (or if a given
regular expression matches a particular string, which comes down to the same thing).
Regular expressions can be concatenated to form new regular expressions; if A and B are both
regular expressions, then AB is also a regular expression. In general, if a string p matches A and
another string q matches B, the string pqwill match AB. This holds unless A or B contain low
precedence operations; boundary conditions between A and B; or have numbered group
references. Thus, complex expressions can easily be constructed from simpler primitive
expressions. Regular expressions can contain both special and ordinary characters. Most ordinary
characters, like "A", "a", or "0", are the simplest regular expressions; they simply match
themselves. You can concatenate ordinary characters, so last matches the string 'last'. (In the rest
of this section, we'll write RE's in this special style, usually without quotes, and strings to be
matched 'in single quotes'.)
Some characters, like "|" or "(", are special. Special characters either stand for classes of ordinary
characters or affect how the regular expressions around them are interpreted.
Lex Library Routines
Lex library routines are those functions which have a detailed knowledge of the lex
functionalities and which can be called to implement various tasks in a lex program.
The following table gives a list of some of the lex routines.
Lex Routine Description
Main() Invokes the lexical analyzer by calling the yylex subroutine.
yywrap() Returns the value 1 when the end of input occurs.
yymore() Appends the next matched string to the current value of the yytext array
rather than replacing the contents of the yytext array.
yyless(int n) Retains n initial characters in the yytext array and returns the remaining
characters to the input stream.
yyreject Allows the lexical analyzer to match multiple rules for the same input
string. (The yyreject subroutine is called when the special action REJECT
is used.)
yylex() The default main() contains the call of yylex()
Answer the Questions:
1. Use of yywrap:
- yywarp() is a function that flex calls when the input file ends.
-It tells the scanner whether it should continue scanning from input file or
terminate the scanning process.
2. Use of yylex function
-yylex() is the main function generated by Lex/Flex.
-It acts as the scanner or lexical analyzer.
-It reads the input stream and matches it with the pattern defined in the
rules section.
3. What does lex.yy.c. do?
-lex.yy.c contains the C code for the lexical analyzer generated by
Lex/Flex. It defines the yylex() function, which reads input, matches
patterns, and performs actions you specified in your .l file.
Practical No. P1
Aim : Write a Lex program to find the parameters given below. Consider as input a question
paper of
an examination.
1. Count the number of questions.
2. Number of questions that have sub-parts and how many do not.
3. Count the total marks.
4. Date of examination
5. Semester
6. Count different types of questions- Eg: What, Discuss, etc.
7. Numbers of words, lines, small letters, capital letters, digits, and special characters.
Program:
%{
#include <stdio.h>
#include <string.h>
int keyword_count = 0;
int identifier_count = 0;
int float_count = 0;
int int_count = 0;
int char_count = 0;
int string_count = 0;
int operator_count = 0;
int relational_count = 0;
int logical_count = 0;
int symbol_count = 0;
int unknown_count = 0;
%}
%%
"int"|"float"|"double"|"char"|"void"|"if"|"else"|"while"|"for"|"return"|"break"|"continue" {
keyword_count++;
printf("Keyword: %s\n", yytext);
}
[0-9]+\.[0-9]+ {
float_count++;
printf("Float constant: %s\n", yytext);
}
[0-9]+ {
int_count++;
printf("Integer constant: %s\n", yytext);
}
'[^']' {
char_count++;
printf("Character constant: %s\n", yytext);
}
\"[^\"]*\" {
string_count++;
printf("String constant: %s\n", yytext);
}
[a-zA-Z_][a-zA-Z0-9_]* {
identifier_count++;
printf("Identifier: %s\n", yytext);
}
"+"|"-"|"*"|"/"|"%" {
operator_count++;
printf("Operator: %s\n", yytext);
}
"="|"=="|"<"|"!"|"<="|">" {
relational_count++;
printf("Relational operator: %s\n", yytext);
}
"&&"|"||" {
logical_count++;
printf("Logical operator: %s\n", yytext);
}
[\(\)\{\}\[\],;.:] {
symbol_count++;
printf("Special symbol: %s\n", yytext);
}
\/\/[^\n]* {}
[ \t\r\n]+ {}
.{
unknown_count++;
printf("Unknown token: %s\n", yytext);
}
%%
int yywrap(void) {
return 1;
}
int main(int argc, char *argv[]) {
FILE *input_file = fopen("pract1.c", "r");
if (input_file == NULL) {
perror("Error opening file");
return 1;
}
yyin = input_file;
yylex();
fclose(input_file);
printf("---------------------------------\n");
printf("Keywords: %d\n", keyword_count);
printf("Identifiers: %d\n", identifier_count);
printf("Float constants: %d\n", float_count);
printf("Integer constants: %d\n", int_count);
printf("Character constants: %d\n", char_count);
printf("String constants: %d\n", string_count);
printf("Operators: %d\n", operator_count);
printf("Relational operators: %d\n", relational_count);
printf("Logical operators: %d\n", logical_count);
printf("Special symbols: %d\n", symbol_count);
printf("Unknown tokens: %d\n", unknown_count);
printf("---------------------------\n");
return 0;
}
Input:
int main() {
int a = 10;
float b = 3.14;
char ch = 'x';
if (a > b) {
printf("a is greater");
} else {
printf("b is greater");
}
return 0;
}
Output:
Practical No. P2
Aim: Question Paper Analyzer
Write a Lex program to find the parameters given below. Consider as input a question
paper of
an examination.
1. Count the number of questions.
2. Number of questions that have sub-parts and how many do not.
3. Count the total marks.
4. Date of examination
5. Semester
6. Count different types of questions- Eg: What, Discuss, etc.
7. Numbers of words, lines, small letters, capital letters, digits, and special characters.
Program:
%{
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int question_count = 0;
int subpart_questions = 0;
int no_subpart_questions = 0;
int total_marks = 0;
int lines = 0, words = 0;
int small = 0, capital = 0, digits = 0, special = 0;
int what_count = 0, discuss_count = 0, explain_count = 0;
char date[20] = "";
char semester[20] = "";
int has_subpart = 0;
%}
%%
Q[0-9]+[.:] {
question_count++; has_subpart = 0; }
[(][0-9]+[)] {
total_marks += atoi(yytext + 1); }
[(]a[)]|[(]b[)] {
if (!has_subpart) { subpart_questions++; has_subpart = 1; } }
Semester[ ]+[0-9]+ {
strncpy(semester, yytext, sizeof(semester) - 1); }
[0-9]{2}-[0-9]{2}-[0-9]{4} {
strncpy(date, yytext, sizeof(date) - 1); }
[Ww]hat { what_count++; }
[Dd]iscuss { discuss_count++; }
[Ee]xplain { explain_count++; }
[a-z] { small++; }
[A-Z] { capital++; }
[0-9] { digits++; }
[^a-zA-Z0-9 \n] { special++; }
[a-zA-Z0-9]+ { words++; }
\n { lines++; }
[ \t]+ { }
%%
int main() {
yyin = fopen("queppr.txt", "r");
yylex();
no_subpart_questions = question_count - subpart_questions;
printf("\n-----------------------------\n");
printf("1. Total Questions: %d\n", question_count);
printf("2. Questions with Subparts: %d\n", subpart_questions);
printf(" Questions without Subparts: %d\n", no_subpart_questions);
printf("3. Total Marks: %d\n", total_marks);
printf("4. Date of Examination: %s\n", date);
printf("5. Semester: %s\n", semester);
printf("6. Question Types:\n");
printf(" What: %d\n", what_count);
printf(" Discuss: %d\n", discuss_count);
printf(" Explain: %d\n", explain_count);
printf("7. Text Statistics:\n");
printf(" Lines: %d\n", lines);
printf(" Words: %d\n", words);
printf(" Small letters: %d\n", small);
printf(" Capital letters: %d\n", capital);
printf(" Digits: %d\n", digits);
printf(" Special characters: %d\n", special);
return 0;
}
int yywrap()
{
return(1);
}
Input:
Semester 5
Date : 01-06-2025
Branch : CSE
Q1. What is a compiler? (5)
(a) Explain the phases of a compiler. (3)
(b) Discuss lexical analysis. (4)
Q2. Discuss intermediate code generation. (5)
Q3: What is syntax analysis? (4)
Q4: Explain code optimization. (6)
Output:
Practical No. P3
Aim: Program Cleaner
Write a Lex Program which takes C program from file and write the same C program in
another file after removing the comments.
Program:
%{
#include <stdio.h>
%}
%%
"//".* ;
"/*"([^*]|\*+[^*/])*\*+"/" ;
.|\n {
printf("%s", yytext);
}
%%
int main() {
yyin = fopen("sample.c", "r");
yylex();
return 0;
}
int yywrap() {
return 1;
}
Input:
#include<stdio.h> //library
//main function
/*
multi
line
comment
*/
void main(){
printf("Hello world");
}
Output:
Practical No. P4
Aim: E4: Do as directed
Write a LEX specification to take the contents from a file
1. Add 3 to number divisible by 7
2. Add 4 to the number divisible by 2
3. Convert the alphabetical list to a numbered list
Program:
%{
#include <stdio.h>
#include <stdlib.h>
int list_num = 1;
%}
%%
^[ \t]*[a-zA-Z]\)[ \t]* {
printf("%d) ", list_num++); }
[0-9]+ {
int n = atoi(yytext);
if (n % 7 == 0)
n += 3;
else if (n % 2 == 0)
n += 4;
printf("%d", n);
}
.|\n {
printf("%s", yytext); }
%%
int main() {
yyin = fopen("e4.txt", "r");
yylex();
return 0;
}
int yywrap() {
return 1;
}
Input:
14
16
This is a test line.
a) Bread
b) Butter
c) Magnets
d) Paint Brush
numbers 7 2 21 10
Output: