Compiler

The document discusses the lexical analysis phase of a compiler, which is essential for processing source code by breaking it down into tokens for further analysis. It outlines the purpose, advantages, and disadvantages of lexical analysis, as well as the concepts of tokens, lexemes, and attributes. The document also explains how lexical analysis works using regular expressions and finite automata, emphasizing its role in simplifying parsing and improving compilation efficiency.

Uploaded by

bd7636514

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views14 pages

Compiler

Uploaded by

bd7636514

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

THE LEXICAL ANALYSIS

PHASE OF A COMPILER
(Approved by: AICTE & Affiliated to Maulana Abul Kalam Azad University of Technology)
Campus: Bishnupur, Dist : Bankura, (W.B)

SUBJECT:- Compiler Design

SUBJECT CODE:- PCC-CS501
STUDENT NAME:- AKASH MANNA
DEPERTMENT:- COMPUTER SCIENCE ENGINEERING
UNIVERSITY ROLL NO:- 15800124097
UNIVERSITY REGISTRATION NO :- 241580120172
YEAR:- 3RD SEMESTER:- 5th
ACADEMIC YEAR :- 2025-2026
EXAM:- CA1
INDEX
 1. WHAT IS LEXICAL ANALYSIS ?
 PURPOSE
 EXAMPLE
 IMPORTANT
 ADVANTAGE
 DISADVANTAGE

 2.TOKENS, LEXEMES, ATTRIBUTES, LOCATION INFORMATION

 3.HOW IT WORK
 4.CONCLUSION
What is Lexical Analysis ?
Lexical analysis is the first step of text processing used in many artificial
intelligence algorithms. Learn why this process is a key step in natural language
processing, allowing machines to understand human text more effectively.
Lexical analysis is one of the first steps in natural language processing, allowing
computers to break down input text into individual units for further analysis. This
article will explore key terms related to lexical analysis, the steps of lexical
analysis, advantages and limitations, and what types of careers utilize this
process.
PURPOSE
 Reads Code Character by Character : - The first step is to read the source
code (or other text) one character at a time from beginning to end.
 Groups Characters into "Lexemes” :- It identifies sequences of characters
that belong together based on the language's rules (e.g., i f forms the word "if", 1 2 3
forms the number "123"). These sequences are called lexemes.
 Classifies Lexemes into "Tokens” :- Each lexeme is then categorized into a
specific type of meaningful unit called a token. A token has a type (e.g., "KEYWORD",
"IDENTIFIER", "OPERATOR") and often a value (e.g., the keyword if, the identifier
myVariable).
 Discards Irrelevant Information : - Whitespace (spaces, tabs, newlines) and
comments are removed because they are not essential for the program's execution.
 Detects Basic Errors : - It can spot invalid characters or character sequences
that don't form a valid token in the language.
EXAMPLE
 INPUT :-
int main()
{
// 2 variables
int a, b;
a = 10;
return 0;
}

 OUTPUT :- ‘int' 'main' '(' ')' '{' 'int' 'a' ',' 'b' ';'
'a' '=' '10' ';' 'return' '0' ';' '}'
IMPORTANT OF Lexical Analysis
➢ First Line of Defense: It acts as the initial filter for source code, catching invalid
characters or malformed tokens early, thus preventing errors from affecting later
stages.
➢ Tokenization Simplifies Parsing: By converting a stream of characters into
easily identifiable tokens, lexical analysis simplifies the complex process of syntax
analysis (parsing) that follows. Syntax analyzers can then work with tokens
instead of raw characters, making their job more straightforward and efficient.
➢ Removes Unnecessary Elements: Whitespaces and comments, which do not
contribute to execution, are removed here. This makes the input to subsequent
compiler stages more concise, streamlining the compilation process.
➢ Speeds Up Compilation: Preprocessing and cleaning up the code at this stage
helps speed up later compiler phases and reduces the risk of ambiguous
interpretations.
➢ Enables Accurate Symbol Table Construction: The lexical analyzer often
records identifier names, keywords, and literals. This information is essential for
populating and managing the symbol table, a key data structure used throughout
compilation.
Advantages
 Simplifies Parsing :- Breaking down the source code into tokens makes it
easier for computers to understand and work with the code. This helps
programs like compilers or interpreters to figure out what the code is supposed
to do. It's like breaking down a big puzzle into smaller pieces, which makes it
easier to put together and solve.
 Error Detection :- Lexical analysis will detect lexical errors such as
misspelled keywords or undefined symbols early in the compilation process.
This helps in improving the overall efficiency of the compiler or interpreter by
identifying errors sooner rather than later.
 Efficiency :- Once the source code is converted into tokens, subsequent
phases of compilation or interpretation can operate more efficiently. Parsing
and semantic analysis become faster and more streamlined when working with
tokenized input.
DISADVANTAGES
 Limited Context : - Lexical analysis operates based on individual tokens
and does not consider the overall context of the code. This can sometimes lead
to ambiguity or misinterpretation of the code's intended meaning especially in
languages with complex syntax or semantics.
 Overhead : - Although lexical analysis is necessary for the compilation or
interpretation process, it adds an extra layer of overhead. Tokenizing the source
code requires additional computational resources which can impact the overall
performance of the compiler or interpreter.
 Debugging Challenges : - Lexical errors detected during the analysis
phase may not always provide clear indications of their origins in the original
source code. Debugging such errors can be challenging especially if they result
from subtle mistakes in the lexical analysis process.
Tokens, Lexemes, Attributes,
Location Information
The output of lexical analysis (also known as scanning or tokenization) is a structured
sequence of tokens representing the source program’s text. Each token captures the
essential details required for parsing and further compilation steps, abstracting away the
raw character stream.
Tokens :-
Each recognized sequence is categorized into a specific token type (e.g., IDENTIFIER,
NUMBER, KEYWORD, OPERATOR, etc.).
Lexemes :-
For each token, the lexeme—the actual substring from the source code matching a
pattern—is recorded.
Attributes :-
Some tokens carry extra information (attributes), such as the value for a numeric literal or
an identifier's name.
Location Information (optional) :-
Line number, character position, or file for error tracing.
How it Works
❑ 1. Specifying Patterns with Regular Expressions :-
Regular expressions are formal notations for describing patterns within text.
For each type of token (like identifiers, numbers, keywords) in a programming language, a
regular expression describes what form those sequences take (e.g., [a-zA-Z][a-zA-Z0-9]* for
identifiers, [0-9]+ for integers).
❑ 2. Translating Regular Expressions into Finite Automata :-
The system systematically converts each regular expression into a finite automaton:
Typically, the regex is first turned into a nondeterministic finite automaton (NFA). This
process can be done using algorithms like Thompson’s construction.
The resulting NFA can be further transformed into an equivalent deterministic finite
automaton (DFA), which is easier and faster for computers to process.
❑ 3. Lexical Analysis Using Automata :-
The automaton (usually a DFA for efficiency) reads the source code one character at a
time.
As it reads, it transitions between states according to its rules, which encode the
structure of the pattern from the original regex.
If the automaton ends in a matching (accepting) state after reading a sequence, that
sequence is recognized as a valid token.

❑ 4. Efficiency and Simplicity :-

The use of DFA makes the recognition process extremely fast and deterministic—each
character causes exactly one state transition.
This process enables the lexical analyzer to efficiently break down raw code into a
stream of tokens for further syntactic analysis.
conclusion
Lexical analysis is a fundamental first phase in the compiler design process, transforming
raw source code into a structured sequence of tokens. These tokens are units of meaning
defined by:
Lexemes: actual substrings in the source code,
Patterns: formal rules (often regular expressions) that describe valid lexemes,
Tokens: abstract categories assigned to lexemes based on the patterns.
Regular expressions provide a powerful and concise way to specify token patterns, while
finite automata (NFAs and DFAs) serve as efficient computational models to recognize
these patterns in input text. This synergy allows lexical analyzers to quickly and accurately
scan source code and output token streams for parsing.
Tools like Lex and Flex automate the generation of lexical analyzers from such pattern
definitions, saving developers from writing complex scanning code manually. Error
handling mechanisms during lexical analysis detect invalid sequences early, report
meaningful diagnostics, and help maintain resilient compilation by recovering from errors.
THANK YOU

Role of A Lexical AN
No ratings yet
Role of A Lexical AN
26 pages
Lexical Analysis
No ratings yet
Lexical Analysis
35 pages
Lexical Analysis and Parsing CD
No ratings yet
Lexical Analysis and Parsing CD
107 pages
Lexical Analysis
No ratings yet
Lexical Analysis
10 pages
Document 4 Compiler Design - Lexical Analysis Notes
No ratings yet
Document 4 Compiler Design - Lexical Analysis Notes
6 pages
L2 Lexical Analysis
No ratings yet
L2 Lexical Analysis
59 pages
Lexical Analysis in Automata Theory and Compiler Design
No ratings yet
Lexical Analysis in Automata Theory and Compiler Design
12 pages
CD - CH2 - Lexical Analysis
No ratings yet
CD - CH2 - Lexical Analysis
59 pages
CD - CH2 - Lexical Analysis
No ratings yet
CD - CH2 - Lexical Analysis
67 pages
Unit 2
No ratings yet
Unit 2
14 pages
002chapter 2 - Lexical Analysis
No ratings yet
002chapter 2 - Lexical Analysis
114 pages
Ca1cd 2327
No ratings yet
Ca1cd 2327
9 pages
Lexical Analysis in Compiler Design
No ratings yet
Lexical Analysis in Compiler Design
46 pages
Chapter 2 Lexical Analysis (Scanning)
No ratings yet
Chapter 2 Lexical Analysis (Scanning)
56 pages
Comp Final
No ratings yet
Comp Final
16 pages
Upload 1
No ratings yet
Upload 1
3 pages
Lexical Analysis in Compiler Design
No ratings yet
Lexical Analysis in Compiler Design
18 pages
SEN 317 Lecture 2n
No ratings yet
SEN 317 Lecture 2n
19 pages
Lecture 3 - Lexical Analysis
No ratings yet
Lecture 3 - Lexical Analysis
42 pages
Unit 1
No ratings yet
Unit 1
24 pages
Lexical Analysis
No ratings yet
Lexical Analysis
9 pages
Lexical Analysis
No ratings yet
Lexical Analysis
15 pages
Lexical Analysis in Compiler Design With Example
No ratings yet
Lexical Analysis in Compiler Design With Example
8 pages
Lecture 4 Lexical Analysis
No ratings yet
Lecture 4 Lexical Analysis
23 pages
Lexical Analysis With Questions and Asnwer
No ratings yet
Lexical Analysis With Questions and Asnwer
7 pages
Chapter 2 - Lexical Analysis
No ratings yet
Chapter 2 - Lexical Analysis
74 pages
ACD Unit-2 Part-2
No ratings yet
ACD Unit-2 Part-2
20 pages
Lexical Analysis
No ratings yet
Lexical Analysis
128 pages
Compiler Construction CS-4207: Lecture 4-5 Instructor Name: Atif Ishaq
100% (1)
Compiler Construction CS-4207: Lecture 4-5 Instructor Name: Atif Ishaq
37 pages
Chapter 2-Lexical Analysis
No ratings yet
Chapter 2-Lexical Analysis
48 pages
5.tokens, Patterns, and Lexemes
No ratings yet
5.tokens, Patterns, and Lexemes
7 pages
R.V. College of Engineering
No ratings yet
R.V. College of Engineering
56 pages
Lexical Analyzer: Design and Implementation With LEX Tool
No ratings yet
Lexical Analyzer: Design and Implementation With LEX Tool
13 pages
Week 5-6
No ratings yet
Week 5-6
33 pages
Unit 03 Scanner
No ratings yet
Unit 03 Scanner
51 pages
CD UNIT-1
No ratings yet
CD UNIT-1
60 pages
Lecture 2 10022025 035804pm
No ratings yet
Lecture 2 10022025 035804pm
27 pages
HW 31712
No ratings yet
HW 31712
22 pages
ATCD
No ratings yet
ATCD
9 pages
Case Study of Lexical Analyzer PDF
100% (1)
Case Study of Lexical Analyzer PDF
3 pages
2-Lexical Analysis
No ratings yet
2-Lexical Analysis
52 pages
Lexical Analysis
No ratings yet
Lexical Analysis
12 pages
Compilers and Translators Assignment
No ratings yet
Compilers and Translators Assignment
3 pages
SPCC Ia-2
No ratings yet
SPCC Ia-2
17 pages
SPCC Ia-2
No ratings yet
SPCC Ia-2
25 pages
Lexical Analysis in Compilers
No ratings yet
Lexical Analysis in Compilers
52 pages
Recognition of Token in Lexical Analysis-3
No ratings yet
Recognition of Token in Lexical Analysis-3
10 pages
Lexical Analysis for Developers
No ratings yet
Lexical Analysis for Developers
16 pages
Compiler Lexical Analysis Guide
No ratings yet
Compiler Lexical Analysis Guide
16 pages
MOD 04 - Language Description & Lexical Analysis
No ratings yet
MOD 04 - Language Description & Lexical Analysis
107 pages
1 - Scanning Slides Sanyal Part1
No ratings yet
1 - Scanning Slides Sanyal Part1
22 pages
Lexical Analyzer
No ratings yet
Lexical Analyzer
16 pages
Compiler Construction II Handout
100% (1)
Compiler Construction II Handout
27 pages
Lexical Analysis
No ratings yet
Lexical Analysis
38 pages
An Analysis of Compiler Design in Context of Lexical Analyzer
No ratings yet
An Analysis of Compiler Design in Context of Lexical Analyzer
5 pages
Lexical Analysis for Programmers
No ratings yet
Lexical Analysis for Programmers
2 pages
Lexical Analysis for CS Students
No ratings yet
Lexical Analysis for CS Students
12 pages
3.role of Lexical Analyzer
No ratings yet
3.role of Lexical Analyzer
4 pages
Operating - 11
No ratings yet
Operating - 11
14 pages
Access Specifiers OOPs Detailed
No ratings yet
Access Specifiers OOPs Detailed
16 pages
Akash Manna 15800124097: ACADEMIC YEAR:-2025-2026 Exam: - Ca1
No ratings yet
Akash Manna 15800124097: ACADEMIC YEAR:-2025-2026 Exam: - Ca1
16 pages
CSE Akash Manna PCC-CS404
No ratings yet
CSE Akash Manna PCC-CS404
11 pages
CSE - Akash Manna - PCC-CS403
No ratings yet
CSE - Akash Manna - PCC-CS403
13 pages
Create - Access Specifier in OOPs
No ratings yet
Create - Access Specifier in OOPs
3 pages
Class - 1 Compulsory English Spoken English (1) 20220517114356
No ratings yet
Class - 1 Compulsory English Spoken English (1) 20220517114356
63 pages
DSA Notes (Sorting Algorithm and Graph Traversal Algorithm)
No ratings yet
DSA Notes (Sorting Algorithm and Graph Traversal Algorithm)
19 pages
Lec-01 - Discrete Math - Set Theory: 06 February 2025 09:49 PM
No ratings yet
Lec-01 - Discrete Math - Set Theory: 06 February 2025 09:49 PM
10 pages
Changelog
No ratings yet
Changelog
31 pages
NTA UGC NET Computer Science Syllabus
No ratings yet
NTA UGC NET Computer Science Syllabus
9 pages
FRM Course Syllabus IPDownload
No ratings yet
FRM Course Syllabus IPDownload
2 pages
TOC-Notes All Units
No ratings yet
TOC-Notes All Units
65 pages
Unit 3.-Finite Automatic: 3.1 Concepts: Definition and Classification of Finite Automata (AF) - Definition 1
No ratings yet
Unit 3.-Finite Automatic: 3.1 Concepts: Definition and Classification of Finite Automata (AF) - Definition 1
22 pages
Complier Design Btech 6th Semester - CSE-A-B Unit 01
No ratings yet
Complier Design Btech 6th Semester - CSE-A-B Unit 01
33 pages
Moore - Melay Machines
No ratings yet
Moore - Melay Machines
15 pages
COURSE CODE::Discrete Structures and Automata Theory Course Prerequisites: Basic Mathematics and Programming Course Objectives
No ratings yet
COURSE CODE::Discrete Structures and Automata Theory Course Prerequisites: Basic Mathematics and Programming Course Objectives
6 pages
1-Structure and Phases of A Compiler-19!07!2024
No ratings yet
1-Structure and Phases of A Compiler-19!07!2024
99 pages
CEP 10 Design Patterns WP
No ratings yet
CEP 10 Design Patterns WP
33 pages
Digital Design: Register-Transfer Level (RTL) Design
No ratings yet
Digital Design: Register-Transfer Level (RTL) Design
88 pages
Speech Recognition: Lecture 11: Advanced Topics
No ratings yet
Speech Recognition: Lecture 11: Advanced Topics
35 pages
Formal Languages & Finite Theory of Automata: BS Course
No ratings yet
Formal Languages & Finite Theory of Automata: BS Course
33 pages
Seminar PPT On FSM Based Vending Machine
No ratings yet
Seminar PPT On FSM Based Vending Machine
13 pages
IIT Syllabus
No ratings yet
IIT Syllabus
20 pages
States, State Graphs, and Transition Testing: 1. Synopsis
100% (1)
States, State Graphs, and Transition Testing: 1. Synopsis
28 pages
Hexahexaflexagon Booklet
No ratings yet
Hexahexaflexagon Booklet
28 pages
18.404/6.840 Intro To The Theory of Computation: Instructor: Mike Sipser Tas
No ratings yet
18.404/6.840 Intro To The Theory of Computation: Instructor: Mike Sipser Tas
14 pages
Sequence Detector
No ratings yet
Sequence Detector
5 pages
Compiler Design Short Notes
No ratings yet
Compiler Design Short Notes
133 pages
Digital Electronics: Course Description and Objectives
No ratings yet
Digital Electronics: Course Description and Objectives
3 pages
Models of Computation Overview
No ratings yet
Models of Computation Overview
36 pages
Unit-2 25 09 2024
No ratings yet
Unit-2 25 09 2024
132 pages
5 Steps To Draw A State Machine Diagram
No ratings yet
5 Steps To Draw A State Machine Diagram
11 pages
300+ REAL TIME Computer Engineering Objective Questions & Answers
100% (1)
300+ REAL TIME Computer Engineering Objective Questions & Answers
57 pages
Unit 1 Question Bank
No ratings yet
Unit 1 Question Bank
11 pages
Scheme and Syllabus of B.Tech. (2018 Batch Onwards)
No ratings yet
Scheme and Syllabus of B.Tech. (2018 Batch Onwards)
14 pages
Subject: Theory of Autamata & Formal Languages: Kanpur Institute of Technology Kanpur B.Tech (CS) Smester: Iv Sem
No ratings yet
Subject: Theory of Autamata & Formal Languages: Kanpur Institute of Technology Kanpur B.Tech (CS) Smester: Iv Sem
6 pages
MBSE Methods
No ratings yet
MBSE Methods
26 pages
R - One-Tape, Off-Line Turing Machine Computations - Hennie PDF
No ratings yet
R - One-Tape, Off-Line Turing Machine Computations - Hennie PDF
26 pages

Compiler

Uploaded by

Compiler

Uploaded by

THE LEXICAL ANALYSIS

SUBJECT:- Compiler Design

 2.TOKENS, LEXEMES, ATTRIBUTES, LOCATION INFORMATION

❑ 4. Efficiency and Simplicity :-

You might also like