0% found this document useful (0 votes)

8 views3 pages

Difference Between Token, Pattern, and Lexeme

Phase 2 of compiler construction focuses on lexical analysis, which involves breaking source code into tokens, patterns, and lexemes. Tokens categorize meaningful units, patterns define their structure, and lexemes are the actual character sequences. Tools like Lex generate lexical analyzers to facilitate this process, while Directed Acyclic Graphs (DAGs) optimize computations by minimizing redundancy through techniques like common subexpression elimination and value numbering.

Uploaded by

Desh Deepak kant

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views3 pages

Difference Between Token, Pattern, and Lexeme

Uploaded by

Desh Deepak kant

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

In Phase 2: Lexical Analysis of compiler construction, the focus is on breaking the source code into a

sequence of tokens that are easier to analyze and process in later phases. Here's an explanation of the
concepts you've mentioned:

1. Difference between Token, Pattern, and Lexeme:

Token:
A token is a category or class of lexemes that share a common structure. It represents a
meaningful unit in the source code. For example, int , + , while , or numeric literals like 10
are tokens. The purpose of tokens is to categorize strings of characters from the source code into
a meaningful group.

Example:
In the code int x = 10; , the tokens would be:

int (keyword)
x (identifier)
= (operator)
10 (literal)
; (punctuation)

Pattern:
A pattern is a rule or regular expression that describes the syntactic structure of a token. It
defines the set of strings that can be considered as instances of a token. In other words, patterns
describe how lexemes (actual sequences of characters) match specific types of tokens.

Example:
A pattern for an identifier token could be: [a-zA-Z_][a-zA-Z0-9_]* , which describes
the structure of variable names (starting with a letter or underscore, followed by any
combination of letters, digits, or underscores).

Lexeme:
A lexeme is an actual sequence of characters in the source code that matches a pattern and is
classified as a token. Lexemes are the concrete instances that the lexical analyzer identifies in the
input source code. They are the actual "words" of the program.

Example:
In the statement int x = 10; , x is a lexeme that corresponds to the identifier token.

2. Compiler Construction Tools (e.g., Lex):

Page 1 of 3
Lex (or Flex, the faster version) is a tool used for generating lexical analyzers. A lexical analyzer (or
lexer) is responsible for scanning the input source code and breaking it down into tokens.

How Lex Works:

Lex uses regular expressions to define patterns for different types of tokens and then
generates code (usually in C or C++) to perform the actual tokenization process.

Input: A specification file containing regular expressions and corresponding token

names.
Output: A C program that performs lexical analysis.

Example:
If you define a regular expression in Lex for integer literals ( [0-9]+ ), the Lex tool will
generate a lexer that recognizes integer literals and classifies them as the token INTEGER .

3. Directed Acyclic Graph (DAG) – for optimization and value numbering:

A Directed Acyclic Graph (DAG) is a graph used for representing expressions in an optimized form,
particularly in the context of compiler optimizations. It is used to minimize redundant calculations,
making expressions more efficient.

DAG in Compiler Optimization:

After generating intermediate code for a program, a DAG can be constructed for each expression
in the code. In this graph, nodes represent operations (like addition or multiplication), and edges
represent operands. The key property of a DAG is that there are no cycles, meaning no operand is
dependent on itself (directly or indirectly). This allows for optimization, such as common
subexpression elimination (reusing the results of previously computed expressions) and value
numbering (assigning unique identifiers to expressions with the same value).

Value Numbering:
In value numbering, expressions that yield the same result are assigned the same "number" or
identifier. This helps eliminate redundant computations by reusing the previously computed
results. For instance, if you have an expression a + b and later b + a , the compiler can
recognize that they are equivalent and compute the result only once, reusing the same value.

DAG Example:
Consider the expression (a + b) + (a + b) . In a simple linear evaluation, this expression
would be computed twice, but in a DAG, it can be optimized by recognizing that the two instances
of (a + b) are the same and only computing it once.

Summary:

Page 2 of 3
Token: Abstract classification of characters from source code (e.g., int , + , identifier ).
Pattern: Regular expression or rule defining the structure of a token.
Lexeme: The actual string in the source code that corresponds to a token.
Lex: A tool for generating lexical analyzers that identify tokens from source code.
DAG: A graph used for optimization, such as common subexpression elimination and value
numbering, by eliminating redundant calculations.

Page 3 of 3

1 - Scanning Slides Sanyal Part1
No ratings yet
1 - Scanning Slides Sanyal Part1
22 pages
2-Lexical Analysis
No ratings yet
2-Lexical Analysis
52 pages
CD - Ch.1
No ratings yet
CD - Ch.1
28 pages
CD - CH2 - Lexical Analysis
No ratings yet
CD - CH2 - Lexical Analysis
59 pages
Comp Final
No ratings yet
Comp Final
16 pages
5.tokens, Patterns, and Lexemes
No ratings yet
5.tokens, Patterns, and Lexemes
7 pages
Unit I
No ratings yet
Unit I
89 pages
Compiler Construction Final Notes For End Sem Exam
No ratings yet
Compiler Construction Final Notes For End Sem Exam
37 pages
Chapter 1
No ratings yet
Chapter 1
28 pages
CD Laqs
No ratings yet
CD Laqs
29 pages
Compiler
No ratings yet
Compiler
14 pages
CD Chapter 1
No ratings yet
CD Chapter 1
28 pages
Lecture 3 - Lexical Analysis
No ratings yet
Lecture 3 - Lexical Analysis
42 pages
Acd 2.1
No ratings yet
Acd 2.1
20 pages
Learning Materials, CD, Unit-2 (Lexical Analysis)
No ratings yet
Learning Materials, CD, Unit-2 (Lexical Analysis)
13 pages
Day 2 - Lexial Analyzer
No ratings yet
Day 2 - Lexial Analyzer
37 pages
Document 4 Compiler Design - Lexical Analysis Notes
No ratings yet
Document 4 Compiler Design - Lexical Analysis Notes
6 pages
Unit 2
No ratings yet
Unit 2
14 pages
002chapter 2 - Lexical Analysis
No ratings yet
002chapter 2 - Lexical Analysis
114 pages
CD - Ch.1
No ratings yet
CD - Ch.1
28 pages
Lecture 4 Lexical Analysis
No ratings yet
Lecture 4 Lexical Analysis
23 pages
Compiler Design
No ratings yet
Compiler Design
16 pages
Overview of Compiler Environment Pass and Phase Phases of Compiler Regular Expression Lexical Analyzer LEX Tool Bootstrapping
No ratings yet
Overview of Compiler Environment Pass and Phase Phases of Compiler Regular Expression Lexical Analyzer LEX Tool Bootstrapping
35 pages
Chapter 2 - Lexical Analysis
No ratings yet
Chapter 2 - Lexical Analysis
74 pages
Chapter 2 - Lexical Analysis
No ratings yet
Chapter 2 - Lexical Analysis
10 pages
Unit 1
No ratings yet
Unit 1
50 pages
Compiler Design
No ratings yet
Compiler Design
14 pages
Compiler Design Notes
No ratings yet
Compiler Design Notes
35 pages
CD Top 20
No ratings yet
CD Top 20
27 pages
Compiler Design for Students
No ratings yet
Compiler Design for Students
40 pages
Assignment
No ratings yet
Assignment
13 pages
Chapter 2-Lexical Analysis
No ratings yet
Chapter 2-Lexical Analysis
48 pages
SS Unit 4
No ratings yet
SS Unit 4
29 pages
Chapter 2 - Lexical Analyser
No ratings yet
Chapter 2 - Lexical Analyser
38 pages
Unit 1
No ratings yet
Unit 1
109 pages
Compiler Lexical Analysis Guide
No ratings yet
Compiler Lexical Analysis Guide
39 pages
Lexical Analysis
No ratings yet
Lexical Analysis
12 pages
Compiler Construction II Handout
100% (1)
Compiler Construction II Handout
27 pages
Chapter 2 Lexical Analyser
No ratings yet
Chapter 2 Lexical Analyser
40 pages
ATCD Mod 3
No ratings yet
ATCD Mod 3
46 pages
Chapter 2 Lexical Analysis (Scanning)
No ratings yet
Chapter 2 Lexical Analysis (Scanning)
56 pages
Lexical and Syntax Analysis
No ratings yet
Lexical and Syntax Analysis
3 pages
CD - CH2 - Lexical Analysis
No ratings yet
CD - CH2 - Lexical Analysis
67 pages
Lexical Analysis in Compiler Design
No ratings yet
Lexical Analysis in Compiler Design
46 pages
CD UNIT-1
No ratings yet
CD UNIT-1
60 pages
CD Unit 1
No ratings yet
CD Unit 1
54 pages
HW 31712
No ratings yet
HW 31712
22 pages
Define Lexeme, Pattern, and Token Lexical
No ratings yet
Define Lexeme, Pattern, and Token Lexical
10 pages
CD 2 M
No ratings yet
CD 2 M
5 pages
Chapter 2
No ratings yet
Chapter 2
41 pages
Lexical Analysis
No ratings yet
Lexical Analysis
128 pages
Lexical Analysis
No ratings yet
Lexical Analysis
10 pages
Unit 5 SP
No ratings yet
Unit 5 SP
28 pages
Compiler Design Chapter 2
No ratings yet
Compiler Design Chapter 2
14 pages
Compiler Construction CS-4207: Lecture 4-5 Instructor Name: Atif Ishaq
100% (1)
Compiler Construction CS-4207: Lecture 4-5 Instructor Name: Atif Ishaq
37 pages
Lexical Analysis
No ratings yet
Lexical Analysis
153 pages
3a. Context Free Grammar
No ratings yet
3a. Context Free Grammar
18 pages
CD 22-23 Answers
No ratings yet
CD 22-23 Answers
28 pages
Lexical Analysis in Compilers
No ratings yet
Lexical Analysis in Compilers
5 pages
Traffic Sign Board
No ratings yet
Traffic Sign Board
88 pages
Data Structures Algorithms Solutions
No ratings yet
Data Structures Algorithms Solutions
110 pages
POP Using C Module 1 Notes
No ratings yet
POP Using C Module 1 Notes
55 pages
Two-Stage Framework For Corner Case Stimuli - Generation Using Transformer and - Reinforcement Learning
No ratings yet
Two-Stage Framework For Corner Case Stimuli - Generation Using Transformer and - Reinforcement Learning
7 pages
(MAA+1.2-1.3) +ARITHMETIC+SEQUENCES Eco
No ratings yet
(MAA+1.2-1.3) +ARITHMETIC+SEQUENCES Eco
12 pages
Tokens in C
No ratings yet
Tokens in C
33 pages
Machine Input Output Questions Guide
No ratings yet
Machine Input Output Questions Guide
4 pages
CC102 Week-4 Lesson-Introduction To Java Programming
100% (2)
CC102 Week-4 Lesson-Introduction To Java Programming
28 pages
DhananjayKulkarni Resume
No ratings yet
DhananjayKulkarni Resume
2 pages
M1 Unit 1 Lesson 4 SkillSupport
No ratings yet
M1 Unit 1 Lesson 4 SkillSupport
2 pages
Optimal Coordination of Distance and Over-Current Relays Using ICA
No ratings yet
Optimal Coordination of Distance and Over-Current Relays Using ICA
4 pages
New Course Code Letter For All Colleges
No ratings yet
New Course Code Letter For All Colleges
3 pages
Python Winter2022
No ratings yet
Python Winter2022
22 pages
Striver SDE Sheet
No ratings yet
Striver SDE Sheet
14 pages
Java Features
No ratings yet
Java Features
9 pages
XII CS Term1 Practical Solution
No ratings yet
XII CS Term1 Practical Solution
32 pages
Lesson Plan - Problem Solving Using C
No ratings yet
Lesson Plan - Problem Solving Using C
4 pages
LOW-LEVEL AND HIGH-LEVEL PROGRAMMING Office
No ratings yet
LOW-LEVEL AND HIGH-LEVEL PROGRAMMING Office
2 pages
How To - Computer Science Project
No ratings yet
How To - Computer Science Project
6 pages
TOC QUESTION PAPER DU 3rd YEAR
No ratings yet
TOC QUESTION PAPER DU 3rd YEAR
2 pages
Data Vault New Column Best Practices
No ratings yet
Data Vault New Column Best Practices
4 pages
SYS600 Status Codes
No ratings yet
SYS600 Status Codes
180 pages
Chapter 5
No ratings yet
Chapter 5
9 pages
Data Structures Class: BSC Cs / Bca: Department of Computer Science & Applications Amcat Exam Questions
No ratings yet
Data Structures Class: BSC Cs / Bca: Department of Computer Science & Applications Amcat Exam Questions
4 pages
Repeated Coding Solution
No ratings yet
Repeated Coding Solution
14 pages
Data Presentation Second Part (IHR)
No ratings yet
Data Presentation Second Part (IHR)
18 pages
Python Bug Dataset for ML Training
No ratings yet
Python Bug Dataset for ML Training
11 pages
Whole Number Y4 p2
No ratings yet
Whole Number Y4 p2
4 pages
Demystifying JavaScript Your Guide To Web Development's Essential Language
No ratings yet
Demystifying JavaScript Your Guide To Web Development's Essential Language
5 pages
Bugreport 2025 03 25 11 39 56 Dumpstate - Log 23601
No ratings yet
Bugreport 2025 03 25 11 39 56 Dumpstate - Log 23601
3 pages

Difference Between Token, Pattern, and Lexeme

Uploaded by

Difference Between Token, Pattern, and Lexeme

Uploaded by

In Phase 2: Lexical Analysis of compiler construction, the focus is on breaking the source code into a

1. Difference between Token, Pattern, and Lexeme:

2. Compiler Construction Tools (e.g., Lex):

How Lex Works:

Input: A specification file containing regular expressions and corresponding token

3. Directed Acyclic Graph (DAG) – for optimization and value numbering:

DAG in Compiler Optimization:

You might also like