Lab 2 NLP

The assignment for the B. Tech course in Natural Language Processing focuses on data cleaning techniques to preprocess an uncleaned paragraph. Students are required to apply various steps such as lowercasing, removing extra whitespace, handling contractions, and fixing punctuation. Additionally, tasks include visualizing word frequency after tokenization and comparing root words from stemming versus lemmatization.

Uploaded by

vikrammadhad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views2 pages

Lab 2 NLP

Uploaded by

vikrammadhad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 2

School of Computer Science Engineering and Technology

Assignment-2
Course-B. Tech. Type- Specialization Elective
Course Code- CSET246 Course Name-Natural Language Processing

Year- 2025 Semester- Even

Date- Batch-All

Objective: The objective of this assignment is to familiarize with the essential data
cleaning steps in Natural Language Processing (NLP). Students will work with a
challenging uncleaned paragraph, applying various data cleaning techniques to
preprocess the text and make it suitable for further NLP tasks.

Task: You are provided with an uncleaned paragraph. Your task is to perform a
series of data cleaning steps to preprocess the text and make it ready for NLP tasks.

Data Cleaning Steps:

1. Lowercasing: Convert the entire paragraph to lowercase to ensure consistent

capitalization.
2. Removing Extra Whitespace: Remove extra spaces and ensure that only a single
space separates words.
3. Handling Contractions: Correct contractions to their full forms (e.g., "I <3 nlp"
to "I love NLP").
4. Removing Special Characters: Remove special characters such as @, #, $, %, &,
*, etc.
5. Reducing Duplicate Letters: Normalize repeated letters (e.g., "soooo" to "soo",
"loooong" to "long").
6. Fixing Punctuation: Correct the excessive use of punctuation marks and
normalize them.
7. Removing URL Artifacts: Clean up any remaining artifacts from URLs (e.g.,
"www.example.com////").

Paragraph:

1. OMG!! I can't believe I found this aWesoMe article about AI &

machine leanring!! It was soooo gooood lol. I <3 nlp butttt i hate spelng
errors in textttt. This is gonna be a looong paragraph with looooots of
spacesss and weirddddd symbols @#$%. The website's link is
www.example.com//// Check it out ASAP!!! #excited
2. Oh my gosh!!! Like, I can't even believe what I just stumbled upon on the
world wide web. This article, "The Marvels of Artificial General Intelligence
& the Future of Humanity," totally blew my mindddd!! It was, like,
sooo mind-blowingly awesome, lolz. I mean, I <3 NLP butttt those annoying
typos in texts drive me nuts. Brace yourselves, this is gonna be one seriously
long paragraph with tons and tons of extraterrestrial spaces and some
seriously weird symbols like @#$%. And guess what? The link to the
website is www.incrediblenews.com//// So, um, you better check it out like
ASAP!!! #excitedmuch

Q2. Visualize Word Frequency After Tokenization

Task: Read a paragraph from a file, perform tokenization using NLTK, and then
visualize the frequency of words using a bar graph.

Q3. Compare Root Words from Stemming vs Lemmatization Visually

Task: Take a sentence, apply both Porter Stemming and WordNet Lemmatization,
then plot a comparison showing how the two methods reduce words differently.

Natural Language Pre-Processing: Prepared By: Syed Afroz Ali
No ratings yet
Natural Language Pre-Processing: Prepared By: Syed Afroz Ali
81 pages
NLP Pipeline
No ratings yet
NLP Pipeline
50 pages
Text Noise Removal & Preprocessing
No ratings yet
Text Noise Removal & Preprocessing
38 pages
Week 8-Module 7 NLP
No ratings yet
Week 8-Module 7 NLP
52 pages
23951a04e3 Acsd08
No ratings yet
23951a04e3 Acsd08
11 pages
Approaching Almost Any NLP
No ratings yet
Approaching Almost Any NLP
118 pages
NLP Slides
No ratings yet
NLP Slides
19 pages
NLP Concepts Resources
No ratings yet
NLP Concepts Resources
48 pages
NLP Study Materials Updated
No ratings yet
NLP Study Materials Updated
43 pages
NLP Curriculum
No ratings yet
NLP Curriculum
2 pages
Chapter 7.1 - Introducing Natural Language Processing
No ratings yet
Chapter 7.1 - Introducing Natural Language Processing
39 pages
02 - NLP Pipeline - Binh
No ratings yet
02 - NLP Pipeline - Binh
37 pages
Assignment-9 (NLP)
No ratings yet
Assignment-9 (NLP)
2 pages
NLP Preprocessing Steps 1740444240
No ratings yet
NLP Preprocessing Steps 1740444240
20 pages
NLP Experiment 1
No ratings yet
NLP Experiment 1
13 pages
AMLTA
No ratings yet
AMLTA
17 pages
NLP Preprocessing Steps
No ratings yet
NLP Preprocessing Steps
20 pages
CSDM2-Text Preprocessing For NL Data - 011050
No ratings yet
CSDM2-Text Preprocessing For NL Data - 011050
6 pages
NLP Techniques for Students
No ratings yet
NLP Techniques for Students
55 pages
NLP - Notes
No ratings yet
NLP - Notes
3 pages
Module-I NLP
No ratings yet
Module-I NLP
35 pages
NLP - Shortnotes Unit 1 & 2
No ratings yet
NLP - Shortnotes Unit 1 & 2
16 pages
NLP Short Notes
No ratings yet
NLP Short Notes
21 pages
NLP Full Overview
No ratings yet
NLP Full Overview
37 pages
Module 1
No ratings yet
Module 1
49 pages
NLP Lab Manual - 1
No ratings yet
NLP Lab Manual - 1
40 pages
M6L2 Lyst1662
No ratings yet
M6L2 Lyst1662
24 pages
NLP (DP) Notes1
No ratings yet
NLP (DP) Notes1
61 pages
Experiment 2
No ratings yet
Experiment 2
4 pages
NLP Guide for AI Students
No ratings yet
NLP Guide for AI Students
29 pages
NLP Pipeline: Chapter-2
No ratings yet
NLP Pipeline: Chapter-2
171 pages
NLP - Shortnotes Unit 1 & 2
No ratings yet
NLP - Shortnotes Unit 1 & 2
16 pages
CSR 322 Syllabus
No ratings yet
CSR 322 Syllabus
2 pages
Chapter-1 Introduction To NLP
No ratings yet
Chapter-1 Introduction To NLP
12 pages
TSP Unit1 Own
No ratings yet
TSP Unit1 Own
13 pages
NLP Syllabus R21
100% (1)
NLP Syllabus R21
2 pages
Intro To NLP: Natural Language Toolkit
No ratings yet
Intro To NLP: Natural Language Toolkit
11 pages
Unit I NLP
No ratings yet
Unit I NLP
5 pages
NLP Applications and Preprocessing
No ratings yet
NLP Applications and Preprocessing
56 pages
NLP - Course EDC 1 29
No ratings yet
NLP - Course EDC 1 29
29 pages
NLP 2
No ratings yet
NLP 2
45 pages
NLP Report File
No ratings yet
NLP Report File
30 pages
VO - MCA - SEM 4 - Text Mining - U2
No ratings yet
VO - MCA - SEM 4 - Text Mining - U2
15 pages
NLP Manual (1-12)
No ratings yet
NLP Manual (1-12)
54 pages
Module 5
No ratings yet
Module 5
69 pages
NLP Roadmap 1
No ratings yet
NLP Roadmap 1
10 pages
NLP Roadmap 1
No ratings yet
NLP Roadmap 1
10 pages
NLB Lab Manuel 2
No ratings yet
NLB Lab Manuel 2
71 pages
Module 2
No ratings yet
Module 2
19 pages
Foundation (Week 4) - DeepTech - Ready Upskilling Program
No ratings yet
Foundation (Week 4) - DeepTech - Ready Upskilling Program
12 pages
AI Zone: Log in Sign Up
No ratings yet
AI Zone: Log in Sign Up
24 pages
Great Big Natural Language Processing Primer KDnuggets
No ratings yet
Great Big Natural Language Processing Primer KDnuggets
25 pages
Natural Language Processing Syllabus
No ratings yet
Natural Language Processing Syllabus
9 pages
TSP Unit1 Own
No ratings yet
TSP Unit1 Own
20 pages
2 - 6N302 Natural Language Processing
No ratings yet
2 - 6N302 Natural Language Processing
6 pages
Week11 1-1
No ratings yet
Week11 1-1
22 pages
Coursera CMWGEQ3QR0ZY
No ratings yet
Coursera CMWGEQ3QR0ZY
1 page
Lab NLP 4
No ratings yet
Lab NLP 4
2 pages
Lab 1 Introduction
No ratings yet
Lab 1 Introduction
2 pages
Lab NLP5
No ratings yet
Lab NLP5
2 pages
Hostel Management System Assignment
No ratings yet
Hostel Management System Assignment
2 pages
Assignment 4 Adversarial Attacks
No ratings yet
Assignment 4 Adversarial Attacks
2 pages
New Project List 1
No ratings yet
New Project List 1
12 pages
MC SSCQ8210 V1.0 IoT-Domain Specialist 09.04.2019
No ratings yet
MC SSCQ8210 V1.0 IoT-Domain Specialist 09.04.2019
23 pages
Hexagon Lincoln Composites Tuffshell ASME
100% (1)
Hexagon Lincoln Composites Tuffshell ASME
12 pages
Statement of Purpose
No ratings yet
Statement of Purpose
4 pages
Eaton Panel PDF
No ratings yet
Eaton Panel PDF
84 pages
Growing Industry Applications of LPWAN Technologies
No ratings yet
Growing Industry Applications of LPWAN Technologies
32 pages
Pradipta RC
No ratings yet
Pradipta RC
1 page
Datasheet Mcu
No ratings yet
Datasheet Mcu
122 pages
Mudassir CV
No ratings yet
Mudassir CV
3 pages
Flyback Chris Basso APEC Seminar 2011
No ratings yet
Flyback Chris Basso APEC Seminar 2011
165 pages
Programing With C & C++ - Removed
No ratings yet
Programing With C & C++ - Removed
5 pages
IFAC Brochure Drone Workshop.
No ratings yet
IFAC Brochure Drone Workshop.
4 pages
Summative Test-MIL2
No ratings yet
Summative Test-MIL2
3 pages
Girl From Nowhere
No ratings yet
Girl From Nowhere
326 pages
Ph.D. Positions - UMN
No ratings yet
Ph.D. Positions - UMN
3 pages
Versal:: The First Adaptive Compute Acceleration Platform (ACAP)
No ratings yet
Versal:: The First Adaptive Compute Acceleration Platform (ACAP)
21 pages
Vietnam's Internet User Insights
No ratings yet
Vietnam's Internet User Insights
64 pages
PowerPoint Presentation Accenture
No ratings yet
PowerPoint Presentation Accenture
11 pages
Data Sheet: TDA3618JR
No ratings yet
Data Sheet: TDA3618JR
24 pages
Arena Peugeot
No ratings yet
Arena Peugeot
12 pages
Merrill Dataside - Project Hobbit
No ratings yet
Merrill Dataside - Project Hobbit
6 pages
Microscope-Distributor Price List V20220809
No ratings yet
Microscope-Distributor Price List V20220809
16 pages
Swathi - Java Full-Stack Developer Resume
No ratings yet
Swathi - Java Full-Stack Developer Resume
6 pages
Et200sp Ai 4xu I 2 Wire ST Manual en-US en-US PDF
No ratings yet
Et200sp Ai 4xu I 2 Wire ST Manual en-US en-US PDF
42 pages
Alcatel 1000 E10 Exchange Parts
No ratings yet
Alcatel 1000 E10 Exchange Parts
1 page
Call Center - Nidia Martinez
No ratings yet
Call Center - Nidia Martinez
2 pages
Aire Acondicionado Attom
No ratings yet
Aire Acondicionado Attom
24 pages
Chandnaa
No ratings yet
Chandnaa
2 pages
JD For SE SD AI Hub NITs - 2025 Batch
No ratings yet
JD For SE SD AI Hub NITs - 2025 Batch
2 pages

Lab 2 NLP

Uploaded by

Lab 2 NLP

Uploaded by

School of Computer Science Engineering and Technology

Year- 2025 Semester- Even

Data Cleaning Steps:

1. Lowercasing: Convert the entire paragraph to lowercase to ensure consistent

1. OMG!! I can't believe I found this aWesoMe article about AI &amp;

Q2. Visualize Word Frequency After Tokenization

Q3. Compare Root Words from Stemming vs Lemmatization Visually

You might also like

1. OMG!! I can't believe I found this aWesoMe article about AI &