0% found this document useful (0 votes)
10 views2 pages

Lab NLP 4

This assignment for the B. Tech course in Natural Language Processing focuses on text normalization and preprocessing techniques. Students will perform tasks such as lowercasing, punctuation removal, tokenization, and stopword elimination on raw English text. The assignment includes practical coding exercises using NLTK to prepare text for NLP tasks like classification or sentiment analysis.

Uploaded by

vikrammadhad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views2 pages

Lab NLP 4

This assignment for the B. Tech course in Natural Language Processing focuses on text normalization and preprocessing techniques. Students will perform tasks such as lowercasing, punctuation removal, tokenization, and stopword elimination on raw English text. The assignment includes practical coding exercises using NLTK to prepare text for NLP tasks like classification or sentiment analysis.

Uploaded by

vikrammadhad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

School of Computer Science Engineering and Technology

Assignment-4
Course-B. Tech. Type- Specialization Elective
Course Code- CSET246 Course Name-Natural Language Processing

Year- 2025 Semester- Even


Date- Batch-All

Text Normalization and Preprocessing

Objective:
This assignment focuses on performing essential text preprocessing steps including lowercasing,
punctuation removal, tokenization, stopword elimination, and basic text analysis. Through
hands-on tasks, students will learn how to clean and prepare raw English text for downstream
NLP tasks like classification or sentiment analysis.

Q1. Perform basic text normalization on English text.


 Input sentence: "The Examination's RESULT was Declared!!"
 Normalize:
o Lowercase
o Remove punctuation
o Remove stopwords

import nltk
import string
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize

Download NLTK data (only once)


nltk.download('punkt')
nltk.download('stopwords')
Input sentence
sentence = "The Examination's RESULT was Declared!!"

Step 1: Lowercase
sentence = sentence.lower()

Step 2: Remove punctuation


sentence = sentence.translate(str.maketrans('', '', string.punctuation))

Step 3: Tokenize the sentence


tokens = word_tokenize(sentence)

Step 4: Remove stopwords


stop_words = set(stopwords.words('english'))
filtered_tokens = [word for word in tokens if word not in stop_words]

Final Output
print("Normalized Tokens:", filtered_tokens)

Q2. Tokenize sentences and words using NLTK.


 Input: 2–3 sentences
 Use nltk.sent_tokenize() and nltk.word_tokenize()
 Show sentence and word tokens

Q3. Identify if characters in the text are alphabets, digits, or special characters.
 Input: "Student123 scored 95%! Great job!!"
 Output:
o Alphabets: Student, scored, Great, job
o Digits: 123, 95
o Special Characters: %, !, !!

You might also like