0% found this document useful (0 votes)

9 views5 pages

NLP Exp 5 B707

The document outlines an experiment to study and implement N-Gram models using NLTK and TextBlob in natural language processing. It explains the concept of N-grams, their importance, applications, advantages, and limitations, along with a practical coding example. Additionally, it highlights the features and functionalities of TextBlob for various NLP tasks, emphasizing the synergy between N-grams and TextBlob for effective text processing.

Uploaded by

mhatreyashb04

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views5 pages

NLP Exp 5 B707

Uploaded by

mhatreyashb04

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

DEPARTMENT OF COMPUTER ENGINEERING

EXPERIMENT NO. 05

DATE:

ROLL NO: B-707

AIM: To study and implement N-Gram using NLTK and Teaxt Blob.

THEORY:

An N-gram is a contiguous sequence of N items from a given text or speech sample. In natural
language processing (NLP), these items are usually words, but they can also be characters or
phonemes.

• Unigram: A sequence of one item (single words).

• Bigram: A sequence of two adjacent items (pairs of words).
• Trigram: A sequence of three adjacent items.
• And so forth...

Why are N-grams important?

• N-grams are fundamental in modeling language statistically.
• They help in understanding and predicting word sequences.
• Widely used in applications like speech recognition, machine translation, and text
prediction.

How N-grams work

Consider the sentence:
"Natural language processing is fun."

• Unigrams: ["Natural", "language", "processing", "is", "fun"]

• Bigrams: [("Natural", "language"), ("language", "processing"), ("processing", "is"),
("is", "fun")]
• Trigrams: [("Natural", "language", "processing"), ("language", "processing", "is"),
("processing", "is", "fun")]

Applications of N-grams
• Language Modeling: Estimating the probability of word sequences.
• Spell Checking: Detecting unusual word sequences.
• Text Classification: Features for categorizing documents.
• Information Retrieval: Improving search accuracy.
DEPARTMENT OF COMPUTER ENGINEERING
• Machine Translation: Aligning phrases across languages. Advantages and
Limitations
Advantages:

• Simple and effective representation of local word context.

• Easy to compute and implement.
• Can be used in various machine learning algorithms.

Limitations:

• Fixed window size means limited context (doesn't capture long-range dependencies).
• Data sparsity for higher N values (e.g., 4-grams or 5-grams). Large storage and
computation required for higher N.

TextBlob is a Python library for processing textual data. It provides a simple API for common
NLP tasks built on top of NLTK and Pattern libraries.

Key Features of TextBlob

• Tokenization: Splitting text into words or sentences.
• Part-of-Speech Tagging: Identifying nouns, verbs, adjectives, etc.
• Noun Phrase Extraction: Pulling out phrases that act as nouns.
• Sentiment Analysis: Evaluating the polarity (positive/negative) and subjectivity of text.
• Translation and Language Detection: Translating between languages.
• Word Inflection and Lemmatization: Normalizing words to their base forms.

How TextBlob Works

TextBlob leverages trained models and corpora to analyze text. For example:

POS Tagging: Uses pre-trained taggers to assign syntactic categories to words.

Sentiment Analysis: Uses lexicon-based approaches assigning polarity scores.

Example Workflow
1. Create a TextBlob object:
2. blob = TextBlob("Natural language processing is fascinating.") 3.
Extract nouns and verbs:
4. nouns = [word for word, pos in blob.tags if pos.startswith('NN')] 5.
Perform sentiment analysis:
DEPARTMENT OF COMPUTER ENGINEERING
6. sentiment = blob.sentiment
7. print(sentiment.polarity, sentiment.subjectivity) Applications
of TextBlob
• Rapid prototyping of NLP applications.
• Sentiment analysis for social media or customer reviews.
• Text classification and filtering.
• Educational tools for teaching NLP concepts.
• Automated chatbots and virtual assistants. Advantages and Limitations

Advantages:

• Easy to use with minimal code.

• Provides many NLP features in a unified API.
• Built on robust NLP libraries (NLTK, Pattern).

Limitations:

• Less control for advanced NLP tasks.

• Performance depends on underlying corpora/models.
• Limited support for deep learning or contextual embeddings.
Aspect N-grams TextBlob

Core Sequences of N contiguous items in

Python library for NLP tasks
Concept text

Focus Statistical language modeling Text analysis, sentiment, tagging

Strengths Simple, effective local context capture Easy to use, multiple NLP tools
integrated
Language modeling, speech Sentiment analysis, noun
Use Cases
recognition phrase extraction

Limited context, data sparsity for

Limitations
large N Limited control, depends on corpora
DEPARTMENT OF COMPUTER ENGINEERING
CODE: import nltk from nltk.util import ngrams from
nltk.tokenize import word_tokenize from textblob
import
TextBlob

# Download all necessary NLTK data for TextBlob and tokenization

nltk.download('punkt') nltk.download('averaged_perceptron_tagger')

nltk.download('brown') nltk.download('wordnet')

nltk.download('omw-1.4')

# Also download TextBlob corpora (this is sometimes necessary) import

textblob.download_corpora textblob.download_corpora.download_all()

# Sample input text text = "Artificial intelligence and machine learning are transforming

the world rapidly."

# --- TEXTBLOB ANALYSIS --- blob

= TextBlob(text)

print("----- TextBlob Analysis -----") print("Sentiment:", blob.sentiment)

print("Noun

Phrases:", blob.noun_phrases) print("POS

Tags:", blob.tags)

# --- NLTK N-GRAMS (1, 2, 3) --- # Tokenize

the text tokens = word_tokenize(text)

DEPARTMENT OF COMPUTER ENGINEERING

# Generate unigrams, bigrams, trigrams

unigrams = list(ngrams(tokens, 1)) bigrams

= list(ngrams(tokens, 2)) trigrams

= list(ngrams(tokens, 3))

print("\n----- NLTK N-grams -----")

print("Unigrams:", unigrams) print("Bigrams:", bigrams)

print("Trigrams:", trigrams)

Conclusion:
N-grams provide a foundational technique for modeling and analyzing text by capturing local
word sequences, while TextBlob offers an easy-to-use toolkit for performing a wide range of
NLP tasks such as sentiment analysis and part-of-speech tagging. Together, they enable
efficient and effective text processing in many real-world applications.

Unit2 Full
No ratings yet
Unit2 Full
28 pages
Natural Language Processing
No ratings yet
Natural Language Processing
34 pages
What Is TextBlob
No ratings yet
What Is TextBlob
10 pages
CSE 3652 Lab Record Format - PDF
No ratings yet
CSE 3652 Lab Record Format - PDF
13 pages
Python NLP Assignment
No ratings yet
Python NLP Assignment
9 pages
Chapter - 1: Existing System
100% (1)
Chapter - 1: Existing System
15 pages
BLC 2 BLC 1nlp12erged
No ratings yet
BLC 2 BLC 1nlp12erged
11 pages
CSDM2-Text Preprocessing For NL Data - 011050
No ratings yet
CSDM2-Text Preprocessing For NL Data - 011050
6 pages
Practical Lect1 - NLP in Python by TextBlob
No ratings yet
Practical Lect1 - NLP in Python by TextBlob
6 pages
TextBlob Sentiment Analysis Guide
No ratings yet
TextBlob Sentiment Analysis Guide
42 pages
Natural Language Processing
No ratings yet
Natural Language Processing
17 pages
NLP Lab
No ratings yet
NLP Lab
63 pages
Lesson 5 NLP Libraries
No ratings yet
Lesson 5 NLP Libraries
69 pages
NLP Lab Manual - Final
No ratings yet
NLP Lab Manual - Final
15 pages
What Is NLP?
No ratings yet
What Is NLP?
74 pages
NLP Prep
No ratings yet
NLP Prep
14 pages
Lecture 8 - Text Analytics NLP
No ratings yet
Lecture 8 - Text Analytics NLP
24 pages
Minor Assignment-3 (NLP)
No ratings yet
Minor Assignment-3 (NLP)
2 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
33 pages
NLP Exp2
No ratings yet
NLP Exp2
6 pages
NLP Record
No ratings yet
NLP Record
6 pages
Natural Language Processing
No ratings yet
Natural Language Processing
25 pages
TSA Book
No ratings yet
TSA Book
154 pages
NLP Smitpatel
No ratings yet
NLP Smitpatel
32 pages
NLP Exercises
No ratings yet
NLP Exercises
2 pages
Tinywow Pythass3 77951173
No ratings yet
Tinywow Pythass3 77951173
17 pages
Jal Patel NLP
No ratings yet
Jal Patel NLP
32 pages
2 - 6N302 Natural Language Processing
No ratings yet
2 - 6N302 Natural Language Processing
6 pages
Important 2 Marks
No ratings yet
Important 2 Marks
11 pages
NLP Short Notes
No ratings yet
NLP Short Notes
21 pages
NLP Final Review
No ratings yet
NLP Final Review
32 pages
2 Marks
No ratings yet
2 Marks
11 pages
A34 NLP Expt 02
No ratings yet
A34 NLP Expt 02
7 pages
Unraveling The Power of Natural Language Processing
No ratings yet
Unraveling The Power of Natural Language Processing
11 pages
AI Zone: Log in Sign Up
No ratings yet
AI Zone: Log in Sign Up
24 pages
Unit 5 - Aiaaia
No ratings yet
Unit 5 - Aiaaia
19 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
15 pages
Week 6: Introduction To Natural Language Processing
No ratings yet
Week 6: Introduction To Natural Language Processing
18 pages
Natural Language Processing Dossier 20231110 141736 0000
No ratings yet
Natural Language Processing Dossier 20231110 141736 0000
114 pages
NLP Lab Manual-1
No ratings yet
NLP Lab Manual-1
18 pages
NLP - Unit 2
No ratings yet
NLP - Unit 2
67 pages
NLP Pipeline: Chapter-2
No ratings yet
NLP Pipeline: Chapter-2
171 pages
NLP Practical Manual
No ratings yet
NLP Practical Manual
48 pages
NLTK for NLP Education
No ratings yet
NLTK for NLP Education
4 pages
CS-875-Lecture 4
No ratings yet
CS-875-Lecture 4
47 pages
NLP Lab Codes Till Mod3
No ratings yet
NLP Lab Codes Till Mod3
7 pages
NLP Challenges & Techniques
No ratings yet
NLP Challenges & Techniques
45 pages
InfoSec Lab Manual for Students
No ratings yet
InfoSec Lab Manual for Students
25 pages
Rohini 69628885691 - Notes Rohini 69628885691 - Notes
No ratings yet
Rohini 69628885691 - Notes Rohini 69628885691 - Notes
11 pages
NLP Techniques for Students
No ratings yet
NLP Techniques for Students
55 pages
CAT King Study Material 5
No ratings yet
CAT King Study Material 5
21 pages
Text Blob
No ratings yet
Text Blob
16 pages
Text Blob
No ratings yet
Text Blob
16 pages
N-Gram Language Models
No ratings yet
N-Gram Language Models
15 pages
NLP Applications and Preprocessing
No ratings yet
NLP Applications and Preprocessing
56 pages
NLP Unit 5
No ratings yet
NLP Unit 5
15 pages
NLP New
No ratings yet
NLP New
3 pages
NLP
No ratings yet
NLP
29 pages
Implementation of N-Gram Technique
No ratings yet
Implementation of N-Gram Technique
6 pages
IJACSA SpecialIssueNo3
No ratings yet
IJACSA SpecialIssueNo3
155 pages
Hidden Markov Models & POS Tagging
No ratings yet
Hidden Markov Models & POS Tagging
156 pages
Parts of Speech Tagging HMM
No ratings yet
Parts of Speech Tagging HMM
5 pages
Part-Of-Speech (POS) Tagging
No ratings yet
Part-Of-Speech (POS) Tagging
53 pages
Hands On Question Answering Systems With BERT Applications in Neural Networks and Natural Language Processing 1st Edition Navin Sabharwal Amit Agrawal 2024 Scribd Download
100% (7)
Hands On Question Answering Systems With BERT Applications in Neural Networks and Natural Language Processing 1st Edition Navin Sabharwal Amit Agrawal 2024 Scribd Download
53 pages
AI Lab Manual - New
No ratings yet
AI Lab Manual - New
16 pages
Deep Learning For Natural Language Processing
100% (2)
Deep Learning For Natural Language Processing
372 pages
NLP A
No ratings yet
NLP A
6 pages
CSE 2021 - 2025 Batch 7th Sem Syllabus
No ratings yet
CSE 2021 - 2025 Batch 7th Sem Syllabus
8 pages
Text Processing Guide for NLP
No ratings yet
Text Processing Guide for NLP
15 pages
NLU: Challenges and Applications
No ratings yet
NLU: Challenges and Applications
3 pages
CB3591 - Engineering Ssecure Software Systems - Notes
No ratings yet
CB3591 - Engineering Ssecure Software Systems - Notes
50 pages
An Empirical Study On POS Tagging For Vietnamese Social Media Text
No ratings yet
An Empirical Study On POS Tagging For Vietnamese Social Media Text
15 pages
NLP CH 2
No ratings yet
NLP CH 2
59 pages
Exploring Corpora Task 1 - 2023
No ratings yet
Exploring Corpora Task 1 - 2023
13 pages
Part-Of-Speech Tagging: A Simple But Useful Form of Linguistic Analysis Christopher Manning
No ratings yet
Part-Of-Speech Tagging: A Simple But Useful Form of Linguistic Analysis Christopher Manning
14 pages
A Hybrid Model For POS Tagging
No ratings yet
A Hybrid Model For POS Tagging
4 pages
Text Mining & NLP for Academics
No ratings yet
Text Mining & NLP for Academics
38 pages
Quick Pad Tagger: An Efficient Graphical User Interface For Building Annotated Corpora With Multiple Annotation Layers
No ratings yet
Quick Pad Tagger: An Efficient Graphical User Interface For Building Annotated Corpora With Multiple Annotation Layers
13 pages
NLP 3
No ratings yet
NLP 3
25 pages
Decoding Ai and Human Authorship: Nuances Revealed Through NLP and Statistical Analysis
No ratings yet
Decoding Ai and Human Authorship: Nuances Revealed Through NLP and Statistical Analysis
19 pages
Module 4 - Chapter 12
No ratings yet
Module 4 - Chapter 12
15 pages
Ai&Ml Bai601 NLP Lab Manual
No ratings yet
Ai&Ml Bai601 NLP Lab Manual
48 pages
NLP Record
No ratings yet
NLP Record
15 pages
Computational Linguistics and Intelligent Text Processing 20th International Conference Cicling 2019 La Rochelle France April 713 2019 Revised Selected Papers Part Ii Alexander Gelbukh PDF Download
100% (1)
Computational Linguistics and Intelligent Text Processing 20th International Conference Cicling 2019 La Rochelle France April 713 2019 Revised Selected Papers Part Ii Alexander Gelbukh PDF Download
78 pages
Syntactic Analysis
No ratings yet
Syntactic Analysis
3 pages
NLP Unit-I Notes
No ratings yet
NLP Unit-I Notes
19 pages
Improving Sentiment Analysis in Arabic: A Combined Approach
No ratings yet
Improving Sentiment Analysis in Arabic: A Combined Approach
9 pages
Network Intrusion Detection System
No ratings yet
Network Intrusion Detection System
46 pages
4.29 Syllabus For M.E. Artificial Intelligence University of Mumbai
No ratings yet
4.29 Syllabus For M.E. Artificial Intelligence University of Mumbai
88 pages

NLP Exp 5 B707

Uploaded by

NLP Exp 5 B707

Uploaded by

DEPARTMENT OF COMPUTER ENGINEERING

ROLL NO: B-707

• Unigram: A sequence of one item (single words).

Why are N-grams important?

How N-grams work

• Unigrams: ["Natural", "language", "processing", "is", "fun"]

• Simple and effective representation of local word context.

Key Features of TextBlob

How TextBlob Works

POS Tagging: Uses pre-trained taggers to assign syntactic categories to words.

• Easy to use with minimal code.

• Less control for advanced NLP tasks.

Core Sequences of N contiguous items in

Focus Statistical language modeling Text analysis, sentiment, tagging

Limited context, data sparsity for

# Download all necessary NLTK data for TextBlob and tokenization

# Also download TextBlob corpora (this is sometimes necessary) import

the world rapidly."

# --- TEXTBLOB ANALYSIS --- blob

print("----- TextBlob Analysis -----") print("Sentiment:", blob.sentiment)

Phrases:", blob.noun_phrases) print("POS

# --- NLTK N-GRAMS (1, 2, 3) --- # Tokenize

the text tokens = word_tokenize(text)

# Generate unigrams, bigrams, trigrams

unigrams = list(ngrams(tokens, 1)) bigrams

= list(ngrams(tokens, 2)) trigrams

print("\n----- NLTK N-grams -----")

You might also like