0% found this document useful (0 votes)
9 views5 pages

NLP Exp 5 B707

The document outlines an experiment to study and implement N-Gram models using NLTK and TextBlob in natural language processing. It explains the concept of N-grams, their importance, applications, advantages, and limitations, along with a practical coding example. Additionally, it highlights the features and functionalities of TextBlob for various NLP tasks, emphasizing the synergy between N-grams and TextBlob for effective text processing.

Uploaded by

mhatreyashb04
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views5 pages

NLP Exp 5 B707

The document outlines an experiment to study and implement N-Gram models using NLTK and TextBlob in natural language processing. It explains the concept of N-grams, their importance, applications, advantages, and limitations, along with a practical coding example. Additionally, it highlights the features and functionalities of TextBlob for various NLP tasks, emphasizing the synergy between N-grams and TextBlob for effective text processing.

Uploaded by

mhatreyashb04
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

DEPARTMENT OF COMPUTER ENGINEERING

EXPERIMENT NO. 05

DATE:

ROLL NO: B-707

AIM: To study and implement N-Gram using NLTK and Teaxt Blob.

THEORY:

An N-gram is a contiguous sequence of N items from a given text or speech sample. In natural
language processing (NLP), these items are usually words, but they can also be characters or
phonemes.

• Unigram: A sequence of one item (single words).


• Bigram: A sequence of two adjacent items (pairs of words).
• Trigram: A sequence of three adjacent items.
• And so forth...

Why are N-grams important?


• N-grams are fundamental in modeling language statistically.
• They help in understanding and predicting word sequences.
• Widely used in applications like speech recognition, machine translation, and text
prediction.

How N-grams work


Consider the sentence:
"Natural language processing is fun."

• Unigrams: ["Natural", "language", "processing", "is", "fun"]


• Bigrams: [("Natural", "language"), ("language", "processing"), ("processing", "is"),
("is", "fun")]
• Trigrams: [("Natural", "language", "processing"), ("language", "processing", "is"),
("processing", "is", "fun")]

Applications of N-grams
• Language Modeling: Estimating the probability of word sequences.
• Spell Checking: Detecting unusual word sequences.
• Text Classification: Features for categorizing documents.
• Information Retrieval: Improving search accuracy.
DEPARTMENT OF COMPUTER ENGINEERING
• Machine Translation: Aligning phrases across languages. Advantages and
Limitations
Advantages:

• Simple and effective representation of local word context.


• Easy to compute and implement.
• Can be used in various machine learning algorithms.

Limitations:

• Fixed window size means limited context (doesn't capture long-range dependencies).
• Data sparsity for higher N values (e.g., 4-grams or 5-grams). Large storage and
computation required for higher N.

TextBlob is a Python library for processing textual data. It provides a simple API for common
NLP tasks built on top of NLTK and Pattern libraries.

Key Features of TextBlob


• Tokenization: Splitting text into words or sentences.
• Part-of-Speech Tagging: Identifying nouns, verbs, adjectives, etc.
• Noun Phrase Extraction: Pulling out phrases that act as nouns.
• Sentiment Analysis: Evaluating the polarity (positive/negative) and subjectivity of text.
• Translation and Language Detection: Translating between languages.
• Word Inflection and Lemmatization: Normalizing words to their base forms.

How TextBlob Works


TextBlob leverages trained models and corpora to analyze text. For example:

POS Tagging: Uses pre-trained taggers to assign syntactic categories to words.


Sentiment Analysis: Uses lexicon-based approaches assigning polarity scores.

Example Workflow
1. Create a TextBlob object:
2. blob = TextBlob("Natural language processing is fascinating.") 3.
Extract nouns and verbs:
4. nouns = [word for word, pos in blob.tags if pos.startswith('NN')] 5.
Perform sentiment analysis:
DEPARTMENT OF COMPUTER ENGINEERING
6. sentiment = blob.sentiment
7. print(sentiment.polarity, sentiment.subjectivity) Applications
of TextBlob
• Rapid prototyping of NLP applications.
• Sentiment analysis for social media or customer reviews.
• Text classification and filtering.
• Educational tools for teaching NLP concepts.
• Automated chatbots and virtual assistants. Advantages and Limitations

Advantages:

• Easy to use with minimal code.


• Provides many NLP features in a unified API.
• Built on robust NLP libraries (NLTK, Pattern).

Limitations:

• Less control for advanced NLP tasks.


• Performance depends on underlying corpora/models.
• Limited support for deep learning or contextual embeddings.
Aspect N-grams TextBlob

Core Sequences of N contiguous items in


Python library for NLP tasks
Concept text

Focus Statistical language modeling Text analysis, sentiment, tagging

Strengths Simple, effective local context capture Easy to use, multiple NLP tools
integrated
Language modeling, speech Sentiment analysis, noun
Use Cases
recognition phrase extraction

Limited context, data sparsity for


Limitations
large N Limited control, depends on corpora
DEPARTMENT OF COMPUTER ENGINEERING
CODE: import nltk from nltk.util import ngrams from
nltk.tokenize import word_tokenize from textblob
import
TextBlob

# Download all necessary NLTK data for TextBlob and tokenization

nltk.download('punkt') nltk.download('averaged_perceptron_tagger')

nltk.download('brown') nltk.download('wordnet')

nltk.download('omw-1.4')

# Also download TextBlob corpora (this is sometimes necessary) import

textblob.download_corpora textblob.download_corpora.download_all()

# Sample input text text = "Artificial intelligence and machine learning are transforming

the world rapidly."

# --- TEXTBLOB ANALYSIS --- blob

= TextBlob(text)

print("----- TextBlob Analysis -----") print("Sentiment:", blob.sentiment)


print("Noun

Phrases:", blob.noun_phrases) print("POS

Tags:", blob.tags)

# --- NLTK N-GRAMS (1, 2, 3) --- # Tokenize

the text tokens = word_tokenize(text)


DEPARTMENT OF COMPUTER ENGINEERING

# Generate unigrams, bigrams, trigrams

unigrams = list(ngrams(tokens, 1)) bigrams

= list(ngrams(tokens, 2)) trigrams

= list(ngrams(tokens, 3))

print("\n----- NLTK N-grams -----")


print("Unigrams:", unigrams) print("Bigrams:", bigrams)

print("Trigrams:", trigrams)

Conclusion:
N-grams provide a foundational technique for modeling and analyzing text by capturing local
word sequences, while TextBlob offers an easy-to-use toolkit for performing a wide range of
NLP tasks such as sentiment analysis and part-of-speech tagging. Together, they enable
efficient and effective text processing in many real-world applications.

You might also like