0% found this document useful (0 votes)
15 views36 pages

NLP Introduction

Natural Language Processing (NLP) is a field of AI that focuses on the interaction between computers and human languages, enabling tasks like translation, summarization, and sentiment analysis. It involves components such as Natural Language Understanding (NLU) and Natural Language Generation (NLG), and faces challenges like ambiguity and the need for extensive data. NLP applications include smart assistants, spam detection, and chatbots, utilizing techniques from both classical and deep learning approaches.

Uploaded by

nitinpandey.dev
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views36 pages

NLP Introduction

Natural Language Processing (NLP) is a field of AI that focuses on the interaction between computers and human languages, enabling tasks like translation, summarization, and sentiment analysis. It involves components such as Natural Language Understanding (NLU) and Natural Language Generation (NLG), and faces challenges like ambiguity and the need for extensive data. NLP applications include smart assistants, spam detection, and chatbots, utilizing techniques from both classical and deep learning approaches.

Uploaded by

nitinpandey.dev
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

Prerequisite

for
Natural language processing
● Python
● Basic Concept of Machine
Learning and Deep Learning
Natural language processing
Natural language processing (NLP) is a subfield of computer science,
information engineering, and artificial intelligence concerned with
the interactions between computers and human (natural) languages,
in particular how to program computers to process and analyze large
amounts of natural language data.

Challenges in natural language processing frequently involve


speech recognition, natural language understanding, and natural
language generation.
It helps developers to organize knowledge for performing tasks such as translation, automatic
summarization, Named Entity Recognition (NER), speech recognition, relationship
extraction, and topic segmentation.
•Translation: Automatically converting text or speech from one language to another, like translating
English sentences into French.

•Automatic Summarization: Generating a condensed version of a longer text while retaining its key
points. It can be extractive (picking important sentences directly) or abstractive (creating new
sentences that summarize the text).

•Named Entity Recognition (NER): Identifying and classifying entities (like names of people,
organizations, dates, etc.) in text. For example, in the sentence "John works at Microsoft," NER would
label "John" as a person and "Microsoft" as an organization.

•Speech Recognition: Converting spoken language into text. It's what happens when you use voice
assistants like Siri or Google Assistant.

•Relationship Extraction: Identifying relationships between entities in a text. For example, in "John
works at Microsoft," this technique would extract the relationship "works_at" between "John" and
"Microsoft.“

•Topic Segmentation: Dividing a text into segments, each dealing with a specific topic. This is useful
in longer documents, like news articles or research papers, to identify different themes or subjects.
Natural language processing
Applications of NLP
There are the following applications of NLP -
1. Smart Assistants
Smart Assistants focuses on building systems that automatically
answer the questions asked by humans in a natural language.

2. Spam Detection
Spam detection is used to detect unwanted e-mails getting to a
user's inbox.
3. Sentiment Analysis
Sentiment Analysis is also known as opinion mining.
It is used on the web to analyse the attitude,
behaviour, and emotional state of the sender. This
application is implemented through a combination of
NLP (Natural Language Processing) and statistics by
assigning the values to the text (positive, negative, or
natural), identify the mood of the context (happy, sad,
angry, etc.)

4. Language Translation
Language translation is used to translate text or
speech from one natural language to another
natural language.
Example: Google Translator
5. Spelling correction
Microsoft Corporation provides word
processor software like MS-word,
PowerPoint for the spelling correction.

6. Speech Recognition
Speech recognition is used for
converting spoken words into text. It is
used in applications, such as mobile,
home automation, video recovery,
dictating to Microsoft Word, voice
biometrics, voice user interface, and so
on.
7. Chatbot
Implementing the Chatbot is one of the
important applications of NLP. It is used by
many companies to provide the customer's
chat services.

8. Information extraction
Information extraction is one of the most
important applications of NLP. It is used for
extracting structured information from
unstructured or semi-structured machine-
readable documents.
Applications
Used by
Advantages of NLP
•NLP helps users to ask questions about any subject and get a direct response within seconds.
•NLP offers exact answers to the question means it does not offer unnecessary and unwanted
information.
•NLP helps computers to communicate with humans in their languages.
•It is very time efficient.
•Most of the companies use NLP to improve the efficiency of documentation processes, accuracy of
documentation, and identify the information from large databases.

Disadvantages of NLP
•For the training of the NLP model, A lot of data and computation are required.
•Many issues arise for NLP when dealing with informal expressions, idioms, and cultural jargon.
•NLP results are sometimes not to be accurate, and accuracy is directly proportional to the accuracy
of data.
•NLP is designed for a single, narrow job since it cannot adapt to new domains and has a limited
function.
Classical NLP
•Overview: Classical NLP uses rule-based, statistical, and machine learning methods to process
language. These approaches were dominant before the rise of deep learning techniques.
•Key Features:
• Focus on linguistic rules: Uses grammar, syntax rules, and domain-specific dictionaries.
• Heavy reliance on handcrafted features and manual feature engineering.
• Often employs shallow machine learning models such as:
• Naive Bayes
• Support Vector Machines (SVM)
• Hidden Markov Models (HMM)
• Conditional Random Fields (CRF)
• Models like n-grams, Bag of Words (BoW), and TF-IDF (Term Frequency-Inverse Document
Frequency) are commonly used for text representation.
•Examples:
• Spam classification using TF-IDF with an SVM classifier.
• Named Entity Recognition (NER) using rule-based methods or CRFs.
• POS tagging using HMMs or CRFs.
•Limitations:
• Requires extensive feature engineering.
• Not very good at capturing long-term dependencies or the contextual meaning of words.
• Struggles with large datasets and complex tasks like machine translation or sentiment analysis
at scale.
Deep NLP (Deep Learning-based NLP)
•Overview: Deep NLP leverages neural networks, particularly deep learning architectures, to learn
patterns in language data without requiring handcrafted features. It has gained prominence due to its
superior performance in complex NLP tasks.
•Key Features:
• Uses deep learning models like:
• Recurrent Neural Networks (RNN) and variants like LSTMs and GRUs.
• Convolutional Neural Networks (CNN) (applied to text).
• Transformers (e.g., BERT, GPT).
• No need for manual feature engineering—models automatically learn features from raw text
data.
• Able to capture context and long-term dependencies in language.
• Often relies on word embeddings such as Word2Vec, GloVe, or BERT embeddings for text
representation.
How NLP, DNLP and DL involves in!!!
How NLP, DNLP and DL involves in!!!
How NLP work?
Components of NLP
There are the following two components of NLP -
1. Natural Language Understanding (NLU)
Natural Language Understanding (NLU) helps the machine to understand and analyse human
language by extracting the metadata from content such as concepts, entities, keywords, emotion,
relations, and semantic roles.
NLU mainly used in Business applications to understand the customer's problem in both spoken and
written language.
NLU involves the following tasks -
•It is used to map the given input into useful representation.
•It is used to analyze different aspects of the language.
2. Natural Language Generation (NLG)
Natural Language Generation (NLG) acts as a translator that converts the computerized data into
natural language representation. It mainly involves Text planning, Sentence planning, and Text
Realization.
NLP Working
Natural Language Understanding
Phonology – This science helps to deal with patterns present in the sound and speeches related
to the sound as a physical entity.

Pragmatics – This science studies the different uses of language.

Morphology – This science deals with the structure of the words and the systematic relations
between them.

Syntax – This science deal with the structure of the sentences.

Semantics – This science deals with the literal meaning of the words, phrases as well as
sentences.
Phases of NLP
There are the following five phases of NLP:
1. Lexical Analysis and Morphological
The first phase of NLP is the Lexical Analysis. This phase scans the source code as a stream of
characters and converts it into meaningful lexemes. It divides the whole text into paragraphs,
sentences, and words.
2. Syntactic Analysis (Parsing)
Syntactic Analysis is used to check grammar, word arrangements, and shows the relationship
among the words.
Example: Agra goes to the Poonam
In the real world, Agra goes to the Poonam, does not make any sense, so this sentence is rejected
by the Syntactic analyzer.
3. Semantic Analysis
Semantic analysis is concerned with the meaning representation. It mainly focuses on the literal
meaning of words, phrases, and sentences.
4. Discourse Integration
Discourse Integration depends upon the sentences that proceeds it and also invokes the meaning of
the sentences that follow it.
5. Pragmatic Analysis
Pragmatic is the fifth and last phase of NLP. It helps you to discover the intended effect by applying
a set of rules that characterize cooperative dialogues.
For Example: "Open the door" is interpreted as a request instead of an order.
Natural Language Generation
Based on NL-Understanding, it will suggest about:
● What should say to user.
● Should be Intelligent and Covervational as like human
● Usage of Structured data.
● With text and Sentence like planning.
Why NLP is difficult?
NLP is difficult because Ambiguity and Uncertainty exist in the language.
Ambiguity
There are the some following ambiguity –
Ambiguity:
Lexical Ambiguity : The Tank is full of water.
Syntactic Ambiguity : ill men and women get to hospital.
Semantic Ambiguity : The Bike hit the pole while it was running.
Pragmatic Ambiguity : The Army is coming.
•Lexical Ambiguity
Lexical Ambiguity exists in the presence of two or more possible meanings of the sentence within a
single word.
Example:
Manya is looking for a match.
In the above example, the word match refers to that either Manya is looking for a partner or Manya
is looking for a match. (Cricket or other match)
•Syntactic Ambiguity
Syntactic Ambiguity exists in the presence of two or more possible meanings within the sentence.
Example:
I saw the moon with the binocular.
In the above example, did I have the binoculars? Or did the moon have the binoculars?
•Referential Ambiguity
Referential Ambiguity exists when you are referring to something using the pronoun.
Example: Kiran went to Sunita. She said, "I am hungry."
In the above sentence, you do not know that who is hungry, either Kiran or Sunita.
How to build an NLP pipeline
There are the following steps to build an NLP pipeline -
Step1: Sentence Segmentation
Sentence Segment is the first step for building the NLP pipeline. It breaks the paragraph into
separate sentences.
Example: Consider the following paragraph -
Independence Day is one of the important festivals for every Indian citizen. It is celebrated on
the 15th of August each year ever since India got independence from the British rule. The
day celebrates independence in the true sense.
Sentence Segment produces the following result:
1."Independence Day is one of the important festivals for every Indian citizen."
2."It is celebrated on the 15th of August each year ever since India got independence from the
British rule."
3."This day celebrates independence in the true sense."
Step2: Word Tokenization
Word Tokenizer is used to break the sentence into separate words or tokens.
Example:
They offers Corporate Training, Summer Training, Online Training, and Winter Training.
Word Tokenizer generates the following result:
“They", "offers", "Corporate", "Training", "Summer", "Training", "Online", "Training", "and", "Winter",
"Training", "."
Step3: Stemming
Stemming is used to normalize words into its base form or root form. For example, celebrates,
celebrated and celebrating, all these words are originated with a single root word "celebrate." The
big problem with stemming is that sometimes it produces the root word which may not have any
meaning.
For Example, intelligence, intelligent, and intelligently, all these words are originated with a single
root word "intelligen." In English, the word "intelligen" do not have any meaning.
Step 4: Lemmatization
Lemmatization is quite similar to the Stemming. It is used to group different inflected forms of the
word, called Lemma. The main difference between Stemming and lemmatization is that it produces
the root word, which has a meaning.
For example: In lemmatization, the words intelligence, intelligent, and intelligently has a root word
intelligent, which has a meaning.
Step 5: Identifying Stop Words
In English, there are a lot of words that appear very frequently like "is", "and", "the", and "a". NLP
pipelines will flag these words as stop words. Stop words might be filtered out before doing any
statistical analysis.
Example: He is a good boy.
Step 6: Dependency Parsing
Dependency Parsing is used to find that how all the words in the sentence are related to each other.
Step 7:Count vectorizer
Count Vectorizer is a way to convert a given set of strings into a frequency representation. In the
above two examples you have Texts that are Tagged respectively. This is a very simple case of NLP
where you get tagged text data set and then using it you have to predict the tag of another text data.
Step 8: Part-of-speech tagging (POS tags)
POS stands for parts of speech, which includes Noun, verb, adverb, and Adjective. It indicates that
how a word functions with its meaning as well as grammatically within the sentences. A word has one
or more parts of speech based on the context in which it is used.
Example: "Google" something on the Internet.
In the above example, Google is used as a verb, although it is a proper noun.
Step 9: Named Entity Recognition (NER)
Named Entity Recognition (NER) is the process of detecting the named entity such as person name,
movie name, organization name, or location.
Example: Steve Jobs introduced iPhone at the Macworld Conference in San Francisco, California.
Step 10: Chunking
Chunking is used to collect the individual piece of information and grouping them into bigger pieces of
sentences.
NLP Libraries
Scikit-learn: It provides a wide range of algorithms for building machine learning models in Python.
Natural language Toolkit (NLTK): NLTK is a complete toolkit for all NLP techniques.
Pattern: It is a web mining module for NLP and machine learning.
TextBlob: It provides an easy interface to learn basic NLP tasks like sentiment analysis, noun
phrase extraction, or pos-tagging.
Quepy: Quepy is used to transform natural language questions into queries in a database query
language.
SpaCy: SpaCy is an open-source NLP library which is used for Data Extraction, Data Analysis,
Sentiment Analysis, and Text Summarization.
Gensim: Gensim works with large datasets and processes data streams.

You might also like