0% found this document useful (0 votes)
85 views18 pages

NLP Ass 1&2

The document discusses natural language processing (NLP), including what NLP is, its applications, and the stages of NLP. It also defines ambiguity and discusses different types of ambiguity as well as challenges in NLP and knowledge levels required for natural language understanding.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
85 views18 pages

NLP Ass 1&2

The document discusses natural language processing (NLP), including what NLP is, its applications, and the stages of NLP. It also defines ambiguity and discusses different types of ambiguity as well as challenges in NLP and knowledge levels required for natural language understanding.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Natural Language Processing (Assignment 1)

Q.1. What is NLP? What are the applications of NLP?


Answer:
NLP:
➢ NLP stands for Natural Language Processing.
➢ It is part of Computer Science, Human language, and Artificial Intelligence.
➢ The field of study that focuses on the interactions between human language and computers is called
natural language processing.
➢ It Is the technology that is used by machines to understand, analyse, manipulate, and interpret
human's languages.
➢ Natural language recognition and natural language generation are types of NLP.
➢ It helps machines to process and understand the human language so that they can automatically
perform repetitive tasks.
➢ Example: Facebook uses NLP to track trending topics and popular hashtags.

NLP Applications:

Question Answering: NLP systems can understand and respond to questions posed in natural language,
which is the basis for chatbots and virtual assistants.

Text Classification: NLP can be used to automatically categorize and tag text data into predefined
categories. This is often used for sentiment analysis, spam detection, and topic classification.

Machine Translation: NLP systems like Google Translate use algorithms to automatically translate text or
speech from one language to another.

Speech Recognition: NLP is involved in converting spoken language into text, enabling voice assistants like
Siri and Alexa to understand and respond to verbal commands.

Information Retrieval: NLP helps search engines understand user queries and retrieve relevant documents
or web pages from vast databases.

Language Generation: NLP models can generate human-like language in the form of poetry, storytelling, or
even code generation.

Legal: NLP is used for legal document analysis, contract review, and e-discovery.

Education: NLP can assist in automated grading of essays, personalized learning, and language learning
applications.
1
By shriyash_m_g
Q.2. Explain the stages of NLP.
Answer: There are five stages in Natural Language Processing:

Lexical Analysis:
➢ Lexical Analysis is the first stage in NLP.
➢ It is also known as morphological anaiysis.
➢ It involves identifying and analyzing the structure of words.
➢ It divides the whole text into paragraphs, sentences, and words.
➢ When you apply lemmatization to a word, it reduces that word to its most basic or dictionary form,
which is the lemma. For example, the lemma of "running," "ran," and "runs" is "run."

Syntactic Analysis:
➢ It is also known ·as parsing.
➢ Syntactic Analysis is used to check grammar, word arrangements, and shows the relationship among
the words.
➢ Example: Agra goes to the Rutuja
➢ In the real world, Agra goes to the Rutuja, does not make any sense, so this sentence is rejected by
the Syntactic analyzer.
➢ Dependency Grammar and Part of Speech (POS) tags are the important attributes of text syntactic.

Semantic Analysis:
➢ Semantic analysis is concerned with the meaning representation.
➢ It mainly focuses on the literal meaning of word.
➢ Consider the sentence: "The apple ate a banana".
➢ Although the sentence is syntactically correct, it doesn't make sense because apples can't eat.
➢ Semantic analysis looks for meaning in the given sentence.
➢ It also deals with combining words into phrases.

2
By shriyash_m_g
Disclosure Integration:
➢ The meaning of any sentence depends upon the meaning of the sentence just before it.
➢ In addition, it also brings about the meaning of immediately succeeding sentence.
➢ In the text, "Yash Patil is a bright student. He spends most of the time in the library." Here, discourse
assigns "he" to refer to "Yash Patil".

Pragmatic Analysis:
➢ Pragmatic is the fifth and last phase of NLP.
➢ During this, what was said is re-interpreted on what it actually meant.
➢ It involves deriving those aspects of language which require real world knowledge.
➢ For example: "Open the book" is interpreted as a request instead of an order.

3
By shriyash_m_g
Q.3. Define ambiguity. Explain different types of ambiguity with examples.
Answer:
Ambiguity in NLP:
➢ Ambiguity is the capability of being understood in more than one way.
➢ Natural language is very ambiguous.
➢ Any sentences in a language with a large enough grammar can have another interpretation.

Types of Ambiguity:
1) Lexical Ambiguity:
➢ The ambiguity of a single word is called lexical ambiguity.
➢ For example, treating the word silver as a noun, an adjective, or a verb.
- She got a silver in the long jump.
- Her dress is trimmed with silver ribbon.
- The surface of the lake was silvered by moonlight

2) Syntactic Ambiguity:
➢ This kind of ambiguity occurs when a sentence is parsed in different ways.
➢ For example, the sentence “The man saw the girl with the telescope”.
➢ It is ambiguous whether the man saw the girl carrying a telescope or he saw her through his
telescope.

3) Semantic Ambiguity:
➢ This kind of ambiguity occurs when the meaning of the words themselves can be misinterpreted.
➢ In other words, semantic ambiguity happens when a sentence contains an ambiguous word or
phrase.
➢ For example, the sentence “The car hit the pole while it was moving” is having semantic ambiguity
because the interpretations can be “The car, while moving, hit the pole” and “The car hit the pole
while the pole was moving”.

4) Anaphoric Ambiguity:
➢ This kind of ambiguity arises due to the use of anaphora entities in discourse.
➢ For example, the horse ran up the hill. It was very steep. It soon got tired. Here, the anaphoric
reference of “it” in two situations cause ambiguity.

5) Pragmatic Ambiguity:
➢ Pragmatic ambiguity arises when the statement is not specific. It is the most difficult ambiguity.
➢ For example, the sentence "I like you too" can have multiple interpretations like I like you just like
you like me), I like you just like someone else dose).

4
By shriyash_m_g
Q.4. Explain the challenges in NLP
Answer:
Contextual words and phrases and homonyms:
➢ The same words and phrases can have different meanings according the context of a sentence and
many words - especially in English - have the exact same pronunciation but totally different
meanings.

Synonyms:
➢ Synonyms can lead to issues similar to contextual understanding because we use many different
words to express the same idea.
➢ Furthermore, some of these words may convey exactly the same meaning, while some may be levels
of complexity (small, little, tiny, minute) and different people use synonyms to denote slightly different
meanings within their personal vocabulary.

Irony and sarcasm:


➢ lrony and sarcasm present problems for machine learning models because they generally use words
and phrases that, strictly by definition, may be positive or negative, but actually cannot the opposite.

Lack of research and development:


➢ Machine learning requires A LOT of data to function to its outer limits - billions of pieces of training
Data.
➢ The more data NLP models are trained on, the smarter they become.
➢ All of the problems above will require more research and new techniques in order to improve on
➢ them.

Errors in text or speech:


➢ Misspelled or misused words can create problems for text analysis.
➢ Autocorrect and grammar correction applications can handle common mistakes, but don't always
understand the writer's intention.

Ambiguity:
➢ Ambiguity in NLP refers to sentences and phrases that potentially have two or more possible
interpretations.
➢ There are Lexical, Semantic & Syntactic ambiguity.
➢ Even for humans the sentence alone is difficult to interpret without the context of surrounding text.
➢ POS (part of speech) tagging is one NLP solution that can help solve the problem, somewhat.

5
By shriyash_m_g
Q.5. Explain the knowledge level.
Answer:
The different Levels (forms) of knowledge required for natural language understanding are given below:
Phonetic and phonological Knowledge:
➢ Phonetics is the study of language at the level of sounds while phonology is the study of combination
of sounds into organized units of speech.
➢ Phonetic and phonological knowledge is essential for speech-based systems as they deal with how
words are related to the sounds that realize them
Morphological knowledge:
➢ Morphology concerns word formation.
➢ It is a study of the patterns of formation of words by the combination of sounds into minimal distinctive
units of meaning called morphemes.
➢ Morphological knowledge concerns how words are constructed from morphemes.
Syntactic knowledge:
➢ Syntax is the level at which we study how words are combined to form phrases, phrases combine to
form clauses, and clauses join to make sentences Syntax analysis concerns sentence formation.
➢ It deals with how words can be put together to form correct sentences.
Semantic knowledge:
➢ It concerns meanings of the words and sentences.
➢ Defining the meaning of a sentence is very difficult due to the ambiguities involved
Pragmatic knowledge:
➢ Pragmatics is the extension of meaning or semantics.
➢ Pragmatics deals with the contextual aspects of meaning in particular situations.
➢ It concerns how sentences are used in different situations
Discourse Knowledge:
➢ Discourse concerns connected sentences.
➢ It is a study of chunks of language which are bigger than a single sentence.
➢ Discourse language concerns inter-sentential links that is how the immediately preceding sentences
affect the interpretation of the next sentence.
➢ Discourse knowledge is important for interpreting pronouns and temporal aspects of the information
conveyed.
World Knowledge:
➢ World knowledge is nothing but everyday knowledge that all speakers share about the world.
➢ It includes the general knowledge about the structure of the world and what each language user
must know about the other user's beliefs and goals.
➢ This is essential to make the language understanding much better.

6
By shriyash_m_g
Q.6. Explain the generic NLP system.
Answer:
The Generic NLP system are: ELIZA, SysTran, TAUM METEO, SHRDLU, LUNAR.
Eliza-(Weizerbaum-1966):
➢ It is the earliest Natural language Understanding system.
➢ It uses symmetric patterns to mimic human conversation with the user.
➢ This system demonstrates communication between humans and machines.

Systran- (System Translation-1969):


➢ It is the first language translation used for Russian- English Translation.
➢ It also provides first online machine translation service called as Babel-fish.
➢ Babel-fish was used by Altavista search engine for machine translation.

Taum METEO:
➢ It is the NL generation system used in Canada to generate weather reports.
SHRDLU-(1972):
➢ This is the NL understanding system that simulates actions of a robot in a Block world domain.
➢ It uses syntactic parsing and semantic reasoning to understand instructions.

LUNAR (1977):
➢ It is a question answering system that answered questions about moon rock.

7
By shriyash_m_g
Natural Language Processing (Assignment 2)

Q.1. What is morphology processing? Explain different types of morphologies.


Answer:
Morphology Processing:
➢ Words are potentially complex units, composed of even more basic units, called morphemes.
➢ A morpheme is the smallest part of a word that has grammatical function or meaning.
➢ We will designate them in braces - { }.
➢ For example, sawed, sawn, sawing, and saws can all be analyzed into the morphemes {saw} + {‑ed},
{‑n}, {‑ing}, and {‑s}, respectively.
➢ None of these last four can be further divided into meaningful units and each occurs in many other
words, such as looked, mown, coughing, bakes.
➢ Morphemes that must be attached as word parts are said to be bound.

Different types of morphologies:


➢ Morphemes can also be classified as root, derivational, or inflectional.
➢ Root morpheme
- It is the basic form to which other morphemes are attached.
- It provides the basic meaning of the word.
- The morpheme {saw} is the root of sawers.
➢ Derivational morphemes
- Derivational morphemes are added to forms to create separate words:
- {‑er} is a derivational suffix whose addition turns a verb into a noun, usually meaning the
person or thing that performs the action denoted by the verb.
- For example, {paint}+{-er} creates painter, one of whose meanings is “someone who paints.”
➢ Inflectional morphemes
- Inflectional morphemes do not create separate words.
- They merely modify the word in which they occur in order to indicate grammatical properties
such as plurality, as the {-s} of magazines does, or past tense, as the {ed} of babecued does.
- For example, in immovability, {im-}, {-abil}, and {-ity} are all derivational morphemes, and
when we remove them, we are left with {move}, which cannot be further divided into
meaningful pieces, and so must be the word’s root.
- We must distinguish between a word’s root and the forms to which affixes are attached. In
moveable, {-able} is attached to {move}, which we’ve determined is the word’s root. However,
{im-} is attached to moveable, not to {move} (there is no word immove), but moveable is not
a root.

8
By shriyash_m_g
Q.2. Explain N-gram model. Explain how n-gram model used for spelling correction.
Answer:
N-gram model:
➢ An N-gram language model predicts the probability of a given N-gram within any sequence of words
in the language.
➢ A good N-gram model can predict the next word in the sentence i.e. the value of p(wlh) – where w
is word & h - where the history of word contains n-1 words.
➢ The model is the probabilistic language model which is trained on the collection of the text.
➢ This model is useful in applications i.e. speech recognition, and machine translations.

Consider an example: “This is a sentence”.


✓ In one-gram or unigram, there is a one-word sequence. As for the above statement, in one gram it
can be “This”, “is”, “a”, “sentence”.
✓ In two-gram or the bi-gram, there is the two-word sequence i.e. “This is”, “is a”, or “a sentence”.
✓ In the three-gram or the tri-gram, there are the three words sequences i.e. “This is a”, or “is a
sentence”.
The illustration of the N-gram modelling i.e. for N=1,2,3 is given below in Figure.

N-gram model used for spelling correction:


➢ N-gram language model is about finding probability distributions over the sequences of the word.
Consider the sentences: "There was heavy rain" and "There was heavy flood".
✓ By using experience, it can be said that the first statement is good.
✓ The N-gram language model tells that the "heavy rain" occurs more frequently than the "heavy
flood".
✓ So, the first statement is more likely to occur and it will be then selected by this model.
✓ In 2-gram, only the previous words are considered for predicting the current word.
✓ In 3-gram, two previous words are considered.

9
By shriyash_m_g
✓ In the N-gram language model the following probabilities are calculated:
P (“There was heavy rain”) = P (“There”, “was”, “heavy”, “rain”) = P (“There”) P (“was” |“There”) P (“heavy”| “There was”)
P (“rain” |“There was heavy”).
✓ As it is not practical to calculate the conditional probability, this is approximated to the bi-gram
model as:
P (“There was heavy rain”) ~ P (“There”) P (“was” |“There”) P (“heavy” |“was”) P (“rain” |“heavy”)

10
By shriyash_m_g
Q.3. Explain Lexicon-free FST Porter stemmer. Explain evaluation of N-grams using perplexity.
Answer:

11
By shriyash_m_g
Evaluation of N-grams using perplexity:
✓ A common metric is to use perplexity, often written as PP.
✓ Perplexity is the multiplicative inverse of the probability assigned to the test set by the language
model, normalized by the number of words in the test set.
✓ If a language model can predict unseen words from the test set, i.e., the P(a sentence from a test
set) is highest; then such a language model is more accurate.
✓ For a test setW = w1w2 : : wN, :

✓ We can use the chain rule to expand the probability of W:

✓ Thus, if we are computing the perplexity of W with a bigram language model, we get:

Extra points:
➢ Because of the inverse relationship with probability, minimizing perplexity implies maximizing the
test set probability.
➢ Perplexity can also be related to the concept of entropy in information theory.
➢ It is important in any N-gram model to include markers at start and end of sentences.
➢ This ensures that the total probability of the whole language sums to one.
➢ But all calculations should include the end markers but not the start markers in the count of word
tokens.
➢ In natural language processing, perplexity is a way of evaluating language models.
➢ A language model is a probability distribution over entire sentences or texts.

12
By shriyash_m_g
Q.4. Explain Morphological parsing with finite state transducer.
Answer:

13
By shriyash_m_g
Q.5. Explain and exercise Good turing discounting
Answer:

Following is the formula:

14
By shriyash_m_g
Q.6. What is meant by Laplace or Add one smoothing?
Answer:

15
By shriyash_m_g
Q.7. Define Regular relation. Give an example.
Answer:

✓ Define (regular) languages are sufficient for some natural language applications, it is often useful to
have a mechanism for relating two (formal) languages.
✓ For example, a part-of-speech tagger can be viewed as an application that relates a set of natural
language strings (the source language) to a set of part-of-speech tags (the target language).
✓ (Relations over languages).
Consider a simple part-of-speech tagger:
Assume that the natural language is defined over Σ1 = {a, b, . . . , z} and that the set of tags is Σ2 =
{PRON, V, DET, ADJ, N, P }. Then the part-of-speech relation might contain the following pairs (here,
a string over Σ1 is mapped to a single element of Σ2):

✓ A regular relation is defined over two alphabets, Σ1 and Σ2. Of course, the two alphabets can be
identical, but for many natural language applications they differ.
✓ If a relation in Σ∗ × Σ∗ is regular, its projections on both coordinates are regular languages (not all
relations that satisfy this condition are regular; additional constraints must hold on the underlying
mapping which we ignore here).
✓ Informally, a regular relation is a set of pairs, each of which consists of one string over Σ1 and one
string over Σ2, such that both the set of strings over Σ1 and that over Σ2 constitute regular
languages.

16
By shriyash_m_g
Q.8. What is the most probable next word predicted by the bigram model for the following word sequences?

Answer:
(1) <s> do ?
P( do | <s>) = 2/5 = 0.4
(2) <s> I like Henry ?
P( Henry | like ) = 0/3 = 0
(3) <s> Do I like College ?
P( College | like ) = 3/3 = 1
(4) <s> Do I like ?
P( like | I ) = 2/4 = 0.5
(5) a) <s> I like College </s>
- P( I | <s> ) = 2/5 = 0.4
- P( like | I ) = 2/4 = 0.5
- P( College | like ) = 3/3 = 1
- P ( <s> | College ) = 3/3 = 1

Probability = (2/5) * (2/4) * (3/3) * (3/3) = 0.2

b) <s> Do I like Henry </s>


- P( Do | <s> ) = 2/5 = 0.4
- P( I | Do ) = 1/2 = 0.5
- P( like | I ) = 2/4 = 0.5
- P( Henry | like ) = 0/3 = 0
- P ( </s> | Henry ) = 1/3 = 0.3

Probability = 0.4 * 0.5 * 0 * 0.5 * 0.3 = 0

Most probable sequence is “ <s> I like College </s> ”


17
By shriyash_m_g
18
By shriyash_m_g

You might also like