0% found this document useful (0 votes)
115 views86 pages

NLP Toppers Solution

The document outlines the curriculum for a Natural Language Processing (NLP) course, detailing various modules such as Introduction to NLP, Word Level Analysis, Syntax Analysis, Semantic Analysis, and Pragmatics. It discusses the advantages and disadvantages of NLP, along with its applications including machine translation, sentiment analysis, and chatbots. The document also emphasizes the importance of understanding ambiguity in natural language and the different levels of NLP processing.

Uploaded by

bayilo7328
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
115 views86 pages

NLP Toppers Solution

The document outlines the curriculum for a Natural Language Processing (NLP) course, detailing various modules such as Introduction to NLP, Word Level Analysis, Syntax Analysis, Semantic Analysis, and Pragmatics. It discusses the advantages and disadvantages of NLP, along with its applications including machine translation, sentiment analysis, and chatbots. The document also emphasizes the importance of understanding ambiguity in natural language and the different levels of NLP processing.

Uploaded by

bayilo7328
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 86

Mar 2022 Edition

Natural Language Processing (NLP) SEM - 8 I BE • COMPUTER

T~l~j
S~lllbYii

[Exam TT·l TT·2 AVO Term Work


\ OraVP~:~c•:J
End of

Exam

Marks 20 20 20 25 80
---- -

\# l.
Module
Introduction
Details Contents
History of NLP, Generic Nl.P system, levels of NLP,
No.
05
Knowledge in language processing , Ambiguity in Natural
language, stages in NLP, challenges of NLP, Applications of
NLP
2. Word Level Morphology analysis-survey of English Morphology, 16
Analysis inflectional morphology & Derivational morphology,

I
Lemmatization, Regular expression, finite automata, finite
state transducers (FST), Morphological parsing with FSi,
Lexicon FST Porter stemmer. N-Grams- N-gram
language model, N-gram for spelling correction
I
3. Syntax analysis Part-Of-Speech tagging (POS) - Tag set for Engltsh (Penn 26
Treebank), Rule based POS tagging, Stochastic POS tagging, \
I
Issues-Multiple tags & words, Unknown words. Introduction
I
I
to CFG, Sequence labeling: Hidden Markov Model (HMM),

I 4. Semantic Analysis
Maximum Entropy, and Conditional Random Field (CRF).
Lexical Semantics, Attachment for fragment of English- 37
sentences, noun phrases, Verb phrases, prepositional
phrases, Relations among lexemes & their senses -
Homonymy, Polysemy, Synonymy, Hyponymy, WordNet,
Robust Word Sense Disambiguation (WSD), Dictionary
based approach
5. Pragmatics Discourse-reference resolution, reference phenomenon, 53
syntactic & semantic constraints on co reference
6. Applications ( Machine translation, Information retrieval, Question answers 57
preferably for system, categorization, summarization, sentiment analysis,
Indian regional Named Entity Recognition.
languages)

Note: We have tried to cover almost every important question(s) listed in syllabus. If you feel any
other question is important and it is not cover in this solution then do mail the question on
Support@BackkBenchers.com or WhatsApp us on +91·9930038388 / +91-7507531198
¥ Handcrafted by BackkBenchers Community •
Chap-11 Introduction SEM - 8 I BE • COMPUTER

c
Ql. What Is NLP & Describe Appllcatlon1 of NLP.

Ans:

l. NLP stands for Natural Language Procenlng.


2 It Is part of Computer Science, Human language, and Artificial Intelligence.
3. The field of study that focuses on the interactions between human language and computers is
called natural language processing.
4. It is the technology that is used by machines to understand, analyse, manipulate, and interpret
human's languages.
s. Natural language recognition and natural language generation are types of NLP.
6. It helps machines to process and understand the human language so that they can automatically
perform repetitive tasks.
7. It helps developers to organize knowledge for performing tasks such as translation, automatic
summarization, Named Entity Recognition (NER), speech recognition, relationship
extraction, and topic segmentation.
8. Example: Facebook uses NLP to track trending topics and popular hashtags.

ADVANIACES OF NLP:

l. NLP helps users to ask questions about any subject and get a direct response within seconds.
2 NLP offers exact answers to the question means it does not offer unnecessary and unwanted
information.
3. NLP helps computers to communicate wit!> humans in their languages.
4. It is very time efficient.
5. Most of the companies use NLP to improve the efficiency of documentation processes, accuracy of
documentation, and identify the information from large databases.

DISADVANTAGESOF NLP:,.

1. NLP may not show context.


2 NLP is unpredictable.
3. NLP may require more keystrokes.
4. NLP is unable to adapt to the new domain, and it has a limited function that's why NLP is built for
a single and specific task only.

APPLICATIONS:

I) Machine Translation:
1. Machine translation is used to translate text or speech from one natural language to another natural
language. ·

" Handcrafted by BackkBenchers Community


• Pages of 70
SEM - 8 I BE • COMP~~
Chap-11 Introduction

· h
e the knowledge of the words and Phra
2. For performing the translation, it Is important v s.
to a S~
grammar of two languages that are I nvoIv ed In translation ' semantics of the languages a"·
knowledge of the word.
3. Example: Google Translator

II) Question AnsM{lng;


l. Question Answering focuses on building systems that automatically answer the questions asked by
humans in a natural language.
2. Question-Answering (QA) is becoming more and more popular thanks to applications such as Siri,
OK Google, chat boxes and virtual assistants.
3. A QA application is a system capable of coherently answering a human request.
4· It may be used as a text-only interface or as a spoken dialog system.

Ill) Sentiment Ana!vsis:


l Sentiment Analysis is also known as opinion mining.
2. It is used on the web to analyse the attitude, behaviour, and emotional state of the sender.
3. The goal of sentiment analysis is to identify sentiment among several posts or even in the same
post where emotion is not always explicitly expressed.
4. Companies use natural language processing applications, such as sentiment analysis, to identify
opinions and sentiment on line to help them understand what customers think about their products
and services (i.e., "I love the ne'-~ iPhone" and, a few lines later "But sometimes it doesn't work
well"
where the person is still talking about the iPhone) and overall indicators of their reputation.

IV) Speech Synthesis:


l. Automatic production of speech is known as speech syntt'iesis.
2. It ~eans speaking a sentence in natural language.
3. The speech synthesis system reads mails on your telephone or reads storybooks for you.
4. For generating the utterances text processing is required, so, NLP is an important component in
speech synthesis system.

V) Speech Recognition:
l. Speech recognition is used for converting spoken words into text.
2. Speech Recognition is a technology that enables the computer to convert voice input data to
machine readable format.
3. There are a lot of fields where speech recognition is used like, virtual assistants, adding speech-to•
text, translating speech, sending emails etc.
4. It is used in search engines where the user can vcice out the name of their search requirements
and get the desired result, making our work easierthan typing out the entire command.
5. Example: Hey Siri, Ok Google.

VI) •nformation Retrieval:


1. In information retrieval the relevant documents related to the user's queries are identified.
• Handcrafted by BackkB€.nchersCommunity Page
6of70
Chap·l I Introduction SEM - 8 I BE • COMPUTER

2. In Information retrieval Indexing, query modlftcotlon, word son e dl1 mblguatlon, 11nd knowlodgo
bases are used for enhancing the performance.
3. For example, Wordnet, and Longman Dictionary of Contompornry Engll~h (LDO':EJ ar G<>mo u5(1ful
lexical resources for Information retrieval research.

VII) Information Extracthm:


1. Information extraction Is one of the most Important applications of NLP.
2. It is used for extracting structured Information from unstructured or semi-structured machine·
readable documents.
3. Information extraction system captures and output factual Information contained within a
document.
4. Like information retrieval system, information extraction system also response to user's information
need.

Vlll)Text Summarisation:
l. There is a huge amount of data available on the internet and it is very hard to go through all
the data to extract a single piece of information.
2. With the help of NLP,text summarization has been made available to the users.
3. This helps in the simplification of huge amounts of data in articles, news, research papers etc.
4. This application is used in Investigative Discovery to ,identify patterns in writing reports,
Social Media Analytics to track awarenessand identify influencers, and Subject-matter expertise
to classify content into meaningful topics.

IX) Recruitment: .
1. In this competitive world, big and small companies are on the receiving end of thousands of
resumesfrom different candidates,
2. It has become a tough job for the HR team to go through all the resumes and select the best
candidate for one single position.
3. NLPhasmade the job easier by filtering through all the resumes and shortlisting the candidates by
different techniques like information extraction and name entity recognition.
4. It goes through different attributes like Location, skills, education etc. and selects candidates
who meet the requirements of the company closely.

X) Chatbots:
l. Chatbots are programs that are designed to assist a user 24/7 and respond appropriately and
answer any query that the user might have.
2. Implementing the Chatbot is one of the important applications of NLP.
3. It is used by many companies to provide the customer's chat services.
4. Example: Facebook Messenger

• Handcrafted by BackkBenchers Community Page7 of70


SfM - 11 BE · COMPllt~~

NU•.
N ll... "' 'hr MM t
Ill
lnfi01!1gi0n ~
n h irnan lang• 1~9· ~l'lrJ caml'JU !l!ri; l'i

t-. man~ lang


!. N<1t ral Iring
gr\ o and natvral language genera ion are types of ~JLP.
{· tt helps fTlachlnes o process and \Jndetstand the hu~n language so tha · l'l~y can utorna i-:ally
perform repelitive tasks.
It helps developers to Of"ganize knowtedge for performing tasks such as translation. autorna ic
summarization, Named Entity Recognition (NER), speech recognition, relationshrp
extraction, and topic segmentation.
8· Exa~: f:acebook uses NLP to track trending topics and popular hashtags.

LEVELS OF Nu>:

Levels of NLP

Phonok)Qy Level

Morphological Level

Lexical Level

Syntactic Level

Semantic Level

Disclosure Level

Pragmatic Level
Figure l.l: Levelsof NLP
I} Phonok>gy Level:
l. Phonology level basically deals with the pronunciation.
2 Phonology identifies and interprets the sounds that makeup words when the machine has to
understand the spoken language.
3. It deals with physical building blocks of language sound system.
4. Example: Bank (finance) v/s Bank (River)
s. In hindi, aa-jayenge (will come) or aaj-ayenge (will come today).

• fiandcrafted by BackkBenchers Community Page 8of70


Chap-1 I tntroductfon
----- - ----~
II) M~ktl Ltvtt
t d I~ With lh ti

3. F re rnple, th w rd 1d 9 · hassh1QI rr th f( 'd I h


nd m, pheme 's' den t s si119ular ar d ur I co ce

no
l Le~ cal level deals with texical meahing or a word and part of speech.
It U$eS lexicon that is a collection of individual lexemes.
3, A ~xeme is a basic unit of lexical meaning; which is an abstract unit of morphological analysis hat
~sen ts the set of forms taken by a single morpheme.
-.. Fore ample, "Bank" can take the form of a noun or a verb but its part of speech and lexical
meaning can onty be derived in context with other words used in the sentence or phrase.

IV) 5mtactic: Level:


l. The part-of-speech tagging oufput of the lexical analysis can be used at the syntactic level of
linguistic processing to group words into the phrase and clause brackets.
2. Syntactic Analysis also referred to as "parsing", allows the extraction of phrases which convf!'i more
meaning than just the individual words ,bY themselves, such as in a noun phrase.
3. One example is differentiating between the subject and the object of the sentence, i.e., identifying
who is performing the action and who is the person affected by it.
4. For example, "Jethalal thanked Babita Ji" and "Babita Ji thanked Jethalal" are sentences with
different meanings from each other because in the first instance, the action of 'thanking' is done by
Jethalal and affects Babita Ji, whereas, in the other one, it is done by Babita Ji and affects Jethalal.

V) Semantic Level:
l. The semantic level of linguistic processing deals with the determination of what a sentence really
means by relating syntactic features and disambiguating words with multiple definitions to the
given context.
2. This level deals with the meaning of words and sentences.
3. There are two approaches of semantic level:
a. Syntax-Driven Semantic Analysis.
b. Semantic Grammar.
4. It is a study of the meaning of words that are associated with grammatical structure.
s. For example, Tony Kakkar inputs the data from this statement we can understand that Tony Kakkar
is an Agent.

VI) Disclosure Level:


1. The discourse level of linguistic processing deals with the analysis of structure and meaning of
text beyond a single sentence, making connections between words and sentences.
2. ft deals with the structure of different kinds of text.

• Handcrafted by BackkBenchers Community Page9 of70


Ch•P·' \ \nuoduct\oni _

V\\\~~ . · d I with the use of real-world knowled


I. Pragmatic means praetica\ or \ogiea\.
> Th• pragmat;c level of linguistic processing ea s 9
undeo>tandingof hoW th;simpacts th• meaning of what is being communicated.
3. By ana\ySing the contextual dimension of th• documents and queries, a more d

representation is derived.
4. Examples of Pragmatics: I heart you!
s. Semantically,"hea<t" refersto an organ in our body that pumps blood and keeps us
alive.
6. However, pragmatically, "heart' in this sentence means "\ove"-hearts are commonly u
symbol for love, and to "heart" someone .has come to mean that you love someone.

Q3. Describe Ambiguityin NLP.

Ans:

Refer Ql.

AMBlGUllY IN NLP:

1. N~tural language has a very rich form and structure.


2 It is very ambiguous.
3. Ambiguity me ans not havi.ng well defined solution
4.
5. Any sentences
Figure 12 s howsin dia .fferent
language with a lar ge e n o ugh. grammar can h ave another
types of amb igu i ty . i.
nterpret

Ambiguity in HLP

Lexical Syntactic Semantic Anaphoric


Ambiguity Ambiguity Ambiguity Ambiguity

Figure 1.2: Ambiguity in NLP

I) Lexical Ambiguity:
1. Lexical is the ambiguity of a si. ngle Word.

• Handcrafted by BackkBenchers Community


SEM - 81 Bl! • COMPUTER
------------~-----------
Chap 1

2. A word can be ambiguous with tt1pect to ltA synt1'ctlc clast.


3. Examplt: book, study.
4. The word silver can be used as a noun, an adjectlV~. or a
verb. a. She bagged two sllvet medals.
b. She made a silver speech.
e, His worries had silvered his hair.
5. Lexical ambiguity can be tesolv~d by Lexicalcategory di~ambiguation i.e., parts-of-speech ta99ing.

U) Syntactic Ambiguity;
1. This ty~ of ambiguity represents sentences that can be parsed in multiple syntacti€al forms.
2. Take the following sentence: "I heard his cell phone ring in my office".
3. The propositional phrase "in my office" can be parsed in a way that modifies the noun Gr en
another way that modifies the verb.

Ill) Semantic: Ambiguity:


1. Thistype of ambiguity is typically related to the interpretation of sentence.
2 This occurs when the meaning of the words themselves can be misinterpreted.
3. Even after the syntax and the meanings of the individual words have been resolved,there are
two ways of reading the sentence.
4. Consider the example, Seema loves her mother and Sriya does too.
'
5. The interpretations can
b Sriya loves Seema's mother or Sriya likes her own mother.

IV) Anaphoric: Ambiguity


1. This kind of ambiguity arises due to the useof anaphora entities in discourse.
2 For example, the horse ran up the hill. It was very steep. It soon got tired.
3. Here, the anaphoric reference of "it" in two situations cause ambiguity.

V) Pragmatic Ambiguity;
1. Such kind of ambiguity refers to the situation where the context of a phrase gives it multiple
interpretations.
2. In simple words, we can say that pragmatic ambiguity ariseswhen the statement is not specific.
3. For example, the sentence "I like you too" can have multiple interpretations like I like you Uust like
you like me), I like you Uustlike someone else dose).
4. It is the most difficult ambiguity.

• Handcraited by BackkBenchers Community Pagell of70


SEM - 8 I BE _ Co
- -- ~~~

proces.sing.
\'e ~ ~e!. \ at ral Language -
-... ..3 sho "~ t e stages of NLP.

[ J
\
J

Disclosure Integration

1
Pragmatic Analysis

Figure 13: Stages of NLP


I) Lexical Analysis:
1. Lexical Analysis is the first stage in NLP.
2 It is also known as, ,,or.,tlological analysis.
3. This phase scans the source code as a stream of characters and converts it into meaningful
4. It divides the whole text into paragraphs, sentences, and words.
s_ The most common lexicon normalization techniques are Stemming:
a. Stemming: Stemming is the process of reducing derived words to their word stem, ba
form-generally a written word form like-"ing","ly", "es", "s", etc
b. Lemmatization: Lemmatization is the process of reducing a group of words into th
or dictionary form. It takes into account things like POS (Parts of Speech},the mea
word in the sentence, the meaning of the word in the nearby sentences, etc. befor
the word to its lemma.

11) Syntactic Analysis:


1. It is also known as parsing.
2. Syntactic Analysis is used to check grammar, word arrangements, and shows the
among the words.
3. Example: Agra goes to the Rutuja
4. In the real world, Agra goes to the Rutuja, does not make any sense, so this sentence is
the Syntactic analyaer.

• Handcrafted by BackkBenchers Community Pa


Chap-11 Introduction
SEM - 8 I BE • COMPUTER

5.
Dependency Grammar and Part of Speech (POS) tags ore th Important ottrlbuteii of text syntactic.

Ill) Semantic Ana!ysls;

l. Semantic analysis is concerned 'l1ith th• mtanlng rtpr111nt1tlon.


2. It mainly focuses on the literal me1ning of words.
3. Consider the sentence: "The apple ate a banana".
4. Although the sentence is syntactically correct, it doesn't make sense because apples can't eat.
5. Semantic analysis looks for meaning In the given sentence.
6. It also deals with combining words Into phrases.
7. For example, "red apple" provides Information regarding one object; hence we treat it as a
single phrase.
8. Similarly, we can group names referring to the same category, person, object or organisation.
9. "Robert Hill" refers to the same person and not two separate names - "Robert" and "Hill".

IV) Disclosure Integration:


l. The meaning of any sentence depends upon the meaning of the sentence just before it.
2. Furthennore, it also bring about the meaning of immediately following sentence.
3. In the text, "Emiway Santai is a bright student. He spends most of the time in the library."
Here, discourse assigns "he" to refer to "Emiway Bantai".

V) Pragmatic Analysis:
1. Pragmatic is the fifth and last phase of NLP.
2. It helps you to discover the intended effect by applyinq a set of rules that characterize
cooperative dialogues.
,,J

3. During this, what was said is re-Interpreted er. what it truly meant.
4. It contains deriving those aspects of language which necessitate real world knowledge.
5. For example: "Open the book" is interpreted as a request instead of an order.

QS. Write short notes on challenges in NLP.

Ans:

Refer Ql.

CHALLENOESIN NLP:

NLP is a powerful tool with huge benefits, but there are still a number of Natural Language Processing
limitations and problems:

I) Contextual words and phrases ar.d homonyms:


l. The same words and phrases can have different meanings according the context of a sentence
and many words - especially in English - have the exact same pronunciation but totally
different meanings.
2. For example:

, • Handcrafted by BackkBenchers Community Page13of70


Chap-l ' lntroduct\on
- - - - tofmlll<.
we ranou

1 um to the store because eally quid<7
b. ething past you r
an I run som n down. ead the context of the senten
c. n,e house is looking really ru d because we r <::e ;ii'\
understan Cl~
3. These are eas~ for humans to
t definitions. of the definitions, differentiatin
understand all of the differen learned a 11 9 bet1.
I5 maY have "
4. And, while NLP language mode
blems. e but have different def ..
them in context can present pro unced the sam 1n1t10
S. Homonyms -two or more words that are prono h-tcrtext applications because they arel'"l't s '
. t' answering and speec - ~r
be prob\emat1c for ques ion
in text form. on problem for
humans. is even a comm
Usage of their and there, for example, ~

II) Synonyms: d rstanding because we use many d"


. . contextual un e lff
l. Synonyms can lead to issues s1m1lar to
words to express the same idea. h arne meaning, while some may b
nvey exactly t es e
2 Furthermore,someofthesewordsmayco le use synonyms to denot
. ) d different peop es
of complexity (small, little, tiny, minute an
different meanings within their personal vocabulary. f rd's possible mea .
. lude all o a wo n1ngs a
3. So, for building NLP systems, it's important to inc
possible synonyms.
· kes but the more relevant training dat
4. Text analysis models may still occasionally make mista '
receive, the better they will be able to understand synonyms.

111) Irony and sarcasm:


1. Irony and sarcasm present problems for machine learning models because they genera
words and phrases that, strictly by definition, may be positive or negative, but actually co1·1n
opposite.
2 t.1odels can be trained with certain cues that frequently accompany ironic or sarcastic phra
"yeah right," "whatever," etc., and word embedding's (where words that have the same
m have a similar representation), but it's still a tricky process.

IV} Ambiguity:
l. Ambiguity in NLP refers to sentences and phrases that potentially have two or more
interpretations.
2. There are Lexical, Semantic & Syntactic ambiguity.
3. Even for humans the sentence alone is difficult to interpret without the context of surroundi
4. POS (part of speech) tagging is one NLP solution that can help solve the problem,
somewh

V) Errors in text and speech:


l. Misspelled or misused words can create problems for text analysis.
2. Autocorrect and grammar correction applications can handle common mistakes, but don
und~rstand the writer's intention.

¥ Handcrafted by BackkBenchersCommunity
Chap·l I lntroductlon SEM - 8 I BE • COMPUTER

3. With spoken language, mispronunciations, different accents, stutters, etc., can be difficult for
a machine to understand.

4. However, as lan,,uage databases grow and smart assistants are trained by their individual users,
these issues can be minimized.

VI) Colloquialisms and slang;


1. Informal phrases, expressions, Idioms, and culture-specific lingo present a number of problems for
NLP.
2. Because as formal language, colloquialisms may have no "dictionary definition" at all, and
these expressions may even have different meanings in different geographic areas.
3. Furthermore, cultural slang is constantly morphing and expanding, so new words pop up every day.
4. For example: Santai

VII) Domain-specific langua~


l. Different businesses and industries often use very different language.
2. An NLP processing model needed for healthcare, for example, would be very different than
one used to process legal documents.
3. These days, however, there are a number of analysis tools trained for specific fields, but
extremely niche industries may need to build or train their own models.

VIII) Low-resource languages:


1. Al machine learning NLP applications have been largely built for the most common, widely
used languages.
2. And it's downright amazing at how accurate translation systems have become.
3. However, many languages, ocpcclaliy those spoken by people with less accessto technology
often go overlooked and under processed.
4. For example, by some estimations, (depending on language vs. dialect) there are over 3,000
languages in Africa, alone.
5. There simply isn't very much data on many of these languages.

IX) Lack of research and development


l. Machine learning requires A LOT of data to function to its outer limits - billions of pieces of
training data.
2. The more data NLP models are trained on, the smarter they become.
3. That said, data (and human language!) is only growing by the day, as are new machine
learning techniques and custom algorithms.
4. All of the problems above will require more research and new techniques in order to improve on
them.

• Handcrafted by BackkBenchers Community Pagels of 70


t
--------- sEM - a I BE • co

~, I

f rnatlo11·
Q1. Describe types of word or

An~:

TYPES Of: WOBD_FOR1'!1AT10ti: YI wordt.


. . of creating ne
Word formation rs the process I< of language. d
t>Uildin9 bloc sed of wor s.
:: Words are the fundamental ltten is cornpO
signed orwr . tlon and Compounding
3. Every human language. spoken, . Inflection, oenva .
matron. l.e,
4. There are three types of wor d r

mflection: f on in which a word is modified t


I) t wordI forrna o e~
1. In morphology, inflection is a process o voice, aspect, person, number, gend Pr<:-;,,
. h as tense, case, er, fh_
different grammatical categories sue ··'Ill:,
and definiteness.
2 Nouns have simple inflectional morphology. . b loW here an affix marking plur 1
. li1sh are given e ' a.
3. Examples of the inflection of noun in Eng
a. Cat (-s)
b. Butterfly (-lies)
c. Mouse (mice)
d. Box (-es) oun to indicate .its possessor.
4. A possessive affix is a suffix or prefix attache d to a n
5. Verbs have slightly more complex inflectional, but still relatively, simple inflectional morphology_

6. There are three types of verbs in English.


a. Main Verbs - Eat, Sleep and Impeach
b. Modal Verbs - Can will, should
c, Primary Verbs - Be, Have, Do
7. In Regular Verbs, all the verbs havethe same endings marking the same functions.

B. Regular verbs have four morphological form.


9. Just by knowing the stem we can predict the other forms.
10. Example:
Morphological Forms of Regular Verbs:
Stem Talk Urge Cry
-s form Talks Urges Cries
-ing form Talking Urging Crying
Morphological Forms of Irregular Verbs·
Stem Eat Think
Put
-s form Eats Thinks
Puts
-ing form Eating
Thinking Putting

• Handcrafted by BackkBenchers Community


Page.16of 70
Chap-21Word Level Analysis
SEM - 81 BE • COMPlTTER

II) Derivation;
l. Morphological derivation Is the process of forming a new word from an existing word, often by

a':lding a prefix or suffix, such as un- or -ness.


2. For example, unhappy and happiness derive from the root word happy.
3. It is differentiated from Inflection, which Is the modification of a word to form different grammatical
categories without changing Its core meaning: determines, determining, and determined are from

the root determine.


4. Derivational morphology often Involves the addition of a derivatlonal suffix or other affix.
s. Examples of English derivatlonal patterns and their suffixes:
a. adjective-to-noun: -ness (slow -1 slowness)
b. adjective-to-verb: -en (weak -1 weaken)
c. adjective-to-adjective: -ish (red -1 reddish)
d. adjective-to-adverb: -ly (personal -1 personally)
e. noun-to-adjective: -al (recreation -1 recreational)
f. noun-to-verb: -fy (glory -1 glorify)
g. verb-to-adjective: -able (drink -1 drinkable)
h. verb-to-noun (abstract): -ance (deliver -1 deliverance)
i. verb-to-noun (agent): -er (write -1 writer)

Ill) Compounding;
l. Compounding words are formed when two or more lexemes combine into a single new word.
2. Compound words may be written as one word or as two words joined with a hyphen.
3. For example:
a. noun-noun compound: note+ book -1 notebook
b. adjective-noun compound: blue+ berry- blueberry
c. verb-noun compound: work+ room .. workroom
d. noun-verb compound: breast+ feed -1 breastfeed
e. verb-verb compound: stir+ fry -1 stir-fry
f. adjective-verb compound: high + light-+ highlight
g. verb-preposition compound: break+ up » breakup
h. preposition-verb compound: out + run -+ outrun
i. adjective-adjective compound: bitter+ sweet » bittersweet
j. preposition-preposition compound: in +to .. into

Q2. Write short notes on Finite Automata.

Ans;

FINITE AUTOMATA:

l. An automaton having a finite number of states is called a Finite Automaton (FA) or Finite State
Automata (FSA).

• Handcrafted by BackkBenchers Community Pagel7 of70


I I I

11 ... t;I 1 r llf'Q ~ ~

r, t ,., t 1 r 1111r 1 C'~t

'· "" r~r"I I cwn 1

(J S
In Tl
tuple (Q, r, fi, QtJ, F), where_
d bY a 5-
'" ,. c n be r@Pf' seri t
8
utornaton.
bet of the a
c:.al!cd the alpha
l. 1 a nit af sym I s,

c ; i the tr nsnion function. . tis processed (Qo E Q).


where any inpll
d QC" i~the initi~I state from
0 fQ (F c Q).
F rs a set of final state/states

TYPESO FJNJIE-FATEAUTQMATIOtUES.Afi

1) ~Drministle Finite a,,;tomatlon f . bol


PFAl: . tor every i put sym we can det
. mation wherein, errr,,r,Q
It may be defined asthe type of finite auto .

the state to which the machine will move. • . lled Deterministic Finite Aut
It has a finite number of states that is why the machine is ca
0 "1aton
2..
(OFA).
3. Deterministic refers to the uniqueness of the computation.
.. . f m the current state to the next state.
4. In DFA, there is only one path for specific input ro . .
. h DFA cannot cl'iange state without any input charact
S. OFA does not accept the null move, 1.e., t e er.
6. DFA can contain multiple final states.
7. It is used in Lexical Analysis.
8. Mathematically, a OFAcan be represented by a 5-tuple (Q, !, 6, Qo, F), where
- a. Q is a finite set of states.
b. ! is a finite set of symbols, called the alphabet of the automaton.

c. o is the transition function where ll: Q M I:-+ Q.


d. Qo is the initial state from where any input is processed (qo E Q).
e. F is a set of final state/states of Q (F ~ Q).
9. Whereas graphically, a DFA can be represented by diagraphs called state diagrams where -
a. The states are represented by vertices.
b. The transitions are shown by labelled arcs.
c. The initial state is represented by an empty incoming arc.
d. 1 he final state is represented by double circle.
10. Example of DFA:
Design a FA with l = (O, 1) accepts those string which starts with 1 and ends with O.

------------
• Handcrafted by BackkBenchers Community ~~~~~~~~~~~--------- • Page
18
of 70
Chap-2 I Word Lwet An1tyll1 SIM ~ 11 Bl • COMPllTEA

Th FA wlll hftv A ~ttu1 s1"ta q11 from which Ol'lly t.h• AdQfl with Input I wlll QO to fh4 "" t ~ :> "·
Cl
Cl

tn state q,, ·if we read 1, we will be in state <11. but if we q,, we wlll reach to state
read 0 at state
Q1 which is the final state. In state<h, ifwe teadeltherO orl, we will g0to(11 state or q, stat•

respective..,,. Note thllt if the input ends with 0, it will be In the final state.
ll) Non-dtttrminlttk F'lntt. Automation (HFAli

1. It may be defined as the type of finite automation where for every inp1:Jt symbol we cannot
determine the state to which the machine will move i.e. the machine can move to arrt combination
of the states.
2. It has a finite number of states that is why the machine is called Non-Deterministic Finite
Automation (NFA).
3. Every NFA is not DFA, but each NFA can be translated into DFA.
4. .Mathematically, NFA can be represented by a S·tuple (Q, l:, 6, qO, F), where -
a. Q is a finite set of states.
b. l: is a finite set of symbols, called the alphabet of the automaton.
c. 6: is the transition function where 6: Q x I: -t 2 Q.
d. qo: is the initial state from where any input is processed (qO E Q).
e. F: is a set of final state/states of Q (F c
Q).
5. Whereas graphically (same as DFA), a NFA can be represented by diagraphs called state
diagrams where -

a. The states are represented by vertices.


b. The transitions are shown by labelled arcs.
c. The initial state is represented by an empty incoming arc.
d. The final state is represented by double circle.
6. Example:

Design an NFA with :L = {O, 1} accepts all string ending with 01.

Solution: Anything either O or 1. Hence, NFA would be:

0,1

0
r------tl q, r-------tl

• Handcrafted by BackkBenchers Community


Page19 of7Q
SEM - 8 l BE _ <::o
h•~2 I W()td l•v I Analys.is ---- ------ ~S)\rt
\~
. l(tuc•rt (FSll·
Q3. Wf'tt. short notM on rtntte state TlAf\

f:lNtTE Sl'ATE TRANSO~ERS:


. achine with two memory tapes: an inplJt
1· h. Finite-state transducer (FS~ ts a finite-state rn tap~~
t
an output tape .
· nditloned on a pa i of symbols.
"is a finite state machine where transitions are co .
. e and generates a set of relations on th
3. Al'\ PST will ~d a set of strings on the input tap e outD
tape.
4
b tween strings in a set.
An F'ST can be thought of as a translator or relater e
· putting a string of letters into the FST
s. '"' morphological parsing, an example would be in the
would then output a string of morphemes.
FSTs are useful in NLP and speech recognition.
7. =
A Finite State Transducer (Fsn is a 5-tuple T (Q, 'E, r, Ii, s, Y) where
a. Q is a finite set of states
b. I: is a finite set of input symbols
c. r is a finite set of output symbols
d. & Q " l: -+ Q is the transition function
e. s EQ is the start state.
f. Y: Q -+ r • is the output function.
8. The FST is a multi-function device, and can be viewed in the following ways:
a. Translator: It reads one string on one tape and outputs another string.
b. Recoanizer. It takes a pair of strings as two tapes and accepts/rejects based on their mat
c, Generator. It outputs a pair of strings on two tapes along with yes/no result based on w
they are matching or not.
d, Relater. It computes the relation between two sets of strings available on two tapes.
9. Example:
a:ab:b

babba ..-+ babbb


aba -4 bbb
aba -4 abb

CLOSURE PROPERTIES OF FINITE STATE TRANSDUCERS:

I) Union:
1. The union of two regular relations is also a regular relation.
2 If T, and T2 are two FSTs,there exists a FST T1 U T2 such that IT1 U T21 = IT1 I U I T2I
II) Inversion:
l. The inversion of a FST simply switches the input and output labels.
2. This means that the same FSTcan be used for both directions of a morphological proce
3. If T = (Ll; 1:2, Q, i.F, E) is a FST, there exists a FSTT-1 such that I T-11 (u) = { v E i:• I u EI T
(v

• Handcrafted by BackkBenchers Community Page



Chap-2 I Word Level Analysis SEM - 81 BE • COMPUTER

Ill) Composition:
l. If T, is a FST from h too, and T1 Is a FST from o, to o, then composition of T, and T2 (T, o T1)
maps from 1,to02•
2. If T, ls a transducer from h too, and T2 is a transducer from o, to 02, then T1 o T2 maps from 11 to
02
3. So the transducer function Is: (T, o T2) (x) = T, (T2 (x))

Q4. Explain N·Oram Model

Ans:

. N-CiRAM MODEL;

l. N-gram can be defined as the contiguous sequence of 'n' items from a given sample of text or
speech.
2. The items can be letters, words, or base pairs according to the application.
3. The N-grams typically are collected from a text or speech corpus.
4. Consider the following example: "I love reading books about Machine Learning on BackkBenchers
Community"
5. A 1-gram/unigram is a one-word sequence. For the given sentence, the unigrams would be: "I",
"love", "reading", "books", "about", "Machine", "Learning", "on", "BackkBenchers", "Community"
6. A 2-gram/bigram is a two-word sequence of words, such as "I love", "love reading" or
"BackkBenchers Community".
7. A 3-gram/trigram is a three-word sequence of words like "I love reading", "about Machine
Learning" or "on BackkBenchers Community"
8. An N-gram language model predicts the probability of a given t·!-gram within any sequence of
words in the language.
9. A good N-gram model can predict the next word in the sentence i.e. the value of p(wlh) - what is
the probability of seeing the word w given a history of previous word h -where the history
contains n-l words.
10. Let's consider the example: "This article is on Sofia",we want to calculate what is the probability
of
the last word being "Sofia" given the previous words.
P (Sofia I This article is on)
11. After generalizing the above equation can be calculated as:
P (ws I W1, W2, W3, W4) or P(W)
= P (Wn I W1, W2, ..... Wn)
12. But how do we calculate it? The answer lies in the chain rule of probability:
p (A I 8) = p (A, 8) Ip (B)
p (A, 8) = p (A I B) p (8)
13. Now generalize the above equation:
P (X1, X2 .Xn) = P (X1} P(X2 J X1} P (~ I X1, X2} .... P {Xn I X1, X2, .... Xn}
1 ••

• Handcrafted by BackkBenchers Community • Page 21 of 70


Chap·2 I Word Lovel Analysis
~--~ w :'T1'1P(W1IW1W2, ..... Wn)
P (w , W2W1 .... · nJ tlons:
. . Marl<oV assumP
14.. S1rnplifying the above formula using ) i:: p (Wi I w1.k, ... W1.1)
p (w1I W1t W2 ...... W1-1
15. For unigram:
p (W1Wl1 "" W
) ::: 'fTi P (W1)
n

16. For Bigram:

Ans:

N·CRAM MODEL FOR SPELLING CORRECTION:


l. Spe II . correction consist of detecting
. and correcting errors.
in
Error detection is the process of finding the misspelled word
3. E rror correction is the process of suggesting· t words to a misspelled word.
correc
4. •
S pel hng • 11 word is pronounced in the sa
errors are mainly phonetic , where the m1sspe- rne 'NQ,,
, asth~
correct word.
S. The spelling errors belong to two categories named non word errors and real world errors.
6. When an error results in the world that does not appear in a given lexicon or is not a .
Valid
orthographic word form it is known as a non-word error. '
7. The real world error result in actual words of the language it occurs because of the typograpL.·
lllCaJ
mistakes or due to spelling errors.
8 Then-gram can be used for the both non word and real world errors detection because in En
9 1.ISh
alphabet certain bigram or trigram of letters never occur or rarely do so.
9. For example, the trigram 'qst ' and bigram 'qd' this information can be used to handle non
.
wo~
error.
10. N-gram technique generally required a large corpus or dictionary as training data so that an n gram
table of possible combinations of letter can be compiled.
11. N gram uses chain of custody rule as follows:
P(s) = P(W1 W2 W3 ... Wn)
= P{w1) p {w2/W1) p (W3/W1W2) W1P (W3/W1W2W3) p (W3/ W1W2W3 ... Wn-1)
= rrr=1 P (w1 I h1)
Example:

Training set:
l. The Arabian Nights
2. These are the fairy tales of the east
3. The stories of the Arabian Nights are translated in many languages.

Bi-gram Model:
P{the,<s>) = 0.67

• Handcrafted by BackkBenchers Com~unity Page 22 of 70


SIM - 11 II · <:OMP\lfP
---
P (a l • 1O
P~•I ity) II l.O
P(us\Jthe) O.l
P(ue/knight~) " l.O
P(maeyfin) "' l.O
P(languagl"!Jman ) : l.O
PlArabiarv\he) " 0.4
P(th are) : O.S
P( f.t\ales) : l.O
p (stories.-\he) : 02
p (translated/are) = 0.5
P(Knights/Arabian): 1.0
P(fairy/the) : 0.2
P(the/of) = 1.0
P(of/stories)= 1.0
POn/translated) = 1.0

Test Sentence(s):
The Arabian Nights are the fairy tales of the
east

P(The/<s>) x P(Arabian/the) xP(Knights/Arabian) x P(are/knights) x P(the/are) x P(fairy/the)P(tales/fairy)


x P(of/tales) x P(the/of) x P(east/the)
= 0.67 x 0.5 x 1.0 x 1.0 x 0.5 x 0.2 x 1.0 x 1.0 x 1.0 x
02
= 0.0067
12 Then-gram model suffers from data sparsenessproblems,
13. The n-gram that does not occur in the training data is assigned zero probability.
14. Show the large corpus even have many zero entries in its bigram matrix.
15. The smoothing techniques are used to handle the data sparsenessproblem.
16. Smoothing is generally referring to the task of re-evaluating zero probability or low probability n•
grams and assigning them non zero values.

• Handcrafted by BackkBenchers Community


• Page 23of70
SEM-81 BE-c:
o~S)
Chap-21 Word Level Anelysil ~
-- - T porter St•"'"'•'
Q6. Explain Lexicon Free FS

Ans:

LEXICOtof FRE_E_~St P_ORTER_ST~MER: er Algorithm Liker 1orpho-analyZers, stel1i


· h is the port ll)~.. .
algo rit rn ~ tq
1. The most famous stemmer . 0 lexicon. r1
rs but tt has n h Engine
be seen as cascaded transduce ncatlons and searc •
2. They are used in Informational Retrieval APP . troduce errors because they do not \Jsea I
t but theY maY m ~lti~
3. Stemming algorithms are efficlen ll~.
. I scade rules -
4. It is based on a series of s1mpe ca
a. ATIONAL 7 ATE (relational 7 relate)
b. ING 7 E (motoring 7 motor)
c. SSES7 SS (grasses 7
grass)
5. Some errors of commission are:
a. Organization - Organ
b. Doing - Doe
c. Generalization - Generic
6. Some errors of omission are:
a. European - Europe
b. Analysis - Analyzes
c. Noise - Noisy

PORTER ALGORITHM EXAMPLE:

For words like: falling, attaching, sing, hopping etc.

Step 1:
If the word has more than one syllab and end with 'ing'
I Remove 'ing' and apply the second step

Step 2:
If word finishes by a double consonant (except l S Z)
Transform it into a single letter

Example: falling -+
fall attaching -+
attach sinq » sing
hoppinq » hop

• Handcrafted by BackkB~nchers Community Pam~ '24


Ch1p·2 I Word Level Analysis SEM - 8 I BE • COMPUTER

Q7. Explain Morphological Parsing with FST.

Ans:

MQRe.tLOJ.~AL PARSIN.O WITH FSI;

1. Morphological parsing means breaking down words Into components and building a str rctured
representation.
2. The objective of the morphological parsing is to produce output lexicons for a single input lexicon.
J. Example:
a. Cats 7 cat + N + PL
b. Caught 7 catch + V + Past
4. The above example contains the stem of the corresponding word (lexicon) in first column, along
with its morphological features like +N means word is noun, +SG means it is singular, +PL means
it is plural, +V for verb.
5. There can be more than one lexical level representation for a given word.
6. Two level morphology represents a word as a correspondence between a lexical level, which
represents a simple concatenation of morphemes makinqup a word, and the surface level, which
represents the actual spelling of the final word.
7. Morphological parsing is implemented by building mapping rules that map letter sequences like
cats on the surface level into morpheme and features sequences like
I
cat +N +PL on the lexical level.
8. Figure 2.1 shows these two levels for the word cats.

Lexical

Surface
Figure 2.1
9. The automaton that we use for performing the mapping between these two levels is the finite-
state transducer or FST.
10. A transducer maps between one set of symbols and another; a finite-state transducer does this via
a finite automaton.
11. Thus we usually visualize an FST as a two-tape automaton which recognizes or generates pairs of
strings.
12. The FST thus has a more general function than an FSA; where an FSA defines a formal language by
defining a set of strings, an FST defines a relation between sets of strings.
13. This relates to another view of an FST; as a machine that reads one string and generates another,
Here's a summary of this four-fold way of thinking about transducers:
a. FST as recognizer: A transducer that takes a pair of strings as input and outputs accept if the
string-pair is in the string-pair language, and a reject if it is not.
b. FST as generator: A machine that outputs pairs of strings of the language. Thus the output is a
yes or no, and a pair of output strings.
c. FST as translator: A machine that reads a string and outputs another string,
d. FSTas set relater: A machine that computes relations between sets.

• Handcrafted by BackkBenchers Community • Page25of70


Chap·3 I Syntax Analysis SEM - 81 BE • C()Mp'1flt.

c YSIS

Ql. Explain Part of Spe ch (POS) T gglng.

Ans:

PARI:Of:·i~ECH (P<>S) TACiOlllfCi:


l rt-of· pe h tagging is the prOCeH of.assigning 1 part·of·1petch or other lexical class rnlr~
to Heh word In a corpus.
2. Tags ali! also usually applied to punctuation markers; thus tagging for natural language is the
sal'l'lt process as tokenlzatlon for computer languages, although tags for natural languages are
muc11 more ambiguous.
3. The input to a tagging algorithm is a string of words and a specified tagset.
4. The output is a single best tag for each word.
S. For example, here are some sample sentences from the Airline Travel Information Systems (ATIS)
corpus of dialogues about air-travel reservations.
6. For each we have ~hown a potential tagged output using the Penn Treebank tagset

VB DT NN
Book that flight .
VBZ DT NN VB NN ?
Does that flight serve dinner ?

7. Even in these simple examples, automatically assigning a tag to each word is not trivial.
8. For example, book is ambiguous.
9. That is, it has more than one possible usage and part of speech.
10. It can be a verb (as in book that flight or to book the suspect) or a noun (as in hand me that book,
or a book of matches).
11. Similarly, that can be a determiner (as in Doesthat flight serve dinner), or a complementizer (as in I
thought that your flight was earlier).
12. The problem of POS-tagging isto resolvethese ambiguities, choosing the proper tag for the context.
13. Most of the POS tagging falls under Rule Base POS tagging, Stochastic POS tagging and
Transformation based tagging.

METHODS:

I) Rule-based POS Tagging:

1. One of the oldest techniques of tagging is rule-based POS tagging.


2. Rule-based taggers use dictionary or lexicon for getting possible tags for tagging each word.
3. If the word has more than one possible tag, then rule-based taggers use hand-written rules to
identify the correct tag.
4. Disambiguation can also be performed in rule-based tagging by analysing the linguistic
featuresof a word along with its preceding as well as following words.
5. For example, suppose if the preceding word of a word is article then word must be a noun.

• by BackkBenchers Community Page26 of70


Handcrafted

chap-J I Syntax Analysis

6. As the name sug~ - - S


s, all such kind
of rules. of lnfol'Ttlntlon In rul E
7. These rules may b elth r
a. Context-patt•m rules
b. Or, as Regular e
-ba d POS t qlng Is codod In th• for
XJ>resslon complled I
ambiguous s nt nto finite-state automata In'
ence representation. , r
s
DW·stagt 1rcJlltKturt of Rul b· . .
- -• -•d eos T1g9fng;
1. In the first stage, it uses a di t'
2. In the second stage 't c ronaryto assign each d I'
wor a r o
frx
' s
'a
'
"
e
T
h

T
.
h
T
We have some limited n b -
um er of rulesapproximately
a
Smoothing and language m d 11• . •
o e rng r defined explrcitly i
n
I

l
.
2

a
.

t
h

b
.•

P
This POS .
.
b
a
s
e
d
o
n
t
h
e
p
r
o
b
a
b
i
l
i
t
y
o
f
t
a
g
o
c
c
u
r
r
i
n
g
.
t
a
g
g
i
n
g
I
S

P
- -•-•,aonchers.Community
SEM - 8 f BE • COMP~
Chap-3 I Syntax Anarysis
--------·----
Ther would be no probability f r the words that do not exist In the corpus.
ft uses diff r nt testing orpus (oth r than training corpus).
It is the implest tag in be ause it hoo es most frequPnt tags associated with a word
training c rpus.

Ill) Transformation-bas· d T g Ing:

1. Transformation based tagging is also alled Brill tagging.


2. It is an instan e of the transf rmati n-based learning (TBL), which is a rule-based algorithm
automatic tagging of POS to the given text.
3. TBL, allows us to have linguistic knowledge in a readable form, transforms one state to another st
b using transformation rules.
4. It draws the inspiration from rule-based and stochastic.
5. If we see similarity between rule-based and transformation tagger, then like rule-based, it is
a based on the rules that specify what tags need to be assigned to what words.
6. On the other hand, if we see similarity between stochastic and transformation tagger then l
stocbestic, it is machine learning technique in which rules are automatically induced from data.

Advantages of Transformation-based Learning ITBL)


We learn small set of simple rules and these rules are enough for tagging.
Developrnant as well as debugging is very easy in TBL because the learned rules are easy t
understand.

Disadvantages of Transformation-based Learning ITBL)


Transformation-based learning (TBL) does not provide tag probabilities.
• In TBL, the training time is very long especially on large corpora .

Q2. Explain CFG.

Ans:

1. CFG stands for Context Free Grammars.


2. CFG's are al~ called phrase-structure grammars.
3. CFG is equivalent to Backus-Naur Form (BNF).
4. CFG's are powerful enough to describe most of the structure in natural languages.
5. CFG's are restricted enough so that efficient parsers can be built.

6. CFG is a notation for describing languages and a superset of Regular grammar.


7. Context free grammar is a fo1 mal grammar which is used to generate all possible strings in a
given
formal language.
8. Context free grammar G can be defined by four tuples as: G= (V, T, P, S)
9. Where,
G describes the grammar
T describes a finite set of terminal symbols.

_., Handcr.afted by BackkSenchers Community Page2sof70


r ct1ap·l_l_sy_n_ta_x_A_n_a_ly_si_s
V describes a finite set of non-terminal
. ---~M - 8 I BE· COMPUTER

symbols p describes a set of production rules

5 is the start symbol.


sing the ways the
A context-free grammar consists of a set of rules or productions, each expres
10.
symbOls of the language can be grouped together, and a I' xlcon of words.
-Here are some rules for our noun phrases
11.
a. NP~ Det Nominal
b. NP ~ ProperNoun

c. Nominal ~ Noun I Nominal Noun


Together, these describe two kinds of NPs.
12.
a. one that consists of a determiner followed by a nominal
b. And another that says that proper names are NPs.
c. The third rule illustrates two things: An explicit disjunction and A recursive

13. definition. The symbols that are used in a CFG are divided into two classes.
14.
The symbols that correspond to words in the language ('The', 'BackkBenchers') are called terminal
symbOls.

15. The symbols that express clusters or generalizations of these are called as nonterminal symbols.

16. In each context free rule, the item to the right of the arrow (~) is an ordered list of one or more
terminals and nonterminal. ,

1.7 While to the left: of the arrow is a single nonterminal symbol expressing some cluster or
generalization.
18. A CFG is usually thought of in two ways: as a device for generating sentences, or as a device
for assigning a structure to a given sentence.

Q3. Wr~te short notes on tagsets for English

Ans:

TACSETS FOR ENGLISH:

1. There are a small number of popular tagsets for English, many of which evolved from the 87-tag
tagset used for the Brown corpus.
2. Three of the most commonly used are the small 45-tag Penn Treebank tagset, the medium-sized 61
tag CS tagset, and the larger 146-tag Cl tagset;
3. The Penn Treebank tagset has been applied to the Brown· corpus and a number of other corpora.
4. The Penn Treebanktagset was culled from the original 87-tag tagset for the Brown corpus.
5. This reduced set leaves out information that can be recovered from the identity of the lexical item.
6. For example, the original Brown tagset and other large tagsets like CS include a separate tag
for each of the different forms of the verbs do, be, and have.
7. These were omitted from the Penn set.
8. Certain syntactic distinctions were not marked in the Penn Treebank tagset because Treebank
sentences were parsed, not merely tagged, and so some syntactic information is represented in
the Phrase structure.

- Handcrafted by BackkBenchers Community Page29 of70


Chap·l I Syntax Analysis_ -- _ _ __ SEM - 81 BE • C~p~
For e ampl , prep sltlons arid · ubordinating onjunctions were combined into the
si ngle ta
sin th tr - tru ure of th snten e disambiguated them. 91~
'
I rm·lrllon 1o:u011llt l'ft_a - - flt•< ripllnn F'.umplr
(~Nil' n l ' - ;11ncli
l '~rd1n11I n11n1N:1
n /11117, ~,;,,
onr, rwn. thrt?I!
or ~- i s. mhol
TO "to"
~~,-
to
l"ktl-rmlncr o, the Ull Interjection air. oop«
~~l!rtcntinl 'thm' VH Verb. M'«' form ear
rmT'l,n m ''"''"
mrn t1lr11r vrm Vrrl>. rm•t trn.;e 111e
P"Tf'Sit l'n/i ub-conj of i1t, hy VIH.1 Vr.rh, g~mn<I entlng
djt'Cll\.'C yel/1rw f.i! BN Vcm. P"~' rml ·iplc eaten
dt ' romr•~tiv~ lRCr VRP Verh, non·hg pre1 ear
Adj,, !0Upcrl11ti"e 11•/Jerl VBZ Verb, 311 pm eat»
li•t item marker I, l, One WOT Wh-dctcrmincr which. lhol
Modnl rn1t, slr01tld WP Wh-prOllOlln what. who
oun, sinJ!. or mn.~s flnma WPS PosJt"ssh·c wh- 1rhose
NNS Noon, plUl'll llamas WRB Wh-sdvrrb how, where
~NP Pmpc1 nt111n. 5.ingular IBM S Dollar sign S
NNPS rn.,:icr noun, plural CarofiMs # Pound 1ign ii
PDT lmktmnintT all, holh Left quote ('or")
f'OS Possessive mding S Righi quote ('or")
PP Pcnonal pronoun /, )~'"· he Left parenthe~is ([.(. (. <)
PP$ Possessive rronotm your, ones Right p11rent.hesii ( ]. ). }. >)
RB AdYCrb quirkl)i. n<.'\'tr , Comma
R BR Ad\'crb. comperative faster Sentence-final JlllrK: (. ! '!)
RBS Ad\'Crli. supc"rl11i\·c fastest Mid-sentence punc (:; ... - -)
RP Pa11iclc 11p. off
Penn Treebank Part-of-Speech Tags (Including Punctuation)

Q4. ExplainParsing.

Ans:

Note: We have explained the below answer in detail for clear understanding. While writing in exam.Cut

short it as per your understanding

PARSING:

1. Parsing in NLP is the process of determining the syntactic structure of a text by analysing its
constituent words based on an underlying grammar (of the language).
2. In syntactic parsing, the parser can be viewed as searching through the space of all possible
parse trees to find the correct parse tree for the sentence.
3. Consider the example "Book that flight"
4. Grammar:
S -t NP VP Del -t that I this I a
S -t Aux NP VP Noun -t book I flight I meal I money
S -t VP Verb -t book include prefer
J J

NP -7 Det Nominal Aux -t does


Nominal -t Noun
Nominal -t Noun Nominal Prep -t from I to I on
NP -t Proper-Noun Proper-Noun -t Houston I TWA
VP -t Verb
VP -+ ~erb NP Nominal -t Nominal PP

Figure 3.1

M u~nr1,..r~~orl hv R::11r.kkBenchers Communitv Paae 30 of10


chap-3 I Syntax Analysis
SEM - 8 I BE - COMPUTER

s. parse Tree:
---------

I
\.I

Am
N

Ve th l>ct Noun
I
Book I I
thnt flight

Figure 3.2

Bf!.EVANCE OF PARSING IN NLP_;

1 parser is used to report any syntax error.


2. It helps to recover from commonly occurring error so that the processing of the remainder of
program can be continued.
3. Parse tree is. created with the help of a parser.
4. Parser is used to create symbol table, which plays an important role in NLP.
s. Parser is also used to produce intermediate representations (IR).

TYPES OF PARSING:

I) Too Down Parsing:

s s s
-<.
NP VP
~
AUX NP VP
I
VP

s s s s s s
-<.VP NP»<.VP
NP Aux
~
NP VP Aux NP VP VP
~II VP
-<;
Det Norn
I
PropN Det
-<:Norn I~/
PropN V NP V

Figure 3.3: Tep-down parsing example


1. A top-down parsing is goal oriented.
2. A top-down parser searches for a parse tree by trying to build from the root node S down to the
leaves.
3. Let's consider the search space that a top-down parser explores, assuming for the moment that it

builds all possible trees in parallel.


4. The algorithm starts by assuming the input can be derived by the designated start symbol S.
5. The next step is to find the tops of all trees which can start with S, by looking for all the grammar

rules with Son the left-hand side.

• Handcrafted by BackkBenchers Community _ Page 31 of 70

--
SEM-81BE ,
Chap-J I Syntax Analysis -co~S)~
6. In the grammar in Figure 3.1, there are three rules that expand S, so the second Ply, or leiv
. e1, ()f
sear h spa in Figure .3 has thr e partial trees. l\
7. We next expand the onstituents in these three new trees,just as we originally expanded S

8. The first tree tells us to expect an NP followed by a VP, the second expects an Aux foll ·
owed b
NP a id a VP, and the third a VP by itself. ~ii~

9. To fit the search space on the page, we have shown in the third ply of Figure 3.3 only th
resulting from the expansion of the left-most leaves of each tree. e tr~
10. At each ply of the search space we use the right-hand sides of the rules to provide n
fi!'w s~
e pectations for the parser, which are then used to recursively generate the rest of the t '()f
rees
11. Treesare grown downward until they eventually reach the part-of-speech categories at th ·
e bott
of the tree. ~
12. At this point, trees whose leaves fail to match all the words in the input can be rejected
, leavi
behind those trees that represent successful parses. l"lg
13. In Figure 3.3, only the 5th parse tree will eventually match the input sentence Book that fl"
'9ht.
Problems with the Top-Down Parser:
Only judge's grammaticality.
Stops when it finds a single derivation.
No semantic knowledge employed.
No way to rank the derivations.
Problems with left-recursive rules.
Problems with ungrammatical sentences.

11) Bottom Up Parsing:

Book that flight


Noun De: No1m Verb Det Noun
I
Book
I
that flight
I I I I
Book that flight

NOM NOM NOM


I
Noun Det
I
Noun I
Verb 0..'I Noun
I I I I I I
Book that ftight Book that flight

NP

A M
NP

NO M
I I ~
m.l VP
I
NOM
I
M
Noun Det Noun Verb Det Noun V.,rb Der Noun
I I I I I I I I I
Book tlw ftighl Book. that Hight Book tlw flight

.>
V P

.
VP NP

Verb
I I
I Der
A Noun
I
Verb
I
Oct
I
Noun
I
Book that •light Boo!. that ftighr

Figure 3.4: Bottom-up parsing example

•Handcrafted by BackkBenchers Community Page32of7


chap-3 I Syntax Analysis

1.
Bottom-up parsing is data di
Bottom-up parsing is the ea
• rected. - SEM - 8 I BE - COMPUTER

2. r 1est
1• known .
common for computer la Parsing algorithm, and I u ed in the shift-reduce parsers
nruages.
In bottom-up parsing, the Par
3. ser starts 'th
words up, again by appl . WI the words of the input, and tries to build trees from the
Yin£ rules from the ra .
The parse is successful 'f h g mmar one at a time.
4. 1 t e parser su
covers all of the input. cceeds In building a tree rooted In the start symbol S that

5 Figure 3.4 show the bottom-u se


· b . b p arch space, beginning with the sentence Book that flight.
6 The parser egins y looking u h
· .
1
• p eac word (book, that, and flight) In the lexicon and building three
part1a trees with the part of spee hf
c or each word.
7 · But the word book is ambiguous·' it can bea noun or a verb.
8 Thus the parser must consider two po 'bl
· s e sets of trees.
The first two plies in Figure 3 4 h h' . . . . .
.9 · s ow t ts of the search space.
in1t1al b1furcat1on
10. Each of the trees in the second ply are then expanded.

11. In the parse on the left (the one in Which book is incorrectly considered a noun), the Nominal! Noun
rule is applied to both of the Nouns (book and flight).

12. This same rule is also applied to the sole Noun (flight) on the right, producing the trees on the
third ply.

13. In ~eneral, the parser extends one ply to the next by looking for places in the parse-in-progress
where the right-hand-side of some rule might fit.

14. This contrasts with the earlier top-down parser, which expanded trees by applying rules when their
left-hand side matched an unexpanded nonterminal.
15. Thus in the fourth ply, in the first and third parse, the sequence Det Nominal is recognized as
the right-hand side of the NP! Det Nominal rule.
16. In the fifth ply, the interpretation of book as a noun has been pruned from the search space.
17. This is because this parse cannot be continued: there is no rule in the grammar with the right-hand
side Nominal NP.
18. The final ply of the search space (not shown in Figure 3.4) is the correct parse tree (see Figure 3.2).

Problems with bottom-up parsing:


Unable to deal with empty categories: termination problem, unless rewriting empties as
constituents is somehow restricted
Inefficient when there is great lexical ambiguity.
Conversely, it is data-directed: it attempts to parse the words that are
there. Repeated work: anywhere there is common substructure.

Page33of70
- Handcrafted by BackkBenchers Community
SEM - 8 I BE • COJ.i
Chap-J I Syntax Analysis

~
QS. Describe Sequence Labeling.

Ans:

SEQUENCE LABELINC:

l. Sequence labeling i a type of pattern recognition task.


2. It is a typical NLP task which assigns a class or label to each token in a given input sequence.
3. In this ontext, a single w rd will be referred to as a "token".
4. These tags or labels can be used in further downstream models as features of the token
enhance search quality by naming spans of tokens.

5. In question answering and search tasks, we can use these spans as entities to specify our
.
query (e.g .. ,. "Pia , a movie by Tom Hanks") we would lrke to label words such as: [Play, movie

Hanks].

6. With these part rem ved, we can use the verb "play" to specify the wanted action, the word"
r
to specify the intent of the action and Tom Hanks as the single subject for our search.
T do this, we need a way of labeling these words to later retrieve them for our query.
8. A common example of a sequence labeling task is part of speech tagging, which seeks to a
a part of speech to each word in an input sentence or document.
9. There are two forms of sequence labeling are:
a. Token Labeling: Each token gets an individual Part of Speech (POS) label and
b. Span Labeling: Labeling segments or groups of words that contain one tag (Named E
Recognition, Syntactic Chunks).

Q6. Write short notes on Hidden Markov Model.

Ans:

HIDDEN MARKOV MODEL (HMM):

1. Hidden Markov models (HM Ms) are sequence models.


2. HMMs are "a statistical Markov model in which the system being modeled is assumed to be
Markov process with unobservable (i.e. hidden) states".
3. They are designed to model the joint distribution P(H , O) , where H is the hidden state and 0 is th
observed state.

4. For example, in the context of POS tagging, the objective would be to build an HMM t
model P(word I tag) and compute the label probabilities given observations using Bayes' Rule:

5. HMM graphs consist of a Hidden Space and Observed Space, where the hidden space consists of
the labels and the observed space is the input.

• Handcrafted by BackkBenchers Community Page 34 of 70 •


ctiap-J I Syntax Analysis
SEM - 8 I BE • COMPlTTER

6.
These spaces are connected vl
--
1a transiti
----- ---
ne state t · f · · ·
on matrices (T, A) to represent the probability o transitlorunq
from o o another follow· .
. ing the1t conne tlons
Each connec .. 1on represents ad' . ·
7· h f th istribution over possible options· given our tags, this results in a
large searc space o e probability of all '
Words given the tag.

Hidden Space: t T

observed Space:

The main idea behind HMMs is th 0 f .


·8 at making observations and traveling along connections based
on a probability distribution.

In the context of sequence tag · h ·


·9 ging, t ereexists a changing observed state (the tag) which changes
as our hidden state (tokens in the source text) also changes.

Q7. Write short notes on Conditional Random Fields (CRF).

Ans:

CONDITIONAL RANDOM FIELDS (CRF):

1. Maximum Entropy Markov Models (MEMMs) also have a well-known issue known as label bias.
2. The label bias problem was introduced due to MEMMs applying local normalization.
3. This often leads to the model getting stuck in local minima during decoding.
4. The local minima trap occurs because the overall model favors nodes with the least amount of
transitions.
5. To solve this, Conditional Random Fields (CRFs) normalize globally and introduce an undirected
graphical structure.
6. The conditional random field (CRF) is a conditional probabilistic model for sequence labeling; just
as structured perceptron is built on the perceptron classifier, conditional random fields are built on
the logistic regression classifier.
7. Conditional random fields (CRFs) are a probabilistic framework for labeling and segmenting
sequential data, based on the conditional approach described in hidden markov model.
8. A CRF is a form of undirected graphical model that defines a single log-linear distribution over label
sequences given a particular observation sequence.
9. The primary advantage of CRFs over hidden Markov models is their conditional nature, resulting in
the relaxation of the independence assumptions required by HMMs in order to ensure tractable
inference.
10. CRFs outperform both MEMMs and HM Ms on a number of real-world sequence labeling tasks.
n Fi9ure 3.5 shows the graphical structure of CRF

• Handcrafted by BackkBenchers Community Page35of70


Chap..3 I Synt Anary i SEM - 81 BE - COMp
- ~~

t-1 1+1 Y1 Yu.1 Yi-I Y1

1 1--f'
i I
Y.

1-l i 1
1
0

i-1

I
0
X1
0
r
Xttt
0

X1-1
0

X.1
I
0

X+1
l true of imple HMMs Oeft), M MMs (center), and the chain-structured case of CRFs (right) for sequ
that the variable i not generated by the model. ~.

Figure 3.5

• Handcrafted by BackkBenchers Community PC\9e 36 of 70


se man tic Ana ly sis SEM _ 8 I

~- ~ ~ - - --------~~~~--
I

BE - COMPUTER
.4
P
CHAP - 4: SEMANTI
write short notes on lexi(:al semantics.
a1·
Arts:

~
. al semantics is the study of word m&an
IC
.. ng. _
L.eX 1
Lexical semantics plays a crucial role in semantic analysis, allowing computers to know relationships
Z· petween words, phrasal verbs, etc.

semantic analysis is the process of extracting meaning from text.


3. its computers to know and i t ts
4· It perm n erpret sentences, paragraphs, or whole documen ·
in Lexical Semantics words, sub-words, etc. are called lexical items.
s. In simple terms, lexical semantics is the relationship between lexical items, meaning of sentences
6.
and syntax of sentence.
The study of lexical semantics looks at:
7,
a. The classification and decomposition of lexical items.
b. The differences and similarities in lexical semantic structure cross-linguistically.
c The relationship of lexical meaning to sentence meaning and syntax.

~TS OF LEXICAL SEMANTIC ANALYSIS:

Followings are some important elements of lexical semantic analysis:


fu'ponymy and Hypernymy:
Hyponymy and hypernymy refers to a relationship between a general term and the more
specttic terms that fa 11 under the category of the general term.
For example, the colors red, green, blue and yellow are hyponyms. They fall under the geraral
term of color, which is the hypernym.
2. Synol"'ymy:

• Synonymy refers to words that are pronounced and spelled differently but contain the same
meaning.
Example: Happy,joyful, glad
3. Antonymy:
• Antonymy refers to words that are related by having the opposite meanings to each other.
• There are three types of antonyms: graded antonyms, complementary antonyms, and relational
antonyms.
Example:
dead, alive
long, short
4,~

' Homonymy refers to the relationship between words that are spelled or pronounced the same

Way but hold different meanings.


Example:
.
rtdcrafted by BackkBenchers commum ty Page37of70
Chap-4 I Semantic Analysis
SEM - 8 I BE - co~
·~
• bank (of river)
• bank (financial institution)
I I 5. Polvsemy:
Polysemy refers to a word having two or more related meanings.
I Example:
bright (shining)
• bright (intelligent)
6. Meronomy:
It is a logical arrangement of text and words that represent a partof or member sorneth'
lt'lg.
Example: A segment of an apple.

Q2. Explain the concept of attachments for a fragment of english.

Ans:

Note: For better understanding we have explained the below answer in detail. Kindly cut short it a
- s Pe
your understanding while attempting in exam.

SEMANTIC ATTACHMENT:

l. Semantic Attachment is the process of making semantics of a sentence by attaci1ing pieces O


semantics to the syntax tree.
2 It helps in creating semantic representation of a sentence.
3. There are three ways that can help to get from the syntax tree to the semantic representation. The
are:
a. Semantic Specialists.
b. Lambda Calculus.
c. Feature Unification.

ATTACHMENTS FOR A FRAGMENT OF ENGLISH:

I) Sentences:

l Considering the following examples.


a. Flight 487 serves lunch.
b. Serve lunch.
c. Does Flight 207 serve lunch?
d. Which flights serve lunch?

2. The meaning representations of these examples all contain propositions concerning the serving o

lunch on flights.

3. However, they differ with respect to the role that these propositions are intended to serve in the

settings in which they are uttered.

4. More specifically, the first example is intended to convey factual information to a hearer, the second

is a request for an action, and the last two are requests for information.

" Handcrafted by BackkBenchers Community Page38of70



s
SEM - 8 I BE • COMPUTER
capture these differences red to FOPC
ro ' we Will Introduce a set of operators that can be app 1
sentences.
specifically, the operators DCL IMP YNQ . h FOPC represent<Jtions of
. . . ' • · , and WHQ wlll be applied tot e
declaratives, 1mperat1ves, yes-n 0 .
. questions, and wh-questlons, respectively.
Flight 487 serves lunch:
NJ VP { DCL( VPsem(NPsem))}
serve lunch:

S ~ VP {IMP( VP.sem(DummyY011))}
Applying this rule to Example, results in the following representation

IMP(3eServing(e) ~ Serverie, Dummy You) I\ Served! e,Lunch)


ooes Flight 207 serve lunch?:

s -t AIL'< NP VP { YNQ( VP.sem(NPsem))}

The use of this rule with for example produces the following representation

YNQ(3eServing(e) l\Server(e,Flt207) I\ Served(e,Lunch))


Which flights serve lunch?:

S ~ WhWordNPVP { WHQ(NP.sem. var, VP.sem(NP.sem))}

The following representation is the result of applying this rule to Example

WHQ(x,3e,xlsa(e . Serving) A Server(e,x)


l\Served(e,Lunch) A lsa(x,Flight))
Compound Nominals:

Compound nominals, also known as noun-noun sequences, consist of simple sequences of nouns,
as in the following examples.
a. Flight schedule
b. Summer flight schedule
2. The syntactic structure of this construction can be captured by the regular expression Noun, or by
the following context-free grammar rules.

Nominal -t Noun
Nominal-« Noun Nominal
{Ax Nominal.semtx) I\ NN(Nou11.sem, x)}
3. The relation NN is used to specify that a relation holds between the modifying elements of a

compcund nominal and the head Noun.


4. In the examples given above, this leads to the following meaning representations

A.xlsa(x,Schcdulc) A NN(x,Flight)

'Axlsa(x,Schedule) t\ NN(x, Fligfu)t\ NN(x, Summer)

Page39 ol70
andcrafted by BackkBenchersCommun1ty
.,.,
SEM - 8 I BE - COMp

- ----- -
Chap-4 f Semantic An ty I ~
---- ---- .

=::
~
g ri s: pre~nominal and predicate.
l. nglish adj pli into two rr'H~jor c
2. h f II win - pp )i(amples.
Th e cat
a. I don't mind a ch nt.
b. This
h vi u n ft n incorrec proposal for the semantic attachrn
et"tt i-;
Ill strat s.
ominal
m(x) I\ Isa( ,Adj.sem)}
h ap { heap}
v I a(:.:. R t aurant t I\ Isaix, Cheap)

is is an e am le f hat is known as intersective semantics.


5. T best ap roach is to simply note the status of a specific kind of modification relation and assurn
a s e further procedure with access to additional releva_nt knowledge can replace this vague
e atio an appropriate representation

ominal ~Adj Nominal


{ A.T Nomina/.sem(x) I\ AM(x,Adj.sem)}
6. App yi gt is rule to a cheap restaurant results in the following form Ola

3x lsa(x, Restaurant) I\ AM(x, Cheap)


lnfinlti e V.erb Phrases:

l. A fair umber of English verbs take some form of verb phrase as one of their arguments.
2 This complicates the normal verb phrase semantic schema since these argument verb phrases
interact with the other arguments of the head verb in ways that are not completely obvious
3. Consider the following example: "I told Harry to go to Maharani".
4. The meaning representation for this example should be something like the following

3e,f,x Isaie, Telling) I\ Isat f ; Going)


l\Teller(e,Speaker) I\ Tel/ee(e,Harry) I\ ToldThing(e,f)
1,Goer(f,Harry) I\ Destination(f,x)
5. There are two interesting things to note about this meaning representation: the first is that it
consists of two events, and the second is that one of the participants, Harry, plays a role in tJoth of
the two events.
6. The difficulty in creating this complex representation falls to the verb phrase dominating the verb
tell which will something like the following as its semantic attachment.

A.x,y }..z 3t! tsat. t!, hi ling)


/\ Teller(e,z) I\ Tel/ee(e,x) 1\ To/dThing(e,y)

7. Semantically, we can interpret this subcategorization frame for Tell as providing" three semantic
roles: a person doing the telling, a recipient of the telling, and the proposition being conveyed.

¥ Handcrafted by BackkBenchers Community Page40of70



_41 s rn nti n
~p - - -
ble,,,: Herry ls n
pro
oh.1t1on:

J'

l' p

this approach, th A.-· m o he


111 111
~ernantics of th inflni I h n rw on in. ion in h y.
Thee pression ( ) s nt a -r due l n tha ins rts Harry in o h Goin -
The notation : variable, is ana'logous to the notation u ed for comptex-t rm v riabl
us access to the e ent variable representing the Going event wi hin the infini i~ 's me
ning representation.

n ~n Phrases:
A noun phrase is a group of two or more words accompanied by a noun that includes
mojiflers. Example: the, a, of them, with him
A noun phrase plays the role of a noun.
'-

;,. In a noun phrase, the modifiers can come before or after the noun.

4. Genitive noun phrases make use of complex determiners that consist of noun phrases with
possessive markers, as in Atlanta's airport and Maharani's menu.
s. A little introspection, however, reveals that the relation between a city and its airport has little
in common with a restaurant and its menu.
6. Therefore, as with compound nominals, it turns out to be best to simply state an abstract semantic
relation between the various constituents.

NP -7 ComplexDet Nominal
{ < 3xNominal.sem(x) I\ GN(x,Comple.xDet .sem) >}

Complexlret -7 NP s {NP.sem}
7. Applying these rules to Atlanta's airport results in the following complex term

< 3xlsa(x,Airport) I\ GN(x,Atlanta) >


VI) PrepositionalPhrases:
1· At a fairly abstract level, prepositional phrases serve two distinct functions: they assert binary

- --- ~~ ~
relations between their heads and the constituents to which they are attached, and they signal
a rg u m e nt
~ ~ ~~ ~ ~ ~ ~ ~~~~~~~~~~~~~~-
t o nts tha t h av e a n

~ ~ ~ ~ ~ ~~ ~ ~ ~ ~
c on sti tue arg um e nt str uc ture .

- Handcrafted by BackkBenchers Community • Page41 of70


SEM-8 I BE-C()Mp~
Chap-4 I Semantic Analysis
typ s of prepositional phrases that differ based ori t~
2. These two functions argue tor two distin
semantic a tachm nts.
here prepositional phrases serve these roi
3. We will consider thr laces In th grammar w ~·
d arguments to verb phrases.
modifiers of noun hras s, modifi r of v rb phrases, an
4. Nominal Modifier Prcpo ltlonal Phra es:
Example: res aurant on earl
I a( .R raurant) I\ On(x,Pear/)
p ominal
Nominal PP
pp p p
P -+ on p..yA.x On(x.y)}
pp -+ P NP { Psem(NP.sem)}

5. Verb Phrase Modifier Prepositional Phrases:


Example: ate dinner in a hurry

VP -t VPPP
A.de Jsa(e, Eating) A Eater(e,x) A Eatenie.Dinnert

)..x Ini x , < =Jh Hurryi h) >)


l

VP-t VP PP {')...yVPsem(y) A I'P.sem(VP.sem.variable)}

A.y3e Isaie, Eating) A Eater(e,y) A Eateni e.Dinners ·


Aln(e, < 3hHurry(h) >)

6. Verb Argument Prepositional Phrases:


Example: I need to go frcvn Boston to Dallas.
In examples like this, the arguments to go are expressed as a prepositional phrase.
However,
meaning representations of these phrases should consist solely of the unaltered representation
their head nouns. To handle this, argument prepositional phrases are treated in the same way t
nonbranching grammatical rules are; the semantic attachment of the noun phrase is cop
unchanged to the semantics of the larger phrase.

PP-tPNP {NP.sem}
.41 semantic Analysis SEM .. 8 I BE • COMPUTER
cf1aP
Explain Homonymy and Polys
•my.

,,.ris:

~
Hor11onymy refers to two unrelated words that look or sound the sam•.
2. wo or more words become hom .
1fthey
h
either sound the same (homophones), ave
the same
T
spelling (homographs), or if they both homoph d h - h but do not have related
ones an s,
meanings. omogr

3. Given below
- are some examples of hom onyms
...
.
a. Stalk:
~he main stem of a herbaceous plant
Pursue or approach stealthily
b. Bank:
Financial Institution
Riverside

1 polysemyrefers to words or phrases with different, but related meanings.


2. A word becomes polysemous if it can be used to express different meanings.,

3. The difference between these meanings can be obvious or subtle.


4. It is sometimes difficult to determine whether a word is polysemous or not because the relations
between words can be vague and unclear.
s. But, examining the origins of the words can help to decide whether a word is polysemic or
homonymous.
6. The following sentences contain some examples- of polysemy.
a. He drank a glass of milk.
b. He forgot to milk the cow.
c. The enraged actor sued the newspaper.
d. He read the newspaper.
e. His cottage is near a small wood.
f. The statue was made out of a block of wood.
g. He fixed his hair.
h. They fixed a date for the wedding

ruFFERENCE BETWEEN POLYSEMYAND HOMONYMY:


~

- Polysemy Homonymy

Polysemy is the coexistence of many possible Homonymy refers to the existence of unrelated

-
meanings for a word or phase. words that look or sound the same.

Polysemy has different yet related meanings. Homonymy has completely different meanings.

Polysemy
--
has related word origins. Homonymy has different origins.

I •H •ndcrafted by BackkBenchers communi ty


.
• Page43 of70
...
SEM - 8 I BE - COMPIJl~~
Chap-4 I semantic Anatysl - -- --....
---------- ds are listed separately
Hom nymo is wor ii'\
Pol emous word are list um11l!r on entry
in dicti naries. di tlonarles. -
f homonymous words cannot b
Pol semous w r s c n understood if you
n'e meaning 0 El
e the words have unrelateo
gu ss d s 1 nc
know them aning f on word.
meanings.

Q4. Explain Meronymy.

4ns:

ERO •
- · . rt or a member of something.
l A · ts a constituent pa
. meronym st a word that represen itt as guava < guava tree).
t e (sometimes wn en
2. For example, Guava is a meronym of Guava- re
3. This part-to-whole relationship is named as meronymy. h le relationships
. b h of different part-to-w 0 ·
4. Meronymy is not only a single relation but a unc
S. It is also expressed in terms of first-order logic.
6. It can also be considered as a partial order. ,

is often represented as 11 part-of'.


7. In knowledge representation languages, meronymy
8. Figure 4.1 shows the example of meronymy

Bird

Head
Tail TORO

Wings Belly

Eyn e.ak Crown

aaws Thighs

Figure 4.1: Example of Meronymy

QS. ExplainSynonymy& Antonymy.

Ans:

SYNONYMY:

1. Synonymy in semantics refers to a word with the same or nearly the same meaning as anoth
word.
2. The term synonymy originates from the greek words sun and onoma which means 'with' and 'nar
3. A synonym is a word or phrase that means exactly tne same as another word or phrase, in the sa
language.

¥ Handcrafted by Backl<8enchel'!.Community
Page44of7
,,.,antic Analysis
ttlse


,~~p- ther
1r
0

inst
words, synonyms are Wotds Wit...
.
a nee, words like delicious, Yu
.
mmy, succute t
•1 similar m
ean1ngs.
.
-
.. - ---- -
SEM - 8 I BE • COMPUTER
--
ror verbs like commence, lnitlat n are synonyms of the adjective tasty.
' . ilarlY· e, and be 1 .
51111 some synonyms do not hav . g n are synonyms of the verb tart.
6 wever, e exactly th~ .
. ~o . es a word can be synonym. same m antng . there may be minute dlfferences.
1· eurn ous With a h . .
50111 not er tn on cont xt or u age but not n another.
1es: '
.qcar1"P
~ utiful - Gorgeous
0ea
a. hase- Buy
pure
D· I
use-EmP oy
c. wealthy
Rich-
d. E
Mistake - rror
e.
0ig- Large
. Little
g. 5111311

~
,rns are words that have opposite or contrastln· •
,Antony··· g meanings .
~ For example, the antonym of hot is cold; similarly, the antonym of day is night.
~, ,AntO n'rns.
J'..
are actually the opposite of synonyms •

). Furthermore, there are three types of antonyms as grad~ble, complementary, and relational
antonyms.
cradable antonyms are pairs of words with opposite meanings that lie on a continuous spectrum.
1
~ For example, if we take age as a continuous spectrum, young and old are two ends of the
spectrum. complementary antonyms are pairs of words with opposite meanings that do not
lie on a continuous spectrum.
s. For example, interior: exterior, true: false, and inhale: exhale.

9. Relational antonyms are pairs of words that refer to a relationship from opposite points of view.
10. For example, doctor: patient, husband: wife, teacher: student, sister: brother.

QS. Explain Hypernymy and Hyponymy.

Ans:

!ri.eERNYMY & HYPONYMY:


1· Hypernymy is the sense which is a superclass.

l. Example:

a. Animal is a hypernym of dog


b. Fruit is a hypernym of mango
t. Vehicle is a hypernym of car
l. Hy f ther sense
Ponyrny is the sense which is a subclass o ano
4. Example:
a. Do ·
9 is a hyponym of animal.
b. c
~hyponym of vehicle.

Page4Sof70
~d ~
crafted by BackkBenchers Commun ..,
sEM - 8 I BE· COMp~~

Chap~4 I Semantic Analysis


---
c. Mang is a hyponym of fruit. . hi with its hypernym.
. ol"'t1ons P
5. .
I 11 strnp Ier t rms, a hyponym 1. '111 a type-o f re
6. Hypernyms and hyp nyms are asymmetric. tence "X is a kind of Y" and deterrnin;n
. . and y in the en ·
7. Hyponymy can be tested by subst1tut1ng X
if it makes sense. - but not "A tool ls a kind of screwdriver••
I" rnakes sense, f Z th X h ·
·8. For example, "As rewdriver is a kind of too y is a hyponym o • en a
d is YPonyt't
. . . . . h onym ofY, an
9. Hyponymy rs a tra nsrtive rel ation, rf X is a YP
of z. m of color; therefore, violet .
d urple is a hypony rs a
10. For example, violet is a hyponym of purple an P

hyponym of color. purple is a hyponym of color but


m: for examp 1 e, .
itsel
11. A word can be both a hypernym and a hypony · n the range of crimson and viol
d of purple betwee et.
is a hypernym of the broad spectrum of sha es . hy-ponymy.
. . be mostly seen rn
12. The hierarchical structure of semantic fields can . h vel is more general and the low
here the h1g er e1 er
13. They could be observed from top to bottom, w
level is more specific.
14. Figure 4.2 shows the example of hypernymy and hyponymy.

CoJor
Hypernym

Figure 4.2: Example of Hypernymy and Hyponymy

Q6. Explain WordNet.

Ans:

WORDNET:

1. Wordnet is a big collection of words from the English Janguage that are related to each other
and are grouped in some way.
2. It is also called as a lexical database.
3. In other words, WordNet is a database of English words that are connected together by their
semantic relationships.
4. It is like a superset dictionary with a graph structure.

s. WordNet groups nouns, verbs, adjectives, etc. which are similar and the groups are called synsets
or synonyms.

6. In a wordnet a group of synsets may belong to some other synset.


7. For example, the synsets stones and cement belong to the synset "Building Materials" of the synset
"Stones" also belongs to another synset called "stonework'.'.
I
" Handcra fted by EackkBenchers Page46of70

Community
.41 semantic Analysis
ctiaP -- -- SEM - 8 I BE - COMPUTER

the given e ample, stones and . Is and also


g. IJ1 b 'ldi m "t iH all(\d hyoonyms of syn et building materta
rl1 synsets u1 l"lg materials and st
0nework
very member of a syn!liet d are 11 d \1 1 s
a synony m .
9. ~ enotes the ame con ept but not all syn et members are
interchangeable in tontext.
111e membership of words in mult' I - ultiplicity of
10· Ip e synsets or c ncept mirrors · Jolysemy or m
,,,eaning.

11. rhere are three principles the synset construction process must adhere to:
Minimalitv:
a. -
• This principle determine which
s on capturing those minimal set of the words in the synset
especially identifies the concept.
• For example, (family house) · . h h use of
uniquely identifies a concept example: "she . from t
' r e
the Classical Singers of Hyderabad".
b. coveraae:

• The main aim of coverage is completion of the synset, that is capturing all those words
that represents the concept expressed by the synset.
• In the synset, the words should oe ordered according to their frequency in the collection.
c. Replaceability:

• Replaceability dictates that the most common words in the synset, that is words towards
the beginning of the synset should be able to replace one another.
l

12. Figure 43 shows the example of Word net.

Figure 4.3: Example of Wordnet


'3. In the above figure, we can see that motorcar is a motor vehicle, and it also contains subjects
like compact and gas guzzler.
l4. Trying to capture relationships of each word, and all the senses of each word, is extremely difficult.
15·
Agreeing on the senses and boundaries of a word is also not simple.
16·
These are just some of the limitations of using wordnet.
s E M _ 8 I B
- - -- - - - -
Chap-4 I Semantic .Analy Is ------·· ~-·
E - COMP~~
-. ;
Q7. Explainwso In d tall.

Ans:
WSO:
c /51mbfgu1tlon.
Sense . usage In the sentence.
1. w D standsfor word . ased on the context of its words can be interpreted I
2. Words hav differ nt meanings b . too because many
s can be ambiguous
3. In human languages, word t of their occurrence. d fl ed as the abf1·
. n the contex P) may be e in rty
multiple ways dependi ng upo ocesslng (NL '
.
Word sense disambiguation, . in natural language
. pr
the useof word .rn a particular context.
4.
. . ·ng of word is activated by blem that any NLP system
determ e w h meant face
very first pro
in hic . . of the . . .
• bi lt e Word's syntactic amb1gu1ty.
Lexi 1gu1 , syntactic or semantic, is
o curacy can so 1 v
a1 n
e
f
6. Part- -spee h ( PO S) taggers with high level o .
emant1 c a
mbiguity is called word
ac sens
1
the ther hand. the problem of resolv ng s
disambiguation. • syntactic ambiguity.
. 9 . . . harder than reso1 d "b "
8. Res I ing sem vinarnbiquity is that exist for the wor
antic . . ass -
9. .
les
For e ample, consider the two examp of the distinct sense

a. I can hear bass sound.


b. He likes to eat grilled bass. .
I d otes the distinct meaning.
10. The occurrence of the word bassclearY en fi h
d nd it means is ·
11. I first sentence it meansfrequencyan seco ' h b t
·
m
- ' h rect meaning tot ea ove sen encesc
2. Hence, if it would be disambiguatedby WSDthen t e cor
be assigned as follows-
a. I can hear
bass/frequencysound. b. He likes
to eat grilled bass/fish.

Approaches and Methods to Word Sense Disambiguation (WSD):

I) Dictionary-based or Knowledge-base.~ Methods:


1. As the name suggests,for disambiguation,these methods primarily rely on dictionaries, treasu
and lexicalknowledge base.
2. They do not use corporaevidencesfor disambiguation.
3. The Leskmethod is the seminaldictionary-basedmethod introduced by Michael Lesk in 1986.
4. The Lesk definition, on which the Lesk algorithm is based is "measure overlap between
se definitions for all words in context".

s. However, in 2000, Kilgarriff and Rosensweiggavethe simplified Lesk definition as "measure


ove between sense definitions of word and current context", which further means identify
the cor
sense for one word at a time.
6. Here the current context is the set of words in surrounding
II) Supervised
sentence Methods:
0r paragrap .
h
l. For disambiguation,machine learning methods make use of .
,. sense-annotated corpora to tram.


~ ~ ~ ~ = = - - ~ -= --
., Ha• 1dc rafte db yB ackk Be nchers C om m

un - --------~~--
ity
--- ------
.41 Semantic An•tyal
c~a P •
SEM - 8 I BE • COMPllTER
fhese methods assume that -- - --
the cont - --·---------·
1,e sense. ~~t trm Novid"".
"' nc u h eviden · on its own to disambiguate
1
in these methods, the Wotd r,
~- 1 s "n0\\1 d
.11 context is repres
ent ge t'thd r .
k. I t in tudes t he information
. ed asb a s~t of "f ature "cf' ho are "'d<!~mP.d
~0n1nr unM sssary.
a a out th ~ . or\'"·
support vector machine 8nd , urrounding wor( c; t
memo a so.
, aches to wso ry-ba d lea · .
appro · rr''"9 Are the most su cessful supervised learning
fhese methods rely on
7, substantial
expensive to create. amount of
manually sense-tagged corpora, which is very

111 ) S-e mi:su~ryised ethods·



~-
oue to the lack of training co
. - rpus, most of the w . .
5uperv1sed learning methods. ord sense disambiguation algorithms use sernr-
lt is because semi-supervised m th
2. e eds use both lab II
. These methods require very sm e ed as well as unlabeled data.
11
xt a amount of an not t d
te · a e text and large amount of plain unannotated
The technique that is used by semi su .
11. Pervised method · b
s i ootstrapping from seed data.
IVl ~uoervised Methods:

These methods assume that similar senses . . .


_ ccur
0 context.
' in slrnilar
2. That is why the senses can be induced fro .
m text by clustering word occurrences by using some
measure of similarity of the context.

3. This task is called word sense induction or discrimination.


4 . unsupervised methods have great pate n t'1a It o overcome the knowledge acquisition bottleneck
due to non-dependency on manual efforts.

oifficulties in Word Sense Disambiguation fWSD);

Differences between dictionaries:

• The major problem of WSD is to decide the sense of the word because different senses can be very
closely related.

' Even different dictionaries and thesauruses can provide different divisions of words into senses.

II) Different algorithms for different ap_plications


' Another problem of WSD is that completely different algorithm might be needed for different
applications.
' For example, in machine translation, it takes the form of target word selection; and in information

retrieval, a sense inventory is not required.

1111
filter-judge variance
• are generally tested by having their results on a task
Another problem of WSD is that WSD systems
compared against the task of human beings.
' This is called the problem of interjudge variance.
SEM- 8 I BE~ COMp~
Chap~4 I Semantic Analysis ---~~

IV) Word_:sense discreteness


- t b asily divided into discrete submeanings.
An ther diffi ult in W D is that words canno ee

A1m_lLcatio11s-.Qf Wurd Sense Disambiguation (WSDJ:

I) Machine Trans atlon:


Machine translation or MT is the most obviousapplication of WSD.
In MT, Lexical choice fur the words that have distinct translations for different senses,is
doneby
WSD.
The sensesin MTare representedas words in the target language. _
Most of the machine translation systemsdo not use explicit WSD module.

11) Information Retrieval IR :


Information retrieval (IR) may be defined as a software program that deals with the
organization storage, retrieval and evaluation of information from document repositories
particularly textua
information.
The system basicallyassistsusers in finding the information they required but it does not
explicitl
return the answers of the questions.
WSD is used to resolvethe ambiguities of the queries provided to IR system.
As like MT, current IR systems do not explicitly use WSD module and they rely on the conceptth
user would type enough context in the query to only retrieve relevant documents.

111) Text Mining and Information Extraction (IE):


In most of the applications, WSD is necessaryto do accurate analysis of text.
For example, WSD helps intelligent gathering system to do flagging of the correct words.
For example, medical intelligent system might need flagging of "illegal drugs" rather than
"medi drugs"

IV) Lexicoaraphy:
WSD and lexicography can work together in loop because modern lexicography is corpus
based
• With lexicography, WSD provides rough empirical sensegroupings as well as statistically
signific contextual indicators of sense.
semantic Analysis
p·" .
c~~
1

scribe Dictionary Based · SEM - 8 f BE - COMPUTER


a6' o Algorfth
rn.
,~:
~I ~ A'?y_ BASEDAL@RlTJiM·
flQ'"'f'L" .•
v~ . pie approach to segment
(>. 51(11 text is
1 up those characters in a di·0 scan each cha
1001< ctionary. racter one at a time from left to right
and series of characters found i
If the n the diet I
ce as a word. onary, then w h
,eauen e ave a matched word and segment that
this will match a shorter len th
eut g Word as Kh
e are several ways to better . mer hasmany of th
rher implement thi em.
~. s approach.

~
I) waY to avoid matching the sh
one ortest word is t 0 f'
. ti·onary instead.
d1C
Ind the longest sequenceof characters in the
. approachis called the longest m .
, fhlS atch1ng afgorith .
~ .5 ;s a greedy algorithm that match h m or maximal matching.
J 'fhl • . . es t e longest word.
. r example, in English, we have these .
•· Fo . series of characters: "themendinehere" for
the first word, we would find: the, them, theme
·ust choose the Ion t h' . and no longer word would match after
that.
NoW we J ges w ich is "theme" th .
6 , • en start again from 'n'.
sut now we don t have any word in this series "nd'"
rne....
When we can't match a word, we just mark the f' t h
8. irs c aracter as unknown.
5 o in "ndineh ... ", we just took 'n' out as an unk nown word and matching the next word starting with
' d' .
10. Assume the word "din" or "re" are not · our d1ct1c:i:lry,
· · we would get the seriesof words as "theme
r
n dine here".

we only get one unknown word here.


2 But as you can see the longest word can make Incorrect segmentation.
3. This result in overextending the first word "theme" into a part of the second word "men" making
the next word unknown "n",

) Bi·DirectionalMaximal Matching:
One way to solve this issue is to also match backward.
This approachis called bi-directional maximal matching as proposed by Narin Bi et al.
It goes from left to right (forward mat~hing) first then from the end of the sentence go from

right to left (backwardmatching).


Then choose the best result.
As we have seen earlier the forward gave us incorrect segmentation.
B
ut the backward would' give
. t suit Narin Bi et al. shows an accuracy of 98% for Bi-
us the correc re ·
Directionaly Maximal Matching algorithm.

;
·~ al"lrlrr"'A .... ..1 a....• D ... ,.a,&,R~nchers Page51 of70
cornmun,i+.,v.
Chap-4 j Semantic Analysis SEM - 8 I BE · COMPlJll~

Ill) Maximum Matching;


l. Another approa h to solving the greed natur of longest matching is an algorithm Called
'maxin)Un"'I mat hing'.
2. Thi approach would segment. multiple po slbilities and chose the one with fewer words in the
;entence.
3.
It would also prioritize fewer unknown words.
4.
Using this approach, we would get the correct segmentation from the example text as shown below
in figure 4.4.

~
I 0 :
:_ - - - _ !

r--"""-- ,....-::... , r - ----


I C :
'---i.----J .__ _...;..., :_ - - - - - !
,.. ----
I C I
,' !

Figure 4.4: Maximum Matching

5. The first word can be "the", "them", and "theme".


6. From each of these nodes,there are multiple choices.
7. Like in "the", the next word can be "me", "men", or "mend".
8. The word "mend" would result in an incorrect word after "in".
9. Only "rrien" would give use "dine", then "here" as correct segmentation.
10. This approach goesthrough all the different combinations basedon our dictionary.
11. So unknown is still a problem that we have not addressedyet.
12. In addition, it can still result in errors with the known words when the correct segmentabletext
prefers more words instead of a fewer number of words.

., Handcrafted by eackkBenchers Community Page52of70


r I 1•r
,1-~I
pr•,
11

--:- ----:::=~~------~~~~~~~~~
.,,at1cs
• SEM - 8 f BE •

short notes on Dlscour19 Refere


~rlt• nee Resolution & Corefertnce RHolutfon•
.. ~
,:
!' ... c.i;:
,ou~
~ 1.1rse in the context of NLP refers to a se . . . . .
01sc0 • quence of sentences occurring one after the other.
1 will obviously be entitles that are being talk 9 d b
'ff'ere a · out and possible reference$ to those entttles
e discourse.
.,, t h
I vample Of 8 dlsccurss:
Arte"

·'
T
I 'o11 \

is a Graduate Student at UT
''"D"allas.

Sbe Ioves worki· ng Natural Lan'"g"ua'g-e--P-r-o--cessing at the institute.


Her hobbies include blogging, dancing and singing.

4. Here, "Ana", "Natural Language Processing" and "UT Dallas" are possible entities.

5. "She"and "Her" are references to the entity "Ana" and "the institute" is a reference to the entity
"UT Dallas".

.B.Ef.ERENCE:

1. Reference,in NLP, is a linguistic process where one word in a sentence or discourse may refer to
another word or entity.
2 The task of resolving such references is known as Reference Resolution.
3. In the above example, "She" and "Her" referring to the entity "Ana" and "the institute" referring to
the entity "UT Dallas"are two examples of Reference Resolution.

DISCOURSE - REFERENCE RESOLUTION:

l. Discourse in the context of NLP refers to a sequence of sentences occurring one after the other.
2 Reference is a linguistic processwhere one word in a sentence or discourse refers to another word
or entity
3. The task of resolvingsuch references is known as Discourse - Reference Resolution.

COREFERENCE RESOLUTION;

l. Coreference Resolutionin particular, is the process of resolving pronouns to identify which


entities are they referring to.
2. It is also a kind of Reference Resolution.
3. The entities resolved may be a person, place, organization, or event.
4. Referent is the object that is being.referred to.

¥ Handcrafted by BackkBenchers Community


Page53of70
SEM - 8 I BE • COt.1p~~

bOV• example. n in the discourse.


5. For l'Xampl , "Ana" is the re f~r nt In th• a ions give
if'V are said to coref
1 0
ns or 11ngutstlc e>Cpross urse ent ., er.
6. ~htrring •xpresslon aro the m•nt ho sani• disco
hat rofer to t
7. TWo or more rer.rring oxp,.sslons t d this better.
o understan
a Now, let us look at another •••mpl• t

bomon J,1nr II, 1 '171.

J . d dca1gncr o
fSp1ccX.

f..lon \lu'k WlU


chief cnglllccr ID
. I I I

"" is the rounder, CEO. I ., " •

. d behind Ncur1llnk.
id I known as the tDlD
1hi'49.~·r111·~1d is WI e Y

Th year old
9. Referring Expressions: Elon Musk. He, e49
10. Referent: Elon Musk Th 49 year old}
{Elon Musk, e
11.Corefering Expressions: {Elon Musk. H e}•
. hor and Endophor.
12. References are usually of two kinds: Exap
. the discourse.
13. Endophor refers to an entity that appears'" . h discourse.
. ot appear int e
14. While Exaphor refersto an entity that does n
15. Example of Endophor:

All~• recently read a wonderful story.


• ferrent that is mentioned explicitly in the
Here "She" refers to "Ana" which appears as possib 1
a
discourse.

16. Example of Exaphor.

r--~
Pick that up. (pointing to an object not mentioned in discourse)

Here "that" refers to a object which appears as a possible referrent for a object that it not mentioned
explicitly in the discourse
17. There are primarily two kinds of Endophors: Anaphor and Cataphor.
18. Anaphor refers to a situation wherein the referential entity or referent appears before its referencin
pronoun in the discourse.
19. Example of Anaphor:

Ana bought a dress. She loves it.


5/pra
' ,,/ to a situat/o
· 1'r1ie. ca·ntaphor refers
the discourse.
n Where/
I 01Jfl I
n th
. rrof1 I of Cataphor.
pn
i·\·',,,

When she bought tb


e dress, t\1u1 didn't k ...
" b fo .
"' "she occurs e re i ts referent' I . now It was torn
He'" ia entity or ~
. ;.'· rnple of cataphor. re erent "Ana• in the dis
. e~ a . cour e. Thus,
, me set corefering expressions is also called

of
:> a coreference chain or a cluster.
write short notes on Types of Refe •
a'J. rring Expressions.
>";.
{)'#-Of REEERRINC EXPRESSION.S;

rrie nve types of referring expressions are described below


JD_definite Noun Phrases:
such kind of reference represents th ··
• e entities that are new to the hearer into the discourse
, For example - in the sentence Ram h d
cont

indefinite reference.
a gone around one day to bring him some food - some is

2. Definite Noun Phrases:

• Opposite to above, such kind of reference represents the entities that are not new or identifiable
t
the hearer into the discourse context.
For example, in the sentence - I used to read The Times of India - The Times of India is a definite
reference.

3. Pronouns:
It is a form of definite reference.
For example, Ram laughed as loud as he could. The word he represents pronoun referring
expression.

4. Demonstratives:
These demonstrate and behave differently than simple definite pronouns.
For example, this and that are demonstrative pronouns.

s, Names:
It is the simplest type of referring expression.
It can be the name of a person, organization and location also.
For examp I e, m. the above examples' Ram is the name-refereeing expression.
- . -'·"Rl!!nchers Community Page55of70
_, S ),
SEM - 8 I BE -
Chap-5 f Pragmatics en . .
ic constraints
Q3. Write short not" on syntaedC a on co reference.
semant
Ans: E~E;
SYNTAcn<; & .s~ c,aNSTRAlHlSQt( CQ..B tacti-:: relationships between a ref
ed bY the Sf" . erf:- ,
1. Reference relations may also be constrain bOth occur in the same sentence
hrasewhen .
expression and a possible ant.cedent noun p es are subject to the constraints ir)<f
.._.1owin9 SAtrrtenc ir...;
2.. For Instance, the pronouns in all of the '""
In brackets.
I sett=John)
a. John bought himself a new Acura. (h m
b, John bought him a new Acura. (him#John) . .111
Acura (h1m#B1
c. John said that Bill bought him a new · .
lf:Bill] Acura
(hrr.ise
d. John said that Bill bought himself a new · ·h ;tJohn]
Ac ra (HeitJohn. e
e. He said that he bought John a new u · [h' =John]

I
h' a new Acura. rm
f. John wanted a new car. Bill bought rm [h -J hn- him;tJohn)
. ew Acura. e- o '
g. John wanted a new car. He bought hrm a n lled reflexives
3. English pronouns such as himself, herself, and themselves are ca . h th . bj
ft xiVe corefers wit e su ect of the
4. Oversimplifying the situation considerably, a re e . rr,
......eas a nonrefleXJve cannot corefer .,..n...
I immediate clause that contains it (example: aj , ) wh "
""'
subject (example: b). .
·1 mmediate clause rs shown by exam i~-
5. That this rule applies only for the subject of them ost i:~
·: manr
· 'fest between the pronoun and the ~;
and (d), in which the opposite reference pattern rs
of the higher sentence.
6. On the other hand, a full noun phrase like John cannot corefer with the subject of the mes
immediate dause nor with a higher-level subject (example: e).
7. Whereas these syntactic constraints apply to a referring expr!:SSion and a particular ~~
I antecedent noun phrase, these constraints actually prohibit coreference between the t'No

I regardless of any other available antecedents that denote the same entity.

I 8. For instance, normally a nonreflexive pronoun like him can corefer with the subject of the previous
sentence as it does in example (f), but it cannot in example (g) because of the existence of lh¬
coreferential pronoun he in the second clause.
9. The rules given above oversimplify the situation in a number of ways, and there are many cases that
they do not cover.
10. Indeed, upon further inspection the facts actually get quite complicated.
11. In fact, it is unlikely that all of the data can be explained using only syntactic relations.
12 For instance, the reflexive himself and the nonreflexive him in sentences (example: h) and (exam
e: i) respectively can both refer to the subject John, even though they occur in identical
syntactic configurations.
h. John set the pamphlets about Acuras next to him If . [h'rmself=John]
s
i. John set the pamphlets about Acuras next to h irn.
· [hirm=John]

" Handcrafted by BackkBenchers


Community PageSGof70

ucatlons
.:l,APP
c~
r: SEM - 8 I BE • COMPl/TER

CHAP • 6; APPLJ
short notes on machine translation
writ• n1 NLP and txplaln the different typ.1 of machine
o1· .. 511tlons
tr••·

l~
-!'
.tAac hlne Translation is also known as robotized lnttrp r•ta .ti on orautomated
. .
tr•ntlltion.
e Translation or MT is simply a pr d h - .
,.iach I n oce ure w en a computer software translates text from
one language to another without human contribution.
At its fundamental level, machine translation pe.rforms a straightforward replacement of atomic
~ words in a single characteristic language for words in another.

4.
using corpus methods, more complicated translations can be conducted, taking into account
better treatment of contrasts in phonetic typology, express acknowledgement, and translations of
idioms, just as the seclusion of oddities.
S. 1n simple language, we can say that machine translation works by using computer software to
translate the text from one source language to another target language.
6. Thus. Machine Translation (Mn is the task of automatically converting one natural language into
another, preserving the meaning of the input text, and producing fluent text in the output language.

k!lallenges of Machine Translation:


1. The large variety of languages, alphabets and grammars.
2. The task to translate a sequence to a sequence is harder for a computer than working with number
omy,
3. There is no one correct answer (example: translating from a language without gender d~pendent
pronouns, he and she can be the same)

TYPES OF MACHINE IRANSLATIONS:

There are four types of machine translation:

Machine Translation

Statistical Rule-based
Machine HybridMachine Neural Machine
Machine Translation Translation
Translation Translation

I) Statistical Machine Translation (SMTI:


It works by alluding to statistical models that depend on the investigation of huge volumes
of bilingual content.
It aims to decide the correspondence between a word from the source language and a word
from the,objective language.

• Handcrafted by BackkBenchers Community Page57of70


sEM - 8 I BE • CO~pU,.
- - ~~

'
Chap-5 I Appllcatlons
rno<t noteworthy dlsadv<int
t· thl i now v r It " "''1•
A nuln x mpl I uon 1. rlon, ularly bo wron . In oth I·,
Pr ntly, MT I 1 mordlm ry for bn uon en n reg r "·1orq~.
h 1rnpli " uons 1
th t It do or In ont xt. wh
odols which are:
don't t qu. llty tr n 1. tton. hln tran lotion rn
d rn::.
Th f t ti lcol·bo
d tron lotion.
HI rnrchlc I phm -b
Synt x-bn d tran lotion.
Phrn -b d tran tatlon.
W rd-bas d trnn lotion.
II) R.u - 1ed Machin• rr1n1latlon fBJiMil.i
mmatical rules.
RBMT ba I lty translot s the basics of gra e and the target language to creat
. . f the source languag e th~
It directs o grammatical examination o ·
translated s ntence. . dependence on lexicons rnean
dl g and its heavy s that
But, RBMT requires extensive proof rea in

efficiency is achieved after a long period of time.

Ill) HY,.brld Machioe Tr s!il~io~JJ,!.:.LU!


HMT, as the term demonstrates, is a mix of RBMT and SMT. . .
. . bl more successful regar,drng quality.
It uses a translation memory, making tt unquest1ona Y .
. t of which is the requirement for enorrn
However, even HMT has a lot of downsides, the b1gges ou
editing, and human translators will be required.
• · • statistical rule generation, multi-pass
There are several approaches to HMT like t1-eng1ne, •and
m
confidence-based.
IV)
NMT is a type of machine translation that relies upon neural network models (based on the huma
brain) to build statistical models with the end goal of translation.
The essential advantage of NMT is that it gives a solitary system that can be pre pa red to unravel
t source and target text.
• Subsequently, it doesn't rely upon specific systems that are regular to other machine translatio
systems, particularly SMT.

Q2. Difference between Rule Based MT vs Statistical MT

Ans:

Table 6.1: Difference between Rule Based MT VS . .


Stat1st1cal
Rule Based MT
;: ; ns-~~~~~~+1::::-:-:-:-~~~=St:_atisticalMT
'""Consistency between versio n s
-
Knows grammatical rules
- ------ -----
Inconsistency betwee-n versions

Does not know grammar -

• Handcrafted by BackkBenchersCommunity
Page 58 o
'
ucatlons
I.APP
P'6 SEM - 8 I BE • COMPUTER
l rrnance and robustness H"!--
ttrraian;1;s~laattTcio>rn~qqiu~aiiill~ty{----
~1~ti perfo ain
nt"
+~agn:dhd~i"Ck:;:pPaucer~ulr9m
i0vt·0"~r\dPredictable quality ---- P~or out-of-dom:11n quall y-
'efsten -::;------------L:Unprodktabtrloan latlon qu
lity
co ~ of fluenC>' . --
Good fh.1ency - -
' ~9c handle exceptions to rules - -

~
;rd tOevelopment and customization costs Good for catching exception to rul
! ~1gh d Rapid and cost:eff9Ctive- devetoprTierit- c~ I
_J__P_ro_v_id_e~d~t~h~e~r~e:qe:xuisit_sre_d_co_r
i
_pu~i
I

explain information retrieval and its types

,.ns:
~
l. information retrieval (IR) may be defined as a software program that deals with the or~nization,
age retrieval and evaluat'1on f ·
stor • o in f rom document repos·itori·es parti·cularly tex- tual
f ormati·on information.

2 Information retrieval (IR)is finding material (usuallydocuments) of an unstructured nature (usually


text) that satisfies an information need from within large collections (usually stored on computers-}.
3. Google search is one of the famous example of Information Retrieval.
4. With the help of figure 6.1, we can understand the processof IR
Document Query

l
Indexing
) Indexing
(Query Analvsis]

Representation ~------- Representation


(Keywords) Query Evaluation (Keywords)

Figure 6.1: Information Retrieval


5. An information retrieval comprisesof the following four key elements:
a. D - Document Representation.
b. Q - Query Representation.
c. F - A framework to match and establish a relationship between D and Q.
d. R (q, di) - A ranking function that determines the similarity between the query and the
document to display relevant information.

TYPES OF INFORMATION RETRIEVAL (IR) MODELS:

Information retrieval models predict and explain what a user will find in relevance to the given query.
The following are three models that are classified for the Information model (IR) model:

I) Classical IR Models:
It is designed upon basic mathematical concepts and is the most widely-used of IR models.
• Handcrafted by BackkBenchers Community Page.59 of 70
• Classic Information Retrl val models can be Implemented with ease.
blllstlc IR models.
• Its examp195 Include V ctor-space, Bool an ond Proba ts containing the defineq
• In this system, th retrlcv I of Inform tlon depends on documen set 01
queries.. Ther Is no ranking or grading of ny kind.
Query represer tation
The different claS!.i'-al IR models take Document R epresentation. , al'\~

RetrievaVMatching function Into account In their modelling.

II) Non-Cltsslctl lR Mod•ls:


• These are completely opposite to the classlcal IR models. .
bllity Boolean operations.
• These are based on principles other than similarity, pro ba ' .
. models, Situation
f matlon logic th
• Following are the examples of Non-classical IR models: I n or E!Qry
models, Interaction models.

Ill) Alternative IR Models: .


k e of some specific technique
• is·
I the enhancement of the classical IR model that ma es us s frorri
some other fields.
· st models Fuzzy models, Latent Sern
Following are the examples of Alternative IR models: Cu1 er ' antic::
Indexing (LSI) models.

Q4. Explain design features of information retrieval systems

Ans:

DESIGN FEATURES OF IR SYSTEMS:

I) Inverted Index:
l. The primary data structure of most of the IR systems is in the form of inverted index.
2 We can define an inverted index !JS a data structure that list, for every word, all documents that
contain it and frequency of the occurrences in document.
3. It makes it easy to search for 'hits' of a query word.

II) Stop Word Elimination:


l. Stop words are those high frequency words that are deemed unlikely to be useful for searching.
2 They have less semantic weights.
3. All such kind of words are in a list called stop list.
4. For example, articles "a", "an", "the" and prepositions like "in", "of', "for", "at" etc. are the example
of stop words.
5. The size of the inverted index can be significantly reduced by stop list.
6. As per Zipfs law, a stop list covering a few dozen words reduces the size of invertedindex by almo
half.
7. On the other hand, sometimes the elimination of stop word may cause elimination of the term th
is useful for searching.

S. For example, if we eliminate the alphabet "A" from "Vitamin A" then it would have no significance

• Handcrafted by BackkBenchers Community


P - --
age 60 of7
.APPiications
~I -------------~S~EM-8f8E
·COMPUTER

·I) ~the simplified form of morphological tl.


'' 5terrif1'l' , b h 1 s, ls th heuri· tic pro,'11'~ of m~r;tlnqtl'l4
< nays
i. torrn of words Y c opplng off the ends of words.
, pase
r.:or e,carnple, the words laughing, laughs, laughed would be sternme· d to t"'
r b 11 root Wr:JrrJ /tJIJ(J/'I.

EJ<plaln the Boolean Model


aS·
fJ'fl.
~
It is the oldest information retrieval (IR) model.
jhe model is based on set theory and the Boolean algebra, where documents are sets of terms and
zl.. queries are Boolean expressions on terms.
rtie Boolean model can be defined as -
3.
D: A set of words, i.e., the indexing terms present in a document. Here, each term is eit'her
present (1) or absent (0).

Q: A Boolean expression, where terms are the index terms and operators are logical products
- AND, logical sum - OR and logical difference- NOT
F: Boolean algebra over sets of terms as well as over sets of documents
If we talk about the relevance feedback, then in Boolean IR model the Relevance prediction
can be defined as follows -
R: A document is predicted asrelevantto the query expression if and only if it satisfies the query
expression as -
((text v inf onnation) A rerieval A - theory)
4. We can explain this model by a query term as an unambiguous definition of a set of documents..
5. For example, the query term "economic" defines the set ef documents that are indexed witli ttie
term "economic".
6. Now, what wouid be the result after combining terms with Boolean AND Operator?
7. It will define a document set that is smaller than or equal to the document sets of any of the
single terms.
a. For example, the query with terms "social" and "economic" will produce the documents set of
documentsthat are indexed with both the terms.
9. In other words, document set with the intersection of both the sets.
10. Now,what would be the result after combining terms with Boolean OR operator?
11. It will define a document set that is bigger than or equal to the document sets of any of the single
terms.
12. For example, the query with terms "social" or "economic" will produce the documents set of
documents that are indexed with either the term "social" or "economic".
13. In other words, document set with the union of both thesets,

Advantages of the Boolean Mode:


The simplest model, which is basedon sets.

• Handcrafted by BackkBenchers Community • Page 61 of70


.• r M n 1 n1 •

h P· I Appll tlon

I,
It
II v lh

DI

mor '""" " thnn • rltl ol w rel,


" tf111 lI
onwll to d t 0
·
dd urn nt ·
d lnform1tlon Retrieval
[)ltl Rttrlev1I '"
Q. plain th dlfteren between ·
Ans:

from a Dntabns Mnnag


(DBMS) such as ODBMS.

I
searched queries

Data-retrieval deals wltti strUC:'urecilR deals with u nstructurecvsern· I·


structured data

Data
data with well-deflned::._s:em~an_t_ic_s-=--~+-;:~:::i:;:::;"-;;;;;--!i;;-~;;;-~
Results Querying a DRsystem produces Querying an IR system p~
exact/precise results or no results If no multiple results with ranking. Partial
match Is allowed
exact match is found
Queries Theliiputqueries are of the form of SQL The Input queries are of the form of
or relational algebra. keywords or natural language.
Ordering - of Mostly, the results are unordered. - The results are always orderedby
results relevance.

Accessibility DR systems can be accessedby only IR systems can be accessed by any non
knowledgeable users or any processes expert human unlike that of DR.

run by automation.
Inference The inference used in data retrieval is of In information retrieval it is far more
the simple deductive kind common to use inductive inference
~~1-;:--;-:;;--~-;-:---;:-;-:-:--~-:-:-:-:~l--~~~~~~~~~--I
Model It follows deterministic modelling It follows probabilistic modelling
approach. approach.
Classification
-In DR, we are most likely to be In IR such a classification is one the
interested in a monothetic whole not very useful, in fact more often
classification, that is, one with classes a polythetic classification is what is

defined by objects possessing wanted. In such a classification each

individual in a class will possess only a


• Handcrafted by BackkBenchets Commu.lity
Page 62of70
'"' is Question Answer ,,Sw.
.uhat ... ,ern in. NlP

~
j#~ ~- .
auestion Anmer System is a bra h
nc of learning of I fi
a1.1estion anmering focuses on b 'Id' n ormation Retrieval and NLP
. ui mg 5Ystems th . ·
humans rn a natural language. at automat1cal/y answer questions PClS«1

A computer understanding of n t
·' a ura/ language consi ..
trans/ate sentences into an int sts of the capability of a program system
erna 1representation so th .
questions asked by a user. at this-system generates valid answe~ c
4. Valid answers mean answers rel
evant to the questions posed by the user
5 To form an answer, it is necessa to .
Th f ry execute the syntax and semantic analysis of a question
6. e process o the system is as follows: ,
a. Query Processing.
b. Document Retrieval.
c. Passage Retrieval.
d. Answer Extraction.

TYPES OF QUESTION ANSWERING

I} IR-based Factoid Question Answering:


1. Goal is to answer a user's question by finding short text segments on the Web or some other
collection of documents.
2. In the question-processing phase a number of pieces of information from the question are
extracted.
3. The answer type specifies the kind of entity the answer consists of (person, location, time, etc.).
4. The query specifies the keywords that should be used for the IR system to use in searching for
documents.

II) Knowledge-based question answering;


J. It is the idea of answering a natural language question by mapping it to a query over a structured
database.
2. The logical form of the question is thus either in the form of a query or can easily be converted
into
one.

Page63of70
-- _,,.,R"'nchers Community
S!M - 81 BE - CQMplrtt~
Chllp-5 I Appllcatlons

J.
tripl•s..
called .-mantle par1er .
'
s ~
. «r ..,•ms. i...v.1ce1 torm at
ror mapping ftom "t ll1 ;tt1ng to any...,,
&....

either to som
e version of predlcat
e ca1cu1 111
· Sem•ntk pars rs for qu Ion answ nng usually ·
map
er a qu•ry language llke SQLor SPARQl

I) Luic•I Ciap;
sedIn different ways.
1. In a natural language, the samemeaning can be expres . . .
. .. if referred concept IS 1dent11ied, brio .
2. Becausea question can usually only be answered ei1ery 91n~1
that can be answered by a
1
this gap significantly increasesthe proportionof quest ons
system ·

II) M)bigul\y:
. . .,, t meanings; this can be structural
l. It is the phenomenon of the same phrase having di eren '31'\d
syntactic (like "flying planes") or lexical and semantic (like "bank").
in money bank vs. river bank)
2. The same string accidentally refers to different concept s (as al'ld
• • 1 t d concepts (as in bank as a com
polyserny, Wherethe same string refersto different but re a Parry
e
vs. bank as a building).

Ill) Multilingualism:
1. Knowledge on the Web is expre~~ed in variouslanguages.
2 While RDF resourcescan be describedin multiple languagesat once using language tags,there
is not a single language that is alwaysused in Web documents.
3. Additionally, users have different native languages. A QA system is expected to recognize a
language and get the results on the go!

Q7. Write short notes on Categorization

Ans:

CATEGORIZATION:

l. Text classification also known astext tagging or text categorization is the process of
categorizing
text into organized groups.
2 By using Natural Language Processing (NLP), text classifierscan automatically analyze text
and then assign a set of pre-defined tags or categories based on its content.
3. Text Classification is the processingof labeling or organizing text data into groups.
4. It forms a fundamental part of Natural Language Processing.
5. In the digital age that we live in we are surrounded by text on our social media accounts, in
commercials, on websites, Ebooks, etc.
6. The majority of this text data is unstructured, so classifying this data can be extremely useful.
7. sentiment Analysis is an important application of Text Classification.

,.

w Handcrafted by QackkBen~hers Community Page 64of 70


.----
APProtehn:

Tet1 Clnslfkat icni can b~ ittl'fil'Vl"d thtooq tllt


'""'.
IJ RllJI:~·~
l These approaches makt- use of Mode afted I' 9
i One way to group text ls to create a tgt
ds related o certain
te~1 based oh the occurrences of these
words..
3. For examl)le, wotds like "fut, "feathers·. 'daws•, and •scales could M"'IP
talking about animals online.
4. These approaches require a lot of domain knowledge to be extensive, take a 1-\ 6 r ·me J1li- ·

and are difficult to scale.

II) Machine Jtarning IPJ>[OIChts:


1. We can use machine learning to train models on large sets of text data to pred1c ca egolif!S of ~

text
2 To train models, we need to transform text data into numerical data - this is know
extraction.
3. Important feature extraction techniques include bag of words and n-grams.
4. There are several useful machine learning algorithms we can use for text classltication.
5. The most popular ones are:
a. Naive Bayes classifiers
b. Support vector machines
c. Deep learning algorithms

Ill) Hybrid approaches;


1. These approaches are a combination of the two algorithms above.
2 They make use of both rule-based and machine learning techniques to model. a classifier that can
3. be fine-tuned in certain scenarios.

AJ>plications;

Text Classification has a wide array of applications. Some popular uses are:

1. Spam detection in emails.


2. Sentiment analysis of online reviews.
3. Topic labeling documents like research papers.
4. Language detection like in Google Translate.
5. Age/gender identification of anonymous users.
6. Tagging online content.

7. Speech recognition used in virtual assistants like Siri and Alexa.

' Handcrafted by BackkBenchers Convnunity Page6Sof70


SEM - 8 I BE • COMPUTER

Chap-51 Appllc1tlon1
- -- -
1
Sentiment An•lyl••
QB. Writ• short not•• on Summ1rl11tlon

Ana:

SUMMARIZATION: xt ho.1 extraction or


summal'Y te .,,
. f source text Into a
1. A summary Is a reductive transformation J a ·
g•n.ritlon. text by preserving
the er version of a source
2. The goal of summarization Is to produce a short
meaning and the key contents of the original document. k needed to digest
large the amount of wor
3. A well written summary can significantly reduce
amounts of text.

JYRnof Text summarization;


I) Extraction based Summarization:
. essen tlal words from a source material
xtracting
1. The extractive text summarising approach enta1 •1s e
and combining them to create a summary. ding to th ·
xtraction is done accor e given
2. Without making any modifications to the texts, t h e e

measure. · hi
utting them out, then strtc mg them
3. This approach works by detecting key chunks of the text, c
'
back together to create a shortened form.
4. As a result, they rely only on phrase extraction from the original text.

II) Abstractive Summarization:

1. Another way of text summarization is abstractive summarization.


2. We create new sentencesfrom the original content in this step.
3. This is in contrast to our previous extractive technique, in which we only utilized the phrases
that were present.
4. It's possible that the phrases formed by abstractive summarization aren't present in the
original text.

5. When abstraction is used for text summarization in deep learning issues, it can overcome the
extractive method's grammatical errors.
6. Abstraction is more efficient than sxtraction.

7. The text summarising algorithms necessaryfor abstraction, on the other hand, are more complex
to build, which is why extraction is still widely used.

Ill) Domain-Specific:

l. In domain-specific summarization, domain knowledge is applied.


2. Specific context, knowledge, and language can be merged u · d . . .
sing oma1n-spec1fic summanzers.
3. For example, models can be combined with the terminolo d. . .
. gy use In medical science so that they
can better grasp and summarise scientific texts. 1

• Hardcrafted by BackkBenchers Community


Page 66 of70
--
c
cha p·S I Appffcations

IV) _oyery·bau.d; -----------------~~~_:_S:E:M~-~8~f:B~E~·C


Query-based summaries are Prim II
1.
. . . . ar Y concerned I h
2. This is s1m1lar to the search result wt natural language questions.
son Google
3, When we type questions Into G .
answer our questions. oogle's search field 1 -
• t Occaslonally returns websites or articles that
4. It displays a snippet or sum
mary of an article th t · - -
a vant to the query we entered.
is
V) _Qenerjc;

1. Generic summarizers, unlik d .


make any assumptions. e omain -specific or q b .
uery- ased summarizers, are not programmed to

2. The content from the source d . .


ocument is simply condensed or summarised.
QS. Write short notes on Sent'
1ment Analyses

SENTIMENT ANALYSES:

1. It is also known as Opinion Mining.

2 Sentiment analysis is the process of detecting positive or negative sentiment in text.


3. Sentiment analysis is an approach to •natural language processing (NLP) that identifies
the emotional tone behind a body of text. ·-
4. This is a popular way for organizations to determine and categorize opinions about a product,
service, or idea.
5. It involves the use of data mining, machine learning (ML) and artificial intelfi~ence (Al) to mine
text for sentiment and subjective information.
6. Since customers express their thoughts and feelings more openly than ever before, sentiment
analysis is becoming an essential tool to monitor and understand that sentiment.
7. Automatically analysing customer feedback, such as opinions in survey responses and social media
conversations, allows brands to learn what makes customers happy or frustrated, so that they can
tailor products and services to meet their customers' needs.
8. For example, using sentiment analysis to automatically analyse 4,000+ reviews about your product
could help you discover if customers are happy about your pricing plans and customer service.
9. Sentiment analysis algorithms fall into one of three buckets:
a. Rule based: These systems automatically perform sentiment analysis based on the set of
manually crafted rules.
b. Automatic: These systems rely on machine learning tschnlques to learn from data.
c. Hybrid: These systems combine both rule based and automatic approaches.

Page67of70
" Handcrafted by eackkBenchers Community
Chap-S I Applications SEM- 8 I BE• COM
----------~-----------------------------------------~
Types of Sentiment Analysis;
l. Fine-grained sentl t
men ana1ys1s provi 'des a more precise level of polarity by breaking it doWt) ii')
further categories, usually very positive to very negative. This can be considered the Opi11· to
IOI)
equivalent of ratings on a 5-star scale.
2. Emotion detection identifi ss specific emotions rath~r than positivity and negativity. Exal'l"l
d
could include happiness, frustration, shock, anger and sa ness.
~~

3. Intent-based analysis recognizes actions behind a text in addition to opinion. For exarl')pl
e, ari
online comment expressing frustration about changing a battery could prompt custorl')er s .
ervice
to reach out to resolvethat specific issue.
4. Aspect-based analysis gathers the specific component being positively or negatively rl')el'lr
. • . IOl')ed.
For example a customer might leave a review on a product saying the battery hfe was too L..
' S110~
Then, the system will return that the negative sentiment is not about the product as a Whal .
e, but
about the battery life.

Q9. Write short notes on Named Entity Recognition

Ans:

NAMED ENTITYRECOGNITION:
1. Named entity recognition (NER) is alsocalled entity Identification or entity extraction.
2. It is a natural language processing (NLP) technique that automatically identifies named entities
in a text and classifies them into predefined categories.
3. Entities can be names of people, organizations, locations, times, quantities, monetary values,
percentages, and more.
4. Example:

Ousted founder llsts his - penthouse for


[organization] [person] [location] [monetary value)

5. With named entity recognition, you can extract key information to understand what a text is
about, or merely use it to collect important information to store in a database.

Types of NER:

I) Entity Name Types:

ENAMEX

PERSON LOCATION OROANIZATJON FACILITIES ARTIFACTS

;
- , ,_ ... ,..,,.r~ft~n bv BackkBenchers Community
Applications
c11aP·S I SEM - 8 I BE· coMP~~

..i1.untr1cal Expressions;
11) !.=

[ NUMI!.( I
r-~~~~-L =.=i .,
[
II,.._.
DISTANCE I I QUANTITY I Y --1~
M_O.. _... .N

lllJ IirDJ Expressions:

TIMEX

TIME MONTH PERIOD VEAR


DAY DATE

~Jl;illengesof N ER:

I) Ambiguity:
1. Ambiguity between comrhon and proper nouns.
2 Example: common words such as "Roja" meaning Rose flower is a name of a person.

II) Spell variations:


One of the major challenges in the web data is that we find different people spell the same
entity with differently.

Ill) Less Resources:


1. Most of the Indian languages are less resource languages ..
2. Either there are no automated tools available to perform pre-processing tasks required
3. for NER such as Part-of-speech tagging, chunking.
4. Or for languages where such tools are available, they have less performance.

IV) Lack of easy availability of annotated data:


1. There are isolated efforts in the development of NER systems for Indian languages.
2. There is no easy availability and access for NE annotated corpus in the community.

V) Morphologically rich:
1. Most of the Indian languages are morphologically rich and agglutinative
2. There will be lot of variations in word forms which make machine learning difficult.

VI) No Capitalization feature:


l In English, capitalization is one of the main features, whereas that's not there in Indian languages
2. Machine learning algorithms have to identify different features.

• Handcraftedby BackkBenchersCommunity Page 69of70


Natural Language Proceulng (NLP) SEM - 8 I BE • COMPlJTeR

Join BackkB nch rs Community & come th• Student Ambnssodor to represent your college & earn

15% DI count.

---- -- - ----- - - -·-· -- --- ..---- --- - -------- - -- -- ----- - ------ - -- -- --- --- --- - ----- -- -- -- - -- -- ---- --
-- - - -- . .

Be the Technical Content Writer with BackkBenchers and earn upto 100 Rs. per 10 Marks Questions .


-------·--------------------------------------------------------------------------------------------------------·---- ......

Buy & Sell Final Year Projects with BackkBenchers. Project Charge upto 10,000.

Follow us on Social Media Profiles to get notified

~ BackkBencherscommunity {9+91-9930038388 (I BackkBenchersCommunlty

----------------:-----·-------------------------------------------------------------------------------------------------- -------------·-

E-?olutions Now Available @BackkBenchers Website


I

• Handcrafted by BackkBenchers Community Page70 of70

You might also like