NLP Toppers Solution
NLP Toppers Solution
                                                                                                   T~l~j
S~lllbYii
Exam
 Marks        20       20        20             25                                   80
                                                          ----         -
\#   l.
               Module
              Introduction
                                                         Details Contents
                                  History of NLP, Generic Nl.P system, levels of NLP,
                                                                                                          No.
                                                                                                          05
                                  Knowledge in language processing , Ambiguity in Natural
                                  language, stages in NLP, challenges of NLP, Applications of
                                  NLP
     2.       Word Level           Morphology     analysis-survey  of English     Morphology,             16
               Analysis             inflectional  morphology     & Derivational   morphology,
                                                                                                                 I
                                    Lemmatization, Regular expression, finite automata, finite
                                   state transducers (FST), Morphological parsing with FSi,
                                     Lexicon       FST Porter stemmer. N-Grams- N-gram
                                   language model, N-gram for spelling correction
                                                                                                     I
     3.      Syntax analysis          Part-Of-Speech tagging (POS) - Tag set for Engltsh (Penn         26
                                      Treebank), Rule based POS tagging, Stochastic POS tagging, \
                                                                                                                 I
                                      Issues-Multiple tags & words, Unknown words. Introduction
                                                                                                      I
                                                                                                                     I
                                      to CFG, Sequence labeling: Hidden Markov Model (HMM),
 I   4.     Semantic Analysis
                                      Maximum Entropy, and Conditional Random Field (CRF).
                                       Lexical Semantics, Attachment for fragment of English-          37
                                      sentences, noun phrases, Verb phrases, prepositional
                                       phrases, Relations among lexemes & their senses -
                                       Homonymy, Polysemy, Synonymy, Hyponymy, WordNet,
                                       Robust Word Sense Disambiguation           (WSD), Dictionary
                                       based approach
      5.        Pragmatics                Discourse-reference    resolution, reference phenomenon,                       53
                                          syntactic & semantic constraints on co reference
      6.       Applications (             Machine translation, Information retrieval, Question answers                   57
                preferably for            system, categorization, summarization, sentiment analysis,
               Indian regional            Named Entity Recognition.
                 languages)
 Note: We have tried to cover almost every important question(s)           listed in syllabus. If you feel any
 other question     is important and it is not cover in this solution then        do mail the question on
 Support@BackkBenchers.com            or WhatsApp us on +91·9930038388 / +91-7507531198
¥ Handcrafted by BackkBenchers   Community   •
      Chap-11 Introduction                                                          SEM - 8 I BE • COMPUTER
                                          c
      Ql.       What Is NLP & Describe Appllcatlon1 of NLP.
Ans:
ADVANIACES OF NLP:
 l.         NLP helps users to ask questions about any subject and get a direct response within seconds.
 2          NLP offers exact answers to the question means it does not offer unnecessary and unwanted
            information.
 3.         NLP helps computers to communicate wit!> humans in their languages.
 4.         It is very time efficient.
 5.        Most of the companies use NLP to improve the efficiency of documentation         processes, accuracy of
           documentation, and identify the information from large databases.
DISADVANTAGESOF NLP:,.
APPLICATIONS:
I)         Machine Translation:
1.         Machine translation is used to translate text or speech from one natural language to another natural
           language.                                                                             ·
                                                           ·   h
                                                           e the knowledge of the words and Phra
2.       For performing the translation, it Is important v                                          s.
                                                    to a                                          S~
         grammar of two languages that are I nvoIv ed In translation ' semantics of the languages a"·
         knowledge of the word.
3.       Example: Google Translator
V)       Speech Recognition:
l.       Speech recognition is used for converting spoken words into text.
2.       Speech Recognition is a technology that enables the computer to convert             voice input    data to
         machine readable format.
    3.   There are a lot of fields where speech recognition is used like, virtual assistants, adding speech-to•
         text, translating speech, sending emails etc.
    4.   It is used in search engines where the user can vcice out the name of their search requirements
         and get the desired result, making our work easierthan typing out the entire command.
    5.    Example: Hey Siri, Ok Google.
      2.     In Information retrieval Indexing, query modlftcotlon, word son e dl1 mblguatlon, 11nd knowlodgo
             bases are used for enhancing the performance.
      3.      For example, Wordnet, and Longman Dictionary of Contompornry Engll~h (LDO':EJ ar G<>mo u5(1ful
             lexical resources for Information retrieval research.
     Vlll)Text Summarisation:
  l.       There is a huge amount of data available on the internet and it is very hard to go through all
            the data to extract a single piece of information.
 2.         With the help of NLP,text summarization has been made available to the users.
 3.        This helps in the simplification of huge amounts of data in articles, news, research papers etc.
 4.           This application is used in Investigative Discovery to ,identify patterns in writing reports,
           Social Media Analytics to track awarenessand identify influencers, and Subject-matter expertise
           to classify content into meaningful topics.
 IX) Recruitment: .
 1.          In this competitive world, big and small companies are on the receiving end of thousands of
           resumesfrom different candidates,
2.         It has become a tough job for the HR team to go through all the resumes and select the best
           candidate for one single position.
3.         NLPhasmade the job easier by filtering through all the resumes and shortlisting the candidates by
           different techniques like information extraction and name entity recognition.
4. It goes through different attributes like Location, skills, education etc. and selects candidates
           who meet the requirements of the company closely.
X)         Chatbots:
l.         Chatbots are programs that are designed to assist a user 24/7 and respond appropriately and
           answer any query that the user might have.
2.         Implementing the Chatbot is one of the important applications of NLP.
3.         It is used by many companies to provide the customer's chat services.
4.         Example: Facebook Messenger
     NU•.
           N ll...   "'     'hr MM    t
           Ill
                                                                                               lnfi01!1gi0n   ~
                                                                                            n h irnan lang• 1~9· ~l'lrJ caml'JU      !l!ri;   l'i
LEVELS OF Nu>:
Levels of NLP
Phonok)Qy Level
Morphological Level
Lexical Level
Syntactic Level
Semantic Level
Disclosure Level
                                                                          Pragmatic Level
                                                             Figure l.l: Levelsof NLP
I}        Phonok>gy Level:
l.        Phonology level basically deals with the pronunciation.
2         Phonology identifies and interprets the sounds that makeup words when the machine has to
          understand the spoken language.
3.        It deals with physical building blocks of language sound system.
4.        Example:        Bank (finance) v/s Bank (River)
s.        In hindi, aa-jayenge (will come) or aaj-ayenge (will come today).
               no
           l          Le~ cal level deals with texical meahing or a word and part of speech.
                      It   U$eS   lexicon that is a collection of individual lexemes.
           3,        A ~xeme is a basic unit of lexical meaning; which is an abstract unit of morphological analysis hat
                     ~sen          ts the set of forms taken by a single morpheme.
          -..        Fore ample, "Bank" can take the form of a noun or a verb but its part of speech and lexical
                     meaning can onty be derived in context with other words used in the sentence or phrase.
     V) Semantic Level:
     l.         The semantic level of linguistic processing deals with the determination of what a sentence really
                means by relating syntactic features and disambiguating words with multiple definitions to the
                given context.
     2.         This level deals with the meaning of words and sentences.
 3.             There are two approaches of semantic level:
                a.         Syntax-Driven Semantic Analysis.
                b.         Semantic Grammar.
 4.             It is a study of the meaning of words that are associated with grammatical structure.
s.             For example, Tony Kakkar inputs the data from this statement we can understand that Tony Kakkar
               is an Agent.
            representation is derived.
      4.     Examples of Pragmatics: I heart you!
      s. Semantically,"hea<t" refersto an organ in our body that pumps blood and keeps us
      alive.
       6.     However, pragmatically, "heart' in this sentence means "\ove"-hearts are commonly u
              symbol for love, and to "heart" someone .has come to mean that you love someone.
Ans:
Refer Ql.
AMBlGUllY IN NLP:
Ambiguity in HLP
                     I)   Lexical Ambiguity:
                     1.    Lexical is the ambiguity of a si. ngle Word.
U)     Syntactic Ambiguity;
1.     This ty~ of ambiguity represents sentences that can be parsed in multiple syntacti€al forms.
2.     Take the following sentence: "I heard his cell phone ring in my office".
3.     The propositional phrase "in my office" can be parsed in a way that modifies the noun Gr en
       another way that modifies the verb.
V) Pragmatic Ambiguity;
1.    Such kind of ambiguity refers to the situation where the context of a phrase gives it multiple
       interpretations.
2.     In simple words, we can say that pragmatic ambiguity ariseswhen the statement is not specific.
3.     For example, the sentence "I like you too" can have multiple interpretations like I like you Uust like
       you like me), I like you Uustlike someone else dose).
4.     It is the most difficult ambiguity.
                                                                  proces.sing.
                              \'e ~   ~e!. \   at ral Language          -
-...                           ..3 sho "~ t e stages of NLP.
                                                          [                              J
                                                                        \
                                                                                         J
Disclosure Integration
                                                                            1
                                                                  Pragmatic Analysis
      5.
             Dependency Grammar and Part of Speech (POS) tags ore th Important ottrlbuteii of text syntactic.
 V) Pragmatic Analysis:
 1.        Pragmatic is the fifth and last phase of NLP.
 2.        It helps you to discover the intended effect by applyinq a set of rules that characterize
           cooperative dialogues.
                                                                  ,,J
 3.        During this, what was said is re-Interpreted er. what it truly meant.
 4.        It contains deriving those aspects of language which necessitate real world knowledge.
 5.        For example: "Open the book" is interpreted as a request instead of an order.
Ans:
Refer Ql.
CHALLENOESIN NLP:
 NLP is a powerful tool with huge benefits, but there are still a number of Natural Language Processing
limitations and problems:
                   IV} Ambiguity:
                   l.         Ambiguity in NLP refers to sentences and phrases that potentially have two or more
                              interpretations.
                    2.        There are Lexical, Semantic    &   Syntactic ambiguity.
                    3.         Even for humans the sentence alone is difficult to interpret without the context of surroundi
                        4. POS (part of speech) tagging is one NLP solution that can help solve the problem,
                        somewh
                              ¥ Handcrafted by BackkBenchersCommunity
Chap·l            I lntroductlon                                                     SEM - 8 I BE • COMPUTER
3.         With spoken language, mispronunciations, different accents, stutters, etc., can be difficult for
           a machine to understand.
4.         However, as lan,,uage databases grow and smart assistants are trained by their individual users,
           these issues can be minimized.
~, I
                                           f    rnatlo11·
     Q1.        Describe types of word or
An~:
     II)             Derivation;
     l.              Morphological derivation Is the process of forming a new word from an existing word, often by
 Ill) Compounding;
 l.         Compounding words are formed when two or more lexemes combine into a single new word.
 2.         Compound words may be written as one word or as two words joined with a hyphen.
 3.             For example:
            a.          noun-noun compound: note+ book -1 notebook
            b.          adjective-noun compound: blue+ berry- blueberry
            c.          verb-noun compound: work+ room .. workroom
           d.           noun-verb compound: breast+ feed -1 breastfeed
           e.          verb-verb compound: stir+ fry -1 stir-fry
           f.          adjective-verb compound: high + light-+ highlight
           g. verb-preposition compound: break+ up » breakup
           h.          preposition-verb compound: out + run -+ outrun
           i.          adjective-adjective compound: bitter+ sweet » bittersweet
           j.          preposition-preposition compound: in +to .. into
Ans;
FINITE AUTOMATA:
l.         An automaton having a finite number of states is called a Finite Automaton (FA) or Finite State
           Automata (FSA).
                                                                 (J S
                                        In                 Tl
                                                                                                                             tuple (Q,      r, fi,   QtJ, F), where_
                                                                                                               d   bY a 5-
            '" ,. c                                                                n be r@Pf' seri t
8
                                                                                                                             utornaton.
                                                                                                           bet of the a
                                                                               c:.al!cd   the alpha
                l.      1       a       nit                    af sym   I s,
TYPESO FJNJIE-FATEAUTQMATIOtUES.Afi
       the state to which the machine will move.             •  .     lled Deterministic Finite Aut
       It has a finite number of states that is why the machine is ca
                                                                                                   0                                                                          "1aton
2..
       (OFA).
3.     Deterministic refers to the uniqueness of the computation.
                                               .. .      f m the current state to the next state.
4.     In DFA, there is only one path for specific input ro                      .          .
                                             .    h DFA cannot cl'iange state without any input charact
S.     OFA does not accept the null move, 1.e., t e                                                     er.
6.     DFA can contain multiple final states.
7.     It is used in Lexical Analysis.
8.     Mathematically, a OFAcan be represented by a 5-tuple (Q, !, 6, Qo, F), where
       - a.          Q is a finite set of states.
       b.       ! is a finite set of symbols, called the alphabet of the automaton.
            ------------
      • Handcrafted by BackkBenchers Community                                                                 ~~~~~~~~~~~---------                  •           Page
                                                                                                                                                                        18
                                                                                                                                                                             of 70
Chap-2 I Word Lwet An1tyll1                                                            SIM ~ 11 Bl • COMPllTEA
      Th FA wlll hftv     A   ~ttu1 s1"ta q11 from which Ol'lly t.h• AdQfl with Input I wlll QO to fh4 ""   t ~ :> "·
                                                                                               Cl
                                                                             Cl
      tn state q,, ·if we read 1, we will be in state <11. but if       we                q,, we wlll reach to state
                                                                             read 0 at state
      Q1 which is the final state. In state<h, ifwe teadeltherO orl, we will g0to(11 state or q, stat•
      respective..,,. Note thllt if the input ends with 0, it will be In the final state.
ll) Non-dtttrminlttk F'lntt. Automation (HFAli
1.    It may be defined as the type of finite automation where for every inp1:Jt symbol we cannot
      determine the state to which the machine will move i.e. the machine can move to arrt combination
      of the states.
2.    It has a finite number of states that is why the machine is called Non-Deterministic Finite
      Automation (NFA).
3.    Every NFA is not DFA, but each NFA can be translated into DFA.
4.    .Mathematically, NFA can be represented by a S·tuple (Q, l:, 6, qO, F), where -
      a.    Q is a finite set of states.
      b.    l: is a finite set of symbols, called the alphabet of the automaton.
      c.    6: is the transition function where 6: Q x I: -t 2 Q.
      d.    qo: is the initial state from where any input is processed (qO E Q).
      e.     F: is a set of final state/states of Q (F       c
      Q).
5.         Whereas graphically (same as DFA), a NFA can be represented by diagraphs called state
      diagrams where -
Design an NFA with :L = {O, 1} accepts all string ending with 01.
0,1
                                                       0
                                                  r------tl         q, r-------tl
                I)         Union:
                           1. The union of two regular relations is also a regular relation.
                           2       If T, and T2 are two FSTs,there exists a FST T1 U T2 such that IT1 U T21               =   IT1 I U I T2I
                II)        Inversion:
                           l.       The inversion of a FST simply switches the input and output labels.
                           2. This means that the same FSTcan be used for both directions of a morphological                                   proce
                            3. If T         = (Ll; 1:2, Q, i.F, E) is   a FST, there exists a FSTT-1 such that I T-11 (u)     = { v E i:• I u EI   T
                            (v
Ill) Composition:
      l.     If T, is a FST from h too, and T1 Is a FST from         o, to o, then composition           of T, and T2 (T, o T1)
             maps from 1,to02•
       2.     If T, ls a transducer from h too, and T2 is a transducer from o, to 02, then T1 o T2 maps from 11 to
              02
       3.     So the transducer function Is: (T, o T2) (x)       = T, (T2 (x))
Ans:
. N-CiRAM MODEL;
l.     N-gram can be defined as the contiguous sequence of 'n' items from a given sample of text or
       speech.
2.     The items can be letters, words, or base pairs according to the application.
3.     The N-grams typically are collected from a text or speech corpus.
4.     Consider the following example: "I love reading books about Machine Learning on BackkBenchers
       Community"
5.     A 1-gram/unigram is a one-word sequence. For the given sentence, the unigrams would be: "I",
       "love", "reading", "books", "about", "Machine", "Learning", "on", "BackkBenchers", "Community"
 6.    A 2-gram/bigram         is a two-word sequence of words, such as "I love", "love reading" or
       "BackkBenchers Community".
7.          A 3-gram/trigram     is a three-word sequence of words like "I love reading",                           "about Machine
        Learning" or "on BackkBenchers Community"
 8.     An N-gram language model predicts the probability of a given t·!-gram within any sequence of
        words in the language.
 9.     A good N-gram model can predict the next word in the sentence i.e. the value of p(wlh) - what is
        the probability of seeing the word w given a history of previous word h -where                                  the history
        contains n-l words.
     10. Let's consider the example: "This article is on Sofia",we want to calculate what is the probability
                                                        of
        the last word being "Sofia" given the previous words.
                                                   P (Sofia I This article is on)
 11. After generalizing the above equation can be calculated as:
                                                  P (ws   I W1, W2, W3, W4) or P(W)
                                                      = P (Wn I W1, W2, ..... Wn)
 12. But how do we calculate it? The answer lies in the chain rule of probability:
                                                     p (A I 8) = p (A, 8) Ip (B)
                                                      p (A, 8) = p (A I B) p (8)
  13. Now generalize the above equation:
                                P (X1, X2 .Xn) = P (X1} P(X2 J X1} P (~ I X1, X2} .... P {Xn I X1, X2, .... Xn}
                                         1 ••
Ans:
Training set:
 l.      The Arabian       Nights
 2.      These are the fairy tales of the east
 3.      The stories of the Arabian Nights are translated in many languages.
 Bi-gram         Model:
 P{the,<s>)       = 0.67
Test Sentence(s):
The Arabian Nights are the fairy tales of the
east
Ans:
Step 1:
If the word has more than one syllab and end with 'ing'
I Remove 'ing' and apply the second step
Step 2:
If word finishes by a double consonant (except l S Z)
Transform      it into a single letter
Example: falling -+
fall attaching -+
attach sinq » sing
 hoppinq » hop
Ans:
1.     Morphological parsing means breaking down words Into components and building a str rctured
       representation.
2.     The objective of the morphological parsing is to produce output lexicons for a single input lexicon.
J.     Example:
       a.    Cats 7 cat + N + PL
       b.   Caught 7 catch + V + Past
4.     The above example contains the stem of the corresponding word (lexicon) in first column, along
      with its morphological features like +N means word is noun, +SG means it is singular, +PL means
      it is plural, +V for verb.
5.    There can be more than one lexical level representation for a given word.
6. Two level morphology represents a word as a correspondence between a lexical level, which
       represents a simple concatenation of morphemes makinqup a word, and the surface level, which
       represents the actual spelling of the final word.
7.     Morphological parsing is implemented by building mapping rules that map letter sequences like
      cats on the surface level into morpheme and features sequences like
                                                                       I
                                                                          cat +N +PL on the lexical level.
8.     Figure 2.1 shows these two levels for the word cats.
Lexical
                                      Surface
                                                        Figure 2.1
9. The automaton that we use for performing the mapping between these two levels is the finite-
      state transducer or FST.
10. A transducer maps between one set of symbols and another; a finite-state transducer does this via
      a finite automaton.
11.   Thus we usually visualize an FST as a two-tape automaton which recognizes or generates pairs of
      strings.
12. The FST thus has a more general function than an FSA; where an FSA defines a formal language by
      defining a set of strings, an FST defines a relation between sets of strings.
13. This relates to another view of an FST; as a machine that reads one string and generates another,
       Here's a summary of this four-fold way of thinking about transducers:
       a.   FST as recognizer: A transducer that takes a pair of strings as input and outputs accept if the
            string-pair is in the string-pair language, and a reject if it is not.
       b.    FST as generator: A machine that outputs pairs of strings of the language. Thus the output is a
            yes or no, and a pair of output strings.
      c.     FST as translator: A machine that reads a string and outputs another string,
      d.    FSTas set relater: A machine that computes relations between sets.
c YSIS
Ans:
                                                VB    DT NN
                                                Book that flight .
                                                VBZ DT NN        VB   NN     ?
                                                Does that flight serve dinner ?
7.           Even in these simple examples, automatically assigning a tag to each word is not trivial.
8.           For example, book is ambiguous.
9.           That is, it has more than one possible usage and part of speech.
10. It can be a verb (as in book that flight or to book the suspect) or a noun (as in hand me that book,
             or a book of matches).
11.          Similarly, that can be a determiner (as in Doesthat flight serve dinner), or a complementizer (as in I
             thought that your flight was earlier).
12. The problem of POS-tagging isto resolvethese ambiguities, choosing the proper tag for the context.
13.          Most of the POS tagging falls under Rule Base POS tagging, Stochastic POS tagging and
             Transformation based tagging.
METHODS:
l
.
2
a
.
•
t
h
b
.•
P
     This POS .
.
b
a
s
e
d
o
n
t
h
e
p
r
o
b
a
b
i
l
i
t
y
o
f
t
a
g
o
c
c
u
r
r
i
n
g
.
t
a
g
g
i
n
g
I
S
•
                                  P
    -   -•-•,aonchers.Community
                                                                                                            SEM - 8 f BE • COMP~
                     Chap-3    I   Syntax   Anarysis
                 --------·----
                         Ther would be no probability           f r the words that do not exist In the corpus.
                         ft uses diff r nt testing      orpus (oth     r than training corpus).
                         It is the   implest          tag in     be ause it    hoo es most frequPnt tags associated with        a word
                        training c rpus.
Ans:
13. definition. The symbols that are used in a CFG are divided into two classes.
14.
    The symbols that correspond to words in the language ('The', 'BackkBenchers') are called terminal
       symbOls.
15. The symbols that express clusters or generalizations of these are called as nonterminal symbols.
16.    In each context free rule, the item to the right of the arrow (~) is an ordered list of one or more
       terminals and nonterminal.             ,
1.7    While to the left: of the arrow is a single nonterminal symbol expressing some cluster or
       generalization.
18.    A CFG is usually thought of in two ways: as a device for generating sentences, or as a device
       for assigning a structure to a given sentence.
Ans:
1.     There are a small number of popular tagsets for English, many of which evolved from the 87-tag
       tagset used for the Brown corpus.
2.     Three of the most commonly used are the small 45-tag Penn Treebank tagset, the medium-sized 61
       tag CS tagset, and the larger 146-tag Cl tagset;
3.     The Penn Treebank tagset has been applied to the Brown· corpus and a number of other corpora.
4.     The Penn Treebanktagset was culled from the original 87-tag tagset for the Brown corpus.
5.     This reduced set leaves out information that can be recovered from the identity of the lexical item.
6.     For example, the original Brown tagset and other large tagsets like CS include a separate tag
       for each of the different forms of the verbs do, be, and have.
7.     These were omitted from the Penn set.
8.     Certain   syntactic distinctions   were not marked in the Penn Treebank tagset because Treebank
       sentences were parsed, not merely tagged, and so some syntactic information is represented in
       the Phrase structure.
Q4. ExplainParsing.
Ans:
Note: We have explained the below answer in detail for clear understanding. While writing in exam.Cut
PARSING:
1.    Parsing in NLP is the process of determining                                   the syntactic structure of a text by analysing its
      constituent   words based on an underlying grammar (of the language).
2.    In syntactic parsing, the parser can be viewed as searching through the space of all possible
      parse trees to find the correct parse tree for the sentence.
3.    Consider the example "Book that flight"
4. Grammar:
                           S -t NP VP                                           Del -t that I this I a
                           S -t Aux NP VP                                       Noun -t book I flight I meal I money
                           S -t VP                                              Verb -t book include prefer
                                                                                                     J                 J
Figure 3.1
s.    parse Tree:
                                           ---------
                                                   I
                                                                \.I
                                                                          Am
                                                                                  N
                                                 Ve        th          l>ct               Noun
                                                         I
                                                       Book               I                 I
                                                                       thnt               flight
Figure 3.2
TYPES OF PARSING:
                                            s                                                   s                 s
                                      -<.
                                      NP              VP
                                                                                   ~
                                                                                  AUX NP VP
                                                                                                                  I
                                                                                                                  VP
                                s                     s                       s                      s        s        s
                          -<.VP NP»<.VP
                          NP                                     Aux
                                                                      ~
                                                                          NP VP Aux NP VP VP
                                                                                                   ~II                 VP
                     -<;
                    Det        Norn
                                             I
                                           PropN                  Det
                                                                      -<:Norn                         I~/
                                                                                                    PropN V       NP V
--
                                                                                                                                               SEM-81BE      ,
  Chap-J    I Syntax Analysis                                                                                                                             -co~S)~
 6.    In the grammar in Figure 3.1, there are three rules that expand S, so the second Ply, or leiv
                                                .                                                 e1, ()f
       sear h spa  in Figure .3 has thr e partial trees.                                                  l\
 7.    We next expand the     onstituents in these three new trees,just as we originally expanded S
 8.    The first tree tells us to expect an NP followed by a VP, the second expects an Aux foll  ·
                                                                                              owed b
       NP a id a VP, and the third a VP by itself.                                                   ~ii~
 9.    To fit the search space on the page, we have shown in the third ply of Figure 3.3 only th
       resulting from the expansion of the left-most                                       leaves of each tree.                                                     e tr~
 10.   At each ply of the search space we use the right-hand sides of the rules to provide n
                                                                                                  fi!'w s~
       e pectations for the parser, which are then used to recursively generate the rest of the t          '()f
                                                                                             rees
 11.   Treesare grown downward until they eventually reach the part-of-speech categories at th ·
                                                                                                                                                                 e bott
       of the tree.                                                                                                                                                       ~
 12.   At this point, trees whose leaves fail to match all the words in the input can be rejected
                                                                                                                                                                  , leavi
       behind those trees that represent successful parses.                                                                                                             l"lg
13.    In Figure 3.3, only the 5th parse tree will eventually match the input sentence Book that fl"
                                                                                                                                                                 '9ht.
 Problems with the Top-Down Parser:
       Only judge's grammaticality.
       Stops when it finds a single derivation.
       No semantic    knowledge employed.
       No way to rank the derivations.
       Problems with left-recursive rules.
       Problems with ungrammatical sentences.
NP
                                                                                                                                      A M
                                                 NP
                                NO M
                                   I        I    ~
                                                                m.l                VP
                                                                                   I
                                                                                                NOM
                                                                                                       I
                                            M
                                 Noun      Det        Noun                      Verb       Det Noun                      V.,rb Der           Noun
                                       I    I               I                          I    I          I                   I I                  I
                                 Book      tlw        ftighl                   Book. that Hight                          Book tlw            flight
                                                                                                                         .>
                                                                                                                                  V                 P
                                                                                                                          .
                                                                      VP                   NP
                                                                  Verb
                                                                   I I
                                                                          I        Der
                                                                                           A    Noun
                                                                                                   I
                                                                                                                    Verb
                                                                                                                      I
                                                                                                                                      Oct
                                                                                                                                        I
                                                                                                                                               Noun
                                                                                                                                                      I
                                                                  Book that                     •light              Boo!.             that      ftighr
  1.
      Bottom-up parsing is data di
      Bottom-up parsing is the ea
                                  • rected.              -                            SEM - 8 I BE - COMPUTER
  2.                                r 1est
                                      1•   known      .
     common for computer la                      Parsing algorithm, and I u ed in the shift-reduce parsers
                                nruages.
     In bottom-up parsing, the Par
  3.                                 ser starts 'th
     words up, again by appl .                 WI the words of the input, and tries to build trees from the
                              Yin£ rules from the ra                   .
     The parse is successful 'f h                   g mmar one at a time.
  4.                           1 t e parser su
     covers all of the input.                  cceeds In building a tree rooted In the start symbol S that
 11.       In the parse on the left (the one in Which book is incorrectly considered a noun), the Nominal! Noun
           rule is applied to both of the Nouns (book and flight).
12. This same rule is also applied to the sole Noun (flight) on the right, producing the trees on the
    third ply.
13.        In ~eneral, the parser extends one ply to the next by looking for places in the parse-in-progress
           where the right-hand-side    of some rule might fit.
14. This contrasts with the earlier top-down parser, which expanded trees by applying rules when their
           left-hand side matched an unexpanded nonterminal.
15. Thus in the fourth ply, in the first and third parse, the sequence Det Nominal is recognized as
           the right-hand side of the NP! Det Nominal rule.
16. In the fifth ply, the interpretation of book as a noun has been pruned from the search space.
17. This is because this parse cannot be continued: there is no rule in the grammar with the right-hand
           side Nominal NP.
18.        The final ply of the search space (not shown in Figure 3.4) is the correct parse tree (see Figure 3.2).
                                                                                                 Page33of70
      - Handcrafted by BackkBenchers Community
                                                                                                          SEM - 8 I BE • COJ.i
                     Chap-J I Syntax Analysis
                                                                                                                        ~
                     QS.        Describe    Sequence Labeling.
Ans:
SEQUENCE LABELINC:
               5.         In question answering and search tasks,          we can use these spans as entities to specify our
                                                                                      .
                          query (e.g .. ,. "Pia , a movie by Tom Hanks")    we would lrke to label words such as: [Play, movie
Hanks].
           6.             With these part     rem ved, we can use the verb "play" to specify the wanted action, the word"
                                                                                                                                 r
                      to specify the intent of the action and Tom Hanks as the single subject for our search.
                      T do this, we need a way of labeling these words to later retrieve them for our query.
          8.          A common example of a sequence labeling task is part of speech tagging, which seeks to a
                     a part of speech to each word in an input sentence or document.
          9.         There are two forms of sequence labeling are:
                     a.      Token Labeling: Each token gets an individual Part of Speech (POS) label and
                     b.      Span Labeling: Labeling segments or groups of words that contain one tag (Named E
                             Recognition, Syntactic Chunks).
Ans:
 4.            For example, in the context of POS tagging, the objective would                         be to build an HMM t
           model P(word I tag) and compute the label probabilities given observations using Bayes' Rule:
5.        HMM graphs consist of a Hidden Space and Observed Space, where the hidden space consists of
          the labels and the observed space is the input.
      6.
              These spaces are connected vl
                                                    --
                                                1a transiti
                                                                -----              ---
                     ne state t                                                                             · f         · · ·
                                                             on matrices (T, A) to represent the probability o transitlorunq
              from o              o another follow·           .
                             .                       ing the1t conne tlons
              Each connec .. 1on represents ad' .                            ·
      7·           h           f th                istribution over possible options· given our tags, this results in a
             large searc space o         e probability of all                               '
                                                          Words given the tag.
Hidden Space: t T
observed Space:
Ans:
 1.         Maximum Entropy Markov Models (MEMMs) also have a well-known issue known as label bias.
 2.        The label bias problem was introduced due to MEMMs applying local normalization.
 3.        This often leads to the model getting stuck in local minima during decoding.
 4.        The local minima trap occurs because the overall model favors nodes with the least amount of
           transitions.
5.         To solve this, Conditional Random Fields (CRFs) normalize globally and introduce an undirected
           graphical structure.
6.         The conditional random field (CRF) is a conditional probabilistic model for sequence labeling; just
           as structured perceptron is built on the perceptron classifier, conditional random fields are built on
           the logistic regression classifier.
7.         Conditional random fields (CRFs) are a probabilistic framework for labeling and segmenting
           sequential data, based on the conditional approach described in hidden markov model.
8.         A CRF is a form of undirected graphical model that defines a single log-linear distribution over label
           sequences given a particular observation sequence.
9.         The primary advantage of CRFs over hidden Markov models is their conditional nature, resulting in
           the relaxation of the independence assumptions required by HMMs in order to ensure tractable
           inference.
10.        CRFs outperform both MEMMs and HM Ms on a number of real-world sequence labeling tasks.
n          Fi9ure 3.5 shows the graphical structure of CRF
                                                                                         1 1--f'
                                                i I
                                                                                                                    Y.
       1-l                    i 1
                                               1
                                               0
                                               i-1
                                                   •
                                                             I
                                                             0
                                                            X1
                                                                          0
                                                                           r
                                                                        Xttt
                                                                                         0
                                                                                         X1-1
                                                                                                       0
                                                                                                       X.1
                                                                                                                     I
                                                                                                                     0
                                                                                                                   X+1
                l true       of imple HMMs Oeft), M MMs (center), and the chain-structured case of CRFs (right) for sequ
                      that the variable i not generated by the model.                                                   ~.
Figure 3.5
~- ~ ~ - - --------~~~~--
                          I
                          BE - COMPUTER
    .4
P
                                                 CHAP - 4: SEMANTI
                         write short notes on lexi(:al semantics.
    a1·
    Arts:
    ~
          . al semantics is the study of word m&an
                IC
                                                    .. ng. _
             L.eX                                       1
       Lexical semantics plays a crucial role in semantic analysis, allowing computers to know relationships
    Z· petween words, phrasal verbs, etc.
             •            Synonymy refers to words that are pronounced and spelled differently but contain the same
                          meaning.
                           Example: Happy,joyful, glad
          3. Antonymy:
                 •        Antonymy refers to words that are related by having the opposite meanings to each other.
                 •        There are three types of antonyms: graded antonyms, complementary antonyms, and relational
                              antonyms.
                              Example:
                                   dead, alive
                                   long, short
          4,~
' Homonymy refers to the relationship between words that are spelled or pronounced the same
Ans:
                               Note: For better understanding we have explained the below                                             answer    in detail.    Kindly         cut short it a
                                                                                                                                                                -                             s Pe
                           your understanding                             while attempting             in exam.
SEMANTIC ATTACHMENT:
I) Sentences:
2. The meaning representations of these examples all contain propositions concerning the serving o
lunch on flights.
3. However, they differ with respect to the role that these propositions are intended to serve in the
4. More specifically, the first example is intended to convey factual information to a hearer, the second
is a request for an action, and the last two are requests for information.
                                                      S ~ VP          {IMP( VP.sem(DummyY011))}
                                  Applying this rule to Example, results in the following representation
The use of this rule with for example produces the following representation
             Compound nominals, also known as noun-noun sequences, consist of simple sequences of nouns,
            as in the following examples.
            a.    Flight schedule
            b.    Summer flight schedule
     2. The syntactic structure of this construction              can be captured by the regular expression Noun, or by
            the following context-free grammar rules.
                                                         Nominal -t Noun
                                         Nominal-« Noun Nominal
                                             {Ax Nominal.semtx) I\ NN(Nou11.sem, x)}
3.      The relation NN is used to specify that a relation holds between the modifying                            elements of a
A.xlsa(x,Schcdulc) A NN(x,Flight)
                                                                                                                Page39 ol70
        andcrafted by BackkBenchersCommun1ty
     .,.,
                                                                                                                                                             SEM - 8         I    BE - COMp
                                                                                                                                                                                           - ----- -
                 Chap-4 f Semantic An ty I                                                                                                                                                             ~
                                                                    ----            ----                                                                              .
                                                                                                                                                                                             =::
                                                                                                                                                                                           ~
                                                                                                                              g ri s: pre~nominal             and predicate.
                l.              nglish adj                                   pli    into two rr'H~jor c
                2.                                                                                   h f II win                   - pp       )i(amples.
                           Th         e cat
                           a.         I don't    mind a ch                                    nt.
                           b.         This
                                      h                                                  vi     u         n       ft n incorrec              proposal for the semantic attachrn
                                                                                                                                                                                                       et"tt i-;
                           Ill strat                                                s.
                                                                                                               ominal
                                                                                                                    m(x) I\ Isa( ,Adj.sem)}
                                                                                     h ap                       { heap}
                                                                       v I a(:.:. R t aurant t I\ Isaix, Cheap)
      l.             A fair           umber       of English verbs take                       some         form        of verb       phrase      as one of their          arguments.
      2              This complicates                    the normal            verb           phrase            semantic          schema         since    these    argument               verb      phrases
                     interact          with the other arguments of the head                                            verb   in ways that          are not completely                 obvious
     3.              Consider the following                         example:        "I told              Harry to go to Maharani".
     4.              The meaning                 representation              for this example                       should       be something             like the following
7.          Semantically,                       we can         interpret        this          subcategorization                      frame      for Tell as providing" three                     semantic
            roles:              a person         doing        the telling,         a recipient                 of the telling,         and the proposition                being        conveyed.
J'
l' p
n           ~n   Phrases:
            A noun phrase is a group                            of two or more words accompanied               by a noun that includes
                mojiflers. Example: the, a, of them, with him
                A noun phrase plays the role of a noun.
 '-
;,. In a noun phrase, the modifiers can come before or after the noun.
 4.              Genitive noun         phrases make                      use of complex determiners        that       consist   of noun       phrases with
                     possessive markers, as in Atlanta's airport                      and Maharani's menu.
  s.                 A little introspection,           however, reveals that the relation            between a city and its airport                has little
                     in common with a restaurant and its menu.
      6.             Therefore, as with compound                        nominals, it turns out to be best to simply state an abstract semantic
                     relation between the various constituents.
                                                NP         -7 ComplexDet Nominal
                                                            { < 3xNominal.sem(x) I\ GN(x,Comple.xDet .sem) >}
                                                                        Complexlret   -7   NP   s   {NP.sem}
           7.         Applying these rules to Atlanta's airport                    results in the following complex term
                                              - --- ~~ ~
                       relations between their heads and the constituents to which they are attached, and they signal
                                         a rg u m e nt
    ~ ~ ~~ ~ ~ ~ ~ ~~~~~~~~~~~~~~-
t   o                  nts    tha       t h    av     e a   n
    ~ ~ ~ ~ ~ ~~ ~ ~ ~ ~
c   on     sti   tue   arg   um     e     nt    str    uc   ture   .
                               VP -t              VPPP
                               A.de Jsa(e, Eating) A Eater(e,x) A Eatenie.Dinnert
                                                                      PP-tPNP                              {NP.sem}
                      .41 semantic     Analysis                                           SEM .. 8 I BE • COMPUTER
           cf1aP
                       Explain Homonymy and Polys
                                                     •my.
,,.ris:
           ~
                    Hor11onymy refers to two unrelated words that look or sound the sam•.
       2.            wo or more words become hom           .
                                                           1fthey
                                                                                                      h
                                                                  either sound the same (homophones), ave
                                                                                                           the same
                     T
                    spelling (homographs), or if they both homoph            d h        - h but do not have related
                                                                     ones an               s,
                    meanings.                                                     omogr
       3.           Given below
                              - are some examples of hom onyms
                                                            ...
                                                                .
                    a. Stalk:
                           ~he main stem of a herbaceous plant
                           Pursue or approach stealthily
                b.     Bank:
                           Financial Institution
                           Riverside
- Polysemy Homonymy
Polysemy is the coexistence of many possible Homonymy refers to the existence of unrelated
       -
      meanings for a word or phase.                                  words that look or sound the same.
Polysemy has different yet related meanings. Homonymy has completely different meanings.
      Polysemy
      --
               has related word origins.                             Homonymy has different origins.
4ns:
            ERO        •
       -               ·                                                 .       rt or a member of something.
     l         A                 ·                           ts a constituent pa
        .        meronym             st a word that represen                               itt as guava < guava tree).
                                                                      t e (sometimes wn en
     2.        For example,           Guava is a meronym of Guava- re
     3.       This part-to-whole relationship is named as meronymy.                     h le relationships
                                                  .       b    h of different part-to-w  0                 ·
     4.       Meronymy is not only a single relation but a unc
     S.       It is also expressed in terms of first-order logic.
     6.       It can also be considered as a partial order.                                                                                        ,
Bird
                                                                                                                     Head
                                            Tail                 TORO
Wings Belly
aaws Thighs
Ans:
SYNONYMY:
1.          Synonymy in semantics refers to a word with the same or nearly the same meaning as anoth
            word.
2.          The term synonymy originates from the greek words sun and onoma which means 'with' and 'nar
3.          A synonym is a word or phrase that means exactly tne same as another word or phrase, in the sa
            language.
     ¥ Handcrafted by Backl<8enchel'!.Community
                                                                                                                                                   Page44of7
                               ,,.,antic Analysis
                   ttlse
k·
         ,~~p- ther
          1r
                   0
                inst
                              words, synonyms are Wotds Wit...
                                                      .
                              a nee, words like delicious, Yu
                                                            .
                                                  mmy, succute t
                                                                    •1   similar m
                                                                   ean1ngs.
                                                                    .
                                                                                     -
                                                                                         ..   - ----            -
                                                                                              SEM - 8 I BE • COMPUTER
                                                                                                          --
          ror          verbs like commence, lnitlat               n are synonyms of the adjective tasty.
'            . ilarlY·                              e, and be 1         .
          51111         some synonyms do not hav .             g n are synonyms of the verb tart.
6              wever,                              e exactly th~ .
     .    ~o . es a word can be synonym.                          same m antng . there may be minute dlfferences.
1·               eurn                             ous With a      h .                                       .
          50111                                               not er tn on cont xt or u age but not n another.
                     1es:                                                                   '
           .qcar1"P
           ~           utiful - Gorgeous
                 0ea
           a.          hase- Buy
                 pure
            D·                 I
                  use-EmP oy
            c.            wealthy
                 Rich-
              d.                           E
                         Mistake -             rror
              e.
                         0ig- Large
                                      . Little
               g. 5111311
         ~
                  ,rns are words that have opposite or contrastln·         •
           ,Antony···                                               g meanings .
         ~ For example, the antonym of hot is cold; similarly, the antonym of day is night.
         ~, ,AntO n'rns.
                    J'..
                         are actually the opposite of synonyms                   •
         ). Furthermore, there are three types of antonyms as grad~ble, complementary, and relational
                   antonyms.
                   cradable antonyms are pairs of words with opposite meanings that lie on a continuous spectrum.
          1
          ~              For example,            if we take age as a continuous spectrum, young and old are two ends of the
                   spectrum. complementary antonyms are pairs of words with opposite meanings that do not
                   lie on a continuous spectrum.
          s.        For example, interior: exterior, true: false, and inhale: exhale.
          9. Relational antonyms are pairs of words that refer to a relationship from opposite points of view.
           10.          For example, doctor: patient, husband: wife, teacher: student, sister: brother.
Ans:
l. Example:
                                                                                                               Page4Sof70
                             ~d                                                      ~
                                  crafted by BackkBenchers Commun ..,
                                                                                                                                 sEM - 8 I BE· COMp~~
                                                                                                           CoJor
                                                      Hypernym
Ans:
WORDNET:
          1.     Wordnet is a big collection of words from the English Janguage that are related to each other
                 and are grouped in some way.
      2.         It is also called as a lexical database.
     3.          In other words, WordNet is a database of English words that are connected together by their
                semantic relationships.
     4.         It is like a superset dictionary with a graph structure.
 s.             WordNet groups nouns, verbs, adjectives, etc. which are similar and the groups are called synsets
               or synonyms.
 11. rhere are three principles the synset construction         process must adhere to:
                   Minimalitv:
        a.     -
               • This principle determine                                                                     which
                                               s on capturing those minimal set of the words in the synset
                    especially identifies the concept.
               • For example, (family house)     ·     .                                           h h use of
                                              uniquely identifies a concept example: "she . from t
                                       '                                                   r        e
                 the Classical Singers of Hyderabad".
        b.     coveraae:
               • The main aim of coverage is completion of the synset, that is capturing all those words
                    that represents the concept expressed by the synset.
               • In the synset, the words should oe ordered according to their frequency in the collection.
        c.     Replaceability:
               • Replaceability dictates that the most common words in the synset, that is words towards
                    the beginning of the synset should be able to replace one another.
                                                            l
           Ans:
        WSO:
                                                   c /51mbfgu1tlon.
                                                Sense                                . usage In the sentence.
        1.          w       D standsfor word    .       ased on the context of its                 words can be interpreted I
        2.         Words hav differ nt meanings b                 .          too because many
                                                  s can be ambiguous
       3.          In human languages, word                       t of their occurrence.                 d fl ed as the abf1·
                                          .          n the contex                            P) may be e in                   rty
                   multiple ways dependi ng upo                                ocesslng (NL '
                                        .
                   Word sense disambiguation, .    in natural language
                                                                .            pr
                                                                              the useof word .rn a particular context.
       4.
                           .       .        ·ng of word is activated by                         blem that any NLP system
                   determ e w h meant                                            face
                                                                                 very first pro
                          in     hic                         . .        of the                                  .     . .
                       •     bi      lt                                                       e Word's  syntactic amb1gu1ty.
                   Lexi         1gu1 ,    syntactic    or semantic,  is
                                                                      o curacy can so 1 v
                         a1                                           n
                                                                      e
                                                                      f
       6.          Part- -spee h ( PO S) taggers with high level o                        .
                                                                                  emant1 c a
                                                                                                mbiguity is called word
                                  ac                                              sens
                                                                         1
                    the ther hand. the problem of resolv ng s
               disambiguation.                                  •    syntactic ambiguity.
                                  . 9 . . . harder than reso1                                     d "b "
      8.       Res I ing sem vinarnbiquity is                              that exist for the wor
                               antic                         . .                                      ass -
      9. .
                                                 les
               For e ample, consider the two examp   of the distinct sense
•
    ~ ~ ~ ~ = = - - ~ -= --
            .,    Ha•    1dc     rafte       db   yB   ackk     Be       nchers     C   om   m
un  - --------~~--
      ity
                                        --- ------
                         .41 Semantic An•tyal
c~a P                                               •
                                                                                                               SEM - 8 I BE • COMPllTER
                   fhese methods assume that                          --      -    --
                                                          the cont                                    -    --·---------·
                     1,e sense.                                       ~~t trm Novid"".
                                                                                   "' nc u h eviden · on its own to disambiguate
                     1
                   in these methods,      the Wotd            r,
     ~-              1                                  s "n0\\1 d
                     .11    context is repres
                                            ent                        ge t'thd r             .
     k. I t in tudes t he information
                           .        ed asb a s~t of "f ature "cf'  ho are "'d<!~mP.d
                                                               ~0n1nr                                         unM sssary.
                                a                       a out th ~    . or\'"·
                   support vector machine           8nd               , urrounding wor( c; t
                                                              memo                                 a so.
,                           aches to wso                             ry-ba        d lea       ·                                                .
                   appro                     ·                                            rr''"9 Are the most su cessful supervised      learning
                   fhese methods         rely on
7,                                                  substantial
                   expensive to      create.                           amount           of
                                                                                             manually sense-tagged    corpora,   which    is very
      •              The major problem of WSD is to decide the sense of the word because different senses can be very
                     closely related.
' Even different dictionaries and thesauruses can provide different divisions of words into senses.
              1111
                      filter-judge variance
               •                                                                          are generally tested by having their results on a task
                         Another problem of WSD is that WSD systems
                         compared against the task of human beings.
               '         This is called the problem of interjudge variance.
                                                                                       SEM- 8    I   BE~ COMp~
      Chap~4       I   Semantic Analysis                                                  ---~~
IV) Lexicoaraphy:
          WSD and lexicography can work together in loop because modern lexicography is corpus
          based
•         With lexicography, WSD provides rough empirical sensegroupings as well as statistically
          signific contextual indicators of sense.
                                              semantic Analysis
                                      p·"                      .
                             c~~
                             1
                         ~
               I)            waY to avoid matching the sh
                     one                                 ortest word is t 0 f'
                       . ti·onary instead.
                     d1C
                                                                              Ind the longest sequenceof characters in the
                    . approachis called the longest m               .
            , fhlS                                        atch1ng afgorith                 .
            ~   .5    ;s a greedy algorithm that match             h               m or maximal matching.
           J 'fhl        •       .      .                 es t e longest word.
             . r example, in English, we have these                 .
           •· Fo           .                                series of characters: "themendinehere" for
               the first word, we would find: the, them, theme
                             ·ust choose the Ion     t h'        .               and no longer word would match after
                                                                          that.
               NoW we J                        ges w ich is "theme" th                       .
         6                         ,                                       • en start again from 'n'.
              sut now we don t have any word in this series "nd'"
                                                                            rne....
              When we can't match a word, we just mark the f' t h
      8.                                                                 irs c aracter as unknown.
             5 o in "ndineh ... ", we just took 'n' out as an unk nown word and matching the next word starting with
            ' d' .
     10.   Assume the word "din" or "re" are not · our d1ct1c:i:lry,
                                                        · ·          we would get the seriesof words as "theme
                                                  r
           n dine here".
) Bi·DirectionalMaximal Matching:
     One way to solve this issue is to also match backward.
 This approachis called bi-directional maximal matching as proposed by Narin Bi et al.
 It goes from left to right (forward mat~hing) first then from the end of the sentence go from
                                                                                 ;
·~ al"lrlrr"'A .... ..1 a....• D ... ,.a,&,R~nchers   Page51 of70
cornmun,i+.,v.
        Chap-4 j Semantic Analysis                                                                                         SEM - 8 I BE · COMPlJll~
                                                                                                                   ~
                                                                                                                       I     0    :
                                                                                                                       :_ - - - _ !
                                       --:- ----:::=~~------~~~~~~~~~
                                    .,,at1cs
                                         •                                                              SEM - 8 f BE •
         ·'
                                                T
                                               I 'o11 \
                                                          is a Graduate Student at UT
                                                                                        ''"D"allas.
4. Here, "Ana", "Natural Language Processing" and "UT Dallas" are possible entities.
5. "She"and "Her" are references to the entity "Ana" and "the institute" is a reference to the entity
   "UT Dallas".
.B.Ef.ERENCE:
1.                Reference,in NLP, is a linguistic process where one word in a sentence or discourse may refer to
                  another word or entity.
2                 The task of resolving such references is known as Reference Resolution.
3.                In the above example, "She" and "Her" referring to the entity "Ana" and "the institute" referring to
                  the entity "UT Dallas"are two examples of Reference Resolution.
l.                Discourse in the context of NLP refers to a sequence of sentences occurring one after the other.
 2                Reference is a linguistic processwhere one word in a sentence or discourse refers to another word
                    or entity
 3.                 The task of resolvingsuch references is known as Discourse - Reference Resolution.
COREFERENCE RESOLUTION;
                                              J                                   .          d dca1gncr o
                                                                                                            fSp1ccX.
                                                                                              . d behind Ncur1llnk.
                                                                  id I known          as the tDlD
                                            1hi'49.~·r111·~1d is WI e Y
                                                   Th    year old
     9.        Referring Expressions: Elon Musk. He, e49
    10. Referent: Elon Musk                                   Th 49 year old}
                                                 {Elon Musk, e
    11.Corefering Expressions: {Elon Musk. H e}•
                                        .         hor and Endophor.
   12. References are usually of two kinds: Exap
                                                   . the discourse.
   13. Endophor refers to an entity that appears'"             . h discourse.
                                      .               ot appear int e
   14. While Exaphor refersto an entity that does n
   15. Example of Endophor:
                                         r--~
                                Pick that up.         (pointing to an object not mentioned in discourse)
  Here "that" refers to a object which appears as a possible referrent for a object that it not mentioned
                                                      explicitly in the discourse
17. There are primarily two kinds of Endophors: Anaphor and Cataphor.
18. Anaphor refers to a situation wherein the referential entity or referent appears before its referencin
    pronoun in the discourse.
19. Example of Anaphor:
                 of
                   :>                                                              a coreference chain or a cluster.
                        write short notes on Types of Refe •
                 a'J.                                                    rring   Expressions.
                >";.
               {)'#-Of REEERRINC EXPRESSION.S;
                indefinite reference.
                                                                   a gone around one day to bring him some food - some is
      •        Opposite to above, such kind of reference represents the entities that are not new or identifiable
      t
               the hearer into the discourse context.
           For example, in the sentence - I used to read The Times of India - The Times of India is a definite
           reference.
 3.        Pronouns:
           It is a form of definite reference.
          For example, Ram laughed as loud as he could. The word he represents pronoun referring
          expression.
4. Demonstratives:
          These demonstrate and behave differently than simple definite pronouns.
          For example, this and that are demonstrative pronouns.
s, Names:
      It is the simplest type of referring expression.
      It can be the name               of a person, organization   and location also.
  For examp I e, m. the above examples' Ram is the name-refereeing expression.
- . -'·"Rl!!nchers Community   Page55of70
                                                                                                                        _, S ),
                                                                                                                     SEM - 8 I BE -
                              Chap-5 f Pragmatics                                                                    en . .
                                                                                      ic constraints
                             Q3.           Write short not" on   syntaedC a                            on co reference.
                             semant
                             Ans:                                                      E~E;
                             SYNTAcn<; & .s~                  c,aNSTRAlHlSQt( CQ..B                tacti-:: relationships between a ref
                                                                                   ed bY the Sf"                 .                         erf:- ,
                            1.      Reference relations may also be constrain                      bOth occur in the same sentence
                                                                                     hrasewhen                                        .
                                    expression and a possible ant.cedent noun p                   es are subject to the constraints ir)<f
                                                                             .._.1owin9 SAtrrtenc                                        ir...;
                            2..     For Instance, the pronouns in all of the '""
                                    In brackets.
                                                                                 I sett=John)
                                    a.     John bought himself   a new Acura. (h m
                                    b,  John bought him a new Acura. (him#John)          .          .111
                                                                              Acura (h1m#B1
                                    c. John said that Bill bought him a new          ·            .
                                                                                lf:Bill] Acura
                                                                                (hrr.ise
                                    d. John said that Bill bought himself a new           ·          ·h ;tJohn]
                                                                           Ac ra (HeitJohn. e
                                   e. He said that he bought John a new u ·                              [h' =John]
            I
                                                                           h' a new Acura. rm
                                   f. John wanted a new car. Bill bought rm                            [h -J hn- him;tJohn)
                                                                            .      ew Acura. e- o '
                                   g. John wanted a new car. He bought hrm a n                                 lled reflexives
                       3.          English pronouns such as himself, herself, and themselves are ca                 . h th     . bj
                                                                                       ft xiVe corefers wit                e su ect of the
                       4.         Oversimplifying the situation considerably, a re e                                  .                    rr,
                                                                                            ......eas a nonrefleXJve cannot corefer .,..n...
        I                         immediate clause that contains it (example: aj , )   wh                                                     "
                                                                                             ""'
                                  subject (example: b).                                                          .
                                                                                               ·1 mmediate clause rs shown by exam                i~-
                      5.          That this rule applies only for the subject of them ost                                                   i:~
                      ·:                                                                    manr
                                                                                   ·            'fest between the pronoun and the            ~;
                                  and (d), in which the opposite reference pattern rs
                              of the higher sentence.
                      6.      On the other hand, a full noun phrase like John cannot corefer with the subject of the                        mes
                              immediate dause nor with a higher-level subject (example: e).
                  7.         Whereas these syntactic constraints apply to a referring expr!:SSion and a particular ~~
    I                        antecedent          noun phrase, these constraints actually prohibit           coreference         between the t'No
I regardless of any other available antecedents that denote the same entity.
I                8.         For instance, normally a nonreflexive pronoun like him can corefer with the subject of the previous
                            sentence as it does in example (f), but it cannot in example (g) because of the existence of lh¬
                            coreferential pronoun he in the second clause.
                 9.         The rules given above oversimplify the situation in a number of ways, and there are many cases that
                            they do not cover.
                10. Indeed, upon further inspection the facts actually get quite complicated.
                11.        In fact, it is unlikely that all of the data can be explained using only syntactic relations.
                12          For instance, the reflexive himself and the nonreflexive him in sentences (example: h) and (exam
                           e: i) respectively       can both refer to the subject John, even though                  they   occur   in identical
                           syntactic configurations.
                                         h. John set the pamphlets about Acuras next to him      If . [h'rmself=John]
                                                                                               s
                                     i.     John set the pamphlets about Acuras next to h irn.
                                                                                           ·   [hirm=John]
                                            CHAP • 6; APPLJ
                           short notes on machine translation
                   writ•                                         n1   NLP and txplaln the different typ.1 of machine
          o1·         .. 511tlons
                  tr••·
          l~
          -!'
                .tAac hlne Translation is also known as robotized lnttrp r•ta .ti on orautomated
                                                                                        .  .
                                                                                               tr•ntlltion.
                 e Translation or MT is simply a pr    d     h      - .
      ,.iach I n                                    oce ure w en a computer software translates text from
          one language to another without human contribution.
                At its fundamental level, machine translation pe.rforms a straightforward replacement of      atomic
     ~ words in a single characteristic language for words in another.
     4.
                using corpus methods, more complicated translations can be conducted, taking into account
            better treatment of contrasts in phonetic typology, express acknowledgement, and translations of
            idioms, just as the seclusion of oddities.
 S. 1n simple language, we can say that machine translation works by using computer software to
            translate the text from one source language to another target language.
 6. Thus. Machine Translation (Mn is the task of automatically converting one natural language into
           another, preserving the meaning of the input text, and producing fluent text in the output language.
Machine Translation
                              Statistical       Rule-based
                               Machine                            HybridMachine       Neural Machine
                                                 Machine           Translation         Translation
                              Translation       Translation
'
                 Chap-5       I   Appllcatlons
                                                                                                                              rno<t noteworthy              dlsadv<int
                                                             t· thl     i                                        now v r It        "                                        "''1•
                     A        nuln            x mpl                                      I   uon 1. rlon,                       ularly bo wron              . In oth                  I·,
                     Pr           ntly,       MT I       1 mordlm           ry for bn                             uon en n reg                                           r "·1orq~.
                                                                                        h 1rnpli " uons 1
                     th t It do                         or In         ont xt. wh
                                                                                                                               odols which are:
                     don't                            t qu. llty tr n 1. tton.                         hln       tran lotion rn
                                                                                             d rn::.
                     Th                                               f t ti lcol·bo
                                                                           d tron lotion.
                             HI rnrchlc           I phm          -b
                             Synt x-bn               d tran lotion.
                             Phrn         -b         d tran tatlon.
                             W rd-bas            d trnn lotion.
            II)     R.u -           1ed Machin• rr1n1latlon fBJiMil.i
                                                                                              mmatical rules.
                    RBMT ba I                 lty translot       s the basics of gra                                    e and the target language                 to creat
                                                                       .   .         f the source languag                                                                           e th~
                    It directs o grammatical                     examination        o                                                                                                       ·
                   translated s ntence.                                                                      .            dependence      on lexicons               rnean
                                                                                             dl g and its heavy                                                                 s that
                   But, RBMT requires extensive                              proof rea        in
Ans:
    • Handcrafted by BackkBenchersCommunity
                                                                                                                                                                Page 58 o
                                                                                                                              '
                          ucatlons
                     I.APP
               P'6                                                                                 SEM - 8 I BE • COMPUTER
         l               rrnance and robustness                          H"!--
                     ttrraian;1;s~laattTcio>rn~qqiu~aiiill~ty{----
              ~1~ti perfo ain
                                                                                                               nt"
              +~agn:dhd~i"Ck:;:pPaucer~ulr9m
         i0vt·0"~r\dPredictable quality ----                              P~or out-of-dom:11n quall y-
              'efsten       -::;------------L:Unprodktabtrloan latlon qu
              lity
              co ~ of fluenC>'        .                                   --
                                                                         Good fh.1ency - -
         ' ~9c         handle exceptions to rules          -     -
         1·
          ~
              ;rd tOevelopment and customization costs                   Good for catching exception to rul
     ! ~1gh d                                                            Rapid and cost:eff9Ctive- devetoprTierit- c~           I
                                                                      _J__P_ro_v_id_e~d~t~h~e~r~e:qe:xuisit_sre_d_co_r
     i
                                                                      _pu~i
     I
     I·
 ,.ns:
~
l.            information retrieval (IR) may be defined as a software program that deals with the or~nization,
                   age retrieval and evaluat'1on f ·
              stor     •                        o in                    f rom document repos·itori·es parti·cularly tex- tual
              f ormati·on information.
                                                                                    l
                                                                                        Indexing
                                          )   Indexing
                                                                                        (Query Analvsis]
Information retrieval models predict and explain what a user will find in relevance to the given query.
The following are three models that are classified for the Information model (IR) model:
I)             Classical IR Models:
               It is designed upon basic mathematical concepts and is the most widely-used of IR models.
• Handcrafted by BackkBenchers Community   Page.59 of 70
  •          Classic Information Retrl val models can be Implemented with ease.
                                                                  blllstlc IR models.
         •   Its examp195 Include V ctor-space, Bool an ond Proba               ts containing     the defineq
     •       In this system, th retrlcv I of Inform tlon depends on documen                                     set 01
             queries.. Ther   Is no ranking or grading of ny kind.
                                                                                       Query represer tation
             The different    claS!.i'-al IR models   take Document R epresentation.                            , al'\~
Ans:
 I)          Inverted Index:
l.           The primary data structure of most of the IR systems is in the form of inverted index.
2            We can define an inverted index !JS a data structure that list, for every word, all documents that
             contain it and frequency of the occurrences in document.
3.           It makes it easy to search for 'hits' of a query word.
S. For example, if we eliminate the alphabet "A" from "Vitamin A" then it would have no significance
                 Q: A Boolean expression, where terms are the index terms and operators are logical products
                 - AND, logical sum - OR and logical difference- NOT
                 F: Boolean algebra over sets of terms as well as over sets of documents
                 If we talk about the relevance feedback, then in Boolean IR model the Relevance prediction
                 can be defined as follows -
                 R: A document is predicted asrelevantto the query expression if and only if it satisfies the query
                 expression as -
                                          ((text v inf onnation) A rerieval A   -   theory)
4.      We can explain this model by a query term as an unambiguous definition of a set of documents..
5.        For example, the query term "economic" defines the set ef documents that are indexed witli ttie
        term "economic".
6.        Now, what wouid be the result after combining terms with Boolean AND Operator?
7.        It will define a document set that is smaller than or equal to the document sets of any of the
          single terms.
a.        For example, the query with terms "social" and "economic" will produce the documents set of
          documentsthat are indexed with both the terms.
9.        In other words, document set with the intersection of both the sets.
10. Now,what would be the result after combining terms with Boolean OR operator?
 11.      It will define a document set that is bigger than or equal to the document sets of any of the single
          terms.
 12. For example, the query with terms "social" or "economic" will produce the documents set of
          documents that are indexed with either the term "social" or "economic".
 13.          In other words, document set with the union of both thesets,
h P· I Appll tlon
           I,
          It
          II    v   lh
DI
I
                                                                                                  searched queries
Data
                          data with well-deflned::._s:em~an_t_ic_s-=--~+-;:~:::i:;:::;"-;;;;;--!i;;-~;;;-~
    Results               Querying a DRsystem produces Querying               an IR system p~
                          exact/precise results or no results If no multiple                                      results with      ranking.           Partial
                                                                                                 match Is allowed
                          exact match is found
    Queries               Theliiputqueries are of the form of SQL The Input queries are of the form of
                          or relational algebra.                                                 keywords or natural language.
    Ordering - of         Mostly, the results are unordered.                          -          The results           are    always       orderedby
    results                                                                                               relevance.
Accessibility             DR systems can be accessedby only                                      IR systems can be accessed by any non
                         knowledgeable users or any processes expert human unlike that of DR.
                         run by automation.
Inference                The inference used in data retrieval is of                              In information            retrieval it is far more
                         the simple deductive kind                                               common to use inductive inference
          ~~1-;:--;-:;;--~-;-:---;:-;-:-:--~-:-:-:-:~l--~~~~~~~~~--I
Model                    It     follows         deterministic           modelling                It       follows      probabilistic           modelling
                         approach.                                                               approach.
Classification
                     -In        DR, we are              most       likely to be                  In IR such a classification               is one the
                         interested              in        a           monothetic                whole not very useful, in fact more often
                         classification,        that is, one with classes                        a polythetic          classification      is what is
                        ~
                      j#~ ~-                          .
                        auestion Anmer System is a bra           h
                                                               nc of learning of I fi
                        a1.1estion anmering focuses on b 'Id'            n ormation Retrieval and NLP
                                  .                     ui mg 5Ystems th           .                 ·
                       humans    rn a natural language.                            at automat1cal/y answer questions PClS«1
                       A computer understanding of n t
                 ·'                                    a ura/ language consi                   ..
                      trans/ate sentences into an int                        sts of the capability of a program system
                                                     erna 1representation so th      .
                      questions asked by a user.                                at this-system generates valid answe~ c
             4. Valid answers mean answers rel
                                                        evant to the questions posed by the user
             5 To form an          answer, it is necessa to                                    .
               Th                   f                     ry   execute the syntax and semantic analysis of a question
            6.    e process        o the system is as follows:                                 ,
                 a.      Query Processing.
                 b.     Document Retrieval.
                 c.     Passage Retrieval.
                 d. Answer        Extraction.
                                                                                                  Page63of70
                                  -- _,,.,R"'nchers Community
                                                                                         S!M - 81 BE - CQMplrtt~
      Chllp-5 I Appllcatlons
      J.
                 tripl•s..
                                                                             called .-mantle par1er .
      '
     s ~
           .     «r ..,•ms.                             i...v.1ce1 torm at
                   ror mapping ftom "t ll1 ;tt1ng to any...,,
                              &....
                                                                    either to som
                                                                                   e version of predlcat
                                                                                                        e ca1cu1 111
          · Sem•ntk pars rs for qu Ion answ nng usually                                                       ·
          map
                er a qu•ry language llke SQLor SPARQl
     I)         Luic•I Ciap;
                                                                      sedIn different ways.
     1.         In a natural language, the samemeaning can be expres                        . .     .
                                  .                            .. if      referred concept IS 1dent11ied, brio .
     2.         Becausea question can usually only be answered ei1ery                                           91n~1
                                                                              that can be answered by a
                                                                        1
                  this gap significantly increasesthe proportionof quest ons
                                                          system                                              ·
     II)        M)bigul\y:
                   .                                              .     .,,     t meanings; this can be structural
 l.             It is the phenomenon of the same phrase having di eren                                              '31'\d
                syntactic (like "flying planes") or lexical and semantic (like "bank").
                                                                                     in money bank vs. river bank)
 2.             The same string accidentally refers to different concept s (as                                        al'ld
                                            •                   •             1 t d concepts (as in bank as a com
                polyserny, Wherethe same string refersto different but re a                                        Parry
                                                                                  e
                vs. bank as a building).
 Ill) Multilingualism:
 1.             Knowledge on the Web is expre~~ed in variouslanguages.
 2              While RDF resourcescan be describedin multiple languagesat once using language tags,there
    is not a single language that is alwaysused in Web documents.
 3. Additionally, users have different native languages. A QA system is expected to recognize a
               language and get the results on the go!
Ans:
CATEGORIZATION:
l.  Text classification also known astext tagging or text categorization is the process of
categorizing
               text into organized groups.
2              By using Natural Language Processing (NLP), text classifierscan automatically analyze text
               and then assign a set of pre-defined tags or categories based on its content.
3.             Text Classification is the processingof labeling or organizing text data into groups.
4.             It forms a fundamental part of Natural Language Processing.
5.             In the digital age that we live in we are surrounded by text on our social media accounts, in
               commercials, on websites, Ebooks, etc.
6.             The majority of this text data is unstructured, so classifying this data can be extremely useful.
7.             sentiment Analysis is an important application of Text Classification.
,.
                      text
           2         To train models, we need to transform text data into numerical data - this is know
                     extraction.
          3.         Important feature extraction techniques include bag of words and n-grams.
          4.         There are several useful machine learning algorithms we can use for text classltication.
          5.        The most popular ones are:
                 a.      Naive Bayes classifiers
                 b. Support vector machines
                c.      Deep learning algorithms
AJ>plications;
Text Classification has a wide array of applications. Some popular uses are:
Chap-51      Appllc1tlon1
                  -           --   -
                                             1
                                               Sentiment          An•lyl••
QB.       Writ• short not•• on Summ1rl11tlon
Ana:
       measure.                                                                           · hi
                                                                 utting them out, then strtc mg them
3.     This approach works by detecting key chunks of the text, c
                          '
       back together to create a shortened form.
4.     As a result, they rely only on phrase extraction from the original text.
5.     When abstraction is used for text summarization in deep learning issues, it can overcome the
       extractive method's grammatical errors.
6.     Abstraction is more efficient than sxtraction.
7.     The text summarising algorithms necessaryfor abstraction, on the other hand, are more complex
       to build, which is why extraction is still widely used.
Ill) Domain-Specific:
SENTIMENT ANALYSES:
                                                                                                        Page67of70
" Handcrafted by eackkBenchers Community
     Chap-S        I Applications                                                                SEM- 8 I BE• COM
     ----------~-----------------------------------------~
     Types of Sentiment Analysis;
     l.  Fine-grained sentl    t
                            men ana1ys1s provi 'des a more precise level of polarity by breaking it doWt) ii')
        further categories, usually very positive to very negative. This can be considered the Opi11· to
                                                                                                                                  IOI)
           equivalent of ratings on a 5-star scale.
     2.    Emotion detection identifi ss specific emotions rath~r than positivity and negativity. Exal'l"l
                                                                    d
           could include happiness, frustration, shock, anger and sa ness.
           ~~
     3.    Intent-based analysis recognizes actions behind a text in addition to opinion. For exarl')pl
                                                                                                          e, ari
           online comment expressing frustration about changing a battery could prompt custorl')er s .
                                                                                                         ervice
           to reach out to resolvethat specific issue.
     4.    Aspect-based analysis gathers the specific component being positively or negatively rl')el'lr
                                                          .                                 •                .          IOl')ed.
           For example a customer might leave a review on a product saying the battery hfe was too                          L..
                          '                                                                                             S110~
          Then, the system will return that the negative sentiment is not about the product as a Whal      .
                                                                                                     e, but
          about the battery life.
Ans:
 NAMED ENTITYRECOGNITION:
 1.       Named entity recognition (NER) is alsocalled entity Identification or entity extraction.
 2.       It is a natural language processing (NLP) technique that automatically identifies named entities
          in a text and classifies them into predefined categories.
3.        Entities   can be names of people, organizations, locations, times, quantities, monetary values,
          percentages, and more.
4.        Example:
5.        With named entity recognition, you can extract key information to understand what a text is
          about, or merely use it to collect important information to store in a database.
Types of NER:
ENAMEX
                                                      ;
- , ,_ ... ,..,,.r~ft~n bv BackkBenchers Community
                       Applications
      c11aP·S I                                                                              SEM - 8 I BE· coMP~~
             ..i1.untr1cal Expressions;
      11)    !.=
                                               [              NUMI!.(                    I
                             r-~~~~-L                                                         =.=i          .,
                   [
                   II,.._.
                          DISTANCE     I   I       QUANTITY                    I Y --1~
                                                                         M_O.. _... .N
TIMEX
~Jl;illengesof N ER:
 I)         Ambiguity:
 1.         Ambiguity between comrhon and proper nouns.
2           Example: common words such as "Roja" meaning Rose flower is a name of a person.
V) Morphologically rich:
 1.         Most of the Indian languages are morphologically rich and agglutinative
    2. There will be lot of variations in word forms which make machine learning difficult.
Join BackkB nch rs Community & come th• Student Ambnssodor to represent your college & earn
15% DI count.
 ---- -- - ----- - - -·-· -- --- ..---- --- - -------- - -- --          ----- - ------ - -- -- --- ---      --- - ----- -- -- -- - -- -- ---- --
                                                                    -- - - -- . .
Be the Technical Content Writer with BackkBenchers and earn upto 100 Rs. per 10 Marks Questions .
                                                                                     •
  -------·--------------------------------------------------------------------------------------------------------·----                          ......
Buy & Sell Final Year Projects with BackkBenchers. Project Charge upto 10,000.
----------------:-----·-------------------------------------------------------------------------------------------------- -------------·-