Total No. of Questions : 8]                                        SEAT No.
                                                      8
                                                    23
P6566                                                                   [Total No. of Pages : 2
                                      [6181]-116
                                                   ic-
                                       tat
                              B.E. (Computer Engineering)
                                     7s
              NATURAL LANGUAGE PROCESSING
                                  3:4
       (2019 Pattern) (Semester - VIII) (410252A) (Elective - V)
                           02 91
                               3:5
                              0
Time : 2½ Hours]                                                              [Max. Marks : 70
                             31
                     6/1 13
Instructions to the candidates:
     1) All questions are compulsory.
                          0
                        2/2
                .23 GP
     2) Figures to the right indicate full marks.
                    E
                   80
Q1) a)     What are generative models of language? Explain any one model in detail.[4]
                                                                          8
                  C
                                                                        23
     b)    Consider the following small corpus:                                            [8]
                                                                      ic-
              16
                                                                     tat
           Training corpus:
           8.2
                                                                   7s
           <s> I am from Pune </s>
          .24
                                                              3:4
                                                   91
           <s> I am a teacher </s>
         49
                                                             3:5
           <s> students are good and are from various cities </s>
                                                  30
                                                         31
            <s> students from Pune do engineering </s>
                                          01
                                                        02
                                                   2/2
           Test data:
                                         GP
                                                  6/1
                 <s> students are from Pune </s>
                                   CE
                                              80
                                                                                              8
           Find the Bigram probability of the given test sentence.
                                                                                            23
                                         .23
     c)    Explain in detail Latent Semantic Analysis for topic modelling (LSA).[6]
                                                                                          ic-
                                         16
                                                                                       tat
                                              OR
                                    8.2
                                                                                      7s
Q2) a)     Write short note on BERT.                                                       [4]
                                   .24
                                                                                 3:4
                                                                        91
     b)    Given a document-term matrix with the following counts:                         [6]
                              49
                                                                                3:5
                                                                    30
                      Document 1 Document 2 Document 3
                                                                            31
                                                                   01
           Term 1             10              5               0
                                                                           02
                                                                        2/2
           Term 2             2               0               8
                                                              GP
                                                                    6/1
           Term 3             1               3               6
                                                         CE
                                                                   80
           Calculate the TF-IDF score of “Term 1” in “Document 1”.
                                                              .23
     c)    Describe the Latent Dirichlet Allocation (LDA) algorithm and how it is
                                                              16
           used for topic modeling?                                          [8]
                                                         8.2
                                                        .24
                                                                                        P.T.O.
                                                    49
Q3) a)    Describe the concept of Information Retrieval. Explain the significance
                                                 8
          of Natural Language Processing in Information Retrieval.                     [4]
                                               23
    b)    Explain reference resolution and conference resolution with example.[8]
                                            ic-
                                      tat
    c)    What is Cross-Lingual information Retrieval, and how is it used in Natural
          Language Processing? Provide an example.                                     [6]
                                    7s
                                          OR
                                 3:4
                          02 91
Q4) a)    Explain the concept of the Vector Space Model, and describe how it is
                              3:5
          used in Information Retrieval.                                               [6]
                             0
                            31
    b)              6/1 13
          Describe entity extraction and relation extraction with the help of examples.[8]
    c)    What is Named Entity Recognition (NER)? Describe the various metrics
                         0
                       2/2
               .23 GP
          used for evaluation.                                                         [4]
                   E
                  80
                                                                     8
Q5) a)    List the tools available for the development of NLP applications? Write
                 C
                                                                   23
          the features of any 3 tools.                                         [7]
                                                                 ic-
    b)    Describe in detail the Lesk algorithm and Walker’s algorithm for word
             16
                                                              tat
          sense disambiguation.                                               [10]
          8.2
                                                            7s
                                         OR
         .24
Q6) a)
                                                         3:4
          Explain the following lexical knowledge networks?                   [10]
                                               91
         49
                                                      3:5
          i) WordNet
                                           30
                                                   31
          ii) Indo WordNet
                                       01
                                                 02
          iii) VerbNets
                                               2/2
          iv) PropBank
                                  GP
                                           6/1
          v) Treebanks
                              CE
                                       80
    b)    Write Python code using NLTK library to split the text into tokens using
                                                                                         8
                                                                                       23
          whitespace, punctuation-based and default tokenization methods. [7]
                                     .23
                                                                                     ic-
                                  16
                                                                                   tat
Q7) a)    Explain three stages of Question Answering system with neat diagram.[7]
                               8.2
                                                                                7s
    b)    Explain Rule based Machine Translation and Statistical Machine Translation
                            .24
                                                                             3:4
          (SMT) with suitable diagrams and example.                            [10]
                                                                  91
                          49
                                                                          3:5
                                        OR
                                                               30
                                                                       31
Q8) a)    Describe following NLP applications:                                 [10]
                                                            01
                                                                     02
          i) Text Entailment
                                                                  2/2
          ii) Dialog and Conversational Agents
                                                       GP
                                                               6/1
    b)    Explain Natural Language Generation with reference architecture. [7]
                                                  CE
                                                            80
                                    
                                                         .23
                                                       16
                                                     8.2
                                                 .24
[6181]-116
[6154]-383                                 2
                                               49