Language model
1. Consider the training set:
The Arabian knights
These are the fairy tales of the east
The stories of the Arabian knights are translated in many languages
Compute using the bigram model the probability of the sentence. Include
start and end symbol in your calculations.
The Arabian knights are the fairy tales of the east
Soln:
2. You are an English class teacher and to make your course interesting you
hold a write like Shakespeare competition where you ask each student to write
a play in the style of Shakespeare. You try to score the plays automatically by
using a trigram model where the probability distribution for the trigrams is
calculated using all of Shakespeare’s plays as the corpus. While scoring the
student plays however, you find that the data was not enough and most
sentences in the student’s plays have a score of 0. What options do you have
to come out of your predicament if you still want to score the plays
automatically?
Ans:
The options are the same as when we have inadequate language models. We
back-off to simpler models e.g. bigram and unigram models and use smoothing
to compute probabilities where the corpus does not have instances of the
relevant n-gram. Another option is to use a weighted linear combination of
multiple n-gram models for different values of n (e.g. n=1, 2, 3).
_________________________________________________________
Part of speech tagging and HMM
1. Using Penn Tree bank, find the POS tag sequence for the following sentences: [6 Marks]
1. The actor was happy he got a part in a movie even though the part was small. [2
marks]
2. I am full of ambition and hope and charm of life. But I can renounce everything at
the time of need [3
marks]
3. When the going gets tough, the tough get going. [ 1 mark]
Soln:
The/DT actor/NN was/VB happy/JJ he/PRP got/VB a/DT part/NN in/IN a/DT movie/NN
“even though”/CC the/DT part/NN was/VB small/ADV. [2 marks]
I//PRP am/VB full/JJ of/IN ambition/NN and/CC hope/NN and/CC charm/JJ of/IN life/NN.
But/CC I/PRP can/VB renounce/VB everything/JJ at/IN the/DT time/NN of/IN need/NN
[3 marks]
When/WDT the/DT going/NN gets/VB tough/RB, the/DT tough/NN get/VB going/RB. [ 1
mark]
2.
.
Parsing
1. Build a parse tree for the sentence “She loves to visit Goa” using Probabilistic Parsing
[5marks]
S → NP VP 1.0
VP → V PP 0.4
VP → V NP 0.6
PP → P NP 1.0
NP → NP PP 0.3
NP → N 0.3
N → visit 0.3
V → visit 0.6
N → Goa 0.3
N → She 0.5
V → loves 1
P → to 1
DT→ a 1
2) Give the correct sequence of arc eager parsing operations for the given sentence
[2marks]
3. Given the grammar and lexicon below dervie the parse tree using top down parsing
method for the sentence [3 marks]
S :The guy ate pizza
S->NP VP
VP->VNP
NP->Det N
N->pizza
N->guy ,Det ->the
V->ate
Soln:
1The 2 guy 3 ate 4 the 5pizza 6
State Backup State Action
1.((S) 1)
2.((NP VP) 1)
3.(DT N VP) 1) matches
the
4.((N VP) 2) matches
guy
5.((VP)3)
6.((V NP ) 3) matches
ate
7.(( Det N) 4)
matches the
8.((N ))5
matches pizza
9.()
4. Given the grammar and lexicon below show the final chart for the following sentence
after applying the bottom up chart parser.[5 marks]
S: Book the flight on airasia
S->VP
VP->V NP
NP->NP PP
NP->Det Noun
PP ->Prep Noun
Det ->the
Verb->Book
Prep->on
Noun->flight|airasia
Soln:
6. In the below weighted graph, the edge weights between girl-saw and a-girl in the
maximum spanning tree are: [5]
Soln:
.
7. Given a treebank how would you determine probabilities for the grammar rules given in
Question 1 (for use with a basic PCFG parser)?
Let’s take the VP rule. There are three VP rules. I would count the total number of VP rules
in the
Treebank. Then, for each rule, I would count the number of times that rule occurs and divide
by the total number of VP rules. That would yield the probability for each rule. I would
follow a similar procedure for each rule where the same non-terminal appeared on the left-
hand side.
8.
9.
________________**********____________________
Sentiment analysis
1.In this modern age where the internet is growing rapidly, the existence of the internet can
make it easier for tourist to find information about hotels. Tourists usually tell the
experience during the hotel by writing reviews on the internet. Hence many hotel’s reviews
are found on the internet. With the availability of reviews on the internet with large
numbers, tourists can’t understand all the reviews they read whether they contain positive
or negative opinions. It takes a sentiment analysis to quickly detect if the reviews is a
positive or negative reviews. Using the Multinomial Naïve Bayes Classifier method find out
that the given hotel reviews are positive or negative.
D1 The hotel is clean and great Positive
D2 The hotel owner is very helpful Positive
D3 Overall Aston Hotel’s experience was Positive
great
D4 The condition of the hotel was very Negative
bad
D5 A HORRIBLE EXPERIENCE FOR ONE Negative
WEEK
D6 The hotel view was great ?
D7 My holiday experience stay in usa so ?
horrible
D8 Overall the hotel in aston very clean ?
and great Positive
Soln:
2. Given the following documents and their sentiment polarities
Document Sentiment words Polarity
D1 Good, Enjoy, Good Positive
D2 Poor, Unpleasant Negative
D3 Enjoy ,Wonderful Positive
D4 Good, Lovely Positive
D5 Good, Poor, Rude Negative
D6 Good ,Wonderful ?
Determine the sentiment polarity of document D6 using the multinomial naïve
Bayes classification (with add1 smoothing) approach. Show your step in detail.
Solution:
P (Positive) =3/5
P (Negative) =2/5
P (Good /Positive) =3+1/7+7=4/14 P (Good/Negative)
=1+1/5+7=2/12
P (Wonderful/Positive) =1+1/7+7 =2/14 P
(Wonderful/negative) =0+1/5+7=1/12
For the document 6
P (Positive/Good, Wonderful) = 4/14*2/14*3/5
=0.29*0.14*0.6
=0.024
P (Negative/ Good, Wonderful) =2/12*1/12*2/5
=0.16*0.083*0.4
=0.005
Sentiment polarity of document D6 is Positive
3. For sentiment analysis of twitter data, Which classifier did you choose among
SVM and Naïve bayes and why? Justify your answer.[3m]
Ans:
Option -1 mark
Justification (2marks) since, you are given only the data of tweets and no other
information, which means there is no target variable present. One cannot train a
supervised learning model, both svm and naive bayes are supervised learning
techniques.
b) I bought an iPhone a few days ago. It was such a nice phone. The touch
screen was really cool. The voice quality was clear too. Although the battery
life was not long, that is ok for me. However, my mother was mad with me as
I did not tell her before I bought it. She also thought the phone was too
expensive, and wanted me to return it to the shop. What are the problems
associated with this type of sentiment analysis?[3Marks]
Ans:
Identification of Implicit aspects.
Multiple sentiments for same opinion phrase for different aspects
Association of corresponding sentiments to aspects in a multi-aspect review
4. How is sentiment calculated or scored? [2M]
5. Compare Rule based approach and machine learning approach in sentiment
analysis [5].
6.
Machine translation
1. Compute the BLEU score for the below translations (candidate1,
candidate2).Consider 1gram,2 gram,3 gram,4 gram and Brevity-Penalty
for calculating BLUE score .
Reference: The teacher arrived late because of the traffic
Candidate 1: The teacher was late due to the traffic
Candidate 2: A teacher arrived late because of transportation
Soln:
2. Compute the BLEU score for the below translations (candidate1,
candidate2).Consider 1gram,2 gram,3 gram,4 gram and Brevity-
Penalty for calculating BLUE score .
Reference: The NASA Opportunity rover is battling a massive dust storm on
Mars.
Candidate 1: The Opportunity rover is combating a big sandstorm on Mars.
Candidate 2: A NASA rover is fighting a massive storm on Mars.
Soln:
Candidate
Metric Candidate 1
2
precision1 (1gram) 8/11 9/11
precision2 (2gram) 4/10 5/10
precision3 (3gram) 2/9 2/9
precision4 (4gram) 0/8 1/8
Brevity-Penalty 0.83 0.83
BLEU-Score 0.0 0.27
3. Compute the BLEU score for the machine transaltion output .
Assume 1gram ,2 gram and brevity penalty
Unigram
Bigram
P1=1+1+1+1+1+1/6 =4/6=0.67
brevity penalty =1
Final BLUE Score =1*0.71*0.67=0.48
4. Compute the BLEU score for the MT output.
(Assume 1gram,2 gram.3 gram.4 gram and brevity penality)
Question and answer system and Chat bot
1. Identify the type of the dialogue agent from the following conversation. Also
explain the working of that type of dialogue agent.
(5 marks)
System: How may I help you?
User: I want to book dinner at a restaurant that serves good
Chinese food
System: Berkos is a restaurant that serves good Chinese food.
User: ok
System: ShouId I book a dinner?
User: yes, please
System: ok.
Soln: Above is the task oriented dialogue system
The diagram shows the working of the task oriented dialogue system.
The student should explain briefly each step as mentioned in the slides
2. Find the intent, domain and slots for the following:
(2 marks)
Book an appointment on 12th Feb 2021 at 10 am for a ECG Test.
SOLN:
DOMAIN: Medical
INTENT: Book an Appointment
Slots
➢ Services: ECG TEST
➢ Date: 12th Feb 2021
➢ Time: 10 AM
3. In a collection of 10000 document, the following words occur in the
following number of documents:
(3 marks)
Oasis occurs in 400 documents, Place occurs in 3500 documents, Desert
occurs in 800 documents, Water occurs in 800 documents, Comes occur
in 800 documents
Beneath occurs in 200 documents, Ground occurs in 900 documents
Calculate TF-IDF term vector for the following document:
Oasis Place Desert Water Comes Beneath Ground Place
Word sense disambiguation and ontology
1) What are lexical sample task and all word task in word sense
disambiguation? How can sources like Wikipedia be used for word sense
disambiguation [2 marks]
Solution
What are lexical sample task and all word task in word sense
disambiguation?
Lexical sample task and all word task are 2 variants of word sense
disambiguation
Lexical sample task -Small pre-selected set of target words
All-words task - System is given an all-words entire texts and lexicon with
an inventory of senses for each entry. We have to disambiguate every word
in the text (or sometimes just every content word).
2. How can sources like Wikipedia be used for word sense disambiguation
Wikipedia can be used as training data for word sense disambiguation using
supervised learning techniques
Ans:
• Concept is mentioned in a Wikipedia: article text may contain an explicit
link to the concept’s Wikipedia page, which is named by a unique identifier
(can be used as a sense annotation)
• These sentences can then be added to the training data for a supervised
system.
3. How can WordNet relations be used for word sense disambiguation in
following sentences:
[3 marks]
1. A bat is not a bird, but a mammal.
2. Jaguar reveals its quickest car ever
3. Raghuram Rajan was the 23rd Governor of the Reserve Bank of India
Solution
Nouns and verbs can be extracted from the sentences. The senses in word net
can be extracted for these words and senses with close relations can be
extracted as correct sense.
1. Bat can be sports bat or mammal. But looking at nouns bat, bird and
mammal, correct sense of bat as MAMMAL can be found using WordNet
relations.
2. Jaguar can be a car or animal. Looking at nouns Jaguar, correct sense of
Jaguar as CAR can be found using WordNet relations.
3. Bank can be river bank or financial bank. : Search senses of nouns
Bank,”Raghuram Rajan”,
Governer. The correct sense of BANK as FINANCIAL sense can be found using
WordNet relations.
4. Consider below three types of movie reviews and convert it into Bag of words model:
• Review 1: This movie is very scary and long
• Review 2: This movie is not scary and is slow
• Review 3: This movie is spooky and good
2.
.
Semantic web
1. How is Syntactic web different from the Semantic web? What is URI in
semantic web ontology?
Syntactic web consist of huge data on net connected by hyperlinks which is
rendered by machines but machines cannot process it due to inability to
understand the meaning of the content.
The semantic Web identifies a set of technologies, tools, and standards which
form the basic building blocks of an infrastructure to support the vision of the
Web associated with meaning.
A Universal Resource Identifier (URI) is a formatted string that serves as a
means of identifying abstract or physical resource. A URI can be further
classified as a locator, a name,or both. Every resource is identified with unique
URI in ontology. Develop an OWL ontology using the following for animal
kingdom for classes likecarnivorous, herbivorous and omnivorous. Use
following Property characteristics, restrictions and Class expressions [3 marks]
inverseOf
domain
range
Cardinality
disjointWith
subClassOf
1. Design a sample ontology for the ‘academic university’ domain. Clearly mention the
a. Classes
b. Properties
c. Relations
d. Axioms / constraints
e.g. Professor, course with ‘teaches’ relation.
The ontology should contain about 10 classes with associated properties, relations and
axioms and presented in RDF triple format.
Answer key:
Some samples below. Pls make sure that the student has mentioned at least 10 classes with
associated properties, relations and axioms
Classes:
Professor rdf:typeowl:Class
Course rdf:typeowl:Class
Student rdf:typeowl:Class
Research rdf:typeowl:Class
Publication rdf:typeowl:Class
Thesis rdf:typeowl:Class
Properties:
Name rdf:typerdf:Property
Name rdfs:domain Professor
Name rdfs:domain Student
CourseID rdf:typerdf:Property
CourseID rdfs:domain Course
Relations:
Professor teaches Course
Publication hasAuthor Professor
Thesis hasAuthor Student
Professor hasInteresetArea Research
Axioms:
Student owl:disjointWith Professor
UndergradStudent rdfs:subClassOf Student
GradStudent rdfs:subClassOf Student
AssistantProfessor rdfs:subClassOf Professor
AssociateProfessor rdfs:subClassOf Professor
2. Describe any classification model (say, Naïve Bayes) to do word sense disambiguation (WSD).
Clearly indicate what is the model, the features, how is it trained, and how it is used for
prediction. Assume that disambiguation is for occurrence instances of a single word – for
example, the occurrence of ‘bank has different senses in the sentences below:
The bank will not be accepting cash on Saturdays.
The river overflowed the bank.
Answer Key:
For WSD, the context in which an ambiguous word occurs is represented by a vector of
features F=(f_1, f_2, …, f_n) and the sense of the ambiguous word is represented by the
class label (s_1, …, s_k). Choosing the right sense of the ambiguous word w can be modeled
as finding the sense s_i that maximizes the conditional probability P(w=s_i|F).
The NB classifier assumes conditional independence of features, so that:
The features for WSD using a NB algorithm are terms such as words, collocations, and words
assigned with their positions which are extracted from the context of the ambiguous word.
The probability of sense si, P(si), and the conditional probability of feature fj with
observation of sense si, P(fj|si), are computed via Maximum-Likelihood Estimation:
Where C(fj,si) is the number of occurrences of fj in a context of sense sj in the training
corpus, C(si) is the number of occurrences of si in the training corpus, and N is the total
number of occurrences of the ambiguous word w or the size of the training dataset. To avoid
the effects of zero counts when estimating the conditional probabilities of the model, when
meeting a new feature fj in a context of the test dataset, for each sense si we set P(fj|w=si)
equal 1/N.
3. If you are asked to build a statistical phrase-based machine translation model for a word order-
based language (like, English), to a morphologically rich language (like,
Hindi or Sanskrit),
a. what problems do you foresee?
b. how will you address them?
Answer Key:
One of the key problems arising during translation from English to a morphologically rich language
(like, Hindi) is that of data sparsity.
There could be different strategies to address this –
1. collect large amounts of parallel corpora assuming that the data captures the morphological
variations.
2. Other option could be to break the problem into ‘translation’ and then ‘morphology selection’.
The source sentence / phrase is first translated into the target language with the root form of words.
As a follow-up step appropriate inflection of the word is selected by a separate model.
4. You are designing a frame-based dialog system for movie booking.
a. What are the different slots in your design? Mention along with their corresponding entity types
and questions that the system would ask a user.
b. Show a finite-state dialog manager for the system
c. What changes would you make to the design to change it from a single initiative system to multi-
initiative system?
Finite-state dialog manager – the system asks sequence of questions to fill up each slot. Each
question is independent of the other.
Accept an answer that presents a sequence of questions. Refer to slide-8
http://courses.washington.edu/ling575/SPR2017/slides/ling575_class4_DM_flat.pdf
The above system is single initiative as the control is always with the system. In multi-
initiative systems, the control could arbitrarily shift between system/user. For this, we use
the concept of frames that are collection of slots and their values. We then use models for
entity extraction and slot filling on user responses. The dialog manager keeps track of which
slots are filled up and which are available and then uses rule-based techniques to decide the
next action (question).
5. “These earphones are a good pick at this price. Connected with laptop for office calls and these
are working well although there is no noise cancellation. Quality of wires are a bit thin and look
delicate, though neckband is ok. Bass will seem ok if you have not used good quality earphones
earlier.”
You have been given product review data like the one shown above. You are asked to design a
sentiment analysis model for this data. What would be your approach?
Describe the different components of your solution. State any assumptions that you are making and
pros/cons (if any) of your approach.
Answer Key:
This product review data contains review of products and its different features.
Hence, an aspect-based sentiment analysis model would be most useful.
The model would have two key components –
1. An aspect extraction model to extract the product features. If the list of product attributes is
already available, we could just build a dictionary-based extractor that does direct lookup in this
dictionary. This would, however, be limited by the entries in the attributes dictionary and does not
support semantic variations. In order to account for semantic variations, one could leverage word
embedding- based models or use knowledge bases like WordNet. Although these models allow for
semantic variations, they are still limited by the vocabulary of the attributes seen during training of
these models. In order to support unseen attributes, one could train NER models.
2. A sentiment classification model for a given phrase / sentence. We could train a text classification
model using n-gram features. This would require labeled data.
The model could be applied either at the sentence or phrasal level.