Here’s your revised question set with Bloom’s Taxonomy Levels marked as L3 (Applying),
L4 (Analyzing), and L5 (Evaluating), where applicable. Questions are kept at easy to
moderate difficulty.
UNIT-3
🔹 FILL IN THE BLANKS (WITH LEVELS &
ANSWERS)
No. Question Level Answer
In feature selection, we aim to reduce the number of ________ used for
1 L3 features
classification.
The Vector Space Model represents documents as vectors in a ________ high-
2 L3
space. dimensional
The Rocchio classification algorithm is based on the ________ of
3 L3 centroid
documents in each class.
In k-nearest neighbor (k-NN), the class is decided by the majority among
4 L3 neighbors
its ________.
The bias-variance tradeoff balances between generalization and ________
5 L4 overfitting
to training data.
6 Linear classifiers use a ________ decision boundary. L3 straight-line
A high bias model typically underfits, while a high variance model tends to
7 L4 overfit
________.
Document similarity in vector space models is often measured using
8 L3 cosine
________ similarity.
9 Classification with more than two classes is called ________ classification. L3 multi-class
One method of document representation is the ________ weighting
10 L3 TF-IDF
scheme.
🔹 MULTIPLE CHOICE QUESTIONS (WITH LEVELS
& ANSWERS)
No. Question Options Level Answer
Which of the following is used to
a) Hash maps b) Decision trees c)
1 represent a document in vector space L3 c) Vectors
Vectors d) Graphs
models?
a) Adds features b) Removes
In text classification, feature selection is b) Removes
2 noise c) Increases size d) Adds L4
important because: noise
complexity
Which classifier uses the average vector a) Naive Bayes b) Rocchio c) k-
3 L3 b) Rocchio
of documents in each class? NN d) Decision Tree
Which similarity measure is commonly a) Hamming b) Cosine c)
4 L3 b) Cosine
used in vector space classification? Manhattan d) Euclidean
The bias-variance tradeoff aims to a) Low bias & variance b) High a) Low bias &
5 L4
achieve: bias & variance variance
a) Features b) Classes c)
6 In k-NN classification, ‘k’ refers to: L3 c) Neighbors
Neighbors d) Documents
Which classifier uses a straight-line a) Naive Bayes b) Linear c)
7 L3 b) Linear
decision boundary? Decision Tree d) Neural Network
a) Only binary b) One-vs-all c)
8 Multi-class classification involves: L4 b) One-vs-all
Clustering d) None
a) TF-IDF b) Bag of Words c)
Which is NOT a document representation c) Cosine
9 Cosine Similarity d) Word L4
method? Similarity
Embeddings
A high variance model often performs a) It has low complexity b) It
10 well on training data but poorly on new generalizes well c) It overfits d) It L4 c) It overfits
data because: underfits
Let me know if you want a PDF or Word file version, or if you'd like to add L1, L2 level
questions for revision/recap.
UNIT-IV,
✅ Answers
🎯 Bloom’s Taxonomy Levels (L3: Applying, L4: Analyzing, L5: Evaluating)
🔹 FILL IN THE BLANKS (WITH ANSWERS &
LEVELS)
No. Question Level Answer
In the linearly separable case, Support Vector Machines try to
1 L3 margin
maximize the ________ between classes.
SVMs are effective for text classification because they handle high
2 L4 dimensional
________ spaces well.
In machine learning for information retrieval, ranking functions are
3 L3 training
learned from ________ data.
non-
4 Flat clustering generates ________ sets of documents. L3
overlapping
K-means clustering aims to minimize the total within-cluster
5 L4 variance
________.
In hierarchical clustering, clusters are organized into a ________
6 L3 tree
structure.
Single-link clustering uses the ________ distance between
7 L4 minimum
members of two clusters.
Complete-link clustering considers the ________ distance between
8 L4 maximum
members of two clusters.
Model-based clustering uses statistical models like ________
9 L4 Gaussian
mixtures.
In clustering evaluation, metrics like purity and ________ score are
10 L5 F1
used.
🔹 MULTIPLE CHOICE QUESTIONS (WITH
ANSWERS & LEVELS)
No. Question Options Level Answer
What is the main goal of a) Maximize accuracy b) Minimize
c) Maximize
1 Support Vector Machines in loss c) Maximize margin d) Reduce L3
margin
linearly separable cases? cost
Which property makes a) Low runtime b) Simplicity c)
c) Handles high
2 SVM suitable for text Handles high dimensions well d) L4
dimensions well
classification? Needs little data
Flat clustering is best a) Nested b) Non-hierarchical c) b) Non-
3 L3
described as: Probabilistic d) Incremental hierarchical
In K-means, what a) Number of features b) Inter-
c) Intra-cluster
4 determines the quality of cluster distance c) Intra-cluster L4
similarity
clustering? similarity d) Random initialization
No. Question Options Level Answer
a) Handles non-linear data well b)
What is a limitation of K- Sensitive to initial centroids c) b) Sensitive to
5 L4
means clustering? Requires no parameters d) Good for initial centroids
hierarchical data
Which clustering algorithm a) K-means b) Flat clustering c) c) Hierarchical
6 L3
produces a dendrogram? Hierarchical clustering d) SVM clustering
a) Average distance b) Maximum
Which distance metric does c) Minimum
7 distance c) Minimum distance d) L4
single-link clustering use? distance
Centroid distance
In complete-link clustering,
a) Closest pair b) Farthest pair c)
8 the distance between L4 b) Farthest pair
Average pair d) Random pair
clusters is based on:
a) Uniform distributions b) No
Model-based clustering c) Underlying
9 structure c) Underlying data models L4
methods assume: data models
d) Manual labeling
In text retrieval, machine a) Retrieve static lists b) Learn b) Learn
10 learning is applied in ad hoc ranking functions c) Remove L3 ranking
retrieval to: stopwords d) Create ontologies functions
Would you like these compiled into a PDF or DOCX file? I can also add True/False, Short
Answers, or questions from L1–L2 levels if you're preparing a full exam.
UNIT-V 🎯 Bloom’s Taxonomy Levels marked as L3 (Applying), L4 (Analyzing), and L5
(Evaluating)
🧠 Questions at easy to moderate difficulty level
🔹 FILL IN THE BLANKS (WITH LEVELS &
ANSWERS)
No. Question Level Answer
The web search industry primarily uses ________ as its
1 L3 advertising
economic model.
The process of identifying and filtering out near-duplicate
2 L4 shingling
documents is called ________.
The search engine component that retrieves web pages for
3 L3 crawler
indexing is called a ________.
PageRank is a link analysis algorithm that assigns
4 L4 backlinks
importance based on the number and quality of ________.
5 The web can be represented as a ________, with pages as L3 graph
No. Question Level Answer
nodes and hyperlinks as edges.
________ servers are used to analyze the structure and
6 L4 connectivity
connectivity of the web.
The algorithm that considers both hub and authority scores HITS (or Hubs and
7 L4
is known as ________. Authorities)
Web indexes are often ________ to improve search speed
8 L4 distributed
and availability.
Estimating index size helps determine the amount of
9 L3 memory (or space)
________ needed for storage and processing.
A user’s interaction with a search engine, from query to search user
10 L3
result selection, defines the ________. experience
🔹 MULTIPLE CHOICE QUESTIONS (WITH LEVELS
& ANSWERS)
No. Question Options Level Answer
Which economic model
a) Subscription b) Licensing
1 supports most web search L3 c) Advertising
c) Advertising d) Donations
engines?
What technique is used to
a) Hashing b) Tokenizing c)
2 detect near-duplicate web L4 c) Shingling
Shingling d) Chunking
documents?
a) Rank pages b) Fetch pages
What is the function of a web
3 c) Store queries d) Optimize L3 b) Fetch pages
crawler?
ads
PageRank assigns high scores a) Frequently updated b) b) Linked by
4 L4
to pages that are: Linked by important pages important pages
a) Indexing b) Duplicate
What is the HITS algorithm
5 detection c) Link analysis d) L4 c) Link analysis
used for?
Caching
In a web graph, the nodes a) Keywords b) Queries c)
6 L3 c) Pages
represent: Pages d) Servers
a) Avoid duplicates b)
Distributed indexes in web
7 Improve ranking c) Speed up L3 c) Speed up search
search help to:
search
What role does a connectivity
a) Stores metadata b) b) Analyzes link
8 server play in web search L4
Analyzes link structure structure
architecture?
a) Filtering results b) Storing
The process of web crawling c) Visiting and
9 ads c) Visiting and L3
involves: downloading pages
downloading pages
Which of the following a) Irrelevant ads b) Slow
c) Accurate and
10 enhances the search user loading c) Accurate and fast L5
fast results
experience? results
Would you like this combined with previous units into a single exam paper or quiz set
(PDF, DOCX, or CSV)?
I can also help generate short answers, case-based questions, or diagram-based questions
if needed.