Q1
Which of the following refers to the problem of finding abstracted patterns (or structures) in
unlabeled data?
● Answer: Unsupervised learning
Q2
Which of the following is an essential process in which intelligent methods are applied to extract
data patterns?
● Answer: Data Mining
Q3
Suppose one wants to predict the number of newborns according to the size of storks' population by
performing supervised learning.
● Answer: Regression
Q4
Euclidean distance measure can also be defined as
● Answer: The distance between two points as calculated using the Pythagoras theorem
Q5
Which of the following is also used as the first step in the knowledge discovery process?
● Answer: Data Cleaning
Q6
Identify the correct option which defines Data Mart.
● Answer: A subgroup of data warehouse
Q7
Choose the incorrect property of the data warehouse.
● Answer: Volatile
Q8
The time horizon in a Data Warehouse is usually
● Answer: 5-10 years
Q9
The proportion of transactions supporting X in T is called
● Answer: Support
Q10
The step eliminates the extensions of (k-1)-itemsets which are not found to be frequent, from being
considered for counting support.
● Answer: Pruning
Q11
The goal of _______ is to discover both the dense and sparse regions of a data set.
● Answer: Clustering
Q12
Which of the following technology is not well suited for data mining?
● Answer: Expert System technology
Q13
The star schema is composed of _______ fact table.
● Answer: 1
Q14
_______ is data about data.
● Answer: Metadata
Q15
The second stage of the Apriori algorithm is
● Answer: Candidate generation
Q16
Each neuron is made up of a number of nerve fibers called
● Answer: Dendrites
Q17
Which of the following process is not involved in the data mining process?
● Answer: Data archaeology
Q18
Which statement given below closely defines the term data selection?
● Answer: The selection of correct data for the process of Knowledge Discovery Database
Q19
Classification rules are extracted from
● Answer: Decision tree
Q20
Extreme values that occur infrequently are called
● Answer: Outliers
Q21
Incorrect or invalid data is known as
● Answer: Noisy data
Q22
If T consists of 500,000 transactions, 20,000 transactions contain bread, 30,000 transactions contain
jam, and 10,000 transactions contain both bread and jam. Then the support of bread and jam is
● Answer: 2%
Q23
The proportion of transactions supporting X in T is called
● Answer: Support
Q24
In a feed-forward network, the connections between layers are _______ from input to output.
● Answer: Unidirectional
Q25
In web mining, _______ is used to know which URLs tend to be requested together.
● Answer: Associations
Q1
What is a subject-oriented, integrated, time-variant, nonvolatile collection of data in support of
management decisions?
● Answer: Data Warehousing
Reason: Data warehousing is explicitly designed for management decision-making and aligns with the
described features.
Q2
The data is stored, retrieved, and updated in
● Answer: OLTP
Reason: OLTP (Online Transaction Processing) systems manage day-to-day transactions efficiently,
covering storage, retrieval, and updates.
Q3
Removing duplicate records is a process called
● Answer: Data cleansing
Reason: While “data cleaning” and “data cleansing” are closely related, "data cleansing" is the standard
term for removing duplicate or inconsistent data.
Q4
Which of the following refers to the problem of finding abstracted patterns (or structures) in
unlabeled data?
● Answer: Unsupervised learning
Reason: Unsupervised learning involves working with unlabeled data to identify hidden patterns or
groupings.
Q5
What is KDD in data mining?
● Answer: Knowledge Discovery in Database
Reason: KDD stands for Knowledge Discovery in Databases, a process for extracting valuable
knowledge from large datasets.
Q6
Classification rules are extracted from
● Answer: Decision tree
Reason: Decision trees are commonly used to derive classification rules by splitting data based on feature
values.
Q7
In ______, the groups are not predefined.
● Answer: Clustering
Reason: Clustering is an unsupervised technique where groups or clusters are formed without predefined
labels.
Q8
Multiple numbers of data sources get combined in which step of the Knowledge Discovery process?
● Answer: Data Integration
Reason: Data integration is the step where data from different sources is combined into a coherent data
repository.
Q9
If T consists of 500,000 transactions, 20,000 transactions contain bread, 30,000 transactions contain
jam, and 10,000 transactions contain both bread and jam, what is the confidence of buying bread
with jam?
● Answer: 50%
Reason: Confidence = (Transactions containing both items) / (Transactions containing the first item) =
10,000 / 20,000 = 50%.
Q10
The step that eliminates the extensions of (k-1)-itemsets, which are not frequent, from being
considered for counting support is called
● Answer: Pruning
Reason: Pruning in Apriori algorithms involves eliminating non-frequent itemsets to optimize support
counting.
Q11
________ describes the data contained in the data warehouse.
● Answer: Metadata
Reason: Metadata provides detailed information about the structure, contents, and organization of data in
a data warehouse.
Q12
The star schema is composed of ______ fact table.
● Answer: One
Reason: In a star schema, there is a central fact table surrounded by dimension tables.
Q13
________ is a good alternative to the star schema.
● Answer: Snowflake schema
Reason: A snowflake schema normalizes dimension tables, unlike the denormalized star schema.
Q14
Rule-based classification algorithms generate ______ rules to perform the classification.
● Answer: If-then
Reason: Rule-based classifiers create "if-then" rules for assigning classes based on attribute values.
Q15
Records cannot be updated in
● Answer: Data warehouse
Reason: Data warehouses are designed for analysis and reporting, not for transactional updates.
Q16
Data warehouse contains ______ data that is never found in the operational environment.
● Answer: Denormalized
Reason: Denormalization in a data warehouse improves performance by storing data in a more readable
and accessible format.
Q17
In customer relationship management, we can detect outlier customers using
● Answer: Contextual outlier detection
Reason: Contextual outlier detection is specifically used to identify anomalies within specific contexts,
such as customer behavior.
Q18
A two-step process is followed in the Apriori property algorithm
● Answer: Join and Prune
Reason: The Apriori algorithm joins itemsets to create candidate itemsets and prunes non-frequent ones.
Q19
Assume the minimum support is 60%, and the number of transactions in the database is 5. Find the
support value.
● Answer: 3
Reason: Support value = (Minimum support) x (Total transactions) = 0.6 x 5 = 3.
Q20
It was shown that the Naive Bayesian method
● Answer: Can be almost optimal only when the attributes are independent
Reason: Naive Bayes assumes independence between attributes, making it most effective in such cases.
Q21
The terms equality and roll-up are associated with
● Answer: OLAP
Reason: Roll-up is an OLAP operation used to aggregate data along a hierarchy.
Q22
________ are highly simplified models of biological neurons.
● Answer: Artificial Neurons
Reason: Artificial neurons in artificial neural networks mimic the structure and function of biological
neurons.
Q23
OLAP stands for
● Answer: Online Analytical Processing
Reason: OLAP is designed for analyzing multidimensional data interactively.
Q24
The human brain consists of a network of
● Answer: Neurons
Reason: The brain’s network consists of interconnected neurons that transmit signals.
Q25
The time horizon in a Data Warehouse is usually
● Answer: 5-10 years
Reason: Data warehouses typically maintain data over a longer time horizon to support historical
analysis.