Chapter: Introduction to Data Science
1. Which of the following can be used to create sub–
samples using a maximum dissimilarity approach?
A. minDissim
B. maxDissim
C. inmaxDissim
D. All of the Mentioned
Answer» B. maxDissim
discuss
2. Which of the following can be used to impute data
sets based only on information in the training set?
A. postprocess
B. preProcess
C. process
D. All of the Mentioned
Answer» B. preProcess
discuss
3. Which of the following model model include a
backwards elimination feature selection routine?
A. MCV
B. MARS
C. MCRS
D. All of the Mentioned
Answer» B. MARS
discuss
4. Which of the following is a categorical outcome?
A. RMSE
B. RSquared
C. Accuracy
D. All of the Mentioned
Answer» C. Accuracy
discuss
5. Which of the following function provides
unsupervised prediction ?
A. cl_forecast
B. cl_nowcast
C. cl_precast
5. Which of the following function provides
unsupervised prediction ?
D. None of the Mentioned
Answer» D. None of the Mentioned
discuss
6. What is true about Machine Learning?
A. Machine Learning (ML) is that field of computer science
ML is a type of artificial intelligence that extract patterns out of raw data by using an
B.
algorithm or method.
The main focus of ML is to allow computer systems learn from experience without
C.
being explicitly programmed or human intervention.
D. All of the above
Answer» D. All of the above
discuss
7. ML is a field of AI consisting of learning algorithms
that?
A. Improve their performance
B. At executing some task
C. Over time with experience
D. All of the above
Answer» D. All of the above
discuss
8. p → 0q is not a?
A. hack clause
B. horn clause
C. structural clause
D. system clause
Answer» B. horn clause
discuss
9. The action _______ of a robot arm specify to Place
block A on block B.
A. STACK(A,B)
B. LIST(A,B)
C. QUEUE(A,B)
D. ARRAY(A,B)
Answer» A. STACK(A,B)
discuss
10. A__________ begins by hypothesizing a sentence (the
symbol S) and successively predicting lower level
constituents until individual preterminal symbols are
written.
A. bottow-up parser
B. top parser
C. top-down parser
D. bottom parser
Answer» C. top-down parser
discuss
11. A model of language consists of the categories which
does not include ________.
A. System Unit
B. structural units.
C. data units
D. empirical units
Answer» B. structural units.
discuss
12. Different learning methods does not include?
A. Introduction
B. Analogy
C. Deduction
D. Memorization
Answer» A. Introduction
discuss
13. The model will be trained with data in one single
batch is known as ?
A. Batch learning
B. Offline learning
C. Both A and B
D. None of the above
Answer» C. Both A and B
discuss
14. Which of the following are ML methods?
A. based on human supervision
B. supervised Learning
14. Which of the following are ML methods?
C. semi-reinforcement Learning
D. All of the above
Answer» A. based on human supervision
discuss
15. In Model based learning methods, an iterative process
takes place on the ML models that are built based on
various model parameters, called ?
A. mini-batches
B. optimizedparameters
C. hyperparameters
D. superparameters
Answer» C. hyperparameters
discuss (1)
16. Which of the following is a widely used and effective
machine learning algorithm based on the idea of
bagging?
A. Decision Tree
B. Regression
C. Classification
D. Random Forest
Answer» D. Random Forest
discuss
17. To find the minimum or the maximum of a function,
we set the gradient to zero because:
A. The value of the gradient at extrema of a function is always zero
B. Depends on the type of problem
C. Both A and B
D. None of the above
Answer» A. The value of the gradient at extrema of a function is always zero
discuss
18. Which of the following is a disadvantage of decision
trees?
A. Factor analysis
B. Decision trees are robust to outliers
C. Decision trees are prone to be overfit
18. Which of the following is a disadvantage of decision
trees?
D. None of the above
Answer» C. Decision trees are prone to be overfit
discuss
19. How do you handle missing or corrupted data in a
dataset?
A. Drop missing rows or columns
B. Replace missing values with mean/median/mode
C. Assign a unique category to missing values
D. All of the above
Answer» D. All of the above
discuss
20. When performing regression or classification, which
of the following is the correct way to preprocess the
data?
A. Normalize the data -> PCA -> training
B. PCA -> normalize PCA output -> training
C. Normalize the data -> PCA -> normalize PCA output -> training
D. None of the above
Answer» A. Normalize the data -> PCA -> training
discuss
21. Which of the following statements about
regularization is not correct?
A. Using too large a value of lambda can cause your hypothesis to underfit the data.
B. Using too large a value of lambda can cause your hypothesis to overfit the data
C. Using a very large value of lambda cannot hurt the performance of your hypothesis.
D. None of the above
Answer» D. None of the above
discuss
22. Which of the following techniques can not be used for
normalization in text mining?
A. Stemming
B. Lemmatization
C. Stop Word Removal
D. None of the above
22. Which of the following techniques can not be used for
normalization in text mining?
Answer» C. Stop Word Removal
discuss
23. In which of the following cases will K-means
clustering fail to give good results? 1) Data points
with outliers 2) Data points with different densities 3)
Data points with nonconvex shapes
A. 1 and 2
B. 2 and 3
C. 1 and 3
D. All of the above
Answer» D. All of the above
discuss
24. Which of the following is a reasonable way to select
the number of principal components "k"?
A. Choose k to be the smallest value so that at least 99% of the varinace is retained.
B. Choose k to be 99% of m (k = 0.99*m, rounded to the nearest integer).
C. Choose k to be the largest value so that 99% of the variance is retained.
D. Use the elbow method.
Answer» A. Choose k to be the smallest value so that at least 99% of the varinace is
retained.
discuss
25. What is a sentence parser typically used for?
A. It is used to parse sentences to check if they are utf-8 compliant.
B. It is used to parse sentences to derive their most likely syntax tree structures.
C. It is used to parse sentences to assign POS tags to all tokens.
D. It is used to check if sentences can be parsed into meaningful tokens.
Answer» B. It is used to parse sentences to derive their most likely syntax tree structures.
discuss
26. Data Analysis is a process of?
A. inspecting data
B. cleaning data
C. transforming data
D. All of the above
Answer» D. All of the above
discuss
27. Which of the following is not a major data analysis
approaches?
A. Data Mining
B. Predictive Intelligence
C. Business Intelligence
D. Text Analytics
Answer» B. Predictive Intelligence
discuss
28. How many main statistical methodologies are used in
data analysis?
A. 2
B. 3
C. 4
D. 5
Answer» A. 2
discuss
29. In descriptive statistics, data from the entire
population or a sample is summarized with ?
A. integer descriptors
B. floating descriptors
C. numerical descriptors
D. decimal descriptors
Answer» C. numerical descriptors
discuss
30. Data Analysis is defined by the statistician?
A. William S.
B. Hans Peter Luhn
C. Gregory Piatetsky-Shapiro
D. John Tukey
Answer» D. John Tukey
discuss
31. Which of the following is true about hypothesis
testing?
A. answering yes/no questions about the data
B. estimating numerical characteristics of the data
C. describing associations within the data
31. Which of the following is true about hypothesis
testing?
D. modeling relationships within the data
Answer» A. answering yes/no questions about the data
discuss
32. The goal of business intelligence is to allow easy
interpretation of large volumes of data to identify new
opportunities.
A. TRUE
B. FALSE
C. Can be true or false
D. Can not say
Answer» A. TRUE
discuss
33. The branch of statistics which deals with development
of particular statistical methods is classified as
A. industry statistics
B. economic statistics
C. applied statistics
D. applied statistics
Answer» D. applied statistics
discuss
34. Which of the following is true about regression
analysis?
A. answering yes/no questions about the data
B. estimating numerical characteristics of the data
C. modeling relationships within the data
D. describing associations within the data
Answer» C. modeling relationships within the data
discuss
35. Text Analytics, also referred to as Text Mining?
A. TRUE
B. FALSE
C. Can be true or false
D. Can not say
Answer» A. TRUE
discuss
36. What is true about Data Visualization?
Data Visualization is used to communicate information clearly and efficiently to users
A.
by the usage of information graphics such as tables and charts.
B. Data Visualization helps users in analyzing a large amount of data in a simpler way.
C. Data Visualization makes complex data more accessible, understandable, and usable.
D. All of the above
Answer» D. All of the above
discuss
37. Data can be visualized using?
A. graphs
B. charts
C. maps
D. All of the above
Answer» D. All of the above
discuss
38. Data visualization is also an element of the broader
_____________.
A. deliver presentation architecture
B. data presentation architecture
C. dataset presentation architecture
D. data process architecture
Answer» B. data presentation architecture
discuss
39. Which method shows hierarchical data in a nested
format?
A. Treemaps
B. Scatter plots
C. Population pyramids
D. Area charts
Answer» A. Treemaps
discuss
40. Which is used to inference for 1 proportion using
normal approx?
A. fisher.test()
B. chisq.test()
C. Lm.test()
40. Which is used to inference for 1 proportion using
normal approx?
D. prop.test()
Answer» D. prop.test()
discuss
41. Which is used to find the factor congruence
coefficients?
A. factor.mosaicplot
B. factor.xyplot
C. factor.congruence
D. factor.cumsum
Answer» C. factor.congruence
discuss
42. Which of the following is tool for checking normality?
A. qqline()
B. qline()
C. anova()
D. lm()
Answer» A. qqline()
discuss
43. Which of the following is false?
A. data visualization include the ability to absorb information quickly
B. Data visualization is another form of visual art
C. Data visualization decrease the insights and take solwer decisions
D. None Of the above
Answer» C. Data visualization decrease the insights and take solwer decisions
discuss
44. Common use cases for data visualization include?
A. Politics
B. Sales and marketing
C. Healthcare
D. All of the above
Answer» D. All of the above
discuss
45. Which of the following plots are often used for
checking randomness in time series?
A. Autocausation
B. Autorank
C. Autocorrelation
D. None of the above
Answer» C. Autocorrelation
Chapter: Introduction to Machine Learning
1. To find the minimum or the maximum of a function,
we set the gradient to zero because:
A. The value of the gradient at extrema of a function is always zero
B. Depends on the type of problem
C. Both A and B
D. None of the above
Answer» A. The value of the gradient at extrema of a function is always zero
discuss
2. Which of the following techniques can not be used for
normalization in text mining?
A. Stemming
B. Lemmatization
C. Stop Word Removal
D. None of the above
Answer» C. Stop Word Removal
discuss
3. In which of the following cases will K-means
clustering fail to give good results? 1) Data points
with outliers 2) Data points with different densities 3)
Data points with nonconvex shapes
A. 1 and 2
B. 2 and 3
C. 1 and 3
D. All of the above
Answer» D. All of the above
discuss
4. Which of the following is a reasonable way to select
the number of principal components "k"?
A. Choose k to be the smallest value so that at least 99% of the varinace is retained.
4. Which of the following is a reasonable way to select
the number of principal components "k"?
B. Choose k to be 99% of m (k = 0.99*m, rounded to the nearest integer).
C. Choose k to be the largest value so that 99% of the variance is retained.
D. Use the elbow method.
Answer» A. Choose k to be the smallest value so that at least 99% of the varinace is
retained.
discuss
5. Which of the following is false?
Subsetting can be used to select and exclude variables and
A.
observations
B. Raw data should be processed only one time.
Merging concerns combining datasets on the same observations to
C.
produce a result with more variables
D. None Of the above
Answer» B. Raw data should be processed only one time.
1. what is the primary goal of data science?
A. Data Visualization
B. Data Cleaning
C. Predictive Analytics
D. Extracting Data from APIs
Answer & Solution Discuss in Board Save for Later
Answer & Solution
Answer: Option C
No explanation is given for this question Let's Discuss on Board
Pause
Unmute
Loaded: 4.20%
Fullscreen
2.
Which programming language is commonly used for
Data Science tasks?
A. Java
B. Python
C. C++
D. JavaScript
Answer & Solution Discuss in Board Save for Later
Answer & Solution
Answer: Option B
No explanation is given for this question Let's Discuss on Board
3.
Which step in the Data Science process involves
understanding and preparing the data for analysis?
A. Data Collection
B. Data Visualization
C. Data Cleaning
D. Model Building
Answer & Solution Discuss in Board Save for Later
Answer & Solution
Answer: Option C
No explanation is given for this question Let's Discuss on Board
4.
What is the term for a data point that falls far from the
rest of the data in a dataset?
A. Outlier
B. Median
C. Mean
D. Variance
Answer & Solution Discuss in Board Save for Later
Answer & Solution
Answer: Option A
No explanation is given for this question Let's Discuss on Board
5.
Which of the following is NOT a type of machine
learning algorithm commonly used in Data Science?
A. Linear Regression
B. K-Means Clustering
C. Decision Trees
D. Object-Oriented Programming
Answer & Solution Discuss in Board Save for Later
Answer & Solution
Answer: Option D
No explanation is given for this question Let's Discuss on Board
6.
Which technology is often used to process and analyze
large-scale data sets in Data Science?
A. Hadoop
B. SQL
C. Python
D. HTML
Answer & Solution Discuss in Board Save for Later
Answer & Solution
Answer: Option A
No explanation is given for this question Let's Discuss on Board
7.
What does the acronym "EDA" stand for in Data
Science?
A. Exploratory Data Analysis
B. Effective Data Algorithms
C. Extracted Data Aggregation
D. Efficient Data Assessment
Answer & Solution Discuss in Board Save for Later
Answer & Solution
Answer: Option A
No explanation is given for this question Let's Discuss on Board
8.
Which of the following is NOT a key skill required for a
Data Scientist?
A. Data Visualization
B. Storytelling
C. Database Administration
D. Machine Learning
Answer & Solution Discuss in Board Save for Later
Answer & Solution
Answer: Option C
No explanation is given for this question Let's Discuss on Board
9.
Which step in the Data Science process involves
selecting the appropriate model and algorithm for
analysis?
A. Data Cleaning
B. Data Visualization
C. Data Collection
D. Model Building
Answer & Solution Discuss in Board Save for Later
Answer & Solution
Answer: Option D
No explanation is given for this question Let's Discuss on Board
10.
In Data Science, what is the term for a dataset that
contains both input features and output labels?
A. Test Data
B. Training Data
C. Validation Data
D. Unlabeled Data
Answer & Solution Discuss in Board Save for Later
Answer & Solution
Answer: Option B
A. Data Cleaning
B. Data Visualization
C. Data Analysis
D. Data Storage
Answer & Solution Discuss in Board Save for Later
Play
Unmute
Loaded: 1.02%
Fullscreen
Skip Ad
12.
Which of the following is NOT a common data format
used in Data Science projects?
A. JSON
B. XML
C. CSV
D. HTML
Answer & Solution Discuss in Board Save for Later
13.
Which of the following is a technique used to handle
missing data in a dataset?
A. Data Augmentation
B. Data Imputation
C. Data Transformation
D. Data Normalization
Answer & Solution Discuss in Board Save for Later
14.
What is the primary purpose of data visualization in
Data Science?
A. To make data more complicated
B. To simplify complex data
C. To increase data complexity
D. To store data
Answer & Solution Discuss in Board Save for Later
15.
Which step in the Data Science process involves
building and training predictive models?
A. Data Collection
B. Data Visualization
C. Data Cleaning
D. Model Building
Answer & Solution Discuss in Board Save for Later
Answer & Solution
Answer: Option D
No explanation is given for this question Let's Discuss on Board
16.
What is the process of extracting patterns and
information from data called?
A. Data Wrangling
B. Data Visualization
C. Data Engineering
D. Data Mining
Answer & Solution Discuss in Board Save for Later
Answer & Solution
Answer: Option D
No explanation is given for this question Let's Discuss on Board
17.
Which statistical measure represents the middle value
of a dataset when it is sorted in ascending order?
A. Median
B. Mean
C. Standard Deviation
D. Mode
Answer & Solution Discuss in Board Save for Later
Answer & Solution
Answer: Option A
No explanation is given for this question Let's Discuss on Board
18.
In Data Science, what is the purpose of feature
engineering?
A. To extract features from data
B. To visualize data features
C. To clean data features
D. To model data features
Answer & Solution Discuss in Board Save for Later
Answer & Solution
Answer: Option A
No explanation is given for this question Let's Discuss on Board
19.
What is the term for a machine learning algorithm that
learns from historical data to make predictions about
the future?
A. Regression
B. Clustering
C. Classification
D. Supervised Learning
Answer & Solution Discuss in Board Save for Later
Answer & Solution
Answer: Option D
Solution:
Supervised Learning is the correct term for a machine learning algorithm that
learns from historical data to make predictions about the future. In supervised
learning, the algorithm is trained on a labeled dataset, where the input data is
paired with corresponding output labels. During the training process, the
algorithm learns the relationship between input features and output labels,
enabling it to make predictions on new, unseen data.
Regression is a specific type of supervised learning used for predicting
continuous values, such as predicting house prices or stock prices.
Clustering is an unsupervised learning technique used for grouping similar data
points together based on their features.
Classification is another type of supervised learning used for predicting discrete
categories or classes, such as spam detection or image recognition.
Therefore, Option D: Supervised Learning is the correct term for a machine
learning algorithm that learns from historical data to make predictions about the
future.
20.
Which of the following is an example of an
unsupervised learning algorithm used in clustering
data?
A. Linear Regression
B. K-Means Clustering
C. Decision Trees
D. Logistic Regression
Answer & Solution Discuss in Board Save for Later
Answer & Solution
Answer: Option B
Section 1 Section 2
1.
Point out the wrong statement.
A. Randomized studies are not used to identify causation
B. Complication approached exist for inferring causation
C. Causal relationships may not apply to every individual
D. All of the mentioned
Answer & Solution Discuss in Board Save for Later
Answer & Solution
Answer: Option A
No explanation is given for this question Let's Discuss on Board
2.
Which of the following technique comes under practical
machine learning?
A. Bagging
B. Boosting
C. Forecasting
D. None of the mentioned
Answer & Solution Discuss in Board Save for Later
Answer & Solution
Answer: Option B
No explanation is given for this question Let's Discuss on Board
3.
Which of the following is a good way of performing
experiments in data science?
A. Measure variability
B. Generalize to the problem
C. Have Replication
D. All of the mentioned
Answer & Solution Discuss in Board Save for Later
Answer & Solution
Answer: Option D
No explanation is given for this question Let's Discuss on Board
4.
Point out the correct statement.
A. Raw data is original source of data
B. Preprocessed data is original source of data
C. Raw data is the data obtained after processing steps
D. None of the mentioned
Answer & Solution Discuss in Board Save for Later
Answer & Solution
Answer: Option A
No explanation is given for this question Let's Discuss on Board
5.
Which of the following is the top most important thing in
data science?
A. answer
B. question
C. data
D. none of the mentioned
Answer & Solution Discuss in Board Save for Later
Answer & Solution
Answer: Option B
No explanation is given for this question Let's Discuss on Board
6.
Which of the following characteristic of big data is
relatively more concerned to data science?
A. Velocity
B. Variety
C. Volume
D. None of the mentioned
Answer & Solution Discuss in Board Save for Later
Answer & Solution
Answer: Option B
No explanation is given for this question Let's Discuss on Board
7.
Which of the following is the common goal of statistical
modelling?
A. Inference
B. Summarizing
C. Subsetting
D. None of the mentioned
Answer & Solution Discuss in Board Save for Later
Answer & Solution
Answer: Option A
No explanation is given for this question Let's Discuss on Board
8.
Which of the following analytical capabilities are
provided by information management company?
A. Stream Computing
B. Content Management
C. Information Integration
D. All of the mentioned
Answer & Solution Discuss in Board Save for Later
Answer & Solution
Answer: Option D
No explanation is given for this question Let's Discuss on Board
9.
Which of the following uses relatively small amount of
data to estimate about bigger population?
A. Inferential
B. Exploratory
C. Causal
D. None of the mentioned
Answer & Solution Discuss in Board Save for Later
Answer & Solution
Answer: Option A
No explanation is given for this question Let's Discuss on Board
10.
Point out the correct statement.
A. If equations are known but the parameters are not, they may be inferred with data analysis
B. If equations are not known but the parameters are, they may be inferred with data analysis
C. If equations and parameter are not, they may be inferred with data analysis
D. None of the mentioned
Answer & Solution Discuss in Board Save for Later
Answer & Solution
Answer: Option A
Section 2
31.
What is the term for the process of removing or
reducing noise and inconsistencies from data?
A. Data Integration
B. Data Transformation
C. Data Aggregation
D. Data Cleansing
Answer & Solution Discuss in Board Save for Later
Answer & Solution
Answer: Option D
No explanation is given for this question Let's Discuss on Board
Play
Unmute
Loaded: 1.17%
Fullscreen
32.
Which of the following best describes the purpose of
data sampling in Data Science?
A. To analyze the entire dataset
B. To select a representative subset
C. To visualize data
D. To calculate data statistics
Answer & Solution Discuss in Board Save for Later
Answer & Solution
Answer: Option B
No explanation is given for this question Let's Discuss on Board
33.
Which statistical measure represents the spread or
dispersion of data values in a dataset?
A. Median
B. Mean
C. Standard Deviation
D. Mode
Answer & Solution Discuss in Board Save for Later
Answer & Solution
Answer: Option C
No explanation is given for this question Let's Discuss on Board
34.
In Data Science, what is the term for a data point that is
missing a value for one or more features?
A. Outlier
B. Anomaly
C. Null Value
D. Feature
Answer & Solution Discuss in Board Save for Later
Answer & Solution
Answer: Option C
No explanation is given for this question Let's Discuss on Board
35.
What is the primary objective of data exploration in Data
Science?
A. To build predictive models
B. To find hidden patterns
C. To summarize data
D. To collect data
Answer & Solution Discuss in Board Save for Later
Answer & Solution
Answer: Option B
No explanation is given for this question Let's Discuss on Board
36.
Which type of data is represented by categories or
labels and cannot be measured numerically?
A. Numerical Data
B. Categorical Data
C. Continuous Data
D. Ordinal Data
Answer & Solution Discuss in Board Save for Later
Answer & Solution
Answer: Option B
No explanation is given for this question Let's Discuss on Board
37.
In Data Science, what is the purpose of data wrangling?
A. To create complex models
B. To transform raw data
C. To visualize data
D. To remove outliers
Answer & Solution Discuss in Board Save for Later
Answer & Solution
Answer: Option B
No explanation is given for this question Let's Discuss on Board
38.
What is the process of splitting a dataset into a training
set and a test set used for machine learning called?
A. Data Partitioning
B. Data Sampling
C. Data Splitting
D. Data Shuffling
Answer & Solution Discuss in Board Save for Later
Answer & Solution
Answer: Option A
No explanation is given for this question Let's Discuss on Board
39.
Which of the following is a common algorithm used for
classification in supervised learning?
A. K-Means Clustering
B. Decision Tree
C. Principal Component Analysis (PCA)
D. Naive Bayes
Answer & Solution Discuss in Board Save for Later
Answer & Solution
Answer: Option B
No explanation is given for this question Let's Discuss on Board
40.
What does the acronym "SQL" stand for in the context
of Data Science?
A. Structured Query Language
B. Statistical Query Language
C. Simplified Query Language
D. Secure Query Language
Answer & Solution Discuss in Board Save for Later
Answer & Solution
Answer: Option A