SRM TRP Engineering College
Department of Computer Science and Engineering
Ques%on Bank for AD3491 FUNDAMENTALS OF DATA SCIENCE AND ANALYTICS
Part A – Short Answer Ques%ons
1. List down any five skills required for a data analyst.
2. Outline the significance of Exploratory Data Analysis (EDA).
3. Tabulate the differences between univariate, bivariate, and mulIvariate analysis. Give an
example.
4. Give an example of a dataset with a non-Gaussian distribuIon.
5. Explain the term 'Normal DistribuIon'.
6. Brief about the Type I and Type II errors in staIsIcs. IdenIfy the relaIonship between
standard error and margin of error.
7. With an assumpIon of null hypothesis as correct, what does it mean when p-values are high
and low?
8. Define the term one-factor ANOVA.
9. Outline a few approaches to detect outliers. Explain different ways to deal with it.
10. Give an approach to handle missing values in a dataset.
Part B – Long Answer Ques%ons
11. a) Brief about exploratory data analysis in dataset analysis and knowledge discovery process.
Or
11. b) i) Outline the purpose of data cleansing. How are missing and nullified data a[ributes
handled and modified during the preprocessing stage? (6)
ii) Explain the Data AnalyIc life cycle. Brief about Regression Analysis. (7)
12. a) i) Indicate whether each of the following distribuIon is posiIvely or negaIvely skewed.
(1) Incomes of taxpayers have a mean of $48,000 and a median of $43,000.
(2) GPAs for all students at some college have a mean of 3.01 and a median of 3.20 (6)
SRM TRP Engineering College
Department of Computer Science and Engineering
ii) Consider the following number of online examinaIon a[empts by 15 students: 2, 17, 5, 3,
28, 7, 5, 8, 5, 6, 2, 12, 10, 4, 3.
(1) Find the mode, median, and mean for these data.
(2) Draw the distribuIon for balanced, posiIvely skewed, or negaIvely skewed. (7)
Or
12. b) i) Assume that SAT math scores approximate a normal curve with a mean of 500 and a
standard deviaIon of 100. Sketch a normal curve and shade in the target areas described by each
of the following statements:
• More than 570
• Less than 515
• Between 520 and 540. (5)
ii) Assume that the burning Imes of electric light bulbs approximate a normal curve with a
mean of 1200 hours and a standard deviaIon of 120 hours. If a large number of new lights are
installed at the same Ime (possibly along a newly opened freeway), at what Ime will:
• 1 percent fail? (2)
• 50 percent fail? (2)
• 95 percent fail? (4)
13. a) Among 100 couples who had undergone marital counseling, 60 couples described their
relaIonships as improved, and among this la[er group, 45 couples had children. The remaining
couples described their relaIonships as unimproved, and among this group, 5 couples had
children.
(1) What is the probability of randomly selecIng a couple who described their relaIonship as
improved? (4)
(2) What is the probability of randomly selecIng a couple with children? (4)
(3) What is the condiIonal probability of randomly selecIng a couple with children, given that
their relaIonship was described as improved? (5)
Or
13. b) State any two reasons why the research hypothesis is not tested directly. Explain them in
brief.
SRM TRP Engineering College
Department of Computer Science and Engineering
14. a) Explain ANOVA in detail with an example.
Or
14. b) The F test describes the raIo of two sources of variability: that for subjects treated
differently and that for subjects treated similarly. Is there any sense in which the t test for two
independent groups can be viewed likewise?
15. a) Define autocorrelaIon and how is it calculated? What does the negaIve correlaIon
convey?
Or
15. b) What is the philosophy of logisIc regression? What kind of model is it? What does logisIc
regression predict? Tabulate the cardinal differences between linear and logisIc regression.
Part C – Essay Ques%ons (1 x 15 = 15 marks)
16. a) Explain populaIon and samples and their difference.
Or
16. b) Imagine a simple populaIon consisIng of only 5 observaIons: 2, 4, 6, 8, 10. List all
possible samples of size two. Construct a relaIve frequency table showing the sample
distribuIon of the mean.