0% found this document useful (0 votes)
11 views7 pages

Question Bank

This document is a question bank for the course 'Fundamentals of Data Science and Analytics' at Anna University, covering various topics in data science, analytics, and statistics. It includes both 2-mark and 10-mark questions across multiple units, focusing on definitions, applications, processes, and techniques related to data science, descriptive analytics, inferential statistics, analysis of variance, and predictive analysis. The questions are designed to assess knowledge on key concepts, methodologies, and practical applications in the field.

Uploaded by

pvarshinibca
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views7 pages

Question Bank

This document is a question bank for the course 'Fundamentals of Data Science and Analytics' at Anna University, covering various topics in data science, analytics, and statistics. It includes both 2-mark and 10-mark questions across multiple units, focusing on definitions, applications, processes, and techniques related to data science, descriptive analytics, inferential statistics, analysis of variance, and predictive analysis. The questions are designed to assess knowledge on key concepts, methodologies, and practical applications in the field.

Uploaded by

pvarshinibca
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

lOMoARcPSD|48712373

Question bank

Artificial intelligence and data science (Anna University)

Scan to open on Studocu

Studocu is not sponsored or endorsed by any college or university


Downloaded by Varshini Pattabiraman (pvarshinibca@gmail.com)
lOMoARcPSD|48712373

AD3491 – Fundamentals of Data Science and Analytics

Semester : IV Regulation 2021

Unit 1

2 marks:

1. Define Data Science


2. What is Big data?
3. What is machine Learning?
4. Define Data Mining?
5. List the characteristics of big data.
6. Mention the categories of data.
7. List some of the application domains of data science.
8. What is structured data? Give some examples.
9. Define unstructured data. Give examples.
10. What is machine generated data.
11. State the importance of setting the research goal.
12. List the phases involve in the data science process.
13. What is meant by data cleaning?
14. What is project charter?
15. Identify the important contents of a project charter.
16. List some of the visualization techniques.
17. Name some problems associated with real world data.
18. Define data warehouse, data mart and data lake.
19. List some of the factors involved in selecting the modeling technique.
20. What is a dummy variable?
21. What do you meant by exploratory data analysis?
22. List out the methods for combining data from different table.
23. Why we need to build a model?
24. On what factors the modelling technique is being selected.
25. Why the data’s to be cleaned.

10 mark Questions:

1. Discuss the applications of data science and big data with suitable examples.
2. Illustrate the overview of the data science process.
3. Elaborate any five application domains of data science
4. Describe the categories of data for data mining.
5. Discuss the significance of setting the research goal for the data science project.
6. Discuss the categories involved in retrieving relevant data from different sources of
data.
7. Explain the different stages of data preparation phase.
8. Elucidate the techniques involved in data cleansing.
9. Illustrate the steps involved in combining data from different data sources.
10. Explain the impact of variable reduction on data science project highlighting its pros
and cons.
11. Elaborate on the steps involve in model building with suitable diagrams.

UNIT II Descriptive Analytics

2 marks:
1. What is meant by frequency distribution?
2. What is meant by qualitative data? Give examples.
3. What is meant by quantitative data? Give examples.
4. Differentiate qualitative and quantitative data.

Downloaded by Varshini Pattabiraman (pvarshinibca@gmail.com)


lOMoARcPSD|48712373

5. Compare discrete and continuous variables.


6. State the difference between nominal and ordinal data.
7. Mention the types of frequency distribution?
8. Define an outlier?
9. What is percentile rank?
10. Provide the equation for percentile rank.
11. State the differences between a histogram and bar graph.
12. Give the measures of central tendency.
13. Define mode.
14. Define median.
15. Define positively skewed distribution.
16. What is negatively skewed distribution?
17. Define variance.
18. Define standard deviation.
19. What is normal curve?
20. Define z score.
21. Give the equation for z-score.
22. How will convert the z score to the original score.
23. Define correlation.
24. Mention the types of correlation.
25. Define scatterplot.
26. What is a curvilinear relationship?
27. List the key properties of correlation coefficient r.
28. Define regression.
29. Give the types of regression models.
30. Define restricted range.
31. What is a regression line?
32. What is the interpretation of r2.
33. What is the standard error of estimate.
34. Give the least squares Regression Equation.
35. State the desirable property of least square regression.
36. State the multi regression equation.
37. When does the regression fallacy occur.
38. How does the standard error of estimate is calculated.
39. Give the general form of linear regression model
40. Provide the difference between correlation and regression.

10 Marks Questions:
1. Explain the different types of frequency distribution with suitable examples
and diagrams.
2. Elaborate the different ways to describe or represent data using tables with
suitable examples.
3. Explain the various ways by which data can be represents or describes using
graphs with suitable examples.
4. Elaborate the different measures of central tendency and describe the
suitable measures for the different types of data distribution.
5. Construct the frequency table an draw bar graph and stem, leaf displays for
the following data:

Downloaded by Varshini Pattabiraman (pvarshinibca@gmail.com)


lOMoARcPSD|48712373

6. The following data are the shoe sizes of 50 male students. The sizes are
discrete data since shoe size is measured in whole and half units only.
Construct a histogram and calculate the width of each bar or class interval.
Suppose you choose six bars.

9; 9; 9.5; 9.5; 10; 10; 10; 10; 10; 10; 10.5; 10.5; 10.5; 10.5; 10.5; 10.5; 10.5;
10.5
11; 11; 11; 11; 11; 11; 11; 11; 11; 11; 11; 11; 11; 11.5; 11.5; 11.5; 11.5; 11.5;
11.5; 11.5
12; 12; 12; 12; 12; 12; 12; 12.5; 12.5; 12.5; 12.5; 14

7. The following data are the heights (in inches to the nearest half inch) of 100
male semiprofessional soccer players. The heights are continuous data, since
height is measured.

60; 60.5; 61; 61; 61.5 63.5; 63.5; 63.5 64; 64; 64; 64; 64; 64; 64; 64.5; 64.5;
64.5; 64.5; 64.5; 64.5; 64.5; 64.5 66; 66; 66; 66; 66; 66; 66; 66; 66; 66; 66.5;
66.5; 66.5; 66.5; 66.5; 66.5; 66.5; 66.5; 66.5; 66.5; 66.5; 67; 67; 67; 67; 67;
67; 67; 67; 67; 67; 67; 67; 67.5; 67.5; 67.5; 67.5; 67.5; 67.5; 67.5 68; 68; 69;
69; 69; 69; 69; 69; 69; 69; 69; 69; 69.5; 69.5; 69.5; 69.5; 69.5 70; 70; 70; 70;
70; 70; 70.5; 70.5; 70.5; 71; 71; 71 72; 72; 72; 72.5; 72.5; 73; 73.5 74

8. Compute the mean, median and mode for the following data sets.
I ) 9, 10, 12, 13, 13, 13, 15, 15, 16, 16, 18, 22, 23, 24, 24, 25

9. Explain the various measures of variability with suitable examples.


10. Using the computation formula for the sum of squares, calculate the
population standard deviation and sample standard deviation for the scores:
 1,3,7,2,0,4,3,7
 10,8,5,0,1,7,9,2,1
11. Elaborate in detail the significance of correlation and the various types of
correlation .
12. What are scatterplots? Illustrate on the various types with suitable examples.
13. Elaborate on the correlation coefficient r. Compare the various correlation
coefficients.
14. Calculate and analyze the correlation coefficient for the following table:

Subject Age x Glucose Level y

1 43 99

2 21 65

3 25 79

4 42 75

Downloaded by Varshini Pattabiraman (pvarshinibca@gmail.com)


lOMoARcPSD|48712373

5 57 87

6 59 81

15. What is the significance of r²? Give a detailed interpretation of r²?


16. Discuss the importance of regression. Elaborate on the types of Regression.
Calculate the regression coefficient and obtain the lines of regression for the
following data.
17. Explain the significance of regression line and Least squares regression
equation.
18. Find the standard error for the sample data: 10, 20, 30, 40, 45.
19. Elaborate on multiple regression equations.
20. Elucidate regression towards the mean. Explain regression fallacy and state
how it can be avoided.

Unit III : Inferential Statistics

2 Mark Questions:

1. Define population? Give an example.


2. What is real population?
3. List the different types of population.
4. What is hypothetical population?
5. Define sample.
6. List the categories of sample.
7. What is random sampling?
8. Mention the types of random sampling.
9. Differentiate population and sample.
10. List the types of non-probability sampling.
11. Define snowball sampling.
12. Differentiate non-probability and probability sampling.
13. Give the optimal sample size.
14. What is systematic sampling?
15. Define cluster sampling.
16. Mention the advantages of random sampling.
17. Define consecutive sampling.
18. Provide the standard error of the mean.
19. Give the level of confidence.
20. Compare two tailed and one tailed test.

10 Mark Questions:

1. Discuss on population and samples with suitable examples.


2. Discuss the different types of random sampling techniques.
3. Elaborate on the different types of non-probability based sampling techniques.
4. Illustrate the hypothesis testing with an example.
5. Explain the procedure of z-test with an example.
6. A teacher claims that the mean score of students in the class is greater than
80 with a standard deviation of 20. If a sample of 75 students was selected
with a mean score of 90 then check if there is enough evidence to support this
claim at a 0.05 significance level.

Downloaded by Varshini Pattabiraman (pvarshinibca@gmail.com)


lOMoARcPSD|48712373

7. An online food delivery company claims that the mean delivery time is less
than 30 minutes with a standard deviation of 10 minutes. Is there enough
evidence to support this claim at a 0.05 significance level if 49 orders were
examined with a mean of 20 minutes?
8. A company wants to improve the quality of products by reducing defects and
monitoring the efficiency of assembly lines. In assembly line A, there were 9
defects reported out of 100 samples and in line B, 25 defects out of 600
samples were identified. Check if there is a difference in the procedures at a
0.05 alpha level?
9. Explain in detail about Estimation and the significance of point estimates.
10. Elaborate on Confidence interval and level of confidence.

Unit –IV Analysis of Variance

2 Mark Questions:

1. Define categorical variable. Give example.


2. Mention the types of categorical variable.
3. Give the difference between on way and two way anova.
4. What is t-test?
5. Give the measures of the t-test.
6. When to use the t-test.
7. Provide the difference between a one-sample t-test and a paired t-test.
8. Can the t-test is used to measure the difference among several groups.
9. Define chi-square test and write its formulae.
10. Specify the purpose of chi-square test.
11. How the chi-square test is interpreted.
12. What is an acceptable value in chi-square method.
13. Define F-test.
14. Write the decision criteria for a right tailed Ftest.
15. Give the critical value for the F test.
16. Why does Anova uses Ftest?
17. Is it possible for a negative F statistic in a f test
18. How F test is differentiated from T .
19. Differentiate one way Anova from two way Anova.
20. How Anova’s statistical significance is determined.
21. What is factorial anova?
22. Where does the chi square test is used.

10 Mark Questions:

1. Alex timed 21 people in the sprint race, to the nearest second:


59, 65, 61, 62, 53, 55, 60, 70, 64, 56, 58, 58, 62, 62, 68, 65, 56, 59, 68, 61, 67
Find the mean, median and mode.
2. The table shows the star rating for 20 hotels.
What is the mode star rating?

Downloaded by Varshini Pattabiraman (pvarshinibca@gmail.com)


lOMoARcPSD|48712373

3. Find the Variance of the Frequency Table

Unit 5 – Predictive Analysis

2 Mark Questions:
1. How do you calculate least squares?
2. List the methods the available to calculate least square.
3. Define the principle of least square.
4. Define least square.
5. What is least square curve fitting?
6. Why do we need Time series Analysis?
7. Give some examples for time series analysis.
8. Mention the types of Time series Analysis.
9. Mention the applications of Time Series Analysis.
10. Give the limitations of Time series Analysis.
11. List the Data types of Time series.
12. What does Goodness of fit mean?
13. Why is Goodness of fit is important?
14. Provide the most common goodness of fit tests.
15. Why do we test goodness of fit.
16. Define multiple linear regression.
17. How the error is calculate in linear regression model.

Downloaded by Varshini Pattabiraman (pvarshinibca@gmail.com)

You might also like