1. Outline the common errors in retrieving data and which cleansing solutions to be employed?
2. Define Data Science with its applications.
3. Determine the values of the range and the IQR for the following sets of data. (a) Retirement
ages: 60, 63, 45, 63, 65, 70, 55, 63, 60, 65, 63 (b) Residence changes: 1, 3, 4, 1, 0, 2, 5, 8, 0, 2,
3, 4, 7, 11, 0, 2, 3, 4
4. Compare and Contrast Correlation and regression.
5. List down the symbols used for mean and standard deviation for three types of distributions.
6. Specify the decision rule for each of the following situations (a) a two-tailed test with α = .05
(b) a one-tailed test, upper tail critical, with α = .01 (c) a one-tailed test, lower tail critical,
with α = .05 (d) a two-tailed test with α = .01
7. Explain Estimated standard error of the mean.
8. List down the formula for Standardized effect estimate, Cohen’s d.
9. What is Predictive Analytics?
10. Compare and Contrast serial correlation and auto correlation.
11. List out at least five applications of data science.
12. What is brushing and linking in exploratory data analysis?
13. Find the sample mean value for the best actress Oscar winner data set: 34 34 26 37 42 41 35
31 41 33 30 74 33 49 38 61 21 41 26 80 43 29 33 35 45 49 39 34 26 25 35 33.
14. Estimate the standard deviation of the sample data: 3, 5, 7 with a sample mean of 5.
15. Illustrate about Population and Sample with suitable example?
16. What are four possible outcomes for any hypothesis test?
17. Analyze the computational formulas for two-factor ANOVA?
18. Compare T-test and ANOVA?
19. Interpret the importance of exponentially weighted moving average?
20. Summarize about logistic regression?
21. Explain in detail about Data Exploration with an example.
22. Analyze about the need of Data Science and explain in detail about the applications
of Data Science?
23. Outline about the Project Charter?
24. Find the median and median for the following retirement ages: 60, 63, 45, 63, 65, 70,
55, 63, 60, 65, 63.
25. The number of friends reported by Facebook users is summarized in the following
frequency distribution:
What is the shape of this distribution?
Find the relative frequencies.
Find the approximate percentile rank of the interval 300–349.
Convert to a histogram.
26. Explain in detail the guidelines involved in frequency distributions for quantitative
data with an example.
27. Explain briefly the steps involved in an hypothesis testing with an example?
28. Before taking the GRE, a random sample of college seniors received special training
on how to take the test. After analyzing their scores on the GRE, the investigator
reported a dramatic gain, relative to the national average of 500, as indicated by a 95
percent confidence interval of 507 to 527. Are the following interpretations true or
false? (a) About 95 percent of all subjects scored between 507 and 527. (b) The
interval from 507 to 527 refers to possible values of the population mean for all
students who undergo special training. (c) The true population mean definitely is
between 507 and 527. (d) This particular interval describes the population mean
about 95 percent of the time. (e) In practice, we never really know whether the
interval from 507 to 527 is true or false. (f) We can be reasonably confident that the
population mean is between 507 and 527.
29. For the population at large, the Wechsler Adult Intelligence Scale is designed to yield
a normal distribution of test scores with a mean of 100 and a standard deviation of
15. School district officials wonder whether, on the average, an IQ score different
from 100 describes the intellectual aptitudes of all students in their district. Wechsler
IQ scores are obtained for a random sample of 25 of their students, and the mean IQ
is found to equal 105. Using the step-by-step procedure described in this chapter,
test the null hypothesis at the .05 level of significance.
30. Explain briefly about point estimation and confidence intervals with an example?
31. A library system lends books for periods of 21 days. This policy is being reevaluated
in view of a possible new loan period that could be either longer or shorter than 21
days. To aid in making this decision, book-lending records were consulted to
determine the loan periods actually used by the patrons. A random sample of eight
records revealed the following loan periods in days: 21, 15, 12, 24, 20, 21, 13, and
16. Test the null hypothesis with t, using the .05 level of significance.
32. Discuss in detail about one factor ANOVA with an example.
33. Compare and Contrast one tailed and two tailed tests in Hypothesis Testing.
34. Discuss in detail about time series analysis in predictive analytics.
35. Examine briefly about Linear least square in predictive analytics.
36. Extend in detail about chi square testing with suitable example.
37. Demonstrate in detail about multiple regression model with example.
38. Summarize about survival Analysis.
39. Infer in detail about different facets of data with suitable examples?
40. Outline the step-by step activities in data science process?
41. Explain in detail about Data Science Process with neat diagram?
42. Couples who attend a clinic for first pregnancies are asked to esti-mate
(independently of each other) the ideal number of children. Given that X and Y
represent the estimates of females and males, respectively, the results are as
follows:
Calculate a value for r, using the computation formula.
43. Assume that an r of .30 describes the relationship between educational level (highest
grade completed) and estimated number of hours spent reading each week. More
specifically:
(i) Determine the least squares equation for predicting weekly reading time from
educational level.
(ii) Faith’s education level is 15. What is her predicted reading time?
(ii) Keegan’s educational level is 11. What is his predicted reading time?
44. According to the American Psychological Association, members with a doctorate and
a full-time teaching appointment earn, on the average, $82,500 per year, with a
standard deviation of $6,000. An investigator wishes to determine whether $82,500 is
also the mean salary for all female members with a doctorate and a full-time teaching
appointment. Salaries are obtained for a random sample of 100 women from this
population, and the mean salary equals $80,100.
(i) Someone claims that the observed difference between $80,100 and $82,500 is large
enough by itself to support the conclusion that female members earn less than male
members. Explain why it is important to conduct a hypothesis test.
(ii) The investigator wishes to conduct a hypothesis test for what population?
(iii) What is the null hypothesis, H0?
(iv) What is the alternative hypothesis, H1?
(v) Specify the decision rule, using the .05 level of significance.
(vi) Calculate the value of z.
(vii) What is your decision about H0?
(viii) Using words, interpret this decision in terms of the original problem.
45. According to a 2009 survey based on the United States census
(http://www.census.gov/prod/2011pubs/acs-15.pdf), the daily one-way commute time
of U.S. workers averages 25 minutes with, we’ll assume, a standard deviation of 13
minutes. An investigator wishes to determine whether the national average describes
the mean commute time for all workers in the Chicago area. Commute times are
obtained for a random sample of 169 workers from this area, and the mean time is
found to be 22.5 minutes. Test the null hypothesis at the .05 level of significance.
46. A psychologist tests whether a series of workshops on assertive training increases
eye contacts initiated by shy college students in controlled interactions with
strangers. A total of 32 subjects are randomly assigned, 8 to a group, to attend either
zero, one, two, or three workshop sessions. The results, expressed as the number of
eye contacts during a standard observation period, are shown in the following chart.
(Also shown for your computational convenience are the values for the sum of
squares, group totals, and the grand total.)
(i) Test the null hypothesis at the .05 level of significance. (Use computation formulas for
various sums of squares).
(ii) Summarize the results with an ANOVA table.
47. Explain in detail about time series analysis in predictive analytics?
48. Examine in detail about multiple regression model with example?
49. An educational psychologist wants to check the claim that regular physical exercise
improves academic achievement. To control for academic aptitude, pairs of college
students with similar GPAs are randomly assigned to either a treatment group that
attends daily exercise classes or a control group. At the end of the experiment, the
following GPAs are reported for the seven pairs of participants:
(i) Using t, test the null hypothesis at the .01 level of significance.
(ii) Specify the p-value for this test result.
(iii) If appropriate (because the test result is statistically significant), use Cohen’s d to
estimate the effect size.
(iv) How might this test result be reported in the literature?
50. A random sample of 90 college students indicates whether they most desire love,
wealth, power, health, fame, or family happiness.
(i) Using the .05 level of significance and the following results, test the null hypothesis
that, in the underlying population, the various desires are equally popular.
(ii) Specify the approximate p-value for this test result.