Health Services Academy
Department of Public Health
Final Term Examination-2025
ANSWER KEY
Section 1 - Mark the correct option. Over cutting is not Allowed.
1. Which of the following is easy to handle and control while collecting the data?
a) Single variation
b) Multiple variation
c) Skewness
d) Cronbach’s alpha
2. What is the purpose of a pilot study?
a) To finalize the research design
b) To test the study procedures on a similar population
c) To collect the main data for the research
d) To analyze the final results
3. For continuous variables, what method is suggested to address missing values?
a) Regression
b) Mean imputation
c) Mode imputation
d) Median
4. What is the primary benefit of using random numbers in research?
a) To increase bias
b) To ensure representation of the data
c) To complicate data analysis
d) To reduce sample size
5. In a normal distribution, the difference between the mean and median should ideally be:
a) < 5%
b) > 10%
c) > 20%
d) Equal to the standard deviation
6. In a normal distribution, approximately what percentage of values should fall within ±2
standard deviations of the mean?
a) 50%
b) 68%
c) 95%
d) 99.7%
7. If skewness and kurtosis values are between ±0.05, the data is considered:
a) Highly skewed
b) Non-normal
c) Approximately normal
d) Leptokurtic
8. In a Normal Q-Q plot, if the maximum values fall on the center line, the data is likely:
a) Skewed
b) Normally distributed
c) Kurtotic
d) Non-normally distributed
9. Which plot might show a "W" shape if the data is normally distributed?
a) Histogram
b) Box-Whisker Plot
c) Detrended Normal Q-Q Plot
d) Scatter Plot
10. Which of the following is NOT a method for categorizing continuous data?
a) Mean
b) Median
c) Regression
d) Quartiles
11. When should the "split file" function be used in SPSS?
a) For all data analysis
b) When comparing groups (e.g., male/female)
c) When testing for normality
d) When calculating Cronbach's alpha
12. Data validity is assessed through:
a) Cronbach's alpha
b) Questionnaire design
c) Statistical analysis
d) Sample size
13. Data reliability is assessed through:
a) Questionnaire design
b) Cronbach's alpha
c) Statistical analysis
d) Sample size
14. What percentage of the total sample size is recommended for a pilot test of a new
questionnaire?
a) 1-2%
b) 5-10%
c) 20-30%
d) 50%
15. What does the EM model help analyze?
a) Normality
b) Missing values
c) Correlation
d) Regression
16. Which statistical method is mentioned as most suitable for analyzing missing values
relative to independent and dependent variables?
a) Mean imputation
b) Regression
c) Visual binning
d) Frequency analysis
17. What is the purpose of visual binning?
a) To test for normality
b) To categorize continuous data
c) To impute missing values
d) To calculate Cronbach's alpha
18. What is the difference between Mathematics and Statistics?
a) Mathematics deals with approximations, Statistics with exact data.
b) Mathematics is inductive, Statistics is deductive.
c) Mathematics deals with exact data, Statistics with approximations.
d) Both deal with exact data.
19. What is the purpose of normality testing before performing parametric tests?
a) To calculate the median
b) To determine if the data meets the assumptions of the test
c) To calculate the mode
d) To categorize the data
20. When is a t-test used instead of a z-test?
a) When the population variance is known.
b) When the population variance is unknown.
c) Only with large sample sizes.
d) Only with categorical data.
21. When is logistic regression used?
a) When the dependent variable is continuous.
b) When the dependent variable is categorical with two outcomes.
c) When the independent variable is categorical.
d) When the data is normally distributed.
22. What does R² represent in regression?
a) P-value
b) Standard deviation
c) Coefficient of determination
d) Mean
23. Which of the following is NOT a test for normality?
a) Shapiro-Wilk
b) Kolmogorov-Smirnov
c) Chi-square
d) Q-Q plot
24. When should post hoc tests be applied?
a) Before ANOVA
b) Only when ANOVA shows a significant difference between groups
c) When data is not normally distributed
d) When using a t-test
25. When the sample size is 30 or above but population variance is unknown, which test is
used?
a) t-test
b) z-test
c) ANOVA
d) Chi-square
26. Which type of graph is most suitable for displaying qualitative data with fewer
categories, such as gender or disease status (yes/no)?
A) Histogram
B) Pie Chart
C) Box Plot
D) Scatter Diagram
27. For continuous data, such as age or weight, which graphical representation is most
appropriate to visualize the data distribution?
A) Scatter Diagram
B) Multiple Bar Charts
C) Histogram
D) Pie Chart
28. When analyzing the relationship between two quantitative variables, such as age vs.
height, which type of graph is used?
A) Box Plot
B) Scatter Diagram
C) Bar Chart
D) Histogram
29. Which descriptive statistical tool is used for nominal data, such as gender or disease
presence?
A) Mean and Standard Deviation
B) Frequency Tables and Mode
C) Harmonic Mean
D) Scatter Plots
30. What is the primary use of the Pearson Chi-Square test in data analysis?
A) To compare means of two independent groups
B) To measure association in qualitative data
C) To calculate the standard deviation
D) To compare more than two groups
31. The ANOVA test is used when:
A) Comparing the averages of two independent groups
B) Testing the association in qualitative data
C) Comparing the averages of more than two groups
D) Analyzing the variance of a single group
32. Which descriptive tool is appropriate when analyzing interval data such as temperature or
Likert scale responses?
A) Mode only
B) Mean, Median, and Mode
C) Scatter Diagram
D) Pie Chart
33. The One Sample T-Test is applied in which scenario?
A) To compare averages of more than two groups
B) To test the association in qualitative data
C) To compare a group with a standard value
D) To analyze relationships between two quantitative variables
34. In SPSS, which test would you use to assess the association between two categorical
variables?
A) Pearson Chi-Square Test
B) Independent Sample T-Test
C) ANOVA
D) Correlation Analysis
35. Which SPSS menu option is most commonly used to input or modify the dataset?
A) Analyze
B) Data
C) Transform
D) File
36. When analyzing continuous variables in SPSS, which descriptive statistics are most
relevant?
A) Mean, Median, and Standard Deviation
B) Frequencies and Crosstabs
C) Mode and Percentages
D) Chi-Square and Regression Coefficients
37. What is the purpose of the "Transform" function in SPSS?
A) Importing external datasets
B) Cleaning and sorting data
C) Creating new variables or modifying existing ones
D) Visualizing data in graphs
38. Which statistical test in SPSS would be suitable to compare the means of more than two
independent groups?
A) Independent T-Test
B) Paired T-Test
C) ANOVA
D) Crosstabs
39. What is the first step in conducting a statistical analysis in SPSS?
A) Visualizing data with charts
B) Performing hypothesis tests
C) Importing and defining the dataset
D) Running the descriptive statistics
40. When entering data into SPSS, which of the following practices is correct?
A) Each sample should occupy its own column.
B) Each variable should occupy its own row.
C) Each sample should occupy its own row.
D) Data should only be entered in numerical format.
41. Which of the following is most affected by extreme outliers in a dataset?
A) Median
B) Mode
C) Mean
D) Interquartile Range
42. Which measure of spread considers all data values but is sensitive to extreme values?
A) Range
B) Interquartile Range
C) Standard Deviation
D) Variance
43. Which of the following is NOT one of the four basic rules for exploring data?
A) Open the data and inspect it.
B) Write a research question.
C) Use qualitative methods for analysis.
D) Graph and run descriptive statistics.
44. What type of graph is best used to compare the means of different groups?
A) Histogram
B) Boxplot
C) Bar Chart
D) Scatterplot
45. Which of the following is a correct null hypothesis for a one-sample T-test?
A) The variable has a significant difference from a specified value.
B) The variable is not different from a specified value.
C) The variable is not normally distributed.
D) The variable is significantly correlated with another variable.
46. The one-sample T-test is most appropriate when:
A) Comparing means between two groups.
B) Testing the difference of one numerical variable against a known value.
C) Testing the relationship between two numerical variables.
D) Comparing frequencies of categorical data.
47. Which of the following determines the rejection of the null hypothesis in a T-test?
A) A p-value > 0.05
B) A p-value < 0.05
C) A t-statistic greater than 1
D) The degree of freedom
48. When data does not meet the normality assumption for an independent samples T-test,
which test is used?
A) Mann-Whitney Test
B) Chi-Square Test
C) Paired Samples T-Test
D) ANOVA
49. Which scenario is appropriate for a paired T-test?
A) Comparing height differences between men and women.
B) Testing the effectiveness of a drug before and after treatment on the same patients.
C) Analyzing contributions between mature and young students in a classroom.
D) Examining differences between two unrelated datasets.
50. Which of the following measures considers all data points but is sensitive to outliers?
A) Mean
B) Range
C) Standard Deviation
D) Interquartile Range
______________________________________________________________________________
Section 2 – Solve the following Questions.
1. When conducting a Chi-Square test, why is it essential that the expected frequency
for each category is at least 1, and no more than 20% of the categories have expected
frequencies less than 5?
ANSWER: These conditions ensure the validity of the Chi-Square test. If the expected
frequencies are too low, the approximation to the Chi-Square distribution may not hold,
leading to inaccurate results. Meeting these assumptions helps maintain the reliability of
the statistical test.
2. How do you analyze normality of a data? Give two examples of tests that can be
applied when the data is normally distributed?
ANSWER: Normality of data can be analyzed through a combination of visual inspection and
statistical tests.
Visual methods: Histograms can be used to check if the data approximates a bell curve.
Q-Q (quantile-quantile) plots compare the data's distribution to a theoretical normal
distribution; if the data is normal, the points will fall closely along a straight diagonal line.
Box plots can also give an indication of symmetry and identify potential outliers.
Statistical tests: The Kolmogorov-Smirnov test and the Shapiro-Wilk test are commonly
used. For larger samples, the Kolmogorov-Smirnov test is often used, while Shapiro-Wilk
is generally recommended for smaller sample sizes (less than 50). A p-value greater than
0.05 from these tests indicates that the data does not significantly deviate from a normal
distribution.
Two examples of tests that can be applied when data is normally distributed are:
T-test: Used to compare the means of two groups.
ANOVA (Analysis of Variance): Used to compare the means of three or more groups
3. Identify and name the graph given in the figure. Why would we use this graph in
statistics? Which correlation can be checked with this graph?
Identify and name the graph given in the figure.
The graph is a scatter plot.
We use this graph in statistics to visualize the relationship between two continuous (scale)
variables. It helps us see if there is a trend or pattern in the data, such as a positive or negative
correlation.
A linear correlation can be checked with this graph. A positive correlation is indicated by an
upward trend, a negative correlation by a downward trend, and no clear trend suggests little or no
linear correlation.
4. Identify the parametric and non-parametric tests from the followings:
I) T-test II) Chi-square III) One-way Anova
IV) Regression V) Correlation
Parametric tests: These tests assume that the data follows a specific distribution (usually
normal).
a. I) T-test
b. III) One-way ANOVA
c. IV) Regression
d. V) Correlation (specifically Pearson correlation)
Non-parametric tests: These tests do not make assumptions about the data's distribution.
e. II) Chi-square
5. Write at least two differences between T-test and Z-test. If you have unknown
variance of the population and (N = 54), then which test would you apply when the
data is normally distributed?
Two key differences between t-tests and z-tests are:
Population variance: A z-test is used when the population variance (or standard deviation)
is known. A t-test is used when the population variance is unknown and estimated from the
sample data.
Sample size: While technically a z-test can be used with large samples even if the
population variance is estimated, in practice, a t-test is more commonly used when the
population variance is unknown, especially with smaller sample sizes. With larger sample
sizes (generally n > 30), the t-distribution closely approximates the z-distribution, so the
results are often very similar.
If you have an unknown population variance and a sample size of N = 54, and the data is normally
distributed, you would apply a z-test because the sample size is greater than 30.
6. Name the graph given in the figure. Also interpret what does the graph curve
indicate in terms of normality.
The graph is a histogram.
The curve superimposed on the histogram represents a normal distribution curve. The histogram's
bars show the frequency of data within specific intervals. If the data is normally distributed, the
histogram's bars should roughly follow the shape of the normal distribution curve, forming a
symmetrical bell shape centered around the mean. In the given figure, the histogram appears to
roughly follow the curve, suggesting that the data is approximately normally distributed. However,
a formal normality test (like Shapiro-Wilk or Kolmogorov-Smirnov) would be needed for a
definitive conclusion.
7. A) When should we use ANOVA? Write one assumption/condition for it.
B) Differentiate between a Paired T-Test and an Independent Sample T-Test?
A.
ANOVA compares means of three or more groups to determine if statistically significant differences
exist. The independent variable must be categorical (grouping variable), and the dependent variable must
be continuous.
A key assumption is homogeneity of variance (equal variances across groups), checked with Levene's
test. Other assumptions include normality within groups and independent observations.
B.
Paired T-Test
Compares two related groups (e.g., before and after treatment for the same participants).
Used for dependent samples, where each data point in one group is paired with a data point in
the other group.
Example: Comparing a person's weight before and after a diet program.
Independent Sample T-Test
Compares two separate groups (e.g., males vs. females).
Used for independent samples, where the two groups have no relationship.
Example: Comparing test scores between two different classes.
8. What is the main purpose of performing a correlation analysis, and why is it important to
remember that correlation does not imply causation?
The purpose of a correlation analysis is to determine whether there is a relationship between two numerical
variables, and whether the relationship is positive or negative. It is important to remember that correlation
does not imply causation because a correlation only shows an association, not whether one variable causes
the other to change. Other factors might be responsible for the observed relationship.
9. In an experiment, a researcher is exploring the relationship between the number of hours
studied and exam scores. The scatterplot reveals a linear pattern. How should the researcher
proceed to confirm the strength and direction of this relationship?
The researcher should calculate the Pearson correlation coefficient (if the data is normally
distributed). This coefficient will provide the strength and direction of the relationship. A positive
value indicates a positive relationship, while a negative value indicates a negative relationship. If
the data is not normally distributed, the researcher should use the Spearman Rank correlation.
10. A researcher investigates whether the preference for coffee (caffeinated vs. decaf) is
independent of mood (happy vs. sad). The Chi-Square test results in a p-value of 0.03. What
can the researcher conclude?
Since the p-value (0.03) is less than the significance level (e.g., 0.05), the researcher rejects the
null hypothesis. This indicates that coffee preference is not independent of mood, and there is a
statistically significant association between the two variables.