MSc Business Analytics
STATISTICS AND ECONOMETRICS
Past Exam - Solutions
Instructions
Answer all FOUR questions.
You are supplied with a formula sheet.
College approved calculators can be used.
Page 1 of 6
Question 1 (20 Marks)
Answer the following questions. Be concise and to the point.
(a) (5 Marks) Discuss the effect of multicollinearity on the unbiasedness and the
variance of OLS estimators, 𝛽"! .
Answer: Multicollinearity means that some independent variables are highly
correlated with each other. Effect on unbiasedness: muticollinearity does not affect
the unbiasedness of the OLS estimators; Effect on the variance: if the independent
variable, 𝑥" , is highly correlated with other independent variables, then 𝑅"# is “close
to 1”. Thus, the variance of the estimate 𝛽"! is large and this decreases the
precision of 𝛽"! .
(b) (5 Marks) Discuss the effect of increasing the sample size on the unbiasedness
and the variance of OLS estimators, 𝛽"! .
Answer: The bias in 𝛽"! (if exists) remains as the sample size increases. Increasing
the sample size increases the total variation in 𝑥" , and thus the variance
decreases.
(a) (5 Marks) Suppose you estimate the gender difference in returns to education
using the following model:
log(𝑤𝑎𝑔𝑒) = (𝛽$ + 𝛿$ 𝑓𝑒𝑚𝑎𝑙𝑒) + (𝛽% + 𝛿% 𝑓𝑒𝑚𝑎𝑙𝑒)𝑒𝑑𝑢𝑐 + 𝑢
where wage is the hourly wage, female is a gender dummy, which is =1 if the
individual is female, and educ is the number of years of education. Someone
asserts that expected wages are the same for men and women who have the
same level of education. Explain how you would test this hypothesis.
Answer: The null hypothesis is 𝐻$ : 𝛿$ = 0 𝑎𝑛𝑑 𝛿% = 0. One would form an F
statistic to test exclusion restrictions.
(c) (5 Marks) Consider the regression model: 𝑦& = 𝛽$ + 𝛽% 𝑥& + 𝑢& . If 𝑅# = 0, what are
=$ and 𝛽
the values of 𝛽 =% ?
=$ = 𝑦>,
Answer: If 𝑅# = 0, then the sample regression line is horizontal at 𝑦>. So 𝛽
=
and 𝛽% = 0.
Page 2 of 6
Question 2 (30 Marks)
Suppose you have estimated the following linear probability model explaining 401(k)
eligibility in terms of income, age, and gender. A 401(k) is a retirement savings plan
sponsored by employers in US.
B = −.506 + .0124𝑖𝑛𝑐 − .000062𝑖𝑛𝑐 # + .0265𝑎𝑔𝑒 − .00031𝑎𝑔𝑒 # − .0035𝑚𝑎𝑙𝑒,
𝑒401𝑘
(.081) (.0006) (.000005) (.0039) (.00005) (.0011)
𝑛 = 9,275, 𝑅# = .094,
where 𝑒401𝑘 is a binary variable for eligibility in a 401(k) plan (𝑒401𝑘 = 1 if eligible for
401(k), and = 0 otherwise), 𝑖𝑛𝑐 denotes family annual income (in $1,000), 𝑎𝑔𝑒 denotes the
individual’s age (in years), and 𝑚𝑎𝑙𝑒 is a binary variable for gender (𝑚𝑎𝑙𝑒 = 1 if male
individual, and = 0 otherwise).
(a) (10 Marks) Holding other factors fixed, for an individual with family income of
$200,000, if annual income increases by $1,000, what happens to the probability of
401(k) eligibility?
+
'()$%*
Answer: '&,- = .0124 − 2(. 000062)(200) = −0.0124. So, for an individual with
family income of $200,000, if annual income increases by $1,000 (𝑖𝑛𝑐 increases by
1), the estimated probability of 401(k) eligibility decreases by 0.0124.
(b) (5 Marks) Is there evidence of gender difference at the 5% significance level
(clearly show test statistic and rejection rule in your answer)? (The 5% critical value
for a two-tailed t test is 1.96)
.$$01
Answer: 𝑡.!"#$ = − .$$%% = −3.18, 𝑡1% = 1.96. Reject 𝐻$ : 𝛽345( = 0 if |−3.18| >
1.96. So, we can reject 𝐻$ or 𝛽345( is statistically different from 0. Therefore, there
is sufficient evidence of gender difference.
(c) (8 Marks) Now, we specify the following version of the above model:
𝑒401𝑘 = 𝛽$ + 𝛽% 𝑖𝑛𝑐 + 𝛽# 𝑖𝑛𝑐 # + 𝛽0 𝑎𝑔𝑒 + 𝛽) 𝑎𝑔𝑒 #
+𝛽6 𝑚𝑎𝑙𝑒 + 𝛽7 𝑚𝑎𝑟𝑟𝑖𝑒𝑑 ∗ 𝑚𝑎𝑙𝑒 + 𝛽8 𝑚𝑎𝑟𝑟𝑖𝑒𝑑 + 𝑢,
where 𝑚𝑎𝑟𝑟𝑖𝑒𝑑 is a binary variable indicating whether the individual is married or
not (𝑚𝑎𝑟𝑟𝑖𝑒𝑑 = 1 if married, and = 0 otherwise). What is the interpretation of (𝛽8 −
𝛽6 )?
Answer: It is the predicted difference in 𝑒401𝑘 probability between married female
and single male.
(d) (7 Marks) Referring to the model in part (c), discuss how you could test whether
marriage has the same effect on 401(k) eligibility for male and female.
Answer: The marriage effect for female is simply 𝛽8 , and the marriage effect for
male is 𝛽7 + 𝛽8 . So to test whether marriage has the same effect, we only need to
run the hypothesis testing 𝐻$ : 𝛽7 = 0, against 𝐻% : 𝛽7 ≠ 0.
Page 3 of 6
Question 3 (30 Marks)
A researcher wants to understand how class attendance would affect students’ final exam
performance. The dependent variable for the analysis is the standardized outcome on a
final exam at a university, stndfnl. Independent variables include atndrte, which indicates
the percentage of classes attended and priGPA, i.e., prior college GPA (on a 4.0 scale).
(a) (5 Marks) Consider first a linear regression of stndfnl on atndrte, given by the
summary below. What is the estimated impact of attendance on final exam
performance? Is it statistically significant?
Answer: Because 𝑎𝑡𝑛𝑑𝑟𝑡𝑒 is measured as a percentage, it means that 1
percentage point increase in atndrte is predicted to increase stndfnl by 0.008128
standard deviations from the mean final exam score. It is significant at 0.1% level.
(b) (8 Marks) Now we add priGPA to the model in part (a). The summary is given
below. What is the estimated effect of atndrte on stndfnl? Discuss why the estimate
is different from that obtained in part (a).
Answer: After we add priGPA, 1 percentage point increase in atndrte is predicted
to decrease stndfnl by 0.001156 standard deviations from the mean final exam
score, and it becomes statistically insignificant even at 10% level. We can explain
Page 4 of 6
the difference based on omitted variable bias. priGPA is possibly positively
correlated with a student’s final exam score, and it is also likely to be positively
correlated with a student’s class attendance. So when we omit priGPA from the
model, we would have a positive bias. More importantly, this regression shows that
after controlling for priGPA, attendance rate does not have any impact on final
exam performance.
(c) (10 Marks) Now, we add 𝑝𝑟𝑖𝐺𝑃𝐴# to the model in part (b). What is the effect of
priGPA on the final exam score? Does it make sense to you? Explain.
Answer: The coefficient of priGPA is negative, and the coefficient of the quadratic
term is positive. It implies that stndfnl first decreases in priGPA, but the slope
gradually increases as priGPA increases. It will eventually turn around and be
increasing in priGPA. The turning point is given by 1.888/(2*0.484) = 1.95. It is hard
to believe that final exam performance decreases in prior College GPA. This is the
cost of using a quadratic to capture the nonlinear effects. If the turning point 1.95 is
below all but a small percentage of students in the sample, then this is not of much
concern. Another possibility is that the estimated effects of priGPA on stndfnl is
biased because we have controlled for no factors, other than atndrte.
(d) (7 Marks) Someone asserts that priGPA has no impact on the final exam
performance. Explain whether you would agree with this assertion based on the
regression results in part (a) and part (c). If you need more information to draw a
conclusion, give a brief explanation on what else is required. (The 5% critical value
for an F test with dfs 2 and 676 is 3.00)
Answer: The null hypothesis is that the coefficients of priGPA and 𝑝𝑟𝑖𝐺𝑃𝐴# in the
model in part (c) are both zero. We can calculate the F statistic using 𝑅# in parts
(a) and (c). The F statistic is
# − 𝑅 # )/𝑞
(𝑅9: (.1699 − .01961)/2
:
# = = 61.20
(1 − 𝑅9: )/(𝑛 − 𝑘 − 1) (1 − .1699)/676
The 5% critical value is 3.00, so one would reject the null hypothesis at that level.
Page 5 of 6
Question 4 (20 Marks)
Consider two regression models (a): 𝑦 = 𝛽$ + 𝑢 and model (b) 𝑦 = 𝛽$ + 𝛽% 𝑥 + 𝑢. We
estimate both models with a sample of 102 observations. Is it possible that model (b) has
a higher AIC than model (a), but 𝑥 is statistically significant at 1% level in model (b)?
Explain. (The 1% critical value for an F test with dfs 1 and 100 is 6.895)
Answer:
Page 6 of 6