Lahore School of Economics
BSc-III
Econometrics I (ECO 204)
Final Term Examination
Spring 2021
Azam Chaudhry Aimal Tanvir
Name: ______________ Section: _______
Total Marks: 100
Instructions: Do not change the order of the questions.
Use the values in the tables below where required.
Critical values for the t-distribution
Significance level t-critical
10% 1.64
5% 1.96
1% 2.58
Critical values for the F-distribution for
(𝒒, ∞)
(q) Significance level (5%)
1 3.84
2 3
3 2.6
4 2.37
5 2.21
6 2.1
Page 1 of 10
Lahore School of Economics
Question 1 (30 Marks):
Suppose a researcher is interested in seeing the impact on the child’s birth weight of the average
cigarettes smoked per day by the mother during pregnancy and the gender of the baby. The researcher
estimates the following equation (Standard errors are in brackets):
bwghti = 3375.9 - 11.65cigsi + 88.6malei - 28.23 (cigsi*malei)
(20.29) (4.57) (28.23) (6.47)
R2= 0.133
Where bwghti = child’s weight at birth, in grams.
cigsi = average cigarettes smoked per day by the mother during pregnancy
malei = A Dummy variable equal to one if the baby is male and equal to zero otherwise.
a. Give economic interpretations for each of the estimated slope coefficients. Using one graph with
bwghti on the vertical axis and cigsi on the horizontal axis, show the relationship between bwghti and
cigsi for a male child and for a female child. (10 marks)
b. Test to see if the coefficient of the cigsi*malei variable is significantly different from zero at
the 1% significance level. Does your result mean that the impact of increasing the average
number of cigs is different for a male and a female child, at the 1% significance level? Explain
why or why not. (5 marks)
c. Suppose another researcher used the same dataset to estimate the same equation as above,
but set malei = 0 for a male child and 1 otherwise. Write down the estimated equation that
the second researcher would obtain. If this equation is different from the estimated equation
above, does it mean that both researchers obtained different results from the same model
and dataset? Explain. (5 marks)
d. Suppose that instead of estimating the equation above in which cigs (cigsi) is a continuous
variable, the researcher creates a second dummy variable cigssmokedi which is equal to one if
Page 2 of 10
Lahore School of Economics
the average number of cigs smoked is greater than 0 and zero otherwise. The researcher
estimates the following equation using cigssmokedi and malei (standard errors are in
brackets):
bwghti = 3384.3 - 172.7cigssmokedi + 83.26malei -25.19 (cigssmokedi *malei)
(20.74) (54.1) (28.83) (76.4)
R2= 0.017
Give an economic interpretation of the estimated intercept, the estimated coefficient of malei
and the estimated coefficient of cigssmokedi. (5 marks)
e. Carefully give an economic interpretation for the coefficient of cigssmokedi *malei. (5
marks)
Question 2 (15 Marks):
The following graph depicts the number of deaths in Korea due to the Coronavirus starting from
January 22nd up to 10th May for the year 2020. The vertical axis represents the number of deaths in
Korea due to the coronavirus and the horizontal axis represents the time frame (starting from 22nd
January, 2020 up to 10th May, 2020).
Page 3 of 10
Lahore School of Economics
10000
8000
Corona Deaths in Korea
6000
4000
2000
01feb2020 01mar2020 01apr2020 01may2020
date
a) We estimated the following regression using the above data.
. reg coronadeathsinkorea date aprildummy date_aprildummy
Source SS df MS Number of obs = 110
F(3, 106) = 282.36
Model 2.0502e+09 3 683401514 Prob > F = 0.0000
Residual 256551535 106 2420297.5 R-squared = 0.8888
Adj R-squared = 0.8856
Total 2.3068e+09 109 21162899.8 Root MSE = 1555.7
coronadeathsi~a Coef. Std. Err. t P>|t| [95% Conf. Interval]
date 133.598 5.816679 22.97 0.000 122.0659 145.1301
aprildummy 2362817 733845.1 3.22 0.002 907898.2 3817737
date_aprildummy -107.281 33.32744 -3.22 0.002 -173.3558 -41.20608
_cons -2931817 127846.4 -22.93 0.000 -3185285 -2678349
Variables Description
date Time span starting from January 22nd till May 10th, 2020.
aprildummy =1 for days in the month of April and 0 otherwise
date_aprildummy date*aprildummy (interaction term)
t Time variable
tsq t*t= time squared
Page 4 of 10
Lahore School of Economics
Looking at the regression results above, what can you conclude about the coronavirus deaths
in Korea? Explain carefully. (5 marks)
b) It has been theorized that higher temperatures may reduce the incidences of coronavirus
(and thus deaths). Therefore, it could be very useful to take the daily data on deaths in Korea
due to coronavirus (coronadeathsinkorea) and regress it against daily high temperatures
(temp_korea) and a time trend (t). This relationship is tested in the regression below.
. reg coronadeathsinkorea temp_korea t
Source SS df MS Number of obs = 110
F(2, 107) = 379.16
Model 2.0215e+09 2 1.0108e+09 Prob > F = 0.0000
Residual 285241600 107 2665809.34 R-squared = 0.8763
Adj R-squared = 0.8740
Total 2.3068e+09 109 21162899.8 Root MSE = 1632.7
coronadeat~a Coef. Std. Err. t P>|t| [95% Conf. Interval]
temp_korea 24.20041 37.11135 0.65 0.516 -49.36852 97.76934
t 131.2576 7.512697 17.47 0.000 116.3645 146.1506
_cons -1414.707 357.8535 -3.95 0.000 -2124.11 -705.304
What can you say about the relationship between the coronavirus deaths in Korea and daily
high temperatures? (5 marks)
c) Using this data, one can determine when the death toll due to this virus will stop rising by
running the following regression. Using the following results, estimate the number of days it
takes for the increase in death rate to fall to zero. (5 marks)
. reg coronadeathsinkorea t tsq
Source SS df MS Number of obs = 110
F(2, 107) = 524.95
Model 2.0934e+09 2 1.0467e+09 Prob > F = 0.0000
Residual 213349041 107 1993916.27 R-squared = 0.9075
Adj R-squared = 0.9058
Total 2.3068e+09 109 21162899.8 Root MSE = 1412.1
coronadeat~a Coef. Std. Err. t P>|t| [95% Conf. Interval]
t 235.2709 17.10752 13.75 0.000 201.3573 269.1846
tsq -.9036152 .149313 -6.05 0.000 -1.199611 -.6076195
_cons -3174.452 411.3608 -7.72 0.000 -3989.927 -2358.978
Page 5 of 10
Lahore School of Economics
Question 3 (10 Marks):
Suppose a graduate of the Lahore School of Economics is interested in determining the factors that
affect election results. Create a simple political economy model which estimates the amount of
votes a particular candidate may get in the upcoming Pakistani elections. Make sure to explain what
type of data you would gather, where you would gather this data from, what your dependent variable
would be (including its units of measurement) and what your independent variables would be
(including their units of measurement). Make sure to include at least five independent variables.
Also explain what you would expect to get for the signs of the estimated slope coefficients of your
model, giving brief economic justifications for each of your hypothesized signs.
Question 4 (15 Marks):
An econometrician was interested in investigating if chemical industries experiences economies of
scale. In order to do this, the econometrician took data from chemical industries and estimated the
following equation (with standard errors in brackets):
rdi= 7.83 - 0.53 salesi + 0.00029 sales i 2 + 2.04 profitsi
(se) (0.253) (0.000024) (0.023)
Where rdi = research and development spending in firm i, in millions
salesi = sales of firm i, U.S. Dollars
profitsi = Average worker’s salary in firm i, U.S. Dollars
a) Explain why the econometrician may have chosen this particular function form to measure
if chemical industries experience economies of scale. (5 marks)
b) Explain what the estimated coefficients of the sales and the sales squared variables imply. (5
marks)
c) Suppose your econometrics instructor told you that there are increasing returns to scale for
industries with up to a certain level of sales and decreasing returns to scale for firms above a
certain level of sales. Explain what this means and calculate the number of sales at which
the returns to scale change, from the equation above. (5 marks)
Page 6 of 10
Lahore School of Economics
Question 5 (15 Marks):
The regressions for this problem are based on a ZIP code-level data on prices for various items at
fast-food restaurants, along with characteristics of the zip code population, in New Jersey and
Pennsylvania. The idea is to see whether fast-food restaurants charge higher prices in areas with a
larger concentration of blacks. The two regressions below regress price of soda on a series of
independent variables based on this model.
Dependent Variable: (1) (2)
soda
prpblck 0.0987 0.0989
(0.0362) (0.0276)
NJ 0.0938 -
(0.1293)
income -2.384 -1.839
(0.984) (0.930)
emp 0.01985 0.0673
(0.000422) (0.0004)
crmrte 0.141 0.473
(0.103) (0.106)
hseval 7.0173 0.00293
(0.00372) (0.008238)
Constant 0.383 1.039
(0.0217) (0.0218)
Observations 394 394
R-squared 0.398 0.364
Standard errors in parentheses
Where;
Soda= price of soda, in US dollars
Prpblack=proportion of the population that is black
NJ=1 if New Jersey and 0 if Pennsylvania
Income= median family income
Emp= number of employees
Page 7 of 10
Lahore School of Economics
Crmrte= crime rate
Hseval=median housing value
Note: In the first regression, the dummy variable NJ is included; in the second, it is omitted. It is
assumed that all the classical assumptions of the regression model are satisfied. Standard errors are
in parenthesis.
i) How would you test the hypothesis that all coefficients in the first model except the constant
term are equal to zero? Carry out the test and explain the implication of your result. (5 marks)
ii) You want to test the underlying null hypothesis that the price of soda is the same in NJ and
Pennsylvania, using a 5% significance level. In doing so, specify: (5 marks)
a) Null hypothesis:
b) Alternative hypothesis:
c) Test statistic:
d) What can you conclude from this test? Interpret your results in a language familiar to
policymakers.
iii) Your friend sees that the R2 for both models is almost the same, and concludes that the NJ
dummy adds no value to your model. Based on the evidence, do you agree with this assessment?
Explain. (5 marks)
Page 8 of 10
Lahore School of Economics
Question 6 (15 Marks):
Hamermesh and Biddle (1994) wanted to estimate the impact of physical attractiveness on an individual’s wage.
They collected a sample where each person in the collected sample was ranked by an interviewer
for physical attractiveness, using five categories (homely, quite plain, average, good looking and
strikingly beautiful. Because there are so few people at the two extremes, the authors put people
into one of the two groups for the regression analysis: below average and above average. The
researchers then estimated the following summary statistics for the variables lwage, exper and
educ.
Table 1
Table 2
Where;
lwage: log(hourly wage); belavg: =1 if looks <= 3; abvavg: =1 if looks >=4;
exper: years of workforce experience; educ: years of schooling
Page 9 of 10
Lahore School of Economics
i. What do the two different output tables for the summary statistics represent?
Explain why are there two different values of the mean for the variable lwage. (5
marks)
ii. The impact of physical attractiveness on hourly wage is determined using the
following equation:
𝒍𝒘𝒂𝒈𝒆𝒊 = 𝜷𝑶 + 𝜷𝟏 𝒂𝒃𝒗𝒂𝒗𝒈𝒊 + 𝝁𝒊
a. Find the values of 𝛽𝑂 & 𝛽1 using Tables 1 and 2 and rewrite the above
equation. (5 marks)
b. Using your answer to part a, interpret 𝛽𝑂 𝑎𝑛𝑑 𝛽1. (5 marks)
Page 10 of 10