0% found this document useful (0 votes)
9 views6 pages

Problem Set 2 - 2025

exercicios no r - inferencia estatistica

Uploaded by

thais
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views6 pages

Problem Set 2 - 2025

exercicios no r - inferencia estatistica

Uploaded by

thais
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Econometrics II

Federal University of ABC

Q2-2025

Problem set #1

Lecturer: Daniel Roland

1) Which of the following are consequences of heteroscedasticity?

(i) The OLS estimators, 𝛽 , are inconsistent.


(ii) The usual F statistic no longer has an F distribution.
(iii) The OLS estimators are no longer BLUE.

2) Researchers at an American university were interested in finding the determinants of


students’ term GPA. They formulated and estimated the following regression with the usual
standard errors in parentheses and the heteroskedasticity-robust standard errors in
brackets.

where trmgpa is term GPA, crsgpa is a weighted average of overall GPA in courses taken,
cumgpa is GPA prior to the current semester, tothrs is total credit hours prior to the
semester, sat is SAT score, hsperc is graduating percentile in high school class, female is a
gender dummy, and season is a dummy variable equal to unity if the student’s sport is in
season during the fall.
Based on the information presented above, it is possible to say that (mark the correct
sentences):

( ) The variables crsgpa and cumgpa have the expected estimated effects, but the effect
of tothrs seems to be the opposite of what is expected.

( ) The variables crsgpa, cumgpa and tothrs have the expected estimated effects, but only
tothrs seems to be statistically significant.

( ) The differences between the usual and the heteroskedasticity-robust standard errors
do not change the statistical significance of any of the regressors.

( ) The F-ratio for this regression is 32.40 and the associated p-value is less than 0.001,
which, in the absence of heteroskedasticity, shows that the regressors are helpful in
explaining term GPA.

( ) Multicollinearity is possibly present as crsgpa, cumgpa and tothrs are possibly highly
correlated. A high correlation between these variables would be evidence of
multicollinearity.

3) Answer True (T) or False (F) in the following statements:

( ) A high correlation between two or more explanatory variables in a linear regression


model does not affect any of the OLS estimator properties.

( ) Under heteroskedasticity, the OLS estimators for the parameters of the multiple
linear regression models continue to be the most efficient (i.e. with the least variance), as
long as the assumptions MLR.1 through MLR.4 are valid.

( ) If 𝜎 (the error variance estimator) is biased and inconsistent, it is still possible to


create valid confidence intervals if there is a sufficiently large sample.

( ) If the homoskedasticity assumption is rejected, the OLS estimators will be biased.

( ) Under multicollinearity, it is not possible to estimate the slope coefficients using an


OLS regression.

4) Suppose that a researcher, when conducting a study on the determinants of automobile


accidents, decided to investigate whether the homoscedasticity assumption was met in his
regression model. Instead of looking at the scatterplot between the dependent variable and
each independent variable, the researcher decided to look at the relationship between the
residuals and the fitted values (of the response variable). He came up with the following
graph:

Graph 1 – Residual analysis of a regression model on the determinants of car accidents.

Based on the graph above, the researcher would be correct in saying (mark the true
statement(s)):

( ) There seems to be multicollinearity present in the regression model.

( ) The variance of the residuals seem to change according to the domain of the fitted value.

( ) Given there is heteroskedasticity, the estimator of the error variance is inconsistent and
biased.

( ) The standard error of each coefficient is no longer valid and this affects the t-ratio, p-
value and confidence intervals.

( ) As multicollinearity is present, a solution to improve the model would be to increase


the sample size.

( ) A solution to heteroskedasticity would be to transform the variables by taking their


natural logs and running the regression again.
( ) A solution to heteroskedasticity would be to calculate the heteroskedasticity-robust
standard errors proposed by White.

5) Using a dataset of crime statistics, a researcher was interested in testing whether the e
average sentence length served for past convictions affects the number of arrests in the
current year (1986). The estimated model with usual standard errors in parentheses and
robust standard errors in brackets is:

where narr86 is the number of arrests for the man in 1986, pcnv is the proportion (not
percentage) of arrests prior to 1986 that led to conviction, avgsen is the average sentence
length served for prior convictions, ptime86 is months spent in prison in 1986, qemp86 is
the number of quarters during which the man was employed in 1986, inc86 is legal income
in 1986 in 100’s of dollars and black and hispan are ethnicity dummy variables.

Given the regression output above, calculate the following:

a) The t-ratios for each estimated coefficient using the usual and the robust standard
errors.
b) Whether or not the estimates are statistically significant at 5% level using the usual
standard errors. What about using the robust standard errors?
c) The F-ratio or the regression shown and whether the regression is overall
statistically significant at the 5% level.

6) An American think-tank is trying to determine whether banks practice discrimination


against minorities in the mortgage loan market. After building a dataset, the researchers
developed an initial linear probability model. The binary variable to be explained is approve,
which is equal to one if a mortgage loan to an individual was approved. The key explanatory
variable is white, a dummy variable equal to one if the applicant was white. The other
applicants in the dataset are Black and Hispanic.

𝑎𝑝𝑝𝑟𝑜𝑣𝑒 = 𝛽 + 𝛽 𝑤ℎ𝑖𝑡𝑒 + 𝑢
The estimated regression, with 1,989 observations, yields:

𝑎𝑝𝑝𝑟𝑜𝑣𝑒 = .708 + .201 𝑤ℎ𝑖𝑡𝑒


(.018) (.020)

The researchers decide to control for other factors and find the following result:

𝑎𝑝𝑝𝑟𝑜𝑣𝑒 = .645 + .129 𝑤ℎ𝑖𝑡𝑒 + 𝑜𝑡ℎ𝑒𝑟 𝑓𝑎𝑐𝑡𝑜𝑟𝑠


(.017) (.020)

Finally, the researchers are interested in looking at the interaction between being white and
having other financial obligations as a percentage of income (obrat). The coefficient for
white*obrat is .0081 and the t statistic is 3.53. Based on all of the information above, mark
True (T) or False (F) on the following statements:

( ) There is no discrimination against non-white individuals as the estimated coefficient for


white is positive.

( ) There is discrimination against non-white people, but the effect is reduced when
controls are added.

( ) When other factors are controlled for, being white is associated with approximately
0.13% higher chance of getting a mortgage loan approved by the bank in comparison with
a non-white person.

( ) The estimated coefficient for the interaction variable between being white and having
financial obligations, white*obrat, is not statistically significant.

( ) The fact that the estimated coefficient for white*obrat is positive indicates that a white
applicant is less penalized than a non-white applicant for having other obligations as a larger
percentage of income, indicating possible discrimination.

7) All the sentences below are false. Rewrite them in a way that makes them correct:

(i) The linear probability model is biased by its own design, but is homoskedastic.
(ii) The probit and logit model can yield probabilities of success greater than one or lower
than zero, unlike the linear probability model.

(iii) The linear probability model, unlike the logit and probit models, allows variation of the
marginal effects according to different values of the independent variables.

(iv) The estimated coefficients for LPM, probit and logit are comparable and no adjustments
are necessary.

8) To find out what factors determine the preference of the use of debit cards over credit
cards, researchers obtained data on 60 customers and considered the following model:

𝑦 = 𝛽 + 𝛽 𝑏𝑎𝑙𝑎𝑛𝑐𝑒 + 𝛽 𝐴𝑇𝑀 + 𝛽 𝑖𝑛𝑡𝑒𝑟𝑒𝑠𝑡 + 𝑢


where y = 1 for debit card users and zero otherwise; balance is the individual’s bank account
balance measured in dollars, ATM is the number of ATM transactions and interest is a
dummy variable indicating whether interest is received on the account.

The regression results are summarised in the table below:

VARIABLE COEFFICIENT COEFFICIENT*


Balance .00028 .00028
(.00015) (.00014)
ATM -.0269 -.0269
(.208) (.202)
Interest -.3019 -.3019
(.1448) (.1353)
Constant .3631 .3631
(.1796) (.1604)
R-squared .1056 (.1056)
* heteroskedasticity-robust standard errors

(i) Which coefficient estimates are statistically significant at the 5% level?


(ii) Are the signs of all the coefficients what you expected? Why?
(iii) What is the interpretation of the coefficient for interest?

You might also like