0% found this document useful (0 votes)
139 views11 pages

Exercise 2 Exam1practice Sa

This document provides practice questions for an exam in econometrics. It includes 4 questions about probability, hypothesis testing, and confidence intervals related to topics like normal distributions, difference of means, sample proportions, and earnings data. The questions are multiple choice or require calculations and short explanations. Sample size, confidence levels, null and alternative hypotheses, and interpretations of statistical results are discussed.

Uploaded by

Robert Joe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
139 views11 pages

Exercise 2 Exam1practice Sa

This document provides practice questions for an exam in econometrics. It includes 4 questions about probability, hypothesis testing, and confidence intervals related to topics like normal distributions, difference of means, sample proportions, and earnings data. The questions are multiple choice or require calculations and short explanations. Sample size, confidence levels, null and alternative hypotheses, and interpretations of statistical results are discussed.

Uploaded by

Robert Joe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 11

lOMoARcPSD|24002269

Exercise 2 Exam1Practice SA

Econometrics I (National University of Singapore)

Studeersnel wordt niet gesponsord of ondersteund door een hogeschool of universiteit


Gedownload door kmt mdy (kmtpdo@gmail.com)
lOMoARcPSD|24002269

Practice Questions for Exam 1

1. The height of male students at your college/university is normally distributed with a


mean of 70 inches and a standard deviation of 3.5 inches. If you had a list of telephone
numbers for male students for the purpose of conducting a survey, what would be the
probability of randomly calling one of these students whose height is:

(a) taller than 6'0"?


(b) between 5'3" and 6'5"?
(c) shorter than 5'7", the mean height of female students?
(d) shorter than 5'0"?
(e) taller than Shaquille O'Neal, the center of the Boston Celtics, who is 7'1" tall?
Compare this to the probability of a woman being pregnant for 10 months (300 days),
where days of pregnancy is normally distributed with a mean of 266 days and a standard
deviation of 16 days.

Answer:
(a) Pr(Z > 0.5714) = 0.2839;
(b) Pr( –2 < Z < 2) = 0.9545 or approximately 0.95;
(c) Pr(Z < -0.8571) = 0.1957;
(d) Pr(Z < -2.8571) = 0.0021;
(e) Pr(Z > 4.2857) = 0.000009 (the text does not show values above 2.99 standard
deviations, Pr(Z>2.99 = 0.0014) and Pr(Z > 2.1250) = 0.0168.

2) Adult males are taller, on average, than adult females. Visiting two recent American
Youth Soccer Organization (AYSO) under 12 year old (U12) soccer matches on a Saturday,
you do not observe an obvious difference in the height of boys and girls of that age. You
suggest to your little sister that she collect data on height and gender of children in 4th to
6th grade as part of her science project. The accompanying table shows her findings.

Height of Young Boys and Girls, Grades 4-6, in inches

Boys Girls

57.8 3.9 55 58.4 4.2 57

(a) Let your null hypothesis be that there is no difference in the height of females and
males at this age level. Specify the alternative hypothesis.
(b) Find the difference in height and the standard error of the difference.
(c) Generate a 95% confidence interval for the difference in height.
(d) Calculate the t-statistic for comparing the two means. Is the difference statistically
significant at the 1% level? Which critical value did you use? Why would this number be
smaller if you had assumed a one-sided alternative hypothesis? What is the intuition
behind this?
Answer:
(a) H0 : - = 0 vs. H1 : - ≠0

(b) - = -0.6, SE( - )= = 0.77.


3.92 4.22
55 +
57
(c) -0.6 ± 1.96 × 0.77 = (-2.11, 0.91).

This study source was downloaded by 100000791990696 from CourseHero.com on 04-27-2022 22:26:53 GMT -

https://www.coursehero.com/file/13236881/exercise-2-ExaGme1dPorawcntilcoeaSdAd/ oor kmt mdy


lOMoARcPSD|24002269

(d) t = -0.78, so < 2.58, which is the critical value at the 1% level. Hence you cannot
t

reject the null hypothesis. The critical value for the one-sided hypothesis would have been
2.33. Assuming a one-sided hypothesis implies that you have some information about
the problem at hand, and, as a result, can be more easily convinced than if you had no
prior expectation.

3) Assume that two presidential candidates, call them Bush and Gore, receive 50% of the
votes in the population. You can model this situation as a Bernoulli trial, where Y is a random
variable with success probability Pr(Y = 1) = p, and where Y = 1 if a person votes for Bush
and Y = 0 otherwise. Furthermore, be the fraction of successes (1s) in a sample, which
let pˆ

is distributed ) in reasonably large samples, say for n ≥ 40.


N(p, p ( 1−
p) n

(a) Given your knowledge about the population, find the probability that in a random sample
of 40, Bush would receive a share of 40% or less.
(b) How would this situation change with a random sample of 100?
(c) Given your answers in (a) and (b), would you be comfortable to predict what the
voting intentions for the entire population are if you did not know p but had polled
10,000
individuals at random and ? Explain.
calculated pˆ

(d) This result seems to hold whether you poll 10,000 people at random in the
Netherlands or the United States, where the former has a population of less than 20
million people, while the United States is 15 times as populous. Why does the population
size not come into play? Answer:
(a) Pr( < 0.40) = Pr(Z < ) = Pr(Z < -1.26) ≈ 0.104. In roughly every 10th sample
pˆ 0.40 − 0.50
0.25
40

of this size, Bush would receive a vote of less than 40%, although in truth, his share is 50%.
(b) Pr( < 0.40) = Pr(Z < ) = Pr(Z < -2.00) ≈ 0.023. With this sample size, you
pˆ 0.40 − 0.50
0.25
100

would expect this to happen only every 50th sample.


(c) The answers in (a) and (b) suggest that for even moderate increases in the sample size,
the estimator does not vary too much from the population mean. Polling 10,000 individuals,
the probability of finding a of 0.48, for example, would be 0.00003. Unless the election

was extremely close, which the 2000 election was, polls are quite accurate even for sample
sizes of 2,500.
(d) The distribution of sample means shrinks very quickly depending on the sample size, not
the population size. Although at first this does not seem intuitive, the standard error of an
estimator is a value which indicates by how much the estimator varies around the
population value. For large sample sizes, the sample mean typically is very close to the

This study source was downloaded by 100000791990696 from CourseHero.com on 04-27-2022 22:26:53 GMT -

https://www.coursehero.com/file/13236881/exercise-2-ExaGme1dPorawcntilcoeaSdAd/ oor kmt mdy


lOMoARcPSD|24002269

population mean.

4) You have collected weekly earnings and age data from a sub-sample of 1,744 individuals
using the Current Population Survey in a given year.
(a) Given the overall mean of $434.49 and a standard deviation of $294.67, construct a
99% confidence interval for average earnings in the entire population. State the meaning of
this interval in words, rather than just in numbers. If you constructed a 90% confidence
interval instead, would it be smaller or larger? What is the intuition?
(b) When dividing your sample into people 45 years and older, and younger than 45,
the information shown in the table is found.

Age Category Average Earnings Standard Deviation N


Y SY

Age ≥ 45 $488.87 $328.64 507


Age < 45 $412.20 $276.63 1237

Test whether or not the difference in average earnings is statistically significant. Given your
knowledge of age-earning profiles, does this result make sense?

Answer:
(a) The confidence interval for mean weekly earnings is 434.49 ± 2.58 = 434.49 ±
× 294.67
1744

18.20
= (416.29, 452.69). Based on the sample at hand, the best guess for the population mean is
$434.49. However, because of random sampling error, this guess is likely to be wrong.
Instead, the interval estimate for the average earnings lies between $416.29 and $452.69.
Committing to such an interval repeatedly implies that the resulting statement is incorrect
1 out of 100 times. For a 90% confidence interval, the only change in the calculation of the
confidence interval is to replace 2.58 by 1.64. Hence the confidence interval is smaller. A
smaller interval implies, given the same average earnings and the standard deviation, that
the statement will be false more often. The larger the confidence interval, the more likely
it is to contain the population value.
(b) Assuming unequal population variances, t = = 4.62, which is
( 488.87 − 412.20 )
328.642276.632
+
507 12.7

statistically significant at conventional levels whether you use a two-sided or one-sided


alternative. Hence the null hypothesis of equal average earnings in the two groups is
rejected. Age-earning profiles typically take on an inverted U-shape. Maximum earnings
occur in the 40s, depending on some other factors such as years of education, which are
not considered here. Hence it is not clear if the alternative hypothesis should be one-sided
or two-sided. In such a situation, it is best to assume a two-sided alternative hypothesis.

5) Sir Francis Galton, a cousin of James Darwin, examined the relationship between the
height of children and their parents towards the end of the 19th century. It is from this
study that the name "regression" originated. You decide to update his findings by
collecting data from 110 college students, and estimate the following relationship:

This study source was downloaded by 100000791990696 from CourseHero.com on 04-27-2022 22:26:53 GMT -

https://www.coursehero.com/file/13236881/exercise-2-ExaGme1dPorawcntilcoeaSdAd/ oor kmt mdy


lOMoARcPSD|24002269

= 19.6 + 0.73 × Midparh, R2 = 0.45, SER = 2.0

where Studenth is the height of students in inches, and Midparh is the average of the
parental heights. (Following Galton's methodology, both variables were adjusted so that
the average female height was equal to the average male height.)
(a) Interpret the estimated coefficients.
(b) What is the meaning of the regression R2?
(c) What is the prediction for the height of a child whose parents have an average height of
70.06 inches?
(d) What is the interpretation of the SER here?
(e) Given the positive intercept and the fact that the slope lies between zero and one, what
can you say about the height of students who have quite tall parents? Those who have
quite short parents?
Answer:
(a) For every one inch increase in the average height of their parents, the student's
height increases by 0.73 of an inch. There is no reasonable interpretation for the
intercept.
(b) The model explains 45 percent of the variation in the height of
students. (c) 19.6 + 0.73 × 70.06 = 70.74.
(d) The SER is a measure of the spread of the observations around the regression line.
The magnitude of the typical deviation from the regression line or the typical regression
error here is two inches.
(e) Tall parents will have, on average, tall students, but they will not be as tall as their
parents. Short parents will have short students, although on average, they will be
somewhat taller than their parents.

6) The baseball team nearest to your home town is, once again, not doing well. Given that
your knowledge of what it takes to win in baseball is vastly superior to that of
management, you want to find out what it takes to win in Major League Baseball (MLB).
You therefore collect the winning percentage of all 30 baseball teams in MLB for 1999 and
regress the winning percentage on what you consider the primary determinant for wins,
which is quality pitching (team earned run average). You find the following information on
team performance:

Summary of the Distribution of Winning Percentage and


Team Earned Run Average for MLB in 1999
Average Standard Percentile
deviation
10% 25% 40% 50% 60% 75% 90%
(median
)
Team 4.71 0.53 3.84 4.35 4.72 4.78 4.91 5.06 5.25
ERA
Winning 0.50 0.08 0.40 0.43 0.46 0.48 0.49 0.59 0.60
Percentag
e

(a) What is your expected sign for the regression slope? Will it make sense to interpret
the intercept? If not, should you omit it from your regression and force the regression
line through the origin?
(b) OLS estimation of the relationship between the winning percentage and the team
ERA yield the following:

This study source was downloaded by 100000791990696 from CourseHero.com on 04-27-2022 22:26:53 GMT -

https://www.coursehero.com/file/13236881/exercise-2-ExaGme1dPorawcntilcoeaSdAd/ oor kmt mdy


lOMoARcPSD|24002269

= 0.9 – 0.10 × teamera , R2=0.49, SER = 0.06,

where winpct is measured as wins divided by games played, so for example a team that won
half of its games would have Winpct = 0.50. Interpret your regression results.
(c) It is typically sufficient to win 90 games to be in the playoffs and/or to win a
division. Winning over 100 games a season is exceptional: the Atlanta Braves had the
most wins in 1999 with 103. Teams play a total of 162 games a year. Given this
information, do you consider the slope coefficient to be large or small?
(d) What would be the effect on the slope, the intercept, and the regression R2 if
you measured Winpct in percentage points, i.e., as (Wins/Games) × 100?
(e) Are you impressed with the size of the regression R2? Given that there is 51% of
unexplained variation in the winning percentage, what might some of these factors
be?

This study source was downloaded by 100000791990696 from CourseHero.com on 04-27-2022 22:26:53 GMT -

https://www.coursehero.com/file/13236881/exercise-2-ExaGme1dPorawcntilcoeaSdAd/ oor kmt mdy


lOMoARcPSD|24002269

Answer:
(a) You expect a negative relationship, since a higher team ERA implies a lower quality of
the input. No team comes close to a zero team ERA, and therefore it does not make sense
to interpret the intercept. Forcing the regression through the origin is a false implication
from this insight. Instead the intercept fixes the level of the regression.
(b) For every one point increase in Team ERA, the winning percentage decreases by 10
percentage points, or 0.10. Roughly half of the variation in winning percentage is
explained by the quality of team pitching.
(c) The coefficient is large, since increasing the winning percentage by 0.10 is the
equivalent of winning 16 more games per year. Since it is typically sufficient to win 56
percent of the games to qualify for the playoffs, this difference of 0.10 in winning
percentage turns can easily turn a loosing team into a winning team.
(d) Clearly the regression R2 will not be affected by a change in scale, since a descriptive
measure of the quality of the regression would depend on whim otherwise. The slope of
the regression will compensate in such a way that the interpretation of the result is
unaffected, i.e., it will become 10 in the above example. The intercept will also change to
reflect the fact that if X were 0, then the dependent variable would now be measured in
percentage, i.e., it will become 94.0 in the above example.
(e) It is impressive that a single variable can explain roughly half of the variation in
winning percentage. Answers to the second question will vary by student, but will
typically include the quality of hitting, fielding, and management. Salaries could be
included, but should be reflected in the inputs.

7) In 2001, the Arizona Diamondbacks defeated the New York Yankees in the Baseball World
Series in 7 games. Some players, such as Bautista and Finley for the Diamondbacks, had a
substantially higher batting average during the World Series than during the regular
season. Others, such as Brosius and Jeter for the Yankees, did substantially poorer. You set
out to investigate whether or not the regular season batting average is a good indicator for
the World Series batting average. The results for 11 players who had the most at bats for
the two teams are:

= –0.347 + 2.290 AZSeasavg , R2=0.11, SER = 0.145,

= 0.134 + 0.136 NYSeasavg , R2=0.001, SER = 0.092,

where Wsavg and Seasavg indicate the batting average during the World Series and the
regular season respectively.
(a) Focusing on the coefficients first, what is your interpretation?
(b) What can you say about the explanatory power of your equation? What do you
conclude from this?
Answer:
(a) The two regressions are quite different. For the Diamondbacks, players who had a 10
point higher batting average during the regular season had roughly a 23 point higher
batting average during the World Series. Hence top performers did relatively better. The
opposite holds for the Yankees.
(b) Both regressions have little explanatory power as seen from the regression R2.
Hence performance during the season is a poor forecast of World Series performance.

8) You have obtained a sample of 14,925 individuals from the Current Population Survey
(CPS) and are interested in the relationship between average hourly earnings and years
of education. The regression yields the following result:

This study source was downloaded by 100000791990696 from CourseHero.com on 04-27-2022 22:26:53 GMT -

https://www.coursehero.com/file/13236881/exercise-2-ExaGme1dPorawcntilcoeaSdAd/ oor kmt mdy


lOMoARcPSD|24002269

= -4.58 + 1.71×educ , R2 = 0.182, SER =


9.30
ahˆe

where ahe and educ are measured in dollars and years respectively.

a. Interpret the coefficients and the regression R2.

b. Is the effect of education on earnings large?

c. Why should education matter in the determination of earnings? Do the results suggest
that there is a guarantee for average hourly earnings to rise for everyone as they receive
an additional year of education? Do you think that the relationship between education and
average hourly earnings is linear?

d. The average years of education in this sample is 13.5 years. What is mean of
average hourly earnings in the sample?

e. Interpret the measure SER. What is its unit of measurement.


Answer:
a. A person with one more year of education increases her earnings by $1.71. There is
no meaning attached to the intercept, it just determines the height of the regression.
The model explains 5 percent of the variation in average hourly earnings.

b. The difference between a high school graduate and a college graduate is four years of
education. Hence a college graduate will earn almost $7 more per hour, on average ($6.84
to be precise). If you assume that there are 2,000 working hours per year, then the average
salary difference would be close to $14,000 (actually $13,680). Depending on how much
you have spent for an additional year of education and how much income you have
forgone, this does not seem particularly large.

c. In general, you would expect to find a positive relationship between years of education
and average hourly earnings. Education is considered investment in human capital. If this
were not the case, then it would be a puzzle as to why there are students in the
econometrics course — surely they are not there to just "find themselves" (which would
be quite expensive in most cases). However, if you consider education as an investment
and you wanted to see a return on it, then the relationship will most likely not be linear.
For example, a constant percent return would imply an exponential relationship whereby
the additional year of education would bring a larger increase in average hourly earnings
at higher levels of education. The results do not suggest that there is a guarantee for
earnings to rise for everyone as they become more educated since the regression R2 does
not equal
1. Instead the result holds "on average."

d. Since 0 = - 1 ⇒ = 0+ 1 . Substituting the estimates for the slope and the


Y X X
Y
intercept then results in a mean of average hourly earnings of roughly $18.50.

e. The typical prediction error is $9.30. Since the measure is related to the deviation of
the actual and fitted values, the unit of measurement must be the same as that of the
dependent variable, which is in dollars here.

9) You have obtained measurements of height in inches of 29 female and 81 male students
This study source was downloaded by 100000791990696 from CourseHero.com on 04-27-2022 22:26:53 GMT -

https://www.coursehero.com/file/13236881/exercise-2-ExaGme1dPorawcntilcoeaSdAd/ oor kmt mdy


lOMoARcPSD|24002269

(Studenth) at your university. A regression of the height on a constant and a binary


variable

This study source was downloaded by 100000791990696 from CourseHero.com on 04-27-2022 22:26:53 GMT -

https://www.coursehero.com/file/13236881/exercise-2-ExaGme1dPorawcntilcoeaSdAd/ oor kmt mdy


lOMoARcPSD|24002269

(BFemme), which takes a value of one for females and is zero otherwise, yields the following
result:

= 71.0 – 4.84×BFemme , R2 = 0.40, SER =


2.0 (0.3) (0.57)
(a) What is the interpretation of the intercept? What is the interpretation of the slope?
How tall are females, on average?
(b) Test the hypothesis that females, on average, are shorter than males, at the 1% level.
(c) Is it likely that the error term is homoskedastic
here? Answer:
(a) The intercept gives you the average height of males, which is 71 inches in this
sample. The slope tells you by how much shorter females are, on average (almost 5
inches). The average height of females is therefore approximately 66 inches.
(b) The t-statistic for the difference in means is -8.49. For a one-sided test, the critical value
is –2.33. Hence the difference is statistically significant.
(c) It is safer to assume that the variances for males and females are different. In
the underlying sample the standard deviation for females was smaller.

10) You have collected 14,925 observations from the Current Population Survey. There are
6,285 females in the sample, and 8,640 males. The females report a mean of average
hourly earnings of $16.50 with a standard deviation of $9.06. The males have an average
of $20.09 and a standard deviation of $10.85. The overall mean average hourly earnings is
$18.58.

a. Using the t-statistic for testing differences between two means (section 3.4 of your
textbook), decide whether or not there is sufficient evidence to reject the null
hypothesis that females and males have identical average hourly earnings.

b. You decide to run two regressions: first, you simply regress average hourly earnings on an
intercept only. Next, you repeat this regression, but only for the 6,285 females in the
sample. What will the regression coefficients be in each of the two regressions?

c. Finally you run a regression over the entire sample of average hourly earnings on an
intercept and a binary variable DFemme, where this variable takes on a value of 1 if the
individual is a female, and is 0 otherwise. What will be the value of the intercept? What
will be the value of the coefficient of the binary variable?

d. What is the standard error on the slope coefficient? What is the t-

statistic? Answer: a. H0: μF = μM; H1: μF ≠ μM


. As a result, you can comfortably reject the null hypothesis at any
20.09 −16.05
t=
10.852 9.062
+
86406285

reasonable confidence level.

b. = = 18.58; = = 16.50
0 0

Hence for each of the regressions, the intercept takes on the value of the overall mean for

This study source was downloaded by 100000791990696 from CourseHero.com on 04-27-2022 22:26:53 GMT -

https://www.coursehero.com/file/13236881/exercise-2-ExaGme1dPorawcntilcoeaSdAd/ oor kmt mdy


lOMoARcPSD|24002269

average hourly earnings, and the mean average hourly earnings for females.

c. = 0+ × DFemme = 20.09 - 3.59× DFemme


1
The intercept is the mean of average hourly earnings for males, and the slope is the
difference between the mean of average hourly earnings of females and males.

d. The standard error on the slope coefficient is 0.16, which is identical to the standard error
of the t-statistic in (a) above. Hence the t-statistic is (-21.98).

This study source was downloaded by 100000791990696 from CourseHero.com on 04-27-2022 22:26:53 GMT -

https://www.coursehero.com/file/13236881/exercise-2-ExaGme1dPorawcntilcoeaSdAd/ oor kmt mdy

You might also like