ECONOMETRICS I
EESP - Graduate Program in Economics | First Quarter 2024
Problem Set 04
Due Mar 10, 23:59
1. In the last problem set, you were asked to minimize E (Y − x′ b)2 . Since many of you presented
some difficulties with that, in this question we will guide you to derive this result in detail.
We recommend Greene’s Appendix A as a reference for linear algebra, whenever you need to
consult something.
(a) First thing, write down the dimensions of all the terms in this expression, to be sure all
these operations are valid. Note that we can only take the square of (Y − x′ b) because
this is 1 × 1.
that (Y − x′ b)2 = (Y − x′ b)′ (Y − x′ b). Use that to show that E (Y − x′ b)2 =
(b) Now
2note
E Y + b′ E [xx′ ] b − 2b′ E [xY ]. Be very careful to justify all passages of this derivation.
Also be sure to always check the dimensions of the matrices in these expressions to ensure
the operations you are doing are valid.
(c) On the next item you will derive this last expression and set it equal to zero. Before we
do that, recall that we saw in Statistics that we cannot always interchange derivative and
expectation operations. Do we need to do that in this case?
As before, we don’t expect you to know the conditions in which you can make these inter-
changes, we just expect you to know that this is not something that is always valid.
(d) Now look up for the formulas for derivatives of matrices, and derive the expression from
item (b), to find E [xx′ ] b − E [xY ] = 0. What is the dimension of these objects? Why
is this so? Again, always make sure to check the dimension of all variables (at least until
you are very comfortable with these matrix operations).
We recommend Greene’s Appendix Section A8 for derivatives of matrix, but you are free
to check other references.
(e) Now show that, assuming E [xx′ ] invertible, we have β = (E [xx′ ])−1 E [xY ] (I changed
to β because now this is the solution to the minimization problem). Again check the
dimensions of all matrices.
E[xY ] ′
(f) Be sure you understand you can never write β = E[xx ′ ] if E [xx ] is a matrix! Also, check
that expressions like bE [xx′ ] or E [xx′ ] b2 don’t make sense. Why?
2. Consider a potential outcomes model:
(
Y (0) = β0 + β2 a + u
Y (1) = β1 + Y (0)
where β0 and β2 are the population OLS parameter of Y (0) on a.
(a) Can we interpret β1 as a causal parameter? And β2 ?
(b) Suppose now that cov(u, T ) = 0. Show that, in this case, we have that population OLS of
Y on T , a, and a constant would identify β1 .
Note: in this setting in which u is the error of an OLS population model with intercept,
we usually say interchangeably cov(u, T ) = 0 and E[uT ] = 0. Why is this not a problem?
ECONOMETRICS I Problem Set 04 - Page 2 of 3 Due Mar 10, 23:59
(c) Show that, if we have the CIA, that is, Y (0) ⊥
⊥ T |a, and the CEF E[Y (0)|a] is linear, then
we have that cov(u, T ) = 0.
(d) If we have Y (0) ⊥⊥ T |a, but E[Y (0)|a] is not linear, then we cannot guarantee that
cov(u, T ) = 0. Explain the intuition of this result, and discuss the implications for the
population OLS of Y on T , a, and a constant.
(e) Suppose now that a is binary. In this case, does the CIA guarantee that β1 is identified
by the OLS of Y on T and a, with an intercept?
(f) Suppose now we are under the assumptions from item c. However, we do not observe a.
What is the population OLS of Y on T and a constant? Under which conditions can we
recover β1 in this case? Provide an intuition for your results.
3. Let Y be an outcome variable of interest, D a binary variable and X a discrete variable with
support X , finite. Let px := Pr(X = x), define pD (x) := Pr(D = 1|X = x). We consider the
following population OLS model:
X
Y = βD + 1{X = x}δx + u.
x∈X
It is possible to show that, in this setting,
P
x∈X pD (x)(1 − pD (x))px βx
β= P ,
x∈X pD (x)(1 − pD (x))px
where βx = E[Y |D = 1, X = x] − E[Y |D = 0, X = x]. We are not asking you to show that
(you can check on MHE for more details).
Answer the following:
(a) Show that β is a convex combination of the βx ’s. If all βx ’s are positive, can we guarantee
that β will also be positive?
(b) Show that, under CIA, (Y (1), Y (0)) ⊥
⊥ T |X, we have that βx = E[Y (1) − Y (0)|X = x].
Note that, together with what you showed on item (a), this implies that, under the CIA,
the population OLS β recovers a weighted average of causal effects. Be careful in each
passage, highlighting the assumptions and results you are using.
(c) What happens to the weights given to each βx when px changes? And when pD (x) changes?
In particular, what happens when pD (x) equals one or zero? Give an intuition for these
results.
(d) Suppose now you are interested in the ATE, in the ATT, and in the ATU (average tre-
atment effect on the untreated). How can you identify these parameters? Highlight the
importance of the CIA to achieve these identifications. Do we need to impose additional
assumptions on pD (x)?
4. Consider a potential outcomes model
(
Y (0) = β0 + β2 W + u
Y (1) = β1L 1{W = 0} + β1H 1{W = 1} + Y (0)
where T = 1 if treated and T = 0 if untreated, and W is a dummy variable. β0 , β2 and u are
defined by the OLS of Y (0) on W and a constant.
ECONOMETRICS I Problem Set 04 - Page 3 of 3 Due Mar 10, 23:59
(a) What is the ATE? What is the ATT? What is the average treatment effect for those with
W = 1? What is the average treatment effect on the treated for those with W = 1?
(b) Suppose β1H = β1L = β1 . What is the identification assumption so that the population
OLS of Y on T , W and a constant recovers β1 ?
(c) Suppose now that β1H may be different from β1L . Assuming CIA, (Y (1), Y (0)) ⊥
⊥ T |W ,
how could you identify each of those parameters under the assumption considered in the
previous item?
(d) Suppose now you consider the population OLS of Y = γ0 + γ1 T + γ2 W + ϵ. What does
the parameter on γ1 recover in this case?
(e) Under which conditions do we have that E [ϵ | T, W ] = 0?
(f) Suppose now that the assumptions you considered in item (b) hold. Consider now the two
different OLS population regressions below:
Y = γ0 + γ1 T + γ2 T × W + γ3 W + u (1)
Y = π0 + π1 T × (1 − W ) + π2 T × W + π3 W + u (2)
1. What do the parameters γ1 , γ2 , π1 , π2 identify? Express in words what these parame-
ters mean.
2. Why can we write these OLS population models with u (the error term of the causal
model) as the error term?