Lecture notes: Demand estimation introduction 1
1 Why demand analysis/estimation?
• Estimation of demand functions is an important empirical endeavor. Why?
• Fundamental empircial question: how much market power do firms have?
– Market power: ability to raise prices profitably. (What market power do
price-taking firms have?)
p−mc
– Market power measured by markup: p
.
– Problem: mc not observed!
– Motivates empirical methodology in IO.
– For example, you observe high prices in an industry. Is this due to market
power, or due to high costs? Cannot answer this question directly, because
we don’t observe costs.
• Indirect approach: obtain estimate of firms’ markups by estimating firms’ de-
mand functions.
• Intuition is most easily seen in monopoly example:
– max pq(p) − C(q(p)), where q(p) is demand curve.
p
– FOC: q(p) + pq ′ (p) = C ′ (q(p))q ′(p)
– At optimal price p∗ , Inverse Elasticity Property holds:
q(p∗ )
(p∗ − MC(q(p∗ ))) = −
q ′ (p∗ )
or
p∗ − mc (q(p∗ )) 1
=− ∗ ,
p ∗ ǫ(p )
′ ∗ p∗
where ǫ(p ) is q (p ) ∗ , the price elasticity of demand.
∗
q(p )
p∗ −mc(q(p∗ ))
– Hence, if we can estimate ǫ(p∗ ), we can infer what the markup p∗
is, even when we don’t observe the marginal cost mc (q(p∗ )).
1
Lecture notes: Demand estimation introduction 2
– Caveat: validity of exercise depends crucially on using the right supply-side
model (in this case: monopoly without entry possibility).
If costs were observed: markup could be estimated directly, and we could
test for vaalidity of monopoly pricing model (ie. test whether markup=
−1
ǫ
).
• Start by reviewing some econometrics. (No attempt to be exhaustive.)
2 Primer: Least-squares estimation
• Observe data points {yi , xi } for i = 1, . . . n. What is linear relationship between
y and x?
• Graph. What linear function of x – that is, α + βx – fits y the best?
• Ordinary least squares (OLS) regression:
X
min [yi − α − βxi ]2 .
α,β
i
• In multivariate case: Xi and β~ are both K − dimensionalvectors. Then
~ 2.
X
min [yi − α − Xi′ β]
~
α,β i
To analyze properties of OLS regression, consider a closely-related statistical problem
of Best Linear Prediction,
Consider two random variables X and Y . What is the “best” predictor of Y , among
all the possible linear functions of X?
“Best” linear predictor minimizes the mean squared error of prediction:
min E(Y − α − βX)2 . (1)
α,β
2
Lecture notes: Demand estimation introduction 3
(Recall: expectation is linear operator, so that E(A + B) = EA + EB)
The first-order conditions are:
For α: 2α − 2EY + 2βEX = 0
For β: 2βEX 2 − 2EXY + 2αEX = 0.
Solving:
Cov(X, Y )
β∗ =
VX (2)
α = EY − β ∗ EX
∗
where
Cov(X, Y ) = E[(X − EX)(Y − EY )] = E(XY ) − EX · EY
and
V X = E[(X − EX)2 ] = E(X 2 ) − (EX)2 .
Additional implications of b.l.p.: Let Ŷ ≡ α∗ + β ∗ X denote a “fitted value” of
Y , and U ≡ Y − Ŷ denote the “residual” or prediction error:
• EU = 0
• V Ŷ = (β ∗ )2 V X = (Cov(X, Y ))2 /V X = ρ2XY V Y
• V U = V Y + (β ∗ )2 V X − 2β ∗ Cov(X, Y ) = V Y − (Cov(X, Y ))2 /V X = (1 −
ρ2XY )V Y
Hence, the b.l.p. accounts for a ρ2XY proportion of the variance in Y ; in this sense,
the correlation measures the linear relationship between Y and X.
3
Lecture notes: Demand estimation introduction 4
Also note that
Cov(Ŷ , U) = Cov(Ŷ , Y − Ŷ )
= E[(Ŷ − E Ŷ )(Y − Ŷ − EY + E Ŷ )]
= E[(Ŷ − E Ŷ )(Y − EY ) − (Ŷ − E Ŷ )(Ŷ − E Ŷ )]
= Cov(Ŷ , Y ) − V Ŷ
= E[(α∗ + β ∗ X − α∗ − β ∗ EX)(Y − EY )] − V Ŷ (3)
= β ∗ E[(X − EX)(Y − EY )] − V Ŷ
= β ∗ Cov(X, Y ) − V Ŷ
= Cov 2 (X, Y )/V X − Cov 2 (X, Y )/V X
= 0.
Hence, for any random variable X, the random variable Y can be written as the sum
of a part which is a linear function of X, and a part which is uncorrelated with X.
Also,
Cov(X, U) = 0. (4)
Note: in practice, with a finite sample of Y, X, the minimization problem (1) is
infeasible. In practice, we minimize the sample counterpart
X
min (Yi − α − βXi )2 (5)
α,β
i
which is the objective function in ordinary least squares regression. The OLS values
for α and β are the finite-sample versions of Eq. (2).
(In “sample” version, expectations are replaced by sample averages. eg. mean Ex is
replaced by sample average from n observations X̄n ≡ n1 i Xi . Law of large numbers
P
say this approximation should not be bad, especially for large n.)
4
Lecture notes: Demand estimation introduction 5
Next we can see some intuition of least-squares regression. Assume that the “true”
model describing the generation of the Y process is:
Y = α + βX + ǫ, Eǫ = 0. (6)
What we mean by true model is that this is a causal model in the sense that a one-
unit increase in X would raise Y by β units. (In the previous section, we just assume
that Y, X move jointly together, so there is no sense in which changes in X “cause”
changes in Y .)
Question: under what assumptions does doing least-squares on Y, X (as in Eqs. (1)
or (5) above) recover the true model; ie. α∗ = α, and β ∗ = β?
• For α∗ :
α∗ = EY − β ∗ EX
= α + βEX + Eǫ − β ∗ EX
which is equal to α if β = β ∗ .
• For β ∗ :
Cov(α + βX + ǫ, X)
β∗ =
V arX
1
= · {E[X(α + βX + ǫ)] − EX · E[α + βX + ǫ]}
V arX
1
· αEX + βEX 2 + E[ǫX] − αEX − β[EX]2 − EXEǫ
=
V arX
1
· β[EX 2 − (EX)2 ] + E[ǫX]
=
V arX
which is equal to β if
E[ǫX] = 0. (7)
This is an “exogeneity” assumption, that (roughly) X and the disturbance term ǫ are
uncorrelated. Under this assumption, the best linear predictors from the infeasible
5
Lecture notes: Demand estimation introduction 6
problem (1)) coincide with the true values of α, β. Correspondingly, it turns out that
the feasible finite-sample least-squares estimates from (5) are “good” (in some sense)
estimators for α, β.
Note that the orthogonality condition (7) differs from the zero covariance property
(4), which is a feature of the b.l.p.
When there is more than one X variable, then we use multivariate regression. In
matrix notation, true model is:
Yn×1 = Xn×k βk×1 + ǫn×1 .
The least-squares estimator for β is
β OLS = (X ′ X)−1 X ′ Y.
Next we consider estimating demand functions, where exogeneity is usually violated.
3 Demand estimation
Linear demand-supply model:
Demand: qtd = γ1 pt + x′t1 β1 + ut1
Supply: pt = γ2 qts + x′t2 β2 + ut2
Equilibrium: qtd = qts
Demand function summarizes consumer preferences; supply function summarizes
firms’ cost structure
First, focus on estimating demand function:
Demand: qt = γ1 pt + x′t1 β1 + ut1
If u1 correlated with u2 , then pt is endogenous in demand function: cannot estimate
using OLS. Graph. Several estimation approaches.
6
Lecture notes: Demand estimation introduction 7
1. Instrumental variable (IV) methods:
• Assume there are instruments Z which satisfy certain properties
(a) Uncorrelated with error term in demand equation: E(u1 Z) = 0. Ex-
clusion restriction. (“order condition”)
(b) Correlated with endogenous variable: Cov(Z, p) 6= 0. (“rank condi-
tion”)
• The x’s are exogenous variables which can serve as instruments:
(a) xt2 are cost shifters; affect production costs. Correlated with pt but
not with ut1 : use as instruments in demand function.
(b) xt1 are demand shifters; affect willingness-to-pay, but not a firm’s pro-
duction costs. Correlated with qt but not with u2t : use as instruments
in supply function.
• Two-stage least squares:
β 2sls = (X̂ ′ X̂)−1 X̂ ′ Y
where X̂ ≡ Z ′ (Z ′ Z)−1 (Z ′ X) are the predicted values of X from a least-
squares regression of X on Z.
2. Maximum likelihood (more technical):
• Likelihood function of the data is joint density of the endogenous variables
(qt , pt ) conditional on exogenous variables (xt1 , xt2 ).
• First, need to express endogenous variables in terms of exogenous variables:
Demand: qt = γ1 pt + x′t1 β1 + ut1
Supply: pt = γ2 qt + x′t2 β2 + ut2
! ! ! ! !
1 −γ1 q β1 0 xt1 ut1
⇒ = +
−γ2 1 p 0 β2 xt2 ut2
⇔ ΓY = BX + U ⇔ Y = Γ−1 BX + Γ−1 U
This is called the “reduced form” representation of the demand-suppoly
system.
7
Lecture notes: Demand estimation introduction 8
• Assume that the unobservables {(ut1 , ut2 )Tt=1 } are distributed according to
a density function g(· · · ).
Example: (ut1 , ut2 ) ∼ i.i.d N(0, Σ)
Then joint density of ~u is:
−1 −1/2 1 ′ −1
g(~u) = (2π) |Σ| exp − (~u) Σ (~u) .
2
• Recall change of variables formula: if Y = X/a and X has density function
g(X), then Y has density function
f (Y ) = g(aY ) · a.
Applying the multivariate version of this, we get
f (Y ) = g(ΓY − BX)∗ | Γ | (8)
Assuming you have T observations of (Yt , Xt ), then likelihood function is
T
Y T
Y
T
L(Y |X) = f (Yt ) =| Γ | g(Y Γ − XB).
t=1 t=1
Log-likelihood function is (ignoring the constant):
T X1
log L(Y | X) ∼ T log | Γ | − log | Σ | − (ΓYt − BXt )′ Σ−1 (ΓYt − BXt )
2 t
2
• Maximize this with respect to Γ, B, Σ to obtain maximum likelihood esti-
mator.