Statistical Foundations of Business Analytics
Chapter 5: Generalized Linear Models
Tim Ederer
Mini 2, 2024
Tepper Business School
Introduction
Chapter 1-4 give you a complete toolkit to make inference about β
• Mostly focus on cases where y is continuous
What happens when y is not continuous?
• Binary outcomes: binary choice, credit default,...
• Categorical outcomes: duration, multiple choice,...
1 / 21
Binary Outcomes
Linear Probability Model
What happens when yi is binary?
• Example: yi = 1 if consumer i chooses product A and yi = 0 otherwise
Linear regression model is not appropriate!
• Linear probability model: E[yi |xi ] = P(yi = 1|xi ) = xi′ β
• Can lead to predictions where P(yi = 1|xi ) is below 0 or above 1
Need to think about an alternative model
• Build model where P(yi = 1|xi ) ∈ [0, 1]
2 / 21
Latent Outcomes
Assume that there is a continuous latent outcome yi∗ such that
(
1, if yi∗ ≥ 0
yi =
0, otherwise
Examples
• Choice: yi∗ could be the utility/valuation of a specific product
• Credit default: yi∗ could be the solvency of a company
• yi∗ is normalized such that you buy the product or you default when yi∗ ≥ 0
How does that help us?
3 / 21
Adapting the Linear Regression Model
We can use the linear regression model to relate yi∗ to xi
yi∗ = xi′ β + εi
Under this structure, all you need to know is β!
• Causal analysis: how a change in xi would change yi∗ and eventually yi
• Forecasting: what would be yi under a counterfactual realization of xi
How do we use data on (yi , xi ) to learn about β?
• How can we overcome the challenge that yi∗ is not observed?
4 / 21
Assumptions
We still need EXO and RANK
• E[εi |xi ] = 0 and no perfect collinearity between elements of x
But given that we do not observe yi∗ we need more structure
P(yi = 1|xi ) = P(xi′ β + εi ≥ 0|xi )
Can we recover β from P(yi = 1|xi )?
5 / 21
Probit
Answer is YES if you specify the distribution of ε
• Reminder: this is not needed in the standard linear regression model
Probit model: εi |xi ∼ N (0, 1)
• P(yi = 1|xi ) = Φ(xi′ β)
• Φ(.) is the c.d.f. of the standard normal distribution
β is identified under this assumption!
• If we would observe the population we could directly derive β
6 / 21
Logit
Alternative to probit: logit model
• Different assumption on distribution of εi
More convenient than probit because of tractable analytical expressions
exp{xi′ β} 1
P(yi = 1|xi ) = and P(yi = 0|xi ) =
1 + exp{xi′ β} 1 + exp{xi′ β}
β is also identified under this assumption
• Use the fact that log P(yi = 1|xi ) − log P(yi = 0|xi ) = xi′ β
7 / 21
What About Inference?
β is identified, now what?
• Does not tell us how we can make inference about β with our sample
We cannot use OLS for these models
• OLS would only work if we observed yi∗
How can we find an alternative estimator for β?
• Use Maximum Likelihood Estimation (MLE)
• Intuition: find value of β such that the predictions of the model are closest to data
8 / 21
Maximum Likelihood Estimation
Likelihood
We want an estimator that “fits” the data best
• Need to measure how likely it is that our model will predict the observed outcome
Likelihood of individual i: l(β; yi , xi )
• How likely it is given a value of β that I observe (yi , xi )?
Qn
Likelihood of the sample: i=1 l(β; yi , xi )
• How likely it is given a value of β that I observe (yi , xi ) for all i = 1, ..., n?
• If this value is small, the model fits poorly =⇒ we should change β!
9 / 21
Maximum Likelihood Estimator
MLE is the value of β that maximizes the log of the likelihood of the sample
• We take the log for mathematical and computational tractability
Pn
Log-Likelihood of the sample: L(β; y , X ) = i=1 log l(β; yi , xi )
• Allows to transform product into a sum =⇒ easier to compute in R
The maximum likelihood estimator for β is defined as
β̂ ML = arg max L(β; y , X )
β
10 / 21
Examples: Logit and Probit
Likelihood of individual i in the logit model
yi 1−yi
exp{xi′ β}
1
l(β; yi , xi ) =
1 + exp{xi′ β} 1 + exp{xi′ β}
Likelihood of individual i in the probit model
y 1−yi
l(β; yi , xi ) = (Φ(xi′ β)) i (1 − Φ(xi′ β))
11 / 21
Illustration in R: MLE with Probit Model
Assume that yi∗ = βxi + εi with β = 1
• Probit: εi |xi ∼ N (0, 1)
12 / 21
Properties of MLE
β̂ ML is unbiased, consistent and efficient
• Only under EXO and RANK
Unbiased and consistent estimator for variance of MLE
" #−1
L(β̂; y , X )
ML
Var(β̂ |X ) = −
c = Iˆ−1
∂β∂β ′
β̂ ML is normally distributed for large n
β̂ ML |X ∼ N β, Iˆ−1
13 / 21
Recap
Linear regression model is not appropriate for binary outcomes
• Leads to incoherent predictions
Alternative model: latent variable model
• Impose linear regression model on continuous latent outcome yi∗
We can make inference about β even if we do not observe yi∗
• Need to impose distributional assumption on ε (probit, logit,...)
• β is identified and β̂ ML is unbiased, consistent and efficient
• =⇒ we can make inference about β using the tools from Chapter 2!
14 / 21
Categorical Outcomes
Categorical Outcomes
What should we do when yi is a categorical variable?
• Multiple choice: yi is the product chosen by consumer i
• Survival analysis: yi is the duration before an event occurs (credit default, insurance claim)
As in the binary case, the linear regression model is not appropriate
• Rely instead on latent variable model
• Use linear regression model to link continuous latent outcome to x
• Estimate parameters of interest via Maximum Likelihood
Focus of this chapter: multiple choice analysis
• Useful for analysis of consumer behavior, optimal pricing, advertisement strategy
15 / 21
Discrete Choice Model
Consider yi as being consumer i’s choice over J products
1 if i chooses product 1
2 if i chooses product 2
yi = ..
.
J if i chooses product J
Goal: study the relationship between consumer choice yi and (z1 , z2 , ..., zJ )
• zj : product j’s characteristics (i.e. price, quality)
16 / 21
Latent Utility Model
Define yi as a function of uij the utility consumer i gets from buying product j
1
if ui1 ≥ uij for all j
2 if ui2 ≥ uij for all j
yi = ..
.
J if uiJ ≥ uij for all j
Use linear regression model to link uij to zj
uij = zj′ β + εij
17 / 21
Conditional Logit
Assumptions needed
• As always we need EXO and RANK
• Fix distribution of ε: εij ∼ Gumbel(0, 1)
This is called the conditional logit model
exp{zj′ β}
P(yi = j|z1 , ..., zJ ) = PJ
′
k=1 exp{zk β}
18 / 21
Estimation of Conditional Logit Model
Use Maximum Likelihood to estimate β
• β̂ ML is unbiased, consistent and efficient under EXO and RANK
Likelihood of individual i
J
!1{yi =j}
Y exp{zj′ β}
l(β; yi , z1 , ..., zJ ) = PJ
j=1 k=1 exp{zk′ β}
19 / 21
Summary
This chapter: what happens when yi is not continuous?
• Linear regression model is not appropriate anymore
Alternative: latent variable model
• Need additional assumptions (fix distribution of errors)
• Need to change the estimator (maximum likelihood estimator)
Very useful applications
• Consumer choice analysis, duration analysis, pricing strategy,...
• Essential in economics, finance, strategy, marketing
20 / 21
Thank you and good luck!
21 / 21