0% found this document useful (0 votes)
8 views7 pages

Lecture 15 Amir Bashir

The document discusses logistic regression for modeling binary dependent variables, specifically focusing on the probability of a firm paying dividends or a worker having a permanent job. It outlines the logistic regression model, its estimation through maximum likelihood, and the interpretation of odds ratios and marginal effects. Additionally, it provides examples and exercises related to gender discrimination in hiring practices, illustrating the application of logistic regression in real-world scenarios.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views7 pages

Lecture 15 Amir Bashir

The document discusses logistic regression for modeling binary dependent variables, specifically focusing on the probability of a firm paying dividends or a worker having a permanent job. It outlines the logistic regression model, its estimation through maximum likelihood, and the interpretation of odds ratios and marginal effects. Additionally, it provides examples and exercises related to gender discrimination in hiring practices, illustrating the application of logistic regression in real-world scenarios.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Lecture 15:

Anderson pdf p-801


Regression with Qualitative (Binary) Dependent Variable: The Logistic Regression
Example1: Modeling amount of dividend paid by a corporate firm in a particular period
𝑦 = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + ⋯ + 𝛽𝑘 𝑥𝑘 + 𝑒
Where 𝑦 = amount of dividend paid (or dividend per share)
𝑥1 : firm size (market cap), 𝑥2 = retained earnings, 𝑥3 = liquidity, 𝑥4 = debt ratio, 𝑥5 =
profitability etc.
This is the usual multiple regression model, the parameters are easily estimated by LS and the
inference i.e. test of hypothesis and confidence intervals can be carried out using standard
framework. Goodness of fit can be measured by R2.
However, many firms do not pay dividend. Now consider the situation that you want to model
whether the firm pays dividend or not in a particular period under study i.e.
1 𝑝𝑎𝑦𝑠 𝑑𝑖𝑣𝑖𝑑𝑒𝑛𝑑
𝑦= {
0 𝑑𝑜𝑒𝑠 𝑛𝑜𝑡 𝑝𝑎𝑦 𝑑𝑖𝑣𝑖𝑑𝑒𝑛𝑑
We want to model this binary dependent variable using the explanatory variables or covariates as
above.
Consider modeling y as Bernoulli distribution with pdf
𝑓(𝑦) = 𝑝 𝑦 (1 − 𝑝)1−𝑦 , 𝑦 = 0, 1
Where p = P(Y= 1) i.e. probability that a firm pays dividend
𝐸(𝑌) = 𝑝, 𝑉(𝑌) = 𝑝(1 − 𝑝)
One can consider the usual regression framework
𝑦 = 𝐸(𝑌) + 𝑒
𝑦 = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + ⋯ + 𝛽𝑘 𝑥𝑘 + 𝑒
𝑦̂ = 𝑏0 + 𝑏1 𝑥1 + 𝑏2 𝑥2 + ⋯ + 𝑏𝑘 𝑥𝑘
Here 𝑦̂ is the estimated probability that the firm pays dividend. This model is called linear
probability model (LPM). However, this model has very undesirable features e.g. estimated value
of y [estimate of P(Y = 1)], may come out to be negative or greater than 1. Also the fit of the
model is quite poor. For example, with only one x (e.g. market cap), the scatter plot with fitted
LPM may look like:
Thus we need a model which renders estimated probability in (0,1) range and also captures
the non-linearity of the relationship well. Such a model should be an S-shaped function as follows:

CDF of any continuous distribution can be used to model this functional form. One of the
most popular function is the logistic distribution CDF given by:
𝑒 𝛽0 + 𝛽1𝑥1 +𝛽2𝑥2 +⋯+𝛽𝑘 𝑥𝑘
E(Y) = 𝑝 =
1+𝑒 𝛽0+ 𝛽1𝑥1 +𝛽2𝑥2 +⋯+𝛽𝑘 𝑥𝑘
1
=
1 + 𝑒 −(𝛽0 + 𝛽1𝑥1 +𝛽2𝑥2 +⋯+𝛽𝑘 𝑥𝑘 )

This is a non-linear function of independent variable x and parameters.


This model is called the Logistic Regression and is widely used in business, economics
finance and medical fields.
The marginal effect of (one unit increase in) a continuous 𝑥𝑘 variable is:
𝑑𝑝
= 𝛽𝑘 𝑝 (1 − 𝑝)
𝑑𝑥𝑘
Marginal effect of a binary dummy variable D is obtained as difference in CDF as:
ME = P(Y = 1 | D=1, X) − P(Y = 1 | D=0, X)
Odds Ratio Interpretation:
It can be noted that the marginal effects computed in terms of probability vary over different at
different values of the explanatory variables. For logistic regression coefficients interpretation is
relative more straightforward when they are interpreted as odds. The odds in favor of an event is
defied as p/q i.e. probability of success divided by probability of failure. It can be shown that the
for a variable x the exponent of the coefficient gives odds ratio i.e., odds ratio = 𝑒 𝛽 . If the odds
ratio of a coefficient is 1, it indicates that the corresponding x variable has no effect on the odds
of Y=1. (This is because 𝑒 𝛽 =1 if 𝛽 = 0). The odds ratio greater than 1 means the variable has a
positive effect on odds and odds ratio less than 1 means the variable has a negative effect on the
odds in favor of Y =1. Interpretation is easier when expressed in percentage.
Parameter Estimation:
Model parameters are estimated by method of maximum likelihood. Software R, SPSS or online
tools e.g. the following can be used.
https://stats.blue/Stats_Suite/logistic_regression_calculator.html
Example: The following Logit model was estimated (based n = 261 observations) to investigate
whether a worker in a South Indian town has a permanent job (Y = 1) or a temporary job (Y = 0).
The R software estimates appear below: (file PermanentJob.xls)
1
𝑃(𝑌𝑖 = 1|𝑋) =
1 + 𝑒 −𝐿̂𝑖
𝐿̂𝑖 = −2.608 + 0.0549 𝐴𝑔𝑒 + 0.1425 𝑊 + 0.208 𝑒𝑑𝑢𝑃𝑅𝐼 + 1.124𝑒𝑑𝑢𝑆𝐸𝐶 + 3.563𝑒𝑑𝑢𝑃𝑂𝑆𝑇
SE (0.0122) (0.348) (0.382) (0.354) (1.055)
Where Age = Worker’s age in years,
W = 1 if worker is a woman, 0 if man
eduPRI = 1 if the worker’s maximum education is primary, (0 otherwise)
eduSEC = 1 if the worker’s maximum education is secondary, (0 otherwise)
eduPOST = 1 if the worker’s maximum education is post-secondary, (0 otherwise)
Note: No education is the reference category

a) Estimate the probability that a worker has a permanent job given that the worker is a 25 years’
female who has secondary education.
b) Estimate the probability that a worker has a permanent job given that the worker is a 25 years’
male who has secondary education.
c) Find the marginal effect of worker’s gender on the probability of having a permanent job
given that the worker is a 25 years’ female who has secondary education. Also interpret this
number.
d) Find the marginal effect of another years of worker’ age on probability of having a permanent
job for a female worker 25 years who has completed secondary education.
e) Predict whether a male worker 30 years of age having no education will be involved in
permanent or temporary job.
f) Estimate the probability that a male worker 40 years of age with post-secondary education will
have a temporary job.
g) Find and interpret the odds ratio of age and gender (W) variables.

Sol: a)
𝐿̂𝑖 = −2.608 + 0.0549 (25) + 0.1425 (1) + 0.208 (0) + 1.124(1) + 3.563(0) = 0.031
1
𝑃(𝑌𝑖 = 1|𝑋) = = 0.5077
1 + 𝑒 −0.031
b)
𝐿̂𝑖 = −2.608 + 0.0549 (25) + 0.1425 (0) + 0.208 (0) + 1.124(1) + 3.563(0) = −0.1115
1
𝑃(𝑌𝑖 = 1|𝑋) = = 0.47215
1 + 𝑒 0.1115
c) 𝑀𝐸 𝑜𝑓 𝐺𝑒𝑛𝑑𝑒𝑟 = 𝑃(𝑌𝑖 = 1|𝑊 = 1) − 𝑃(𝑌𝑖 = 1|𝑊 = 0) = 0.5077 − 0.47215 = 0.0355
The probability of a 25 year age woman with secondary education having a permanent job is
0.0355 higher than the male person with these characteristics.
𝑑𝑃
d) = 𝛽𝐴𝑔𝑒 𝑝 (1 − 𝑝)
𝑑𝐴𝑔𝑒
1
From part a) 𝑝 = 𝑃(𝑌𝑖 = 1|𝑋) = = 0.5077
1+𝑒 −0.031

Hence
𝑑𝑃
= 𝛽𝐴𝑔𝑒 𝑝 (1 − 𝑝) = 0.0549 (0.5077)(1 − 0.5077) = 0.0137
𝑑𝐴𝑔𝑒
So for a woman of such characteristics, as she gets one year older her probability of having a
permanent job increases by 0.0137.
e) 𝐿̂𝑖 = −2.608 + 0.0549 (30) + 0.1425 (0) + 0.208 (0) + 1.124(0) + 3.563(0) = −0.961
1
𝑃(𝑌𝑖 = 1|𝑋) = = 0.2766
1 + 𝑒 0.961
As this estimated probability is less than 0.5, we predict that such a worker will be involved in a
temporary job.
f)𝐿̂𝑖 = −2.608 + 0.0549 (40) + 0.1425 (0) + 0.208 (0) + 1.124(0) + 3.563(1) = 3.151
1
𝑃(𝑌𝑖 = 1|𝑋) = = 0.9589
1 + 𝑒 −3.151
Hence probability of such a person having temporary job is 1 − 𝑝 = 1 − 0.9589 = 0.041
g). Age: Odds Ratio = 𝑒 0.0549 = 1.0564.
Interpretation. A one year increase in age is associated with increasing odds in favor of
permanent job (vs temporary job) by 5.64%. (keeping other variables fixed).
[calculated as (1.056-1)*100]. Note of for odds ratio <1, the percentage decrease in odds is
calculated as (1-odds ratio)*100].
Gender (W):𝑒 0.1425 = 1.153
Interpretation. A woman has 15.3% greater odds of permanent job than a man (keeping other
variables fixed).
Anderson pdf p-809 Ex 44, 46, 47
Further Exercises:
Ex: Gender discrimination in hiring. Suppose you are investigating allegations of gender
discrimination in the hiring practices of a particular firm. An equal-rights group claims that females
are less likely to be hired than males with the same background, experience, and other
qualifications. Data (shown on the next page file ‘hire.xls’) collected on 28 former applicants will
be used to fit the model
a) Estimate the probability that female candidate who has 5 years of higher education and 6
years of experience is hired.
b) Estimate the probability that male candidate who has 5 years of higher education and 6
years of experience is hired.
c) Find and interpret the marginal effect of candidate’s gender on hiring probability for a
candidate who has 5 years of higher education and 6 years of experience.
d) Find and interpret the marginal effect of another year of experience on hiring probability
for a candidate who has 5 years of higher education and 6 years of experience.
e) Predict whether a female candidate who has who has 5 years of higher education and 6
years of experience is hired.
f) Find and interpret the odds ratio of experience and gender variables.

hire educ exp genderM


0 6 2 0
0 4 0 1
1 6 6 1
1 6 3 1
0 4 1 0
1 8 3 0
0 4 2 1
0 4 4 0
0 6 1 0
1 8 10 0
0 4 2 1
0 8 5 0
0 4 2 0
0 6 7 0
1 4 5 1
0 6 4 0
0 8 0 1
1 6 1 1
0 4 7 0
0 4 1 1
0 4 5 0
0 6 0 1
1 8 5 1
0 4 9 0
0 8 1 0
0 6 1 1
1 4 10 1
1 6 12 0

Using R: (use the data in csv file hiring.csv).


hiring=read.csv(file.choose()) # load data and name the data frame as hiring
attach(hiring) # attach data so that variables can be accessed directly
head(hiring) # display first few rows of the data frame
model1=glm(hire~educ+exp+male, data=hiring, family = binomial())
# estimate the logistic regression model
summary(model1) # show the results summary

You might also like