Republic of Benin
University of Abomey-Calavi
                                  (UAC)
                       Faculty of Agronomic Sciences
                                   (FAS)
            MASTER STATISTICS, ORIENTATION BIOSTATISTIC
   BINARY MODEL: LOGIT AND PROBIT
                                Group 6
Members :                                                                  Lecturer :
Boris BEHINGAN                                       Dr. Ir. Epiphane SODJINOU
Auric DJENONTIN                                 Agricultural Economist, Biostatistician
Elisé TOHO
                                                          July 2016
Outline
Introduction
1- Logit model..........................................................................................................................3
     1-1-    Principles......................................................................................................................3
     1-2-    Estimation of the Logit Model.....................................................................................4
     1-3-    Steps in estimating Logit Regression...........................................................................4
2-     Probit model........................................................................................................................5
     2-1-    Assumption of the model.............................................................................................5
     2-2-    Steps involved in estimation of Probit Model..............................................................5
3-     Logit versus probit...............................................................................................................6
4-     Application in R...................................................................................................................7
Conclusion................................................................................................................................10
References.................................................................................................................................11
                                                                     2
Introduction
There are certains type of regression models in which the dependant or response variable is
dichotomous in nature, taking a 1 or 0 value. There are special estimation associated with
such models. The most commonly used approachs to estimating such models are: the linear
probability model, the logit model and the probit model. But we will develop here the logit
and probit models. In the first part we will try to explain the theoretical aspect of probit and
logit regression followed by their application in R.
1- Logit model
1-1-    Principles
Logit regression (logit) analysis is a uni/multivariate technic which allows for estimating the
probability that an event occurs or not, by predicting a binary dependent outcome from a set
of independent variables. In an example of home ownership where the dependent variable is
owning a house or nor in relation to income, the linear probability model can be write as:
                                     Pi=E ( Y =1 ⋮ X i ) =β 1+ β2 X i
Where X is the income and Y=1 means that the family owns a house.
Let us consider the following representation of home ownership:
                                                      1                  1
                     Pi=E ( Y =1 ⋮ X i ) =                          =              (1)
                                             1+ exp [ β1 + β 2 X i ] 1+exp (−Z i)
Where Zi =β 1+ β 2 X i
The equation (1) is known as the (cumulative) distribution function. Here Zi ranges from
−∞ ¿+∞ ; Pi ranges between 0 and 1.
                                                                      1
Pi is the probability of owning a house and is given by:                         . Then the probability of
                                                                 1+ exp (−Z i )
                                      1
not owning a house is (1- Pi)¿                 .
                                 1+ exp (Z i)
                                                                             Pi     1+ exp (Z i)
Then we can define the odd ration as in favour of owning a house                  =               (2).
                                                                           (1−P i) 1+ exp (−Z i)
                                                      3
Taking the natural log of (2) we can obtain the Logit L which is:
Li=ln [ Pi /(1−Pi ) ]=Zi =¿ β 1+ β2 X i (3)
    -     As P goes from 0 to 1, the logit L goes from −∞ ¿+∞ . That is, although the
          probabilities lie between 0 and 1, the logits is not bounded.
    -     Although L is linear in X, the probabilities themselves are not.
    -     The interpretation of the logit model is as follows, β 2 the slope, measures the change
          in L for a unit change in X.it tells how the log odds in favour of owning a house
          change as income changes by a unit. The intercept β 1 is the value of the log odds in
          favour of owning a house if income is zero.
1-2-      Estimation of the Logit Model
In order to estimate the logit model, we need apart from X i , the values of logit Li. We need to
                                                             ni
compute the estimated relative frequency: ^
                                          Pi=                   . This relative frequency is an estimate of
                                                             Ni
true Pi corresponding to each X i . Using the estimated Pi, we can obtain the estimated logit as:
                                    ^Li=ln [ P        ^ i ) ]=Z i= ^β 1+ β^ 2 X i
                                             ^ i /(1− P
1-3-      Steps in estimating Logit Regression
Step 1
                                                                                                     ni
Compute the estimated probability of owning a house for each income level X i , as : ^
                                                                                     Pi=
                                                                                                     Ni
Step 2
For each X i , obtain the logit as ^Li=ln [ P        ^ i )]
                                            ^ i /(1− P
Step 3
Transform the logit regression as follows: √ W i Li=β 1 √ W i + β 2 √ W i X i + √ W i U i where
        N i Pi
Wi=            and U i is the non-normality of the disturbance.
        1−P i
Step 4
Estimate (4) by OLS
                                                         4
Step 5
Establish confidence intervals and/or test hypothesis in the usual OLS framework.
2- Probit model
   In order to explain the behavior of a dichotomous de pendent variable we have to use
   suitably chosen Cumulative Distribution Function (CDF). The logit model uses the
   cumulative logistic function. But this is not the only CDF that one can use. In some
   applications the normal CDF has been found useful. The estimating model that emerges
   from the normal CDF is known as the Probit Model.
   Let us assume that in home ownership example, the decision of the ith family to own a
   house or not depends on unobservable utility index I i, that is determined by the
   explanatory variables in such a way that the larger the value of index I i, the greater the
   probability of the family owning a house. The index I i can be expressed as I i=β 1+ β2 X i ,
   where X i is the income of the ith family.
2-1-     Assumption of the model
                                                                              ¿
   For each family there is a critical or threshold level of the index (I ¿¿ i )¿, such that if I i
             ¿                                                                        ¿
   exceeds I i , the family will own a house otherwise not. But the threshold level I i is not also
   observable. If it is assumed that it is normally distributed with the same mean and
   variance, it is possible to estimate the parameters of equation (5) and thus get some
   information about unobservable index itself.
   In probit analysis, the unobservable utility index I i is known as normal equivalent deviate
   (n.e.d.) or simply Normit. Since n.e.d. or I i will be negative whenever Pi <0.5 , in practice
   the number 5 is added to the n.e.d. and the result so obtained is called the Probit.
                                      Probit = n.e.d + 5 = I i+ 5
   In order to estimate β 1+ β2 , (5) can be written as
                                         I 1=β 1+ β 2 X i +U i (6)
                                                  5
2-2-     Steps involved in estimation of Probit Model
Step 1
Compute the estimated probability of owning a house for each income level X i , as in a case of
                   ni
Logit model: ^
             Pi=
                   Ni
Step 2
Obtain the n.e.d from the standard normal CDF, I i=β 1+ β2 X i +U i
Step 3
Add 5 to the estimated I i to convert them into probits and use the probits thus obtain the
dependent variable in (6).
Step 4
The term of residual errors is heteroscedastic as in Logit models. In order to get efficient
estimates, one has to transform the model
Step 5
Estimate (6) by OLS
3- Logit versus probit
    The difference between logit and probit models lies in the assumption on the
         distribution of the error term in the model. For logit model, the errors are assumed to
         follow the standard logistic distribution while for the probit, the errors assumed to
         follow a normal distribution.
    The logit function is similar, but has thinner tails than the normal distribution
                                                6
Figure 1 : Logit and probit trend
Source : Harari-Kermadec, 2009
    Is logit better than probit, or vice versa? Both methods yield similar result. Preference
       for probit or logit tends to vary by discipline. Logit is more popular in health sciences
       like epidemiology. Probit model is popular in econometry and used by economists and
       political scientists.
    Qualitatively, logit and probit models give similar results, the estimates of parameters
       of the two models are not directly comparable. If we want to make β comparable in
       logit and probit model there is an approximate relationship: Multiply probit.s β by
       1.81 and it will be approximately the same as logit.s.
4- Application in R
The command use to performe logit or probit analysis is the function glm available in R. The
following syntax show how to run it.
# Import the data
                                               7
mydata<-read.table("Poids.txt",header=TRUE)
is the name of the data con y, x1, x2 and x3 where y is the dependent variable taking 0 and 1
as values the nit is dichotomous and x1, x2 and x3 are the explanatory variables
# Model
 or probit <- glm (y~ x1 + x2 + x3, family=binomial (link="logit or probit"),
data=mydata)
summary (logit or probit)
# Use summary to get the result
Call:
glm(formula = y ~ x1 + x2 + x3, family = binomial(link = "logit"), data = mydata)
Deviance Residuals:
          Min     1Q Median        3Q     Max
          -2.0277 0.2347 0.5542 0.7016 1.0839
Coefficients:
                  Estimate Std. Error z value Pr(>|z|)
        (Intercept) 0.4262   0.6390 0.667 0.5048
          x1       0.8618    0.7840 1.099 0.2717
          x2       0.3665    0.3082 1.189 0.2343
                                                     .
          x3       0.7512    0.4548 1.652 0.0986
          ---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 70.056 on 69 degrees of freedom
Residual deviance: 65.512 on 66 degrees of freedom
                                                 8
AIC: 73.512
Number of Fisher Scoring iterations: 5
      -   The Pr (>|z|) column shows the two-tailed p-values testing the null hypothesis that the
          coefficient is equal to zero (no significant effect). The usual value is 0.05, by this
          measure none of the coefficients have a significant effect on the log-odds ratio of the
          dependent variable. The coefficient for x3 is significant at 10% (<0.10).
      -   The z value also tests the null that the coefficient is equal to zero.
      -   The Estimate column shows the coefficients. When x3 increase by one unit, the
          expected change in the log odds is 0.7512. What you get from this column is whether
          the effect of the predictors is positive or negative.
# Here it is the sign of the coefficients which are important. It shows if y and x follow the
same direction. We also need to see the significance of the coefficient. For the exemple
only x3 is significant at 10%.
# The package mfx we can get the odd ratio by using the following command
library(mfx)
logitor(y_bin ~ x1 + x2 + x3, data=mydata)
And we get
Call:
logitor(formula = y_bin ~ x1 + x2 + x3, data = mydata)
Odds Ratio:
      OddsRatio Std. Err.      z P>|z|
x1 2.36735 1.85600 1.0992 0.27168
x2 1.44273 0.44459 1.1894 0.23427
                                           .
x3 2.11957 0.96405 1.6516 0.09861
---
                                                   9
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# We’ve seen that only x3 is significant at 10%. Then we will focus the interpretation of the
odd ratio on x3. When x3 increases by one unit, the odds of y = 1 increase by 112% (2.12-
1)*100. Or, the odds of y =1 are 2.12 times higher when x3 increases by one unit (keeping all
other predictors constant).
Conclusion
Binary models are used when the dependant variable or response variable is dichotomous
Logit and probit are the model used in this case. There are similar and the choice depend on
the discipline.
                                                10
References
Torres-Reyna O., 2004. Logit/Probit models in R. Princeton University, 12p
Harari-Kermadec H., 2009. Econométrie 2 : données qualitatives, probit et logit. 7p.
Wooldridge M. J., 1960.Econometric Analysis of Cross Section and Panel Data. p: 453-460
                                              11