0% found this document useful (0 votes)
16 views16 pages

Chapter 3

The document discusses logistic regression, a key model for categorical response data, highlighting its applications in fields such as biomedical studies, social science, and marketing. It explains the logistic regression model's structure, its interpretation using odds ratios, and provides an example using horseshoe crab data to illustrate the model's application and significance testing. Additionally, it includes a step-by-step guide on how to perform binary logistic regression using SPSS.

Uploaded by

Sanjida Tasnim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views16 pages

Chapter 3

The document discusses logistic regression, a key model for categorical response data, highlighting its applications in fields such as biomedical studies, social science, and marketing. It explains the logistic regression model's structure, its interpretation using odds ratios, and provides an example using horseshoe crab data to illustrate the model's application and significance testing. Additionally, it includes a step-by-step guide on how to perform binary logistic regression using SPSS.

Uploaded by

Sanjida Tasnim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 16

Logistic Regression

In introducing generalized linear models for binary data in chapter two we highlighted logistic regression.
This is the most important model for categorical response data. It is used increasingly in a wide variety of
applications. Early uses were in biomedical studies but the past 20 years have also seen much use in
social science research and marketing. Recently, logistic regression has become a popular tool in business
applications.

Some credit-scoring applications use logistic regression to model the probability that a subject is credit
worthy. For instance, the probability that a subject pays a bill on time may use predictors such as the size
of the bill, annual income, occupation, mortgage and debt obligations, percentage of bills paid on time in
the past, and other aspects of an applicant’s credit history. A company that relies on catalog sales may
determine whether to send a catalog to a potential customer by modeling the probability of a sale as a
function of indices of past buying behavior. In this chapter we study logistic regression more closely.

3.1 THE LOGISTIC REGRESSION MODEL

To begin, suppose there is a single explanatory variable X, which is quantitative. For a binary response
variable Y, recall that π(x) denotes the “success” probability at value x. This probability is the parameter
for the binomial distribution. The logistic regression model has linear form for the logit of this
probability,

logit [ π ( x ) ] =… … … …( 1)

The formula implies that π(x) increases or decreases as an S-shaped function of x. The logistic regression
formula implies the following formula for the probability π(x),

exp ( α + βx )
π ( x) = … … … … … .(2)
1+exp ( α + βx )

3.2 Horseshoe Crabs: Viewing and Smoothing a Binary Outcome

To illustrate these interpretations, we analyze the horseshoe crab data introduced Table 3.1. The table
comes from a study of nesting horseshoe crabs (J. Brockmann, Ethology, 102: 1–21, 1996). Each female
horseshoe crab in the study had a male crab in her nest. The study investigated factors that affect whether
the female crab had any other males, called satellites, residing nearby her. The response outcome for each
female crab is her number of satellites. Explanatory variables are the female crab’s color, spine condition,
weight and female crab’s shell or carapace width (in centimeters). For now, we use female crab’s shell
width alone as a predictor/explanatory variable. In the sample, this shell width had a mean of 26.3 cm and
a standard deviation of 2.1 cm.

Table 3.1 Number of Crab Satellites by Female’s Color(C), Spine Condition(S), Width (W), Weight
(Wt) and Number of satellites (Sa)

1
2
Here, Y indicate whether a female crab has any satellites. That is, Y = 1 if a female crab has at least one
satellite, and Y = 0 if she has no satellite

From SPSS, we get the following output after performing binomial logistic regression, where explanatory
variable (X) is Width or W and dependent variable (Y) is Having satellite.

Let π ( x ) denote the probability that a female horseshoe crab of width x has a satellite. The estimated
probability of a satellite according to the data is,

exp (−12.3508+0.4972 x )
^π ( x ) =
1+exp (−12.3508+ 0.4972 x )

In this sample, the minimum width is 21.0 cm, the estimated probability for this minimum width is

exp (−12.3508+ 0.4972∗21 )


^π ( 21.0 )= =0.129
1+exp (−12.3508+0.4972∗21 )

At the maximum width of 33.5 cm, the estimated probability equals

exp (−12.3508+ 0.4972∗33.5 )


^π ( 33.5 )= =0.987
1+exp (−12.3508+0.4972∗33.5 )

The median effective level is the width at which ^π ( x ) =0.50 . This is


−^α −−12.3508
¿ E L50= = =24.8 . Figure 1 plots the estimated probabilities as a function of width.
^β 0.4972

3
The logistic regression formula (1) indicates that the logit (logarithm of the odds) increases by β for every
1 cm increase in x. The parameter β in equations (1) and (2) determines the rate of increase or decrease of
the S-shaped curve for π(x). The sign of β indicates whether the curve ascends (β > 0) or descends (β <
0), and the rate of change increases as |β| increases.

When β = 0, the right-hand side of equation (2) simplifies to a constant. Then, π(x) is identical at all x, so
the curve becomes a horizontal straight line. The binary response Y is then independent of X.

Figure 1 shows the S-shaped appearance of the model for π(x), for horseshoe crab data. Since it is curved
rather than a straight line, the rate of change in π(x) per 1 unit increase in x depends on the value of x. A
straight line drawn tangent to the curve at a particular x value, such as shown in Figure 1, describes the
rate of change at that point. For logistic regression parameter β, that line has slope equal to βπ(x)[1 −
π(x)]. For instance, the line tangent to the curve at x for which π(x) = 0.50 has slope β(0.50)(0.50) =
0.25β; by contrast, when π(x) = 0.90 or 0.10, it has slope 0.09β. The slope approaches 0 as the probability
approaches 1.0 or 0.

The steepest slope occurs at x for which π(x) = 0.50. That x value relates to the logistic regression
parameters by x = −α/β. This x value is sometimes called the median effective level and is denoted EL50.
It represents the level at which each outcome has a 50% chance.
At the sample mean width of 26.3 cm, ^π ( x ) = 0.674. From Section 4.1.1, the incremental rate of change
in the fitted probability at that point is ^β π^ ( x ) [ 1−π^ ( x ) ] =0. 4972(0 . 674)(0 . 326)=0 . 11 . For female
crabs near the mean width, the estimated probability of a satellite increases at the rate of 0.11 per 1 cm
increase in width. The estimated rate of change is greatest at the x value (24.8) at which ^π ( x ) = 0.50;
there, the estimated probability increases at the rate of (0.4972)(0.50)(0.50) = 0.12 per 1 cm increase in

4
width. Unlike the linear probability model, the logistic regression model permits the rate of change to
vary as x varies.

3.3 Odds Ratio Interpretation


An important interpretation of the logistic regression model uses the odds and the odds ratio. For model
(1), the odds of response 1 (i.e., the odds of a success) are

For the horseshoe crabs, the logistic regression model is


logit[ ^π ( x ) ] = −12.3508 + 0.4972x
When
x = 26.3, ^π ( x ) = 0.674, odds= 0.674/0.326 = 2.07 and log(odds)= ln(2.07)
x = 27.3, ^π ( x ) = 0.773, odds= 0.773/0.227 = 3.40 =2.07*1.64 and log(odds)= ln(3.40)=ln(2.07)+0.4972
x = 28.3, ^π ( x ) = 0.848, odds= 0.848/0.227 = 5.57 =2.07*1.64*1.64 and log(odds)=ln(5.57)=ln(3.4)+0.4972
……..

That means for one unit increase in x, the odds are multiplied by e β =e .4972=¿ 1.64. So, the estimated
odds of a satellite multiply by e β = exp (0.497) = 1.64 for each centimeter increase in width. The
interpretations are as following:

β=0.4972 indicates that, for one unit change in width, the expected change in log odds of having at
least one satellite is 0.4972.

You have to keep in mind that the estimated logistic regression coefficient also provides an estimate of
the odds ratio i.e. ¿=e β
So, OR= e β = exp (0.497) = 1.64 indicates that, there is a 64% increase in the odds of having at least
one satellite for each cm increase in width of female carb’s shell.

3.4 INFERENCE FOR LOGISTIC REGRESSION

Confidence Intervals for Effects


A large-sample Wald confidence interval for the parameter β in the logistic regression model (1) is
^β ± Z . SE
α
2
For the logistic regression analysis of the horseshoe crab data, the estimated effect of width on the
probability of a satellite is ^β = 0.497, with SE = 0.102. A 95%Wald confidence interval for β is 0.497 ±
1.96(0.102), or (0.298, 0.697).

The confidence interval for the odds ratio is (e 0.298 , e 0.697 ¿=( 1.347 , 2.007 ) . We infer that a 1 cm
increase in width has at least a 34 percent increase and at most a doubling in the odds that a female crab
has a satellite.

Significance Testing
For the logistic regression model, H o : β = 0 states that the probability of success is independent of X.
Wald test statistics for large samples is,

5
β^
z=
SE
has a standard normal distribution when β = 0. Refer z to the standard normal table to get a one-sided or
two-sided P-value. Equivalently, for the two-sided Ha: β ≠ 0,
2
z =¿ has a large-sample chi-squared null distribution with df = 1. For the horseshoe crab data, the Wald
β^ 0.4972
statistic z= = =4.9
SE .102
This shows strong evidence of a positive effect of width on the presence of satellites (P <0.0001). The
equivalent chi-squared statistic, z2 = 23.9, has df = 1.

How to run binary logistic in SPSS:


This total process in done in SPSS 22. It can be performed in the same way of the previous version of
SPSS as well.
1. You have to input the variable info in “variable view”.

6
2. In “data view” you will input the data.

3. As our dependent variable (Y) is binary for which the categories are having at least
one satellite (Y=1) or no satellite (Y=0). So we are creating a new binary
variable from “Sa”.

7
4. Select the variable you want to change and tap the button.

5. Write the name and label of the new variable and tap “change”.

8
6. Tap the button

9
7. If the number of satellite is zero, then our new variable Y (having at least one
satellite) is zero. Tap button “add”.

8. If the number of satellite is one or more, then our new variable Y (having at least one
satellite) is one (that means yes). Tap button “add”

10
9. As this new variable is categorical (string), you have to tap the button and then
“continue”.

10. Tap ”OK” .

11
11. Our new binary variable Y is ready .

12. Follow

12
13. Select the new variable as dependent .

14.

13
15. Select the independent variable width as covariate.

16. Go to “option” .

14
17. Select confidence interval for odds raio [OR=exp(B)] and “continue”.

18.

15
19. In “output window”, scroll down and you will find your result of binary logistic
regression.

16

You might also like