Business Analytics and
Operations Research
Roger R. Gung, Ph.D.
Sr. Director, Business Analytics & Operations Research
roger.gung@phoenix.edu
6/5/2017 Copyright 2016 University of Phoenix. All Rights Reserved. 1
Generalized Linear Model (GLM)
Generalized Linear Model is a flexible generalization of ordinary linear
regression that allows for dependent/target variables that have error
distribution models other than a normal distribution
GLM generalizes linear regression by allowing the linear model to be related to
the dependent/target variable via a link function
Two major types of GLM:
Logistic Regression
Count Regression
6/5/2017 Copyright 2016 University of Phoenix. All Rights Reserved. 2
Logistic Regression
Logistic Regression is a regression model where the dependent/target
variable is categorical
Binary dependent variable
Multiple category dependent variable
Cases where the dependent variable has more than two outcome categories may be
analyzed in multinomial logistic regression, or,
if the multiple categories are ordered, in ordinal logistic regression
This course covers the case of a binary dependent variablethat is, where it
can take only two values, "0" and "1", which represent outcomes such as
pass/fail, win/lose, alive/dead or healthy/sick
Link function for binary dependent variable:
Logit function
Probit function
6/5/2017 Copyright 2016 University of Phoenix. All Rights Reserved. 3
Logistic Regression
Logit function
log
1
1
0.9
0.8
0.7
0.6
p 0.5
0.4
0.3
0.2
0.1
0
-10 -8 -6 -4 -2 0 2 4 6 8 10
(p)
6/5/2017 Copyright 2016 University of Phoenix. All Rights Reserved. 4
Logistic Regression
Probit function: Inverse CDF of standard normal distribution
1
0.9
0.8
0.7
0.6
p 0.5
0.4
0.3
0.2
0.1
0
-10 -8 -6 -4 -2 0 2 4 6 8 10
(p)
6/5/2017 Copyright 2016 University of Phoenix. All Rights Reserved. 5
Count Regression
Count Model Mean-Variance Relationship
Model Mean Variance
Poisson
Negative Binomial (NB1) 1
Negative Binomial (NB2) 1
Poisson Inverse Gaussian 1 2
Negative Binomial P 1
Generalized Poisson 1
Poisson Regression
log
6/5/2017 Copyright 2016 University of Phoenix. All Rights Reserved. 6
SAS Proc Logistic
6/5/2017 Copyright 2016 University of Phoenix. All Rights Reserved. 7
SAS Proc Logistic
6/5/2017 Copyright 2016 University of Phoenix. All Rights Reserved. 8
Variable Significance Level
P-Value: Significance level of each variables impact to model goodness of fit
Odds Ratio: Significance level of each variables impact on target variable
magnitude is measured by comparing the odds of the best level to the odds of
the worst level of the driver. It is defined as the following:
" "
! -
1 " #$%& '$($) 1 " *+,%& '$($)
Information Value (IV) and Weight of Evidence (WOE): Significance level of each
variables impact on target variable magnitude
6/5/2017 Copyright 2016 University of Phoenix. All Rights Reserved. 9
Nonlinear Regression
Nonlinear Regression is a form of regression analysis in which
response/target variable is modeled by a function which is a nonlinear
combination of independent variables
Examples of nonlinear functions include exponential functions, logarithmic
functions, trigonometric functions, power functions, Gaussian function
. /0
Convert . /0 to
Do not convert to log
because is not the random
noise for
Similarly for the following forms:
log
6/5/2017 Copyright 2016 University of Phoenix. All Rights Reserved. 10
Nonlinear Regression for Marketing
Each independent variable has a lift factor
to describe the diminishing return
(financial benefit) to the dependent variable
It is a multiplicative/interactive model that can be converted to additive model
Marketing demand model
5
60 7
log
log
9
6/5/2017 Copyright 2016 University of Phoenix. All Rights Reserved. 11
Binning Nonlinear Variable
It is not easy to find the best nonlinear function for each numerical variable
Convert numerical variables into categorical variables
Apply decision tree to bin numerical variables
Each bin/level will has a linear coefficient
less population
9
:
;
6/5/2017 Copyright 2016 University of Phoenix. All Rights Reserved. 12
Binning Nonlinear Variable
Binning nonlinear variables can also be applied to GLM
The dependent variable need to be judgmentally binned to observe the
nonlinearity, because it is a probability measurement
less population
9
:
;
6/5/2017 Copyright 2016 University of Phoenix. All Rights Reserved. 13
Receiver Operating Characteristic
Receiver operating characteristic curve, i.e. ROC curve, is a graphical plot that
illustrates the diagnostic ability of a binary classifier system as its discrimination
threshold is varied
ROC curve is created by plotting the true positive rate (TPR) against the false
positive rate (FPR) at various threshold settings
The true-positive rate is also known as sensitivity, recall or probability of
detection in machine learning
false-positive rate is also known as the fall-out or probability of false alarm and
can be calculated as (1 specicity)
6/5/2017 Copyright 2010 University of Phoenix. All Rights Reserved. 14
ROC Basic Concept
6/5/2017 Copyright 2010 University of Phoenix. All Rights Reserved. 15
ROC Space
6/5/2017 Copyright 2010 University of Phoenix. All Rights Reserved. 16
Constructing ROC Curve
In binary classification, the class prediction for each instance is made based on
a continuous random variable X, which is a score computed for the instance
(e.g. estimated probability in Logistic Regression)
Given a threshold parameter T, the instance is classified as positive if X >T,
and negative otherwise
Follows a probability density 1 if the instance actually belongs to class
positive, and 0 if otherwise.
TPR is given
C
TPR BD 1
FPR is given
C
TPR BD 0
6/5/2017 Copyright 2010 University of Phoenix. All Rights Reserved. 17
Area under Curve, C-statistics
ROC curve is created by plotting the true positive rate (TPR) against the false
positive rate (FPR) at various threshold settings
The area under the curve (AUC), also known as C-statistics, is equal to the
probability that a classifier will rank a randomly chosen positive instance higher
than a randomly chosen negative one
AUC is given by
C
AUC H I J K > J 1J0JJ K J "L > L
C
6/5/2017 Copyright 2010 University of Phoenix. All Rights Reserved. 18
Weight of Evidence
In logistic regression, we usually have categorical variables. One can choose to
use Weight of Evidence (WOE) to transform categorical variables into numerical
variables
WOE of each level of a categorical variable measures the strength of the level
for separating yes and no in response variable
Benefits of WOE transformation
Linear assumption in logistic regression naturally holds (explained in next page)
Reduces degree of freedom and helps with overfitting problem
Check of collinearity between categorical variables becomes easy
6/5/2017 Copyright 2010 University of Phoenix. All Rights Reserved. 19
WOE and Nave Bayes
For independent variable
1,2, P , the WOE of the Qth level is
calculated as follows:
UV!PW X. !YT !P, % !
[ \ Q\ T.].T
_ X X.
RS Q TP TP
UV!PW ^! !YT !P, % !
[ \ Q\ T.].T
_ X ^!
Nave Bayes classifier is given by:
X1a X 1 a X 1 /a
TP TP
X0a X 0 a X 0 /a
X 1 X 1 X 1 ` X 1
TP TP
X 0 X 0 X 0 ` X 0
`
X 1
X 1
TP TP
X 0
X 0
c
WOE vector
Linear assumption after WOE transformation holds!
6/5/2017 Copyright 2010 University of Phoenix. All Rights Reserved. 20
WOE and Information Value (IV)
Weight of evidence indicates the predictive power of a particular level of the
variable
Information value (IV) assesses the differentiating power of the variable which
indicates how much differences from one level to other levels in regards to the
outcome, such as pass or fail
IV is given by:
Id
_ X X.
_ X ^! f RS Q
e)) _
One rule of thumb regarding IV is (Siddiqi, Naeem 2006):
IV < 0.02: unpredictive
0.02 <= IV < 0.1: weak
0.1 <= IV < 0.3: medium
0.3 <= IV < 0.5: strong
IV >= 0.5: suspicious and should be checked for over-predicting
6/5/2017 Copyright 2010 University of Phoenix. All Rights Reserved. 21
Calculation of WOE & IV
Here is an example of calculating WOE and IV
Distribution_ Distribution_
Degree Level #PSI #REG #NonREG REG NonREG WOE IV
Associate 29,553 4,224 25,329 16.4% 8.1% 0.701 0.058
Bachelor 119,216 12,820 106,396 49.8% 34.2% 0.376 0.059
Master 21,911 1,774 20,137 6.9% 6.5% 0.063 0.000
Doctorate 1,268 46 1,222 0.2% 0.4% -0.787 0.002
Missing 164,978 6,871 158,107 26.7% 50.8% -0.643 0.155
ALL 336,926 25,735 311,191 100% 100% 0.274
6/5/2017 Copyright 2010 University of Phoenix. All Rights Reserved. 22
Conversion of WOE Coefficients
When one uses WOE-transformed variables to build logistic regression, the
corresponding coefficient estimates represent the change of log odds in relation
to the increase of one unit in WOE. However, this coefficient can not be easily
interpreted.
In order to obtain the coefficients for original variables, conversion of WOE
coefficients can be performed as follows:
Multiply the coefficient with WOE for each level
Choose one level as reference (for example Undergraduate), and subtract the
multiplied result for reference from each level
Then we get the coefficient for original variable Degree Level and interpretation is
simple. For example, log odds of REG for AA PSI is 0.318 more than that of
undergraduate PSI
Degree Level WOE Coefficient Estimate Multiplied Compared with Reference (Undergraduate)
Associate 0.701 0.687 0.318
Bachelor 0.376 0.369 --
Master 0.063 0.980 0.062 -0.307
Doctorate -0.787 -0.771 -1.140
Missing -0.643 -0.631 -0.999
6/5/2017 Copyright 2010 University of Phoenix. All Rights Reserved. 23
Survival Analysis
Survival analysis is for analyzing the expected duration of time until one or
more events happen, such as death in biological organisms and failure in
mechanical systems
KaplanMeier survival analysis is a non-parametric statistical analysis used to
estimate the survival function from lifetime data
KaplanMeier survival analysis can take into account some types of censored
data:
Right-censoring
Left-censoring
Individual 1
Individual 2 Right censoring
Individual 3
Individual 4 Left censoring
Experiment Start Time Experiment End Time
Calendar Time
6/5/2017 Copyright 2010 University of Phoenix. All Rights Reserved. 24
KaplanMeier Survival Analysis
Individual 1
Individual 2 Right censoring
Individual 3
Individual 4 Left censoring
Survival Time
1.0
Survival Probability
2/3
2/4
1/4
Survival probability is unknown from this time point.
Survival Time
Kaplan-Meier survival probability is often viewed as actual probability.
6/5/2017 Copyright 2010 University of Phoenix. All Rights Reserved. 25
Survival Modeling
Survival function is the basic model employed to describe
time-to-event phenomena.
Survival function, S(t), is the probability of an individual
surviving beyond time t.
S (t ) = Pr(T > t ) = f (t )dt = 1 F (t )
t
dS (t )
f (t ) = S(t)
dt
-f(t)
6/5/2017 Copyright 2010 University of Phoenix. All Rights Reserved. Business Analytics & Optimization 26
Survival Modeling
- Hazard Rate/Function
Hazard rate is the death rate at time t conditional on
survival until time t or later, that is T >= t.
Suppose that an individual has survived for a time t and the
probability that it will not survive for an additional time t.
Pr(t T < t + t | T t )
h(t ) = lim t 0
t
= f (t ) / S (t )
d log[ S (t )]
=
dt
t
H (t ) = h(t )dt = log[S (t )]
0
S (t ) = e H ( t )
6/5/2017 Copyright 2010 University of Phoenix. All Rights Reserved. Business Analytics & Optimization 27
Survival Modeling
- Competing Risks
Assume there are k competing risks (death factors).
Pr(t T < t + t , d = i | T t )
hi (t ) = lim t 0
t
k
hT (t ) = hi (t )
i =1
t
H T (t ) = hT (t )dt = log[S (t )]
0
S (t ) = e H T ( t )
6/5/2017 Copyright 2010 University of Phoenix. All Rights Reserved. Business Analytics & Optimization 28
Survival Modeling
- Parametric Regression
Defined hazard function a parametric regression function.
\ \ . 60 &g0
Parametric Regression:
log[h(t )] = log[h0 (t )] + i (t ) X i
= 0 (t ) + i (t ) X i
6/5/2017 Copyright 2010 University of Phoenix. All Rights Reserved. Business Analytics & Optimization 29
Survival Modeling
- Semi-parametric Regression
Cox (1972) defined proportional Hazard Rate:
\|a \ . 60 g0 60 g0 g0
.
\|a \ .
60 g0
where a is the referenced attribute vector
From which, we fit parameters/coefficients for regressors
using maximum likelihood estimator, without worrying
about baseline hazard rate h0(t).
After getting coefficients, fit baseline hazard rate h0(t).
6/5/2017 Copyright 2010 University of Phoenix. All Rights Reserved. Business Analytics & Optimization 30
Partial Likelihood Estimator for Cox
Proportional Hazard Survival Model
In a sample size n consisting of (Tj, Xj), j = 1, 2, , n, we assume the
censoring time is non-informative in that, given Xj, the event and
censoring time for the jth individual are independent.
Let t1 < t2 < < tD denote the order of event times and X(k)i be the ith
covariates associated with the individual whose death time is tk.
Define the risk set at time tk, R(tk), as the set of all individuals who are
still under study at the time just prior to tk.
The partial likelihood based on the hazard rate is expressed by
r m
. 0no 60 g l 0
j k 5 m
0no 60 gp0
sc _q&l .
r t r
m
log j k
L s
log . 0no 60 gp0
sc
c sc _q&l
6/5/2017 Copyright 2010 University of Phoenix. All Rights Reserved. Business Analytics & Optimization 31
Life-Span Survival Modeling
- Fitted Functions with 2 Competing Risks
Students Attributes:
Associates with High School Diploma College = School of Business
(transfer credits >= 24, FPA from Aug2009 to Aug2010) Employer Billing = No
1 Gender = Male
0.9 Marital Status = Single
0.8 Managed Own Funds = No
Graduation
0.7
Region = Northeast
Survival Probability
Withdraw Age = 35
0.6
AGI = 30k
0.5 Combined
0.4
0.3
0.2
0.1
0
101
106
111
116
121
126
131
136
141
146
151
156
161
1
6
11
16
21
26
31
36
41
46
51
56
61
66
71
76
81
86
91
96
Week
6/5/2017 Copyright 2012 University of Phoenix. All Rights Reserved. Business Analytics & Optimization 32
Life-Span Survival Modeling
- Fitted Function vs Actual
Students Attributes:
Associates with High School Diploma College = School of Business
(transfer credits >=24 and FPA from Aug 2009 to Aug2010) Employer Billing = No
1.0 Gender = Male
0.9 Marital Status = Single
0.8 Managed Own Funds = No
Region = Northeast
0.7
Survival Probability
Age = any
0.6
AGI = any
0.5
0.4
Fitted
0.3
Actual
0.2
0.1
0.0
1
6
11
16
21
26
31
36
41
46
51
56
61
66
71
76
81
86
91
96
101
106
111
116
121
126
131
136
141
146
151
156
161
Week
6/5/2017 Copyright 2012 University of Phoenix. All Rights Reserved. Business Analytics & Optimization 33
Life-Span Survival Modeling
- Expected LS vs Actual LS
Students Attributes:
Associates with High School Diploma College = School of Business
(transfer credits >=24 and FPA from Aug 2009 to Aug2010) Employer Billing = No
Gender = Male
90 Marital Status = Single
80 Managed Own Funds = No
70 Region = Northeast
Actual LS (weeks)
60 Age = 35
50 AGI = 30k
40
30
20
10
0
20-25
25-30
30-35
35-40
40-45
45-50
50-55
55-60
60-65
65-70
70-75
75-80
Expected LS (weeks)
6/5/2017 Copyright 2012 University of Phoenix. All Rights Reserved. Business Analytics & Optimization 34