CLM Article
CLM Article
Abstract
This paper introduces the R-package ordinal for the analysis of ordinal data using
cumulative link models. The model framework implemented in ordinal includes partial
proportional odds, structured thresholds, scale effects and flexible link functions. The
package also support cumulative link models with random effects which are covered in a
future paper. A speedy and reliable regularized Newton estimation scheme using analyt-
ical derivatives provides maximum likelihood estimation of the model class. The paper
describes the implementation in the package as well as how to use the functionality in
the package for analysis of ordinal data including topics on model identifiability and cus-
tomized modelling. The package implements methods for profile likelihood confidence
intervals, predictions of various kinds as well as methods for checking the convergence of
the fitted models.
This is a copy of an article submitted for publication in Journal of Statistical Software (https:
//www.jstatsoft.org/).
1. Introduction
Ordered categorical data, or simply ordinal data, are common in a multitude of empirical
sciences and in particular in scientific disciplines where humans are used as measurement
instruments. Examples include school grades, ratings of preference in consumer studies, degree
of tumor involvement in MR images and animal fitness in ecology. Cumulative link models
(CLMs) are a powerful model class for such data since observations are treated correctly as
categorical, the ordered nature is exploited and the flexible regression framework allows for
in-depth analyses.
This paper introduces the ordinal package (Christensen 2018) for R (R Core Team 2018) for
the analysis of ordinal data with cumulative link models. The paper describes how ordinal
supports the fitting of CLMs with various models structures, model assessment and inferential
options including tests of partial proportional odds and scale effects. The implementation and
an effective fitting algorithm is also described. The ordinal package also supports cumulative
link mixed models (CLMMs); CLMs with normally distributed random effects. The support
of this model class will not be given further treatment here but remain a topic for a future
2 Cumulative Link Models with the R package ordinal
paper.
The name cumulative link models is adopted from Agresti (2002), but the model class has
been referred to by several other names in the literatures, such as ordinal regression models as
well as ordered logit models and ordered probit models (Greene and Hensher 2010) for the logit
and probit link functions. The cumulative link model with a logit link is widely known as the
proportional odds model due to McCullagh (1980) and with a complementary log-log link, the
model is sometimes referred to as the proportional hazards model for grouped survival times.
Cumulative link models can be fitted by all the major software packages and while some
software packages support scale effects, partial proportional odds (also referred to as unequal
slopes, partial effects, and nominal effects), different link functions and structured thresholds
all model structures are usually not available in any one package or implementation. The
following brief software review is based on the publically available documentation at software
packages websites retreived in june 2018.
IBM SPSS (IBM Corp. 2017) implements McCullagh’s PLUM (McCullagh 1980) procedure,
allows for the five standard link functions (cf. Table 3) and scale effects. Estimation is via
Fisher-Scoring and a test for equal slopes is available for the location-only model while it is
not possible to estimate a partial proportional odds model.
Stata (StataCorp 2017) includes the ologit and oprobit procedures for CLMs with logistic
and probit links but without support for scale effects, partial effect or structured thresholds.
The add-on package oglm allows for all five standard link functions and scale effects. The
GLLAMM package (Rabe-Hesketh, Skrondal, and Pickles 2004) also has some support for
CLMs in addition to some support for random effects.
SAS (SAS Institute Inc. 2010) implements CLMs with logit links in proc logistic and CLMs
with the 5 standard links in prog genmod.
In R, several packages on CRAN implements CLMs. polr from MASS (Venables and Ripley
2002) implements standard CLMs allowing for the 5 standard link functions but no further
extensions; the VGAM package (Yee 2010) includes CLMs via the vglm function using the
cumulative link. vglm allows for several link functions as well as partial effects. The lrm
and orm functions from the rms package (Harrell Jr 2018) also implements CLMs with the 5
standard link functions but without scale effects, partial or structured thresholds. A Bayesian
alternative is implemented in the brms package (Bürkner 2017) which includes structured
thresholds in addition to random-effects.
The ordinal package implements CLMs and CLMMs along with functions and methods to
support these model classes. The two key functions in ordinal are clm and clmm which fits
CLMs and CLMMs respectively. Additional functions in ordinal cover distributional support
for Gumbel and log-gamma distributions as well as gradients1 of normal, logistic and Cauchy
probability density functions which are used in the iterative methods implemented in clm and
clmm. An overview over key functions in ordinal is provided in Table 1.
A number of standard methods are implemented for fitted CLMs, i.e., objects of class clm
fitted with ordinal::clm which mostly correspond to methods also available for glm objects.
Most extractor methods will not be explicitly discussed in this paper as they behave unsur-
prisingly but otherwise most methods will be discussed in the following sections. As CLMMs
are not covered by this paper methods for clmm objects will not be discussed.
1
gradients with respect to x, the quantile; not the parameters of the distributions
Rune Haubo B Christensen 3
The remainder of the paper is organized as follows. The next section establishes notation
by defining CLMs and associated log-likelihood functions, then describes the extended class
of CLMs that is implemented in ordinal including details about scale effects, structured
thresholds, partial proportional odds and flexible link functions. The third section describes
how maximum likelihood (ML) estimation of CLMs is implemented in ordinal. The fourth
section describes how CLMs are fitted and ordinal data are analysed with ordinal including
sections on nominal effects, scale effects, structured thresholds, profile likelihoods, assessment
of model convergence, fitted values and predictions, issues around model identifiability and
finally how ordinal is prepared for customizable fitting of models not otherwise covered by
the API. We end in section 5 with a brief conclusion.
−∞ ≡ θ0 ≤ θ1 ≤ . . . ≤ θJ−1 ≤ θJ ≡ ∞.
or equivalently
XX
ℓ(θ, β; y) = I(yi = j) log πij
i j
X
= log π̃i
i
where π̃i is the j’th entry in πi and I(·) is the indicator function.
Allowing for observation-level weights (case weights), wi leads finally to
X
ℓ(θ, β; y) = wi log π̃i . (3)
i
where z1−α/2 is the (1−α/2) quantile of the standard normal cumulative distribution function.
Taking s(·) to be the Wald statistic s(βa ) : w(βa ) = (β̂a − βa )/se(
ˆ β̂a ) leads to the classical
symmetric intervals. Better confidence intervals can be obtained by choosing instead the
likelihood root statistic (see e.g., Pawitan 2001; Brazzale, Davison, and Reid 2007):
q
s(βa ) : r(βa ) = sign(β̂a − βa ) −2[ℓ(θ̂, β̂; y) − ℓp (βa ; y)]
3
we have suppressed the conditioning on the covariate vector, xi , i.e., γij = γj (xi ) and P (Yi ≤ j) = P (Y ≤
j|xi ).
Rune Haubo B Christensen 5
where
ℓp (βa ; y) = max ℓ(θ, β; y) ,
θ,β−a
is the profile likelihood for the scalar parameter βa and β−a is the vector of regression pa-
rameters without the a’th one.
While the profile likelihood has to be optimized over all parameters except βa we define a
log-likelihood slice as
ℓslice (βa ; y) = ℓ(βa ; θ̂, β̂−a , y) , (4)
which is the log-likelihood function evaluated at βa while keeping the remaining parameters
fixed at their ML estimates.
A quadratic approximation to the log-likelihood slice is (β̂a − βa )2 /2τa2 where the curvature
unit τa is the squareroot of a’th diagonal element of the Hessian of −ℓ(θ̂, β̂; y).
P(Yi ≤ j)
logit(γij ) = log (5)
1 − P(Yi ≤ j)
The odds ratio (OR) of the event Yi ≤ j at x1 relative to the same event at x2 is then
which is independent of j. Thus the cumulative odds ratio is proportional to the distance
between x1 and x2 which motivated McCullagh (1980) to denote the cumulative logit model
a proportional odds model. If x represent a treatment variable with two levels (e.g., placebo
and treatment), then x2 − x1 = 1 and the odds ratio is exp(−βtreatment ). Similarly the odds
ratio of the event Y ≥ j is exp(βtreatment ).
The probit link has its own interpretation through a normal linear model for a latent variable
which is considered in section 2.4.
The complentary log-log (clog-log) link is also sometimes used because of its interpretation
as a proportional hazards model for grouped survival times:
Here 1 − γj (xi ) is the probability or survival beyond category j given xi . The proportional
hazards model has the property that
thus the ratio of hazards at x1 relative to x2 are proportional. If the log-log link is used on
the response categories in the reverse order, this is equivalent to using the clog-log link on
the response in the original order. This reverses the sign of β as well as the sign and order of
{θj } while the likelihood and standard errors remain unchanged.
Details of the most common link functions are described in Table 3.
6 Cumulative Link Models with the R package ordinal
Table 3: Summary of the five standard link functions. a : the complementary log-log link; b :
the Gumbel distribution is also known as the extreme value (type I) distribution for extreme
minima or maxima. It is also sometimes referred to as the Weibull (or log-Weibull) distribu-
tion (http://en.wikipedia.org/wiki/Gumbel_distribution); c : the Cauchy distribution
is a t-distribution with one degree of freedom.
The ordinal package allows for the estimation of an extended class of cumulative link models
in which the basic model (1) is extended in a number of ways including structured thresholds,
partial proportional odds, scale effects and flexible link functions. The following sections will
describe these extensions of the basic CLM.
gα (θi ) − x⊤ ⊤
i β − wi β̃j
γij = Fλ (ηij ), ηij = (7)
exp(zi ζ)
where
Fλ is the inverse link function. It may be parameterized by the scalar parameter λ in which
case we refer to Fλ−1 as a flexible link function,
gα (θi ) parameterises thresholds {θj } by the vector α such that g restricts {θj } to be for
example symmetric or equidistant. We denote this structured thresholds.
x⊤
i β are the ordinary regression effects,
wi⊤ β̃j are regression effects which are allowed to depend on the response category j and they
are denoted partial or non-proportional odds (Peterson and Harrell Jr. 1990) when the
logit link is applied. To include other link functions in the terminology we denote these
effects nominal effects (in text and code) because these effects are not integral to the
ordinal nature of the data.
Rune Haubo B Christensen 7
exp(zi ζ) are scale effects since in a latent variable view these effects model the scale of the
underlying location-scale distribution.
With the exception of the structured thresholds, these extensions of the basic CLM have
been considered individually in a number of sources but to the author’s best knowledge not
previously in a unified framework. For example partial proportional odds have been considered
by Peterson and Harrell Jr. (1990) and scale effect have been considered by McCullagh (1980)
and Cox (1995). Agresti (2002) is a good introduction to cumulative link models in the context
of categorical data analysis and includes discussions of scale effects.
Si = α∗ + x⊤ ∗
i β + εi , εi ∼ N (0, σ ∗2 ) (8)
∗
If Si falls between two thresholds, θj−1 < Si ≤ θj∗ where
∗
−∞ ≡ θ0∗ < θ1∗ < . . . < θJ−1 < θJ∗ ≡ ∞ (9)
θj∗ − α∗ − x⊤ ∗
!
i β
γij = P(Yi ≤ j) = P(Si ≤ θj∗ ) = P Z ≤ = Φ(θj − x⊤
i β)
σ∗
where Z follows a standard normal distribution, Φ denotes the standard normal cumulative
distribution function, parameters with an “∗ ” exist on the latent scale, θj = (θj∗ − α∗ )/σ ∗ and
β = β ∗ /σ ∗ . Note that α∗ , β ∗ and σ ∗ would have been identifiable if the latent variable S was
directly observed, but they are not identifiable with ordinal observations.
If we allow a log-linear model for the scale such that
where zi is the i’th row of a design matrix Z without a leading column for an intercept and
σ ∗ = exp(µ), then
θj∗ − α∗ − x⊤ ∗
! !
i β θj − xTi β
γij = P Z ≤ =Φ
σi∗ σi
g(α) = J ⊤ α = θ
where the Jacobian J defines the mapping from the parameters α to the thresholds θ. The
traditional ordered but otherwise unrestricted thresholds are denoted flexible thresholds and
obtained by taking J to be an identity matrix.
Assuming J = 6 ordered categories, the Jacobians for equidistant and symmetric thresholds
(denoted equidistant and symmetric in clm argument threshold) are
" # 1 1 1 1 1
1 1 1 1 1
Jequidistant = , Jsymmetric = 0 −1 0 1 0 .
0 1 2 3 4
−1 0 0 0 1
The nature of J for a particular model can always be inspected by printing the tJac com-
ponent of the clm fit.
categorical variable has 3 levels, then β̃j is a 2-vector. In the default treatment contrast-
coding ("contr.treatment") θj is the j’th threshold for the first (base) level of the factor,
β̃1j is the differences between thresholds for the first and second level and β̃2j is the difference
betwen the thresholds for the first and third level.
In general we define Θ as a matrix with J − 1 columns and with 1 row for each combination
of the levels of factors in W . This matrix can be extracted from the model fit
Note that variables in X cannot also be part of W if the model is to remain identifiable.
ordinal detects this and automatically removes the offending variables from X.
which depends on the auxiliary parameter λ ∈]0, ∞[. When λ = 1, the logistic link function
arise, and when λ → 0,
The density implied by the inverse link function is left skewed if 0 < λ < 1, symmetric if
λ = 1 and right skewed if λ > 1, so the link function can be used to assess the evidence about
possible skewness of the latent distribution.
The log-gamma link function proposed by Genter and Farewell (1985) is based on the log-
gamma density by Farewell and Prentice (1977). The cumulative distribution function and
hence inverse link function reads
−2
1 − G(q; λ ) λ<0
Fλ (η) = Φ(η) λ=0
G(q; λ−2 )
λ>0
where q = λ−2 exp(λη) and G(·; α) denotes the Gamma distribution with shape parameter α
and unit rate parameter and Φ denotes the cumulative standard normal distribution function.
The corresponding density function reads
(
|λ|k k Γ(k)−1 exp{k(λη − exp(λη))} λ 6= 0
fλ (η) =
φ(η) λ=0
10 Cumulative Link Models with the R package ordinal
where k = λ−2 , Γ(·) is the gamma function and φ is the standard normal density function.
Note that choice and parameterization of the predictor, ηij , e.g., the use of scale effects, can
affect the evidence about the shape of the latent distribution. There are usually several link
functions which provide essentially the same fit to the data and choosing among the good
candidates is probably best done by appealing to arguments such as ease of interpretation
rather than arguments related to fit.
where
H̃(ψ (i) ; y) = H(ψ (i) ; y) + c2 (c3 + min(e(i) ))I,
H(ψ (i) ; y) and g(ψ (i) ; y) are the Hessian and gradient of the negative log-likelihood function
with respect to the parameters evaluated at the current estimates; e(i) is vector of eigenvalues
of H(ψ (i) ; y), h(i) is the i’th step, c1 is a scalar parameter which controls the step halving,
and c2 , c3 are scalar parameters which control the regularization of the Hessian.
Regularization is only enforced when the Hessian is not positive definite, so c2 = 1 when
min(e(i) ) < τ and zero otherwise, were τ is an appropriate tolerance. The choice of c3 is to
some extent arbitrary (though positive) and the algorithm in ordinal sets c3 = 1.
Step-halving is enforced when the full step h(i) causes a decrease in the likelihood function
in which case c1 is consecutively halved, c1 = 12 , 41 , 18 , . . . until the step c1 h(i) is small enough
to cause an increase in the likelihood or until the maximum allowed number of consecutive
step-halvings has been reached.
Rune Haubo B Christensen 11
The algorithm in ordinal also deals with a couple of numerical issues that may occur. For
example, the likelihood function may be sufficiently flat that the change in log-likelihood is
smaller than what can be represented in double precision, and so, while the new parameters
may be closer to the true ML estimates and be associated with a smaller gradient, it is not
possible to measure progress by the change in log-likelihood.
The NR algorithm in ordinal has two convergence criteria: (1) an absolute criterion requesting
that max |g(ψ (i) ; y)| < τ1 and (2) a relative criterion requesting that max |h(i) | < τ2 where
the default thresholds are τ1 = τ2 = 10−6 .
Here the first criterion attempts to establish closeness of ψ (i) to the true ML estimates in
absolute terms; the second criterion is an estimate of relative closeness of to the true ML
estimates. Both convergence criteria are needed if both small (e.g., ≈ 0.0001) and large (e.g.,
≈ 1000) parameter estimates are to be determined accurately with an appropriate number of
correct decimals as well as significant digits.
The NR algorithm in ordinal attempts to satisfy the absolute criterion first and will then only
attempt to satisfy the relative criterion if it can take the full un-regularized NR step and then
only for a maximum of 5 steps.
where α∗ is the exact (but unknown) value of the ML estimate, α̂(i) is the ML estimator
of α at the i’th iteration and h(i) is the full unregularized NR step at the i’th iteration.
Since the gradient and Hessian of the negative log-likelihood function with respect to the
parameters is already evaluated and part of the model fit at convergence, it is essentially
computationally cost-free to approximate the error in the parameter estimates. Based on the
error estimate the number of corretly determined decimals and significant digits is determined
12 Cumulative Link Models with the R package ordinal
Starting values
For the basic CLMs (1) the threshold parameters are initialized to an increasing sequence
such that the cumulative density of logistic distribution between consecutive thresholds (and
below the lowest or above the highst threshold) is the same. The regression parameters β,
scale parameters ζ as well as nominal effect β ∗ are initialized to 0.
If the model specifies a cauchit link or includes scale parameters estimation starts at the
parameter estimates of a model using the probit link and/or without the scale-part of the
model.
Estimation problems
With many nominal effects it may be difficult to find a model in which the threshold pa-
Rune Haubo B Christensen 13
rameters are strictly increasing for all combinations of the parameters. Upon convergence of
the NR algorithm the model evaluates the Θ-matrix and checks that each row of threshold
estimates are increasing.
When a continuous variable is included among the nominal effects it is often helpful if the
continuous variable is centered at an appropriate value (at least within the observed range of
the data). This is because {θj } represent the thresholds when the continuous variable is zero
and {θj } are enforced to be a non-decreasing sequence. Since the nominal effects represent
different slopes for the continuous variable the thresholds will necessarily be ordered differently
at some other value of the continuous variable.
Convergence codes
Irrespective of the fitting algorithm, ordinal reports the following convergence codes for CLMs
in which negative values indicate convergence failure:
-3 Not all thresholds are increasing. This is only possible with nominal effects and constitutes
an invalid fit.
-2 The Hessian has at least one negative eigenvalue. This means that the point at which the
algorithm terminated does not represent an optimum.
-1 Absolute convergence criterion (maximum absolute gradient) was not satisfied. This means
that the algorithm couldn’t get close enough to a stationary point of the log-likelihood
function.
0 Successful convergence.
1 The Hessian is singular (i.e., at least one eigenvalue is zero). This means that some param-
eters are not uniquely determined.
Note that with convergence code 1 the optimum of the log-likelihood function has been found
although it is not a single point but a line (or in general a (hyper) plane), so while some
parameters are not uniquely determined the value of the likelihood is valid enough and can
be compared to that of other models.
In addition to these convergence codes, the NR algorithm in ordinal reports the following
messages:
1 Absolute convergence criterion was met, but relative criterion was not met
Note that convergence is assessed irrespective of messages from the fitting algorithm irre-
spective of whether the NR algorithm described above or a general-purpose quasi-Newton
optimizer is used.
14 Cumulative Link Models with the R package ordinal
Several arguments are standard and well-known from lm and glm and will not be described in
detail; formula, data, weights, subset and na.action are all parts of the standard model
specification in R.
scale and nominal are interpreted as R-formulae with no left hand sides and specifies the
scale and nominal effects of the model respectively, see sections 4.3 and 4.2 for details; start
is an optional vector of starting values; doFit can be set to FALSE to prompt clm to return
a model environment, for details see section 4.9; model controls whether the model.frame
should be included in the returned model fit; link specifies the link function and threshold
specifies an optional threshold structure, for details see section 4.4.
Note the absense of a separate offset argument. Since clm allows for different offsets
in different formula and scale offsets have to be specified within a each formulae, e.g.,
scale = ~ x1 + offset(x2).
Control parameters can either be specified as a named list, among the optional ... arguments,
or directly as a call to clm.control — in the first two cases the arguments are passed on to
clm.control. clm.control takes the following arguments:
The method argument specifies the optimization and/or return method. The default esti-
mation method (Newton) is the regularized Newton-Raphson estimation scheme described
in section 3.1; options model.frame and design prompts clm to return respectively the
model.frame and a list of objects that represent the internal representation instead of fitting
the model; options ucminf, nlminb and optim represent different general-purpose optimizers
which may be used to fit the model (the former from package ucminf (Nielsen and Mortensen
2016), the latter two from package stats). The sign.location and sign.nominal options
allow the user to flip the signs on the location and nominal model terms. The convergence
argument instructs clm how to alert the user of potential convergence problems; ... are
optional arguments passed on to the general purpose optimizers; trace applies across all
optimizers and positive values lead to printing of progress during iterations; the remaining
arguments (maxIter, gradTol, maxLineIter, relTol, tol) control the behavior of the
regularized NR algorithm described in section 3.1.
Rune Haubo B Christensen 15
Least—Most bitter
Temperature Contact 1 2 3 4 5
cold no 4 9 5 0 0
cold yes 1 7 8 2 0
warm no 0 5 8 3 2
warm yes 0 1 5 7 5
Table 4: The number of ratings from nine judges in bitterness categories 1 — 5. Wine data
from Randall (1989) aggregated over bottles and judges.
where β1 (tempi ) attains the values β1 (cold) and β1 (warm), and β2 (contacti ) attains the
values β2 (no) and β2 (yes). The effect of temperature in this model is illustrated in Figure 1.
This is a model for the cumulative probability of the ith rating falling in the jth category
or below, where i index all observations (n = 72), j = 1, . . . , J index the response categories
(J = 5) and θj is the intercept or threshold for the jth cumulative logit: logit(P (Yi ≤ j)).
Fitting the model with clm we obtain:
R> library("ordinal")
R> fm1 <- clm(rating ~ temp + contact, data=wine)
R> summary(fm1)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
tempwarm 2.5031 0.5287 4.735 2.19e-06 ***
contactyes 1.5278 0.4766 3.205 0.00135 **
---
16 Cumulative Link Models with the R package ordinal
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Threshold coefficients:
Estimate Std. Error z value
1|2 -1.3444 0.5171 -2.600
2|3 1.2508 0.4379 2.857
3|4 3.4669 0.5978 5.800
4|5 5.0064 0.7309 6.850
The summary method prints basic information about the fitted model. The primary result is
the coefficient table with parameter estimates, standard errors and Wald based p values for
tests of the parameters being zero. If one of the flexible link functions (link = "log-gamma"
or link = "Aranda-Ordaz") is used a coefficient table for the link parameter, λ is also in-
cluded. The maximum likelihood estimates of the model coefficients are:
The coefficients for temp and contact are positive indicating that higher temperature and
contact increase the bitterness of wine, i.e., rating in higher categories is more likely. Because
the treatment contrast coding which is the default in R was used, {θ̂j } refers to the thresholds
at the setting with tempi = cold and contacti = no. Three natural and complementing
interpretations of this model are
1. The thresholds {θ̂j } at contacti = yes conditions have been shifted a constant amount
1.53 relative to the thresholds {θ̂j } at contacti = no conditions.
2. The location of the latent distribution has been shifted +1.53σ ∗ (scale units) at contacti =
yes relative to contacti = no.
3. The odds ratio of bitterness being rated in category j or above (OR(Y ≥ j)) is
exp(β̂2 (yes − no)) = 4.61.
Note that there are no p values displayed for the threshold coefficients because it usually does
not make sense to test if they equal zero.
The number of Newton-Raphson iterations is given below niter with the number of step-
halvings in parenthesis. max.grad is the maximum absolute gradient of the log-likelihood
function with respect to the parameters. The condition number of the Hessian (cond.H) is
well below 104 and so does not indicate a problem with the model.
The anova method produces an analysis of deviance (ANODE) tables also based on Wald
χ2 -tests and provides tables with type I, II and III hypothesis tests using the SAS definitions.
A type I table, the R default for linear models fitted with lm, sequentially tests terms from
first to last, type II tests attempt to respect the principle of marginality and test each term
after all others while ignoring higher order interactions, and type III tables are based on
orthogonalized contrasts and tests of main effects or lower order terms can often be interpreted
as averaged over higher order terms. Note that in this implementation any type of contrasts
(e.g., contr.treatment or contr.SAS as well as contr.sum) can be used to produce type III
Rune Haubo B Christensen 17
Y: 1 2 3 4 5
warm
P(Y = 2|cold)
cold
θ1 α θ2 θ3 θ4
Figure 1: Illustration of the effect of temperature in the standard cumulative link model in
Equation 10 for the wine data in Table 4 through a latent variable interpretation.
tests. For further details on the interpretation and definition of type I, II and III tests, please
see (Kuznetsova, Brockhoff, and Christensen 2017) and (SAS Institute Inc. 2008).
Here we illustrate with a type III ANODE table, which in this case is equivalent to type I
and II tables since the variables are balanced:
Df Chisq Pr(>Chisq)
temp 1 22.417 2.195e-06 ***
contact 1 10.275 0.001348 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Likelihood ratio tests, though asymptotically equivalent to the Wald tests usually better
reflect the evidence in the data. These tests can be obtained by comparing nested models
with the anova method, for example, the likelihood ratio test of contact is
which in this case produces a slightly lower p value. Equivalently we can use drop1 to obtain
likelihood ratio tests of the explanatory variables while controlling for the remaining variables:
Model:
rating ~ temp + contact
Df AIC LRT Pr(>Chi)
<none> 184.98
temp 1 209.91 26.928 2.112e-07 ***
contact 1 194.03 11.043 0.0008902 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Likelihood ratio tests of the explanatory variables while ignoring the remaining variables are
provided by the add1 method (outputs not shown):
Confidence intervals of the parameter estimates are provided by the confint method which
by default compute the so-called profile likelihood confidence intervals:
R> confint(fm1)
2.5 % 97.5 %
tempwarm 1.5097627 3.595225
contactyes 0.6157925 2.492404
The cumulative link model in Equation 10 assumes that the thresholds, {θj } are constant for
all values of the remaining explanatory variables, here temp and contact. This is generally
referred to as the proportional odds assumption or equal slopes assumption. We can relax this
assumption in two general ways: with nominal effects and scale effects examples of which we
will now present in turn.
regression variables. In the following model we relax this assumption and allow the threshold
parameters to depend on contact which leads to the so-called partial proportional odds for
contact:
logit(P (Yi ≤ j)) = θj + β̃j (contacti ) − β(tempi )
(12)
i = 1, . . . , n, j = 1, . . . , J − 1
One way to view this model is to think of two sets of thresholds being applied at conditions
with and without contact as illustrated in Figure 2. The model is specified as follows with
clm:
Coefficients:
Estimate Std. Error z value Pr(>|z|)
tempwarm 2.519 0.535 4.708 2.5e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Threshold coefficients:
Estimate Std. Error z value
1|2.(Intercept) -1.3230 0.5623 -2.353
2|3.(Intercept) 1.2464 0.4748 2.625
3|4.(Intercept) 3.5500 0.6560 5.411
4|5.(Intercept) 4.6602 0.8604 5.416
1|2.contactyes -1.6151 1.1618 -1.390
2|3.contactyes -1.5116 0.5906 -2.559
3|4.contactyes -1.6748 0.6488 -2.581
4|5.contactyes -1.0506 0.8965 -1.172
As can be seen from the output of summary there are no regression coefficient estimated for
contact, but there are additional threshold coefficients estimated instead. The naming and
meaning of the threshold coefficients depend on the contrast coding applied to contact. Here
the R default treatment contrasts ("contr.treatment") are used.
Here coefficients translate to the following parameter functions:
θ1 θ2 θ3 θ4 contact: yes
warm
cold
θ1 α θ2 θ3 θ4 contact: no
Figure 2: Illustration of nominal effects leading to different sets of thresholds being applied
for each level of contact in a latent variable interpretation, cf., Equation 12.
Again {θj } refer to the thresholds at tempi = cold and contacti = no settings while the
thresholds at tempi = cold and contacti = yes are {θ̂j + β̃ˆj (yes − no)}. The odds ratio of
bitterness being rated in category j or above (OR(Y ≥ j)) now depends on j: {exp(−β̃ˆ (yes−
j
no))} = {5.03, 4.53, 5.34, 2.86}.
The resulting thresholds for each level of contact, i.e., the estimated Θ-matrix can be ex-
tracted with:
R> fm.nom$Theta
As part of the convergence checks, clm checks the validity of Θ, i.e., that each row of the
threshold matrix is non-decreasing.
We can perform a likelihood ratio test of the proportional odds assumption for contact by
comparing the likelihoods of models (10) and (12) as follows:
There is only little difference in the log-likelihoods of the two models and the test is in-
significant. Thus there is no evidence that the proportional odds assumption is violated for
contact.
It is not possible to estimate both β2 (contacti ) and β̃j (contacti ) in the same model. Con-
sequently variables that appear in nominal cannot enter in formula as well. For instance,
not all parameters are identifiable in the following model:
We are made aware of this when summarizing or printing the model in which the coefficient
for contactyes is NA:
R> fm.nom2
Threshold coefficients:
1|2 2|3 3|4 4|5
(Intercept) -1.323 1.246 3.550 4.660
contactyes -1.615 -1.512 -1.675 -1.051
To test the proportional odds assumption for all variables, we can use
R> nominal_test(fm1)
This function moves all terms in formula and copies all terms in scale to nominal one by
one and produces an add1-like table with likelihood ratio tests of each term.
Coefficients:
Estimate Std. Error z value Pr(>|z|)
tempwarm 2.6294 0.6860 3.833 0.000127 ***
contactyes 1.5878 0.5301 2.995 0.002743 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
log-scale coefficients:
Estimate Std. Error z value Pr(>|z|)
tempwarm 0.09536 0.29414 0.324 0.746
Threshold coefficients:
Estimate Std. Error z value
1|2 -1.3520 0.5223 -2.588
2|3 1.2730 0.4533 2.808
3|4 3.6170 0.7774 4.653
4|5 5.2982 1.2027 4.405
In a latent variable interpretation the location of the latent distribution is shifted 2.63σ ∗ (scale
units) from cold to warm conditions and 1.59σ ∗ from absense to presence of contact. The scale
of the latent distribution is σ ∗ at cold conditions but σ ∗ exp(ζ(warm−cold)) = σ ∗ exp(0.095) =
1.10σ ∗ , i.e., 10% higher, at warm conditions. However, observe that the p value for the scale
Rune Haubo B Christensen 23
Y: 1 2 3 4 5
warm
cold
θ1 α θ2 θ3 θ4
Figure 3: Illustration of scale effects leading to different scales of the latent variable, cf.,
Equation 14.
effect in the summary output shows that the ratio of scales is not significantly different from
1 (or equivalently that the difference on the log-scale is not different from 0).
Scale effects offer an alternative to nominal effects (partial proportional odds) when non-
proportional odds structures are encountered in the data. Using scale effects is often a better
approach because the model is well-defined for all values of the explanatory variables irrespec-
tive of translocation and scaling of covariates. Scale effects also use fewer parameters which
often lead to more sensitive tests than nominal effects. Potential scale effects of variables
already included in formula can be discovered using scale_test. This function adds each
model term in formula to scale in turn and reports the likelihood ratio statistic in an add1
fashion:
R> scale_test(fm1)
confint and anova methods apply with no change to models with scale and nominal parts,
but drop1, add1 and step methods will only drop or add terms to the (location) formula.
additional restrictions on the thresholds. In the following model we require that the thresh-
olds, {θj } are equidistant or equally spaced. This allows us to assess an assumption that
judges are using the response scale in such a way that there is the same distance between
adjacent response categories, i.e., that θj − θj−1 = constant for j = 2, ..., J − 1. The effect of
equidistant thresholds is illustrated in Figure 4 and can be fitted with:
Coefficients:
Estimate Std. Error z value Pr(>|z|)
tempwarm 2.4632 0.5164 4.77 1.84e-06 ***
contactyes 1.5080 0.4712 3.20 0.00137 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Threshold coefficients:
Estimate Std. Error z value
threshold.1 -1.0010 0.3978 -2.517
spacing 2.1229 0.2455 8.646
The parameters determining the thresholds are now the first threshold (threshold.1) and
the spacing among consecutive thresholds (spacing). The mapping to this parameterization
is stored in the transpose of the Jacobian matrix (tJac) component of the model fit. This
makes it possible to extract the thresholds imposed by the equidistance structure with
These thresholds are in fact already stored in the Theta component of the model fit. The
following shows that the average distance between consecutive thresholds in fm1 which did
not restrict the thresholds is very close to the spacing parameter from fm.equi:
R> mean(diff(coef(fm1)[1:4]))
[1] 2.116929
One advantage of imposing additional restrictions on the thresholds is the use of fewer pa-
rameters. Whether the restrictions are warranted by the data can be assessed in a likelihood
ratio test:
Rune Haubo B Christensen 25
Y: 1 2 3 4 5 Y: 1 2 3 4 5
∆1 ∆2 ∆3 ∆ ∆ ∆
β β
warm warm
cold cold
θ1 α θ2 θ3 θ4 θ1 α θ2 θ3 θ4
Figure 4: Illustration of flexible (left) and equidistant (right) thresholds being applied in a
cumulative link model in a latent variable interpretation.
In this case the test is non-significant, so there is no considerable loss of fit at the gain of
saving two parameters, hence we may retain the model with equally spaced thresholds.
Note that the shape of the latent distribution (determined by the choice of link function) also
affects the distances between the thresholds. If thresholds are equidistant under a normal
distribution (i.e., with the logit link) they will in general4 not be equidistant under a differently
shaped latent distribution such as a skew latent distribution (e.g., with the log-log or clog-log
link).
1.0
1.0
0.8
0.8
Relative profile likelihood
0.6
0.4
0.4
0.2
0.2
0.0
0.0
1 2 3 4 0 1 2 3
tempwarm contactyes
Figure 5: Relative profile likelihoods for the regression parameters in fm1 for the wine data.
Horizontal lines indicate 95% and 99% confidence bounds.
As an example, the profile likelihood of model coefficients for temp and contact in fm1 can
be obtained with
The resulting plots are provided in Figure 5. The alpha argument controls how far from the
maximum likelihood estimate the likelihood function should be profiled: the profile strays no
further from the MLE when values outside an (1 - alpha)-level profile likelihood confidence
interval.
From the relative profile likelihood in Figure 5 for tempwarm we see that parameter values
between 1 and 4 are reasonably well supported by the data, and values outside this range has
little likelihood. Values between 2 and 3 are very well supported by the data and have high
likelihood.
Profiling is implemented for regression (β) and scale (ζ) parameters but not available for
threshold, nominal and flexible link parameters.
Likelihood slices
The maximum likelihood estimates of the parameters in cumulative link models do not have
closed form expressions, so iterative methods have to be applied to fit the models. Further,
CLMs are non-linear models and in general the likelihood function is not garanteed to be
well-behaved or even uni-model. In addition the special role of the threshold parameters and
the restriction on them being ordered can affect the appearance of the likelihood function.
To confirm that an unequivocal optimum has been reached and that the likelihood function is
reasonably well-behaved around the reported optimum we can inspect the likelihood function
Rune Haubo B Christensen 27
Relative log−likelihood
Relative log−likelihood
Relative log−likelihood
0
0
−10
−5
−10
−30
−15
−20
−3 −2 −1 0 1 0.0 1.0 2.0 2.5 3.5 4.5
Relative log−likelihood
Relative log−likelihood
0
−6 −2
−2
−20
−6
−12
−12
−40
Figure 6: Slices of the (negative) log-likelihood function (solid) for parameters in fm1 for the
wine data. Dashed lines indicate quadratic approximations to the log-likelihood function and
vertical bars indicate maximum likelihood estimates.
in a neighborhood around the reported optimum. For these purposes we can display slices of
the likelihood function.
The following code produces the slices shown in Figure 6 which displays the shape of the
log-likelihood function in a fairly wide neighborhood around the reported MLE; here we use
λ = 5 curvature units, as well as it’s quadratic approximation.
Figure 6 shows that log-likelihood function is fairly well behaved and relatively closely quadratic
for most parameters.
Looking at the log-likelihood function much closer to the reported optimum (using λ = 10−5 )
we can probe how accurately the parameter estimates are determined. The likelihood slices in
Figure 7 which are produced with the following code shows that the parameters are determined
accurately with at least 5 correct decimals. Slices are shown for two parameters and the slices
for the remaining 4 parameters are very similar.
−1e−11
−1e−11
Relative log−likelihood
Relative log−likelihood
−3e−11
−3e−11
−5e−11
−5e−11
5.006402 5.006404 5.006406 5.006408 2.503099 2.503101 2.503103 2.503105
4|5 tempwarm
Figure 7: Slices of the (negative) log-likelihood function (solid) for parameters in fm1 for
the wine data very close to the MLEs. Dashed lines (indistinguashable from the solid lines)
indicate quadratic approximations to the log-likelihood function and vertical bars the indicate
maximum likelihood estimates.
Parameter accuracy
As discussed in section 3.1 the method independent error estimate provides an assessment of
the accucary with which the ML estimates of the parameters have been determined by the
fitting algorithm. This error estimate is implemented in the convergence method which we
now illustrate on a model fit:
R> convergence(fm1)
The most important information is the number of correct decimals (Cor.Dec) and the number
of significant digits (Sig.Dig) with which the parameters are determined. In this case all
parameters are very accurately determined, so there is no reason to lower the convergence
tolerance. The logLik.error shows that the error in the reported value of the log-likelihood
is below 10−10 , which is by far small enough that likelihood ratio tests based on this model
are accurate.
Note that the assessment of the number of correctly determined decimals and significant
digits is only reliable sufficiently close to the optimum so in practice we caution against this
assessment if the algorithm did not converge successfully.
ˆi = π̃i (ψ̂)
π̃
that is, the value of π̃i , cf., Equation 3 evaluated at the ML estimates ψ̂. These are the
values returned by the fitted and fitted.values extractor methods and stored in the
fitted.values component of the model fit.
The values of πij (cf., Equation 2) evaluated at the ML estimates of the parameters (i.e., π̂ij )
can also be thought of as fitted values for the multinomially distributed variable Yi∗ . These
values can be obtained from the model fit by use of the predict method:
1 2 3 4 5
1 0.20679013 0.5706497 0.1922909 0.02361882 0.00665041
2 0.20679013 0.5706497 0.1922909 0.02361882 0.00665041
3 0.05354601 0.3776461 0.4430599 0.09582084 0.02992711
4 0.05354601 0.3776461 0.4430599 0.09582084 0.02992711
5 0.02088771 0.2014157 0.5015755 0.20049402 0.07562701
6 0.02088771 0.2014157 0.5015755 0.20049402 0.07562701
Note that the original data set should be supplied in the newdata argument without the
response variable (here rating). If the response variable is present in newdata predictions
are produced for only those rating categories which were observed and we get back the fitted
values:
Class predictions are also available and defined here as the response class with the highest
probability, that is, for the i’th observation the class prediction is the mode of πi . To obtain
class predictions use type = "class" as illustrated in the following small table:
30 Cumulative Link Models with the R package ordinal
Other definitions of class predictions can be applied, e.g., nearest mean predictions:
1 2 3 4 5 6
2 2 3 3 3 3
where the default 95% confidence level can be changed with the level argument.
ˆ = π̃(ψ̂) are obtained by application
Here the standard errors of fitted values or predictions, π̃
of the delta method:
ˆ ) = CVar(ψ̂)C ⊤ , ∂ π̃(ψ)
Var(π̃ C=
∂ψ ψ=ψ̂
probability scale. predict.clm takes this approach and computes the standard error of
ˆi ) by yet an application of the delta method:
κ̂i = logit(π̃
ˆi )
∂g(π̃ ˆ ˆi
se(κ̂i ) = ˆi ) = se(π̃i ) ,
se(π̃ ˆi ) = log
g(π̃
π̃
.
ˆi
∂ π̃ ˆi (1 − π̃
π̃ ˆi ) ˆi
1 − π̃
Complete separation
In binary logistic regression the issue of complete separation is well known. This may happen,
for example if only “success” or only “failure” is observed for a level of a treatment factor.
In CLMs the issue may appear even when outcomes are observed in more than one response
category. This can be illustrated using the wine dataset if we combine the three central
categories:
rating_comb3 1 2-4 5
temp
cold 5 31 0
warm 0 29 7
Coefficients:
Estimate Std. Error z value Pr(>|z|)
tempwarm 21.89 NA NA NA
Threshold coefficients:
Estimate Std. Error z value
1|2-4 -1.825 NA NA
2-4|5 23.310 NA NA
32 Cumulative Link Models with the R package ordinal
Here the true ML estimates of the coefficients for temp and the second threshold are at
infinity but the algorithm in clm terminates when the likelihood function is sufficiently flat.
This means that the reported values of the coefficients for temp and the second threshold are
arbitrary and will change if the convergence criteria are changed or a different optimization
method is used. The standard errors of the coefficients are not available because the Hessian
is effectively singular and so cannot be inverted to produce the variance-covariance matrix of
the parameters. The ill-determined nature of the Hessian is seen from the very large condition
number of the Hessian, cond.H.
Note, however, that while the model parameters cannot be uniquely determined, the likelihood
of the model is well defined and as such it can be compared to the likelihood of other models.
For example, we could compare it to a model that excludes temp
The difference in log-likelihood is substantial, however, the criteria for the validity of the
likelihood ratio test are not fulfilled, so the p value should not be taken at face value.
The complete-separation issue may also appear in less obvious situations. If, for example we
consider the following model allowing for nominal effects of temp the issue shows up:
Coefficients:
Estimate Std. Error z value Pr(>|z|)
contactyes 1.465 NA NA NA
Rune Haubo B Christensen 33
Threshold coefficients:
Estimate Std. Error z value
1|2.(Intercept) -1.266 NA NA
2|3.(Intercept) 1.104 NA NA
3|4.(Intercept) 3.766 NA NA
4|5.(Intercept) 24.896 NA NA
1|2.tempwarm -21.095 NA NA
2|3.tempwarm -2.153 NA NA
3|4.tempwarm -2.873 NA NA
4|5.tempwarm -22.550 NA NA
Analytical detection of which coefficients suffer from unidentifiability due to complete sepa-
ration is a topic for future research and therefore unavailable in current versions of ordinal.
Aliased coefficients
Aliased coefficients can occurs in all kinds of models that build on a design matrix including
linear models as well as generalized linear models. lm and glm determine the rank deficiency
of the design matrix using the rank-revealing implementation of the QR-decomposition in
LINPACK and displays the aliased coefficients as NAs5 . Though the QR decomposition is not
used during iterations in clm, it used initially to determine aliased coeffients. An example is
provided using the soup data available in the ordinal package:
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Threshold coefficients:
Estimate Std. Error z value
1|2 -1.63086 0.10740 -15.184
2|3 -0.64177 0.09682 -6.629
3|4 -0.31295 0.09546 -3.278
4|5 -0.05644 0.09508 -0.594
5|6 0.61692 0.09640 6.399
PRODID
DAY 1 2 3 4 5 6
1 369 94 184 93 88 93
2 370 275 0 92 97 92
which shows that the third PRODID was not presented at the third day.
The issue of aliased coefficients extends in CLMs to nominal effects since the joint design
matrix for location and nominal effects will be singular if the same variables are included
in both location and nominal formulae. clm handles this by not estimating the offending
coefficients in the location formula as illustrated with the fm.nom2 model fit in section 4.2.
Over parameterization
The scope of model structures allowed in clm makes it possible to specify models which are
over parameterized in ways that do not lead to rank deficient design matrices and as such
are not easily detected before fitting the model. An example is given here which includes
both additive (location) and multiplicative (scale) effects of contact for a binomial response
variable but the issue can also occur with more than two response categories:
Coefficients:
Estimate Std. Error z value Pr(>|z|)
contactyes 1.099 NA NA NA
log-scale coefficients:
Estimate Std. Error z value Pr(>|z|)
contactyes -1.174e-06 NA NA NA
Threshold coefficients:
Estimate Std. Error z value
1-2|3-5 -7.171e-17 NA NA
This environment holds a complete specification of the cumulative link models including
design matrices B1, B2, S and other components. The environment also contains the cu-
mulative distribution function that defines the inverse link function pfun and its first and
second derivatives, i.e., the corresponding density function dfun and gradient gfun. Of direct
interest here is the parameter vector par and functions that readily evaluate the negative log-
likelihood (clm.nll), its gradient with respect to the parameters (clm.grad) and the Hessian
(clm.hess). The negative log-likelihood and the gradient a the starting values is therefore
R> rho$clm.nll(rho)
[1] 115.8795
R> c(rho$clm.grad(rho))
36 Cumulative Link Models with the R package ordinal
[1] 86.49192
Note that the gradient function clm.grad assumes that clm.nll has been evaluated at the
current parameter values; similarly, clm.hess assumes that clm.grad has been evaluated at
the current parameter values. The NR algorithm in ordinal takes advantage of this so as to
minimize the computational load.
If interest is in fitting a custom CLM with, say, restrictions on the parameter space, this can
be achived by a combination of a general purpose optimizer and the functions clm.nll and
optionally clm.grad. Assume for instance we know that the regression parameters can be no
larger than 2, then the model can be fitted with the following code:
disease
smoker 0 1 2 3 4 Sum
no 334 99 117 159 30 739
yes 350 307 345 481 67 1550
smokeryes
2.090131
showing that overall the odds of worse disease rating is twice as high for smokers compared
to non-smokers.
Allowing for nominal effects we see that the log odds-ratio for smoking clearly changes with
disease status, and that it does so in an almost linearly decreasing manor:
Peterson and Harrell Jr. (1990) suggested a model which restricts the log odds-ratios to be
linearly decreasing with disease status modelling only the intercept (first threshold) and slope
of the log odds-ratios:
(Intercept) I(0:3)
1.0225640 -0.3110969
We can implement the log-likelihood of this model as follows. As starting values we combine
parameter estimates from fm.nom and the linear model fm.lm, and finally optimize the log-
likelihood utilising the fm.nom model environment:
Thus the log-odds decrease linearly from 1.02 for the first two disease categories by 0.3 per
disease category.
5. Conclusions
Cumulative link models is a very rich model class in between, in a sense, the perhaps the two
most important model classes in statistics; linear models and logistic regression models. The
greater flexibility of CLMs relative to binary logistic regression models also facilitates to the
ability to check assumptions such as the partial proportional odds assumption. Non-linear
structures such as scale effects arise naturally in a latent variable interpretation. In addition to
nominal effects and the non-linear scale effects, the ordered nature of the thresholds gives rise
to computational challanges which we have described and addressed in the ordinal package.
In addition to computational challenges, practical data analysis with CLMs can also be chal-
langing. In our experience a top-down approach in which a “full” model is fitted and gradually
simplified is often problematic, not only because this easily leads to unidentifiable models but
also because there are many different ways in which models can be reduced or expanded. A
more pragmatic approach is often preferred; understanding the data through plots, tables,
and even linear models can aid in finding a suitable intermediate ordinal starting model.
Attempts to identify a “correct” model will also often lead to frustrations; the greater the
model framework, the greater the risk that there are multiple models which fit the data (al-
most) equally well. It is well known statistical wisdom that with enough data many goodness
of fit tests become sensitive to even minor deviations of little practical relevance. This is par-
ticularly true for tests of partial proportional odds; in the author’s experience almost all CLMs
on real data show some evidence of non-proportional odds for one or more variables while
the most useful representation of the data is often a model that simply assumes proportional
odds.
Computational details
The results in this paper were obtained using R 4.0.0 with ordinal, version 2019.12.10. R itself
and all packages used are available from the Comprehensive R Archive Network (CRAN) at
https://CRAN.R-project.org/.
References
Agresti A (2002). Categorical Data Analysis. 3rd edition. Wiley. ISBN 978-0470463635.
Aranda-Ordaz FJ (1983). “An Extension of the Proportional-Hazards Model for Grouped
Data.” Biometrics, 39, 109–117. doi:10.2307/2530811.
Brazzale AR, Davison AC, Reid N (2007). Applied Asymptotics—Case Studies in Small-
Sample Statistics. Cambridge University Press. ISBN 9780521847032.
Burridge J (1981). “A Note on Maximum Likelihood Estimation for Regression Models Using
Grouped Data.” Journal of the Royal Statistical Society. Series B (Methodological), 43(1),
41–45. ISSN 00359246.
Rune Haubo B Christensen 39
Bürkner PC (2017). “brms: An R Package for Bayesian Multilevel Models Using Stan.”
Journal of Statistical Software, 80(1), 1–28. doi:10.18637/jss.v080.i01.
Christensen RHB (2012). Sensometrics: Thurstonian and Statistical Models. Ph.D. thesis,
Technical University of Denmark (DTU). URL http://orbit.dtu.dk/files/12270008/
phd271_Rune_Haubo_net.pdf.
Christensen RHB (2018). “ordinal—Regression Models for Ordinal Data.” R package version
2018.8-25, URL http://www.cran.r-project.org/package=ordinal/.
Cox C (1995). “Location-Scale Cumulative Odds Models for Ordinal Data: A Generalized
Non-Linear Model Approach.” Statistics in Medicine, 14, 1191–1203. doi:10.1002/sim.
4780141105.
Farewell VT, Prentice RL (1977). “A Study of Distributional Shape in Life Testing.” Tech-
nometrics, 19, 69–77. doi:10.2307/1268257.
Greene WH, Hensher DA (2010). Modeling Ordered Choices: A Primer. Cambridge University
Press.
Harrell Jr FE (2018). rms: Regression Modeling Strategies. R package version 5.1-2, URL
https://CRAN.R-project.org/package=rms.
IBM Corp (2017). IBM SPSS Statistics for Windows, Version 25.0. IBM Corp., Armonk,
NY.
McCullagh P (1980). “Regression Models for Ordinal Data.” Journal of the Royal Statistical
Society, Series B, 42, 109–142.
Peterson B, Harrell Jr FE (1990). “Partial Proportional Odds Models for Ordinal Response
Variables.” Applied Statistics, 39, 205–217. doi:10.2307/2347760.
Pratt JW (1981). “Concavity of the Log Likelihood.” Journal of the American Statistical
Association, 76(373), 103–106. ISSN 01621459. doi:10.2307/2287052.
40 Cumulative Link Models with the R package ordinal
SAS Institute Inc (2008). The Four Types of Estimable Functions – SAS/STAT ®9.22 User’s
Guide. SAS Institute Inc., Cary, NC. URL https://support.sas.com/documentation/
cdl/en/statugestimable/61763/PDF/default/statugestimable.pdf.
SAS Institute Inc (2010). SAS/STAT ®9.22 User’s Guide. SAS Institute Inc., Cary, NC. URL
https://support.sas.com/documentation/.
Randall J (1989). “The Analysis of Sensory Data by Generalised Linear Model.” Biometrical
journal, 7, 781–793. doi:10.1002/bimj.4710310703.
R Core Team (2018). R: A Language and Environment for Statistical Computing. R Founda-
tion for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.
StataCorp (2017). Stata 15 Base Reference Manual. College Station, TX. URL https:
//www.stata.com/.
Venables WN, Ripley BD (2002). Modern Applied Statistics with S. 4th edition. Springer-
Verlag, New York. doi:10.1007/978-0-387-21706-2.
Yee TW (2010). “The VGAM Package for Categorical Data Analysis.” Journal of Statistical
Software, 32(10), 1–34. doi:10.18637/jss.v032.i10.
Affiliation:
Rune Haubo Bojesen Christensen
Section for Statistics and Data Analysis
Department of Applied Mathematics and Computer Science
DTU Compute
Technical University of Denmark
Richard Petersens Plads
Building 324
DK-2800 Kgs. Lyngby, Denmark
and
Christensen Statistics
Bringetoften 7
DK-3500 Værløse, Denmark
E-mail: Rune.Haubo@gmail.com; Rune@ChristensenStatistics.dk
URL: http://christensenstatistics.dk/