Longitudinal Data Analysis
Aboma Temesgen(Assistant Professor of Biostatistics )
Department of Statistics
College of Computing and Informatics
Haramaya University
aboma.temesgen@gmail.com
©June 2021
Contents
1 Longitudinal Study Design
2 Features of Longitudinal Data
3 Cross - sectional versus Longitudinal Data
4 Exploratory Data Analysis
5 A Model for Longitudinal Data
Model Estimation Techniques
Longitudinal Study Design
• The statistical techniques like analysis of variance and regression
have a basic assumption that the residual or error terms are
independently and identically distributed (iid)
• Many types of studies, however, have designs which imply gathering
repeated measurement data that are dependent groups or clusters
• Familiar examples of clusters are families, towns, litters, branches of
a company or schools.
• In each of these examples, a cluster is a collection of sub-units on
which observations are made
Introduction Cont...
• Another usual form of clustering arises when data are measured
repeatedly on the same unit
• When these repeated measurements are taken over time, it is called
a longitudinal study or, in some applications, a panel study.
Longitudinal data are special forms of repeated measurements
• The methods of analysis used in LDA relax the independence
assumption and take into account more complicated data structure
• LDA methods take into account(relax) the interdependence of
observations which is due to repeated measurement or clustering
Introduction Cont...
• Failure to account for the effect of correlation can result in erroneous
estimation of the variability of parameter estimates, and hence in
misleading inference.
• This interdependence of observations can be modeled using mixed
models
• This Mixed models allow us to specify a pattern for the correlation
between these measurements. By doing so, we may obtain more
precise estimates and wider inferences can be made.
• It was the combination of fixed and random effects
Features of Longitudinal Data
• Defining feature of longitudinal studies is that measurements of the
same individuals are taken repeatedly through time.
• Longitudinal studies allow direct study of change over time.
• Objective: primary goal is to characterize the change in response over
time and the factors that influence change.
• With repeated measures on individuals, we can capture
within-individual change.
• Note: measurements in a longitudinal study are commensurate, i.e.,
the same variable is measured repeatedly.
Features Cont...
• By comparing each individual’s responses at two or more occasions, a
longitudinal analysis can remove extraneous, but unavoidable,
sources of variability among individuals.
• This eliminates major sources of variability or "noise" from the
estimation of within-individual change.
• Complications: Repeated measures on individuals are correlated and
variability is often heterogeneous across measurement occasions
Features Cont...
• Longitudinal data require somewhat more sophisticated statistical
techniques because the repeated observations are usually (positively)
correlated.
• Correlation arises due to repeated measures on the same individuals.
• Sequential nature of the measures implies that certain types of
correlation structures are likely to arise.
• Correlation must be accounted for in order to obtain valid inferences.
• Heterogeneous variability must also be accounted for in order to
obtain valid inferences.
Features Cont...
• Correlated data commonly arise in many applications.
• Longitudinal Studies: designs in which the outcome variable is
measured repeatedly over time.
• Repeated Measures Studies: somewhat older terminology applied to
special set of longitudinal designs characterized by measurement at a
common set of occasions (usually in an experimental setting under
different conditions or treatments).
Terminology in LDA
• Individuals or Subjects: Participants in a longitudinal study are
referred to as individuals or subjects.
• Occasions: In a longitudinal study individuals are measured
repeatedly at different occasions or times.
• The number of repeated observations, and their timing, can vary
widely from one longitudinal study to another.
• When number and timing of the repeated measurements are the
same for all individuals, study design is said to be "balanced" over
time.
• Note: Designs can be balanced, although studies may have
incompleteness in data collection
Terminology Cont...
• Correlation: An aspect of longitudinal data that complicates their
statistical analysis is that repeated measures on the same individual
are usually positively correlated.
• This violates the fundamental assumption of independence that is
the cornerstone of many statistical techniques.
• Why are longitudinal data correlated?
• What are the potential consequences of not accounting for correlation
among longitudinal data in the analysis?
Terminology Cont...
• Variability: An additional, although often overlooked, aspect of
longitudinal data that complicates their statistical analysis is
heterogeneous variability.
• That is, the variability of the outcome at the end of the study is often
discernibly different than the variability at the start of the study.
• This violates the assumption of homoscedasticity that is the basis for
standard linear regression techniques.
• Thus, there are two aspects of longitudinal data that complicate their
statistical analysis: (i) repeated measures on the same individual are
usually positively correlated, and (ii) variability is often
heterogeneous across measurement occasions.
Notation
• Let Yij denote the response variable for the ith individual (i = 1;...; N)
at the jth occasion (j = 1; . . . ;n).
• If the repeated measures are assumed to be equally-separated in
time, this notation will be sufficient.
• We can represent the n observations on the N individuals in a two
dimensional array, with rows corresponding to individuals and
columns corresponding to the responses at each occasion.
Data layout
ocassion
ID 1 2 3 . . n
1 y11 y12 y13 . . y1n
2 y21 y22 y23 . . y24
3 y31 y32 y33 . . y3n
. . . . . . .
. . . . . . .
. . . . . . .
N yN1 yN2 yN3 . . yNn
Vector Notation
• We can group the n repeated measures on the same individual into a
n by 1 response vector
Yi1
Y
i2
Yi =
..
.
Yin
• Alternatively, we can denote the response vectors Yi as
0
Yi = (Yi1 ; Yi2 ; ...; Yin )
Vector Notation
• We denote the expectation or mean of Yij by
µj = E(Yij ); where E(.) can be thought of as a long-run average (over
individuals).
• The mean, µj , provides a measure of the location of the center of the
distribution of Yij
• Variance σj2 = E[Yij − E(Yij )]2 = E[Yij−µj ]2
• Covariance σjk = E[(Yij − µj )(Yik − µk )] is a measure of the linear
dependence between Yij and Yik
E[(Yij −µj )(Yik −µk )]
• Correlation ρjk = σj σ k
Vector Notation
0
• For the vector of repeated measures,Yi = (Yi1 ; Yi2 ; ...; Yin ) , we define
the variance-covariance matrix, Cov(Yi ),
Var(Yi1 ) Cov(Yi1 , Yi2 ) ··· Cov(Yi1 , Yin )
Cov(Y , Y )
i1 i2 Var(Yi2 ) ··· Cov(Yi2 , Yin )
Cov(Yi ) =
.. .. .. ..
. . . .
Cov(Yin , Yi1 ) Cov(Yin , Yi2 ) ··· Var(Yin )
σ12 σ12 ··· σ1n
σ
21 σ22 ··· σ2n
=
.. .. .. ..
.
. . .
σn1 σn2 ··· σn2
Vector Notation
• We can also define the correlation matrix, Corr(Yi )
1 ρ12 ··· ρ1n
ρ
21 1 ··· ρ2n
Corr(Yi ) =
.. .. .. ..
.
. . .
ρn1 ρn2 ··· 1
• Example: Consider Exposure to lead during infancy is associated
with substantial deficits in tests of cognitive ability which is treated
with Chelation usually requiring injections and hospitalization. 100
children randomized to placebo or Succimer to examining changes in
blood lead level during course of treatment
Covariance Matrix
W0 W1 W4 W6
W0 25.24 22.75 24.26 21.42
W1 22.75 29.82 27.04 23.38
W4 24.26 27.04 33.10 28.22
w6 21.42 23.38 28.22 31.81
• Estimated covariance matrix for the blood lead levels for children in
the placebo group.
W0 W1 W4 W6
W0 1.00 0.83 0.84 0.76
W1 0.83 1.00 0.86 0.76
W4 0.84 0.86 1.00 0.87
W6 0.76 0.76 0.87 1.00
• Estimated correlation matrix for the blood lead levels for children in
the placebo group.
Observations about Correlation in Longitudinal Data
• Empirical observations about the nature of the correlation among
repeated measures in longitudinal studies:
• (i) correlations are positive,
• (ii) correlations decrease with increasing time separation,
• (iii) correlations between repeated measures rarely ever approach
zero, and
• (iv) correlation between a pair of repeated measures taken very
closely together in time rarely approaches one.
Consequences of Ignoring Correlation
• Potential impact of ignoring correlation can be illustrated using data
from the Treatment of Lead-Exposed Children Trial.
• For simplicity, consider only the first two repeated measures, taken
at week 0 and week 1.
• It is of interest to determine the change in the mean response over
time.
• An estimate of change is given by δ̂ = µ̂2 − µ̂1
1
PN
• Where µ̂j = N j=1 Yij
Consequences of Ignoring Correlation...
• In the TLC trial, the estimate of change in the succimer group is
-13:0 (or 13.5 - 26.5).
• For inferences, we also need a standard error.
PN
• Variance Var(δ̂) = Var( N1 i=1 (Yi2 − Yi1 )) = 1
N
(σ12 + σ22 − 2σ12 )
• Note: The term, -2σ12 , accounts for the correlation among the
repeated measures.
• Substituting estimates of the variances and covariance into this
expression:
1
Var(δ̂) = 50
(25.2 + 58.9 − 2 ∗ 15.5) = 1.06
Consequences of Ignoring Correlation...
• What if we had ignored that the data are correlated and proceeded
with an analysis assuming all observations are independent?
• Independence ⇒ zero covariance.
1
• Leading to (incorrect) estimate of the variance of δ̂ = 50
(25.2 + 58.9)=
1.68
• which is approximately 1.6 times larger.
Consequences of Ignoring Correlation...
• In this illustration, ignoring the correlation results in:
• standard errors that are too large
• confidence intervals that are too wide
• p-values for the test of H0: δ = 0 that are too large
• In general, failure to take account of the correlation (covariance)
among the repeated measures will result in incorrect estimates of the
sampling variability and can lead to quite misleading scientific
inferences.
• Correlation and heterogeneous variability must be accounted for in
order to obtain valid inferences about change in response over time.
Cross-sectional versus Longitudinal Data
• Cross sectional data refers to the data collected at a specific point of
time
• Longitudinal data refers to measurements made repeatedly over time
to study how the subjects evolve over time.
• The concern of longitudinal data analysis is change over time.
• Observations from cross sectional data are uncorrelated whereas in
longitudinal study, the measurements made for subjects over a period
of time(cluster) are correlated.
• Repeated measures are obtained when a response is measured
repeatedly on a set of units
Consider the relation between resp. Y and Age
Figure: Y versus Age Cross sectional Case
• The graph suggests a negative relation between Y and age
• Actually the measurement was made on 7 subject at 2 time occasions
Consider the relation between resp. Y and Age...
Figure: Y versus Age when longitudinal
• Are we now still inclined to conclude that there is a negative relation between
Y and Age?
• Here is how longitudinal important in showing how subject specific evolves
over time(over subject- and within subject +)
Consider the relation between resp. Y and Age...
Figure: Y versus Age when longitudinal
• The graph now suggests the cross-sectional as well as longitudinal trend to be
negative
Consider the relation between resp. Y and Age...
• For both of the longitudinal and cross sectional plots exactly the
same observations could also have been obtained in a longitudinal
study, with 2 measurements per subject.
• Conclusions: Longitudinal data allow to distinguish differences
between subjects from changes within subjects
LDA data
• Units of measurements are:
• Subjects, patients, participants, ...
• Animals, plants, ...
• Clusters: families, towns, branches of a company,...
LDA data View
• Example 1: Height of infants measured after birth at Jimma
University Specialized Hospital
• Example 2: CD4 measurement measured for HIV infected patients at
Jimma University for the progression of HIV in the patients
• Example 3: Measures of blood lead level at baseline, 1, 4 and 6 weeks
of the treatment of lead-exposed children trial
Simple Methods for Analyzing Longitudinal
Data
Simple Methods LDA
• The reason why classical techniques fail in the context of
longitudinal data is that observationswithin subjects are correlated
• In many cases the correlation between two repeated measurements
decreases as the time span between two repeated measurements
increases. E.g: Growth curve for 5 repeated measurements
1.00 0.95 0.96 0.93 0.87
0.95 1.00 0.97 0.96 0.89
• 0.96 0.97 1.00 0.98 0.94
0.93 0.96 0.98 1.00 0.98
0.87 0.89 0.94 0.98 1.00
Simple Methods LDA...
• A correct analysis should account for this
• The most simple Example of longitudinal data is paired observations
• The paired test accounts for this by considering subject-specific
differences: ∆i = Yi1 − Yi2
• This reduces the number of measurements to just one per subject,
which implies that classical techniques can be applied.
Simple Methods...
• In the case of more than 2 measurements per subject, similar simple
techniques are often applied to reduce the number of measurements
for the ith subject, from ni to 1
• Some examples:
1 Analysis at each time point separately
2 Analysis of Area Under the Curve (AUC)
3 Analysis of endpoints
4 Analysis of increments
5 Analysis of covariance
Analysis at Each Time Point
• The data are analyzed at each occasion separately (comparisons ate
each time points independent t-test)
• t1 , t2 , ..., tj
• Advantages: Simple to interpret and uses available data
• Disadvantages: Does not consider ’overall’ differences,Does not allow
to study evolution differences and problem of multiple testing
Analysis of Area Under the Curve
• For each subject, the area under its curve is
calculated:AUCi = (ti2 − ti1 )(yi1 + yi2 )/2 + (ti3 − ti2 )((yi2 + yi3 )/2 + ....
• Advantages:
1 No problems of multiple testing
2 Does not explicitly assume balanced data
3 Compare ’Overall’ differences
• Disadvantage: Uses only partial information : AUCi
Analysis of Endpoints
• In randomized studies(treatments assigned randomly), there are no
systematic differences at baseline. Hence, ’treatment’ effects can be
assessed by only comparing the measurements at the last occasion
• Advantages:
1 No problem of multiple testing
2 Does not explicitly assume balanced data
• Disadvantage:
1 Uses only partial information :yini
2 Only valid for large data sets
Analysis of Increments
• A simple method to compare evolutions between subjects , correcting
for differences at baseline , is to analyze the subject specific changes
yini − yi1 .
• Paired t-test
• Advantages:
1 No problem of multiple testing
2 Does not explicitly assume balanced data
• Disadvantages: Uses only partial information :yini − yi1
Analysis of Covariance
• correcting for differences at baseline, is to use analysis of covariance
techniques where the first measurement is included as covariate in
the model.
• Advantages:
1 No problem of multiple testing
2 Does not explicitly assume balanced data
• Disadvantages: Uses partial information and Does not take into
account the variability of yi1
General Disadvantage of Simple methods
• This way, the analysis of longitudinal data is reduced to the analysis
of independent observations, for which classical statistical
procedures are available
• However,all these methods have the disadvantage that lot of
information is lost
• Further, they often do not allow to draw conclusions about the way
the end points have been reached.
Exploratory Data Analysis
Exploratory Data Analysis
• Exploratory analysis comprises techniques to visualize patterns in
the data.
• Data analysis must begin by making displays that expose patterns
relevant to the scientific question.
• The best methods are capable of uncovering patterns which are
unexpected
• In this regard graphical displays are so important
1 Highlight aggregate patterns of scientific interest
2 Identify both cross sectional and longitudinal patterns
3 Make identification of unusual individuals and unusual observations
Exploratory Data Analysis...
• Most longitudinal studies address the relationship of a response with
explanatory variables, often including time
• A LMM also makes assumptions about:
1 Individual(subject specific) profiles: linear, quadratic, ...
2 The average evolution:(non-)linear, covariates,...
3 The variance function: constant, quadratic, ...
4 The correlation structure: constant, serial, ...
• Data exploration is a very helpful tool in the selection of appropriate models.
Exploring Individual Profile
Figure: Blood lead level Profile for children treated with succimer
• Helps to identify how individual subject(cluster)evolves over time
• Suggest different intercepts and slopes for time
• Linear with non linear or quadratic
Exploring the Mean Structure
• The average values are computed at each point separately and
connected, for balanced data.
• If the data is not balanced loess smoothing will be used instead.
• This will give idea as to how the profile for a number of the
population evolves over time.
• The results of this exploration will be useful in order to choose a
fixed-effects structure for the LMM
Exploring Mean Structure..
Figure: Blood lead level mean structure
• Shows how the average blood lead level evolves for fixed effects over time
• suggests Linear with few quadratic fixed effects
Modelling the Mean: Analysis of Response Profiles
• Basic idea: Compare groups of subjects in terms of mean response
profiles over time.
• Useful for balanced longitudinal designs and when there is a single
categorical covariate (perhaps denoting different treatment or
exposure groups).
• Analysis of response profiles can be extended to handle more than a
single group factor.
• Analysis of response profiles can also handle missing data.
• Example: Consider Children randomized to placebo or Succimer and
measures of blood lead level at baseline, 1, 4 and 6 weeks with mean
response profile.
Mean Profile
Figure: Mean blood lead levels at baseline, week 1, week 4, and week 6 in the
succimer and placebo groups
Hypotheses concerning response profiles
Given a sequence of n repeated measures on a number of distinct groups
of individuals, three main questions:
1 Are the mean response profiles similar in the groups, in the sense
that the mean response profiles are parallel? question of group with
time interaction effect.
2 Assuming mean response profiles are parallel, are the means
constant over time, in the sense that the mean response profiles are
flat? question of time effect.
3 Assuming that the population mean response profiles are parallel,
are they also at the same level in the sense that the mean response
profiles for the groups coincide? question of group effect
Note: For many longitudinal studies, especially longitudinal clinical
trials, main interest is group with time interaction effect.
Exploring the mean Structure
• In many studies true underlying mean response process changes over
time in a relatively smooth, monotonically increasing or decreasing
pattern.
• Fitting parsimonious models for mean response results in statistical
tests of covariate effects (e.g., treatment with time interactions) with
greater power than in analysis of response profiles.
• Simplest possible curve for describing changes in the mean response
over time is a straight line and the slope has direct interpretation in
terms of a constant rate of change in mean response for a single unit
change in time
Exploring the mean Structure...
• When changes in the mean response over time are not linear,
higher-order polynomial trends can be considered.
• For example, if the means are monotonically increasing or decreasing
over the course of the study, but in a curvilinear way, a model with
quadratic trends can be considered.
• In a quadratic trend model the rate of change in the mean response is
not constant but depends on time. Rate of change must be expressed
in terms of two parameters.
Exploring the mean Structure...
• When changes in the mean response over time are not linear,
higher-order polynomial trends can be considered.
• For example, if the means are monotonically increasing or decreasing
over the course of the study, but in a curvilinear way, a model with
quadratic trends can be considered.
• In a quadratic trend model the rate of change in the mean response is
not constant but depends on time.
• Rate of change must be expressed in terms of two parameters.
Exploring the mean Structure...
• If there is the suspecting collinearity of the response it is advisable to
consider centering time
• Linear splines: If simplest curve is a straight line, then one way to
extend the curve is to have sequence of joined line segments that
produces a piecewise linear pattern.
• and there is also consideration of Constant Effect
Exploring the variance Structure
• The variance function is given by: σ 2 (t) = E(Y(t) − µ(t))2
• Hence, an estimate for σ 2 (t) can be obtained from applying any of the
techniques described for exploring the mean structure to squared
residuals r2ij
• When we estimate the covariance matrix without making any
particular assumption about the covariance structure, we say that we
are using an unrestricted or unstructured covariance matrix.
• As we shall see later, it is sometimes advantageous to model the
covariance structure more parsimoniously
Exploring variance Structure..
Figure: The variance Structure of blood lead level
• Shows how the variance blood lead level evolves for fixed effects over time
• suggests linear with certain quadratic variance effects
Modelling the Covariance
• Longitudinal data present two aspects of the data that require
modelling: mean response over time and covariance.
• Although these two aspects of the data can be modelled separately,
they are interrelated.
• Choice of models for mean response and covariance are
interdependent.
• A model for the covariance must be chosen on the basis of some
assumed model for the mean response.
• Recall: Covariance between any pair of residuals, say [Yij − µij (β)]
and [Yik − µik (β)], depends on the model for the mean, i.e., depends on
β.
Modelling the Covariance
Three broad approaches can be distinguished:
1 " Unstructured" or arbitrary pattern of covariance
2 Covariance pattern models
3 Random effects covariance structure
Unstructured Covariance
• Appropriate when design is "balanced" and number of measurement
occasions is relatively small.
• No explicit structure is assumed other than homogeneity of
covariance across different individuals, Cov(Yi ) = Σi = Σ.
• Chief advantage: no assumptions made about the patterns of
variances and covariances.
• Potential drawbacks: When number of covariance parameters is
large with number of times, relative to sample size, is likely to be
very unstable.
Covariance Pattern Models
• When attempting to impose some structure on the covariance, a
indirect balance needs to be reached.
• With too little structure there may be too many parameters to be
estimated with limited amount of data.
• With too much structure, potential risk of model miss specification
and misleading inferences concerning β.
• Classic tradeoff between bias and variance.
• Covariance pattern models have their basis in models for serial
correlation originally developed for time series data
Compound Symmetry
• Assumes variance is constant across occasions, say σ 2 , and
Corr(Yij ; Yik ) = ρ for all j and k.
• Cov(Yi ) = σ 2 ∗ correlationmatrixof (ρ)
• Parsimonious: two parameters regardless of number of measurement
occasions.
• Strong assumptions about variance and correlation are usually not
valid with longitudinal data
Toeplitz
• Assumes variance is constant across occasions, say σ 2 , and
Corr(Yij ; Yi;j+k ) = ρk for all j and k.
• Cov(Yi ) = σ 2 ∗ correlationmatrixof (ρk )
• Assumes correlation among responses at adjacent measurement
occasions is constant, ρ1 .
• Toeplitz only appropriate when measurements are made at equal (or
approximately equal) intervals of time.
• A special case of the Toeplitz covariance is the (first-order)
autoregressive covariance.
Toeplitz
• Assumes variance is constant across occasions, say σ 2 , and
Corr(Yij ; Yi;j+k ) = ρk for all j and k.
• Only appropriate when the measurements are made at equal (or
approximately equal) intervals of time.
• Compound symmetry, Toeplitz and autoregressive covariances
assume variances are constant across time.
• This assumption can be relaxed by considering versions of these
models with heterogeneous variances, Var(Yij ) = σj2 .
Exponential
• When measurement occasions are not equally-spaced over time,
autoregressive model can be generalized as follows.
• Let {ti1 ; . . . ; tin } denote the observation times for the ith individual
and assume that the variance is constant across all measurement
|t −tik |
occasions, say σ 2 , and Corr(Yij ; Yik ) = ρj ij ; for ρ ≥ 0.
• Correlation between any pair of repeated measures decreases
exponentially with the time separations between them.
Exponential
• Referred to as "exponential" covariance because it can be
re-expressed as
Cov(Yij ; Yik ) = σ 2 ρ|tij tik | = σ 2 exp(−θ|tij − tik |) ; where θ = -log(ρ) or ρ =
exp(-θ) for θ ≥ 0.
• Exponential covariance model is invariant under linear
transformation of the time scale.
• If we replace tij by (a+btij ) (e.g., if we replace time measured in
"weeks" by time measured in "days"), the same form for the
covariance matrix holds
• Note: Choice of models for covariance and mean are interdependent
Exploring the Correlation Structure
• For balanced longitudinal data, the correlation structure can be
studied through the correlation matrix, or a scatterplot matrix
Exploring Correlation Structure..
Figure: The correlation Structure
• Shows how the correlation utility value evolves for fixed effects over time
• suggests constant correlation structure
Exploring the Random Effects...
• The first step in the model building process for a linear mixed-effects
model, after the functional form of the model has been decided, is
choosing which parameters in the model, if any, should have a
random-effect component included to account for between-group
variation.
Model for Longitudinal Data
Notation of General Linear Model
• Previously, we assumed a sample of N subjects are measured
repeatedly at n occasions.
• Either by design or happenstance, subjects may not have same
number of repeated measures or be measured at same set of
occasions.
• We assume there are ni repeated measurements on the ith subject
and each Yij is observed at time tij .
Notation of General Linear Model
• We can group the response variables for the ith subject into a ni ∗ 1
vector:
Yi1
Yi2
Yi = .
..
Yini
• Associated with Yij there is a p*1 vector of covariates
Xij1
Xij2
Xij = .
..
Xijni
For j=1,2,...,ni i = 1,2,...,N
• Note: Information about the time of observation, treatment or
exposure group, and other predictor and confounding variables can
be expressed through this vector of covariates.
Notation of General Linear Model
• We can group the vectors of covariates into a ni × p matrix:
Xi11 Xi12 ··· Xi1p
X
i21 Xi22 ··· Xi2p
Xi =
.. .. .. ..
.
. . .
Xini 1 Xini 2 ··· Xini p
• Xi is simply an ordered collection of the values of the p covariates for
the ith subject at the ni occasions.
Linear Models for Longitudinal Data
• Throughout this course we consider linear regression models for
changes in the mean response over time:
Yij = β1 Xij1 + β2 Xij2 · · · βp Xijp + eij
• where, β1 , β2 + ...βp are unknown regression coefficients.
• The eij are random errors, with mean zero, and represent deviations
of the Yij ’s from their means,
E(Yij |Xij ) = β1 Xij1 + β2 Xij2 · · · βp Xijp
• E(Yij |Xij ) = Xi β
Model Assumptions
1 The individuals represent a random sample from the population of
interest.
2 The elements of the vector of repeated measures Yi1 ; ...; Yin , have a
Multivariate Normal (MVN) distribution,
3 Observations from different individuals are independent, while
repeated measurements of the same individual are not assumed to be
independent.
• The covariance matrix of the vector of observations, Yi1 ; ...; Yin , is
denoted Σ and its elements are σjj0
Probability Distribution
• A probability distribution describes the likelihood or relative
frequency of occurrence of particular values of the response (or
dependent) variable.
• Recall single response variable(Yi ) having normal probability density
in standard linear regression model is:
..
(y −X β +X β ,.,X β )2
f (yi |Xi1 , Xi2 , . . . , Xip ) = √ 1 exp{− i i1 1 2σi22 2 ip p }
2πσ 2
• With repeated measures we have a vector of response variables and
must consider joint probability models for the entire vector of
responses.
• The Multivariate Normal Distribution is an extension of the Normal
distribution for a single response to a vector of responses.
Multivariate Normal Distribution
• The multivariate normal probability density function for
0
Yi = (Yi1 ; Yi2 ; . . . ; Yin ) has the following representation:
n 1 0
f (Yi |Xi ) = 2π − 2 |Σ|− 2 exp[−(Yi − Xi β) Σ−1 (Yi − Xi β]
• Note that f (Yi |Xi ) describes the probability or relative frequency of
occurrence of a particular set of values of (Yi1 ; Yi2 ; . . . ; Yin ).
Multivariate Normal Distribution
0
• f (Yi |Xi ) depends to a very large extent on (Yi − Xi β) Σ−1 (Yi − Xi β]
• Note that f (Yi |Xi ) describes the probability or relative frequency of
occurrence of a particular set of values of (Yi1 ; Yi2 ; . . . ; Yin ).
Model For Longitudinal Data
• In practice we have unbalanced data which is unequal number of
measurements per subject atdifferent time points
• The model the we use should account this differences
• Often, subject-specific longitudinal profiles can be well approximated
by linear regression functions
• The model which accounts this subject specific difference and
unbalanced data is know to be the Linear mixed model(LMM)for
continuous response cases
• Let see some concepts before we proceed to the modeling approaches
Linear Mixed Models
• Linear mixed models(LMM) are models that handle data where
observations are not independent.
• LMM correctly models correlated errors, whereas procedures in the
general linear model family(GLM)usually do not.
• GLM includes such procedures as t-tests, analysis of variance,
correlation, regression, and factor analysis, to name a few
• LMM can be considered as a further generalization of GLM to better
support analysis of a continuous response. Mixed models contain
both fixed and random effects.
• Fixed Effects: factors for which the only levels under consideration
are contained in the coding of those effects
Linear Mixed Models...
• Random Effects: factors for which the levels contained in the coding
of those factors are a random sample of the total number of levels in
the population for that factor
• Hierarchical effects: Hierarchical designs have nested effects. Nested
effects are those with subjects within groups
• For instance, patients are nested within doctors and doctors are
nested within hospitals.
• Repeated measures: where observations are correlated rather than
independent (ex., before-after studies, time series data,
matched-pairs designs)
Linear Mixed Models...
• Fixed effect parameters are commonly used in linear and generalized
linear models.
• In contrast, a random effect is a random variable. It does not make
sense to estimate a random effect; instead, we try to estimate the
parameters that describe the distribution of this random effect.
• In some analysis, random effects are used simply to induce a certain
correlation structure in the data and there is sense in which the
chosen levels represent a sample from a population
Formulation of Linear Mixed Models
• The 2-stage model formulation
• In practice: often unbalanced data:
1 Unequal number of measurements per subject
2 Measurements not taken at fixed time points
• Therefore, multivariate regression techniques are often not
applicable
• Often, subject-specific longitudinal profiles can be well approximated
by linear regression functions
• This leads to a 2-stage model formulation:
The 2-Stage Model Formulation
• Stage 1: Linear regression model for each subject separately
(different parameters with different subjects)
• Stage 2: Explain variability in the subject-specific regression
coefficients using known covariates (over all evolution with known
covariates)
Stage 1
• Response Yij for ith subject, measured at time tij , i = 1, . . . , N, j=1, . .
. ,ni
0
• Response vector Yi for ith subject:Yi =(Yi1 , Yi2 , ..., Yini )
• Yi = ZT
i βi + εi
• Where;Zi is ni xq matrix of known covariates,βi is a q-dimensional
vector of subject-specific regression coefficients
• εi ∼ N(0, Σi ); whereΣi = σ 2 Ini
• Note that the above model describes the observed variability within
subjects
Stage 2
• Between-subject variability can now be studied from relating the βi
to known covariates
• Stage 2 model
• β i = KT
i β + bi
• Where;Ki is qxp matrix of known covariates,β is a p-dimensional
vector of unknown regression parameters
• bi ∼ N(0, D)
The General Linear Mixed-effects Model
• A 2-stage approach can be performed explicitly in the analysis
• However, this is just another example of the use of summary
statistics:
• Yi is summarized by βi
• Summary statistics βi analyzed in second stage
• The associated drawbacks can be avoided by combining the two
stages into one model:
The General Linear Mixed-effects Model...
Yi = Zi βi + εi
•
βi = Ki β + bi ⇒ Yi = Zi Ki (Xi )β + Zi bi + εi
| {z }
• General linear mixed-effects model
Yi = Xi β + Zi bi + εi
• bi ∼ N(0, D), εi ∼ N(0, Σi )
b1 , b2 , ..., bN , ε1 , ε2 , ..., εN independent
• Fixed effects: β
• Random effects: bi
• Variance components: elements in D and Σi
Hierarchical versus Marginal Model
• The LMM is given by:
Yi = Xi β + Zi bi + εi
bi ∼ N(0, D), εi ∼ N(0, Σi )
b1 , b2 , ..., bN , ε1 , ε2 , ..., εN independent
• It can be rewritten as: Yi |bi ∼ N(Xi β + Zi bi , Σi ), bi ∼ N(0, D)
• It is therefore also called a hierarchical model:
1 A model for Yi given bi
2 A model for bi
0
• Marginally, we have that Yi is distributed as: Yi ∼ N(Xi β, Zi DZi + Σi )
Hierarchical versus Marginal Model...
• Hence, very specific assumptions are made about the dependence of
mean and covariance on the covariates Xi and Zi :
1 Implied mean : Xi β
0
2 Implied covariance: Vi = Zi DZi + Σi
• Note that the hierarchical model implies the marginal one, NOT vice
versa.
Components of the Linear Mixed Effects Model
• The Mean Structure: The Xi β part stands for fixed effects. where Xi
is a set of covariates used for modeling the response.
• The Random Effects: bi
• Variance components: elements in D and Σi
Estimation of Marginal Model
• Recall that the general linear mixed model equals
• Yi = Xi β + Zi bi + εi
• Where;bi ∼ N(0, D), εi ∼ N(0, Σi ) which are independent
0
• The marginal model is given by: Yi ∼ N(Xi β, Zi DZi + Σi )
• Note that inferences based on the marginal model do not explicitly
assume the presence of random effects representing the natural
heterogeneity between subjects
Estimation of Marginal Model....
• Notations:
• β : vector of fixed effects
• α: vector of all variance components in D and Σi
0 0 0
• θ = (β , α )
• Marginal likelihood function:
QN −ni −1 0
• LML (θ) = i=1 {(2Π)
2 |Vi (α)| 2 exp( −1
2
(Yi − Xi β) Vi (α)−1 )(Yi − Xi β)}
• If α were known, MLE of β equals
0 0
β̂(α) = ( N −1 PN
P
i=1 Xi Wi Xi ) i=1 Xi Wi Yi
• where Wi equals V−1
i .
Estimation of Marginal Model....
• In most cases, α is not known, and needs to be replaced by an
estimate α̂
• Two frequently used estimation methods for α:
1 Maximum likelihood
2 Restricted maximum likelihood
Maximum Likelihood Estimation
• α̂ML obtained from maximizing LML (α, β̂(α)) with respect to α
• The resulting estimate β̂(α̂ML ) for β will be denoted by β̂ML
• α̂ML and β̂ML can also be obtained from maximizing LML (θ) with
respect to θ, i.e., with respect to α and β simultaneously.
Restricted Maximum Likelihood Estimation(REML)
• Recall Variance Estimation in Normal Populations
• Consider a sample of N observations Y1 , ..., YN from N(µ, σ 2 )
(Yi −µ)2
• For known µ, MLE of σ 2 equals: σ̂ 2 =
P
i N
• σ̂ 2 is unbiased for σ 2
(Yi −Ȳ)2
• When µ is not known, MLE of σ 2 equals: σ̂ 2 =
P
i N
• Note that σ̂ 2 is biased for σ 2 : E(σ̂ 2 ) = N−1 2
N
σ
Restricted Maximum Likelihood Estimation(REML)..
• The bias expression tells us how to derive an unbiased estimate:
S2 = − Ȳ)2 /(N − 1)
P
i (Yi
• Apparently, having to estimate µ introduces bias in MLE of σ 2
• How to estimate σ 2 , without estimating µ first?
• The model for all data simultaneously:Y ∼ N(µ, σ 2 IN )
Restricted Maximum Likelihood Estimation(REML)..
• We transform Y such that µ vanishes from the likelihood:
0 0
U = A Y ∼ N(0, σ 2 A A)
• MLE of σ 2 , based on U, equals: S2 = 1
− Ȳ)2
P
N−1 i (Yi
• A defines a set of N-1 linearly independent error contrasts
• S2 is called the REML estimate of σ 2 , and S2 is independent of A
Restricted Maximum Likelihood Estimation(REML)..
• Consider Estimation of Residual Variance in Linear Regression
Model
• Consider a sample of N observations Y1 , ..., YN from a linear
regression model:Y ∼ N(Xβ, σ 2 I)
0
• MLE of σ 2 : σ̂ 2 = (Y − Xβ̂) (Y − Xβ̂)/N
N−p 2
• Note that σ̂ 2 is biased for σ 2 : E(σ̂ 2 ) = N
σ
Restricted Maximum Likelihood Estimation(REML)..
• The bias expression tells us how to derive an unbiased estimate:
0
MSE = (Y − Xβ̂) (Y − Xβ̂)/N − p
• The MSE can also be obtained from transforming the data orthogonal
to X:
0 0
U = A Y ∼ N(0, σ 2 A A)
• The MLE of σ 2 , based on U, now equals the mean squared error, MSE
• The MSE is again called the REML estimate of σ 2
REML for the Linear Mixed Model
• We first combine all models: Yi ∼ N(Xi β, Vi ) into one model
Y ∼ N(Xβ, V) where Y is Nx1 vector, X is Nxp and V is NxN matrix
of covariates and variances
• Again, the data(Y) are transformed orthogonal to X:
0 0
U = A Y ∼ N(0, A V(α)A)
REML for the Linear Mixed Model...
• The MLE of α, based on U is called the REML estimate, and is
denoted by α̂REML
• The resulting estimated β̂α̂REML for β will be denoted by β̂REML
• α̂REML and β̂REML can also be obtained from maximizing
P 0 −1
LREML (θ) = | i Xi Wi (α)Xi | 2 LML (θ) with respect to θ, i.e., with
respect to α and β simultaneously.
• LREML (α, β̂(α)) is the likelihood of the error contrasts U, and is often
called the REML likelihood function.
• Note that LREML (θ) is NOT the likelihood for our original data Y
Inference for the Marginal Model
• Inference for fixed effects:
1 Wald test
2 t-test and F-test
3 Robust inference
4 LR test
• Inference for variance components:
1 Wald test
2 LR test
• Information criteria
Inference for the Fixed Effects
0 0
P −1 P
• Estimate for β: β̂(α) = ( i (Xi Wi (α)Xi ) i (Xi Wi (α)Yi with α
replaced by its ML or REML estimate
• Conditional on α, β̂(α) is multivariate normal with mean β and
covariance
P 0 P 0 P 0
Var(β̂)= ( i (Xi Wi Xi )−1 ( i (Xi Wi Var(Yi )Wi Xi )( i (Xi Wi Xi )−1
P 0
=( i (Xi Wi Xi )−1 Where; Var(Yi ) = Wi−1
• In practice one again replaces α by its ML or REML estimate
Approximate Wald Test
• For any known matrix L, consider testing
H0 : Lβ = 0, VsHA : Lβ 6= 0
• Wald test statistic:
0 0 0 0
−1
L ]−1 Lβ̂
P
G = β̂ L [L(( i (Xi Wi Xi )
• Asymptotic null distribution of G is χ2 with rank(L) degrees of
freedom
Approximate t-test and F-test
0
P −1
• Wald test based on Var(β̂)= ( i (Xi Wi Xi )
• Variability introduced from replacing α by some estimate is not taken
into account in Wald tests
• Therefore, Wald tests will only provide valid inferences in sufficiently
large samples
• In practice, this is often resolved by replacing the χ2 distribution by
an appropriate F-distribution (are the normal by a t)
• For any known matrix L, consider testing
H0 : Lβ = 0, VsHA : Lβ 6= 0
Approximate t-test and F-test...
• F test statistic:
0 0 0 0
β̂ L [L(( i (Xi Wi Xi )−1 L ]−1 Lβ̂
P
F= Rank(L)
• Approximate null-distribution of F is F with numerator degrees of
freedom equal to rank(L)
• Denominator degrees of freedom to be estimated from the data:
Containment method, Satterthwaite approximation, Kenward and
Roger approximation. . . .
Approximate t-test and F-test...
• In the context of longitudinal data, all methods typically lead to large
numbers of degrees of freedom, and therefore also to very similar
p-values.
• For univariate(single parameter) hypotheses (rank(L) = 1) the F-test
reduces to a t-test
Likelihood Ratio Test
• Comparison of nested models with different mean structures, but
equal covariance structure
• Null hypothesis of interest equals H0: β ∈ Θβ,0 , for some subspace
Θβ,0 of the parameter space Θβ of the fixed effects β
• Notation:
LML : maximum likelihood function
θ̂ML,0 : MLE under H0
θ̂ML : MLE under general model
Likelihood Ratio Test
• Comparison of nested models with different mean structures, but
equal covariance structure
• Test statistic:
LML (θ̂ML,0 )
−2lnλN = −2ln[ LML (θ̂ML )
]
• Asymptotic null distribution: χ2 with d.f. equal to the difference in
dimension of Θβ and Θβ,0
• LR tests for the mean structure are not valid under REML which
may resulted in Negative LR test statistic!
Inference for the Variance Components
• Inference for the mean structure is usually of primary interest.
• However, inferences for the covariance structure is of interest as well:
• interpretation of the random variation in the data
• over parameterized covariance structures lead to inefficient
inferences for mean
• too restrictive models invalidate inferences for the mean structure
Approximate Wald Test
• Asymptotically, ML and REML estimates of α are normally
distributed with correct mean and inverse Fisher information matrix
as covariance
Information Criteria
• LR tests can only be used to compare nested models
• How to compare non-nested models ? Likely hood value
• The general idea behind the LR test for comparing model A to a more
extensive model B is to select model A if the increase in likelihood
under model B is small compared to increase in complexity
• A similar argument can be used to compare non-nested models A and
B
Information Criteria...
• The model is selected with the highest penalized log-likelihood
` − z(#θ) for some function z(.) of the number θ of parameters in the
model.
• Different functions z(.) lead to different criteria:
• Where; n∗ = N =
P
i ni Under ML and N-p under REML
Criterion Definition of z(.)
Akaike (AIC) z(#θ) = #θ
BIC z(#θ) = (#θln(n∗ ))
Schwarz (SBC) z(#θ) = (#θln(n∗ ))/2
Hannan and Quinn (HQIC) z(#θ) = (#θln(ln(n∗ )))
Bozdogan (CAIC) z(#θ) = (#θln(n∗ + 1))/2
Model Building Strategy