0% found this document useful (0 votes)
170 views114 pages

LDA Part I

LDA

Uploaded by

Yasin Ahmed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
170 views114 pages

LDA Part I

LDA

Uploaded by

Yasin Ahmed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 114

Longitudinal Data Analysis

Aboma Temesgen(Assistant Professor of Biostatistics )

Department of Statistics
College of Computing and Informatics
Haramaya University

aboma.temesgen@gmail.com
©June 2021
Contents

1 Longitudinal Study Design

2 Features of Longitudinal Data

3 Cross - sectional versus Longitudinal Data

4 Exploratory Data Analysis

5 A Model for Longitudinal Data

Model Estimation Techniques


Longitudinal Study Design

• The statistical techniques like analysis of variance and regression

have a basic assumption that the residual or error terms are


independently and identically distributed (iid)

• Many types of studies, however, have designs which imply gathering

repeated measurement data that are dependent groups or clusters

• Familiar examples of clusters are families, towns, litters, branches of

a company or schools.

• In each of these examples, a cluster is a collection of sub-units on

which observations are made


Introduction Cont...

• Another usual form of clustering arises when data are measured

repeatedly on the same unit

• When these repeated measurements are taken over time, it is called

a longitudinal study or, in some applications, a panel study.


Longitudinal data are special forms of repeated measurements

• The methods of analysis used in LDA relax the independence

assumption and take into account more complicated data structure

• LDA methods take into account(relax) the interdependence of

observations which is due to repeated measurement or clustering


Introduction Cont...

• Failure to account for the effect of correlation can result in erroneous

estimation of the variability of parameter estimates, and hence in


misleading inference.

• This interdependence of observations can be modeled using mixed

models

• This Mixed models allow us to specify a pattern for the correlation

between these measurements. By doing so, we may obtain more


precise estimates and wider inferences can be made.

• It was the combination of fixed and random effects


Features of Longitudinal Data

• Defining feature of longitudinal studies is that measurements of the

same individuals are taken repeatedly through time.

• Longitudinal studies allow direct study of change over time.

• Objective: primary goal is to characterize the change in response over

time and the factors that influence change.

• With repeated measures on individuals, we can capture

within-individual change.

• Note: measurements in a longitudinal study are commensurate, i.e.,

the same variable is measured repeatedly.


Features Cont...

• By comparing each individual’s responses at two or more occasions, a

longitudinal analysis can remove extraneous, but unavoidable,


sources of variability among individuals.

• This eliminates major sources of variability or "noise" from the

estimation of within-individual change.

• Complications: Repeated measures on individuals are correlated and

variability is often heterogeneous across measurement occasions


Features Cont...

• Longitudinal data require somewhat more sophisticated statistical

techniques because the repeated observations are usually (positively)


correlated.

• Correlation arises due to repeated measures on the same individuals.

• Sequential nature of the measures implies that certain types of

correlation structures are likely to arise.

• Correlation must be accounted for in order to obtain valid inferences.

• Heterogeneous variability must also be accounted for in order to

obtain valid inferences.


Features Cont...

• Correlated data commonly arise in many applications.

• Longitudinal Studies: designs in which the outcome variable is

measured repeatedly over time.

• Repeated Measures Studies: somewhat older terminology applied to

special set of longitudinal designs characterized by measurement at a


common set of occasions (usually in an experimental setting under
different conditions or treatments).
Terminology in LDA

• Individuals or Subjects: Participants in a longitudinal study are

referred to as individuals or subjects.

• Occasions: In a longitudinal study individuals are measured

repeatedly at different occasions or times.

• The number of repeated observations, and their timing, can vary

widely from one longitudinal study to another.

• When number and timing of the repeated measurements are the

same for all individuals, study design is said to be "balanced" over


time.

• Note: Designs can be balanced, although studies may have

incompleteness in data collection


Terminology Cont...

• Correlation: An aspect of longitudinal data that complicates their

statistical analysis is that repeated measures on the same individual


are usually positively correlated.

• This violates the fundamental assumption of independence that is

the cornerstone of many statistical techniques.

• Why are longitudinal data correlated?

• What are the potential consequences of not accounting for correlation

among longitudinal data in the analysis?


Terminology Cont...

• Variability: An additional, although often overlooked, aspect of


longitudinal data that complicates their statistical analysis is
heterogeneous variability.

• That is, the variability of the outcome at the end of the study is often
discernibly different than the variability at the start of the study.

• This violates the assumption of homoscedasticity that is the basis for


standard linear regression techniques.

• Thus, there are two aspects of longitudinal data that complicate their
statistical analysis: (i) repeated measures on the same individual are
usually positively correlated, and (ii) variability is often
heterogeneous across measurement occasions.
Notation

• Let Yij denote the response variable for the ith individual (i = 1;...; N)

at the jth occasion (j = 1; . . . ;n).

• If the repeated measures are assumed to be equally-separated in

time, this notation will be sufficient.

• We can represent the n observations on the N individuals in a two

dimensional array, with rows corresponding to individuals and


columns corresponding to the responses at each occasion.
Data layout

ocassion
ID 1 2 3 . . n
1 y11 y12 y13 . . y1n
2 y21 y22 y23 . . y24
3 y31 y32 y33 . . y3n
. . . . . . .
. . . . . . .
. . . . . . .
N yN1 yN2 yN3 . . yNn
Vector Notation

• We can group the n repeated measures on the same individual into a

n by 1 response vector

 
Yi1
 
Y 
 i2 
Yi = 
 .. 

 . 
 
Yin

• Alternatively, we can denote the response vectors Yi as


0
Yi = (Yi1 ; Yi2 ; ...; Yin )
Vector Notation

• We denote the expectation or mean of Yij by

µj = E(Yij ); where E(.) can be thought of as a long-run average (over


individuals).

• The mean, µj , provides a measure of the location of the center of the

distribution of Yij

• Variance σj2 = E[Yij − E(Yij )]2 = E[Yij−µj ]2

• Covariance σjk = E[(Yij − µj )(Yik − µk )] is a measure of the linear

dependence between Yij and Yik


E[(Yij −µj )(Yik −µk )]
• Correlation ρjk = σj σ k
Vector Notation

0
• For the vector of repeated measures,Yi = (Yi1 ; Yi2 ; ...; Yin ) , we define

the variance-covariance matrix, Cov(Yi ),


 
Var(Yi1 ) Cov(Yi1 , Yi2 ) ··· Cov(Yi1 , Yin )
 
Cov(Y , Y )
 i1 i2 Var(Yi2 ) ··· Cov(Yi2 , Yin )

Cov(Yi ) = 
 .. .. .. ..



 . . . . 

Cov(Yin , Yi1 ) Cov(Yin , Yi2 ) ··· Var(Yin )

 
σ12 σ12 ··· σ1n
 
σ
 21 σ22 ··· σ2n 

=
 .. .. .. .. 

 .
 . . . 

σn1 σn2 ··· σn2
Vector Notation

• We can also define the correlation matrix, Corr(Yi )

 
1 ρ12 ··· ρ1n
 
ρ
 21 1 ··· ρ2n 

Corr(Yi ) = 
 .. .. .. .. 

 .
 . . . 

ρn1 ρn2 ··· 1

• Example: Consider Exposure to lead during infancy is associated

with substantial deficits in tests of cognitive ability which is treated


with Chelation usually requiring injections and hospitalization. 100
children randomized to placebo or Succimer to examining changes in
blood lead level during course of treatment
Covariance Matrix

W0 W1 W4 W6
W0 25.24 22.75 24.26 21.42
W1 22.75 29.82 27.04 23.38
W4 24.26 27.04 33.10 28.22
w6 21.42 23.38 28.22 31.81
• Estimated covariance matrix for the blood lead levels for children in
the placebo group.

W0 W1 W4 W6
W0 1.00 0.83 0.84 0.76
W1 0.83 1.00 0.86 0.76
W4 0.84 0.86 1.00 0.87
W6 0.76 0.76 0.87 1.00
• Estimated correlation matrix for the blood lead levels for children in
the placebo group.
Observations about Correlation in Longitudinal Data

• Empirical observations about the nature of the correlation among

repeated measures in longitudinal studies:

• (i) correlations are positive,

• (ii) correlations decrease with increasing time separation,

• (iii) correlations between repeated measures rarely ever approach

zero, and

• (iv) correlation between a pair of repeated measures taken very

closely together in time rarely approaches one.


Consequences of Ignoring Correlation

• Potential impact of ignoring correlation can be illustrated using data

from the Treatment of Lead-Exposed Children Trial.

• For simplicity, consider only the first two repeated measures, taken

at week 0 and week 1.

• It is of interest to determine the change in the mean response over

time.

• An estimate of change is given by δ̂ = µ̂2 − µ̂1

1
PN
• Where µ̂j = N j=1 Yij
Consequences of Ignoring Correlation...

• In the TLC trial, the estimate of change in the succimer group is

-13:0 (or 13.5 - 26.5).

• For inferences, we also need a standard error.


PN
• Variance Var(δ̂) = Var( N1 i=1 (Yi2 − Yi1 )) = 1
N
(σ12 + σ22 − 2σ12 )

• Note: The term, -2σ12 , accounts for the correlation among the

repeated measures.

• Substituting estimates of the variances and covariance into this

expression:
1
Var(δ̂) = 50
(25.2 + 58.9 − 2 ∗ 15.5) = 1.06
Consequences of Ignoring Correlation...

• What if we had ignored that the data are correlated and proceeded

with an analysis assuming all observations are independent?

• Independence ⇒ zero covariance.

1
• Leading to (incorrect) estimate of the variance of δ̂ = 50
(25.2 + 58.9)=
1.68

• which is approximately 1.6 times larger.


Consequences of Ignoring Correlation...

• In this illustration, ignoring the correlation results in:

• standard errors that are too large

• confidence intervals that are too wide

• p-values for the test of H0: δ = 0 that are too large

• In general, failure to take account of the correlation (covariance)

among the repeated measures will result in incorrect estimates of the


sampling variability and can lead to quite misleading scientific
inferences.

• Correlation and heterogeneous variability must be accounted for in

order to obtain valid inferences about change in response over time.


Cross-sectional versus Longitudinal Data

• Cross sectional data refers to the data collected at a specific point of

time

• Longitudinal data refers to measurements made repeatedly over time

to study how the subjects evolve over time.

• The concern of longitudinal data analysis is change over time.

• Observations from cross sectional data are uncorrelated whereas in

longitudinal study, the measurements made for subjects over a period


of time(cluster) are correlated.

• Repeated measures are obtained when a response is measured

repeatedly on a set of units


Consider the relation between resp. Y and Age

Figure: Y versus Age Cross sectional Case

• The graph suggests a negative relation between Y and age

• Actually the measurement was made on 7 subject at 2 time occasions


Consider the relation between resp. Y and Age...

Figure: Y versus Age when longitudinal

• Are we now still inclined to conclude that there is a negative relation between
Y and Age?

• Here is how longitudinal important in showing how subject specific evolves


over time(over subject- and within subject +)
Consider the relation between resp. Y and Age...

Figure: Y versus Age when longitudinal

• The graph now suggests the cross-sectional as well as longitudinal trend to be


negative
Consider the relation between resp. Y and Age...

• For both of the longitudinal and cross sectional plots exactly the

same observations could also have been obtained in a longitudinal


study, with 2 measurements per subject.

• Conclusions: Longitudinal data allow to distinguish differences

between subjects from changes within subjects


LDA data

• Units of measurements are:

• Subjects, patients, participants, ...

• Animals, plants, ...

• Clusters: families, towns, branches of a company,...


LDA data View

• Example 1: Height of infants measured after birth at Jimma

University Specialized Hospital

• Example 2: CD4 measurement measured for HIV infected patients at

Jimma University for the progression of HIV in the patients

• Example 3: Measures of blood lead level at baseline, 1, 4 and 6 weeks

of the treatment of lead-exposed children trial


Simple Methods for Analyzing Longitudinal
Data
Simple Methods LDA

• The reason why classical techniques fail in the context of

longitudinal data is that observationswithin subjects are correlated

• In many cases the correlation between two repeated measurements

decreases as the time span between two repeated measurements


increases. E.g: Growth curve for 5 repeated measurements
1.00 0.95 0.96 0.93 0.87
0.95 1.00 0.97 0.96 0.89
• 0.96 0.97 1.00 0.98 0.94
0.93 0.96 0.98 1.00 0.98
0.87 0.89 0.94 0.98 1.00
Simple Methods LDA...

• A correct analysis should account for this

• The most simple Example of longitudinal data is paired observations

• The paired test accounts for this by considering subject-specific

differences: ∆i = Yi1 − Yi2

• This reduces the number of measurements to just one per subject,

which implies that classical techniques can be applied.


Simple Methods...

• In the case of more than 2 measurements per subject, similar simple


techniques are often applied to reduce the number of measurements
for the ith subject, from ni to 1
• Some examples:

1 Analysis at each time point separately

2 Analysis of Area Under the Curve (AUC)

3 Analysis of endpoints

4 Analysis of increments

5 Analysis of covariance
Analysis at Each Time Point

• The data are analyzed at each occasion separately (comparisons ate

each time points independent t-test)

• t1 , t2 , ..., tj

• Advantages: Simple to interpret and uses available data

• Disadvantages: Does not consider ’overall’ differences,Does not allow

to study evolution differences and problem of multiple testing


Analysis of Area Under the Curve

• For each subject, the area under its curve is

calculated:AUCi = (ti2 − ti1 )(yi1 + yi2 )/2 + (ti3 − ti2 )((yi2 + yi3 )/2 + ....

• Advantages:

1 No problems of multiple testing

2 Does not explicitly assume balanced data

3 Compare ’Overall’ differences

• Disadvantage: Uses only partial information : AUCi


Analysis of Endpoints

• In randomized studies(treatments assigned randomly), there are no

systematic differences at baseline. Hence, ’treatment’ effects can be


assessed by only comparing the measurements at the last occasion

• Advantages:

1 No problem of multiple testing

2 Does not explicitly assume balanced data

• Disadvantage:

1 Uses only partial information :yini

2 Only valid for large data sets


Analysis of Increments

• A simple method to compare evolutions between subjects , correcting

for differences at baseline , is to analyze the subject specific changes


yini − yi1 .

• Paired t-test

• Advantages:

1 No problem of multiple testing

2 Does not explicitly assume balanced data

• Disadvantages: Uses only partial information :yini − yi1


Analysis of Covariance

• correcting for differences at baseline, is to use analysis of covariance

techniques where the first measurement is included as covariate in


the model.

• Advantages:

1 No problem of multiple testing

2 Does not explicitly assume balanced data

• Disadvantages: Uses partial information and Does not take into

account the variability of yi1


General Disadvantage of Simple methods

• This way, the analysis of longitudinal data is reduced to the analysis

of independent observations, for which classical statistical


procedures are available

• However,all these methods have the disadvantage that lot of

information is lost

• Further, they often do not allow to draw conclusions about the way

the end points have been reached.


Exploratory Data Analysis
Exploratory Data Analysis

• Exploratory analysis comprises techniques to visualize patterns in


the data.

• Data analysis must begin by making displays that expose patterns


relevant to the scientific question.

• The best methods are capable of uncovering patterns which are


unexpected
• In this regard graphical displays are so important

1 Highlight aggregate patterns of scientific interest

2 Identify both cross sectional and longitudinal patterns

3 Make identification of unusual individuals and unusual observations


Exploratory Data Analysis...

• Most longitudinal studies address the relationship of a response with


explanatory variables, often including time

• A LMM also makes assumptions about:


1 Individual(subject specific) profiles: linear, quadratic, ...
2 The average evolution:(non-)linear, covariates,...
3 The variance function: constant, quadratic, ...
4 The correlation structure: constant, serial, ...

• Data exploration is a very helpful tool in the selection of appropriate models.


Exploring Individual Profile

Figure: Blood lead level Profile for children treated with succimer

• Helps to identify how individual subject(cluster)evolves over time

• Suggest different intercepts and slopes for time

• Linear with non linear or quadratic


Exploring the Mean Structure

• The average values are computed at each point separately and

connected, for balanced data.

• If the data is not balanced loess smoothing will be used instead.

• This will give idea as to how the profile for a number of the

population evolves over time.

• The results of this exploration will be useful in order to choose a

fixed-effects structure for the LMM


Exploring Mean Structure..

Figure: Blood lead level mean structure

• Shows how the average blood lead level evolves for fixed effects over time

• suggests Linear with few quadratic fixed effects


Modelling the Mean: Analysis of Response Profiles

• Basic idea: Compare groups of subjects in terms of mean response

profiles over time.

• Useful for balanced longitudinal designs and when there is a single

categorical covariate (perhaps denoting different treatment or


exposure groups).

• Analysis of response profiles can be extended to handle more than a

single group factor.

• Analysis of response profiles can also handle missing data.

• Example: Consider Children randomized to placebo or Succimer and

measures of blood lead level at baseline, 1, 4 and 6 weeks with mean


response profile.
Mean Profile

Figure: Mean blood lead levels at baseline, week 1, week 4, and week 6 in the
succimer and placebo groups
Hypotheses concerning response profiles

Given a sequence of n repeated measures on a number of distinct groups


of individuals, three main questions:
1 Are the mean response profiles similar in the groups, in the sense
that the mean response profiles are parallel? question of group with
time interaction effect.
2 Assuming mean response profiles are parallel, are the means
constant over time, in the sense that the mean response profiles are
flat? question of time effect.
3 Assuming that the population mean response profiles are parallel,
are they also at the same level in the sense that the mean response
profiles for the groups coincide? question of group effect
Note: For many longitudinal studies, especially longitudinal clinical
trials, main interest is group with time interaction effect.
Exploring the mean Structure

• In many studies true underlying mean response process changes over

time in a relatively smooth, monotonically increasing or decreasing


pattern.

• Fitting parsimonious models for mean response results in statistical

tests of covariate effects (e.g., treatment with time interactions) with


greater power than in analysis of response profiles.

• Simplest possible curve for describing changes in the mean response

over time is a straight line and the slope has direct interpretation in
terms of a constant rate of change in mean response for a single unit
change in time
Exploring the mean Structure...

• When changes in the mean response over time are not linear,

higher-order polynomial trends can be considered.

• For example, if the means are monotonically increasing or decreasing

over the course of the study, but in a curvilinear way, a model with
quadratic trends can be considered.

• In a quadratic trend model the rate of change in the mean response is

not constant but depends on time. Rate of change must be expressed


in terms of two parameters.
Exploring the mean Structure...

• When changes in the mean response over time are not linear,

higher-order polynomial trends can be considered.

• For example, if the means are monotonically increasing or decreasing

over the course of the study, but in a curvilinear way, a model with
quadratic trends can be considered.

• In a quadratic trend model the rate of change in the mean response is

not constant but depends on time.

• Rate of change must be expressed in terms of two parameters.


Exploring the mean Structure...

• If there is the suspecting collinearity of the response it is advisable to

consider centering time

• Linear splines: If simplest curve is a straight line, then one way to

extend the curve is to have sequence of joined line segments that


produces a piecewise linear pattern.

• and there is also consideration of Constant Effect


Exploring the variance Structure

• The variance function is given by: σ 2 (t) = E(Y(t) − µ(t))2

• Hence, an estimate for σ 2 (t) can be obtained from applying any of the

techniques described for exploring the mean structure to squared


residuals r2ij

• When we estimate the covariance matrix without making any

particular assumption about the covariance structure, we say that we


are using an unrestricted or unstructured covariance matrix.

• As we shall see later, it is sometimes advantageous to model the

covariance structure more parsimoniously


Exploring variance Structure..

Figure: The variance Structure of blood lead level

• Shows how the variance blood lead level evolves for fixed effects over time

• suggests linear with certain quadratic variance effects


Modelling the Covariance

• Longitudinal data present two aspects of the data that require

modelling: mean response over time and covariance.

• Although these two aspects of the data can be modelled separately,

they are interrelated.

• Choice of models for mean response and covariance are

interdependent.

• A model for the covariance must be chosen on the basis of some

assumed model for the mean response.

• Recall: Covariance between any pair of residuals, say [Yij − µij (β)]

and [Yik − µik (β)], depends on the model for the mean, i.e., depends on
β.
Modelling the Covariance

Three broad approaches can be distinguished:

1 " Unstructured" or arbitrary pattern of covariance

2 Covariance pattern models

3 Random effects covariance structure


Unstructured Covariance

• Appropriate when design is "balanced" and number of measurement

occasions is relatively small.

• No explicit structure is assumed other than homogeneity of

covariance across different individuals, Cov(Yi ) = Σi = Σ.

• Chief advantage: no assumptions made about the patterns of

variances and covariances.

• Potential drawbacks: When number of covariance parameters is

large with number of times, relative to sample size, is likely to be


very unstable.
Covariance Pattern Models

• When attempting to impose some structure on the covariance, a

indirect balance needs to be reached.

• With too little structure there may be too many parameters to be

estimated with limited amount of data.

• With too much structure, potential risk of model miss specification

and misleading inferences concerning β.

• Classic tradeoff between bias and variance.

• Covariance pattern models have their basis in models for serial

correlation originally developed for time series data


Compound Symmetry

• Assumes variance is constant across occasions, say σ 2 , and

Corr(Yij ; Yik ) = ρ for all j and k.

• Cov(Yi ) = σ 2 ∗ correlationmatrixof (ρ)

• Parsimonious: two parameters regardless of number of measurement

occasions.

• Strong assumptions about variance and correlation are usually not

valid with longitudinal data


Toeplitz

• Assumes variance is constant across occasions, say σ 2 , and

Corr(Yij ; Yi;j+k ) = ρk for all j and k.

• Cov(Yi ) = σ 2 ∗ correlationmatrixof (ρk )

• Assumes correlation among responses at adjacent measurement

occasions is constant, ρ1 .

• Toeplitz only appropriate when measurements are made at equal (or

approximately equal) intervals of time.

• A special case of the Toeplitz covariance is the (first-order)

autoregressive covariance.
Toeplitz

• Assumes variance is constant across occasions, say σ 2 , and

Corr(Yij ; Yi;j+k ) = ρk for all j and k.

• Only appropriate when the measurements are made at equal (or

approximately equal) intervals of time.

• Compound symmetry, Toeplitz and autoregressive covariances

assume variances are constant across time.

• This assumption can be relaxed by considering versions of these

models with heterogeneous variances, Var(Yij ) = σj2 .


Exponential

• When measurement occasions are not equally-spaced over time,

autoregressive model can be generalized as follows.

• Let {ti1 ; . . . ; tin } denote the observation times for the ith individual

and assume that the variance is constant across all measurement


|t −tik |
occasions, say σ 2 , and Corr(Yij ; Yik ) = ρj ij ; for ρ ≥ 0.

• Correlation between any pair of repeated measures decreases

exponentially with the time separations between them.


Exponential

• Referred to as "exponential" covariance because it can be

re-expressed as
Cov(Yij ; Yik ) = σ 2 ρ|tij tik | = σ 2 exp(−θ|tij − tik |) ; where θ = -log(ρ) or ρ =
exp(-θ) for θ ≥ 0.

• Exponential covariance model is invariant under linear

transformation of the time scale.

• If we replace tij by (a+btij ) (e.g., if we replace time measured in

"weeks" by time measured in "days"), the same form for the


covariance matrix holds

• Note: Choice of models for covariance and mean are interdependent


Exploring the Correlation Structure

• For balanced longitudinal data, the correlation structure can be

studied through the correlation matrix, or a scatterplot matrix


Exploring Correlation Structure..

Figure: The correlation Structure

• Shows how the correlation utility value evolves for fixed effects over time

• suggests constant correlation structure


Exploring the Random Effects...

• The first step in the model building process for a linear mixed-effects

model, after the functional form of the model has been decided, is
choosing which parameters in the model, if any, should have a
random-effect component included to account for between-group
variation.
Model for Longitudinal Data
Notation of General Linear Model

• Previously, we assumed a sample of N subjects are measured

repeatedly at n occasions.

• Either by design or happenstance, subjects may not have same

number of repeated measures or be measured at same set of


occasions.

• We assume there are ni repeated measurements on the ith subject

and each Yij is observed at time tij .


Notation of General Linear Model

• We can group the response variables for the ith subject into a ni ∗ 1
vector:
Yi1
 
 Yi2 
Yi =  . 
 
 .. 
Yini

• Associated with Yij there is a p*1 vector of covariates

Xij1
 
 Xij2 
Xij =  . 
 
 .. 
Xijni

For j=1,2,...,ni i = 1,2,...,N


• Note: Information about the time of observation, treatment or
exposure group, and other predictor and confounding variables can
be expressed through this vector of covariates.
Notation of General Linear Model

• We can group the vectors of covariates into a ni × p matrix:

 
Xi11 Xi12 ··· Xi1p
 
X
 i21 Xi22 ··· Xi2p  
Xi = 
 .. .. .. .. 

 .
 . . .  
Xini 1 Xini 2 ··· Xini p

• Xi is simply an ordered collection of the values of the p covariates for

the ith subject at the ni occasions.


Linear Models for Longitudinal Data

• Throughout this course we consider linear regression models for

changes in the mean response over time:


Yij = β1 Xij1 + β2 Xij2 · · · βp Xijp + eij

• where, β1 , β2 + ...βp are unknown regression coefficients.

• The eij are random errors, with mean zero, and represent deviations

of the Yij ’s from their means,


E(Yij |Xij ) = β1 Xij1 + β2 Xij2 · · · βp Xijp

• E(Yij |Xij ) = Xi β
Model Assumptions

1 The individuals represent a random sample from the population of


interest.

2 The elements of the vector of repeated measures Yi1 ; ...; Yin , have a
Multivariate Normal (MVN) distribution,

3 Observations from different individuals are independent, while


repeated measurements of the same individual are not assumed to be
independent.

• The covariance matrix of the vector of observations, Yi1 ; ...; Yin , is

denoted Σ and its elements are σjj0


Probability Distribution

• A probability distribution describes the likelihood or relative

frequency of occurrence of particular values of the response (or


dependent) variable.

• Recall single response variable(Yi ) having normal probability density

in standard linear regression model is:


..
(y −X β +X β ,.,X β )2
f (yi |Xi1 , Xi2 , . . . , Xip ) = √ 1 exp{− i i1 1 2σi22 2 ip p }
2πσ 2

• With repeated measures we have a vector of response variables and

must consider joint probability models for the entire vector of


responses.

• The Multivariate Normal Distribution is an extension of the Normal

distribution for a single response to a vector of responses.


Multivariate Normal Distribution

• The multivariate normal probability density function for


0
Yi = (Yi1 ; Yi2 ; . . . ; Yin ) has the following representation:
n 1 0
f (Yi |Xi ) = 2π − 2 |Σ|− 2 exp[−(Yi − Xi β) Σ−1 (Yi − Xi β]

• Note that f (Yi |Xi ) describes the probability or relative frequency of

occurrence of a particular set of values of (Yi1 ; Yi2 ; . . . ; Yin ).


Multivariate Normal Distribution

0
• f (Yi |Xi ) depends to a very large extent on (Yi − Xi β) Σ−1 (Yi − Xi β]

• Note that f (Yi |Xi ) describes the probability or relative frequency of

occurrence of a particular set of values of (Yi1 ; Yi2 ; . . . ; Yin ).


Model For Longitudinal Data

• In practice we have unbalanced data which is unequal number of

measurements per subject atdifferent time points

• The model the we use should account this differences

• Often, subject-specific longitudinal profiles can be well approximated

by linear regression functions

• The model which accounts this subject specific difference and

unbalanced data is know to be the Linear mixed model(LMM)for


continuous response cases

• Let see some concepts before we proceed to the modeling approaches


Linear Mixed Models

• Linear mixed models(LMM) are models that handle data where


observations are not independent.

• LMM correctly models correlated errors, whereas procedures in the


general linear model family(GLM)usually do not.

• GLM includes such procedures as t-tests, analysis of variance,


correlation, regression, and factor analysis, to name a few

• LMM can be considered as a further generalization of GLM to better


support analysis of a continuous response. Mixed models contain
both fixed and random effects.

• Fixed Effects: factors for which the only levels under consideration
are contained in the coding of those effects
Linear Mixed Models...

• Random Effects: factors for which the levels contained in the coding

of those factors are a random sample of the total number of levels in


the population for that factor

• Hierarchical effects: Hierarchical designs have nested effects. Nested

effects are those with subjects within groups

• For instance, patients are nested within doctors and doctors are

nested within hospitals.

• Repeated measures: where observations are correlated rather than

independent (ex., before-after studies, time series data,


matched-pairs designs)
Linear Mixed Models...

• Fixed effect parameters are commonly used in linear and generalized

linear models.

• In contrast, a random effect is a random variable. It does not make

sense to estimate a random effect; instead, we try to estimate the


parameters that describe the distribution of this random effect.

• In some analysis, random effects are used simply to induce a certain

correlation structure in the data and there is sense in which the


chosen levels represent a sample from a population
Formulation of Linear Mixed Models

• The 2-stage model formulation

• In practice: often unbalanced data:

1 Unequal number of measurements per subject

2 Measurements not taken at fixed time points

• Therefore, multivariate regression techniques are often not

applicable

• Often, subject-specific longitudinal profiles can be well approximated

by linear regression functions

• This leads to a 2-stage model formulation:


The 2-Stage Model Formulation

• Stage 1: Linear regression model for each subject separately

(different parameters with different subjects)

• Stage 2: Explain variability in the subject-specific regression

coefficients using known covariates (over all evolution with known


covariates)
Stage 1

• Response Yij for ith subject, measured at time tij , i = 1, . . . , N, j=1, . .

. ,ni
0
• Response vector Yi for ith subject:Yi =(Yi1 , Yi2 , ..., Yini )

• Yi = ZT
i βi + εi

• Where;Zi is ni xq matrix of known covariates,βi is a q-dimensional

vector of subject-specific regression coefficients

• εi ∼ N(0, Σi ); whereΣi = σ 2 Ini

• Note that the above model describes the observed variability within

subjects
Stage 2

• Between-subject variability can now be studied from relating the βi

to known covariates

• Stage 2 model

• β i = KT
i β + bi

• Where;Ki is qxp matrix of known covariates,β is a p-dimensional

vector of unknown regression parameters

• bi ∼ N(0, D)
The General Linear Mixed-effects Model

• A 2-stage approach can be performed explicitly in the analysis

• However, this is just another example of the use of summary

statistics:

• Yi is summarized by βi

• Summary statistics βi analyzed in second stage

• The associated drawbacks can be avoided by combining the two

stages into one model:


The General Linear Mixed-effects Model...


 Yi = Zi βi + εi

 βi = Ki β + bi ⇒ Yi = Zi Ki (Xi )β + Zi bi + εi
| {z }
• General linear mixed-effects model

 Yi = Xi β + Zi bi + εi

• bi ∼ N(0, D), εi ∼ N(0, Σi )

b1 , b2 , ..., bN , ε1 , ε2 , ..., εN independent

• Fixed effects: β

• Random effects: bi

• Variance components: elements in D and Σi


Hierarchical versus Marginal Model

• The LMM is given by:



 Yi = Xi β + Zi bi + εi

bi ∼ N(0, D), εi ∼ N(0, Σi )

b1 , b2 , ..., bN , ε1 , ε2 , ..., εN independent

• It can be rewritten as: Yi |bi ∼ N(Xi β + Zi bi , Σi ), bi ∼ N(0, D)

• It is therefore also called a hierarchical model:

1 A model for Yi given bi

2 A model for bi
0
• Marginally, we have that Yi is distributed as: Yi ∼ N(Xi β, Zi DZi + Σi )
Hierarchical versus Marginal Model...

• Hence, very specific assumptions are made about the dependence of


mean and covariance on the covariates Xi and Zi :

1 Implied mean : Xi β
0
2 Implied covariance: Vi = Zi DZi + Σi

• Note that the hierarchical model implies the marginal one, NOT vice

versa.
Components of the Linear Mixed Effects Model

• The Mean Structure: The Xi β part stands for fixed effects. where Xi

is a set of covariates used for modeling the response.

• The Random Effects: bi

• Variance components: elements in D and Σi


Estimation of Marginal Model

• Recall that the general linear mixed model equals

• Yi = Xi β + Zi bi + εi

• Where;bi ∼ N(0, D), εi ∼ N(0, Σi ) which are independent

0
• The marginal model is given by: Yi ∼ N(Xi β, Zi DZi + Σi )

• Note that inferences based on the marginal model do not explicitly

assume the presence of random effects representing the natural


heterogeneity between subjects
Estimation of Marginal Model....

• Notations:

• β : vector of fixed effects

• α: vector of all variance components in D and Σi


0 0 0
• θ = (β , α )

• Marginal likelihood function:


QN −ni −1 0
• LML (θ) = i=1 {(2Π)
2 |Vi (α)| 2 exp( −1
2
(Yi − Xi β) Vi (α)−1 )(Yi − Xi β)}

• If α were known, MLE of β equals


0 0
β̂(α) = ( N −1 PN
P
i=1 Xi Wi Xi ) i=1 Xi Wi Yi

• where Wi equals V−1


i .
Estimation of Marginal Model....

• In most cases, α is not known, and needs to be replaced by an


estimate α̂
• Two frequently used estimation methods for α:

1 Maximum likelihood

2 Restricted maximum likelihood


Maximum Likelihood Estimation

• α̂ML obtained from maximizing LML (α, β̂(α)) with respect to α

• The resulting estimate β̂(α̂ML ) for β will be denoted by β̂ML

• α̂ML and β̂ML can also be obtained from maximizing LML (θ) with
respect to θ, i.e., with respect to α and β simultaneously.
Restricted Maximum Likelihood Estimation(REML)

• Recall Variance Estimation in Normal Populations

• Consider a sample of N observations Y1 , ..., YN from N(µ, σ 2 )


(Yi −µ)2
• For known µ, MLE of σ 2 equals: σ̂ 2 =
P
i N

• σ̂ 2 is unbiased for σ 2
(Yi −Ȳ)2
• When µ is not known, MLE of σ 2 equals: σ̂ 2 =
P
i N

• Note that σ̂ 2 is biased for σ 2 : E(σ̂ 2 ) = N−1 2


N
σ
Restricted Maximum Likelihood Estimation(REML)..

• The bias expression tells us how to derive an unbiased estimate:


S2 = − Ȳ)2 /(N − 1)
P
i (Yi

• Apparently, having to estimate µ introduces bias in MLE of σ 2

• How to estimate σ 2 , without estimating µ first?

• The model for all data simultaneously:Y ∼ N(µ, σ 2 IN )


Restricted Maximum Likelihood Estimation(REML)..

• We transform Y such that µ vanishes from the likelihood:


0 0
U = A Y ∼ N(0, σ 2 A A)

• MLE of σ 2 , based on U, equals: S2 = 1


− Ȳ)2
P
N−1 i (Yi

• A defines a set of N-1 linearly independent error contrasts

• S2 is called the REML estimate of σ 2 , and S2 is independent of A


Restricted Maximum Likelihood Estimation(REML)..

• Consider Estimation of Residual Variance in Linear Regression


Model

• Consider a sample of N observations Y1 , ..., YN from a linear


regression model:Y ∼ N(Xβ, σ 2 I)
0
• MLE of σ 2 : σ̂ 2 = (Y − Xβ̂) (Y − Xβ̂)/N

N−p 2
• Note that σ̂ 2 is biased for σ 2 : E(σ̂ 2 ) = N
σ
Restricted Maximum Likelihood Estimation(REML)..

• The bias expression tells us how to derive an unbiased estimate:


0
MSE = (Y − Xβ̂) (Y − Xβ̂)/N − p

• The MSE can also be obtained from transforming the data orthogonal
to X:
0 0
U = A Y ∼ N(0, σ 2 A A)

• The MLE of σ 2 , based on U, now equals the mean squared error, MSE

• The MSE is again called the REML estimate of σ 2


REML for the Linear Mixed Model

• We first combine all models: Yi ∼ N(Xi β, Vi ) into one model


Y ∼ N(Xβ, V) where Y is Nx1 vector, X is Nxp and V is NxN matrix
of covariates and variances

• Again, the data(Y) are transformed orthogonal to X:


0 0
U = A Y ∼ N(0, A V(α)A)
REML for the Linear Mixed Model...

• The MLE of α, based on U is called the REML estimate, and is


denoted by α̂REML

• The resulting estimated β̂α̂REML for β will be denoted by β̂REML

• α̂REML and β̂REML can also be obtained from maximizing


P 0 −1
LREML (θ) = | i Xi Wi (α)Xi | 2 LML (θ) with respect to θ, i.e., with
respect to α and β simultaneously.

• LREML (α, β̂(α)) is the likelihood of the error contrasts U, and is often
called the REML likelihood function.

• Note that LREML (θ) is NOT the likelihood for our original data Y
Inference for the Marginal Model

• Inference for fixed effects:

1 Wald test

2 t-test and F-test

3 Robust inference

4 LR test

• Inference for variance components:

1 Wald test

2 LR test

• Information criteria
Inference for the Fixed Effects

0 0
P −1 P
• Estimate for β: β̂(α) = ( i (Xi Wi (α)Xi ) i (Xi Wi (α)Yi with α
replaced by its ML or REML estimate

• Conditional on α, β̂(α) is multivariate normal with mean β and


covariance
P 0 P 0 P 0
Var(β̂)= ( i (Xi Wi Xi )−1 ( i (Xi Wi Var(Yi )Wi Xi )( i (Xi Wi Xi )−1
P 0
=( i (Xi Wi Xi )−1 Where; Var(Yi ) = Wi−1

• In practice one again replaces α by its ML or REML estimate


Approximate Wald Test

• For any known matrix L, consider testing


H0 : Lβ = 0, VsHA : Lβ 6= 0

• Wald test statistic:


0 0 0 0
−1
L ]−1 Lβ̂
P
G = β̂ L [L(( i (Xi Wi Xi )

• Asymptotic null distribution of G is χ2 with rank(L) degrees of


freedom
Approximate t-test and F-test

0
P −1
• Wald test based on Var(β̂)= ( i (Xi Wi Xi )

• Variability introduced from replacing α by some estimate is not taken


into account in Wald tests

• Therefore, Wald tests will only provide valid inferences in sufficiently


large samples

• In practice, this is often resolved by replacing the χ2 distribution by


an appropriate F-distribution (are the normal by a t)

• For any known matrix L, consider testing


H0 : Lβ = 0, VsHA : Lβ 6= 0
Approximate t-test and F-test...

• F test statistic:
0 0 0 0
β̂ L [L(( i (Xi Wi Xi )−1 L ]−1 Lβ̂
P
F= Rank(L)

• Approximate null-distribution of F is F with numerator degrees of


freedom equal to rank(L)

• Denominator degrees of freedom to be estimated from the data:


Containment method, Satterthwaite approximation, Kenward and
Roger approximation. . . .
Approximate t-test and F-test...

• In the context of longitudinal data, all methods typically lead to large


numbers of degrees of freedom, and therefore also to very similar
p-values.

• For univariate(single parameter) hypotheses (rank(L) = 1) the F-test


reduces to a t-test
Likelihood Ratio Test

• Comparison of nested models with different mean structures, but


equal covariance structure

• Null hypothesis of interest equals H0: β ∈ Θβ,0 , for some subspace


Θβ,0 of the parameter space Θβ of the fixed effects β

• Notation:
LML : maximum likelihood function
θ̂ML,0 : MLE under H0
θ̂ML : MLE under general model
Likelihood Ratio Test

• Comparison of nested models with different mean structures, but


equal covariance structure

• Test statistic:
LML (θ̂ML,0 )
−2lnλN = −2ln[ LML (θ̂ML )
]

• Asymptotic null distribution: χ2 with d.f. equal to the difference in


dimension of Θβ and Θβ,0

• LR tests for the mean structure are not valid under REML which
may resulted in Negative LR test statistic!
Inference for the Variance Components

• Inference for the mean structure is usually of primary interest.

• However, inferences for the covariance structure is of interest as well:

• interpretation of the random variation in the data

• over parameterized covariance structures lead to inefficient


inferences for mean

• too restrictive models invalidate inferences for the mean structure


Approximate Wald Test

• Asymptotically, ML and REML estimates of α are normally


distributed with correct mean and inverse Fisher information matrix
as covariance
Information Criteria

• LR tests can only be used to compare nested models

• How to compare non-nested models ? Likely hood value

• The general idea behind the LR test for comparing model A to a more
extensive model B is to select model A if the increase in likelihood
under model B is small compared to increase in complexity

• A similar argument can be used to compare non-nested models A and


B
Information Criteria...

• The model is selected with the highest penalized log-likelihood


` − z(#θ) for some function z(.) of the number θ of parameters in the
model.

• Different functions z(.) lead to different criteria:

• Where; n∗ = N =
P
i ni Under ML and N-p under REML
Criterion Definition of z(.)
Akaike (AIC) z(#θ) = #θ
BIC z(#θ) = (#θln(n∗ ))
Schwarz (SBC) z(#θ) = (#θln(n∗ ))/2
Hannan and Quinn (HQIC) z(#θ) = (#θln(ln(n∗ )))
Bozdogan (CAIC) z(#θ) = (#θln(n∗ + 1))/2
Model Building Strategy

You might also like