0% found this document useful (0 votes)
26 views44 pages

Survival

STUDY MATERIALS FOR BIOSTATISTICS
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views44 pages

Survival

STUDY MATERIALS FOR BIOSTATISTICS
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

Survival Analysis

Sources:
•Slides: Kristin Sainani Stanford http://www.stanford.edu/~kcobb
•Johnson and Shih An Introduction to Survival Analysis,
Principles and Practice of Clinical Research 2E (2007)
•Rich et al. A practical guide to understanding Kaplan-Meier Curves,
Otolaryngology – Head and Neck Surgery (2010)

ABDBM © Ron Shamir 1


Overview

• Intro, terminology
• Survival/hazard functions.
• Kaplan-Meier curves
• The LogRank test
• Cox PH

ABDBM © Ron Shamir 2


Early example of survival
analysis, 1669

Christiaan Huygens' 1669 curve


showing how many out of 100
people survive until 86 years.
From: Howard Wainer- STATISTICAL GRAPHICS: Mapping the Pathways of Science. Annual Review of Psychology. Vol.
52: 305-335.

ABDBM © Ron Shamir 3


What is survival analysis?
• Statistical methods for analyzing
longitudinal data on the occurrence of
event.
• Possible events:
– death, injury, onset of disease, recovery from illness,
recurrence-free survival for 5 years (binary variables)
– transition above or below the clinical threshold of a
continuous variable (e.g. blood glucose level).
• Accommodates data from randomized
clinical trial or cohort study design.

ABDBM © Ron Shamir 4


Randomized Clinical Trial (RCT)

Disease

Intervention
Random
assignment Disease-free
Target Disease-free,
at-risk cohort
population Disease

Control
Disease-free

TIME
ABDBM © Ron Shamir 5
Randomized Clinical Trial (RCT)

Cured

Treatment
Random
assignment Not cured
Target Patient
population
population Cured

Control
Not cured
TIME
ABDBM © Ron Shamir 6
Randomized Clinical Trial (RCT)

Dead

Treatment
Random
assignment Alive
Target Patient
population
population Dead

Control
Alive

TIME
ABDBM © Ron Shamir 7
Cohort study
(prospective/retrospective)

Disease

Exposed

Disease-free
Target Disease-free
cohort
population Disease

Unexposed
Disease-free

TIME
ABDBM © Ron Shamir 8
Examples of survival analysis in
medicine

ABDBM © Ron Shamir 9


RCT: Women’s Health Initiative
(JAMA, 2002)

On hormones

Cumulative
incidence On placebo
Women’s Health Initiative
Writing Group.
JAMA. 2002;288:321-333.

ABDBM © Ron Shamir 10


Breast cancer and low-fat diet
Control

Low-fat diet

Prentice et al.
JAMA, February
8, 2006; 295:
629 - 642.

ABDBM © Ron Shamir 11


Aspirin, ibuprofen, and mortality after myocardial
infarction: retrospective cohort study

Curtis et al. BMJ 2003

Curits et al. BMJ 2003;327:1322-1323.


ABDBM © Ron Shamir 12
Why survival analysis?
1. Why not compare mean time-to-event
between groups using a t-test or linear
regression?
-- For some patients we may not know if and
when an event occurred: study terminated
or we lost touch with them
2. Why not compare proportion of events in
each group using risk/odds ratios or logistic
regression?
--ignores time

ABDBM © Ron Shamir 13


Terminology
• The event of interest: the outcome sought
• Time-to-event: The time from entry into a
study until a subject had the outcome
• Censoring: Subjects are said to be
censored if they are lost to follow up or
drop out of the study, or if the study ends
before they have the outcome. They are
counted as alive / disease-free for the
time they were enrolled in the study.
– Must assume censoring is independent of the
outcome, otherwise censoring will create bias

ABDBM © Ron Shamir 14


An example

Solid circles: uncensored


ABDBM © Ron Shamir
Open: censored 15
Moving all start times to 0

A better view only if time


homogeneity holds 16
ABDBM © Ron Shamir
Data of a hypothetical study

ABDBM © Ron Shamir Johnson and Shih 17


Data
Two-variable outcome :
• ti = time at last disease-free observation or
time at event
• ci =1 if had the event; ci =0 no event by time
ti

ABDBM © Ron Shamir 18


Survival function
• S(t): the probability of an individual
surviving at least until time t
• Usually unknown, evaluated based on a
sample
• Survival experience – the empirical function

ABDBM © Ron Shamir 19


Cumulative survival

ABDBM © Ron Shamir 20


Probability density function: f(t)
T: the event time for an individual (a random
variable)
The probability of the event time occurring at
exactly time t

P (t ≤ T < t + ∆t )
f (t ) = lim
∆t 
→ 0 ∆t
F(t) = CDF of f(t)
S(t) = 1-F(t)

ABDBM © Ron Shamir 21


The hazard function
The probability that if you survive to t, you
will succumb to the event in the next instant.

P (t ≤ T < t + ∆t / T ≥ t )
h(t ) = lim
∆t 
→ 0 ∆t
Bayes’ rule
P (t ≤ T < t + dt & T ≥ t ) P (t ≤ T < t + dt ) f (t )dt
h(t )dt = P (t ≤ T < t + dt / T ≥ t ) = = =
P (T ≥ t ) P (T ≥ t ) S (t )

f (t )
Hazard from density and survival : h(t) =
S (t )
ABDBM © Ron Shamir 23
AGE
ABDBM © Ron Shamir 24
A possible set of probability density, failure,
survival, and hazard functions.

f(t)=density function

F(t)=cumulative failure

S(t)=cumulative survival h(t)=hazard function

ABDBM © Ron Shamir 25


The Kaplan-Meier curve
Sorted events t1 < t2 < …< tn. No censoring.
Pr(surviving to ti) = (n-i+1)/n
What to do when some subjects are censored?
Sorted events t1 < t2 < …< tn,
di – no of events in (ti-1,ti]; ni – no of individuals at
risk (remaining in the study) in (ti-1,ti];
Pr(survival to ti)= P(surviving to ti-1) x P(surviving
interval (ti-1,ti]) = P(survival to ti-1) x (ni-di)/ni

 K-M or
product-
limit
estimator
ABDBM © Ron Shamir 26
ABDBM © Ron Shamir 27
K-M estimate and curve
• Non-parametric estimate of the survival function

• Empirical probability of surviving past certain


times in the sample (taking into account censoring).
• Describes survivorship of study population/s.
• Commonly used to compare two study populations.
• Intuitive graphical presentation.

ABDBM © Ron Shamir 28


Edward L. Paul Meier
Kaplan 1924-2011

ABDBM © Ron Shamir 29


Comparing two survival curves
Result
• Two methods: depends on
t; tendency
– Compare the curves at a pre-specified time point t to pick the
– Compare the overall plots over the entire time range “best” t

Hormones vs Placebo
Women’s Health Initiative
Writing Group.
JAMA. 2002;288:321-333.

ABDBM © Ron Shamir 30


Comparing two curves: Log rank test
• H0: S1(t) = S2(t) for all t
• Log rank test: Use the ranks of events, not times.
Sorted events t1 < t2 < …< tK, For time tj:

Events Surviving Total

Group 1 aj bj aj+bj

Group2 cj dj cj+dj
Total aj+cj bj+dj nj

Under H0, E(aj)=tot events x # at risk group 1/# at risk =


(aj+cj)x(aj+bj)/nj

Z is approximately standard normal – evaluate p-val


ABDBM © Ron Shamir 32
Example: breast cancer survival
signature Van de Vijver NEJM 02

• Caveats:
– No mention of mean
survival
– Visual inspection can
be misleading
– Must predefine the
groups in advance

Certain characteristics (age,


sex, ..) can be related to
survival – confounding /
prognostic factors can
change the relation of
treatment to outcome
Need to stratify the test and
compare survival
differences within each
level of these factors Small
numbers left
ABDBM © Ron Shamir 33
WHI and breast cancer

Women’s
Health
Initiative
Writing
Group.
JAMA. 2002
;288:321-
333.
ABDBM © Ron Shamir 34
Cox Proportional Hazard Model
• K-M curves and Log Rank – univariate
analysis; describe survival using one
categorical factor
• Cox PH: allows many prognostic factors,
categorical or real-valued
• Semi-parametric
• Models the effect of predictors and
covariates on the hazard rate but leaves
the baseline hazard rate unspecified.
• Estimates relative rather than absolute
hazard.
ABDBM © Ron Shamir 35
The model
Components:
•A baseline hazard function that is left unspecified but must be
positive (=the hazard when all covariates are 0)
•A linear function of a set of k fixed covariates that is
exponential.

β1 xi1 +...+ β k xik


hi (t ) = λ0 (t )e
Can take on any form!

log hi (t ) = log λ0 (t ) + β1 xi1 + ... + β k xik


36
The model

Proportional hazards:
Hazard for person i (eg a smoker)

Hazard hi (t ) λ0 (t )e β1xi1 +...+ β k xik β1 ( xi 1 − x j 1 ) +...+ β1 ( xik − x jk )


ratio
HRi , j = = = e
h j (t ) λ0 (t )e β1x j1 +...+ β k x jk

Hazard for person j


(eg a non-smoker)

Hazard functions should be strictly parallel


Produces covariate-adjusted hazard ratios
37
The model

The point is to compare the hazard rates of


individuals who have different covariates:
Hence, called Proportional hazards:

For binary x: β is exp


h1 (t ) h0 (t )e βx1 log (increase in hazard)
β ( x1 − x2 )
HR = = = e betw categories.
h2 (t ) h0 (t )e βx2 For numerical x: exp log
increase per unit (e.g.
Hazard functions should year)
be strictly parallel.

ABDBM © Ron Shamir 38


Cox PH - computation
• The coefficients β1, …, βK can be estimated using
numerical optimization (details not shown)
• For large enough sample, the estimate of each βi
has a normal distribution and its p-val and
confidence intervals can be computed.

ABDBM © Ron Shamir 39


Example: Farmingham heart study
• Cohort of 5,180 aged 45-82 followed until time of
death or up to 10 years. 46% males, 402 deaths.
Die (n=402) Do Not Die (n=4778)
Mean (SD) Age, years 65.6 (8.7) 56.1 (7.5)
N (%) Male 221 (55%) 2145 (45%)

• Cox PH model for age and sex as factors:


Risk Factor Parameter Estimate P-Value
Age, years 0.11149 0.0001
Male Sex 0.67958 0.0001

• Both factors increase risk.


– Age: exp(0.11149) = 1.118 so 11.8% higher risk per year.
– Male: exp(0.67958) = 1.973 higher risk per males, holding
age constant
ABDBM © Ron Shamir http://sphweb.bumc.bu.edu/otlt/MPH-Modules/BS/BS704_Survival/ 40
Model with more covariates
Risk Factor Parameter Estimate P-Value Hazard Ratio (HR)
(95% CI for HR)
Age, years 0.11691 0.0001 1.124 (1.111-1.138)
Male Sex 0.40359 0.0002 1.497 (1.215-1.845)

Systolic Blood 0.01645 0.0001 1.017 (1.012-1.021)


Pressure
Current Smoker 0.76798 0.0001 2.155 (1.758-2.643)

Total Serum -0.00209 0.0963 0.998 (0.995-2.643)


Cholesterol
Diabetes -0.02366 0.1585 0.816 (0.615-1.083)

• Significant factors have CI that do not include 1


(the null)

ABDBM © Ron Shamir 41


Example 1: Study of publication bias

By
Kaplan-
Meier
methods

From: Publication bias: evidence of delayed publication in a cohort study of clinical research projects BMJ 1997;315:640-645 (13 September) 42
Univariate Cox regression
Table 4 Risk factors for time to publication using univariate Cox regression analysis

Characteristic # not published # published Hazard ratio (95% CI)

Null 29 23 1.00

Non-significant 16 4 0.39 (0.13 to 1.12)


trend

Significant 47 99 2.32 (1.47 to 3.66)

From: Publication bias: evidence of delayed publication in a cohort study of clinical research projects BMJ 1997;315:640-645 (13 September)

Interpretation: Significant results have a 2-fold higher


incidence of publication compared to null results.
43
Example 2: Study of mortality in
academy award winners for screenwriting

Kaplan-
Meier
methods

From: Longevity of screenwriters who win an academy award: longitudinal study BMJ 2001;323:1491-1496 ( 22-29 December ) 44
Table 2. Death rates for screenwriters who have won an
academy award.* Values are percentages (95% confidence
Relative increase
intervals) and are adjusted for the factor indicated
in death rate for
winners

Basic analysis 37 (10 to 70)


Adjusted analysis HR=1.37; interpretation:
37% higher incidence of
Demographic: death for winners compared
Year of birth with nominees 32 (6 to 64)
Sex 36 (10 to 69)
Documented education 39 (12 to 73)
All three factors 33 (7 to 65)
Professional:
Film genre HR=1.35; interpretation: 37 (10 to 70)
Total films 35% higher incidence of 39 (12 to 73)
death for winners compared
Total four star films 40 (13 to 75)
with nominees even after
Total nominations adjusting for potential 43 (14 to 79)
Age at first film confounders 36 (9 to 68)
Age at first nomination 32 (6 to 64)
All six factors 40 (11 to 76)
All nine factors 35 (7 to 70)
Sir David Cox
• Born 1924
• Cambridge, Imperial College London, Oxford
• Books:
– Planning of experiments (1958)
– Queues (Methuen, 1961). With Walter L. Smith
– Renewal Theory (Methuen, 1962).
– The theory of stochastic processes (1965). With Hilton David Miller
– Analysis of binary data (1969). With Joyce E. Snell
– Theoretical statistics (1974). With D. V. Hinkley
– Point processes (Chapman & Hall/CRC, 1980). With Valerie Isham
– Applied statistics, principles and examples (Chapman & Hall/CRC, 1981). With Joyce E. Snell
– Analysis of survival data (Chapman & Hall/CRC, 1984). With David Oakes
– Asymptotic techniques for use in statistics. (1989) With Ole E. Barndorff-Nielsen
– Inference and asymptotics (Chapman & Hall/CRC, 1994). With Ole E. Barndorff-Nielsen
– Multivariate dependencies, models, analysis and interpretation (Chapman & Hall, 1995). With Nanny Wermuth
– The theory of design of experiments. (Chapman & Hall/CRC, 2000). With Nancy M. Reid.
– Complex stochastic systems (Chapman & Hall/CRC, 2000). With Ole E. Barndorff-Nielsen and Claudia
Klüppelberg
– Components of variance (Chapman & Hall/CRC, 2003). With P. J. Solomon
– Principles of Statistical Inference (Cambridge University Press, 2006). ISBN 978-0-521-68567-2
– Selected Statistical Papers of Sir David Cox 2 Volume Set
– Principles of Applied Statistics (CUP) With Christl A. Donnelly
ABDBM © Ron Shamir 46

You might also like