0% found this document useful (0 votes)

59 views17 pages

Survival 2

This document discusses survival analysis with a focus on the log-rank test, covering its derivation, interpretation, and practical applications. It includes topics such as the Kaplan-Meier estimator, types of censoring, and a real-life example comparing court case disclosure times. The document provides mathematical derivations and guidelines for implementation in software, all formatted in LATEX for easy compilation.

Uploaded by

rimoghoshsayan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

59 views17 pages

Survival 2

Uploaded by

rimoghoshsayan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

Log-Rank Test

Sayan Ghosh

Abstract
This document provides a comprehensive treatment of survival analysis with
emphasis on the derivation and interpretation of the log-rank test. We cover:
• Fundamentals of survival analysis and censoring
• Kaplan–Meier estimator and its properties
• Derivation of the log-rank test statistic, expected values under the null hy-
pothesis, variance, and asymptotic chi-square distribution
• Construction and role of contingency tables at each event time
• A real-life example: comparing court case disclosure times between two peri-
ods (2015–2020 vs. 2020–2025), including data structure, handling of censor-
ing, Kaplan–Meier curves, log-rank computation, and interpretation
• Guidelines for implementing analyses and plotting in software
All mathematical expressions, tables, and explanatory text are provided in LATEX
format for direct compilation or inclusion in reports.

Contents
1 Introduction 3
1.1 Types of Censoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Basics of Survival Analysis 4

2.1 Notation and Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Kaplan–Meier Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3 Log-Rank Test: Theory and Derivation 7

3.1 Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.2 Data Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.3 Expected Number of Events under H0 . . . . . . . . . . . . . . . . . . . 7
3.4 Variance under H0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.5 Aggregating over All Event Times . . . . . . . . . . . . . . . . . . . . . . 8
3.6 Test Statistic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.7 Role of Contingency Tables at Each Event Time . . . . . . . . . . . . . . 8

4 Kaplan–Meier Curves and Plotting 9

1
5 Real-Life Example: Court Case Disclosure Times 9
5.1 Context and Data Structure . . . . . . . . . . . . . . . . . . . . . . . . . 9
5.2 Handling Censoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
5.3 Kaplan–Meier Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . 10
5.4 Applying the Log-Rank Test . . . . . . . . . . . . . . . . . . . . . . . . . 11
5.4.1 Step-by-Step Computation . . . . . . . . . . . . . . . . . . . . . . 11
5.4.2 Illustrative Contingency Table at a Given Time . . . . . . . . . . 11
5.5 Interpretation of Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
5.6 Software Implementation Notes . . . . . . . . . . . . . . . . . . . . . . . 12

6 Full Mathematical Derivations 13

6.1 Kaplan–Meier Product-Limit Formula . . . . . . . . . . . . . . . . . . . . 13
6.2 Log-Rank Test: Hypergeometric Model at Each Time . . . . . . . . . . . 13
6.3 Asymptotic Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
6.4 Connection to Chi-Square Goodness-of-Fit . . . . . . . . . . . . . . . . . 15

7 Contingency Tables and Aggregation 15

7.1 At Each Time t(j) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
7.2 Aggregating Across Times . . . . . . . . . . . . . . . . . . . . . . . . . . 15

8 Example Walk-Through (Hypothetical Data) 15

8.1 Hypothetical Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
8.2 Event Times and Risk Sets . . . . . . . . . . . . . . . . . . . . . . . . . . 16
8.3 Interpreting Hypothetical Outcome . . . . . . . . . . . . . . . . . . . . . 16

9 Practical Considerations 16

10 References 17

2
1 Introduction
Survival analysis concerns the study of time until the occurrence of an event of inter-
est (often called “failure time”, “event time”, or “time to event”) in the presence of
potentially censored observations. It arises in many fields: medicine (time to death or
relapse), engineering (time to failure of components), social sciences (duration until an
event such as job change), and in our motivating example, legal studies (time until court
case disclosure).
Key challenges:

• Right-censoring: For some subjects, the event has not occurred by the end of
observation or they are lost to follow-up. We know only that their survival time
exceeds their last observed time.

• Nonparametric estimation: We may not wish to assume a specific parametric

form for the survival distribution.

• Comparisons: We often wish to compare survival experiences between two or

more groups.

The Kaplan–Meier estimator provides a nonparametric estimate of the survival func-

tion in presence of right-censoring. To compare two groups, the log-rank test is the most
common nonparametric test, based on comparing observed vs. expected event counts at
each event time under the null hypothesis of equal survival curves.

1.1 Types of Censoring

• Right-censoring: The most common form. The event has not occurred by the
end of observation or the case is lost to follow-up.occurs when a subject leaves the
study before an event occurs, or the study ends before the event has occurred. For
example, we consider patients in a clinical trial to study the effect of treatments on
stroke occurrence. The study ends after 5 years. Patients who have not experienced
a stroke by the end of the study are considered right-censored. If a patient leaves
the study at time te and the event occurs sometime after that (i.e., in the interval
(te , ∞)), the exact event time is unknown. Legal example: A legal case is still
awaiting disclosure when the study ends.

• Left-censoring: The event occurred before the observation period began. We only
know that it happened before a certain time.Left censoring is when the event of
interest has already occurred before enrolment. This is very rarely encountered.
Truncation is deliberate and due to study design. Legal example: A case was
disclosed before official records began, so the exact date of disclosure is unknown.

• Interval-censoring: The event is known to have occurred within a specific time

interval. Legal example: A case’s disclosure is not observed at one hearing but is
confirmed to be public by the next.

• Non-informative censoring: An assumption that censoring is independent of

the event process. For example, if a case is still pending at the end of the study
for reasons not related to its disclosure time, the censoring can be treated as non-
informative.

3
• Progressive censoring: In progressive censoring, subjects are removed from the
study at various times according to a pre-specified rule, not necessarily because the
event of interest occurred. This allows for more flexibility in study design and often
reduces costs.
Legal example: In a long-running case tracking system, certain cases are periodi-
cally removed from observation due to lack of resources or re-prioritization. The
outcome of these cases remains unknown after removal, even though they were
under observation earlier.

2 Basics of Survival Analysis

2.1 Notation and Definitions
• Let τ denote the event time random variable.

• The survival function is

S(t) = Pr(τ > t).
where T is the random variable representing the time to event.
Properties of the survival function include:

– S(0) = 1, since the event has not occurred at time zero.

– S(t) is a non-increasing function of t, approaching zero as t → ∞.

In the context of survival analysis, the survival function is often estimated using
methods like the Kaplan-Meier estimator, which accounts for censored data.

Density Function
The density function f (t) describes the instantaneous rate at which events occur
at time t. It is the derivative of the cumulative distribution function F (t):

d P (t ≤ T < t + ∆t)
f (t) = F (t) = lim .
dt ∆t→0 ∆t
In survival analysis, f (t) gives the probability density of the event happening exactly
at time t.
The relationship between the density function and the survival function S(t) is:

d
f (t) = − S(t),
dt
because S(t) = 1 − F (t).

4
From Density Function to Survival Function
Let T be the random variable denoting time to event. Its probability density func-
tion (pdf) is denoted by f (t), and the cumulative distribution function (cdf) is
Z t
F (t) = P (T ≤ t) = f (u) du.
0

The survival function S(t) is the probability that the event has not occurred by
time t, i.e.,
Z t
S(t) = P (T > t) = 1 − F (t) = 1 − f (u) du.
0

In other words, the survival function is the complement of the cumulative distribu-
tion function, representing the probability of surviving beyond time t.
• The hazard function is
Pr(t ≤ τ < t + ∆t | τ ≥ t) f (t)
h(t) = lim+ = ,
∆t→0 ∆t S(t)

The hazard function can be interpreted as the conditional failure rate at time t.
A high hazard at a particular time indicates that the event is very likely to occur
immediately after that time, assuming it has not occurred yet.

Additional properties and interpretations:

– The hazard function is not a probability but a rate, and thus can take values
greater than 1.
– The shape of the hazard function over time provides insight into the risk
dynamics. For example:
∗ A decreasing hazard suggests that the risk of event occurrence reduces the
longer the subject survives (e.g., ”early failures”).
∗ An increasing hazard implies that risk grows with time, such as with aging
or cumulative damage.
∗ A constant hazard corresponds to memoryless processes like the exponen-
tial distribution.
– In legal contexts, the hazard function can describe how likely a case is to be
disclosed at each point in time, conditional on it not having been disclosed
before.
– The hazard function links closely to the cumulative hazard function H(t),
defined as Z t
H(t) = h(u) du,
0
which represents the accumulated risk of event occurrence up to time t.
Understanding the hazard function helps analysts model and interpret the timing
and risk patterns of events in survival analysis, enabling better predictions and
decisions.

5
Cumulative Hazard Function
The cumulative hazard function H(t) represents the total accumulated risk of
experiencing the event by time t. It is defined as the integral of the hazard function
over time:
Z t
H(t) = h(u) du,
0

where h(u) is the hazard function at time u.

The cumulative hazard function connects the survival function S(t) and the hazard
function via the relationship:

S(t) = exp (−H(t)) .

This shows that the probability of surviving beyond time t decreases exponentially
with the accumulated hazard.
where f (t) is the density of τ if it exists.
• Right-censoring: For each subject i, we observe (Ti , δi ):
(
1, if event occurred (τi ≤ Ci ),
Ti = min(τi , Ci ), δi =
0, if censored (τi > Ci ),
where Ci is the censoring time.
• We assume non-informative (independent) censoring: the event time distribution
and censoring mechanism are independent.

2.2 Kaplan–Meier Estimator

Suppose we have n subjects, and let t(1) < t(2) < · · · < t(J) be the distinct ordered event
times (observed times at which at least one event occurs). For each event time t(j) , define:
nj = number of subjects “at risk” just prior to t(j) , dj = number of events (failures) at t(j) .
Right-censored observations at time t imply that the subject is counted as at-risk up to
time t, but does not contribute an event at t.
The Kaplan–Meier (product-limit) estimator of the survival function is:
Y dj
S(t)
b = 1− .
t ≤t
n j
(j)

Properties:
• It is a step function with downward jumps at each observed event time.
• Censored times contribute to risk set counts up to the time of censoring but do not
produce jumps.
• One can compute pointwise confidence intervals, e.g. using Greenwood’s formula
for the variance of S(t).
b

• Graphical display: Kaplan–Meier curve; tick marks indicate censored observations.

6
3 Log-Rank Test: Theory and Derivation
3.1 Objective
Compare survival experiences of two groups (Group 1 vs. Group 2) under the null hy-
pothesis:

H0 : The two groups share the same survival (hazard) function.

Under H0 , any observed differences in survival are due to random variation.

3.2 Data Setup

Label the two groups by g = 1, 2. For each subject i, we have (Ti , δi , gi ). Pool all subjects
and let t(1) , t(2) , . . . , t(J) be the distinct ordered event times across both groups. At each
event time t(j) , define:

N1,j = #{subjects in Group 1 still at risk just before t(j) }, N2,j = #{subjects in Group 2 still at risk

and

O1,j = #{events in Group 1 at t(j) }, O2,j = #{events in Group 2 at t(j) }.

Also let
Nj = N1,j + N2,j , Oj = O1,j + O2,j .
Right-censoring affects the “at risk” counts but does not directly produce events; censored
subjects are removed from the risk set after their censoring time.

3.3 Expected Number of Events under H0

Under H0 (equal hazards), at each event time t(j) , the probability that a given event
among the Oj total events at that time comes from Group 1 is N1,j /Nj . Equivalently,
conditional on Oj events among Nj at-risk subjects, the number O1,j in Group 1 follows
a hypergeometric-like distribution with expectation:
N1,j
E1,j = Oj .
Nj

Similarly,
N2,j
E2,j = Oj .
Nj

3.4 Variance under H0

Under the same conditional model, the variance of O1,j (hypergeometric variance) is:

N1,j N2,j Oj (Nj − Oj ) Nj − Oj Nj − N1,j

Var(O1,j ) = V1,j = 2
= E1,j .
Nj (Nj − 1) Nj Nj − 1

(When Nj is large, the finite-population correction (Nj − 1) matters little; often approx-
imate forms drop that term, but the exact form ensures correctness.)

7
3.5 Aggregating over All Event Times
Define the total observed-minus-expected for Group 1:
J
X
U1 = O1,j − E1,j .
j=1

Under H0 , E[O1,j − E1,j ] = 0 for each j, so E[U1 ] = 0. The variance of U1 is:

J
X
Var(U1 ) = V1,j ,
j=1

neglecting covariance terms since event times are distinct and counts at different times
are (approximately) uncorrelated under the standard counting process framework.

3.6 Test Statistic

Define PJ
U1 j=1 (O1,j − E1,j )
Z=p = qP .
Var(U1 ) J
V1,j
j=1

By central limit arguments, under H0 and with moderate-to-large number of events,

·
Z ∼ N (0, 1).
Equivalently, one often uses
χ2 = Z 2 ∼ χ21
approximately, with 1 degree of freedom (since comparing two groups). The two-sided
p-value is
p = 2 1 − Φ(|Z|) or p = 1 − Fχ21 (Z 2 ),

where Φ is the standard normal CDF and Fχ21 is the chi-square CDF with 1 df. If p is
below a chosen significance level (e.g. 0.05), reject H0 : conclude the survival curves differ.

3.7 Role of Contingency Tables at Each Event Time

At each event time t(j) , we can tabulate:

Table 1: Contingency table at event time t(j) .

Event at t(j) No Event (still at risk) Total at risk

Group 1 O1,j N1,j − O1,j N1,j
Group 2 O2,j N2,j − O2,j N2,j
Total Oj Nj − Oj Nj

From this table:

N1,j N1,j N2,j Oj (Nj − Oj )
E1,j = Oj , V1,j = .
Nj Nj2 (Nj − 1)
P P
Then accumulate (O1,j − E1,j ) and V1,j over all event times.

8
4 Kaplan–Meier Curves and Plotting
To visualize survival (or disclosure-time) differences between two groups, plot the Ka-
plan–Meier curves:
Y dg,j
Sg (t) =
b 1− , g = 1, 2,
t ≤t
ng,j
(j)

where ng,j and dg,j are the risk set size and number of events in group g at time t(j) . On
a single plot:
• The horizontal axis is time t (e.g., time until disclosure, in days/months).

• The vertical axis is estimated survival probability S(t).

• Each group’s curve is drawn as a step function.

• Tick marks indicate censored observations.

• Add legend identifying Group 1 (2015–2020) vs. Group 2 (2020–2025).

• Optionally include confidence bands (e.g., Greenwood’s formula).

Figure 1: Example of Kaplan–Meier survival curves .

Interpretation: If one curve lies consistently below another (i.e., drops faster), it
indicates shorter times to event (e.g., faster disclosures). However, formal inference uses
the log-rank test.

5 Real-Life Example: Court Case Disclosure Times

5.1 Context and Data Structure
We compare two cohorts of court cases:

9
• Group 1: Cases initiated during 2015–2020.

• Group 2: Cases initiated during 2020–2025.

Event of interest: disclosure of the case. For each case:

• caseID: unique identifier

• group: 1 if initiated in 2015–2020, 2 if in 2020–2025

• time: duration from initiation until disclosure or censoring (in appropriate units,
e.g. days or months)

• status: 1 if disclosure observed, 0 if censored (e.g. case still pending by end of

study, withdrawn, lost to follow-up)

Example of a small data table:

caseID group time status

C001 1 12.5 1
C002 1 30.0 0
C003 2 8.0 1
C004 2 15.2 1
C005 2 24.0 0
...

Here, time might be in months. A row with status = 0 indicates a right-censored

observation at that time.

5.2 Handling Censoring

Censoring may occur if:

• The case remains undisclosed by the study end date (e.g., last follow-up at December
2025).

• The case is withdrawn or dismissed without disclosure event.

• Data loss or subject dropout—but in legal data, typically censoring means still
pending at cutoff.

Censored cases contribute to risk sets up to their censoring time and then are removed
thereafter.

5.3 Kaplan–Meier Estimation

Compute Kaplan–Meier curves separately for each group:

Sb1 (t), Sb2 (t).

Software (e.g., R’s survival::survfit(), Python lifelines, or other) can produce esti-
mates and plots. The plot helps visualize any separation between curves.

10
5.4 Applying the Log-Rank Test
5.4.1 Step-by-Step Computation
1. Pool all observed event times across both groups; order distinct times t(1) < · · · <
t(J) .
2. For each event time t(j) :
• Determine N1,j : number of Group 1 cases with Ti ≥ t(j) .
• Determine N2,j : number of Group 2 cases with Ti ≥ t(j) .
• Determine O1,j : number of Group 1 cases with event exactly at t(j) (δi = 1).
• Determine O2,j : number of Group 2 cases with event at t(j) .
• Compute totals: Nj = N1,j + N2,j , Oj = O1,j + O2,j .
• Compute expected event count in Group 1:
N1,j
E1,j = Oj .
Nj
• Compute variance:
N1,j N2,j Oj (Nj − Oj )
V1,j = .
Nj2 (Nj − 1)
• Record the contribution O1,j − E1,j and V1,j .
3. Sum contributions:
J
X J
X
2
U1 = (O1,j − E1,j ), σ = V1,j .
j=1 j=1

4. Compute test statistic:

U1
Z=√ , χ2 = Z 2 .
σ2
5. Obtain p-value:
p = 1 − Fχ21 (χ2 ).
6. Decision: if p < α (e.g. 0.05), reject H0 , conclude the disclosure-time distributions
differ.

5.4.2 Illustrative Contingency Table at a Given Time

Suppose at time t(j) = 10 months:
N1,j = 50, N2,j = 60, O1,j = 5, O2,j = 8.
Then
50 650
Nj = 110, Oj = 13, E1,j = 13 ×= ≈ 5.909,
110 110
50 × 60 × 13 × (110 − 13) 50 × 60 × 13 × 97
V1,j = = .
1102 × 109 1102 × 109
Compute numeric value as needed. Then contribution to U1 is 5 − 5.909 = −0.909, and
add to overall sum; variance add accordingly.

11
5.5 Interpretation of Results
After computing the overall statistic:
P 2
j (O1,j − E 1,j )
χ2 = P ,
j V1,j

suppose we obtain χ2 = 5.12. Then with 1 df:

p = 1 − Fχ21 (5.12) ≈ 0.0236.

Since p < 0.05, we reject H0 and conclude that disclosure times differ between 2015–2020
and 2020–2025. One might then inspect which group has shorter times: if Group 2’s
Kaplan–Meier curve drops earlier, it indicates faster disclosures in 2020–2025 compared
to 2015–2020. In reporting:

“The log-rank test comparing disclosure times yields χ2 (1) = 5.12, p = 0.024,
indicating a statistically significant difference in disclosure-time distributions
between the two time periods. The Kaplan–Meier curves (Figure 1) show
that cases initiated in 2020–2025 tend to be disclosed earlier than those in
2015–2020.”

If instead p ≥ 0.05, conclude no significant difference.

5.6 Software Implementation Notes

• R:

library(survival)
# Assume df has columns: time, status (1=event, 0=censor), group (factor with
fit <- survfit(Surv(time, status) ~ group, data = df)
plot(fit, col = c("blue","red"), lty = 1:2, xlab="Time", ylab="Survival Proba
legend("topright", legend=levels(df$group), col=c("blue","red"), lty=1:2)
# Log-rank test:
lr <- survdiff(Surv(time, status) ~ group, data = df)
print(lr)
# survdiff gives chisq statistic and p-value via chisq distribution

• Python (lifelines):

from lifelines import KaplanMeierFitter, statistics

import matplotlib.pyplot as plt

kmf1 = KaplanMeierFitter()
kmf2 = KaplanMeierFitter()
mask1 = (df[’group’]==1)
mask2 = (df[’group’]==2)
kmf1.fit(df.loc[mask1,’time’], df.loc[mask1,’status’], label=’Group1’)
kmf2.fit(df.loc[mask2,’time’], df.loc[mask2,’status’], label=’Group2’)

12
ax = kmf1.plot_survival_function()
kmf2.plot_survival_function(ax=ax)
plt.xlabel(’Time’)
plt.ylabel(’Survival Probability’)
plt.title(’Kaplan{Meier Curves: Disclosure Times’)
# Log-rank:
from lifelines.statistics import logrank_test
results = logrank_test(
df.loc[mask1,’time’], df.loc[mask2,’time’],
event_observed_A=df.loc[mask1,’status’],
event_observed_B=df.loc[mask2,’status’]
)
print(results.test_statistic, results.p_value)

• Ensure correct coding of group indicator and status. Check assumptions: non-
informative censoring, independence, etc.
• For small sample sizes or few events, large-sample approximations may be poor;
consider exact methods or permutation if feasible.

6 Full Mathematical Derivations

6.1 Kaplan–Meier Product-Limit Formula
Let t(1) < t(2) < · · · < t(J) be ordered event times. At t(j) :
nj = #{i : Ti ≥ t(j) }, dj = #{i : Ti = t(j) , δi = 1}.
Then Y
dj
S(t)
b = 1− nj
.
t(j) ≤t

Derivation: The conditional probability of surviving beyond t(j) given survival just prior
is 1 − dj /nj . Multiply across event times (product-limit).

6.2 Log-Rank Test: Hypergeometric Model at Each Time

At a fixed event time t(j) :
• Nj = N1,j + N2,j subjects at risk.
• Oj = O1,j + O2,j events occur.
• Under H0 of equal hazard, each at-risk subject has equal instantaneous event prob-
ability. Hence, conditional on Oj total events among Nj , the number O1,j in Group
1 follows a hypergeometric distribution:

O1,j Oj ∼ Hypergeometric Nj , N1,j , Oj ,
with mean and variance:
N1,j N1,j N2,j Oj (Nj − Oj )
E1,j = Oj , Var(O1,j ) = .
Nj Nj2 (Nj − 1)

13
Use of Hypergeometric Distribution in Log-Rank
Test
At a fixed event time tj , suppose:

– Nj : total individuals at risk,

– N1j : individuals at risk in Group 1,
– N2j = Nj − N1j : individuals at risk in Group 2,
– Oj : total number of events observed,
– O1j : number of events observed in Group 1.

Under the null hypothesis H0 (equal hazard rates between groups), all at-risk indi-
viduals have the same probability of experiencing the event. Hence, the assignment
of the Oj events to the two groups is random and without replacement.
This follows the hypergeometric distribution:

O1j | Oj ∼ Hypergeometric(Nj , N1j , Oj )

With:

N1j N1j N2j Oj (Nj − Oj )

E(O1j ) = Oj · , Var(O1j ) =
Nj Nj2 (Nj − 1)
This distribution is used because:

– It models the probability of observing a specific number of events in a subgroup

(Group 1),
– From a fixed population (at-risk individuals) with a fixed number of events,
– Under random allocation (equal hazard assumption),
– Without replacement (each event occurs only once).

Summing over times yields aggregate mean zero, variance sum of per-time variances
(approximate independence across times).

6.3 Asymptotic Distribution

Under regularity conditions and moderate-to-large total number of events, by central
limit theorem for sums of independent (or weakly dependent) increments,
J
X J
X
U1 = (O1,j − E1,j ) ≈ N 0, V1,j .
j=1 j=1

Thus
U1
Z = qP ≈ N (0, 1), Z 2 ≈ χ21 .
j V1,j

Hence use χ21 distribution for hypothesis testing.

14
6.4 Connection to Chi-Square Goodness-of-Fit
The log-rank test can be viewed as a generalized chi-square test comparing observed vs.
expected counts across the risk sets over time. At each event time, the contingency table
yields a 2×2 comparison; aggregating over times accumulates evidence of departure from
the null pattern of proportional event occurrence. The final statistic sums the squared
standardized deviations, akin to a chi-square sum of (observed minus expected) over
variance, yielding an overall chi-square with 1 df for two groups.

7 Contingency Tables and Aggregation

7.1 At Each Time t(j)
As in Table 1, for each event time form the 2×2:

Event No Event (still at risk) Total at risk

Group 1 O1,j N1,j − O1,j N1,j
Group 2 O2,j N2,j − O2,j N2,j
Total Oj Nj − Oj Nj

Compute:
N1,j N1,j N2,j Oj (Nj − Oj )
E1,j = Oj , V1,j = .
Nj Nj2 (Nj − 1)
Define
d1,j = O1,j , d2,j = O2,j .

7.2 Aggregating Across Times

We accumulate:
J
X J
X
U1 = (d1,j − E1,j ), Var(U1 ) = V1,j .
j=1 j=1
p
Test statistic Z = U1 / Var(U1 ), or χ2 = Z 2 .

8 Example Walk-Through (Hypothetical Data)

Below is a stylized example with a small number of cases for illustration. In practice, one
would use software.

15
8.1 Hypothetical Dataset

Table 2: Example data: court case disclosure times

caseID group time (months) status

C001 1 5.0 1
C002 1 8.0 1
C003 1 12.0 0
C004 1 15.0 1
C005 2 4.0 1
C006 2 9.0 1
C007 2 11.0 0
C008 2 14.0 1

8.2 Event Times and Risk Sets

Distinct event times: 4.0, 5.0, 8.0, 9.0, 14.0, 15.0 (months). For each:

• At t = 4.0:

N1 = 4 (cases C001–C004 at risk), N2 = 4 (C005–C008 at risk),

4
event in Group 2: O2 = 1, O1 = 0. Compute E1 = 1 × 8
= 0.5, etc.

• At t = 5.0: adjust risk sets removing those with event at 4.0; and so on.
P
Compute each time’s table, accumulate U1 and V1,j . Finally compute Z.

8.3 Interpreting Hypothetical Outcome

After computation, suppose Z = .4242, so χ2 = .18, p ≈ 0.67. Then conclude no
significant difference in disclosure times between groups in this hypothetical dataset.

9 Practical Considerations
• Ties: If multiple events occur at exactly same time, different methods exist (exact,
Breslow, Efron) in Cox model context. For the log-rank test, ties are handled
by treating Oj as total events at that time; hypergeometric model handles ties
naturally.

• Proportional hazards assumption: The log-rank test is most powerful when

hazards are proportional (the ratio of hazard functions between groups is constant
over time). If hazards cross, log-rank may have low power or misleading results;
consider alternative tests (e.g., weighted log-rank tests, Fleming–Harrington).

• Censoring patterns: If censoring differs markedly between groups, interpretation

must consider potential bias; assume non-informative censoring.

16
• Sample size and number of events: The approximation to chi-square is bet-
ter with larger number of observed events. With few events, consider exact or
permutation tests if feasible.

• Multiple groups: For more than two groups, one can extend log-rank to a multi-
sample test, leading to χ2k−1 distribution for k groups. Here we focus on two-group
case (1 df).

10 References
References
[1] Klein, J. P., & Moeschberger, M. L. (2003). Survival Analysis: Techniques for Cen-
sored and Truncated Data. Springer.

[2] Collett, D. (2003). Modelling Survival Data in Medical Research. CRC Press.

[3] Hosmer, D. W., Lemeshow, S., & May, S. (2008). Applied Survival Analysis: Regres-
sion Modeling of Time-to-Event Data. Wiley.

[4] Thern eau, T. M., & Grambsch, P. M. (2000). Modeling Survival Data: Extending the
Cox Model. Springer.

Chapter Three
No ratings yet
Chapter Three
10 pages
Survival Analysis-Debby Raden
No ratings yet
Survival Analysis-Debby Raden
98 pages
Survival Analysis Theory 2024-4
No ratings yet
Survival Analysis Theory 2024-4
49 pages
MAS361 SurvAnalysis 2024-25
No ratings yet
MAS361 SurvAnalysis 2024-25
43 pages
Survival Analysis for Biostatisticians
No ratings yet
Survival Analysis for Biostatisticians
86 pages
Time To Event Data
No ratings yet
Time To Event Data
48 pages
Introduction To Survival Analysis in R
No ratings yet
Introduction To Survival Analysis in R
46 pages
Chap1 PDF
No ratings yet
Chap1 PDF
12 pages
Survival Analysis for Academics
No ratings yet
Survival Analysis for Academics
133 pages
Survival Analysis
No ratings yet
Survival Analysis
6 pages
Introduction To Survival Analysis: Kristin Sainani Ph.D. Stanford University Department of Health Research and Policy
No ratings yet
Introduction To Survival Analysis: Kristin Sainani Ph.D. Stanford University Department of Health Research and Policy
70 pages
H (T) /frac (F (T) ) (S (T) ) H (T) : Survival Analysis
No ratings yet
H (T) /frac (F (T) ) (S (T) ) H (T) : Survival Analysis
2 pages
Longitudinal EBM-CAS Introduction To Survival Analysis and Log-Rank Test-Dr. Vicka Oktaria, MPH, PH.D (2023)
No ratings yet
Longitudinal EBM-CAS Introduction To Survival Analysis and Log-Rank Test-Dr. Vicka Oktaria, MPH, PH.D (2023)
24 pages
FALLSEM2021-22 CSI3004 ETH VL2021220104123 Reference Material I 29-11-2021 Patient Survival Analysis
No ratings yet
FALLSEM2021-22 CSI3004 ETH VL2021220104123 Reference Material I 29-11-2021 Patient Survival Analysis
70 pages
Week 1 - Introduction
No ratings yet
Week 1 - Introduction
48 pages
Kaplan-Meier Survival Curves and The Log-Rank Test
No ratings yet
Kaplan-Meier Survival Curves and The Log-Rank Test
42 pages
M1 Survival
No ratings yet
M1 Survival
46 pages
Survival/Event History Analysis: BSC, MSC Biostatistics and Health Informatics
No ratings yet
Survival/Event History Analysis: BSC, MSC Biostatistics and Health Informatics
27 pages
Advanced Survival Analysis Guide
No ratings yet
Advanced Survival Analysis Guide
71 pages
Survival Analysis
No ratings yet
Survival Analysis
13 pages
Distribution (Weibull) Fitting
No ratings yet
Distribution (Weibull) Fitting
40 pages
Lecture 4
No ratings yet
Lecture 4
42 pages
Da Pra Week-14 (Survival Analysis) - 102311
No ratings yet
Da Pra Week-14 (Survival Analysis) - 102311
11 pages
Survival Analysis
No ratings yet
Survival Analysis
267 pages
Distribution Weibull Fitting
No ratings yet
Distribution Weibull Fitting
48 pages
Survival Analysis
No ratings yet
Survival Analysis
42 pages
Introduction To Clinical Research Survival Analysis - Getting Started Karen Bandeen-Roche, Ph.D. July 20, 2010
No ratings yet
Introduction To Clinical Research Survival Analysis - Getting Started Karen Bandeen-Roche, Ph.D. July 20, 2010
33 pages
Survival Analysis With Python Tutorial - How, What, When, and Why - by Towards AI Team - Towards AI
No ratings yet
Survival Analysis With Python Tutorial - How, What, When, and Why - by Towards AI Team - Towards AI
44 pages
An Introduction To Survival Analysis Using Stata 3rd Edition Mario Cleves Download
No ratings yet
An Introduction To Survival Analysis Using Stata 3rd Edition Mario Cleves Download
79 pages
Topic 2 - Introduction To Survival Analysis
No ratings yet
Topic 2 - Introduction To Survival Analysis
40 pages
Stevenson Survival Analysis 195.721 PDF
No ratings yet
Stevenson Survival Analysis 195.721 PDF
31 pages
Analysis of Survival Data - LN - D Zhang - 05
100% (1)
Analysis of Survival Data - LN - D Zhang - 05
264 pages
Survival 2
No ratings yet
Survival 2
31 pages
Ch11 Survival Analysis
No ratings yet
Ch11 Survival Analysis
140 pages
Censored Data Analysis Guide
No ratings yet
Censored Data Analysis Guide
36 pages
Survival Analysis Detailed
No ratings yet
Survival Analysis Detailed
14 pages
Survival Analysis Notes
No ratings yet
Survival Analysis Notes
13 pages
Bio2 Module 6 - Survival Analysis
No ratings yet
Bio2 Module 6 - Survival Analysis
18 pages
Survival Analysis Dengan Pendekatan R
No ratings yet
Survival Analysis Dengan Pendekatan R
32 pages
Survival Analysis
No ratings yet
Survival Analysis
10 pages
Survival Competing Risk
No ratings yet
Survival Competing Risk
29 pages
Survival Analysis Using S
No ratings yet
Survival Analysis Using S
163 pages
Biostatistics II: Survival Analysis: Department of Biostatistics, Erasmus University Medical Center
No ratings yet
Biostatistics II: Survival Analysis: Department of Biostatistics, Erasmus University Medical Center
429 pages
Borgan 2014
No ratings yet
Borgan 2014
10 pages
Rao Schoenfeld 2007 Survival Methods
No ratings yet
Rao Schoenfeld 2007 Survival Methods
5 pages
Survival Analysis Tanyaradzwa
No ratings yet
Survival Analysis Tanyaradzwa
13 pages
Survival Analysis With STATA 1701597623
No ratings yet
Survival Analysis With STATA 1701597623
252 pages
Kaplan-Meier Survival Analysis Guide
No ratings yet
Kaplan-Meier Survival Analysis Guide
23 pages
Survival Analytics
No ratings yet
Survival Analytics
18 pages
7.2 Estimation of Survival Function: T T T W W W W W
No ratings yet
7.2 Estimation of Survival Function: T T T W W W W W
5 pages
Survival Analysis
No ratings yet
Survival Analysis
8 pages
Survival Part 1
100% (1)
Survival Part 1
37 pages
Event History Analysis With R Chapman Hall CRC The R Series 2nd Edition Göran Broström Direct Download
No ratings yet
Event History Analysis With R Chapman Hall CRC The R Series 2nd Edition Göran Broström Direct Download
130 pages
An Introduction To Survival Analysis Using Stata 3rd Edition Mario Cleves Instant Download
No ratings yet
An Introduction To Survival Analysis Using Stata 3rd Edition Mario Cleves Instant Download
92 pages
LP (English 2)
100% (1)
LP (English 2)
4 pages
ISO 13485 LAT - Training Objective
No ratings yet
ISO 13485 LAT - Training Objective
2 pages
2014-15 AY Calendar PDF
No ratings yet
2014-15 AY Calendar PDF
2 pages
English 1 Lesson 8 Business Letters
No ratings yet
English 1 Lesson 8 Business Letters
18 pages
Pragyan Vol.06, Isu.02
No ratings yet
Pragyan Vol.06, Isu.02
88 pages
Postcolonialism and Achebes Anthills Ofsavannahcongressofijasrome
No ratings yet
Postcolonialism and Achebes Anthills Ofsavannahcongressofijasrome
4 pages
Construction Lab Report
No ratings yet
Construction Lab Report
150 pages
Import CAD Files Into ADAMS
No ratings yet
Import CAD Files Into ADAMS
3 pages
For Anne Gregory
No ratings yet
For Anne Gregory
4 pages
Grade 4 - Q2 Math Lesson
No ratings yet
Grade 4 - Q2 Math Lesson
20 pages
Shell Tonna S3 M Leaflet
No ratings yet
Shell Tonna S3 M Leaflet
2 pages
Getting Started With Vsphere Command-Line Interfaces: Esxi 5.1 Vcenter Server 5.1
No ratings yet
Getting Started With Vsphere Command-Line Interfaces: Esxi 5.1 Vcenter Server 5.1
38 pages
AutoCAD VBA Integration with Excel
No ratings yet
AutoCAD VBA Integration with Excel
59 pages
Org Design For Design Orgs: Building and Managing In-House Design Teams 1st Edition Peter Merholz Download
100% (5)
Org Design For Design Orgs: Building and Managing In-House Design Teams 1st Edition Peter Merholz Download
83 pages
QMS Process Audit Strategies
No ratings yet
QMS Process Audit Strategies
2 pages
Greek Civilization - Group 4 Report
No ratings yet
Greek Civilization - Group 4 Report
12 pages
03 Driving Force For Solidification
No ratings yet
03 Driving Force For Solidification
8 pages
Architecture Thesis: Transport Hub
No ratings yet
Architecture Thesis: Transport Hub
91 pages
F Cps Pearson Math Textbooks Award
No ratings yet
F Cps Pearson Math Textbooks Award
10 pages
Resident Marine Mammals Case Digest
100% (16)
Resident Marine Mammals Case Digest
5 pages
Rental Property Preferences
No ratings yet
Rental Property Preferences
9 pages
Chapter 5 Lord of The Flies
No ratings yet
Chapter 5 Lord of The Flies
4 pages
Multiple Career Development Pathways Toolkit
No ratings yet
Multiple Career Development Pathways Toolkit
31 pages
Eurocode 1: Structural Actions Guide
0% (1)
Eurocode 1: Structural Actions Guide
2 pages
Oct.10 Am - Bernard Dion - A Model-Based Approach For The Design of Avionics Systems and Embedded Software
No ratings yet
Oct.10 Am - Bernard Dion - A Model-Based Approach For The Design of Avionics Systems and Embedded Software
55 pages
DIR/Floortime: Becoming More Matthew by Joshua Feder, MD
100% (1)
DIR/Floortime: Becoming More Matthew by Joshua Feder, MD
3 pages
CV DR Moetrarsi
No ratings yet
CV DR Moetrarsi
2 pages
Team Development Stages Guide
No ratings yet
Team Development Stages Guide
7 pages
MEF Specs for Carrier Ethernet
No ratings yet
MEF Specs for Carrier Ethernet
57 pages
2019 PSSA ISS Math Grade 7 PDF
No ratings yet
2019 PSSA ISS Math Grade 7 PDF
42 pages