Q-Q Plots Likelihood MLE
• Understanding • Defining • MLE Properties
• Interpreting • Interpreting • MLE in Survival Analysis
• Assessing • Log-Likelihood • Parameter Estimation
Evaluate model fit with Understand Apply MLE to
Q-Q plots. likelihood as a estimate survival
model fit measure model parameters.
Imagine a clinical trial testing a new drug for a
specific type of cancer. Researchers are
interested in understanding how long patients
survive after starting the treatment.
In a cancer drug trial, how can
researchers analyze patient
survival time?
Ways of Survival Analysis
Non-Parametric Models (Kaplan-Meier Estimator):
• Assumptions: Minimal assumptions, making it suitable for exploratory
analysis.
• Advantages: Provides a non-parametric estimate of the survival function.
• Disadvantages: Less powerful for comparing groups or identifying risk
factors.
Semi-Parametric Models (Cox Proportional Hazards Model)
• Assumptions: Does not assume a specific distribution for survival
times but assumes proportional hazards.
• Advantages: Flexible and robust to distributional assumptions.
• Disadvantages: Limited to proportional hazards assumption.
Ways of Survival Analysis
And the topic of our lecture today is
Parametric Survival Models
These models assume a specific distribution for survival times, such
as exponential, Weibull, or log-normal. This assumption allows for
precise estimation of survival probabilities and confidence intervals.
However, it's crucial to correctly specify the distribution, as incorrect
assumptions can lead to biased results.
Q-QPlots in Survival Analysis
What is a Q-Q Plot?
A Q-Q plot (Quantile-Quantile plot) is a graphical tool used to
assess if a sample comes from a particular theoretical
distribution. It compares the quantiles of the observed data to
the quantiles of the theoretical distribution.
Used to assess if a dataset follows a specified distribution
(e.g., normal, Weibull).
Quantile
A quantile is a value that divides a dataset or probability distribution into equal-
sized intervals. It helps to understand the distribution of data by marking specific
points on the scale.
For example, in a dataset:
• Median (50th percentile) is a quantile that splits data into two equal halves.
• Quartiles (like the 25th and 75th percentiles) divide data into four equal parts.
• Percentiles divide data into 100 equal parts, where the 90th percentile, for
instance, indicates that 90% of the data lies below this value.
Quantiles are widely used in statistics to assess data distribution, compare
distributions, and identify outliers.
Q-Q Plot
The x-axis represents the theoretical quantiles, and the y-axis
represents the sample quantiles.
Purpose:
• Assessing Distribution Fit:
⚬ Used to assess if a dataset follows
a specified theoretical distribution
(e.g., normal, Weibull, log-normal).
⚬ Helps to visually check the
goodness-of-fit for the chosen
distribution.
How to Interpret a Q-Q Plot:
• If the points on the Q-Q plot roughly follow a straight line, it suggests
that the data may follow the assumed distribution.
• Deviations from the straight line indicate potential departures from
the assumed distribution.
• From the
Straight Line Equation
we determine the best fit
line for the data:
Y=bx+a
How to draw a Q-Q plot to assess if the data follows the assumed distribution
1. *Sort the data*: Begin by sorting your data in ascending order. This step is essential for calculating the quantiles.
2. *Calculate the quantiles of the distribution*: Determine the theoretical quantiles of the assumed distribution you wish to
compare your data against. Common choices include the quantiles of a normal or uniform distribution, depending on the context.
3. *Calculate the quantiles of the data*: Compute the quantiles of your sorted data. These are essentially the data points ordered
in such a way that they can be compared to the theoretical quantiles.
4. *Plot the quantiles*: On a scatter plot, plot the calculated quantiles of your data against the theoretical quantiles of the
assumed distribution. The x-axis would show the quantiles of the assumed distribution, while the y-axis would display the
quantiles of your data.
5. *Interpret the plot*: If the points on the Q-Q plot fall along a straight line, it indicates that your data closely follows the assumed
distribution. Deviations from a straight line suggest departures from the assumed distribution.
• 6. *Assess the goodness of fit*: Analyze the Q-Q plot. If the points follow a straight line closely, it suggests a good fit to the
assumed distribution. Any curvature, outliers, or deviations indicate that the data may not conform to the expected
distribution.
1-Straight Line (Good Fit)
• If the points follow the 45-degree line closely,
your data likely follows the assumed
distribution.
• Small random deviations around the line are
normal but should not form any specific
pattern.
•
3-S-Shaped Curve (Light or Heavy Tails)
• Upward curve (concave): Indicates heavy
tails. Your data has more extreme values
than expected for the theoretical distribution.
• Downward curve (convex): Indicates light
tails. Your data has fewer extreme values
than expected.
•
Example:
Does the following Sample come from a normal distributed
population?3.89 4.75 6.33 4.75 7.21 5.78 5.80 5.20 7.90
• First, order the data from smallest to largest
3.89 4.75 4.75 5.20 5.78 5.80 6.33 7.21 7.90
• Then plot these values against appropriate quantiles from the
standard normal distribution
• divide the distribution into 10 equal areas
n=9
n+1=10
• find the theoritical quantile
from Standard normal table
find the values of standard
normal random variable that
make that happen and draw in
the X-axis
Identify the sample quantiles corresponding to the 25th and 75th percentiles:
• 25th percentile (Q1sampleQ1sample): 4.75
• 75th percentile (Q3sampleQ3sample): 6.33
Identify the theoretical quantiles corresponding to the 25th and 75th percentiles:
• 25th percentile (Q1theoreticalQ1theoretical): −0.674
• 75th percentile (Q3theoreticalQ3theoretical): 0.674
Calculate the slope (m):
m=0.674−(−0.674)6.33−4.75=1.3481.58≈1.17
Calculate the intercept (b):
b=4.75−1.17⋅(−0.674)=4.75+0.79≈5.54
The equation of the line is:
y=m⋅x+b
Substituting m=1.17 and b=5.54 :
y=1.17⋅x+5.54
1.Normal Distribution: The plot compares your data quantiles with the theoretical quantiles
of a normal distribution. The closer the points are to the red line, the more normally
distributed your data is.
2.Log-Normal Distribution: This plot uses a log-normal scale. It shows how well your data
matches a log-normal distribution.
3.Log-Logistic Distribution: The plot is based on the log-logistic distribution, using its
theoretical quantiles for comparison.
Likelihood Function
The likelihood function L(θ∣X) is a function of the parameters θ given the
observed data X. It represents the probability (or probability density) of
observing the data X under a specified model with parameters θ.
Role in Survival Analysis:
In survival analysis, the likelihood function is used to estimate the parameters
of the survival distribution.
• By maximizing the likelihood function, we find the parameter values that
are most likely to have generated the observed data.
Likelihood Function
2. Constructing the Likelihood Function
Suppose we have:
• A dataset X={x1,x2,…,xn}, where each Xi represents an observed data point.
• A probability distribution with parameter(s) θ that we want to estimate.
The likelihood function is the product of the probability densities (or probabilities, for
discrete data) of each observed data point, given the parameter θ:
Likelihood Function
•Hazard Function h(x; θ):
- Represents the instantaneous risk of an event at time
x.
- For exponential distribution: h(x; θ) = λ.
•Survival Function S(x; θ):
- Represents the probability of survival up to time x.
- For exponential distribution: S(x; θ) = e^(-λx).
we will use this soon
we have to know that uncensored data
=
Censored Data: Each type has a specific contribution to the likelihood
function.
• In survival analysis, data may be censored due to study limitations or
event timing outside observation periods.
1. Right-Censored Data
the event happend after a certain time
2. Left-Censored Data
the event happend before the certain time
3.interval-censored
the event happend between 2 known times
• xi : Uncensored event times (exact times of the event ),
• yj : Right-censored times (the event occurs after yj ),
• zk : Left-censored times (the event occurs before zk ),
• (ai,bi) : Interval-censored times
The likelihood function for a dataset with all types of censored data
is:
where:
n0: Number of uncensored observations.
nr: Number of right-censored observations.
nl: Number of left-censored observations.
ni: Number of interval-censored
observations.
if any tybe of censored is not exist we can remove it’s function
Maximum likelihood estimation
•Definition:
•Maximum Likelihood Estimation (MLE) is a method to estimate model
parameters.
•Finds parameter values that maximize the probability of observing the data
under the model.
•Purpose of MLE:
•Identifies parameters that best fit the data within a specified model.
•How MLE Works:
•Step 1: Construct the likelihood function.
Maximum likelihood estimation
Step 2: Log-Likelihood Function:
To simplify the calculations, we take the logarithm of the likelihood function to obtain the log-
likelihood function:
Step 3: Taking the
Derivative:
To find the MLE for θ, we differentiate the log-likelihood function with respect to θ and set it
equal to zero:
general example
Consider five subjects with different types of event
time or censoring information. We assume a
probability density function f(t)and hazard function
h(t) for the outcome.
Write down the contribution to the likelihood for
each subject.
Then, construct the full likelihood L(θ)by combining
each individual contribution, using the hazard
function h(t), survival function S(t), and cumulative
distribution function F(t) as necessary
Answer:-
Alice experienced the event at time
t=3 we can use this or this depend on the
Bob was right-censored at time question
t=7
Chris experienced the event at time
t=4
Dana was left-censored at time t=2
Erin was interval-censored between t=5 and
t=9
Answer:-
The full likelihood L(θ) is obtained by taking the product of each subject’s
independent contribution:
Expanding each term using the hazard and survival
functions:
The Maximum Likelihood Estimation (MLE) for θ can then be obtained by maximizing this likelihood function or,
more commonly, the log-likelihood function based on these contributions.
it is to complex because we have all types of censored data and it is not in most
question
usually questions contain 1 or 2 types
now we will take about MLE in different survival
distributions
EXPONENTIAL
DISTRIBUTION
One-Parameter Exponential Distribution:
The one-parameter exponential distribution is a continuous probability
distribution that is often used to model the time between events. It has a
single parameter, typically denoted by λ (lambda), which represents the rate
parameter.
Mean = 1/λ
density
function:
Survival function:
hazard function:
cumulative hazard
functions:
Estimation of λ for Data without Censored Observations:
Suppose that there are n persons in the study and everyone is followed to death or failure
Let t1 , t2 ,..., tn be the exact survival times of the n people.
The likelihood
function:
the log-likelihood function:
the MLE of λ :
the MLE of mean :
confidence interval
confidence interval for λ is:
confidence interval for the mean:
Exampl
e
Consider the following remission times in weeks for 21 patients with acute
leukemia:1,1,2,2,3,4,4,5,5,6,8,8,9,10,10,12,14,16, 20, 24, and 34. Assume that remission duration
follows the exponential distribution. obtain:
(a) The MLE of λ (b) The MLE of mean (c) The 95% confidence intervals for λ and mean
a
n = 21
) significance level = .05
b)
c
)
c
)
Estimation of λ for Data with Censored Observations:
The likelihood function
with rigth:censored
the log-likelihood function
:
MLE of the parameter :
the MLE of mean :
confidence interval
the same equation of uncensored data but we change the n (number
of sample) to r (number of uncensored data)
Example
Suppose that in a laboratory experiment 10 mice are exposed to carcinogens. The experimenter decides to
terminate the study after half of the mice are dead and to sacrifice the other half at that time. The survival
times of the five dead mice are 4, 5, 8, 9, and 10 weeks. The survival data of the 10 mice are 4, 5, 8, 9, 10,
10, 10+, 10+, 10+, and 10+. Assuming that the failure of these mice follows an exponential distribution
The probability of surviving given time for the mice can be estimated
from . For example, the probability that a mouse exposed to the
same carcinogen will survive longer than 8 weeks is
The probability of dying in 8 weeks is then 1-
0.629=0.371.
Weibull Distribution
Weibull Distribution in Survival analysis
The Weibull distribution is widely used in survival analysis to model time-to-
event data, particularly when the event rate varies over time. This distribution
is flexible because it can model increasing, constant, or decreasing hazard
rates, depending on its shape parameter
Mean=λΓ(1+1/𝛾)
density
function:
survivorship function:
hazard function:
cumulative hazard
functions:
Estimation of λ and γ for Data without Censored Observations:
Suppose that there are n persons in the study and everyone is followed to death or failure
Let t1 , t2 ,..., tn be the exact survival times of the n people.
The likelihood
function:
the log-likelihood function:
Maximum likelihood Estimation of λ ,𝛾
for Data without Censored Observations
The MLE of λ :
The MLE of 𝛾 :
The MLE of mean:
Example on the MLE Weibull Uncensored Data
Consider the following remission times in weeks for 21 patients with acute
leukemia:1,1,2,2,3,4,4,5,5,6,8,8,9,10,10,12,14,16, 20, 24, and 34. Assume that remission duration
follows the Weibull Distribution. obtain:
(a) The MLE of λ (b) The MLE of 𝛾 (c) The MLE of mean
λ
𝛾
𝜆 𝛾
The second derivative in Newton-Raphson:
• Provides information on the curvature of the log-likelihood function.
• Helps adjust step size and direction to improve convergence.
• Reduces the number of iterations needed for convergence,
especially in cases with well-behaved functions.
Since 𝛾 is slightly greater than 1, it indicates a slightly right-skewed distribution.
𝛾λ
Now lets apply the Estimated parameters on the S(t)
After calculating S(8)=0.6472, it means that there is a 64.72% probability that a patient will remain in
remission for at least 8 weeks. In other words, 64.72% of the patients are expected to survive beyond the
8-week mark.
Estimation of λ and γ for Data with Censored Observations:
Suppose that there are n persons in the study and everyone is followed to death or failure
Let t1 , t2 ,..., tn be the exact survival times of the n people.
The likelihood
function:
Assuming that δi is censorship indicator, where δ=1 if the event was observed (uncensored) and δ=0 if the data is censored.
the log-likelihood function:
Maximum likelihood Estimation of λ ,𝛾
for Data with Censored Observations
The MLE of λ :
The MLE of 𝛾 :
The MLE of mean: Calculating the exact mean can be more complex and generally requires
numerical integration. The mean will depend on the censoring level and the
distribution of the censored observations, as well as the parameter estimates λ, 𝛾
Example on the MLE Weibull Censored Data
Suppose that in a laboratory experiment 10 mice are exposed to carcinogens. The experimenter decides to
terminate the study after half of the mice are dead and to sacrifice the other half at that time. The survival
times of the five dead mice are 4, 5, 8, 9, and 10 weeks. The survival data of the 10 mice are 4, 5, 8, 9, 10, 10,
10+, 10+, 10+, and 10+. Assuming that the failure of these mice follows an Weibull Distribution
(a) The MLE of λ (b) The MLE of 𝛾 (c) The MLE of mean
where r is the number of uncensored data
points.
Step 3 Partial Derivative with Respect to λ:
Setting the derivative to 0 we get
Step 4 Partial Derivative with Respect to 𝛾:
Setting the derivative to 0 we get
The two estimated equations generally do not have a
closed-form solution, so they must be solved
numerically, arguably we tend to use R program or any
other softwares to solve them.
After using R program to estimate the parameters
𝛾λ
Now lets apply the Estimated parameters on the S(t)
For instance let’s calculate S(8) by substituting t=8, λ=0.07900991, 𝛾= 1.93827
After getting S(8)=0.6630, This result means that the probability a
mouse survives beyond 8 weeks is approximately 66.3%.
LOG-LOGISTIC
DISTRIBUTION
The Log-Logistic Distribution in Survival Analysis
The log-logistic distribution is a continuous probability distribution often used in
survival analysis and reliability engineering. It has two parameters: the shape
parameter (α), which affects the tail behavior and hazard function peak, and the
scale parameter here (γ) , “(β)”, which stretches or compresses the distribution
along the time axis. This distribution is versatile for modeling data with various
hazard rates and is particularly useful for lifetimes and failure times,
accommodating heavy-tailed data commonly found in practice.
density ≥ α γ
function:
survivorship function
survivor function ( S(t) )
cumulative hazard
functions:
Estimation of γ,α for Data without Censored Observations:
Suppose that there are n persons in the study and everyone is followed to death or failure
Let t1 , t2 ,..., tn be the exact survival times of the n people.
The likelihood
function:
the log-likelihood function:
Given the following remission times (in weeks) for 5 patients
with acute leukemia: t1=1, t2=2 , t3=4 , t4=6 , t5=8.
Step 1: Write Down the Likelihood Function
Step 2: get the log-likelihood function
Step 3: Substitute the Density Function into the
Log-Likelihood Function
Given the following remission times (in weeks) for 5 patients
with acute leukemia: t1=1, t2=2 , t3=4 , t4=6 , t5=8.
Step 3: Substitute the Density Function into
the Log-Likelihood Function
Given the following remission times (in weeks) for 5 patients
with acute leukemia: t1=1, t2=2 , t3=4 , t4=6 , t5=8.
Step 4: Differentiate the Log-Likelihood Function
The previous equations are nonlinear and need to be solved
numerically using methods such as the Newton-Raphson method
Step 1: Organize the Data
Step 2: Calculate the Kaplan-Meier Estimator
2,3.5, 5, 7, 9, 10, 15, 20,30,40
0.7
Step 3: Substitute the Density Function into
the Log-Likelihood Function
Step 4: Trial Values for α alpha and γ gamma
Step 4: Trial Values for α alpha and γ gamma
Step 3.2: Compare and Update α and γ
Step 3.3: Adjusting the Parameters
α γ
New Trial Values for α and γ
library(survival)
library(car)
set.seed(42)
survival_time <- rexp(100, rate = 0.2)
censoring <- sample(c(0, 1), 100, replace = TRUE, prob = c(0.3,
0.7))
surv_data <- Surv(time = survival_time, event = censoring)
uncensored_data <- survival_time[censoring == 1]
qqPlot(uncensored_data, distribution = "exp", rate = 0.2,
main = "QQ Plot - Exponential Distribution (Uncensored
Data Only)",
ylab = "Ordered Survival Times (Uncensored)",
xlab = "Theoretical Quantiles")
qqPlot(uncensored_data, distribution = "weibull", shape = 1.5,
scale = 5,
main = "QQ Plot - Weibull Distribution (Uncensored Data
Only)",
ylab = "Ordered Survival Times (Uncensored)",
xlab = "Theoretical Quantiles")
set.seed(42)
lambda <- 0.2
survival_time <- rexp(100, rate = lambda)
censoring <- sample(c(0, 1), 100, replace = TRUE, prob = c(0.3, 0.7))
likelihood_function <- function(lambda, survival_time, censoring) {
uncensored_likelihood <- lambda * exp(-lambda *
survival_time[censoring == 1])
censored_likelihood <- exp(-lambda * survival_time[censoring == 0])
total_likelihood <- prod(uncensored_likelihood) *
prod(censored_likelihood)
return(total_likelihood)
}
lambda_values <- seq(0.01, 1, by = 0.01)
likelihood_values <- sapply(lambda_values, function(lambda)
likelihood_function(lambda, survival_time, censoring))
plot(lambda_values, likelihood_values, type = "l", col = "blue", lwd = 2,
main = "Likelihood vs Lambda for Exponential Distribution (Censored
Data)",
xlab = "Lambda", ylab = "Likelihood")
abline(v = lambda_values[which.max(likelihood_values)], col = "red", lwd
set.seed(42)
n <- 100
shape_param <- 1.5
scale_param <- 2
survival_time <- rweibull(n, shape = shape_param, scale = scale_param)
censoring <- sample(c(0, 1), n, replace = TRUE, prob = c(0.3, 0.7))
likelihood_function_weibull <- function(shape, scale, survival_time, censoring) {
uncensored_likelihood <- (shape / scale) * (survival_time[censoring == 1] /
scale)^(shape - 1) *
exp(-(survival_time[censoring == 1] / scale)^shape)
censored_likelihood <- exp(-(survival_time[censoring == 0] / scale)^shape)
total_likelihood <- prod(uncensored_likelihood) * prod(censored_likelihood)
return(total_likelihood)
}
shape_values <- seq(0.1, 3, by = 0.1)
scale_values <- seq(0.5, 5, by = 0.1)
likelihood_values <- outer(shape_values, scale_values, Vectorize(function(shape, scale) {
likelihood_function_weibull(shape, scale, survival_time, censoring)
}))
# Plot the likelihood surface
persp(shape_values, scale_values, likelihood_values, theta = 30, phi = 30,
col = "lightblue", ltheta = 120, shade = 0.5,
xlab = "Shape Parameter (k)", ylab = "Scale Parameter (λ)", zlab = "Likelihood",
main = "Likelihood Surface for Weibull Distribution (Censored Data)")
set.seed(42)
n <- 100
shape_param <- 1.5
scale_param <- 2
survival_time <- scale_param * ((runif(n) / (1 - runif(n)))^(1 / shape_param))
censoring <- sample(c(0, 1), n, replace = TRUE, prob = c(0.3, 0.7))
loglogistic_likelihood <- function(alpha, beta, survival_time, censoring) {
uncensored_likelihood <- (alpha / beta) / (1 + (survival_time[censoring == 1] /
beta)^alpha)
censored_likelihood <- 1 / (1 + (survival_time[censoring == 0] / beta)^alpha)
total_likelihood <- prod(uncensored_likelihood) * prod(censored_likelihood)
return(total_likelihood)
}
alpha_values <- seq(0.5, 3, by = 0.1)
beta_values <- seq(0.5, 5, by = 0.1)
likelihood_values <- outer(alpha_values, beta_values, Vectorize(function(alpha, beta) {
loglogistic_likelihood(alpha, beta, survival_time, censoring)
}))
likelihood_values[1:5, 1:5]
persp(alpha_values, beta_values, likelihood_values, theta = 30, phi = 30,
col = "lightblue", ltheta = 120, shade = 0.5,
xlab = "Shape Parameter (α)", ylab = "Scale Parameter (β)", zlab = "Likelihood",
main = "Likelihood Surface for Log-Logistic Distribution (Censored Data)")
Any Questions?