Incomplete D?
ta methods
Supplementary Material 1:
On the probability distributions
Valentin Patilea
Ensai 2A, Oct-Dec 2024
This version: October 31, 2024
1/14
Agenda
The distribution of a duration
Characterizing the distribution
Common distributions in duration models
2/14
▶ The notions presented in the following are mostly stated for
random variables with non-negative values
▶ The reason is because these notions are mostly used in
survival analysis
▶ However, these notions extend to random variables with
values on the whole real line
3/14
Ways to characterize a probability distribution (1/2)
▶ Assume that we study a random variable Y ∈ R+ = [0, ∞)
that represent a time-to-event.
▶ Distribution function: FY : R+ 7→ [0, 1] defined by
FY (y ) = P(Y ≤ y ), y ≥0
▶ Survival function: SY : R+ 7→ [0, 1] defined by
SY (y ) = P(Y > y ) = 1 − FY (y ), y ≥0
▶ Some authors denote the survival function by F Y , and
called it survivor or reliability function.
▶ Notation: SY (y −) = P(Y ≥ y ), y ≥ 0
4/14
Ways to characterize a probability distribution (2/2)
▶ We say that Y admits a density (with respect to the
Lebesgue measure on the real line) if there exists a
measurable function fY : R+ 7→ R+ such that
Z y Z ∞
FY (y ) = fY (t)dt, y ≥ 0 also SY (y ) = fY (t)dt, y ≥ 0
0 y
In this case, we also say that Y is absolutely continuous, or
simply continuous
▶ If FY (·) is differentiable at t, then
fY (y ) = FY′ (y ) = −SY′ (y )
▶ In the following we also use the simplified notation f , F and
S (or F ) instead of fY , FY and SY (or F Y )
▶ Other ways to characterize the distribution of Y : characteristic
function, moment generating function, etc.
5/14
Exercise: Moments Using the Survival Function
▶ Let Y be a nonnegative random variable with survival
function S.
▶ Exercise: Show that, for any α > 0,
Z ∞
E(Y α ) = α y α−1 S(y )dy ,
0
in the sense that if one side converges so does the other.1
▶ Deduce Z ∞
E(Y ) = S(y )dy .
0
▶ Exercise. Propose an alternative, direct proof for the
relationship Z ∞
E(Y ) = S(y )dy
0
using Fubini’s Theorem.
1
See Feller (1966), An Introduction to Probability Theory and Its Applications, vol. 2, Lemma 1, p. 150. 6/14
Hazard functions : discrete random variables (1/2)
▶ Let Y ∈ {y1 , y2 , . . .} with 0 ≤ y1 < y2 < · · ·
▶ Let pk = P(Y = yk ) > 0, k ≥ 1
▶ The hazard function (also called hazard rate, or failure
rate) is defined as
P(Y = yk ) pk pk pk
λ(yk ) = =P = = , k ≥ 1.
P(Y ≥ yk ) p
j≥k j S(yk −1 ) S(yk −)
▶ We could also write the hazard function as a conditional
probability
λ(yk ) = P(Y = yk | Y ≥ yk )
▶ The hazard rate is thus the probability that the event
occurs at time yk given that it did not occur previously
7/14
Hazard functions : discrete random variables (2/2)
▶ Exercise: show that for each k ≥ 1,
k
S(yk ) S(yk ) Y
λ(yk ) = 1 − =1− , S(yk ) = {1 − λ(yj )}
S(yk −1 ) S(yk −)
j=1
(by definition S(y1 −) = 1).
▶ The cumulative hazard function is defined as
X
Λ(yk ) = λ(yj ), k ≥1
1≤j≤k
Proposition
The hazard function characterizes the distribution of Y . The
same is true for the cumulative hazard function.
▶ Exercise: Prove the Proposition.
8/14
Hazard functions : continuous random variables (1/2)
▶ Assume the random variable Y ≥ 0 admits the density f
▶ The hazard function (also called hazard rate, or failure
rate) is defined as
f (y ) f (y )
λ(y ) = = , y ≥0
S(y ) P(Y ≥ y )
▶ Herein we will always use the convention 0/0 = 0!
▶ The cumulative hazard function is defined as
Z y
Λ(y ) = λ(t)dt, y ≥ 0.
0
9/14
Hazard functions : continuous random variables (2/2)
▶ In the case of a random variable with a density, which we
assume continuous, we also have
1
λ(y ) = lim P (Y ∈ [y , y + h) | Y ≥ y ) (1)
h↓0 h
Proposition
In the case where Y ≥ 0 admits the density f we have the
following relationships
Z y
S(y ) = exp(−Λ(y )) and f (y ) = λ(y ) exp − λ(t)dt .
0
In particular, any of λ(·) and Λ(·) could be used to characterize
the distribution of Y
10/14
Exercises:
▶ Prove the relationship (1) in the case where the density f (·)
is continuous
▶ Prove the Proposition on the previous slide
11/14
Agenda
The distribution of a duration
Characterizing the distribution
Common distributions in duration models
12/14
Exponential distribution
▶ A law for a nonnegative random variable Y
▶ The law is indexed by one positive parameter λ
▶ The survivor function: S(y ) = exp(−λy ), y ≥ 0
▶ Density: f (y ) = λ exp(−λy ), y ≥ 0
▶ E(Y ) = 1/λ; Var (Y ) = 1/λ2
▶ Some authors use a different parametrization: γ = 1/λ !
▶ Mode: one mode at y = 0
▶ Quantile function: q(p) = −λ−1 log(1 − p)
▶ Hazard function: λ(y ) ≡ λ (constant hazard rate)
▶ Cumulative Hazard function: Λ(y ) = λy
13/14
Weibull distribution
▶ A law for a nonnegative random variable Y
▶ The law is indexed by two parameters λ > 0 (scale parameter)
and k > 0 (shape parameter)
▶ The survivor function: S(y ) = exp(−(λy )k ), y ≥ 0
▶ Density: f (y ) = k λ(λy )k −1 exp(−(λy )k ), y ≥ 0
▶ E(Y ) = Γ(1 + 1/k )/λ; Var (Y ) – exercise
▶ Some authors use a different parametrization:
γ = 1/λ, or γ = λk , or yet other !
▶ The random variable W = (λY )k has an exponential law with
parameter equal to 1
▶ If U ∼ U[0, 1], then Y = λ−1 (− log(U))1/k is a Weibull random
variable with parameters λ and k
▶ One mode at y = 0 if k ≤ 1, and at ((k − 1)/k )1/k /λ if k > 1
▶ Quantile function: q(p) = λ−1 (− log(1 − p))1/k
▶ Hazard function: λ(y ) = k λ(λy )k −1
14/14