Poisson and Geometric Random Variables: Jeff Chak Fu WONG
Poisson and Geometric Random Variables: Jeff Chak Fu WONG
Department of Mathematics
The Chinese University of Hong Kong
MATH3280
Introductory Probability
The Poisson Random Variable
Definition 1
A random variable X that takes on one of the values 0, 1, 2, . . . is said to be a
Poisson random variable with parameter λ if, for some λ > 0,
λi
p(i) = P {X = i} = e−λ , i = 0, 1, 2, . . . (1)
i!
Equation (1) defines a probability mass function, since
∞ ∞
X X λi
p(i) = e−λ = e−λ eλ = 1
i=0 i=0
i!
The Poisson random variable has a tremendous range of applications in diverse ar-
eas because it may be used as an approximation for a binomial random variable
with parameters (n, p) when n is large and p is small enough so that np is of
moderate size.
To see this, suppose that X ia a binomial random variable with parameters (n, p),
and let λ = np. Then
n!
P {X = i} = pi (1 − p)n−i
(n − i)!i!
i n−i
n! λ λ
= 1−
(n − i)!i! n n
n(n − 1) · · · (n − i + 1) −
(n λi (1 − λ/n)n
i)!
= i
n − i)!
(n
i! (1 − λ/n)i
λi
P {X = i} ≈ e−λ
i!
In other words, if n independent trials, each of which results in a success with
probability p, are performed, then, when n is large and p is small enough to
make np moderate, the number of successes occurring is approximately a Poisson
random variable with parameter λ = np.
This value λ (which will later be shown to equal the expected number of successes)
will usually be determined empirically.
Some examples of random variables that generally obey the Poisson probability
law [that is, they obey Equation (1)] are as follows:
1. The number of misprints on a page (or a group of pages) of a book
2. The number of people in a community who survive to age 100
3. The number of wrong telephone numbers that are dialed in a day
4. The number of packages of dog biscuits sold in a particular store each day
5. The number of customers entering a post office on a given day
6. The number of vacancies occurring during a year in the federal judicial
system
7. The number of α-particles discharged in a fixed period of time from some
radioactive material
Each of the preceding, and numerous other random variables, are approximately
Poisson for the same reason − namely, because of the Poisson approximation to
the binomial.
Example 1
Solution
Letting X denote the number of errors on this page, we have
P {X ≥ 1} = 1 − P {X = 0} = 1 − e−1/2 ≈ .393
■
Example 2
Solution
Let
X = {the number of defectives}
Exact Solution:
X is a binomial random variable with parameters n = 10 and p = 0.1. So
X ∼ Binomial(10, 0.1)
P {X ≤ 1} = P {X = 0} + P {X = 1}
! !
10 0 10 10
= (.1) (.9) + (.1)1 (.9)9
0 1
= 0.7361,
Approximate Solution:
We can approximate X by a Poisson random variable, say Y , with parameter
λ = n · p = 10 · (0.1) = 1. So
Y ∼ Poisson(1)
P {X ≤ 1} ≈ P {Y ≤ 1}
= P {Y = 0} + P {Y = 1}
= e−1 + e−1
≈ 0.7358.
Before computing the expected value and variance of the Poisson random vari-
able with parameter λ, recall that this random variable approximates a binomial
random variable with parameters n and p when n is large, p is small, and λ = np.
Since such a binomial random variable has
• expected value np = λ and
• variance np(1 − p) = λ(1 − p) ≈ λ (since p is small),
it would seem that both the expected value and the variance of a Poisson random
variable would equal its parameter λ.
We now verify this result:
∞
X ie−λ λi
E[X] =
i=0
i!
∞ i
X iλ
= e−λ
i=0 − 1)!
i(i
∞
X λλi−1
= e−λ by using λi = λλi−1
i=1
(i − 1)!
∞
X λj
= λe−λ by letting j = i − 1
j=0
j!
∞
X λi
= λe−λ eλ since = eλ
j=0
i!
=λ
Thus, the expected value of a Poisson random variable X is indeed equal to its
parameter λ. To determine its variance, we first compute E[X 2 ]:
∞ 2 −λ i
X i e λ
E[X 2 ] =
i=0
i!
∞
X ie−λ λi−1
=λ
i=1
(i − 1)!
∞
X λj
=λ (j + 1)e−λ by letting j = i − 1
j=0
j!
∞ ∞
!
X je−λ λj X e−λ λj
=λ +
j=0
j! j=0
j!
= λ (E[X] + 1)
= λ(λ + 1)
where the final equality follows because the first sum is the expected value of
a Poisson random variable with parameter λ and the second is the sum of the
probabilities of this random variable.
Therefore, since we have shown that E[X] = λ, we obtain
n = 15 and p = 1/2
Hence,
λ = n · p = 15 · (1/2) = 15/2
So
P {X ≥ 4} = 1 − P {X = 0} − P {X = 1} − P {X = 2} − P {X = 3}
= 1 − p(0) − p(1) − p(2) − p(3)
(15/2)2 (15/2)3
= 1 − e−15/2 − e−15/2 (15/2) − e−15/2 − e−15/2
2! 3!
The expectation and the variance are
■
The Geometric Random Variable
Suppose that independent trials, each having a probability p, 0 < p < 1, i.e.,
Bernoulli(p) of being a success, are performed until a success occurs.
If we let X equal the number of trials required, then
P {X = n} = (1 − p)n−1 p, n = 1, 2, · · · (2)
X ∼ Geometric(p)
Figure 1:
Example 4
An urn contains N white and M black balls. Balls are randomly selected, one
at a time, until a black one is obtained. If we assume that each ball selected
is replaced before the next one is drawn, what is the probability that
(a) exactly n draws are needed?
(b) at least k draws are needed?
Solution. If we let X denote the number of draws needed to select a black ball,
then X satisfies Equation (2) with
M
p= .
M +N
Hence, (a)
n−1
M N n−1
N M
P {X = n} = =
M +N M +N (M + N )n
where
M N
1−p=1− =
M +N M +N
(b)
∞ n−1
M X N
P {X ≥ k} =
M + N n=k M +N
∞ n−1 k−1 −(k−1)
M X N N N
=
M + N n=k M + N M +N M +N
∞
k−1 X (n−(k−1))−1
M N N
= by letting j = n − (k − 1)
M +N M +N n=k
M +N
∞
k−1 X j−1
M N N
=
M +N M +N j=1 M +N
k−1
M N
M +N M +N
=
1 − MN
+N
k−1
N
=
M +N
Of course, part (b) could have been obtained directly, since the probability that
at least k trials are necessary to obtain a success is equal to the probability that
the first k − 1 trials are all failures. That is, for a geometric random variable,
P {X ≥ k} = (1 − p)k−1
■
Example 5
An urn contains 10 white and 5 black balls. Balls are randomly selected, one
at a time, until a black one is obtained. Assume each selected ball is replaced
before next one is drawn. Let
2. What is the probability that at least 13 drawn are needed up to the first
black ball?
∞ n−1
X 10 5
P {X ≥ 13} =
n=13
15 15
∞ n−1
5 X 10
=
15 n=13 15
12 X ∞ n−1
5 10 10
=
15 15 n=1
15
12
5 10 1
=
15 15 1 − 10
15
12
10
= .
15
If X ∼ Geometric(p) then what are the expectation and the variance of X?
∞
X
E[X] = ip(i)
i=1
X∞
= (i−1+1)p(i)
i=1
X∞ ∞
X
= (i − 1)p(i) + p(i)
i=1 i=1
X∞
= (i − 1)q i−1 p + 1, where q =1−p
i=1
X∞
= mq m p + 1
m=0
X∞
=q mq m−1 p + 1
m=1
X∞
=q mp(m) + 1
m=1
= qE[X] + 1
Hence we obtain
E[X] = qE[X] + 1
or
E[X] − qE[X] = 1
or
(1 − q)E[X] = 1
By solving this equation,
1 1
E[X] = =
1−q p
To determine Var(X), let us first compute E[X 2 ]. With q = 1 − p, we have
∞
X
E[X 2 ] = i2 q i−1 p
i=1
∞
X
= (i−1+1)2 q i−1 p
i=1
∞
X ∞
X ∞
X
= (i − 1)2 q i−1 p + 2(i − 1)q i−1 p + q i−1 p
i=1 i=1 i=1
∞
X ∞
X
2 j i
= j q p+2 jq p + 1
j=0 j=0
∞
X ∞
X
=q j 2 q j−1 p + 2q jq j−1 p + 1
j=1 j=1
2
= qE[X ] + 2qE[X] + 1
Using E[X] = 1/p, the equation for E[X 2 ] yields
or
(1 − q)E[X 2 ] = 2q(1/p) + 1
or
2q
pE[X 2 ] = +1
p
Hence,
2q + p q+1
E[X 2 ] = =
p2 p2
giving the result
q+1 1 q 1−p
Var(X) = − 2 = 2 =
p2 p p p2
■
Example 6
Roll a dice until 5 shows up.
Example 7
T T T HT T HHT has X = 4.
T T T H T T H H T
1st 2nd 3rd 4th 5th 6th 7th 8th 9th
The pmf is (
(1 − p)k−1 p if k = 1, 2, · · ·
p(k) =
0 otherwise
1
Mean : E[X] = µ =
p
√
1−p
2 1−p
Variance : Var(X) = σ = , SD : SD(X) = σ =
p2 p
Negative Binomial Distribution
• Consider a biased coin with probability p of heads.
• Flip it repeatedly (potentially ∞ times).
• Let X be the number of flips until the rth head (r = 1, 2, 3, · · · is a
fixed parameter).
Example 8
For r = 3, T T T HT HHT T H has X = 7.
T T T H T H H T T H
1st 2nd 3rd 4th 5th 6th 7th 8th 9th 10th
X = k when
• first k − 1 flips: r − 1 heads and k − r tails in any order;
• kth flip: heads
Note that
p(k) = k−1 pr−1 (1 − p)k−r · p = k−1
r
r−1 r−1
p (1 − p)k−r , if k = r, r + 1, r + 2, · · ·
The pmf is
(
k−1 r
p (1 − p)k−r if k = r, r + 1, r + 2, · · ·
r−1
p(k) =
0 otherwise
If k flips are required to get r heads, it means in k −1 flips you already have r −1
heads and the kth flip is a head. Assuming successive flips are independent, we
have:
P (X = k) = P (r − 1 heads in k − 1 flips)·p
= p(k)
!
k − 1 r−1
= p (1 − p)k−r ·p
r−1
!
k−1 r
= p (1 − p)k−r , if k = r, r + 1, r + 2, · · ·
r−1
•
1 1 r
Mean : E[X] = E[X1 ] + · · · + E[Xr ] = + ··· + =
p p p
•
1−p 1−p r(1 − p)
Variance : Var(X) = + ··· + =
p2 p2 p2
• p
Standard deviation : SD(X) = r(1 − p)/p
Geometric Distribution
Example 9
About 10% of the population is left-handed.
Look at the handedness of babies in birth order in a hospital.
Number of births until first left-handed baby:
Geometric distribution with p = .1:
(
.9x−1 · .1 if k = 1, 2, · · ·
p(x) =
0 otherwise
1 1
Mean : E[X] = µ = = = 10
p .1
√ √
1−p .9
Standard deviation : SD(X) = σ = = ≈ 9.487
p .1
which is huge!
Figure 2:
Negative Binomial Distribution
Example 10
About 10% of the population is left-handed.
Look at the handedness of babies in birth order in a hospital.
Number of births until 8th left-handed baby:
Geometric distribution with p = .1:
(
x−1 8
p (1 − p)x−8
8−1
if x = 8, 9, 10, · · ·
p(x) =
0 otherwise
r 8
Mean : E[X] = µ = = = 80
p .1
p p
r(1 − p) 8(.9)
Standard deviation : SD(X) = σ = = ≈ 26.833
p .1
For real a, x with |x| < 1, For integral r > 0, and real x with
|x| < 1,
∞
a X i
∞
!
= ax 1 X k − 1 k−r
1−x i=0 = x
(1 − x)r r−1
k=r
= a + ax + ax2 + · · ·
Total probability for the negative
Total probability for the geometric binomial distribution:
distribution:
∞
!
X k−1 r
∞
X p (1 − p)k−r
(1 − p)k−1 p k=r
r−1
k=1 ∞
!
p r
X k−1
= =p (1 − p)k−r
1 − (1 − p) r−1
k=r
p 1
= =1 = pr · =1
p (1 − (1 − p))r
Geometric and negative binomial – versions
Unfortunately, there are 4 versions of the definitions of these distributions, so
you may see them defined differently elsewhere:
• Version 1: the definitions we already did (call the variable X).
• Version 2 (geometric): Let Y be the number of tails before the first
heads, so T T T HT T HHT has Y = 3.
The pdf is
(
(1 − p)k−1 p if k = 0, 1, 2, · · ·
pY (k) =
0 otherwise
Since Y = X − 1, we have
1 1−p
Mean : E[Y ] = − 1, Variance : Var(Y ) =
p p2
• Version 2 (negative binomial): Let Y be the number of tails before the
rth heads, so Y = X − r. The pdf is
(
k+r−1 r
p (1 − p)r if k = 0, 1, 2, · · ·
r−1
pY (k) =
0 otherwise
• Versions 3 and 4: switch the roles of heads and tails in the first two
versions (so p and 1 − p are switched).
It is evident that P (X = x) ≥ 0 for x ≥ r. So let us prove that
X
P (X = x) = 1 :
x≥r
!
X X x−1 r
P (X = x) = p (1 − p)x−r
r−1
x≥r x≥r
!
symmetry
X x−1 r
= p (1 − p)x−r
x−r
x≥r
!
substituting j = x − r
X r+j−1
r
= p (1 − p)j
j
j≥0
!
j −r
identity j+r−1
j −r
j
= (−1) j
X
r
= p (−1) (1 − p)j
j
j≥0
!
j j
(−1) (1 − p) = (p − 1)j X −r
= pr (p − 1)j
j
j≥0
binomial theorem
= = p (1 + (p − 1))−r
r
=1
using the identities
!
x−1 (x − 1)!
=
r−1 ((x − 1) − (r − 1))!(r − 1)!
(x − 1)!
=
((x − 1) − (r − 1))!((x − 1) − (x − r))!
!
x−1
=
x−1−r+1
!
x−1
=
x−r
and
!
j+r−1 (j + r − 1)!
=
j (j + r − 1 − j)!j!
(j + r − 1)(j + r − 2) · · · r(r − 1)!
=
(r − 1)!j!
(j + r − 1)(j + r − 2) · · · r
=
j!
j (−r − (j − 1))(−r − (j − 2)) · · · (−r)
= (−1)
j!
j (−r)(−r − 1) · · · (−r − (j − 1))
= (−1)
j!
!
−r
= (−1)j
j
Hypergeometric Distribution
• An urn has 1000 balls: 700 green, 300 blue.
• Pick a ball at random. The probability it’s green is
p = 700/1000 = .7
Figure 4:
• An urn has 1000 balls: 700 green, 300 blue.
• The urn needs to be well-mixed. Here, if you pick from the top, the
chance of blue is much higher than in the total population.
Figure 5:
Sampling with and without replacement
K N −K
k n−k
P (X = k) = N
n
np(1 − p)(N − n)
• E[X] = np and Var(X) =
N −1
Sampling without replacement (2nd method)
A urn has 1000 balls: 700 green (G), 300 blue (B).
Figure 6:
Exercise: Prove the expected value/the variance of hypergeometric distribution:
np(1 − p)(N − n)
E[X] = np and Var(X) = .
N −1
Reference