0% found this document useful (0 votes)
30 views59 pages

Poisson and Geometric Random Variables: Jeff Chak Fu WONG

CUHK MATH3280

Uploaded by

ke ke
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views59 pages

Poisson and Geometric Random Variables: Jeff Chak Fu WONG

CUHK MATH3280

Uploaded by

ke ke
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 59

Poisson and Geometric Random Variables

Jeff Chak Fu WONG

Department of Mathematics
The Chinese University of Hong Kong

MATH3280
Introductory Probability
The Poisson Random Variable
Definition 1
A random variable X that takes on one of the values 0, 1, 2, . . . is said to be a
Poisson random variable with parameter λ if, for some λ > 0,

λi
p(i) = P {X = i} = e−λ , i = 0, 1, 2, . . . (1)
i!
Equation (1) defines a probability mass function, since
∞ ∞
X X λi
p(i) = e−λ = e−λ eλ = 1
i=0 i=0
i!

The Poisson random variable has a tremendous range of applications in diverse ar-
eas because it may be used as an approximation for a binomial random variable
with parameters (n, p) when n is large and p is small enough so that np is of
moderate size.
To see this, suppose that X ia a binomial random variable with parameters (n, p),
and let λ = np. Then
n!
P {X = i} = pi (1 − p)n−i
(n − i)!i!
 i  n−i
n! λ λ
= 1−
(n − i)!i! n n
n(n − 1) · · · (n − i + 1)  −
(n  λi (1 − λ/n)n
i)!
= i
n − i)!
(n

 i! (1 − λ/n)i


Now, for n large and λ moderate,


 n
λ
1− ≈ e−λ
n
n(n − 1) · · · (n − i + 1)
≈1
ni
and  i
λ
1− ≈1
n
Hence, for n large and λ moderate,

λi
P {X = i} ≈ e−λ
i!
In other words, if n independent trials, each of which results in a success with
probability p, are performed, then, when n is large and p is small enough to
make np moderate, the number of successes occurring is approximately a Poisson
random variable with parameter λ = np.
This value λ (which will later be shown to equal the expected number of successes)
will usually be determined empirically.
Some examples of random variables that generally obey the Poisson probability
law [that is, they obey Equation (1)] are as follows:
1. The number of misprints on a page (or a group of pages) of a book
2. The number of people in a community who survive to age 100
3. The number of wrong telephone numbers that are dialed in a day
4. The number of packages of dog biscuits sold in a particular store each day
5. The number of customers entering a post office on a given day
6. The number of vacancies occurring during a year in the federal judicial
system
7. The number of α-particles discharged in a fixed period of time from some
radioactive material
Each of the preceding, and numerous other random variables, are approximately
Poisson for the same reason − namely, because of the Poisson approximation to
the binomial.
Example 1

Suppose that the number of typographical errors on a single page of this


book has a Poisson distribution with parameter λ = 21 .
Calculate the probability that there is at least one error on this page.
Example 1

Suppose that the number of typographical errors on a single page of this


book has a Poisson distribution with parameter λ = 21 .
Calculate the probability that there is at least one error on this page.

Solution
Letting X denote the number of errors on this page, we have

P {X ≥ 1} = 1 − P {X = 0} = 1 − e−1/2 ≈ .393


Example 2

Suppose that the probability that an item produced by a certain machine


will be defective is .1.
Find the probability that a sample of 10 items will contain at most 1 defective
item.
Example 2

Suppose that the probability that an item produced by a certain machine


will be defective is .1.
Find the probability that a sample of 10 items will contain at most 1 defective
item.

Solution
Let
X = {the number of defectives}
Exact Solution:
X is a binomial random variable with parameters n = 10 and p = 0.1. So

X ∼ Binomial(10, 0.1)

Then, the desired probability is

P {X ≤ 1} = P {X = 0} + P {X = 1}
! !
10 0 10 10
= (.1) (.9) + (.1)1 (.9)9
0 1
= 0.7361,
Approximate Solution:
We can approximate X by a Poisson random variable, say Y , with parameter
λ = n · p = 10 · (0.1) = 1. So

Y ∼ Poisson(1)

Then, the desired probability is

P {X ≤ 1} ≈ P {Y ≤ 1}
= P {Y = 0} + P {Y = 1}
= e−1 + e−1
≈ 0.7358.
Before computing the expected value and variance of the Poisson random vari-
able with parameter λ, recall that this random variable approximates a binomial
random variable with parameters n and p when n is large, p is small, and λ = np.
Since such a binomial random variable has
• expected value np = λ and
• variance np(1 − p) = λ(1 − p) ≈ λ (since p is small),
it would seem that both the expected value and the variance of a Poisson random
variable would equal its parameter λ.
We now verify this result:

X ie−λ λi
E[X] =
i=0
i!
∞ i
X iλ
= e−λ 
i=0  − 1)!
i(i

X λλi−1
= e−λ by using λi = λλi−1
i=1
(i − 1)!

X λj
= λe−λ by letting j = i − 1
j=0
j!

X λi
= λe−λ eλ since = eλ
j=0
i!


Thus, the expected value of a Poisson random variable X is indeed equal to its
parameter λ. To determine its variance, we first compute E[X 2 ]:
∞ 2 −λ i
X i e λ
E[X 2 ] =
i=0
i!

X ie−λ λi−1

i=1
(i − 1)!

X λj
=λ (j + 1)e−λ by letting j = i − 1
j=0
j!
∞ ∞
!
X je−λ λj X e−λ λj
=λ +
j=0
j! j=0
j!

= λ (E[X] + 1)
= λ(λ + 1)

where the final equality follows because the first sum is the expected value of
a Poisson random variable with parameter λ and the second is the sum of the
probabilities of this random variable.
Therefore, since we have shown that E[X] = λ, we obtain

Var(X) = E[X 2 ] − (E[X])2


= λ(λ + 1) − λ2

Theorem 1
If X ∼ Poisson(λ) then

E[X] = λ and Var(X) = λ.


Example 3
Suppose that earthquakes occur in a certain region with frequency 3 per week
on average and occur with probability 1/2. Find the probability that at least
4 earthquakes occur in the next 5 weeks. Find also expectation and variance.
In order to determine right parameter we need to adapt the time interval
according to the question. Since the desired time interval is 5 weeks, we will
assume, on average, 15 earthquakes occur in 5 weeks. So we may take

n = 15 and p = 1/2

Hence,
λ = n · p = 15 · (1/2) = 15/2
So

P {X ≥ 4} = 1 − P {X = 0} − P {X = 1} − P {X = 2} − P {X = 3}
= 1 − p(0) − p(1) − p(2) − p(3)
(15/2)2 (15/2)3
= 1 − e−15/2 − e−15/2 (15/2) − e−15/2 − e−15/2
2! 3!
The expectation and the variance are

E[X] = λ = 15/2 and Var(X) = λ = 15/2.


The Geometric Random Variable
Suppose that independent trials, each having a probability p, 0 < p < 1, i.e.,
Bernoulli(p) of being a success, are performed until a success occurs.
If we let X equal the number of trials required, then

P {X = n} = (1 − p)n−1 p, n = 1, 2, · · · (2)

Equation (2) follows because, in order for X to equal n, it is necessary and


sufficient that the first n − 1 trials are failures and the nth trial is a success.
Equation (2) then follows, since the outcomes of the successive trials are assumed
to be independent. Since
∞ ∞
X X p
P {X = n} = p (1 − p)n−1 = =1
n=1 n=1
1 − (1 − p)

it follows that, with probability 1, a success will eventually occur.


Any random variable X whose probability mass function is given by Equation
(2) is said to be a geometric random variable with parameter p.
This random variable is called a Geometric random variable. Write

X ∼ Geometric(p)

where p is probability of success. It takes values 1, 2, 3, · · · , and its PMF is


(
(1 − p)n−1 p if n = 1, 2, · · ·
p(n) =
0 otherwise

Figure 1:
Example 4
An urn contains N white and M black balls. Balls are randomly selected, one
at a time, until a black one is obtained. If we assume that each ball selected
is replaced before the next one is drawn, what is the probability that
(a) exactly n draws are needed?
(b) at least k draws are needed?
Solution. If we let X denote the number of draws needed to select a black ball,
then X satisfies Equation (2) with
M
p= .
M +N
Hence, (a)
n−1
M N n−1

N M
P {X = n} = =
M +N M +N (M + N )n
where
M N
1−p=1− =
M +N M +N
(b)
∞  n−1
M X N
P {X ≥ k} =
M + N n=k M +N
∞  n−1  k−1  −(k−1)
M X N N N
=
M + N n=k M + N M +N M +N
∞ 
k−1 X (n−(k−1))−1
M N N

= by letting j = n − (k − 1)
M +N M +N n=k
M +N
∞ 
k−1 X j−1
M N N

=
M +N M +N j=1 M +N
  k−1
M N
M +N M +N
=  
1 − MN
+N
k−1
N

=
M +N

Of course, part (b) could have been obtained directly, since the probability that
at least k trials are necessary to obtain a success is equal to the probability that
the first k − 1 trials are all failures. That is, for a geometric random variable,

P {X ≥ k} = (1 − p)k−1


Example 5
An urn contains 10 white and 5 black balls. Balls are randomly selected, one
at a time, until a black one is obtained. Assume each selected ball is replaced
before next one is drawn. Let

X = {the number of balls drawn up to the first black ball}

So we assume that selecting a black ball is success. In this case


5
p = probability of success =
15
and
10
1 − p = probability of failure =
15
Hence the PMF is
 
n−1
 10
 5
if n = 1, 2, · · ·
p(n) = 15 15

0 otherwise
Then we can answer the following questions.
1. What is the probability that 7 drawn are needed up to the first black ball?
 6
10 5
p(7) =
15 15

2. What is the probability that at least 13 drawn are needed up to the first
black ball?

∞  n−1
X 10 5
P {X ≥ 13} =
n=13
15 15
∞  n−1
5 X 10
=
15 n=13 15
 12 X ∞  n−1
5 10 10
=
15 15 n=1
15
 12
5 10 1
=
15 15 1 − 10
15
 12
10
= .
15
If X ∼ Geometric(p) then what are the expectation and the variance of X?


X
E[X] = ip(i)
i=1
X∞
= (i−1+1)p(i)
i=1
X∞ ∞
X
= (i − 1)p(i) + p(i)
i=1 i=1
X∞
= (i − 1)q i−1 p + 1, where q =1−p
i=1
X∞
= mq m p + 1
m=0
X∞
=q mq m−1 p + 1
m=1
X∞
=q mp(m) + 1
m=1

= qE[X] + 1
Hence we obtain
E[X] = qE[X] + 1
or
E[X] − qE[X] = 1
or
(1 − q)E[X] = 1
By solving this equation,
1 1
E[X] = =
1−q p
To determine Var(X), let us first compute E[X 2 ]. With q = 1 − p, we have

X
E[X 2 ] = i2 q i−1 p
i=1

X
= (i−1+1)2 q i−1 p
i=1

X ∞
X ∞
X
= (i − 1)2 q i−1 p + 2(i − 1)q i−1 p + q i−1 p
i=1 i=1 i=1

X ∞
X
2 j i
= j q p+2 jq p + 1
j=0 j=0

X ∞
X
=q j 2 q j−1 p + 2q jq j−1 p + 1
j=1 j=1
2
= qE[X ] + 2qE[X] + 1
Using E[X] = 1/p, the equation for E[X 2 ] yields

E[X 2 ] = qE[X 2 ] + 2qE[X] + 1

or
(1 − q)E[X 2 ] = 2q(1/p) + 1
or
2q
pE[X 2 ] = +1
p
Hence,
2q + p q+1
E[X 2 ] = =
p2 p2
giving the result
q+1 1 q 1−p
Var(X) = − 2 = 2 =
p2 p p p2

Example 6
Roll a dice until 5 shows up.

X ∼ {the number of trials up to the first 5 shows}


Then
1
p = probability of success =
6
The PMF is  
n−1
 5
 1
if n = 1, 2, · · ·
p(n) = 6 6

0 otherwise
What is the probability that rolling the dice at most 3 times until the dice
shows a 5?
P {X ≤ 3} = p(1) + p(2) + p(3)
   2
1 5 1 5 1
= + +
6 6 6 6 6
   2 !
1 5 5
= 1+ +
6 6 6
 
1 36 + 30 + 25
=
6 36
96
=
216
What is the expectation?
1
E[X] = =6
p
What is the variance?
1 1
Var(X) = − = 36 − 6 = 30
p2 p
What is the standard deviation?
p √
σX = Var(X) = 30 ■
Geometric and negative binomial distributions
Geometric Distribution
• Consider a biased coin with probability p of heads.
• Flip it repeatedly (potentially ∞ times).
• Let X be the number of flips until the first head.

Example 7
T T T HT T HHT has X = 4.

T T T H T T H H T
1st 2nd 3rd 4th 5th 6th 7th 8th 9th
The pmf is (
(1 − p)k−1 p if k = 1, 2, · · ·
p(k) =
0 otherwise
1
Mean : E[X] = µ =
p

1−p
2 1−p
Variance : Var(X) = σ = , SD : SD(X) = σ =
p2 p
Negative Binomial Distribution
• Consider a biased coin with probability p of heads.
• Flip it repeatedly (potentially ∞ times).
• Let X be the number of flips until the rth head (r = 1, 2, 3, · · · is a
fixed parameter).

Example 8
For r = 3, T T T HT HHT T H has X = 7.

T T T H T H H T T H
1st 2nd 3rd 4th 5th 6th 7th 8th 9th 10th
X = k when
• first k − 1 flips: r − 1 heads and k − r tails in any order;
• kth flip: heads
Note that 
p(k) = k−1 pr−1 (1 − p)k−r · p = k−1
 r
r−1 r−1
p (1 − p)k−r , if k = r, r + 1, r + 2, · · ·
The pmf is
(
k−1 r
p (1 − p)k−r if k = r, r + 1, r + 2, · · ·

r−1
p(k) =
0 otherwise
If k flips are required to get r heads, it means in k −1 flips you already have r −1
heads and the kth flip is a head. Assuming successive flips are independent, we
have:

P (X = k) = P (r − 1 heads in k − 1 flips)·p
= p(k)
!
k − 1 r−1
= p (1 − p)k−r ·p
r−1
!
k−1 r
= p (1 − p)k−r , if k = r, r + 1, r + 2, · · ·
r−1

• Basically, the k−1



r−1
represents the number of ways you can select r − 1
successes out of k − 1 tries.
• The next part, pr−1 (1 − p)k−r , assuming the event of success is
independent, represents r − 1 successes and k − r failures. Note that
(k − r) + (r − 1) = (k − 1), which is the total number of tries so far.
• Multiplying that with p, which is the probability of the last success, gives
you the answer.
Negative Binomial Distribution – mean and variance
Consider the sequence of flips T T T HT HHT T H.
Break it up at each heads:
T
| T{z T H |{z}
T H} |{z} H T |T
{zH}
X1 =4 X2 =2 X3 =1 X4 =3

• X1 is the number of flips until the 1st heads;


• X2 is the number of additional flips until the 2nd heads;
• X3 is the number of additional flips until the 3rd heads; · · ·
The Xi ’s are i.i.d. (independent and identically distributed) geometric random
variables with parameter p, and
X = X1 + · · · + Xr .


1 1 r
Mean : E[X] = E[X1 ] + · · · + E[Xr ] = + ··· + =
p p p

1−p 1−p r(1 − p)
Variance : Var(X) = + ··· + =
p2 p2 p2
• p
Standard deviation : SD(X) = r(1 − p)/p
Geometric Distribution

Example 9
About 10% of the population is left-handed.
Look at the handedness of babies in birth order in a hospital.
Number of births until first left-handed baby:
Geometric distribution with p = .1:

(
.9x−1 · .1 if k = 1, 2, · · ·
p(x) =
0 otherwise
1 1
Mean : E[X] = µ = = = 10
p .1
√ √
1−p .9
Standard deviation : SD(X) = σ = = ≈ 9.487
p .1
which is huge!
Figure 2:
Negative Binomial Distribution
Example 10
About 10% of the population is left-handed.
Look at the handedness of babies in birth order in a hospital.
Number of births until 8th left-handed baby:
Geometric distribution with p = .1:
(
x−1 8
p (1 − p)x−8

8−1
if x = 8, 9, 10, · · ·
p(x) =
0 otherwise

r 8
Mean : E[X] = µ = = = 80
p .1
p p
r(1 − p) 8(.9)
Standard deviation : SD(X) = σ = = ≈ 26.833
p .1

Probability the 50th baby is the 8th left-handed one:


! !
50 − 1 8 50−8 49
p(50) = .1 .9 = .18 .942 ≈ 0.0103
8−1 7
Figure 3:
Where do the distribution names come from?
The PDFs correspond to the terms in certain Taylor series

Geometric series: Negative binomial series

For real a, x with |x| < 1, For integral r > 0, and real x with
|x| < 1,

a X i

!
= ax 1 X k − 1 k−r
1−x i=0 = x
(1 − x)r r−1
k=r
= a + ax + ax2 + · · ·
Total probability for the negative
Total probability for the geometric binomial distribution:
distribution:

!
X k−1 r

X p (1 − p)k−r
(1 − p)k−1 p k=r
r−1
k=1 ∞
!
p r
X k−1
= =p (1 − p)k−r
1 − (1 − p) r−1
k=r
p 1
= =1 = pr · =1
p (1 − (1 − p))r
Geometric and negative binomial – versions
Unfortunately, there are 4 versions of the definitions of these distributions, so
you may see them defined differently elsewhere:
• Version 1: the definitions we already did (call the variable X).
• Version 2 (geometric): Let Y be the number of tails before the first
heads, so T T T HT T HHT has Y = 3.
The pdf is
(
(1 − p)k−1 p if k = 0, 1, 2, · · ·
pY (k) =
0 otherwise
Since Y = X − 1, we have
1 1−p
Mean : E[Y ] = − 1, Variance : Var(Y ) =
p p2
• Version 2 (negative binomial): Let Y be the number of tails before the
rth heads, so Y = X − r. The pdf is
(
k+r−1 r
p (1 − p)r if k = 0, 1, 2, · · ·

r−1
pY (k) =
0 otherwise

• Versions 3 and 4: switch the roles of heads and tails in the first two
versions (so p and 1 − p are switched).
It is evident that P (X = x) ≥ 0 for x ≥ r. So let us prove that
X
P (X = x) = 1 :
x≥r

!
X X x−1 r
P (X = x) = p (1 − p)x−r
r−1
x≥r x≥r
!
symmetry
X x−1 r
= p (1 − p)x−r
x−r
x≥r
!
substituting j = x − r
X r+j−1
r
= p (1 − p)j
j
j≥0
    !
j −r
identity j+r−1
j −r
j
= (−1) j
X
r
= p (−1) (1 − p)j
j
j≥0
!
j j
(−1) (1 − p) = (p − 1)j X −r
= pr (p − 1)j
j
j≥0
binomial theorem
= = p (1 + (p − 1))−r
r

=1
using the identities

!
x−1 (x − 1)!
=
r−1 ((x − 1) − (r − 1))!(r − 1)!
(x − 1)!
=
((x − 1) − (r − 1))!((x − 1) − (x − r))!
!
x−1
=
x−1−r+1
!
x−1
=
x−r

and
!
j+r−1 (j + r − 1)!
=
j (j + r − 1 − j)!j!
(j + r − 1)(j + r − 2) · · · r(r − 1)!
=
(r − 1)!j!
(j + r − 1)(j + r − 2) · · · r
=
j!
j (−r − (j − 1))(−r − (j − 2)) · · · (−r)
= (−1)
j!
j (−r)(−r − 1) · · · (−r − (j − 1))
= (−1)
j!
!
−r
= (−1)j
j
Hypergeometric Distribution
• An urn has 1000 balls: 700 green, 300 blue.
• Pick a ball at random. The probability it’s green is

p = 700/1000 = .7

Figure 4:
• An urn has 1000 balls: 700 green, 300 blue.
• The urn needs to be well-mixed. Here, if you pick from the top, the
chance of blue is much higher than in the total population.

Figure 5:
Sampling with and without replacement

A urn has 1000 balls: 700 green, 300 blue.

Sampling without replacement


Sampling with replacement
Pick one of the 1000 balls, record
Pick one of the 1000 balls. Record
color, and set it aside.
color (green or blue).
Pick one of the remaining 999 balls,
Put it back in the urn and shake it up.
record color, set it aside.
Again pick one of the 1000 balls and
Pick one of the remaining 998 balls,
record color.
record color, set it aside.
Repeat n times.
Repeat n times, never re-using the
On each draw, the probability of same ball.
green is 700/1000.
Equivalently, take n balls all at once
The # green balls drawn has a and count them by color.
binomial distribution,
The # green balls drawn has a
p = 700/1000 = .7
hypergeometric distribution.
A urn has 1000 balls: 700 green, 300 blue.
A sample of 7 balls is drawn.
What is the probability that it has 3 green balls and 4 blue balls?

Sampling with replacement Sampling without replacement

Each draw has the same probability to # samples with 3 green


 balls and 4
be green: p = 700/1000 = .7 blue balls: 700
3
300
4
# samples of size 7: 1000

7

P (3 green & 4 blue)


P (3 green & 4 blue)
!
7 3
= p (1 − p)4 # samples with 3 green and 4 blue
3 =
! # samples of size 7
7 700 300
 
= .73 .34 3
3 = 1000
4
7
= 0.0972405 ≈ 0.0969179
Hypergeometric distribution
Exact distribution for sampling without replacement

Population (full urn) Sample


N balls n balls
K green k green
N − K blue n − k blue
p = K/N pb = k/n

Hypergeometric distribution (for sampling without replacement)


• Draw n balls without replacement.
• Let the random variable X be the number of green balls drawn.
• Its pmf is given by the hypergeometric distribution

K N −K
 
k n−k
P (X = k) = N

n

np(1 − p)(N − n)
• E[X] = np and Var(X) =
N −1
Sampling without replacement (2nd method)
A urn has 1000 balls: 700 green (G), 300 blue (B).

What is the probability to draw 7


balls in the order GBBGBGB?
Probability a sample of size 7 has 3
green and 4 blue
P (1st is G)
Each sequence of 3 G’s and 4 B’s has
= 700/1000 that same probability; numerator
P (2nd is B|1st is G) factors are in a different order, but
= 300/999 the result is equal.
Adding probabilities of all 73

P (3rd is B|first two are G,B)
sequences of 3 G’s and 4 B’s gives:
= 299/998

P (3 green & 4 blue)


P (GBBGBGB) !
7 (700 · 699 · 698) · (300 · 299 · 298 · 297)
700 300 299 699 298 698 297 =
= · · · · · · 3 1000 · 999 · 998 · 997 · 996 · 995 · 994
1000 999 998 997 996 995 994
(700 · 699 · 698) · (300 · 299 · 298 · 297)
=
1000 · 999 · 998 · 997 · 996 · 995 · 994
Sampling without replacement
Equivalence of both methods, and approximation by binomial distribution
A urn has 1000 balls: 700 green, 300 blue.
A sample of 7 balls is drawn, without replacement.
What is the probability that it has 3 green balls and 4 blue balls?
Probability with hypergeometric distribution (1st method)
700 300
 
3
P (3 green & 4 blue) = 1000
4
7
700 · 699 · 698 300 · 299 · 298 · 297
·
= 3! 4!
1000·999·998·997·996·995·994
7!
7! (700 · 699 · 698) · (300 · 299 · 298 · 297)
=
3!4! 1000 · 999 · 998 · 997 · 996 · 995 · 994
2nd method to compute hypergeometric distribution
!
7
≈ (700/1000)3 (300/1000)4
3

Probability with binomial distribution


If the numbers of green, blue, and total balls in the sample are much smaller
than in the urn, the hypergeometric pmf ≈ the binomial pmf.
Hypergeometric distribution vs. Binomial distribution
p = 0.25, Population size N = 10000

Figure 6:
Exercise: Prove the expected value/the variance of hypergeometric distribution:
np(1 − p)(N − n)
E[X] = np and Var(X) = .
N −1
Reference

Sheldon Ross, A first course in probability. 8th edition. Pearson Education


International.
Tesler G.P., MATH186 Lecture Notes.

You might also like