0% found this document useful (0 votes)
10 views31 pages

Chap 2.1

This document covers the concepts of random variables and probability distributions, defining random variables as numerical functions assigned to outcomes in a sample space. It explains discrete random variables, probability mass functions, cumulative distribution functions, and mathematical expectation, providing examples for clarity. Additionally, it discusses properties of cumulative distribution functions and introduces mean and variance as measures of central tendency and dispersion in probability distributions.

Uploaded by

jeffsiu456
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views31 pages

Chap 2.1

This document covers the concepts of random variables and probability distributions, defining random variables as numerical functions assigned to outcomes in a sample space. It explains discrete random variables, probability mass functions, cumulative distribution functions, and mathematical expectation, providing examples for clarity. Additionally, it discusses properties of cumulative distribution functions and introduces mean and variance as measures of central tendency and dispersion in probability distributions.

Uploaded by

jeffsiu456
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

Stat1301 Probability& Statistics I Spring 2008-2009

Chapter II Random Variables and Probability


Distributions
§ 2.1 Random Variables

Definition

A random variable X : Ω → ℜ is a numerical valued function defined on a sample


space. In other words, a number X (ω ) , providing a measure of the characteristic of
interest, is assigned to each outcome ω in the sample space.

Remark
Always keep in mind that X is a function rather than a number. The value of X
depends on the outcome. We write X = x to represent the event
{ ω ∈ Ω | X (ω ) = x } and X ≤ x to represent the event { ω ∈ Ω | X (ω ) ≤ x }.

Example
Let X be the number of aces in a hand of three cards drawn randomly from a deck
of 52 cards. Denote A as an ace card and N as a non-ace card. Then

Ω = {AAA, AAN, ANA, ANN, NAA, NAN, NNA, NNN}

The space of X is {0, 1, 2, 3}. Hence X is discrete.

X : Ω → { 0,1,2,3 } such that

X ( AAA) = 3
X ( AAN ) = X ( ANA) = X ( NAA) = 2
X ( ANN ) = X ( NAN ) = X ( NNA ) = 1
X ( NNN ) = 0

Refer to the same example from Chapter I, we have

P( X = 0) = P ({ NNN }) = 0.78262
P( X = 1) = P ({ ANN , NAN , NNA}) = 0.06805 × 3 = 0.20415
P( X = 2 ) = P ({ AAN , ANA, NAA }) = 0.00434 × 3 = 0.01302
P( X = 3) = P ({ AAA }) = 0.00018

P.39
Stat1301 Probability& Statistics I Spring 2008-2009

Example
The annual income ω of a randomly selected citizen has a sample space Ω = [0, ∞ ) .
Suppose the annual income is taxable if it exceeds c. Let X be the taxable income.
Then the space of X is also [0, ∞ ) and X : Ω → [0, ∞ ) such that

⎧0, ω≤c
X (ω ) = ⎨ .
⎩ω − c, ω>c

Note: Conventionally, we use capital letters X, Y, … to denote random variables


and small letters x, y, … the possible numerical values (or realizations) of
these variables.

§ 2.2 Distribution of the Discrete Type

Definition

A random variable X defined on the sample space Ω is called a discrete random


variable if X (Ω ) = { X (ω ) : ω ∈ Ω } is countable (e.g. X : Ω → { 0,1,2,... }).

§ 2.2.1 Probability Mass Function and Distribution Function

Definition

The probability mass function (pmf) of a discrete random variable X is defined as

p(x ) = P( X = x ) , x ∈ X (Ω ) ,

where X (Ω ) is the countable set of possible values of X.

Example
For the previous example of card drawing, the pmf of X is

p(0 ) = 0.78262 , p(1) = 0.20415 , p(2 ) = 0.01302 , p(3) = 0.00018

P.40
Stat1301 Probability& Statistics I Spring 2008-2009

Conditions for a pmf

Since pmf is defined through probability, we have the following conditions for p to
be a valid pmf:

1. p( x ) ≥ 0 for x ∈ X (Ω ) ;
p ( x ) = 0 for x ∉ X (Ω )

2. ∑ p(x ) = 1
x∈X (Ω )

3. P ( X ∈ A) = ∑ p ( x ) where A ⊂ X (Ω )
x∈ A

Example

Is p( x ) = x 6 , x = 1,2,3 a valid pmf ?

p(x )
1/2
1/3
1/6

1 2 3

X (Ω ) = {1,2,3 }

1. p( x ) = x 6 > 0 for all x = 1,2,3 .

1 1 1
∑ p( x ) = 6 + 3 + 2 = 1
3
2.
x =1

1
3. P ( X ≤ 2 ) = p (1) + p (2 ) =
2

P.41
Stat1301 Probability& Statistics I Spring 2008-2009

Definition

The (cumulative) distribution function (cdf) of the discrete random variable X is


defined as

F ( x ) = P ( X ≤ x ) = ∑ p (t ) , −∞< x<∞.
t≤x

Example

Using previous example, p( x ) = x 6 , x = 1,2,3 , we have

1
F (1) = P ( X ≤ 1) = P ( X = 1) =
6

1
F (1.5) = P ( X ≤ 1.5) = P ( X = 1) = = F (1.566) = F (1.99999 ) = ...
6

1
F (2 ) = P ( X ≤ 2 ) = p (1) + p (2 ) =
2

F (3) = P ( X ≤ 3) = p (1) + p (2 ) + p (3) = 1

As can be seen, the cdf of a discrete random variable would be a step-function with
p ( x ) as the size of the jumps at the possible value x.

F (x )
1

1/2

1/6

1 2 3

P.42
Stat1301 Probability& Statistics I Spring 2008-2009

Properties of a cdf
1. F ( x ) is nondecreasing, i.e. if a ≤ b , then F (a ) ≤ F (b ) .

2. F (− ∞ ) = lim F (b ) = 0
b→−∞

3. F (∞ ) = lim F (b ) = 1
b →∞

4. F is right continuous. That is, for any b and any decreasing sequence
{ bn , n ≥ 1 } that converges to b, lim F (bn ) = F (b ) .
n →∞

5. F ( x ) is a step-function if X is a discrete random variable. The step size at the


point x ∈ X (Ω ) is P ( X = x ) .

6. The cdfs are useful in describing probability models. The probabilities


attributed to events concerning a random variable X can be reconstructed from
the cdf of X, i.e. the cdf of X completely specifies the random behaviour of X.

§ 2.2.2 Mathematical Expectation

Example
Consider the following two games. In each game, three fair dice will be rolled.

Game 1: If all the dice face up with same number, then you win $24 otherwise
you lose $1.

Game 2: You win $1, $2, or $3, according to one die face up as six, two dice face
up as six or three dice face up as six, respectively. If no dice face up as
six, then you lose $1.

Which game is a better choice?

To make a better betting strategy, one may consider the amount one will win (or
lose) in long run. First we need to evaluate the probabilities of win or lose in each
game. Let X, Y be the amounts of money you will win in one single trial of game
1 and game 2 respectively. A negative value means you lose money.

P.43
Stat1301 Probability& Statistics I Spring 2008-2009

For game 1,
1 1 1 1
P ( X = 24 ) = P (same number on three dice ) = 6 × × × =
6 6 6 36
1 35
P ( X = −1) = 1 − =
36 36

For game 2,
1 5 5 25 5 1 1 5
P (Y = 1) = 3 × × × = , P (Y = 2 ) = 3 × × × =
6 6 6 72 6 6 6 72

1 1 1 1 25 5 1 125
P (Y = 3) = × × = , P (Y = −1) = 1 − − − =
6 6 6 216 72 72 216 216

Suppose we play game 1 for 36000 times. Since the relative frequency is a good
estimate of the probability when number of trials is large, approximately in 1000
times we will win $24 and 35000 times we will lose $1. So in these 36000 trials of
game 1, we win
24 × 1000 + (− 1) × 35000 = −11000

Approximately we will lose $11000 in 36000 trial of game 1. The average amount
we win in each trial is
− 11000 11
=−
36000 36

This is the long term average of gain if we play game 1. Indeed it can be calculated
as
11 24 × 1000 + (− 1) × 35000
− =
36 36000
1 35
= 24 × + (− 1) ×
36 36
= 24 × P ( X = 24 ) + (− 1) × P ( X = −1)

Similarly, the long term average of gain if we play game 2 is

1 × P (Y = 1) + 2 × P (Y = 2 ) + 3 × P (Y = 3) + (− 1) × P (Y = −1)
25 5 1 125 17 11
= 1× + 2× + 3× + (− 1) × =− >−
72 72 216 216 216 36

Therefore game 2 is better than game 1 in terms of long term average gain.

However, since in the long run you will lose money in both game, the best strategy
is do not gamble at all.

P.44
Stat1301 Probability& Statistics I Spring 2008-2009

Definition

Let X be a discrete random variable with pmf p( x ) . The mathematical expectation


(expected value) of X is defined by

E ( X ) = ∑ xp ( x )
x∈ X (Ω )

provided that the summation exists. In general, for any function g, the expected
value of g ( X ) is

E (g ( X )) = ∑ g ( x ) p ( x ) .
x∈ X ( Ω )

( )
e.g. E X 2 = ∑ x 2 p ( x ) , E (log X ) = ∑ (log x ) p ( x ) , …, etc.
x∈ X ( Ω ) x∈ X ( Ω )

Properties
1. If c is a constant, then E (c ) = c .

E (c ) = ∑ cp ( x ) = c ∑ p ( x ) = c
x∈ X ( Ω ) x∈ X (Ω )

2. If c is a constant, then E (cg ( X )) = cE (g ( X )) .

E (cg ( X )) = ∑ cg ( x ) p ( x ) = c ∑ g ( x ) p ( x ) = cE (g ( X ))
x∈ X ( Ω ) x∈ X ( Ω )

⎛n ⎞ n
3. If c1 , c2 ,..., cn are constants, then E ⎜ ∑ ci gi ( X )⎟ = ∑ ci E (gi ( X )) .
⎝ i =1 ⎠ i =1

e.g. E (5 X + 3 X 2 ) = 5 E ( X ) + 3E ( X 2 ) .

However, E ( X 2 ) ≠ [E ( X )] , E (log X ) ≠ log E ( X )


2

4. X (ω ) ≥ Y (ω ) for all ω ∈ Ω ⇒ E ( X ) ≥ E (Y )

5. E ( X ) ≥ E ( X )

P.45
Stat1301 Probability& Statistics I Spring 2008-2009

Example

In the previous gambling example, for game 1,

E ( X 2 ) = (24 )
1 2 35 611
+ (− 1) = = 16.9722
2

36 36 36

⎛⎛ ⎛ 11 ⎞ ⎞ ⎞⎟ ⎛
2 2
11 ⎞ 1 ⎛
2
11 ⎞ 35

E ⎜ X − ⎜ − ⎟ ⎟ = ⎜ 24 + ⎟ + ⎜−1+ ⎟ = 16.8788
⎜⎝ ⎝ 36 ⎠ ⎠ ⎟ ⎝ 36 ⎠ 36 ⎝ 36 ⎠ 36
⎝ ⎠

Alternatively,

⎛⎛ ⎛ ⎞ ⎞
2
⎞ ⎛ 11 ⎞
2
⎛ 11 ⎞
E ⎜ X − ⎜ − ⎟ ⎟ = E ( X ) + ⎜ ⎟ + 2⎜ ⎟ E ( X )
⎜ 11 ⎟ 2
⎜⎝
⎝ ⎝ 36 ⎠ ⎠ ⎟⎠ ⎝ 36 ⎠ ⎝ 36 ⎠
= 16.9722 + 0.09336 + 2(0.3056 )(− 0.3056 ) = 16.8788

(
The value E ( X − E ( X ))
2
) can tell us the variation of our gains among long term
trials of game 1.

§ 2.2.3 Mean and Variance

Definition
If X is a discrete random variable with pmf p( x ) and space X (Ω ) , then E ( X ) is
called the (population) mean of X (of the distribution) and is usually denoted by
μ . It is a measure of central location of the random variable X .

(
Furthermore, E ( X − μ ) =
2
) ∑ ( x − μ ) p ( x ) is called the (population) variance
x ∈ X (Ω )
2

of X (of the distribution) and is usually denoted by σ 2 or Var ( X ) .

The positive square root σ = σ 2 = Var ( X ) is called the (population) standard


deviation of X (of the distribution). Both σ and σ 2 are measures of spread.

P.46
Stat1301 Probability& Statistics I Spring 2008-2009

Properties

1. Var ( X ) = E ( X 2 ) − μ 2

(
Var ( X ) = E ( X − μ )
2
)
= E ( X 2 − 2 μX + μ 2 )
= E ( X 2 ) − 2 μE ( X ) + μ 2
= E (X 2 ) − μ 2

2. Let a, c be two constants. Then Var (aX + c ) = a 2Var ( X ) .

{
Var (aX + c ) = E [(aX + c ) − E (aX + c )]
2
}
{
= E [aX − aE ( X )]
2
}
(
= E a2 (X − μ )
2
)
= a E (( X − μ ) ) = a Var ( X )
2 2 2

Example

For the gambling example,

Var ( X ) = 16.8788 , σ x = 16.8788 = 4.1084

E (Y 2 ) = 1 ×
25 5 1 125
+ 4× + 9× + 1× = 1.2454
72 72 216 216
2
⎛ 17 ⎞
Var (Y ) = E (Y ) − μ = 1.2454 − ⎜ −
2 2
⎟ = 1.2392
⎝ 216 ⎠
y

σ y = 1.2392 = 1.1132

Therefore the variation of gain from game 2 is much less than that from game 1.

P.47
Stat1301 Probability& Statistics I Spring 2008-2009

Example

An individual who owns the ice cream concession at a sporting event can expect to
net $600 on the sale of ice cream if the day is sunny, but only $300 if it is cloudy,
and $100 if it rains. The respective probabilities for those events are 0.6, 0.3 and
0.1.

Let $ X be his net profit in a particular day. Then the distribution of X can be
described by the following pmf:

P ( X = 600) = 0.6 , P ( X = 300) = 0.3 , P ( X = 100) = 0.1 .

Hence his expected profit is given by

E ( X ) = 600 × 0.6 + 300 × 0.3 + 100 × 0.1 = 460

Variance of profit :

( )
E X 2 = (600 ) × 0.6 + (300 ) × 0.3 + (100 ) × 0.1 = 244000
2 2 2

( )
Var ( X ) = E X 2 − [E ( X )] = 244000 − (460 ) = 32400
2 2

Hence the standard deviation of his profit is $180 .

Markov Inequality

If X is a positive random variable with finite mean, then for any constant c > 0 ,

E(X )
P( X ≥ c ) ≤ .
c

Proof
E ( X ) = ∑ xp ( x )
x∈ X (Ω )
= ∑ xp ( x ) + ∑ xp ( x )
x≥c x<c
≥ ∑ xp ( x ) ≥ c ∑ p ( x ) = P ( X ≥ c )
x≥c x≥c

P.48
Stat1301 Probability& Statistics I Spring 2008-2009

Chebyshev’s Inequality

If the random variable X has a finite mean and finite variance σ 2 , then for any
constant k > 0 ,
P ( X − μ ≥ kσ ) ≤ 2 .
1
k

e.g. The probability that X deviates from the mean more than 2 standard
deviations is at most 0.25.

Proof

By Markov inequality,
⎛ X −μ 2
P ( X − μ ≥ kσ ) = Pr ⎜ ≥k ⎟≤
2 (
⎞ E ( X − μ )2 σ 2 1
= 2
)
⎜ σ ⎟ k 2
k
⎝ ⎠

§ 2.2.4 Moment and Moment Generating Function

Definition

Let r be a positive integer. E ( X r ) is called the rth moment of X . E ( X − b ) is ( r


)
called the r th moment of X about b if it exists. It is also called the rth central
moment if b = μ . For example, E ( X ) is the 1st moment of X and σ 2 is the 2nd
central moment of X.

Definition

Let X be a discrete random variable with pmf p( x ) and space X (Ω ) , then

( )
M X (t ) = E e tX = ∑ e tx p ( x )
x∈ X ( Ω )

is called the moment generating function of X if it exists. The domain of M X (t )


are all real number t such that e tX has finite expected value.

P.49
Stat1301 Probability& Statistics I Spring 2008-2009

Example
x
⎛1⎞
Suppose X is a random variable with pmf p ( x ) = 2⎜ ⎟ , x = 1,2,3,...
⎝ 3⎠
Then the moment generating function of X is

x
∞ ∞ ⎛ et ⎞ x
⎛1⎞
M X (t ) = ∑ e 2⎜ ⎟ = ∑ 2⎜⎜ ⎟⎟ .
tx

x =1 ⎝ 3⎠ x =1 ⎝ 3 ⎠

For M X (t ) exist, the series must be converging, i.e. e t 3 < 1 . Therefore for t < ln 3 ,

et 3 2e t
M X (t ) = 2 = .
1 − et 3 3 − et

M X (t ) is undefined if t ≥ ln 3 .

Properties of moment generating function


It can be used for “generating” moments.

d r (M (t ))
M ( r ) (0 ) = r
= E (X r )
dt t =0

Proof ( )
M (t ) = E e tX = ∑ e tx p ( x ) ⇒ M (0 ) = 1
x∈ X (Ω )

M ' (t ) = ∑ xe tx p ( x ) ⇒ M ' (0) = ∑ xp ( x ) = E ( X )


x∈ X ( Ω ) x∈ X ( Ω )

M ' ' (t ) = ∑ x 2 e tx p ( x ) ( )
⇒ M ' ' (0) = ∑ x 2 p ( x ) = E X 2
x∈ X (Ω ) x∈ X ( Ω )

and so on …

P.50
Stat1301 Probability& Statistics I Spring 2008-2009

Example

For the previous example,

M ' (t ) =
(3 − e )(2e ) − (2e )(− e ) ⇒ μ = M ' (0) = 4 + 2 = 3
t t t t

(3 − e ) t 2
4 2

σ 2 = M ' ' (0 ) − [M ' (0 )] may be difficult to find.


2

Consider the function R(t ) = ln M (t ) ,

M ' (t ) M ' (0 )
R' (t ) = ⇒ R' (0 ) = =μ
M (t ) M (0 )

M (t )M ' ' (t ) − M ' (t )M ' (t ) M (0 )M ' ' (0 ) − [M ' (0 )]


2

R' ' (t ) = ⇒ R ' ' (0 ) = =σ 2


[M (t )] 2
[M (0 )] 2

⇒ R(t ) = ln 2 + t − ln (3 − e t ),
2e t
Therefore M (t ) =
3−e t

R ' (t ) = 1 +
et
, R' ' (t ) =
(3 − e )e − e (− e )
t t t t

3 − et (3 − e )t 2

Hence

1 3 2 +1 3
μ = R' (0 ) = 1 + = , σ 2 = R' ' (0 ) = = .
3 −1 2 4 4

Remark

Moment generating function uniquely characterizes the distribution. That is, if X


and Y have the same moment generating function, then they must have the same
distribution.

P.51
Stat1301 Probability& Statistics I Spring 2008-2009

§ 2.2.5 Bernoulli Trials and the Binomial Distribution

Bernoulli experiment

Possible outcome X Probability


Success 1 p
Fail 0 1− p

e.g.

Trial Success Failure


Tossing a coin Head Tail
Birth of a child Boy Girl
Pure guess in multiple choice Correct Wrong
Randomly choose a voter Support Not support
Randomly select a product Non defective Defective

p( x ) = p x (1 − p ) ,
1− x
pmf of X : x = 0,1

We call the distribution of X “Bernoulli distribution”. The outcome of each


experiment is called the “Bernoulli trial”.

μ = p, E (X 2 ) = p , σ 2 = p − p 2 = p(1 − p ) .

Binomial Distribution

Let X be the random variable denoting the number of successes in n Bernoulli


trials. If these n Bernoulli trials are:

(i) having the same success probability p, and

(ii) independent, i.e. the success probability of any trial is not affected by the
outcome of other trials;

then X is said to have a binomial distribution with n trials and success probability
p . It is denoted as
X ~ b(n, p ) .

P.52
Stat1301 Probability& Statistics I Spring 2008-2009

Example

Let X be the number of boys in a family with four children.


Success : boy
Failure : girl

Value of X 0 1 2 3 4
FFFF SFFF SSFF SSSF SSSS
FSFF SFSF SSFS
FFSF SFFS SFSS
FFFS FSSF FSSS
FSFS
FFSS

probability (1 − p ) 4
p(1 − p )
3
p 2 (1 − p )
2
p 3 (1 − p ) p4

no. of permutations ⎛ 4⎞ ⎛ 4⎞ ⎛ 4⎞ ⎛ 4⎞ ⎛ 4⎞
⎜⎜ ⎟⎟ ⎜⎜ ⎟⎟ ⎜⎜ ⎟⎟ ⎜⎜ ⎟⎟ ⎜⎜ ⎟⎟
⎝0⎠ ⎝1 ⎠ ⎝ 2⎠ ⎝3⎠ ⎝ 4⎠

Therefore the pmf of X is given by

⎛4⎞
p ( x ) = P ( X = x ) = ⎜⎜ ⎟⎟ p x (1 − p ) ,
4− x
x = 0,1,2,3,4 .
⎝ x⎠

In general, pmf of X ~ b(n, p ) is given by

⎛n⎞
p ( x ) = P ( X = x ) = ⎜⎜ ⎟⎟ p x (1 − p ) ,
n− x
x = 0,1,2,..., n .
⎝ x⎠

Binomial theorem
⎛n⎞
(a + b )
n
= ∑ ⎜⎜ ⎟⎟a i b n−i
n

i =0 ⎝ i ⎠

Hence p ( x ) is the ( x + 1 )th term in the expansion of ( p + (1 − p )) .


n

P.53
Stat1301 Probability& Statistics I Spring 2008-2009

Distribution function:
x ⎛n⎞
F ( x ) = P ( X ≤ x ) = ∑ ⎜⎜ ⎟⎟ p i (1 − p )
n −i

i = 0⎝ i ⎠

Moment generating function :

⎛n⎞
M X (t ) = ∑ e tx ⎜⎜ ⎟⎟ p x (1 − p )
n
n− x

x =0 ⎝ x⎠
⎛n⎞
= ∑ ⎜⎜ ⎟⎟(e t p ) (1 − p )
n
x n− x

x =0 ⎝ x ⎠

= (e t p + 1 − p )
n

From this moment generating function one can easily derive

μ = np , σ 2 = np(1 − p ) .

Example

An examination paper consists of 50 multiple choice questions, with 5 choices for


each question. A student goes into the examination without knowing a thing, and
tries to answer all the questions by pure guessing. Let X be the number of questions
this student can answer correctly. Then obviously X ~ b(50,0.2 ) .

On average, he will get E ( X ) = 50 × 0.2 = 10 correct answers by pure guessing, and


the corresponding variance is Var ( X ) = 50 × 0.2 × 0.8 = 8 .

The probability of getting 15 correct answers by pure guessing is

⎛ 50 ⎞
P ( X = 15) = p (15) = ⎜⎜ ⎟⎟(0.2 ) (0.8) = 0.02992 .
15 35

⎝15 ⎠

Suppose that two marks will be given for each correct answer, while half mark will
be deducted for each incorrect answer. Let Y be the total score this student can get.
Then Y = 2 × X + (− 0.5) × (50 − X ) = 2.5 X − 25 .

On average, he will get

E (Y ) = E (2.5 X − 25) = 2.5E ( X ) − 25 = 2.5 × 10 − 25 = 0

P.54
Stat1301 Probability& Statistics I Spring 2008-2009

marks and the corresponding variance is

Var (Y ) = Var (2.5 X − 25) = (2.5) Var ( X ) = (2.5) × 8 = 50 .


2 2

If the passing mark is set to 40, the probability that he will pass the examination is

P (Y ≥ 40) = P (2.5 X − 25 ≥ 40)


= P ( X ≥ 26)
50 ⎛ 50 ⎞
= ∑ ⎜⎜ ⎟⎟(0.2 ) (0.8)
i n −i

i = 26 ⎝ i ⎠

= 0.000000492

Example

Eggs are sold in boxes of six. Each egg has independently of the others a
probability 0.2 being cracked. A shopper requires three boxes of eggs and regards
as satisfactory if a box contains not more than two cracked eggs.

Let X be the number of cracked eggs in a particular box. Then X ~ b(6,0.2 ) .

P (a box is satisfactory ) = P ( X ≤ 2 )

⎛ 6⎞ ⎛6⎞
= (0.8) + ⎜⎜ ⎟⎟(0.2 )(0.8) + ⎜⎜ ⎟⎟(0.2 ) (0.8) = 0.90112
6 5 2 4

⎝1 ⎠ ⎝ 2⎠

Let Y be the number of satisfactory boxes in five boxes. Then Y ~ b(5,0.90112 ) .

P (at least 3 boxes are satisfactory ) = P (Y ≥ 3)

⎛ 5⎞ ⎛5⎞
= ⎜⎜ ⎟⎟(0.90112 ) (0.09888 ) + ⎜⎜ ⎟⎟(0.90112 ) (0.09888) + (0.90112 )
3 2 4 5

⎝ 3⎠ ⎝ 4⎠

= 0.99171

P.55
Stat1301 Probability& Statistics I Spring 2008-2009

Properties of the Binomial Distribution

1. When n is equal to 1, b(1, p ) is just the Bernoulli distribution.

2. X can be viewed as a sum of independent Bernoulli random variables

Yi ~ b(1, p ) .
n iid
X = ∑ Yi ,
i =1

3. If X ~ b(n, p ) , then P ( X = k ) first increases monotonically and then decreases


monotonically, reaching its largest value when k is the largest integer less than
or equal to (n + 1) p .

4. If X ~ b(n, p ) , then
p n−k
P ( X = k + 1) = P( X = k ).
1− p k +1

§ 2.2.6 Geometric and Negative Binomial Distribution

Geometric Distribution
Suppose we perform a sequence of independent Bernoulli trials with success
probability p. Let X be the number of trials performed until the first success is
obtained. Then X is said to have a Geometric distribution. It is denoted by

X ~ Geogmetric ( p ) .

The pmf of X is given by

p ( x ) = P ( X = x ) = (1 − p )
x −1
p, x = 1,2,...

Distribution function :
F ( x ) = Pr ( X ≤ x ) = ∑ p(i )
x

i =1

1 − (1 − p )
x x
= p ∑ (1 − p )
i −1
=p
i =1 1 − (1 − p )
= 1 − (1 − p ) , x = 1,2,...
x

P.56
Stat1301 Probability& Statistics I Spring 2008-2009

Moment generating function :


M X (t ) = ∑ e tx (1 − p )
x −1
p
x =1

= ∑ e t ( y +1) (1 − p ) p
y

y =0

[ ]

= e t p ∑ e t (1 − p )
y

y =0
t
e p
= , t < − ln (1 − p )
1 − e t (1 − p )

From this moment generating function, one can easily get

1 1− p
μ= , σ2 = .
p p2

Example

In a casino game, roulette, suppose you bet on the number 00 for every trial. Let X
be the number of games played until you win once, then X ~ Geogmetric(1 38) .
On average, you will need to play for E ( X ) = 38 games in order to win once. The
probability of no more than 4 games played until your first win is

4
⎛ 1⎞
P ( X ≤ 4 ) = F (4 ) = 1 − ⎜1 − ⎟ = 0.1012 .
⎝ 38 ⎠

Example

Often packages that you buy in a store include a card with a picture, or other items
of a set, and you try to collect all of the N possible cards. We would be interested
in the expected number of trials one should make in order to collect a set of the
cards. Define

X 1 = 1 , the number of trials until we see the first new card


X i = the number of trials after the (i − 1) th new card until the ith new card

and assume that the packages are independent with equal chances to contain the N
possible cards.
P.57
Stat1301 Probability& Statistics I Spring 2008-2009

Then for i > 1 , the distribution of X i is Geometric with success probability


N − i +1 N
and therefore E ( X i ) = . Let W be the expected number of trials
N N − i +1
N
needed for collecting the whole set of N different cards. Then W = ∑ X i and
i =1
therefore
N N N N 1
E (W ) = ∑ E ( X i ) = ∑ = N∑ .
i =1 i =1 N − i + 1 i =1 i

In particular, if N = 9 , then

⎛ 1 1⎞
E (W ) = 9 × ⎜1 + + L + ⎟ = 25.4607 .
⎝ 2 9⎠

Remarks

1. There is another definition of the Geometric distribution. Let Y be the number


of failures before the first success. Then obviously Y = X − 1 .

p( y ) = (1 − p ) p , y = 0,1,2,...
y

F ( y ) = 1 − (1 − p )
y +1
, y = 0,1,2,...
p
M Y (t ) = , t < − ln (1 − p )
1 − e t (1 − p )
1− p 1− p
μ= , σ2 = 2
p p

2. Consider
(
P ( X > a + b ) = 1 − 1 − (1 − p ) )
= (1 − p ) (1 − p )
a +b a b

= P ( X > a ) Pr ( X > b )
Therefore
P ( X > a + b, X > a )
P( X > a + b | X > a ) = = P( X > b) .
P( X > a )

Hence conditional on no success before ath trial, the probability of no success in


the next b trials is equal to the unconditional probability of no success before
the first bth trial. This property is called the memoryless property. Among all
discrete distributions, geometric distribution is the only one that has the
memoryless property.

P.58
Stat1301 Probability& Statistics I Spring 2008-2009

Example

Consider the roulette example. Since the geometric distribution is memoryless,


although you may lose 100 games in a row, the probability of waiting more than 5
games till you win is just the same as the scenario that you didn’t lose the 100
games at all. This dispels the so called Gambler’s fallacy that “cold hands” are
more (less) likely to come up after observing a series of “hot hands”.

Negative Binomial Distribution

Suppose we perform a sequence of independent Bernoulli trials with success


probability p. Let X be the number of trials until a total of r successes are
accumulated. (Totally there are X trials to produce r successes, and the Xth trial is a
success.) Then X is said to have a negative binomial distribution. It is denoted by

X ~ nb(r , p ).

The pmf of X is given by

⎛ x − 1⎞ r
p ( x ) = P ( X = x ) = ⎜⎜ ⎟⎟ p (1 − p ) ,
x−r
x = r , r + 1, r + 2,...
⎝ r −1⎠

Negative binomial theorem

1 1 1
= 1 + ar + r (r + 1)a 2 + L + r (r + 1)L(r + k − 1)a k + L
(1 − a )r
2! k!
∞ ⎛ r + k − 1⎞ ∞ ⎛ y − 1⎞
= ∑ ⎜⎜ ⎟⎟a k = ∑ ⎜⎜ ⎟⎟a y −r
k =0 ⎝ k ⎠ y =r ⎝ r − 1 ⎠

Hence p( x ) is the ( x + 1 )th term in the expansion of p r (1 − (1 − p )) .


−r

Distribution function :
x ⎛i −1 ⎞
F ( x ) = ∑ ⎜⎜ ⎟⎟ p r (1 − p )i −r , x = r , r + 1, r + 2,...
i = r ⎝ r − 1⎠

P.59
Stat1301 Probability& Statistics I Spring 2008-2009

Moment generating function :

∞ ⎛ x − 1⎞ r
M X (t ) = ∑ e tx ⎜⎜ ⎟⎟ p (1 − p )x −r
x =r ⎝ r − 1⎠
∞ ⎛ x − 1⎞
= e p ∑ ⎜⎜
tr r
(
⎟⎟ e t (1 − p ) )
x−r

x=r ⎝ r − 1 ⎠
r
⎡ et p ⎤
=⎢ ⎥ , t < − ln (1 − p )
⎣1 − e (1 − p )⎦
t

From this moment generating function one can easily derive

r r (1 − p )
μ= , σ2 = .
p p2

Example

Fermat and Pascal are sitting in a cafe in Paris and decide to play the simplest of all
games, flipping a coin. If the coin comes up head, Fermat gets a point. If the coin
comes up tail, Pascal gets a point. The first to get 10 points wins the total pot worth
100 Francs. But then a strange thing happens. Fermat is winning 7 points to 6,
when he receives an urgent message that a friend is sick, and he must rush to his
home town of Toulouse immediately. Of course Pascal understands, but later, in
correspondence, the problem arises: how should the 100 Francs be divided?

Ans: Let X be the number of additional games they need to play so that Fermat
can get 3 more points. Then X is the number of trials until 3 heads (successes)
are obtained. Therefore X is a negative binomial random variable with r = 3
and p = 0.5 . The probability mass function is given by

⎛ x − 1⎞ ⎛ x − 1⎞
p ( x ) = ⎜⎜ ⎟⎟(0.5) (1 − 0.5) = ⎜⎜ ⎟⎟(0.5) ,
x −3
x = 3,4,5,...
3 x

⎝ 3 − 1⎠ ⎝ 2 ⎠

For Fermat to win the game, Pascal should get less than 4 points before he
gets 3 points, i.e. X must be less than 7. Therefore

P (Fermat win ) = P ( X < 7 )


⎛ 3 − 1⎞ ⎛ 4 − 1⎞ ⎛ 5 − 1⎞ ⎛ 6 − 1⎞
= ⎜⎜ ⎟⎟(0.5) + ⎜⎜ ⎟⎟(0.5) + ⎜⎜ ⎟⎟(0.5) + ⎜⎜ ⎟⎟(0.5)
3 4 5 6

⎝ 2 ⎠ ⎝ 2 ⎠ ⎝ 2 ⎠ ⎝ 2 ⎠
= 0.65625

P.60
Stat1301 Probability& Statistics I Spring 2008-2009

Hence Fermat should receive 65.625 Francs, while Pascal should receive
34.375 Francs.

For general problem of points with intermediate scores m to n, the probability that
the side with m points would win the game is

n −1⎛ i −1 ⎞ m
⎟⎟ p (1 − p ) .
i −m
∑ ⎜⎜
i = r ⎝ m − 1⎠

Remarks
1. If r is equal to 1, then the negative binomial distribution becomes the
geometric distribution, i.e. nb(1, p ) ≡ Geogmetric( p ) .

2. There is another definition of the negative binomial distribution. Let Y be the


number of failures before the rth success. Then obviously, Y = X − r .

⎛ y + r − 1⎞ r
p ( y ) = ⎜⎜ ⎟⎟ p (1 − p ) , y = 0,1,2,...
y

⎝ r −1 ⎠

r
⎛ p ⎞
M Y (t ) = ⎜⎜ ⎟⎟ , t < − ln(1 − p )
⎝ 1 − e t
(1 − p ) ⎠

r (1 − p ) r (1 − p )
μ= , σ2 =
p p2

3. A technique known as inverse binomial sampling is useful in sampling


biological populations. If the proportion of individuals possessing a certain
characteristic is p and we sample until we see r such individuals, then the
number of individuals sampled is a negative binomial random variable.

P.61
Stat1301 Probability& Statistics I Spring 2008-2009

§ 2.2.7 Hypergeometric Distribution

Definition

Suppose we have N objects with m objects as type I and ( N − m ) objects as type II.
A sample of n objects is randomly drawn without replacement from the N objects.
Let X be the number of type I objects in the sample. Then X is said to have a
Hypergeometric distribution. It is denoted by

Hypergeometric(N , m, n ) .

The pmf of X is given by

⎛ m ⎞⎛ N − m ⎞
⎜⎜ ⎟⎟⎜⎜ ⎟
⎝ x ⎠⎝ n − x ⎟⎠
p(x ) = P( X = x ) = , max(n − ( N − m ),0 ) ≤ x ≤ min(n, m )
⎛N⎞
⎜⎜ ⎟⎟
⎝n ⎠
⎛ N − n⎞
μ = np , σ2 =⎜ ⎟np(1 − p )
⎝ N −1 ⎠

where p = m N is the proportion of type I objects in the population of N objects.

Remark

The hypergeometric distribution can be regarded as a finite population counterpart


of the binomial distribution.

Sampling with replacement → Binomial distribution

Sampling without replacement → Hypergeometric distribution

The variances of a hypergeometric distribution with a binomial distribution differ


N −n
by the multiplier and this expression is therefore called the finite population
N −1
correction factor.

P.62
Stat1301 Probability& Statistics I Spring 2008-2009

Example

Let X be the number of 2’s in a hand of 13 cards drawn randomly from a deck of
52 cards. Then X has a hypergeometric distribution with N = 52 , m = 4 , n = 13 .

4 ⎛ 52 − 13 ⎞ ⎛ 4 ⎞⎛ 4⎞
E ( X ) = 13 × =1 , Var ( X ) = ⎜ ⎟(13)⎜ ⎟⎜1 − ⎟ = 0.7059
52 ⎝ 52 − 1 ⎠ ⎝ 52 ⎠⎝ 52 ⎠

⎛ 4 ⎞⎛ 48 ⎞ ⎛ 52 ⎞
P ( X = 3) = ⎜⎜ ⎟⎟⎜⎜ ⎟⎟ ⎜⎜ ⎟⎟ = 0.04120
⎝ 3 ⎠⎝ 10 ⎠ ⎝ 13 ⎠

Example

(Capture and Re-capture Experiment)

To estimate the population size of a specific kind of animal in a certain region, e.g.
number of fishes in a lake, ecologists often perform the following procedures.

1. Catch m fishes from the lake.


2. Tag these m fishes with a certain marker and release them back to the lake.
3. After a certain period of time, catch n fishes from the lake.
4. Count the number of tagged fishes found in this new sample, denote as X.

It follows that X is a hypergeometric random variable such that

⎛ m ⎞⎛ N − m ⎞ ⎛N⎞
P ( X = i ) = ⎜⎜ ⎟⎟⎜⎜ ⎟⎟ ⎜⎜ ⎟⎟ = Pi (N ) (say)
⎝ i ⎠⎝ n − i ⎠ ⎝n⎠

Pi ( N ) (N − m )(N − n ) . This ratio is greater than 1 if and only if


Consider =
Pi ( N − 1) N ( N − m − n + i )

(N − m )(N − n ) ≥ N (N − m − n + i ) ⇔ N ≤ mn .
i
Hence for fixed m, n, i; the value Pi ( N ) is first increasing, and then decreasing,
and reaches its maximum value at the largest integral value not exceeding mn i .
Therefore a reasonable estimation of the population size N is

⎢ mn ⎥
⎢⎣ i ⎥⎦

where i is the number of tagged fishes we found in the new sample.

This kind of estimation is known as the maximum likelihood estimation.


P.63
Stat1301 Probability& Statistics I Spring 2008-2009

§ 2.2.8 Poisson Distribution

Definition

A random variable X , taking on one of the values 0,1,2,... , is said to have the
Poisson distribution with parameter θ (θ > 0 ) if

e −θ θ x
p( x ) = Pr ( X = x ) = , x = 0,1,2,...
x!

It is denoted as X ~ ℘(θ ) .

Distribution function :
e −θ θ i
F (x ) = ∑
x
.
i =0 i!

Moment generating function :


e −θ θ x
M X (t ) = ∑ e tx
x =0 x!


(e θ )
t x

=e −θ

x =0 x!

= e −θ eθe = exp(θ (e t − 1))


t

From this one can easily derive

μ =σ 2 =θ .

Computational Formula

If X ~ ℘(θ ) , then
θ
P ( X = i + 1) = P( X = i ) .
i +1

P.64
Stat1301 Probability& Statistics I Spring 2008-2009

Definition

Let an event E with occurrence in time obeying the following postulates:

1. Independence – The number of times E occurs in non-overlapping time


intervals are independent.

2. Lack of clustering – The probability of two or more occurrences in a


sufficiently short interval is essentially zero.

3. Rate – The probability of exactly one occurrence in a sufficient short time


interval of length h is approximately λh , i.e. directly proportional to h.

Denote N (t ) as the number of occurrences of E within time interval [0, t ]. Then


{ N (t ), t ≥ 0 } is said to be a Poisson Process and the probabilistic behaviour of
N (t ) can be modelled by the Poisson distribution with parameter θ = λt .

The formal derivation of the distribution of N (t ) requires the knowledge of


differential equations and is omitted here. The following provides an informal
justification of the result.

First we may partition the time interval into n subintervals each with length h = t n .

For n sufficiently large, i.e. h sufficiently short,

λt
P (one occurrence in a subinterval ) = λh = (by postulate 3),
n

P (more than one occurrence in a subinterval ) = 0 (by postulate 2),

Hence each subinterval can be regarded as a Bernoulli trial with success


probability λt n . From postulate 1, the occurrences in the subintervals are
independent. Therefore we have n independent Bernoulli trials each with success
probability λt n . Hence
⎛ λt ⎞
N (t ) ~ b⎜ n, ⎟
⎝ n⎠

n− x
⎛ n ⎞⎛ λ t ⎞ ⎛ λt ⎞
x
p ( x ) = P (N (t ) = x ) = ⎜⎜ ⎟⎟⎜ ⎟ ⎜1 − ⎟ , x = 0,1,2,..., n
⎝ x ⎠⎝ n ⎠ ⎝ n⎠

P.65
Stat1301 Probability& Statistics I Spring 2008-2009

Since we need the subintervals to be sufficiently small, we should consider the


limit of the above expression when n → ∞ .
n− x
⎛ n ⎞⎛ λt ⎞ ⎛ λt ⎞
x

p( x ) = lim ⎜ ⎟⎜ ⎟ ⎜1 − ⎟
n →∞ ⎜ ⎟
⎝ x ⎠⎝ n ⎠ ⎝ n⎠

n(n − 1)L(n − x + 1) (λt )


n− x
⎛ λt ⎞
x
= lim ⎜1 − ⎟
n →∞ x! nx ⎝ n⎠

=
(λt )x n⎛ 1⎞ ⎛
lim 1 − L 1 − 1−
−x
x − 1 ⎞⎛ λt ⎞ ⎛ λt ⎞
1−
n
=
e −λt (λt )
x
⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟
x! n→∞ n⎝ n⎠ ⎝ n ⎠⎝ n⎠ ⎝ n⎠ x!

The pmf of N (t ) is then given by

e − λt (λt )
x
p( x ) = Pr ( N (t ) = x ) = , x = 0,1,2,...
x!

Therefore N (t ) has a Poisson distribution with parameter λt , i.e. N (t ) ~ ℘(λt ) .

Remarks
⎛ N (t ) ⎞
1. Note that E (N (t )) = λt ⇒ E ⎜ ⎟ = λ . Therefore λ can be interpreted as the
⎝ t ⎠
average number of occurrence per unit time interval. The value of λ depends
on the time unit used.

2. According to above derivation, the Poisson distribution can be used to


approximate a binomial distribution when n is large and p is small.

Poisson Approximation to Binomial

When n is large and p is small such that np is bounded, then the binomial
distribution b(n, p ) can be approximated by ℘(np ), i.e.

⎛n⎞ x e − np (np )
x
p ( x ) = ⎜⎜ ⎟⎟ p (1 − p ) ≈
n− x
.
⎝ ⎠
x x !

This approximation should be successful if n ≥ 100 and np ≤ 10 .

P.66
Stat1301 Probability& Statistics I Spring 2008-2009

Example

Average phone call per hour = 3 (time unit = hour, λ = 3 )

Assume that number of phone calls is a Poisson process.

e −3 32
P (2 phone calls in one hour ) = P (N (1) = 2 ) = = 0.224
2!

P (2 or more phone calls in one hour ) = P (N (1) ≥ 2 ) = 1 − P (N (1) = 0) − P (N (1) = 1)

e −3 3 0 e −3 31
= 1− − = 0.801
0! 1!

What is P (less than 8 phone calls in 2 hours) ?

Let N (2 ) be the number of phone calls in 2 hours, then λt = 6 , N (2 ) ~ ℘(6 ) .

7 e −6 6 y
P (N (2 ) < 8) = ∑ = 0.744 .
y =0 y !

Example

In a certain manufacturing process in which glass items are being produced,


defects or bubbles occur, occasionally rendering the piece undesirable for
marketing. It is known that on the average 1 in every 1000 of these items produced
has one or more bubbles. What is the probability that a random sample of 8000
will yield fewer than 7 items possessing bubbles?

Let X be the number of items possessing bubbles. Then

⎛ 1 ⎞
X ~ b⎜ 8000, ⎟ ≅ ℘(8) .
⎝ 1000 ⎠

7 ⎛ 8000 ⎞
P ( X ≤ 7 ) = ∑ ⎜⎜ ⎟⎟(0.001) (0.999 )
i 8000 − i

i = 0⎝ i ⎠

7 e −8 8 i
≈∑ = 0.4530
i =0 i!

P.67
Stat1301 Probability& Statistics I Spring 2008-2009

Example

Suppose that the total number of scores in a soccer match of the England Premier
League follows the Poisson distribution with θ = 2.9 , i.e. on average there are 2.9
goals per match. Determine (a) the probability that there will be more than 2 goals;
(b) the probability that there will be even number of goals (zero is counted as even).

Let X be the number of scores in a particular match. Then X follows ℘(2.9 ) , i.e.

e − 2.9 (2.9 )
r
P( X = r ) ≈ , r = 0,1,2,...
r!

(a) P ( X > 2 ) = 1 − P ( X ≤ 2 )
= 1 − (P ( X = 0) + P ( X = 1) + P ( X = 2 ))
− 2.9 ⎛
= 1 − e ⎜⎜1 + 2.9 +
(2.9 ) ⎞
2
⎟ = 0.5540
2 ! ⎟
⎝ ⎠

(b) P ( X is even ) = P ( X = 0) + P ( X = 2 ) + P ( X = 4 ) + L

⎛ 2.9 2 2.9 4 ⎞
= e − 2.9 ⎜⎜1 + + + L⎟⎟
⎝ 2! 4! ⎠

From Taylor’s expansion, we have

2.9 2 2.9 3 2.9 4


e 2.9
= 1 + 2.9 + + + +L
2! 3! 4!

− 2.9 2.9 2 2.9 3 2.9 4


e = 1 − 2.9 + − + −L
2! 3! 4!

− 2.9 ⎛ 2.9 2 2.9 4 ⎞


Therefore e 2.9
+e = 2⎜⎜1 + + + L⎟⎟ .
⎝ 2! 4! ⎠

e 2.9 + e −2.9 1 + e −5.8


Hence P ( X is even ) = e − 2.9
× = = 0.5015 .
2 2

P.68
Stat1301 Probability& Statistics I Spring 2008-2009

Example

Random variables that commonly modelled by the Poisson distribution:

1. Number of misprints on a page of a book.

2. Number of people in a community living to 100 years of age.

3. Number of wrong telephone numbers that are dialled in a day.

4. Number of packages of dog biscuits sold in a particular store each day.

5. Number of customers entering a post office on a given day.

6. Number of vacancies occurring during a year in the Supreme Court.

7. Number of α -particles discharged in a fixed period of time from some

8. radioactive material.

9. Number of earthquakes occurring during some fixed time span.

10. Number of wars per year.

11. Number of electrons emitted from a heated cathode during a fixed time period.

12. Number of deaths in a given period of time of the policyholders of a life


insurance company.

13. Number of flaws in a certain type of drapery material.

P.69

You might also like