Chap 2.1
Chap 2.1
Definition
Remark
Always keep in mind that X is a function rather than a number. The value of X
depends on the outcome. We write X = x to represent the event
{ ω ∈ Ω | X (ω ) = x } and X ≤ x to represent the event { ω ∈ Ω | X (ω ) ≤ x }.
Example
Let X be the number of aces in a hand of three cards drawn randomly from a deck
of 52 cards. Denote A as an ace card and N as a non-ace card. Then
X ( AAA) = 3
X ( AAN ) = X ( ANA) = X ( NAA) = 2
X ( ANN ) = X ( NAN ) = X ( NNA ) = 1
X ( NNN ) = 0
        P( X   = 0) = P ({ NNN }) = 0.78262
        P( X   = 1) = P ({ ANN , NAN , NNA}) = 0.06805 × 3 = 0.20415
        P( X   = 2 ) = P ({ AAN , ANA, NAA }) = 0.00434 × 3 = 0.01302
        P( X   = 3) = P ({ AAA }) = 0.00018
                                           P.39
                                                  Stat1301 Probability& Statistics I   Spring 2008-2009
Example
The annual income ω of a randomly selected citizen has a sample space Ω = [0, ∞ ) .
Suppose the annual income is taxable if it exceeds c. Let X be the taxable income.
Then the space of X is also [0, ∞ ) and X : Ω → [0, ∞ ) such that
                                          ⎧0,          ω≤c
                                 X (ω ) = ⎨                .
                                          ⎩ω − c,      ω>c
Definition
Definition
p(x ) = P( X = x ) , x ∈ X (Ω ) ,
Example
For the previous example of card drawing, the pmf of X is
                                           P.40
                                                        Stat1301 Probability& Statistics I   Spring 2008-2009
Since pmf is defined through probability, we have the following conditions for p to
be a valid pmf:
1. p( x ) ≥ 0 for x ∈ X (Ω ) ;
   p ( x ) = 0 for x ∉ X (Ω )
2.        ∑ p(x ) = 1
     x∈X (Ω )
3. P ( X ∈ A) = ∑ p ( x ) where A ⊂ X (Ω )
                      x∈ A
Example
                             p(x )
                                               1/2
                                         1/3
                                   1/6
1 2 3
X (Ω ) = {1,2,3 }
                  1     1      1
     ∑ p( x ) = 6 + 3 + 2 = 1
      3
2.
     x =1
                                         1
3. P ( X ≤ 2 ) = p (1) + p (2 ) =
                                         2
                                                     P.41
                                                           Stat1301 Probability& Statistics I   Spring 2008-2009
Definition
                       F ( x ) = P ( X ≤ x ) = ∑ p (t ) ,             −∞< x<∞.
                                                 t≤x
Example
                                           1
       F (1) = P ( X ≤ 1) = P ( X = 1) =
                                           6
                                                  1
       F (1.5) = P ( X ≤ 1.5) = P ( X = 1) =        = F (1.566) = F (1.99999 ) = ...
                                                  6
                                                 1
       F (2 ) = P ( X ≤ 2 ) = p (1) + p (2 ) =
                                                 2
As can be seen, the cdf of a discrete random variable would be a step-function with
p ( x ) as the size of the jumps at the possible value x.
                             F (x )
                            1
1/2
1/6
1 2 3
                                                  P.42
                                                   Stat1301 Probability& Statistics I   Spring 2008-2009
Properties of a cdf
1. F ( x ) is nondecreasing, i.e. if a ≤ b , then F (a ) ≤ F (b ) .
2. F (− ∞ ) = lim F (b ) = 0
               b→−∞
3. F (∞ ) = lim F (b ) = 1
            b →∞
4. F     is right continuous. That is, for any b and any decreasing sequence
   { bn , n ≥ 1 } that converges to b, lim F (bn ) = F (b ) .
                                         n →∞
Example
Consider the following two games. In each game, three fair dice will be rolled.
Game 1: If all the dice face up with same number, then you win $24 otherwise
        you lose $1.
Game 2: You win $1, $2, or $3, according to one die face up as six, two dice face
        up as six or three dice face up as six, respectively. If no dice face up as
        six, then you lose $1.
To make a better betting strategy, one may consider the amount one will win (or
lose) in long run. First we need to evaluate the probabilities of win or lose in each
game. Let X, Y be the amounts of money you will win in one single trial of game
1 and game 2 respectively. A negative value means you lose money.
                                                P.43
                                                 Stat1301 Probability& Statistics I   Spring 2008-2009
For game 1,
                                                         1 1 1 1
      P ( X = 24 ) = P (same number on three dice ) = 6 × × × =
                                                         6 6 6 36
                         1 35
      P ( X = −1) = 1 −    =
                         36 36
For game 2,
                     1 5 5 25                                        5 1 1 5
      P (Y = 1) = 3 × × × =               ,          P (Y = 2 ) = 3 × × × =
                     6 6 6 72                                        6 6 6 72
                 1 1 1  1                                               25 5   1   125
      P (Y = 3) = × × =                   ,          P (Y = −1) = 1 −     − −    =
                 6 6 6 216                                              72 72 216 216
Suppose we play game 1 for 36000 times. Since the relative frequency is a good
estimate of the probability when number of trials is large, approximately in 1000
times we will win $24 and 35000 times we will lose $1. So in these 36000 trials of
game 1, we win
                        24 × 1000 + (− 1) × 35000 = −11000
Approximately we will lose $11000 in 36000 trial of game 1. The average amount
we win in each trial is
                               − 11000     11
                                        =−
                                36000      36
This is the long term average of gain if we play game 1. Indeed it can be calculated
as
                      11 24 × 1000 + (− 1) × 35000
                    −    =
                      36              36000
                                1             35
                        = 24 ×      + (− 1) ×
                                36            36
                        = 24 × P ( X = 24 ) + (− 1) × P ( X = −1)
      1 × P (Y = 1) + 2 × P (Y = 2 ) + 3 × P (Y = 3) + (− 1) × P (Y = −1)
                    25         5          1             125       17      11
              = 1×     + 2×      + 3×         + (− 1) ×      =−       >−
                    72        72         216            216       216     36
Therefore game 2 is better than game 1 in terms of long term average gain.
However, since in the long run you will lose money in both game, the best strategy
is do not gamble at all.
                                              P.44
                                                                      Stat1301 Probability& Statistics I   Spring 2008-2009
Definition
                                                  E ( X ) = ∑ xp ( x )
                                                                x∈ X (Ω )
provided that the summation exists. In general, for any function g, the expected
value of g ( X ) is
                                          E (g ( X )) = ∑ g ( x ) p ( x ) .
                                                               x∈ X ( Ω )
       ( )
e.g. E X 2 = ∑ x 2 p ( x ) , E (log X ) = ∑ (log x ) p ( x ) , …, etc.
                  x∈ X ( Ω )                             x∈ X ( Ω )
Properties
1. If c is a constant, then E (c ) = c .
    E (c ) = ∑ cp ( x ) = c ∑ p ( x ) = c
             x∈ X ( Ω )           x∈ X (Ω )
    E (cg ( X )) = ∑ cg ( x ) p ( x ) = c ∑ g ( x ) p ( x ) = cE (g ( X ))
                     x∈ X ( Ω )                   x∈ X ( Ω )
                                             ⎛n             ⎞ n
3. If c1 , c2 ,..., cn are constants, then E ⎜ ∑ ci gi ( X )⎟ = ∑ ci E (gi ( X )) .
                                             ⎝ i =1         ⎠ i =1
e.g. E (5 X + 3 X 2 ) = 5 E ( X ) + 3E ( X 2 ) .
4. X (ω ) ≥ Y (ω ) for all ω ∈ Ω ⇒ E ( X ) ≥ E (Y )
5. E ( X ) ≥ E ( X )
                                                               P.45
                                                             Stat1301 Probability& Statistics I   Spring 2008-2009
Example
E ( X 2 ) = (24 )
                         1        2 35   611
                           + (− 1)     =     = 16.9722
                2
36 36 36
 ⎛⎛     ⎛ 11 ⎞ ⎞ ⎞⎟ ⎛
                  2          2
                        11 ⎞ 1 ⎛
                                             2
                                        11 ⎞ 35
 ⎜
E ⎜ X − ⎜ − ⎟ ⎟ = ⎜ 24 + ⎟        + ⎜−1+ ⎟        = 16.8788
 ⎜⎝     ⎝  36 ⎠ ⎠   ⎟ ⎝ 36 ⎠   36   ⎝   36 ⎠   36
 ⎝                  ⎠
Alternatively,
 ⎛⎛     ⎛     ⎞ ⎞
                  2
                    ⎞           ⎛ 11 ⎞
                                       2
                                          ⎛ 11 ⎞
E ⎜ X − ⎜ − ⎟ ⎟ = E ( X ) + ⎜ ⎟ + 2⎜ ⎟ E ( X )
 ⎜         11       ⎟        2
 ⎜⎝
 ⎝      ⎝ 36 ⎠ ⎠ ⎟⎠             ⎝ 36 ⎠    ⎝ 36 ⎠
                      = 16.9722 + 0.09336 + 2(0.3056 )(− 0.3056 ) = 16.8788
                    (
The value E ( X − E ( X ))
                                     2
                                         )   can tell us the variation of our gains among long term
trials of game 1.
Definition
If X is a discrete random variable with pmf p( x ) and space X (Ω ) , then E ( X ) is
called the (population) mean of X (of the distribution) and is usually denoted by
μ . It is a measure of central location of the random variable X .
                         (
Furthermore, E ( X − μ ) =
                                 2
                                     )          ∑ ( x − μ ) p ( x ) is called the (population) variance
                                             x ∈ X (Ω )
                                                           2
                                                          P.46
                                                           Stat1301 Probability& Statistics I   Spring 2008-2009
Properties
1. Var ( X ) = E ( X 2 ) − μ 2
                      (
    Var ( X ) = E ( X − μ )
                              2
                                  )
             = E ( X 2 − 2 μX + μ 2 )
             = E ( X 2 ) − 2 μE ( X ) + μ 2
             = E (X 2 ) − μ 2
                          {
    Var (aX + c ) = E [(aX + c ) − E (aX + c )]
                                                       2
                                                           }
                          {
                 = E [aX − aE ( X )]
                                     2
                                             }
                          (
                 = E a2 (X − μ )
                                 2
                                       )
                      = a E (( X − μ ) ) = a Var ( X )
                          2              2    2
Example
E (Y 2 ) = 1 ×
                 25      5        1       125
                    + 4×    + 9×     + 1×     = 1.2454
                 72      72      216      216
                                                   2
                                 ⎛ 17 ⎞
Var (Y ) = E (Y ) − μ = 1.2454 − ⎜ −
                  2       2
                                       ⎟ = 1.2392
                                 ⎝ 216 ⎠
                          y
σ y = 1.2392 = 1.1132
Therefore the variation of gain from game 2 is much less than that from game 1.
                                                  P.47
                                                         Stat1301 Probability& Statistics I   Spring 2008-2009
Example
An individual who owns the ice cream concession at a sporting event can expect to
net $600 on the sale of ice cream if the day is sunny, but only $300 if it is cloudy,
and $100 if it rains. The respective probabilities for those events are 0.6, 0.3 and
0.1.
Let $ X be his net profit in a particular day. Then the distribution of X can be
described by the following pmf:
Variance of profit :
                ( )
              E X 2 = (600 ) × 0.6 + (300 ) × 0.3 + (100 ) × 0.1 = 244000
                                     2               2                   2
                               ( )
              Var ( X ) = E X 2 − [E ( X )] = 244000 − (460 ) = 32400
                                                2                            2
Markov Inequality
If X is a positive random variable with finite mean, then for any constant c > 0 ,
                                                           E(X )
                                         P( X ≥ c ) ≤            .
                                                            c
Proof
              E ( X ) = ∑ xp ( x )
                         x∈ X (Ω )
                       = ∑ xp ( x ) + ∑ xp ( x )
                         x≥c             x<c
                       ≥ ∑ xp ( x ) ≥ c ∑ p ( x ) = P ( X ≥ c )
                         x≥c              x≥c
                                                    P.48
                                                 Stat1301 Probability& Statistics I   Spring 2008-2009
Chebyshev’s Inequality
If the random variable X has a finite mean and finite variance σ 2 , then for any
constant k > 0 ,
                            P ( X − μ ≥ kσ ) ≤ 2 .
                                               1
                                              k
e.g. The probability that      X    deviates from the mean more than 2 standard
deviations is at most 0.25.
Proof
By Markov inequality,
                                   ⎛ X −μ 2
             P ( X − μ ≥ kσ ) = Pr ⎜        ≥k ⎟≤
                                              2                 (
                                                ⎞ E ( X − μ )2 σ 2   1
                                                                   = 2
                                                                               )
                                   ⎜ σ          ⎟        k 2
                                                                    k
                                   ⎝            ⎠
Definition
Definition
                                          ( )
                              M X (t ) = E e tX = ∑ e tx p ( x )
                                                   x∈ X ( Ω )
                                            P.49
                                                             Stat1301 Probability& Statistics I    Spring 2008-2009
Example
                                                                                 x
                                                   ⎛1⎞
Suppose X is a random variable with pmf p ( x ) = 2⎜ ⎟ , x = 1,2,3,...
                                                   ⎝ 3⎠
Then the moment generating function of X is
                                                                                  x
                                              ∞       ∞ ⎛ et ⎞   x
                                               ⎛1⎞
                               M X (t ) = ∑ e 2⎜ ⎟ = ∑ 2⎜⎜ ⎟⎟ .
                                                     tx
x =1 ⎝ 3⎠ x =1 ⎝ 3 ⎠
For M X (t ) exist, the series must be converging, i.e. e t 3 < 1 . Therefore for t < ln 3 ,
                                                        et 3    2e t
                                      M X (t ) = 2            =       .
                                                      1 − et 3 3 − et
M X (t ) is undefined if t ≥ ln 3 .
                                                   d r (M (t ))
                                M ( r ) (0 ) =            r
                                                                     = E (X r )
                                                       dt       t =0
Proof                     ( )
              M (t ) = E e tX = ∑ e tx p ( x )                       ⇒ M (0 ) = 1
                                       x∈ X (Ω )
              M ' ' (t ) = ∑ x 2 e tx p ( x )                                                        ( )
                                                                     ⇒ M ' ' (0) = ∑ x 2 p ( x ) = E X 2
                          x∈ X (Ω )                                                   x∈ X ( Ω )
and so on …
                                                          P.50
                                                                Stat1301 Probability& Statistics I   Spring 2008-2009
Example
        M ' (t ) =
                   (3 − e )(2e ) − (2e )(− e ) ⇒ μ = M ' (0) = 4 + 2 = 3
                            t       t         t       t
                            (3 − e )    t 2
                                                                 4     2
                    M ' (t )             M ' (0 )
        R' (t ) =            ⇒ R' (0 ) =          =μ
                    M (t )               M (0 )
                           ⇒ R(t ) = ln 2 + t − ln (3 − e t ),
                    2e t
Therefore M (t ) =
                   3−e   t
                      R ' (t ) = 1 +
                                       et
                                                  ,          R' ' (t ) =
                                                                         (3 − e )e − e (− e )
                                                                               t   t     t    t
3 − et (3 − e )t 2
Hence
                                  1  3                                        2 +1 3
        μ = R' (0 ) = 1 +           = ,                   σ 2 = R' ' (0 ) =       = .
                                3 −1 2                                          4  4
Remark
                                                           P.51
                                                  Stat1301 Probability& Statistics I   Spring 2008-2009
Bernoulli experiment
e.g.
                     p( x ) = p x (1 − p ) ,
                                        1− x
pmf of X :                                            x = 0,1
μ = p, E (X 2 ) = p , σ 2 = p − p 2 = p(1 − p ) .
Binomial Distribution
(ii) independent, i.e. the success probability of any trial is not affected by the
    outcome of other trials;
then X is said to have a binomial distribution with n trials and success probability
p . It is denoted as
                                    X ~ b(n, p ) .
                                               P.52
                                                           Stat1301 Probability& Statistics I   Spring 2008-2009
Example
Value of X                       0               1                   2                3            4
                               FFFF            SFFF                SSFF             SSSF         SSSS
                                               FSFF                SFSF             SSFS
                                               FFSF                SFFS             SFSS
                                               FFFS                FSSF             FSSS
                                                                   FSFS
                                                                   FFSS
probability                   (1 − p )  4
                                             p(1 − p )
                                                           3
                                                                 p 2 (1 − p )
                                                                            2
                                                                                 p 3 (1 − p )      p4
no. of permutations              ⎛ 4⎞               ⎛ 4⎞             ⎛ 4⎞            ⎛ 4⎞          ⎛ 4⎞
                                ⎜⎜ ⎟⎟              ⎜⎜ ⎟⎟            ⎜⎜ ⎟⎟           ⎜⎜ ⎟⎟         ⎜⎜ ⎟⎟
                                 ⎝0⎠                ⎝1 ⎠             ⎝ 2⎠            ⎝3⎠           ⎝ 4⎠
                               ⎛4⎞
       p ( x ) = P ( X = x ) = ⎜⎜ ⎟⎟ p x (1 − p ) ,
                                                 4− x
                                                               x = 0,1,2,3,4 .
                               ⎝ x⎠
                               ⎛n⎞
       p ( x ) = P ( X = x ) = ⎜⎜ ⎟⎟ p x (1 − p ) ,
                                                 n− x
                                                               x = 0,1,2,..., n .
                               ⎝ x⎠
Binomial theorem
                                                           ⎛n⎞
                                        (a + b )
                                                        n
                                                    = ∑ ⎜⎜ ⎟⎟a i b n−i
                                               n
i =0 ⎝ i ⎠
                                                     P.53
                                                     Stat1301 Probability& Statistics I   Spring 2008-2009
Distribution function:
                                                    x ⎛n⎞
                          F ( x ) = P ( X ≤ x ) = ∑ ⎜⎜ ⎟⎟ p i (1 − p )
                                                                      n −i
i = 0⎝ i ⎠
                                                 ⎛n⎞
                             M X (t ) = ∑ e tx ⎜⎜ ⎟⎟ p x (1 − p )
                                          n
                                                                 n− x
                                        x =0     ⎝ x⎠
                                             ⎛n⎞
                                     = ∑ ⎜⎜ ⎟⎟(e t p ) (1 − p )
                                          n
                                                        x         n− x
x =0 ⎝ x ⎠
                                      = (e t p + 1 − p )
                                                          n
μ = np , σ 2 = np(1 − p ) .
Example
                                         ⎛ 50 ⎞
                 P ( X = 15) = p (15) = ⎜⎜ ⎟⎟(0.2 ) (0.8) = 0.02992 .
                                                   15    35
⎝15 ⎠
Suppose that two marks will be given for each correct answer, while half mark will
be deducted for each incorrect answer. Let Y be the total score this student can get.
Then Y = 2 × X + (− 0.5) × (50 − X ) = 2.5 X − 25 .
                                                  P.54
                                                 Stat1301 Probability& Statistics I   Spring 2008-2009
If the passing mark is set to 40, the probability that he will pass the examination is
i = 26 ⎝ i ⎠
= 0.000000492
Example
Eggs are sold in boxes of six. Each egg has independently of the others a
probability 0.2 being cracked. A shopper requires three boxes of eggs and regards
as satisfactory if a box contains not more than two cracked eggs.
P (a box is satisfactory ) = P ( X ≤ 2 )
                                       ⎛ 6⎞                ⎛6⎞
                             = (0.8) + ⎜⎜ ⎟⎟(0.2 )(0.8) + ⎜⎜ ⎟⎟(0.2 ) (0.8) = 0.90112
                                    6                  5             2     4
⎝1 ⎠ ⎝ 2⎠
              ⎛ 5⎞                         ⎛5⎞
           = ⎜⎜ ⎟⎟(0.90112 ) (0.09888 ) + ⎜⎜ ⎟⎟(0.90112 ) (0.09888) + (0.90112 )
                            3          2                 4                      5
⎝ 3⎠ ⎝ 4⎠
= 0.99171
                                             P.55
                                                              Stat1301 Probability& Statistics I   Spring 2008-2009
                                                                 Yi ~ b(1, p ) .
                                               n                      iid
                                        X = ∑ Yi ,
                                              i =1
4. If X ~ b(n, p ) , then
                                                        p n−k
                                P ( X = k + 1) =                 P( X = k ).
                                                       1− p k +1
Geometric Distribution
Suppose we perform a sequence of independent Bernoulli trials with success
probability p. Let X be the number of trials performed until the first success is
obtained. Then X is said to have a Geometric distribution. It is denoted by
X ~ Geogmetric ( p ) .
                       p ( x ) = P ( X = x ) = (1 − p )
                                                              x −1
                                                                     p,         x = 1,2,...
Distribution function :
       F ( x ) = Pr ( X ≤ x ) = ∑ p(i )
                                        x
i =1
                                          1 − (1 − p )
                 x                                       x
            = p ∑ (1 − p )
                                i −1
                                       =p
                i =1                       1 − (1 − p )
            = 1 − (1 − p ) ,                    x = 1,2,...
                            x
                                                         P.56
                                                                 Stat1301 Probability& Statistics I   Spring 2008-2009
                          ∞
             M X (t ) = ∑ e tx (1 − p )
                                            x −1
                                                   p
                          x =1
                          ∞
                     = ∑ e t ( y +1) (1 − p ) p
                                                   y
y =0
                                     [             ]
                                 ∞
                     = e t p ∑ e t (1 − p )
                                                       y
                                 y =0
                                    t
                              e p
                     =                                     , t < − ln (1 − p )
                         1 − e t (1 − p )
                                         1                         1− p
                                 μ=        ,                σ2 =        .
                                         p                          p2
Example
In a casino game, roulette, suppose you bet on the number 00 for every trial. Let X
be the number of games played until you win once, then X ~ Geogmetric(1 38) .
On average, you will need to play for E ( X ) = 38 games in order to win once. The
probability of no more than 4 games played until your first win is
                                                                           4
                                                    ⎛   1⎞
                         P ( X ≤ 4 ) = F (4 ) = 1 − ⎜1 − ⎟ = 0.1012 .
                                                    ⎝ 38 ⎠
Example
Often packages that you buy in a store include a card with a picture, or other items
of a set, and you try to collect all of the N possible cards. We would be interested
in the expected number of trials one should make in order to collect a set of the
cards. Define
and assume that the packages are independent with equal chances to contain the N
possible cards.
                                                            P.57
                                                         Stat1301 Probability& Statistics I   Spring 2008-2009
In particular, if N = 9 , then
                                       ⎛ 1       1⎞
                          E (W ) = 9 × ⎜1 + + L + ⎟ = 25.4607 .
                                       ⎝ 2       9⎠
Remarks
                      p( y ) = (1 − p ) p                  , y = 0,1,2,...
                                        y
                      F ( y ) = 1 − (1 − p )
                                            y +1
                                                     , y = 0,1,2,...
                                         p
                      M Y (t ) =                  ,  t < − ln (1 − p )
                                 1 − e t (1 − p )
                           1− p                     1− p
                      μ=           ,           σ2 = 2
                               p                      p
2. Consider
                                    (
               P ( X > a + b ) = 1 − 1 − (1 − p )          )
                                                     = (1 − p ) (1 − p )
                                                    a +b              a          b
                              = P ( X > a ) Pr ( X > b )
   Therefore
                                               P ( X > a + b, X > a )
                  P( X > a + b | X > a ) =                            = P( X > b) .
                                                     P( X > a )
                                                   P.58
                                                              Stat1301 Probability& Statistics I   Spring 2008-2009
Example
X ~ nb(r , p ).
                                    ⎛ x − 1⎞ r
           p ( x ) = P ( X = x ) = ⎜⎜      ⎟⎟ p (1 − p ) ,
                                                        x−r
                                                                              x = r , r + 1, r + 2,...
                                    ⎝ r −1⎠
           1                 1                          1
                 = 1 + ar + r (r + 1)a 2 + L + r (r + 1)L(r + k − 1)a k + L
       (1 − a )r
                             2!                        k!
                    ∞ ⎛ r + k − 1⎞        ∞ ⎛ y − 1⎞
                 = ∑ ⎜⎜          ⎟⎟a k = ∑ ⎜⎜         ⎟⎟a y −r
                   k =0 ⎝ k       ⎠      y =r ⎝ r − 1 ⎠
Distribution function :
                             x ⎛i −1 ⎞
                 F ( x ) = ∑ ⎜⎜         ⎟⎟ p r (1 − p )i −r        , x = r , r + 1, r + 2,...
                           i = r ⎝ r − 1⎠
                                                     P.59
                                                             Stat1301 Probability& Statistics I   Spring 2008-2009
                   ∞      ⎛ x − 1⎞ r
       M X (t ) = ∑ e tx ⎜⎜       ⎟⎟ p (1 − p )x −r
                  x =r    ⎝ r − 1⎠
                           ∞ ⎛ x − 1⎞
                = e p ∑ ⎜⎜
                    tr r
                                          (
                                      ⎟⎟ e t (1 − p )  )
                                                      x−r
                          x=r ⎝ r − 1 ⎠
                                      r
                ⎡     et p     ⎤
               =⎢              ⎥              , t < − ln (1 − p )
                ⎣1 − e (1 − p )⎦
                      t
                                          r                       r (1 − p )
                                  μ=        ,              σ2 =              .
                                          p                           p2
Example
Fermat and Pascal are sitting in a cafe in Paris and decide to play the simplest of all
games, flipping a coin. If the coin comes up head, Fermat gets a point. If the coin
comes up tail, Pascal gets a point. The first to get 10 points wins the total pot worth
100 Francs. But then a strange thing happens. Fermat is winning 7 points to 6,
when he receives an urgent message that a friend is sick, and he must rush to his
home town of Toulouse immediately. Of course Pascal understands, but later, in
correspondence, the problem arises: how should the 100 Francs be divided?
Ans: Let X be the number of additional games they need to play so that Fermat
     can get 3 more points. Then X is the number of trials until 3 heads (successes)
     are obtained. Therefore X is a negative binomial random variable with r = 3
     and p = 0.5 . The probability mass function is given by
                           ⎛ x − 1⎞                     ⎛ x − 1⎞
                p ( x ) = ⎜⎜      ⎟⎟(0.5) (1 − 0.5) = ⎜⎜       ⎟⎟(0.5) ,
                                                   x −3
                                                                                       x = 3,4,5,...
                                         3                            x
⎝ 3 − 1⎠ ⎝ 2 ⎠
      For Fermat to win the game, Pascal should get less than 4 points before he
      gets 3 points, i.e. X must be less than 7. Therefore
                               ⎝ 2 ⎠             ⎝ 2 ⎠             ⎝ 2 ⎠             ⎝ 2 ⎠
                            = 0.65625
                                                       P.60
                                                           Stat1301 Probability& Statistics I   Spring 2008-2009
      Hence Fermat should receive 65.625 Francs, while Pascal should receive
      34.375 Francs.
For general problem of points with intermediate scores m to n, the probability that
the side with m points would win the game is
                                  n −1⎛    i −1 ⎞ m
                                                ⎟⎟ p (1 − p ) .
                                                             i −m
                                   ∑ ⎜⎜
                                   i = r ⎝ m − 1⎠
Remarks
1. If r is equal to 1, then the negative binomial distribution becomes the
   geometric distribution, i.e. nb(1, p ) ≡ Geogmetric( p ) .
                               ⎛ y + r − 1⎞ r
                    p ( y ) = ⎜⎜          ⎟⎟ p (1 − p )              , y = 0,1,2,...
                                                        y
⎝ r −1 ⎠
                                                       r
                               ⎛         p        ⎞
                   M Y (t ) = ⎜⎜                  ⎟⎟        ,        t < − ln(1 − p )
                               ⎝ 1 − e t
                                         (1 − p )  ⎠
                                  r (1 − p )                        r (1 − p )
                            μ=               ,              σ2 =
                                       p                                p2
                                                 P.61
                                                   Stat1301 Probability& Statistics I   Spring 2008-2009
Definition
Suppose we have N objects with m objects as type I and ( N − m ) objects as type II.
A sample of n objects is randomly drawn without replacement from the N objects.
Let X be the number of type I objects in the sample. Then X is said to have a
Hypergeometric distribution. It is denoted by
Hypergeometric(N , m, n ) .
                          ⎛ m ⎞⎛ N − m ⎞
                          ⎜⎜ ⎟⎟⎜⎜        ⎟
                           ⎝ x ⎠⎝ n − x ⎟⎠
     p(x ) = P( X = x ) =                  ,          max(n − ( N − m ),0 ) ≤ x ≤ min(n, m )
                                ⎛N⎞
                                ⎜⎜ ⎟⎟
                                 ⎝n ⎠
                                                       ⎛ N − n⎞
                        μ = np         ,       σ2 =⎜          ⎟np(1 − p )
                                                       ⎝ N −1 ⎠
Remark
                                               P.62
                                                        Stat1301 Probability& Statistics I   Spring 2008-2009
Example
Let X be the number of 2’s in a hand of 13 cards drawn randomly from a deck of
52 cards. Then X has a hypergeometric distribution with N = 52 , m = 4 , n = 13 .
                         4                         ⎛ 52 − 13 ⎞ ⎛ 4 ⎞⎛    4⎞
       E ( X ) = 13 ×      =1 ,        Var ( X ) = ⎜         ⎟(13)⎜ ⎟⎜1 − ⎟ = 0.7059
                        52                         ⎝ 52 − 1 ⎠ ⎝ 52 ⎠⎝ 52 ⎠
                                          ⎛ 4 ⎞⎛ 48 ⎞      ⎛ 52 ⎞
                             P ( X = 3) = ⎜⎜ ⎟⎟⎜⎜ ⎟⎟       ⎜⎜ ⎟⎟ = 0.04120
                                          ⎝ 3 ⎠⎝ 10 ⎠      ⎝ 13 ⎠
Example
To estimate the population size of a specific kind of animal in a certain region, e.g.
number of fishes in a lake, ecologists often perform the following procedures.
                                      ⎛ m ⎞⎛ N − m ⎞       ⎛N⎞
                        P ( X = i ) = ⎜⎜ ⎟⎟⎜⎜      ⎟⎟      ⎜⎜ ⎟⎟ = Pi (N )     (say)
                                      ⎝ i ⎠⎝ n − i ⎠       ⎝n⎠
                     (N − m )(N − n ) ≥ N (N − m − n + i ) ⇔ N ≤ mn .
                                                                   i
Hence for fixed m, n, i; the value Pi ( N ) is first increasing, and then decreasing,
and reaches its maximum value at the largest integral value not exceeding mn i .
Therefore a reasonable estimation of the population size N is
                                                 ⎢ mn ⎥
                                                 ⎢⎣ i ⎥⎦
Definition
A random variable X , taking on one of the values 0,1,2,... , is said to have the
Poisson distribution with parameter θ (θ > 0 ) if
                                               e −θ θ x
                       p( x ) = Pr ( X = x ) =          ,                         x = 0,1,2,...
                                                  x!
It is denoted as X ~ ℘(θ ) .
Distribution function :
                                                                     e −θ θ i
                                                       F (x ) = ∑
                                                                  x
                                                                              .
                                                                i =0    i!
                           ∞
                                           e −θ θ x
              M X (t ) = ∑ e tx
                         x =0                 x!
                                 ∞
                                           (e θ )
                                             t     x
                      =e   −θ
                                ∑
                                x =0          x!
μ =σ 2 =θ .
Computational Formula
If X ~ ℘(θ ) , then
                                                                      θ
                                           P ( X = i + 1) =               P( X = i ) .
                                                                  i +1
                                                               P.64
                                                    Stat1301 Probability& Statistics I   Spring 2008-2009
Definition
First we may partition the time interval into n subintervals each with length h = t n .
                                                          λt
       P (one occurrence in a subinterval ) = λh =                              (by postulate 3),
                                                             n
                                                             n− x
                                       ⎛ n ⎞⎛ λ t ⎞ ⎛ λt ⎞
                                              x
          p ( x ) = P (N (t ) = x ) = ⎜⎜ ⎟⎟⎜ ⎟ ⎜1 − ⎟               ,       x = 0,1,2,..., n
                                       ⎝ x ⎠⎝ n ⎠ ⎝   n⎠
                                               P.65
                                                          Stat1301 Probability& Statistics I   Spring 2008-2009
       p( x ) = lim  ⎜ ⎟⎜ ⎟ ⎜1 − ⎟
                n →∞ ⎜ ⎟
                     ⎝ x ⎠⎝ n ⎠ ⎝  n⎠
            =
              (λt )x    n⎛ 1⎞ ⎛
                     lim 1 − L 1 −         1−
                                                −x
                                   x − 1 ⎞⎛ λt ⎞ ⎛ λt ⎞
                                                   1−
                                                        n
                                                          =
                                                            e −λt (λt )
                                                                        x
                             ⎜        ⎟     ⎜         ⎟⎜            ⎟ ⎜          ⎟
                x!    n→∞   n⎝       n⎠     ⎝       n ⎠⎝           n⎠ ⎝         n⎠             x!
                                                  e − λt (λt )
                                                             x
                     p( x ) = Pr ( N (t ) = x ) =              ,           x = 0,1,2,...
                                                         x!
Remarks
                                 ⎛ N (t ) ⎞
1. Note that E (N (t )) = λt ⇒ E ⎜        ⎟ = λ . Therefore λ can be interpreted as the
                                 ⎝ t ⎠
   average number of occurrence per unit time interval. The value of λ depends
   on the time unit used.
   When n is large and p is small such that np is bounded, then the binomial
   distribution b(n, p ) can be approximated by ℘(np ), i.e.
                                            ⎛n⎞ x               e − np (np )
                                                                             x
                                 p ( x ) = ⎜⎜ ⎟⎟ p (1 − p ) ≈
                                                           n− x
                                                                               .
                                            ⎝ ⎠
                                             x                        x !
                                                    P.66
                                                     Stat1301 Probability& Statistics I   Spring 2008-2009
Example
                                                                       e −3 32
              P (2 phone calls in one hour ) = P (N (1) = 2 ) =                = 0.224
                                                                          2!
                                                  e −3 3 0 e −3 31
                                             = 1−         −        = 0.801
                                                     0!       1!
                                                 7  e −6 6 y
                              P (N (2 ) < 8) = ∑             = 0.744 .
                                               y =0    y !
Example
                                      ⎛         1 ⎞
                                 X ~ b⎜ 8000,      ⎟ ≅ ℘(8) .
                                      ⎝       1000 ⎠
                       7 ⎛ 8000 ⎞
       P ( X ≤ 7 ) = ∑ ⎜⎜       ⎟⎟(0.001) (0.999 )
                                         i        8000 − i
i = 0⎝ i ⎠
                      7  e −8 8 i
                   ≈∑             = 0.4530
                    i =0    i!
                                                P.67
                                                                   Stat1301 Probability& Statistics I   Spring 2008-2009
Example
Suppose that the total number of scores in a soccer match of the England Premier
League follows the Poisson distribution with θ = 2.9 , i.e. on average there are 2.9
goals per match. Determine (a) the probability that there will be more than 2 goals;
(b) the probability that there will be even number of goals (zero is counted as even).
Let X be the number of scores in a particular match. Then X follows ℘(2.9 ) , i.e.
                                                  e − 2.9 (2.9 )
                                                                      r
                                     P( X = r ) ≈                , r = 0,1,2,...
                                                         r!
(a) P ( X > 2 ) = 1 − P ( X ≤ 2 )
                = 1 − (P ( X = 0) + P ( X = 1) + P ( X = 2 ))
                        − 2.9 ⎛
                = 1 − e ⎜⎜1 + 2.9 +
                                       (2.9 ) ⎞
                                             2
                                               ⎟ = 0.5540
                                         2 !   ⎟
                              ⎝                ⎠
(b) P ( X is even ) = P ( X = 0) + P ( X = 2 ) + P ( X = 4 ) + L
                                     ⎛ 2.9 2 2.9 4     ⎞
                          = e − 2.9 ⎜⎜1 +    +     + L⎟⎟
                                     ⎝    2!   4!      ⎠
                                                               P.68
                                           Stat1301 Probability& Statistics I   Spring 2008-2009
Example
8. radioactive material.
11. Number of electrons emitted from a heated cathode during a fixed time period.
P.69