Prob Review
Prob Review
Rob Hall
 September 9, 2010
What is Probability?
AC
       A    A∩B B                       A∪B
Properties of Set Operations
    I   Commutativity: A ∪ B = B ∪ A
    I   Associativity: A ∪ (B ∪ C ) = (A ∪ B) ∪ C .
    I   Likewise for intersection.
    I   Proof?
Properties of Set Operations
    I   Commutativity: A ∪ B = B ∪ A
    I   Associativity: A ∪ (B ∪ C ) = (A ∪ B) ∪ C .
    I   Likewise for intersection.
    I   Proof? Follows easily from commutative and associative
        properties of “and” and “or” in the definitions.
Properties of Set Operations
    I   Commutativity: A ∪ B = B ∪ A
    I   Associativity: A ∪ (B ∪ C ) = (A ∪ B) ∪ C .
    I   Likewise for intersection.
    I   Proof? Follows easily from commutative and associative
        properties of “and” and “or” in the definitions.
    I   Distributive properties: A ∩ (B ∪ C ) = (A ∩ B) ∪ (A ∩ C )
        A ∪ (B ∩ C ) = (A ∪ B) ∩ (A ∪ C )
    I   Proof?
Properties of Set Operations
    I   Commutativity: A ∪ B = B ∪ A
    I   Associativity: A ∪ (B ∪ C ) = (A ∪ B) ∪ C .
    I   Likewise for intersection.
    I   Proof? Follows easily from commutative and associative
        properties of “and” and “or” in the definitions.
    I   Distributive properties: A ∩ (B ∪ C ) = (A ∩ B) ∪ (A ∩ C )
        A ∪ (B ∩ C ) = (A ∪ B) ∩ (A ∪ C )
    I   Proof? Show each side of the equality contains the other.
    I   DeMorgan’s Law ...see book.
Disjointness and Partitions
   Remarks: may consider the event space to be the power set of the sample
   space (for a discrete sample space - more later).
Probability Terminology
   Remarks: may consider the event space to be the power set of the sample
   space (for a discrete sample space - more later). e.g., rolling a fair die:
   Ω = {1, 2, 3, 4, 5, 6}
Probability Terminology
   Remarks: may consider the event space to be the power set of the sample
   space (for a discrete sample space - more later). e.g., rolling a fair die:
   Ω = {1, 2, 3, 4, 5, 6}
   F = 2Ω = {{1}, {2} . . . {1, 2} . . . {1, 2, 3} . . . {1, 2, 3, 4, 5, 6}, {}}
Probability Terminology
   Remarks: may consider the event space to be the power set of the sample
   space (for a discrete sample space - more later). e.g., rolling a fair die:
   Ω = {1, 2, 3, 4, 5, 6}
   F = 2Ω = {{1}, {2} . . . {1, 2} . . . {1, 2, 3} . . . {1, 2, 3, 4, 5, 6}, {}}
   P({1}) = P({2}) = . . . = 16 (i.e., a fair die)
   P({1, 3, 5}) = 21 (i.e., half chance of odd result)
   P({1, 2, 3, 4, 5, 6}) = 1 (i.e., result is “almost surely” one of the faces).
Axioms for Probability
    A   A∩B B
P(A ∪ B) – General Unions
                                                 P(A ∩ B)
                                      P(A|B) =
                                                    P(B)
     A   A∩B B
                     Interpretation: the outcome is definitely in B, so treat
                     B as the entire sample space and find the probability
                     that the outcome is also in A.
Conditional Probabilities
                                                        P(A ∩ B)
                                             P(A|B) =
                                                           P(B)
     A   A∩B B
                            Interpretation: the outcome is definitely in B, so treat
                            B as the entire sample space and find the probability
                            that the outcome is also in A.
   This rapidly leads to: P(A|B)P(B) = P(A ∩ B) aka the “chain rule for
   probabilities.” (why?)
Conditional Probabilities
                                                           P(A ∩ B)
                                                     P(A|B) =
                                                              P(B)
     A    A∩B B
                               Interpretation: the outcome is definitely in B, so treat
                               B as the entire sample space and find the probability
                               that the outcome is also in A.
   This rapidly leads to: P(A|B)P(B) = P(A ∩ B) aka the “chain rule for
   probabilities.” (why?)
   When A1 , A2 . . . are a partition of Ω:
                               ∞
                               X                     ∞
                                                     X
                      P(B) =          P(B ∩ Ai ) =         P(B|Ai )P(Ai )
                                i=1                  i=1
Conditional Probabilities
                                                           P(A ∩ B)
                                                     P(A|B) =
                                                              P(B)
     A    A∩B B
                               Interpretation: the outcome is definitely in B, so treat
                               B as the entire sample space and find the probability
                               that the outcome is also in A.
   This rapidly leads to: P(A|B)P(B) = P(A ∩ B) aka the “chain rule for
   probabilities.” (why?)
   When A1 , A2 . . . are a partition of Ω:
                               ∞
                               X                     ∞
                                                     X
                      P(B) =          P(B ∩ Ai ) =         P(B|Ai )P(Ai )
                                i=1                  i=1
        P(A) =
Conditional Probability Example
   Suppose we throw a fair die:
   Ω = {1, 2, 3, 4, 5, 6}, F = 2Ω , P({i}) = 16 , i = 1 . . . 6
   A = {1, 2, 3, 4} i.e., “result is less than 5,”
   B = {1, 3, 5} i.e., “result is odd.”
                     2
        P(A) =
                     3
        P(B) =
Conditional Probability Example
   Suppose we throw a fair die:
   Ω = {1, 2, 3, 4, 5, 6}, F = 2Ω , P({i}) = 16 , i = 1 . . . 6
   A = {1, 2, 3, 4} i.e., “result is less than 5,”
   B = {1, 3, 5} i.e., “result is odd.”
                     2
        P(A) =
                     3
                     1
        P(B) =
                     2
     P(A|B) =
Conditional Probability Example
   Suppose we throw a fair die:
   Ω = {1, 2, 3, 4, 5, 6}, F = 2Ω , P({i}) = 16 , i = 1 . . . 6
   A = {1, 2, 3, 4} i.e., “result is less than 5,”
   B = {1, 3, 5} i.e., “result is odd.”
                     2
        P(A) =
                     3
                     1
        P(B) =
                     2
                     P(A ∩ B)
     P(A|B) =
                       P(B)
Conditional Probability Example
   Suppose we throw a fair die:
   Ω = {1, 2, 3, 4, 5, 6}, F = 2Ω , P({i}) = 16 , i = 1 . . . 6
   A = {1, 2, 3, 4} i.e., “result is less than 5,”
   B = {1, 3, 5} i.e., “result is odd.”
                     2
        P(A) =
                     3
                     1
        P(B) =
                     2
                     P(A ∩ B)
     P(A|B) =
                       P(B)
                     P({1, 3})
                =
                       P(B)
Conditional Probability Example
   Suppose we throw a fair die:
   Ω = {1, 2, 3, 4, 5, 6}, F = 2Ω , P({i}) = 16 , i = 1 . . . 6
   A = {1, 2, 3, 4} i.e., “result is less than 5,”
   B = {1, 3, 5} i.e., “result is odd.”
                     2
        P(A) =
                     3
                     1
        P(B) =
                     2
                     P(A ∩ B)
     P(A|B) =
                       P(B)
                     P({1, 3})
                =
                       P(B)
                     2
                =
                     3
Conditional Probability Example
   Suppose we throw a fair die:
   Ω = {1, 2, 3, 4, 5, 6}, F = 2Ω , P({i}) = 16 , i = 1 . . . 6
   A = {1, 2, 3, 4} i.e., “result is less than 5,”
   B = {1, 3, 5} i.e., “result is odd.”
                     2
        P(A) =
                     3
                     1
        P(B) =
                     2
                     P(A ∩ B)                                 P(A ∩ B)
     P(A|B) =                                  P(B|A) =
                       P(B)                                     P(A)
                     P({1, 3})
                =
                       P(B)
                     2
                =
                     3
Conditional Probability Example
   Suppose we throw a fair die:
   Ω = {1, 2, 3, 4, 5, 6}, F = 2Ω , P({i}) = 16 , i = 1 . . . 6
   A = {1, 2, 3, 4} i.e., “result is less than 5,”
   B = {1, 3, 5} i.e., “result is odd.”
                       2
         P(A) =
                       3
                       1
         P(B) =
                       2
                       P(A ∩ B)                                 P(A ∩ B)
      P(A|B) =                                  P(B|A) =
                         P(B)                                     P(A)
                       P({1, 3})                                1
                   =                                       =
                         P(B)                                   2
                       2
                   =
                       3
                                    P(A|B)P(B)
                         P(B|A) =
                                       P(A)
   Often this is written as:
                                  P(A|Bi )P(Bi )
                      P(Bi |A) = P
                                  i P(A|Bi )P(Bi )
   Where Bi are a partition of Ω (note the bottom is just the law of
   total probability).
Independence
                                                  1.0
                1.0
                                                  0.8
                0.8
                                                  0.6
                0.6
                              ●
        FX(x)
                                          FX(x)
                0.4
                                                  0.4
                0.2
0.2
                          ●
                0.0
0.0
                      0   1   2   3   4                 −2   −1   0   1   2
Discrete Distributions
                                      0.2
                                      0.1
                                      0.0
−4 −2 0 2 4
                                                      x
Multiple Random Variables
                      =     aE (X ) + bE (Y ) + c
Characteristics of Distributions
   Questions:
    1. E [EX ] =
Characteristics of Distributions
   Questions:
                   P
    1. E [EX ] =   x (EX )fX (x)   =
Characteristics of Distributions
   Questions:
                   P                         P
    1. E [EX ] =   x (EX )fX (x)   = (EX )   x fX (x)   = EX
Characteristics of Distributions
   Questions:
                   P                          P
    1. E [EX ] =    x (EX )fX (x)   = (EX )   x fX (x)   = EX
    2. E (X · Y ) = E (X )E (Y )?
Characteristics of Distributions
   Questions:
                   P                           P
    1. E [EX ] =     x (EX )fX (x)   = (EX )       x fX (x)   = EX
    2. E (X · Y ) = E (X )E (Y )?
       Not in general, although when fX ,Y = fX fY :
                     X                         X             X
       E (X ·Y ) =         xyfX (x)fY (y ) =       xfX (x)        yfY (y ) = EX ·EY
                     x,y                       x              y
Characteristics of Distributions
                        Var(X ) = E (X − EX )2
   This may give an idea of how “spread out” a distribution is.
Characteristics of Distributions
                         Var(X ) = E (X − EX )2
   This may give an idea of how “spread out” a distribution is.
   A useful alternate form is:
           E (X − EX )2 = E [X 2 − 2XE (X ) + (EX )2 ]
                           = E (X 2 ) − 2E (X )E (X ) + (EX )2
                           = E (X 2 ) − (EX )2
Characteristics of Distributions
                         Var(X ) = E (X − EX )2
   This may give an idea of how “spread out” a distribution is.
   A useful alternate form is:
           E (X − EX )2 = E [X 2 − 2XE (X ) + (EX )2 ]
                           = E (X 2 ) − 2E (X )E (X ) + (EX )2
                           = E (X 2 ) − (EX )2
      Var(X + Y )   =   E (X − EX + Y − EY )2
                    =   E (X − EX )2 + E (Y − EY )2 +2 E (X − EX )(Y − EY )
                        |    {z    } |      {z    }    |        {z        }
                            Var(X )        Var(Y )             Cov(X ,Y )
Characteristics of Distributions
      Var(X + Y )   =   E (X − EX + Y − EY )2
                    =   E (X − EX )2 + E (Y − EY )2 +2 E (X − EX )(Y − EY )
                        |    {z    } |      {z    }    |        {z        }
                            Var(X )          Var(Y )           Cov(X ,Y )
   (why?)
Putting it all together
            E (X̄n ) =
Putting it all together
                           n
                           1X
            E (X̄n ) = E [    Xi ] =
                           n
                          i=1
Putting it all together
                           n           n
                           1X        1X
            E (X̄n ) = E [    Xi ] =    E (Xi ) =
                           n         n
                          i=1         i=1
Putting it all together
                           n           n
                           1X        1X          1
            E (X̄n ) = E [    Xi ] =    E (Xi ) = nµ = µ
                           n         n           n
                          i=1         i=1
Putting it all together
                           n              n
                           1X        1X          1
            E (X̄n ) = E [    Xi ] =    E (Xi ) = nµ = µ
                           n         n           n
                          i=1            i=1
                                    n
                                  1X
               Var(X̄n ) = Var(      Xi ) =
                                  n
                                   i=1
Putting it all together
                           n              n
                           1X        1X          1
            E (X̄n ) = E [    Xi ] =    E (Xi ) = nµ = µ
                           n         n           n
                          i=1            i=1
                                    n
                                  1X        1        σ2
               Var(X̄n ) = Var(      Xi ) = 2 nσ 2 =
                                  n        n         n
                                   i=1
Entropy of a Distribution
   Entropy gives the mean depth in the tree (= mean number of bits).
Law of Large Numbers (LLN)
                                          σ2
                     P(|X̄n − µ| ≥ ) ≤       →0
                                          n2
  For any fixed , as n → ∞.
Law of Large Numbers (LLN)
  Recall our variable X̄n = n1 ni=1 Xi .
                              P
  We may wonder about its behavior as n → ∞.
Law of Large Numbers (LLN)
  Recall our variable X̄n = n1 ni=1 Xi .
                              P
  We may wonder about its behavior as n → ∞.
  The weak law of large numbers:
  In English: choose  and a probability that |X̄n − µ| < , I can find you
  an n so your probability is achieved.
Law of Large Numbers (LLN)
  Recall our variable X̄n = n1 ni=1 Xi .
                              P
  We may wonder about its behavior as n → ∞.
  The weak law of large numbers:
  In English: choose  and a probability that |X̄n − µ| < , I can find you
  an n so your probability is achieved.
  The strong law of large numbers:
                            P( lim X̄n = µ) = 1
                               n→∞
  In English: choose  and a probability that |X̄n − µ| < , I can find you
  an n so your probability is achieved.
  The strong law of large numbers:
                            P( lim X̄n = µ) = 1
                               n→∞
                               n= 1                         0.8               n= 2                                       n= 10                                       n= 75
             0.8
                                                                                                                                                       2.0
                                                                                                           0.6
                                                            0.6
             0.6
                                                                                                                                                       1.5
   Density
Density
Density
                                                                                                                                             Density
                                                                                                           0.4
                                                            0.4
             0.4
                                                                                                                                                       1.0
                                                                                                           0.2
                                                            0.2
             0.2
                                                                                                                                                       0.5
             0.0
0.0
0.0
                                                                                                                                                       0.0
                   1   2   3          4   5   6                   1   2   3          4   5   6                   1   2   3       4   5   6                   1   2   3       4   5   6
                                h                                              h                                             h                                           h
Central Limit Theorem (CLT)
   The distribution of X̄n also converges weakly to a Gaussian,
                                                                                        x −µ
                                                                       lim FX̄n (x) = Φ( √ )
                                                                      n→∞                 nσ
         Simulated n dice rolls and took average, 5000 times:
                               n= 1                         0.8               n= 2                                       n= 10                                       n= 75
             0.8
                                                                                                                                                       2.0
                                                                                                           0.6
                                                            0.6
             0.6
                                                                                                                                                       1.5
   Density
Density
Density
                                                                                                                                             Density
                                                                                                           0.4
                                                            0.4
             0.4
                                                                                                                                                       1.0
                                                                                                           0.2
                                                            0.2
             0.2
                                                                                                                                                       0.5
             0.0
0.0
0.0
                                                                                                                                                       0.0
                   1   2   3          4   5   6                   1   2   3          4   5   6                   1   2   3       4   5   6                   1   2   3       4   5   6
h h h h
                               n= 1                         0.8               n= 2                                       n= 10                                       n= 75
             0.8
                                                                                                                                                       2.0
                                                                                                           0.6
                                                            0.6
             0.6
                                                                                                                                                       1.5
   Density
Density
Density
                                                                                                                                             Density
                                                                                                           0.4
                                                            0.4
             0.4
                                                                                                                                                       1.0
                                                                                                           0.2
                                                            0.2
             0.2
                                                                                                                                                       0.5
             0.0
0.0
0.0
                                                                                                                                                       0.0
                   1   2   3          4   5   6                   1   2   3          4   5   6                   1   2   3       4   5   6                   1   2   3       4   5   6
h h h h