18.
600: Lecture 13
Lectures 1-12 Review
     Scott Sheffield
          MIT
Outline
   Counting tricks and basic principles of probability
   Discrete random variables
Outline
   Counting tricks and basic principles of probability
   Discrete random variables
Selected counting tricks
     I   Break choosing one of the items to be counted into a
         sequence of stages so that one always has the same number of
         choices to make at each stage. Then the total count becomes
         a product of number of choices available at each stage.
Selected counting tricks
     I   Break choosing one of the items to be counted into a
         sequence of stages so that one always has the same number of
         choices to make at each stage. Then the total count becomes
         a product of number of choices available at each stage.
     I   Overcount by a fixed factor.
Selected counting tricks
     I   Break choosing one of the items to be counted into a
         sequence of stages so that one always has the same number of
         choices to make at each stage. Then the total count becomes
         a product of number of choices available at each stage.
     I   Overcount by a fixed factor.
     I   If you have n elements you wish to divide into r distinct piles
         of sizes n1 , n2 . . . nr , how many ways to do that?
Selected counting tricks
     I   Break choosing one of the items to be counted into a
         sequence of stages so that one always has the same number of
         choices to make at each stage. Then the total count becomes
         a product of number of choices available at each stage.
     I   Overcount by a fixed factor.
     I   If you have n elements you wish to divide into r distinct piles
         of sizes n1 , n2 . . . nr , how many ways to do that?
         Answer n1 ,n2n,...,nr := n1 !n2n!!...nr ! .                                
     I
Selected counting tricks
     I   Break choosing one of the items to be counted into a
         sequence of stages so that one always has the same number of
         choices to make at each stage. Then the total count becomes
         a product of number of choices available at each stage.
     I   Overcount by a fixed factor.
     I   If you have n elements you wish to divide into r distinct piles
         of sizes n1 , n2 . . . nr , how many ways to do that?
         Answer n1 ,n2n,...,nr := n1 !n2n!!...nr ! .                                
     I
     I   How many sequences a1 , . . . , ak of non-negative integers
         satisfy a1 + a2 + . . . + ak = n?
Selected counting tricks
     I   Break choosing one of the items to be counted into a
         sequence of stages so that one always has the same number of
         choices to make at each stage. Then the total count becomes
         a product of number of choices available at each stage.
     I   Overcount by a fixed factor.
     I   If you have n elements you wish to divide into r distinct piles
         of sizes n1 , n2 . . . nr , how many ways to do that?
         Answer n1 ,n2n,...,nr := n1 !n2n!!...nr ! .                                
     I
     I   How many sequences a1 , . . . , ak of non-negative integers
         satisfy a1 + a2 + . . . + ak = n?
         Answer: n+k1                           
     I
                       n     . Represent partition by k  1 bars and n
         stars, e.g., as   |  ||    |.
Axioms of probability
     I   Have a set S called sample space.
Axioms of probability
     I   Have a set S called sample space.
     I   P(A)  [0, 1] for all (measurable) A  S.
Axioms of probability
     I   Have a set S called sample space.
     I   P(A)  [0, 1] for all (measurable) A  S.
     I   P(S) = 1.
Axioms of probability
     I   Have a set S called sample space.
     I   P(A)  [0, 1] for all (measurable) A  S.
     I   P(S) = 1.
     I   Finite additivity: P(A  B) = P(A) + P(B) if A  B = .
Axioms of probability
     I   Have a set S called sample space.
     I   P(A)  [0, 1] for all (measurable) A  S.
     I   P(S) = 1.
     I   Finite additivity: P(A  B) = P(A) + P(B) if A  B = .
                                              P
     I   Countable additivity: P(i=1 Ei ) =  i=1 P(Ei ) if Ei  Ej = 
         for each pair i and j.
Consequences of axioms
    I   P(Ac ) = 1  P(A)
Consequences of axioms
    I   P(Ac ) = 1  P(A)
    I   A  B implies P(A)  P(B)
Consequences of axioms
    I   P(Ac ) = 1  P(A)
    I   A  B implies P(A)  P(B)
    I   P(A  B) = P(A) + P(B)  P(AB)
Consequences of axioms
    I   P(Ac ) = 1  P(A)
    I   A  B implies P(A)  P(B)
    I   P(A  B) = P(A) + P(B)  P(AB)
    I   P(AB)  P(A)
Inclusion-exclusion identity
     I   Observe P(A  B) = P(A) + P(B)  P(AB).
Inclusion-exclusion identity
     I   Observe P(A  B) = P(A) + P(B)  P(AB).
     I   Also, P(E  F  G ) =
         P(E ) + P(F ) + P(G )  P(EF )  P(EG )  P(FG ) + P(EFG ).
Inclusion-exclusion identity
     I   Observe P(A  B) = P(A) + P(B)  P(AB).
     I   Also, P(E  F  G ) =
         P(E ) + P(F ) + P(G )  P(EF )  P(EG )  P(FG ) + P(EFG ).
     I   More generally,
                                n
                                X                X
               P(ni=1 Ei ) =         P(Ei )             P(Ei1 Ei2 ) + . . .
                                i=1              i1 <i2
                                                 X
                           + (1)(r +1)                       P(Ei1 Ei2 . . . Eir )
                                           i1 <i2 <...<ir
                                                n+1
                           = + . . . + (1)               P(E1 E2 . . . En ).
Inclusion-exclusion identity
     I   Observe P(A  B) = P(A) + P(B)  P(AB).
     I   Also, P(E  F  G ) =
         P(E ) + P(F ) + P(G )  P(EF )  P(EG )  P(FG ) + P(EFG ).
     I   More generally,
                                n
                                X                X
               P(ni=1 Ei ) =         P(Ei )             P(Ei1 Ei2 ) + . . .
                                i=1              i1 <i2
                                                 X
                           + (1)(r +1)                       P(Ei1 Ei2 . . . Eir )
                                           i1 <i2 <...<ir
                                                n+1
                           = + . . . + (1)               P(E1 E2 . . . En ).
                                                                                      n
                        P                                                                 
     I   The notation i1 <i2 <...<ir means a sum over all of the                      r
         subsets of size r of the set {1, 2, . . . , n}.
Famous hat problem
    I   n people toss hats into a bin, randomly shuffle, return one hat
        to each person. Find probability nobody gets own hat.
Famous hat problem
    I   n people toss hats into a bin, randomly shuffle, return one hat
        to each person. Find probability nobody gets own hat.
    I   Inclusion-exclusion. Let Ei be the event that ith person gets
        own hat.
Famous hat problem
    I   n people toss hats into a bin, randomly shuffle, return one hat
        to each person. Find probability nobody gets own hat.
    I   Inclusion-exclusion. Let Ei be the event that ith person gets
        own hat.
    I   What is P(Ei1 Ei2 . . . Eir )?
Famous hat problem
    I   n people toss hats into a bin, randomly shuffle, return one hat
        to each person. Find probability nobody gets own hat.
    I   Inclusion-exclusion. Let Ei be the event that ith person gets
        own hat.
    I   What is P(Ei1 Ei2 . . . Eir )?
                  (nr )!
    I   Answer:     n! .
Famous hat problem
    I   n people toss hats into a bin, randomly shuffle, return one hat
        to each person. Find probability nobody gets own hat.
    I   Inclusion-exclusion. Let Ei be the event that ith person gets
        own hat.
    I   What is P(Ei1 Ei2 . . . Eir )?
               (nr )!
    I   Answer:  n! .
    I There are n terms         like that in the inclusion exclusion sum.
                 r
                n (nr )!
      What is r n! ?
Famous hat problem
    I   n people toss hats into a bin, randomly shuffle, return one hat
        to each person. Find probability nobody gets own hat.
    I   Inclusion-exclusion. Let Ei be the event that ith person gets
        own hat.
    I   What is P(Ei1 Ei2 . . . Eir )?
               (nr )!
    I   Answer:   n! .
    I There are n terms         like that in the inclusion exclusion sum.
                  r
                n (nr )!
      What is r n! ?
    I Answer: 1 .
               r!
Famous hat problem
    I   n people toss hats into a bin, randomly shuffle, return one hat
        to each person. Find probability nobody gets own hat.
    I   Inclusion-exclusion. Let Ei be the event that ith person gets
        own hat.
    I   What is P(Ei1 Ei2 . . . Eir )?
               (nr )!
    I   Answer:   n! .
    I There are n terms like that in the inclusion    exclusion sum.
                  r
                n (nr )!
      What is r n! ?
    I Answer: 1 .
               r!
    I P(n Ei ) = 1  1 + 1  1 + . . .  1
         i=1              2! 3! 4!         n!
Famous hat problem
    I   n people toss hats into a bin, randomly shuffle, return one hat
        to each person. Find probability nobody gets own hat.
    I   Inclusion-exclusion. Let Ei be the event that ith person gets
        own hat.
    I   What is P(Ei1 Ei2 . . . Eir )?
                  (nr )!
    I   Answer:      n! .
    I   There are nr terms like that in the inclusion exclusion sum.
        What is nr (nr    )!
                    
                        n! ?
    I   Answer: r1! .
                              1   1     1            1
    I   P(ni=1 Ei ) = 1  2!   + 3!  4! + . . .  n!
                                     1    1    1            1
    I   1  P(ni=1 Ei ) = 1  1 + 2!   3! + 4!   . . .  n!  1/e  .36788
Conditional probability
     I   Definition: P(E |F ) = P(EF )/P(F ).
Conditional probability
     I   Definition: P(E |F ) = P(EF )/P(F ).
     I   Call P(E |F ) the conditional probability of E given F  or
         probability of E conditioned on F .
Conditional probability
     I   Definition: P(E |F ) = P(EF )/P(F ).
     I   Call P(E |F ) the conditional probability of E given F  or
         probability of E conditioned on F .
     I   Nice fact: P(E1 E2 E3 . . . En ) =
         P(E1 )P(E2 |E1 )P(E3 |E1 E2 ) . . . P(En |E1 . . . En1 )
Conditional probability
     I   Definition: P(E |F ) = P(EF )/P(F ).
     I   Call P(E |F ) the conditional probability of E given F  or
         probability of E conditioned on F .
     I   Nice fact: P(E1 E2 E3 . . . En ) =
         P(E1 )P(E2 |E1 )P(E3 |E1 E2 ) . . . P(En |E1 . . . En1 )
     I   Useful when we think about multi-step experiments.
Conditional probability
     I   Definition: P(E |F ) = P(EF )/P(F ).
     I   Call P(E |F ) the conditional probability of E given F  or
         probability of E conditioned on F .
     I   Nice fact: P(E1 E2 E3 . . . En ) =
         P(E1 )P(E2 |E1 )P(E3 |E1 E2 ) . . . P(En |E1 . . . En1 )
     I   Useful when we think about multi-step experiments.
     I   For example, let Ei be event ith person gets own hat in the
         n-hat shuffle problem.
Dividing probability into two cases
                P(E ) = P(EF ) + P(EF c )
                     = P(E |F )P(F ) + P(E |F c )P(F c )
Dividing probability into two cases
                    P(E ) = P(EF ) + P(EF c )
                          = P(E |F )P(F ) + P(E |F c )P(F c )
     I   In words: want to know the probability of E . There are two
         scenarios F and F c . If I know the probabilities of the two
         scenarios and the probability of E conditioned on each
         scenario, I can work out the probability of E .
Bayes theorem
    I   Bayes theorem/law/rule states the following:
        P(A|B) = P(B|A)P(A)
                     P(B)   .
Bayes theorem
    I   Bayes theorem/law/rule states the following:
        P(A|B) = P(B|A)P(A)
                     P(B)   .
    I   Follows from definition of conditional probability:
        P(AB) = P(B)P(A|B) = P(A)P(B|A).
Bayes theorem
    I   Bayes theorem/law/rule states the following:
        P(A|B) = P(B|A)P(A)
                     P(B)   .
    I   Follows from definition of conditional probability:
        P(AB) = P(B)P(A|B) = P(A)P(B|A).
    I   Tells how to update estimate of probability of A when new
        evidence restricts your sample space to B.
Bayes theorem
    I   Bayes theorem/law/rule states the following:
        P(A|B) = P(B|A)P(A)
                     P(B)   .
    I   Follows from definition of conditional probability:
        P(AB) = P(B)P(A|B) = P(A)P(B|A).
    I   Tells how to update estimate of probability of A when new
        evidence restricts your sample space to B.
                       P(B|A)
    I   So P(A|B) is    P(B)    times P(A).
Bayes theorem
    I   Bayes theorem/law/rule states the following:
        P(A|B) = P(B|A)P(A)
                     P(B)   .
    I   Follows from definition of conditional probability:
        P(AB) = P(B)P(A|B) = P(A)P(B|A).
    I   Tells how to update estimate of probability of A when new
        evidence restricts your sample space to B.
                         P(B|A)
    I   So P(A|B) is      P(B)    times P(A).
                P(B|A)
    I   Ratio    P(B)    determines how compelling new evidence is.
P(|F ) is a probability measure
     I   We can check the probabilityPaxioms: 0  P(E |F )  1,
         P(S|F ) = 1, and P(Ei ) = P(Ei |F ), if i ranges over a
         countable set and the Ei are disjoint.
P(|F ) is a probability measure
     I   We can check the probabilityPaxioms: 0  P(E |F )  1,
         P(S|F ) = 1, and P(Ei ) = P(Ei |F ), if i ranges over a
         countable set and the Ei are disjoint.
     I   The probability measure P(|F ) is related to P().
P(|F ) is a probability measure
     I   We can check the probabilityPaxioms: 0  P(E |F )  1,
         P(S|F ) = 1, and P(Ei ) = P(Ei |F ), if i ranges over a
         countable set and the Ei are disjoint.
     I   The probability measure P(|F ) is related to P().
     I   To get former from latter, we set probabilities of elements
         outside of F to zero and multiply probabilities of events inside
         of F by 1/P(F ).
P(|F ) is a probability measure
     I   We can check the probabilityPaxioms: 0  P(E |F )  1,
         P(S|F ) = 1, and P(Ei ) = P(Ei |F ), if i ranges over a
         countable set and the Ei are disjoint.
     I   The probability measure P(|F ) is related to P().
     I   To get former from latter, we set probabilities of elements
         outside of F to zero and multiply probabilities of events inside
         of F by 1/P(F ).
     I   P() is the prior probability measure and P(|F ) is the
         posterior measure (revised after discovering that F occurs).
Independence
    I   Say E and F are independent if P(EF ) = P(E )P(F ).
Independence
    I   Say E and F are independent if P(EF ) = P(E )P(F ).
    I   Equivalent statement: P(E |F ) = P(E ). Also equivalent:
        P(F |E ) = P(F ).
Independence of multiple events
    I   Say E1 . . . En are independent if for each
        {i1 , i2 , . . . , ik }  {1, 2, . . . n} we have
        P(Ei1 Ei2 . . . Eik ) = P(Ei1 )P(Ei2 ) . . . P(Eik ).
Independence of multiple events
    I   Say E1 . . . En are independent if for each
        {i1 , i2 , . . . , ik }  {1, 2, . . . n} we have
        P(Ei1 Ei2 . . . Eik ) = P(Ei1 )P(Ei2 ) . . . P(Eik ).
    I   In other words, the product rule works.
Independence of multiple events
    I   Say E1 . . . En are independent if for each
        {i1 , i2 , . . . , ik }  {1, 2, . . . n} we have
        P(Ei1 Ei2 . . . Eik ) = P(Ei1 )P(Ei2 ) . . . P(Eik ).
    I   In other words, the product rule works.
    I   Independence implies P(E1 E2 E3 |E4 E5 E6 ) =
        P(E1 )P(E2 )P(E3 )P(E4 )P(E5 )P(E6 )
                P(E4 )P(E5 )P(E6 )           = P(E1 E2 E3 ), and other similar
        statements.
Independence of multiple events
    I   Say E1 . . . En are independent if for each
        {i1 , i2 , . . . , ik }  {1, 2, . . . n} we have
        P(Ei1 Ei2 . . . Eik ) = P(Ei1 )P(Ei2 ) . . . P(Eik ).
    I   In other words, the product rule works.
    I   Independence implies P(E1 E2 E3 |E4 E5 E6 ) =
        P(E1 )P(E2 )P(E3 )P(E4 )P(E5 )P(E6 )
                P(E4 )P(E5 )P(E6 )           = P(E1 E2 E3 ), and other similar
        statements.
    I   Does pairwise independence imply independence?
Independence of multiple events
    I   Say E1 . . . En are independent if for each
        {i1 , i2 , . . . , ik }  {1, 2, . . . n} we have
        P(Ei1 Ei2 . . . Eik ) = P(Ei1 )P(Ei2 ) . . . P(Eik ).
    I   In other words, the product rule works.
    I   Independence implies P(E1 E2 E3 |E4 E5 E6 ) =
        P(E1 )P(E2 )P(E3 )P(E4 )P(E5 )P(E6 )
                P(E4 )P(E5 )P(E6 )           = P(E1 E2 E3 ), and other similar
        statements.
    I   Does pairwise independence imply independence?
    I   No. Consider these three events: first coin heads, second coin
        heads, odd number heads. Pairwise independent, not
        independent.
Outline
   Counting tricks and basic principles of probability
   Discrete random variables
Outline
   Counting tricks and basic principles of probability
   Discrete random variables
Random variables
    I   A random variable X is a function from the state space to the
        real numbers.
Random variables
    I   A random variable X is a function from the state space to the
        real numbers.
    I   Can interpret X as a quantity whose value depends on the
        outcome of an experiment.
Random variables
    I   A random variable X is a function from the state space to the
        real numbers.
    I   Can interpret X as a quantity whose value depends on the
        outcome of an experiment.
    I   Say X is a discrete random variable if (with probability one)
        if it takes one of a countable set of values.
Random variables
    I   A random variable X is a function from the state space to the
        real numbers.
    I   Can interpret X as a quantity whose value depends on the
        outcome of an experiment.
    I   Say X is a discrete random variable if (with probability one)
        if it takes one of a countable set of values.
    I   For each a in this countable set, write p(a) := P{X = a}.
        Call p the probability mass function.
Random variables
    I   A random variable X is a function from the state space to the
        real numbers.
    I   Can interpret X as a quantity whose value depends on the
        outcome of an experiment.
    I   Say X is a discrete random variable if (with probability one)
        if it takes one of a countable set of values.
    I   For each a in this countable set, write p(a) := P{X = a}.
        Call p the probability mass function.
                                    P
    I   Write F (a) = P{X  a} = xa p(x). Call F the
        cumulative distribution function.
Indicators
     I   Given any event E , can define an indicator random variable,
         i.e., let X be random variable equal to 1 on the event E and 0
         otherwise. Write this as X = 1E .
Indicators
     I   Given any event E , can define an indicator random variable,
         i.e., let X be random variable equal to 1 on the event E and 0
         otherwise. Write this as X = 1E .
     I   The value of 1E (either 1 or 0) indicates whether the event
         has occurred.
Indicators
     I   Given any event E , can define an indicator random variable,
         i.e., let X be random variable equal to 1 on the event E and 0
         otherwise. Write this as X = 1E .
     I   The value of 1E (either 1 or 0) indicates whether the event
         has occurred.
         If E1 , E2 , . . . , Ek are events then X = ki=1 1Ei is the number
                                                    P
     I
         of these events that occur.
Indicators
     I   Given any event E , can define an indicator random variable,
         i.e., let X be random variable equal to 1 on the event E and 0
         otherwise. Write this as X = 1E .
     I   The value of 1E (either 1 or 0) indicates whether the event
         has occurred.
         If E1 , E2 , . . . , Ek are events then X = ki=1 1Ei is the number
                                                    P
     I
         of these events that occur.
     I   Example: in n-hat shuffle problem, let Ei be the event ith
         person gets own hat.
Indicators
     I   Given any event E , can define an indicator random variable,
         i.e., let X be random variable equal to 1 on the event E and 0
         otherwise. Write this as X = 1E .
     I   The value of 1E (either 1 or 0) indicates whether the event
         has occurred.
         If E1 , E2 , . . . , Ek are events then X = ki=1 1Ei is the number
                                                    P
     I
         of these events that occur.
     I   Example: in n-hat shuffle problem, let Ei be the event ith
         person gets own hat.
         Then ni=1 1Ei is total number of people who get own hats.
               P
     I
Expectation of a discrete random variable
     I   Say X is a discrete random variable if (with probability one)
         it takes one of a countable set of values.
Expectation of a discrete random variable
     I   Say X is a discrete random variable if (with probability one)
         it takes one of a countable set of values.
     I   For each a in this countable set, write p(a) := P{X = a}.
         Call p the probability mass function.
Expectation of a discrete random variable
     I   Say X is a discrete random variable if (with probability one)
         it takes one of a countable set of values.
     I   For each a in this countable set, write p(a) := P{X = a}.
         Call p the probability mass function.
     I   The expectation of X , written E [X ], is defined by
                                      X
                           E [X ] =         xp(x).
                                     x:p(x)>0
Expectation of a discrete random variable
     I   Say X is a discrete random variable if (with probability one)
         it takes one of a countable set of values.
     I   For each a in this countable set, write p(a) := P{X = a}.
         Call p the probability mass function.
     I   The expectation of X , written E [X ], is defined by
                                      X
                           E [X ] =         xp(x).
                                     x:p(x)>0
     I   Represents weighted average of possible values X can take,
         each value being weighted by its probability.
Expectation when state space is countable
    I   If the state space S is countable, we can give SUM OVER
        STATE SPACE definition of expectation:
                                     X
                            E [X ] =     P{s}X (s).
                                 sS
Expectation when state space is countable
    I   If the state space S is countable, we can give SUM OVER
        STATE SPACE definition of expectation:
                                     X
                            E [X ] =     P{s}X (s).
                                 sS
    I   Agrees with the SUM OVER POSSIBLE X VALUES
        definition:                X
                          E [X ] =   xp(x).
                                  x:p(x)>0
Expectation of a function of a random variable
     I   If X is a random variable and g is a function from the real
         numbers to the real numbers then g (X ) is also a random
         variable.
Expectation of a function of a random variable
     I   If X is a random variable and g is a function from the real
         numbers to the real numbers then g (X ) is also a random
         variable.
     I   How can we compute E [g (X )]?
Expectation of a function of a random variable
     I   If X is a random variable and g is a function from the real
         numbers to the real numbers then g (X ) is also a random
         variable.
     I   How can we compute E [g (X )]?
     I   Answer:                          X
                         E [g (X )] =              g (x)p(x).
                                        x:p(x)>0
Additivity of expectation
     I   If X and Y are distinct random variables, then
         E [X + Y ] = E [X ] + E [Y ].
Additivity of expectation
     I   If X and Y are distinct random variables, then
         E [X + Y ] = E [X ] + E [Y ].
     I   In fact, for real constants a and b, we have
         E [aX + bY ] = aE [X ] + bE [Y ].
Additivity of expectation
     I   If X and Y are distinct random variables, then
         E [X + Y ] = E [X ] + E [Y ].
     I   In fact, for real constants a and b, we have
         E [aX + bY ] = aE [X ] + bE [Y ].
     I   This is called the linearity of expectation.
Additivity of expectation
     I   If X and Y are distinct random variables, then
         E [X + Y ] = E [X ] + E [Y ].
     I   In fact, for real constants a and b, we have
         E [aX + bY ] = aE [X ] + bE [Y ].
     I   This is called the linearity of expectation.
     I   Can extend to more variables
         E [X1 + X2 + . . . + Xn ] = E [X1 ] + E [X2 ] + . . . + E [Xn ].
Defining variance in discrete case
     I   Let X be a random variable with mean .
Defining variance in discrete case
     I   Let X be a random variable with mean .
     I   The variance of X , denoted Var(X ), is defined by
         Var(X ) = E [(X  )2 ].
Defining variance in discrete case
     I   Let X be a random variable with mean .
     I   The variance of X , denoted Var(X ), is defined by
         Var(X ) = E [(X  )2 ].
     I   Taking g (x)P= (x  )2 , and recalling that
         E [g (X )] = x:p(x)>0 g (x)p(x), we find that
                                      X
                        Var[X ] =          (x  )2 p(x).
                                    x:p(x)>0
Defining variance in discrete case
     I   Let X be a random variable with mean .
     I   The variance of X , denoted Var(X ), is defined by
         Var(X ) = E [(X  )2 ].
     I   Taking g (x)P= (x  )2 , and recalling that
         E [g (X )] = x:p(x)>0 g (x)p(x), we find that
                                      X
                        Var[X ] =          (x  )2 p(x).
                                    x:p(x)>0
     I   Variance is one way to measure the amount a random variable
         varies from its mean over successive trials.
Defining variance in discrete case
     I   Let X be a random variable with mean .
     I   The variance of X , denoted Var(X ), is defined by
         Var(X ) = E [(X  )2 ].
     I   Taking g (x)P= (x  )2 , and recalling that
         E [g (X )] = x:p(x)>0 g (x)p(x), we find that
                                       X
                         Var[X ] =          (x  )2 p(x).
                                     x:p(x)>0
     I   Variance is one way to measure the amount a random variable
         varies from its mean over successive trials.
     I   Very important alternate formula: Var[X ] = E [X 2 ]  (E [X ])2 .
Identity
     I   If Y = X + b, where b is constant, then Var[Y ] = Var[X ].
Identity
     I   If Y = X + b, where b is constant, then Var[Y ] = Var[X ].
     I   Also, Var[aX ] = a2 Var[X ].
Identity
     I   If Y = X + b, where b is constant, then Var[Y ] = Var[X ].
     I   Also, Var[aX ] = a2 Var[X ].
     I   Proof: Var[aX ] = E [a2 X 2 ]  E [aX ]2 = a2 E [X 2 ]  a2 E [X ]2 =
         a2 Var[X ].
Standard deviation
                         p
    I   Write SD[X ] =    Var[X ].
Standard deviation
                         p
    I   Write SD[X ] =    Var[X ].
    I   Satisfies identity SD[aX ] = aSD[X ].
Standard deviation
                         p
    I   Write SD[X ] =    Var[X ].
    I   Satisfies identity SD[aX ] = aSD[X ].
    I   Uses the same units as X itself.
Standard deviation
                         p
    I   Write SD[X ] =    Var[X ].
    I   Satisfies identity SD[aX ] = aSD[X ].
    I   Uses the same units as X itself.
    I   If we switch from feet to inches in our height of randomly
        chosen person example, then X , E [X ], and SD[X ] each get
        multiplied by 12, but Var[X ] gets multiplied by 144.
Bernoulli random variables
     I   Toss fair coin n times. (Tosses are independent.) What is the
         probability of k heads?
Bernoulli random variables
     I   Toss fair coin n times. (Tosses are independent.) What is the
         probability of k heads?
         Answer: kn /2n .                     
     I
Bernoulli random variables
     I   Toss fair coin n times. (Tosses are independent.) What is the
         probability of k heads?
         Answer: kn /2n .                     
     I
     I   What if coin has p probability to be heads?
Bernoulli random variables
     I   Toss fair coin n times. (Tosses are independent.) What is the
         probability of k heads?
         Answer: kn /2n .                     
     I
     I   What if coin has p probability to be heads?
         Answer: kn p k (1  p)nk .                    
     I
Bernoulli random variables
     I   Toss fair coin n times. (Tosses are independent.) What is the
         probability of k heads?
         Answer: kn /2n .                     
     I
     I   What if coin has p probability to be heads?
         Answer: kn p k (1  p)nk .                    
     I
         Writing q = 1  p, we can write this as kn p k q nk
                                                        I
Bernoulli random variables
     I   Toss fair coin n times. (Tosses are independent.) What is the
         probability of k heads?
         Answer: kn /2n .                     
     I
     I   What if coin has p probability to be heads?
         Answer: kn p k (1  p)nk .                    
     I
         Writing q = 1  p, we can write this as kn p k q nk
                                                        I
     I   Can use binomial theorem to show probabilities sum to one:
Bernoulli random variables
     I   Toss fair coin n times. (Tosses are independent.) What is the
         probability of k heads?
         Answer: kn /2n .                     
     I
     I   What if coin has p probability to be heads?
         Answer: kn p k (1  p)nk .                    
     I
         Writing q = 1  p, we can write this as kn p k q nk
                                                        I
     I   Can use binomial theorem to show probabilities sum to one:
         1 = 1n = (p + q)n = nk=0 kn p k q nk .
                             P        
     I
Bernoulli random variables
     I   Toss fair coin n times. (Tosses are independent.) What is the
         probability of k heads?
         Answer: kn /2n .                     
     I
     I   What if coin has p probability to be heads?
         Answer: kn p k (1  p)nk .                    
     I
         Writing q = 1  p, we can write this as kn p k q nk
                                                        I
     I   Can use binomial theorem to show probabilities sum to one:
         1 = 1n = (p + q)n = nk=0 kn p k q nk .
                             P        
     I
     I   Number of heads is binomial random variable with
         parameters (n, p).
Decomposition approach to computing expectation
    I   Let X be a binomial random variable with parameters (n, p).
        Here is one way to compute E [X ].
Decomposition approach to computing expectation
    I   Let X be a binomial random variable with parameters (n, p).
        Here is one way to compute E [X ].
    I   Think of X as representing number of heads in n tosses of
        coin that is heads with probability p.
Decomposition approach to computing expectation
    I   Let X be a binomial random variable with parameters (n, p).
        Here is one way to compute E [X ].
    I   Think of X as representing number of heads in n tosses of
        coin that is heads with probability p.
        Write X = nj=1 Xj , where Xj is 1 if the jth coin is heads, 0
                    P
    I
        otherwise.
Decomposition approach to computing expectation
    I   Let X be a binomial random variable with parameters (n, p).
        Here is one way to compute E [X ].
    I   Think of X as representing number of heads in n tosses of
        coin that is heads with probability p.
        Write X = nj=1 Xj , where Xj is 1 if the jth coin is heads, 0
                    P
    I
        otherwise.
    I   In other words, Xj is the number of heads (zero or one) on the
        jth toss.
Decomposition approach to computing expectation
    I   Let X be a binomial random variable with parameters (n, p).
        Here is one way to compute E [X ].
    I   Think of X as representing number of heads in n tosses of
        coin that is heads with probability p.
        Write X = nj=1 Xj , where Xj is 1 if the jth coin is heads, 0
                    P
    I
        otherwise.
    I   In other words, Xj is the number of heads (zero or one) on the
        jth toss.
    I   Note that E [Xj ] = p  1 + (1  p)  0 = p for each j.
Decomposition approach to computing expectation
    I   Let X be a binomial random variable with parameters (n, p).
        Here is one way to compute E [X ].
    I   Think of X as representing number of heads in n tosses of
        coin that is heads with probability p.
        Write X = nj=1 Xj , where Xj is 1 if the jth coin is heads, 0
                    P
    I
        otherwise.
    I   In other words, Xj is the number of heads (zero or one) on the
        jth toss.
    I   Note that E [Xj ] = p  1 + (1  p)  0 = p for each j.
    I   Conclude by additivity of expectation that
                                  n
                                  X                 n
                                                    X
                       E [X ] =         E [Xj ] =         p = np.
                                  j=1               j=1
Compute variance with decomposition trick
        X = nj=1 Xj , so
              P
    I
        E [X 2 ] = E [ ni=1 Xi nj=1 Xj ] = ni=1 nj=1 E [Xi Xj ]
                      P       P           P P
Compute variance with decomposition trick
        X = nj=1 Xj , so
              P
    I
        E [X 2 ] = E [ ni=1 Xi nj=1 Xj ] = ni=1 nj=1 E [Xi Xj ]
                      P       P           P P
    I   E [Xi Xj ] is p if i = j, p 2 otherwise.
Compute variance with decomposition trick
        X = nj=1 Xj , so
              P
    I
        E [X 2 ] = E [ ni=1 Xi nj=1 Xj ] = ni=1 nj=1 E [Xi Xj ]
                      P       P           P P
    I   E [Xi Xj ] is p if i = j, p 2 otherwise.
        Pn Pn
                   j=1 E [Xi Xj ] has n terms equal to p and (n  1)n
    I
           i=1
        terms equal to p 2 .
Compute variance with decomposition trick
        X = nj=1 Xj , so
              P
    I
        E [X 2 ] = E [ ni=1 Xi nj=1 Xj ] = ni=1 nj=1 E [Xi Xj ]
                      P       P           P P
    I   E [Xi Xj ] is p if i = j, p 2 otherwise.
        Pn Pn
                   j=1 E [Xi Xj ] has n terms equal to p and (n  1)n
    I
           i=1
        terms equal to p 2 .
    I   So E [X 2 ] = np + (n  1)np 2 = np + (np)2  np 2 .
Compute variance with decomposition trick
        X = nj=1 Xj , so
              P
    I
        E [X 2 ] = E [ ni=1 Xi nj=1 Xj ] = ni=1 nj=1 E [Xi Xj ]
                      P       P           P P
    I   E [Xi Xj ] is p if i = j, p 2 otherwise.
        Pn Pn
                   j=1 E [Xi Xj ] has n terms equal to p and (n  1)n
    I
           i=1
        terms equal to p 2 .
    I   So E [X 2 ] = np + (n  1)np 2 = np + (np)2  np 2 .
    I   Thus
        Var[X ] = E [X 2 ]  E [X ]2 = np  np 2 = np(1  p) = npq.
Compute variance with decomposition trick
        X = nj=1 Xj , so
              P
    I
        E [X 2 ] = E [ ni=1 Xi nj=1 Xj ] = ni=1 nj=1 E [Xi Xj ]
                      P       P           P P
    I   E [Xi Xj ] is p if i = j, p 2 otherwise.
        Pn Pn
                   j=1 E [Xi Xj ] has n terms equal to p and (n  1)n
    I
           i=1
        terms equal to p 2 .
    I   So E [X 2 ] = np + (n  1)np 2 = np + (np)2  np 2 .
    I   Thus
        Var[X ] = E [X 2 ]  E [X ]2 = np  np 2 = np(1  p) = npq.
    I   Can P
            show generally
                       Pnthat if X1 , . . . , Xn independent then
              n
        Var[ j=1 Xj ] = j=1 Var[Xj ]
Bernoulli random variable with n large and np = 
    I   Let  be some moderate-sized number. Say  = 2 or  = 3.
        Let n be a huge number, say n = 106 .
Bernoulli random variable with n large and np = 
    I   Let  be some moderate-sized number. Say  = 2 or  = 3.
        Let n be a huge number, say n = 106 .
    I   Suppose I have a coin that comes on heads with probability
        /n and I toss it n times.
Bernoulli random variable with n large and np = 
    I   Let  be some moderate-sized number. Say  = 2 or  = 3.
        Let n be a huge number, say n = 106 .
    I   Suppose I have a coin that comes on heads with probability
        /n and I toss it n times.
    I   How many heads do I expect to see?
Bernoulli random variable with n large and np = 
    I   Let  be some moderate-sized number. Say  = 2 or  = 3.
        Let n be a huge number, say n = 106 .
    I   Suppose I have a coin that comes on heads with probability
        /n and I toss it n times.
    I   How many heads do I expect to see?
    I   Answer: np = .
Bernoulli random variable with n large and np = 
    I   Let  be some moderate-sized number. Say  = 2 or  = 3.
        Let n be a huge number, say n = 106 .
    I   Suppose I have a coin that comes on heads with probability
        /n and I toss it n times.
    I   How many heads do I expect to see?
    I   Answer: np = .
    I   Let k be some moderate sized number (say k = 4). What is
        the probability that I see exactly k heads?
Bernoulli random variable with n large and np = 
    I   Let  be some moderate-sized number. Say  = 2 or  = 3.
        Let n be a huge number, say n = 106 .
    I   Suppose I have a coin that comes on heads with probability
        /n and I toss it n times.
    I   How many heads do I expect to see?
    I   Answer: np = .
    I   Let k be some moderate sized number (say k = 4). What is
        the probability that I see exactly k heads?
    I   Binomial   formula:
         n k                  n(n1)(n2)...(nk+1) k
                    p)nk =                                p)nk .           
         k   p (1                     k!          p (1
Bernoulli random variable with n large and np = 
    I   Let  be some moderate-sized number. Say  = 2 or  = 3.
        Let n be a huge number, say n = 106 .
    I   Suppose I have a coin that comes on heads with probability
        /n and I toss it n times.
    I   How many heads do I expect to see?
    I   Answer: np = .
    I   Let k be some moderate sized number (say k = 4). What is
        the probability that I see exactly k heads?
    I   Binomial   formula:
         n k                  n(n1)(n2)...(nk+1) k
                    p)nk =                                   p)nk .           
         k   p (1                      k!         p (1 
                                  k                k
    I   This is   approximately k! (1  p)nk  k! e  .
Bernoulli random variable with n large and np = 
    I   Let  be some moderate-sized number. Say  = 2 or  = 3.
        Let n be a huge number, say n = 106 .
    I   Suppose I have a coin that comes on heads with probability
        /n and I toss it n times.
    I   How many heads do I expect to see?
    I   Answer: np = .
    I   Let k be some moderate sized number (say k = 4). What is
        the probability that I see exactly k heads?
    I   Binomial   formula:
         n k                  n(n1)(n2)...(nk+1) k
                    p)nk =                                   p)nk .           
         k   p (1                      k!         p (1 
                                  k                k
    I   This is   approximately k! (1  p)nk  k! e  .
    I   A Poisson random variable X with parameter  satisfies
                     k
        P{X = k} = k! e  for integer k  0.
Expectation and variance
    I   A Poisson random variable X with parameter  satisfies
                     k
        P{X = k} = k! e  for integer k  0.
Expectation and variance
    I   A Poisson random variable X with parameter  satisfies
                     k
        P{X = k} = k! e  for integer k  0.
    I   Clever computation tricks yield E [X ] =  and Var[X ] = .
Expectation and variance
    I   A Poisson random variable X with parameter  satisfies
                     k
        P{X = k} = k! e  for integer k  0.
    I   Clever computation tricks yield E [X ] =  and Var[X ] = .
    I   We think of a Poisson random variable as being (roughly) a
        Bernoulli (n, p) random variable with n very large and
        p = /n.
Expectation and variance
    I   A Poisson random variable X with parameter  satisfies
                     k
        P{X = k} = k! e  for integer k  0.
    I   Clever computation tricks yield E [X ] =  and Var[X ] = .
    I   We think of a Poisson random variable as being (roughly) a
        Bernoulli (n, p) random variable with n very large and
        p = /n.
    I   This also suggests E [X ] = np =  and Var[X ] = npq  .