Properties of Expected values and Variance
Christopher Croke
University of Pennsylvania
Math 115
UPenn, Fall 2011
Christopher Croke Calculus 115
Expected value
Consider a random variable Y = r (X ) for some function r , e.g.
Y = X 2 + 3 so in this case r (x) = x 2 + 3.
Christopher Croke Calculus 115
Expected value
Consider a random variable Y = r (X ) for some function r , e.g.
Y = X 2 + 3 so in this case r (x) = x 2 + 3. It turns out (and we
have already used) that
Z ∞
E (r (X )) = r (x)f (x)dx.
−∞
Christopher Croke Calculus 115
Expected value
Consider a random variable Y = r (X ) for some function r , e.g.
Y = X 2 + 3 so in this case r (x) = x 2 + 3. It turns out (and we
have already used) that
Z ∞
E (r (X )) = r (x)f (x)dx.
−∞
R∞
This is not obvious since by definition E (r (X )) = −∞ xfY (x)dx
where fY (x) is the probability density function of Y = r (X ).
Christopher Croke Calculus 115
Expected value
Consider a random variable Y = r (X ) for some function r , e.g.
Y = X 2 + 3 so in this case r (x) = x 2 + 3. It turns out (and we
have already used) that
Z ∞
E (r (X )) = r (x)f (x)dx.
−∞
R∞
This is not obvious since by definition E (r (X )) = −∞ xfY (x)dx
where fY (x) is the probability density function of Y = r (X ).
You get from one integral to the other by careful uses of u
substitution.
Christopher Croke Calculus 115
Expected value
Consider a random variable Y = r (X ) for some function r , e.g.
Y = X 2 + 3 so in this case r (x) = x 2 + 3. It turns out (and we
have already used) that
Z ∞
E (r (X )) = r (x)f (x)dx.
−∞
R∞
This is not obvious since by definition E (r (X )) = −∞ xfY (x)dx
where fY (x) is the probability density function of Y = r (X ).
You get from one integral to the other by careful uses of u
substitution.
One consequence is
Z ∞
E (aX + b) = (ax + b)f (x)dx = aE (X ) + b.
−∞
Christopher Croke Calculus 115
Expected value
Consider a random variable Y = r (X ) for some function r , e.g.
Y = X 2 + 3 so in this case r (x) = x 2 + 3. It turns out (and we
have already used) that
Z ∞
E (r (X )) = r (x)f (x)dx.
−∞
R∞
This is not obvious since by definition E (r (X )) = −∞ xfY (x)dx
where fY (x) is the probability density function of Y = r (X ).
You get from one integral to the other by careful uses of u
substitution.
One consequence is
Z ∞
E (aX + b) = (ax + b)f (x)dx = aE (X ) + b.
−∞
(It is not usually the case that E (r (X )) = r (E (X )).)
Christopher Croke Calculus 115
Expected value
Consider a random variable Y = r (X ) for some function r , e.g.
Y = X 2 + 3 so in this case r (x) = x 2 + 3. It turns out (and we
have already used) that
Z ∞
E (r (X )) = r (x)f (x)dx.
−∞
R∞
This is not obvious since by definition E (r (X )) = −∞ xfY (x)dx
where fY (x) is the probability density function of Y = r (X ).
You get from one integral to the other by careful uses of u
substitution.
One consequence is
Z ∞
E (aX + b) = (ax + b)f (x)dx = aE (X ) + b.
−∞
(It is not usually the case that E (r (X )) = r (E (X )).)
Similar facts old for discrete random variables.
Christopher Croke Calculus 115
Problem For X the uniform distribution on [0, 2] what is E (X 2 )?
Christopher Croke Calculus 115
Problem For X the uniform distribution on [0, 2] what is E (X 2 )?
If X1 , X2 , X3 , ...Xn are random variables and
Y = r (X1 , X2 , X3 , ...Xn ) then
Z Z Z Z
E (Y ) = ... r (x1 , x2 , x3 , ..., xn )f (x1 , x2 , x3 , ..., xn )dx1 dx2 dx3 ...dxn
where f (x1 , x2 , x3 , ..., xn ) is the joint probability density function.
Christopher Croke Calculus 115
Problem For X the uniform distribution on [0, 2] what is E (X 2 )?
If X1 , X2 , X3 , ...Xn are random variables and
Y = r (X1 , X2 , X3 , ...Xn ) then
Z Z Z Z
E (Y ) = ... r (x1 , x2 , x3 , ..., xn )f (x1 , x2 , x3 , ..., xn )dx1 dx2 dx3 ...dxn
where f (x1 , x2 , x3 , ..., xn ) is the joint probability density function.
Problem Consider again our example of randomly choosing a point
in [0, 1] × [0, 1].
Christopher Croke Calculus 115
Problem For X the uniform distribution on [0, 2] what is E (X 2 )?
If X1 , X2 , X3 , ...Xn are random variables and
Y = r (X1 , X2 , X3 , ...Xn ) then
Z Z Z Z
E (Y ) = ... r (x1 , x2 , x3 , ..., xn )f (x1 , x2 , x3 , ..., xn )dx1 dx2 dx3 ...dxn
where f (x1 , x2 , x3 , ..., xn ) is the joint probability density function.
Problem Consider again our example of randomly choosing a point
in [0, 1] × [0, 1]. We could let X be the random variable of choosing
the first coordinate and Y the second. What is E (X + Y )?
(note that f (x, y ) = 1.)
Christopher Croke Calculus 115
Problem For X the uniform distribution on [0, 2] what is E (X 2 )?
If X1 , X2 , X3 , ...Xn are random variables and
Y = r (X1 , X2 , X3 , ...Xn ) then
Z Z Z Z
E (Y ) = ... r (x1 , x2 , x3 , ..., xn )f (x1 , x2 , x3 , ..., xn )dx1 dx2 dx3 ...dxn
where f (x1 , x2 , x3 , ..., xn ) is the joint probability density function.
Problem Consider again our example of randomly choosing a point
in [0, 1] × [0, 1]. We could let X be the random variable of choosing
the first coordinate and Y the second. What is E (X + Y )?
(note that f (x, y ) = 1.)
Easy properties of expected values:
If Pr (X ≥ a) = 1 then E (X ) ≥ a.
If Pr (X ≤ b) = 1 then E (X ) ≤ b.
Christopher Croke Calculus 115
Properties of E (X )
A little more surprising (but not hard and we have already used):
E (X1 + X2 + X3 + ... + Xn ) = E (X1 ) + E (X2 ) + E (X3 ) + ... + E (Xn ).
Christopher Croke Calculus 115
Properties of E (X )
A little more surprising (but not hard and we have already used):
E (X1 + X2 + X3 + ... + Xn ) = E (X1 ) + E (X2 ) + E (X3 ) + ... + E (Xn ).
Another way to look at binomial random variables;
Let Xi be 1 if the i th trial is a success and 0 if a failure.
Christopher Croke Calculus 115
Properties of E (X )
A little more surprising (but not hard and we have already used):
E (X1 + X2 + X3 + ... + Xn ) = E (X1 ) + E (X2 ) + E (X3 ) + ... + E (Xn ).
Another way to look at binomial random variables;
Let Xi be 1 if the i th trial is a success and 0 if a failure. Note that
E (Xi ) = 0 · q + 1 · p = p.
Christopher Croke Calculus 115
Properties of E (X )
A little more surprising (but not hard and we have already used):
E (X1 + X2 + X3 + ... + Xn ) = E (X1 ) + E (X2 ) + E (X3 ) + ... + E (Xn ).
Another way to look at binomial random variables;
Let Xi be 1 if the i th trial is a success and 0 if a failure. Note that
E (Xi ) = 0 · q + 1 · p = p.
Our binomial variable (the number of successes) is
X = X1 + X2 + X3 + ... + Xn so
E (X ) = E (X1 ) + E (X2 ) + E (X3 ) + ... + E (Xn ) = np.
Christopher Croke Calculus 115
Properties of E (X )
A little more surprising (but not hard and we have already used):
E (X1 + X2 + X3 + ... + Xn ) = E (X1 ) + E (X2 ) + E (X3 ) + ... + E (Xn ).
Another way to look at binomial random variables;
Let Xi be 1 if the i th trial is a success and 0 if a failure. Note that
E (Xi ) = 0 · q + 1 · p = p.
Our binomial variable (the number of successes) is
X = X1 + X2 + X3 + ... + Xn so
E (X ) = E (X1 ) + E (X2 ) + E (X3 ) + ... + E (Xn ) = np.
What about products?
Christopher Croke Calculus 115
Properties of E (X )
A little more surprising (but not hard and we have already used):
E (X1 + X2 + X3 + ... + Xn ) = E (X1 ) + E (X2 ) + E (X3 ) + ... + E (Xn ).
Another way to look at binomial random variables;
Let Xi be 1 if the i th trial is a success and 0 if a failure. Note that
E (Xi ) = 0 · q + 1 · p = p.
Our binomial variable (the number of successes) is
X = X1 + X2 + X3 + ... + Xn so
E (X ) = E (X1 ) + E (X2 ) + E (X3 ) + ... + E (Xn ) = np.
What about products? Only works out well if the random variables
are independent. If X1 , X2 , X3 , ...Xn are independent random
variables then:
Yn Yn
E ( Xi ) = E (Xi ).
i=1 i=1
Christopher Croke Calculus 115
Properties of Var (X )
Problem: Consider independent random variables X1 , X2 , and
X3 , where E (X1 ) = 2, E (X2 ) = −1, and E (X3 )=0. Compute
E (X12 (X2 + 3X3 )2 ).
Christopher Croke Calculus 115
Properties of Var (X )
Problem: Consider independent random variables X1 , X2 , and
X3 , where E (X1 ) = 2, E (X2 ) = −1, and E (X3 )=0. Compute
E (X12 (X2 + 3X3 )2 ).
There is not enough information!
Christopher Croke Calculus 115
Properties of Var (X )
Problem: Consider independent random variables X1 , X2 , and
X3 , where E (X1 ) = 2, E (X2 ) = −1, and E (X3 )=0. Compute
E (X12 (X2 + 3X3 )2 ).
There is not enough information! Also assume E (X12 ) = 3,
E (X22 ) = 1, and E (X32 ) = 2.
Christopher Croke Calculus 115
Properties of Var (X )
Problem: Consider independent random variables X1 , X2 , and
X3 , where E (X1 ) = 2, E (X2 ) = −1, and E (X3 )=0. Compute
E (X12 (X2 + 3X3 )2 ).
There is not enough information! Also assume E (X12 ) = 3,
E (X22 ) = 1, and E (X32 ) = 2.
Facts about Var (X ):
Var (X ) = 0 means the same as: there is a c such that
Pr (X = c) = 1.
Christopher Croke Calculus 115
Properties of Var (X )
Problem: Consider independent random variables X1 , X2 , and
X3 , where E (X1 ) = 2, E (X2 ) = −1, and E (X3 )=0. Compute
E (X12 (X2 + 3X3 )2 ).
There is not enough information! Also assume E (X12 ) = 3,
E (X22 ) = 1, and E (X32 ) = 2.
Facts about Var (X ):
Var (X ) = 0 means the same as: there is a c such that
Pr (X = c) = 1.
σ 2 (X ) = E (X 2 ) − E (X )2 (alternative definition)
Christopher Croke Calculus 115
Properties of Var (X )
Problem: Consider independent random variables X1 , X2 , and
X3 , where E (X1 ) = 2, E (X2 ) = −1, and E (X3 )=0. Compute
E (X12 (X2 + 3X3 )2 ).
There is not enough information! Also assume E (X12 ) = 3,
E (X22 ) = 1, and E (X32 ) = 2.
Facts about Var (X ):
Var (X ) = 0 means the same as: there is a c such that
Pr (X = c) = 1.
σ 2 (X ) = E (X 2 ) − E (X )2 (alternative definition)
σ 2 (aX + b) = a2 σ 2 (X ).
Christopher Croke Calculus 115
Properties of Var (X )
Problem: Consider independent random variables X1 , X2 , and
X3 , where E (X1 ) = 2, E (X2 ) = −1, and E (X3 )=0. Compute
E (X12 (X2 + 3X3 )2 ).
There is not enough information! Also assume E (X12 ) = 3,
E (X22 ) = 1, and E (X32 ) = 2.
Facts about Var (X ):
Var (X ) = 0 means the same as: there is a c such that
Pr (X = c) = 1.
σ 2 (X ) = E (X 2 ) − E (X )2 (alternative definition)
σ 2 (aX + b) = a2 σ 2 (X ). Proof: σ 2 (aX + b) =
E [(aX +b−(aµ+b))2 ]
Christopher Croke Calculus 115
Properties of Var (X )
Problem: Consider independent random variables X1 , X2 , and
X3 , where E (X1 ) = 2, E (X2 ) = −1, and E (X3 )=0. Compute
E (X12 (X2 + 3X3 )2 ).
There is not enough information! Also assume E (X12 ) = 3,
E (X22 ) = 1, and E (X32 ) = 2.
Facts about Var (X ):
Var (X ) = 0 means the same as: there is a c such that
Pr (X = c) = 1.
σ 2 (X ) = E (X 2 ) − E (X )2 (alternative definition)
σ 2 (aX + b) = a2 σ 2 (X ). Proof: σ 2 (aX + b) =
E [(aX +b−(aµ+b))2 ] = E [(aX −aµ)2 ]
Christopher Croke Calculus 115
Properties of Var (X )
Problem: Consider independent random variables X1 , X2 , and
X3 , where E (X1 ) = 2, E (X2 ) = −1, and E (X3 )=0. Compute
E (X12 (X2 + 3X3 )2 ).
There is not enough information! Also assume E (X12 ) = 3,
E (X22 ) = 1, and E (X32 ) = 2.
Facts about Var (X ):
Var (X ) = 0 means the same as: there is a c such that
Pr (X = c) = 1.
σ 2 (X ) = E (X 2 ) − E (X )2 (alternative definition)
σ 2 (aX + b) = a2 σ 2 (X ). Proof: σ 2 (aX + b) =
E [(aX +b−(aµ+b))2 ] = E [(aX −aµ)2 ] = a2 E [(X −µ)2 ]
Christopher Croke Calculus 115
Properties of Var (X )
Problem: Consider independent random variables X1 , X2 , and
X3 , where E (X1 ) = 2, E (X2 ) = −1, and E (X3 )=0. Compute
E (X12 (X2 + 3X3 )2 ).
There is not enough information! Also assume E (X12 ) = 3,
E (X22 ) = 1, and E (X32 ) = 2.
Facts about Var (X ):
Var (X ) = 0 means the same as: there is a c such that
Pr (X = c) = 1.
σ 2 (X ) = E (X 2 ) − E (X )2 (alternative definition)
σ 2 (aX + b) = a2 σ 2 (X ). Proof: σ 2 (aX + b) =
E [(aX +b−(aµ+b))2 ] = E [(aX −aµ)2 ] = a2 E [(X −µ)2 ] = a2 σ 2 (X ).
Christopher Croke Calculus 115
Properties of Var (X )
Problem: Consider independent random variables X1 , X2 , and
X3 , where E (X1 ) = 2, E (X2 ) = −1, and E (X3 )=0. Compute
E (X12 (X2 + 3X3 )2 ).
There is not enough information! Also assume E (X12 ) = 3,
E (X22 ) = 1, and E (X32 ) = 2.
Facts about Var (X ):
Var (X ) = 0 means the same as: there is a c such that
Pr (X = c) = 1.
σ 2 (X ) = E (X 2 ) − E (X )2 (alternative definition)
σ 2 (aX + b) = a2 σ 2 (X ). Proof: σ 2 (aX + b) =
E [(aX +b−(aµ+b))2 ] = E [(aX −aµ)2 ] = a2 E [(X −µ)2 ] = a2 σ 2 (X ).
For independent X1 , X2 , X3 , ..., Xn
σ 2 (X1 +X2 +X3 +...+Xn ) = σ 2 (X1 )+σ 2 (X2 )+σ 2 (X3 )+...+σ 2 (Xn ).
Christopher Croke Calculus 115
Properties of Var (X )
Note that the last statement tells us that
σ 2 (a1 X1 + a2 X2 + a3 X3 + ... + an Xn ) =
= a12 σ 2 (X1 ) + a22 σ 2 (X2 ) + a32 σ 2 (X3 ) + ... + an2 σ 2 (Xn ).
Christopher Croke Calculus 115
Properties of Var (X )
Note that the last statement tells us that
σ 2 (a1 X1 + a2 X2 + a3 X3 + ... + an Xn ) =
= a12 σ 2 (X1 ) + a22 σ 2 (X2 ) + a32 σ 2 (X3 ) + ... + an2 σ 2 (Xn ).
Now we can compute the variance of the binomial distribution with
parameters n and p.
Christopher Croke Calculus 115
Properties of Var (X )
Note that the last statement tells us that
σ 2 (a1 X1 + a2 X2 + a3 X3 + ... + an Xn ) =
= a12 σ 2 (X1 ) + a22 σ 2 (X2 ) + a32 σ 2 (X3 ) + ... + an2 σ 2 (Xn ).
Now we can compute the variance of the binomial distribution with
parameters n and p. As before X = X1 + X2 + ... + Xn where the
Xi are independent with Pr (Xi = 0) = q and Pr (Xi = 1) = p.
Christopher Croke Calculus 115
Properties of Var (X )
Note that the last statement tells us that
σ 2 (a1 X1 + a2 X2 + a3 X3 + ... + an Xn ) =
= a12 σ 2 (X1 ) + a22 σ 2 (X2 ) + a32 σ 2 (X3 ) + ... + an2 σ 2 (Xn ).
Now we can compute the variance of the binomial distribution with
parameters n and p. As before X = X1 + X2 + ... + Xn where the
Xi are independent with Pr (Xi = 0) = q and Pr (Xi = 1) = p.
So µ(Xi ) = p
Christopher Croke Calculus 115
Properties of Var (X )
Note that the last statement tells us that
σ 2 (a1 X1 + a2 X2 + a3 X3 + ... + an Xn ) =
= a12 σ 2 (X1 ) + a22 σ 2 (X2 ) + a32 σ 2 (X3 ) + ... + an2 σ 2 (Xn ).
Now we can compute the variance of the binomial distribution with
parameters n and p. As before X = X1 + X2 + ... + Xn where the
Xi are independent with Pr (Xi = 0) = q and Pr (Xi = 1) = p.
So µ(Xi ) = p and
σ 2 (Xi ) = E (Xi2 ) − µ(Xi )2 =
Christopher Croke Calculus 115
Properties of Var (X )
Note that the last statement tells us that
σ 2 (a1 X1 + a2 X2 + a3 X3 + ... + an Xn ) =
= a12 σ 2 (X1 ) + a22 σ 2 (X2 ) + a32 σ 2 (X3 ) + ... + an2 σ 2 (Xn ).
Now we can compute the variance of the binomial distribution with
parameters n and p. As before X = X1 + X2 + ... + Xn where the
Xi are independent with Pr (Xi = 0) = q and Pr (Xi = 1) = p.
So µ(Xi ) = p and
σ 2 (Xi ) = E (Xi2 ) − µ(Xi )2 = p − p 2 = p(1 − p) = pq.
Christopher Croke Calculus 115
Properties of Var (X )
Note that the last statement tells us that
σ 2 (a1 X1 + a2 X2 + a3 X3 + ... + an Xn ) =
= a12 σ 2 (X1 ) + a22 σ 2 (X2 ) + a32 σ 2 (X3 ) + ... + an2 σ 2 (Xn ).
Now we can compute the variance of the binomial distribution with
parameters n and p. As before X = X1 + X2 + ... + Xn where the
Xi are independent with Pr (Xi = 0) = q and Pr (Xi = 1) = p.
So µ(Xi ) = p and
σ 2 (Xi ) = E (Xi2 ) − µ(Xi )2 = p − p 2 = p(1 − p) = pq.
Thus σ 2 (X ) = Σσ 2 (Xi ) = npq.
Christopher Croke Calculus 115
Properties of Var (X )
Consider our random variable Z which is the sum of the
coordinates of a point randomly chosen from [0, 1] × [0, 1].
Christopher Croke Calculus 115
Properties of Var (X )
Consider our random variable Z which is the sum of the
coordinates of a point randomly chosen from [0, 1] × [0, 1].
Z = X + Y where X and Y both represent choosing a point
randomly from [0, 1].
Christopher Croke Calculus 115
Properties of Var (X )
Consider our random variable Z which is the sum of the
coordinates of a point randomly chosen from [0, 1] × [0, 1].
Z = X + Y where X and Y both represent choosing a point
randomly from [0, 1]. They are independent and last time we
1
showed σ 2 (X ) = 12 .
Christopher Croke Calculus 115
Properties of Var (X )
Consider our random variable Z which is the sum of the
coordinates of a point randomly chosen from [0, 1] × [0, 1].
Z = X + Y where X and Y both represent choosing a point
randomly from [0, 1]. They are independent and last time we
1
showed σ 2 (X ) = 12 . So σ 2 (Z ) = 16 .
Christopher Croke Calculus 115
Properties of Var (X )
Consider our random variable Z which is the sum of the
coordinates of a point randomly chosen from [0, 1] × [0, 1].
Z = X + Y where X and Y both represent choosing a point
randomly from [0, 1]. They are independent and last time we
1
showed σ 2 (X ) = 12 . So σ 2 (Z ) = 16 .
What is E (XY )?
Christopher Croke Calculus 115
Properties of Var (X )
Consider our random variable Z which is the sum of the
coordinates of a point randomly chosen from [0, 1] × [0, 1].
Z = X + Y where X and Y both represent choosing a point
randomly from [0, 1]. They are independent and last time we
1
showed σ 2 (X ) = 12 . So σ 2 (Z ) = 16 .
What is E (XY )? They are independent so
E (XY ) = E (X )E (Y ) = 12 · 12 = 41 .
Christopher Croke Calculus 115
Properties of Var (X )
Consider our random variable Z which is the sum of the
coordinates of a point randomly chosen from [0, 1] × [0, 1].
Z = X + Y where X and Y both represent choosing a point
randomly from [0, 1]. They are independent and last time we
1
showed σ 2 (X ) = 12 . So σ 2 (Z ) = 16 .
What is E (XY )? They are independent so
E (XY ) = E (X )E (Y ) = 12 · 12 = 41 .
(σ 2 (XY ) is more complicated.)
Christopher Croke Calculus 115
Pn Pn
Why is E ( i=1 Xi ) = i=1 E (Xi ) even when not independent?
Christopher Croke Calculus 115
Pn Pn
Why is E ( i=1 Xi ) = i=1 E (Xi ) even when not independent?
Xn Z Z Z
E( Xi ) = ... (x1 +x2 +...+xn )f (x1 , x2 , ..., xn )dx1 dx2 ...dxn
i=1
Christopher Croke Calculus 115
Pn Pn
Why is E ( i=1 Xi ) = i=1 E (Xi ) even when not independent?
Xn Z Z Z
E( Xi ) = ... (x1 +x2 +...+xn )f (x1 , x2 , ..., xn )dx1 dx2 ...dxn
i=1
n Z Z
X Z
= ... xi f (x1 , x2 , ..., xn )dx1 dx2 ...dxn .
i=1
Christopher Croke Calculus 115
Pn Pn
Why is E ( i=1 Xi ) = i=1 E (Xi ) even when not independent?
Xn Z Z Z
E( Xi ) = ... (x1 +x2 +...+xn )f (x1 , x2 , ..., xn )dx1 dx2 ...dxn
i=1
n Z Z
X Z
= ... xi f (x1 , x2 , ..., xn )dx1 dx2 ...dxn .
i=1
n Z
X n
X
= xi fi (xi )dxi = E (Xi ).
i=1 i=1
Christopher Croke Calculus 115