Econ 140 (Spring 2018) - Section 1
∗
GSIs: Caroline, Chris, Jimmy, Kaushiki, and Leah
1 Random Variable (RV)
Denition: A random variable captures randomness in life. It is the numerical representation of random
events. Informally, a RV X is a function that assigns a real valued number x = X(ξ) to each outcome in the
Sample Space S.
Types of random variable:
• Discrete: takes only a nite (more precisely: countable) number of values on the real line ; Ex: tossing
a coin, rolling a dice, ...
• Continuous: takes a continuum of values on the real line ; Ex: Normal distribution, Uniform distribu-
tion, ...
1.1 Continuous Random Variables
• Probability distribution: for a continuous variable it can be described by the probability density function
(p.d.f.). The area under the p.d.f. between any two points is the probability that the random variable
falls between those two points (SW p.18)
´b ´
(i)
a
f (x)dx = P (a ≤ X ≤ b) for all a, b, and (ii)
R
f (x) = 1.
1.2 Discrete Random Variables
For a discrete random variable we list all possible outcomes of the random variable with their associated
probabilities [P (X = x)]; e.g.: If I toss a fair coin and X is the random variable describing the outcome (1
for Heads, 0 for Tail), then the probability distribution of X is {P (X = 1) = 21 ; P (X = 0) = 12 }
Both discrete and continuous random variables can be described by the
• Cumulative distribution function (c.d.f.), which cumulates the total probability until a certain value of
the random variable: F (x) = P (X ≤ x); e.g.: For the fair coin example, the cdf of X is {P (X ≤ 0) =
1
2; P (X ≤ 1) = 1}.
1.3 Functions of Random Variables
Probability distributions are often summarized or described in terms of their means and variances, which in
their turn are dened in terms of the expectation operator E.
• Mean, or expected value, of X:
For a discrete RVX that can take N values (x1 , ..., xN ): µX = E(X) = p1 x1 + p2 x2 + ... + pN xN =
N
P
pi xi
i=1
∗ We thank previous GSIs for the great section material they built over time. These section notes are heavily based on their
previous work.
1
´
For continuous RVs: µX = E(X) = xf (x)dx
• Variance of X:
h N
i P
For discrete RVs:
2
σX
2
= V ar(X) = E (X − µX ) = pi (xi − µX )
2
i=1
´
2
For continuous RVs: σX = V ar(X) = (x − µX ) f (x)dx
2
p
the positive square root of the variance is called standard deviation σX = V ar(X)
h i
2 2
Useful fact: σX = E (X − E(X)) = E(X 2 ) − [E(X)]2 (Why? Use expectation operator's properties
listed below to derive that result) ; In informal words, variance is the expectation of the square minus
the square of the expectation.
N P
P K
• Covariance: σXY = Cov(X, Y ) = E [(X − µX ) (Y − µY )] = (xj − µX ) (yi − µY ) P (X = xj , Y =
i=1 j=1
yi )
Useful fact: σXY = E [(X − E(X)) (Y − E(Y ))] = E(XY ) − E(X)E(Y ) ; In informal words, the covariance
is the expectation of the product minus the product of expectations.
Cov(X,Y ) σXY
• Correlation: ρ = Corr(X, Y ) = √ = σX σY . It is always between -1 and 1. Two useful
V ar(X)V ar(Y )
results:
Cov(a + bX, Y ) = bCov(X, Y )
E(XY ) = Cov(X, Y ) + E(X)E(Y )
Properties of the expectation, variance, and covariance operators:
• E(aY + bX + c) = aE(Y ) + bE(X) + c
• V ar(aY + bX + c) = a2 V ar(Y ) + b2 V ar(X) + 2abCov(Y, X)
• Cov(a + bX, c + dY ) = bdCov(X, Y )
Corr(X, Y ), if b>0 and d>0 or b<0 and d<0
• Corr(a + bX, c + dY ) =
−Corr(X, Y ), if b>0 and d<0 or b>0 and d<0
2 Joint, marginal and conditional distributions
• Joint probability distribution: probability that two RVs, X and Y, simultaneously take a pair of values
x and y, that isP (X = x, Y = y)
PN
• Marginal probability distribution: P (Y = y) = i=1 P (X = xi , Y = y)
P (X=x,Y =y) P (X=x|Y =y).P (Y =y)
• Conditional distribution of Y given X=x : P (Y = y|X = x) = P (X=x) = P (X=x)
PN
Conditional expectation: E(Y |X = x) = i=1 yi P (Y = yi |X = x)
PN 2
Conditional variance: V ar(Y |X = x) = i=1 [yi − E(Y |X = x)] P (Y = yi |X = x)
Law of iterated expectations: E[Y ] = EX [EY [Y |X]] ; The mean of Y is the weighted average
of the conditional distribution of Y given X , weighted by the probability distribution of X .
(SW p. 29) Alternatively, for discrete random variables, it can be rewritten as follows : E(Y ) =
N
P
E(Y |X = xi )P (X = xi )
i=1
In plain words, when you are taking the expected value of Y, you are averaging the values of
2
Y weighting them by their probability of occurring. This is exactly the same as calculating the
expected value of Y for each level of X and then averaging them using the probability of each
level of X of occurring as weights. Intuitively, if you want to compute the mean height of the US
population you could compute it the standard way (left hand side) or you could compute it for
males and female separately and then averaging these two values using the proportion of male and
female population as weights (right hand side) and these two procedure would give you the very
same result.
• Independence: X andY are independent if the realization of one does not aect the realization of
the other. Formally, X ⊥ Y if and only if P (Y = y, X = x) = P (Y = y).P (X = x) for all
possible values y, x that Y and X can take ; or equivalently P (Y = y|X = x) = P (Y = y) and
P (X = x|Y = y) = P (X = x) for all possible values y, x that Y and X can take. If X and
Y are independant, then they are also mean independant: E(XY ) = E(X)E(Y ). It follows that
cov(X, Y ) = E(XY ) − E(X)E(Y ) = 0 and then X and Y are uncorrelated, i.e corr(X, Y ) = 0. (Be
careful, the converse is not true! Uncorrelation does not imply independance)
3
3 Practice problems
3.1 Stock and Watson, 2.2
Here is the joint distribution of weather conditions and commuting times:
Rain (X=0) No rain (X=1) Total
Long commute (Y=0) 0.15 0.07 0.22
Short commute (Y=1) 0.15 0.63 0.78
Total 0.30 0.70 1
2
Compute (a) E(Y ) and E(X) ; (b) σX and σY2 ; (c) σXY and corr(X, Y )
3.2 Stock and Watson, 2.6
Here is the joint distribution of employment status and college graduation:
Unemployed (Y=0) Employed (Y=1) Total
Non college grads (X=0) 0.053 0.586 0.639
College grads (X=1) 0.015 0.346 0.361
Total 0.068 0.932 1
(a) Compute E(Y )
(b) The unemployment rate is the fraction of the labor force that is unemployed. Show that it is given
by 1 − E(Y )
(c) Calculate E(Y |X = 0) and E(Y |X = 1)
4
(d) Calculate the unemployment rate for college graduates and non college graduates
(e) A randomly selected individual of this population reports being unemployed. What is the probability
that this worker is a college graduate? A non college graduate?
(f ) Are educational achievement and employment status independent?
3.3 Stock and Watson, 2.9
Here is the joint distribution of X and Y:
Y=14 Y=22 Y=30 Y=40 Y=65
X=1 0.02 0.05 0.10 0.03 0.01
X=5 0.17 0.15 0.05 0.02 0.01
X=8 0.02 0.03 0.15 0.10 0.09
(a) Calculate the probability distribution, mean and variance of Y.
(b) Calculate the probability distribution, mean and variance of Y given X=8
(c) Calculate the covariance between X and Y.
5
4 Solutions
4.1 Stock and Watson, 2.2
The table gives us:
• The joint distribution of X and Y : {P (X = 0, Y = 0) = 0.15; P (X = 1, Y = 0) = 0.07; P (X = 0, Y =
1) = 0.15; P (X = 1, Y = 1) = 0.63}
• The marginal distribution of Y :{P (Y = 0) = 0.22; P (Y = 1) = 0.78}
• The marginal distribution of X : {P (X = 0) = 0.30; P (X = 1) = 0.70}
X and Y are Bernoulli distributed: X ∼ Bernouilli(pX ) and Y ∼ Bernouilli(pY ) where pX = P (X = 1) =
0.70 and pY = P (Y = 1) = 0.78
(a) E(Y ) = P (Y = 1) × 1 + P (Y = 0) × 0 = P (Y = 1) = pY = 0.78
E(X) = P (X = 1) × 1 + P (X = 0) × 0 = P (X = 1) = pX = 0.70
This is a useful feature of Bernouilli distributions: if Z ∼ Bernouilli(p) then E(Z) = p
2
(b)σX = E[(X − E(X))2 ] = P (X = 1) × [1 − E(X)]2 + P (X = 0) × [0 − E(X)]2
= pX (1 − pX )2 + (1 − pX )(−pX )2 = pX (1 − pX )[(1 − pX ) + pX ) = pX (1 − pX )
= 0.70 × (1 − 0.70) = 0.21
σY2 = E[(Y − E(Y ))2 ] = pY (1 − pY ) = 0.78 × (1 − 0.78) ≈ 0.18
2
This is another useful feature of Bernouilli distributions: if Z ∼ Bernouilli(p) then σZ = p(1 − p)
(c) σXY = E[(X − E(X))(Y − E(Y ))]
= P (Y = X = 0, Y = 0) × (0 − pX )(0 − pY ) + P (X = 0, Y = 1) × (0 − pX )(1 − pY )
+P (X = 1, Y = 0) × (1 − pX )(0 − pY ) + P (X = 1, Y = 1) × (1 − pX )(1 − pY )
Then we can plug in the numbers to nd the nal answer. Another (less tedious) way to do it could be
the following:
σXY = E(XY ) − E(X)E(Y ) = P (X = 1, Y = 1) × 1 − pX pY (do you see why?)
= 0.63 − 0.70 × 0.78 = 0.084
σXY
corr(X, Y ) = σX σY = √σXY
2
√ 0.084
0.21×0.18
= 0.43
σX σy2
4.2 Stock and Watson, 2.6
(a) Y is Bernouilli distributed with pY = P (Y = 1) = 0.932 so E(Y ) = pY = 0.932
(b) The fraction of the population that is unemployed is given by P (Y = 0) so the unemployment rate is:
P (Y = 0) = 1 − P (Y = 1) = 1 − pY = 1 − E(Y )
(c) E(Y |X = 0) = P (Y = 1|X = 0) × 1 + P (Y = 0|X = 0) × 0 = P (Y = 1|X = 0)
P (Y =1,X=0)
Using the denition of conditional probabilities:E(Y |X = 0) = P (Y = 1|X = 0) = P (X=0) =
0.586
0.639 = 0.917
P (Y =1,X=1) 0.346
E(Y |X = 1) = P (Y = 1|X = 1) = P (X=1) = 0.361 = 0.958
(d) The unemployment rate for college graduate is P (Y = 0|X = 1) = 1 − E(Y |X = 1) = 0.042 or 4.2%
The unemployment rate for non college graduates is P (Y = 0|X = 0) = 1 − E(Y |X = 0) = 0.083 or 8.3%
(e) The probability that a random unemployed worker is a college graduate is:
P (Y =0,X=1) 0.015
P (X = 1|Y = 0) = P (Y =0) = 0.068 = 0.221
6
The probability that a random unemployed worker is a non college graduate is:
P (X = 0|Y = 0) = P (YP =0,X=0)
(Y =0) = 0.053
0.068 = 0.779
or simply P (X = 0|Y = 0) = 1 − P (X = 1|Y = 0) = 1 − 0.221 = 0.779
(f ) No. There are dierent ways to check that:
• P (X = 1|Y = 0) = 0.221 6= P (X = 1) = 0.361 so P (X = x|Y = y) 6= P (X = x) ; X and Y are not
independant.
• σXY = E(XY ) − E(X)E(Y ) = P (Y = 1, X = 1) − pX pY = 0.346 − 0.932 × 0.361 ≈ 0.01 6= 0
σXY
therefore corr(X, Y ) = σX σY 6= 0; X and Y are correlated, so they cannot be independant.
4.3 Stock and Watson, 2.9
First, let's dene X = {1; 5; 8} and Y = {14; 22; 30; 40; 65}. Those are the sets of values that X and Y (resp.)
can take.
P
(a) The probability distribution of Y is P (Y = y) = P (Y = y, X = x) for each yY
xX
For instance: P (Y = 14) = P (Y = 14, X = 1) + P (Y = 14, X = 5) + P (Y = 14, X = 8) = 0.21
To nd the full probability distribution of Y , we need to compute as many probabilities as the number
of values Y can take, that is 5 probabilities. They must add up to 1.
Y=14 Y=22 Y=30 Y=40 Y=65 Prob Distr of X
X=1 0.02 0.05 0.10 0.03 0.01 0.21
X=5 0.17 0.15 0.05 0.02 0.01 0.40
X=8 0.02 0.03 0.15 0.10 0.09 0.39
Prob Distr of Y 0.21 0.23 0.30 0.15 0.11 1
P
The mean of Y is E(Y ) = yP (Y = y) = 14×P (Y = 14)+22×P (Y = 22)+30×P (Y = 30)+40×P (Y =
yY
40) + 65 × P (Y = 65)
= 14 × 0.21 + 22 × 0.23 + 30 × 0.30 + 40 × 0.15 + 65 × 0.11 = 30.15
The variance of Y can be calculated as V ar(Y ) = E(Y 2 ) − E(Y )2
E(Y ) = 14 × 0.21 + 222 × 0.23 + 302 × 0.30 + 402 × 0.15 + 652 × 0.11 = 1127.23
2 2
2 2 2
So V ar(Y ) = E(Y ) − E(Y ) = 1127.23 − (30.15) = 218.21
P (Y =y,X=8)
(b) The probability distribution of Y given X = 8 is P (Y = y|X = 8) = P (X=8)
P
where P (X = 8) = P (Y = y, X = 8) = 0.39
yY
P (Y =14,X=8)
For instance: P (Y = 14|X = 8) =
P (X=8) = 0.02
0.39 = 0.0005
Here again we need to compute 5 probabilities (that must add up to 1).
P(Y=14 | X=8) P(Y=22 | X=8) P(Y=30 | X=8) P(Y=40 | X=8) P(Y=65 | X=8)
0.02/0.39 0.03/0.39 0.15/0.39 0.10/0.39 0.09/0.39
P
The mean of Y given X=8 is E(Y |X = 8) = yP (Y = y|X = 8) = E(Y |X = 8) = 14 × (0.02/0.39) +
yY
22 × (0.03/0.39) + 30 × (0.15/0.39) + 40 × (0.10/0.39) + 65 × (0.09/0.39) = 39.21
The variance of Y given X = 8 can be calculated as V ar(Y |X = 8) = E((Y |X = 8)2 ) − E(Y |X = 8)2
E((Y |X = 8) ) = 142 × (0.02/0.39) + 222 × (0.03/0.39) + 302 × (0.15/0.39) + 402 × (0.10/0.39) + 652 ×
2
(0.09/0.39) = 1778.7 So V ar(Y |X = 8) = 1778.7 − 39.212 = 241.65
(c) σXY = E(XY ) − E(X)E(Y )
E(XY ) = 1 × 14 × 0.02 + 1 × 22 × 0.05 + 1 × 40 × 0.03 + 1 × 65 × 0.01 + ... + 8 × 65 × 0.09 = 171.7 So
σXY = E(XY ) − E(X)E(Y ) = 171.7 − 5.33 × 30.15 = 11
corr(X, Y ) = Cov(X, Y )/(σX σY ) = 11/(2.6 × 14.77) = 0.286