>
PROPERTIES OF EXPECTATION
Contents
1 Introduction 5 Conditional Expectation
2. Expectation of Sums of Random 6 Conditional Expectation and
Variables Prediction
3. Moments of the Number of Events that 7 Moment Generating Functions
Occur 8 Additional Properties of Normal Random
4 Covariance, Variance of Sums, and Variables
|
Correlations
Introduction
9 General Definition of Expectation
In this chapter, we develop and exploit additional properties of expected values. The
expected value of the random variable X is defined by
FIX] = xp)
where X isa discrete random variable with probability mass function p(x), and by
aux= [7 xfards
when X is a continuous random variable with probability density function f(x).
Since E[X/ is a weighted average of the possible values of X, it follows that if X
must lie between a and 6, then so must its expected value. That is, if
PlasXsbj=1
then
a= 5X] =
‘To verify the preceding statement, suppose that X is a discrete random variable for
which Pla = X = b) = 1. Since this implies that p(x) = 0 for all x outside of the
interval [a, b), it follows that
EIX]= D> we)
rH~0
= DO oe
xp@ro
=a DY p@®
peo
From Chapter 7 of A First Course in Probability, Ninth Edition. Sheldon Ross.
Copyright © 2014 by Pearson Education, Inc. All rights reserved.Properties of Expectation
In the same manner, it can be shown that E[X] = b, so the result follows for discrete
random variables. As the proof in the continuous case is similar, the result follows,
2 Expectation of Sums of Random Variables
Example
2a
For a two-dimensional analog of propositions that give the computational formulas
for the expected value of a function of a random variable, suppose that X and ¥ are
random variables and g is a function of two variables. Then we have the following
result,
If X and ¥ have a joint probability mass function p(x,y), then
ElgX.¥)] =) Dey»)
yor
ICX and ¥ have a joint probability density function (x,y), then
aieanvnl= [f° sere aray
Let us give a proof of Proposition 2.1 when the random variables X and ¥ are
jointly continuous with joint density function f(x,y) and when g(X, Y) is a nonnega-
tive random variable. Because g(X, ¥) = 0, we have
etacx. y= ["Pigex.y) > nat
Writing
Pig(X.Y) > a=f ff satynst never
tonetey>t
shows that
Interchanging the order of integration gives
109)
ewcrnl= ff [Eso aray as
Ie Jy Jiao
-f[ f B0xy) fla.y)dy dr
‘Thus, the result is proven when g(X, ¥) is a nonnegative random variable. The gen-
«eral case then follows as in the one-dimensional case.
‘An accident occurs at a point X' that is uniformly distributed on a road of length L.
‘At the time of the accident, an ambulance is at a location ¥ that is also uniformly
distributed on the road. Assuming that X and Y are independent, find the expected
distance between the ambulance and the point of the accident.
Solution We need to compute E[|X — ¥/]. Since the joint density function of X and
Yis
Soy)
ap O
307308
Example
2g
Example
Properties of Expectation
Mean of a hypergeometric random variable
i mballs are randomly selected from an urn containing N balls of which m are white,
find the expected number of white balls selected.
Solution Let X denote the number of white balls selected, and represent X as
XEN to + Xm
where
11 ifthe ith white balls selected
0 otherwise
Now
AUX) = PU = 1)
Pith white ball is selected}
1\(w~1
1)(n—1
( A)
n
N
Hence,
BLK] = LX) + + EX) =
We could also have obtained the preceding result by using the alternative represen-
tation
XaY tot Yn
where
y, = |1__ ifthe ith ball selected is white
= 10 otherwise
Since the ith ball selected is equally likely to be any of the N balls, it follows that
aYl=%
so mm
BUX] = EY) + + EY] = 5 .
Expected number of matches
‘Suppose that N people throw their hats into the center of a room. The hats are mixed
up, and each person randomly selects one. Find the expected number of people who
select their own hat.
Solution Letting X denote the number of matches, we can compute E[X] most eas-
ily by writing
XeX eX tet XyExample
Properties of Expectation
where
1 if the ith person selects his own hat
0 otherwise
Since, for each i, the ith person is equally likely to select any of the N hats,
BUX] = PX = 1)
Thus,
EIX] = E[Xi) + + + ELXy] R)N=
Hence, onthe average, exactly one person selects his own hat .
Coupon-collecting problems
‘Suppose that there are N types of coupons, and each time one obtains a coupon, it
is equally likely to be any one of the NV types. Find the expected number of coupons
‘one needs to amass before obtaining a complete set of at least one of each type.
Solution Let X denote the number of coupons collected before a complete set is
attained, We compute E[X] by using the same technique we used in computing the
mean of a negative binomial random variable (Example 2f). That is, we define X;,i =
0,1,...,N — 1 to be the number of additional coupons that need be obtained after
i distinct types have been collected in order to obtain another distinct type, and we
note that
XaXy$ Mt Xa
‘When {distinct types of coupons have already been collected, a new coupon obtained
will be of a distinct type with probability (N’ — /N. Therefore,
me
ww) =
or, in other words, X; is a geometric random variable with parameter (N — i)/N.
Hence,
PUG =k)
axl=
implying that
Ten hunters are waiting for ducks to fly by. When a flock of ducks flies overhead, the
hunters fire at the same time, but each chooses his target at random, independently
of the others. If each hunter independently hits his target with probability p, com-
pute the expected number of ducks that escape unhurt when a flock of size 10 flies
overhead.310
Example
Properties of Expectation
Solution Let X; equal 1 if the ith duck escapes unhurt and 0 otherwise, for i
2,..2,10. The expected number of ducks to escape can be expressed as
LX, + ++ + Xo) = ELL) + + + ELMol
‘Tocompute £[Xi] = P(X; = 1), we note that each of the hunters will, independently,
hit the ith duck with probability p/10, so
px= n= (1 -4)
Hence,
Expected number of runs
Suppose that a sequence of n 1’s and m 0's is randomly permuted so that each of the
(x + m)!/(nlml) possible arrangements is equally likely. Any consecutive string of
1'sis said to constitute a run of 1's—for instance, ifn = 6,m = 4, and the ordering
is 1, 1,1,0, 1,1, 0, 0,1, 0, then there are 3 runs of 1's—and we are interested in
computing the mean number of such runs. To compute this quantity, let
1 ifarun of 1'sstarts at the ith position.
T= 10 otherwise
‘Therefore, R(1), the number of runs of 1, can be expressed as
nim
R= Yh
and it follows that
nee
FUR) = 5 Ud
Now,
E{h] = P(*1" in position 1)
atm
andforl X;,3> Yi},
it follows that D?, the square of the distance from the origin, is given by
»-(5) -(&)
=Sa@? + YD + PD Kx + YY)
a i
=n + CL eosoieng + sinoysing
‘i
ant312
Example
2m
Properties of Expectation
where cos? 6; + sin®@; = 1. Taking expectations and using the independence of 6
and 6; when i # j and the fact that
an
2nEfcos6i] =f cosudu = sin2x — sind =0
N
an
2n [sind] f sinudu = cos0 — cos2x =0
lo
we arrive at
ig
Analyzing the qu
sort algorithm
Suppose that we are presented with a set of n distinct values x1,*2,-.-,%» and that
we desire to put them in increasing order, or as it is commonly stated, to sort them.
Anefficient procedure for accomplishing this task is the quick-sort algorithm, which
is defined as follows: When n = 2, the algorithm compares the two values and then
puts them in the appropriate order. When n > 2, one of the elements is randomly
chosen—say it is x/—and then all of the other values are compared with xj. Those
smaller than x; are put in a bracket to the left of x; and those larger than x; are put in
a bracket to the right of x The algorithm then repeats itself on these brackets and,
continues until all values have been sorted, For instance, suppose that we desire to
sort the following 10 distinct values:
5,9,3,10,11,14,8,4,17,6
We start by choosing one of them at random (that is, each value has probability 4, of
being chosen). Suppose, for instance, that the value 10 is chosen. We then compare
‘each ofthe others to this value, putting ina bracket to the left of 10 all those values
smaller than 10 and to the right all those larger. This gives
15,9,3,8,4,6),10, (11,14,17]
‘We now focus on a bracketed set that contains more than a single value—say the one
on the left of the preceding—and randomly choose one of its values—say that 6 is,
chosen. Comparing each of the values in the bracket with 6 and putting the smaller
ones in a new bracket to the left of 6 and the larger ones in a bracket to the right
of 6 gives
15,3,4),6,(9,8),10, (11, 14,17),
If we now consider the leftmost bracket, and randomly choose the value 4 for com-
parison, then the next iteration yields
(3),4,{5),6, (9,8), 10, (11, 14,17)
‘This continues until there is no bracketed set that contains more than a single value.
If we let X denote the number of comparisons that it takes the quick-sort algo-
rithm to sort n distinct numbers, then E[X] is a measure of the effectiveness of this
algorithm. To compute E[X], we will first express X as a sum of other random vari-
ables as follows. To begin, give the following names to the values that are to be
sorted: Let 1 stand for the smallest, let 2 stand for the next smallest, and so on. Then,
for 1 =i < j =n, let [(i,{) equal 1 if and j are ever directly compared, and let it
equal 0 otherwise, With this definition, it follows thatProperties of Expectation
mtn
x-D yp
is
implying that
ie
axy=2/3 Oran
iam
et
=D DY Ales)
i
an
=D Piiand jare ever compared}
im
To determine the probability that i and j are ever compared, note that the values
ii + Ly... — 1,j will initially be in the same bracket (since all values are initially
in the same bracket) and will remain in the same bracket if the number chosen for
the first comparison is not between i and j. For instance, if the comparison number is,
larger than j, then all the values ,i + 1,...,j — 1, will go in a bracket to the left of
the comparison number, and if tis smaller than i, then they will all go in a bracket,
to the right. Thus all the values i,i + 1,...,j ~ 1,/ will remain in the same bracket
‘until the first time that one of them is chosen as a comparison value. At that point all
the other values between i and j will be compared with this comparison value. Now,
if this comparison value is neither i nor j, then upon comparison with it, i will go into
a left bracket and jinto a right bracket, and thus i and j will be in different brackets,
and so will never be compared. On the other hand, if the comparison value of the
setii + 1,...,j — Ljiseither i or j, then there will be a direct comparison between
i and j. Now, given that the comparison value is one of the values between i and j,
it follows that it is equally likely to be any of these j — i + 1 values, and thus the
probability that itis either ‘or jis 27 — i + 1). Therefore, we can conclude that
P{i and j are ever compared)
and
aneES
‘To obtain a rough approximation of the magnitude of E[X] when m is large, we can
approximate the sums by integrals. Now
2 ™ 2
Ye ysrat* Lyeaat*
=2logte - 1+ DI,
= 2login — i + 1) — 2logt2)
Dlogin — i + 1)
313Properties of Expectation
‘Thus
not
E[X] = Yo 2login - i + 1)
a
22 [roger — x + Da
h
=2 f logyndy
= 2ylogy) — yz
~ Inlogin)
‘Thus we see that when 7 is large, the quick-sort algorithm requires, on average,
approximately 2n log(n) comparisons to sort n distinct values. .
Example The probability of a union of events
Let A1,...An denote events, and define the indicator variables X;,i = 1,...,m, by
Now, note that
a 1 if U Aj occurs
Hence,
Expanding the left side of the preceding formula yields
(Ua) =E mn - Lyx + VY Kx
tg ‘See
a eae, x] ox
However,
1 if Aj Ais Ay occurs
XX Xa
0 otherwise
so
BUX + Xig) = Pi)
‘Thus, Equation (2.3) is just a statement of the well-known formula for the union of
events:
PUA) = PAD — PAA) + DY Pian)
7
13 ‘spe
So DM IPCAL An) .
314Properties of Expectation
‘When one is dealing with an
each having a finite expectation,
E [& x| 5 Lae e4)
finite collection of random variables Xi,i = 1,
not necessarily true that
‘To determine when (2.4) is valid, we note that 5° Xi
|
(25)
Hence, Equation (2.4) is valid whenever we are justified in interchanging the expec-
tation and limit operations in Equation (2.5). Although, in general, this interchange
is not justified, it can be shown to be valid in two important special cases:
1. The X; are all nonnegative random variables. (That is, P(X; = 0) = 1 for all i.
2. BLIXil] < 00.
Consider any nonnegative, integer-valued random variable X. If, for each i = 1, we
define
1 itxed
Malo tx ) games played, and let A denote the set of all 2\°/ pos-
sible tournament outcomes. Then, with f(s) defined as the number of Hamiltonian
paths that result when the outcome is s € A, we are asked to show that
max f(s) =
To show tis consider the randomly chosen outcome $ that is obtained when the
results ofthe () games ar independent, with each contestant being equally likely
to win each encounter. To determine E[f(S)], the expected number of Hamiltonian
paths that result from the outcome S, number the n! permutations, and, for i
Ay--synly let
x,— [1+ ifpermutation fis a Hamiltonian
= 10, otherwise
Since
£9) = OX
it follows that
BY =P ELX
Because, by the assumed independence of the outcomes of the games, the probabil-
ity that any specified permutation is a Hamiltonian is (1/2)"~, it follows that
FIX = PUG =) = ay"
Therefore,
ELf(S)] = mi/2y"
Since f(S) is not a constant random variable, the preceding equation implies that
there is an outcome of the tournament having more than n!/2"-! Hamiltonian
paths. .
A grove of 52 trees is arranged in a circular fashion. If 15 chipmunks live in these
trees, show that there is a group of 7 consecutive trees that together house at least 3
chipmunks.
Solution Let the neighborhood of a tree consist of that tree along with the next six
trees visited by moving in the clockwise direction. We want to show that for any
choice of living accommodations of the 15 chipmunks, there isa tree that has at least
317318
Proposition For arbitrary numbers x,
22
Properties of Expectation
3 chipmunks living in its neighborhood, To show this, choose @ tree at random and
let X denote the number of chipmunks that live in its neighborhood. To determine
E{X], arbitrarily number the 15 chipmunks and for i = 1,...,15,let
x, = [1+ ifchipmunk é ives in the neighborhood of the randomly chosen tree
= 0, otherwise
Because
1s
x= Dx
a
wwe obtain that
1s
EX] = AIX
a
However, because X will equal 1 ifthe randomly chosen tree is any ofthe 7 trees
consisting of the te in which chipmunk i lives along with its 6 neighboring trees
‘when moving in the counterclockwise direction,
X= == 5
Consequently,
105
BUX] = > 2
showing that there exists a tree with more than 2 chipmunks living in its neig
borhood.
"2.2. The Maximum-Minimums Identity
‘We start with an identity relating the maximum of a set of numbers to the minimums.
of the subsets of these numbers
maxi = ox — Yo minca,x) + D> ming.}.x4)
Tf ire
eee + (DT ming... yn)
Proof We will give a probabilistic proof of the proposition. To begin, assume that all
the x; are in the interval [0, 1]. Let U be a uniform (0, 1) random variable, and define
the events Ait 1n, by Ai = (U < xj). That is, A; is the event that the uniform
random variable is less than xj. Because at least one of these events Aj will occur if
U is less than at least one of the values x;, we have that
UA
[em
Therefore,
PUAD plu < maxx;
maxx;
Also,
P(A) = PIU < xi}Properties of Expectation
In addition, because all of the events Aj,,...,Ai, will occur if U is less than all the
values x,,..-,2,, We see that the intersection of these events is
Ai Ai us min vf
her
implying that
PIA Ai) fu - min. = inn
Fler
‘Thus, the proposition follows from the inclusion-exclusion formula for the probabil-
ity of the union of events:
PUA) = > P(A) = OPA) + SO PLAASAR)
ro ie
tie CDP CAL An)
When the x; are nonnegative, but not restricted to the unit interval, let c be such
that all the x; are less than c. Then the identity holds for the values yi = xi/c, and the
desired result follows by multiplying through by c. When the 2 can be negative, let
bb be such that x; + 6 > 0 for all i, Therefore, by the preceding,
max(x; + 6) = SoG + 6) — So ming; + 6,4; + 6)
‘ 7 iy
eee (DT miner + by... + BY
Letting
M = Yom — Yo min(eix) + + (CDT miner... .%9)
i
we can rewrite the foregoing identity as
But
o=a-1)"
‘The preceding two equations show that
maxx; =M
and the proposition is proven. o
It follows from Proposition 2.2 that for any random variables Xi,...,
max Xi = DXi — Y)min(X,X) + + (minX, Xn)
tf
319320
Properties of Expectation
‘Taking expectations of both sides of this equality yields the following relationship
between the expected value of the maximum and those of the partial minimums:
E [max x] = Dax) — D Almincxi,X)]
ta
to DMT minX, Xn] (27)
‘Coupon collecting with unequal probabilities,
Suppose there are n types of coupons and that each time one collects a coupon, it
is, independently of previous coupons collected, a type i coupon with probability pi,
Lpi = 1. Find the expected number of coupons one needs to collect to obtain a
Complete set of at least one of each type.
Solution If we let X; denote the number of coupons one needs to collect to obtain
a type i, then we can express X as,
X= max Xi
Because each new coupon obtained is a type i with probability p, Xj is a geometric
random variable with parameter pi. Also, because the minimum of X; and X; is the
number of coupons needed to obtain either a type i or a type j, it follows that for
i # j, min (X;, X)) is a geometric random variable with parameter p; + pj Similarly,
min (X;,Xj, Xe), the number needed to obtain any of types i,j, and k, is a geometric
random variable with parameter pi + pj + px, and soon. Therefore, the identity (2.7)
yields
1 1 1
A= Da, ~ Ln +P + Za +P + PK
1
Pt +Pn
tet (DE
Noting that
and using the identity
1-]Ja-
it
Doe = oe Pe ge yt ert sper
shows, upuutintegsating dae ideutity that
ax=[" (1 [Ja - ) a
en
is a more useful computational form, .Properties of Expectation
3 Moments of the Number of Events that Occur
Many of the examples solved in the previous section were of the following form: For
given events A1,...,An, ind E[X], where X is the number of these events that occur.
‘The solution then involved defining an indicator variable J; for event Aj such that
1, if Aj occurs
0, otherwise
Because
x=Dh
we obtained the result
E[X]=E [=] Yan = ray @.a)
int isi iat
Now suppose we are interested in the number of pairs of events that occur.
Because /if; will equal 1 if both A; and Aj occur and will equal 0 otherwise, it fol-
lows that the number of pairs is equal to }°;_, i But because X is the number of
events that occur, it also follows that the number of pairs of events that occur is (3).
Consequently,
20
@)=
where there are (3) terms in the summation. Taking expectations yields
x
£\ (5) )= Dat =D Paap G2)
i i
x(x — 1)
[22] =D Pa)
5
giving that
ELX?] - EX] =20 PAA) (3.3)
which yields E[X2), and thus Var(X) = E[X?] — (ELX)?.
Moreover, by considering the number of distinct subsets of k events that all
occur, we see that
()- YL he
‘Taking expectations gives the identity
{(D]- Yo Algtge l= YL PAyAge A
ieeeociy hehe
G4)
321Example
Example
Properties of Expectation
‘Moments of binomial random variables
Consider n independent trials, with each trial being a success with probability p. Let,
Aj be the event that trial i isa success. When i # j, P(A:A) = p?. Consequently,
gent 03)
[@]-E"-@e
ig
E[X(X = 1] =n — Dp?
or
E[X?] — EX] =n(n — typ?
Now, E[X] = D7
Var(X) = FLX"
(Ai) = np, so, from the preceding equation
— (E[X)? = n(n — 1p? + np — (npy = nptl — p)
In general, because P(Aj, Aj, --- Aq) = p*, we obtain from Equation (3.4) that
AQ]- 2G
or, equivalently,
EIX(X — 1) = k + Jan — Yen = k + Dp
‘The successive values E[X*], k = 3, can be recursively obtained from this identity
For instance, with k = 3, it yields
BIX(X = 1(X = 2)] = n(n — 1m = 2p*
BIX? — 3X? + 2X] = n(n ~ 1)(n — 2p?
ELX?| = 3X7] — 2E[X] + men — 1)(n — 2p
3n(n — 1p? + np + nin — In — 2p? .
Moments of hypergeometric random variables
‘Suppose n balls are randomly selected from an urn containing N balls, of which m
are white, Let A; be the event that the ith ball selected is white. Then X, the number
of white balls selected, is equal to the number of the events Ai,...,An that occur.
Because the ith ball selected is equally likely to be any of the N balls, of which m are
white, P(Ai) = m/N. Consequently, Equation (3.1) gives that E[X] = D7. P(A)
nm/N. Also, since
PAA) = PADPCASAD)Example
3e
Properties of Expectation
‘we obtain, from Equation (3.2), that
(O]-E8e3- Ose
or
mim — 1)
AIX(K — Di =n — Dy
showing that
BEA] =m yn = 2 + BUX)
‘This formula yields the variance of the hypergeometric, namely,
Var(X) = E[X?] — (E[X)?
enn = MD tt _ mt
NW 1) *
-me = Ym — 1)
Nw N-1
Higher moments of X are obtained by using Equation (3.4). Because
m(m = 1)---(m ~k +1)
PAwAa AW) = NOS TN ED,
Equation (3.4) yields
A () . mm — 0 (m =k +1)
“L) | TA NW =D ED,
EIX(K =) k + 1)
or
ke
ke)
= nln = Doon =k + DR
the match problem
For i = 1,...,N, let Aj be the event that person i selects his or her own hat in the
match problem. Then
A) = ANA) = bo
PUA) = PADPUAIAD = >
‘which follows because, conditional on person i selecting her own hat, the hat selected
by person j is equally likely to be any of the other N’ — 1 hats, of which one is his
own. Consequently, with X equal to the number of people who select their own hat,
1 ollows trom Equation (3.2) that
[@)|-Exa—- (aa
thus showing that
ELX(X - 0)
323324
Properties of Expectation
‘Therefore, E[X?] = 1 + E[X]. Because E[X] = DM, PrAd
= 1, we obtain that
Var(X) = E[X"] — (E[X1)
Hence, both the mean and variance of the number of matches is 1. For higher
moments, we use Equation (3.4), along with the fact that P(4,,Ai,-
N= aN=rpT to obtain
(| = (maa
EIX(X = 1) =k +H)
Another coupon-collecting problem
Suppose that there are N distinct types of coupons and that, independently of past
types collected, each new one obtained is type j with probability pj, "pj = 1
Find the expected value and variance of the number of different types of coupons
that appear among the fist n collected.
Solution We will find it more convenient to work with the number of uncollected,
types. So, let ¥ equal the number of types of coupons collected, and let X = N — Y
denote the number of uncollected types. With A; defined as the event that there are
‘no type {coupons in the collection, X is equal to the number of the events Ay,..., AN
that occur. Because the types of the successive coupons collected are independent,
and, with probability 1 — p; each new coupon is not type i, we have
P(A) = (0 — pi”
N
Mi. — po, from which it follows that
Hence, E[X] =
w
E[Y]=N - E[X]=N - oa - pa”
i
Similarly, because each of the n coupons collected is neither a type i nor a type j
coupon, with probability 1 — p; — pj, we have
PUA) =
1 pi pp, t#j
Thus
EIX(X — 0) =2)0 PAA) = 270 — pi - pp”
ig *g
or
— pi" + ELX)
Hence, we obtain
Var(¥) = Var(X)
= UX?) — (EEX)?
n N
=200 - pi - pp" + Da - pa" - | oa - po"
i = aExample
3e
Properties of Expectation
In the special case where pi = 1/N,
...N, the preceding formulas give
i
am=nfi-(0-3)]
Var(¥) = NON = v(1 = zy + w(t - ») - Pag - iy" 1.
and
‘The negative hypergeometric random variables
Suppose an urn contains n + m balls, of which n are special and m are ordinary. These
items are removed one at a time, with each new removal being equally likely to be
any of the balls that remain in the urn. The random variable ¥, equal to the number
of balls that need be withdrawn until a total of r special balls have been removed,
is said to have a negative hypergeometric distribution. The negative hypergeometric
distribution bears the same relationship to the hypergeometric distribution as the
negative binomial does to the binomial. That is, in both cases, rather than considering
a random variable equal to the number of successes in a fixed number of trials (as
are the binomial and hypergeometric variables), they refer to the number of trials
needed to obtain a fixed number of successes.
‘To obtain the probability mass function of a negative hypergeometric random
variable Y, note that ¥ will equal k if both
1. the first k — 1 withdrawals consist of r — 1 special and k ~ r ordinary balls
and
2. the kth ball withdrawn is special.
Consequently,
poy = ky = Walle) met
(it) n+m—ka1
We will not, however, ulilize the preceding probability mass function (o obtain the
mean and variance of Y. Rather, let us number the m ordinary balls as 01,...,0ms
and then, for each i = 1,...,n, let A; be the event that o; is withdrawn before r
special balls have been removed. Then, if X is the number of the events Ay,..-,Am
that occur, it follows that X is the number of ordinary balls that are withdrawn before
a total of r special balls have been removed. Consequently,
Yeor+X
showing that
BWYJ=r + EIXJ=r + SO PAD
a
To determine P(A;), consider the m + 1 balls consisting of o; along with the n special
balls. Of these m +1 balls, o; is equally likely to be the first one withdrawn, or the
second one withdrawn, ..., or the final one withdrawn. Hence, the probability that
it is among the first r of these to be selected (and so is removed before a total or r
special balls have been withdrawn) is ty. Consequently,
net
PAD =
325326
Example
af
Properties of Expectation
and
‘Thus, for instance, the expected number of cards of a well-shuffied deck that would
need to be turned over until a spade appears is 1 + }#{ = 3.786, and the expected
number of cards that would need to be turned over until an ace appears is
1+ $=106.
‘To determine Var(Y) = Var(X), we use the identity
EIX(X = 1) =20 PA)
i
Now, P(A;4)) is the probability that both o; and o are removed before there have
been a total of r special balls removed. So consider the n + 2 balls consisting of 0,0),
and the n special balls. Because all withdrawal orderings of these balls are equally
likely, the probability that 0; and oj are both among the first r + 1 of them to be
removed (and so are both removed before r special balls have been withdrawn) is
OW rr +1)
PAA) (CH) @ FDO FD
Consequently
(m rr +1)
sxe 01=9(2) aes
so
» _ rr +1)
AXP] = mom — DE Tay + AI
Because E[X] = my. this yields
re ee
Var(Y) = Var(X) = mim = DE Dare + aed (
A\little algebra now shows that
mr(n + 1 = rn + m +1)
Vary) + 1m + 2)
Singletons in the coupon collector's problem
Suppose that there are n distinct types of coupons and that, independently of past
types collected, each new one obtained is equally likely to be any of the 7 types.
Suppose also that one continues to collect coupons until a complete set of at least
fone of each type has been obtained. Find the expected value and variance of the
‘number of types for which exactly one coupon of that type is collected.
Solution Let X equal the number of types for which exactly one of that type is
collected. Also, let 7; denote the ith type of coupon to be collected, and let Ai be
the event that there is only a single type T; coupon in the complete set. Because X
is equal to the number of the events Ai,...,An that occur, we have
al=So Pan
Now, at the moment when the first type 7; coupon is collected, there remain n — i
types that need to be collected to have a complete set. Because, starting at thisProperties of Expectation
moment, each of these n — i + 1 types (the m — i not yet collected and type Ti)
18 equally likely to De the last of these types to be collected, it follows that the type
1 wil be the las ofthese types (and so willbe a singleton) with probability ;=7-
Consequently, P(A,) yielding
aT
oy
a-Si Lh
‘To determine the variance of the number of singletons, let S,, for i < j, be the event
that the first type 7; coupon to be collected is still the only one of its type to have
been collected at the moment that the first type 7; coupon has been collected. Then
PUVA) = PLAIAIS,) POS)
Now, P(Siy) is the probability that when a type 7; has just been collected, of the
n — i + 1 types consisting of type T; and the n — ias yet uncollected types, a type T;
is not among the first j — i of these types to be collected. Because type T; is equally
likely to be the first, or second, or ...,2 — i + 1 of these types to be collected,
we have
antl
“nHi-t
P(Sy)=1 - fa
Now, conditional on the event S,j, both A; and A; will occur if, at the time the first
type T; coupon is collected, of the m ~ j + 2 types consisting of types 7:, 7), and the
n — jas yet uncollected types, T; and T; are both collected after the other n — j. But
this implies that
PALAIS) =2
yet
Therefore,
PAA)
@+i-ia@t2—-p 1 *?
yielding
1
AK D=4¥ GST ae Pe
Consequently, using the previous result for E[X1], we obtain
1 :
a Lasiaomse—p * Li
al
4 Covariance, Variance of Sums, and Correlations
‘The following proposition shows that the expectation of a product of independent
random variables is equal to the product of their expectations.
Proposition If X and Y are independent. then. for any functions A and g.
4
Eig XOhY)] = ElgeOletay)]328
Proposition
42
Properties of Expectation
Proof Suppose that X and ¥ are jointly continuous wit
joint density f¢x,y). Then
secon] = J J eeororveay) dedy
= [°F ecomonneoron acay
= [-royronwy [™ seopee as
= EIAY)E[g(X)]
The proof in the discrete case is similar. a
Just as the expected value and the variance of a single random variable give
us information about that random variable, so does the covariance between two
random variables give us information about the relationship between the random
variables.
Defi
‘The covariance between X and Y, denoted by Cov (X, ¥), is defined by
Cov(X, ¥) = E[(X — FLX) (Y — ELY))
Upon expanding the right side of the preceding definition, we see that
Cov(X,¥)
EIXY = E[XJY — XE[Y] + ELYIELXI
EIXY] - ELXJELY] - ELXIELY] + ELXIELY]
EUXY] ~ ELXIE(Y]
Note that if X and Y are independent, then, by Proposition 4.1, Cov(X, ¥) = 0.
However, the converse is not true. A simple example of two dependent random
variables X and Y having zero covariance is obtained by letting X be a random
variable such that
PIX = =NaPk ==}
and defining
0 tx #0
Y=11 itx=0
Now, XY = 0, 0 E[XY] = 0. Also, E[X] = 0. Thus,
Cov(X, ¥) = EIXY] ~ ELXIE[Y] = 0
However, X and ¥ are clearly not independent.
‘The following proposition lists some of the properties of covariance,
(i) Cov(x, ¥) = Cow, x)
(ii) Cow(X,X) = Var(x)
(iii) Cov(ax, ¥) = a Cov(x, ¥)Properties of Expectation
it jet fat jot
cor (Ef) =F Scouxey)
Proof of Proposition 42 Parts (i) and (ii) follow immediately from the definition
of covariance, and part (iii) is left as an exercise for the reader. To prove part (iv),
which states that the covariance operation is additive (as is the operation of taking
expectations), let 4s = E[Xi] and vj = E[¥)). Then
and
0
&
Me
(X= wen - |
fi
=E) OO - voy) - »|
fai jot
YAK = HY) — w))
where the last equality follows because the expected value of a sum of random vari
a
ables is equal to the sum of the expected values.
It follows from parts (ii) and (iv) of Proposition 4.2, upon taking ¥) = X),
Ayn, that
wm ($x) =e (Eds)
LY cowxx)
in
= Dvarixy + OY Comix)
= 4
Since each pair of indices i,j,i # j, appears twice in the double summation, the pre-
ceding formula is equivalent to
Var («)- Lever + 2D coun. xy (aay
1g
329Example
4a
330
Properties of Expectation
It X;,-..,Xn are pairwise independent, in that Xj and Xj are independent for
1% J, then Equation (4.1) reduces to
Var (& x) =Soverex
ia
‘The following examples illustrate the use of Equation (4.1).
Let Xi:
expected value wand variance a?, and as in Example 2c, let X = $5 Xi/m be the
LX,, be independent and identically distributed random variables having
sample mean, The quantities X; — X,i = 1,...,m, are called deviations, as they
equal the differences between the individual data and the sample mean, The random
variable
yo (Xi = XP
SPT
is called the sample variance. Find (a) Var(X) and (b) ES"
ve (+)
1)? _
+) > var(xXi) by independence
Solution
(a) Var) =
a
a
g
2
=/9
(b) We start with the following algebraic
(n = 1S? = YX — w+ w - KP
i
wy? + DO - w? - 2K - WM - w
iat ia
wy? +n ~ pw)? — 20% - wn - 4)
= w)? = n& = pw)?
‘Taking expectatinns of the preceding yields
DA - wy] = nBLR - 7)
7
no® — nVari®)
=(n = Io?
(n - DES?Example
4b
Example
Properties of Expectation
where the final equality made use of part (a) of this example and the one preceding
it made use of the result of Example Zc, namely, that £[X] = jt. Dividing through
by n — 1 shows that the expected value of the sample variance is the distribution
variance 0, .
Our next example presents another method for obtaining the variance of a bino-
mial random variable.
Variance of a binomial random var
le
‘Compute the variance of a binomial random varlable 2 with parameters m and p.
Solution Since such a random variable represents the number of successes in n inde-
pendent trials when each trial has the common probability p of being a success, we
may write
KaX bot hy
where the X; are independent Bernoulli random variables such that
1 ifthe ith trial is a success ~
0 otherwise
Xa
Hence, from Equation (4.1), we obtain
Vat(X) = Var(Xs) + ++ + Var(Xn)
But
Var(Xi) = ELX?] — (ELXi)?
= FIX] — (E[Xi)? siuwe X? =
Ae
Thus,
Var(X) = np ~ p) .
Sampling from a finite population
Consider a set of N people, each af whom has an opinion about a certain
ject that is measured by a real number v that represents the person’s “strength of
feeling” about the subject. Let v; represent the strength of feeling of person i,
1..N.
Suppose that the quantities vj,i = 1,...,N, are unknown and, to gather infor-
‘mation, a group of m of the IV people is “randomly chosen” in the sense that all of the
™) subsets of size n are equally likely tobe chosen. These n people are then ques-
tioned and their feelings determined. If $ denotes the sum of the n sampled values,
cteumine ity asean au variance.
‘An important application of the preceding problem is to a forthcoming election
in which each person in the population is either for or against acertain candidate or
proposition. If we take vj to equal 1 if person ‘is in favor and 0 if he or she is against,
W
then ¥ = 5° vi/N represents the proportion of the population that is in favor. To
estimate ¥, a random sample of n people is chosen, and these people are polled.
331Properties of Expectation
‘The proportion of those polled who are in favor—that is, S/n—is often used as an
estimate of ¥.
Solution For each person i,i = 1,
whether or not that person is include:
.N, define an indicator variable J; to indicate
in the sample, That is,
1 if person ‘is in the random sample
0 otherwise
h=
Now, 5 can be expressed by
x
s= Dut
N
BIS = vet
o
‘Var($) = )> Var(vili) + 2) D> Covivili, wih)
19
”
Yvivarchy + 22D wvjCovili, hp
i 1
Because
aul=5
nn-1
BU = a
it follows that
Var(li) =
Covilist)
Hence,
332Properties of Expectation
‘The expression for Var(S) can be simplified somewhat by using the identity
N
(yy + ++: + wy = Ev? + 2 Doi. After some simplification, we obtain
7
nN = n)
Varis) = "=
Consider now the special case in which Np of the v’s are equal to 1 and the
remainder equal to 0. Then, in this case, S is a hypergeometric random variable and
hhas mean and variance given, respectively, by
E[S]=nv=np since
and
vai =H (88 — 9)
n)
“WoT Pa - P)
‘The quantity S/n, equal to the proportion of those sampled that have values equal to,
1, is such that
esl
‘The correlation of two random variables X and ¥, denoted by p(X, Y),is defined,
as long as Var(X) Var(¥) is postive, by
(X,Y)
It can be shown that
15 p(X) S1 (42)
‘To prove Equation (4.2), suppose that X and ¥ have variances given by a? and a2,
respectively. Then, on the one hand,
o=vee(2 + x)
a * os
Vi wi
cit3) nee 2Cov(X, Y)
a oF x8
= lt + (X.Y)
implying that
-15 (X.Y)334
Example
4e
Properties of Expectation
osvar(X—-¥
a ay
= Yar), Var(¥) _ 2Cov(x, ¥)
oe Cay? O20
=2l - (XY)
On the other hand,
implying that oyet
o(X.Y) =
which completes the proof of Equation (4.2).
In fact, since Var(Z) = 0 implies that Z is constant with probability 1, it follows
from the proof of Equation (4.2) that p(X, ¥) = 1 implies that ¥ =a + bX, where
b = «4/0, > Vand (X,Y) = —1 implies that ¥ = a + bX, where b = —0,/o. < 0.
‘We leave it as an exercise for the reader to show that the reverse is also true: that if
Y =a + bX, then p(X, ¥) is either +1 or ~1, depending on the sign of b.
‘The correlation coefficient is a measure of the degree of linearity between X
and ¥. A value of o(X, ¥) near +1 or ~1 indicates a high degree of linearity between
X and Y, whereas a value near 0 indicates that such linearity is absent. A positive
value of o(X, ¥) indicates that ¥ tends to increase when X does, whereas a negative
value indicates that Y tends to decrease when X increases. If p(X, Y) = 0, then X
and ¥ are said to be uncorrelated.
Let [4 and Ig be indicator variables for the events A and B. That is,
=|} it occurs
4=10 otherwise
me {: if B occurs
0 otherwise
‘Then
Ella) = POA)
Ells] = PB)
Ellaln] = PCABY
so
Cov(la, tn) = P(AB) ~ PLA)P(B)
~ PCBYPCAIB) — PAY)
Thus, we obtain the quite intuitive result that the indicator variables for A and B
are either positively correlated, uncorrelated, or negatively correlated, depending,
‘on whether P(A|B) is, respectively, greater than, equal to, or less than P(A).
‘Our next example shows that the sample mean and a deviation from the sample
‘mean are uncorrelated.
Let X1,...,%q be independent and identically distributed random variables having
variance 0. Show that
Cov(X; — X,X)=0Example
at
Properties of Expectation
Solution We have
Cov(X; — ¥,X) = Cov(Xi,X) — Cov(¥,X)
i¢ =
= Cov| Xi, 0%) ] — Var)
where the next-to-last equality uses the result of Example 4a and the final equality
follows because
0 ifj #1 byindependence
o? ifj=i since Var(X) =
Although X and the deviation Xj — X are uncorrelated, they are not, in gen-
‘eral, independent. However, in the special case where the X; are normal random
variables, it turns out that not only is ¥ independent of a single deviation, but itis
independent of the entire sequence of deviations Xj — X,j = 1,...,n. This result
will be established in Section 8, where we will also show that, inthis case, the sample
mean and the sample variance S? are independent, with ( — 1)S?/o? having a
chi-squared distribution with n — 1 degrees of freedom. (See Example 4a for the
definition of S?.) .
Consider m independent trials, each of which results in any of r possible outcomes,
with probabilities pi,...,Pr, Dhan pi = 1. Ifwe let Niyi r, denote the number
of the m trials that result in outcome /, then Ni,Nz,...,Ny have the multinomial
distribution
PUN, = My .065 Np m= Pt Pr nism
ii
For i # j, it seems likely that when N; is large, N; would tend to be small; hence, itis,
intuitive that they should be negatively correlated. Let us compute their covariance
by using Proposition 4.2(iv) and the representation
M=Diw and N= DG
mi im
oa
1 iftrial k results in outcome i
t= (Seen
bees
wo= {oss
335336
Properties of Expectation
From Proposition 4.2(iv), we have
Cov(Ni, Nj) = >) Covtlitk, 4(6))
Akal
Now, on the one hand, when k # £,
Cov(li(k), h()) = 0
since the outcome of trial k is independent of the outcome of trial &. On the other hand,
Covi(2),4() = EUCOGO] — EOE)
= 0 — pipj = —Pipj
where the equation uses the fact that 1:(£)1}(2) — 0, since trial £ cannot result in both
‘outcome i and outcome j. Hence, we obtain
CovN.N) = ~mpipj
‘hich is in accord with our intuition mat 1; and IV; are negatively correlated.
5 Conditional Expectation
Example
Sa
5.1 Definitions
If X and ¥ are jointly discrete random variables, then the conditional probability
mass function of X, given that Y = y, is defined for all y such that
PLY =y) > 0,by
pay)
= P(X =x|¥ =y) = POY?
pxyw(ey) = PLX = x pro)
Its therefore natural to define, n this case, the conditional expectation of X given
that ¥ = y, forall values of y such that py()) > 0, by
Dx =x1¥ =y1
= Dxvey)
EIXIY =
If X and ¥ are independent binomial random variables with identical parameters n
and p, calculate the conditional expected value of X given that X + ¥ =m.
Solution Let us first calculate the conditional probability mass function of X given
that X + Y =m. For k = min(n,m),
PUX =k, X + Y=m)
PX + Yam)
PIX = KIX + Y =m)Properties of Expectation
(ero -ora yt gjrta = ora
(Jona - pen
where we have used the fact that X + Y is a binomial random variable with param-
eters 2n and p. Hence, the conditional distribution of X, given that X + Y =m, is
the hypergeometric distribution, and from Example 2g, we obtain
EIXIX + ¥=
Similarly, if X and ¥ are jointly continuous with a joint probability density func-
tion ery), then the conditional probability density of X, given that
Y = y, is defined for all values of y such that fy(y) > 0 by
_ fey
Savvy) Fro).
It is natural, in this case, to define the conditional expectation of X, given that
Y=y,by wo
axv=y)=[ xfay (aly) dx
provided that f(y) > 0.
Example Suppose that the joint density of X and ¥ is given by
ewer
y
fay=
O E[RIwin, S = (JQ,
However, given that the initial sum is i, the number of additional rolls needed and
the outcome (whether a win or a loss) are independent. (This is easily seen by first
noting that conditional on an initial sum of i, the outcome is independent of the
number of additional dice rolls needed and then using the symmetry property of
independence, which states that if event A is independent of event B, then event B
is independent of event A.) Therefore,
E{Riwin] = > ELRIS =],
= 2.938
Although we could determine E[Riplayer loses] exactly as we did E[Riplayer
wins], its easier to use
E{R] = E[Riwinlp + E{Rilose](1 — p)
implying that
F{Rilose| = RI = E1Riwinkp _
3.801 .
‘The bivariate normal joint density function of the random variables X and Y is
2
= —— exp | | (tae)? y (ro
1 ame (ar (CS) + (4)
— yp EDO = 2]
a.
We will now show that pis the correlation between X and ¥. As shown in Exam
ple Se, ux = E[X],02 = Var(X), and y1y = E[Y],02 = Var(¥). Consequently,
Com(X,¥) = AY
XY] = sey
amy
To determine E[XY], we condition on Y. That is, we use the identity
ELXY] = E[ELXYI¥]]Example
58
Properties of Expectation
Recalling from Example Sd that the conditional distribution of X given that
is normal with mean ji + p2(y — jy), we see that
ELXY|Y = y] =
= yur + pO — ny)
Consequently, «
EIXVIY] = Yux + p2(¥? ~ wy¥)
%
implying that
EIXY] = [en + ey? — 9]
oy
= usFLY] + pEELY? ~ py¥]
oy
= many + 0% (FIV ~ 13)
uly + pEVar(Y)
= Matty + porey
‘Therefore, _
Corr(x, ¥) = 28 = p .
Oy
Sometimes E[X] is easy to compute, and we use the conditioning identity to
compute a conditional expected value. This approach is illustrated by our next
example.
Consider n independent trials, each of which results in one of the outcomes 1,...,k,
with respective probabilities p1,....Pks DeaiPi = 1. Let Nj denote the number of
trials that result in outcome i, = 1,.....Fori # j,find
(@)BINJIN, > 0] and) ELNINE > 1)
Solution To solve (a), let
Then
BIN]] = EIN
Oru
or, equivalently,
BIN)] = E[NjINi = OJP(N; = 0) + ELNjIN: > OJPIN; > 0)
Now, the unconditional distribution of Nj is binomial with parameters np). Also,
given that N; = r, each of the n — r trials that does not result in outcome i will,
independently, result in outcome j with probability P(|not i) = 724. Consequently,Example
Properties of Expectation
the conditional distribution of Nj, given that N; = r, is binomial with parameters
n= riqebe. Because PIN; 1 = pi). the preceding equation yields
ny 1 = po" + BININ: > 0] — (1 ~ pa")
giving the result
a= por
E[NjINi > 0] = npj
1-G- py
‘We can solve part (b) in a similar manner. Let
0, ifNj=0
J=}1, it%
2, ifN>1
Then
IN|] = EIN} = OPU = 0) + ZIN}U = 1]PU =1)
+ EIN) 2\PU =2)
or, equivalently,
IN|] = BINJIN: = OPN; = 0} + ELNGIN: = TPN:
+ BINJIN; > 1PM: > 1)
‘This equation yields
py =n = pit + (= 1B npict = pat!
Pi Tp
+ EIN IN, > 110 ~ 0 = pa" = mpi ~ po")
giving the result
FINN: > 1) = MAL = @ = part = (ne = dpa = piv}
m = 1 0 — pit — mpi — pi
It is also possible to obtain the variance of a random variable by conditioning.
We illustrate this approach by the following example.
‘Variance of the geometric distribution
Independent trials, each resulting in a success with probability p, are successively
performed. Let N be the time of the first success. Find Var(N).
Solution Let ¥ = 1 ifthe first trial results in a success and ¥ = 0 otherwise. Now,
‘Var(N) = E[N?] — (EIN)?
To calculate E[N?], we condi
on ¥ as follows:
EIN?) = E{E(N*1Y]]
However,
EIN?Y = 1] =1
E[N*|Y = 0] = Ed + N)*]Properties of Expectation
‘These two equations follow because, on the one hand, if the first trial results in a
success, ten, clearly, V = 1; thus, V2 = 1. Un the other hand, ifthe hrst trial results
in a failure, then the total number of trials necessary for the first success will have
the same distribution as 1 (the first trial that results in failure) plus the necessary
number of additional trials. Since the latter quantity has the same distribution as N,
we obtain E[N*IY = 0] = E[(1 + N)?]. Hence,
EIN? = E[N41Y = PLY = 1) + EINA1Y = ORLY = 0)
+ ~ pela + NY]
= 1+ (1 ~ DERN + 4
However, EIN] = 1/p; therefore,
Bin’) =1 + A=?)
?
+ (1 ~ pye{N?]
or
EIN?) 2-p
Consequently, :
Var(N) = E[N?] — (ELN)?
-G)
Consider a gambling situation in which there are r players. with player i initially
having n; units, m; > 0, 1 = 1,7. At each stage, two of the players are chosen to
play a game, with the winner of the game receiving I unit from the loser. Any player
whose fortune drops to 0 is eliminated, and this continues until a single player has
all n = S1_,m; units, with that player designated as the victor. Assuming that the
results of successive games are independent and that each game is equally likely to
bbe won by either of its two players, find the average number of stages until one of,
the players has all units.
Solution To find the expected number of stages played, suppose first that there are
only 2 players, with players 1 and 2 initially having j and n — J units, respectively.
Let Xj denote the number of stages that will be played, and let mj = E[X)]. Then,
for j= Ayn — 1,
Xa +A)
where Aj is the additional number of stages needed beyond the first stage. Taking
expectations gives
mj =1+ EIA)
Conditioning on the result of the first stage then yields
‘my =1 + E[A/IL wins frst stage]1/2 + E[Aj|2 wins first stage}1 /2
Now, if player 1 wins at the first stage, then the situation from that point on is exactly
the same as in a problem that supposes that player / starts with j + 1 and player 2
withn — (j + 1) units. Consequently,
E{Aj|I wins first stage] = mis346
Properties of Expectation
and, analogously,
E{AjI2 wins first stage] = mj-1
Thus,
1
1
Te 5m + am4
or, equivalently,
mje, = 2my — mj-1 2 Jd 1 (64)
Using that mo = 0, the preceding equation yields
my = 2m; - 2
m3 = 2my — my ~ 2=3m, — 6 = 30m ~ 2)
mg = 2ms — mz ~ 2= 4m, — 12 =4(m ~ 3)
suggesting that
Sim — 741)
1” 6S)
‘To prove the preceding equality, we use mathematical induction. Since we've already
shown the equation to be true for i = 1,2, we take as the induction hypothesis that
it is true whenever i = j < n. Now we must prove that it is true for j + 1. Using
Equation (5.4) yields
Img =2mj — mya ~ 2
2ilm, =] + 1) G = Dim, = 7 + 2)—2 (by the induction hypothesis)
=G+ dm, — oP ++ P—y+2-2
G+ 0m - 7 -
G+ Dem - 9
which completes the induction proof of (5.5). Letting i = n in (5.5), and using that
‘mn, = 0, now yields that
man 1
which, again using (5.5). gives the result
main =)
‘Thus, the mean number of games played when there are only 2 players with initial
amounts i and n — is the product of their initial amounts, Because both players
play all stages, this is also the mean number of stages involving player 1.
‘Now let us return to the problem involving r players with initial amounts 7,4
Aves, Diam = n. Let X denote the number of stages needed to obtain a
and let X; denote the number of stages involving player i. Now, from the point of
view of player i, starting with nj, he will continue to play stages, independently being
equally likely to win or lose each one, until his fortune is either n or 0. Thus, the
number of stages he plays is exactly the same as when he has a single opponent with
an initial fortune of n — nj. Consequently, by the preceding result, it follows that
AX
n(n — ni)Example
Properties of Expectation
£133 | =m - mp
a) a
But because each stage involves two players,
1s
x 2L%
“Taking expectations now yields
It is interesting to note that while our argument shows that the mean number of
stages does not depend on the manner in which the teams are selected at each stage,
not true for the distribution of the number of stages. To see this, suppose
and n3 = 2. If players 7 and 2 are chosen in the first stage, then
it will take at least three stages to determine a winner, whereas if player 3 is in the
first stage, then itis possible for there to be only two stages. .
In our next example, we use conditioning to verify that the expected number of
uniform (0,1) random variables that need to be added for their sum to exceed 1 is
equal toe.
Let Uj, Uz,... be a sequence of independent uniform (0, 1) random variables. Find
E{N] when
Ne=min}m )°U; > |
i
Solution We will find E[N] by obtaining a more general result. For x « [0, 1} let
N@) =
and set
m(x) = EIN@)]
‘That is, N(x) is the number of uniform (0, 1) random variables we must add until
their sum exceeds x, and m(x) is its expected value, We will now derive an equation
for m(x) by conditioning on Us. This gives, from Equation (5.1b),
1
ma) = f E[N(@ IU: = yldy (66)
Now,
wena ft cme HZ! 6
‘The preceding formula is obviously true when y > x. It is also true when y = x,
since, if the first uniform value is y, then, at that point, the remaining number of
uniform random variables needed is the same as if we were just starting and were
347348
Properties of Expectation
going to add uniform random variables until their sum exceeded x — y, Substituting
Equation (5.7) into Equation (5.6) gives
mo =1 +f m(x — y)dy
f
* by letting
+f mnta in
Differentiating the preceding equation yields
m'(x) = mx)
or, equivalently,
mx)
mx)
Integrating this equation gives
loglm(x)] =x + ¢
m(x) = ke*
‘Since m(0) = 1, it follows that k = 1, so we obtain
m(x) =
‘Therefore, m(1), the expected number of uniform (0, 1) random variables that need.
to be added until their sum exceeds 1, is equal toe. .
5.3 Computing Probabilities by Conditioning
‘Not only can we obtain expectations by first conditioning on an appropriate random
variable, but we can also use this approach to compute probabilities. To see this, let
E denote an arbitrary event, and define the indicator random variable X by
1 it B occurs
as 0 if Edoes not occur
It follows from the definition of X that
E[X] = P(E)
EIXIY = y] = PCEIY = y) for any random variable ¥
‘Therefore, from Equations (5.1) and (5.1), we obtain
PE) = DPE = PY =)) it Vis discrete
A (58)
= [© peer =yfrondy it iscontinuous
Note that if Y is a discrete random variable taking on one of the values y1,....Jns
then by defining the events F;,i = 1,...,n, by Fi = (Y = yi), Equation (5.8) reduces
to the familiar equation
P(E) = > P(E\F)PCF))
where F,..., a are mutually exclusive events whose union isthe sample space.Example
Properties of Expectation
‘The best-prize problem
Suppose that we are to be presented with n distinct prizes, in sequence. After being
presented with a prize, we must immediately decide whether to accept it or to reject,
it and consider the next prize. The only information we are given when deciding
‘whether to accept a prize is the relative rank of that prize compared to ones already
seen. That is for instance, when the fifth prize is presented, we learn how it compares
with the four prizes we've already seen. Suppose that once a prize is rejected, itis
lost, and that our objective is to maximize the probability of obtaining the best prize.
‘Assuming that all n! orderings of the prizes are equally likely, how well can we do?
Solution Rather surprisingly, we can do quite well. To see this, fix a value k, 0 =
k PuibestiX = )PK =)
DY Palbestix =i),
tai
Now, on the one hand, if the overall best prize is among the first k, then no prize is
ever selected under the strategy considered. That is,
Py(best|X =) =0 ifisk
On the other hand, if the best prize isin position i, where i > k, then the best prize
will be selected if the best of the first / — 1 prizes is among the first k (for then none
of the prizes in positions k + 1,k + 2,...,i — 1 would be selected). But, conditional
on the best prize being in position i itis easy to verify that all possible orderings of
the other prizes remain equally likely, which implies that each of the first i — 1 prizes
is equally likely to be the best of that batch. Hence, we have
Py(best|X =
P(best of first i — 1 is among the first KIX = i)
ifi>k
Now, if we consider the function
ae)
349350
Example
Properties of Expectation
then
ge) = htop (2) -i
%0
10) =0=>log(2) <1-92=2
‘Thus, since Py(best) ~ g(k), we sce that the best strategy of the type considered is to
let the first n/e prizes go by and then accept the first one to appear that is better than
all of those, In addition, since g(n/e) = 1/e, the probability that this strategy selects
the best prize is approximately 1/e ~ 36788.
Remark Most people are quite surprised by the size of the probability of obtaining
the best prize, thinking that this probability would be close to 0 when m is large. How
ever, even without going through the calculations, a little thought reveals that the
probability of obtaining the best prize can be made reasonably large. Consider the
strategy of letting half of the prizes go by and then selecting the first one to appear
that is better than all of those, The probability that a prize is actually selected is the
probability that the overall best is among the second half, and this is}. In addition,
sven that a prize is selected, at the time of selection that prize would have been
the best of more than n/2 prizes to have appeared and would thus have probability
of at least } of being the overall best. Hence, the strategy of leting the fist half of
all prizes go by and then accepting the first one that is better than all of those prizes
has a probability greater than } of obtaining the best prize. .
Let U be a uniform random variable on (0, 1), and suppose that the conditional
distribution of X, given that U = p, is binomial with parameters n and p. Find the
probability mass function of X.
Solution Conditioning on the value of U gives
1
pix=a= f P(X = iJU = pifup) dp
1
frau = nap
5
nl
1
= att f #0 ~ vee
i — di Jo
‘Now, it can be shown that
1 aun = a
f pil — pylap = 2 =O!
Hence, we obtain
1
oO n+
That is, we obtain the surprising result that if a coin whose probability of coming
up heads is uniformly distributed over (0, 1) is flipped n times, then the number of
heads occurring is equally likely to be any of the values 0,...,7.
Because the preceding conditional distribution has such a nice form, itis worth
trying to find another argument to enhance our intuition as to why such a result
is true. To do so, let U,Ui,...,Un be m + 1 independent uniform (0, 1) randomExample
Example
Properties of Expectation
variables, and let X denote the number of the random variables Uj,..., Uy that are
smaller than J. Since all the random variables U, Ui,...,Un Nave the same aistri-
bution, it follows that U is equally likely to be the smallest, or second smallest, or
largest of them; so X is equally likely to be any of the values 0,1,...,7. However,
given that U = p, the number of the U; that are less than U is a binomial random
variable with parameters n and p, thus establishing our previous result. .
Suppose that X and ¥ are independent continuous random variables having densi-
ties fy and fy, respectively. Compute P(X < Y},
Solution Conditioning on the value of Y yields
PIX <¥) = [ope < YIY =ylfvondy
= [Lea ( i) eka = py
a
=e +1- py
‘where the last equality follows from the binomial theorem. Differentiation yields
MQ =n¢pe + 1 — py "pe
Thus,
FX]
Differentiating a second time yields
M"() = n(n — 1(pel + 1 = py" (pel? + mipe! + 1 — pyle!
MO
np
so
E[X?] = M"0) = n(n — Vp? + np
‘The variance of X is given by
Var(X) = E[X?] — (ELX)?
(n — Lp? + np — rep?
=np(l — p)
verifying the result obtained previously in the chapter.
Example Poisson distribution with mean 1.
If X is a Poisson random variable with parameter 2, then
Mi = Ee*]
© gng-hyn
so Gey"
LH
Differentiation yields
Mo
Mo
(ae'Pexpla(e! — 1)} + Aetexptate’ — D}Properties of Expectation
E[X] = M0)
E[X*] = M"(0) = 2? +
Var(X) = ELX?] = (E[X)?
Example
Mo = Fe
= [oremas
fb
aa [omar
o
3
= fort 0
expan’ D) A
Geometric with
parameter
O=pst
Negative (to t)pa-e [ pe
1
1
2
al
sie
T-a-pe
nrthy
Another important result is that the moment generating function uniquely deter-
mines the distribution. That is, if M() exists and is finite in some region about f = 0,
then the distribution of X is uniquely determined. For instance, if
0
meo=(5) @ +0,
then it follows from Table 1 that X is a binomial random variable with parameters
Wand},
‘Suppose that the moment generating function of a random variable X is given by
M@ = &€-D, What is P(X = 0)?
Solution We see from Table 1 that M() = €%'~ is the moment generating fune-
tion of a Poisson random variable with mean 3. Hence, by the one-to-one correspon-
dence between moment generating functions and distribution functions, it follows
that X must be a Poisson random variable with mean 3. Thus, P(X = 0) =e".
‘Sums of independent binomial random variables
If X and Y are independent binomial random variables with parameters (n, p) and
(om, p), respectively, what isthe distribution of X + Y?
Solution The moment generating function of X + is given by
Mxsy(6) = Mx()My() = (pel + 1 — py"(pel + 1 = py™
= (pe +1 - pyr
However, (pe! + 1 — p)"*® is the moment generating function of a binomial ran
dom variable having parameters m + n and p. Thus, this must be the distriby
of X + ¥.Properties of Expectation
) °. (on)
eo {ee ofl x wore 9 yen tL = eoy | samme sion
wv
_ ors o 0< 9)
x x (2) (oa }=Cos | szoronmared gous verses)
. 5 AY oer
OD
0 < Y;0med
x x o>x of
T T oer a ws ‘ys pepmanodg
osmomo
a z (=n
»-qh=@ ‘») 1940 m0}
eo i ae geese -4| wos @) 00
soueHeA URW GY “UONDURY (af uonduny Kysuap Aumgegord,
“uOMnGuIsiC] AypgeGoId SHONUNUCD _Z aIGEL
363Example
Te
Example
Example
a
Properties of Expectation
‘Sums of independent Poisson random variables
Calculate the distribution of X + ¥ when X and ¥ are independent Poisson random
variables with means respectively 21 and 2.
Solution
Mxyy =Mx@MyO,
=explar(e! — 1)}expiratet — 1))
xp((ar + day(el — 1}
Hence, X + ¥ is Poisson distributed with mean 2 + A2. .
‘Sums of independent normal random variables
Show that if X and Y are independent normal random variables with respective
parameters (11,02) and (j,03), then X'-+ Y is normal with mean jy + 42 and
variance o? + 03.
Solution
Mxsv® =Mx@MyO
ae
= exp yo + ait ox
aan | 2 J ope
+ mh
+n + vor
which is the moment generating function of a normal random variable with mean
ye, + pe and variance o? + a2. The desired result then follows because the moment
generating function uniquely determines the distribution, .
Compute the moment generating function of a chi-squared random variable with
degrees of freedom.
Solution We can represent such a random variable as
Bt Ze
where Zj,...,Z, are independent standard normal random variables. Let M(s) be
its moment generating function. Then, by the preceding,
M@ = Ele)"
where Z is a standard normal random variable. Now,
eae
wok fet ae wheres? 2 = 2)
a= 29?Example
Properties of Expectation
Where the next-to-last equality uses the fact that the normal density with mean 0 and
variance o* integrates to 1. ineretore,
M@ = (1 = 29"? .
Moment generating function of the sum of a random number of random variables
Let X1,X2,... be a sequence of independent and identically distributed random
variables, and let N’ be a nonnegative, integer-valued random variable that is inde-
Pendent ot the sequence %,1 = 1, We want to compute tne moment generating
function of
w
Y=)%
m
(In Example 5d, Y was interpreted as the amount of money spent in a store on a
given day when both the amount spent by a customer and the number of customers
are random variables)
“To compute the moment generating function of Y. we frst condition on N as
follows:
r
e oo| Sx} wen|=e| ap
;
xl y
-rloof
=[Mxoy"
where
Mx(o = Fle*]
Hence,
EeY\N] = (Mx(oy
Thus,
My( = BUM x0")
‘The moments of ¥ can now be obtained upon differentiation, as follows:
My(0) = E[N(Mx(0)" My)
So
EY] = M40)
E[N(Mx(0)"" M)]
E(NE[X]} (72)
EINELX]
verifying the result of Example 5d. (In this last set of equalities, we have used the
fact that Mx(0) = Ele™*] = 1.)
365366
Properties of Expectation
Also,
MY = BIN(N — DMx(oy (MO? + NOMx Oy" MZCO)
ELY"] = Mj)
EIN(N = 1(E[X)? + NELX?]]
E[X)?(E[N?] — EIN) + E[NJELX?] (73)
= EINVELX?] — (ELX)?) + (ELX)PEIN?]
E[N|Vat(X) + (E[X)PE[N?]
Hence, from Equations (7.2) and (7.3), we have
‘Var(¥) = E[N]Var(X) + (ELX)?CEIN?] — (EIND*)
= E[N]Var(X) + (E[X])?Var(N) .
Let ¥ denote a uniform random variable on (0. 1). and suppose that conditional on
¥ = p, the random variable X has a binomial distribution with parameters n and p.
In Example 5k, we showed that X is equally likely to take on any of the values
0,1,...,. Establish this result by using moment generating functions.
Solution To compute the moment generating function of X, start by conditioning
‘on the value of ¥. Using the formula for the binomial moment generating function
ives
Ele*|¥ =p] = (pe + 1 — py*
Now, ¥ is uniform on (0,1), 0, upon taking expectations, we obtain
Fe)
1
[ve +1- pra
<
=z4 f ydy (by the substitution y = pe’ +1 — p)
gown)
Z
a a
weit et eat +e")
Because the preceding is the moment generating function of a random variable that
is equally likely to be any of the values 0,1... the desired result follows from the
fact that the moment generating function of a random variable uniquely determines
its distribution, .
7.1 Joint Moment Generating Functions
It is also possible to define the joint moment generating function of two or more
random variables, This is done as follows: For any m random variables Xj,...,Xn.
the joint moment generating function, M(t1.....t,), is defined, for all real values of
Ny eeelny by
Mths stn) = Ele ¥t eX)Example
Tm
Properties of Expectation
‘The individual moment generating functions can be obtained from M(t,
letting all Dut one of tne fs De 0. That is,
fn) by
Mx,(0 = Ele*] = MO,
where the fis in the ith place,
It can be proven (although the proof is too advanced for this chapter) that the
joint moment generating function M(‘y,...,tn) uniquely determines the joint dis-
tribution of X;,...,Xx- This result can then be used to prove that the m random
variables X1,...,X_ are independent if and only if,
Mths. sate) = My (ts) Miz (tn) (74)
For the proof in one direction, ifthe n random variables are independent, then
M(tyscssaty) = leit 408
= Efe «eX
= Ble")... Efe***] by independence:
= Mx,(4)-- Mx(ln)
For the proof in the other direction, if Equation (7.4) is satisfied, then the joint
moment generating function M(t...) isthe same as the joint moment generating
function of m independent random variables, the ith of which has the same distribu-
tion as Xj. As the joint moment generating function uniquely determines the joint
distribution, this must be the joint distribution; hence, the random variables are
independent.
Let X and ¥ be independent normal random variables, each with mean y and vari-
ance o?. Let us establish that X + Y and X — Y are independent by computing
their joint moment generating function:
BEXAR] = ppeeroxHe-9Y]
= Fle*X]z[e-9¥]
a eiteneatune? a utosatt
a putter pit
But we recognize the preceding as the joint moment generating function of the sum
of a normal random variable with mean 2 and vatiance 20” and an independent
normal random variable with mean 0 and variance 2a”, Because the joint moment
generating function uniquely determines the joint distribution, it follows that X + Y
and X — Y are independent normal random variables. .
Suppose that the number of events that occur is a Poisson random variable with
‘mean A and that each event is independently counted with probability p. Show that
the number of counted events and the number of uncounted events are independent
Poisson random variables with respective means kp and A(1 — p)..Properties of Expectation
Solution Let X denote the total number of events, and let X. denote the number of
them that are counted. To compute the joint moment generating function of Xe, the
number of events that are counted, and X — Xe, the number that are uncounted,
start by conditioning on X to obtain
EleXHOXOX =n] = A Efe P*|X = rn]
=e™pe! +1 — pyr
=e + (1 pe"
which follows because, conditional on X =n, X- is a binomial random variable with
parameters n and p. Hence,
Ele ROX |X] = (pe + (1 — pyey*
‘Taking expectations of both sides of this equation yields
Ele HOAX) = Elipe’ + (1 — poe*]
‘Now, since X is Poisson with mean 2, it follows that Efe] = eM. Therefore, for
any positive value awe see (by letting a = e') that E[a*] = &-), Thus,
EfeXHX-X0] = poe H0-pe-D
= ee dI-PE—1)
As the preceding is the joint moment generating function of independent Poisson
random variables with respective means 4p and (1 — p), the result is proven,
8 Additional Properties of Normal Random Variables
8.1 The Multivariate Normal Distribution
Let Z;,...,Zy be a set of m independent unit normal random variables. If, for some
constants ay,1 Sis m,1=j = n,andyj,1 i= m,
X= anZi ++ + ainZy + 1
Mp aan, ++ + aZn +
On Zy +o + OinZn + 4
Kin =m Zi +--+ nn + Hm
then the random variables X;,...,%m are said to have a multivariate normal distri-
bution.Example
Properties of Expectation
From the fact that the sum of independent normal random variables is itself a
normal random variable, it follows that each X; is a normal random variable with
mean and variance given, respectively, by
FIX) =
Var(Xi) ye
Let us now consider
M(ty,.--stm) = ElexptthXy + +++ + tmXmi]
the joint moment generating function of X;,...,Xm. The first thing to note is that
since $* nX1 is itself a linear combination of the independent normal random vi
ables Z1,...,Zn itis also normally distributed. Its mean and variance are
E| aki) = Dame
mi
and
Var
x) cov ux 4%)
"fat
ELV wor.
aint
Now, if ¥ is a normal random variable with mean y and variance o?, then
EleY] = My leat = ett"?
xp} Yaa + SY ayCowXnX)
ia ist fat
Mitts .++ stm)
which shows that the joint distribution of X1,..., Xm is completely determined from
a knowledge of the values of E[X;] and Cov(Xi, Xj),
Itcan be shown that when m = 2, the multivariate normal distribution reduces
to the bivariate normal.
Find P(X < Y) for bivariate normal random variables X and Y having parameters
Mx = E[X], ay = ELY], of = Var(X), of = Var(¥), p = Cort(X, ¥)
Solution Because X — Y is normal with mean
EX ~ Y]=us ~ by
and variance
Var(X — ¥) = Var(X) + Var(-¥) + 2Cow(X,—¥)
o2 + of ~ 2010370
Properties of Expectation
we obtain
PIX < Y)=PIX — ¥ <0)
— x — Xa¥ ~~) —(x — by)
fa} + 0} — Yoocay fo? + oF ~ 2p00y
ty =e a
(Fes “+ of — 2pax0y
Suppose that the conditional distribution of X, given that © = 6, is normal with
mean and variance 1. Moreover, suppose that © itself is a normal random vari-
able with mean and variance 02. Find the conditional distribution of © given that
Xan
Solution Rather than using and then simplifying Bayes’s formula, we will solve this,
problem by first showing that X, © has a bivariate normal distribution. To doo, note
that the joint density function of X, can be written as
fx,00,9) = frye CXl0)fo()
where fxyo (+10) is a normal density with mean @ and variance 1. However, if we let Z
bbe a standard normal random variable that is independent of ©, then the conditional
distribution of Z + ©, given that © = 6, is also normal with mean 9 and variance 1
‘Consequently, the joint density of Z + ©, © is the same as that of X,@. Because the
former joint density is clearly bivariate normal (since Z + © and © are both linear
combinations of the independent normal random variables Z and @), it follows that
X,@ has a bivariate normal distribution. Now,
EIX]= FIZ + 0] =n
Var(X) = Var(Z + @) =1 + 07
and
p= Corr(X,@)
=Com(Z + 0,8)
Cov(Z + 0,0)
WVar'Z + O)var@)
vie
Because X, © has a bivariate normal distribution, the conditional distribution of ©,
given that X = xis normal with mean
Van
El@1x = x] = £0} + » | MO
2
set ee
and variance
Var(@LX = x) = Var(@y(1 — p*)
eB
T+Properties of Expectation
8.2 The Joint Distribution of the Sample Mean and Sample
Variance
Let X},...,X~ be independent normal random variables, each with mean and vari-
ance «2, Let ¥ = 3° Xi/n denote their sample mean, Since the sum of independent
i
normal random variables is also a normal random variable, it follows that X is a nor-
mal random variable with (from Examples 2c and 4a) expected value and variance
",
Jn.
‘Now, recall from Example 4e that
Cov(®, x; — %) ” al)
Also, note that since X,X; — X,X2 — X,...,Xn — X are all linear combina-
tions of the independent standard normals (X; — 4)/0,i = 1,...n, it follows that
X,X; — X,i=1,....m has a joint distribution that is multivariate normal. If we let
Y be a normal random variable, with mean 4. and variance o”/n, that is independent
of the Xji = 1,...,n, then Y,X; — X,i = 1,...,n also has a multivariate normal
distribution and, indeed, because of Equation (8.1), has the same expected values
and covariances as the random variables X,X; — X,i = 1,...,n. But since a mul-
tivariate normal distribution is determined completely by its expected values and
it follows that ¥,X, — X,i=1,....nandX,X; — X,i=1,....whave
stribution, thus showing that X is independent of the sequence of
ns. Xi — Xi = 1s... 40 i
Since X is independent of the sequence of deviations X; ~ Xi
also independent of the sample variance S* = S>(X; — KP/(n — 0).
A
Since we already know that X is normal with mean 1 and variance 0?/n, it
remains only to determine the distribution of S®, To accomplish this, recall, rom
Example 4a, the algebraic identity
the same joint
devi
po eeaM it is
(m — DSP = ou - XP
Dex - wy? = n& = wy?
a
2
Upon dividing the preceding equation by 02, we obtain
mas ( (62)
a
Now,
is the sum of the squares of n independent standard normal random variables and
so is a chi-squared random variable with n degrees of freedom. Hence, from Exam-
ple Ti, its moment generating function is (1 ~ 21)""/?, Also, because
Cs)
374Properties of Expectation
is the square of a standard normal variable, itis a chi-squared random variable with
1 degree of freedom, and so has moment generating function (1 — 2)-", Now, we
have seen previously in the chapter that the two random variables on the left side
‘of Equation (8.2) are independent. Hence, as the moment generating function of
the sum of independent random variables is equal to the product of their individual
moment generating functions, we have
FeODF 70 — a2 = = 292
or
Blet-D164) = — ay-mve
But as (1 — 2-0 is the moment generating function of a chi-squared random
variable with n — 1 degrees of freedom, we can conclude, since the moment gener-
ating function uniquely determines the distribution of the random variable, that that
is the distribution of (n — 1)$?/a?.
Summing up, we have shown the following.
Proposition If X;,...,X» are independent and identically distributed normal random variables
‘8.1 with mean y« and variance a, then the sample mean and the sample variance $?
are independent. X is a normal random variable with mean wand variance o/n;
(x ~ 1)$%/a? isa chi-squared random variable with n — 1 degrees of freedom.
9 General Definition of Expectation
Upto this point, we have defined expectations only for discrete and continuous ran-
dom variables. However, there also exist random variables that are neither discrete
nor continuous, and they, too, may possess an expectation. As an example of such a
random variable, let X be a Bernoulli random variable with parameter p = 3, and
let ¥ be a uniformly distributed random variable over the interval (0, 1]. Further-
more, suppose that X and Y are independent, and define the new random variable
Woy
wa[k st
Y ifX#l
Clearly, W is neither a discrete (since its set of possible values, [0,1], is uncountable)
nor a continuous (since P{W = 1} = }) random variable.
In order to define the expectation of an arbitrary random variable, we require
the notion of a Stieltjes integral. For instance, for any function g, f° g(x) dr is defined
by
. *
f g@) dx = lim > gets — x1)
where the limit is taken over all
G4 — 1) 0.
For any distribution function F, we define the Stieltjes integral of the nonnega-
tive function g over the interval [a, b] by
Say 0. Further, we define the Stieltes integral over the whole
real line by
“0 ’
f (x) dF) = lim f (x) dF(x)
Finally, if gis not a nonnegative function, we define g* and g~ by
itga) <0
-waf o ite =o
ro={ es itga) <0
von [ry fae
Because gcx) = g'(8) — g (4) and g* and g~ are both nonnegative functions, iis
natural to define
[se are = [ive ary = [ew ar)
and we say that /°°, g(x) dF(x) exists as long as [°° g* (x) dF(x) and f°, g(x) dF(x)
are not both equal to +00.
If X is an arbitrary random variable having cumulative distribution F, we define
the expected value of X by
Ex] = f xdF() (9)
‘It can be shown that if X is a discrete random variable with mass function p(x), then
[ dF) = YD xp)
a noo
whereas if X is a continuous random variable with density function f(x), then
[i2ateo = [ates
‘The reader should note that Equation (9.1) yields an intuitive definition of E[X];
consider the approximating sum
DxlFan - Fe]
of E[X]. Because F(x) — F(xi-1) is just the probability that X willbe in the interval
(1, xi) the approximating sum multiplies the approximate value of X when itis in
the interval (x1, by the probability that it will be in that interval and then sums
over all the intervals. Clearly, as these intervals get smaller and smaller in length, we
obtain the “expected value” of X.
Stieltjes integrals are mainly of theoretical interest because they yield a compact,
way of defining and dealing with the properties of expectation. For instance, the
use of Stielties integrals avoids the necessity of having to give separate statements
and proofs of theorems for the continuous and the discrete cases. However, their
properties are very much the same as those of ordinary integrals, and all of the proofs
presented in this chapter can easily be translated into proofs in the general case.
373374
Properties of Expectation
Summary
EX and ¥ have a joint probability mass function p(x, y),
then!
EIg(XY)] =D sep»)
es
whereas if they have a joint density function f(x,y), then
BKM) = [~ [~ seyftyndedy
‘consequence ofthe preceding equations is that
AUX + Y]=21X] + AY)
which generalizes to
E lé | = Dax
‘The covariance between random variables X and ¥ is,
siven by
Cov(X, ¥) = E[(X - E[XI(Y - E[YD]
IXY] ~ ELXIEIY
A useful identity is
om(Sads = SF econx.yp
When n = mand ¥; = Xii =
formula gives
)
‘The correlation between X and ¥, denoted by (X,Y), is
defined by
the preceding
Yvwexy + 2Y comers vp
i 14
CoviX.¥)
(X,Y) =
If X and ¥ are jointly discrete random variables, then the
conditional expected value of X, given that ¥ = y, is
defined by
EIXIY =y]= xP =H =y]
ILX and ¥ are jointly continuous random variables, then
EIXIY =
where
computed conditional on the event that Y
the properties of ordinary expectations.
Let E[X|¥] denote that function of ¥ whose value at
Y¥ =yis E[XIY = 5]. A very useful identity is
21x] = ELELXIYI]
In the case of discrete random variables, this equation
reduces to the identity
EIX] =O EIXIY = yIP(Y = y}
7
and, in the continuous case, to
EUX|= | FIXIY = Wrordy
‘The preceding equations can often be applied to obtain
E{X] by first “conditioning” on the value of some other
random variable Y. In addition, since, for any event A,
P(A) = Ella}, where [4 is lif A occurs and is 0 otherwise,
we can use the same equations to compute probabil
The conditional variance of X, given that
defined by
Var(XI¥ = y) = BUX — ELXIY = y)1¥ =y)]
Let Var(X|¥) be that function of ¥ whose value at ¥ = y
is Var(X1¥ = y). The following is known as the conditional
variance formula:
Var(x)
{Var(X1¥)] + Var(BLXIY)
‘Suppose that the random variable X is to be observed and,
‘on the basis of its value, one must then predict the value of
the random variable ¥. In such a situation, it turns out that
‘among all predictors, E[YLX] has the smallest expectation
of the square of the difference between it and Y.
‘The moment generating function of the random vari-
able X is defined by
Mo = Fe)
‘The moments of X can be obtained by successively differ
cntiating M(O and then evaluating the resulting quantity
at = 0. Specifically, we have
e
BUX" = Mo] n= 1,2...
hao
‘Two useful results concerning moment generating func-
tions are, first, that the moment generating function
uniquely ‘determines the distribution function of the