0% found this document useful (0 votes)
161 views94 pages

Expectation

this is the basic chapter for the expectation
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
161 views94 pages

Expectation

this is the basic chapter for the expectation
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 94
> PROPERTIES OF EXPECTATION Contents 1 Introduction 5 Conditional Expectation 2. Expectation of Sums of Random 6 Conditional Expectation and Variables Prediction 3. Moments of the Number of Events that 7 Moment Generating Functions Occur 8 Additional Properties of Normal Random 4 Covariance, Variance of Sums, and Variables | Correlations Introduction 9 General Definition of Expectation In this chapter, we develop and exploit additional properties of expected values. The expected value of the random variable X is defined by FIX] = xp) where X isa discrete random variable with probability mass function p(x), and by aux= [7 xfards when X is a continuous random variable with probability density function f(x). Since E[X/ is a weighted average of the possible values of X, it follows that if X must lie between a and 6, then so must its expected value. That is, if PlasXsbj=1 then a= 5X] = ‘To verify the preceding statement, suppose that X is a discrete random variable for which Pla = X = b) = 1. Since this implies that p(x) = 0 for all x outside of the interval [a, b), it follows that EIX]= D> we) rH~0 = DO oe xp@ro =a DY p@® peo From Chapter 7 of A First Course in Probability, Ninth Edition. Sheldon Ross. Copyright © 2014 by Pearson Education, Inc. All rights reserved. Properties of Expectation In the same manner, it can be shown that E[X] = b, so the result follows for discrete random variables. As the proof in the continuous case is similar, the result follows, 2 Expectation of Sums of Random Variables Example 2a For a two-dimensional analog of propositions that give the computational formulas for the expected value of a function of a random variable, suppose that X and ¥ are random variables and g is a function of two variables. Then we have the following result, If X and ¥ have a joint probability mass function p(x,y), then ElgX.¥)] =) Dey») yor ICX and ¥ have a joint probability density function (x,y), then aieanvnl= [f° sere aray Let us give a proof of Proposition 2.1 when the random variables X and ¥ are jointly continuous with joint density function f(x,y) and when g(X, Y) is a nonnega- tive random variable. Because g(X, ¥) = 0, we have etacx. y= ["Pigex.y) > nat Writing Pig(X.Y) > a=f ff satynst never tonetey>t shows that Interchanging the order of integration gives 109) ewcrnl= ff [Eso aray as Ie Jy Jiao -f[ f B0xy) fla.y)dy dr ‘Thus, the result is proven when g(X, ¥) is a nonnegative random variable. The gen- «eral case then follows as in the one-dimensional case. ‘An accident occurs at a point X' that is uniformly distributed on a road of length L. ‘At the time of the accident, an ambulance is at a location ¥ that is also uniformly distributed on the road. Assuming that X and Y are independent, find the expected distance between the ambulance and the point of the accident. Solution We need to compute E[|X — ¥/]. Since the joint density function of X and Yis Soy) ap O 307 308 Example 2g Example Properties of Expectation Mean of a hypergeometric random variable i mballs are randomly selected from an urn containing N balls of which m are white, find the expected number of white balls selected. Solution Let X denote the number of white balls selected, and represent X as XEN to + Xm where 11 ifthe ith white balls selected 0 otherwise Now AUX) = PU = 1) Pith white ball is selected} 1\(w~1 1)(n—1 ( A) n N Hence, BLK] = LX) + + EX) = We could also have obtained the preceding result by using the alternative represen- tation XaY tot Yn where y, = |1__ ifthe ith ball selected is white = 10 otherwise Since the ith ball selected is equally likely to be any of the N balls, it follows that aYl=% so mm BUX] = EY) + + EY] = 5 . Expected number of matches ‘Suppose that N people throw their hats into the center of a room. The hats are mixed up, and each person randomly selects one. Find the expected number of people who select their own hat. Solution Letting X denote the number of matches, we can compute E[X] most eas- ily by writing XeX eX tet Xy Example Properties of Expectation where 1 if the ith person selects his own hat 0 otherwise Since, for each i, the ith person is equally likely to select any of the N hats, BUX] = PX = 1) Thus, EIX] = E[Xi) + + + ELXy] R)N= Hence, onthe average, exactly one person selects his own hat . Coupon-collecting problems ‘Suppose that there are N types of coupons, and each time one obtains a coupon, it is equally likely to be any one of the NV types. Find the expected number of coupons ‘one needs to amass before obtaining a complete set of at least one of each type. Solution Let X denote the number of coupons collected before a complete set is attained, We compute E[X] by using the same technique we used in computing the mean of a negative binomial random variable (Example 2f). That is, we define X;,i = 0,1,...,N — 1 to be the number of additional coupons that need be obtained after i distinct types have been collected in order to obtain another distinct type, and we note that XaXy$ Mt Xa ‘When {distinct types of coupons have already been collected, a new coupon obtained will be of a distinct type with probability (N’ — /N. Therefore, me ww) = or, in other words, X; is a geometric random variable with parameter (N — i)/N. Hence, PUG =k) axl= implying that Ten hunters are waiting for ducks to fly by. When a flock of ducks flies overhead, the hunters fire at the same time, but each chooses his target at random, independently of the others. If each hunter independently hits his target with probability p, com- pute the expected number of ducks that escape unhurt when a flock of size 10 flies overhead. 310 Example Properties of Expectation Solution Let X; equal 1 if the ith duck escapes unhurt and 0 otherwise, for i 2,..2,10. The expected number of ducks to escape can be expressed as LX, + ++ + Xo) = ELL) + + + ELMol ‘Tocompute £[Xi] = P(X; = 1), we note that each of the hunters will, independently, hit the ith duck with probability p/10, so px= n= (1 -4) Hence, Expected number of runs Suppose that a sequence of n 1’s and m 0's is randomly permuted so that each of the (x + m)!/(nlml) possible arrangements is equally likely. Any consecutive string of 1'sis said to constitute a run of 1's—for instance, ifn = 6,m = 4, and the ordering is 1, 1,1,0, 1,1, 0, 0,1, 0, then there are 3 runs of 1's—and we are interested in computing the mean number of such runs. To compute this quantity, let 1 ifarun of 1'sstarts at the ith position. T= 10 otherwise ‘Therefore, R(1), the number of runs of 1, can be expressed as nim R= Yh and it follows that nee FUR) = 5 Ud Now, E{h] = P(*1" in position 1) atm andforl X;,3> Yi}, it follows that D?, the square of the distance from the origin, is given by »-(5) -(&) =Sa@? + YD + PD Kx + YY) a i =n + CL eosoieng + sinoysing ‘i ant 312 Example 2m Properties of Expectation where cos? 6; + sin®@; = 1. Taking expectations and using the independence of 6 and 6; when i # j and the fact that an 2nEfcos6i] =f cosudu = sin2x — sind =0 N an 2n [sind] f sinudu = cos0 — cos2x =0 lo we arrive at ig Analyzing the qu sort algorithm Suppose that we are presented with a set of n distinct values x1,*2,-.-,%» and that we desire to put them in increasing order, or as it is commonly stated, to sort them. Anefficient procedure for accomplishing this task is the quick-sort algorithm, which is defined as follows: When n = 2, the algorithm compares the two values and then puts them in the appropriate order. When n > 2, one of the elements is randomly chosen—say it is x/—and then all of the other values are compared with xj. Those smaller than x; are put in a bracket to the left of x; and those larger than x; are put in a bracket to the right of x The algorithm then repeats itself on these brackets and, continues until all values have been sorted, For instance, suppose that we desire to sort the following 10 distinct values: 5,9,3,10,11,14,8,4,17,6 We start by choosing one of them at random (that is, each value has probability 4, of being chosen). Suppose, for instance, that the value 10 is chosen. We then compare ‘each ofthe others to this value, putting ina bracket to the left of 10 all those values smaller than 10 and to the right all those larger. This gives 15,9,3,8,4,6),10, (11,14,17] ‘We now focus on a bracketed set that contains more than a single value—say the one on the left of the preceding—and randomly choose one of its values—say that 6 is, chosen. Comparing each of the values in the bracket with 6 and putting the smaller ones in a new bracket to the left of 6 and the larger ones in a bracket to the right of 6 gives 15,3,4),6,(9,8),10, (11, 14,17), If we now consider the leftmost bracket, and randomly choose the value 4 for com- parison, then the next iteration yields (3),4,{5),6, (9,8), 10, (11, 14,17) ‘This continues until there is no bracketed set that contains more than a single value. If we let X denote the number of comparisons that it takes the quick-sort algo- rithm to sort n distinct numbers, then E[X] is a measure of the effectiveness of this algorithm. To compute E[X], we will first express X as a sum of other random vari- ables as follows. To begin, give the following names to the values that are to be sorted: Let 1 stand for the smallest, let 2 stand for the next smallest, and so on. Then, for 1 =i < j =n, let [(i,{) equal 1 if and j are ever directly compared, and let it equal 0 otherwise, With this definition, it follows that Properties of Expectation mtn x-D yp is implying that ie axy=2/3 Oran iam et =D DY Ales) i an =D Piiand jare ever compared} im To determine the probability that i and j are ever compared, note that the values ii + Ly... — 1,j will initially be in the same bracket (since all values are initially in the same bracket) and will remain in the same bracket if the number chosen for the first comparison is not between i and j. For instance, if the comparison number is, larger than j, then all the values ,i + 1,...,j — 1, will go in a bracket to the left of the comparison number, and if tis smaller than i, then they will all go in a bracket, to the right. Thus all the values i,i + 1,...,j ~ 1,/ will remain in the same bracket ‘until the first time that one of them is chosen as a comparison value. At that point all the other values between i and j will be compared with this comparison value. Now, if this comparison value is neither i nor j, then upon comparison with it, i will go into a left bracket and jinto a right bracket, and thus i and j will be in different brackets, and so will never be compared. On the other hand, if the comparison value of the setii + 1,...,j — Ljiseither i or j, then there will be a direct comparison between i and j. Now, given that the comparison value is one of the values between i and j, it follows that it is equally likely to be any of these j — i + 1 values, and thus the probability that itis either ‘or jis 27 — i + 1). Therefore, we can conclude that P{i and j are ever compared) and aneES ‘To obtain a rough approximation of the magnitude of E[X] when m is large, we can approximate the sums by integrals. Now 2 ™ 2 Ye ysrat* Lyeaat* =2logte - 1+ DI, = 2login — i + 1) — 2logt2) Dlogin — i + 1) 313 Properties of Expectation ‘Thus not E[X] = Yo 2login - i + 1) a 22 [roger — x + Da h =2 f logyndy = 2ylogy) — yz ~ Inlogin) ‘Thus we see that when 7 is large, the quick-sort algorithm requires, on average, approximately 2n log(n) comparisons to sort n distinct values. . Example The probability of a union of events Let A1,...An denote events, and define the indicator variables X;,i = 1,...,m, by Now, note that a 1 if U Aj occurs Hence, Expanding the left side of the preceding formula yields (Ua) =E mn - Lyx + VY Kx tg ‘See a eae, x] ox However, 1 if Aj Ais Ay occurs XX Xa 0 otherwise so BUX + Xig) = Pi) ‘Thus, Equation (2.3) is just a statement of the well-known formula for the union of events: PUA) = PAD — PAA) + DY Pian) 7 13 ‘spe So DM IPCAL An) . 314 Properties of Expectation ‘When one is dealing with an each having a finite expectation, E [& x| 5 Lae e4) finite collection of random variables Xi,i = 1, not necessarily true that ‘To determine when (2.4) is valid, we note that 5° Xi | (25) Hence, Equation (2.4) is valid whenever we are justified in interchanging the expec- tation and limit operations in Equation (2.5). Although, in general, this interchange is not justified, it can be shown to be valid in two important special cases: 1. The X; are all nonnegative random variables. (That is, P(X; = 0) = 1 for all i. 2. BLIXil] < 00. Consider any nonnegative, integer-valued random variable X. If, for each i = 1, we define 1 itxed Malo tx ) games played, and let A denote the set of all 2\°/ pos- sible tournament outcomes. Then, with f(s) defined as the number of Hamiltonian paths that result when the outcome is s € A, we are asked to show that max f(s) = To show tis consider the randomly chosen outcome $ that is obtained when the results ofthe () games ar independent, with each contestant being equally likely to win each encounter. To determine E[f(S)], the expected number of Hamiltonian paths that result from the outcome S, number the n! permutations, and, for i Ay--synly let x,— [1+ ifpermutation fis a Hamiltonian = 10, otherwise Since £9) = OX it follows that BY =P ELX Because, by the assumed independence of the outcomes of the games, the probabil- ity that any specified permutation is a Hamiltonian is (1/2)"~, it follows that FIX = PUG =) = ay" Therefore, ELf(S)] = mi/2y" Since f(S) is not a constant random variable, the preceding equation implies that there is an outcome of the tournament having more than n!/2"-! Hamiltonian paths. . A grove of 52 trees is arranged in a circular fashion. If 15 chipmunks live in these trees, show that there is a group of 7 consecutive trees that together house at least 3 chipmunks. Solution Let the neighborhood of a tree consist of that tree along with the next six trees visited by moving in the clockwise direction. We want to show that for any choice of living accommodations of the 15 chipmunks, there isa tree that has at least 317 318 Proposition For arbitrary numbers x, 22 Properties of Expectation 3 chipmunks living in its neighborhood, To show this, choose @ tree at random and let X denote the number of chipmunks that live in its neighborhood. To determine E{X], arbitrarily number the 15 chipmunks and for i = 1,...,15,let x, = [1+ ifchipmunk é ives in the neighborhood of the randomly chosen tree = 0, otherwise Because 1s x= Dx a wwe obtain that 1s EX] = AIX a However, because X will equal 1 ifthe randomly chosen tree is any ofthe 7 trees consisting of the te in which chipmunk i lives along with its 6 neighboring trees ‘when moving in the counterclockwise direction, X= == 5 Consequently, 105 BUX] = > 2 showing that there exists a tree with more than 2 chipmunks living in its neig borhood. "2.2. The Maximum-Minimums Identity ‘We start with an identity relating the maximum of a set of numbers to the minimums. of the subsets of these numbers maxi = ox — Yo minca,x) + D> ming.}.x4) Tf ire eee + (DT ming... yn) Proof We will give a probabilistic proof of the proposition. To begin, assume that all the x; are in the interval [0, 1]. Let U be a uniform (0, 1) random variable, and define the events Ait 1n, by Ai = (U < xj). That is, A; is the event that the uniform random variable is less than xj. Because at least one of these events Aj will occur if U is less than at least one of the values x;, we have that UA [em Therefore, PUAD plu < maxx; maxx; Also, P(A) = PIU < xi} Properties of Expectation In addition, because all of the events Aj,,...,Ai, will occur if U is less than all the values x,,..-,2,, We see that the intersection of these events is Ai Ai us min vf her implying that PIA Ai) fu - min. = inn Fler ‘Thus, the proposition follows from the inclusion-exclusion formula for the probabil- ity of the union of events: PUA) = > P(A) = OPA) + SO PLAASAR) ro ie tie CDP CAL An) When the x; are nonnegative, but not restricted to the unit interval, let c be such that all the x; are less than c. Then the identity holds for the values yi = xi/c, and the desired result follows by multiplying through by c. When the 2 can be negative, let bb be such that x; + 6 > 0 for all i, Therefore, by the preceding, max(x; + 6) = SoG + 6) — So ming; + 6,4; + 6) ‘ 7 iy eee (DT miner + by... + BY Letting M = Yom — Yo min(eix) + + (CDT miner... .%9) i we can rewrite the foregoing identity as But o=a-1)" ‘The preceding two equations show that maxx; =M and the proposition is proven. o It follows from Proposition 2.2 that for any random variables Xi,..., max Xi = DXi — Y)min(X,X) + + (minX, Xn) tf 319 320 Properties of Expectation ‘Taking expectations of both sides of this equality yields the following relationship between the expected value of the maximum and those of the partial minimums: E [max x] = Dax) — D Almincxi,X)] ta to DMT minX, Xn] (27) ‘Coupon collecting with unequal probabilities, Suppose there are n types of coupons and that each time one collects a coupon, it is, independently of previous coupons collected, a type i coupon with probability pi, Lpi = 1. Find the expected number of coupons one needs to collect to obtain a Complete set of at least one of each type. Solution If we let X; denote the number of coupons one needs to collect to obtain a type i, then we can express X as, X= max Xi Because each new coupon obtained is a type i with probability p, Xj is a geometric random variable with parameter pi. Also, because the minimum of X; and X; is the number of coupons needed to obtain either a type i or a type j, it follows that for i # j, min (X;, X)) is a geometric random variable with parameter p; + pj Similarly, min (X;,Xj, Xe), the number needed to obtain any of types i,j, and k, is a geometric random variable with parameter pi + pj + px, and soon. Therefore, the identity (2.7) yields 1 1 1 A= Da, ~ Ln +P + Za +P + PK 1 Pt +Pn tet (DE Noting that and using the identity 1-]Ja- it Doe = oe Pe ge yt ert sper shows, upuutintegsating dae ideutity that ax=[" (1 [Ja - ) a en is a more useful computational form, . Properties of Expectation 3 Moments of the Number of Events that Occur Many of the examples solved in the previous section were of the following form: For given events A1,...,An, ind E[X], where X is the number of these events that occur. ‘The solution then involved defining an indicator variable J; for event Aj such that 1, if Aj occurs 0, otherwise Because x=Dh we obtained the result E[X]=E [=] Yan = ray @.a) int isi iat Now suppose we are interested in the number of pairs of events that occur. Because /if; will equal 1 if both A; and Aj occur and will equal 0 otherwise, it fol- lows that the number of pairs is equal to }°;_, i But because X is the number of events that occur, it also follows that the number of pairs of events that occur is (3). Consequently, 20 @)= where there are (3) terms in the summation. Taking expectations yields x £\ (5) )= Dat =D Paap G2) i i x(x — 1) [22] =D Pa) 5 giving that ELX?] - EX] =20 PAA) (3.3) which yields E[X2), and thus Var(X) = E[X?] — (ELX)?. Moreover, by considering the number of distinct subsets of k events that all occur, we see that ()- YL he ‘Taking expectations gives the identity {(D]- Yo Algtge l= YL PAyAge A ieeeociy hehe G4) 321 Example Example Properties of Expectation ‘Moments of binomial random variables Consider n independent trials, with each trial being a success with probability p. Let, Aj be the event that trial i isa success. When i # j, P(A:A) = p?. Consequently, gent 03) [@]-E"-@e ig E[X(X = 1] =n — Dp? or E[X?] — EX] =n(n — typ? Now, E[X] = D7 Var(X) = FLX" (Ai) = np, so, from the preceding equation — (E[X)? = n(n — 1p? + np — (npy = nptl — p) In general, because P(Aj, Aj, --- Aq) = p*, we obtain from Equation (3.4) that AQ]- 2G or, equivalently, EIX(X — 1) = k + Jan — Yen = k + Dp ‘The successive values E[X*], k = 3, can be recursively obtained from this identity For instance, with k = 3, it yields BIX(X = 1(X = 2)] = n(n — 1m = 2p* BIX? — 3X? + 2X] = n(n ~ 1)(n — 2p? ELX?| = 3X7] — 2E[X] + men — 1)(n — 2p 3n(n — 1p? + np + nin — In — 2p? . Moments of hypergeometric random variables ‘Suppose n balls are randomly selected from an urn containing N balls, of which m are white, Let A; be the event that the ith ball selected is white. Then X, the number of white balls selected, is equal to the number of the events Ai,...,An that occur. Because the ith ball selected is equally likely to be any of the N balls, of which m are white, P(Ai) = m/N. Consequently, Equation (3.1) gives that E[X] = D7. P(A) nm/N. Also, since PAA) = PADPCASAD) Example 3e Properties of Expectation ‘we obtain, from Equation (3.2), that (O]-E8e3- Ose or mim — 1) AIX(K — Di =n — Dy showing that BEA] =m yn = 2 + BUX) ‘This formula yields the variance of the hypergeometric, namely, Var(X) = E[X?] — (E[X)? enn = MD tt _ mt NW 1) * -me = Ym — 1) Nw N-1 Higher moments of X are obtained by using Equation (3.4). Because m(m = 1)---(m ~k +1) PAwAa AW) = NOS TN ED, Equation (3.4) yields A () . mm — 0 (m =k +1) “L) | TA NW =D ED, EIX(K =) k + 1) or ke ke) = nln = Doon =k + DR the match problem For i = 1,...,N, let Aj be the event that person i selects his or her own hat in the match problem. Then A) = ANA) = bo PUA) = PADPUAIAD = > ‘which follows because, conditional on person i selecting her own hat, the hat selected by person j is equally likely to be any of the other N’ — 1 hats, of which one is his own. Consequently, with X equal to the number of people who select their own hat, 1 ollows trom Equation (3.2) that [@)|-Exa—- (aa thus showing that ELX(X - 0) 323 324 Properties of Expectation ‘Therefore, E[X?] = 1 + E[X]. Because E[X] = DM, PrAd = 1, we obtain that Var(X) = E[X"] — (E[X1) Hence, both the mean and variance of the number of matches is 1. For higher moments, we use Equation (3.4), along with the fact that P(4,,Ai,- N= aN=rpT to obtain (| = (maa EIX(X = 1) =k +H) Another coupon-collecting problem Suppose that there are N distinct types of coupons and that, independently of past types collected, each new one obtained is type j with probability pj, "pj = 1 Find the expected value and variance of the number of different types of coupons that appear among the fist n collected. Solution We will find it more convenient to work with the number of uncollected, types. So, let ¥ equal the number of types of coupons collected, and let X = N — Y denote the number of uncollected types. With A; defined as the event that there are ‘no type {coupons in the collection, X is equal to the number of the events Ay,..., AN that occur. Because the types of the successive coupons collected are independent, and, with probability 1 — p; each new coupon is not type i, we have P(A) = (0 — pi” N Mi. — po, from which it follows that Hence, E[X] = w E[Y]=N - E[X]=N - oa - pa” i Similarly, because each of the n coupons collected is neither a type i nor a type j coupon, with probability 1 — p; — pj, we have PUA) = 1 pi pp, t#j Thus EIX(X — 0) =2)0 PAA) = 270 — pi - pp” ig *g or — pi" + ELX) Hence, we obtain Var(¥) = Var(X) = UX?) — (EEX)? n N =200 - pi - pp" + Da - pa" - | oa - po" i = a Example 3e Properties of Expectation In the special case where pi = 1/N, ...N, the preceding formulas give i am=nfi-(0-3)] Var(¥) = NON = v(1 = zy + w(t - ») - Pag - iy" 1. and ‘The negative hypergeometric random variables Suppose an urn contains n + m balls, of which n are special and m are ordinary. These items are removed one at a time, with each new removal being equally likely to be any of the balls that remain in the urn. The random variable ¥, equal to the number of balls that need be withdrawn until a total of r special balls have been removed, is said to have a negative hypergeometric distribution. The negative hypergeometric distribution bears the same relationship to the hypergeometric distribution as the negative binomial does to the binomial. That is, in both cases, rather than considering a random variable equal to the number of successes in a fixed number of trials (as are the binomial and hypergeometric variables), they refer to the number of trials needed to obtain a fixed number of successes. ‘To obtain the probability mass function of a negative hypergeometric random variable Y, note that ¥ will equal k if both 1. the first k — 1 withdrawals consist of r — 1 special and k ~ r ordinary balls and 2. the kth ball withdrawn is special. Consequently, poy = ky = Walle) met (it) n+m—ka1 We will not, however, ulilize the preceding probability mass function (o obtain the mean and variance of Y. Rather, let us number the m ordinary balls as 01,...,0ms and then, for each i = 1,...,n, let A; be the event that o; is withdrawn before r special balls have been removed. Then, if X is the number of the events Ay,..-,Am that occur, it follows that X is the number of ordinary balls that are withdrawn before a total of r special balls have been removed. Consequently, Yeor+X showing that BWYJ=r + EIXJ=r + SO PAD a To determine P(A;), consider the m + 1 balls consisting of o; along with the n special balls. Of these m +1 balls, o; is equally likely to be the first one withdrawn, or the second one withdrawn, ..., or the final one withdrawn. Hence, the probability that it is among the first r of these to be selected (and so is removed before a total or r special balls have been withdrawn) is ty. Consequently, net PAD = 325 326 Example af Properties of Expectation and ‘Thus, for instance, the expected number of cards of a well-shuffied deck that would need to be turned over until a spade appears is 1 + }#{ = 3.786, and the expected number of cards that would need to be turned over until an ace appears is 1+ $=106. ‘To determine Var(Y) = Var(X), we use the identity EIX(X = 1) =20 PA) i Now, P(A;4)) is the probability that both o; and o are removed before there have been a total of r special balls removed. So consider the n + 2 balls consisting of 0,0), and the n special balls. Because all withdrawal orderings of these balls are equally likely, the probability that 0; and oj are both among the first r + 1 of them to be removed (and so are both removed before r special balls have been withdrawn) is OW rr +1) PAA) (CH) @ FDO FD Consequently (m rr +1) sxe 01=9(2) aes so » _ rr +1) AXP] = mom — DE Tay + AI Because E[X] = my. this yields re ee Var(Y) = Var(X) = mim = DE Dare + aed ( A\little algebra now shows that mr(n + 1 = rn + m +1) Vary) + 1m + 2) Singletons in the coupon collector's problem Suppose that there are n distinct types of coupons and that, independently of past types collected, each new one obtained is equally likely to be any of the 7 types. Suppose also that one continues to collect coupons until a complete set of at least fone of each type has been obtained. Find the expected value and variance of the ‘number of types for which exactly one coupon of that type is collected. Solution Let X equal the number of types for which exactly one of that type is collected. Also, let 7; denote the ith type of coupon to be collected, and let Ai be the event that there is only a single type T; coupon in the complete set. Because X is equal to the number of the events Ai,...,An that occur, we have al=So Pan Now, at the moment when the first type 7; coupon is collected, there remain n — i types that need to be collected to have a complete set. Because, starting at this Properties of Expectation moment, each of these n — i + 1 types (the m — i not yet collected and type Ti) 18 equally likely to De the last of these types to be collected, it follows that the type 1 wil be the las ofthese types (and so willbe a singleton) with probability ;=7- Consequently, P(A,) yielding aT oy a-Si Lh ‘To determine the variance of the number of singletons, let S,, for i < j, be the event that the first type 7; coupon to be collected is still the only one of its type to have been collected at the moment that the first type 7; coupon has been collected. Then PUVA) = PLAIAIS,) POS) Now, P(Siy) is the probability that when a type 7; has just been collected, of the n — i + 1 types consisting of type T; and the n — ias yet uncollected types, a type T; is not among the first j — i of these types to be collected. Because type T; is equally likely to be the first, or second, or ...,2 — i + 1 of these types to be collected, we have antl “nHi-t P(Sy)=1 - fa Now, conditional on the event S,j, both A; and A; will occur if, at the time the first type T; coupon is collected, of the m ~ j + 2 types consisting of types 7:, 7), and the n — jas yet uncollected types, T; and T; are both collected after the other n — j. But this implies that PALAIS) =2 yet Therefore, PAA) @+i-ia@t2—-p 1 *? yielding 1 AK D=4¥ GST ae Pe Consequently, using the previous result for E[X1], we obtain 1 : a Lasiaomse—p * Li al 4 Covariance, Variance of Sums, and Correlations ‘The following proposition shows that the expectation of a product of independent random variables is equal to the product of their expectations. Proposition If X and Y are independent. then. for any functions A and g. 4 Eig XOhY)] = ElgeOletay)] 328 Proposition 42 Properties of Expectation Proof Suppose that X and ¥ are jointly continuous wit joint density f¢x,y). Then secon] = J J eeororveay) dedy = [°F ecomonneoron acay = [-royronwy [™ seopee as = EIAY)E[g(X)] The proof in the discrete case is similar. a Just as the expected value and the variance of a single random variable give us information about that random variable, so does the covariance between two random variables give us information about the relationship between the random variables. Defi ‘The covariance between X and Y, denoted by Cov (X, ¥), is defined by Cov(X, ¥) = E[(X — FLX) (Y — ELY)) Upon expanding the right side of the preceding definition, we see that Cov(X,¥) EIXY = E[XJY — XE[Y] + ELYIELXI EIXY] - ELXJELY] - ELXIELY] + ELXIELY] EUXY] ~ ELXIE(Y] Note that if X and Y are independent, then, by Proposition 4.1, Cov(X, ¥) = 0. However, the converse is not true. A simple example of two dependent random variables X and Y having zero covariance is obtained by letting X be a random variable such that PIX = =NaPk ==} and defining 0 tx #0 Y=11 itx=0 Now, XY = 0, 0 E[XY] = 0. Also, E[X] = 0. Thus, Cov(X, ¥) = EIXY] ~ ELXIE[Y] = 0 However, X and ¥ are clearly not independent. ‘The following proposition lists some of the properties of covariance, (i) Cov(x, ¥) = Cow, x) (ii) Cow(X,X) = Var(x) (iii) Cov(ax, ¥) = a Cov(x, ¥) Properties of Expectation it jet fat jot cor (Ef) =F Scouxey) Proof of Proposition 42 Parts (i) and (ii) follow immediately from the definition of covariance, and part (iii) is left as an exercise for the reader. To prove part (iv), which states that the covariance operation is additive (as is the operation of taking expectations), let 4s = E[Xi] and vj = E[¥)). Then and 0 & Me (X= wen - | fi =E) OO - voy) - »| fai jot YAK = HY) — w)) where the last equality follows because the expected value of a sum of random vari a ables is equal to the sum of the expected values. It follows from parts (ii) and (iv) of Proposition 4.2, upon taking ¥) = X), Ayn, that wm ($x) =e (Eds) LY cowxx) in = Dvarixy + OY Comix) = 4 Since each pair of indices i,j,i # j, appears twice in the double summation, the pre- ceding formula is equivalent to Var («)- Lever + 2D coun. xy (aay 1g 329 Example 4a 330 Properties of Expectation It X;,-..,Xn are pairwise independent, in that Xj and Xj are independent for 1% J, then Equation (4.1) reduces to Var (& x) =Soverex ia ‘The following examples illustrate the use of Equation (4.1). Let Xi: expected value wand variance a?, and as in Example 2c, let X = $5 Xi/m be the LX,, be independent and identically distributed random variables having sample mean, The quantities X; — X,i = 1,...,m, are called deviations, as they equal the differences between the individual data and the sample mean, The random variable yo (Xi = XP SPT is called the sample variance. Find (a) Var(X) and (b) ES" ve (+) 1)? _ +) > var(xXi) by independence Solution (a) Var) = a a g 2 =/9 (b) We start with the following algebraic (n = 1S? = YX — w+ w - KP i wy? + DO - w? - 2K - WM - w iat ia wy? +n ~ pw)? — 20% - wn - 4) = w)? = n& = pw)? ‘Taking expectatinns of the preceding yields DA - wy] = nBLR - 7) 7 no® — nVari®) =(n = Io? (n - DES? Example 4b Example Properties of Expectation where the final equality made use of part (a) of this example and the one preceding it made use of the result of Example Zc, namely, that £[X] = jt. Dividing through by n — 1 shows that the expected value of the sample variance is the distribution variance 0, . Our next example presents another method for obtaining the variance of a bino- mial random variable. Variance of a binomial random var le ‘Compute the variance of a binomial random varlable 2 with parameters m and p. Solution Since such a random variable represents the number of successes in n inde- pendent trials when each trial has the common probability p of being a success, we may write KaX bot hy where the X; are independent Bernoulli random variables such that 1 ifthe ith trial is a success ~ 0 otherwise Xa Hence, from Equation (4.1), we obtain Vat(X) = Var(Xs) + ++ + Var(Xn) But Var(Xi) = ELX?] — (ELXi)? = FIX] — (E[Xi)? siuwe X? = Ae Thus, Var(X) = np ~ p) . Sampling from a finite population Consider a set of N people, each af whom has an opinion about a certain ject that is measured by a real number v that represents the person’s “strength of feeling” about the subject. Let v; represent the strength of feeling of person i, 1..N. Suppose that the quantities vj,i = 1,...,N, are unknown and, to gather infor- ‘mation, a group of m of the IV people is “randomly chosen” in the sense that all of the ™) subsets of size n are equally likely tobe chosen. These n people are then ques- tioned and their feelings determined. If $ denotes the sum of the n sampled values, cteumine ity asean au variance. ‘An important application of the preceding problem is to a forthcoming election in which each person in the population is either for or against acertain candidate or proposition. If we take vj to equal 1 if person ‘is in favor and 0 if he or she is against, W then ¥ = 5° vi/N represents the proportion of the population that is in favor. To estimate ¥, a random sample of n people is chosen, and these people are polled. 331 Properties of Expectation ‘The proportion of those polled who are in favor—that is, S/n—is often used as an estimate of ¥. Solution For each person i,i = 1, whether or not that person is include: .N, define an indicator variable J; to indicate in the sample, That is, 1 if person ‘is in the random sample 0 otherwise h= Now, 5 can be expressed by x s= Dut N BIS = vet o ‘Var($) = )> Var(vili) + 2) D> Covivili, wih) 19 ” Yvivarchy + 22D wvjCovili, hp i 1 Because aul=5 nn-1 BU = a it follows that Var(li) = Covilist) Hence, 332 Properties of Expectation ‘The expression for Var(S) can be simplified somewhat by using the identity N (yy + ++: + wy = Ev? + 2 Doi. After some simplification, we obtain 7 nN = n) Varis) = "= Consider now the special case in which Np of the v’s are equal to 1 and the remainder equal to 0. Then, in this case, S is a hypergeometric random variable and hhas mean and variance given, respectively, by E[S]=nv=np since and vai =H (88 — 9) n) “WoT Pa - P) ‘The quantity S/n, equal to the proportion of those sampled that have values equal to, 1, is such that esl ‘The correlation of two random variables X and ¥, denoted by p(X, Y),is defined, as long as Var(X) Var(¥) is postive, by (X,Y) It can be shown that 15 p(X) S1 (42) ‘To prove Equation (4.2), suppose that X and ¥ have variances given by a? and a2, respectively. Then, on the one hand, o=vee(2 + x) a * os Vi wi cit3) nee 2Cov(X, Y) a oF x8 = lt + (X.Y) implying that -15 (X.Y) 334 Example 4e Properties of Expectation osvar(X—-¥ a ay = Yar), Var(¥) _ 2Cov(x, ¥) oe Cay? O20 =2l - (XY) On the other hand, implying that oyet o(X.Y) = which completes the proof of Equation (4.2). In fact, since Var(Z) = 0 implies that Z is constant with probability 1, it follows from the proof of Equation (4.2) that p(X, ¥) = 1 implies that ¥ =a + bX, where b = «4/0, > Vand (X,Y) = —1 implies that ¥ = a + bX, where b = —0,/o. < 0. ‘We leave it as an exercise for the reader to show that the reverse is also true: that if Y =a + bX, then p(X, ¥) is either +1 or ~1, depending on the sign of b. ‘The correlation coefficient is a measure of the degree of linearity between X and ¥. A value of o(X, ¥) near +1 or ~1 indicates a high degree of linearity between X and Y, whereas a value near 0 indicates that such linearity is absent. A positive value of o(X, ¥) indicates that ¥ tends to increase when X does, whereas a negative value indicates that Y tends to decrease when X increases. If p(X, Y) = 0, then X and ¥ are said to be uncorrelated. Let [4 and Ig be indicator variables for the events A and B. That is, =|} it occurs 4=10 otherwise me {: if B occurs 0 otherwise ‘Then Ella) = POA) Ells] = PB) Ellaln] = PCABY so Cov(la, tn) = P(AB) ~ PLA)P(B) ~ PCBYPCAIB) — PAY) Thus, we obtain the quite intuitive result that the indicator variables for A and B are either positively correlated, uncorrelated, or negatively correlated, depending, ‘on whether P(A|B) is, respectively, greater than, equal to, or less than P(A). ‘Our next example shows that the sample mean and a deviation from the sample ‘mean are uncorrelated. Let X1,...,%q be independent and identically distributed random variables having variance 0. Show that Cov(X; — X,X)=0 Example at Properties of Expectation Solution We have Cov(X; — ¥,X) = Cov(Xi,X) — Cov(¥,X) i¢ = = Cov| Xi, 0%) ] — Var) where the next-to-last equality uses the result of Example 4a and the final equality follows because 0 ifj #1 byindependence o? ifj=i since Var(X) = Although X and the deviation Xj — X are uncorrelated, they are not, in gen- ‘eral, independent. However, in the special case where the X; are normal random variables, it turns out that not only is ¥ independent of a single deviation, but itis independent of the entire sequence of deviations Xj — X,j = 1,...,n. This result will be established in Section 8, where we will also show that, inthis case, the sample mean and the sample variance S? are independent, with ( — 1)S?/o? having a chi-squared distribution with n — 1 degrees of freedom. (See Example 4a for the definition of S?.) . Consider m independent trials, each of which results in any of r possible outcomes, with probabilities pi,...,Pr, Dhan pi = 1. Ifwe let Niyi r, denote the number of the m trials that result in outcome /, then Ni,Nz,...,Ny have the multinomial distribution PUN, = My .065 Np m= Pt Pr nism ii For i # j, it seems likely that when N; is large, N; would tend to be small; hence, itis, intuitive that they should be negatively correlated. Let us compute their covariance by using Proposition 4.2(iv) and the representation M=Diw and N= DG mi im oa 1 iftrial k results in outcome i t= (Seen bees wo= {oss 335 336 Properties of Expectation From Proposition 4.2(iv), we have Cov(Ni, Nj) = >) Covtlitk, 4(6)) Akal Now, on the one hand, when k # £, Cov(li(k), h()) = 0 since the outcome of trial k is independent of the outcome of trial &. On the other hand, Covi(2),4() = EUCOGO] — EOE) = 0 — pipj = —Pipj where the equation uses the fact that 1:(£)1}(2) — 0, since trial £ cannot result in both ‘outcome i and outcome j. Hence, we obtain CovN.N) = ~mpipj ‘hich is in accord with our intuition mat 1; and IV; are negatively correlated. 5 Conditional Expectation Example Sa 5.1 Definitions If X and ¥ are jointly discrete random variables, then the conditional probability mass function of X, given that Y = y, is defined for all y such that PLY =y) > 0,by pay) = P(X =x|¥ =y) = POY? pxyw(ey) = PLX = x pro) Its therefore natural to define, n this case, the conditional expectation of X given that ¥ = y, forall values of y such that py()) > 0, by Dx =x1¥ =y1 = Dxvey) EIXIY = If X and ¥ are independent binomial random variables with identical parameters n and p, calculate the conditional expected value of X given that X + ¥ =m. Solution Let us first calculate the conditional probability mass function of X given that X + Y =m. For k = min(n,m), PUX =k, X + Y=m) PX + Yam) PIX = KIX + Y =m) Properties of Expectation (ero -ora yt gjrta = ora (Jona - pen where we have used the fact that X + Y is a binomial random variable with param- eters 2n and p. Hence, the conditional distribution of X, given that X + Y =m, is the hypergeometric distribution, and from Example 2g, we obtain EIXIX + ¥= Similarly, if X and ¥ are jointly continuous with a joint probability density func- tion ery), then the conditional probability density of X, given that Y = y, is defined for all values of y such that fy(y) > 0 by _ fey Savvy) Fro). It is natural, in this case, to define the conditional expectation of X, given that Y=y,by wo axv=y)=[ xfay (aly) dx provided that f(y) > 0. Example Suppose that the joint density of X and ¥ is given by ewer y fay= O E[RIwin, S = (JQ, However, given that the initial sum is i, the number of additional rolls needed and the outcome (whether a win or a loss) are independent. (This is easily seen by first noting that conditional on an initial sum of i, the outcome is independent of the number of additional dice rolls needed and then using the symmetry property of independence, which states that if event A is independent of event B, then event B is independent of event A.) Therefore, E{Riwin] = > ELRIS =], = 2.938 Although we could determine E[Riplayer loses] exactly as we did E[Riplayer wins], its easier to use E{R] = E[Riwinlp + E{Rilose](1 — p) implying that F{Rilose| = RI = E1Riwinkp _ 3.801 . ‘The bivariate normal joint density function of the random variables X and Y is 2 = —— exp | | (tae)? y (ro 1 ame (ar (CS) + (4) — yp EDO = 2] a. We will now show that pis the correlation between X and ¥. As shown in Exam ple Se, ux = E[X],02 = Var(X), and y1y = E[Y],02 = Var(¥). Consequently, Com(X,¥) = AY XY] = sey amy To determine E[XY], we condition on Y. That is, we use the identity ELXY] = E[ELXYI¥]] Example 58 Properties of Expectation Recalling from Example Sd that the conditional distribution of X given that is normal with mean ji + p2(y — jy), we see that ELXY|Y = y] = = yur + pO — ny) Consequently, « EIXVIY] = Yux + p2(¥? ~ wy¥) % implying that EIXY] = [en + ey? — 9] oy = usFLY] + pEELY? ~ py¥] oy = many + 0% (FIV ~ 13) uly + pEVar(Y) = Matty + porey ‘Therefore, _ Corr(x, ¥) = 28 = p . Oy Sometimes E[X] is easy to compute, and we use the conditioning identity to compute a conditional expected value. This approach is illustrated by our next example. Consider n independent trials, each of which results in one of the outcomes 1,...,k, with respective probabilities p1,....Pks DeaiPi = 1. Let Nj denote the number of trials that result in outcome i, = 1,.....Fori # j,find (@)BINJIN, > 0] and) ELNINE > 1) Solution To solve (a), let Then BIN]] = EIN Oru or, equivalently, BIN)] = E[NjINi = OJP(N; = 0) + ELNjIN: > OJPIN; > 0) Now, the unconditional distribution of Nj is binomial with parameters np). Also, given that N; = r, each of the n — r trials that does not result in outcome i will, independently, result in outcome j with probability P(|not i) = 724. Consequently, Example Properties of Expectation the conditional distribution of Nj, given that N; = r, is binomial with parameters n= riqebe. Because PIN; 1 = pi). the preceding equation yields ny 1 = po" + BININ: > 0] — (1 ~ pa") giving the result a= por E[NjINi > 0] = npj 1-G- py ‘We can solve part (b) in a similar manner. Let 0, ifNj=0 J=}1, it% 2, ifN>1 Then IN|] = EIN} = OPU = 0) + ZIN}U = 1]PU =1) + EIN) 2\PU =2) or, equivalently, IN|] = BINJIN: = OPN; = 0} + ELNGIN: = TPN: + BINJIN; > 1PM: > 1) ‘This equation yields py =n = pit + (= 1B npict = pat! Pi Tp + EIN IN, > 110 ~ 0 = pa" = mpi ~ po") giving the result FINN: > 1) = MAL = @ = part = (ne = dpa = piv} m = 1 0 — pit — mpi — pi It is also possible to obtain the variance of a random variable by conditioning. We illustrate this approach by the following example. ‘Variance of the geometric distribution Independent trials, each resulting in a success with probability p, are successively performed. Let N be the time of the first success. Find Var(N). Solution Let ¥ = 1 ifthe first trial results in a success and ¥ = 0 otherwise. Now, ‘Var(N) = E[N?] — (EIN)? To calculate E[N?], we condi on ¥ as follows: EIN?) = E{E(N*1Y]] However, EIN?Y = 1] =1 E[N*|Y = 0] = Ed + N)*] Properties of Expectation ‘These two equations follow because, on the one hand, if the first trial results in a success, ten, clearly, V = 1; thus, V2 = 1. Un the other hand, ifthe hrst trial results in a failure, then the total number of trials necessary for the first success will have the same distribution as 1 (the first trial that results in failure) plus the necessary number of additional trials. Since the latter quantity has the same distribution as N, we obtain E[N*IY = 0] = E[(1 + N)?]. Hence, EIN? = E[N41Y = PLY = 1) + EINA1Y = ORLY = 0) + ~ pela + NY] = 1+ (1 ~ DERN + 4 However, EIN] = 1/p; therefore, Bin’) =1 + A=?) ? + (1 ~ pye{N?] or EIN?) 2-p Consequently, : Var(N) = E[N?] — (ELN)? -G) Consider a gambling situation in which there are r players. with player i initially having n; units, m; > 0, 1 = 1,7. At each stage, two of the players are chosen to play a game, with the winner of the game receiving I unit from the loser. Any player whose fortune drops to 0 is eliminated, and this continues until a single player has all n = S1_,m; units, with that player designated as the victor. Assuming that the results of successive games are independent and that each game is equally likely to bbe won by either of its two players, find the average number of stages until one of, the players has all units. Solution To find the expected number of stages played, suppose first that there are only 2 players, with players 1 and 2 initially having j and n — J units, respectively. Let Xj denote the number of stages that will be played, and let mj = E[X)]. Then, for j= Ayn — 1, Xa +A) where Aj is the additional number of stages needed beyond the first stage. Taking expectations gives mj =1+ EIA) Conditioning on the result of the first stage then yields ‘my =1 + E[A/IL wins frst stage]1/2 + E[Aj|2 wins first stage}1 /2 Now, if player 1 wins at the first stage, then the situation from that point on is exactly the same as in a problem that supposes that player / starts with j + 1 and player 2 withn — (j + 1) units. Consequently, E{Aj|I wins first stage] = mis 346 Properties of Expectation and, analogously, E{AjI2 wins first stage] = mj-1 Thus, 1 1 Te 5m + am4 or, equivalently, mje, = 2my — mj-1 2 Jd 1 (64) Using that mo = 0, the preceding equation yields my = 2m; - 2 m3 = 2my — my ~ 2=3m, — 6 = 30m ~ 2) mg = 2ms — mz ~ 2= 4m, — 12 =4(m ~ 3) suggesting that Sim — 741) 1” 6S) ‘To prove the preceding equality, we use mathematical induction. Since we've already shown the equation to be true for i = 1,2, we take as the induction hypothesis that it is true whenever i = j < n. Now we must prove that it is true for j + 1. Using Equation (5.4) yields Img =2mj — mya ~ 2 2ilm, =] + 1) G = Dim, = 7 + 2)—2 (by the induction hypothesis) =G+ dm, — oP ++ P—y+2-2 G+ 0m - 7 - G+ Dem - 9 which completes the induction proof of (5.5). Letting i = n in (5.5), and using that ‘mn, = 0, now yields that man 1 which, again using (5.5). gives the result main =) ‘Thus, the mean number of games played when there are only 2 players with initial amounts i and n — is the product of their initial amounts, Because both players play all stages, this is also the mean number of stages involving player 1. ‘Now let us return to the problem involving r players with initial amounts 7,4 Aves, Diam = n. Let X denote the number of stages needed to obtain a and let X; denote the number of stages involving player i. Now, from the point of view of player i, starting with nj, he will continue to play stages, independently being equally likely to win or lose each one, until his fortune is either n or 0. Thus, the number of stages he plays is exactly the same as when he has a single opponent with an initial fortune of n — nj. Consequently, by the preceding result, it follows that AX n(n — ni) Example Properties of Expectation £133 | =m - mp a) a But because each stage involves two players, 1s x 2L% “Taking expectations now yields It is interesting to note that while our argument shows that the mean number of stages does not depend on the manner in which the teams are selected at each stage, not true for the distribution of the number of stages. To see this, suppose and n3 = 2. If players 7 and 2 are chosen in the first stage, then it will take at least three stages to determine a winner, whereas if player 3 is in the first stage, then itis possible for there to be only two stages. . In our next example, we use conditioning to verify that the expected number of uniform (0,1) random variables that need to be added for their sum to exceed 1 is equal toe. Let Uj, Uz,... be a sequence of independent uniform (0, 1) random variables. Find E{N] when Ne=min}m )°U; > | i Solution We will find E[N] by obtaining a more general result. For x « [0, 1} let N@) = and set m(x) = EIN@)] ‘That is, N(x) is the number of uniform (0, 1) random variables we must add until their sum exceeds x, and m(x) is its expected value, We will now derive an equation for m(x) by conditioning on Us. This gives, from Equation (5.1b), 1 ma) = f E[N(@ IU: = yldy (66) Now, wena ft cme HZ! 6 ‘The preceding formula is obviously true when y > x. It is also true when y = x, since, if the first uniform value is y, then, at that point, the remaining number of uniform random variables needed is the same as if we were just starting and were 347 348 Properties of Expectation going to add uniform random variables until their sum exceeded x — y, Substituting Equation (5.7) into Equation (5.6) gives mo =1 +f m(x — y)dy f * by letting +f mnta in Differentiating the preceding equation yields m'(x) = mx) or, equivalently, mx) mx) Integrating this equation gives loglm(x)] =x + ¢ m(x) = ke* ‘Since m(0) = 1, it follows that k = 1, so we obtain m(x) = ‘Therefore, m(1), the expected number of uniform (0, 1) random variables that need. to be added until their sum exceeds 1, is equal toe. . 5.3 Computing Probabilities by Conditioning ‘Not only can we obtain expectations by first conditioning on an appropriate random variable, but we can also use this approach to compute probabilities. To see this, let E denote an arbitrary event, and define the indicator random variable X by 1 it B occurs as 0 if Edoes not occur It follows from the definition of X that E[X] = P(E) EIXIY = y] = PCEIY = y) for any random variable ¥ ‘Therefore, from Equations (5.1) and (5.1), we obtain PE) = DPE = PY =)) it Vis discrete A (58) = [© peer =yfrondy it iscontinuous Note that if Y is a discrete random variable taking on one of the values y1,....Jns then by defining the events F;,i = 1,...,n, by Fi = (Y = yi), Equation (5.8) reduces to the familiar equation P(E) = > P(E\F)PCF)) where F,..., a are mutually exclusive events whose union isthe sample space. Example Properties of Expectation ‘The best-prize problem Suppose that we are to be presented with n distinct prizes, in sequence. After being presented with a prize, we must immediately decide whether to accept it or to reject, it and consider the next prize. The only information we are given when deciding ‘whether to accept a prize is the relative rank of that prize compared to ones already seen. That is for instance, when the fifth prize is presented, we learn how it compares with the four prizes we've already seen. Suppose that once a prize is rejected, itis lost, and that our objective is to maximize the probability of obtaining the best prize. ‘Assuming that all n! orderings of the prizes are equally likely, how well can we do? Solution Rather surprisingly, we can do quite well. To see this, fix a value k, 0 = k PuibestiX = )PK =) DY Palbestix =i), tai Now, on the one hand, if the overall best prize is among the first k, then no prize is ever selected under the strategy considered. That is, Py(best|X =) =0 ifisk On the other hand, if the best prize isin position i, where i > k, then the best prize will be selected if the best of the first / — 1 prizes is among the first k (for then none of the prizes in positions k + 1,k + 2,...,i — 1 would be selected). But, conditional on the best prize being in position i itis easy to verify that all possible orderings of the other prizes remain equally likely, which implies that each of the first i — 1 prizes is equally likely to be the best of that batch. Hence, we have Py(best|X = P(best of first i — 1 is among the first KIX = i) ifi>k Now, if we consider the function ae) 349 350 Example Properties of Expectation then ge) = htop (2) -i %0 10) =0=>log(2) <1-92=2 ‘Thus, since Py(best) ~ g(k), we sce that the best strategy of the type considered is to let the first n/e prizes go by and then accept the first one to appear that is better than all of those, In addition, since g(n/e) = 1/e, the probability that this strategy selects the best prize is approximately 1/e ~ 36788. Remark Most people are quite surprised by the size of the probability of obtaining the best prize, thinking that this probability would be close to 0 when m is large. How ever, even without going through the calculations, a little thought reveals that the probability of obtaining the best prize can be made reasonably large. Consider the strategy of letting half of the prizes go by and then selecting the first one to appear that is better than all of those, The probability that a prize is actually selected is the probability that the overall best is among the second half, and this is}. In addition, sven that a prize is selected, at the time of selection that prize would have been the best of more than n/2 prizes to have appeared and would thus have probability of at least } of being the overall best. Hence, the strategy of leting the fist half of all prizes go by and then accepting the first one that is better than all of those prizes has a probability greater than } of obtaining the best prize. . Let U be a uniform random variable on (0, 1), and suppose that the conditional distribution of X, given that U = p, is binomial with parameters n and p. Find the probability mass function of X. Solution Conditioning on the value of U gives 1 pix=a= f P(X = iJU = pifup) dp 1 frau = nap 5 nl 1 = att f #0 ~ vee i — di Jo ‘Now, it can be shown that 1 aun = a f pil — pylap = 2 =O! Hence, we obtain 1 oO n+ That is, we obtain the surprising result that if a coin whose probability of coming up heads is uniformly distributed over (0, 1) is flipped n times, then the number of heads occurring is equally likely to be any of the values 0,...,7. Because the preceding conditional distribution has such a nice form, itis worth trying to find another argument to enhance our intuition as to why such a result is true. To do so, let U,Ui,...,Un be m + 1 independent uniform (0, 1) random Example Example Properties of Expectation variables, and let X denote the number of the random variables Uj,..., Uy that are smaller than J. Since all the random variables U, Ui,...,Un Nave the same aistri- bution, it follows that U is equally likely to be the smallest, or second smallest, or largest of them; so X is equally likely to be any of the values 0,1,...,7. However, given that U = p, the number of the U; that are less than U is a binomial random variable with parameters n and p, thus establishing our previous result. . Suppose that X and ¥ are independent continuous random variables having densi- ties fy and fy, respectively. Compute P(X < Y}, Solution Conditioning on the value of Y yields PIX <¥) = [ope < YIY =ylfvondy = [Lea ( i) eka = py a =e +1- py ‘where the last equality follows from the binomial theorem. Differentiation yields MQ =n¢pe + 1 — py "pe Thus, FX] Differentiating a second time yields M"() = n(n — 1(pel + 1 = py" (pel? + mipe! + 1 — pyle! MO np so E[X?] = M"0) = n(n — Vp? + np ‘The variance of X is given by Var(X) = E[X?] — (ELX)? (n — Lp? + np — rep? =np(l — p) verifying the result obtained previously in the chapter. Example Poisson distribution with mean 1. If X is a Poisson random variable with parameter 2, then Mi = Ee*] © gng-hyn so Gey" LH Differentiation yields Mo Mo (ae'Pexpla(e! — 1)} + Aetexptate’ — D} Properties of Expectation E[X] = M0) E[X*] = M"(0) = 2? + Var(X) = ELX?] = (E[X)? Example Mo = Fe = [oremas fb aa [omar o 3 = fort 0 expan’ D) A Geometric with parameter O=pst Negative (to t)pa-e [ pe 1 1 2 al sie T-a-pe nrthy Another important result is that the moment generating function uniquely deter- mines the distribution. That is, if M() exists and is finite in some region about f = 0, then the distribution of X is uniquely determined. For instance, if 0 meo=(5) @ +0, then it follows from Table 1 that X is a binomial random variable with parameters Wand}, ‘Suppose that the moment generating function of a random variable X is given by M@ = &€-D, What is P(X = 0)? Solution We see from Table 1 that M() = €%'~ is the moment generating fune- tion of a Poisson random variable with mean 3. Hence, by the one-to-one correspon- dence between moment generating functions and distribution functions, it follows that X must be a Poisson random variable with mean 3. Thus, P(X = 0) =e". ‘Sums of independent binomial random variables If X and Y are independent binomial random variables with parameters (n, p) and (om, p), respectively, what isthe distribution of X + Y? Solution The moment generating function of X + is given by Mxsy(6) = Mx()My() = (pel + 1 — py"(pel + 1 = py™ = (pe +1 - pyr However, (pe! + 1 — p)"*® is the moment generating function of a binomial ran dom variable having parameters m + n and p. Thus, this must be the distriby of X + ¥. Properties of Expectation ) °. (on) eo {ee ofl x wore 9 yen tL = eoy | samme sion wv _ ors o 0< 9) x x (2) (oa }=Cos | szoronmared gous verses) . 5 AY oer OD 0 < Y;0med x x o>x of T T oer a ws ‘ys pepmanodg osmomo a z (=n »-qh=@ ‘») 1940 m0} eo i ae geese -4| wos @) 00 soueHeA URW GY “UONDURY (af uonduny Kysuap Aumgegord, “uOMnGuIsiC] AypgeGoId SHONUNUCD _Z aIGEL 363 Example Te Example Example a Properties of Expectation ‘Sums of independent Poisson random variables Calculate the distribution of X + ¥ when X and ¥ are independent Poisson random variables with means respectively 21 and 2. Solution Mxyy =Mx@MyO, =explar(e! — 1)}expiratet — 1)) xp((ar + day(el — 1} Hence, X + ¥ is Poisson distributed with mean 2 + A2. . ‘Sums of independent normal random variables Show that if X and Y are independent normal random variables with respective parameters (11,02) and (j,03), then X'-+ Y is normal with mean jy + 42 and variance o? + 03. Solution Mxsv® =Mx@MyO ae = exp yo + ait ox aan | 2 J ope + mh +n + vor which is the moment generating function of a normal random variable with mean ye, + pe and variance o? + a2. The desired result then follows because the moment generating function uniquely determines the distribution, . Compute the moment generating function of a chi-squared random variable with degrees of freedom. Solution We can represent such a random variable as Bt Ze where Zj,...,Z, are independent standard normal random variables. Let M(s) be its moment generating function. Then, by the preceding, M@ = Ele)" where Z is a standard normal random variable. Now, eae wok fet ae wheres? 2 = 2) a= 29? Example Properties of Expectation Where the next-to-last equality uses the fact that the normal density with mean 0 and variance o* integrates to 1. ineretore, M@ = (1 = 29"? . Moment generating function of the sum of a random number of random variables Let X1,X2,... be a sequence of independent and identically distributed random variables, and let N’ be a nonnegative, integer-valued random variable that is inde- Pendent ot the sequence %,1 = 1, We want to compute tne moment generating function of w Y=)% m (In Example 5d, Y was interpreted as the amount of money spent in a store on a given day when both the amount spent by a customer and the number of customers are random variables) “To compute the moment generating function of Y. we frst condition on N as follows: r e oo| Sx} wen|=e| ap ; xl y -rloof =[Mxoy" where Mx(o = Fle*] Hence, EeY\N] = (Mx(oy Thus, My( = BUM x0") ‘The moments of ¥ can now be obtained upon differentiation, as follows: My(0) = E[N(Mx(0)" My) So EY] = M40) E[N(Mx(0)"" M)] E(NE[X]} (72) EINELX] verifying the result of Example 5d. (In this last set of equalities, we have used the fact that Mx(0) = Ele™*] = 1.) 365 366 Properties of Expectation Also, MY = BIN(N — DMx(oy (MO? + NOMx Oy" MZCO) ELY"] = Mj) EIN(N = 1(E[X)? + NELX?]] E[X)?(E[N?] — EIN) + E[NJELX?] (73) = EINVELX?] — (ELX)?) + (ELX)PEIN?] E[N|Vat(X) + (E[X)PE[N?] Hence, from Equations (7.2) and (7.3), we have ‘Var(¥) = E[N]Var(X) + (ELX)?CEIN?] — (EIND*) = E[N]Var(X) + (E[X])?Var(N) . Let ¥ denote a uniform random variable on (0. 1). and suppose that conditional on ¥ = p, the random variable X has a binomial distribution with parameters n and p. In Example 5k, we showed that X is equally likely to take on any of the values 0,1,...,. Establish this result by using moment generating functions. Solution To compute the moment generating function of X, start by conditioning ‘on the value of ¥. Using the formula for the binomial moment generating function ives Ele*|¥ =p] = (pe + 1 — py* Now, ¥ is uniform on (0,1), 0, upon taking expectations, we obtain Fe) 1 [ve +1- pra < =z4 f ydy (by the substitution y = pe’ +1 — p) gown) Z a a weit et eat +e") Because the preceding is the moment generating function of a random variable that is equally likely to be any of the values 0,1... the desired result follows from the fact that the moment generating function of a random variable uniquely determines its distribution, . 7.1 Joint Moment Generating Functions It is also possible to define the joint moment generating function of two or more random variables, This is done as follows: For any m random variables Xj,...,Xn. the joint moment generating function, M(t1.....t,), is defined, for all real values of Ny eeelny by Mths stn) = Ele ¥t eX) Example Tm Properties of Expectation ‘The individual moment generating functions can be obtained from M(t, letting all Dut one of tne fs De 0. That is, fn) by Mx,(0 = Ele*] = MO, where the fis in the ith place, It can be proven (although the proof is too advanced for this chapter) that the joint moment generating function M(‘y,...,tn) uniquely determines the joint dis- tribution of X;,...,Xx- This result can then be used to prove that the m random variables X1,...,X_ are independent if and only if, Mths. sate) = My (ts) Miz (tn) (74) For the proof in one direction, ifthe n random variables are independent, then M(tyscssaty) = leit 408 = Efe «eX = Ble")... Efe***] by independence: = Mx,(4)-- Mx(ln) For the proof in the other direction, if Equation (7.4) is satisfied, then the joint moment generating function M(t...) isthe same as the joint moment generating function of m independent random variables, the ith of which has the same distribu- tion as Xj. As the joint moment generating function uniquely determines the joint distribution, this must be the joint distribution; hence, the random variables are independent. Let X and ¥ be independent normal random variables, each with mean y and vari- ance o?. Let us establish that X + Y and X — Y are independent by computing their joint moment generating function: BEXAR] = ppeeroxHe-9Y] = Fle*X]z[e-9¥] a eiteneatune? a utosatt a putter pit But we recognize the preceding as the joint moment generating function of the sum of a normal random variable with mean 2 and vatiance 20” and an independent normal random variable with mean 0 and variance 2a”, Because the joint moment generating function uniquely determines the joint distribution, it follows that X + Y and X — Y are independent normal random variables. . Suppose that the number of events that occur is a Poisson random variable with ‘mean A and that each event is independently counted with probability p. Show that the number of counted events and the number of uncounted events are independent Poisson random variables with respective means kp and A(1 — p).. Properties of Expectation Solution Let X denote the total number of events, and let X. denote the number of them that are counted. To compute the joint moment generating function of Xe, the number of events that are counted, and X — Xe, the number that are uncounted, start by conditioning on X to obtain EleXHOXOX =n] = A Efe P*|X = rn] =e™pe! +1 — pyr =e + (1 pe" which follows because, conditional on X =n, X- is a binomial random variable with parameters n and p. Hence, Ele ROX |X] = (pe + (1 — pyey* ‘Taking expectations of both sides of this equation yields Ele HOAX) = Elipe’ + (1 — poe*] ‘Now, since X is Poisson with mean 2, it follows that Efe] = eM. Therefore, for any positive value awe see (by letting a = e') that E[a*] = &-), Thus, EfeXHX-X0] = poe H0-pe-D = ee dI-PE—1) As the preceding is the joint moment generating function of independent Poisson random variables with respective means 4p and (1 — p), the result is proven, 8 Additional Properties of Normal Random Variables 8.1 The Multivariate Normal Distribution Let Z;,...,Zy be a set of m independent unit normal random variables. If, for some constants ay,1 Sis m,1=j = n,andyj,1 i= m, X= anZi ++ + ainZy + 1 Mp aan, ++ + aZn + On Zy +o + OinZn + 4 Kin =m Zi +--+ nn + Hm then the random variables X;,...,%m are said to have a multivariate normal distri- bution. Example Properties of Expectation From the fact that the sum of independent normal random variables is itself a normal random variable, it follows that each X; is a normal random variable with mean and variance given, respectively, by FIX) = Var(Xi) ye Let us now consider M(ty,.--stm) = ElexptthXy + +++ + tmXmi] the joint moment generating function of X;,...,Xm. The first thing to note is that since $* nX1 is itself a linear combination of the independent normal random vi ables Z1,...,Zn itis also normally distributed. Its mean and variance are E| aki) = Dame mi and Var x) cov ux 4%) "fat ELV wor. aint Now, if ¥ is a normal random variable with mean y and variance o?, then EleY] = My leat = ett"? xp} Yaa + SY ayCowXnX) ia ist fat Mitts .++ stm) which shows that the joint distribution of X1,..., Xm is completely determined from a knowledge of the values of E[X;] and Cov(Xi, Xj), Itcan be shown that when m = 2, the multivariate normal distribution reduces to the bivariate normal. Find P(X < Y) for bivariate normal random variables X and Y having parameters Mx = E[X], ay = ELY], of = Var(X), of = Var(¥), p = Cort(X, ¥) Solution Because X — Y is normal with mean EX ~ Y]=us ~ by and variance Var(X — ¥) = Var(X) + Var(-¥) + 2Cow(X,—¥) o2 + of ~ 2010 370 Properties of Expectation we obtain PIX < Y)=PIX — ¥ <0) — x — Xa¥ ~~) —(x — by) fa} + 0} — Yoocay fo? + oF ~ 2p00y ty =e a (Fes “+ of — 2pax0y Suppose that the conditional distribution of X, given that © = 6, is normal with mean and variance 1. Moreover, suppose that © itself is a normal random vari- able with mean and variance 02. Find the conditional distribution of © given that Xan Solution Rather than using and then simplifying Bayes’s formula, we will solve this, problem by first showing that X, © has a bivariate normal distribution. To doo, note that the joint density function of X, can be written as fx,00,9) = frye CXl0)fo() where fxyo (+10) is a normal density with mean @ and variance 1. However, if we let Z bbe a standard normal random variable that is independent of ©, then the conditional distribution of Z + ©, given that © = 6, is also normal with mean 9 and variance 1 ‘Consequently, the joint density of Z + ©, © is the same as that of X,@. Because the former joint density is clearly bivariate normal (since Z + © and © are both linear combinations of the independent normal random variables Z and @), it follows that X,@ has a bivariate normal distribution. Now, EIX]= FIZ + 0] =n Var(X) = Var(Z + @) =1 + 07 and p= Corr(X,@) =Com(Z + 0,8) Cov(Z + 0,0) WVar'Z + O)var@) vie Because X, © has a bivariate normal distribution, the conditional distribution of ©, given that X = xis normal with mean Van El@1x = x] = £0} + » | MO 2 set ee and variance Var(@LX = x) = Var(@y(1 — p*) eB T+ Properties of Expectation 8.2 The Joint Distribution of the Sample Mean and Sample Variance Let X},...,X~ be independent normal random variables, each with mean and vari- ance «2, Let ¥ = 3° Xi/n denote their sample mean, Since the sum of independent i normal random variables is also a normal random variable, it follows that X is a nor- mal random variable with (from Examples 2c and 4a) expected value and variance ", Jn. ‘Now, recall from Example 4e that Cov(®, x; — %) ” al) Also, note that since X,X; — X,X2 — X,...,Xn — X are all linear combina- tions of the independent standard normals (X; — 4)/0,i = 1,...n, it follows that X,X; — X,i=1,....m has a joint distribution that is multivariate normal. If we let Y be a normal random variable, with mean 4. and variance o”/n, that is independent of the Xji = 1,...,n, then Y,X; — X,i = 1,...,n also has a multivariate normal distribution and, indeed, because of Equation (8.1), has the same expected values and covariances as the random variables X,X; — X,i = 1,...,n. But since a mul- tivariate normal distribution is determined completely by its expected values and it follows that ¥,X, — X,i=1,....nandX,X; — X,i=1,....whave stribution, thus showing that X is independent of the sequence of ns. Xi — Xi = 1s... 40 i Since X is independent of the sequence of deviations X; ~ Xi also independent of the sample variance S* = S>(X; — KP/(n — 0). A Since we already know that X is normal with mean 1 and variance 0?/n, it remains only to determine the distribution of S®, To accomplish this, recall, rom Example 4a, the algebraic identity the same joint devi po eeaM it is (m — DSP = ou - XP Dex - wy? = n& = wy? a 2 Upon dividing the preceding equation by 02, we obtain mas ( (62) a Now, is the sum of the squares of n independent standard normal random variables and so is a chi-squared random variable with n degrees of freedom. Hence, from Exam- ple Ti, its moment generating function is (1 ~ 21)""/?, Also, because Cs) 374 Properties of Expectation is the square of a standard normal variable, itis a chi-squared random variable with 1 degree of freedom, and so has moment generating function (1 — 2)-", Now, we have seen previously in the chapter that the two random variables on the left side ‘of Equation (8.2) are independent. Hence, as the moment generating function of the sum of independent random variables is equal to the product of their individual moment generating functions, we have FeODF 70 — a2 = = 292 or Blet-D164) = — ay-mve But as (1 — 2-0 is the moment generating function of a chi-squared random variable with n — 1 degrees of freedom, we can conclude, since the moment gener- ating function uniquely determines the distribution of the random variable, that that is the distribution of (n — 1)$?/a?. Summing up, we have shown the following. Proposition If X;,...,X» are independent and identically distributed normal random variables ‘8.1 with mean y« and variance a, then the sample mean and the sample variance $? are independent. X is a normal random variable with mean wand variance o/n; (x ~ 1)$%/a? isa chi-squared random variable with n — 1 degrees of freedom. 9 General Definition of Expectation Upto this point, we have defined expectations only for discrete and continuous ran- dom variables. However, there also exist random variables that are neither discrete nor continuous, and they, too, may possess an expectation. As an example of such a random variable, let X be a Bernoulli random variable with parameter p = 3, and let ¥ be a uniformly distributed random variable over the interval (0, 1]. Further- more, suppose that X and Y are independent, and define the new random variable Woy wa[k st Y ifX#l Clearly, W is neither a discrete (since its set of possible values, [0,1], is uncountable) nor a continuous (since P{W = 1} = }) random variable. In order to define the expectation of an arbitrary random variable, we require the notion of a Stieltjes integral. For instance, for any function g, f° g(x) dr is defined by . * f g@) dx = lim > gets — x1) where the limit is taken over all G4 — 1) 0. For any distribution function F, we define the Stieltjes integral of the nonnega- tive function g over the interval [a, b] by Say 0. Further, we define the Stieltes integral over the whole real line by “0 ’ f (x) dF) = lim f (x) dF(x) Finally, if gis not a nonnegative function, we define g* and g~ by itga) <0 -waf o ite =o ro={ es itga) <0 von [ry fae Because gcx) = g'(8) — g (4) and g* and g~ are both nonnegative functions, iis natural to define [se are = [ive ary = [ew ar) and we say that /°°, g(x) dF(x) exists as long as [°° g* (x) dF(x) and f°, g(x) dF(x) are not both equal to +00. If X is an arbitrary random variable having cumulative distribution F, we define the expected value of X by Ex] = f xdF() (9) ‘It can be shown that if X is a discrete random variable with mass function p(x), then [ dF) = YD xp) a noo whereas if X is a continuous random variable with density function f(x), then [i2ateo = [ates ‘The reader should note that Equation (9.1) yields an intuitive definition of E[X]; consider the approximating sum DxlFan - Fe] of E[X]. Because F(x) — F(xi-1) is just the probability that X willbe in the interval (1, xi) the approximating sum multiplies the approximate value of X when itis in the interval (x1, by the probability that it will be in that interval and then sums over all the intervals. Clearly, as these intervals get smaller and smaller in length, we obtain the “expected value” of X. Stieltjes integrals are mainly of theoretical interest because they yield a compact, way of defining and dealing with the properties of expectation. For instance, the use of Stielties integrals avoids the necessity of having to give separate statements and proofs of theorems for the continuous and the discrete cases. However, their properties are very much the same as those of ordinary integrals, and all of the proofs presented in this chapter can easily be translated into proofs in the general case. 373 374 Properties of Expectation Summary EX and ¥ have a joint probability mass function p(x, y), then! EIg(XY)] =D sep») es whereas if they have a joint density function f(x,y), then BKM) = [~ [~ seyftyndedy ‘consequence ofthe preceding equations is that AUX + Y]=21X] + AY) which generalizes to E lé | = Dax ‘The covariance between random variables X and ¥ is, siven by Cov(X, ¥) = E[(X - E[XI(Y - E[YD] IXY] ~ ELXIEIY A useful identity is om(Sads = SF econx.yp When n = mand ¥; = Xii = formula gives ) ‘The correlation between X and ¥, denoted by (X,Y), is defined by the preceding Yvwexy + 2Y comers vp i 14 CoviX.¥) (X,Y) = If X and ¥ are jointly discrete random variables, then the conditional expected value of X, given that ¥ = y, is defined by EIXIY =y]= xP =H =y] ILX and ¥ are jointly continuous random variables, then EIXIY = where computed conditional on the event that Y the properties of ordinary expectations. Let E[X|¥] denote that function of ¥ whose value at Y¥ =yis E[XIY = 5]. A very useful identity is 21x] = ELELXIYI] In the case of discrete random variables, this equation reduces to the identity EIX] =O EIXIY = yIP(Y = y} 7 and, in the continuous case, to EUX|= | FIXIY = Wrordy ‘The preceding equations can often be applied to obtain E{X] by first “conditioning” on the value of some other random variable Y. In addition, since, for any event A, P(A) = Ella}, where [4 is lif A occurs and is 0 otherwise, we can use the same equations to compute probabil The conditional variance of X, given that defined by Var(XI¥ = y) = BUX — ELXIY = y)1¥ =y)] Let Var(X|¥) be that function of ¥ whose value at ¥ = y is Var(X1¥ = y). The following is known as the conditional variance formula: Var(x) {Var(X1¥)] + Var(BLXIY) ‘Suppose that the random variable X is to be observed and, ‘on the basis of its value, one must then predict the value of the random variable ¥. In such a situation, it turns out that ‘among all predictors, E[YLX] has the smallest expectation of the square of the difference between it and Y. ‘The moment generating function of the random vari- able X is defined by Mo = Fe) ‘The moments of X can be obtained by successively differ cntiating M(O and then evaluating the resulting quantity at = 0. Specifically, we have e BUX" = Mo] n= 1,2... hao ‘Two useful results concerning moment generating func- tions are, first, that the moment generating function uniquely ‘determines the distribution function of the

You might also like