0% found this document useful (0 votes)
11 views23 pages

Chap 3.2

The document discusses conditional distributions and conditional expectations in probability and statistics, particularly focusing on discrete bivariate random variables. It provides definitions, examples, and formulas for calculating conditional probability mass functions (pmfs) and expectations, illustrating how knowledge of one variable affects the uncertainty of another. Additionally, it covers properties of independent variables and the implications for conditional distributions.

Uploaded by

jeffsiu456
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views23 pages

Chap 3.2

The document discusses conditional distributions and conditional expectations in probability and statistics, particularly focusing on discrete bivariate random variables. It provides definitions, examples, and formulas for calculating conditional probability mass functions (pmfs) and expectations, illustrating how knowledge of one variable affects the uncertainty of another. Additionally, it covers properties of independent variables and the implications for conditional distributions.

Uploaded by

jeffsiu456
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Stat1301 Probability& Statistics I Spring 2008-2009

§ 3.8 Conditional Distributions and Conditional Expectation

Oftentimes when two random variables, ( X , Y ) , are observed, the values of the two
variables are related. For example, suppose that

X = a person’s height , Y = same person’s weight.

Surely we would think it more likely that Y > 200 pounds if we were told that
X = 182 cm than if we were told that X = 104 cm .

The knowledge about the value of X gives us some information about the value of
Y even though it does reveal the exactly value of Y.

Recall that for any two events E and F, the conditional probability of E given F is
defined by
P (E ∩ F )
P (E | F ) = provided that P (F ) > 0 .
Pr (F )

This leads to the following definition.

Definition

Let ( X , Y ) be a discrete bivariate random vector with joint pmf p( x, y ) and


marginal pmfs p X ( x ) and pY ( y ) . For any x such that p X ( x ) = P ( X = x ) > 0 , the
conditional pmf of Y given that X = x is the function of y denoted by pY | X ( y | x )
and defined by

P (Y = y , X = x ) p ( x, y )
pY | X ( y | x ) = P (Y = y | X = x ) = = .
P( X = x ) p X (x )

On the other hand, for any y such that pY ( y ) = P (Y = y ) > 0 , the conditional pmf of
X given that Y = y is the function of x denoted by p X |Y ( x | y ) and defined by

P ( X = x, Y = y ) p ( x, y )
p(x | y ) = P( X = x | Y = y ) = =
P (Y = y ) pY ( y )

P.129
Stat1301 Probability& Statistics I Spring 2008-2009

Example 3.25
Referring to example 1 that we randomly draw 3 balls from an urn with 3 red balls,
4 white balls, 5 blue balls. Let X be number of red balls, Y be the number of white
balls in the sample. The joint pmf of ( x, y ) is given by the following table.

Values of Y
Values of X 0 1 2 3 Total
0 0.0454 0.1818 0.1364 0.0182 0.3818
1 0.1364 0.2727 0.0818 0 0.4909
2 0.0682 0.0545 0 0 0.1227
3 0.0045 0 0 0 0.0045
Total 0.2545 0.5091 0.2182 0.0182 1.0000

Dividing all the entries by the row totals we obtain the conditional pmfs of Y X =x
.

Values of Y
Values of X 0 1 2 3 Total
0 0.1190 0.4762 0.3571 0.0476 1
1 0.2778 0.5556 0.1667 0 1
2 0.5556 0.4444 0 0 1
3 1 0 0 0 1

Each row represents a conditional pmf of Y X = x . For example, the first row is the
conditional pmf of Y given that X = 0 , the second row is the conditional pmf of Y
given that X = 1 , etc. From these conditional pmfs we can see how our uncertainty
on the value of Y is affected by our knowledge on the value of X.
Similarly, dividing all the entries in the joint pmf table by the column totals gives
the conditional pmf of X Y = y which is shown in the following table.

Values of Y
Values of X 0 1 2 3
0 0.1786 0.3571 0.6250 1
1 0.5357 0.5357 0.3750 0
2 0.2679 0.1071 0 0
3 0.0179 0 0 0
Total 1 1 1 1

P.130
Stat1301 Probability& Statistics I Spring 2008-2009

Example 3.26

Let X ~ ℘(λ1 ) , Y ~ ℘(λ2 ) be two independent Poisson random variables. If it is


known that X + Y = n , what will be the conditional distribution of X ?

Obviously, the possible values of X given X + Y = n should be from 0 to n.

First of all, the random variable X + Y is distributed as ℘(λ1 + λ2 ) as the moment


generating function of X + Y is

M X +Y (t ) = M X (t )M Y (t ) = e λ1 (e ) × eλ2 (e t −1) = e(λ1 + λ2 )(e t −1) .


t −1

Now consider

P( X = k , X + Y = n )
P( X = k | X + Y = n ) =
P( X + Y = n )

P( X = k ,Y = n − k )
=
P( X + Y = n )

P ( X = k )P (Y = n − k )
=
P( X + Y = n )

e − λ1 λ1k e − λ2 λn2 − k e − (λ1 + λ2 ) (λ1 + λ2 )


n
= ( X + Y ~ ℘(λ1 + λ2 ) )
k! (n − k )! n!

n! λ1k λn2 − k
=
k ! (n − k ) ! (λ1 + λ2 )n

k n−k
⎛ n ⎞⎛ λ1 ⎞ ⎛ λ2 ⎞
= ⎜⎜ ⎟⎟⎜⎜ ⎟⎟ ⎜⎜ ⎟⎟ , x = 0,1,..., n
⎝ ⎠⎝ 1
k λ + λ 2 ⎠ λ
⎝ 1 + λ 2 ⎠

⎛ λ1 ⎞
Hence X |X +Y = n ~ b⎜⎜ n , ⎟⎟ .
⎝ λ1 + λ2⎠

P.131
Stat1301 Probability& Statistics I Spring 2008-2009

Remarks
1. If X is independent of Y, then the conditional pmf of X given Y = y will
becomes
p ( x, y ) p X ( x ) pY ( y )
p X |Y ( x | y ) = = = p X ( x ) for all y .
pY ( y ) pY ( y )

Similarly,

p ( x, y ) p X ( x ) pY ( y )
pY | X ( y | x ) = = = pY ( y ) for all x .
p X (x ) p X (x )

Hence the knowledge of the value of one variable do not affect our uncertainty
on the value of another variable, i.e. knowledge of one variable give us no
information on the other variable if they are independent.

2. For continuous random variables, the conditional distributions are defined in an


analogous way.

Conditional pdf of Y given that X = x

f ( x, y )
fY | X ( y | x ) = provided that f X ( x ) > 0 .
f X (x )

Conditional pdf of X given that Y = y

f ( x, y )
f X |Y ( x | y ) = provided that f Y ( y ) > 0 .
fY ( y )

3. The conditional distribution of Y given X = x is possibly a different probability


distribution for each value of x. Thus we actually have a family of probability
distributions for Y, one for each x. When we wish to describe this entire family,
we will use the phrase “the distribution of Y X ”.

P.132
Stat1301 Probability& Statistics I Spring 2008-2009

4. The conditional pmf/pdf satisfies all the properties of a pmf/pdf and describes
the probabilistic behaviour of a random variable given the value of another
variable. Hence we can have the followings definitions.

Conditional Distribution Function of Y given X = x

⎧ ∑ fY | X (i | x ) discrete case

FY | X ( y | x ) = P (Y ≤ y | X = x ) = ⎨ i ≤ y
⎪⎩ ∫ − ∞ fY | X (t | x )dt
y
continuous case

Conditional Expectation of g (Y ) given X = x

⎧ ∑ g (i ) fY | X (i | x ) discrete case

E (g (Y ) | X = x ) = ⎨ i
⎪⎩ ∫ − ∞ g (t ) fY | X (t | x )dt

continuous case

Conditional Mean of Y given X = x

E (Y | X = x )

Conditional Variance Y given X = x

(
Var (Y | X = x ) = E (Y − E (Y | X = x )) | X = x
2
)
( )
= E Y 2 | X = x − (E (Y | X = x ))
2

Example 3.27
Suppose that the joint density of X and Y is given by

⎧ e− x y e− y
⎪ x > 0, y > 0
f ( x, y ) = ⎨ y .
⎪0
⎩ otherwise

Marginal pdf of Y :

fY ( y ) = ∫

0
e− x y e− y
y
[
dx = e − y − e − x ]
y ∞
0 = e− y , y>0

Hence Y ~ Exp (1) .

P.133
Stat1301 Probability& Statistics I Spring 2008-2009

Conditional pdf of X Y = y :

f ( x, y ) 1 − x y
fY | X ( x | y ) = = e , x>0
fY ( y ) y

Hence the conditional distribution of X given Y = y is exponential with parameter


λ = 1 y , or we may write
X Y ~ Exp Y −1 . ( )
The conditional distribution function of X Y = y is

⎧ 1 − e− x y
x>0
FX |Y ( x | y ) = ⎨ .
⎩0 otherwise

Also, the conditional mean and variance can be determined easily as

E(X | Y ) = Y , Var ( X | Y ) = Y 2

Therefore E ( X | Y ) and Var ( X | Y ) are random variables.

What is E (E ( X | Y )) ?

Computing Expectations by Conditioning

Two important and useful formulae of conditional expectation are given below.

E (u ( X )) = E (E (u ( X ) | Y ))

E (E (u ( X ) | Y )) = ∫ − ∞ E (u ( X ) | Y = y ) fY ( y )dy

Proof :

{ }
= ∫ − ∞ ∫− ∞ u ( x ) f X |Y ( x | y )dx fY ( y )dy
∞ ∞

= ∫ − ∞ ∫ − ∞ u ( x ) f ( x, y )dxdy
∞ ∞

= E (u ( X ))

P.134
Stat1301 Probability& Statistics I Spring 2008-2009

Var ( X ) = E (Var ( X | Y )) + Var (E ( X | Y ))

Proof : { ( )
E (Var ( X | Y )) = E E X 2 | Y − (E ( X | Y ))
2
}
( ) {
= E X 2 − E (E ( X | Y ))
2
}
{
Var (E ( X | Y )) = E (E ( X | Y ))
2
}− { E (E ( X | Y )) } 2

( )
= E X 2 − E (Var ( X | Y )) − E ( X )
2

= Var ( X ) − E (Var ( X | Y ))

Example 3.28
Suppose we have a binomial random variable X which represents the number of
success in n independent Bernoulli experiments. Sometimes the success probability
p is unknown. However, we usually have some understanding on the value of p,
e.g. we may believe that p is another random variable picked uniformly from (0,1) .
Then we have the following hierarchical model:

p ~ U (0,1) , X p
~ b(n, p )

Using the above formulae, we can compute

n
E ( X ) = E (E ( X | p )) = E (np ) = nE ( p ) =
2

Var ( X ) = E (Var ( X | p )) + Var (E ( X | p ))

= E (np (1 − p )) + Var (np )

( )
= nE ( p ) − nE p 2 + n 2Var ( p )

n n n 2 n(n + 2 )
= − + =
2 3 12 12

P.135
Stat1301 Probability& Statistics I Spring 2008-2009

To find the marginal pmf of X, p X ( x ) = P ( X = x ) , we can let

⎧1 X =x
Ix = ⎨ .
⎩0 otherwise

Then P ( X = x ) = E (I x )
= E (E (I x | p ))
= E (P ( X = x | p ))
⎛⎛ n⎞ n− x ⎞
= E ⎜⎜ ⎜⎜ ⎟⎟ p x (1 − p ) ⎟⎟
⎝⎝ x⎠ ⎠
⎛n⎞ 1
= ⎜⎜ ⎟⎟ ∫ 0 p x (1 − p ) dp
n− x

⎝ x⎠
⎛ n ⎞ Γ( x + 1)Γ(n − x + 1)
= ⎜⎜ ⎟⎟
⎝ x⎠ Γ(n + 2 )
⎛ n ⎞ x ! (n − x )! 1
= ⎜⎜ ⎟⎟ = , x = 0,1,..., n
⎝ x ⎠ (n + 1)! n +1

Hence X is distributed as integer uniform in { 0,1,2,..., n }.

Using the Bayes’ theorem, the conditional pdf of p given X = x is given by

p X | p (x | p ) f p ( p )
f p| X ( p | x ) =
p X (x )
⎛n⎞ 1
= ⎜⎜ ⎟⎟ p x (1 − p ) × 1
n− x

⎝ x⎠ n +1

=
(n + 1)! p x (1 − p )n − x
x ! (n − x )!
Γ(n + 2 ) (n − x +1)−1
= p ( x +1)−1 (1 − p ) , 0 < p <1
Γ( x + 1)Γ(n − x + 1)

Therefore p X = x ~ Beta ( x + 1, n − x + 1) and

x +1 x +1
E( p | X = x) = = .
(x + 1) + (n − x + 1) n + 2
This formula is known as the Laplace’s law of succession in the 18th century by
Pierre-Simon Laplace in the course of treating the sunrise problem which tried to
answer the question “What is the probability that the sun will rise tomorrow?”

P.136
Stat1301 Probability& Statistics I Spring 2008-2009

Example 3.29

Let X ~ ℘(λ1 ) , Y ~ ℘(λ2 ) be two independent Poisson random variables. Find the
X
expected value of the proportion .
X +Y

⎛ λ1 ⎞
Ans: Let N = X + Y . From example 3.26, we know that X ~ b⎜⎜ N , ⎟⎟ .
N
⎝ λ1 + λ2⎠
Consider
⎛ X ⎞ ⎛X⎞ ⎡ ⎛X ⎞⎤
E⎜ ⎟ = E ⎜ ⎟ = E ⎢ E ⎜ | N ⎟⎥
⎝ X +Y ⎠ ⎝N⎠ ⎣ ⎝N ⎠⎦
⎧1 ⎫
= E ⎨ E ( X | N )⎬
⎩N ⎭
⎧ 1 Nλ1 ⎫ λ1
= E⎨ ⎬=
⎩ N λ1 + λ2 ⎭ λ1 + λ2

Example 3.30
(Prediction of Y from X)

When X and Y are not independent, we can base on the observed value of X to
predict the value of the unobserved random variable Y. That is, we may predict the
value of Y by g ( X ) where g is a function chosen in such a way that the mean
(
square error of the prediction, Q = E (Y − g ( X )) , is minimized.
2
)
First we conditional on X, consider

( ) ( )
E (Y − g ( X )) | X = E Y 2 | X − 2 g ( X )E (Y | X ) + g ( X )
2 2

= Var (Y | X ) + (E (Y | X )) − 2 g ( X )E (Y | X ) + g ( X )
2 2

= Var (Y | X ) + (g ( X ) − E (Y | X )) .
2

Hence
{ ((
Q = E E Y − g(X ) | X
2
)) }
{
= E (Var (Y | X )) + E (g ( X ) − E (Y | X )) .
2
}
Therefore Q is minimized if we choose g ( x ) = E (Y | X = x ) , i.e. the best predictor
of Y given the value of X is g ( X ) = E (Y | X ) . The mean square error of this
predictor is

( )
E (Y − E (Y | X )) = E (Var (Y | X )) = Var (Y ) − Var (E (Y | X )) ≤ Var (Y ) .
2

P.137
Stat1301 Probability& Statistics I Spring 2008-2009

Example 3.31

Two random variables X and Y are said to have a bivariate normal distribution if
their joint pdf is

⎧ ⎡⎛ x − μ 2 ρ ( x − μ x )( y − μ y ) ⎛ y − μ y ⎤⎫
2

2
⎪ ⎞
f ( x, y ) =
1
exp⎨−
1 ⎢⎜ ⎟⎟ − +⎜ ⎟ ⎥ ⎪⎬
( )
x

⎢⎝ σ x σ xσ y ⎜ σ ⎟
2πσ xσ y 1− ρ 2 ⎪⎩ 2 1 − ρ ⎥⎪
2
⎣ ⎠ ⎝ y ⎠ ⎦⎭
, −∞< x<∞ , −∞< y<∞ .

where μ x ,σ x2 are the mean and variance of X ; μ y ,σ y2 are the mean and variance
of Y ; ρ is the correlation coefficient between X and Y. It is denoted as

⎛X⎞ ⎡⎛ μ x ⎞ ⎛ σ x2 ρσ xσ y ⎞⎤
⎜⎜ ⎟⎟ ~ N ⎢⎜⎜ ⎟⎟ , ⎜ ⎟⎥ .
⎜ σ y2 ⎟⎠⎥⎦
⎝Y ⎠ ⎢⎣⎝ μ y ⎠ ⎝ ρσ xσ y

Consider the marginal pdf of X :

f X ( x ) = ∫ − ∞ f ( x, y )dy

= C ( x )∫
∞ 1


exp ⎨−
1 ⎛ y − μy



2⎫
⎪ (
⎡ ρ (x − μ x ) y − μ y )⎤dy
−∞
(
2πσ 2y 1 − ρ 2 ) ⎪⎩ 2 1 − ρ
2
( ) ⎜ σy




⎪⎭
exp ⎢
(
⎢⎣ 1 − ρ σ xσ y
2
) ⎥
⎥⎦

⎧⎪ ⎛ x − μx ⎞
2
⎫⎪
where C (x ) =
1 1
exp⎨− ⎜⎜ ⎟⎟ ⎬
2πσ x
2
⎪⎩ 2 1 − ρ
2
( ) ⎝ σx ⎠ ⎪⎭

⎡ ρ (x − μ ) ⎤ y − μy
∞ ⎛ 1 ⎞
= C ( x )∫
1
exp⎜ − z 2 ⎟ exp ⎢ x
z ⎥dz ( put z = )
−∞ 2π ⎝ 2 ⎠ ⎢σ x 1 − ρ 2 ⎥ σ y 1− ρ 2
⎣ ⎦

⎛ ρ (x − μ ) ⎞
= C ( x )M Z ⎜ x ⎟ where M Z (t ) is the mgf of Z ~ N (0,1)
⎜σ 1− ρ 2 ⎟
⎝ x ⎠

⎧⎪ ⎫⎪ ⎧⎪ 1 ρ 2 ( x − μ x )2 ⎫⎪
2
1 1 ⎛ x − μx ⎞
= exp ⎨− ⎜⎜ ⎟⎟ ⎬ ⎨ ⎬
2πσ x2 (
⎪⎩ 2 1 − ρ
2
) ⎝ σx ⎠ ⎪⎭
exp
(
⎪⎩ 2 σ x2 1 − ρ 2 ⎪⎭ )

1 ⎧⎪ 1 ⎛ x − μ ⎞
2
⎫⎪
= exp ⎨− ⎜⎜ x
⎟⎟ ⎬, −∞ < x < ∞
2πσ x
2
⎪⎩ 2 ⎝ σ x ⎠ ⎪⎭

P.138
Stat1301 Probability& Statistics I Spring 2008-2009

(
Hence the marginal distribution of X is N μ x ,σ x2 . The conditional pdf of Y given)
X = x is given by

f ( x, y )
fY | X ( y | x ) =
f X (x )
⎧⎪ ⎛ ρσ y ⎞
2⎫

=
1
exp ⎨− 2
1
⎜⎜ y − μ y − (x − μ x )⎟⎟ −∞ < y < ∞
(
2πσ 2y 1 − ρ 2 ) (
⎪⎩ 2σ y 1 − ρ
2
) ⎝ σx ⎠
⎬,
⎪⎭

ρσ y
Hence Y X
~ N


⎜ μ y +
σ
( X − μ x ) , 1 − ρ 2
σ 2⎞
y ( )
⎟⎟ . The best predictor of Y given
⎝ x ⎠
the value of X is
ρσ y
E (Y | X ) = μ y + (X − μx ) .
σx

It is called the linear regression of Y on X .

§ 3.9 Transformations of Multivariate Distribution

Let X 1 , X 2 ,..., X n be jointly distributed continuous random variables with joint


probability density function f X ( x1 , x2 ,..., xn ) . It is sometimes necessary to obtain
the joint distribution of the random variables Y1 , Y2 ,..., Yn which arise as functions
of X 1 , X 2 ,..., X n . For example, suppose n = 2 , then

Y1 = g1 ( X 1 , X 2 ) = X 1 + X 2

Y2 = g 2 ( X 1 , X 2 ) = X 1 − X 2

would transform the random variables X 1 , X 2 into their sum and difference. To
determine the joint pdf of the transformed random variables, we may use the
following theorem, which is a generalization of the one-variable transformation
formula in section 2.6.

In general, let Yi = g i ( X 1 ,..., X n ) , i = 1,2,..., n for some functions g ’s such that the
functions g’s satisfy the following conditions :

1. The equations yi = g i ( x1 ,..., xn ) can be uniquely solved for x1 , x2 ,..., xn in terms


of y1 , y 2 ,..., y n with solutions given by, say, xi = hi ( y1 ,..., y n ) , i = 1,2,..., n (i.e.
the transformation from X ‘s to Y ‘s is 1-1 correspondence).

P.139
Stat1301 Probability& Statistics I Spring 2008-2009

2. The functions g’s have continuous partial derivatives at all points ( x1 , x2 ,..., xn )
and are such that the n × n Jacobian determinant is non-zero, i.e.

∂g1 ∂g1 ∂g1


L
∂x1 ∂x2 ∂xn
∂g 2 ∂g 2 ∂g 2
L
J ( x1 , x2 ,..., xn ) = ∂x1 ∂x2 ∂xn ≠ 0
M M M M
∂g n ∂g n ∂g n
L
∂x1 ∂x2 ∂xn

at all points ( x1 , x2 ,..., xn ) .

Under these two conditions, the joint pdf of Y1 , Y2 ,..., Yn is given by the following
formula :
f Y ( y1 ,..., yn ) = f X ( x1 ,..., xn ) × J ( x1 ,..., xn )
−1

where xi = hi ( y1 ,..., y n ) , i = 1,2,..., n .

Example 3.31

Suppose that two random variables X 1 , X 2 have a continuous joint distribution for
which the joint pdf is as follows:

⎧1
⎪ ( x1 + x2 )e 1 2
−x −x
for x1 > 0, x2 > 0
f X ( x1 , x2 ) = ⎨ 2 .
⎪⎩ 0 otherwise

Obviously the transformation Y1 = X 1 + X 2 , Y2 = X 1 − X 2 is 1-1 correspondence


with the inverse transformation X 1 = (Y1 + Y2 ) 2 , X 2 = (Y1 − Y2 ) 2 . The Jacobian
determinant is given by
∂y1 ∂y1
∂x ∂x2 1 1
J ( x1 , x2 ) = 1 = = 1 × (− 1) − 1 × 1 = −2 ≠ 0
∂y2 ∂y2 1 − 1
∂x1 ∂x2

The joint pdf of Y1 , Y2 is therefore given by


1 1
f Y ( y1 , y2 ) = f X ( x1 , x2 ) × J ( x1 , x2 )
−1 −1
= y1e − y1 × − 2 = y1e − y1
2 4
P.140
Stat1301 Probability& Statistics I Spring 2008-2009

with support x1 > 0, x2 > 0 ⇔ − y1 < y2 < y1 , y1 > 0 , i.e.

⎧1 −y
⎪ y1e 1 for − y1 < y2 < y1 , y1 > 0
f Y ( y1 , y2 ) = ⎨ 4 .
⎪⎩ 0 otherwise

Example 3.32

Let X ~ Γ(α , λ ) , Y ~ Γ(β , λ ) be two independent Gamma random variables. The


joint pdf of X and Y is

⎧ λα + β
⎪ xα −1 y β −1e − λ ( x + y ) for x > 0, y > 0
f X ,Y ( x, y ) = ⎨ Γ(α )Γ(β ) .
⎪0
⎩ otherwise

Consider the transformation

X
U= , V = X +Y .
X +Y

This is a 1-1 correspondence transformation with

X = UV , Y = V (1 − U ) .

The support of the joint distribution of U ,V is given by

x>0 uv > 0 0 < u <1


⇔ ⇔ .
y>0 v (1 − u ) > 0 v>0

Jacobian determinant:

∂u y 1− u ∂u x u ∂v ∂v
= = , =− =− , = =1
∂x ( x + y )2
v ∂y (x + y )2
v ∂x ∂y

∂u ∂u
1− u u
∂x ∂y − 1− u ⎛ u⎞ 1
J ( x, y ) = = v v = ×1 − 1 × ⎜ − ⎟ = ≠ 0
∂v ∂v v ⎝ v⎠ v
1 1
∂y ∂y

P.141
Stat1301 Probability& Statistics I Spring 2008-2009

Hence the joint pdf of U ,V is given by

fU ,V (u, v ) = f X ,Y ( x, y ) × J ( x, y )
−1

−1
λα + β
= (uv )α −1 (v (1 − u ))β −1 e − λv × 1
Γ(α )Γ(β ) v

λα + β
u α −1 (1 − u ) vα + β −1e −λv ,
β −1
= 0 < u < 1, v > 0
Γ(α )Γ(β )

From the joint pdf it is easily observed that U and V are independent. Note that the
joint pdf of U ,V can be written as

⎧ Γ(α + β ) α −1 β −1 ⎫ ⎧ λ
α +β

fU ,V (u, v ) = ⎨ u (1 − u ) ⎬ ⎨ vα + β −1e − λv ⎬ , 0 < u < 1, v > 0
⎩ Γ(α )Γ(β ) ⎭ ⎩ Γ(α + β ) ⎭

and therefore

V = X + Y ~ Γ(α + β , λ ) .
X
U= ~ Beta(α , β ) ,
X +Y

Remark

Sometimes the Jacobian determinant J ( x1 , x2 ,..., xn ) may be complicated. The


following alternative formula may be used

−1
∂g1 ∂g1 ∂g1 ∂h1 ∂h1 ∂h1
L L
∂x1 ∂x2 ∂xn ∂y1 ∂y2 ∂yn
∂g 2 ∂g 2 ∂g 2 ∂h2 ∂h2 ∂h2
L L
J ( x1 , x2 ,..., xn )
−1
= ∂x1 ∂x2 ∂xn = ∂y1 ∂y2 ∂yn
M M M M M M M M
∂g n ∂g n ∂g n ∂hn ∂hn ∂hn
L L
∂x1 ∂x2 ∂xn ∂y1 ∂y2 ∂yn

where h’s are the inverse transformations xi = hi ( y1 ,..., y n ) , i = 1,2,..., n .

P.142
Stat1301 Probability& Statistics I Spring 2008-2009

Example 3.33

In example 3.32,

−1
∂x ∂x ∂u ∂u
∂u ∂v = v u ∂x ∂y
= v × (1 − u ) − (− v ) × u = v = .
∂y ∂y − v 1 − u ∂v ∂v
∂u ∂v ∂x ∂y

Example 3.34

(Proof of the independence between X and S 2 from normal sample.)

( )
iid
Suppose X 1 , X 2 ,..., X n ~ N μ ,σ 2 . The joint pdf of X 1 , X 2 ,..., X n is given by

n ⎧⎪ 1 ⎛ ( xi − μ )2 ⎞⎫⎪
f ( x1 , x2 ,..., xn ) = ∏ ⎨ exp⎜⎜ − ⎟⎬
⎟⎪
i =1 ⎪⎩ 2πσ
2
⎝ 2σ 2
⎠⎭
( ) ⎧ 1 n
(x − μ )2 ⎫⎬ ,
n

= 2πσ 2 2 exp ⎨− 2 ∑ i
− ∞ < x1 , x2 ,..., xn < ∞
⎩ 2σ i =1 ⎭

Consider the transformation

Y1 = X , Yi = X i − X , i = 2,3,..., n .

This is a 1-1 correspondence transformation with

X 1 = nX − ( X 2 + ... + X n ) = X − ( X 2 − X ) − ... − ( X n − X )
= Y1 − Y2 − ... − Yn

X i = Yi + Y1 for i = 2,3,..., n

∂x1 ∂x1 ∂x1


L 1 −1 −1 L −1
∂y1 ∂y2 ∂yn
∂x2 ∂x2 ∂x2 1 1 0 L 0
L
J ( x1 , x2 ,..., xn )
−1
= ∂y1 ∂y2 ∂yn = 1 0 1 L 0 =n
M M M M M M M M M
∂xn ∂xn ∂xn
L 1 0 0 L 1
∂y1 ∂y2 ∂yn

P.143
Stat1301 Probability& Statistics I Spring 2008-2009

From section 3.6, it was derived that

n n
∑ (xi − μ ) = ∑ ( xi − x ) + n( x − μ )
2 2 2

i =1 i =1
2
⎛ n ⎞ n
= ⎜ ∑ yi ⎟ + ∑ yi2 + n( y1 − μ )
2

⎝ i =2 ⎠ i =2

Therefore the joint pdf of Y1 , Y2 ,..., Yn is given by

⎧⎪ 1 ⎡⎛ n ⎞ 2 n 2 2 ⎤⎪

( )
n
2 −2
f Y ( y1 ,..., yn ) = 2πσ exp ⎨− 2 ⎢⎜ ∑ i ⎟ ∑ i
y + y + n ( y − μ ) ⎥⎬ × n
⎪⎩ 2σ
1
⎣⎢ ⎝ i = 2 ⎠ i = 2 ⎦⎥ ⎪⎭
⎧⎪ 1 ⎛ ( y1 − μ )2 ⎞⎫⎪
=⎨ exp⎜⎜ − ⎟⎬
⎪⎩ 2π σ n 2
( )
⎝ 2 σ n ⎠⎪⎭
2 ⎟( )
⎧⎪ ⎧ 1 ⎡⎛ n ⎞ 2 n 2 ⎤ ⎫⎫⎪
× ⎨ n (2πσ 2 ) 2 exp⎨−
n −1

⎢⎜⎝ ∑ y i ⎟ + ∑ y i ⎥ ⎬⎬

⎪⎩ ⎩ 2σ ⎠ ⎦ ⎭⎪⎭
2
⎣ i = 2 i =2

, − ∞ < y1 , y2 ,..., yn < ∞

Therefore Y1 is independent of (Y2 ,..., Yn ) , i.e. X = Y1 is independent of


1 ⎛⎜ ⎛ n ⎞ ⎞
2

∑ (X i − X ) =
1 n n
S = ∑ Yi ⎟ + ∑ Yi 2 ⎟ .
2 2

n − 1 i =1 n − 1 ⎜⎝ ⎝ i = 2 ⎠ i = 2 ⎟⎠

§ 3.10 Some Important Transformations

In this section we consider the situations that two random variables are
transformed into one random variable.

§ 3.10.1 Sum of two random variables

Z = X +Y
Discrete case pZ (z ) = P ( X + Y = z ) = ∑ p ( x, z − x ) = ∑ p (z − y , y )
x y
Continuous case FZ (z ) = P (Z ≤ z ) = P ( X + Y ≤ z )
= P (Y ≤ z − X ) = ∫ − ∞ ∫ − ∞ f ( x, y )dydx
∞ z−x

= P ( X ≤ z − Y ) = ∫ − ∞ ∫ − ∞ f ( x, y )dxdy
∞ z− y

f Z (z ) = F ' (z ) = ∫ − ∞ f ( x, z − x )dx = ∫ − ∞ f (z − y , y )dy


∞ ∞

P.144
Stat1301 Probability& Statistics I Spring 2008-2009

Example 3.35

Let X and Y be two independent geometric random variables with parameter p.


What is the distribution of Z = X + Y ?

Support of Z: z = 2,3,..., ∞


pZ (z ) = ∑ p ( x, z − x ) = ∑ p X ( x ) pY (z − x )
x x =1
z −1
(
= ∑ (1 − p )
x =1
x −1
)(
p (1 − p )
z − x −1
p ) ( pY ( z − x ) = 0 if z − x < 1 )
z −1

∑ (1 − p )
z −2
=p 2

x =1

⎛ z − 1⎞ 2
= ( z − 1) p 2 (1 − p ) ⎟⎟ p (1 − p )
z −2 x −2
= ⎜⎜
⎝ 2 − 1⎠

⎛ n − 1⎞ r
Compare with the pmf of nb(n, r ) , p( x ) = ⎜⎜ ⎟⎟ p (1 − p ) ,
n−r
x = r , r + 1,... , we
⎝ r − 1⎠
can see that Z ~ nb(2, p ) .

Example 3.36

Let ( X , Y ) distributed jointly with joint pdf

⎧ e− y if 0 < x < y < ∞


f ( x, y ) = ⎨ .
⎩0 otherwise

What is the distribution of Z = X + Y ?

Support of Z: z ∈ (0, ∞ )

f Z (z ) = ∫ − ∞ f (z − y , y )dy = ∫ z 2 e − y dy ( f ( z − y, y ) = 0 if z − y < 0 or z − y > y )


∞ z

[
= − e− y ] z
z2

= e−z 2 − e− z , z>0

P.145
Stat1301 Probability& Statistics I Spring 2008-2009

Example 3.37

Let X and Y be two independent U (0,1) random variables. What is the distribution
of Z = X + Y ?

Support of Z: z ∈ (0,2 )

f Z (z ) = ∫ − ∞ f (z − y , y )dy = ∫ − ∞ f X (z − y ) fY ( y )dy
∞ ∞

For the integrand to be non-zero, we must have

0 < x <1 0 < z − y <1 z −1 < y < z


⇔ ⇔ ⇔ max (z − 1,0) < y < min(z,1) .
0 < y <1 0 < y <1 0 < y <1

f Z (z ) = ∫ 0 (1)(1)dy = z
z
For 0 < z ≤ 1,

f Z (z ) = ∫ z −1 (1)(1)dy = 2 − z
1
For 1 < z < 2 ,

⎧z for 0 < z ≤ 1

Therefore f Z (z ) = ⎨ 2 − z for 1 < z < 2 .
⎪0
⎩ otherwise

§ 3.10.2 Difference between two random variables

Z = X −Y
Discrete case pZ ( z ) = P ( X − Y = z ) = ∑ p ( x, x − z ) = ∑ p (z + y , y )
x y

Continuous case FZ (z ) = P (Z ≤ z ) = P ( X − Y ≤ z )
= P (Y ≥ X − z ) = ∫ − ∞ ∫ x − z f ( x, y )dydx
∞ ∞

= P ( X ≤ z + Y ) = ∫ − ∞ ∫ − ∞ f ( x, y )dxdy
∞ z+ y

f Z (z ) = F ' (z ) = ∫ − ∞ f ( x, x − z )dx = ∫ − ∞ f (z + y , y )dy


∞ ∞

P.146
Stat1301 Probability& Statistics I Spring 2008-2009

Example 3.38

Let ( X , Y ) distributed jointly with pdf

⎧ e− y if 0 < x < y < ∞


f ( x, y ) = ⎨ .
⎩0 otherwise

Find the distribution function and hence the pdf of Z = Y − X .

Support of Z: z ∈ (0, ∞ )

FZ (z ) = P (Z ≤ z ) = P (Y − X ≤ z ) y–x=z

f ( x, y )dydx
∞ x+ z
= ∫0 ∫0 y
y=x
∞ x+ z − y
=∫ ∫ 0 x
e dydx
=∫
∞ −x
0
(
e − e−(x + z ) )dx
= 1 − e−z
z
Integral region
f Z (z ) = F 'Z (z ) = e , −z
z>0
x
0
Hence Z = Y − X ~ Exp(1) .

Example 3.39

Let X and Y be two independent U (0,1) random variables. What is the distribution
of Z = X − Y ?

Support of Z : z ∈ (− 1,1)

f Z (z ) = ∫ − ∞ f (z + y , y )dy = ∫ − ∞ f X (z + y ) fY ( y )dy
∞ ∞

For the integrand to be non-zero, we must have

0 < x <1 0 < z + y <1 − z < y <1− z


⇔ ⇔ ⇔ max (− z,0) < y < min(1 − z,1) .
0 < y <1 0 < y <1 0 < y <1

f Z (z ) = ∫ − z (1)(1)dy = 1 + z
1
For − 1 < z ≤ 0 ,

f Z (z ) = ∫ 0 (1)(1)dy = 1 − z
1− z
For 0 < z < 1,

P.147
Stat1301 Probability& Statistics I Spring 2008-2009

⎧ 1+ z for − 1 < z ≤ 0

Therefore f Z (z ) = ⎨ 1 − z for 0 < z < 1 .
⎪0
⎩ otherwise

§ 3.10.3 Product of two random variables

Z = XY
Discrete case pZ (z ) = P ( XY = z ) = ∑ p ( x, x z ) = ∑ p ( y z , y )
x y

Continuous case FZ (z ) = P (Z ≤ z ) = P ( XY ≤ z )
= P (Y ≤ z X , X > 0) + P (Y ≥ z X , X < 0)
= ∫ 0 ∫ − ∞ f ( x, y )dydx + ∫ − ∞ ∫ z x f ( x, y )dydx
∞ z x 0 ∞

1 1
f Z (z ) = F ' (z ) = ∫ 0 f ( x, z x ) dx − ∫ − ∞ f ( x, z x ) dx
∞ 0

x x
1 1
= ∫ − ∞ f ( x, z x ) dx = ∫ − ∞ f (z y , y ) dy
∞ ∞

x y

Example 3.40

Let X and Y be two independent U (0,1) random variables. What is the distribution
of Z = XY ?

Support of Z: z ∈ (0,1)

1 1
f Z (z ) = ∫ − ∞ f ( x, z x ) dx = ∫ − ∞ f X ( x ) fY (z x ) dx
∞ ∞

x x

For the integrand to be non-zero, we must have

0 < x <1 0 < x <1 0 < x <1


⇔ ⇔ ⇔ 0 < z < x < 1.
0 < y <1 0 < z x <1 0< z< x

⎛1⎞
f Z (z ) = ∫ z (1)(1)⎜ ⎟dx = − ln z , 0 < z < 1 .
1
Therefore
⎝ x⎠

P.148
Stat1301 Probability& Statistics I Spring 2008-2009

§ 3.10.4 Ratio of two random variables

X
Z=
Y
Discrete case pZ (z ) = P ( X Y = z ) = ∑ p ( x, x z ) = ∑ p (zy, y )
x y

Continuous case FZ (z ) = P (Z ≤ z ) = P ( X Y ≤ z )
= P ( X ≤ zY , Y > 0) + P ( X ≥ zY , Y < 0)
= ∫ 0 ∫ − ∞ f ( x, y )dxdy + ∫ − ∞ ∫ zy f ( x, y )dxdy
∞ zy 0 ∞

f Z (z ) = F ' (z ) = ∫ 0 f (zy , y ) ydy − ∫ − ∞ f (zy , y ) ydy


∞ 0

⎛ x⎞ x
= ∫ − ∞ f (zy, y ) y dy = ∫ − ∞ f ⎜ x, ⎟ 2 dx
∞ ∞

⎝ z⎠z

Example 3.41

Let X and Y be two independent N (0,1) random variables. What is the distribution
X
of Z = ?
Y

Support of Z: z ∈ (− ∞, ∞ )

f Z (z ) = ∫ − ∞ f (zy , y ) y dy = ∫ − ∞ f X (zy ) fY ( y ) y dy
∞ ∞

⎛ −
z 2 y 2 ⎞⎛

y2 ⎞
= ∫ −∞ ⎜ e 2 ⎟⎜ e 2 ⎟ y dy
∞ 1 1
⎜ 2π ⎟⎜ 2π ⎟
⎝ ⎠⎝ ⎠

= 2∫ 0
∞ 1
exp⎜⎜ −
(
⎛ 1 + z2 y2 ⎞ )
⎟⎟ ydy (integrand is an even function)
2π ⎝ 2 ⎠
1⎡ 1
= ⎢
(
⎛ 1 + z 2 y 2 ⎞⎤
exp⎜⎜ − ⎟⎟⎥
) ∞

π ⎣1 + z 2 ⎝ 2 ⎠⎦ 0
1
= , −∞< z <∞
π (1 + z 2 )

This is known as the Cauchy distribution.


P.149
Stat1301 Probability& Statistics I Spring 2008-2009

Example 3.42

Let X ~ N (0,1) and Y ~ χ r2 be two independent random variables. What is the


X
distribution of Z = ?
Y r

Let W = Y r , then Y = rW 2 . It is a 1-1 function as W and Y are all positive. The


pdf of W is given by

( )
fW (w ) = fY rw2 × 2rw =
1
Γ(r 2 )2 r2
rw 2 ( ) r 2 −1 − rw 2 2
e × 2rw

rr 2
= w r −1e − rw 2 , w>0
2

Γ(r 2 )2 r 2 −1

Support of Z : z ∈ (− ∞, ∞ )

f Z (z ) = ∫ − ∞ f (zw, w ) w dw = ∫ − ∞ f X (zw ) fW (w ) w dw
∞ ∞

⎛ 1 − z 2 w2
∞⎜
⎞⎛ r2 −
rw 2 ⎞
=∫ e 2 ⎟⎜ r
w r −1e 2 ⎟ wdw ( fW (w) = 0 if w < 0 )
0 ⎜
2π ⎟⎜ Γ(r 2 )2 r 2 −1 ⎟
⎝ ⎠⎝ ⎠
rr 2
(r + z 2 )w 2
∞ r −
= ∫ we 2 dw
π Γ(r 2 )2 (r −1) 2 0
r 1

rr 2 ∞ ⎛ 2u ⎞ 2 − u 1 ⎛ 2 u ⎞ 2 2
= ∫ ⎜ ⎟ e ⎜ ⎟ du
Γ(1 2 )Γ(r 2 )2(r −1) 2 0 ⎝ r + z 2 ⎠ 2 ⎝ r + z2 ⎠ r + z2
(r + z )w
1

⎛ 2u ⎞ 2
2 2

( put u= ⇔ w=⎜ 2 ⎟
)
2 ⎝r + z ⎠
r −1
r2 r −1
r 22 ∞
= ∫ u 2 e − u du
Γ(1 2 )Γ(r 2 )2 (r −1) 2 r +1
(r + z )
0
2 2

⎛ r + 1⎞ ⎛ r + 1⎞
Γ⎜ ⎟ Γ⎜ ⎟ 1 r +1
2 − 2
2 ⎠ ⎛ ⎞
r (r + z 2 ) 2 = ⎝
2 ⎠ r2
= ⎝
r +1
− − z
r 2 ⎜1 + ⎟ , −∞< z <∞
⎛1⎞ ⎛r ⎞ ⎛1⎞ ⎛r ⎞
Γ⎜ ⎟ Γ⎜ ⎟ ⎝
r ⎠
Γ⎜ ⎟ Γ⎜ ⎟
⎝2⎠ ⎝2⎠ ⎝2⎠ ⎝2⎠

It is known as the Student’s t distribution and is denoted as Z ~ t r . It is an


important distribution used for statistical inference.

P.150
Stat1301 Probability& Statistics I Spring 2008-2009

Example 3.43

Let X and Y be two independent U (0,1) random variables. Determine the


X
distribution of Z = .
Y

Support of Z : z ∈ (0, ∞ )

⎛Y ⎞
Distribution function of Z: FZ (z ) = P (Z ≤ z ) = P⎜ ≤ z ⎟ = P (Y ≤ zX )
⎝X ⎠

y
For 0 < z < 1 ,

FZ (z ) = ∫ 0 ∫ 0 (1)(1)dydx
1 zx 1
y = zx
1
=∫ 0
zxdx
z
= integral region
2 x
0 1

For 1 ≤ z < ∞ , y
y = zx
FZ (z ) = ∫ 0 ∫ y z (1)(1)dxdy
1 1
1

1⎛ y⎞
= ∫ 0 ⎜1 − ⎟dy
⎝ z⎠
1 integral region
=1−
2z 0 1
x

Hence

⎧ ⎧
⎪0 for z ≤ 0 ⎪0 for z ≤ 0
⎪z ⎪
⎪ ⎪1
FZ (z ) = ⎨ for 0 < z < 1 , f Z (z ) = ⎨ if 0 < z < 1 .
⎪2 ⎪2
⎪ 1 ⎪ 1
⎪⎩ 1 − for z ≥ 1 ⎪⎩ 2 z 2 if z ≥ 1
2z

P.151

You might also like