0% found this document useful (0 votes)
23 views34 pages

Chap 2.2

The document discusses the properties and definitions of continuous random variables, including their probability density functions (pdf), mean, variance, and moment generating functions. It covers specific distributions such as uniform and exponential distributions, detailing their characteristics and applications. Examples are provided to illustrate the concepts, including calculations of probabilities and expected values.

Uploaded by

jeffsiu456
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views34 pages

Chap 2.2

The document discusses the properties and definitions of continuous random variables, including their probability density functions (pdf), mean, variance, and moment generating functions. It covers specific distributions such as uniform and exponential distributions, detailing their characteristics and applications. Examples are provided to illustrate the concepts, including calculations of probabilities and expected values.

Uploaded by

jeffsiu456
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

Stat1301 Probability& Statistics I Spring 2008-2009

§ 2.3 Distribution of the Continuous Type

Definition

A random variable X is said to be of the (absolute) continuous type if its


distribution function F ( x ) = P ( X ≤ x ) has the form

F ( x ) = ∫ − ∞ f (t )dt ,
x
−∞< x<∞

for some function f : ℜ → [0, ∞ ) .

If random variable X is of continuous type, then its probabilistic behaviour is no


longer described by pmf defined by p ( x ) = P ( X = x ) . Instead, the function f, called
the probability density function (pdf), will be used.

§ 2.3.1 Probability Density Function of Continuous Random Variable

A continuous variable of a group of individuals can be represented by a histogram


as shown below.

In a histogram, the area of each rectangular block should be directly proportional


to the frequency of the corresponding class. Usually, the height of each block is set
to the following relative frequency density so that the total area of all the blocks
will be equal to 1.

relative frequency
relative frequency density = .
class width

P.70
Stat1301 Probability& Statistics I Spring 2008-2009

For a large data set, one can use smaller class width to produce finer histogram.

Therefore for an infinite population, it is reasonable to model the distribution of a


continuous variable by a smooth curve, that is, the probability density function.

P.71
Stat1301 Probability& Statistics I Spring 2008-2009

Properties of probability density function

1. If F is differentiable, then

P( X ≤ x + t ) − P( X ≤ x ) d
f ( x ) = lim = F (x ) .
t →0 t dx

2. f ( x ) ≥ 0 for all x

3. ∫ − ∞ f ( x )dx = F (∞ ) = 1

4. P ( X ∈ B ) = ∫B f ( x )dx where B is any subset of ℜ .

5. If X is of continuous type, then

P ( X = a ) = ∫ a f ( x )dx = 0 for all a; and hence


a
(i)

(ii) P (a ≤ X ≤ b ) = P (a < X ≤ b ) = P (a ≤ X < b ) = P (a < X < b )


= ∫ a f ( x )dx = F (b ) − F (a )
b

(iii) The distribution function F is continuous everywhere.

Example

⎧ c 2x − x2
f (x ) = ⎨
( ) ,0 < x < 2
⎩0 , otherwise

2
⎡ x3 ⎤
2
( )
c ∫ 2 x − x dx = 1 ⇒ c ⎢ x 2 − ⎥ = 1 ⇒
0
2 4c
=1⇒ c =
3
⎣ 3 ⎦0 3 4
0. 8
0. 7
0. 6
0. 5
f(x)

0. 4
0. 3
0. 2
0. 1
0
0 0. 5 1 1. 5 2
x

P.72
Stat1301 Probability& Statistics I Spring 2008-2009

For any a, b ∈ [0,1] ( a < b ),

P (a ≤ X ≤ b ) = ∫ a
b3

4
(2 x − x )dx = { 3(b
2 1
4
2
) (
− a 2 − b3 − a 3 )} .
Distribution function:

F (x ) = ∫ 0
x3

4
(2t − t )dt = (3x
2 1
4
2
− x3 ) for 0 < x < 2 .

F (x ) = 0 if x ≤ 0 ; F (x ) = 1 if x ≥ 2 .

P (0.5 ≤ X ≤ 1) = F (1) − F (0.5) = 0.5 − 0.15625 = 0.34375

Remark

Note that f ( x ) is not a probability. For a very small ε > 0 ,

⎛ ε ε ⎞ a +ε 2
P⎜ a − ≤ X ≤ a + ⎟ = ∫ a − ε 2 f ( x )dx ≈ ε f (a ) .
⎝ 2 2⎠

Hence f (a ) is a measure of how likely it is that the random variable will be near a.

§ 2.3.2 Mean, Variance and Moment Generating Function

Definition
If f ( x ) is the pdf of a continuous random variable, then

E (u ( X )) = ∫ − ∞ u ( x ) f ( x )dx

is the expected value (mathematical expectation) of u ( X ) if it exists.

The properties of the mathematical expectations of a discrete random variable also


apply to the mathematical expectations of a continuous random variable.

P.73
Stat1301 Probability& Statistics I Spring 2008-2009

Properties
1. If c is a constant, then E (c ) = c .

2. If c is a constant, then E (cg ( X )) = cE (g ( X )) .

⎛n ⎞ n
3. If c1 , c2 ,..., cn are constants, then E ⎜ ∑ ci gi ( X )⎟ = ∑ ci E (gi ( X )) .
⎝ i =1 ⎠ i =1

4. X (ω ) ≥ Y (ω ) for all ω ∈ Ω ⇒ E ( X ) ≥ E (Y )

5. E ( X ) ≥ E ( X )

μ = E ( X ) = ∫ − ∞ xf ( x )dx

Mean

Variance ( )
σ 2 = E ( X − μ )2 = ∫ − ∞ ( x − μ )2 f ( x ) = E (X 2 ) − μ 2

Moment generating function

( )
M X (t ) = E e tX = ∫ − ∞ e tx f ( x )dx

E ( X r ) = M X( r ) (0 )

R(t ) = ln M X (t ) ⇒ μ = R ' (0 ), σ 2 = R ' ' (0 )

Example

⎧ xe − x ,0 < x < ∞
f (x ) = ⎨
⎩0 , otherwise

1
M X (t ) = ∫ 0 e tx xe − x dx = ∫ 0 xe (t −1) x dx =
∞ ∞
, t <1
(1 − t )2
R(t ) = ln M X (t ) = −2 ln(1 − t )

μ = R' (0 ) = 2 , σ 2 = R ' ' (0 ) = 2

P.74
Stat1301 Probability& Statistics I Spring 2008-2009

§ 2.3.3 Uniform Distribution

Definition

For an interval (a, b ) , let X be the point randomly drawn from this interval. If the
pdf of X is a constant function on (a, b ) , i.e.

⎧ 1
⎪ ,a < x < b
f (x ) = ⎨ b − a ,
⎪⎩ 0 , otherwise

then X is said to have a uniform distribution and is denoted as X ~ U (a, b ) .

f ( x)

x
0 a b

Roughly speaking, it is a point randomly selected in such a way that it has no


preference to be located near any particular region.

⎧0 ,x ≤ a
⎪⎪ x − a
Distribution function F (x ) = ⎨ ,a < x < b
⎪ b−a
⎪⎩ 1 ,x ≥b

a+b
Mean μ = E(X ) = (midpoint of interval)
2

Variance σ = E (X
2 2
)− μ = (
2b − a)
2

12

P.75
Stat1301 Probability& Statistics I Spring 2008-2009

Example

A straight rod drops freely onto a horizontal plane. Let X be the angle between the
rod and North direction: 0 ≤ X < 2π . Then X ~ U (0,2π ) .

μ=
0 + 2π
=π , σ =
2 (2π − 0)
2
1
= π2
2 12 3

x−0 x
F (x ) = = for 0 ≤ x < 2π
2π − 0 2π

⎛π π⎞
P (pointing towards direction between NE and E ) = P ⎜ ≤ X ≤ ⎟
⎝4 2⎠
⎛π ⎞ ⎛π ⎞
= F⎜ ⎟ − F⎜ ⎟
⎝2⎠ ⎝4⎠
1 ⎛π π ⎞
= ⎜ − ⎟ = 0.125
2π ⎝ 2 4 ⎠

Property

Let X ~ U (0,1), Y = cX + d , then

Y ~ U (d , c + d ) if c is positive;

Y ~ U (c + d , d ) if c is negative.

x−0
Proof FX ( x ) = P ( X ≤ x ) = = x for 0 ≤ x ≤ 1 .
1− 0
If c > 0 , then the df of Y is given by

FY ( y ) = P (Y ≤ y ) = P (cX + d ≤ y )
⎛ y−d ⎞ y−d y−d
= P⎜ X ≤ ⎟= = for c ≤ y ≤ c + d
⎝ c ⎠ c (c + d ) − d
Compare with the df of a uniform random variable, we have

Y ~ U (c, c + d ) .

Similarly if c < 0 , then Y ~ U (c + d , c ) .

P.76
Stat1301 Probability& Statistics I Spring 2008-2009

Example

If X ~ U (0,1) , then 2 X − 1 ~ U (− 1,1) .

X +2
If X ~ U (− 2,6) , then ~ U (0,1) .
8

§ 2.3.4 Exponential Distribution

Definition

Let X be a positive random variable with pdf

⎧ λ e − λx ,x >0
f (x ) = ⎨ ,
⎩0 , otherwise

then X is said to have an exponential distribution and is denoted as X ~ Exp (λ ) .

P.77
Stat1301 Probability& Statistics I Spring 2008-2009

Recall in Poisson process,

Random variable N (t ) = number of occurrences in a time interval [0, t ]

e − λt (λt )
y
g ( y ) = P (N (t ) = y ) = , y = 0,1,2,...
y!

Define random variable X be the waiting time until the first occurrence in a
Poisson process with rate λ . Then the distribution function of X can be derived as
follows:

for x ≤ 0 , F (x ) = P( X ≤ x ) = 0 .

for x > 0 , F (x ) = P( X ≤ x ) = 1 − P( X > x )


= 1− P (no occurrence in time interval [0, x ])
= 1 − P ( N ( x ) = 0 ) = 1 − e − λx (set y = 0 in g ( y ) )

⎧ 1 − e − λx ,x > 0
Therefore the df of X is F (x ) = ⎨ .
⎩0 ,x ≤0

Hence the pdf of X is given by f ( x ) = F ' ( x ) = λe − λx , x≥0

Therefore X is distributed as exponential with parameter λ . An exponential


random variable therefore can describes the random time elapsing between
unpredictable events (e.g. telephone calls, earthquakes, arrivals of buses or
customers, etc.)

The moment generating function of Exp (λ ) is given by


λ e − (λ − t ) x λ
M X (t ) = ∫ − (λ − t ) x
∞ tx − λx ∞
e λe dx = ∫ λe dx = − = , t<λ
0 0
λ −t 0
λ −t

R(t ) = ln M X (t ) = ln λ − ln (λ − t )

1 1
R' (t ) = , R' ' (t ) =
λ −t (λ − t )2
1 1
μ = R' (0 ) = , σ 2 = R' ' (0 ) =
λ λ2

P.78
Stat1301 Probability& Statistics I Spring 2008-2009

In summary, we have, for X ~ Exp (λ ) ,

⎧ 1 − e − λx ,x > 0
Distribution function F (x ) = ⎨
⎩0 ,x ≤0

λ
Moment generating function M X (t ) = , t<λ
λ −t

1 1
Mean and variance μ= , σ2 =
λ λ2

Example

Let X be the failure time of a machine. Assume that X follows an exponential


distribution with rate λ = 2 failures per month. Then the average failure time is
E(X ) =
1
month. The probability that the machine can run over 4 months without
2
any failure is give by

Pr ( X > 4 ) = 1 − F (4 ) = 1 − (1 − e −2 ( 4 ) ) = 0.000335 .

Memoryless property

For any a > 0 and b > 0 ,

P ( X > a + b ) = 1 − F (a + b )
= e − λ ( a +b )
= e − λa e − λb
= (1 − F (a ))(1 − F (b )) = P ( X > a )P ( X > b )

This implies Pr ( X > a + b | X > a ) = Pr ( X > b ) .

That is, knowing that event hasn’t occurred in the past a units of time doesn’t alter
the distribution of arrival time in the future, i.e. we may assume the process starts
afresh at any point of observation.

Among all continuous random variable with support (0, ∞ ) , the exponential
distribution is the only distribution that has the memoryless property.

P.79
Stat1301 Probability& Statistics I Spring 2008-2009

Example

In the previous example, what is the probability that the machine can run 4 months
more given that it has run for 1 year already?

Ans: P ( X > 16 | X > 12 ) = P ( X > 4 ) = 0.000335

§ 2.3.5 Survival Function and Harzard Rate Function

Let X be a positive continuous random variable that we interpret as being the


lifetime of some item, having distribution function F and probability density
function f.

Definition

Survival function S (t ) = P ( X > t ) = 1 − F (t )

It represents the probability that the item can survive at least for a time t.

Hazard rate function (mortality function, failure rate function)

f (t )
λ (t ) = .
S (t )

Remarks

1. The hazard rate function λ (t ) represents the conditional probability intensity


that a t-unit-old item will fail/die instantly at time t.

F (t + dt ) − F (t ) f (t )dt
P (t < X < t + dt | X > t ) = ≈ = λ (t )dt
S (t ) S (t )

2. The hazard rate function λ (t ) uniquely determines the distribution F.

f (t ) S ' (t ) d
λ (t ) = =− = − (ln S (t ))
S (t ) S (t ) dt

∫ 0 λ (t )dt = [− ln S (t )] 0 = − ln S ( x ) = − ln (1 − F ( x ))
x

x

⇒ (
F ( x ) = 1 − exp − ∫ 0 λ (t )dt
x
)
P.80
Stat1301 Probability& Statistics I Spring 2008-2009

Example

f (t ) λ e − λt
If X ~ exp(λ ) , then λ (t ) = = =λ.
(
S (t ) 1 − 1 − e − λt )
Hence the exponential random variable has a constant hazard rate, i.e. old subject
will be as likely to “die” as young subject, without regarding to their ages. Due to
such memoryless property, the exponential distribution is generally not a
reasonable model for the survival time of subjects with natural aging property.

Example

Usually it would be more reasonable to model the lifetime of an item by an


increasing hazard rate function rather than a constant hazard rate. (“Older” item
will have higher chance to fail/die.) For example, we can use a linear hazard rate
function λ (t ) = a + bt . Then the distribution function is given by

⎛ ⎡ bt 2 ⎤ ⎞⎟
( )
x

F ( x ) = 1 − exp − ∫ (a + bt )dt = 1 − exp − ⎢at +
x
0 ⎜ ⎣ 2 ⎥⎦ 0 ⎟
⎝ ⎠
⎛ bx 2 ⎞

= 1 − exp⎜ − ax − ⎟⎟ , for x > 0
⎝ 2 ⎠

⎛ bx 2 ⎞
f ( x ) = F ' ( x ) = (a + bx )exp⎜⎜ − ax − ⎟⎟ , x>0
⎝ 2 ⎠

In particular, if a = 0 , the corresponding random variable is said to have the


Rayleigh distribution.

P.81
Stat1301 Probability& Statistics I Spring 2008-2009

§ 2.3.6 The Gamma and Chi-Squared Distribution

Γ(α ) = ∫ 0 xα −1e − x dx ,

Gamma function α >0

Properties of the Gamma function


1. Γ(1) = 1

2. For α > 1 , Γ(α ) = (α − 1)Γ(α − 1)

3. For any integer n ≥ 1 , we have Γ(n ) = (n − 1) !

⎛1⎞
4. Γ⎜ ⎟ = π
⎝2⎠

Definition

Let X be a positive random variable with pdf

⎧ λα α −1 − λx
⎪ x e ,x > 0
f ( x ) = ⎨ Γ(α ) ,
⎪0 ,x ≤ 0

then X is said to have a Gamma distribution and is denoted as X ~ Γ(α , λ ) .

P.82
Stat1301 Probability& Statistics I Spring 2008-2009

Gamma random variable as a waiting time

Let Tn be the waiting time until the nth occurrence of an event according to a
Poisson process with rate λ . Then the distribution function of Tn can be derived
as
F (t ) = P (Tn ≤ t ) = 1 − P (Tn > t )

= 1 − P (N (t ) < n )

=1− ∑
n −1 (λt )k e − λt , t > 0.
k =0 k!

Hence the pdf of Tn is

λ (λt )n−1 e − λt λn
f (t ) = F ' (t ) = = t n−1e −λt , t > 0.
(n − 1)! Γ(n )

Therefore X is distributed as Gamma with parameters α = n and rate λ . A Gamma


random variable therefore can describes the random time elapsing until the
accumulation of a specific number of unpredictable events (e.g. telephone calls,
earthquakes, arrivals of buses or customers, etc.)

The moment generating function of X ~ Γ(α , λ ) is given by

λα α −1 − λx λα ∞ α −1 − (λ − t )x
M X (t ) = ∫
∞ tx
e x e dx = ∫ x e dx
0
Γ(α ) Γ(α ) 0

λα Γ(α ) ∞ (λ − t )α α −1 − (λ − t )x
= ∫ x e dx
Γ(α ) (λ − t )α 0 Γ(α )

α
⎛ λ ⎞
=⎜ ⎟ , t<λ
⎝λ −t⎠

From the moment generating function, one can easily derive the mean and variance
of X.

P.83
Stat1301 Probability& Statistics I Spring 2008-2009

In summary, we have, for X ~ Gamma (α , λ ) ,

λα α −1 − λt
F (x ) = ∫ 0
x
Distribution function t e dt
Γ(α )

α
⎛ λ ⎞
Moment generating function M X (t ) = ⎜ ⎟ , t<λ
⎝λ −t⎠

α α
Mean and variance μ= , σ2 =
λ λ2

n
For a Poisson process, the mean waiting time until the nth occurrence is E (Tn ) = .
λ

Example
Assume that the number of phone calls received by a customer service
representative follows a Poisson process with rate 20 calls per hour.

Let T8 be the waiting time (in hours) until the 8th phone call. Then T8 ~ Γ(8,20 ) .

8 2
Mean waiting time for 8 phone calls is E (T8 ) = = hour.
20 5

Suppose that a customer service representative only needs to serve 8 calls before
taking a break. Then the probability that he/she has to work for more than 48
minutes before taking a rest is

P ( X > 0.8) = 1 − P ( X ≤ 0.8) = 1 − ∫


(20) 7 − 20 x
8
0.8
x e dx
Γ(8)0

⎛ 7 (16 )k e −16 ⎞

= 1− 1− ∑ ⎟ = 0.01
⎜ k! ⎟
⎝ k =0 ⎠

P.84
Stat1301 Probability& Statistics I Spring 2008-2009

Properties of Gamma distribution

1. The exponential distribution is a special case of the Gamma distribution with


α = 1 , i.e. Γ(1, λ ) ≡ exp(λ ) .

2. Suppose X ~ Γ(α , λ ) . Let Y = aX where a is a positive number. Then

⎛ λ⎞
Y ~ Γ⎜ α , ⎟ .
⎝ a⎠

Proof

The moment generating function of Y is given by

α α
⎛ λ ⎞ ⎛ λ a ⎞
( ) = E (e )
M Y (t ) = E e tY taX
= M X (at ) = ⎜
λ −
⎟ = ⎜⎜
λ −
⎟⎟ , t<
λ
⎝ at ⎠ ⎝ a t ⎠ a

which is the moment generating function of Γ(α , λ a ) .

3. The parameter α is called the shape parameter while λ is called the scale
parameter. According to property 2, we may tabulate the distribution function
for the Gamma distribution with a standardized scale parameter.

Definition
r 1
Let r be a positive integer. If X ~ Γ(α , λ ) with α = , λ = , then we say that X
2 2
has a Chi-Squared distribution with degrees of freedom r and is denoted by
X ~ χ r2 .

1
Probability density function f (x ) = x r 2−1e − x 2 , x > 0.
Γ(r 2 )2 r 2

1 1
Moment generating function M X (t ) = , t<
(1 − 2t )r 2 2

Mean and variance μ=r , σ 2 = 2r

The values of the Chi-Squared distribution function are tabled.

P.85
Stat1301 Probability& Statistics I Spring 2008-2009

Example
⎛ 1⎞
For the previous example, X ~ Γ(8,20 ) . Consider Y = 40 X ~ Γ⎜ 8, ⎟ ≡ χ16
2
.
⎝ 2⎠
From Chi-Squared distribution table,

P ( X > 0.8) = 1 − P ( X ≤ 0.8) = 1 − P (Y ≤ 32 ) = 1 − 0.99 = 0.01

P.86
Stat1301 Probability& Statistics I Spring 2008-2009

§ 2.3.7 Normal Distribution

(Very important distribution)

Definition

The random variable X is said to have a normal distribution (Gaussian


distribution) if its pdf is defined by

( x − μ )2
f (x ) =
1 −
e 2σ 2
, −∞< x<∞
2πσ 2

where − ∞ < μ < ∞ and σ 2 > 0 are the location parameter and scale parameter
respectively. It is denoted as X ~ N (μ ,σ 2 ) .

(t − μ ) 2
1 −
F (x ) = ∫ − ∞
x
Distribution function e 2σ 2 dt , −∞< x<∞
2πσ 2

⎛ 1 ⎞
Moment generating function M X (t ) = exp⎜ μt + σ 2t 2 ⎟ , t ∈ℜ
⎝ 2 ⎠

Mean and variance E(X ) = μ , Var ( X ) = σ 2

P.87
Stat1301 Probability& Statistics I Spring 2008-2009

If μ = 0 , σ 2 = 1 , then Z ~ N (0,1) is called the standard normal distribution.

Usually the probability density function and distribution function of the standard
normal distribution are denoted as

z2
1 −2
φ (z ) = e , −∞< z<∞

t2
1 −2
Φ (z ) = P (Z ≤ z ) = ∫
z
e dt , −∞< z<∞
−∞

The values of the standard normal distribution function are tabled.

Normal distribution is important because many of the random quantities in the real
world have distributions resemble the normal distribution.

Properties

1. Normal distribution is symmetric about its mean. That is, if X ~ N (μ ,σ 2 ) , then

P( X ≤ μ − x ) = P( X ≥ μ + x ) .

In particular, Φ( x ) = 1 − Φ (− x ) .

( )
2. If X ~ N (μ ,σ 2 ) , then aX + b ~ N aμ + b, a 2σ 2 . In particular, the normal
X −μ
score Z = is distributed as standard normal.
σ

3. If X ~ N (0,1) , then X 2 ~ χ 12 .

P.88
Stat1301 Probability& Statistics I Spring 2008-2009

Examples of standard normal distribution table

P.89
Stat1301 Probability& Statistics I Spring 2008-2009

Example

Let X be the IQ score of a person. Experience reveals that the mean IQ score is 100
and standard deviation of IQ score is 15. Then we have

⎛ 85 − 100 X − 100 115 − 100 ⎞


P (85 ≤ X ≤ 115) = P⎜ ≤ ≤ ⎟
⎝ 15 15 15 ⎠

⎛ 85 − 100 115 − 100 ⎞


= P⎜ ≤Z≤ ⎟, Z ~ N (0,1)
⎝ 15 15 ⎠

⎛ 115 − 100 ⎞ ⎛ 85 − 100 ⎞


= Φ⎜ ⎟ − Φ⎜ ⎟
⎝ 15 ⎠ ⎝ 15 ⎠

= Φ (1) − Φ (− 1)

= Φ (1) − (1 − Φ (1))

= 2(0.841) − 1 = 0.682

⎛b−μ⎞ ⎛a−μ⎞
In general, if X ~ N (μ ,σ 2 ) , then P (a ≤ X ≤ b ) = Φ ⎜ ⎟ − Φ⎜ ⎟ .
⎝ σ ⎠ ⎝ σ ⎠

Example

In general, for a normal random variable, the probability of that its value is within
one standard deviation of the mean is

P (μ − σ ≤ X ≤ μ + σ ) = Φ (1) − Φ (− 1) = 0.682 = 68.2% .

Similarly, the probability of that its value is within two standard deviation from the
mean is

P (μ − 2σ ≤ X ≤ μ + 2σ ) = Φ (2 ) − Φ (− 2 ) = 0.944 = 94.4% .

P.90
Stat1301 Probability& Statistics I Spring 2008-2009

Example

A manufacturer needs washers between .1180 and .1220 inches thick; any
thickness outside this range is unusable. One machine shop will sell washers as
$3.00 per 1000. Their thickness is normally distributed with a mean of .1200 inch
and a standard deviation of .0010 inch.

A second machine shop will sell washers at $2.60 per 1000. Their thickness is
normally distributed with a mean of .1200 inch and a standard deviation of .0015
inch.

Which shop offers the better deal?

Ans: Washers purchased from shop 1: (


X ~ N 0.12, (0.001)
2
)
⎛ 0.122 − 0.12 ⎞ ⎛ 0.122 − 0.12 ⎞
P (0.118 ≤ X ≤ 0.122 ) = Φ ⎜ ⎟ − Φ⎜ ⎟
⎝ 0.001 ⎠ ⎝ 0.001 ⎠

= Φ (2 ) − Φ (− 2 )

= 0.9772 − (1 − 0.9772 ) = 0.9544

Hence on average, 954.4 of 1000 washers from shop 1 are usable. Average
cost of each effective washer is $3 954.4 = $0.003143 .

Washers purchased from shop 2: (


Y ~ N 0.12, (0.0015)
2
)
⎛ 0.122 − 0.12 ⎞ ⎛ 0.122 − 0.12 ⎞
P (0.118 ≤ Y ≤ 0.122 ) = Φ ⎜ ⎟ − Φ⎜ ⎟
⎝ 0.0015 ⎠ ⎝ 0.0015 ⎠

= Φ (1.33) − Φ (− 1.33)

= 0.9082 − (1 − 0.9082 ) = 0.8164

Hence on average, 816.4 of 1000 washers from shop 2 are usable. Average
cost of each effective washer is $2.6 816.4 = $0.003185 .

We may conclude that washers from shop 1 are slightly better.

P.91
Stat1301 Probability& Statistics I Spring 2008-2009

Example

A brass polish manufacturer wishes to set his filling equipment so that in the long
run only five cans in 1,000 will contain less than a desired minimum net fill of
800gm. It is known from experience that the filled weights are approximately
normally distributed with a standard deviation of 6gm. At what level will the mean
fill have to be set in order to meet this requirement?

Let X be the net fill of a particular can. Then X ~ N (μ ,36 ) . The requirement can
be expressed as the following probability statement:

⎛ X − μ 800 − μ ⎞
P ( X < 800) = 0.005 ⇒ P⎜ < ⎟ = 0.005
⎝ σ 6 ⎠

⎛ 800 − μ ⎞
⇒ Φ⎜ ⎟ = 0.005
⎝ 6 ⎠

800 − μ
⇒ = 2.576 ⇒ μ = 815.456
6

§ 2.3.8 Beta Distribution

Definition

Let X be a positive random variable with pdf

⎧ Γ(α + β ) α −1
x (1 − x )
β −1
⎪ ,0 < x <1
f ( x ) = ⎨ Γ(α )Γ(β )
⎪0
⎩ , otherwise

where α and β are positive numbers, then X is said to have a Beta distribution
and is denoted as X ~ Beta (α , β ) .

α αβ
Mean and variance E(X ) = , Var ( X ) =
α+β (α + β ) (α + β + 1)
2

It is commonly used for modelling a random quantity distributed within a finite


interval.

P.92
Stat1301 Probability& Statistics I Spring 2008-2009

Remarks

1. The function
Γ(α )Γ(β )
B (α , β ) = ∫ 0 x a −1 (1 − x )
1 b −1
dx =
Γ(α + β )

is called the beta function. Therefore the pdf of beta distribution is sometimes
expressed as
⎧ 1
xα −1 (1 − x )
β −1
⎪ ,0 < x <1
f ( x ) = ⎨ B (α , β ) .
⎪0
⎩ , otherwise

2. If α = β = 1 , the beta distribution becomes the uniform distribution U (0,1) .

Example

After assessed the current political, social, economical and financial factors, a
financial analyst believes that the proportion of the stocks that will increase in
value tomorrow is a beta random variable with α = 5 and β = 3 . What is the
expected value of this proportion? How likely is that the values of at least 70% of
the stocks will move up?

5
Ans: Let p be the proportion. Then p ~ Beta (5,3) and E ( p ) = = 0.625 .
5+3

Γ(8)
P ( p ≥ 0.7 ) = ∫ 0.7
1

Γ(5)Γ(3)
x 5 −1 (1 − x ) dx =
3 −1 7! 1 4
5! 2 !
∫ 0.7
( )
x − 2 x 5 + x 6 dx = 0.0706

P.93
Stat1301 Probability& Statistics I Spring 2008-2009

§ 2.4 Mixed Distribution

There exists distributions which are neither discrete nor (absolute) continuous. We
call these distributions the mixed distribution.

Example

Suppose T is the lifetime of a device (in 1000 hour units) distributed as exponential
with rate λ = 1 . In a test of the device, we cannot wait forever, so we might
terminate the test after 2000 hours and the truncated lifetime X is recorded, i.e.

⎧T if T < 2
X =⎨ .
⎩2 if T ≥ 2

Therefore we have
(
P ( X = 2 ) = P (T ≥ 2 ) = 1 − 1 − e −2 = e −2 )
and for 0 < x < 2 ,
P ( X ≤ x ) = P (T ≤ x ) = 1 − e − x .

The distribution function of X is therefore given by

⎧0 x≤0

F (x ) = ⎨ 1 − e − x 0< x<2
⎪1 x≥2

It is discrete at x = 2 and continuous elsewhere. The expected lifetime recorded


can be computed as

E ( X ) = ∫ 0 xe − x dx + e − 2 × 2 = 1 − e − 2 = 0.8647
2

P.94
Stat1301 Probability& Statistics I Spring 2008-2009

§ 2.5 Quantiles

In describing the distribution of a group of data, it would be more comprehensive if


some cut-off points that partition the dataset into consecutive pieces are reported.
For example, the lower quartile is the data value such that 1 4 of the data points
are lower than or equal to it; the median is the data value such that 1 2 of the data
points are lower than or equal to it; etc. In general, these cut-off values are called
quantiles.

§ 2.5.1 Quantiles of Continuous Distributions

Definition

When the distribution of a random variable X is continuous and one-to-one over


the whole set of possible values of X, the inverse distribution function F −1 exists
and the value F −1 ( p ) is called the p-quantile of X for 0 < p < 1 . It is also named as
the 100p percentile of X.

If x p is the 100p percentile of X, then 100p percent of the distribution of X is at or


below the value of x p .

Example

Suppose that the final examination scores in a large class of Stat1301


approximately follows the normal distribution with mean 69 and standard
deviation 12, i.e. X ~ N (69,144 ) . Then the 85th percentile of the scores is
F −1 (0.85) and can be evaluated by solving

F ( x0.85 ) = 0.85 ⇒ P ( X ≤ x0.85 ) = 0.85


⎛ x − 69 ⎞
⇒ Φ ⎜ 0.85 ⎟ = 0.85
⎝ 12 ⎠
x − 69
⇒ 0.85 = 1.04
12
⇒ x0.85 = 81.48

The 85th percentile of the scores is 81.48. Therefore a student with score higher
than 81.5 is in the top 15% of the class.

Note that the calculation relies on the fact that the normal distribution function is a
one-to-one function.

P.95
Stat1301 Probability& Statistics I Spring 2008-2009

Remark

Certain quantiles have special names. The 1 2 -quantile or the 50th percentile is a
special case of what we shall call a median. The 1 4 -quantile or the 25th percentile
is the lower quartile. The 3 4 -quantile or the 75th percentile is the upper quartile.
These three values partition the distribution into four equal pieces.

Example

If X ~ Exp (λ ) , then
⎧ 1 − e − λx ,x > 0
F (x ) = ⎨ .
⎩0 ,x ≤0

For any 0 < p < 1 ,


F (x p ) = p ⇒ 1 − e
1
= p ⇒ x p = − ln (1 − p ) .
− λx p

Since F (x p ) = p has a unique solution, F −1 exists and is given by

1
F −1 ( p ) = − ln (1 − p ) .
λ

In particular, if the lifetime (in 1000 hours unit) of a device is distributed as


exponential with rate λ = 1 , then

1 4
x0.25 = F −1 (0.25) = − ln (1 − 0.25) = ln = 0.288
1 3

1
x0.5 = F −1 (0.5) = − ln (1 − 0.5) = ln 2 = 0.693 ,
1

1
x0.75 = F −1 (0.75) = − ln (1 − 0.75) = ln 4 = 1.386
1

i.e. the lower quartile, median, and upper quartile of the lifetime of the devices are
288, 693, and 1386 hours respectively.

P.96
Stat1301 Probability& Statistics I Spring 2008-2009

Example

Let X be a binomial random variable with parameters n = 5 and p = 0.3 . The


following table shows the pmf and df of X.

x 0 1 2 3 4 5
p(x ) 0.1681 0.3602 0.3087 0.1323 0.0284 0.0024
F (x ) 0.1681 0.5283 0.8370 0.9693 0.9977 1

The p-quantile cannot be evaluated by F −1 ( p ) because F −1 does not exist, e.g.


there is no x such that F ( x ) = 0.5 .

To define quantile for distribution that is not continuous, we may make use of a
generalization of the inverse for non-decreasing functions.

§ 2.5.2 Quantiles of Discrete Distributions

Definition

Let X be a random variable with distribution function F. For any 0 < p < 1 , the p-
quantile (100p percentile) of X is defined as the smallest x such than F ( x ) ≥ p .

Example

From previous example, the values of F ( x ) for x = 1,2,3,4,5 are all greater than 0.5.
Therefore the median of a b(5,0.3) random variable is x0.5 = 1 . Similarly, the 75th
percentile is x0.75 = min{ x : F ( x ) ≥ 0.75 } = 2 .

Remark

This definition of the quantile applies to any distribution, no matter whether it is


discrete, continuous, or of the mixed type. In fact, for continuous distribution, we
have
min{ x : F ( x ) ≥ p } = min{ x : x ≥ F −1 ( p ) } = F −1 ( p ) .

P.97
Stat1301 Probability& Statistics I Spring 2008-2009

§ 2.6 Distribution of Functions of Random Variable

In general, functions of random variables are random variables.

Example

1. If Z ~ N (0,1) , then Y = Z 2 ~ χ12 , (


X = σZ + μ ~ N μ , σ 2 .)
⎛ λ⎞
2. If X ~ Γ(α , λ ) , then Y = aX ~ Γ⎜α , ⎟ .
⎝ a⎠

3. If X ~ U (0,1), then
1
Y = − ln X ~ Exp (λ ) .
λ

Theorem

Let X be a continuous random variable on distributed on a space S with pdf f X ( x ) .


Let Y = g ( X ) where g is a function such that g −1 exists. Then the pdf of Y can be
obtained by
(
f Y ( y ) = f X g −1 ( y ) )
d −1
dy
g (y) , y ∈ g (S ) .

Proof

Let FX ( x ) be the distribution function of X. The distribution function of Y is

FY ( y ) = P (Y ≤ y ) = P (g ( X ) ≤ y ) .

If g is a strictly increasing function, then

( )
FY ( y ) = P X ≤ g −1 ( y ) = FX g −1 ( y )( )
⇒ fY ( y ) =
d
dy
(
FX g −1 ( y ) )

= f X (g −1 ( y )) g (y).
d −1
dy

P.98
Stat1301 Probability& Statistics I Spring 2008-2009

On the other hand, if g is a strictly decreasing function, then

( ) (
FY ( y ) = P X ≥ g −1 ( y ) = 1 − FX g −1 ( y ) )
⇒ fY ( y ) = − FX (g −1 ( y ))
d
dy

= − f X (g −1 ( y ))
d −1
g ( y ).
dy

g ( y ) is positive when g is increasing, and is negative when g is


d −1
Since
dy
decreasing, the pdf of Y can be expressed as

(
f Y ( y ) = f X g −1 ( y ) ) d −1
dy
g (y) , y ∈ g (S ) .

Example

Let X ~ U (0,1) and Y = g ( X ) = − ln X .


1
λ

f X (x ) = 1 , 0 < x <1

S = (0,1) , g (S ) = (0, ∞ )

1
Since g ( x ) = y ⇒ − ln x ⇒ x = e − λy , the inverse function g −1 ( y ) = e − λy exists.
λ
The pdf of Y is
( )
f Y ( y ) = f X e − λy
d − λy
dy
e

= 1 × λ e − λy = λ e − λy , y > 0.

1
Therefore Y = − ln X ~ Exp (λ ) .
λ

P.99
Stat1301 Probability& Statistics I Spring 2008-2009

Example

Let X ~ N (0,1) and Y = Φ ( X ).

S = (− ∞, ∞ ) , Φ (S ) = (0,1)

Since Φ is a one-one function, Φ −1 exists. The pdf of Y is given by

(
f Y ( y ) = f X Φ −1 ( y )) d −1
dy
Φ (y)

(
= φ Φ −1 ( y ) ) Φ' (Φ1 ( y ))
−1

= 1, y ∈ (0,1) (as Φ' ( x ) = φ ( x ) )

Hence Y ~ U (0,1) .

Example

(Log-normal Distribution)

Let X ~ N (μ ,σ 2 ) and Y = g ( X ) = e X .

1 ⎛ ( x − μ )2 ⎞
f X (x ) = exp⎜⎜ − ⎟,
⎟ −∞< x<∞
2πσ 2 ⎝ 2σ 2

S = (− ∞, ∞ ) , g (S ) = (0, ∞ )

Since g ( x ) = y ⇒ e x = y ⇒ y = ln x , the inverse g −1 ( y ) = ln y exists. The pdf of


Y is given by
d
fY ( y ) = f X (ln y ) ln ( y )
dy

1⎛ (ln y − μ )2 ⎞
= exp⎜⎜ − ⎟,
⎟ y > 0.
y 2πσ 2
⎝ 2σ 2

It is called the log-normal distribution which is commonly used to model


multiplicative product of random quantities such as return rate on stock investment.

P.100
Stat1301 Probability& Statistics I Spring 2008-2009

Example

Let X ~ N (0,1) and Y = X 2 .

x2
1 −
f X (x ) = φ (x ) = e 2 , −∞< x<∞

S = (− ∞, ∞ ) , g (S ) = (0, ∞ )

Since g ( x ) = y ⇒ x 2 = y ⇒ x = ± y does not have a unique solution, g −1 ( y ) does


not exists. We cannot use the formula.

We may start from the distribution function. Consider

FY ( y ) = P (Y ≤ y ) = P X 2 ≤ y ( )
(
=P− y≤X ≤ y )
= 2Φ ( y )−1
fY ( y ) = F 'Y ( y ) = 2Φ ' ( y) d
dy
y

=2
1

⎛ 1
exp⎜ − ( y ) ⎞⎟2 1
⎝ 2 ⎠ 2 y

1
= y −1 2 e − y 2
212
π
1 y
1 −1 −
= 12 y e ,
2 2
y > 0.
2 Γ(1 2 )

Hence Y = X 2 ~ χ 12 .

P.101
Stat1301 Probability& Statistics I Spring 2008-2009

Example

(Weibull Distribution)

Let X ~ Exp (λ ) and Y = g ( X ) = X 1 β , β > 0 .

f X ( x ) = λ e − λx , x>0

S = (0, ∞ ) , g (S ) = (0, ∞ )

Obviously, g −1 ( y ) = y β exists. The pdf of Y is given by

( )
fY ( y ) = f X y β
d β
dy
y

= λβ y β −1 exp − λy β ,( ) y>0

It is called the Weibull distribution which is often used in the field of life data
analysis. The corresponding distribution function and hazard rate function are
respectively

(
F ( y ) = 1 − exp − λy β , ) y>0
and
λ ( y ) = λβ y β −1 , y > 0.

Example

(Maxwell-Boltzmann Distribution)

⎛3 ⎞
Let X ~ Γ⎜ , λ ⎟ and Y = X .
⎝2 ⎠

λ3 2 2λ3 2
f X (x ) = x 3 2 −1e − λx = x 1 2 e − λx , x>0
Γ(3 2 ) π

S = (0, ∞ ) , g (S ) = (0, ∞ )

Obviously, g −1 ( y ) = y 2 exists.

P.102
Stat1301 Probability& Statistics I Spring 2008-2009

The pdf of Y is given by

( )
fY ( y ) = f X y 2
d 2
dy
y

2λ3 2
=
π
(y )
2 1 2 − λy 2
e ×2y

4λ3 2
y 2 e −λy ,
2
= y>0
π

It is called the Maxwell-Boltzmann distribution which is widely used in statistical


physics to model the speed of molecules in a uniform gas at equilibrium.

P.103

You might also like