0% found this document useful (0 votes)
10 views49 pages

Prob Lec3

The lecture covers discrete random variables, including their definitions, probability mass functions, and key concepts such as expectation and variance. It discusses various distributions like Bernoulli, Binomial, Poisson, and Geometric. Additionally, it provides examples and properties of expectation and variance, illustrating their importance in understanding random variables.

Uploaded by

dominhthanhcp06
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views49 pages

Prob Lec3

The lecture covers discrete random variables, including their definitions, probability mass functions, and key concepts such as expectation and variance. It discusses various distributions like Bernoulli, Binomial, Poisson, and Geometric. Additionally, it provides examples and properties of expectation and variance, illustrating their importance in understanding random variables.

Uploaded by

dominhthanhcp06
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

Lecture 3

Discrete Random Variables

1 / 49
Outline

• Probability mass function

• Expectation, variance

• Bernoulli, Binomial distributions

• Poisson distribution

• Geometric distribution

2 / 49
Prob. Cond. Discr. Cont. Joint E, cov LLN, CLT Mass fct E, var Binom Poi Geom

Random variables

The best way of thinking about random variables is just to


consider them as random numbers.

But random means that there must be some kind of experiment


behind these numbers. They actually fit well in our framework:

Definition
A random variable is a function from the sample space Ω to the
real numbers R.

The usual notation for random variables is X , Y , Z , etc., we


often don’t mark them as functions: X (ω), Y (ω), Z (ω), etc.

803//281
49
Prob. Cond. Discr. Cont. Joint E, cov LLN, CLT Mass fct E, var Binom Poi Geom

Random variables

Example
Flipping three coins, let X count the number of Heads obtained.
Then, as a function on Ω,

X (T , T , T ) = 0;
X (T , T , H) = X (T , H, T ) = X (H, T , T ) = 1;
X (T , H, H) = X (H, T , H) = X (H, H, T ) = 2;
X (H, H, H) = 3.

Instead, we’ll just say that X can take on values 0, 1, 2, 3 with


respective probabilities 18 , 38 , 38 , 18 .

814//281
49
Prob. Cond. Discr. Cont. Joint E, cov LLN, CLT Mass fct E, var Binom Poi Geom

Discrete random variables


Definition
A random variable X that can take on finitely or countably
infinitely many possible values is called discrete.

Example
The number of Heads in three coinflips is discrete.

Example
The number of conflips needed to first see a Head is discrete: it
can be 1, 2, 3, . . . .

Example
The lifetime of a device is not discrete, it can be anything in the
real interval [0, ∞).
825//281
49
Prob. Cond. Discr. Cont. Joint E, cov LLN, CLT Mass fct E, var Binom Poi Geom

Mass function
The distribution of a random variable will be the object of
central importance to us.

Definition
Let X be a discrete random variable with possible values
x1 , x2 , . . . . The probability mass function (pmf), or distribution of
a random variable tells us the probabilities of these possible
values:
pX (xi ) = P{X = xi },
for all possible xi ’s.

Often the possible values are just integers, xi = i, and we can


just write pX (i) for the mass function.
We also omit the subscript X if it’s clear which random variable
we are considering and simply put p(i).
836//281
49
Prob. Cond. Discr. Cont. Joint E, cov LLN, CLT Mass fct E, var Binom Poi Geom

Mass function
Proposition
For any discrete random variable X ,
X
p(xi ) ≥ 0, and p(xi ) = 1.
i

Proof.

Remark
Vice versa: any function p which is only non-zero in countably
many xi values, and which has the above properties, is a
probability mass function. There is a sample space and a
random variable that realises this mass function.
847//281
49
Prob. Cond. Discr. Cont. Joint E, cov LLN, CLT Mass fct E, var Binom Poi Geom

Mass function

Example
We have seen X , the number of Heads in three coinflips. Its
possible values are X = 0, 1, 2, 3, and its mass function is
given by

1 3
p(0) = p(3) = ; p(1) = p(2) = .
8 8
Indeed,
3
X 1 3 3 1
p(i) = + + + = 1.
8 8 8 8
i=0

858//281
49
Prob. Cond. Discr. Cont. Joint E, cov LLN, CLT Mass fct E, var Binom Poi Geom

Mass function

Example
Fix a positive parameter λ > 0, and define

λi
p(i) = c · , i = 0, 1, 2, . . . .
i!
How should we choose c to make this into a mass function? In
that case, what are P{X = 0} and P{X > 2} for the random
variable X having this mass function?

869//281
49
Prob. Cond. Discr. Cont. Joint E, cov LLN, CLT Mass fct E, var Binom Poi Geom

Mass function

Solution
First, p(i) ≥ 0 iff c ≥ 0. Second, we need

X ∞
X λi
p(i) = c · = c · eλ = 1,
i!
i=0 i=0

from which c = e−λ . To answer the probabilities,

λ0
P{X = 0} = p(0) = e−λ = e−λ ;
0!

87
10//281
49
Prob. Cond. Discr. Cont. Joint E, cov LLN, CLT Mass fct E, var Binom Poi Geom

Mass function

Solution (. . . cont’d)
P{X > 2} = 1 − P{X ≤ 2}
= 1 − P{X = 0} − P{X = 1} − P{X = 2}
λ0 λ1 λ2
= 1 − e−λ · − e−λ · − e−λ ·
0! 1! 2!
λ 2
= 1 − e−λ − e−λ · λ − e−λ · .
2

88
11//281
49
Prob. Cond. Discr. Cont. Joint E, cov LLN, CLT Mass fct E, var Binom Poi Geom

Expectation, variance

Once we have a random variable, we would like to quantify its


typical behaviour in some sense. Two of the most often used
quantities for this are the expectation and the variance.

89
12//281
49
Prob. Cond. Discr. Cont. Joint E, cov LLN, CLT Mass fct E, var Binom Poi Geom

1. Expectation
Definition
The expectation, or mean, or expected value of a discrete
random variable X is defined as
X
EX : = xi · p(xi ),
i

provided that this sum exists.

Remark
The expectation is nothing else than a weighted average of the
possible values xi with weights p(xi ). A center of mass, in other
words.
p(x1 ) p(x2 ) p(x3 ) p(x4 )

x1 E(X ) x2 x3 x4

90
13//281
49
Prob. Cond. Discr. Cont. Joint E, cov LLN, CLT Mass fct E, var Binom Poi Geom

1. Expectation
Remark

Why is this definition natural?


indicator variable: bin ch báo ( 1: indicates the
presence of a specific category, 0: indicates its
Example (an important one. . . ) absence)

Let X be an indicator variable: K vng ca mt bin ngu nhiên ch báo bng xác sut
ca s kin mà nó đi din.
(
1, if event E occurs,
X =
0, if event E c occurs.

Its mass function is p(1) = P{E} and p(0) = 1 − P{E}. Its


expectation is

EX = 0 · p(0) + 1 · p(1) = P{E}.

91
14//281
49
Prob. Cond. Discr. Cont. Joint E, cov LLN, CLT Mass fct E, var Binom Poi Geom

1. Expectation

Example (fair die)


Let X be the number shown after rolling a fair die. Then
X = 1, 2, . . . , 6, each with probability 16 . The expectation is

6
X 6
X 1 1+6 1 7
EX = i · p(i) = i· = ·6· = .
6 2 6 2
i=1 i=1

The expected value is not necessarily a possible value. Have


you ever seen a die showing 3.5. . . ?

92
15//281
49
Prob. Cond. Discr. Cont. Joint E, cov LLN, CLT Mass fct E, var Binom Poi Geom

2. A few properties of expectation

Proposition (expectation of a function of a r.v.)


Let X be a discrete random variable, and g : R → R function.
Then X
Eg(X ) = g(xi ) · p(xi ),
i

if exists. . .
This formula is rather natural.
Proof.

93
16//281
49
Prob. Cond. Discr. Cont. Joint E, cov LLN, CLT Mass fct E, var Binom Poi Geom

2. A few properties of expectation


Corollary (linearity of expectations, first version)
Let X be a discrete random variable, a and b fixed real
numbers. Then

E(aX + b) = a · EX + b.

Proof.
According to the above, (with g(x) = ax + b,)
X X X
E(aX + b) = (axi + b) · p(xi ) = a · xi p(xi ) + b · p(xi )
i i i
= a · EX + b.

94
17//281
49
Prob. Cond. Discr. Cont. Joint E, cov LLN, CLT Mass fct E, var Binom Poi Geom

2. A few properties of expectation


Corollary (linearity of expectations, first version)
Let X be a discrete random variable, a and b fixed real
numbers. Then

E(aX + b) = a · EX + b.

Proof.
According to the above, (with g(x) = ax + b,)
X X X
E(aX + b) = (axi + b) · p(xi ) = a · xi p(xi ) + b · p(xi )
i i i
= a · EX + b.

94
18//281
49
Prob. Cond. Discr. Cont. Joint E, cov LLN, CLT Mass fct E, var Binom Poi Geom

2. A few properties of expectation


Corollary (linearity of expectations, first version)
Let X be a discrete random variable, a and b fixed real
numbers. Then

E(aX + b) = a · EX + b.

Proof.
According to the above, (with g(x) = ax + b,)
X X X
E(aX + b) = (axi + b) · p(xi ) = a · xi p(xi ) + b · p(xi )
i i i
= a · EX + b · 1.

94
19//281
49
Prob. Cond. Discr. Cont. Joint E, cov LLN, CLT Mass fct E, var Binom Poi Geom

2. A few properties of expectation


Definition (moments)
Let n be a positive integer. The nth moment of a random
variable X is defined as
EX n .
The nth absolute moment of X is

E|X |n .

Remark
Our notation in this definition and in the future will be

EX n : = E(X n ) 6= (EX )n !!

95
20//281
49
Prob. Cond. Discr. Cont. Joint E, cov LLN, CLT Mass fct E, var Binom Poi Geom

3. Variance
Example
Define X ≡ 0,
  
 1  1  1
 1, wp. , 2, wp. ,  10, wp. ,
Y = 2 Z = 5 U= 2

 −1, wp. 1 , 
 − 1 , wp. 4 , 
 −10, wp. 1 ,
2 2 5 2
Notice EX = EY = EZ = EU = 0, the expectation does not
distinguish between these rv.’s. Yet they are clearly different.

Definition (variance, standard deviation)


The variance and the standard deviation of a random
√ variable
2
are defined as VarX : = E(X − EX ) and SD X : = VarX .

Why this definition?


96
21//281
49
Prob. Cond. Discr. Cont. Joint E, cov LLN, CLT Mass fct E, var Binom Poi Geom

3. Variance
Example (. . . cont’d)
VarX = E(X − 0)2 = 02 = 0,

SD X = 0 = 0.
1 1
VarY = E(Y − 0)2 = 12 · + (−1)2 · = 1,
√ 2 2
SD Y = 1 = 1.
1  1 2 4
VarZ = E(Z − 0)2 = 22 · + − · = 1,
√ 5 2 5
SD Z = 1 = 1.
1 1
VarU = E(U − 0)2 = 102 · + (−10)2 · = 100,
√ 2 2
SD U = 100 = 10.

97
22//281
49
Prob. Cond. Discr. Cont. Joint E, cov LLN, CLT Mass fct E, var Binom Poi Geom

3. Variance

These numbers do distinguish between most of our variables


though finer information would be needed to make a difference
between Y and Z .

98
23//281
49
Prob. Cond. Discr. Cont. Joint E, cov LLN, CLT Mass fct E, var Binom Poi Geom

4. A few properties of the variance

Proposition (an equivalent form of the variance)


For any X , VarX = EX 2 − (EX )2 .

Proof.

Corollary
For any X , EX 2 ≥ (EX )2 , with equality only if X = const. a.s.

New notation a.s. (almost surely) means with probability one.

99
24//281
49
Prob. Cond. Discr. Cont. Joint E, cov LLN, CLT Mass fct E, var Binom Poi Geom

4. A few properties of the variance

Example
The variance of the number X shown after rolling a fair die is

1  7 2 35
VarX = EX 2 − (EX )2 = (12 + 22 + · · · + 62 ) · − =
6 2 12
p
and its standard deviation is 35/12 ≃ 1.71.

The two most important numbers we can say about a fair die
are the average of 3.5 and typical deviations of 1.71 around this
average.

100
25//281
49
Prob. Cond. Discr. Cont. Joint E, cov LLN, CLT Mass fct E, var Binom Poi Geom

4. A few properties of the variance

Example (an important one)


The variance of the indicator variable X of the event E is

VarX = EX 2 −(EX )2 = 12 ·P{E}−(P{E})2 = P{E}·(1−P{E})


p
and the standard deviation is SD X = P{E} · (1 − P{E}).

101
26//281
49
Prob. Cond. Discr. Cont. Joint E, cov LLN, CLT Mass fct E, var Binom Poi Geom

4. A few properties of the variance

Proposition (nonlinearity of the variance)


Let X be a random variable, a and b fixed real numbers. Then

Var(aX + b) = a2 · VarX .

Proof.

Notice the square on a2 and also that, in particular,


Var(X + b) = VarX = Var(−X ): the variance is invariant to
shifting the random variable by a constant b or to reflecting it.

102
27//281
49
Prob. Cond. Discr. Cont. Joint E, cov LLN, CLT Mass fct E, var Binom Poi Geom

Bernoulli, Binomial

In this part we’ll get to know the Bernoulli and the Binomial
distributions.

The setting will be that a fixed number of independent trials will


be made, each succeeding with probability p. We will be
counting the number of successes.

103
28//281
49
Prob. Cond. Discr. Cont. Joint E, cov LLN, CLT Mass fct E, var Binom Poi Geom

1. Definition

Definition
Suppose that n independent trials are performed, each
succeeding with probability p. Let X count the number of
successes within the n trials. Then X has the
Binomial distribution with parameters n and p or, in short,
X ∼ Binom(n, p).

The special case of n = 1 is called the


Bernoulli distribution with parameter p.

Notice that the Bernoulli distribution is just another name for the
indicator variable from before.

104
29//281
49
Prob. Cond. Discr. Cont. Joint E, cov LLN, CLT Mass fct E, var Binom Poi Geom

2. Mass function
Proposition
Let X ∼ Binom(n, p). Then X = 0, 1, . . . , n, and its mass
function is
 
n i
p(i) = P{X = i} = p (1 − p)n−i , i = 0, 1, . . . , n.
i

In particular, the Bernoulli(p) variable can take on values 0 or 1,


with respective probabilities

p(0) = 1 − p, p(1) = p.

105
30//281
49
Prob. Cond. Discr. Cont. Joint E, cov LLN, CLT Mass fct E, var Binom Poi Geom

2. Mass function

Remark
That the above is indeed a mass function we verify via the
Binomial Theorem (p(i) ≥ 0 is clear):
n
X n  
X n i
p(i) = p (1 − p)n−i = [p + (1 − p)]n = 1.
i
i=0 i=0

106
31//281
49
Prob. Cond. Discr. Cont. Joint E, cov LLN, CLT Mass fct E, var Binom Poi Geom

2. Mass function
Example
Screws are sold in packages of 10. Due to a manufacturing
error, each screw today is independently defective with
probability 0.1. If there is money-back guarantee that at most
one screw is defective in a package, what percentage of
packages is returned?

Define X to be the number of defective screws in a package.


Then X ∼ Binom(10, 0.1), and the answer is the chance that a
given package has 2 or more faulty screws:

P{X ≥ 2} = 1 − P{X = 0} − P{X = 1}


   
10 10
=1− 0.10 0.910 − 0.11 0.99 ≃ 0.2639.
0 1

107
32//281
49
Prob. Cond. Discr. Cont. Joint E, cov LLN, CLT Mass fct E, var Binom Poi Geom

3. Expectation, variance
Proposition
Let X ∼ Binom(n, p). Then

EX = np, and VarX = np(1 − p).

Proof.
We first need to calculate
X n
X  
n i
EX = i · p(i) = i· p (1 − p)n−i .
i
i i=0

d i
To handle this, here is a cute trick: i = dt t t=1 .

108
33//281
49
Prob. Cond. Discr. Cont. Joint E, cov LLN, CLT Mass fct E, var Binom Poi Geom

3. Expectation, variance

Proof.

Xn  
n
EX = i · pi (1 − p)n−i
i
i=0
Xn  
n d i
= t |t=1 · pi (1 − p)n−i
i dt
i=0
n  
d X n 
= (tp)i (1 − p)n−i
dt i t=1
i=0
d
= (tp + 1 − p)n |t=1 = n(tp + 1 − p)n−1 · p|t=1 = np.
dt

109
34//281
49
Prob. Cond. Discr. Cont. Joint E, cov LLN, CLT Mass fct E, var Binom Poi Geom

Poisson

The Poisson distribution is of central importance in Probability.


We won’t see immediately why, we’ll just start with defining its
distribution. Later we’ll see how it comes from the Binomial.

110
35//281
49
Prob. Cond. Discr. Cont. Joint E, cov LLN, CLT Mass fct E, var Binom Poi Geom

1. Mass function

Definition
Fix a positive real number λ. The random variable X is
Poisson distributed with parameter λ, in short X ∼ Poi(λ), if it is
non-negative integer valued, and its mass function is

λi
p(i) = P{X = i} = e−λ · , i = 0, 1, 2, . . .
i!

We have already seen in an example that this is indeed a mass


function.

Ok, nice, but why this distribution?

111
36//281
49
Prob. Cond. Discr. Cont. Joint E, cov LLN, CLT Mass fct E, var Binom Poi Geom

2. Poisson approximation of Binomial

Proposition
Fix λ > 0, and suppose that Yn ∼ Binom(n, p) with p = p(n) in
such a way that n · p → λ. Then the distribution of Yn converges
to Poisson(λ):

λi
∀i ≥ 0 P{Yn = i} −→ e−λ .
n→∞ i!

That is, take Y ∼ Binom(n, p) with large n, small p, such that


np ≃ λ. Then Y is approximately Poisson(λ) distributed.

112
37//281
49
Prob. Cond. Discr. Cont. Joint E, cov LLN, CLT Mass fct E, var Binom Poi Geom

2. Poisson approximation of Binomial

Proof.
 
n
P{Yn = i} = · pi (1 − p)n−i
i
1 (1 − p)n
= · [np] · [(n − 1)p] · · · [(n − i + 1)p] · .
i! (1 − p)i

Now, np → λ, (n − 1)p → λ, . . . (n − i + 1)p → λ .


 1 n
(1 − p)n = 1 − −→ e−λ .
1/p n→∞

(1 − p)i → 1. Therefore, P{Yn = i} → i!1 λi e−λ .

113
38//281
49
Prob. Cond. Discr. Cont. Joint E, cov LLN, CLT Mass fct E, var Binom Poi Geom

3. Expectation, variance

Proposition
For X ∼ Poi(λ), EX = VarX = λ.

Recall np and np(1 − p) for the Binomial. . .

Proof.

X ∞
X λi
EX = ip(i) = i · e−λ
i!
i=0 i=1

X X ∞
−λ λi−1 λj
=λ e =λ e−λ = λ.
(i − 1)! j!
i=1 j=0

114
39//281
49
Prob. Cond. Discr. Cont. Joint E, cov LLN, CLT Mass fct E, var Binom Poi Geom

4. Examples

Example
Because of the approximation of Binomial, the
◮ number of typos on a page of a book;
◮ number of citizens over 100 years of age in a city;
◮ number of incoming calls per hour in a customer centre;
◮ number of customers in a post office today
are each well approximated by the Poisson distribution.

Many independent small probability events, summing up to “a


few” in expectation.

115
40//281
49
Prob. Cond. Discr. Cont. Joint E, cov LLN, CLT Mass fct E, var Binom Poi Geom

4. Examples
Example
A book on average has 1/2 typos per page. What is the
probability that the next page has at least three of them?
The number X of typos on a page follows a Poisson(λ)
distribution, where λ can be determined from 12 = EX = λ. To
answer the question,

P{X ≥ 3}
= 1 − P{X ≤ 2}
= 1 − P{X = 0} − P{X = 1} − P{X = 2}
(1/2)0 −1/2 (1/2)1 −1/2 (1/2)2 −1/2
=1− ·e − ·e − ·e
0! 1! 2!
≃ 0.014.

116
41//281
49
Prob. Cond. Discr. Cont. Joint E, cov LLN, CLT Mass fct E, var Binom Poi Geom

4. Examples
Example
Screws are sold in packages of 10. Due to a manufacturing
error, each screw today is independently defective with
probability 0.1. If there is money-back guarantee that at most
one screw is defective in a package, what percentage of
packages is returned?

Define X as before; X ∼ Binom(10, 0.1). However, it can


already well be approximated by a Poi(1) distribution
(λ = 1 = 10 · 0.1 = np). Thus,

P{X ≥ 2} = 1 − P{X = 0} − P{X = 1}


10 11
≃ 1 − e−1 − e−1 ≃ 0.2642.
0! 1!
Compare this with the exact value 0.2639 from the Binomial.
117
42//281
49
Prob. Cond. Discr. Cont. Joint E, cov LLN, CLT Mass fct E, var Binom Poi Geom

Geometric

In this setting we again perform independent trials. However,


the question we ask is now different: we’ll be waiting for the first
success.

118
43//281
49
Prob. Cond. Discr. Cont. Joint E, cov LLN, CLT Mass fct E, var Binom Poi Geom

1. Mass function
Definition
Suppose that independent trials, each succeeding with
probability p, are repeated until the first success. The total
number X of trials made has the Geometric(p) distribution (in
short, X ∼ Geom(p)).

Proposition
X can take on positive integers, with probabilities
p(i) = (1 − p)i−1 · p, i = 1, 2, . . ..

That this is a mass function, we verify by p(i) ≥ 0 and



X ∞
X p
p(i) = (1 − p)i−1 · p = = 1.
1 − (1 − p)
i=1 i=1

119
44//281
49
Prob. Cond. Discr. Cont. Joint E, cov LLN, CLT Mass fct E, var Binom Poi Geom

1. Mass function

Remark
For a Geometric(p) random variable and any k ≥ 1 we have
P{X ≥ k} = (1 − p)k −1 (we have at least k − 1 failures).

Corollary
The Geometric random variable is (discrete) memoryless: for
every k ≥ 1, n ≥ 0

P{X ≥ n + k | X > n} = P{X ≥ k}.

120
45//281
49
Prob. Cond. Discr. Cont. Joint E, cov LLN, CLT Mass fct E, var Binom Poi Geom

2. Expectation, variance

Proposition
For a Geometric(p) random variable X ,

1 1−p
EX = , VarX = .
p p2

121
46//281
49
Prob. Cond. Discr. Cont. Joint E, cov LLN, CLT Mass fct E, var Binom Poi Geom

2. Expectation, variance

Proof.


X ∞
X
i−1
EX = i · (1 − p) p= i · (1 − p)i−1 p
i=1 i=0

d X i 

X ∞
d i
= t |t=1 · (1 − p)i−1 p = t · (1 − p)i−1 p
dt dt t=1
i=0 i=0
p d 1
= ·
1 − p dt 1 − (1 − p)t t=1
p 1−p 1
= · 2
= .
1 − p (1 − (1 − p)) p

122
47//281
49
Prob. Cond. Discr. Cont. Joint E, cov LLN, CLT Mass fct E, var Binom Poi Geom

3. Example

Example
To first see 3 appearing on a fair die, we wait X ∼ Geom( 61 )
1
many rolls. Our average waiting time is EX = 1/6 = 6 rolls, and
the standard deviation is
v
u
√ u 1 − 61 √
SD X = VarX = t 2 = 30 ≃ 5.48.
1
6

123
48//281
49
Prob. Cond. Discr. Cont. Joint E, cov LLN, CLT Mass fct E, var Binom Poi Geom

3. Example

Example (. . . cont’d)
The chance that 3 first comes on the 7th roll is
 1 6 1
p(7) = P{X = 7} = 1 − · ≃ 0.056,
6 6

while the chance that 3 first comes on the 7th or later rolls is
 1 6
P{X ≥ 7} = 1 − ≃ 0.335.
6

124
49//281
49

You might also like