0% found this document useful (0 votes)
169 views8 pages

Probability & Random Variables

This document summarizes key concepts about probability distributions: 1. It defines the probability of the empty set P(∅) as 0 and proves it using axiom 3. 2. It states a general formula for calculating the probability of a union of mutually exclusive events. 3. It introduces the Poisson distribution as an approximation of the binomial when n is large and p is small, and defines its probability mass function. 4. It provides formulas for calculating the expected value and variance of the Poisson distribution.

Uploaded by

Amir Vahdani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
169 views8 pages

Probability & Random Variables

This document summarizes key concepts about probability distributions: 1. It defines the probability of the empty set P(∅) as 0 and proves it using axiom 3. 2. It states a general formula for calculating the probability of a union of mutually exclusive events. 3. It introduces the Poisson distribution as an approximation of the binomial when n is large and p is small, and defines its probability mass function. 4. It provides formulas for calculating the expected value and variance of the Poisson distribution.

Uploaded by

Amir Vahdani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Chapter 2: Axioms of Probability

①Proof of 𝑃(∅) = 0:
Consider a series of events E1, E2, …, Ei where E1=S and Ei=∅ for i>1, then, because the events are mutually
exclusive and because 𝑆 = ⋃∞ 𝑖=1 𝐸𝑖 , we have, from axiom 3:
∞ ∞

𝑃(𝑆) = ∑ 𝑃(𝐸𝑖 ) = 𝑃(𝑆) + ∑ 𝑃(∅)


𝑖=1 𝑖=2

Which means:

𝑃(∅) = 0

②Generally, for any set of mutually exclusive set of events:


𝑛 𝑛

𝑃 (⋃ 𝐸𝑖 ) = ∑ 𝑃(𝐸𝑖 )
1 𝑖=1

③the Strong Law of Large Numbers:


With probability 1, fraction of times in which a specific event E happens is equal to P(E), if the experiment is
repeated over and over again. Will be proved later

④regarding Eʹ:
1 = 𝑃(𝑆) = 𝑃(𝐸 ∪ 𝐸ʹ) = 𝑃(𝐸) + 𝑃(𝐸ʹ)

⑤summation of probabilities (proof of 𝑃(𝐸 ∪ 𝐹) = 𝑃(𝐸) + 𝑃(𝐹) − 𝑃(𝐸 ∩ 𝐹))


𝑃(𝐸 ∪ 𝐹) = 𝑃(𝐸 ∪ (𝐸ʹ ∩ 𝐹)) = 𝑃(𝐸) + 𝑃(𝐸ʹ ∩ 𝐹)

Since 𝐹 = (𝐹 ∩ 𝐸) ∪ (𝐹 ∩ 𝐸ʹ), 𝑃(𝐹) = 𝑃(𝐸 ∩ 𝐹) + 𝑃(𝐸ʹ ∩ 𝐹), meaning 𝑃(𝐸ʹ ∩ 𝐹) = 𝑃(𝐹) − 𝑃(𝐸 ∩ 𝐹)

𝑁(𝐸 )
⑥Proof of 𝑃(𝐸 ) = 𝑁(𝑆) :
For 𝑆 = {1,2,3, … , 𝑁}, 𝑃({1}) = 𝑃({2}) = ⋯ = 𝑃({3}), and since {1}, {2}, … 𝑎𝑛𝑑 {𝑁} are all mutually exclusive,
then for example, for 𝐸 = {1,2,4}:
1 1 1 𝑁(𝐸)
𝑃(1 ∪ 2 ∪ 4) = 𝑃({1}) + 𝑃({2}) + ({4}) = + + =
𝑁 𝑁 𝑁 𝑁(𝑆)

⑦regarding urns and balls:


Suppose we have an urn containing n balls, p of which are special. The probability that with k balls withdrawn,
q of them will be special is equal to (hypergeometric distribution):

1
𝑝 𝑛−𝑝
(𝑞) (𝑘 − 𝑞 )
𝑛
( )
𝑘

Chapter 4: Random Variables


①a general formula:
𝐸[𝑎𝑋 + 𝑏] = 𝑎𝐸[𝑋] + 𝑏

②Variance definition:
𝑉𝑎𝑟(𝑋) = 𝐸[(𝑋 − 𝜇)2 ] = ∑(𝑥 − 𝜇)2 𝑝(𝑥)
𝑥

= ∑(𝑥 2 − 2𝑥𝜇 + 𝜇2 )𝑝(𝑥) = ∑ 𝑥 2 𝑝(𝑥) − 2𝜇 ∑ 𝑥𝑝(𝑥) + 𝜇2 ∑ 𝑝(𝑥)


𝑥 𝑥 𝑥 𝑥

= 𝐸[𝑋 2 ] + 2𝜇. 𝐸[𝑋] + 𝜇2 . 1 = 𝐸[𝑋 2 ] − 2𝜇2 + 𝜇2 = 𝐸[𝑋 2 ] − 𝜇2


∴ 𝑉𝑎𝑟(𝑋) = 𝐸[𝑋 2 ] − (𝐸[𝑋])2
The conclusion is a more convenient way of calculating Var(X).

Also note that:

𝑉𝑎𝑟(𝑎𝑋 + 𝑏) = 𝑎2 𝑉𝑎𝑟(𝑋)

③Standard Deviation:
𝑠𝑑(𝑋) = √𝑉𝑎𝑟(𝑋)

④the Bernoulli random variable:


𝑝(0) = 1 − 𝑝
𝑝(1) = 𝑝
The E is equal to:

𝐸(𝑋) = 𝑝

⑤Binomial random variable:


𝑛
( ) 𝑝𝑖 (1 − 𝑝)𝑛−𝑖
𝑖
The parameters are shown as (n,p)

2
The expected value is equal to:
𝑛
𝑛
𝐸(𝐵(𝑋; 𝑛, 𝑝)) = ∑ 𝑖 ( ) 𝑝𝑖 (1 − 𝑝)𝑛−𝑖
𝑖
𝑖=1

𝑛 𝑛−1
Note that: 𝑖 ( ) = 𝑛 ( )
𝑖 𝑖−1
𝑛 𝑛
𝑛−1 𝑛 − 1 𝑖−1 (1
∑𝑛( ) 𝑝. 𝑝𝑖−1 (1 − 𝑝)𝑛−1−(𝑖−1) = 𝑛𝑝 ∑ ( )𝑝 − 𝑝)𝑛−1−(𝑖−1) = 𝑛𝑝
𝑖−1 𝑖−1
𝑖=1 𝑖=1

For the variance, we need E[X2]:


𝑛 𝑛−1
𝑛 𝑛 − 1 𝑖−1 (1
∑ 𝑖 ( ) 𝑝𝑖 (1 − 𝑝)𝑛−𝑖 = 𝑛𝑝 ∑ 𝑖 (
2
)𝑝 − 𝑝)𝑛−1−(𝑖−1)
𝑖 𝑖−1
𝑖=1 𝑖=1
𝑛−1
𝑛 − 1 𝑖−1 (1
= 𝑛𝑝 ∑[1 + (𝑖 − 1)] ( )𝑝 − 𝑝)𝑛−1−(𝑖−1)
𝑖−1
𝑖=1
𝑛−1 𝑛−1
𝑛 − 1 𝑖−1 (1 𝑛 − 1 𝑖−1 (1
= 𝑛𝑝 [(∑ ( )𝑝 − 𝑝)𝑛−1−(𝑖−1) ) + (∑(𝑖 − 1) ( )𝑝 − 𝑝)𝑛−1−(𝑖−1) )]
𝑖−1 𝑖−1
𝑖=1 𝑖=1

∴ 𝐸[𝑖 2 ] = 𝑛𝑝(1 + (𝑛 − 1)𝑝)


Using the obtained value of E[X2], we can now calculate the Var[X]

𝑉𝑎𝑟(𝑋) = 𝐸[𝑋 2 ] − (𝐸[𝑋])2


𝑛𝑝(1 + (𝑛 − 1)𝑝) − (𝑛𝑝)2 = 𝑛𝑝(1 − 𝑝)

④The Poisson random variable:


It may be used as an approximation for a binomial random variable with parameters (n,p) when n is large and
p is small enough so that np is of moderate size. It is defined as:

𝜆𝑖
𝑝(𝑖) = 𝑒 −𝜆 𝑖 = 1,2,3, …
𝑖!
It also represents a probability mass function:
∞ ∞
−𝜆
𝜆𝑖
∑ 𝑝(𝑖) = 𝑒 ∑ = 𝑒 −𝜆 𝑒 𝜆 = 1
𝑖!
𝑖=0 𝑖=0

It is derived as follows, taking np=λ:

𝑛! 𝑖 (1 𝑛−𝑖
𝑛! 𝜆 𝑖 𝜆 𝑛−𝑖
𝑝(𝑖) = 𝑝 − 𝑝) = ( ) (1 − )
(𝑛 − 𝑖)! 𝑖! (𝑛 − 𝑖)! 𝑛! 𝑛 𝑛

3
𝑛
𝜆
𝑛(𝑛 − 1) … (𝑛 − 𝑖 + 1) 𝜆𝑖 (1 − ⁄𝑛)
= 𝑖
𝑛𝑖 𝑖!
(1 − 𝜆⁄𝑛)

For n large and λ moderate:

𝜆 𝑛 𝑛(𝑛 − 1) … (𝑛 − 𝑖 + 1) 𝜆 𝑖
(1 − ) ≈ 𝑒 −𝜆 ≈1 (1 − ) ≈ 1
𝑛 𝑛𝑖 𝑛
Hence,

𝜆𝑖
𝑝(𝑖) ≈ 𝑒 −𝜆
𝑖!

⑤where Poisson is used:


Suppose we want to approximate the number of people who reach age 100. Among n persons with the
probability p(100) for each of them to reach the age 100, the number is approximated to be Poisson with (i;λ)
where i=100 and λ=np

⑥E[X] & Var[X]:


First we calculate E[X]:
∞ ∞
−𝜆
𝜆𝑖 𝜆𝑖−1
𝐸[𝑋] = ∑ 𝑖𝑒 = 𝑒 −𝜆 𝜆 ∑ = 𝑒 −𝜆 𝜆𝑒 𝜆 = 𝜆
𝑖! (𝑖 − 1)!
𝑖=0 𝑖=1

For the variance:


∞ ∞ ∞ ∞
2] 2 −𝜆
𝜆𝑖 𝜆𝑖−1 𝜆𝑖−1 𝜆𝑖−1
𝐸[𝑋 = ∑𝑖 𝑒 = 𝜆 ∑ 𝑖𝑒 −𝜆 = 𝜆 [∑(𝑖 − 1)𝑒 −𝜆 + ∑ 𝑒 −𝜆 ] = 𝜆[𝜆 + 1]
𝑖! (𝑖 − 1)! (𝑖 − 1)! (𝑖 − 1)!
𝑖=0 𝑖=1 𝑖=1 𝑖=1

Var[X] is equal to:

𝜆(𝜆 + 1) − 𝜆2 = 𝜆

⑦the Negative Binomial Random Variable


N trials are repeated. The probability of having r successes is a negative binomial random variable with
parameters (X;r,p):
𝑛 − 1 𝑟 (1
𝑃(𝑋 = 𝑛) = ( )𝑝 − 𝑝)𝑛−𝑟 𝑛 = 𝑟, 𝑟 + 1, …
𝑟−1
It follows because in order for the rth success to occur in the nth trial with probability p, r-1 successes must
have happened in the n-1 previous trials.

Example: what is the probability of achieving r successes before m failures?

In other words, the probability of achieving r successes in r+m-1 trials has to be calculated, which is as follows:
𝑟+𝑚−1
𝑛 − 1 𝑟 (1
∑ ( )𝑝 − 𝑝)𝑛−𝑟
𝑟−1
𝑛=𝑟
4
⑧the expected value and variance of the NB random variable:
First we calculate E[X]:
∞ ∞
𝑛 − 1 𝑟 (1 𝑟 𝑛
𝐸[𝑋] = ∑ 𝑛 ( )𝑝 − 𝑝)𝑛−𝑟 = ∑ 1 ( ) 𝑝𝑟+1 (1 − 𝑝)𝑛−𝑟
𝑟−1 𝑝 𝑟
𝑛=𝑟 𝑛=𝑟

𝑟 (𝑛 + 1) − 1 𝑟+1 𝑟
= ∑( ) 𝑝 (1 − 𝑝)(𝑛+1)−(𝑟+1) =
𝑝 (𝑟 + 1) − 1 𝑝
𝑛=𝑟

The term after the sigma sign represents a mass probability function with parameters (n+1, r+1, p)
𝑟
𝐸[𝑋] =
𝑝
For the variance,
𝑟 𝑟 𝑟+1
𝐸[𝑋 2 ] = 𝐸[𝑌 − 1] = ( − 1)
𝑝 𝑝 𝑝
𝑟 𝑟+1 𝑟 2 𝑟(1 − 𝑝)
𝑉𝑎𝑟(𝑋) = ( − 1) − ( ) =
𝑝 𝑝 𝑝 𝑝2

⑨the Hypergeometric random variable:


Suppose that a sample of size n is to be chosen randomly (without replacement) from an urn containing N
balls, of which m are white and N − m are black. If we let X denote the number of white balls selected,
𝑚 𝑁−𝑚
( )( )
𝑖 𝑛−𝑖
𝑃(𝑋 = 𝑖) = 𝑖 = 0, 1, 2, …
𝑁
( )
𝑛
Which is a hypergeometric variable with the variable and parameters of (X; n,N,m)

⑩the expected value and variance of the hypergeometric variable:

𝑁 𝑚 𝑁−𝑚
( )( )
𝑖 𝑛−𝑖
𝐸[𝑋] = ∑ 𝑖
𝑁
𝑖=1 ( )
𝑖
𝑚 𝑚−1 𝑁 𝑁 𝑁−1
using 𝑖 ( ) = 𝑚 ( ) and ( ) = ( ):
𝑖 𝑖−1 𝑖 𝑛 𝑖−1

𝑁 𝑚 − 1 (𝑁 − 1) − (𝑚 − 1)
𝑛𝑚 ( )( ) 𝑛𝑚 𝑛𝑚
𝑖−1 (𝑛 − 1) − (𝑖 − 1)
𝐸[𝑋] = ∑1 = 1=
𝑁 𝑁−1 𝑁 𝑁
𝑖=1 ( )
𝑖−1
The number 1 after nm/N represents a mass probability function with parameters (1, n-1, N-1, m-1)

For E[X2]:

5
𝑚 𝑁−𝑚 𝑁 𝑚 − 1 (𝑁 − 1) − (𝑚 − 1)
( )( ) 𝑛𝑚 𝑁 (
𝑖−1
)(
(𝑛 − 1) − (𝑖 − 1)
)
2] 2 𝑖 𝑛 − 𝑖
𝐸[𝑋 = ∑ 𝑖 = ∑[(𝑖 − 1) + 1]
𝑁 𝑁 𝑁−1
𝑖=1 ( ) 𝑖=1 ( )
𝑛 𝑛−1
𝑛𝑚 𝑛𝑚 (𝑛 − 1)(𝑚 − 1)
= [𝐸(𝑖 − 1) + 𝑃(1)] = [ + 1]
𝑁 𝑁 (𝑁 − 1)

For the variance:


𝑛𝑚 (𝑛 − 1)(𝑚 − 1) 𝑛𝑚 2
𝑉𝑎𝑟(𝑋) = [ + 1] − ( )
𝑁 (𝑁 − 1) 𝑁
𝑚
𝑚−1 𝑚 1−
𝑁
Using = − :
𝑁−1 𝑁 𝑁−1

𝑚
𝑛𝑚 𝑚 1−𝑀 𝑛𝑚 𝑛𝑚 𝑚 𝑛−1
∴ 𝑉𝑎𝑟(𝑋) = [(𝑛 − 1) ( − )+1− ]= (1 − ) (1 − )
𝑁 𝑁 𝑁−1 𝑁 𝑅 𝑁 𝑁−1

Chapter 5: Continuous Random Variables


①Represented by:
𝑏
𝑃{𝑎 ≤ 𝑥 ≤ 𝑏} = ∫ 𝑓(𝑥)𝑑𝑥
𝑎

Note that f(x) itself does NOT show P(a), because:


𝑎
𝑃(𝑋 = 𝑎) = ∫ 𝑓(𝑥)𝑑𝑥 = 0
𝑎

Therefore:
𝑎 ∞
𝑎
𝑃(𝑋 < 𝑎) = 𝑃(𝑋 ≤ 𝑎) = 𝐹(𝑥) = ∫ 𝑓(𝑥)𝑑𝑥 = [∫ 𝑓(𝑥)𝑑𝑥 ] |
−∞ −∞
−∞

②The expectation of continuous random variables:


𝐸(𝑋) = ∫ 𝑥𝑓(𝑥)𝑑𝑥
−∞

𝐸[𝑔(𝑥)] = ∫ 𝑔(𝑥)𝑓(𝑥)𝑑𝑥
−∞

For example:
1 𝑖𝑓 0 ≤ 𝑥 ≤ 1
𝑓(𝑥) = { 𝐸(𝑒 𝑋 ) =?
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

6
∞ 1
𝑓(𝑥)=1, 0≤𝑥≤1 1
𝐸(𝑒 𝑋 ) = ∫ 𝑒 𝑥 𝑓(𝑥)𝑑𝑥 → ∫ 𝑒 𝑥 𝑑𝑥 = 𝑒 𝑥 | = 𝑒 1 − 𝑒 0 = 𝑒 − 1
0
−∞ 0

③The variance of continuous random variables is just the same as


discrete random variables:
𝑉𝑎𝑟(𝑋) = 𝐸[𝑋 2 ] − (𝐸[𝑋])2 also 𝑉𝑎𝑟(𝑎𝑋 + 𝑏) = 𝑎2 𝑉𝑎𝑟(𝑋)

④the uniform random variable:


1 0<𝑥<1
𝑓(𝑥) = {
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
For any 0<a<b<1:
𝑏
𝑏
𝑃(𝑎 ≤ 𝑋 ≤ 𝑏) = ∫ 𝑓(𝑥)𝑑𝑥 = 𝑥 | = 𝑏 − 𝑎
𝑎
𝑎

In its general form:


1
if α<x<b
𝑓(𝑥) = {𝛽 − 𝛼
0 if otherwise
Therefore, for any value a:
0 𝑎<𝛼
𝑎−𝛼
𝐹(𝑎) = {𝛽 − 𝛼 𝛼 < 𝑎𝛽
1 𝛽<𝑎
The expectation is equal to (α+β)/2, the variance (β-α)2/12

⑤The normal random variable:


1
𝑓(𝑥) = 𝑒𝑥𝑝[−(𝑥 − 𝜇)/2𝜎 2 ]
√2𝜋𝜎
For any normally distributed X, w/ parameters σ2 and µ, Y=aX+b is normally distributed w/ parameters aµ+b
and a2σ2. The cumulative probability function for the normal distribution is shown as:
𝑥 𝑥
1 1
Φ(𝑥) = [(𝑥 − 𝜇)⁄𝜎 2 ]⁄2] =
∫ 𝑒𝑥𝑝 [− ⏟ ∫ 𝑒𝑥𝑝[− 𝑍⁄2]
√2𝜋 𝑍 √2𝜋
−∞ −∞

∴ P(𝑎 < 𝑋 < 𝑏) = Φ(𝑏) − Φ(𝑎) = 𝑃(𝑍𝑎 < 𝑍𝑋 < 𝑍𝑏 ) = Φ(𝑍𝑏 ) − Φ(𝑍𝑎 )
note that: Φ(∞) = 1

7
⑥The normal approximation to the Binomial Distribution:
The probability that in n trials, w/ a success probability of p, X of them will be a success, can be approximate
as:

𝑋 − 𝑛𝑝
𝑃 (𝑎 ≤ ≤ 𝑏) ≅ Φ(𝑏) − Φ(𝑎)
√𝑛𝑝(1 − 𝑝)

The approximation is quite good when np(1-p)>10.

When using the normal approximation for a binomially distributed P(x)=y, use normal distribution for
P(y-.5<y<y+.5) (standardize first), which is called the continuity correction. For a binomially distributed P(X>Y),
use the normal distribution for P(X≥Y+.5)

⑦The exponential Random Variable:


For λ>0, w/ parameters λ:
−𝜆𝑥
𝑓(𝑥) = {𝜆𝑒 if 𝑥 ≥ 0
0 if 𝑥 < 0

⑧ Integration by Parts:
For calculating∫ 𝑓(𝑥)𝑔(𝑥)𝑑𝑥:
𝑑
[𝑓(𝑥)𝑔(𝑥)] = 𝑓́ (𝑥)𝑔(𝑥) + 𝑓(𝑥)𝑔́ (𝑥)
𝑑𝑥
Integrating both side gives:
𝑑
∫ [𝑓(𝑥)𝑔(𝑥)]𝑑𝑥 = ∫ 𝑓́ (𝑥)𝑔(𝑥)𝑑𝑥 + ∫ 𝑓(𝑥)𝑔́ (𝑥)𝑑𝑥
𝑑𝑥
Rearranging gives the integration by parts formula:

∫ 𝑓(𝑥)𝑔́ (𝑥)𝑑𝑥 = 𝑓(𝑥)𝑔(𝑥) − ∫ 𝑓́ (𝑥)𝑔(𝑥)𝑑𝑥

Or, representing f(x) w/ u and g(x) w/ v gives:

∫ 𝑢𝑑𝑣 = 𝑢𝑣 − ∫ 𝑣𝑑𝑢

⑨The expectation and Variance of the Exponential Random Variable


The E is equal to 1/λ, the Var equal to 1/λ2

⑪Memoryless Random Variables


Refer to Sheldon page 210

You might also like