0% found this document useful (0 votes)
9 views40 pages

Random Variable & Distributions

The document discusses random variables and their distribution functions, defining random variables as numerical responses to random experiments. It explains discrete random variables, their properties, and provides examples, including the calculation of expected values and variances. Additionally, it covers cumulative probability distribution functions and introduces concepts like pooled testing for efficiency in identifying diseases.

Uploaded by

jpc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views40 pages

Random Variable & Distributions

The document discusses random variables and their distribution functions, defining random variables as numerical responses to random experiments. It explains discrete random variables, their properties, and provides examples, including the calculation of expected values and variances. Additionally, it covers cumulative probability distribution functions and introduces concepts like pooled testing for efficiency in identifying diseases.

Uploaded by

jpc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

RANDOM VARIABLES &

DISTRIBUTION FUNCTIONS
RANDOM VARIABLES
▪ Random variable: Response of random experiments taking different
numerical values with certain probabilities.
▪ In a random experiment, the quantities of interest, or the real-valued
functions defined on the sample space, are known as random variables.

▪ A random variable assigns a real number to every possible outcome or


event in an experiment.

2
RANDOM VARIABLE
Sample point Assigned
(Outcome) Numerical Value (X)
Example:
DD 2
▪ Experiment: testing two components. DN 1
(D=defective, N=non-defective)
ND 1
▪ Sample space: S={DD,DN,ND,NN}
NN 0
▪ Let X = number of defective
components when two components
are tested.
▪ Assign numerical values to the
outcomes

3
RANDOM VARIABLES
Examples:
▪ Number of defective items in a random sample of 100
▪ Number of car crashes in Delhi tomorrow
▪ Amount of rainfall in India this monsoon
▪ Waiting time for a credit card authorization
▪ Length of time a tumor patient survives
▪ Starting salary after graduation

4
DISCRETE RANDOM VARIABLES
▪ Discrete random variables can assume only a finite or countable number of
possible values
▪ Discrete variables produce outcomes that come from a counting process.
▪ Examples: number of classes you are taking, number of earthquakes in
Japan in one year, number of goals in a football match.

5
EXAMPLES OF DISCRETE RANDOM VARIABLES
Random Experiment Random Variable (x) Possible Values for the
Random Variable

Flip a coin Face of coin showing 1 if heads; 0 if tails


Roll a die Number of dots showing on 1, 2, 3, 4, 5, 6
top of die

Contact five customers Number of customers who 0, 1, 2, 3, 4, 5


place an order

Operate a health care clinic Number of patients who 0, 1, 2, 3, …


for one day arrive

Offer a customer the choice Product chosen by customer 0 if none; 1 if choose product
of two products A; 2 if choose product B

6
FORMAL DESCRIPTION OF A DISCRETE R.V. X
Value (x) Probability (p(x))
x1 P(X=x1) = p(x1)
x2 P(X=x2) = p(x2)
⋮ ⋮
Total 1

The probability function above is called the probability mass function (p.m.f.) of
X.
It is also known as the distribution of X.
Values can be any set of numbers, typically integers.
7
DISCRETE PROBABILITY DISTRIBUTION
▪ The probability distribution is defined by a probability function,
denoted by P(x), that provides the probability for each value of the
random variable.
▪ Two key properties
– The probability of each value of x is a value between 0 and 1
0 ≤ 𝑃(𝑋 = 𝑥) ≤ 1
– The sum of the probabilities equals 1, or σ 𝑃 𝑋 = 𝑥 = 1

8
TWO DICE

9
EXAMPLE
X = The minimum of the two fair dice
(If both dice show the same number, take X to be
that.)

▪ What is the distribution/p.m.f. of X?

▪ Given X = 4, what is the probability that the first


die shows 6? 5? 4?

10
CUMULATIVE PROBABILITY DISTRIBUTION
FUNCTION
▪ A discrete random variable can also be defined in terms of the cumulative
distribution function, 𝑃 𝑋 ≤ 𝑥
▪ The distribution function tells us the chance that
➢ The number of accidents is below a certain margin
➢ Rainfall is below a certain amount
➢ Salary is above a certain threshold
➢ A patient survives more than some stipulated time limit
▪ The last two cases look at 1-F(x), which is called the Survival Function.

12
PROBLEM

▪ Let p(1) = 0.1, p(2) = 0.2 and


p(5) = 0.7.
▪ What is the distribution
function?

• The distribution function for a discrete random


variable looks like a step, and hence is called a step
function.

13
PROBLEM
• I am trying to hit a target, and I keep making
successive independent attempts until I hit the
target for the first time.
• The chance of hitting the target in any trial is ¼.
• Let X = number of trials needed. What is the
distribution of X? What is the distribution
function?

14
PROPERTIES OF RANDOM VARIABLES
15
EXPECTATION
▪ The expected value of a random variable is the average result you would see after
repeating the experiment countless times.
▪ The expected value is also referred to as the mean.
▪ It is a weighted average of all possible values of X. The weights are the probabilities.
▪ Denoted as 𝐸(𝑋) indicates a central location
▪ It is calculated as 𝐸 𝑋 = σ 𝑥𝑖 𝑃 𝑋 = 𝑥𝑖
Find E[X], where X is the outcome when we roll a fair die.
1
Since p 1 = p 2 = p 3 = p 4 = p 5 = 𝑝 6 =
6

1 1 1 1 1 1 7
𝐸 𝑋 =1 +2 +3 +4 +5 +6 =
6 6 6 6 6 6 2
16
EXAMPLES
1. p(1) = 0.1, p(2) = 0.2 and p(5) = 0.7. E(X)?

2. I am trying to hit a target, and I keep making successive independent attempts


until I hit the target for the first time. The chance of hitting the target in any trial
is ¼. What is the expected number of attempts needed?

17
PROBLEM
Three buses are carrying 180 students for a weekend trip. Bus A
carries 40 students, Bus B carries 60 students and Bus C carries 80
students.
(a) A bus is randomly chosen. Let X denote the number of students in
that bus. Write the distribution of X. Hence compute E(X).
(b) A student is randomly chosen. Let Y denote the number of
students in that student’s bus. Do you expect E(Y) to be larger
than E(X) or smaller? Why?
(c) Write the distribution of Y. Compute E(Y).

18
EXPECTATION OF A FUNCTION OF X
If X is a discrete random variable that takes on one of the values 𝑥𝑖 , 𝑖 ≥ 1, with
respective probabilities p(𝑥𝑖 ), then, for any real-valued function g,

𝐸(𝑔 𝑥 ) = ෍ 𝑔( 𝑥𝑖 )𝑝(𝑥𝑖 )
𝑖

Example: Let X denote a random variable that takes on any of the values −1, 0, and
1 with respective probabilities
P{X = −1} = .2 P{X = 0} = .5 P{X = 1} = .3
Compute E[𝑋 2 ].

19
EXAMPLE
Monthly sales of a certain product are Number of P(x) xP(x)
items (x)
believed to follow a certain probability
5000 0.2 1000
distribution. Suppose the company has a
6000 0.3 1800
fixed monthly production cost of $8000,
7000 0.2 1400
and each item generates revenue of $2.
8000 0.2 1600
Find the expected monthly profit from
9000 0.1 900
product sales, h(X).
6700

𝑃𝑟𝑜𝑓𝑖𝑡, ℎ(𝑋) = 𝑎𝑋 − 𝑏
where a is the per unit revenue and b is
production cost
𝐸[ℎ 𝑋 ] = 2 ∗ 𝐸 𝑋 − 8000

20
VARIANCE AND STANDARD DEVIATION
The variance and standard deviation are both measures of variability. They measure a random
variable spread.
 The variance is denoted 𝑉𝑎𝑟 𝑋 or 𝜎 2
 The variance is calculated as
2 2
𝑉𝑎𝑟 𝑋 = E X − E X = σ 𝑥𝑖 − 𝐸 𝑋 𝑃 𝑋 = 𝑥𝑖
𝑛

= ෍ 𝑥𝑖2 𝑝(𝑥𝑖 ) − (𝐸 𝑋 )2 = 𝐸 𝑋 2 − (𝐸 𝑋 )2
𝑖=1

 The standard deviation is denoted by 𝑆𝐷(𝑋) or 𝜎 is a square root of variance.


 Indicates if values are clustered about the mean or widely scattered

21
EXAMPLE: Let X take values 1, 2, and 3 with probability 1/3
each. Let Y take the same values with respective
MEANING OF probabilities 0.1, 0.8, and 0.1.

VARIANCE Whose variance is higher? Why?

22
COMPUTER SHIPMENTS AND QUALITY
CheapO Computers promptly shipped two servers to its biggest client. The company
profits $5,000 on each one of these big systems. Executives are horrified to learn that
someone restocked 4 refurbished computers along with 11 new systems in the
warehouse. The guys in shipping randomly selected the systems that were delivered
from the 15 computers in stock.
If the client gets two new servers, CheapO earns $10,000 profit. If the client gets a
refurbished computer, it’s coming back. CheapO loses the $5,000 profit on this system
and must also pay a $500 penalty. That leaves a $4,500 profit. If both servers that were
shipped are refurbished, however, the client will return both and cancel the order.
CheapO will be out of any profit and be left with $1,000 in penalties. What is the
expected value and standard deviation of CheapO’s profit?
23
COMPUTER SHIPMENTS AND QUALITY CASE
CONT’D…

Outcome 𝒙 𝑷(𝑿 = 𝒙)
Both Refurbished -$1,000 𝐶24
𝐶215

One refurbished $4,500 𝐶111 𝐶14


𝐶215
Both new $10,000 𝐶211
𝐶215

24
ADDITION RULE FOR THE EXPECTED VALUE OF
THE SUM OF RANDOM VARIABLES
The mean or expected value of the sum of random variables is the sum of their
means or expected values:
𝐸 𝑋+𝑌 =𝐸 𝑋 +𝐸 𝑌 ; 𝐸 𝑋−𝑌 =𝐸 𝑋 −𝐸 𝑌 ;
The variance of the sum of mutually independent random variables is the sum of
their variances:
𝑉𝑎𝑟 𝑋 + 𝑌 = 𝑉𝑎𝑟 𝑋 + 𝑉𝑎𝑟 𝑌 ; 𝑉𝑎𝑟 𝑋 − 𝑌 = 𝑉𝑎𝑟 𝑋 + 𝑉𝑎𝑟 𝑌
Otherwise, there will be a cross-term called covariance. We’ll discuss that later.

25
ADDITION RULE OF WEIGHTED SUMS
The expected value of a linear function of a random variable is:
𝐸 𝑎𝑋 + 𝑏𝑌 + 𝑐 = 𝑎𝐸 𝑋 + 𝑏𝐸 𝑌 + 𝑐
The variance of a linear function of a random variable is:
𝑉𝑎𝑟 𝑎𝑋 + 𝑏𝑌 + 𝑐 = 𝑉𝑎𝑟 𝑎𝑋 + 𝑏𝑌 = 𝑎2 𝑉𝑎𝑟 𝑋 + 𝑏 2 𝑉𝑎𝑟 𝑌
If and only if X and Y are independent

26
EXAMPLE
A coin is tossed once. P(head) = p.
Let X = 1 if head, 0 if tail. E(X) = ? V(X) = ?

Consider a gambling game: you pay me Rs. 6 to play the game. I toss a coin as
above; you get Rs. 10 if the coin comes up heads, zero if tails.

The amount you win, W, is: W = 10X – 6

A new random variable! What are E(W) and V(W)?

27
CASE: POOLED TESTING
EXAMPLE: GROUP/POOLED TESTING

Consider a disease whose presence can be identified by carrying out a blood test.
Let p denote the probability that a randomly selected individual has the disease. Suppose N
individuals are independently selected for testing. One way to proceed is to carry out a
separate test on each of the N blood samples, which can be expensive if N is very large.
A potentially more economical approach, group testing, was introduced during World War II to
identify syphilitic men among army inductees:
First, take a part of each blood sample, combine n of these specimens, and carry out a single test.
If no one has the disease, the result is negative, and only this single test is sufficient for these n.
If at least one individual is diseased, the test on the combined sample yields a positive result, in
which case individual tests are then carried out for this group.
Carry on this way until you cover all N individuals.

29
ROW POOLED TESTING
Task: If p = 0.1, N = 1000, and n = 3, what is the expected number
of tests using this procedure? What is the expected number when n =
5?
• If the pooled test is negative (all samples are negative):
– Number of tests = 1 (pooled) + 0 (individual) = 1 test
– Probability = 0.9⁵
• If the pooled test is positive (at least one sample is positive):
– Number of tests = 1 (pooled) + 5 (individual) = 6 tests
– Probability = 1 − 0.9⁵

30
POOL OF 5
Expected number of tests for one pool:
E[Tests per pool] = 1 + 5 × (1 − 0.9⁵)
Total number of complete pools = N / n = 1000 / 5 = 200
Expected TOTAL number of tests (all 1000 individuals):
E[Total tests] = 200 × (1 + 5 × (1 − 0.9⁵))
= 200 + 1000 × (1 − 0.9⁵)
≈ 609.51

31
POOL SIZE OF 3
• If the pooled test is negative ( all samples are negative ):
– Number of tests = 1 (pooled) + 0 (individual) = 1 test
– Probability = 0.9³
• If the pooled test is positive ( at least one sample is positive ):
– Number of tests = 1 (pooled) + 3 (individual) = 4 tests
– Probability = 1 − 0.9³

32
POOL SIZE OF 3
Expected number of tests for one pool:
E[Tests per pool] = 1 + 3 × (1 − 0.9³)
Total number of full pools = ⌊N / n⌋ = ⌊1000 / 3⌋ = 333
One individual is left over and will be tested separately.
Expected TOTAL number of tests:
E[Total tests] = 1 + 333 × (1 + 3 × (1 − 0.9³))
= 334 + 999 × (1 − 0.9³)
≈ 604.73

33
▪ How many people should we pool together?
CHALLENGE ▪ Which pooling strategy is best?
▪ How do test errors affect our strategy?

34
WHAT SHOULD BE THE OPTIMAL VALUE
FOR POOL SIZE
Consider a population size of 10000, and a prevalence rate of p =
1% for COVID. Suppose the test available is 100% accurate.
Task 1: Compute the expected number of tests needed for row-
pooled tests, where the number of row-pooled changes ranges from 1
to 20. What is the best pool size?

35
TASK 1

Example (N = 10 000, p
Component Expected count
= 0.01)
𝑁 10000
Initial pooled testing
𝑘 𝑘

Follow-up testing (pool 𝑁


𝑘[1 − 1 − 𝑝 𝑘 ] 10000[1 − 0.99 𝑘 ]
turns +) 𝑘

36
TASK 1
Expected number of tests needed

𝑁 𝑘
E[Tests] = +𝑁 1− 1−𝑝
𝑘

For 𝑁 = 10000, 𝑝 = 0.01


10000
E[Tests] = + 10000 1 − 0.99𝑘
𝑘

37
WHEN N ∤ K (RESIDUAL SAMPLES)
▪ When 10000/k is not an integer, we will need one extra pool
with the remaining samples.
𝑁 𝑁 𝑘 𝑚
▪ E Tests = +1+𝑘 1− 1−𝑝 +𝑚 1− 1−𝑝
𝑘 𝑘

where [x] is the integer part of the number x, and m is the residual.

38
GROUP/POOLED TESTING: CROSS
POOLING
Again, let p denote the probability that a randomly selected individual
has the disease, and suppose N individuals are independently selected
for testing.
Place samples in a n×n square tray. First, pool each of the n rows and
test the n batches created. Then test the n columns in a similar manner.
If no one has the disease, then the process stops, and the total number
of tests needed is 2n.
If some rows and columns come out to be positive, test the individuals
lying on the intersections of these rows and columns.
Carry on this way until you cover all N individuals.

39
CROSS POOLED TESTING
Task: If p = 0.01, N = 10000, and n = 2, what is the expected number of
tests using this procedure? What is the expected number when n = 5?
For general 𝑛 = 𝑘,

Component Expected count (symbolic)

Compulsory pooled testing 2𝑁


(rows + columns) 𝑘
Follow-up individual testing 𝑁[1 − 2 1 − 𝑝 𝑘
+ 1−𝑝 2𝑘−1 ]
(row & column both +)

40
WHAT IF THE TEST IS IMPERFECT
Prevalence (probability that an arbitrary individual is infected): p
• Intrinsic test error rates
– False-positive probability: 𝑃𝐹𝑃
– False-negative probability: 𝑃𝐹𝑁

𝑁
E[Total Tests] = [1 + 𝑘((1 − 1 − 𝑝 𝑘 )(1 − 𝑃𝐹𝑁 ) + 1 − 𝑝 𝑘 𝑃𝐹𝑃 )]
𝑘

41

You might also like