PROBABILITY DISTRIBUTION
A probability distribution shows the possible outcomes of an experiment and the probability of
each of these outcomes. That is, probability distribution is a complete list of all possible of
values of a random variable and their corresponding probabilities. A probability distribution
can be classified as a discrete or continuous probability distribution according to whether it
assumes a discrete or continuous random variable.
Discrete variable is the probability massy function (pmf) and is usually denoted by
p(x). If X is a discrete random variable taking at most a countable infinite number of
values x1, x2, …, then P (xi) = P(X = xi): i= 1, 2 …is called the probability mass
function of random variable X. The set of ordered pairs {x i, P (xi)} i= 1, 2 … gives the
probability distribution of the random variable X. The numbers P (xi): i= 1, 2…must
satisfy the following conditions.
Continuous variable is the probability density function (pdf) and is usually
denoted by f(x). A random variable, X, is said to be a continuous random variable if
there is a non–negative function.
Discrete probability distribution
Discrete probability distribution is a distribution whose random variable is discrete. It
describes a finite set of possible occurrences, for discrete “count data.”
Properties (Required Conditions) for a Discrete Probability Distribution
The sum of the probabilities of all the events in the sample space must equal 1.
i.e. P(x) =1
The probability of each event in the sample space must be between or equal to 0
and 1.
i.e. 0 P(x) 1
Example 1) Consider the possible outcomes for the random experiment of tossing three coins
together once.
Sample space, S = {HHH, THH, HTH, HHT, TTH, THT, HTT,TTT}
Let X be the number of heads that will turn up when three coins tossed. The possible values
of X are 0,1,2 and 3.
𝑃(𝑋 = 0) = 𝑃(𝑋 (𝑇𝑇𝑇)) = 1/8,
3
𝑃(𝑋 = 1) = 𝑃(𝑋 (𝐻𝑇𝑇)) + 𝑃(𝑋 (𝑇𝐻𝑇) ) + 𝑃(𝑋 (𝑇𝑇𝐻) ) = 1/8 + 1/8 + 1/8 =
8
3
𝑃(𝑋 = 2) = 𝑃(𝑋 (𝐻𝐻𝑇)) + 𝑃(𝑋 (𝐻𝑇𝐻)) + 𝑃(𝑋 (𝑇𝐻𝐻)) = 1/8 + 1/8 + 1/8 =
8
1
𝑃(𝑋 = 3) = 𝑃(𝑋 (𝐻𝐻𝐻)) =
8
X 0 1 2 3
1 3 3 1
P(X)
8 8 8 8
EXPECTED VALUE AND VARIANCE OF A PROBABILITY DISTRIBUTION
Expected Value
The expected value, or mean, of a random variable is a measure of the central location for the
random variable. It is denoted by E(x) or . The mathematical expression for the expected
value of a discrete random variable x is as follows:
Expected value of a discrete random variable:
E(x)= = x1 P(x1) + x2 . P(x2) +………..+ xn P(Xn) Or,
E (x) =
x
i 1
i
. P(xi)
Where x1, x2,-------,xn are the outcomes and P(x1), P(x2)…P(xn) are the
corresponding probabilities.
The above formula shows that in order to compute the expected value of a discrete random
variable, we must multiply each value of the random variable by the corresponding probability
P(x) and then add the resulting products.
Variance
While the expected value provides the mean value for the random variable, we often need a
measure of dispersion, or variability, for the random variable just as we need variance in block
5 to summarize the dispersion in a data set. The mathematical expression for the variance of a
discrete random variable is as follows:
Variance of a discrete probability distribution, σ 2
x . Px x Px .
n n
2 2 2
σ 2
= i i i i
i 1 i 1
and
The standard deviation is = 𝝈𝟐
Example 2) If three fair coins are tossed, find the expected number of heads that will occur
and obtain the variance.
Solution:
Begin by constructing the probability distribution for the number of heads in tossing the three
coins. The probability distribution is constructed below:
No of heads, x 0 1 2 3
Probability, P(x) 1/8 3/8 3/8 1/8
Then,
4
E(x)= 𝑖=1 xi. P(xi)= xi P(x1) + x2 . P(x2) + x3 . P(x3) + x4 . P(x4)
= 0·1/8 + 1·3/8 + 2·3/8 + 3·1/8
= 0 + 3/8 + 6/8 + 3/8 = 12/8 = 6/4 = 3/2 = 1.5
The theoretical mean = 1.5 implies that if the experiment is done as many times as possible,
then on the average a head occurs 1.5 of the time.
4
[(xi − )2 · P(xi)]
𝑖=1
= (x1 - )2 · P(x1) + (x2 - )2 · P(x2) + (x3 - )2 · P(x3) + (x4 - )2 ·P(x4)
= (0 - 1.5)2 · 1/8 + (1-1.5)2 · 3/8 + (2 - 1.5)2 · 3/8 + (3 - 1.5)2 · 1/8
2 = 0.5
Example 3) One thousand tickets are sold at $1 each for a color television valued at $350.
What is the expected value if a person purchases one ticket?
Solution:
The problem can be seen as follows:
When a person purchases one ticket, he has two chances, to lose $1 or gain $349.
Gain, x $ 349 -$1
P(x) 1/1000 999/1000
Hence,
E(x) = $349 · 1/1000 + (-$1) · 999/1000 = -$0.65
Or,
E(x) = overall gain - $1 = $350 · 1/1000 - $1 = $0.65
i.e. The average loss is $0.65 for each of the 1000 ticket holders.
Continuous probability distribution
A common continuous probability distribution is the normal probability distribution. Several
mathematicians were instrumental in its development; among them is the eighteen-century
mathematician and astronomer Karl Gauss. In honor of his work, the normal probability
distribution is often called the Gaussian distribution.
Continuous probability distribution is a probability distribution whose random variable is
continuous. It describes an “unbroken” continuum of possible occurrences. Probability of a
single value is zero and probability of an interval is the area bounded by curve of probability
density function and interval on x-axis. Let a and b be any two values; a <b. The probability
that X assumes a value that lies between a and b is equal to the area under the curve a and b;
that is P (a X b) The integration from a to b in the case of the continuous variable is
analogous to the summation of probabilities in the discrete case.
6.4 Common Discrete Probability Distributions
6.4.1 Binomial Distribution
The origin of binomial distribution is Bernoulli's trial. Bernoulli's trial is an experiment where
there are only two possible outcomes, “success" or "failure". In connection with this trial, a
success may be getting heads with a balanced coin; it may be passing an examination.
Whenever we face such experiment, we use binomial distribution under the assumptions stated
below. Any experiment can also be turned into a Bernoulli trial by defining one or more
possible results which we are interested as „„Success” and all other possible results as
“Failure”. For instance, while rolling a fair die, a "success" may be defined as "getting even
numbers on top" and odd numbers as "Failure".
Generally, the sample space in a Bernoulli trial is S = {S, F}, S = Success, F = failure.
Notation: Let probability of success and failure are p and q respectively.
P (success) = P(s) = p and P (failure) = P (f) = q, where q= 1- p.
Definition: Let X be the number of success in n repeated Binomial trials with probability of
success p on each trial, then the probability distribution of a discrete random variable X is
called binomial distribution. Let p = the probability of success q= 1-p= the probability of
failure on any given trial. A binomial random variable with parameters n and p represents the
number of r successes in n independent trials, when each trial has p probability of success.
If X is a random variable, then for i= 0, 1, 2… n
(( = )) = (1 − )
( − 1)
(( = )) = where q = 1 – p
( )
A binomial experiment is a probability experiment that satisfies the following assumptions.
1. The experiment consists of n identical trials.
2. Each trial has only one of the two possible mutually exclusive outcomes, success or a
failure.
3. The probability of each outcome does not change from trial to trial.
4. The trials are independent.
Mean, and Variance of a Probability Distribution
Definition
The mean, variance and standard deviation of a variable that has the Binomial
distribution is found as:
Mean =n·p
Variance 2 = n·p·q
Standard deviation = npq
Example 4) A fair coin is flipped 3 times, what is the probability of getting exactly two
heads?
Solution:
Let X be number of heads with possible values 0,1,2,3
P (getting head) =) p = ½, q = 1-p =1/2, n =3
3 1 1 3
(( = 2)) = ( ) ( )3 =8
2 ((3 − 2) )
Example 5) A new drug is effective 60% of the time. What is the probability that in a random
sample of 4 patients, it will be effective on two of them?
Solution:
This is a Binomial experiment as the points of the experiment are satisfied. Define „effective‟
as „success‟ and „non effective‟ as „failure‟. Then,
p = 0.6, q = 1 - 0.6 = 0.4, n = 4, x=2
Required p (2) = ?
. 0.6 . 0.4 6 0.0576 0.3456
4!
P (2)
2 2
4 2 !2!
Hence, the drug will be effective on two of a random sample of 4 patients with a probability of
0.3456 (or 34.56%).
Example 6) A coin is tossed four times. Find the mean, variance and SD of the number of
heads that will be obtained.
Solution:
Here n = 4, p = 1/2, and q = 1/2
=n.p=4.½=2
2 = n . p . q = 4 . 1/2 . 1/2 = 1
= 𝜎 = 1=1
Example 7) A die is rolled 240 times. Find the mean, variance and standard deviation for the
number of 3‟s that will be rolled.
Solution:
n = 240,P=1/6
= n . p = 240(1/6) = 40
2 = n . p . q = (24)(1/6)(5/6) 33.33
= 33 5.77
6.4.2 Poisson Distribution
It is a discrete probability distribution which is used in the area of rare events. It is useful when
n is large and p is small and when the independent variables occur over a period of time.
The Poisson distribution counts the number of success in a fixed interval of time or within a
specified region.
Examples of random variables that usually obey the Poisson distribution are:
The number of car accidents in a day.
Arrival of telephone calls over interval of times.
The number of misprints on a typed page (a group of pages) of a book.
Natural disasters like earth quake.
The number of suicides reported by a particular city.
The number of customers entering a post office on a given day.
To apply the Poisson distribution, two conditions must be met:
i) The number of success that occurs in any interval is independent of those that occur
in other non-overlapping intervals.
ii) The probability of a success in an interval is proportional to the size of the interval.
In short, the two important traits of the Poisson distribution are independence and
probability.
Let X is the number of occurrences in a Poisson process and λ be the actual average number of
occurrence of an event in a unit length of interval, the probability function for Poisson
distribution is,
The Poisson probability function
e . x
Px ;
x!
Where P(x, λ) is the probability of x occurrences in an interval of time, volume,
area etc for a variable, λ denotes the mean number of occurrences and e 2.7183
Remarks
Poisson distribution possesses only one parameter λ
If X has a Poisson distribution with parameter 𝜆 , then E (X) = λ and Var (X) = λ,
i.e. E (X) = Var (X) =λ ,
P( x
i 0
i ) 1
Example 8) In a small city, 10 accidents took place in a time of 50 days. Find the probability
that there will be a) two accidents in a day and b) three or more accidents in a
day.
Solution:
There are 0.2 accidents per day.
Let X be the random variable, the number of accidents per day
X ~poiss (𝜆 = 0.2) X = 0, 1, 2, ….
.
(0.2)
(( = 2)) = = 0.0164
2
𝑏) 𝑃 (𝑋 ≥ 3) = 𝑃(𝑋 = 3) + 𝑃(𝑋 = 4) + 𝑃(𝑋 = 5) +. . .
= 1 − [𝑃(𝑋 = 0) + 𝑃(𝑋 = 1) + 𝑃(𝑋 = 2)]
. . . . . . 𝑠𝑖𝑛𝑐𝑒 P( x ) 1
i 0
i
= 1 − [0.8187 + 0.1637 + 0.0164] = 0.0012
Example 9) Past police records indicate a mean of five accidents per month while
investigating the safety of a dangerous intersection. The number of accidents is
distributed according to the probability in any month of
a) Exactly 3 accidents.
b) Fewer than 2 accidents.
Solution: By assumption the given distribution is a Poisson probability distribution.
Given that =5
x . e
P x
a) x!
x=3
P 3
5 . 2.7183
3 5
125 0.00674
3! 6
= 0.1404
b) Fewer than 2 accidents comprise 0 and 1 accident during any month.
P 0 P 1
5 . 2.7183
0 5
5 2.7183
1 5
0! 1!
0.0674 + 0.3370
0.4044
Remark:- Although the above probability was determined by evaluating the probability
function, it is often easier to refer to the table for the Poisson probability distribution.
These table provides probabilities for specific values of x and . We have included the
table at the end of this block.
For convenience, in example 1a, = 5 and x = 3. In the first column of the table choose
x = 3 and correspond it with =5, the intersection of these two numbers gives you the
required probability, which is 0.1404.
Example 10) If there are 200 typographical errors randomly distributed in a 500-page
manuscript, find the probability that a given page contains exactly 3 errors.
Solution: First of all, find the mean number of errors
200
0.4
500
Or, 0.4 error per page.
Since x = 3,
e x 2.7183 . 0.4
0.4 3
Px , 0.00715
x! 3!
Thus, there is less than a 1% probability that a give page contains less than 3 errors.
6.5. Continuous Probability Distributions
6.5.1 Normal Distributions
It is the most important distribution in describing a continuous random variable and used as an
approximation of other distribution. Many variables in the practical world follow this distribution,
and hence in many ways it is the cornerstone of modern Statistical Theory. It has been noticed
that empirical distributions of various types of observations in natural and social sciences are
often very close to normal distribution. In statistical analysis the distributions of observations is
frequently assumed to be approximately normal. In statistical estimation and testing of hypotheses
the normal distribution plays an important role.
The graph of the normal distribution is known as the normal curve, which is bell-shaped:
SOME PROPERTIES OF THE NORMAL CURVE
The following are the important properties of the normal curve:
1. The normal curve is “bell-shaped” and symmetrical about the mean. The property of
symmetry can be shown using the pdf as: f ( c) f c . Which is equivalent to
saying that P( X ) P( X ) 0.5 .
Since this is the property of the median, it follows that, for the normal distribution,
Mean = Median= Mode.
2. The height of the normal curve is at its maximum when X mean , which means,
again, Mean = Median= Mode.
3. The first and the third quartiles are equidistant from the median,
Q Q
i.e., Q Q Q Q1 . Or, Q 1 3 .
3 2 2 2 2
4. The Probability that a random variable will have a value between any two points is equal to
the area under the curve between those points.
5. Total area under the standard normal curve is equal to 1 or 100%
6. The curve is continuous and never touches the x-axis i.e is asymptotic.
7. The standard normal curve is symmetric about 0.
8. Most of the area under the standard normal curve lies between z= -3 and z=3.
9. The curve is Uni-modal
10. The mathematical equation of the normal probability distribution is defined by the
probability density function.
x 2
f x
1 2 2
e
2
Where
= mean 3.14159
= Standard deviation e 2.7183
Given a normal distributed random variable X with mean μ and standard deviation σ
𝑥−𝜇 𝑏−𝜇
𝑎 𝜇
𝑃(𝑎 < 𝑋 < 𝑏) = 𝑃( < )<
𝜎 𝜎 𝜎
𝑥−𝜇 𝑎−𝜇
𝑃(𝑋 < 𝑎) = 𝑃( < )
𝜎 𝜎
𝑥 𝜇 𝑎 𝜇
But, = 𝑍 standard normal random variable 𝑃(𝑍 < )
𝜎 𝜎
Note: i) P (a<x<b) = P (a ≤X<b)
= P (a<X≤ b)
=P (a ≤X≤ b)
ii) P (- ∞ < Z < ∞) = 1
Example 11) Example 6.9: Find the probabilities that a random variable having the standard
normal distribution will take on a value
a) Less than 1.72; b)Less than -0.88;
c) Between 1.30 and 1.75; d) Between -0.25 and 0.45.
Solution: By using the normal table,
a) 𝑃 (𝑧 < 1.72) = 0.5000 + 0.4573 = 0.9573
b) 0.5000 − 𝑃 (0 < 𝑍 < 0.88) = 0.5000 − 0.3106 = 0.1894
c) 𝑃 (0 < 𝑍 < 1.75) − 𝑃 (0 < 𝑍 < 1.30) = 0.4599 − 0.4932 = 0.0567
d) 𝑃 (−0.25 < 𝑍 < 0) + 𝑃 (0 < 𝑍 < 0.45) = 𝑃 (0 < 𝑍 < 0.25) + 𝑃 (0 < 𝑍 < 0.45)
= 0.0987 + 1736 = 0.2723
Example 12) Find the area under the normal curve between z=0 and z=2.34
Solution:
The standard normal curve
Representation is shown: From
the table the intersection 0 2.34
of z = 2.3 with 0.04 gives 0.4904 or
49.04% which is the required area.
Example 13) Find the area under the normal distribution curve between z = -1.93 and z = 2.35
Solution: For easy look, draw
the normal curve and locate the two z-scores.
The total area (the shaded region) is the area
between –1.93 and 0 plus the -1.93 0 2.35
area between 0 and 2.35;
Hence, from the normal distribution table
Area = 0.4732 + 0.4906 = 0.9638 or 96.38%. Note that it is equivalent to say that the
probability of the z-value lying between z = -1.93 and z = 2.35 is 96.38%. This can also be
written as:
P(-1.93 < z > 2.35) = 0.9638
Example 14) Find the probability that the z-value of a normally distributed variable lies to the
left of 1.65
Solution
The probability that the z-value
lies to the left of 1.65 is equivalent to
finding the area under the standard
normal curve, which is to the left of 1.65
Hence, total area = area to the left of 0 0 1.65
plus area between 0 and 1.65 = P(z < 1.65)
= 0.5000 + 0.4505 = 0.9505 or 95.05%
Which is required probability.
Example 15) Find P(z > 1.91)
Solution
P(z > 1.91) = area to the right of 0 area between 0 and 1.91.
i.e
P(Z > 1.91) = P(z > 0) - P(0 < z < 1.91) 0 1.91
= 0.5000 - 0.479
= 0.021 or 2.1%
APPLICATIONS OF THE NORMAL DISTRIBUTION
The area under the normal curve is used to solve practical application problems such as finding
probabilities or percentages of values. In order to solve such problems you need only transform
the values of the variable into the z values and read the standard normal distribution table.
Example 16) The scores for an IQ test are normally distributed with a mean of 100 and a
standard deviation of 15. Find the percentage of IQ scores that will fall below
112.
Solution
Step 1: Draw a figure and represent the area
Step 2: Find the z-value
Corresponding to an IQ
Score 112.
Z = x - = 112 – 100 = 0.8 100 112
115 0 0.8
Step3: From the table,
P(z < 0.8) = P(z < 0) + P(0 < z < 0.8) = 0.5000 + 0.2881 = 0.7881
Hence, 78.81% of the IQ scores fall below 112.
Example 17) The monthly salaries of 2000 workers are normally distributed with a mean of
birr 550 and of workers whose monthly salaries are
a) Between birr 600 and 700
b) Less than birr 700.
Solutions: the z – values corresponding to 600 and 700 are
600 550
Z 0.625
80
700 550
Z 1.875
80
550 600 700
Hence, 96.99% x200=1939.8 0 0.625 1.875
Approximately 1940 of the workers earn a monthly salary less than birr 700.
Example 18) A college desires to accept only the top 10% of all graduating seniors on the
basis of the results on a national placement test. The test has a mean of 500 and
a standard deviation of 100. Find the cut-off score for the exam.
Solution:
The area is shown.
We solve the problem back ward.
We need to determine the point on 500 x
the axis that cuts the upper 10% of the area. 0 z
Let it be denoted by x
P(z < 0) = 0.5000 – 0.1000 = 0.4000
From the table, the z – value that corresponds to the area 0.4000 is approximately 1.28.
x 500
Then, 1.28 x 628
100
Hence the score 628 should be used as a cut –off score. Any student scoring below 628 should
not be admitted.