Statistical and Mathematical
Methods for Data Analysis
Dr. Syed Faisal Bukhari
Associate Professor
Department of Data Science
Faculty of Computing and Information Technology
University of the Punjab
Dr. Faisal Bukhari, Department of Data Science, PU, Lahore 1
Textbooks
Probability & Statistics for Engineers & Scientists,
Ninth Edition, Ronald E. Walpole, Raymond H. Myer
Elementary Statistics: Picturing the World, 6th
Edition, Ron Larson and Betsy Farber
Elementary Statistics, 13th Edition, Mario F. Triola
Dr. Faisal Bukhari, Department of Data Science, PU, Lahore 2
Reference books
Probability and Statistical Inference, Ninth Edition,
Robert V. Hogg, Elliot A. Tanis, Dale L. Zimmerman
Probability Demystified, Allan G. Bluman
Schaum's Outline of Probability, Second Edition,
Seymour Lipschutz, Marc Lipson
Python for Probability, Statistics, and Machine Learning, José
Unpingco
Practical Statistics for Data Scientists: 50 Essential Concepts,
Peter Bruce and Andrew Bruce
Think Stats: Probability and Statistics for Programmers, Allen
Downey
Dr. Faisal Bukhari, Department of Data Science, PU, Lahore 3
References
Readings for these lecture notes:
Probability & Statistics for Engineers & Scientists, Ninth edition,
Ronald E. Walpole, Raymond H. Myer
Probability Demystified, Allan G. Bluman
Elementary Statistics: Picturing the World, 6th Edition, Ron Larson
and Betsy Farber
https://www.statisticshowto.com/probability-and-
statistics/statistics-definitions/conditional-probability-definition-
examples/#:~:text=Conditional%20probability%20is%20the%20pro
bability,probability%20of%200.5%20(50%25).
https://en.wikipedia.org/wiki/Contingency_table#:~:text=In%20stat
istics%2C%20a%20contingency%20table,%2C%20engineering%2C%
20and%20scientific%20research.
These notes contain material from the above resources.
Dr. Faisal Bukhari, Department of Data Science, PU, Lahore 4
Independent and Dependent Events
[1]
Two events A and B are independent if and only if P(B|A)
= P(B) or P(A|B) = P(A), assuming the existences of the
conditional probabilities. Otherwise, A and B are
dependent.
OR
Two events, A and B, are said to be independent if the
fact that event A occurs does not affect the probability
that event B occurs.
OR
A conditional probability is the probability of an event
occurring, given that another event has already occurred.
The conditional probability of event B occurring, given
that event A has occurred, is denoted by P(B|A) and is
read as “probability of B, given A.”
Dr. Faisal Bukhari, Department of Data Science, PU, Lahore 5
Independent and Dependent Events
[2]
Example 1: If a coin is tossed and then a die is rolled,
the outcome of the coin in no way affects or
changes the probability of the outcome of the die.
Dr. Faisal Bukhari, Department of Data Science, PU, Lahore 6
Independent and Dependent Events [3]
Example 2: Selecting a card from a deck, replacing it,
and then selecting a second card from a deck. The
outcome of the first card, as long as it is replaced,
has no effect on the probability of the outcome of
the second card.
Dr. Faisal Bukhari, Department of Data Science, PU, Lahore 7
Independent and Dependent Events [4]
Two events A and B are independent if and only if
P(B|A) = P(B) or P(A|B) = P(A), assuming the existences
of the conditional probabilities. Otherwise, A and B are
dependent
OR
When the occurrence of the first event in some way
changes the probability of the occurrence of the
second event, the two events are said to be
dependent.
Dr. Faisal Bukhari, Department of Data Science, PU, Lahore 8
Independent and Dependent Events
[5]
Example 1: Suppose a card is selected from a deck
and not replaced, and a second card is selected. In
this case, the probability of 1 selecting any specific
card on the first draw is 52, but since this card is not
replaced, the probability of selecting any other
specific card on the second 1 draw is 51, since there
are only 51 cards left.
Dr. Faisal Bukhari, Department of Data Science, PU, Lahore 9
Independent and Dependent Events [6]
Example 2: Drawing a ball from an urn, not replacing
it, and then drawing a second ball.
Dr. Faisal Bukhari, Department of Data Science, PU, Lahore 10
Example: The table at shows the results of a study in
which researchers examined a child’s IQ and the
presence of a specific gene in the child. Find the
probability that a child has a high IQ, given that the
child has the gene.
Dr. Faisal Bukhari, Department of Data Science, PU, Lahore 11
Solution Let A be the event that child has a high IQ
and let B be the event that has the gene
P(A and B)
P(A|B) =
P(B)
n(child has a high IQ and has the gene)
P(A|B) =
n(has the gene)
𝟑𝟑
P(B|A) = = 0.458
𝟕𝟐
Dr. Faisal Bukhari, Department of Data Science, PU, Lahore 12
First Multiplication Rule [1]
Before explaining the first multiplication rule,
consider the example of tossing two coins. The
sample space is HH, HT, TH, TT. From classical
probability theory, it can be determined that the
probability of getting two heads is 𝟏Τ𝟒.
However, there is another way to determine the
probability of getting two heads. In this case, the
probability of getting a head on the first toss is 𝟏Τ𝟐,
and the probability of getting a head on the second
toss is also 𝟏Τ𝟐.
Dr. Faisal Bukhari, Department of Data Science, PU, Lahore 13
First Multiplication Rule [2]
So the probability of getting two heads can be
determined by multiplying 𝟏Τ𝟐 × 𝟏Τ𝟐 = 𝟏Τ𝟒.
Dr. Faisal Bukhari, Department of Data Science, PU, Lahore 14
Multiplication Rule I [1]
Multiplication Rule I: For two independent events A
and B,
P(A and B)= P(A) × P(B).
In other words, when two independent events occur
in sequence, the probability that both events will
occur can be found by multiplying the probabilities of
each individual event.
The word “and” is the key word and means that both
events occur in sequence and to multiply.
Dr. Faisal Bukhari, Department of Data Science, PU, Lahore 15
Multiplication Rule I [2]
Example: A coin is tossed and a die is rolled. Find the
probability of getting a tail on the coin and a 5 on
the die.
Dr. Faisal Bukhari, Department of Data Science, PU, Lahore 16
Solution:
Let A be the event of getting a tail on the coin
P(A) = 𝟏Τ𝟐 = 0.5 ( or 50%)
Let B be the event of getting a 5 on the die
P(B) = 𝟏Τ𝟔 = 0.1667 (or 16.67 %)
Since A and B are independent events, so
P(A and B) = P(A) × P(B)
= 𝟏Τ𝟐 × 𝟏Τ𝟔 = 𝟏Τ𝟏𝟐
= 0.0833 (or 8.33 %)
Dr. Faisal Bukhari, Department of Data Science, PU, Lahore 17
Multiplication Rule 1 [3]
The previous example can also be solved using classical
probability. Recall that the sample space for tossing a
coin and rolling a die is
S ={H1, H2, H3, H4, H5, H6, T1, T2, T3, T4, T5, T6}
n(S) = 12
Let A be the event of getting a “T5”
A = {T5}
n(A) = 1
P(A) = 𝟏Τ𝟏𝟐
= 0.0833 (or 8.33 %)
Dr. Faisal Bukhari, Department of Data Science, PU, Lahore 18
Multiplication Rule 1 [4]
Example: An urn contains 2 red balls, 3 green balls,
and 5 blue balls. A ball is selected at random and its
color is noted. Then it is replaced and another ball is
selected and its color is noted. Find the probability of
each of these:
a. Selecting 2 blue balls
b. Selecting a blue ball and then a red ball
c. Selecting a green ball and then a blue ball
Dr. Faisal Bukhari, Department of Data Science, PU, Lahore 19
Solution
Let R be an event of getting a red ball
Let G be an event of getting a green ball
Let B be an event of getting a blue ball
P(R) = 𝟐Τ𝟏𝟎, P(G) = 𝟑Τ𝟏𝟎, P(B) = 𝟓Τ𝟏𝟎
Since events are independent, so
a. P(B and B) =P(BB) = P(B) x P(B) = 5Τ10 x 5Τ10 = 1Τ4
= 0.25 (or 25%)
b. P(B and R) = P(B) x P(R) = 5Τ10 x 2Τ10 = 1Τ10
= 0.10 (or 10 %)
c. P(G and B) = P(G) x P(B) = 3Τ10 x 5Τ10 = 3Τ20
= 0.15 (or 15 %)
Dr. Faisal Bukhari, Department of Data Science, PU, Lahore 20
Multiplication Rule 1 [5]
Example: A die is tossed 3 times. Find the probability
of getting three 6s.
Dr. Faisal Bukhari, Department of Data Science, PU, Lahore 21
Solution
Let A be the event of getting a ‘6’
P(A) = 1Τ6
Since events are independent, so
P(A and A and A) = P(A) x P(A) x P(A)
= 1Τ6 x 1Τ6 x 1Τ6
= 1Τ216 (= 0.0046 or 0.4600 %)
OR
P(AAA) = 1Τ6 x 1Τ6 x 1Τ6
= 1Τ216
= 0.0046 (or 0.4600 %)
Dr. Faisal Bukhari, Department of Data Science, PU, Lahore 22
Multiplication Rule 1 [6]
Example: It is known that 66% of the students at a
large college favor building a new fitness center. If
two students are selected at random, find the
probability that all of them favor the building of a
new fitness center.
Dr. Faisal Bukhari, Department of Data Science, PU, Lahore 23
Solution
Let F be the event that a student favor the building of
a new fitness center
P(F) = 0.66
P(F and F) or P(FF) = (0.66)(0.66)
= 0.4356 or 43.56%.
Dr. Faisal Bukhari, Department of Data Science, PU, Lahore 24
Multiplication Rule II [1]
When two sequential events are dependent, a
slight variation of the multiplication rule is used to
find the probability of both events occurring.
For example, when a card is selected from an
ordinary deck of 52 cards the probability of getting
a specific card is 𝟏Τ𝟓𝟐, but the probability of getting
a specific card on the second draw is 𝟏Τ𝟓𝟏 since 51
cards remain.
Dr. Faisal Bukhari, Department of Data Science, PU, Lahore 25
Example: Two cards are selected from a deck and the
first card is not replaced. Find the probability of
getting two kings.
Dr. Faisal Bukhari, Department of Data Science, PU, Lahore 26
Solution
P(two kings) = 4Τ52 x 3Τ51
= 12Τ2652
= 1Τ221
= 0.0045 (or 0.45 %)
Dr. Faisal Bukhari, Department of Data Science, PU, Lahore 27
Multiplication Rule II [2]
When the two events A and B are dependent, the
probability that the second event B occurs after
the first event A has already occurred is written as
P(B|A).
This does not mean that B is divided by A; rather, it
means and is read as ‘‘the probability that event B
occurs given that event A has already occurred.’’
P(B|A) also means the conditional probability that
event B occurs given event A has occurred.
Dr. Faisal Bukhari, Department of Data Science, PU, Lahore 28
Multiplication Rule II [3]
The probability of an event B occurring when it is
known that some event A has occurred is called a
conditional probability and is denoted by P(B|A).
The symbol P(B|A) is usually read “the probability
that B occurs given that A occurs”
OR
simply “the probability of B, given A.”
Dr. Faisal Bukhari, Department of Data Science, PU, Lahore 29
Multiplication Rule II [4]:
When two events are dependent, the probability of
both events occurring is P(A and B)= P(A) x P(B|A)
Dr. Faisal Bukhari, Department of Data Science, PU, Lahore 30
Example: A box contains 24 toasters, 3 of which are
defective. If two toasters are selected and tested,
find the probability that both are defective (assume
toasters are not replaced).
Dr. Faisal Bukhari, Department of Data Science, PU, Lahore 31
Solution
Let D1 be the event that first toaster is defective.
Let D2 be the event that second toaster is defective.
P(D1 and D2) = P(D1) x P(D2|D1)
= 3Τ24 x 2Τ23
= 1Τ8 x 2Τ23
= 1Τ92
= 0.0109 (or 1.0870 %)
Dr. Faisal Bukhari, Department of Data Science, PU, Lahore 32
Multiplication Rule II [5]:
When two events are dependent, the probability of
both events occurring is P(A and B)= P(A) x P(B|A)
Dr. Faisal Bukhari, Department of Data Science, PU, Lahore 33
Multiplication Rule II [6]:
Example: Two cards are drawn without replacement
from a deck of 52 cards. Find the probability that
both are queens.
Dr. Faisal Bukhari, Department of Data Science, PU, Lahore 34
Solution
Let Q1 be the event that the first card is a queen.
Let Q2 be the event that the second card is a queen.
P(Q1 and Q2) = P(Q1) x P(Q2|Q1)
= 4Τ52 x 3Τ51
= 1Τ221
= 0.0045 (0.4525%)
Dr. Faisal Bukhari, Department of Data Science, PU, Lahore 35
Multiplication Rule II [7]:
Example: A box contains 3 orange balls, 3 yellow
balls, and 2 white balls. Three balls are selected
without replacement. Find the probability of
selecting 2 yellow balls and a white ball.
Dr. Faisal Bukhari, Department of Data Science, PU, Lahore 36
Solution
Orange balls Yellow White balls Total balls
3 3 2 8
Let Y1 be the event that the first ball is yellow.
Let Y2 be the event that the second ball is yellow.
Let W3 be the event that the third ball is white.
P(Y1 and Y2 and W3) or P(Y1Y2W3) = 3Τ8 × 2Τ7 × 2Τ6
= 12Τ336
= 0.0357(or3.5714 %)
Note: The key word for the multiplication rule is and. It means to multiply.
Dr. Faisal Bukhari, Department of Data Science, PU, Lahore 37
Multiplication Rule II [8]:
Example: A box contains 3 orange balls, 3 yellow
balls, and 2 white balls. Three balls are selected
without replacement. Find the probability of
selecting a white ball, a yellow ball and a white ball.
Dr. Faisal Bukhari, Department of Data Science, PU, Lahore 38
Solution
Orange balls Yellow White balls Total balls
3 3 2 8
Let W1 be the event that the first ball is white
Let Y2 be the event that the second ball is yellow.
Let W3 be the event that the third ball is white.
P(W1 and Y2 and W3) or P(W1Y2W3) = 2Τ8 × 3Τ7 × 1Τ6
= 1Τ56
=0.017857(or1.7857 %)
Note: The key word for the multiplication rule is and. It means to multiply.
Dr. Faisal Bukhari, Department of Data Science, PU, Lahore 39
Multiplication Rule II [1]:
Example: A box contains 3 orange balls, 3 yellow
balls, and 2 white balls. Three balls are selected
without replacement. Find the probability of
selecting a white ball and 2 yellow balls.
Dr. Faisal Bukhari, Department of Data Science, PU, Lahore 40
Solution
Orange balls Yellow balls White balls Total balls
3 3 2 8
Let W1 be the event that the first ball is white.
Let Y2 be the event that the second ball is yellow.
Let Y3 be the event that the third ball is yellow.
P( W1 and Y2 and Y3) or P(W1Y2Y3) = 2Τ8 × 3Τ7 × 2Τ6
= 12Τ336
= 0.0357 (or3.5714 %)
Note: The key word for the multiplication rule is and.
It means to multiply.
Dr. Faisal Bukhari, Department of Data Science, PU, Lahore 41
Conditional Probability [1]
Previously, conditional probability was used to find
the probability of sequential events occurring when
they were dependent.
Recall that P(B|A) means the probability of event
B occurring given that event A has already
occurred.
Another situation where conditional probability
can be used is when additional information about
an event is known.
Sometimes it might be known that some
outcomes in the sample space have occurred or
that some outcomes cannot occur.
Dr. Faisal Bukhari, Department of Data Science, PU, Lahore 42
Conditional Probability [2]
When conditions are imposed or known on events,
there is a possibility that the probability of the certain
event occurring may change.
Example: A die is rolled; find the probability of getting a
4 if it is known that an even number occurred when
the die was rolled.
Dr. Faisal Bukhari, Department of Data Science, PU, Lahore 43
Alternative Approach: Conditional
Probability [1]
Solution:
If it is known that an even number has occurred, the
sample space is
Reduced sample space = {2, 4, 6}
n(S’) = 3
Let A be the event of getting a ‘4’
A = {4}
n(A) = 1
P(A) = 𝟏Τ𝟑 = 0.3333 (33.33%)
Dr. Faisal Bukhari, Department of Data Science, PU, Lahore 44
Sample space of two dice using
table
A table can be used for the sample space when two
dice are rolled.
Dr. Faisal Bukhari, Department of Data Science, PU, Lahore 45
Alternative Approach: Conditional
Probability [2]
Example: Two dice are rolled. Find the probability of
getting a sum of 3 if it is known that the sum of the
spots on the dice was less than six.
Dr. Faisal Bukhari, Department of Data Science, PU, Lahore 46
Solution
Reduced sample space ={(1, 1), (1, 2), (2, 1), (3, 1), (2, 2),
(1, 3), (1, 4), (2, 3), (3, 2), and (4, 1)}
n(S’) = 10
Let A be the event of getting a ‘sum of 3’
A = {(1, 2), (2, 1)}, n(A) = 2
P(A) = 𝟐Τ𝟏𝟎 = 𝟏Τ𝟓
or
P(sum of 3|sum less than 6) = 𝟐Τ𝟏𝟎
= 𝟏Τ𝟓 = 0.20 (or 20%)
Dr. Faisal Bukhari, Department of Data Science, PU, Lahore 47
Alternative Approach: Conditional
Probability [3]
The two previous examples of conditional probability
were solved using classical probability and reduced
sample spaces; however, they can be solved by using
the following formula for conditional probability.
Dr. Faisal Bukhari, Department of Data Science, PU, Lahore 48
Alternative Approach: Conditional
Probability [4]
The conditional probability of two events A and B is
P(A|B) = P(A and B)/P(B)
OR
P(A and B)
=
P(B)
P(A and B) means the probability of the outcomes
that events A and B have in common.
Dr. Faisal Bukhari, Department of Data Science, PU, Lahore 49
Conditional Probability without
reducing the sample space [1]
Example: A die is rolled; find the probability of getting a
4, if it is known that an even number occurred when
the die was rolled.
Dr. Faisal Bukhari, Department of Data Science, PU, Lahore 50
Solution
S = {1, 2, 3, 4, 5, 6}
Let events are defined as:
A: Getting a 4 on a die
B: An even number occur on a die
∵ P(A|B) = P(A and B)/P(B)
P(A and B) = 𝟏Τ𝟔
P(B) = 𝟑Τ𝟔
P(A and B) 1 × 6
P(A|B) = = Τ6 Τ3
P(B)
= 1Τ3 = 0.3333(or 33.33%)
Dr. Faisal Bukhari, Department of Data Science, PU, Lahore 51
Sample space of two dice using
table
A table can be used for the sample space when two
dice are rolled.
Dr. Faisal Bukhari, Department of Data Science, PU, Lahore 52
Conditional Probability without
reducing the sample space [2]
Example: Two dice are rolled. Find the probability of
getting a sum of 3 if it is known that the sum of the
spots on the dice was less than 6.
Dr. Faisal Bukhari, Department of Data Science, PU, Lahore 53
Solution [1]:
∵ P(A|B) = P(A ∩ B)/P(B)
Let events are defined as:
A ∩ 𝐁: Getting a sum 3 and sum of the spots on the
dice was less than 6
A: Getting sum of the spots on the dice was 3
B: Getting sum of the spots on the dice was less
than 6
A ∩ 𝐁 = {(2, 1), (1, 2)}
n(A ∩ 𝐁) = 2
P(A ∩ 𝐁) = 2Τ36 = 1Τ18
= 0.0555 (or 5.55 %)
Dr. Faisal Bukhari, Department of Data Science, PU, Lahore 54
Solution [2]:
Let B be the event of getting sum of the spots on the
dice was less than 6
B = {(1, 1), (1, 2), (1, 3), (1, 4), (2, 1), (2, 2), (2, 3), (3,
1), (3, 2), (4, 1)}
P(B) = 10Τ36 = 5Τ18 = 0.2777 (or 27.78 %)
P(A|B) = P(A and B)/P(B)
= 1Τ18 × 18Τ5 = 1Τ5 = 0.2 (or 20 %)
Dr. Faisal Bukhari, Department of Data Science, PU, Lahore 55
Alternative approach: Conditional
Probability with reducing the
sample space
Example: Two dice are rolled. Find the probability of
getting a sum of 3 if it is known that the sum of the
spots on the dice was less than 6.
Dr. Faisal Bukhari, Department of Data Science, PU, Lahore 56
Solution
If it is known that the sum of the spots on the dice
was less than 6
Let reduced sample space = 𝑆 ′
⇒ 𝑺′ = {(1, 1), (1, 2), (1, 3), (1, 4), (2, 1), (2, 2), (2, 3),
(3, 1), (3, 2), (4, 1)}
n(𝑺′ ) = 10
Let A be the event of getting a sum of 3
A = {(2, 1), (1, 2)}
𝟐 𝟏
P(A) = = = 0.2 (or 20 %)
𝟏𝟎 𝟓
Dr. Faisal Bukhari, Department of Data Science, PU, Lahore 57
Alternative approach: Conditional
Probability with reducing the
sample space
Example: When two dice were rolled, it is known
that the sum was an even number. In this case, find
the probability that the sum was 8.
Dr. Faisal Bukhari, Department of Data Science, PU, Lahore 58
Solution:
Reduced sample space = S’
{(1, 1), (1, 3), (1, 5), (2, 2), (2, 4), (2, 6), (3, 1), (3, 3),
(3, 5), (4, 2), (4, 4), (4, 6), (5, 1), (5, 3), (5, 5), (6, 2),
(6, 4), (6, 6)}
n(S’) = 18
LetA be the event of getting a sum of ‘8’
A = {(2, 6), (3, 5), (4, 4), (5, 3), (6, 2)}
n(A) = 5
P(A) = 5Τ18 = 0.2777 (27. 78%)
Dr. Faisal Bukhari, Department of Data Science, PU, Lahore 59
A Contingency Table
In statistics, a contingency table (also known as
a cross tabulation or crosstab) is a type of table in
a matrix format that displays the
(multivariate) frequency distribution of the variables.
They are heavily used in survey research, business
intelligence, engineering, and scientific research.
They provide a basic picture of the interrelation
between two variables and can help find
interactions between them.
Dr. Faisal Bukhari, Department of Data Science, PU, Lahore 60
Example: This question uses the following
contingency table:
Have pets Do not have pets Total
Male 0.41 0.08 0.49
Female 0.45 0.06 0.51
Total 0.86 0.14 1
What is the probability a randomly selected person is
male, given that they own a pet?
Dr. Faisal Bukhari, Department of Data Science, PU, Lahore 61
Step 1: Repopulate the formula with new variables
M is for male and PO stands for pet owner, so the formula
becomes:
P(A and B)
P(A|B) =
P(B)
P(M|PO) = P(M∩PO) / P(PO) --------------------(1)
Step 2: Figure out P(M∩PO) from the table. The intersection of
male/pets (the intersection on the table of these two factors) is
0.41
Have pets Do not have pets Total
Male 0.41 0.08 0.49
Female 0.45 0.06 0.51
Total 0.86 0.14 1
Dr. Faisal Bukhari, Department of Data Science, PU, Lahore 62
Step 3: Figure out P(PO) from the table. From the total
column, 86% (0.86) of respondents had a pet
Have pets Do not have Total
pets
Male 0.41 0.08 0.49
Female 0.45 0.06 0.51
Total 0.86 0.14 1
Step 4: Insert your values into the formula:
P(M|PO) = P(M∩PO) / PO
= 0.41 / 0.86
= 0.477, or 47.7%.
Example: In a large housing plan, 35% of the homes
have a deck and a two-car garage, and 80% of the
houses have a two-car garage. Find the probability
that a house has a deck given that it has a two-car
garage.
Dr. Faisal Bukhari, Department of Data Science, PU, Lahore 64
Solution
Let D be the event of getting deck and two-car garage
Let G be the event of getting two-car garage
Given
P(D) = 0.35
P(G) = 0.80
P(deck|two-car garage) = P(D)ൗP(G)
= 0.35Τ = 7Τ16
0.80
= 0.4375 (or 43.75 %)
Dr. Faisal Bukhari, Department of Data Science, PU, Lahore 65
A summary of probability
Dr. Faisal Bukhari, Department of Data Science, PU, Lahore 66