Prob&StatsBook PDF
Prob&StatsBook PDF
January 8, 2013
2
Contents
2 Probability 9
2.1 What is Probability . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Review of set notation . . . . . . . . . . . . . . . . . . . . . . 10
2.3 Types of Probability . . . . . . . . . . . . . . . . . . . . . . . 16
2.4 Laws of Probability . . . . . . . . . . . . . . . . . . . . . . . . 17
2.5 Counting Rules useful in Probability . . . . . . . . . . . . . . 21
2.6 Conditional probability and independence . . . . . . . . . . . 28
2.7 Bayes Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3
4 CONTENTS
7
8 CHAPTER 1. PROBABILITY IN THE WORLD AROUND US
rather to the results of many repetitions. For example, we cannot predict the
result of one particular lottery drawing, but we can calculate probabilities of
certain outcomes. We cannot describe the velocity of a single atom, but we
can say something about the behavior of the velocities in the ensemble of all
atoms.
This is the stuff that probabilistic models are made of. Another example
of a field where probabilistic models are routinely used is actuarial science.
It deals with lifetimes of humans and tries to predict how long any given
person is expected to live, based on other variables describing the particulars
of his/her life. Of course, this expected life span is a poor prediction when
applied to any given person, but it works rather well when applied to many
persons. It can help to decide the rates the insurance company should charge
for covering any given person.
Today’s science deals with enormously complex models, for example, the
models of Earth’s climate (there are many of them available, at different levels
of complexity and resolution). The models should also take into account the
uncertainties from many sources, including our imperfect knowledge of the
current state of Earth, our imperfect understanding of all physical processes
involved, and the uncertainty about future scenarios of human development.2
Understanding and communicating this uncertainty is greatly aided by
the knowledge of the rules of probability.
The authors thank Lynda Ballou for contributing some examples and
exercises.
2
Not the least, our ability to calculate the output of such models is also limited by the
current state of computational science.
Chapter 2
Probability
9
10 CHAPTER 2. PROBABILITY
Axioms of Probability
All probability values are positive numbers not greater than 1, i.e. 0 ≤ p ≤ 1.
An event that is not likely to occur or impossible has probability zero, while
an event that’s certain to occur has probability one.
Outcome: The elementary outcomes are all possible results of the random
experiment.
Sample Space(SS): The sample space is the set or collection of all the
outcomes of an experiment and is denoted by S.
Example 2.2.
a) Flip a coin once, then the sample space is: S = {H, T }
b) Flip a coin twice, then the sample space is: S = {HH, HT, T H, T T }
A B
&%
&%
Venn diagram
Venn diagram is often used to illustrate the relations between sets (events).
The sets A and B are represented as circles; operations between them (in-
tersections, unions and complements) can also be represented as parts of the
diagram. The entire sample space S is the bounding box. See Figure 2.1
A′B′
AB′ AB A′B
Figure 2.1: Venn diagram of events A (in bold) and B, represented as insides
of circles, and various intersections
De Morgan’s Law
(A ∪ B)0 = A0 B 0
(AB)0 = A0 ∪ B 0
Exercises
2.1.
Use the Venn diagrams to illustrate Distributive laws and De Morgan’s law.
2.2.
Simplify the following (Draw the Venn diagrams to visualize)
a) (A0 )0
b) (AB)0 ∪ A
c) (AB) ∪ (AB 0 )
d) (A ∪ B ∪ C)B
2.3.
Represent by set notation the following events
2.4.
The sample space consists of eight capital letters (outcomes), A, B, C ,...,
H. Let V = event that the letter represents a vowel, and L = event that the
letter is made of straight lines. Describe the outcomes that comprise
a) V L
b) V ∪ L0
c) V 0 L0
14 CHAPTER 2. PROBABILITY
2.5.
Out of all items sent for refurbishing, 40% had mechanical defects, 50% had
electrical defects, and 25% had both.
Denoting A = {an item has a mechanical defect} and
B = {an item has an electrical defect}, fill the probabilities into the Venn
diagram and determine the quantities listed below.
a) P (A)
b) P (AB)
c) P (A0 B)
'$
'$
d) P (A0 B 0 )
A B
e) P (A ∪ B)
&%
&%
f) P (A0 ∪ B 0 )
g) P ([A ∪ B]0 )
2.6.
A sample of mutual funds was classified according to whether a fund was up
or down last year (A and A0 ) and whether it was investing in international
stocks (B and B 0 ). The probabilities of these events and their intersections
are represented in the two-way table below.
B B0
A 0.33 ? ?
A0 ? ? 0.52
0.64 ? 1
• Two-way table
This is a popular way to represent statistical data. The cells of the table
correspond to the intersections of row and column events. Note that the
contents of the table add up accross rows and columns of the table. The
bottom-right corner of the table contains P (S) = 1
B B0
A 0.26 0.32 0.58
A0 0.11 ? 0.42
0.37 0.63 1
• Tree diagram
A tree diagram may be used to show the sequence of choices that lead to the
complete description of outcomes. For example, when tossing two coins, we
may represent this as follows
●
Second toss Outcome
H HH
First toss
H
T HT
H TH
T
T TT
(see below).
16 CHAPTER 2. PROBABILITY
Example 2.5.
Example 2.6.
The following are the counts of fish of each type, that you have caught before.
Example 2.7.
Based on genetics, the proportion of male children among all children con-
ceived should be around 0.5. However, based on the statistics from a large
number of live births, the probability that a child being born is male is about
0.512.
The empirical probability definition has a weakness that it depends on
the results of a particular experiment. The next time this experiment is
repeated, you are likely to get a somewhat different result.
However, as an experiment is repeated many times, the empirical proba-
bility of an event, based on the combined results, approaches the theoretical
probability of the event.1
Subjective Probability: Subjective probabilities result from intuition, ed-
ucated guesses, and estimates. For example: given a patient’s health and
extent of injuries a doctor may feel that the patient has a 90% chance of a
full recovery.
Regardless of the way probabilities are defined, they always follow the
same laws, which we will explore starting with the following Section.
a) 0 ≤ P (ei ) ≤ 1
necessarily which ones would work in practice. However, once we have as-
signed the probability to each outcome, they are subject to further rules
which we will describe below.
Example 2.8. Probability that John passes a Math exam is 4/5 and that he
passes a Chemistry exam is 5/6. If the probability that he passes both exams
is 3/4, find the probability that he will pass at least one exam.
Solution. Let M = John passes Math exam, and C = John passes Chemistry
exam.
P (John passes at least one exam) = P (M ∪ C) =
= P (M ) + P (C) − P (M ∩ C) = 4/5 + 5/6 − 3/4 = 53/60
P (A ∪ B) = P (A) + P (B).
This follows immediately from (2.2). Since A and B are mutually exclusive,
P (A ∩ B) = 0.
2.4. LAWS OF PROBABILITY 19
Example 2.9. What is the probability of getting a total of 7 or 11, when two
dice are rolled?
1 2 3 4 5 6
1 (1,1) (1,2) (1,6)
2
3
4
5
6 (6,6)
Solution. Let A be the event that the total is 7 and B be the event that it is
11. The sample space for this experiment is
S = {(1, 1), (1, 2), ......, (2, 1), (2, 2), ........., (6, 6)}, n(S) = 36
A = {(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1)} and n(A) = 6.
So, P (A) = 6/36 = 1/6.
Exercises
2.7.
Two cards are drawn from a pack, without replacement. What is the prob-
ability that both are greater than 2 and less than 8?
2.8.
A permutation of the word ”white” is chosen at random. Find the probability
that it begins with a vowel. Also find the probability that it ends with a
consonant.
2.9.
Find the probability that a leap year will have 53 Sundays.
20 CHAPTER 2. PROBABILITY
2.10.
As a foreign language, 40% of the students took Spanish and 30% took
French, while 60% took at least one of these languages. What percent of
students took both Spanish and French?
2.11.
In a class of 100 students, 30 are in mathematics. Moreover, of the 40 females
in the class, 10 are in Mathematics. If a student is selected at random from
the class, what is the probability that the student will be a male or be in
mathematics?
2.12.
Suppose that P (A) = 0.4, P (B) = 0.5 and P (AB) = 0.2. Find the following:
a) P (A ∪ B)
b) P (A0 B)
c) P [A0 (A ∪ B)]
d) P [A ∪ (A0 B)]
2.13.
Two tetrahedral (4-sided) symmetrical dice are rolled, one after the other.
a) Find the probability that both dice will land on the same number.
b) Find the probability that each die will land on a number less than 3.
c) Find the probability that the two numbers will differ by at most 1.
d) Will the answers change if we rolled the dice simultaneously?
2.5. COUNTING RULES USEFUL IN PROBABILITY 21
Example 2.10. How large is the sample space when a pair of dice is thrown?
Solution. The first die can be thrown in n1 = 6 ways and the second in
n2 = 6 ways. Therefore, the pair of dice can land in n1 n2 = 36 possible
ways.
Theorem 2.3 can naturally be extended to more than two operations: if
we have n1 , n2 ,...,nk consequent choices, then the total number of ways is
n1 n2 · · · nk .
The term permutations refers to an arrangement of objects when the or-
der matters (for example, letters in a word).
Theorem 2.4. Permutations
Example 2.11.
From among ten employees, three are to be selected to travel to three out-
of-town plants A, B, and C, one to each plant. Since the plants are located
in different cities, the order in which the employees are assigned to the plants
is an important consideration. In how many ways can the assignments be
made?
22 CHAPTER 2. PROBABILITY
Proof. Start with picking ordered sets of size r. This can be done in n Pr =
n!
(n−r)!
ways. However, many of these are the re-orderings of the same basic
set of objects. Each distinct set of r objects can be re-ordered in r Pr = r!
ways. Therefore, we need to divide the number of permutations n Pr by r!,
thus arriving at the equation (2.3).
Example 2.12.
In the previous example, suppose that three employees are to be selected
from among the ten available to go to the same plant. In how many ways
can this selection be made?
Solution. Here, order is not important; we want to know how many subsets
of size r = 3 can be selected from n = 10 people. The result is
10 10! 10(9)(8)
= = = 120
3 3! 7! 3(2)(1)
2.5. COUNTING RULES USEFUL IN PROBABILITY 23
Example 2.13.
A package of six light bulbs contains 2 defective bulbs. If three bulbs are
selected for use, find the probability that none of the three is defective.
Solution. P(none are defective) =
4
number of ways 3 nondefectives can be chosen 3 1
= = =
total number of ways a sample of 3 can be chosen 6 5
3
Example 2.14.
In a poker hand consisting of 5 cards, find the probability of holding 2 aces
and 3 jacks.
4
Solution. The number of ways of being dealt 2 aces from 4 is 2
= 6 and
4
the number of ways of being dealt 3 jacks from 4 is 3 = 4.
The total number of 5-card poker hands, all of which are equally likely is
52
= 2, 598, 960
5
Hence, the probability of getting 2 aces and 3 jacks in a 5-card poker hand
is P (C) = (6 ∗ 4)/2, 598, 960
Example 2.15.
A university warehouse has received a shipment of 25 printers, of which
10 are laser printers and 15 are inkjet models. If 6 of these 25 are selected
at random to be checked by a particular technician, what is the probability
that exactly 3 of these selected are laser printers? At least 3 inkjet printers?
Solution. First
choose 3 of the 15 inkjet and then 3 of the 10 laser printers.
15 10
There are 3 and 3 ways to do it, and therefore
15
10
3
P (exactly 3 of the 6) = 3
25 = 0.3083
6
Example 2.16.
Consider 10 people to be split into 3 groups to be assigned to 3 plants. If we
are to send 5 people to Plant A, 3 people to Plant B, and 2 people to Plant
C, then the total number of assignments is
10!
= 2520
5! 3! 2!
Exercises
2.14.
An incoming lot of silicon wafers is to be inspected for defectives by an engi-
neer in a microchip manufacturing plant. Suppose that, in a tray containing
20 wafers, 4 are defective. Two wafers are to be selected randomly for in-
spection. Find the probability that neither is defective.
2.15.
A person draws 5 cards from a shuffled pack of cards. Find the probability
that the person has at least 3 aces. Find the probability that the person has
at least 4 cards of the same suit.
2.16.
Three people enter the elevator on the basement level. The building has 7
floors. Find the probability that all three get off at different floors.
2.17.
In a group of 7 people, each person shakes hands with every other person.
How many handshakes did occur?
2.5. COUNTING RULES USEFUL IN PROBABILITY 25
2.18.
A marketing director considers that there’s “overwhelming agreement” in a
5-member focus group when either 4 or 5 people like or dislike the product.a
If, in fact, the product’s popularity is 50% (so that all outcomes are equally
likely), what is the probability that the focus group will be in “overwhelming
agreement” about it? Is the marketing director making a judgement error in
declaring such agreement “overwhelming”?
2.19.
A die is tossed 5 times. Find the probability that we will have 4 of a kind.
2.20.
In a lottery, 6 numbers are drawn out of 45. You hit a jackpot if you guess
all 6 numbers correctly, and get $400 if you guess 5 numbers out of 6. What
are the probabilities of each of those events?
2.21.
There are 21 Bachelor of Science programs at New Mexico Tech. Given 21
areas from which to choose, in how many ways can a student select:
2.22.
From a box containing 5 chocolates and 4 hard candies, a child takes a
handful of 4 (at random). What is the probability that exactly 3 of the 4 are
chocolates?
2.23.
If a group consist of 8 men and 6 women, in how many ways can a committee
of 5 be selected if:
2.24.
Suppose we have a lot of 40 transistors of which 8 are defective. If we sample
without replacement, what is the probability that we get 4 good transistors
in the first 5 draws?
2.25.
A housewife is asked to rank four brands A, B, C, and D of household cleaner
according to her preference, number one being the one she prefers most, etc.
she really has no preference among the four brands. Hence, any ordering is
equally likely to occur.
2.26.
How many ways can one arrange the letters of the word ADVANTAGE so
that the three As are adjacent to each other?
2.27.
How many distinct ways are there to permute the letters in the word PROB-
ABILITY?
2.28.
Eight tires of different brands are ranked 1 to 8 (best to worst) according
to mileage performance. If four of these tires are chosen at random by a
customer, find the probability that the best tire among the four selected by
the customer is actually ranked third among the original eight.
2.5. COUNTING RULES USEFUL IN PROBABILITY 27
1 1
1 2 1
1 3 3 1
1 4 6 4 1
1
● 5 10 10 5 1
A step in construction
The number in each cell represents the number of downward routes from the
vertex to that point (can you explain why?). It is also a number of ways to choose
r objects out of n (can you explain why?), that is, nr .
●
1
1 1
1 2 1
1 3 3 1
1 4 6 4 1
1 5 10 10 5 1
1 6 15 20 15 6 1
1 7 21 1 35 35 21 7
1 8 28
56 70 56 28 8 1
1 9 36 84 126 126 84 36 9 1
1
● 10 45 120 210 252 210 120 45 10 1
The first 10 rows
The combinations numbers are also called binomial coefficients and are seen in
Calculus. Namely, they are the terms in the expansion
n
n
X n
(a + b) = ar bn−r
r
r=0
Note that, if you let a = b = 1/2, then on the right-hand side of the sum you will
get the probabilities
n
r
P (a is chosen r times and b is chosen n − r times) = n
2
and on the left-hand side you will have 1 (the total of all probabilities).
28 CHAPTER 2. PROBABILITY
P (A ∩ B)
P (A | B) = for P (B) > 0 (2.4)
P (B)
Number of outcomes in AB
P (A | B) =
Number of outcomes in B
For example, P (a die is 3 | a die is odd) = 1/3 and
P (a die is 4 | a die is odd) = 0.
Example 2.17.
Let A = {a family has two boys} and B = {a family of two has at least one boy}
Find P (A | B).
2.6. CONDITIONAL PROBABILITY AND INDEPENDENCE 29
Solution. The event B contains the following outcomes: (B, B), (B, G) and
(G, B). Only one of these is in A. Thus, P (A | B) = 1/3.
However, if I know that the family has two children, and I see one of
the children and it’s a boy, then the probability suddenly changes to 1/2.
There is a subtle difference in the language and this changes the conditional
probability!2
Statistical reasoning
Suppose I pick a card at random from a pack of playing cards, without
showing you. I ask you to guess which card it is, and you guess the five of
diamonds. What is the probability that you are right? Since there are 52
cards in a pack, and only one five of diamonds, the probability of the card
being the five of diamonds is 1/52.
Next, I tell you that the card is red, not black. Now what is the probability
that you are right? Clearly you now have a better chance of being right than
you had before. In fact, your chance of being right is twice as big as it was
before, since only half of the 52 cards are red. So the probability of the card
being the five of diamonds is now 1/26. What we have just calculated is a
conditional probability–the probability that the card is the five of diamonds,
given that it is red.
If we let A stand for the card being the five of diamonds, and B stand for
the card being red, then the conditional probability that the card is the five
of diamonds given that it is red is written P (A|B).
In our case, P (A ∩ B) is the probability that the card is the five of
diamonds and red, which is 1/52 (exactly the same as P(A), since there are
no black fives of diamonds!). P(B), the probability that the card is red, is
1/2. So the definition of conditional probability tells us that P (A|B) = 1/26,
exactly as it should. In this simple case we didn’t really need to use a formula
to tell us this, but the formula is very useful in more complex cases.
If we rearrange the definition of conditional probability, we obtain the
multiplication rule for probabilities:
Definition 2.10 can be extended to more than two events, but it’s fairly
difficult to describe.3 However, it is often used in this context:
For example, if we tossed a coin 5 times, the probability that all are Heads
is P (H1 ) × P (H2 ) × ... × P (H5 ) = (1/2)5 = 1/32. However, this calculation
also extends to outcomes with unequal probabilities.
Example 2.19.
Three bits (0 or 1 digits) are transmitted over a noisy channel, so they will
be flipped independently with probability 0.1 each. What is the probability
3
For example, the relation P (ABC) = P (A)P (B)P (C) does not guarantee that the
events A, B, C are independent.
2.6. CONDITIONAL PROBABILITY AND INDEPENDENCE 31
that
a) At least one bit is flipped
b) Exactly one bit is flipped?
P (exactly one) = P (F1 F20 F30 )+P (F10 F2 F30 )+P (F10 F20 F3 ) = 3(0.1)(1−0.1)2 = 0.243
Self-test questions
Suppose you throw two dice, one after the other.
Answers:
d) For the dice to add up to 4, there are three possibilities–either both dice
show a 2, or the first shows a 3 and the second shows a 1, or the first
shows a 1 and the second shows a 3. Each of these has a probability
of (1/6)(1/6) 3= 1/36 (using the special multiplication rule, since the
rolls are independent). Hence the probability that the dice add up to 4
is 1/36 + 1/36 + 1/36 = 3/36 = 1/12 (using the special addition rule,
since the outcomes are mutually exclusive).
e) If the first die shows a 2, then for the dice to add up to 4 the second
die must also show a 2. So the probability that the dice add up to 4
given that the first shows a 2 is 1/6.
f) Note that we cannot use the simplified multiplication rule here, because
the dice adding up to 4 is not independent of the first die showing a
2. So we need to use the full multiplication rule. This tells us that
probability that the first die shows a 2 and the dice add up to 4 is
given by the probability that the first die shows a 2, multiplied by the
probability that the dice add up to 4 given that the first die shows a 2.
This is (1/6)(1/6) = 1/36.
Alternatively, see part (c).
Second marble
7 6 42
P(R2|R1) = 6 9 P(R1R2) = * =
First marble 10 9 90
P(R1) = 7 10
P(G2|R1) = 3 9 P(R1G2) = 21 90
P(R2|G1) = 7 9 P(G1R2) = ?
P(G1) = 3 10
P(G2|G1) = 2 9 P(G1G2) = ?
Now, can you tell me what P (R2 ) and P (R1 | R2 ) are? Maybe you know the
answer already. However, we will get back to this question in Section 2.7.
Example 2.21.
Suppose that of all individuals buying a certain digital camera, 60% include
an optional memory card in their purchase, 40% include a set of batteries,
and 30% include both a card and batteries. Consider randomly selecting a
buyer and let A={memory card purchased} and B= {battery purchased}.
Then find P (A|B) and P (B|A).
Solution. From given information, we have P (A) = 0.60, P (B) = 0.40, and
P(both purchased) = P (A ∩ B) =0.30. Given that the selected individual
purchased an extra battery, the probability that an optional card was also
purchased is
P (A ∩ B) 0.30
P (A|B) = = = 0.75
P (B) 0.40
That is, of all those purchasing an extra battery, 75% purchased an optional
34 CHAPTER 2. PROBABILITY
P (B ∩ A) 0.30
P (battery | memory card) = P (B|A) = = = 0.50
P (A) 0.60
Notice that P (A|B) 6= P (A) and P (B|A) 6= P (B), that is, the events A and
B are dependent.
Exercises
2.29.
A year has 53 Sundays. What is the conditional probability that it is a leap
year?
2.30.
The probability that a majority of the stockholders of a company will attend
a special meeting is 0.5. If the majority attends, then the probability that
an important merger will be approved is 0.9. What is the probability that a
majority will attend and the merger will be approved?
2.31.
Let events A, B have positive probabilities. Show that, if P (A | B) = P (A)
then also P (B | A) = P (B).
2.32.
The cards numbered 1 through 10 are placed in a hat, mixed up, then one
of the cards is drawn. If we are told that the number on the drawn card is
at least five, then what is the probability that it is ten?
2.33.
In the roll of a fair die, consider the events A = {2, 4, 6} = “even numbers”
and B = {4, 5, 6} =“high scores”. Find the probability that die showing an
even number given that it is a high score.
2.34.
There are two urns. In the first urn there are 3 white and 2 black balls and
in the second urn there 1 white and 4 black balls. From a randomly chosen
urn, one ball is drawn. What is the probability that the ball is white?
2.6. CONDITIONAL PROBABILITY AND INDEPENDENCE 35
2.35.
The level of college attainment of US population by racial and ethnic group
in 1998 is given in the following tableb
Racial or Eth- Number of Percentage Percentage Percentage
nic Group Adults with with with
(Millions) Associate’s Bachelor’s Graduate or
Degree Degree Professional
Degree
Native Americans 1.1 6.4 6.1 3.3
Blacks 16.8 5.3 7.5 3.8
Asians 4.3 7.7 22.7 13.9
Hispanics 11.2 4.8 5.9 3.3
Whites 132.0 6.3 13.9 7.7
The percentages given in the right three columns are conditional percentages.
c) Given that the person had an Associate’s degree, what is the probability
that the person was Hispanic?
2.36.
The dealer’s lot contains 40 cars arranged in 5 rows and 8 columns. We pick
one car at random. Are the events A = {the car comes from an odd-numbered row}
and B = {the car comes from one of the last 4 columns} independent? Prove
your point of view.
2.37.
You have sent applications to two colleges. If you are considering your
chances to be accepted to either college as 60%, and believe the results are
statistically independent, what is the probability that you’ll be accepted to
at least one?
How will your answer change if you applied to 5 colleges?
2.38.
Show that, if the events A and B are independent, then so are A0 and B 0 .
36 CHAPTER 2. PROBABILITY
2.39.
In a high school class, 50% of the students took Spanish, 25% took French
and 30% of the students took neither.
Let A = event that a randomly chosen student took Spanish, and B =
event that a student took French. Fill in either the Venn diagram or a 2-way
table and answer the questions:
B B0
'$
'$ A
A B A0
&%
&%
2.40.
One half of all female physicists are married. Among those married, 50% are
married to other physicists, 29% to scientists other than physicists and 21%
to nonscientists. Among male physicists, 74% are married. Among them, 7%
are married to other physicists, 11% to scientists other than physicists and
82% to nonscientists.c What percent of all physicists are female? [Hint: This
problem can be solved as is, but if you want to, assume that physicists comprise
1% of all population.]
2.41.
Error-correcting codes are designed to withstand errors in data being sent
over communication lines. Suppose we are sending a binary signal (consisting
of a sequence of 0’s and 1’s), and during transmission, any bit may get flipped
with probability p, independently of any other bit. However, we might choose
to repeat each bit 3 times. For example, if we want to send a sequence 010,
2.7. BAYES RULE 37
we will code it as 000111000. If one of the three bits flips, say, the receiver
gets the sequence 001111000, he will still be able to decode it as 010 by
majority voting. That is, reading the first three bits, 001, he will interpret
it as an attempt to send 000. However, if two of the three bits are flipped,
for example 011, this will be interpreted as an attempt to send 111, and thus
decoded incorrectly.
What is the probability of a bit being decoded incorrectly under this
scheme?d
2.42. ?
Give an example of events A, B, C such that they are pairwise independent
(i.e. P (AB) = P (A)P (B) etc.) but P (ABC) 6= P (A)P (B)P (C). [Hint:
You may build them on a sample space with 4 elementary outcomes.]
b) B1 ∪ B2 ∪ · · · ∪ Bk = S
This situation often arises when the statistics are available in subgroups of
a population. For example, an insurance company might know accident rates
for each age group Bi . This will give the company conditional probabilities
P (A | Bi ) (if we denote A = event of accident).
Question: if we know all the conditional probabilities P (A | Bi ), how do
we find the unconditional P (A)?
Consider a case when k = 2:
The event A can be written as the union of mutually exclusive events AB1
and AB2 , that is
B1 B2 .... Bk
B2A
Suppose we want to find probability of the form P (B1 |A), which can be
written as
P (AB1 ) P (A|B1 ) P (B1 )
P (B1 |A) = = ,
P (A) P (A)
therefore
P (B1 )P (A|B1 )
P (B1 |A) =
P (B1 )P (A|B1 ) + P (B2 )P (A|B2 )
This calculation generalizes to k > 2 events as follows.
Theorem 2.7. Bayes Rule
Subsequently,
P (Bj )P (A|Bj )
P (Bj |A) = (2.9)
P (A)
The equation (2.8) is often called Law of Total Probability.
2.7. BAYES RULE 39
Example 2.22.
A rare genetic disease (occuring in 1 out of 1000 people) is diagnosed using
a DNA screening test. The test has false positive rate of 0.5%, meaning that
P (test positive | no disease) = 0.005. Given that a person has tested positive,
what is the probability that this person actually has the disease?
First, guess the answer, then read on.
Example 2.23.
At a certain assembly plant, three machines make 30%, 45%, and 25%, re-
spectively, of the products. It is known from the past experience that 2%, 3%,
and 2% of the products made by each machine, respectively, are defective.
Now, suppose that a finished product is randomly selected.
Here, the first branching represents probabilities of the events Bi , and the
second branching represents conditional probabilities P (A | Bi ). The proba-
bilities of intersections, given by the products, are on the right. P (A) is their
sum.
Exercises
2.43.
Lucy is undecided as to whether to take a Math course or a Chemistry course.
She estimates that her probability of receiving an A grade would be 21 in a
math course, and 23 in a chemistry course. If Lucy decides to base her decision
on the flip of a fair coin, what is the probability that she gets an A?
2.7. BAYES RULE 41
2.44.
Of the customers at a gas station, 70% use regular gas, and 30% use diesel.
Of the customers who use regular gas, 60% will fill the tank completely, and
of those who use diesel, 80% will fill the tank completely.
a) What percent of all customers will fill the tank completely?
Star?
(c) What percent of all cameras sold were Zony and were not returned?
P (A B1 ) =
P (A|B )=
1
PP
PP
P PP
P PP
P
P (B1 )
P (B2 )
PP
@ PP
@ P PP
@ P PP
@ P
P@(B3 )
@
@
@
@
@
@
PP
PP
P PP
P PP
P
2.48.
Three newspapers, A, B, and C are published in a certain city. It is estimated
from a survey that that of the adult population: 20% read A, 16% read B,
14% read C, 8% read both A and B, 5% read both A and C, 4% read both B
and C, 2% read all three. What percentage reads at least one of the papers?
Of those that read at least one, what percentage reads both A and B?
2.49.
Suppose P (A|B) = 0.3, P (B) = 0.4, P (B|A) = 0.6. Find:
a) P (A)
b) P (A ∪ B)
NOTES 43
2.50. ?
This is the famous Monty Hall problem.f A contestant on a game show is
asked to choose among 3 doors. There is a prize behind one door and nothing
behind the other two. You (the contestant) have chosen one door. Then, the
host is flinging one other door open, and there’s nothing behind it. What
is the best strategy? Should you switch to the remaining door, or just stay
with the door you have chosen? What is your probability of success (getting
the prize) for either strategy?
2.51. ?
There are two children in a family. We overheard about one of them referred
to as a boy.
b) Suppose that the oldest child is a boy. Again, find the probability that
there are 2 boys in the family.g [Why is it different from part (a)?]
Chapter exercises
2.52.
At a university, two students were doing well for the entire semester but
failed to show up for a final exam. Their excuse was that they traveled out
of state and had a flat tire. The professor gave them the exam in separate
rooms, with one question worth 95 points: “which tire was it?”. Find the
probability that both students mentioned the same tire.h
2.53.
In firing the company’s CEO, the argument was that during the six years
of her tenure, for the last three years the company’s market share was lower
than for the first three years. The CEO claims bad luck. Find the probability
that, given six random numbers, the last three are the lowest among six.
Notes
a
Taken from Leonard Mlodinow, The Drunkard’s Walk
b
Source: US Department of Education, National Center for Education Statistics, as
reported in Chronicle of Higher Education Almanac, 1998-1999, 2000.
44 NOTES
c
Laurie McNeil and Marc Sher. The dual-career-couple problem. Physics Today, July
1999.
d
see David MacKay, Information Theory, Inference, and Learning Algorithms, 640
pages, Published September 2003.
Downloadable from http://www.inference.phy.cam.ac.uk/itprnn/book.html
e
According to "http://www.highbeam.com/doc/1G1-167842487.html",Consumer In-
terests Annual, January 1, 2007 by Hanna, Sherman D.; Lindamood, Suzanne
f
There are some interesting factoids about this in Mlodinow’s book, including Marylin
vos Savant’s column in Parade magazine and scathing replies from academics, who believed
that the probability was 50%. Vos Savant did it again in 2011 with another probability
question that seems, however, intentionally ambiguously worded.
g
Puzzle cited by Martin Gardner, mentioned in Math Horizons, Sept. 2010. See also the
discussion at http://www.stat.columbia.edu/~cook/movabletype/archives/2010/05/
hype about cond.html
h
This example is also from Mlodinow’s book.
Chapter 3
Discrete probability
distributions
One can think of the shoe size of a randomly chosen person as a random
variable. We have already seen the example when a die was rolled and a
number was recorded. This number is also a random variable.
Example 3.1.
Toss two coins and record the number of heads: 0, 1 or 2. Then the following
outcomes can be observed.
Outcome TT HT TH HH
Number of heads 0 1 1 2
The random variables will be denoted with capital letters X, Y, Z, ... and the
lowercase x would represent a particular value of X. For the above example,
x = 2 if heads comes up twice. Now we want to look at the probabilities of
45
46 CHAPTER 3. DISCRETE PROBABILITY DISTRIBUTIONS
the outcomes. For the probability that the random variable X has the value
x, we write P (X = x), or just p(x).
For the coin flipping random variable X, we can make the table:
x 0 1 2
p(x) 1/4 1/2 1/4
This table represents the probability distribution of the random variable X.
a) P (X = x) = pX (x) ≥ 0
P
b) x P (X = x) = 1, where the sum is over all possible x
The function pX (x) or simply p(x) is called probability mass function (PMF)
of X.
What does this actually mean? A discrete probability function is a func-
tion that can take a discrete number of values (not necessarily finite). This
is most often the non-negative integers or some subset of the non-negative
integers. There is no mathematical restriction that discrete probability func-
tions only be defined at integers, but we will use integers in many practical
situations. For example, if you toss a coin 6 times, you can get 2 heads or 3
heads but not 2.5 heads.
Each of the discrete values has a certain probability of occurrence that is
between zero and one. That is, a discrete function that allows negative values
or values greater than one is not a PMF. The condition that the probabilities
add up to one means that one of the values has to occur.
Example 3.2.
A shipment of 8 similar microcomputers to a retail outlet contains 3 that are
defective. If a school makes a random purchase of 2 of these computers, find
the probability mass function for the number of defectives.
Solution. Let X be a random variable whose values x are the possible num-
bers of defective computers purchased by school. Then x must be 0, 1 or 2.
3.1. DISCRETE DISTRIBUTIONS 47
Then,
3
5
0 10
P (X = 0) = 8
2 =
2
28
3
5
1 15
P (X = 1) = 8
1 =
2
28
3
5
2 3
P (X = 2) = 8
0 =
2
28
Thus, the probability mass function of X is
x 0 1 2
10 15 3
p(x)
28 28 28
b) lim F (x) = 1
x→∞
c) F (x) is non-decreasing
d) p(x) = F (x) − F (x−) = F (x) − lim F (y)
y↑x
Example 3.3.
Find the CDF of the random variable from Example 3.2. Using F (x), verify
that P (X = 1) = 15/28.
25 10 15
Now, P (X = 1) = p(1) = F (1) − F (0) = 28
− 28
= 28
.
−1 0 1 2 3 −1 0 1 2 3
x x
Exercises
3.1.
Suppose that two dice are rolled independently, with outcomes X1 and X2 .
Find the distribution of the random variable Y = X1 + X2 . [Hint: It’s easier
to visualize all the ourcomes if you make a two-way table.]
3.2.
What constant c makes p(x) a valid PMF?
3.5.
For an on-line electronics retailer, X = the number of Zony digital cameras
returned per day follows the distribution given by
x 0 1 2 3 4 5
p(x) 0.05 0.1 ? 0.2 0.25 0.1
(a) Fill in the “?”
(b) Find P (X > 3)
(c) Find the CDF of X (make a table).
50 CHAPTER 3. DISCRETE PROBABILITY DISTRIBUTIONS
3.6.
Out of 5 components, 3 are domestic and 2 are imported. 3 components
are selected at random (without replacement). Calculate the PMF for X =
number of domestic components picked (make a table).
3.7.
The CDF of a discrete random variable X is shown in the plot below.
CDF
1.0
0.8
0.6
F(x)
0.4
0.2
0.0
−2 −1 0 1 2 3 4
V (X) = σ 2 = E (X − µ)2 = E (X 2 ) − µ2 ,
where X
E (X 2 ) = x2 p(x).
x
The variance defines the average (or expected) value of the squared dif-
ference from the mean.
If we use V (X) = E (X − µ)2 as a definition, we can see that
V (X) = E (X−µ)2 = E (X 2 −2µX+µ2 ) = E (X 2 )−2µE (X)+µ2 = E (X 2 )−µ2
52 CHAPTER 3. DISCRETE PROBABILITY DISTRIBUTIONS
The mean describes the center of the probability distribution, while stan-
dard deviation describes the spread. Larger values of σ signify a distribution
with larger variation. This will be undesirable in some situations, e.g. indus-
trial process control, where we would like the manufactured items to have
identical characteristics. On the other hand, a degenerate random variable
X that has P (X = a) = 1 for some value of a is not random at all, and it
has the standard deviation of 0.
Example 3.4.
The number of fire emergencies at a rural county in a week, has the following
x 0 1 2 3 4
distribution
P (X = x) 0.52 0.28 0.14 0.04 0.02
a) E (aX + b) = aE (X) + b
b) V (aX + b) = a2 V (X) = a2 σ 2
c) σaX+b = |a| σ.
Example 3.5.
Let X be a random variable having probability mass function given in Ex-
ample 3.4. Calculate the mean and variance of g(X) = 4X + 3.
Let X be a random variable with mean µ and a variance σ 2 . Then for any
positive k,
1
P (|X − µ| ≥ kσ) ≤ 2
k
Exercises
3.8.
Timmy is selling cholocates door to door. The probability distribution of X,
the number of cholocates he sells in each house, is given by
x 0 1 2 3 4
P (X = x) 0.45 0.25 0.15 0.1 0.05
Find the expected value and standard deviation of X.
3.9.
In the previous exercise, suppose that Timmy earns 50 cents for school from
each purchase. Find the expected value and standard deviation of his earn-
ings per house.
3.10.
A dollar coin, a quarter, a nickel and a dime are tossed. I get to pocket all
the coins that came up heads. What are my expected winnings?
3.11.
Consider X with the distribution of a random digit, p(x) = 1/10, x =
0, 1, 2, ..., 9
3.12.
In the Numbers game, two players choose a random number between 1 and
6, and compute the absolute difference.
That is, if Player 1 gets the number Y1 , and Player 2 gets Y2 , then they find
X = |Y1 − Y2 |
3.13.
According to ScanUS.com, the number of cars per household in an Albu-
querque neighborhood was distributed as follows
x 0 1 2 3+
P (X = x) 0.047 0.344 0.402 0.207
3+ really means 3 or more, but let’s assume that there are no more than 3
cars in any household.
Find the expected value and standard deviation of X.
3.14.
For the above Problem, the web site really reported the average of 1.9 cars per
household. This is higher than the answer for the Problem 3.13. Probably,
it’s due to the fact that we limited the number of cars by 3.
Suppose we limit the number of cars by 4. This means the distribution
x 0 1 2 3 4
will look like
p(x) 0.047 0.344 0.402 p3 p4
where p3 + p4 = 0.207. Assuming that E (X) = 1.9, reverse-engineer this
information to find p3 and p4 .
3.15.
The frequencies of electromagnetic waves in the upper ionosphere observed
in the vicinity of earthquakes have the mean 1.7 kHz, and standard deviation
of 0.2 kHz. According to Chebyshev inequality,
3.16.
Find the mean and variance of the given PMF p(x) = 1/k, where x =
1, 2, 3, ..., k.
3.3. BERNOULLI DISTRIBUTION 57
3.17.
Show that the function defined by p(x) = 2−x for x = 1, 2, 3, ... can represent
a probability mass function of a random variable X. Find the mean and the
variance of X.
3.18.
For t > 0 show that p(x) = e−t (1 − e−t )x−1 , x = 1, 2, 3, ... can represent a
probability mass function. Also, find E (X) and V (X).
This situation applies to many random processes with just two possible out-
comes: a heads-or-tails coin toss, a made or missed free throw in basketball
etc1 . We arbitrarily call one of these outcomes a success and the other a
failure.
1
However, we have to make sure that the probability of success remains constant. Thus,
for example, wins or losses in a series of football games may not be a Bernoulli experiment!
3.4. BINOMIAL DISTRIBUTION 59
Assume that each Bernoulli trial can result in a success with probability p
and a failure with probability q = 1 − p. Then the probability distribution of
the binomial random variable X, the number of successes in n independent
trials, is
n k n−k
P (X = k) = p q , k = 0, 1, 2, . . . , n.
k
The mean and variance of the binomial distribution are
We can notice that the mean and variance of the Binomial are n times larger
than those of the Bernoulli random variable.
0.20
0.00 0.02 0.04 0.06 0.08 0.10
0.15
0.10
0.05
0.00
15 20 25 30 35 40 45 50 0 5 10 15
Figure 3.2: Binomial PMF: left, with n = 60, p = 0.6; right, with n = 15,
p = 0.5
Example 3.8.
The probability that a certain kind of component will survive a shock test is
0.75. Find the probability that
0.25
0.30
0.20
0.15
0.20
p(x)
0.10
0.10
0.05
0.00
0.00
0 5 10 15 0 5 10 15
Figure 3.3: Binomial PMF: left, with n = 15, p = 0.1; right, with n = 15,
p = 0.8
Example 3.9.
It has been claimed that in 60% of all solar heating installations the utility
bill is reduced by at least one-third. Accordingly, what are the probabilities
that the utility bill will be reduced by at least one-third in
(a) four of five installations;
(b) at least four of five installations?
3.4. BINOMIAL DISTRIBUTION 61
Solution.
5
(a) P (X = 4) = (0.60)4 (0.4)5−4 = 5(0.1296)(0.4) = 0.2592
4
5
(b) P (X = 5) = (0.60)5 (0.40)5−5 = 0.605 = 0.0777
5
Hence, P (reduction for at least four) = P (X ≥ 4) = 0.2592 + 0.0777 =
0.3369
Exercises
3.20.
There’s 50% chance that a mutual fund return on any given year will beat the
industry’s average. What proportion of funds will beat the industry average
for at least 4 out of 5 last years?
3.21.
Biologists would like to catch Costa Rican glass frogs for breeding. There is
75% probability that a glass frog they catch is male. If 10 glass frogs of a
certain species are caught, what are the chances that they will have at least
2 male and 2 female frogs? What is the expected value of the number of
female frogs caught?
3.22.
A 5-member focus group are testing a new game console. Suppose that there’s
50% chance that any given group member approves of the new console, and
their opinions are independent of each other.
a) Calculate and fill out the probability distribution for X = number of
group members who approve of the new console.
b) Calculate P (X ≥ 3).
c) How does your answer in part (b) change when there’s 70% chance that
any group member approves of the new console?
3.23.
Suppose that the four engines of a commercial airplane were arranged to
operate independently and that the probability of in-flight failure of a single
engine is 0.01. Find:
62 CHAPTER 3. DISCRETE PROBABILITY DISTRIBUTIONS
g(y; p) := P (Y = y) = (1 − p)y−1 p, y = 1, 2, 3, . . .
Its CDF is
F (y) = 1 − q y , y = 1, 2, 3, . . . , q =1−p
Its mean and variance are
1 1−p
µ= and σ2 =
p p2
Proof. To achieve the first success on yth trial means to have the first y −
1 trials to result in failures, and the last yth one a success, and then by
independence of trials,
P (F F...F S) = q y−1 p
Now the CDF
F (y) = P (Y ≤ y) = 1 − P (Y > y)
The latter means that all the trials up to and including the yth one, resulted
in failures, which equals P (y failures in a row) = q y and we get the CDF
subtracting this from 1.
The mean E (Y ) can be found by differentiating a geometric series:
∞
X ∞
X ∞
X
y−1
E (Y ) = yp(y) = yp(1 − p) =p y(1 − p)y−1 =
i=1 i=1 i=1
∞ ∞
X d y d X y d 2 3
=p q =p q =p (1 + q + q + q + · · · − 1) =
dq dq dq
i=1 i=1
64 CHAPTER 3. DISCRETE PROBABILITY DISTRIBUTIONS
d d p 1
(1 − q)−1 − (1) =
=p = .
dq dq (1 − q)2 p
The variance can be calculated by differentiating a geometric series twice:
∞ ∞
X
y−1
X d2 y
E {Y (Y − 1)} = pq = pq (q ) =
dq 2
i=1 i=1
d2 2 2q
= pq (1 − q)−1 = pq = 2
dq 2 (1 − q)3 p
2q 1 2q 1 1 q
Hence E (Y 2 ) = + and V (Y ) = + − 2 = 2
p2 p p2 p p p
0.5
0.5
0.4
0.4
0.3
0.3
p(x)
p(x)
0.2
0.2
0.1
0.1
0.0
0.0
0 5 10 15 20 0 5 10 15 20
x x
Figure 3.4: Geometric PMF: left, with p = 0.2; right, with p = 0.5
Example 3.10.
For a certain manufacturing process it is known that, on the average, 1 in
every 100 items is defective. What is the probability that the first defective
item found is the fifth item inspected? What is the average number of items
that should be sampled before the first defective is found?
Solution. Using the geometric distribution with x = 5 and p = 0.01, we have
g(5; 0.01) = (0.01)(0.99)4 = 0.0096.
Mean number of items needed is µ = 1/p = 100.
3.5. GEOMETRIC DISTRIBUTION 65
Example 3.11.
If the probability is 0.20 that a burglar will get caught on any given job,
what is the probability that he will get caught no later than on his fourth
job?
Solution. Substituting y = 4 and p = 0.20 into the geometric CDF, we get
P (Y ≤ 4) = 1 − 0.84 = 0.5904
Exercises
3.30.
The probability to be caught while running a red light is estimated as 0.1.
What is the probability that a person is first caught on his 10th attempt to
run a red light? What is the probability that a person runs a red light at
least 10 times without being caught?
3.31.
A computing center is interviewing people until they find a qualified person
to fill a vacant position. The probability that any single applicant is qualified
is 0.15.
a) Find the expected number of people to interview.
b) Find the probability the center will need to interview between 4 and 8
people (inclusive).
3.32.
If probability of success is 0.01, how many trials are necessary so that prob-
ability of at least one success is greater than 0.5?
3.33.
From past experience it is known that 3% of accounts in a large accounting
population are in error. What is the probability that the first account in
error is found on the 5th try? What is the probability that the first account
in error occurs in the first five accounts audited?
3.34.
A rat must choose between five doors, one of which contains chocolate. If the
rat chooses the wrong door, it is returned to the starting point and chooses
again (randomly), and continues until it gets the chocolate. What is the
probability of the rat getting chocolate on the second attempt?
66 CHAPTER 3. DISCRETE PROBABILITY DISTRIBUTIONS
Proof. We have P (Y = y) =
Example 3.12.
In an NBA championship series, the team which wins four games out of seven
will be the winner. Suppose that team A has probability 0.55 of winning over
the team B, and the teams A and B face each other in the championship
games.
(a) What is the probability that team A will win the series in six games?
(b) What is the probability that team A will win the series?
3.6. NEGATIVE BINOMIAL DISTRIBUTION 67
Solution.
(a) nb(6; 4, 0.55) = 53 (0.55)4 (1 − 0.55)6−4 = 0.1853.
Example 3.13.
A pediatrician wishes to recruit 5 couples, each of whom is expecting their
first child, to participate in a new childbirth regimen. She anticipates that
20% of all couples she asks will agree. What is the probability that 15 couples
must be asked before 5 are found who agree to participate?
Solution. Substituting x = 15, p = 0.2, r = 5, we get
14
nb(15; 5, 0.2) = (0.2)5 (0.8)15−5 = 0.034
4
Exercises
3.35.
Biologists catch Costa Rican glass frogs for breeding. There is 75% proba-
bility that a glass frog they catch is male. Biologists would like to have at
least 2 female frogs. What is the expected value of the total number of frogs
caught, until they reach their goal? What is the probability that they will
need exactly 6 frogs to reach their goal?
3.36.
Jim is a high school baseball player. He has 0.25 batting average, meaning
that he makes a hit in 25% of his tries (“at-bats”). What is the probability
that Jim makes his second hit of the season on his sixth at-bat?
3.37.
In the best-of-5 series, Team A has 60% chance to win any single game, and
the outcomes of the games are independent. Find the probability that Team
A will win the series (i.e. will win the majority of the games).
3.38.
For Problem 3.37, find the expected duration of the series (regardless of
which team wins). [Hint: First, fill out the table containing d, p(d) – the
distribution of the duration D. For example, P (D = 3) = P (team A wins in 3) +
P (team B wins in 3)]
68 CHAPTER 3. DISCRETE PROBABILITY DISTRIBUTIONS
e−µ µx
pois(x; µ) := P (X = x) =
x!
where µ is the parameter of the distribution.a
a
Some textbooks use λ for the parameter. We will use λ for the intensity of the Poisson
process, to be discussed later
E (X) = V (X) = µ.
∞
X µx−2
= µ2 e−µ = µ2 e−µ eµ = µ2
(x − 2)!
x=2
0.30
0.20
0.20
p(x)
p(x)
0.10
0.10
0.00
0.00
0 5 10 15 20 0 5 10 15 20
x x
Example 3.14.
During World War II, the Nazis bombed London using V-2 missiles. To
study the locations where missiles fell, the British divided the central area
of London into 576 half-kilometer squares.i The following is the distribution
of counts per square
Number of missiles in Expected (Poisson)
Number of squares
a square Number of squares
0 229 227.5
1 211 211.3
2 93 98.1
3 35 30.4
4 7 7.1
5 and over 1 1.6
Total 576 576.0
Are the counts suggestive of Poisson distribution?
distribution holds, then the expected number of 0 squares (out of 576) will
be
e−0.9288 0.92880
576 × P (X = 0) = 576 × = 227.5
0!
The same way, fill out the rest of the expected counts column. As you can
see, the data match the Poisson model very closely!
Poisson distribution is often mentioned as a distribution of spatial ran-
domness. As a result, British command were able to conclude that the mis-
siles were unguided.
Example 3.15.
During a laboratory experiment, the average number of radioactive particles
passing through a counter in one millisecond is 4. What is the probabil-
ity that 6 particles enter the counter in a given millisecond? What is the
probability of at least 6 particles?
e−4 46
pois(6; 4) = = 0.1042
6!
Example 3.16.
At a certain industrial facility, accidents occur infrequently. It is known that
the probability of an accident on any given day is 0.005 and the accidents
are independent of each other. For a given period of 400 days, what is the
probability that
(a) there will be an accident on only one day?
(b) there are at most two days with an accident?
Solution. Let X be a binomial random variable with n = 400 and p = 0.005.
Thus µ = np = (400)(0.005) = 2. Using the Poisson approximation,
e−2 21
a) P (X = 1) = 1!
= 0.271
e−2 20 −2 21 −2 22
b) P (X ≤ 2) = P (X = 0)+P (X = 1)+P (X = 2) = 0!
+e 1!
+e 2!
= 0.1353 + 0.271 + 0.271 = 0.6766
Exercises
3.39.
Number of cable breakages in a year is known to have Poisson distribution
with µ = 0.32.
a) Find the mean and standard deviation of the number of cable breakages
in a year.
b) According to Chebyshev’s inequality, what is the upper bound for
P (X ≥ 2)?
c) What is the exact probability P (X ≥ 2), based on Poisson model?
3.40.
At a barber shop, expected number of customers per day is 8. What is a
probability that, on a given day, between 5 and 10 customers (inclusive) show
up? At least 8 customers?
72 CHAPTER 3. DISCRETE PROBABILITY DISTRIBUTIONS
3.41.
Poisson distribution can be derived by considering Binomial with n large and
p small. Compare computationally
3.42.
Bolted assemblies on a hull of spacecraft may become loose with probabiity
0.005. There are 96 such assemblies on board. Assuming that assemblies
behave statistically independently, find the probability that there is at most
one loose assembly on board.
3.43.
An airline finds that 5% of the people making reservations on a certain flight
will not show up for the flight. If the airline sells 160 tickets for a flight with
155 seats, what is the probability that the flight ends up overbooked, i.e.
more that 155 people will show up? [Hint: Use the Poisson approximation
for the number of people who will not show up.]
3.44.
A region experiences, on average, 7.5 earthquakes (magnitude 5 or higher),
per year. Assuming Poisson distribution, find the probability that
3.45.
A plumbing company estimates to get the average of 60 service calls per
week. Assuming Poisson distribution, find the probability that, in a given
week
3.8. HYPERGEOMETRIC DISTRIBUTION 73
The mean and variance of the hypergeometric distribution are µ = n Nk and
−n
σ 2 = n Nk 1 − Nk N
N −1
74 CHAPTER 3. DISCRETE PROBABILITY DISTRIBUTIONS
We have already seen such a random variable: see Example 3.2. Here are
some more examples.
Example 3.17.
Lots of 40 components each are called unacceptable if they contain as many
as 3 defectives or more. The procedure for sampling the lot is to select 5
components at random and to reject the lot if a defective is found. What is
the probability that exactly 1 defective is found in the sample if there are 3
defectives in the entire lot?
Solution. Using the above distribution with n = 5, N = 40, k = 3 and x = 1,
we can find the probability of obtaining one defective to be
3 37
1
hg(1; 40, 5, 3) = 4
40 = 0.3011
5
Example 3.18.
A shipment of 20 tape recorders contains 5 that are defective. If 10 of them
are randomly chosen for inspection, what is the probability that 2 of the 10
will be defective?
Solution. Subsituting x = 2, n = 10, k = 5, and N = 20 into the formula, we
get
5 15
10(6435)
2
P (X = 2) = 208 = = 0.348
10
184756
Note that, if we were sampling with replacement, we would have Binomial
distribution (why?) with p = k/N . In fact, if N is much larger than n, then
the difference between Binomial and Hypergeometric distribution becomes
small.
Exercises
3.47.
Out of 10 construction facilities, 4 are in-state and 6 are out of state. Three
facilities are earmarked as test sites for a new technology. What is the prob-
ability that 2 out of 3 are out of state?
3.8. HYPERGEOMETRIC DISTRIBUTION 75
3.48.
A box contains 8 diodes, among them 3 are of new design. If 4 diodes are
picked randomly for a circuit, what is the probability that at least one is of
new design?
3.49.
There are 25 schools in a district, 10 of which are performing below standard.
Five schools are selected at random for an in-depth study. Find:
3.50.
A small division, consisting of 6 women and 4 men, picks “employee of the
month” for 3 months in a row. Suppose that, in fact, a random person is
picked each month. Let X be the number of times a woman was picked.
Calculate the distribution of X (make a table with all possible values), for
the cases
3.51.
A jar contains 50 red marbles and 30 blue marbles. Four marbles were
selected at random. Find the probability to obtain at least 3 red marbles, if
the sampling was
a) without replacement;
b) with replacement.
c) Compare the results.
76 CHAPTER 3. DISCRETE PROBABILITY DISTRIBUTIONS
M (t) = E (etY )
The expected values of powers of random variables are often called mo-
ments. For example, E (Y ) is the first moment of Y , and E (Y 2 ) is the second
moment of Y . When M (t) exists, it is differentiable in a neighborhood of
t = 0, and the derivatives may be taken inside the expectation. Thus,
0 dM (t) d tY d tY
M (t) = = E [e ] = E e = E [Y etY ]
dt dt dt
Now if we set t = 0, we have M 0 (0) = E Y . Going on the second derivative,
M 00 (t) = E [Y 2 etY ]
and hence M 00 (0) = E (Y 2 ). In general, M (k) (0) = E (Y k ) .
Theorem 3.8. Properties of MGF’s
a) Uniqueness: Let X and Y be two random variables with moment
generating functions MX (t) and MY (t), respectively, If MX (t) = MY (t)
for all values of t, in some neighborhood of 0, then X and Y have the
same probability distribution.
Example 3.19.
Evaluate the moment generating function for the geometric distribution
On the right, we have an infinite geometric series with first term qet and the
∞
X qet
t
ratio qe . Its sum is (qet )x = t
. We obtain
x=1
1 − qe
t 1
M (t) = p e
1 − qet
Exercises
3.52.
b) Apply the property (d) of Theorem 3.8 to calculate the MGF of the
Binomial distribution. [Hint: Binomial random variable Y with pa-
rameters n, p can be represented as Y = X1 + X2 + ... + Xn , where X’s
are independent and each has Bernoulli distribution with parameter p.]
78 CHAPTER 3. DISCRETE PROBABILITY DISTRIBUTIONS
3.54.
Apply the property (d) of Theorem 3.8 and Example 3.19 to calculate the
MGF of Negative Binomial distribution.
3.55.
Use the derivatives of MGF to calculate the mean and variance of geometric
distribution.
3.56.
Suppose that MGF of a random variable X was found equal to
1
M (t) =
1 − t2
Using the properties of MGF, find E (X) and E (X 2 ).
3.57. ?
Continuous probability
distributions
79
80 CHAPTER 4. CONTINUOUS PROBABILITY DISTRIBUTIONS
What does this actually mean? Since continuous probability functions are defined
for an infinite number of points over a continuous interval, the probability at a
single point is always zero. Probabilities are measured over intervals, not single
points. That is, the area under the curve between two distinct points defines the
probability for that interval. This means that the height of the probability func-
tion can in fact be greater than one. The property that the integral must equal
one is equivalent to the property for discrete distributions that the sum of all the
probabilities must equal one.
Example 4.1.
Suppose that the error in the reaction temperature, in ◦ C, for a controlled
laboratory experiment is a continuous random variable X having the density
( 2
x
for −1 ≤ x ≤ 2
f (x) = 3
0 elsewhere
(a) Verify condition (b) of Definition 4.1.
(b) Find P (0 < X < 1).
4.1. CONTINUOUS RV AND THEIR PROB DIST 81
R∞ R2 2 3
Solution. (a) f (x)dx = −1 x3 dx = x9 |2−1 =
−∞
8
9
+ 1
9
=1
R1 2 3
(b) P (0 < X < 1) = 0 x3 dx = x9 |10 = 19 .
Example 4.2.
For the density function of Example 4.1, find F (x) and use it to evaluate
P (0 < X < 1).
Solution. For −1 < x < 2, we have
Z x x
x
t2 t3 x3 + 1
Z
F (x) = f (t)dt = dt = = ,
−∞ −1 3 9 −1 9
Therefore,
0
x ≤ −1
x3 +1
F (x) = 9
for −1 < x < 2
1 x ≥ 2.
1
Note that the same relation holds for discrete RV’s but in the continuous case P (a ≤
X ≤ b), P (a < X ≤ b) and P (a < X < b) are all the same. Why?
82 CHAPTER 4. CONTINUOUS PROBABILITY DISTRIBUTIONS
Example 4.3.
The time X in months until failure of a certain product has the PDF
( 2 3
3x
exp − x64 for x > 0
f (x) = 64
0 elsewhere
Example 4.4.
For each of the following functions,
(i) find the constant c so that f (x) is a PDF of a random variable X, and
(ii) find the distribution function F (x).
( 3
x
for 0 < x < c
a) f (x) = 4
0 elsewhere
(
3 2
x for −c < x < c
b) f (x) = 16
0 elsewhere
(
4xc for 0 < x < 1
c) f (x) =
0 elsewhere
(
c
3/4 for 0 < x < 1
d) f (x) = x
0 elsewhere
x4
Answers. a) c = 2 and F (x) = 16
, 0 < x < 2.
x3
b) c = 2 and F (x) = 16
+ 12 , −2 < x < 2.
Example 4.5.
The life length of batteries X (in hundreds of hours) has the density
(
1 − x2
2
e for x > 0
f (x) =
0 elsewhere
Find the probability that the life of a battery of this type is less than 200 or
greater than 400 hours.
Solution. Let A denote the event that X is less than 2, and let B denote the
event that X is greater than 4. Then
Z 2 Z ∞
1 −x 1 −x
P (A ∪ B) = P (A) + P (B) (why?) = e 2 dx + e 2 dx
0 2 4 2
Example 4.6.
Refer to Example 4.5. Find the probability that a battery of this type lasts
more than 300 hours, given than it already has been in use for more than
200 hours.
Solution. We are interested in P (X > 3|X > 2); and by the definition of
conditional probability,
because the intersection of the events (X > 3) and (X > 2) is the event
(X > 3). Now
Z ∞
1 −x/2
e dx 3
P (X > 3) 2 e− 2 1
=Z ∞3
= −1 = e− 2 = 0.606
P (X > 2) 1 −x/2 e
e dx
2 2
84 CHAPTER 4. CONTINUOUS PROBABILITY DISTRIBUTIONS
Exercises
4.1.
The lifetime of a vacuum cleaner, in years, is described by
x/4
for 0 < x < 2
f (x) = (4 − x)/4 for 2 ≤ x < 4
0 elsewhere
4.2.
The proportion of warehouse items claimed within 1 month is given by a
random variable X with density
(
c(x + 1) for 0 < x < 1
f (x) =
0 elsewhere
4.3.
The demand for an antibiotic from a local pharmacy is given by a random
variable X with CDF
(
2500
1 − (x+50) 2 for x > 0
F (x) =
0 elsewhere
4.4.
The waiting time, in minutes, between customers coming into a store is a
continuous random variable with CDF
(
0 for x < 0
F (x) =
1 − exp (−x/2) for x ≥ 0
Find the probability of waiting less than 1.5 minutes between successive
customers
a) using the cumulative distribution of X;
b) using the probability density function of X (first, you have to find it).
4.5.
A continuous random variable X that has a density function given by
(
1
for −1 < x < 4
f (x) = 5
0 elsewhere
Example 4.7.
Suppose that X has density function given by
(
3x2 for 0 ≤ x ≤ 1
f (x) =
0 elsewhere
(a) Find the mean and variance of X
(b) Find mean and variance of u(X) = 4X + 3.
Solution. (a) From the above definitions,
Z ∞ Z 1 1 1
x4
Z
2 3 3
E (X) = x f (x)dx = x (3x )dx = 3x dx = 3 = = 0.75
−∞ 0 0 4 4
Z 1 Z 1 0
5 1
x 3
Now, E (X 2 ) = x2 (3x2 )dx = 3x4 dx = 3 = = 0.6
0 0 5 0 5
Hence, σ 2 = E (X 2 ) − µ2 = 0.6 − (0.75)2 = 0.6 − 0.5625 = 0.0375
Probability
d
f (x) = P (X ≤ x) = F 0 (x)
dx
p(x) = P (X = x)
P (X = x) is 0 for any x
F (x) = P (X ≤ x) ●
P (a < X ≤ b) = ●
●
● ●
= F (b) − F (a)
Mean Z
P
E (X) = µX xp(x) xf (x) dx
Mean of a function Z
P
E g(X) g(x)p(x) g(x)f (x) dx
Variance Z
2 2 2
= E (X ) − µ (x − µ)2 p(x) (x − µ)2 f (x) dx
P
σX
88 CHAPTER 4. CONTINUOUS PROBABILITY DISTRIBUTIONS
Exercises
4.6.
For the density described in Exercise 4.2, find the mean and standard devi-
ation of X.
4.7.
For a random variable X with the density
2√1 x for 0 < x < 1
f (x) =
0 elsewhere
0.30
0.8
0.20
f(x)
f(x)
0.4
0.10
0.00
0.0
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
x x
(b) Z 4
1 1
P (X ≥ 3) = dx = .
3 4 4
Example 4.9.
The failure of a circuit board interrupts work by a computing system until
a new board is delivered. Delivery time X is uniformly distributed over the
interval of at least one but no more than four days. The cost C of this failure
and interruption consists of a fixed cost C0 for the new part and a cost that
increases proportionally to X 2 , so that
C = C 0 + C1 X 2
(a) Find the probability that the delivery time is two or more days.
(b) Find the expected cost of a single failure, in terms of C0 and C1 .
4.3. UNIFORM DISTRIBUTION 91
(
Solution. a) 1
for 1 ≤ x ≤ 5
4
f (x) =
0 elsewhere
Thus, Z 5
1 1 3
P (X ≥ 2) = dx = (5 − 2) =
2 4 4 4
b) We know that
E (C) = C0 + C1 E (X 2 )
so it remains for us to find E (X 2 ). This value could be found directly from
the definition or by using the variance and the fact that E (X 2 ) = V (X)+µ2 .
Using the latter approach, we find
2 2
(b − a)2 (5 − 1)2
2 a+b 1+5 31
E (X ) = + = + =
12 2 12 2 3
Thus, E (C) = C0 + C1 31
3
.
Exercises
4.12.
For a digital measuring device, rounding errors have Uniform distribution,
between −0.05 and 0.05 mm.
a) Find the probability that the rounding error is between −0.01 and
0.03mm
b) Find the expected value and the standard deviation of the rounding
error.
c) Calculate and plot the CDF of the rounding errors.
4.13.
The capacitances of “1mF” (microfarad) capacitors are, in fact, Uniform[0.95, 1.05]
mF.
4.14.
For X having a Uniform[−1, 4] distribution, find the mean and variance.
Then, use the formula for variance and a little algebra to find E (X 2 ).
4.15.
Suppose the radii of spheres R have a uniform distribution on [2, 3]. Find
the mean volume. (V = 43 π R3 ). Find the mean surface area. (A = 4π R2 ).
µ = β and σ 2 = β 2 .
The distribution function for the exponential distribution has the simple
form: Z t
1 − βx t
F (t) = P (X ≤ t) = e dx = 1 − e− β for t ≥ 0
0 β
f (t)
r(t) = , t>0 (4.2)
1 − F (t)
Suppose that X, with density f , is a lifetime of an item. Consider the proportion
of items currently alive (at the time t) that will fail in the next time interval
(t, t + ∆t], where ∆t is small. Thus, by the conditional probability formula,
Note that the failure rate λ = β1 of an item with exponential lifetime does not
depend on the item’s age. This is known as the memoryless property of exponential
distribution. The exponential distribution is the only continuous distribution to
have a constant failure rate.
In reliability studies, the mean of a positive-valued distribution, is also called
Mean Time To Fail or MTTF. So, we have exponential MTTF = β.
e−λt (λt)0
pois(0, λt) = = e−λt .
0!
Thus, if the time of first failure is denoted X, then
P (X ≤ t) = 1 − P (X > t) = 1 − e−λt
We see that P (X ≤ t) = F (t), the CDF for X, has the form of an exponential
CDF. Here, λ = β1 is again the failure rate. Upon differentiating, we see that
the density of X is given by
Example 4.10.
A downtime due to equipment failure is estimated to have Exponential dis-
tribution with the mean β = 6 hours. What is the probability that the next
downtime will last between 5 and 10 hours?
Solution. P (5 < X < 10) =
= F (10) − F (5) = 1 − exp(−10/6) − [1 − exp(−5/6)] = 0.2457
Example 4.11.
The number of calls to the call center has Poisson distribution with parameter
λ = 4 calls per minute. What is the probability that we have to wait more
than 20 seconds for the next call?
Solution. The waiting time between calls, X, has exponential distribution
4
with parameter β = λ1 = 14 . Then, P (X > 13 ) = 1 − F ( 13 ) = e− 3 = 0.2636
Exercises
4.16.
Prove another version of the memoryless property of the exponential distri-
bution,
P (X > t + s | X > t) = P (X > s).
Thus, an item that is t years old has the same probabilistic properties as a
brand-new item. [Hint: Use the definition of conditional probability and the
expression for exponential CDF.]
4.17.
The 1-hour carbon monoxide concentrations in a big city are found to have
an exponential distribution with a mean of 3.6 parts per million (ppm).
(a) Find the probability that a concentration will exceed 9 ppm.
(b) A traffic control policy is trying to reduce the average concentration.
Find the new target mean β so that the probability in part (a) will
equal 0.01
(c) The median of probability distribution is defined as solution m to the
equation (F is the CDF)
F (m) = 0.5
Find the median of the concentrations from part (a).
4.5. THE GAMMA DISTRIBUTION 95
4.18.
Customers come to a barber shop as a Poisson process with the frequency of
3 per hour. Suppose Y1 is the time when first customer comes.
a) Find the expected value and the standard deviation of Y1
b) Find the probability that the store is idle for at least first 30 minutes
after opening.
µ = αβ and σ 2 = αβ 2 .
1.5 α = 0.5
α=1
α=2
α=5
1.0
f(x)
0.5
0.0
0 2 4 6 8 10
● ● ● ●● ● ● ● ● ●● ● ● ●● ● ●● ●●
time
The Poisson process was first studied2 in 1900’s when modeling the obser-
vation times of radioactive particles recorded by Geiger counter. It consists
of the consecutive event times Y1 , Y2 ,... such that the interarrival times
X1 = Y1 , X2 = Y2 − Y1 ,... have independent Exponential distributions. (The
observations start at the time t = 0.)
From the property (c) above, the kth event time has Gamma distribution
with α = k. As in Section 4.4, the average number of particles to appear
during [0, t) has Poisson distribution with the mean µ = λt where the rate
λ = 1/β.
The same way, the number of events on any given interval of time, say,
(t1 , t2 ] follows the Poisson distribution with the mean µ = λ(t2 − t1 ). Thus,
the expected number of events to be observed equals the intensity times the
length of the observation period.
Note the units: if the rate λ is measured in events per hour (say), that
is, the unit is hours−1 , then the mean time between events is measured in
hours.
The Gamma CDF (for integer α) can be derived using this relationship.
Suppose Yk is the time to wait for kth event. Then it is Gamma (α = k,
β) random variable. On one hand, the probability that this event happens
before time t is the CDF F (t). On the other hand, this will happen if and
only if there is a total of at least k events on the interval [0, t]:
2
not by Poisson!
98 CHAPTER 4. CONTINUOUS PROBABILITY DISTRIBUTIONS
-
0 Y1 Y2 t Y3
Figure 4.4: Illustration of the principle “Yk ≤ t if and only if N (t) ≥ k”, here
k = 2.
Here, N (t) is the number of events on the [0, t] interval. According to Poisson
process, N (t) has Poisson distribution with the mean µ = λt = t/β. Thus,
k−1
X (t/β)i
P (Yk ≤ t) = P (N ≥ k) = 1 − P (N < k) = 1 − e−t/β (4.4)
i=0
i!
Example 4.13.
For the situation in Example 4.12, find the probability that the total monthly
rainfall exceeds 5 inches.
Solution. P (Y > 5) = 1 − F (5) = 1 − (1 − P (N < k)) = P (N < k) where
k = α = 2. Equation 4.4 yields P (Y > 5) = e−5/1.6 (1 + 5/1.6) = 0.181
Exercises
4.19.
Customers come to a barber shop with the frequency of 3 per hour. Suppose
Y4 is the time when 4th customer has come.
4.20.
Differentiate Equation 4.4 for k = 2 to show that you indeed will get the
Gamma density function with α = 2.
4.21.
A truck has 2 spare tires. Under intense driving conditions, tire blowouts are
determined to approximately follow a Poisson process with the intensity of
1.2 per 100 miles. Let X be the total distance the truck can go with 2 spare
tires.
Definition 4.8.
1.0
µ = − 1, σ = 1
0.8 µ = 0, σ = 1
µ = 2, σ = 3
µ = 5, σ = 0.5
0.6
f(x)
0.4
0.2
0.0
−2 0 2 4 6 8 10
Usefulness of Z
We are able to transform the observations of any normal random variable X
to a new set of observations of a standard normal random variable Z. This
can be done by means of the transformation
X −µ
Z= .
σ
The values of the CDF of Z can be obtained from Table A. Namely,
0.5 + TA(z), z ≥ 0
F (z) =
0.5 − TA(|z|), z < 0
where TA(z) = P (0 < Z < z) denotes table area of z. The second equation
follows from the symmetry of the Z distribution.
Table A allows us to calculate probabilities and percentiles associated
with normal random variables, as the direct integration of normal density is
not possible.
4.6. NORMAL DISTRIBUTION 101
Example 4.14.
If Z denotes a standard normal variable, find
(a) P (Z ≤ 1) (b) P (Z > 1) (c) P (Z < −1.5) (d) P (−1.5 ≤ Z ≤ 0.5).
(e) Find a number, say z0 , such that P (0 ≤ Z ≤ z0 ) = 0.49
Solution. This example provides practice in using Normal probability Table.
We see that
a) P (Z ≤ 1) = P (Z ≤ 0) + P (0 ≤ Z ≤ 1) = 0.5 + 0.3413 = 0.8413.
b) P (Z > 1) = 0.5 − P (0 ≤ Z ≤ 1) = 0.5 − 0.3413 = 0.1587
c) P (Z < −1.5) = P (Z > 1.5) = 0.5 − P (0 ≤ Z ≤ 1.5) = 0.5 − 0.4332 =
0.0668.
d) P (−1.5 ≤ Z ≤ 0.5) = P (−1.5 ≤ Z ≤ 0) + P (0 ≤ Z ≤ 0.5)
= P (0 ≤ Z ≤ 1.5) + P (0 ≤ Z ≤ 0.5) = 0.4332 + 0.1915 = 0.6247.
e) To find the value of z0 we must look for the given probability of 0.49
on the area side of Normal probability Table. The closest we can come
is at 0.4901, which corresponds to a Z value of 2.33. Hence z0 = 2.33.
0.4
0.2
0.0
−3 −2 −1 0 1 2 3
Example 4.15.
For X ∼ N (50, 102 ), find the probability that X is between 45 and 62.
Solution. The Z- values corresponding to X = 45 and X = 62 are
45 − 50 62 − 50
Z1 = = −0.5 and Z2 = = 1.2.
10 10
Therefore, P (45 ≤ X ≤ 62) = P (−0.5 ≤ Z ≤ 1.2) = TA(1.2) + TA(0.5) =
0.3849 + 0.1915 = 0.5764
102 CHAPTER 4. CONTINUOUS PROBABILITY DISTRIBUTIONS
z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09
.0 .0000 .0040 .0080 .0120 .0160 .0199 .0239 .0279 .0319 .0359
.1 .0398 .0438 .0478 .0517 .0557 .0596 .0636 .0675 .0714 .0753
.2 .0793 .0832 .0871 .0910 .0948 .0987 .1026 .1064 .1103 .1141
.3 .1179 .1217 .1255 .1293 .1331 .1368 .1406 .1443 .1480 .1517
.4 .1554 .1591 .1628 .1664 .1700 .1736 .1772 .1808 .1844 .1879
.5 .1915 .1950 .1985 .2019 .2054 .2088 .2123 .2157 .2190 .2224
.6 .2257 .2291 .2324 .2357 .2389 .2422 .2454 .2486 .2517 .2549
.7 .2580 .2611 .2642 .2673 .2704 .2734 .2764 .2794 .2823 .2852
.8 .2881 .2910 .2939 .2967 .2995 .3023 .3051 .3078 .3106 .3133
.9 .3159 .3186 .3212 .3238 .3264 .3289 .3315 .3340 .3365 .3389
1.0 .3413 .3438 .3461 .3485 .3508 .3531 .3554 .3577 .3599 .3621
1.1 .3643 .3665 .3686 .3708 .3729 .3749 .3770 .3790 .3810 .3830
1.2 .3849 .3869 .3888 .3907 .3925 .3944 .3962 .3980 .3997 .4015
1.3 .4032 .4049 .4066 .4082 .4099 .4115 .4131 .4147 .4162 .4177
1.4 .4192 .4207 .4222 .4236 .4251 .4265 .4279 .4292 .4306 .4319
1.5 .4332 .4345 .4357 .4370 .4382 .4394 .4406 .4418 .4429 .4441
1.6 .4452 .4463 .4474 .4484 .4495 .4505 .4515 .4525 .4535 .4545
1.7 .4554 .4564 .4573 .4582 .4591 .4599 .4608 .4616 .4625 .4633
1.8 .4641 .4649 .4656 .4664 .4671 .4678 .4686 .4693 .4699 .4706
1.9 .4713 .4719 .4726 .4732 .4738 .4744 .4750 .4756 .4761 .4767
2.0 .4772 .4778 .4783 .4788 .4793 .4798 .4803 .4808 .4812 .4817
2.1 .4821 .4826 .4830 .4834 .4838 .4842 .4846 .4850 .4854 .4857
2.2 .4861 .4864 .4868 .4871 .4875 .4878 .4881 .4884 .4887 .4890
2.3 .4893 .4896 .4898 .4901 .4904 .4906 .4909 .4911 .4913 .4916
2.4 .4918 .4920 .4922 .4925 .4927 .4929 .4931 .4932 .4934 .4936
2.5 .4938 .4940 .4941 .4943 .4945 .4946 .4948 .4949 .4951 .4952
2.6 .4953 .4955 .4956 .4957 .4959 .4960 .4961 .4962 .4963 .4964
2.7 .4965 .4966 .4967 .4968 .4969 .4970 .4971 .4972 .4973 .4974
2.8 .4974 .4975 .4976 .4977 .4977 .4978 .4979 .4979 .4980 .4981
2.9 .4981 .4982 .4982 .4983 .4984 .4984 .4985 .4985 .4986 .4986
3.0 .4987 .4987 .4987 .4988 .4988 .4989 .4989 .4989 .4990 .4990
4.6. NORMAL DISTRIBUTION 103
Example 4.16.
Given a random variable X having a normal distribution with µ = 300 and
σ = 50, find the probability that X is greater than 362.
Solution. To find P (X > 362), we need to evaluate the area under the normal
curve to the right of x = 362. This can be done by transforming x = 362 to
the corresponding Z-value. We get
x−µ 362 − 300
z= = = 1.24
σ 50
Hence P (X > 362) = P (Z > 1.24) = P (Z < −1.24) = 0.5 − TA(1.24) =
0.1075.
Example 4.17.
A diameter X of a shaft produced has a normal distribution with parameters
µ = 1.005, σ = 0.01. The shaft will meet specifications if its diameter is
between 0.98 and 1.02 cm. Which percent of shafts will not meet specifica-
tions?
Solution.
0.98 − 1.005 1.02 − 1.005
1 − P (0.98 < X < 1.02) = 1 − P <Z<
0.01 0.01
A pth percentile of a random variable X is the point q that leaves the area of
p/100% to the left. That is, q is the solution for the equation
P (X ≤ q) = p/100%
For example, the median (introduced in Exercise 4.17) is the 50th percentile
of a probability distribution.
We will discuss how to find percentiles of normal distribution. The pre-
vious two examples were solved by going first from a value of x to a z-value
104 CHAPTER 4. CONTINUOUS PROBABILITY DISTRIBUTIONS
and then computing the desired area. In the next example we reverse the
process and begin with a known area, find the z-value, and then determine
x by rearranging the equation z = x−µ
σ
to give
x = µ + σz
Using the Normal Table calculations, it’s straightforward to show the follow-
ing
The famous 68% - 95% rule
For a Normal population, 68% of all values lie in the interval [µ − σ, µ + σ],
and 95% lie in [µ − 2σ, µ + 2σ].
In addition, 99.7% of the population lies in [µ − 3σ, µ + 3σ].
Example 4.18.
Using the situation in Example 4.17, a diameter X of a shaft had µ =
1.005, σ = 0.01. Give an interval that would contain 95% of all diameters.
Solution. The interval is µ ± 2σ = 1.005 ± 2(0.01), that is, from 0.985 to
1.025.
Example 4.19.
The SAT Math exam is scaled to have the average of 500 points, and the
standard deviation of 100 points. What is the cutoff score for top 10% of the
SAT takers?
Solution. In this example we begin with a known area, find the z-value, and
then find x from the formula x = µ + σz. The 90th percentile corresponds to
the 90% area under the normal curve to the left of x. Thus, we also require
a z-value that leaves 0.9 area to the left and hence, the Table Area of 0.4.
From Table A, P (0 < Z < 1.28) = 0.3997. Hence
Example 4.20.
Let X = monthly sick leave time have normal distribution with parameters
µ = 200 hours and σ = 20 hours.
4.6. NORMAL DISTRIBUTION 105
a) What percentage of months will have sick leave below 150 hours?
b) What amount of time x0 should be budgeted for sick leave so that the
budget will not be exceeded with 80% probability?
X = µ + σZ
Example 4.21.
Suppose X is Binomial with parameters n = 15, and p = 0.4, then µ = np =
(15)(4) = 6 and σ 2 = npq = 15(0.4)(0.6) = 3.6. Suppose we are interested
in the probability that X assumes a value from 7 to 9 inclusive, that is,
P (7 ≤ X ≤ 9). The exact probability is given by
9
X
P (7 ≤ X ≤ 9) = bin(x; 15, 0.4) = 0.1771 + 0.1181 + 0.0612 = 0.3564
7
For Normal approximation we find the area between x1 = 6.5 and x2 = 9.5
using z-values which are
x1 − np x1 − µ 6.5 − 6
z1 = √ = = = 0.26,
npq σ 1.897
and
9.5 − 6
z2 = = 1.85
1.897
Adding or removing 0.5 is called continuity correction. It arises when we try
to approximate a distribution with integer values (here, Binomial) through
the use of a continuous distribution (here, Normal). Shown in Fig.4.7, the
sum over the discrete set {7 ≤ X ≤ 9} is approximated by the integral of
the continuous density from 6.5 to 9.5.
0.20
0.15
0.10
0.05
0.00
5 6 7 8 9 10 11
Now,
therefore, the normal approximation provides a value that agrees very closely
with the exact value of 0.3564. The degree of accuracy depends on both n
and p. The approximation is very good when n is large and if p is not too
near 0 or 1.
Example 4.22.
The probability that a patient recovers from a rare blood disease is 0.4. If
100 people are known to have contracted this disease, what is the probability
that at most 30 survive?
Solution. Let the binomial variable X represent the number of patients that
survive. Since n = 100 and p = 0.4, we have
µ = np = (100)(0.4) = 40
and
σ 2 = npq = (100)(0.4)(0.6) = 24,
√
also σ = σ 2 = 4.899. To obtain the desired probability, we compute z-value
for x = 30.5. Thus,
x−µ 30.5 − 40
z= = = −1.94,
σ 4.899
and the probability of fewer than 30 of the 100 patients surviving is P (X <
30) ≈ P (Z < −1.94) = 0.5 − 0.4738 = 0.0262.
Example 4.23.
A fair coin (p = 0.5) is tossed 10,000 times, and the number of Heads X is
recorded. What are the values that contain X with 95% certainty?
Solution. p
We have µ = np = 10, 000(0.5) = 5, 000 and σ = 10, 000(0.5)(1 − 0.5) =
50. We need to find x1 and x2 so that P (x1 ≤ X ≤ x2 ). Since the mean of
X is large, we will neglect the continuity correction.
Since we will be working with Normal approximation, let’s find z1 and z2
such that
P (z1 ≤ Z ≤ z2 ) = 0.95
108 CHAPTER 4. CONTINUOUS PROBABILITY DISTRIBUTIONS
The solution is not unique, but we can choose the values of z1,2 that are sym-
metric about 0. This will mean finding z such that P (0 < Z < z) = 0.475.
Using Normal tables “in reverse” we will get z = 1.96. Thus, P (−1.96 <
Z < 1.96) = 0.95.
Next, transforming back into X, use the formula x = µ + σz, so
Thus, with a large likelihood, our Heads count will be within 100 of the
expected value of 5,000.
This is an example of the famous “2 sigma” rule.
Exercises
4.22.
Given a standard normal distribution Z, find
4.23.
Given a normal distribution with µ = 30 and σ = 6, find
4.24.
Given the normally distributed variable X with mean 18 and standard devi-
ation 2.5, find
4.6. NORMAL DISTRIBUTION 109
a) P (X < 15)
b) the value of k such that P (X < k) = 0.2236
c) the value of k such that P (X > k) = 0.1814
d) P (17 < X < 21).
4.25.
A soft drink machine is regulated so that it discharges an average of 200
milliliters (ml) per cup. If the amount of drink is normally distributed with
a standard deviation equal to 15 ml,
a) what fraction of the cups will contain more than 224 ml?
b) what is the probability that a cup contains between 191 and 209 milliliters?
c) how many cups will probably overflow if 230 ml cups are used for the
next 1000 drinks?
d) below what value do we get the smallest 25% of the drinks?
4.26.
A company pays its employees an average wage of $15.90 an hour with a
standard deviation of $1.50. If the wages were approximately normally dis-
tributed and paid to the nearest cent,
4.27.
A solar panel produces, on average, 34.5 kWh (kilowatt-hours) per month,
with standard deviation of 2.5 kWh.
a) Find the probability that the panel output will be between 35 and 38
kWh in a month.
b) Find an interval, symmetric about the mean (that is, [µ − a, µ + a] for
some a), that contains 72% of monthly kWh values.
110 CHAPTER 4. CONTINUOUS PROBABILITY DISTRIBUTIONS
4.28.
The likelihood that a job application will result in an interview is estimated
as 0.1. A grad student has mailed 40 applications. Find the probability that
she will get at least 3 interviews,
a) Using the Normal approximation.
b) Using the Poisson approximation.
c) Find the exact probability. Which approximation has worked better?
Why?
4.29.
It is estimated that 33% of individuals in a population of Atlantic puffins have
a certain recessive gene. If 90 individuals are caught, estimate the probability
that there will be between 30 and 40 (inclusive) with the recessive gene.
The Weibull distribution with γ > 1 typically has an asymmetric shape with
a peak in the middle and the long right “tail”. Shapes of Weibull density are
shown in Fig. 4.8 for various values of γ.
2.0
1.5
γ=1
γ=2
γ=5
f(x)
1.0
0.5
0.0
0 1 2 3 4
x
Figure 4.8: Weibull densities, all with β = 1
Exercises
4.30.
The time it takes for a server to respond to a request is modeled by the
Weibull distribution with γ = 2/3 and β = 15 milliseconds.
4.31.
The lifetime of refrigerators is assumed to follow Weibull distribution with
parameters β = 7 years and γ = 4.
Find:
when the integral exists. For the exponential distribution, this becomes
Z ∞
1
M (t) = etx e−x/β dx = (1 − βt)−1
0 β
For properties of MGF’s, see Section 3.9
Chapter 5
If X and Y are two discrete random variables, the probability that X equals
x while Y equals y is described by p(x, y) = P (X = x, Y = y). That is, the
function p(x, y) describes the probability behavior of the pair X, Y .
113
114 CHAPTER 5. JOINT PROBABILITY DISTRIBUTIONS
c) P (X = x, Y = y) = p(x, y).
X
For any region A in the xy-plane, P [(x, y) belongs to A] = p(x, y).
(x,y)∈A
Example 5.1.
If two dice are rolled independently, then the numbers X and Y on the first
and second die, respectively, will each have marginal PMF p(x) = 1/6 for
x = 1, 2, ..., 6.
The joint PMF is p(x, y) = 1/36, so that p(x) = 6y=1 p(x, y)
P
Example 5.2.
Consider X = person’s age and Y = income. The data are abridged from the
US Current Population Survey.j For the purposes of this example, we replace
the age and income groups by their midpoints. For example, the first row
represents ages 25-34 and the first column represents incomes $0-$10,000.
Y, income
5 20 40 60 85 Total
X, age 30 0.049 0.116 0.084 0.039 0.032 0.320
40 0.042 0.093 0.081 0.045 0.061 0.322
50 0.047 0.102 0.084 0.053 0.072 0.358
Total 0.139 0.310 0.249 0.137 0.165 1.000
Here, the joint PMF is given inside the table and the marginal PMF’s of X
and Y are row and column totals, respectively.
5.1. BIVARIATE AND MARGINAL PROBAB DIST 115
For example, p(30, 60) = 0.039 and pY (40) = 0.084 + 0.081 + 0.084 =
0.249.
For continuous random variables, the PMF’s turn to densities, and summa-
tion to integration.
Definition 5.3. Joint density, marginal densities
When X and Y are continuous random variables, the joint density func-
tion f (x, y) describes the likelihood that the pair (X, Y ) belongs to the neigh-
borhood of the point (x, y). It is visualized as a surface lying above the xy
plane.
Example 5.3.
A certain process for producing an industrial chemical yields a product that
contains two main types of impurities. Suppose that the joint probability
distribution of the impurity concentrations (in mg/l) X and Y is given by
(
2(1 − x) for 0 < x < 1, 0 < y < 1
f (x, y) =
0 elsewhere
3
0.1
2
1
3
0
2 0.3
0.3
1
density
0.25
−1
0.2
0
y
0.2
0.1
−1 0.15
−2
−3 −2 −2
−1
0
x 1 −3
0.05
2
−3
3
−3 −2 −1 0 1 2 3
Figure 5.1: An example of a joint density function. Left: surface plot. Right:
contour plot.
Solution. (b)
Z 0.7 Z 0.5
P (0 < X < 0.5, 0.4 < Y < 0.7) = 2(1 − x)dx dy = 0.225
0.4 0
(c) Z ∞ Z 1
fX (x) = f (x, y)dy = 2(1 − x)dy = 2(1 − x), 0<x<1
−∞ 0
Z ∞ Z 1
and fY (y) = f (x, y)dx = 2(1 − x)dx = 1, 0<y<1
−∞ 0
p(x, y)
p(x | y) = for y such that p(y) > 0
pY (y)
5.2. CONDITIONAL PROBABILITY DISTRIBUTIONS 117
For a pair of continuous RV’s with joint density f (x, y), the conditional
density function of X given Y = y is defined as
f (x, y)
f (x|y) = for y such that fY (y) > 0
fY (y)
f (x, y)
f (y|x) = for x such that fX (x) > 0
fX (x)
P (X = 0, Y = 0)
P (X = 0 | Y = 0) =
P (Y = 0)
Example 5.4.
Using the data from Example 5.2,
Y, income
5 20 40 60 85 Total
X, age 30 0.049 0.116 0.084 0.039 0.032 0.320
40 0.042 0.093 0.081 0.045 0.061 0.322
50 0.047 0.102 0.084 0.053 0.072 0.358
Total 0.139 0.310 0.249 0.137 0.165 1.000
Calculate the conditional PMF of Y given X = 30.
Solution. Conditional PMF of Y given X = 30, will give the distribution of
incomes in that age group. Divide all of the row X = 30 by its marginal and
obtain
Y, income
5 20 40
X, age 30 0.049/0.320 = 0.153 0.116/0.320 = 0.362 0.084/0.320 =0.263
[continued] 60 85 Total
0.039/0.320 = 0.122 0.032/0.32 = 0.1 1
Example 5.5.
The joint density for the random variables (X, Y ), where X is the unit tem-
perature change and Y is the proportion of spectrum shift that a certain
atomic particle produces is
(
10xy 2 for 0 < x < y < 1
f (x, y) =
0 elsewhere
1.0
●● ● ●● ● ●● ● ● ●●●●●●● ●● ● ● ● ● ●●●●● ●●●
● ● ●●● ●●● ●
● ● ●●● ●● ● ●● ● ● ●● ● ●
● ●●● ●●●●●●● ● ●
● ● ● ● ●● ● ● ● ● ● ●
●●
●● ●
● ●●●
●
●● ●●● ●
●● ● ●● ● ● ● ●●●
● ●● ●●●●●
● ●
●● ●●●
● ●●●
● ●●●
●● ● ● ●● ● ●● ● ●●● ●●●●●●●●●●●●
● ● ●●●
●●●
●● ●
● ●● ●●●● ● ● ●●
● ● ●●
●● ●
●●
●
● ● ●● ●● ● ● ●
● ● ●● ● ●● ●
● ● ●
●●●● ● ● ●● ● ● ●●●●●●●● ●●●●●
● ●● ●●●
● ●● ● ● ●●●● ●●●●● ● ●
● ● ●● ● ●
●●● ●● ●● ●● ● ●
● ●
●
● ● ●●●● ●
● ● ● ●
● ●● ●●●●● ● ●
●●●●● ●
● ● ●
●● ● ● ● ● ● ●● ● ● ● ● ●●
●
● ● ●● ● ●
●●● ●
●●●●● ● ● ●● ●●●
● ● ●● ●
● ●●●●●●● ●● ●● ● ●●●● ●
●●●
● ●
● ●●● ● ● ● ● ● ● ●●●●● ● ● ● ●●●● ●●● ● ●
● ●● ●
● ●● ●● ●●●
● ●● ● ● ● ●
● ●●● ●●● ●● ● ●●● ● ● ●
●● ● ●●● ●●
● ●●
● ●● ● ●●●●● ● ●●● ● ●●●
● ● ● ● ● ● ●●●
● ● ●● ●●● ●●●
● ●●● ●●
●● ●●● ● ● ● ● ● ●
● ● ●● ●
● ● ● ● ●●●●● ●●● ● ● ● ● ●● ● ● ●
●● ● ●● ●●●
● ● ●
0.8
●● ● ● ● ● ● ● ● ●● ● ●●●●● ●● ●●● ●
●
● ● ● ●●● ● ● ● ● ●● ● ●
●● ●
● ● ●
● ● ●●●●● ●
●●●●● ● ● ● ●●●●●●● ● ● ●●● ●●
● ● ●● ●●●
●●
● ●
●● ●● ●
●
● ●● ● ●● ●● ● ● ● ●● ●●●
●
● ●
● ● ● ● ● ● ● ●● ●
●
● ●●
●
●
● ● ● ●●●● ●● ● ● ● ●●●● ● ●
● ●● ●● ● ● ● ● ● ●● ●
●●
● ● ● ● ●●●● ● ●● ● ●● ●●
● ●●● ●
●● ●● ●● ● ● ● ●●
●
● ●●● ●
● ● ●● ● ● ●●
● ● ●
● ● ●
● ● ● ● ● ● ● ●● ●●● ●● ●●
● ● ● ● ● ● ● ● ● ●● ●
● ● ●●● ● ● ● ●●
● ● ●●
● ● ●
● ●
●● ● ● ●● ● ● ●
●
0.6
● ● ●
1.0 ● ● ● ●
●
●
●
● ●
●
●● ●●● ●
● ● ● ● ●
●
●● ●●
●
● ●
● ● ●●● ● ●
● ● ●
● ● ● ● ●● ● ●
●● ● ● ● ●
● ● ●
● ● ● ●
y
●
●● ● ●
0.8 ● ●
●
●●●
● ●
●
0.4
● ● ●
8 ●
●
● ●
0.6 ●
6
density
● ● ●●
4
y
0.2
0.4
2 ●
0
0.0 0.2
0.2
0.0
0.4
0.6
x 0.8 0.0
1.0 0.0 0.2 0.4 0.6 0.8 1.0
Figure 5.2: Left: Joint density from Example 5.5, right: a typical sample
from this distribution
(b) Now
f (x, y) 10xy 2 3y 2
f (y|x) = = = , 0<x<y<1
fX (x) (10/3)x(1 − x3 ) (1 − x3 )
and
f (x, y) 10xy 2 2x
f (x|y) = = 4
= 2, 0 < x < y < 1
fY (y) 5y y
For the last one, say, treat y as fixed (given) and x is the variable.
and
f (x, y) = fX (x)fY (y) for continuous case
This definition of independence agrees with our definition for the events,
P (AB) = P (A)P (B). For example, if two dice are rolled independently,
then the numbers X and Y on the first and second die, respectively, will
each have PMF p(x) = 1/6 for x = 1, 2, ..., 6. The joint PMF will then be
p(x, y) = pX (x)pY (y) = (1/6)2 = 1/36.
Example 5.6.
Show that the random variables in Example 5.3 are independent.
Solution. Here,
(
2(1 − x) for 0 < x < 1 and 0 < y < 1
f (x, y) =
0 elsewhere
for 0 < x, y < 1 and 0 elsewhere. Hence, X and Y are independent random
variables.
120 CHAPTER 5. JOINT PROBABILITY DISTRIBUTIONS
Exercises
5.1.
The joint distribution for the number of total sales =X1 and number of elec-
tronic equipment sales =X2 per hour for a wholesale retailer are given below
X2 0 1 2
X1 = 0 0.1 0 0
X1 = 1 0.1 0.2 0
X1 = 2 0.1 ? 0.15
5.2.
X and Y have the following joint density:
(
k for 0 ≤ x ≤ y ≤ 1
f (x, y) =
0 elsewhere
5.3.
A point lands into [0, 1] × [0, 1] square with random coordinates X, Y inde-
pendent, having Uniform[0, 1] distribution each.
a) What is the probability that the distance from the point to the origin
is less than 1, that is, P (X 2 + Y 2 < 1)?
5.4. EXPECTED VALUES OF FUNCTIONS 121
5.4.
The random variables X, Y have joint density f (x, y) = e−(x+y) , x, y > 0
Suppose that the discrete RV’s (X, Y ) have a joint PMF p(x, y). If g(x, y) is
any real-valued function, then
XX
E [g(X, Y )] = g(x, y)p(x, y).
x y
The sum is over all values of (x, y) for which p(x, y) > 0.
If (X, Y ) are continuous random variables, with joint PDF f (x, y), then
Z ∞Z ∞
E [g(X, Y )] = g(x, y) f (x, y)dx dy.
−∞ −∞
3
0.05
− − + +
2
0.15
− − + +
1
0.3
0
+ + 0.25
− −
−1
0.2
+ + − −
−2
0.1
−3
−3 −2 −1 0 1 2 3
Cov(X, Y )
ρ= p
V (X) V (Y )
Properties of correlation:
• The correlation coefficient lies between −1 and +1.
• The correlation coefficient is dimensionless (while covariance has di-
mension of XY).
5.4. EXPECTED VALUES OF FUNCTIONS 123
Example 5.7.
The fraction X of male runners and the fraction Y of female runners who
compete in marathon races is described by the joint density function
(
8xy for 0 ≤ x ≤ 1, 0 ≤ y ≤ x
f (x, y) =
0 elsewhere
and (
4y(1 − y 2 ) for 0 ≤ y ≤ 1
f (y) =
0 elsewhere
From the marginal density functions, we get
Z 1 Z 1
4 4 8
E (X) = 4x dx = and E (Y ) = 4y 2 (1 − y 2 )dy =
0 5 0 15
124 CHAPTER 5. JOINT PROBABILITY DISTRIBUTIONS
Then
4 4 8 4
Cov(X, Y ) = E (XY ) − E (X)E (Y ) = − =
9 5 15 225
Proof. We will show the proof for the continuous case; the discrete case
follows similarly.
For independent X, Y ,
ZZ ZZ
E (XY ) = xy f (x, y)dx dy = xfX (x)yfY (y)dx dy =
Z Z
= xfX (x)dx yfY (x)dy = E (X) E (Y )
Example 5.8.
If X and Y are random variables with variances V (X) = 2, V (Y ) = 4,
and covariance Cov(X, Y ) = −2, find the variance of the random variable
Z = 3X − 4Y + 8.
Solution. By Theorem 5.3,
V (Z) = σZ2 = V (3X − 4Y + 8) = 9V (X) + 16V (Y ) − 24 Cov(X, Y )
so V (Z) = (9)(2) + (16)(4) − 24(−2) = 130.
Example 5.10.
A very important application of Theorem 5.3 is the calculation of variance
of the sample mean
X1 + X2 + ... + Xn Y
X= =
n n
where Xi are independent and identically distributed RV’s (representing a
sample of measurements), and Y denotes the total of all measurements.
Suppose that V (Xi ) = σ 2 for each i. Then
V (Y ) V (X1 ) + V (X2 ) + ... + V (Xn ) nσ 2 σ2
V (X) = = = =
n2 n2 n2 n
√
This means that√σX = σ/ n, that is, the mean of n independent
measurements is n more precise than a single measurement.
Example 5.11.
The error in a single permeability measurement has the standard deviation of
0.01 millidarcies (md). If we made 8 independent measurements, how large
is the error we should expect from their mean?
√ √
Solution. σX = σ/ n = 0.01/ 8 ≈ 0.0035md
Exercises
5.5.
Y 0 1 2
X =0 0.1 0 0
X =1 0.1 0.2 0
X =2 0.1 0.35 0.15
a) Calculate E (X 2 Y ).
b) Calculate E (X/Y ).
5.7.
Using the density in Problem 5.6, find the covariance and correlation between
X and Y .
5.8.
Ten people get into an elevator. Assume that their weights are independent,
with the mean 150 lbs and standard deviation 30 lbs.
a) Find the expected value and the standard deviation of their total
weight.
b) Assuming Normal distribution, find the probability that their combined
weight is less than 1700 pounds.
5.9.
While estimating speed of light in a transparent medium, an individual mea-
surement X is determined to be unbiased (that is, the mean of X equals
the unknown speed of light), but the measurement error, assessed as the
standard deviation of X, equals 35 kilometers per second (km/s).
a) In an experiment, 20 independent measurements of the speed of light
were made. What is the standard deviation of the mean of these mea-
surements?
b) How many measurements should be made so that the error in estimat-
ing the speed of light (measured as σX ) will decrease to 5 km/s?
5.10.
A part is composed of two segments. One segment is produced with the
mean length 4.2cm and standard deviation of 0.1cm, and the second segment
is produced with the mean length 2.5cm and standard deviation of 0.05cm.
Assuming that the production errors are independent, calculate the mean
and standard deviation of the total part length.
5.11.
Random variables X and Y have means 3 and 5, and variances 0.5 and 2,
respectively. Further, the correlation coefficient between X and Y equals
−0.5. Find the mean and variance of W = X − Y .
5.12. ?
Find an example of uncorrelated, but not independent random variables.
[Hint: Two discrete RV’s with 3 values each are enough.]
128 CHAPTER 5. JOINT PROBABILITY DISTRIBUTIONS
P (X = x, Y = 0) 1/6
P (X = x | Y = 0) = = = 1/3
P (Y = 0) 1/2
for x = 1, 2, 3, and
P (X = x | Y = 1) = 1/3 for x = 4, 5, 6.
Thus, E (X | Y = 0) = (1/3)(1 + 2 + 3) = 2 and
E (X | Y = 1) = (1/3)(4 + 5 + 6) = 5
(b) E (X | Y ) is 2 or 5, depending on Y . Each value may happen with prob-
ability 1/2. Thus, P [E (X | Y ) = 2] = 0.5 and P [E (X | Y ) = 5] = 0.5
5.5. CONDITIONAL EXPECTATIONS* 129
Z ∞ Z ∞ Z ∞ Z ∞
= xf (x|y)f (y)dx dy = xf (x|y)dx f (y)dy
−∞ −∞ −∞ −∞
Z ∞
= E (X|Y = y)f (y)dy = E [E (X|Y )]
−∞
Example 5.13.
Suppose that the total weight X of occupants in a car depends on how many
there are, let the number of occupants equal Y , and each occupant weighs
150 lbs on average. Then E (X | Y = y) = 150y. Suppose Y has the following
distribution
y 1 2 3 4
p(y) 0.62 0.28 0.07 0.03
150y 150 300 450 600
Then E (X | Y ) has the distribution with values given in the last row of the
table, and probabilities identical to p(y). We can verify by straightforward
calculation that E (X | Y ) = E (150Y ) = 226.5. Then the Theorem says that
E (X) = 226.5 as well, so we don’t even have to know the distribution of
occupant weights, only its mean (150).
130 CHAPTER 5. JOINT PROBABILITY DISTRIBUTIONS
Exercises
5.13.
For the random variables X and Y from Example 5.12, verify the identity in
part (a) of the Theorem 5.4.
5.14.
Suppose that the number of lobsters caught in a trap follows the distribution
y 0 1 2 3
p(y) 0.5 0.3 0.15 0.05
and the average weight of lobster is 1.7 lbs, with variance 0.25 lbs2 . Find the
expected value and the variance of the total catch in one trap.
Chapter 6
6.1 Introduction
At times we are faced with a situation where we must deal not with the
random variable whose distribution is known but rather with some function
of that random variable. For example, we might know the distribution of
particle sizes, and would like to infer the distribution of particle weights.
In the case of a simple linear function, we have already asserted what
the effect is on the mean and variance. What has been omitted was what
actually happens to the distribution.
We will discuss several methods of obtaining the distribution of Y = g(X)
from known distribution of X. The CDF method and the transformation
method are most frequently used. The CDF method is all-purpose and flex-
ible. The transformation method is typically faster (when it works).
6.1.1 Simulation
One use of these methods is to generate random variables with a given distribution.
This is important in simulation studies. Suppose that we have a complex operation
that involves several components. Suppose that each component is described by a
random variable and that the outcome of the operation depends on the components
in a complicated way. One approach to analyzing such a system is to simulate each
component and calculate the outcome for the simulated values. If we repeat the
simulation many times, then we can get an idea of the probability distribution of
the outcomes.
131
132 CHAPTER 6. FUNCTIONS OF RANDOM VARIABLES
= 1 − e−λy/b = 1 − e−(λ/b)y
Example 6.3.
Show that if X has a uniform distribution on the interval [0, 1] then
Y = − ln(1 − X) has an exponential distribution with mean 1.
Solution. Recall that for the uniform distribution on (0, 1), P (X < x) = x.
Also, note that the range of Y is (0, ∞).
Exercises
6.1.
Show that if X has a uniform distribution on [0, 1], then so does 1 − X.
6.2.
√
Let X have a uniform distribution on [0, 1]. Let Y = X.
a) Find the distribution of Y.
134 CHAPTER 6. FUNCTIONS OF RANDOM VARIABLES
6.3.
Using the CDF method, show that the Weibull random variable Y (with
some parameter γ > 0, and β = 1) can be obtained from Exponential X
(with the mean 1) as Y = X 1/γ .
6.4.
Suppose the radii of spheres have a normal distribution with mean 2.5 and
1
variance 12 . Find the median volume and median surface area.
6.5.
Let X have a uniform distribution on [0, 1]. Show how you could define H(x)
so the Y = H(X) would have a Poisson distribution with mean 1.3.
6.6.
A point lands into [0, 1] × [0, 1] square with random coordinates X, Y inde-
pendent, having Uniform[0, 1] distribution each. Use the CDF method to
find the distribution of U = max(X, Y ).
6.7.
Let X,
√ Y be independent, standard Normal RV’s. Find the distribution of
Z = X 2 + Y 2 . You can interpret this as the distance of the random point
(X, Y ) to the origin. [Hint: Use the polar coordinates.]
Example 6.5.
Let X be a geometric random variable with PMF
x−1
3 1
p(x) = , x = 1, 2, 3, . . .
4 4
Find the distribution of the random variable U = X 2 .
Solution. Since the values of X are all positive, the transformation defines
√a
one to one correspondence between the x and u values, u = x2 and x = u.
Hence,
√u−1
√ 3 1
g(u) = p( u) = , u = 1, 4, 9, ...
4 4
For continuous RV’s, the transformation formula originates from the
change of variable formula for integrals.
Theorem 6.2. Transformations: continuous
Suppose that X is a continuous random variable with density f (x). Let
y = h(x) define a one-to-one transformation that can be uniquely solved for
x, say x = w(y). Then the density of Y = h(X) is
dx
fY (y) = f (x) = f [w(y)] × |J|
dy
Example 6.6.
Let X be a continuous random variable with probability distribution
(
x/12 for 1 ≤ x ≤ 5
f (x) =
0 elsewhere
Find the probability distribution of the random variable Y = 2X − 3.
Solution. The inverse solution of y = 2x − 3 yields x = (y + 3)/2, from which
we obtain J = w0 (y) = dx
dy
= 21 . Therefore, using the above Theorem 6.2, we
find the density function of Y to be
1 y+3 1 y+3
fY (y) = = , −1 < y < 7
12 2 2 48
136 CHAPTER 6. FUNCTIONS OF RANDOM VARIABLES
Example 6.7.
Let X be Uniform[0, 1] random variable. Find the distribution of Y = X 5 .
Solution. Inverting, x = y 1/5 , and dx/dy = (1/5)y −4/5 . Thus, we obtain
fY (y) = 1 × (1/5)y −4/5 = (1/5)y −4/5 , 0<y<1
Example 6.8.
Let X be a continuous random variable with density
(
x+1
2
for −1 ≤ x ≤ 1
f (x) =
0 elsewhere
For example, let X be Exponential with the mean 1, and Y = bX. Then
f (x) = e−x , x > 0, and (6.1) gives
That is, Y is Exponentially distributed with the mean b. This agrees with
the result of Example 6.1.
Another example of location and scale parameters is provided by Normal
distribution: if Z is standard Normal, then Y = µ + σZ produces Y a Normal
(µ, σ 2 ) random variable. Thus, µ is the location and σ is the scale parameter.
Formula (6.1) also provides a faster way to solve some of the above Examples.
Exercises
6.8.
Suppose that Y = cos(πX) where the RV X is given by the table
x −2 −1 0 2 3
p(x) 0.1 0.2 0.3 0.3 0.1
Find the distribution of Y (make a table).
6.9.
The random variable X has a distribution given by the table
x −1 0 1 2
p(x) 0.1 0.2 0.3 0.4
6.10.
Let X be a continuous random variable with density
(
2
(x + 1) for 0 ≤ x ≤ 1
f (x) = 3
0 elsewhere
6.11.
Use the methods of this section to show that linear functions of normal
random variables again have a normal distribution. Let Y = a + bX, where
X is normal with the mean µ and variance σ 2 . How do the mean and variance
of Y relate to those of X? Again, use the methods of this section.
6.12.
The so-called Pareto random variable X with parameters 10 and 2 has the
density function
10
f (x) = 2 , x > 10
x
Write down the density function of Y = 4X − 20 (do not forget the limits!)
6.13.
Re-do Example 6.4 (p. 133) using the transform (Jacobian) method.
6.14.
For the following distributions identify the parameters as location or scale
parameters, or neither:
a) Weibull, parameter β.
b) Weibull, parameter γ.
c) Uniform on [−θ, θ], parameter θ.
d) Uniform on [b, b + 1], parameter b.
6.4. CENTRAL LIMIT THEOREM 139
Example 6.10.
The average voltage of the batteries is 9.2V and standard deviation is 0.25V.
Assuming normal distribution and independence, what is the distribution of
total voltage Y = X1 + ... + X4 ? Find the probability that the total voltage
is above 37.
Here we mention (without proof, which can be obtained using the mo-
ment generating functions) some properties of the sums of independent ran-
dom variables.
Distribution of Xi Distribution of
Y = X1 + X2 + ... + Xn (indep.)
0.15
1.0
0.8
0.10
0.6
0.4
0.05
0.2
7→
0.00
0.0
Exponential 0 1 2 3 4 5 Gamma 0 5 10 15 20
1.0
1.0
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
7→
0.0
0.0
Normal −3 −2 −1 0 1 2 3 Normal −3 −2 −1 0 1 2 3
2000
1500
1000
500
500
7→
0
0
Poisson 0 2 4 6 8 10 Poisson 5 10 15 20 25 30
Let X be the mean of a sample coming from some distribution with mean µ
and variance σ 2 . Then, for large n, X n is approximately Normal with mean
µ and variance σ 2 /n.
If n < 30, the approximation is good only if the population distribution
is not too different from a normal. If the population is normal, the sampling
6.4. CENTRAL LIMIT THEOREM 141
Dependence on n
As n increases, two things happen to the distribution of X: it is becoming
sharper (due to the variance decreasing) and also the shape is becoming more
and more Normal. For example, if Xi are Uniform[0,1], then the density of
X behaves as follows:
6
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
1
There are some cases of the so-called “heavy-tailed” distributions for which the CLT
does not hold, but they will not be discussed here.
142 CHAPTER 6. FUNCTIONS OF RANDOM VARIABLES
Example 6.12.
The fracture strengths of a certain type of glass average 14 (thousands of
pounds per square inch) and have a standard deviation of 2. What is the
probability that the average fracture strength for 100 pieces of this glass
exceeds 14.5?
Solution. By the central limit theorem the average strength X has approx-
imately a normal distribution with mean= 14 and standard deviation, σ =
√2 = 0.2. Thus,
100
14.5 − 14
P (X > 14.5) ≈ P Z > = P (Z > 2.5) = 0.5 − 0.4938 = 0.0062
0.2
Example 6.13.
A fair (p = 0.5) coin is tossed 500 times.
0.04
0.03
Normal approximation for
n = 500 and p = 0.5
0.025
0.02
0.015
0.01
0.005
0
200 210 220 230 240 250 260 270 280 290 300
0.35
0.25
0.2
0.15
0.1
0.05
0
−5 0 5 10 15 20 25
144 CHAPTER 6. FUNCTIONS OF RANDOM VARIABLES
Exercises
6.15.
The average concentration of potassium in county soils was determined as 85
ppm, with standard deviation 30 ppm. If n = 20 samples of soils are taken,
find the probability that their average potassium concentration will be in the
“medium” range (80 to 120 ppm).
6.16.
The heights of students have a mean of 174.5 centimeters (cm) and a standard
deviation of 6.9 cm. If a random sample of 25 students is obtained, determine
6.17.
The measurements of an irregular signal’s frequency have mean of 20 Hz and
standard deviation of 5 Hz. 10 independent measurements are done.
6.18.
A process yields 10% defective items. If 200 items are randomly selected from
the process, what is the probability that the sample proportion of defectives
a) exceeds 13%?
b) is less than 8%?
Chapter 7
Descriptive statistics
145
146 CHAPTER 7. DESCRIPTIVE STATISTICS
Example 7.1.
For some really important matters, the U.S. Census Bureau tries to
reach every single American, but this is practically impossible.
b) The gas mileage of a car is investigated. Suppose that we drive n = 20
times using a full tank of gas, until it’s empty, and calculate the average
gas mileage for each trip. Here, the population is all potential trips
between fillups on this car to be made (under usual driving conditions)
and the sample is the 20 trips actually made.
Definition 7.2.
Example 7.2.
Old Faithful is a famous geyser in Yellowstone National Park. The data
recorded represent waiting times between eruptions (in minutes). There are
n = 272 observations. The first ten observations are 79, 54, 74, 62, 85, 55,
88, 85, 51, 85. Using the bins 41-45, 46-50 etc we get
Bin 41-45 46-50 51-55 56-60 61-65 66-70 71-75
Count 4 22 33 24 14 10 27
Bin 76-80 81-85 86-90 91-95 96-100
Count 54 55 23 5 1
Histogram of Y
50
40
Frequency
30
20
10
0
40 50 60 70 80 90 100
for larger sample sizes; but not too many bins, so that the histogram keeps a
smooth√ appearance. Some authors recommend the number of bins no higher
than n where n is the sample size.
15
80
60
10
Frequency
Frequency
40
5
20
0
0
40 50 60 70 80 90 100 50 60 70 80 90
Y Y
Figure 7.2: histograms of Old Faithful data: bins too wide, bins too narrow
0.15
0.2
0.00
0.00
0.0
0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10
Example 7.3.
The heights of last 8 US presidents are (in cm)k : 185, 182, 188, 188, 185,
177, 182, 193. Find the mean and standard deviation of these heights.
Solution. The average height is X = 185. To make the calculations more
compact, let’s subtract 180 from each number, as it willP not affect the stan-
dard deviation: 5, 2, 8, 8, 5, −3, 2, 13, and X = 5. Then, Xi2 = 364 and we
364 − 52 (8) √
get S 2 = = 23.43 and S = 23.43 = 4.84.
8−1
150 CHAPTER 7. DESCRIPTIVE STATISTICS
7.3.2 Percentiles
Definition 7.4.
The pth percentile (or quantile) of a data set is a number q such that p% of
the entire sample are below this number. It can be calculated as
r = ((n + 1)p/100)th smallest number in the sample.
The algorithm for calculating pth percentile is then as follows.2
a) Order the sample, from smallest to largest, denote these as
X(1) , X(2) , . . . , X(n) .
b) Calculate r = (n + 1)p/100, let k = brc be the integer part of r.
c) If interpolation is desired, take X(k) + (r − k)[X(k+1) − X(k) ],
If interpolation is not needed, take X(r∗ ) where r∗ is the rounded value
of r.
Generally, if the sample size n is large, the interpolation is not needed.3
The 50-th percentile is known as median. It is, along with the mean, a
measure of center of the data set.
Example 7.4.
Back to the example of US presidents: find the median and 22nd percentile
of the presidents’ heights.
Solution. The ordered data are 177, 182, 182, 185, 185, 188, 188, 193. For
n = 8 we have two “middle observations”: ranked 4th and 5th, these are
both 185. Thus, the median is 185 (accidentally we have seen that X = 185
also).
To find 22nd percentile, take r = (n + 1)p = 9(0.22) = 1.98, round it to
2. Then, take 2nd ranked observation, which is 182.
they might produce fairly different results. For the right-skewed distribution,
mean > median, and for the left-skewed, mean < median.
The median is resistant to outliers. This means that the unusually high
or low observations do not greatly affect the median. The mean X is not
resistant to outliers.
Mean of a function
We can define the mean of any function g of our data as
g(X1 ) + g(X2 ) + ... + g(Xn )
g(X) =
n
Similar to the properties of the expected values (see Theorem 3.2), we have
the following properties:
a) aX + b = aX + b
b) but, generally, g(X) = g(X)
c) For sample standard deviation, SaX+b = aSX
Exercises
7.1.
The temperature data one morning from different weather stations in the
vicinity of Socorro were
71.9, 73.7, 72.3, 74.6, 72.8, 67.5, 72.0 (in ◦F )
a) Find the mean and standard deviation of temperatures
b) Find the median and 86th percentile.
c) Suppose that the last measurement came from Magdalena Ridge and
became equal to 41.7 instead of 72.0. How will this affect the mean
and the median, respectively?
d) Re-calculate the above answers if the temperature is expressed in Cel-
cius. [Hint: you do not have to do it from scratch!]
7.2.
The heights of the last 20 US presidents are, in cm: 185, 182, 188, 188, 185,
177, 182, 193, 183, 179, 175, 188, 182, 178, 183, 180, 182, 178, 170, 180.
152 CHAPTER 7. DESCRIPTIVE STATISTICS
7.3.
The permeabilities of 12 oil pumping locations, in millidarcies, are: 0.07,
0.17, 0.06, 0.09, 0.17, 0.18, 0.04, 0.07, 0.02, 0.57, 0.71, 0.05.
7.4.
Several runners have completed a 1 mile race, with these results: 4.35, 4.51,
4.18, 4.56, 4.10, 3.75 (in minutes).
Statistical inference
8.1 Introduction
In previous sections we emphasized properties of the sample mean. In this
section we will discuss the problem of estimation of population parameters, in
general. A point estimate of some population parameter θ is a single value
θ̂ of a statistic. For example, the value X is the point estimate of population
X
parameter µ. Similarly, p̂ = is a point estimate of the true proportion p
n
in a binomial experiment.
Statistical inference deals with the question: can we infer something about
the unknown population parameters (e.g., µ, σ or p)? Two major tools
for statistical inference are confidence intervals (they complement a point
estimate with a margin of error) and hypothesis tests that try to prove some
statement about the parameters.
E (θ̂) = θ.
153
154 CHAPTER 8. STATISTICAL INFERENCE
The unbiased estimators are correct “on average”, while actual samples
yield results higher or lower than the true value of the parameter
On the other hand, biased estimators would consistently overestimate or
underestimate the target parameter.
Example 8.1.
One reason that the sample variance S 2 = (Xi − X)2 /(n − 1) is divided by
P
n − 1 (instead of n) is the unbiasedness property. Indeed, it can be shown
that E (S 2 ) = σ 2 . However, E (S) 6= σ.
Proof. Central Limit Theorem (CLT) claims that, regardless of the initial
distribution, the sample mean X = (X1 + ... + Xn )/n will be approximately
Normal:
X ≈ Normal (µ, σ 2 /n)
for n reasonably large (usually n ≥ 30 is considered enough).
Suppose that a confidence level C = 100%(1 − α) is given. Then, find zα/2
such that
Due to the symmetry of Z-distribution, we need to find the z-value with the
upper tail probability α/2. That is, table area TA(zα/2 ) = 0.5 − α/2.
X −µ
Then, using CLT, Z ≈ √ , therefore
σ/ n
X −µ
P −zα/2 < √ < zα/2 ≈ 1 − α
σ/ n
Solving for µ, we obtain the result.
Notes:
(a) If σ is unknown, it can replaced by S, the sample standard deviation,
with no serious loss in accuracy for the large sample case. Later, we will
discuss what happens for small samples.
(b) This CI (and many to follow) has the following structure
X ±m
Example 8.3.
The average zinc concentration recovered from a sample of zinc measurements
in 36 different locations in the river is found to be 2.6 milligrams per liter.
Find the 95% and 99% confidence intervals for the mean zinc concentration
µ. Assume that the population standard deviation is 0.3.
Solution. The point estimate of µ is X = 2.6. For 95% confidence, zα/2 =
1.96. Hence, the 95% confidence interval is
0.3 0.3
2.6 − 1.96 √ < µ < 2.6 + 1.96 √ = (2.50, 2.70)
36 36
For a 99% confidence, zα/2 = 2.575 and hence the 99% confidence interval is
0.3 0.3
2.6 − 2.575 √ < µ < 2.6 + 2.575 √ = (2.47, 2.73)
36 36
We see that a wider interval is required to estimate µ with a higher degree
of confidence.
Example 8.4.
An important property of plastic clays is the amount of shrinkage on drying.
For a certain type of plastic clay 45 test specimens showed an average shrink-
age percentage of 18.4 and a standard deviation of 1.2. Estimate the “true”
average shrinkage µ for clays of this type with a 95% confidence interval.
Solution. For these data, a point estimate of µ is X = 18.4. The sample
standard deviation is S = 1.2. Since n is fairly large, we can replace σ by S.
Hence, 95% confidence interval for µ is
1.2 1.2
18.4 − 1.96 √ < µ < 18.4 + 1.96 √ = (18.05, 18.75)
45 45
Thus we are 95% confident that the true mean lies between 18.05 and 18.75.
8.2. CONFIDENCE INTERVALS 157
Exercises
8.1.
In a school district, they would like to estimate the average reading rate of
first-graders. After selecting a random sample of n = 65 readers, they ob-
tained sample mean of 53.4 words per minute (wpm), and standard deviation
of 33.9 wpm.l Calculate a 98% confidence interval for the average reading
rate of all first-graders in the district.
8.2.
A random sample of 200 calls initiated while driving had a mean duration
of 3.5 minutes with standard deviation 2.2 minutes. Find a 99% confidence
interval for the mean duration of telephone calls initiated while driving.
158 CHAPTER 8. STATISTICAL INFERENCE
8.3.
• Is the new drug more effective than the old one? In this case, we would
like to compare two parameters, e.g. the average effectiveness of the
old drug versus the new one.
In making the decision, we will compare the statement (say, p = 1/2) with
the available data and will reject the claim p = 1/2 if it contradicts the data.
In the subsequent sections we will learn how to set up and test the hypotheses
in various situations.
otherwise Accept H0 .
a
Do not confuse p-value with notation for proportion p
Calculation of P-values
For the two-sided hypothesis, P-value = 2 × P (Z > |z|).
For the right-tailed hypothesis, HA : µ > µ0 , P-value = P (Z > z)
For the left-tailed hypothesis, HA : µ < µ0 , P-value = P (Z < z)
8.3. STATISTICAL HYPOTHESES 161
0.4
0.4
0.4
µ ≠ µ0 µ > µ0 µ < µ0
0.3
0.3
0.3
0.2
0.2
0.2
0.1
0.1
0.1
0.0
0.0
0.0
−3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3
Example 8.6.
A manufacturer of sports equipment has developed a new synthetic fishing
line that he claims has a mean breaking strength of 8 kg with a standard
deviation of 0.5 kg. A random sample of 50 lines is tested and found to have
a mean breaking strength of 7.8 kg. Test the hypothesis that µ = 8 against
the alternative that µ 6= 8. Use α = 0.01 level of significance.
Solution.
a) H0 : µ = 8
b) HA : µ 6= 8
c) α = 0.01 and hence critical value zα/2 = 2.57
d) Test statistic:
√ √
n(X − µ0 ) 50(7.8 − 8)
z= = = −2.83
σ 0.5
Example 8.7.
A random sample of 100 recorded deaths in the United States during the
past year showed an average life span of 71.8 years. Assuming a population
standard deviation of 8.9 years, does this seem to indicate that the mean life
span today is greater than 70 years? Use a 0.05 level of significance.
Solution.
a) H0 : µ = 70 years.
b) HA : µ > 70 years.
c) α = 0.05 and zα = 1.645
d) Test statistic:
√ √
n(X − µ0 ) 100(71.8 − 70)
z= = = 2.02
σ 8.9
e) Decision: Reject H0 if 2.02 > 1.645, since 2.02 > 1.645, we reject H0 .
f) Conclusion: We conclude that the mean life span today is greater than
70 years.
Reject H0
Example 8.8.
The nominal output voltage for a certain electrical circuit is 130V. A random
sample of 40 independent readings on the voltage for this circuit gave a
sample mean of 128.6V and a standard deviation of 2.1V. Test the hypothesis
that the average output voltage is 130 against the alternative that it is less
than 130. Use a 5% significance level.
Solution.
a) H0 : µ = 130
b) HA : µ < 130
8.4. THE CASE OF UNKNOWN σ 163
Exercises
8.6.
It is known that the average height of US adult males is about 173 cm, with
standard deviation of about 6 cm.
Referring to Exercise 7.2, the average height of 20 last US presidents
was 181.9 cm. Are the presidents taller than the average? Test at the level
α = 0.05 and also compute the p-value.
8.7.
Is it more difficult to reject H0 when the significance level is smaller? Suppose
that the p-value for a test was 0.023. Would you reject H0 at the level
α = 0.05? At α = 0.01?
0.4 df = 1
df=4
df=10
0.3
Z distribution
f(x)
0.2
0.1
0.0
−3 −2 −1 0 1 2 3
On the other hand, we already noted that for large n we could simply use
Z-distribution for the C.I. calculation. This is justified by the fact that tα/2
values approach zα/2 values as n gets larger.
Example 8.9.
The contents of 7 similar containers of sulfuric acid are 9.8, 10.2, 10.4, 9.8,
10.0, 10.2 and 9.6 liters. Find a 95% confidence interval for the mean volume
of all such containers, assuming an approximate normal distribution.
Solution. The sample mean and standard deviation for the given data are
X = 10.0 and S = 0.283. Using the T-Table, we find t0.025 = 2.447 for 6
degrees of freedom. Hence the 95% confidence interval for µ is
0.283 0.283
10.0 − 2.447 √ < µ < 10.0 + 2.447 √ ,
7 7
which reduces to 9.74 < µ < 10.26
Example 8.10.
A random sample of 12 graduates of a certain secretarial school typed an
average of 79.3 words per minute (wpm) with a standard deviation of 7.8
wpm. Assuming a normal distribution for the number of words typed per
minute, find a 99% confidence interval for the average typing speed for all
graduates of this school.
Solution. The sample mean and standard deviation for the given data are
X = 79.3 and S = 7.8. Using the T-Table, we find t0.005 = 3.106 with 11
degrees of freedom. Hence the 95% confidence interval for µ is
7.8 7.8
79.3 − 3.106 √ < µ < 79.3 + 3.106 √ ,
12 12
which reduces to 72.31 < µ < 86.30.
We are 99% confident that the interval 72.31 to 86.30 includes the true av-
erage typing speed for all graduates.
166 CHAPTER 8. STATISTICAL INFERENCE
a) Null Hypothesis H0 : µ = µ0
b) Alternative Hypothesis HA : µ 6= µ0 , or HA : µ > µ0 , or HA : µ < µ0 .
c) Critical value: tα/2 for two-tailed or tα for one-tailed test.
√
n(X − µ0 )
d) Test Statistic t = with n − 1 degrees of freedom
S
e) Decision Rule: Reject H0 if
Example 8.11.
Engine oil was stated to have the mean viscosity of µ0 = 85.0. A sample of
n = 25 viscosity measurements resulted in a sample mean of X = 88.3 and a
sample standard deviation of S = 7.49. What is the evidence that the mean
viscosity is not as stated? Use α = 0.1.
Solution.
a) H0 : µ = 85.0
b) HA : µ 6= 85.0
c) α = 0.1 and tα/2 = 1.711 with 24 degrees of freedom.
168 CHAPTER 8. STATISTICAL INFERENCE
d) Test statistic:
√ √
n(X − µ0 ) 25(88.3 − 85.0)
t= = = 2.203
S 7.49
e) Decision: Reject H0 since 2.203 > 1.711.
f) Conclusion: We conclude that the average viscosity is not equal to 85.0
Decision based on P-value:
Since the test in this example is two sided, the desired p-value is twice the
tail area. Therefore, using t-table with df = 24, we have
P-value = 2 × P (T > 2.203) = 2(0.0187) = 0.0374,
which allows us to reject the null hypothesis that µ = 85 at a level of signif-
icance smaller than 0.1.
Conclusion: In summary, we conclude that there is fairly strong evidence
that the mean viscosity is not equal to 85.0
Example 8.12.
A sample of n = 20 cars driven under varying highway conditions achieved
fuel efficiencies with a sample mean of X = 34.271 miles per gallon (mpg)
and a sample standard deviation of S = 2.915 mpg. Test the hypothesis that
the average highway mpg is less than 35 with α = 0.05.
Solution.
a) H0 : µ = 35.0
b) HA : µ < 35.0
c) α = 0.05 and tα = 1.729 with 19 degrees of freedom.
d) Test statistic:
√ √
n(X − µ0 ) 20(34.271 − 35.0)
t= = = −1.119
S 2.915
e) Decision: since −1.119 > −1.729, we do not reject H0 .
f) Conclusion: There is no evidence that the average highway mpg is any
less than 35.0
Decision based on P-value:
P-value = P (T < −1.119) = P (T > 1.119) > 0.10,
(using df = 19), thus p-value > α = 0.05, do not reject H0 .
8.4. THE CASE OF UNKNOWN σ 169
H0 : µ = µ0 vs. HA : µ 6= µ0
at the level α, using a confidence interval with the confidence level 100%(1 −
α). If we found the 100%(1 − α) C.I. for the mean µ, and µ0 belongs to it,
we accept H0 , otherwise we reject H0 .
This way, the C.I. is interpreted as the range of “plausible” values for µ.
The false positive rate in this case will be equal to α = 1 − C/100%
Example 8.13.
Reconsider Example 8.11. There, we had to test H0 : µ = 85.0 with the data
n = 25, X = 88.3 and S = 7.49, at the level α = 0.1. Is there evidence that
the mean average viscosity is not 85.0?
7.49
88.3 ± 1.711 √ = 88.3 ± 2.6 or (85.7, 90.9)
25
Since 85.0 does not belong to this interval, there is evidence that the “true”
mean viscosity is not 85.0 (in fact, it’s higher).
We arrived at the same conclusion as in Example 8.11.
Exercises
8.8.
In determining the gas mileage of a new model of hybrid car, the independent
research company collected information from 14 randomly selected drivers.
They obtained the sample mean of 38.4 mpg, with the standard deviation of
5.2 mpg. Obtain a 99% C.I. for µ.
What is the meaning of µ in this problem? What assumptions are necessary
for your C.I. to be correct?
8.9.
This problem is based on the well-known Newcomb data set for the speed
of light.n It contains the measurements (in nanoseconds) it took the light to
bounce inside a network of mirrors. The numbers given are the time recorded
minus 24, 800 ns. We will only use the first ten values.
28 26 33 24 34 -44 27 16 40 -2
Some mishaps in the experimental procedure led to the two unusually low
values (−44 and −2). Calculate the 95% C.I.’s for the mean in case when
a) all the values are used
b) the two outliers are removed
Which of the intervals will you trust more and why?
8.10.
For the situation in Example 8.6 (fishing line strength), test the hypotheses
using the C.I. approach.
(X 1 − X 2 ) − (µ1 − µ2 )
Z=
σD
a) Null Hypothesis H0 : µ1 = µ2
b) Alternative Hypothesis HA : µ1 6= µ2 , or HA : µ1 > µ2 , or HA : µ1 < µ2 .
c) Critical value: tα/2 for two-tailed or tα for one-tailed test, for some
chosen significance level α.
X1 − X2
d) Test Statistic t = p
S12 /n1 + S22 /n2
e) Decision Rule: Reject H0 if
Example 8.14.
A study of iron deficiency among infants compared samples of infants fol-
lowing different feeding regimens. One group contained breast-fed infants,
while the other group were fed a standard baby formula without any iron
supplements. Here are the data on blood hemoglobin levels at 12 months of
age:
Group n X s
Breast-fed 23 13.3 1.7
Formula 19 12.4 1.8
(a) Is there significant evidence that the mean hemoglobin level is higher
among breast-fed babies?
(b) Give a 95% confidence interval for the mean difference in hemoglobin
level between the two populations of infants.
Standard Error
All previous formulas involving t-distribution have a common structure. For
example, (8.3) can be re-written as
(X 1 − X 2 ) ± tα/2 SEX 1 −X 2 ,
p
where the quantity SEX 1 −X 2 = S12 /n1 + S22 /n2 is called the Standard Error.
Likewise, the one-sample confidence interval for the mean is
X ± tα/2 SEX ,
√
where SEX = s/ n.
Likewise, the formulas for the t-statistic are
X1 − X2 X − µ0
t= for 2-sample, and t = for 1-sample situation.
SEX 1 −X 2 SEX
Example 8.15.
The following are the left hyppocampus volumes (in cm3 ) for a group of
twin pairs, one is affected by schizophrenia, and the other is noto
Pair number 1 2 3 4 5 6 7 8 9 10 11 12
Unaffected 1.94 1.44 1.56 1.58 2.06 1.66 1.75 1.77 1.78 1.92 1.25 1.93
Affected 1.27 1.63 1.47 1.39 1.93 1.26 1.71 1.67 1.28 1.85 1.02 1.34
Difference 0.67 -0.19 0.09 0.19 0.13 0.40 0.04 0.10 0.50 0.07 0.23 0.59
Solution. Since the twins’ LH volumes are clearly not independent (if one is
large the other is likely to be large, too – positive correlation!), we cannot
use the 2-sample procedure.
However, we can just compute the differences (Unaffected – Affected) and
test for the mean difference to be equal to 0. That is,
H0 : µ = 0 versus HA : µ 6= 0
where µ is the “true” average difference, and X, S are computed for the
sample of differences.
Given that X = 0.235 and S = 0.254,√let’s test these hypotheses at
α = 0.10. We obtain t = (0.235 − 0)/(0.254/ 12) = 3.20. From the T-table
with df = 11 we get p-value between 2(0.005) = 0.01 and 2(0.001) = 0.002.
At α = 0.05, we Reject H0 , thus stating that there is a significant difference
between LH volumes of normal and schizophrenic people.
8.5. C.I. AND TESTS FOR TWO MEANS 175
Exercises
8.11.
In studying how humans pick random objects, the subjects were presented
a population of rectangles and have used two different sampling methods.
They then calculated the average areas of the sampled rectangles for each
method. Their results were
mean st.dev. n
Method 1 10.8 4.0 16
Method 2 6.1 2.3 16
Calculate the 99% C.I. for the difference of “true” means by the two methods.
Is there evidence that the two methods produce different results?
8.12.
The sports research lab studies the effects of swimming on maximal volume
of oxygen uptake.
For 8 volunteers, the maximal oxygen uptake was measured before and after
the 6-week swimming program. The results are as follows:
Is there evidence that the swimming program has increased the maximal
oxygen uptake?
8.13.
Visitors to an electronics website rated their satisfaction with two models
of printers/scanners, on the scale of 1 to 5. The following statistics were
obtained:
n mean st.dev.
Model A 31 3.6 1.5
Model B 65 4.2 0.9
At the level of 5%, test the hypothesis that both printers would have the
same average rating in the general population, that is, H0 : µA = µB . Also,
calculate the 95% confidence interval for the mean difference µA − µB .
176 CHAPTER 8. STATISTICAL INFERENCE
Solution. To assess whether the data provide evidence that the coin was not
balanced, we test H0 : p = 0.5 versus HA : p 6= 0.5.
The test statistic is
p̂ − p0 0.5069 − 0.50
z=p =p = 0.88
p0 (1 − p0 )/n 0.50(1 − 0.5)/4040
From Z chart we find P (Z < 0.88) = 0.8106. Therefore, the p-value is 2(1 −
0.8106) = 0.38. The data are compatible with balanced coin hypothesis.
Now we will calculate a 99% confidence interval for p. The zα/2 = 2.576
from the normal table. Hence, the 99% CI for p is
r
(0.5069)(1 − 0.5069)
p̂ = 0.5069 ± 2.576 = 0.5069 ± (2.576)(0.00786)
4040
= 0.5069 ± 0.0202 = (0.4867, 0.5271)
Example 8.18.
To test the effectiveness of a new pain relieving drug, 80 patients at a clinic
were giving a pill containing the drug and 80 others were giving a placebo.
At the 0.01 level of significance, what can we conclude about the effectiveness
of the drug if the first group 56 of the patients felt a beneficial effect while
38 out of those who received placebo felt a beneficial effect?
Solution. H0 : p1 − p2 = 0 and HA : p1 − p2 > 0
z = 2.89, where p̂1 = 56
80
= 0.7 and p̂2 = 38
80
= 0.475 and p̂ = 56+38
80+80
= 0.5875
8.6. INFERENCE FOR PROPORTIONS 179
Exercises
8.14.
Suppose that a nutritionist claims that at least 75% of the preschool children
in a certain country have protein deficient diets, and that a sample survey
reveals that 206 preschool children in a sample of 300 have protein deficient
diets. Test the claim at the 0.02 level of significance. Also, compute a 98%
confidence interval.
8.15.
In a survey of 200 office workers, 165 said they were interrupted three or
more times an hour by phone messages, faxes etc. Find and interpret a
90% confidence interval for the population proportion of workers who are
interrupted three or more times an hour.
8.16.
You would like to design a poll to determine what percent of your peers
volunteer for charities. You have no clear idea of what the value of p is going
to be like, and you’ll be satisfied with the 90% margin of error equal to ±10%.
Find the sample size needed for your study.
8.17.
In random samples of 200 tractors from one assembly line and 400 tractors
from another, there were, respectively, 16 tractors and 20 tractors which
required extensive adjustments before they could be shipped. At the 5%
level of significance, can we conclude that there is a difference in the quality
of the work of the two assembly lines?
Chapter Exercises
For each of the questions involving hypothesis tests, state the null and alter-
native hypotheses, compute the test statistic, determine the p-value, make
the decision and summarize the results in plain English. Use α = 0.05 unless
otherwise specified.
180 CHAPTER 8. STATISTICAL INFERENCE
8.18.
Two brands of batteries are tested and their voltages are compared. The
summary statistics are below. Find and interpret a 95% confidence interval
for the true difference in means.
mean st.dev. n
Brand 1 9.2 0.3 25
Brand 2 8.9 0.6 27
8.19.
You are studying yield of a new variety of tomato. In the past, yields of
similar types of tomato have shown a standard deviation of 8.5 lbs per plant.
You would like to design a study that will determine the average yield within
a 90% error margin of ±2 lbs. How many plants should you sample?
8.20.
A biologist knows that the average length of a leaf of a certain full-grown
plant is 4 inches. A sample of 45 leaves from the plants that were given a
new type of plant food had an average length of 4.2 inches, with the standard
deviation of 0.6 inches. Is there reason to believe that the new plant food
is responsible for a change in the average growth of leaves? Use α = 0.02.
Would your conclusion have changed if you used α = 0.05?
8.21.
A job placement director claims that mean starting salary for nurses is
$38,000. A random sample of 10 nurses’ salaries has a mean $35,450 and
a standard deviation of $4,700. Is there enough evidence to reject the direc-
tor’s claim at α = 0.01?
8.22.
College Board claimsp that in 2010, public four-year colleges charged, on
average, $7,605 per year in tuition and fees for in-state students. A sample
of 20 public four-year colleges collected in 2011 indicated a sample mean
of $8,039 and the sample standard deviation was $1,950. Is there sufficient
evidence to conclude that the average in-state tuition has increased?
8.6. INFERENCE FOR PROPORTIONS 181
8.23.
The weights of grapefruit follow a normal distribution. A random sample of
12 new hybrid grapefruit had a mean weight of 1.7 pounds with standard
deviation 0.24 pounds. Find a 95% confidence interval for the mean weight
of the population of the new hybrid grapefruit.
8.24.
The Mountain View Credit Union claims that the average amount of money
owed on their car loans is $ 7,500. Suppose a random sample of 45 loans shows
the average amount owed equals $8,125, with standard deviation $4,930.
Does this indicate that the average amount owed on their car loans is not
$7,500? Use a 1% level of significance.
8.25.
An overnight package delivery service has a promotional discount rate in
effect this week only. For several years the mean weight of a package delivered
by this company has been 10.7 ounces. However, a random sample of 12
packages mailed this week gave the following weights in ounces:
12.1 15.3 9.5 10.5 14.2 8.8 10.6 11.4 13.7 15.0 9.5 11.1
Use a 1% level of significance to test the claim that the packages are
averaging more than 10.7 ounces during the discount week.
8.26.
Some people claim that during US elections, the taller of the two major party
candidates tends to prevail. Here are some data on the last 15 elections
(heights are in cm).
Year 2008 2004 2000 1996 1992 1988 1984 1980
Winning candidate 185 182 182 188 188 188 185 185
Losing candidate 175 193 185 187 188 173 180 177
8.27.
An item in USA Today reported that 63% of Americans owned a mobile
browsing device. A survey of 143 employees at a large school showed that
85 owned a mobile browsing device. At α = 0.02, test the claim that the
percentage is the same as stated in USA Today.
8.28.
A poll by CNN revealed that 47% of Americans approve of the job perfor-
mance of the President. The poll was based on a random sample of 537
adults.
c) Would you have also reached the same conclusion for H0 : p = 0.45?
8.29.
Find a poll cited in a newspaper, web site or other news source, with a men-
tion of the sample size and the margin of error. (For example, rasmussenreports.com
frequently discuss their polling methods.) Confirm the margin of error pre-
sented by the pollsters, using your own calculations.
Chapter 9
Linear Regression
Example 9.1.
Imagine that we are opening an ice cream stand and would like to be able to
predict how many customers we will have. We might use the temperature as
a predictor. We decided to collect data over a 30-week period from March
to July.q
Week 1 2 3 4 5 6 7 8 9 10
Mean temp 41 56 63 68 69 65 61 47 32 24
Consumption 0.386 0.374 0.393 0.425 0.406 0.344 0.327 0.288 0.269 0.256
Week 11 12 13 14 15 16 17 18 19 20
Mean temp 28 26 32 40 55 63 72 72 67 60
Consumption 0.286 0.298 0.329 0.318 0.381 0.381 0.47 0.443 0.386 0.342
Week 21 22 23 24 25 26 27 28 29 30
Mean temp 44 40 32 27 28 33 41 52 64 71
Consumption 0.319 0.307 0.284 0.326 0.309 0.359 0.376 0.416 0.437 0.548
183
184 CHAPTER 9. LINEAR REGRESSION
0.6
●
0.5
●
Pints per Person
●
●
●
●
0.4
●
●
● ● ● ●
● ●
●
● ●
● ● ●
● ●
●
0.3
●
●
● ● ●
●
●
0.2
20 30 40 50 60 70 80
(SSE is for the Sum of Squared Errors, however the quantities Yi − b̂0 − b̂1 Xi
are usually referred to as residuals.)
To find the minimum, we would calculate partial derivatives of SSE with
186 CHAPTER 9. LINEAR REGRESSION
Solution.
P P P 2 P 2 P
X = 36, Y = 62, X = 204, Y = 536, XY = 175.
Therefore,
X = 36/9 = 4, Y = 62/9 = 6.89, SSX = 204 − 362 /9 = 60,
2
SSY = 536 − 62 /9 = 108.9, SSXY = 175 − 36(62)/9 = −73
and finally,
b̂1 = −73/60 = −1.22, b̂0 = 6.89 − (−1.22)4 = 11.76, r = −0.903
Thus, we get the equation
Ŷ = 11.76 − 1.22X
Example 9.3.
For the data in Example 9.1,
0.6
●
0.5
Pints per Person
● ●
●
●
0.4
●
● ● ●
● ●● ●
●
● ●
● ● ●
● ●
0.3
● ●
●
● ● ●
●
●
0.2
20 30 40 50 60 70 80
Figure 9.2: Least squares regression line for the ice cream example
Solution.
(a) We may obtain the following estimates (usually done by a computer)
These can be used to plot the regression line (Fig. 9.2) and make predictions.
Can you interpret the slope and the intercept for this problem in plain En-
glish?
Example 9.4.
Continuing the analysis of data from Example 9.1, let’s examine a portion
of computer output (done by R statistical package).
Estimate Std.Error t-value Pr(>|t|)
(Intercept) 0.2069 0.0247 8.375 4.13e-09
X 0.003107 0.000478 6.502 4.79e-07
We can calculate confidence intervals and hypothesis tests for the parameters
b0 and b1 .
The 95% C.I. for the slope b1 is
0.6
●
0.5
Pints per Person
● ●
●
●
0.4
●
● ● ●
● ●● ●
●
● ●
● ● ●
● ●
0.3
● ●
●
● ● ●
●
●
0.2
20 30 40 50 60 70 80
Figure 9.3: Confidence (solid lines) and prediction bands (broken lines) for
the ice cream example
Exercises
9.1.
In the file http://www.nmt.edu/~olegm/382book/cars2010.csv, there are
some data on several 2010 compact car models. The variables are: engine dis-
placement (liters), city MPG, highway MPG, and manufacturer’s suggested
price.
192 CHAPTER 9. LINEAR REGRESSION
0.10
0.05
● ●
●
●
Residual
●
●
●
●
●
● ●
●
0.00
● ●
●
● ●
● ● ●
● ●
●
●
−0.05
● ●
●
Fitted values
If Yi = a0 ea1 Xi then ln Yi = ln a0 + a1 Xi
9.3.
A head of a large Hollywood company has seen the following values of its
market share in the last six years:s
9.4.
For the Old Faithful geyser, the durations of eruption (X) were recorded,
with the interval to the next eruption (Y), both in minutes.
9.5.
Does the price of the first-class postal stamp follow linear regression, or some
other pattern?t
Year (since 1900) 32 58 63 68 71 74 75 78 81 85 88 91 95 99
Price (cents) 3 4 5 6 8 10 13 15 20 22 25 29 32 33
Year (since 1900) 101 102 106 108 111 112
Price (cents) 34 37 39 42 44 45
• A test for goodness-of-fit, that is, how well do the observed counts Xi fit
a given distribution.
• A test for independence, for which there are two classification categories
(variables), and we are testing the independence of these variables.
195
196 CHAPTER 10. CATEGORICAL DATA ANALYSIS
where pi is the probability that aPsubjectPwill belong to group i and p0i , i = 1, ..., k
are given numbers. (Note that pi = p0i = 1, so that pk can actually be ob-
tained from the rest of pi ’s.)
Our data (Observed
Pk counts) are the counts of each category in the sample, X1 , X2 , ...., Xk
such that i=1 Xi = n. The total sample size is n. For k = 2 we would get X1 =
number of successes, and X2 = n − X1 = number of failures, that is, Binomial
distribution. For k > 2 we deal with Multinomial distribution.
For testing H0 , we compare the observed counts Xi to the ones we would expect
under null hypothesis, that is,
To adjust for the size of each group, we would take the squared difference divided
by Ei , that is (Ei − Xi )2 /Ei . Adding up, we obtain the
k
X (Ei − Xi )2
Chi-square statistic χ2 = (10.1)
Ei
i=1
We would reject H0 when χ2 statistic is large (that is, the Observed counts are far
from Expected counts). Thus, our test is always one-sided. To find the p-value,
use χ2 upper-tail probability table very much like the t-table. See Table C.
0.5
0.4
df = 2
df = 5
0.3
df = 10
f(x)
0.2
0.1
0.0
0 5 10 15 20
x
Figure 10.1: Chi-square densities
10.1. CHI-SQUARE GOODNESS-OF-FIT TEST 197
Example 10.1.
When studying earthquakes, we recorded the following numbers of earthquakes (1
and above on Richter scale) for 7 consecutive days in January 2008.
Day 1 2 3 4 5 6 7 Total
Count 85 98 79 118 112 135 137 764
Expected 109.1 109.1 109.1 109.1 109.1 109.1 109.1 764
Here, n = 764. Is there evidence that the rate of earthquake activity changes
during this week?
Example 10.2.
In this example, we will test whether a paricular distribution matches our experi-
mental results. These are the data from the probability board (quincunx), we test
if the distribution is really Binomial (as is often claimed). The slots are labeled
0-19. Some slots were merged together (why?)
Solution. The expected counts are computed using Binomial(n = 19, p = 0.5)
distribution, and then multiplying by the T otal = 100. For example,
19
E9 = 0.59 (1 − 0.5)19−9 × 100 = 17.6
9
To test
H0 : Variable 1 is independent of Variable 2 vs
HA : Variable 1 is not independent of Variable 2
we can use the χ2 random variable with df = (r − 1)(c − 1), where
c
r X
X (Eij − Xij )2
test statistic χ2 = (10.2)
Eij
i=1 j=1
2
These are not random variables in the sense of Chapter 3, because they are categorical,
not numerical.
200 CHAPTER 10. CATEGORICAL DATA ANALYSIS
Example 10.3.
Suppose that we ordered 50 components from each of the vendors A, B and C,
and the results are as follows
We would like to investigate whether all the vendors are equally reliable. That is,
Solution. We’ll put all the expected counts into the table
Expected counts:
Succeeded Failed Total
_____________________________________________________
Vendor A 45 5 50
Vendor B 45 5 50
Vendor C 45 5 50
_____________________________________________________
Total 135 15 150
Exercises
10.1.
In testing how well people can generate random patterns, the researchers asked
everyone in a group of 20 people to write a list of 5 random digits. The results are
tabulated below
3
For this particular example, since df = 2, there is a more exact p-value calculation
based on Exponential distribution: P (Y > 4) = exp(−4/2) = 0.1353. In general, we can
use Excel function chidist or other software to compute the exact p-values.
10.2. CHI-SQUARE TEST FOR INDEPENDENCE 201
Digits 0 1 2 3 4 5 6 7 8 9 Total
Observed 6 11 10 13 8 13 7 17 8 7 100
Are the digits completely random or do humans have preference for some particular
digits over the others?
10.2.
Forensic statistics. To uncover rigged elections, a variety of statistical tests might
be applied. For example, made-up precinct totals are sometimes likely to have an
excess of 0 or 5 as their last digits. For a city election, the observers counted that
21 precinct totals had the last digit 0, 18 had the last digit 5, while 102 had some
other last digit. Is there evidence that the elections were rigged?
10.3.
In an earlier example of Poisson distribution, we discussed the number of Nazi
bombs hitting 0.5×0.5km squares in London. The following were counts of squares
that have 0, 1, 2, ... hits:
Test whether the data fit the Poisson distribution (for p01 , ...p0k use the Poisson
probabilities, with the parameter µ estimated as average number of hits per square,
µ = 0.9288).
10.4.
To test the attitudes to a tax reform, the state officials collected data of the
opinions of likely voters, along with their income level
Income Level:
Low Medium High
For 182 213 203
Against 154 138 110
Do the people with different incomes have significantly different opinions on tax
reform? (That is, test whether the Opinion variable is independent of Income
variable.)
10.5.
Using exponential distribution, confirm the calculation of chi-square (df = 2) crit-
ical points from Table C for upper tail area α = 0.1 and α = 0.005. Find the point
for χ2 (df = 2) distribution with α = 0.2
202 NOTES
Notes
i
see e.g. http://forgetomori.com/2009/skepticism/seeing-patterns/
j
see
"http://www.census.gov/hhes/www/cpstables/032010/perinc/new01_001.htm"
k
http://en.wikipedia.org/wiki/
Heights_of_Presidents_of_the_United_States_and_presidential_candidates
l
see http://www.readingonline.org/articles/bergman/wait.html
m
“Student” [William Sealy Gosset] (March 1908). ”The probable error of a mean”.
Biometrika 6 (1): 1-25.
n
For example, see "http://www.stat.columbia.edu/~gelman/book/data/light.asc"
o
example from “Statistical Sleuth”
p
http://www.collegeboard.com/student/pay/add-it-up/4494.html
q
Kotswara Rao Kadilyala (1970). “Testing for the independence of regression distur-
bances” Econometrica, 38, 97-117. Appears in: A Handbook of Small Data Sets, D. J.
Hand, et al, editors (1994). Chapman and Hall, London.
r
from The R book by Michael Crawley
s
Mlodinow again. The director, Sherry Lansing, was subsequently fired only to see
several films developed during her tenure, including Men In Black, hit it big.
t
see "http://www.akdart.com/postrate.html"