0 ratings0% found this document useful (0 votes) 330 views23 pagesCpstats
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here.
Available Formats
Download as PDF or read online on Scribd
Statistics I
5,6,2,2,2,7
Consider this list of numbers:
‘The mean of the list is the average:
BHO+24I+2E7 _
6
The median is the number in the middle when the list is in order. For example, the median for 1,2,3,4,5 is 3.
For our particular list, which looks like
2,2,2,5,6,7
when ordered, there is no single middle number we can consider the median. When that happens, the median
is the average the two middle numbers:
245 a
= “BS
Now what if the list were 100 numbers long? How would you determine the median? Take half to get 50. The
50th and Sist numbers would be the ones in the middle you would average.
For an ordered list of 101 numbers, take half to get 50.5. Round up. The 5ist number is the median.
Seems a little counterintuitive, right? If you find this hard to memorize, just keep the smallest case in your
back pocket. For a list of 3 numbers, the second one is obviously the median. How would we get this
‘mathematically? Take half of 3 fo get 1.5. Round up to 2, which designates the second number. For a list of 4
numbers, the median is the average of the second and third numbers. Take half of 4 to get 2. This designates
the second and third numbers
In both cases, we “rounded up.” When there was an odd number of numbers, we rounded 1.5 up to 2. When
there was an even number of numbers, we rounded 2 up to 3, which indicated that two numbers would.
contribute to the median. This technique may seem a bit odd, but many students have found it helpful in
quickly finding the median of a large batch of numbers.
‘The mode is the number that shows up the most often. In our particular list, it’s [2],
The range is the difference between the biggest number in the list and the smallest number:
7-2=[5]
223
SAT & IG BOOK STORE 01227746409 223CHAPTER 25 STATISTICS I
The standard deviation is a measure of how spread out a list of numbers is. In other words, how much they
“deviate” from the mean. The standard deviation is lower when more numbers are closer to the mean. The
standard deviation is higher when more numbers are spread out away from the mean. For example, our
list
2,2,2,5,6,7
would have a higher standard deviation than the following list
5,5,5,5,6,7
because the second list is more tightly clustered around the mean. It tums out that the standard deviation
of our list is 2.28 and the standard deviation of the second list is 0.83. Don’t worry about how we got these
values—you'll never be asked to calculate the standard deviation on the SAT, Just know how to compare one
list’s standard deviation with another's as we just did.
EXAMPLE 1:
Daily Hours Spent Playing Sports
Number of students
Hours
The histogram above summarizes the daily number of hours spent playing sports for 80 students at a
school
PART 1: What is the mean daily number of hours spent playing sports for the 80 students?
PART 2: What is the median daily number of hours spent playing sports for the 80 students?
Part 1 Solution: Sum up the total number of hours for every student. Then divide that by the number of
students.
Total hours (0x5) + (1x35) + (2 * 15) + (
Teal hours 5) 10x25) _ MO _ p75)
Number of students 80, 80 1125!
Part 2 Solution: In a group of 80 students, the 40th and 41st students are the two in the middle (the histogram
already orders the students by their hours so we don’t have to). The first 5 students spend 0 hours playing,
sports each day. The next 35 students spend 1 hour. This group includes the 40th student, so the 40th stuclent
spends 1 hour. The next 15 students spend 2 hours. Now this group includes the 41st student, so the 41st
student spends 2 hours. Taking the average,
Daily hours spent by 40th student + Daily hours spent by 4ist student _ 14-2
: 42 _[75)
224
SAT & IG BOOK STORE 01227746409
224THE COLLE
PANDA
EXAMPLE 2:
.
0
The dot plot above summarizes the number of flights taken in a year by 19 college students. If the student
who took 6 flights in a year is removed from the data, which of the following correctly describes the
changes to the statistical measures of the data?
‘The mean decreases.
I, The median decreases.
IL. The rangp decreases.
A) Ilonly
B) Land Il only
©) land II only
D) L.Iand
The student who took 6 fights in a year is called an outlier, an extreme data point that is far outside where
most of the daia lies. Because this outlier is greater than the rest of the data, it brings the average (mean) up. It
also inereases the range since there isa larger gap between the minimum (0) and the maximum (6)
When thisoutlicris removed, the mean decreases and the range decreases. The median, however, is unaffected.
To confirm this, le’s calculate it. Before the outlier is removed, there are 19 students, and the median is
represented by the 10th stuclent, who took one flight. After the outlier is removed, there are 18 students, and
the median is represented by the Ith end 10th students, both of whom took one flight. So the median of 1 does
not change. And in fact, outliers typically affect the mean but not the median. Answer| (C)
EXAMPLE 3: ‘The average weight of a group of pandas is 200 pounds. Another panda, weighing 230
pounds, joins the group, raising the average weight of the entire group to 205 pounds. How manly pandas
‘were in the original group?
(Once in a while, you will get a word problem that involves averages. ‘These questions have less to do with
statistics and more to do with algebra, but because we cover averages in this chapter, we decided to cover
these types of word problems here as well.
‘When dealing with average questions on the SAT, think in terms of sums or totals. You can always find the
sum by multiplying the average with the number of subjects.
Let the number of pandas in the original group be x. ‘The total weight of the original group is then 20x, When
another panda joins the group, the number of pandas is x + 1 and the total weight is 205(x + 1)
25
SAT & IG BOOK STORE 01227746409 225(CHAPTER 25
Since that panda weighs 230 pounds,
20x +230 = 205(x +1)
20x + 230 = 208% + 205
bx = 5
‘There were[5 | pandas in the original group.
EXAMPLE 4:
Neighborhood A
‘Number of residents
os as
Number of cars owned
‘Neighborhood B
Number of residents
012445 6
Number of cars owned
‘The bar charts above summarize the number of cars that residents from two neighborhoods, A and B,
‘own, Which of the following correctly compares the standard deviation of the number of cars owned by
residents in each of the neighborhoods?
‘A) The standard deviation of the number of cars owned by residents in Neighborhood A is larger.
B) The standard deviation of the number of cars owned by residents in Neighborhood B is larger.
©) The standard deviation of the number of cars owned by residents in Neighborhood A and
Neighborhood B is the same,
1D) The relationship cannot be determined from the information given,
26
SAT & IG BOOK STORE 01227746409 226Ee THE COLLEGE PANDA
Most of the data for Neighborhood B are at the ends and are much more spread out from the mean, which,
because the bar graph is symmetrical, we can estimate to be 3 cars, The data for Neighborhood A, on the
other hand, are more clustered towards the low end, where the mean is. Therefore, the standard deviation for
Neighborhood B is larger. Answer |(B) |
CHAPTER EXERCISE: Answers for this chapter start on page 310.
‘A calculator is allowed on the following
questions.
TEE oer 6s
Te Be: |
‘The average height of 14 students in one classis | g
63 inches. The average height of 21 students in gt
another class is 68. If the two classes are 3 |
combined, what is the average height, in inches, g |
of the students in the combined class? g 2}
A) 645 Zz
B) 6
0
OQ 6 PS 0 Pm PP
D) 665 SLPS Sf
Books read
EL | the histogram above shows the number of
books read last year by 20 editors at a publishing
Kristie has taken five tests in science class. The company. Which of the following could be the
average of all five of Kristie's test scores is 94. ‘median number of books read by the 20 editors?
The average of her last three test scores is 92.
What is the average of her first two test scores? a
a BR
y ov
B) 9%
D) 2
ov
D) 98
27
SAT & IG BOOK STORE 01227746409 227CHAPTER 25. STATISTICS
es
Miss World Titleholders
eoeceeene
8 19 20 21
“2 Bh
Age (years)
The dotplot above shows the distribution of ages
for 24 winners of the Miss World beauty pageant
at the time they were crowned. Based on the
data, which of the following is closest to the
average (arithmetic mean) age of the winning
Miss World pageant contestant?
A) 19
B) 20
o 2
b) 2
BB 2 See
Locks are sections of canals in which the water
level can be mechanically changed to raise and
lower boats. The table below shows the number
of locks for 10 canals in France:
Name | # Locks
Aisne | 27
Alsace | 25
|" Rhone 5
Centre | 30
[Garonne | 23 |
Lalinde | 27 |
(3 32.
Ea
93
29
Removing which ofthe following two canals
from the data would result in the greatest
decrease in the standard deviation of the
number of locks in each canal?
A) Aisne and Lalinde:
B) Alsace and Garonne
C) Centre and Midi
D)_ Rhone and Vosges
SAT & IG BOOK STORE 01227746409
228THE COLLEGE PANDA
The tables below give the distribution of travel
times between two towns for Bus A and Bus B
‘over the same 40 days.
Bus A
Travel time (minutes) | Frequency
a 5
5 10
a [5 |
so 10
BusB
‘Travel time (minutes) | Frequency |
Bo | 5 |
30 10
35 15
40 10
Which of the following statements is true about
the data shown for these 40 days?
A) The standard deviation of travel times for
Bus A is smaller.
B) The standard deviation of travel times for
Bus Bis smaller.
C) The standard deviation of travel times is,
the same for Bus A and Bus B.
D) The standard deviation of travel times for
Bus A and Bus B cannot be compared with
the data provided.
SAT & IG BOOK STORE 01227746409
229
10
Number of kayaks
oH etuaroe
45 46 474849
Weight (in pounds)
‘The bar chart above shows the distribution of
‘weights (to the nearest pound) for 19 kayaks
made by Company A and 19 kayaks made by
‘Company B. Which of the following correctly
‘compares the median weight of the kayaks made
by each company?
A) The median weight of the kayaks made by
‘Company A is smaller.
B) The median weight of the kayaks made by
‘Company B is smaller.
©) The median weight of the kayaks is the
same for both companies.
D) The relationship cannot be determined
from the information given.
‘Temperature (°F) | Frequency
60. 3
a 4
6 4
a 10
7 7
The table above gives the distribution of low
temperatures for a city over 28 days. What is the
median low temperature, in degrees Fahrenheit
(°F), of the city for these 28 days?
229CHAPTER 25 STATISTICS 1
BS Sse Se
A shoe store surveyed a random sample of 50
customers to better estimate which shoe sizes
should kept in stock. The store found that the
median shoe size of the customers in the sample
is 10 inches. Which of the following statements
must be true?
A) The sum of all the shoe sizes in the sample
is 500 inches.
B) The average of the smallest shoe size and
the largest shoe size in the sample is 10
inches
C) The difference between the smallest shoe
size and the largest shoe size in the sample
is 10 inches.
D) Atleast half of the customers in the sample
have shoe sizes greater than or equal to 10
inches.
A food company hires an independent research
agency to determine its product's shelf life, the
length of time it may be stored before it expires. |
Using a random sample of 40 units of the
product, the research agency finds that the
product's shelf life has a range of 3 days. Which
of the following must be true about the units in|
the sample?
A) Allthe units expired within 3 days.
B) ‘The unit with the longest shelf life took 3
days longer to expire than the unit with the
shortest shelf life.
©) The mean shelf life of the units is 3 more
than the median
D) The median shelf life of the units is3 more
than the mean.
230
SAT & IG BOOK STORE 01227746409
1
, ee See
5 6 7 8 9 0
Integers
‘The graph above shows the frequency
distribution of alist of randomly generated
integers between 5 and 10. Which of the
following correctly gives the mean and the range
of the list of integers?
A) Mean = 7.6, Range = 4
B) Mean = 7.6, Range =5
©) Mean = 82, Range
D) Mean = 82, Range
CE a as
Quiz [1 [2[3f4a[s{ol7]
Score [87 | 75] 90 [85 | 98 [87 [91 ]
‘The table above shows the scores for Jay’s first
seven math quizzes. Which of the following are
true about his scores?
1. The mode is greater than the median,
Il, The median is greater than the mean.
I, The range is greater than 20.
A) Tonly
B) Mlonly
©) Mand tt
D) 1,1 andl
230Eee =
{mam School A
Number of classes
192 3 4 «5
Number of films shown
The bar chart above shows the number of films
shown in class over the past year for 19 classes
in School A and 15 classes in School B. Which of
the following correctly compares the mean and
median number of films shown in each class for
the two schools?
A) The mean and median number of films
shown in each class are both greater in
School A.
B) ‘The mean and median number of films
shown in each class are both greater in
School B.
©) The mean number of films shown in each,
lass is greater in School A, but the median
is the same in both schools.
D) The mean and median number of films
shown in each class are the same in both,
schools.
Calories in Meals
500 | 500 | 520 | 550 | 55%
'550 | 550 | 600 | 600 | 900
The table above lists the number of calories in
each of Mary's last 10 meals. If 900-calorie
‘meal that she had today is added to the values
listed, which of the following statistical
measures of the data will not change?
1. Median
Il, Mode
IIL, Range
A) Land ILonly
B) [and Il only
©) Mand ttl only
D) 1,1, and Mt
21 22 23 24 25 26 27 28 29 30
Gas mileage (miles per gallon)
The dotplot above gives the gas mileage (in
miles per gallon) of 15 different cars. If the dot
representing the car with the greatest gas
mileage is removed from the dotplot, what will
happen to the mean, median, and standard
deviation of the new data set?
A) Only the mean will decrease.
B) Only the mean and standard deviation will
decrease
C) Only the mean and median will decrease.
D) The mean, median, and standard deviation
will decrease.
SAT & IG BOOK STORE 01227746409 231CHAPTER 25 STATISTICSI_
Snowfall (in inches)
45 | 48 | 49 | 50 | 52 | 54
55 | 57 | 57 | 57 | 58 | 59
60 | 60 [61 | 61 | 65 | 90
‘The table above lists the amounts of snowfall, to
the nearest inch, experienced by 18 different
cities in the past year. The outlier measurement
of 100 inches is an error. Of the mean, median,
and range of the values listed, which will change
the most if the 90-inch measurement is replaced
by the correct measurement of 20 inches?
A) Mean
B) Median
©) Range
D) None of them will change.
232
SAT & IG BOOK STORE 01227746409 23226
Statistics II
‘The goal of statistics is to be able to make predictions and estimations based on limited time and information.
For example, a statistician might want to estimate the mean weight of all female raccoons in the United
States. The problem is that it’s impossible to survey the entire female raccoon population. In fact, by the
time that could be accomplished, not only would the data be out of date but there would be new females in
the population. Instead, a statistician takes a random sample of female raccoons to make an estimation of
what the actual mean might be. In other words, the sample mean is used to estimate the population mean.
Using a sample to predict something about the entire population is a common theme in statistics and in SAT
questions,
EXAMPLE 1: A pet food store chose 1,000 customers at random and asked each customer how many pets
4ormore
‘There are a total of 18,000 customers in the store's database. Based on the survey data, whatis the expected
total number of customers who own 2 pets?
Using the sample data, we can estimate the total number who own 2 pets to be
m 5g
18.000 2%, =f
233,
SAT & IG BOOK STORE 01227746409 233CHAPTER 26 STATISTICS It
EXAMPLE 2:
i
H
os tH
% 1000
Heart rate (beats per minute)
vn Bio
a
Oxygen Uptake (liters per minute)
‘The scatterplot above shows the relationship between heart rate and oxygen uptake at 16 different points
during Kyle's exercise routine. The line of best fit is also shown.
PART 1: Based on the line of best fit, what is Kyle's predicted oxygen uptake at a heart rate of 110 beats
per minute?
PART 2: What is the oxygen uptake, in liters per minute, of the measurement represented by the data
point that is farthest from the line of best fit?
Part 1 Solution: Using the line of best fit, we can see that at a heart rate of 110 beats per minute (along the
sani), the oxygen uptake is [15 liters per minute.
Using the line of best fit to make a prediction can be dangerous, especially when
+ we are makinga prediction outside the scope of our data set (predicting the oxygen uptake ata heart rate
‘of 250 beats per minute, for example—you'd probably be dead).
‘* there are outliers that may heavily influence the line of best fit (see Part 2).
+ the data is better modeled by a quadratic or exponential curve rather than a linear one. In this case, a
linear model looks tobe the right one, but something like compound interest may look linear at first even
though it’s exponential growth,
Part 2 Solution: From the scatterplot, we can see that the data point farthest away fom the line of best fits at
1iBalong the »-axis. The point represents an oxygen uptake of [25 liters per minute
[Note that this data point is likely an outlier, which can heavily infiuence the line of best fit and throw off our
predictions. Oulliers should be removed from the data if they represent special cases or exceptions.
Not only will you be asked to make predictions using the line of best fit, but you'll also be asked to interpret
its slope and y-intercept. We'll use the data from this example in the next one to show you how these concepts
are tested.
234
SAT & IG BOOK STORE 01227746409
234THE COLLEGE PANDA
EXAMPLE 3:
‘Oxygen Uptake versus Heart Rate
ae
LT
ttt
TT
Sees
‘Oxygen Uptake (liters per minute)
So 1000
Heart tate (beats per minute)
‘The scatterplot above shows the relationship between heart rate and oxygen uptake at 16 different points
during Kyle's exercise routine. The line of best fit is also shown,
PART 1: Which of the following is the best interpretation of the slope of the line of best fit in the context
ofthis problem?
‘A) The predicted increase in Kyle’s oxygen uptake, in liters per minute, for every one beat per minute
increase in his heart rate
B) The predicted increase in Kyle's heart rate, in beats per minute, for every one liter per minute
increase in his oxygen uptake
©) Kyle's predicted oxygen uptake in liters per minute at a heart rate of 0 beats per minute
D) Kyle's predicted heart rate in beats per minute at an oxygen uptake of O liters per minute
PART 2: Which of the following. is the best interpretation of the y-intercept of the line of best fit in the
context of this problem?
A) The predicted increase in Kyle's oxygen uptake; i
‘increase in his heart rate
Titers pet minute, for every one beat per ininute
B) The predicted increase in) Kyle's heart rate, in beats per minute, for every one liter per minute
increase in his oxygen uptake
©) Kyle's predicted oxygen uptake in liters per minute at a heart rate of d beats per minute
D) Kyle's predicted heart rate in beats per minute at an oxygen uptake of O liters per minute
Part 1 Solution: As we learned in the linear model questions in the interpretation chapter, the slope is the
increase in y (oxygen uptake) for each increase in x (heart rate). The only difference now is that it’s a predicted
increase. The answer is[(A) |
Part 2 Solution: The intercept isthe value of y oxygen uptake) when x (the her eat) 0, The answer
[(©) Nove tat this value would have no significance in el ifesince you woul be deat a ear rate of
This again illustrates the danger of predicting values outside the scope of the sample data.
235
SAT & IG BOOK STORE 01227746409 235CHAPTER 26 STATISTICS __
EXAMPLE 4: Malden is a town in the state of Massachusetts. A real estate agent randomly surveyed 50
apartments for sale in Malden and found that the average price of each apartment was $150,000. Another
real estate agent intends to replicate the survey and will attempt to get a smaller margin of error. Which
Of the following samples will most likely result in a smaller margin of error for the mean price of an
apartment in Malden, Massachusetts?
‘A) 30 randomly selected apartments in Malden
'B) 20 randomily selectedt apartments in all of Massachusetts
©) 80 randomly selected apartments in Malden
1D) 80 randomly selected apartments in all of Massachusetts
The answer is| (C)| The margin of error refers to the room for error we give to an estimate. For example, we
could say the mean price of an apartment in Malden is $150,000 with a margin of error of $10,000. This implies
that the true mean price ofall apartments in Malden is likely between $140,000 and $160,000. This interval is
called a confidence interval (see Example 6).
To get a smaller margin of error in Example 4, we should first only select from apartments in Malden. Selecting,
apartments from all of Massachusetts not only introduces more variability to the data but also strays from the
original intent of the survey, which is to find the average price of Malden apartments. Secondly, we should use
larger sample size. This is common sense. The more apartments we survey, the more accurate our data and
‘our estimations are and the lower our margin of error is.
In fact, the margin of error for any estimate from an experiment depends on two factors:
+ Sample size
+ Variability in the data (often measured by standard deviation)
The larger the sample size and the less variable the data is, the lower the margin of error. We typically can’t
control the standard deviation of the data (how spread out itis), but we can control the sample size. So why
don’t researchers always use huge sample sizes? Because it’s too costly and time-consuming to gather data
from everyone and everywhere.
EXAMPLE 5: Researchers conducted an experiment to determine whether exercise improves stident
exam scores. They randomly selected 200 students who exercise at least once a week and 200 students
who do not exercise at least once a week. After tracking the students’ academic performances for a year,
the researchers found that the students who exercise at least once a week performed significantly better
‘on the same exams than the students who do not, Based on the design and results of the study, which of
the following is an appropriate conclusion?
‘A) Exercising at least once a week is likely to:improve exam scores.
1B) Exérvising three times a week improves exam scotes more than exercising just once a week.
©) Any student who starts exercising at least once a week will improve his or her exam scores.
D) There is a positive association between exercise and student exam scores.
This question deals with a classic case of association (also called correlation) vs. causation. Just because
students who exercise got better exam scores doesn’t mean that exercise causes an improvement in exam
scores, It’s ust associated with an improvement in exam scores. Perhaps students who exercise just have more
discipline or they have more demanding parents who make them study harder. Due to the way the experiment
was designed, we can’t tell what the underlying factor is.
236
SAT & IG BOOK STORE 01227746409
236THE COLLEGE PANDA
‘Therefore, answer (A) is wrong because it implies causation. Answer (B) is wrong because it not only implies
‘causation but also implies that the frequency of exercise matters, something that wasn’t tracked in the experiment
Answer (C) is wrong because it suggests a completely certain outcome. Even if exercise DID improve exam
scores, not every single student who starts exercising will improve their scores. There might be students for
whom exercising makes their scores worse. Any conclusion drawn from sample data is a generalization and
should not be regarded as a truth for every individual.
‘The answer is|(D) | There is a positive association between exercise and student exam scores.
One of the things the researchers did correctly was to take random samples from each group. The key word is,
random. If the samples weren’t random, we wouldn’t even have been able to conclude that there is a positive
association between exercise and exam scores. Why? Let's say the researchers picked 30 students from the
tennis team for the exercise group and 30 students who just play video games all day for the non-exercise
‘group. Definitely not random. Now, did the exercise group do better on their exams because they exercise
or because they play tennis? Or was it the video games that made the non-exercise group perform worse?
Because the selection isn’t random, we can't tell how each factor influences the result. When the selection is,
random, all the factors except the one we're testing are “averaged out.”
Now what f the researchers wanted to see whether exercise does indeed cause an improvement in exam scores.
What should they have done differently? The answer is random assignment. Instead of randomly selecting 200
students from one group that already exercises regularly and 200 students from another group that does not,
they should have just randomly selected 400 students. The next step would be to randomly assign each student
to exercise or not. Everyone in the exercise group is forced to exercise at least once a week and everyone in the
rnon-exercise group is not allowed to exercise. If the exercise group performs better on the exams, then we can
conclude that exercise causes an improvement in exam scores, Of course, conducting this type of experiment
can be extremely difficult, which is why proving, causation can be such a monumental task
‘The following list summarizes the conclusions you can draw from different experimental designs involving
two variables (e.g. exercise and exam scores)
1. Subjects not selected at random & Subjects not randomly assigned
‘¢ Results cannot be generalized to the population.
Cause and effect cannot be proven.
© Example: Researchers want to see whether medication X is effective in treating the flu. People
with the flu from Town A receive medication X. People with the flu from Town B receive a placebo
(sugar pill). More people in the medication X group experience a reduction in flu symptoms. The
generalization that medication X is associated with a reduction in flu symptoms cannot be made
since it was only tested in Town A and Town B (sample was not randomly selected from the general,
population). There may be something special about Town A and Town B. No cause and effect
relationship can be established because the medication was not randomly assigned. Perhaps Town
A experienced a less severe flu epidemic.
2. Subjects not selected at random & Subjects randomly assigned
‘+ Results cannot be generalized to the population.
‘* Cause and effect can be proven.
ample: Researchers want to see whether medication X is effective in treating the flu. People
with the flu from Town A and Town B are randomly assigned to either medication X or a placebo
(sugar pill). More people in the medication X group experience a reduction in flu symptoms. The
generalization that medication X is effective for everyone cannot be made since it was only tested
in Town A and Town B (sample was not randomly selected from the general population). Perhaps
only one particular strain of the flu exists in Town A and Town B. A cause and effect relationship
can be established because the medication was randomly assigned. For the people in Town A and
237
SAT & IG BOOK STORE 01227746409 237CHAPTER 26 STATISTICS IL
‘Town B, we can conclude that medication X causes a reduction in flu symptoms. Note that this is
still just a generalization—as with any other medicatio
definitely get better, even if you live in Town A or Town B.
nedication X does not guarantee you will
3. Subjects selected at random & Subjects not randomly assigned.
+ Results can be generalized to the population,
'* Cause and effect ¢
rnot be proven.
* Example: Researchers want to see whether medication X is effective in treating the flu. People
with the flu from the general population are randomly selected. They are given the choice of a
new medication (medication X) oF a traditional medication (really a sugar pill). More people in the
‘medication X group experience a reduction in flu symptoms. We can generalize that people who
choose to receive medication X fare better than those who don’t. However, no cause and effect
relationship can be established because the medication was not randomly assigned. We don't know
whether the reduction in symptoms is due to the medication or a difference between those who
volunteered and those who didn’t
4. Subjects selected at random & Subjects randomly assigned
Results can be generalized to the population.
‘¢ Cause and effect can be proven.
Fxample: Researchers want to see whether medication X is effective in treating the flu. People
ith the flu from the general population are randomly selected. Using, a coin toss (heads or tails),
researchers randomly assign each person to either medication X or a placebo (sugar pill). More
people in the medication X group experience a reduction in flu symptoms. We can conclude that
medication X causes a reduction in flu symptoms. This conclusion can be generalized to the entire
Population of people with the flu.
EXAMPLE 6; Environmentalists are testing pH levels ina forest that is being harmed by acid rain. They
analyzed water samples from 40 rainfalls in the past year and found that the mean pH of the watersamples
has a 95% confidence interval of 3.2 to 3.8, Which of the following conclusions is the most appropriate
based on the confidence interval?
‘A) 95% of all the forest rainfalls in the past year havé a pH between 3.2.and 3.8.
'B) 95% ofall the forest rainfalls im the past decade have a pH between 3.2 and 3.8.
©) Itis plausible that the true mean pH of all the forest rainfalls in the past yearis between 3.2 and 3.8
1D) Tks plausible that the true mean pH of all the forest rainfalls in the past decade is between 3.2 and
38.
If you don’t know what a confidence interval is, don’t worry. You'll never need to calculate one and the SAT
makes these questions very easy. Alla confidence interval does is tell you where the true mean (or some other
statistical measure) for the population is likely to be (e.g. between 3.2 and 3.8). Even though the SAT only
brings up 95% confidence intervals, there are 97% and 99% (any percentage) confidence intervals. The higher
the confidence, the more likely the true mean falls within the interval. So in the example above, we can be
quite confident that the true mean pH of al the forest rainfalls in the past year is between 3.2 and 38. Answer
(©] The answer is not (D) because we cannot draw conclusions about the past decade when all the samples
were gathered from the past year.
A confidence interval does NOT say anything about the rainfalls themselves. You cannot say that any one
rainfall has a 95% chance of having a pH between 3.2 and 3.8, and you cannot say that 95% of all the forest
rainfalls in the past year had a pH between 3.2 and 3.8. Always remember that a confidence interval applies
238
SAT & IG BOOK STORE 01227746409
238THE COLLEGE PANDA
only fo the mean, which is a statistical measurement, NOT an individual data point or a group of data
points.
Secondly, a 95% confidence interval does not imply that there is a 95% chance it contains the true mean. Even.
though confidence intervals are computed for the mean, you cannot say that the interval of 3.2 to3.8 has a 95%
chance of containing the true mean pH.
‘So what does it mean in statistics to be 95% confident in something? If the experiment were repeated again and.
again, each with 40 water samples, 95% of those experiments would give us a cor
the true mean. In other words, the confidence
idence interval that contains
terval given in the example is the result of just one experiment
Another run of the same experiment (another 40 samples) would produce a different confidence interval. Keep
on getting these confidence intervals and 95% of them will contain the true mean. So the 95% pertains to al
the confidence intervals generated by repeated experiments, NOT the chance that any one confidence interval
contains the true mean. Again, don’t worry about how confidence intervals are calculated, but be aware that
this is how “confidence” is defined in statistics.
239
SAT & IG BOOK STORE 01227746409 239CHAPTER 26 STATISTICS IL
CHAPTER EXERCISE: Answers for this chapter start on page 312.
A calculator is allowed on the following | BSS ee
aquestons
Bek eae
Traffic Light Violations in Various Towns
100
Bes essa 8
t
30 40 50 60 70 80 9% 100
Number of traffic lights
10 11 12 13 14 15 16 17 18 19 20
Average weekly number of traffic light violations
‘Age (years)
‘The scatterplot above shows the relationship
between age, in years, and shoe size for 24 males
between 10 and 20 years old. The line of best fit
is also shown. Based on the data, how many 19
year old males had a shoe size greater than the
‘one predicted by the line of best fit?
The scatterplot above shows the number of
traific lights in 15 towns and the average weekly
‘number of traffic light violations that occur in
each town. The line of best fit is also shown.
Based on the line of best fit, which of the
following is the predicted average weekly
number of traffic light violations in a town with
Al | Brraffic lights?
B2 A) 40
o3 i
D4 |
240
SAT & IG BOOK STORE 01227746409A university wants to determine the dietary
preferences of the students in its freshman class.
Which of the following survey methods is most
likely to provide the most valid results?
A) Selecting a random sample of 600 students
from the university
B) Selecting a random sample of 300 students
from the university's freshman class
C) Selecting a random sample of 600 students
from the university's freshman class
D) Selecting a random sample of 600 students
from one of the university's freshman
dining halls
‘Two candidates are running for governor of a
state. A recent poll reports that out of a random
sample of 250 voters, 110 support Candidate A
and 140 support Candidate B. An estimated
{500,000 state residents are expected to vote on
election day. According to the poll, Candidate B
is expected to receive how many more votes
than Candidate A?
A) 60,000
B) 130,000
©) 220,000
D) 280,000
241
SAT & IG BOOK STORE 01227746409
THE COLLEGE PANDA
Consumer Behavior during Store Sales
Sa8
5
0
‘Average shopping time (minutes)
0 5 10 15 20 25 30 35 40 45 50
Store Discount (%)
Shopping time refers to the time a customer
spends in one store. The scatterplot above shows
the average shopping time, in minutes, of
customers at 26 different stores offering various
discounts. The line of best fit is also shown.
Which of the following is the best interpretation
of the meaning of the y-intercept of the line of
best fit?
A) The predicted average shopping time, in
minutes, of customers ata store offering no
discount
‘The predicted average shopping time, in
minutes, of customers at a store offering a
50% discount
‘The predicted increase in the average
shopping time, in minutes, for each one
percent increase in the store discount
‘The predicted average number of
customers at a store offering no discount
B)
°)
D)
241CHAPTER 26 STATISTICS I
ES ee SS
Advertising for 16 Companies
ES ae
Movie Length versus Box Office Sales
zB
z
Revenue (in thousands of dollars)
0 10 20 30 40 50 60 70 80 90 100
Advertising Expenses (in thousands of dollars)
‘The scatterplot above shows the relationship
between revenue and advertising expenses for
16 companies. The line of best fit is also shown.
Which of the following is the best interpretation
‘of the meaning of the slope of the line of best fit?
‘A) The expected increase in revenue for every
one dollar increase in advertising expenses
B) The expected increase in revenue for every
one thousand dollar increase in adve
expenses
‘The expected increase in advertising
‘expenses for every one thousand dollar
increase in revenue
D) The expected revenue of a company that
hhas no advertising expenses
°
s
Box Office Sales (in millions of dollars)
o88aees
160 70 80 90 100 110 120 130 140 150
Movie Length (minutes)
‘The scatterplot above plots the lengths of 15,
movies against their box office sales. The line of
best fit is also shown. Which of the following is
the best interpretation of the meaning of the
slope of the line of best fit?
A) The expected decrease in box office sales
per minute increase in movie length
B) The expected increase in box office sales per
mute increase in movie length
C) The expected decrease in box office sales
per 10-minute increase in movie length
D) The expected increase in box office sales per
10-minute increase in movie length
242
SAT & IG BOOK STORE 01227746409
242‘Mistakes Made in Incentive-based Task
THE COLLEGE PANDA
Fat and Calories of lee Cream.
Boss 88
Number of mistakes made
-.
Total calories
0
0 100 200 300 400 500 600
Prize (in dollars)
Ina psychological study, researchers asked
participants to each complete a difficult task for
a cash prize, the amount of which varied from.
participant to participant. The results of the
study, as well as the line of best fit, are shown in
the scatterplot above. Which of the following is
the best interpretation of the meaning of the
grintercept of the line of best fit?
A) The expected decrease in the number of
iiistakes made per dollar increase in the
cash prize
B) The expected increase in the number of
‘iistakes made per dollar increase in the
‘cash prize
‘The expected dollar amount of the cash
prize required for a person to complete the
task with 0 mistakes
‘The expected number of mistakes a person
makes in completing the task when no cash
prize is offered
°
Db)
SAT & IG BOOK STORE 01227746409
283,
‘Total fat (grams)
‘The scatterplot above shows the fat content and
calorie counts of 8 different cups of ice cream.
Based on the line of best fit to the data shown,
‘what is the expected inerease in the number of
calories for each additional gram of fat in a cup
of ice cream?
AS
B) 8
©) 20
D) 40
243bk eae ae
z Food Courts in Various Malls
> 3 100 T T
g i ; 4 |
R g ’ |
3 a 1
i i So
i i
3 5
i é
* 9 goo +.
0 100 200 300-400 500 2B 4 5 6 7
Amount of nitrogen applied (pounds per acre) Number of restaurants
“The scatterplot above shows the amount of |The scatterplot above shows the distribution of
nitrogen fertilizer applied to 8 oat fields and ie eo ne Hee irene nel ooe
their yields. The line of best fit is also shown, fares ie MeO Dee A lee con
Which of the following is closest to the amount | __ According to the data, what is the total number
of nitrogen applied, in pounds per acre, the ‘of seats at the food court represented by the data
cat field whose yield is best predicted by the line, _ Point that is farthest from the line of best fit?
of best fit? |g) 200
A) 200 B) 240
B) 350 ©) 320
©) 400 D) 560
D) 450
Researchers must conduct an experiment to see
whether a new vaccine is effective in relieving
cortain allergies. They have selected a random.
sample of 100 allergy patients. Some of the
patients are assigned to the new vaccine while
the rest are assigned to the traditional treatment.
Which of the following methods of assigning
cach patient's treatment is most likely to lead to
a reliable conclusion about the effectiveness of
the new vaccine?
A) Females are assigned to the new vaccine.
B) Those who have more than one allergy are
assigned to the new vaccine,
©) The patients divide themselves evenly into
two groups. A coin is tossed to decide
which group receives the new vaccine.
D) Each patient is assigned a random number.
‘Those with an even number are assigned to
the new vaccine,
244
SAT & IG BOOK STORE 01227746409 244A basketball manufacturer selects a random
sample of its basketballs each week to ensure a
consistent air pressure within them is
maintained. In Week 1, the sample had a mean
air pressure of 8.2 psi (pounds per square inch)
and a margin of error of 0.1 psi. In Week 2, the
sample had a mean air pressure of 7.7 psi and a
margin of error of 0.3 psi. Based on these results,
which of the following is a reasonable
conclusion?
A) Most of the basketballs produced in Week 1
hhad an air pressure under 8.2 psi, whereas
most ofthe basketballs produced in Week 2
had an air pressure under 7.7 psi
B) The mean air pressure of all the basketballs
produced in Week 1 was 0.5 psi more than
the mean air pressure of al the basketballs
produced in Week 2.
© Thenumber of basketballs in the Week 1
sample was more than the number of
basketballs in the Week 2 sample
D) tis very likely that the mean air pressure
of all the basketballs produced in Week 1
was less than the mean air pressure ofall
the basketballs produced in Week 2.
A student is assigned to conduct a survey to
determine the mean number of servings of
vegetables eaten by a certain group of people
each day. The student has not yet decided which
group of people will be the focus of this survey.
Selecting a random sample from which of the
following groups would most likely give the
smallest margin of error?
A) Residents of the same city
B) Customers of a certain restaurant
©) Viewers of the same television show
D) Students who are following the same daily
diet plan
SAT & IG BOOK STORE 01227746409
[i | Ee ae
245
THE COLLEGE PANDA
The length of a blue-spotted salamander’s tail
can be used to estimate its age. A biologist
selects 80 blue-spotted salamanders at random
and finds that the average length of their tails
has a 95% confidence interval of 5 to 6 inches.
Which of the following conclusions is the most
appropriate based on the confidence interval?
A) 95% of all blue-spotted salamanders have a
tail that is between 5 and 6 inches in length.
B) 95% ofall salamanders have a tal that is,
between 5 and 6 inches in length.
C) The true average length of the tails ofall
blue-spotted salamanders is likely between
Sand 6 inches,
D) The true average length of the tails of all
salamanders is likely between 5 and 6
inches.
a
An economist conducted research to determine
a relationship between the price
of food and population density. He collected
data from a random sample of 100 US. cities and
found significant evidence that the price of food
ower in places with a high population
density. Which of the following conclusions is
best supported by these results?
A) InUS. cities, there isa positive association
‘between the price of food and population
density,
B) InUS. cities, there is a negative association
between the price of food and population
density.
CC) InUS cities, a decrease in the price of food
is caused by an increase in the population
density.
D) InUS. cities, an increase in the population
density is caused by a decrease in the price
of food.
245