0% found this document useful (0 votes)
327 views20 pages

Correlation Is Not Causation !!

1) This lesson teaches students how to determine if two variables are correlated or if one causes changes in the other by exploring scatter plots, correlation, causation, and regression to the mean. 2) Part 1 discusses correlation and scatter plots to explore if two variables are related. Part 2 covers determining causation using a causation checklist. 3) Part 3 explains that variables often change over time due to regression to the mean rather than one directly causing the other. 4) Part 4 has students apply what they've learned to assess claims about relationships between variables in case studies.

Uploaded by

danielmugabo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
327 views20 pages

Correlation Is Not Causation !!

1) This lesson teaches students how to determine if two variables are correlated or if one causes changes in the other by exploring scatter plots, correlation, causation, and regression to the mean. 2) Part 1 discusses correlation and scatter plots to explore if two variables are related. Part 2 covers determining causation using a causation checklist. 3) Part 3 explains that variables often change over time due to regression to the mean rather than one directly causing the other. 4) Part 4 has students apply what they've learned to assess claims about relationships between variables in case studies.

Uploaded by

danielmugabo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

CORRELATION vs CAUSATION TEACHER GUIDE

Does Ice Cream Make the Sun Shine?


TEACHER INFORMATION
CORRELATION vs CAUSATION: DOES ICE CREAM MAKE THE SUN SHINE?

Level: High School Lesson Objectives:


  How to work out if two variables are related
Mathematics Area: Statistics and Probability How to work out if one variable causes change in other variable
(CCSS.MATH.CONTENT.HSS.ID.C.9)  How to use these skills to take better decisions in our lives
   
Decision Education Area: Dispositions and Knowledge
(Statistical Reasoning) Part 1: Correlation 
  This section introduces the tools we use to explore whether two
Topics covered: Scatter Plots, Correlation, Causation, variables are related (scatter plots, correlation strength/type)
Regression to the Mean, Assessing Claims of Causation  
  Part 2: Causation
Delivery Time: 90 minutes (or 2 x 45 minutes). Can be This section introduces 'The Causation Checklist', highlighting
shortened by discussing questions in pairs /as a group, helpful criteria we can use to investigate whether one
in place of creating written answers variable actually drives the change in the other variable
 
Equipment required: Part 3: Regression to the Mean
Pupil Guide – 1 per pupil This section explores why variables typically change anyway
Computer with projector - ‘Pupil Guide’ PDF also serves as over time, with extremes returning to average, so we must be
lesson presentation slides careful when suggesting a cause for such changes
Dice – 1 per pupil  
Counters x 10 (to represent ‘speed cameras’) Part 4: Decision Scope
This section provides students with the opportunity to use their
Helpful Links: knowledge of correlation and causation to assess claims in
Scatter Plot Tool - https://scatterplot.online/ given case studies
The Simpsons Bear Patrol Clip - https://tinyurl.com/y9htbpoc
OVERVIEW
WE'LL EXPLORE...
... whether ice cream makes the sun shine,
whether speed cameras reaLLY REDUCE CAR ACCIDENT NUMBERS,
and how to tell if your lucky streak is about to run out

LESSON OBJECTIVES...

How to work out if two variables are relatEd


how to work out if one variable CAUSES the change in the other variable
how to use these skills to take better decisions in our lives
Check that students know what a VARIABLE is:

a quantity that can vary over time


CONTENTS :  
PART 1: CORRELATION - Looking for Links

PART 2: CAUSATION - Looking for CAUSES

PART 3: REGRESSION TO THE MEAN - THE HIDDEN DRIVER

PART 4: DECISION SCOPE - APPLYING WHAT WE KNOW

SUMMARY
WHAT DO WE THINK?
This activity is intended to get students thinking about why this topic is important. Students to discuss in pairs to produce written answers,

then review answers as a class discussion


Baseball players who hit the most home runs in the first
Are people with more money happier than people with
half of the season typically hit much fewer in the second
less? How could we find out?
half of the season. Why do you think that might be?
Encourage students to share observations from their own

experience, but also to think about how to gather relevant data, Students may think that a strong performance is based on skill

perhaps by asking a large number of people to provide alone, so may be surprised by this. Encourage them to think

information about personal income and self-reported happiness about other factors behind a strong performance (e.g. good luck)

Some superstitious sports fans (and players) have Why might it be helpful to work out if different variables
bizarre rituals that they always perform before a game. are related to each other, or if one variable is even driving
Why? How could we check to see if they really work? the change in the other variable?

People who perform such rituals usually believe that there is Encourage students to think more broadly about the

a relationship between their behavior and the team's importance of establishing which variables are driving

performance. Could be checked by stopping/changing ritual outcomes that they may care about e.g. health, education etc.

The more ice-creams sold on any given day, the hotter If you could investigate the relationship between any
the temperature. Is this proof that ice cream sellers two variables, which would you pick? (e.g. meditation vs
control the weather? stress, family size vs self confidence, screen time vs
academic performance…?)
Encourage students to see that relationships can work in Encourage students to think of an outcome that they genuinely

different directions. Ask them to consider if it's more likely that care about, so that they might reflect on the importance of

the weather is determining the ice cream sales establishing which variables may influence it
PART 1: CORRELATION - Looking for Links
Are these two variables related?
Check that the students can plot coordinates. Read through steps as a class, demonstrating how the scatter graph is constructed

SCATTER PLOTS Visualising a Relationship


Scatter plots are used to investigate the relationship
between two variables by turning data into a picture

How to Create a Scatter Plot:


(1) Plot each point on the graph, using the values as
coordinate points ... is plotted on
(2) Draw a line through the points, which shows the the grid
general direction of the points. The line should go as
much through the middle of all the points as is possible.
This is called the 'line of best fit'

Example Scatter Plot

Temp Ice Creams


(celsius) (number)
Each of these Can you describe the relationship between the temperature
points... and number of ice creams sold?
22 52
Students should note that, as the temperature increases, the
15 30

18 42 number of ice creams sold also increases. Temperature is on the

This data shows


13 19 x-axis since it is the independent variable (despite the lesson title!)
the temperature
19 44
and number of What could we do if we wanted to be more confident about
23 53 ice creams sold our description of this relationship?
17 35 at one location Various answers, though the simplest would be to increase
21 48 on ten different
16 35 days the total amount of data by making observations on a

18 38 greater number of days


PART 1: CORRELATION - Looking for Links
Are these two variables related?
Review steps as a class, then support students in building their own scatter plots, in particular, with drawing the line of best fit

BUILD YOUR OWN SCATTER PLOT

By following the steps outlined on the previous page, use


the data below to build your own scatter plot on the
empty grid.

Data for Your Scatter Plot


This is data for 10 students, showing the number of times they
were absent from a Math lesson during the last school year,
and the score they received on the end-of-year test:

Absences Test Score


(number) (percentage)

4 84
11 70
What does the scatter plot tell us about the relationship
between the number of absences and test scores?
13 68
Students should observe that as the number of absences
8 75
increases, the test scores decrease
15 62
3 87
17 55 Can you give an explanation for why the student who was
5 73 absent 5 times falls quite far below your line?
14 62 This student was rarely absent but still performed poorly. His

9 76
weak performance is likely due to general difficulties with

Math, rather than poor attendance


PART 1: CORRELATION - Looking for Links
Are these two variables related?
Review this page with the students as a class

THE LANGUAGE OF CORRELATION TYPE of Correlation


This describes if the variables are moving in the same
How are these variables related? direction.

Correlations can be described by STRENGTH and TYPE


POSITIVE
The variables move in the
STRENGTH of Correlation same direction. As one
This is a measure of how close the points are to falling on a increases, the other one
straight line - the closer to the line, the stronger the increases too
correlation.

VERY STRONG
The dots are closely clustered
NEGATIVE
along the line
The variables move in
opposite directions. As one
increases, the other
decreases

MODERATE  NO CORRELATION
The dots fall a little further from There is no relationship
the line between how the variables
change
PART 1: CORRELATION - Looking for Links
Are these two variables related?
Students to discuss in pairs, or work individually, to produce written answers, then review answers as a class

DESCRIBING CORRELATIONS
Using the language on the previous page, describe the
correlations shown in the following scatter plots.
Remember to describe both the STRENGTH and TYPE of
correlation

Students should note that, as the population size increases,

the country medal score increases. Students should describe

correlation as MODERATE and POSITIVE

Briefly review as class, so students are introduced to the idea 

CORRELATION COEFFICIENT
Statisticians describe these kinds of relationships using a
Students should note that, as the number of training hours
value, r, which can vary from -1 to 1.
increases, the average 100m time decreases. Students should Values between 0 and 1 indicate a POSITIVE correlation.
Values between 0 and -1 indicate a NEGATIVE correlation.
describe the correlation as STRONG and NEGATIVE
Values further from 0 indicate a STRONGER correlation
Values closer to 0 indicate a WEAKER correlation
PART 2: CAUSATION - Looking for CAUSES
Does one variable drive the change in the other?
Review this first section with the students as a class Students discuss in pairs to produce written answers, then

CORRELATION, YES. BUT CAUSATION...? review as a class

Variables can be linked without one causing the PRESIDENTIAL FOOTBALL


other The outcome of the Washington
Redskins’ final game of the season (A)
predicted if the challenger candidate
CORRELATION: Variable A and Variable B are related. won the presidential election (B), in
every election from 1942 to 2000.
A B
Do these football games determine
CAUSATION: Change in Variable A causes change in who becomes president?
Variable B
A B There is no plausible mechanism by which the football

games could influence the election. This is most likely a

Students discuss in pairs to produce written answers, then review coincidence.

as a class

THE MIGHTY WINDMILL BUILDING A CHECKLIST


The faster windmill blades rotate (A), Can you think of any other examples of two variables that
the greater the strength of the wind (B). might be correlated but not be causally related?

Does the rotation of windmills blades Many possible answers. Encourage students to consider

cause the wind? examples from their own experience.

What alternative explanations should we consider before


Encourage students to articulate why someone might suggest suggesting a causal relationship between two variables?

this, but also why it's incorrect. The windmill blades don't drive Gather initial ideas from students here, but this topic is fully

the wind, the wind drives the windmill blades. explored on the next two pages.
PART 2: CAUSATION - Looking for CAUSES
Does one variable drive the change in the other?
Review this page with the students as a class

THE CAUSATION CHECKLIST


There are a number of ways that two variables can be related, without A causing B

Example
REVERSE CAUSATION Winter coat usage (A) correlates with
Rather than A causing B, would
A B cold weather (B), but cold weather
B causing A make more sense? actually causes winter coat usage

Example
CONFOUNDERS
A Basketball performance (A) and shoe size
Might some other variable, C, C (B) are correlated, but 'height' (C) drives
actually be causing both A and B? B
both basketball performance and shoe size

Example
COINCIDENCE From 2000-2009, the amount of cheese
If there’s no reasonable connection,
A eaten per person (A) correlated with the
could it just be a coincidence? B number of people who died by becoming
tangled in their own bedsheets (B) ....!

A further consideration ... This is still CAUSATION, but


included since important idea

Example
MULTIPLE CAUSES A
Having good friends (A) correlates with
Might A be only one of many C B well-being (B), but many other factors
causes of B? D (C, D...etc.) also contribute to well-being 
PART 2: CAUSATION - Looking for CAUSES
Does one variable drive the change in the other?
Students to discuss in pairs, or work individually, to produce written answers, then review answers as a class

CORRELATION OR CAUSATION? More babies who sleep with a


nightlight in their rooms
Each of the following claims of causation is unjustified.
develop vision problems ...
Use THE CAUSATION CHECKLIST to write down a more
so nightlights cause vision problems
likely explanation for each one:
Briefly mentioned so students become familiar with the idea  Most likely explanation is CONFOUNDERS. The use of nightlights is

Beware the Confirmation Bias! Be careful to scrutinize conclusions


likely to be more common for parents with poor vision, who are also
that you like as rigorously as ones you don't! Research suggests we
are less critical of evidence that supports our existing beliefs more likely to have children who share their genetic vision problems

More students who use a tutor In the 1990s, the stork population of
have poor academic grades ... Germany increased and the at-home
so tutors damage academic birth rate also increased ...
performance so storks really do deliver babies

Most likely explanation is REVERSE CAUSATION. Students who Most likely explanation is COINCIDENCE. There is no

have poor grades are more likely to seek out a tutor reasonable mechanism by which the stork population could

drive the birth rates.

More people die if they sleep in a The more books in a US household,


hospital bed than in their own bed ... the stronger the academic performance
so hospitals beds cause death of the children in the family ...
so books lead to academic success

Most likely explanation is CONFOUNDERS. Illness increases Most likely explanation is CONFOUNDERS. Parents who have

chance of being in a hospital bed, and also increases chance many books in the home typically place great value on learning,

of death so also actively support their children with their education


HOMER AND LISA
on CORRELATION  vs CAUSATION
Link to clip is given on the TEACHER INFORMATION page. If time, watch clip & discuss Homer's errors in statistical reasoning with the class

HOMER: NOT A BEAR IN SIGHT. THE BEAR PATROL MUST BE WORKING LIKE A CHARM.
LISA: THAT’S SPECIOUS REASONING, DAD.
HOMER: THANK YOU, DEAR.
LISA: BY YOUR LOGIC I COULD CLAIM THAT THIS ROCK KEEPS TIGERS AWAY.
HOMER: OH, HOW DOES IT WORK?
LISA: IT DOESN’T WORK.
HOMER: UH-HUH.
LISA: IT’S JUST A STUPID ROCK.
HOMER: UH-HUH.
LISA: BUT I DON’T SEE ANY TIGERS AROUND, DO YOU?
[HOMER THINKS OF THIS, THEN PULLS OUT SOME MONEY]
HOMER: LISA, I WANT TO BUY YOUR ROCK.
[LISA REFUSES AT FIRST, THEN TAKES THE EXCHANGE]
PART 3: REGRESSION TO THE MEAN - THE HIDDEN DRIVER
Was that change going to happen anyway?
Review this first section with the students as a class Discuss the following questions as a class

CHANGE WITHOUT A CAUSE ACCIDENT RECORD


When we work to change something and it then improves, STREET NAME:
we might be taking credit for a change that would have
ACCIDENTS YEAR 1:
happened anyway. The role of luck leads to variables
we measure going up and down even without a cause, ACCIDENTS YEAR 2:
though we are often tempted to search for
a story to explain why. What happened to the number of accidents on the
DANGER-STREETS after the speed cameras were
introduced?
THE SPEED TRAP TEST
Students should observe that the number of accidents on
Do Speed Cameras Really
Reduce Accident Numbers?
danger-streets drops from Year 1 to Year 2

Use COUNTERS to represent SPEED CAMERAS Were the speed cameras really responsible for the
change in accident numbers?
How to play: No. Most likely result in Year 2 is a mid-range number, which will

(1) Using the ACCIDENT RECORD box, write down a


be viewed as a drop from the high number in the first year
street name you know, and then stand up.
(2) Your teacher will now give everyone a die. Roll your What would happen if you gave speed cameras to the
die twice and add the scores. This is the number of safest streets (scores 2, 3, or 4)? Play again to find out!
ACCIDENTS in Year 1 on your street. Again, since mid-range numbers are most likely, we would expect
(3) If you score 9 or less, you have a SAFE-STREET. Sit
down. You don't need a speed camera. the number of accidents on safest street in Year 2 to INCREASE

(4) If you score 10,11,12, you have a DANGER-STREET. What does this activity tell us about the effectiveness of
Your teacher will now give you a speed camera, and speed cameras?
ask you repeat Step 2 to find the number of accidents It tells us accident numbers would probably fall just by chance,
on your street in the following year.
so we actually don’t know if speed cameras are effective or not
PART 3: REGRESSION TO THE MEAN - THE HIDDEN DRIVER
Was that change going to happen anyway?
Review the first two sections with the students as a class, reading the story aloud. For the questions, work individually, or in pairs, to produce
written answers, then discuss as a class
PERFOMANCE, SKILL & LUCK
THE MATHEMATICS OF HIGH PERFORMANCE
Most outcomes are a result of two main factors - skill
The speed trap test was based purely on
and luck
random chance, but do we see the same effect
with elite performance, where skill is involved?
PERFORMANCE = SKILL + LUCK
In his book 'Thinking Fast & Slow' psychologist
Daniel Kahneman tells a story about performance.
He was explaining to instructors who teach pilots
that praise works better than punishment. SKILL consistent but... .... LUCK is not
However, one of the most experienced instructors
tells him that he's wrong. The instructor explains: GREAT PERFORMANCE: POOR PERFORMANCE:
GOOD SKILL + GOOD LUCK POOR SKILL + BAD LUCK

"On many occasions I have praised flight luck likely to change, luck likely to change,
cadets for clean execution of some aerobatic so performance will so performance will
maneuver. The next time they try the same probably dip to probably improve to
maneuver, they usually do worse. On the other average average
hand, I have often screamed into a cadet's
How had the aircraft instructor misunderstood the
earphone for bad execution, and in general he
impact of his teaching methods?
does better on his next try. So please don't tell
us that reward works and punishment does Students should understand that he believed his words were

not, because the opposite is the case." responsible for the following changes in performance

Does this story suggest that one teaching tactic


(punishment or praise) is more effective than the other?
No. We expect extreme performance to be followed by average

performance, so we can't assess the effectiveness of either tactic


PART 3: REGRESSION TO THE MEAN - THE HIDDEN DRIVER
Was that change going to happen anyway?
Read each scenario aloud. Students to discuss in pairs, or work individually, to produce written answers. Review answers as a class

THE SPORTS ILLUSTRATED CURSE DID THAT REALLY HELP?


In 2002, Sports Illustrated Imagine the following
magazine published an story. After a terrible set of
article, highlighting the exam results, a school
excessive bad luck principal introduces a new
experienced by many policy of school uniform
sports stars after for all students. The
appearing on their front following year’s exam
cover. The winning streaks results shows a clear
of many top athletes have improvement.
come to an abrupt end
following this high profile
accolade.  

In terms of performance, what do you think athletes Can the principal justifiably claim that the new uniform
invited to appear on the magazine cover have in common? policy worked?
Students should suggest that all sports stars invited to be on the The policy was introduced when test scores were unusually low,

cover will have been performing at the very highest level in their so they would most likely have increased in the following year

respective sport in the period before receiving their invitations anyway, again due to regression to the mean

Can you think of another explanation for 'The Sports What could the principal have done differently to really
Illustrated Curse'? test the effectiveness of his uniform policy?
Since good performance entails good skill and good luck, and luck He could have introduced the policy after an average set of

is likely to change, their performance is likely to dip anyway. The results. Alternatively, he could have made only one of two classes

curse is most probably an example of regression to the mean adopt the policy, thus 'controlling' for regression to the mean
PART 4: DECISION SCOPE - APPLYING WHAT WE KNOW
How can this help us make better decisions?
Read each of the three scenarios aloud. Students to discuss in pairs, or work individually, to produce written answers. Review answers as a class

APPLYING OUR UNDERSTANDING


For what kind of situations might these ideas be useful?
CASE STUDY 2: ALTERNATIVE REMEDIES
In each of the following Case Studies, certain claims are
made. Use your new understanding of correlation and
causation to indicate if the claims are justifiable, ‘I was feeling really sick and I
explaining your reasons carefully. had tried all the medicines I
usually use. A friend
recommended drinking
CASE STUDY 1: FAMILY MEALS
ground-up apricot seeds,
which sounded pretty
‘I read that kids from
ridiculous, but I was ready
families that eat meals
to try anything. I drank the
together 3 or more times
tea and 2 days later I was
per week are more likely to
back to full health.’
perform better at school,
and even have better
relationships with their
parents’
Students should note that REGRESSION TO THE MEAN may be

Students should note that CONFOUNDERS may well explain this the hidden driver in this scenario. The protagonist sought a new

correlation. Responsible and attentive parents are probably more remedy when his health was much worse than average. In most

likely to organize regular family meals, and are also probably cases, he would soon return to average health anyway, but the

more likely to take an active interest in their children's academic timing of the intervention makes it appear that the apricot seeds

progress were the cause of the improvement.


PART 4: DECISION SCOPE - APPLYING WHAT WE KNOW
How can this help us make better decisions?
Read aloud and discuss as a class

CASE STUDY 3: PIZZA DEFEATS CANCER THE LIMITS OF DATA

The Causation Checklist can only help increase our


'I read about a new study confidence that two variables are causally related. In fact,
which found that eating we can never be 100% certain that A causes B. This is
pizza can prevent cancer. because Mathematics and Statistics work differently.
The study, which involved
12 people from 12 to 68 Mathematics reaches conclusions through DEDUCTION.
years of age, reported how This means that we accept certain basic ideas to be true
many pizzas they ate each (axioms), and we connect them using logic to show that
month. Cancer rates were other ideas must then also be true.
lowest in the participants
Statistics reaches conclusions through INDUCTION. This
who ate the most
means observations (data) about the world are recorded
pizza.’
and then analyzed to identify patterns in the data.

However, we can never be sure of what data we are


Students should note that CONFOUNDERS may also explain this missing. Europeans used to believe that all swans were
white, until 1697 when Dutch explorers
correlation. Study participants that eat the most pizza are
sighted Black Swans in Australia. Just
probably also the youngest participants, and younger people are because all swan data you have suggests
swans are white, the conclusion is wrong
much less likely to develop cancer. So AGE may well be a
because of missing black swan data.
confounder, independently driving both lower pizza consumption

In the same way, even though we might


and higher cancer rates. Also, students should note the very
be confident that A is causing B, we can
small sample size (only 12 people), which should reduce our never be certain that we are not missing
data about C, which may be the real driver.
confidence considerably in the study's findings.
SUMMARY
Review aloud with the class, clarifying any remaining misconceptions

Two variables are CORRELATED if they are


THE CAUSATION CHECKLIST
related to each other
REVERSE CAUSATION
Rather than A causing B, would A B
Two variables are CAUSALLY related if a B causing A make more sense?
change in one variable drives the change in
the other variable
CONFOUNDERS
A
Might some other variable, C, C
The CAUSATION CHECKLIST can be used to
actually be causing both A and B?
B
eliminate alternative explanations for
correlations
COINCIDENCE
If there’s no reasonable connection,
A
Outcomes are typically partly due to
could it just be a coincidence? B
random chance

MULTIPLE CAUSES A
Extremely high numbers usually come Might A be only one of many C B
down and extremely low numbers go up. causes of B? D

If an intervention is introduced when the


numbers are at an extreme, the intervention REGRESSION TO THE MEAN
may appear to have an effect even if it If B was at an extreme value, it may A B
didn’t. change without the introduction of A
... added to checklist
REVIEW: WHAT DO WE THINK NOW?
Students should tackle these final questions independently, producing written answers, which can be reviewed later to assess understanding

Answer the following questions to check you have List 3 alternative explanations to consider, before
understood all the important ideas from this lesson: making a claim of causation:
1 - Reverse Causation

2 - Confounders

3 - Coincidence
Why is it helpful to know if two variables are related?
If we want to influence a certain outcome B, it is useful to know
When a claim is made that a certain intervention has
which other variables it is related to (A, C, D etc.). These
made improvements, what trap should you check for?
correlations can be further investigated to establish causation,
Regression to the mean. It is also important to know if the

which may then suggest new ways to influence outcome B


intervention was introduced when the variable was at an

extreme value

What does it mean if two variables are ‘correlated’?


Why do many variables naturally return from extreme
It means that there is a relationship between the two
values towards the average?
variables and they behave in a consistent way relative to
Many variables are partly due to chance, which is not consistent.

each other
Good luck and bad luck will soon be followed by average luck,

so extreme values typically soon return to average values

What does it mean if two variables are ‘causally’


related? So, does ice cream make the sun shine?
It means that a change in one variable actually causes the Ice cream sales and hours of sunshine are correlated, but this

change in the other variable.  claim is an example of reverse causation. In reality, an increase

in sun shine causes an increase in ice cream sales

You might also like