0% found this document useful (0 votes)
99 views21 pages

AS STAT-11 Q4 Wk7

The document discusses correlation analysis in statistics and probability. It defines univariate and bivariate data, describes how to construct scatter plots to illustrate the relationship between two variables, and explains how to describe the shape, trend, and strength of correlation based on a scatter plot.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
99 views21 pages

AS STAT-11 Q4 Wk7

The document discusses correlation analysis in statistics and probability. It defines univariate and bivariate data, describes how to construct scatter plots to illustrate the relationship between two variables, and explains how to describe the shape, trend, and strength of correlation based on a scatter plot.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Republic of the Philippines

Department of Education
REGION I
SCHOOLS DIVISION OF THE CITY OF BATAC

ACTIVITY SHEETS IN
STATISTICS AND PROBABILITY
QUARTER 4, WEEK 7

CORRELATION ANALYSIS

Most Essential Learning Competencies (MELCs):


• illustrates the nature of bivariate data (M11/12SP-IVg-2)
• constructs a scatter plot (M11/12SP-IVg-3);
• describes shape (form), trend (direction), and variation (strength) based on a
scatter plot. (M11/12SP-IVg-4);
• calculates the Pearson’s sample correlation coefficient (M11/12SP-IVh-2); and
• solves problems involving correlation analysis (M11/12SP-IVh-3).

Prepared by:

QUEENY JOYCE F. SEBASTIAN


Master Teacher I
Background Information
Illustrating the Nature of Bivariate Data

Data that involve one variable is called univariate data. Univariate data are
often described using the measures of central tendency (mean or average, mode, and
median), variations, or other descriptive statistics. Here are examples of univariate
data:

Examples Variable Involved


Department of Health (DOH) recorded number of infected cases
the number of infected COVID-19 cases
from April 14 to May 21, 2021 in the
Philippines.
World Health Organization (WHO) number of COVID-19 recoveries
summarized the number of COVID-19
recoveries around the world.

Data that involve two variables are called bivariate data. The statistical
procedure used to determine and describe the relationship between two variables is
called correlation analysis.

Examples Variables Involved


In the public market, a consumer supply and price of vegetable
observed that the fewer is the supply of
vegetables, the higher the price gets.
The Ilocos Norte provincial government
gave emphasis that limiting the number number of household members and
rate of COVID-19 infection
of household members going outside to
purchase essential goods will help
decrease the rate of COVID -19 infection
in the province.
Constructing Scatter Plot

Scatter plot, scatter graph, scatter diagram, or scatter gram is a graphical


representation that shows the relationship or the correlation of two variables of
bivariate data.

Scatter plot shows how points collected from a set of bivariate data are
scattered on a Cartesian plane. It gives a good visual picture of how two variables are
related or associated with one another in terms of form, trend, and variation of
correlation. The form of points in the scatter plot determines the shape of the
correlation of the variables. The trend determines the direction of the points, either the
variables have positive, negative, or no correlation. The variation or strength of
correlation is based on the closeness of the points on a trend line and it determines
whether the variables have no, weak, moderate, strong, or perfect correlation.

Example 1
The given numbers are the age of a person in years and his/her corresponding
weight.

Age of a person (x) 11 12 13 14 15 16 17 18 19 20


Weight of a person (y) 40 42 38 35 45 51 48 48 50 47

The weight of a person will be plotted on the y-axis while the age of a person
will be potted on the x-axis. The scatter plot is shown below.
Example 2
A Math teacher conducted a study regarding the performance of grade 11
students in General Mathematics. Their average grades were taken at different time
or period. The data are given below.

Order of period of the subject 1 2 3 4 5 6 7 8


Average grade 86 88 84 82 82 81 80 79

The order of the subject will be plotted on the x-axis and grades will be plotted
on the y-axis as illustrated below.

Example 3
A researcher asked for the weight of 10 students together with the weight of
their mother (biological) and created a scatter plot.

Weight of mother 65 69 74 78 59 81 76 80 81 75
Weight of student 52 55 62 63 47 66 63 69 68 65
The scatter plot is presented below.

Describing the Shape (Form), Trend (Direction), and Variation (Strength) Based
on a Scatter Plot

The correlation of the variables can be described in terms of form (shape),


trend (direction), and variation (strength) of scatter plot. The form of correlation can
be determined by the shape of points on a scatter plot categorized as linear or
curvilinear. The form of correlation is linear if the points on scatter plot follow a trend
of straight line. The form of scatter plot is non-linear if the points follow a trend of
curve line. Sample scatter plots showing curvilinear form of correlation are given
below.
The correlation of variables can also be described in terms of its trend or
direction. The trend of correlation can be positive, negative, or zero/negligible
depending on the direction of the points. The trend of correlation is summarized in the
table that follows.

Trend Graph Direction Description


of Points
Positive The A positive correlation
Correlation points exists when high
follow a values of one
trend variable correspond
rising to high values of
from left another variable or
to right. low values of one
variable correspond
to low values of
another variable.
Negative The A negative
Correlation points correlation exists
follow a when high values of
trend one variable
rising correspond to low
from right values of another
to left. variable or low
values of one
variable correspond
to high values of
another variable.
No The A negligible
Correlation/ points are correlation exists
Negligible neither when high values of
Correlation rising one variable
from left correspond to either
to right high or low values of
nor right another variable.
to left.

The closeness of the points around the trend line determines the variation or
strength of the correlation between the variables involved. The closer the points to the
trend line, the stronger the correlation of the variables is. The strength of correlation
between two variables can be perfect, strong, weak, or no/negligible correlation. To
summarize the strength of correlation, refer to the table below.

Correlation Scatter Plot Description


Strong Positive This correlation exists
Correlation when almost all of the
points are on the line or
the points are closely
scattered on the trend line
that rises from left to right.

Weak Positive Compared to strong


Correlation positive correlation, the
points in this correlation
are scattered a bit far
from the trend line from
left to right.
No Correlation or The points in this
Negligible correlation do not follow
Correlation any trend line.
The points are just
scattered around the
Cartesian plane.

Weak Negative The points in this


Correlation correlation are scattered a
bit far from the trend line
from right to left.

Moderate Negative This correlation exists


Correlation when the points are
moderately scattered
rising from right to left.

Strong Negative This correlation exists


Correlation when almost all of the
points are on the line or
the points are closely
scattered on the trend line
that rises from right to left.

Two variables can also have perfect positive or perfect negative correlation. In
a scatter plot, the variables with perfect correlation will show points that fall into a
straight line/trend line.
Calculating the Pearson’s Sample Correlation Coefficient

The Pearson’s sample correlation coefficient (also known as Pearson r),


denoted by r, is a test statistic that measures the strength of the linear relationship
between two variables. To find r, the following formula is used:

The correlation coefficient (r) is a number between -1 and 1 that describes


both the strength and the direction of correlation. In symbol, we write -1 ≤ r ≤ 1.

Example 4
Teachers of Masipag National High School instilled among their students the
value of time management and excellence in everything they do. The table below
shows the time in hours spent in studying (X) by six Grade 11 students and their scores
in a test (Y). Solve for the Pearson’s sample correlation coefficient r.

X 1 2 3 4 5 6
Y 5 10 10 15 25 30

Solution

The suggested steps will guide you on how to compute the Pearson product moment
correlation r.

Steps Solution
1. Construct a table as shown X Y XY X2 Y2
below. 1 5
2 10
3 10
4 15
5 25
6 30
2. Complete the table. X Y XY X2 Y2
a. Multiply entries in the X and 1 5 5 1 25
Y columns. Put them under 2 10 20 4 100
the XY column. 3 10 30 9 100
b. Square all the entries in the 4 15 60 16 225
X column. Put them under 5 25 125 25 625
X2 column. 6 30 180 36 900
c. Square all the entries in the
Y column. Put them under
Y2 column
3. a. Get the sum of all entries in X Y XY X2 Y2
the X column. This is Σ𝑿. 1 5 5 1 25
b. Get the sum of all entries in 2 10 20 4 100
the Y column. This is Σ𝒀. 3 10 30 9 100
c. Get the sum of all entries in 4 15 60 16 225
the XY column. This is Σ𝑿𝒀. 5 25 125 25 625
d. Get the sum of all entries in 6 30 180 36 900
the X2 column. This is Σ𝑿2. Σ𝑿 = Σ𝒀 = Σ𝑿Y = Σ𝑿2 = Σ𝒀2 =
e. Get the sum of all entries in 21 95 420 91 1975
the Y2 column. This is Σ𝒀2.
3. Substitute the values obtained Here n = 6 because there are six (6) pairs of
from Step 3 in the formula: values.
𝑛(∑ 𝑋𝑌 ) − (∑ 𝑋)(∑ 𝑌 )
𝑟=
√[𝑛(∑ 𝑋 2 ) − (∑ 𝑋 )2 ][𝑛(∑ 𝑌 2 ) − (∑ 𝑌 )2 ]
6(420) − (21)(95)
𝑟=
√[6(91) − (21)2 ][6(1975) − (95)2 ]
2520 − 1995
𝑟=
√[546 − 441][11850 − 9025]
525
𝑟=
√[105][2825]
525
𝑟=
√296,625
𝑟 ≈ 0.96395 or 0.96
The value of r is a positive number.
Therefore, we can say accurately that there is a
positive correlation between hours spent in
studying and their scores in a test.

Note: For consistency of our answer, round your


final answer into two decimal places.

We will interpret r value by looking at the scale that gives both strength and
direction of correlation.

Using the correlation scale, we can determine the strength of the correlation
coefficient r. From the previous example, r = 0.96 which means that there is a “strong
positive correlation” between the time in hours spent in studying by six Grade 11
students and their scores in a test. To interpret, we can simply state it this way: “As
the Grade 11 students spend more time in studying their lesson, the higher their scores
in test.” or “As the Grade 11 students spend less time in studying their lesson, the
lower their scores in test.”

Example 5

Scenario: Filipino employees are known for being persistent and hardworking. That is
why they truly value every single cent of their salary. Here are some situations
showing the relationship between the salary and spending of a Filipino
employee.
Situation 1: There is a survey wherein the correlation coefficient r between salary and
spending of employee was found to be 0.97.
Interpretation: There is a “strong positive correlation” between salary and
spending of employees.

Situation 2: In another survey, the correlation coefficient r between salary and


spending of employee was found to be 0.38.
Interpretation: There is a “weak positive correlation” between salary and
spending of employees.

Situation 3: In another survey, the correlation coefficient r between salary and


spending of employee was found to be -0.81.
Interpretation: There is a “strong negative correlation” between salary and
spending of employees.

Situation 4: In another survey, the correlation coefficient r between salary and


spending of employee was found to be -0.19.
Interpretation: There is a “weak negative correlation” between salary and
spending of employees.

For more examples, see the table below:

Bivariate Data Computed Pearson’s r Interpretation


Temperature and the -0.781 There is a strong negative
number of hot chocolate correlation between the
products sold temperature and the
number of hot chocolate
products sold.
Amount of coffee intake 0.426 There is a weak positive
and number of hours you correlation between the
stay awake amount of coffee intake
and number of hours you
stay awake.
Height and salary of 0 There is no correlation
teachers between the height and
salary of teachers.
The number of people -0.48 There is a weak negative
cleaned the garden and correlation between the
the number of hours number of people who
cleaning the garden clean the garden and the
number of hours cleaning
the garden.
The number of pets and 0.73 There is strong positive
the amount of money correlation between the
spent in buying pet foods number of pets and the
amount of money used to
buy pet foods.

If data are in a scatter plot, we can determine the strength of correlation and
value of r by estimating it. Refer to the given examples below:

Scatter Plot Estimated Estimated


Strength of the Value of r
Correlation
Strong Positive The value of r should
Correlation be in the range
between 0.5 and 1.
We can say 0.8 or
0.75 as long as it is
within the range in the
correlation scale.
Weak Negative The value of r should
Correlation be in the range
between
0 and -0.5
We can say -0.39
as long as it is within
the range in the
correlation scale.
The closeness of the points around the trend line determines the strength of
the correlation. The closer the points to the trend line, the stronger the correlation
of the variables is.

Note: Other authors use different scales of interpreting the correlations. Use the
presented scale throughout this lesson.

Example 6
An agriculture student is interested in determining the relationship between the
time in hours spent in planting rice in a 250 m2 land area and the number of workers
who planted the rice. The data is shown below. Calculate the Pearson’s sample
correlation coefficient and interpret.

Number of workers 3 4 5 6 7 8 9
Time in hours spent 4 4 3.5 3.25 2.5 1.25 1

Solution
The suggested steps will guide you on how to compute the Pearson product moment
correlation r.

Steps Solution
1. Construct a table as shown X Y XY X2 Y2
below. 3 4
4 4
5 3.5
6 3.25
7 2.5
8 1.25
9 1
2. Complete the table. X Y XY X2 Y2
a. Multiply entries in the X and 3 4 12 9 16
Y columns. Put them under 4 4 16 16 16
the XY column. 5 3.5 17.5 25 12.25
b. Square all the entries in the 6 3.25 19.5 36 10.56
X column. Put them under 7 2.5 17.5 49 6.25
X2 column. 8 1.25 10 64 1.56
c. Square all the entries in the 9 1 9 81 1
Y column. Put them under
Y2 column
3. X Y XY X2 Y2
a. Get the sum of all entries in 3 4 12 9 16
the X column. This is Σ𝑿. 4 4 16 16 16
b. Get the sum of all entries in 5 3.5 17.5 25 12.25
the Y column. This is Σ𝒀. 6 3.25 19.5 36 10.56
c. Get the sum of all entries in 7 2.5 17.5 49 6.25
the XY column. This is Σ𝑿𝒀. 8 1.25 10 64 1.56
d. Get the sum of all entries in 9 1 9 81 1
the X2 column. This is Σ𝑿2. Σ𝑿 = Σ𝒀 = Σ𝑿Y = Σ𝑿2 = Σ𝒀2 =
e. Get the sum of all entries in 42 19.5 101.5 280 63.62
the Y2 column. This is Σ𝒀2.
4. Substitute the values Here n = 6 because there are six (6) pairs of
obtained from Step 3 in the values.
𝑛(∑ 𝑋𝑌 ) − (∑ 𝑋)(∑ 𝑌 )
formula: 𝑟=
√[𝑛(∑ 𝑋 2 ) − (∑ 𝑋 )2 ][𝑛(∑ 𝑌 2 ) − (∑ 𝑌 )2 ]
7(101.5) − (42)(19.5)
𝑟=
√[7(280) − (42)2 ][7(63.62) − (19.5)2 ]
710.5 − 819
𝑟=
√[1960 − 1764][445.34 − 380.25]
−108.1
𝑟=
√[196][65.09]
−108.1
𝑟=
√12,757.64
𝑟 ≈ −0.95706 or -0.96
5. Interpretation. The value of r is a negative number. Therefore,
we can say accurately that there is a strong
negative correlation between the time spent in
planting rice and the number of workers planted
rice. It means that if there are more workers
plant rice there will be less time spent in
planting. If there are less workers in planting
rice, there will be more time spent in planting.
Name:___________________________________________ Date:____________
Grade/Section: ____________________________________Score:____________

Activity 1: Uno or Dos?


Directions: Determine the variables in the following situations and
identify whether they involve univariate or bivariate data. The first one
is given as an example.

Situation Variable/s Univariate or


Bivariate
Example:
A secretary recorded the daily Number of patients Univariate
number of patients a doctor has for
a month during the General
Community Quarantine.
1. Mr. Mangapit will donate face
masks to the people in his
barangay. He asked a health
worker to survey the number of
family members living in each
house on his barangay
2. To properly compensate an
employee, the administrative aid
records the number of hours
their employees are working and
their respective take home pay.
3. A GAS student was tasked to
conduct a survey on the
number of hours students
spend in playing online games
like Mobile Legends.
4. Jelai asked the height of her
friends and their mothers.

5. A researcher records the


number of infected COVID-19
patients and the number of days
they spent in the hospital before
recovering from the disease.

________________________________________ _______________
Parent’s/Guardian’s Signature Over Printed Name Date
Name:___________________________________________ Date:____________
Grade/Section: ____________________________________Score:____________

Activity 2: Plot Me!


Directions: Construct the scatter plot of the following data by plotting
the points on the given Cartesian plane. Then determine if the trend and
strength of correlation is strong positive, weak positive, no/negligible
correlation, weak negative, or strong positive.

1. Masipag National High School is known for students who excel in Math. A
researcher recorded the IQ of the students and their scores on a 50- item Math
test as the focus of his study.
IQ Scores 85 87 83 85 87 97 105 110 115 120
Test Scores 21 23 30 34 31 35 40 42 45 48
Test Scores

IQ Scores

Trend:____________________________________________

2. The Grade 11-Hope students of Masipag National High School believe on the
value of thriftiness. That is why they conducted a research on selected
elementary learners regarding the amount they save from their daily allowance
and their corresponding weight as follows.

Amount saved in peso 10 8 15 20 5 3 5 25 10 15


Weight in kilogram 38 40 37 36 42 41 39 35 36 37
Weight in kilogram

Amount saved in peso

Trend: ____________________________________

3. Jannah interviewed 9 of her classmates on their average daily allowance in


peso and their weight in kilogram. The results are given below.

Daily Allowance 35 55 60 65 45 55 70 70 77
Weight 40 38 45 43 60 41 63 57 60

Trend:___________________________________________________

________________________________________ _______________
Parent’s/Guardian’s Signature Over Printed Name Date
Name:___________________________________________ Date:____________
Grade/Section: ____________________________________Score:____________

Activity 3: Solve Me!


Directions: Read the following situations. Using the given data,
calculate the Pearson’s sample correlation coefficient. After obtaining
Pearson’s r, analyze and interpret the result. Show all your necessary
solutions.

1. The table shows the data obtained from six students of Mapalad Integrated
High School in a study about the number of hours a student exercises each
week and the score s/he gets in a test.

Student Hours (X) Score (Y)


A 1 25
B 2 5
C 3 20
D 4 40
E 5 25
F 6 9

2. A group of Senior High School students is conducting a collaborative research


and they want to determine whether there is a correlation between the age of
tricycles (in years) in a certain city and the mileage it runs (in kilometers). The
data are shown below.

Age of tricycle, in years (X) 0.5 1 1.5 2 3 4


Mileage, in km/liter (Y) 16 14 10 12 10 12

________________________________________ _______________
Parent’s/Guardian’s Signature Over Printed Name Date
References:

Alonzo, George. Statistics and Probability for Senior High School Grade 11.
Makati City: Salinlahi Publishing House, Inc., 2017.

Belecina, Rene R. Statistics and Probability. Manila: Rex Book Store, 2016.

Department of Education. Statistics and Probability. Quarter 4 Modules

You might also like