Republic of the Philippines
Department of Education
                                   REGION I
                     SCHOOLS DIVISION OF THE CITY OF BATAC
          ACTIVITY SHEETS IN
      STATISTICS AND PROBABILITY
                       QUARTER 4, WEEK 7
                 CORRELATION ANALYSIS
Most Essential Learning Competencies (MELCs):
  •   illustrates the nature of bivariate data (M11/12SP-IVg-2)
  •   constructs a scatter plot (M11/12SP-IVg-3);
  •   describes shape (form), trend (direction), and variation (strength) based on a
      scatter plot. (M11/12SP-IVg-4);
  •   calculates the Pearson’s sample correlation coefficient (M11/12SP-IVh-2); and
  •   solves problems involving correlation analysis (M11/12SP-IVh-3).
                                   Prepared by:
                     QUEENY JOYCE F. SEBASTIAN
                                 Master Teacher I
Background Information
Illustrating the Nature of Bivariate Data
        Data that involve one variable is called univariate data. Univariate data are
often described using the measures of central tendency (mean or average, mode, and
median), variations, or other descriptive statistics. Here are examples of univariate
data:
                 Examples                             Variable Involved
 Department of Health (DOH) recorded               number of infected cases
 the number of infected COVID-19 cases
 from April 14 to May 21, 2021 in the
 Philippines.
 World    Health    Organization   (WHO)        number of COVID-19 recoveries
 summarized the number of COVID-19
 recoveries around the world.
        Data that involve two variables are called bivariate data. The statistical
procedure used to determine and describe the relationship between two variables is
called correlation analysis.
                 Examples                             Variables Involved
 In the public market, a consumer                supply and price of vegetable
 observed that the fewer is the supply of
 vegetables, the higher the price gets.
 The Ilocos Norte provincial government
 gave emphasis that limiting the number       number of household members and
                                                 rate of COVID-19 infection
 of household members going outside to
 purchase essential goods will help
 decrease the rate of COVID -19 infection
 in the province.
Constructing Scatter Plot
       Scatter plot, scatter graph, scatter diagram, or scatter gram is a graphical
representation that shows the relationship or the correlation of two variables of
bivariate data.
       Scatter plot shows how points collected from a set of bivariate data are
scattered on a Cartesian plane. It gives a good visual picture of how two variables are
related or associated with one another in terms of form, trend, and variation of
correlation. The form of points in the scatter plot determines the shape of the
correlation of the variables. The trend determines the direction of the points, either the
variables have positive, negative, or no correlation. The variation or strength of
correlation is based on the closeness of the points on a trend line and it determines
whether the variables have no, weak, moderate, strong, or perfect correlation.
Example 1
       The given numbers are the age of a person in years and his/her corresponding
weight.
 Age of a person (x)       11     12    13     14      15    16     17    18    19   20
 Weight of a person (y)    40     42    38     35      45    51     48    48    50   47
       The weight of a person will be plotted on the y-axis while the age of a person
will be potted on the x-axis. The scatter plot is shown below.
Example 2
       A Math teacher conducted a study regarding the performance of grade 11
students in General Mathematics. Their average grades were taken at different time
or period. The data are given below.
 Order of period of the subject       1    2      3     4     5      6      7      8
 Average grade                    86       88    84     82   82     81     80      79
       The order of the subject will be plotted on the x-axis and grades will be plotted
on the y-axis as illustrated below.
Example 3
       A researcher asked for the weight of 10 students together with the weight of
their mother (biological) and created a scatter plot.
 Weight of mother      65    69       74   78    59     81    76     80     81     75
 Weight of student     52    55       62   63    47     66    63     69     68     65
         The scatter plot is presented below.
Describing the Shape (Form), Trend (Direction), and Variation (Strength) Based
on a Scatter Plot
         The correlation of the variables can be described in terms of form (shape),
trend (direction), and variation (strength) of scatter plot. The form of correlation can
be determined by the shape of points on a scatter plot categorized as linear or
curvilinear. The form of correlation is linear if the points on scatter plot follow a trend
of straight line. The form of scatter plot is non-linear if the points follow a trend of
curve line. Sample scatter plots showing curvilinear form of correlation are given
below.
       The correlation of variables can also be described in terms of its trend or
direction. The trend of correlation can be positive, negative, or zero/negligible
depending on the direction of the points. The trend of correlation is summarized in the
table that follows.
    Trend                    Graph                  Direction        Description
                                                    of Points
 Positive                                           The          A positive correlation
 Correlation                                        points       exists when high
                                                    follow a     values of one
                                                    trend        variable correspond
                                                    rising       to high values of
                                                    from left    another variable or
                                                    to right.    low values of one
                                                                 variable correspond
                                                                 to low values of
                                                                 another variable.
 Negative                                           The          A negative
 Correlation                                        points       correlation exists
                                                    follow a     when high values of
                                                    trend        one variable
                                                    rising       correspond to low
                                                    from right   values of another
                                                    to left.     variable or low
                                                                 values of one
                                                                 variable correspond
                                                                 to high values of
                                                                 another variable.
 No                                                  The              A negligible
 Correlation/                                        points are correlation exists
 Negligible                                          neither          when high values of
 Correlation                                         rising           one variable
                                                     from left        correspond to either
                                                     to right         high or low values of
                                                     nor right        another variable.
                                                     to left.
        The closeness of the points around the trend line determines the variation or
strength of the correlation between the variables involved. The closer the points to the
trend line, the stronger the correlation of the variables is. The strength of correlation
between two variables can be perfect, strong, weak, or no/negligible correlation. To
summarize the strength of correlation, refer to the table below.
      Correlation                 Scatter Plot                          Description
 Strong Positive                                                This correlation exists
 Correlation                                                    when almost all of the
                                                                points are on the line or
                                                                the points are closely
                                                                scattered on the trend line
                                                                that rises from left to right.
 Weak Positive                                                  Compared to strong
 Correlation                                                    positive correlation, the
                                                                points in this correlation
                                                                are scattered a bit far
                                                                from the trend line from
                                                                left to right.
 No Correlation or                                         The points in this
 Negligible                                                correlation do not follow
 Correlation                                               any trend line.
                                                           The points are just
                                                           scattered around the
                                                           Cartesian plane.
 Weak Negative                                             The points in this
 Correlation                                               correlation are scattered a
                                                           bit far from the trend line
                                                           from right to left.
 Moderate Negative                                         This correlation exists
 Correlation                                               when the points are
                                                           moderately scattered
                                                           rising from right to left.
 Strong Negative                                           This correlation exists
 Correlation                                               when almost all of the
                                                           points are on the line or
                                                           the points are closely
                                                           scattered on the trend line
                                                           that rises from right to left.
       Two variables can also have perfect positive or perfect negative correlation. In
a scatter plot, the variables with perfect correlation will show points that fall into a
straight line/trend line.
Calculating the Pearson’s Sample Correlation Coefficient
       The Pearson’s sample correlation coefficient (also known as Pearson r),
denoted by r, is a test statistic that measures the strength of the linear relationship
between two variables. To find r, the following formula is used:
       The correlation coefficient (r) is a number between -1 and 1 that describes
both the strength and the direction of correlation. In symbol, we write -1 ≤ r ≤ 1.
Example 4
       Teachers of Masipag National High School instilled among their students the
value of time management and excellence in everything they do. The table below
shows the time in hours spent in studying (X) by six Grade 11 students and their scores
in a test (Y). Solve for the Pearson’s sample correlation coefficient r.
     X             1           2              3          4              5         6
     Y             5          10           10            15             25        30
Solution
The suggested steps will guide you on how to compute the Pearson product moment
correlation r.
                 Steps                                       Solution
 1. Construct a table as shown            X         Y          XY            X2   Y2
    below.                                1         5
                                          2         10
                                          3         10
                                          4         15
                                          5         25
                                          6         30
2. Complete the table.                   X            Y           XY        X2         Y2
   a. Multiply entries in the X and       1           5            5          1       25
     Y columns. Put them under            2           10          20          4       100
     the XY column.                       3           10          30          9       100
   b. Square all the entries in the       4           15          60        16        225
     X column. Put them under             5           25          125       25        625
     X2   column.                         6           30          180       36        900
   c. Square all the entries in the
     Y column. Put them under
     Y2 column
3. a. Get the sum of all entries in      X            Y           XY        X2         Y2
     the X column. This is Σ𝑿.            1           5            5          1       25
  b. Get the sum of all entries in        2           10          20          4       100
     the Y column. This is Σ𝒀.            3           10          30          9       100
  c. Get the sum of all entries in        4           15          60        16        225
     the XY column. This is Σ𝑿𝒀.          5           25          125       25        625
  d. Get the sum of all entries in        6           30          180       36        900
     the X2 column. This is Σ𝑿2.        Σ𝑿 =         Σ𝒀 =        Σ𝑿Y =   Σ𝑿2 =      Σ𝒀2 =
  e. Get the sum of all entries in       21           95          420     91        1975
     the Y2 column. This is Σ𝒀2.
3. Substitute the values obtained Here n = 6 because there are six (6) pairs of
   from Step 3 in the formula:        values.
                                                          𝑛(∑ 𝑋𝑌 ) − (∑ 𝑋)(∑ 𝑌 )
                                        𝑟=
                                              √[𝑛(∑ 𝑋 2 ) − (∑ 𝑋 )2 ][𝑛(∑ 𝑌 2 ) − (∑ 𝑌 )2 ]
                                                            6(420) − (21)(95)
                                          𝑟=
                                                √[6(91) − (21)2 ][6(1975) − (95)2 ]
                                                               2520 − 1995
                                                𝑟=
                                                     √[546 − 441][11850 − 9025]
                                                                   525
                                                          𝑟=
                                                               √[105][2825]
                                                                   525
                                                            𝑟=
                                                                 √296,625
                                                       𝑟 ≈ 0.96395 or 0.96
                                            The value of r is a positive number.
                                     Therefore, we can say accurately that there is a
                                     positive correlation between hours spent in
                                     studying and their scores in a test.
                                     Note: For consistency of our answer, round your
                                     final answer into two decimal places.
       We will interpret r value by looking at the scale that gives both strength and
direction of correlation.
       Using the correlation scale, we can determine the strength of the correlation
coefficient r. From the previous example, r = 0.96 which means that there is a “strong
positive correlation” between the time in hours spent in studying by six Grade 11
students and their scores in a test. To interpret, we can simply state it this way: “As
the Grade 11 students spend more time in studying their lesson, the higher their scores
in test.” or “As the Grade 11 students spend less time in studying their lesson, the
lower their scores in test.”
Example 5
Scenario: Filipino employees are known for being persistent and hardworking. That is
       why they truly value every single cent of their salary. Here are some situations
       showing the relationship between the salary and spending of a Filipino
       employee.
Situation 1: There is a survey wherein the correlation coefficient r between salary and
       spending of employee was found to be 0.97.
       Interpretation: There is a “strong positive correlation” between salary and
       spending of employees.
Situation 2: In another survey, the correlation coefficient r between salary and
      spending of employee was found to be 0.38.
      Interpretation: There is a “weak positive correlation” between salary and
      spending of employees.
Situation 3: In another survey, the correlation coefficient r between salary and
      spending of employee was found to be -0.81.
      Interpretation: There is a “strong negative correlation” between salary and
      spending of employees.
Situation 4: In another survey, the correlation coefficient r between salary and
      spending of employee was found to be -0.19.
      Interpretation: There is a “weak negative correlation” between salary and
      spending of employees.
      For more examples, see the table below:
      Bivariate Data         Computed Pearson’s r            Interpretation
   Temperature and the               -0.781            There is a strong negative
 number of hot chocolate                                correlation between the
       products sold                                      temperature and the
                                                        number of hot chocolate
                                                             products sold.
  Amount of coffee intake             0.426             There is a weak positive
 and number of hours you                                correlation between the
        stay awake                                      amount of coffee intake
                                                        and number of hours you
                                                              stay awake.
   Height and salary of                 0                There is no correlation
         teachers                                       between the height and
                                                           salary of teachers.
  The number of people                -0.48             There is a weak negative
  cleaned the garden and                                correlation between the
    the number of hours                                       number of people who
    cleaning the garden                                      clean the garden and the
                                                             number of hours cleaning
                                                                     the garden.
  The number of pets and                 0.73                There is strong positive
   the amount of money                                       correlation between the
 spent in buying pet foods                                    number of pets and the
                                                             amount of money used to
                                                                    buy pet foods.
      If data are in a scatter plot, we can determine the strength of correlation and
value of r by estimating it. Refer to the given examples below:
             Scatter Plot                    Estimated                 Estimated
                                          Strength of the               Value of r
                                            Correlation
                                           Strong Positive        The value of r should
                                             Correlation             be in the range
                                                                   between 0.5 and 1.
                                                                    We can say 0.8 or
                                                                   0.75 as long as it is
                                                                  within the range in the
                                                                    correlation scale.
                                           Weak Negative          The value of r should
                                             Correlation             be in the range
                                                                         between
                                                                        0 and -0.5
                                                                    We can say -0.39
                                                                  as long as it is within
                                                                     the range in the
                                                                    correlation scale.
       The closeness of the points around the trend line determines the strength of
the correlation. The closer the points to the trend line, the stronger the correlation
of the variables is.
Note: Other authors use different scales of interpreting the correlations. Use the
presented scale throughout this lesson.
Example 6
       An agriculture student is interested in determining the relationship between the
time in hours spent in planting rice in a 250 m2 land area and the number of workers
who planted the rice. The data is shown below. Calculate the Pearson’s sample
correlation coefficient and interpret.
 Number of workers        3       4       5        6         7         8         9
 Time in hours spent      4       4       3.5     3.25      2.5      1.25        1
Solution
The suggested steps will guide you on how to compute the Pearson product moment
correlation r.
                 Steps                                   Solution
 1. Construct a table as shown            X       Y        XY         X2       Y2
    below.                                3        4
                                          4        4
                                          5       3.5
                                          6      3.25
                                          7       2.5
                                          8      1.25
                                          9        1
2. Complete the table.                         X          Y           XY          X2      Y2
     a. Multiply entries in the X and          3          4           12          9       16
        Y columns. Put them under              4          4           16          16      16
        the XY column.                         5         3.5         17.5         25    12.25
     b. Square all the entries in the          6        3.25         19.5         36    10.56
        X column. Put them under               7         2.5         17.5         49     6.25
        X2   column.                           8        1.25          10          64     1.56
     c. Square all the entries in the          9          1           9           81      1
        Y column. Put them under
        Y2 column
3.                                             X          Y           XY          X2      Y2
 a. Get the sum of all entries in              3          4           12          9       16
       the X column. This is Σ𝑿.               4          4           16          16      16
 b. Get the sum of all entries in              5         3.5         17.5         25    12.25
       the Y column. This is Σ𝒀.               6        3.25         19.5         36    10.56
     c. Get the sum of all entries in          7         2.5         17.5         49     6.25
       the XY column. This is Σ𝑿𝒀.             8        1.25          10          64     1.56
     d. Get the sum of all entries in          9          1           9           81      1
       the   X2   column. This is   Σ𝑿2.     Σ𝑿 =       Σ𝒀 =        Σ𝑿Y =      Σ𝑿2 =    Σ𝒀2 =
     e. Get the sum of all entries in         42        19.5        101.5       280     63.62
       the Y2 column. This is Σ𝒀2.
4. Substitute the values                   Here n = 6 because there are six (6) pairs of
      obtained from Step 3 in the          values.
                                                         𝑛(∑ 𝑋𝑌 ) − (∑ 𝑋)(∑ 𝑌 )
      formula:                                𝑟=
                                                 √[𝑛(∑ 𝑋 2 ) − (∑ 𝑋 )2 ][𝑛(∑ 𝑌 2 ) − (∑ 𝑌 )2 ]
                                                              7(101.5) − (42)(19.5)
                                              𝑟=
                                                   √[7(280) − (42)2 ][7(63.62) − (19.5)2 ]
                                                                    710.5 − 819
                                                𝑟=
                                                     √[1960 − 1764][445.34 − 380.25]
                                                                      −108.1
                                                           𝑟=
                                                                 √[196][65.09]
                                                                      −108.1
                                                               𝑟=
                                                                    √12,757.64
                                                         𝑟 ≈ −0.95706 or -0.96
5. Interpretation.   The value of r is a negative number. Therefore,
                     we can say accurately that there is a strong
                     negative correlation between the time spent in
                     planting rice and the number of workers planted
                     rice. It means that if there are more workers
                     plant rice there will be less time spent in
                     planting. If there are less workers in planting
                     rice, there will be more time spent in planting.
Name:___________________________________________ Date:____________
Grade/Section: ____________________________________Score:____________
                Activity 1: Uno or Dos?
                Directions: Determine the variables in the following situations and
                identify whether they involve univariate or bivariate data. The first one
                is given as an example.
               Situation                          Variable/s            Univariate or
                                                                         Bivariate
  Example:
  A secretary recorded the daily             Number of patients           Univariate
  number of patients a doctor has for
  a month during the General
  Community Quarantine.
  1. Mr. Mangapit will donate face
     masks to the people in his
     barangay. He asked a health
     worker to survey the number of
     family members living in each
     house on his barangay
  2. To properly compensate an
     employee, the administrative aid
     records the number of hours
     their employees are working and
     their respective take home pay.
  3. A GAS student was tasked to
     conduct a survey on the
     number of hours students
     spend in playing online games
     like Mobile Legends.
  4. Jelai asked the height of her
     friends and their mothers.
  5. A researcher records the
     number of infected COVID-19
     patients and the number of days
     they spent in the hospital before
     recovering from the disease.
________________________________________                              _______________
Parent’s/Guardian’s Signature Over Printed Name                             Date
              Name:___________________________________________ Date:____________
              Grade/Section: ____________________________________Score:____________
                          Activity 2: Plot Me!
                          Directions: Construct the scatter plot of the following data by plotting
                          the points on the given Cartesian plane. Then determine if the trend and
                          strength of correlation is strong positive, weak positive, no/negligible
                          correlation, weak negative, or strong positive.
                1. Masipag National High School is known for students who excel in Math. A
                   researcher recorded the IQ of the students and their scores on a 50- item Math
                   test as the focus of his study.
                    IQ Scores       85      87    83  85    87 97        105 110 115 120
                    Test Scores     21      23    30  34    31 35        40     42     45    48
Test Scores
                                                  IQ Scores
                   Trend:____________________________________________
                2. The Grade 11-Hope students of Masipag National High School believe on the
                   value of thriftiness. That is why they conducted a research on selected
                   elementary learners regarding the amount they save from their daily allowance
                   and their corresponding weight as follows.
                    Amount saved in peso     10    8     15   20    5    3     5   25    10   15
                    Weight in kilogram       38   40     37   36   42   41    39   35    36   37
    Weight in kilogram
                                                          Amount saved in peso
                         Trend: ____________________________________
      3. Jannah interviewed 9 of her classmates on their average daily allowance in
         peso and their weight in kilogram. The results are given below.
                         Daily Allowance   35   55   60     65      45     55    70   70    77
                         Weight            40   38   45     43      60     41    63   57    60
                         Trend:___________________________________________________
________________________________________                                         _______________
Parent’s/Guardian’s Signature Over Printed Name                                        Date
  Name:___________________________________________ Date:____________
  Grade/Section: ____________________________________Score:____________
                Activity 3: Solve Me!
                Directions: Read the following situations. Using the given data,
                calculate the Pearson’s sample correlation coefficient. After obtaining
                Pearson’s r, analyze and interpret the result. Show all your necessary
                solutions.
     1. The table shows the data obtained from six students of Mapalad Integrated
        High School in a study about the number of hours a student exercises each
        week and the score s/he gets in a test.
                  Student          Hours (X)    Score (Y)
                      A               1             25
                      B               2             5
                      C               3             20
                      D               4             40
                      E               5             25
                      F               6             9
     2. A group of Senior High School students is conducting a collaborative research
        and they want to determine whether there is a correlation between the age of
        tricycles (in years) in a certain city and the mileage it runs (in kilometers). The
        data are shown below.
   Age of tricycle, in years (X)      0.5      1         1.5     2         3         4
   Mileage, in km/liter (Y)           16       14        10     12         10        12
________________________________________                              _______________
Parent’s/Guardian’s Signature Over Printed Name                             Date
References:
Alonzo, George. Statistics and Probability for Senior High School Grade 11.
            Makati City: Salinlahi Publishing House, Inc., 2017.
Belecina, Rene R. Statistics and Probability. Manila: Rex Book Store, 2016.
Department of Education. Statistics and Probability. Quarter 4 Modules