Correlation Analysis
A Performance Task Presented to Ms. Ivy M. Geronimo (Subject Instructor)
Cafino, John Spain D.
Cordero, Kristine M.
Estefani, Louie Althea M.
Galdores, Gabriel Rhen J.
Guinto, Derik Connery F.
Padilla, Dennis Angelo M.
Paguio, Shania M.
Quijano, Anthony Lewi S.
11 STEM - A
INTRODUCTION
An inferential statistical test of correlation is used to ascertain whether there is
a statistically significant relationship or connection between two variables. With the
aid of the bivariate data, learners can determine, interpret, and analyze the two
quantitative variables. This data can help to:
Identify trends and patterns
Identify cause-and-effect relationships
Researchers make predictions
Inform decision-making
The following types of bivariate analysis are used to conduct correlation and
regression line analyses:
Scatter plots
Correlation
Regression
In line with this, listed below are the sets of bivariate data (a minimum of 70
pairs in the sample) that are relevant to everyday life:
I. Academic Performance of 70 Senior High School Students
The group wants to study the relationship between the number of hours
students spend reviewing and their scores on a standardized test.
Hours of Reviewing Test Scores (over 100)
6 85
3 78
4 57
2 45
7 90
4 70
12 96
3 42
6 62
7 68
1 50
1 75
3 80
4 85
5 78
8 86
11 83
6 100
3 62
14 94
9 67
6 78
2 40
4 43
4 36
5 80
3 89
2 69
4 56
8 75
7 60
1 34
0.5 50
5 77
7 87
6 86
9 89
10 91
12 93
2 64
4 68
5 69
1 21
2 64
5 89
6 89
2 70
13 100
4 94
1 54
6 94
9 32
0.5 65
2 78
3 59
12 98
4 53
2 77
6 89
7 64
8 75
0.5 21
10 98
2 25
7 88
12 97
1.5 66
0.5 65
4 73
8 87
II. Allowance of 70 Senior High School Students
The group wants to study the relationship between age and the amount of
allowance that senior high school students receive from their parents.
Age Money Earned (peso/week)
16 500
18 350
15 460
17 600
17 643
17 575
19 410
18 255
17 650
16 245
16 350
18 380
19 285
17 300
16 180
19 540
18 690
17 245
18 700
16 650
18 260
16 270
16 385
17 460
17 470
17 230
18 440
19 345
16 560
18 640
16 430
18 740
16 330
15 220
16 540
17 525
18 430
16 760
17 586
18 630
17 530
19 750
17 520
16 640
18 420
18 350
17 950
16 640
17 750
18 540
19 420
18 330
17 720
16 640
18 420
17 320
18 240
17 140
17 820
19 520
19 740
17 405
18 640
19 530
16 540
17 720
17 650
16 690
16 530
19 480
III. Time Spent of 70 STEM Students on Social Media
The group wants to study the relationship between the amount of time STEM
students spend on social media and their confidence level.
Time on Social Media (minutes) Confidence Level (1-10)
60 7
120 4
40 6
140 3
70 6
80 7
140 5
40 9
60 6.5
75 7
95 4
105 6
180 3
80 7
20 6
80 9.5
50 6
80 7
60 5
30 9
40 4
55 7
70 7
85 6
115 5.5
60 8
30 6
170 3
140 2
80 7
75 3.5
105 1
45 4
50 8
85 7
65 6
60 5
40 5
20 4.5
10 9
60 8
80 8
95 7
85 6
74 5
56 6.5
90 4
25 8
60 9
45 6
60 4
45 5.5
50 6
40 7
110 3
130 2
60 8
30 4.5
55 8
40 6
30 5
45 4
30 5
100 4.5
105 4
240 1
55 3
75 6
60 6
90 9
The bivariate data that we will be using consists of three variables: academic
performance, allowance, and social media. Academic performance refers to the
hours of reviewing that affect a student's test score. Allowance refers to the financial
support that a student receives from their family and how age affects the money
earned. Social media refers to the time spent on social media that affects their
confidence level.
By examining the relationship between these variables using correlation analysis, we
can gain insights into how they are related and better understand the factors that
influence students.
Academic Performance - Hours of reviewing and Test Score
Allowance - Age and Money earned
Social Media - Time on social media and confidence level
The importance of our bivariate data is to let the researchers determine and
describe the relationship between the two variables and how they are related to each
other. The research presented here either confirms or denies the relationship and
association hypotheses. This is crucial in understanding the underlying patterns and
trends in the data, which can inform future research and decision-making.
Additionally, bivariate data analysis can also help identify potential confounding
variables that may affect the relationship between the two variables of interest. The
value of a dependent variable can be predicted using changes in the value of an
independent variable. This can be useful in a variety of study fields, including social
science, medicine, marketing, and others. We used scatter plots to show how the two
variables are related. The main purpose of scatter plots is to examine and display
correlations between two numerical variables. The patterns displayed by the dots in a
scatter plot are in addition to the values of the individual data points when the data as
a whole is viewed.
The things we want to learn from the data are: (1) how much time people
spend on social media; and (2) how much they compare themselves socially. By
analyzing the scatter plot, we can identify any correlation between time spent on
social media and social comparison. This information can be useful in understanding
the impact of social media on individuals' mental health and well-being. (3) The
number of hours students spend reviewing subjects that affect how well they perform
on tests; and lastly, (4) How much allowance do senior high school students receive
from their parents, and at what age do most have a lot?
HYPOTHESIS TESTING AND COMPUTATION
After creating the three sets of bivariate data, the group went ahead and
identified the null and alternative hypotheses, scatter plots, the computation for r, and
regression line. The data are presented below from the academic performance,
allowance, and social media respectively.
I. Academic Performance of 70 Senior High School Students
a) Null and Alternative Hypothesis
H 0 : There is a correlation between the hours of reviewing and the test
scores of 70 Senior High School students.
H A : There is no correlation between the hours of reviewing and the test
scores of 70 Senior High School students
b) Scatter Plot
Academic Performance of 70 Students
Test Scores (over 100) Linear (Test Scores (over 100))
120
100
f(x) = 3.55279010301454 x + 52.5286858207314
R² = 0.369596638718595
80
Range of Scores
60
40
20
0
0 2 4 6 8 10 12 14 16
Hours of Reviewing
c) The computation for r
Excel Formula For Pearson R: 0.607944602
Correlation Data Analysis Table:
Hours of Reviewing Test Scores (over 100)
d) The computation
Money for the regression line
Earned (peso/week) 1
Age Regression Statistics 0.607944602 1
Multiple R 0.607944602
R Square 0.369596639
Adjusted R Square 0.360326001
Standard Error 16.19347792
Observations 70
ANOVA
df SS MS F Significance F
Regression 1 10454.3894 10454.3894 39.86744516 2.38361E-08
Residual 68 17831.55345 262.2287273
Total 69 28285.94286
Coefficients Standard Error t Stat
Intercept 52.52868582 3.511512081 14.95899334
Hours of Reviewing 3.552790103 0.562678534 6.314067244
P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
3.41734E-23 45.52157256 59.53579908 45.52157256 59.53579908
2.38361E-08 2.429982569 4.675597637 2.429982569 4.675597637
y= 3.55279010301455x + 52.5286858207314
II. Allowance of 70 Senior High School Students
a) Null and Alternative Hypothesis
H 0 : There is a relationship between the age and the amount of allowance
70 Senior High School students receive from their parents.
H A : There is no relationship between the age and the amount of
allowance 70 Senior High School students receive from their parents.
b) Scatter Plot
Allowance of 70 Senior High School Students
Money Earned (peso/week) Linear (Money Earned (peso/week))
Range of Money Allowance Quantity
1000
900
800
700
600
500
400 f(x) = 3.61770981507823 x + 435.443456614509
R² = 0.000486221370092577
300
200
100
0
14.5 15 15.5 16 16.5 17 17.5 18 18.5 19 19.5
Age
c) The computation for r
Excel Formula For Pearson R: 0.022050428
Correlation Data Analysis Table:
Hours of Reviewing Test Scores (over 100)
Money Earned (peso/week) 1
Age 0.022050428 1
d) The computation for the regression line
Regression Statistics
Multiple R 0.022050428
R Square 0.000486221
Adjusted R Square -0.014212511
Standard Error 178.2913523
Observations 70
ANOVA
df SS MS F Significance F
Regression 1 1051.513199 1051.513199 0.033079137 0.85622057
Residual 68 2161570.83 31787.80632
Total 69 2162622.343
Coefficients Standard Error t Stat
Intercept 435.4434566 343.3554575 1.268200191
Hours of Reviewing 3.617709815 19.89100118 0.181876708
P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
0.209051307 -249.7116912 1120.598604 -249.7116912 1120.598604
0.85622057 -36.07416505 43.30958468 -36.07416505 43.30958468
y= 3.61770981507823x + 435.443456614509
III. Time Spent by 70 STEM Students on Social Media
a. Null and Alternative Hypothesis
H 0 : There is a connection between the amount of time 70 STEM students
spent on social media along with their confidence level.
H A : There is no connection between the amount of time 70 STEM
students spent on social media along with their confidence level.
b. Scatter Plot
Time Spent of 70 STEM Students on Social Media
10 Confidence Level(1-10) Linear (Confidence Level(1-10))
9
8
Confidence Level Ranking
7 f(x) = − 0.0254716855137361 x + 7.58631983225568
R² = 0.269073629443458
6
5
4
3
2
1
0
0 50 100 150 200 250 300
Range of Minutes Spent
c. The computation for r
Excel Formula For Pearson R: -0.518723076
Correlation Data Analysis Table:
Hours of Reviewing Test Scores (over 100)
Money Earned (peso/week) 1
Age -0.518723076 1
d. The computation for the regression line
Regression Statistics
Multiple R 0.518723076
R Square 0.269073629
Adjusted R Square 0.258324712
Standard Error 1.706164918
Observations 70
ANOVA
df SS MS F Significance F
Regression 1 72.86994374 72.86994374 25.03262646 4.22041E-06
Residual 68 197.9479134 2.910998727
Total 69 270.8178571
Coefficients Standard Error t Stat
Intercept 7.586319832 0.424873109 17.85549536
Hours of Reviewing -0.025471686 0.005091016 -5.003261582
P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
2.26363E-27 6.738498743 8.434140922 6.738498743 8.434140922
4.22041E-06 -0.03563065 -0.015312721 -0.03563065 -0.015312721
y= -0.0254716855137361x + 7.58631983225568
RESULTS AND IMPLICATIONS
(you may include the answer for the essential question in this part)
(add if the results of each scatter plots are either strong positive, strong negative,
moderately positive, moderately negative, weak or no correlation)
DOCUMENTATION