0% found this document useful (0 votes)

94 views7 pages

Y X y X N B: Linear Regression

Linear regression allows predicting values of a dependent variable from known independent variable values using a regression equation. The linear regression line is estimated using the least squares method to minimize error and expressed as y = a + bx, where a is the y-intercept and b is the slope. Correlation analysis measures the strength of relationships between two variables using a correlation coefficient r that ranges from -1 to 1, indicating the extent to which data points cluster around a linear relationship.

Uploaded by

Maria Zenypher

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

94 views7 pages

Y X y X N B: Linear Regression

Uploaded by

Maria Zenypher

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

You are on page 1/ 7

LINEAR REGRESSION

A mathematical equation that allows us to predict the values of one dependent variable from
known values of one or more independent variables is called a regression equation. The term
regression equation is derived from the original heredity studies made by Francis Galton. In his
study, he compared the heights of the sons of tall fathers over successive generations regressed
toward the mean height of the population. In other words, sons of unusually tall fathers tend to be
shorter than their fathers and sons of unusually short fathers tend to be taller than their fathers. Today
the term regression is applied to all types of prediction problems and does not necessarily imply a
regression toward a population mean.

In the study of linear regression, we consider the problem of estimating or predicting the
value of a dependent variable Y on the basis of a known measurement of an independent and
frequently controlled variable X.

Using a scatter diagram, we can determine if the two variables are linearly related to some
extent. Once a reasonable linear relationships has been ascertained, we usually express this
mathematically by a straight-line equation called the linear regression line. The linear regression
line is written using the slope-intercept form

y  a  bx

where the constants a and b represents the y-intercept and slope, respectively. The symbol y is used
here to distinguish between the value given by the regression line and an actual observed value y for
some value of x.

Once the point estimates a and b are determined from the sample data, the linear regression

line can be used to predict the value y corresponding to any given value x.

Estimation of Parameters. Given the sample   X i , Yi  ; i  1, 2,  , n , the least-squares estimate

of the parameters in the regression line

y  a  bx
are obtained from the formulas

n x 1 y1   x1   y  1
b

n x    x 1 
2
2
1

and a  y  bx

Example 1. Consider the following data:

x 1 2 3 4 5 6
y 6 4 3 5 4 2
(a) Find the equation of the regression line.
(b) Graph the line on a scatter diagram.
(c) Find the point estimate of  y14. .
Solution:
xi yi xi yi x12 y12
1 6 6 1 36
2 4 8 4 16
3 3 9 9 9
4 5 20 16 25
5 4 20 25 16
6 2 12 36 4

1
Total 21 24 75 91 106
(a) n  6,  x  21
i  y  24 i  x y  75
i i

 x  91
2
i  y  106 2
i
y  4 and x  3.5

Substituting these values in the formula for b, we get

n xi yi    xi    yi  6  75   21  24 450  504  54

b     0.5143
n xi2    xi 
2

6  91   21
2
  546  441 105
b  0.514.
a  y  bx  4    0.514  3.5  4  1.799  5.799
a  5.799

yˆ  a  bx  5.799  0.514 x. This is the regression line

(b) Y
7
6 .
. . .
5
4 . . .
3 . . .
2
1 X
1 2 3 4 5 6 7 8

Since the slope of y is negative, it implies that as x increases y decreases.

( c) yˆ  a  bx  5.799    0.514   4   5.799  2.056  3.743

LINEAR CORRELATION

We shall consider here the problem of measuring the relationship between two variables X
and Y rather than predicting a value of Y from a knowledge of the independent variable X. For
example, if X represents the amount of money spent yearly on advertising by a retail merchandising
firm and Y represents their total yearly sales, we might ask whether a decrease in advertising is likely
to be accompanied by a decrease in the yearly sales.
Correlation analysis attempts to measure the strength of such relationships between two
variables by means of a single number called a correlation coefficient.

A linear correlation coefficient is defined to be a measure of the linear relationship between

the two random variables X and Y. This relationship is denoted by r. r measures the extent to which
the points cluster about a straight line. By constructing a scatter diagram for the n pairs of
measurements   xi , yi  ; i  1, 2, , n in our random sample (as in the graph below), we are able
to draw certain conclusions concerning r. If the points follow closely a straight line of positive slope
as in (a), we have a high positive correlation between the two variables. On the other hand, if the
points follow closely a straight line of negative slope as in (b), we have a high negative correlation
between the two variables. The correlation between the two variables decreases numerically as the
scattering of points from a straight line increases. If the points follow a strictly random pattern as in
(c) below, we have zero correlation and conclude that no linear relationship exists between X and Y.

2
Y Y
. .
.. . .
... . . . .
.... . . .
... . . .
... . .
X X
(a) (b)
Y
Y
. . ...
. . . .. ...
. . . ... ...
. . . ... ...
. . . ..
. . . .... ...
X X
(c ) (d)
The correlation coefficient between two variables is a measure of their linear relationship and
a value of r  0 implies a lack of linearity and not a lack of association. Hence, if a strong quadratic
relationship exists between X and Y as indicated in (d), we still obtain a zero correlation even though
there is a strong nonlinear relationship.

The most widely used measure of linear correlation between two variables is called
PEARSON PRODUCT-MOMENT CORRELATION COEFFICIENT or simply the SAMPLE
CORRELATION COEFFICIENT and is denoted by r.

The measure of linear relationship between two variables X and Y is estimated by the sample
correlation coefficient r, where

n xi yi    xi 
 y  S i
r b
n  x    x   n  y    y   S
x
2 2 2 2
y
1 i 1 i

Since SSE   n  1  S y  b S x 
2 2 2

And by dividing both sides of the equation by  n  1 S y , we obtain the relation

SSE
r2  1
 n  1 S y2
2
Note that SSE and S y are always nonnegative, we can say that r 2 must be between zero and 1.
Consequently r must range from –1 to +1. A value of r = -1 will occur when SSE = 0 and all
points lie exactly on a straight line having a negative slope. If all points lie exactly on a straight line
having a positive slope, once again SSE =0 and we obtain a value r= +1. Hence a perfect linear
relationship exists between the values of X and Y in our sample when r  1. If r is close to +1 or –
1, the linear relationship between the two variables is strong and we say that we have a high
correlation. However, if r is close to zero, the linear relationship between X and Y is weak or perhaps
nonexistent.

A number that expresses the proportion of the total variation in the values of the variable Y
that can be accounted for or explained by the linear relationship with the values of the variable X is
usually referred to as the sample coefficient of variation and is denoted by r 2 . Thus a correlation of
r= 0.6 means that 0.36 or 36% of the total variation of the values of Y in our sample is accounted for
by linear relationship with the values of X.

3
The values of r and its interpretation

r Interpretation
1 Perfect positive correlation
0.91 - 0.99 very highly positively correlated
0.71 – 0.90 highly positively correlated
0.41 – 0.70 Marked or moderately positively correlated
0.21 - 0.40 Low or slightly positively correlated
0.01 - 0 .21 Very Low positive or Negligible
-0.01 – -0.20 Very low negative or Negligible
-0.21 - -0.40 Low or slightly negatively correlated
-0.41 - -0.70 Marked or moderately negatively correlated
-0.71- -0.90 Highly negatively correlated
-0.91- -0.99 Very highly negatively correlated
-1 Perfect negative correlation

Example 1: Compute and interpret the correlation coefficient for the following data:

X 4 5 9 14 18 22 24
Y 16 22 11 16 7 3 17
Solution:

xi yi x2 y2 xi yi
4 16 16 256 64
5 22 25 484 110
9 11 81 121 99
14 16 196 256 224
18 7 324 49 126
22 3 484 9 66
24 17 576 289 408
Total 96 92 1702 1464 1097

n7 x i  96 y i  92  x y  1097i i

x 2
1  1464 and  y  1702
2
1

substituting these values in the formula for r, we get

n x i y i    x i f   y  i
r
n  x 2
1    x   n  y    y 
i
2 2
1 i
2

7 1097    96   92 

7 1702   96  7 1464   92 
2 2

7679  8832

11914  9216 10248  8464
 1153  1153  1153
    0.5255462
 2698 1784 4813232 2193.9079
r  0.53

Since r= -0.53, the two variables X and Y are moderately negatively correlated.

Example 2. Compute and interpret the correlation coefficient for the aptitude scores and grade point
averages below:

4
Grade-point Average Aptitude Score
Y X
1.93 565
2.55 525
1.72 477
2.48 555
2.87 502
1.87 469
1.34 517
3.03 555
2.54 576
2.34 559
1.40 574
1.45 578
1.72 548
3.80 656
2.13 688
1.81 465
2.33 661
2.53 477
2.04 490
3.20 524

Solution:
GPA AS
YI XI X I YI X I2 YI2
1.93 565 1090.45 319225 3.7249
2.55 525 1338.75 275625 6.50250
1.72 477 820.44 227529 2.95840
2.48 555 1376.40 308025 6.15040
2.87 502 1440.74 252004 8.23690
1.87 469 877.03 219961 3.49690
1.34 517 692.78 267289 1.79560
3.03 555 1681.65 308025 9.18090
2.54 576 1463.04 331776 6.45160
2.34 559 1308.06 312481 4.47560
1.40 574 803.60 329476 1.96000
1.45 578 838.10 334084 2.10250
1.72 548 942.56 300304 2.95840
3.80 656 2492.80 430336 14.44000
2.13 688 1465.44 473344 4.53690
1.81 465 841.65 216225 3.27610
2.33 661 1540.13 436921 5.42890
2.53 477 1206.81 227529 6.40090
2.04 490 999.60 240100 4.16160
3.20 524 1676.80 274576 10.24000

TOTAL 45.08 10961 24896.83 6084835 109.47900

n  20 X i  10961 Yi  45.08  X Y  24896.83

i i

 X  6084835
1
2
and Y  109.47900
1
2

5
n xi yi    xi  y  i
r
n  x    x    n  y    y 
2
1 i
2 2
1 i
2

20  24896.83  10961  45.08

20  6084835  10961  20 109.47900   45.08 
2

497936.6  494121.88

121696700  120143521  2189.58  2032.2064
3814.72

153179 157.3736

3814.72

24106330 .67
3814.72

4909.81982
 0.776957228
r  0.78,
The grade-point averages are highly correlated with the aptitude scores.

The sample correlation coefficient r is a value computed from a random sample of n pairs of
measurements. Different random samples of size n from the same population will generally produce
different values of r.

EXERCISES: Solve each of the following problems. Show all solutions.

1. The grades of a class of 9 students on a midterm report (x) and on the final examination
(y) are as follows:
x 77 50 71 72 81 94 96 99 67
y 82 66 78 34 47 85 99 99 67

(a) Find the equation of the regression line.

(b) Estimate the final examination grade of a student who receive a grade of 85 on the
midterm report but was ill at the time of the final examination.
(c) Compute r.

2. A study was made on the amount of converted sugar in a certain process at various
temperatures. The data were coded and recorded as follows:

Temperature, x Converted Sugar, Temperature, x Converted Sugar, y

y
1.0 8.1 1.6 8.6
1.1 7.8 1.7 10.2
1.2 8.5 1.8 9.3
1.3 9.8 1.9 9.2
1.4 9.5 2.0 10.5
1.5 8.9

(a) Estimate the linear regression line.

(b) Estimate the amount of converted sugar produced when the coded temperature is
1.75.

3. A mathematics placement test is given to all entering freshmen at a small college. A

student who receives a grade below 35 is denied admission to the regular mathematics
course and placed in a remedial class. The placement test scores and the final grades for
20 students who took the regular course were recorded as follows:

6
Placement Test Course Grade Placement Test Course Grade
50 53 90 54
35 41 80 91
35 61 60 48
40 56 60 71
55 68 60 71
65 36 40 47
35 11 55 53
60 70 50 68
90 79 65 57
35 59 50 79

(a) Plot a scatter diagram.

(b) Find the equation of the regression line to predict course grades from placement
test scores.
(c) Graph the line on the scatter diagram
(d) If 60 is the minimum passing grade, below which placement test score should
students in the future be denied admission to this course?

4. Compute and interpret the correlation for the following grades of 6 students selected at
random.

Mathematics Grade 70 92 80 74 65 83
English Grade 74 84 63 87 78 90

5. The following data were obtained in a study of the relationship between the weight and
chest size of infants at birth:
Weight (kg) Chest Size (cm) Weight (kg) Chest Size (cm)
2.75 29.5 4.32 27.7
2.15 26.3 2.31 28.3
4.41 32.2 4.30 30.3
5.52 36.5 3.71 28.7
3.21 27.2
(a) Calculate r.
(b) Graph the line on a scatter diagram.
(c) Find the point estimate of  y14 .

Linear Regression & Correlation
No ratings yet
Linear Regression & Correlation
9 pages
15 MAY - NR - Correlation and Regression
No ratings yet
15 MAY - NR - Correlation and Regression
10 pages
REGRESSION and CORRELATION ANALYSIS STA 106 - DR. BASHIRU
No ratings yet
REGRESSION and CORRELATION ANALYSIS STA 106 - DR. BASHIRU
10 pages
Simple Linear Regression and Correlation Analysis: Chapter Five
No ratings yet
Simple Linear Regression and Correlation Analysis: Chapter Five
5 pages
Stat II Chapter 6
No ratings yet
Stat II Chapter 6
11 pages
Regression and Correlation
No ratings yet
Regression and Correlation
13 pages
Chapter 1
No ratings yet
Chapter 1
22 pages
Stats for Students & Educators
No ratings yet
Stats for Students & Educators
15 pages
Correlation and Regression
100% (6)
Correlation and Regression
36 pages
Lecture 11
No ratings yet
Lecture 11
16 pages
Regression and Correlation
No ratings yet
Regression and Correlation
37 pages
Chapter - 10.QM Sir Pac
No ratings yet
Chapter - 10.QM Sir Pac
8 pages
Correction
No ratings yet
Correction
10 pages
STAT1
No ratings yet
STAT1
17 pages
How Can We Explore The Association Between Two Quantitative Variables?
No ratings yet
How Can We Explore The Association Between Two Quantitative Variables?
7 pages
Correlation and Regression Analysis
No ratings yet
Correlation and Regression Analysis
8 pages
Lecture 7 8 Weeks Correlation and Regression
No ratings yet
Lecture 7 8 Weeks Correlation and Regression
7 pages
Lesson 2 - 1
No ratings yet
Lesson 2 - 1
44 pages
Correlation and Regression Analysis
No ratings yet
Correlation and Regression Analysis
12 pages
Correlation
No ratings yet
Correlation
57 pages
Correlation & Regression (Complete) .PDF Theory Module-6-B
100% (1)
Correlation & Regression (Complete) .PDF Theory Module-6-B
9 pages
Correlation, Regression & Curve Fitting
No ratings yet
Correlation, Regression & Curve Fitting
6 pages
Chapter 14 (Technical English For Statistics)
No ratings yet
Chapter 14 (Technical English For Statistics)
9 pages
Regression and Correlation Notes
No ratings yet
Regression and Correlation Notes
28 pages
07 - Correlation and Regression Analysis-1
No ratings yet
07 - Correlation and Regression Analysis-1
13 pages
Unit 3FDS
No ratings yet
Unit 3FDS
10 pages
4-Biol 605-Regression Models
No ratings yet
4-Biol 605-Regression Models
25 pages
M. Amir Hossain PHD: Course No: Emba 502: Business Mathematics and Statistics
No ratings yet
M. Amir Hossain PHD: Course No: Emba 502: Business Mathematics and Statistics
31 pages
Chapter-9-Simple Linear Regression & Correlation
No ratings yet
Chapter-9-Simple Linear Regression & Correlation
11 pages
Chapter 13 PowerPoint
No ratings yet
Chapter 13 PowerPoint
36 pages
Chp14 Past Papers SQ
No ratings yet
Chp14 Past Papers SQ
8 pages
5 - Chapter9-Linear Regression
No ratings yet
5 - Chapter9-Linear Regression
15 pages
Correlation and Regression
No ratings yet
Correlation and Regression
23 pages
PSNM - Ch. 1
No ratings yet
PSNM - Ch. 1
16 pages
CH VII - Regression & Correlation
No ratings yet
CH VII - Regression & Correlation
7 pages
Book 2 Notes-71-78
No ratings yet
Book 2 Notes-71-78
8 pages
Regression and Correlation
No ratings yet
Regression and Correlation
14 pages
Correlation
100% (1)
Correlation
29 pages
Final Project: Raiha, Maheen, Fabiha Mahnoor, Zara
No ratings yet
Final Project: Raiha, Maheen, Fabiha Mahnoor, Zara
14 pages
Unit 07 Regression Correlation
No ratings yet
Unit 07 Regression Correlation
36 pages
Handout 5 Correlation and Regression (Recovered)
No ratings yet
Handout 5 Correlation and Regression (Recovered)
6 pages
MCQs Unit 4 Correlation and Regression
89% (9)
MCQs Unit 4 Correlation and Regression
14 pages
Correlation & Regression Guide
No ratings yet
Correlation & Regression Guide
10 pages
Correlation and Regression
No ratings yet
Correlation and Regression
7 pages
Regression and Correlation Guide
No ratings yet
Regression and Correlation Guide
13 pages
Lectures 14 15
No ratings yet
Lectures 14 15
66 pages
31 Mathematics Correlation Regression
No ratings yet
31 Mathematics Correlation Regression
9 pages
Class Note II - 044242
No ratings yet
Class Note II - 044242
19 pages
Correlation and Regression
No ratings yet
Correlation and Regression
12 pages
Regression and Correlation
No ratings yet
Regression and Correlation
19 pages
Business Stat CHAPTER 6
No ratings yet
Business Stat CHAPTER 6
5 pages
Quant Notes 9-7-21
No ratings yet
Quant Notes 9-7-21
103 pages
Chapter 9
No ratings yet
Chapter 9
14 pages
Correlation and Regression
No ratings yet
Correlation and Regression
23 pages
Chapter 10
No ratings yet
Chapter 10
3 pages
Chapter 14 Simple Linear Regression .
No ratings yet
Chapter 14 Simple Linear Regression .
39 pages
Correlation and Regression Analysis
No ratings yet
Correlation and Regression Analysis
17 pages
Correlation and Regression 2020
No ratings yet
Correlation and Regression 2020
63 pages
Moisture Diffusivity in Quinoa Chenopodi PDF
No ratings yet
Moisture Diffusivity in Quinoa Chenopodi PDF
5 pages
Regression Analysis: Presented By:-Akansha Singh Abhishek Malhotra
No ratings yet
Regression Analysis: Presented By:-Akansha Singh Abhishek Malhotra
12 pages
Strategic Entrepreneurship and Performance: An Institutional Perspective On Indian Family Businesses
No ratings yet
Strategic Entrepreneurship and Performance: An Institutional Perspective On Indian Family Businesses
39 pages
Rebecaaaaa 111
No ratings yet
Rebecaaaaa 111
9 pages
Relationship Between Couples Communication Patterns and Marital Satisfaction
No ratings yet
Relationship Between Couples Communication Patterns and Marital Satisfaction
4 pages
SM 2021 MBA Assignment Forecasting Instructions
No ratings yet
SM 2021 MBA Assignment Forecasting Instructions
3 pages
6 2 NumPy For MATLAB Users
No ratings yet
6 2 NumPy For MATLAB Users
15 pages
Author's Original Submission
No ratings yet
Author's Original Submission
21 pages
Business Research Methods Guide
No ratings yet
Business Research Methods Guide
9 pages
Chapter Four Results and Discussion 4.1 Results: Table 1. Gender of Respondents
No ratings yet
Chapter Four Results and Discussion 4.1 Results: Table 1. Gender of Respondents
9 pages
Department of Physics Laboratory Manual For Nuclear Physics (Phys2052)
100% (1)
Department of Physics Laboratory Manual For Nuclear Physics (Phys2052)
132 pages
Bloom Canning Sevilla 2004
No ratings yet
Bloom Canning Sevilla 2004
13 pages
Biostat
100% (1)
Biostat
66 pages
J DENT RES-2014-Ney-876-81
No ratings yet
J DENT RES-2014-Ney-876-81
7 pages
Basic Business Statistics: 10 Edition
No ratings yet
Basic Business Statistics: 10 Edition
77 pages
IJESR - School Practices in Parental Involvement PDF
No ratings yet
IJESR - School Practices in Parental Involvement PDF
20 pages
P I 2 Business Mathematics and Statistics
No ratings yet
P I 2 Business Mathematics and Statistics
2 pages
The Correlation Between Students' Reading Habit and Their Reading Comprehension
No ratings yet
The Correlation Between Students' Reading Habit and Their Reading Comprehension
10 pages
Islamic Banks: Disclosure & Performance
No ratings yet
Islamic Banks: Disclosure & Performance
24 pages
Motivation and Teachers' Performance in Selected Public Secondary Schools in Ikenne Nigeria
No ratings yet
Motivation and Teachers' Performance in Selected Public Secondary Schools in Ikenne Nigeria
11 pages
Tribhuvan University: A Comparative Analysis of Financial Performance of Commercial Banks (Nabil, Nsbibl & HBL) by
No ratings yet
Tribhuvan University: A Comparative Analysis of Financial Performance of Commercial Banks (Nabil, Nsbibl & HBL) by
10 pages
STATISTICS 8 CHAPTERS 1 TO 6 Solve
No ratings yet
STATISTICS 8 CHAPTERS 1 TO 6 Solve
8 pages
MSC - ProCOR 2006 User's Guide
No ratings yet
MSC - ProCOR 2006 User's Guide
224 pages
Information Sharing Inventory Management and Customer Satisfaction The Case of Manufacturing Firms in Kampala 2162 6359-1-053
No ratings yet
Information Sharing Inventory Management and Customer Satisfaction The Case of Manufacturing Firms in Kampala 2162 6359-1-053
10 pages
Linear by Linear Association
0% (1)
Linear by Linear Association
5 pages
Hoshin X Matrix
No ratings yet
Hoshin X Matrix
6 pages
Often Often Always Often Often Sometimes Often Composite Mean
No ratings yet
Often Often Always Often Often Sometimes Often Composite Mean
7 pages
2015 JC2 H2 Maths Victoria Junior College PDF
100% (1)
2015 JC2 H2 Maths Victoria Junior College PDF
33 pages
Capitalist Development and Democracy - Notes
No ratings yet
Capitalist Development and Democracy - Notes
6 pages
Saudi Telecom's Culture & Knowledge Exchange
100% (1)
Saudi Telecom's Culture & Knowledge Exchange
19 pages