0% found this document useful (0 votes)

6 views10 pages

Unit 3FDS

The document covers key concepts in statistics related to correlation and regression, including types of correlation, scatter plots, and the correlation coefficient. It explains regression analysis, including simple and multiple regression, the least squares regression line, and the interpretation of R-squared. Additionally, it discusses regression toward the mean and the regression fallacy, providing examples and formulas for calculations.

Uploaded by

yogeshkumarcpt

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views10 pages

Unit 3FDS

Uploaded by

yogeshkumarcpt

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

III SEM CSE

Unit – III
DESCRIBING RELATIONSHIPS
Correlation – Scatter plots – correlation coefficient for quantitative data – computational formula for correlation
coefficient – Regression – regression line – least squares regression line – Standard error of estimate –
interpretation of r2 – multiple regression equations – regression towards the mean

Correlation
Correlation refers to a process for establishing the relationships between two variables. You learned a way to
get a general idea about whether or not two variables are related, is to plot them on a “scatter plot”. While there
are many measures of association for variables which are measured at the ordinal or higher level of
measurement, correlation is the most commonly used approach.

Types of Correlation
 Positive Correlation – when the values of the two variables move in the same direction so that an
increase/decrease in the value of one variable is followed by an increase/decrease in the value of the
other variable.
 Negative Correlation – when the values of the two variables move in the opposite direction so that an
increase/decrease in the value of one variable is followed by decrease/increase in the value of the other
variable.
 No Correlation – when there is no linear dependence or no relation between the two variables.

SCATTERPLOTS
A scatter plot is a graph containing a cluster of dots that represents all pairs of scores. In other words
Scatter plots are the graphs that present the relationship between two variables in a data-set. It represents data
points on a two-dimensional plane or on a Cartesian system.

Construction of scatter plots

 The independent variable or attribute is plotted on the X-axis. Fig 6.1
 The dependent variable is plotted on the Y-axis.
1
 Use each pair of scores to locate a dot within the scatter plot
III SEM CSE
Positive, Negative, or Little or No Relationship?

The first step is to note the tilt or slope, if any, of a dot cluster.
A dot cluster that has a slope from the lower left to the upper right, as in panel A of below figure reflects a
positive relationship.

A dot cluster that has a slope from the upper left to the lower right, as in panel B of below figure reflects a
negative relationship.

A dot cluster that lacks any apparent slope, as in panel C of below figure reflects little or no relationship.

Perfect Relationship
A dot cluster that equals (rather than merely approximates) a straight line reflects a perfect relationship between
two variables.

Curvilinear Relationship
The previous discussion assumes that a dot cluster approximates a straight line and, therefore, reflects a linear
relationship. But this is not always the case. Sometimes a dot cluster approximates a bent or curved line, as in
below figure, and therefore reflects a curvilinear relationship.

2
III SEM CSE

A CORRELATION COEFFICIENT FOR QUANTITATIVE DATA : r

The correlation coefficient, r, is a summary measure that describes the extent of the statistical
relationship between two interval or ratio level variables.

Properties of r
 The correlation coefficient is scaled so that it is always between -1 and +1.
 When r is close to 0 this means that there is little relationship between the variables and the farther away
from 0 r is, in either the positive or negative direction, the greater the relationship between the two
variables.
 The sign of r indicates the type of linear relationship, whether positive or negative.
 The numerical value of r, without regard to sign, indicates the strength of the linear relationship.
 A number with a plus sign (or no sign) indicates a positive relationship, and a number with a minus sign
indicates a negative relationship

COMPUTATION FORMULA FOR r

Calculate a value for r by using the following computation formula:

Where the two sum of squares terms in the denominator are defined as

The sum of the products term in the numerator, SPxy, is defined in below formula

Or the formula is written as

Where n = Number of Information

Σx = Total of the First Variable Value
Σy = Total of the Second Variable Value
Σxy = Sum of the Product of first & Second Value
Σx2 = Sum of the Squares of the First Value
Σy2 = Sum of the Squares of the Second Value

3
III SEM CSE

REGRESSION
A regression is a statistical technique that relates a dependent variable to one or more independent
(explanatory) variables. A regression model is able to show whether changes observed in the dependent variable
are associated with changes in one or more of the explanatory variables.
Regression captures the correlation between variables observed in a data set, and quantifies whether
those correlations are statistically significant or not.

A Regression Line
a regression line is a line that best describes the behaviour of a set of data. In other words, it’s a line that best
fits the trend of a given data.

4
III SEM CSE

The purpose of the line is to describe the interrelation of a

dependent variable (Y variable) with one or many
independent variables (X variable). By using the equation
obtained from the regression line an analyst can forecast
future behaviours of the dependent variable by inputting
different values for the independent ones.

Types of regression
The two basic types of regression are
 Simple linear regression
Simple linear regression uses one independent variable to
explain or predict the outcome of the dependent variable Y
 Multiple linear regression
Multiple linear regressions use two or more independent
variables to predict the outcome

Predictive Errors
Prediction error refers to the difference between the predicted values made by some model and the
actual values.

LEAST SQUARES REGRESSION LINE

The placement of the regression line minimizes not the total predictive error but the total squared
predictive error, that is, the total for all squared predictive errors. When located in this fashion, the regression
line is often referred to as the least squares regression line.
The Least Squares Regression Line is the line that minimizes the sum of the residuals squared. The
residual is the vertical distance between the observed point and the predicted point, and it is calculated by
subtracting ˆy from y.

Formula

y’ = bx+a b – slope , a – y intercept

b= N Σ(xy) − Σx Σy
N Σ(x2) − (Σx)2

5
III SEM CSE

b = Σy − m Σx
N
Example

"x" "y"
2 4
3 5
5 7
7 10
9 15

Step 1: For each (x,y) calculate x2 and xy:

x y x2 xy
2 4 4 8
3 5 9 15
5 7 25 35
7 10 49 70
9 15 81 135
Step 2: Sum x, y, x2 and xy (gives us Σx, Σy, Σx2 and Σxy):
Σx: 26 Σy: 41 Σx2: 168 Σxy: 263
Step 3: Calculate Slope b
b = N Σ(xy) − Σx Σy
N Σ(x2) − (Σx)2

= 5 x 263 − 26 x 41
5 x 168 − 262

= 1315 − 1066
840 − 676

= 249
164
b = 1.5183.

Step 4: Calculate Intercept a

a = Σy − b Σx
N
= 41 − 1.5183 x 26
5
a = 0.3049.

Step 5: y’ = bx+a
y’ = 1.518x + 0.305

6
III SEM CSE

x y y = 1.518x + 0.305 error

2 4 3.34 −0.66
3 5 4.86 −0.14
5 7 7.89 0.89
7 10 10.93 0.93
9 15 13.97 −1.03

To predict the y value we can assume any value for x.

Assume x = 8.
Then y = 1.518 x 8 + 0.305
= 12.45

STANDARD ERROR OF ESTIMATE ,s y | x

The standard error of the estimate is a measure of the accuracy of predictions. The regression line is the
line that minimizes the sum of squared deviations of prediction (also called the sum of squares error), and the
standard error of the estimate is the square root of the average squared deviation.
The standard error of estimate and symbolized as s y | x, this estimate of predictive error complies with
the general format for any sample standard deviation, that is, the square root of a sum of squares term divided
by its degrees of freedom.

Fig. Predictive errors for five friends

Example
Calculate the standard error of estimate for the given X and Y values. X = 1,2,3,4,5 Y=2,4,5,4,5

7
III SEM CSE

Solution
Create five columns labeled x, y, y’, y – y’, ( y – y’)2 and N=5

x y x2 xy Y’= y-y’ ( y – y’)2

bx+a
1 2 1 2 2.8 -0.8 0.64
2 4 4 8 3.4 0.6 0.36
3 5 9 15 4.0 1 1
4 4 16 16 4.6 -0.6 0.36
5 5 25 25 5.2 -0.2 0.04
Σ( y – y’)2
Σx:15 Σy:20 Σx2:55 Σxy:66 = 2.4
2 2
Note: for finding b value we have to find xy and x , so add xy and x column in table

b = N Σ(xy) − Σx Σy
N Σ(x2) − (Σx)2

b=5(66)-15x20
5(55)-(15)2
=
330 – 300
275-225

b= 30/50 = 0.6

a = Σy − b Σx
N
= 20 – (0.6 x 15)
5
= 20 – 11
5
a= 9/5 = 2.2

SSy/x = √((y-y’)2 / n-2)

=√(2.4/3)

SSy/x = 0.894

INTERPRETATION OF r 2
R-Squared (R² or the coefficient of determination) is a statistical measure in a regression model that
determines the proportion of variance in the dependent variable that can be explained by the independent
variable. In other words, r-squared shows how well the data fit the regression model (the goodness of fit).
R-squared can take any values between 0 to 1. Although the statistical measure provides some useful
insights regarding the regression model, the user should not rely only on the measure in the assessment of a
statistical model.

8
III SEM CSE

In addition, it does not indicate the correctness of the regression model. Therefore, the user should
always draw conclusions about the model by analyzing r-squared together with the other variables in a
statistical model.
The most common interpretation of r-squared is how well the regression model explains observed data.

MULTIPLE REGRESSION EQUATIONS

Multiple regression is a statistical technique applied on datasets dedicated to draw out a relationship
between one response or dependent variable and multiple independent variables.
Multiple regression works by considering the values of the available multiple independent variables and
predicting the value of one dependent variable.

Example:
A researcher decides to study students’ performance from a school over a period of time. He observed that as
the lectures proceed to operate online, the performance of students started to decline as well. The parameters for
the dependent variable “decrease in performance” are various independent variables like “lack of attention,
more internet addiction, neglecting studies” and much more.

Formula to find multiple regression

y = b1x1 + b2x2 + … bnxn + a

REGRESSION TOWARD THE MEAN

Regression toward the mean refers to a tendency for scores, particularly extreme scores, to shrink toward the
mean.
In statistics, regression toward the mean (also called reversion to the mean, and reversion to mediocrity) is a
concept that refers to the fact that if one sample of a random variable is extreme, the next sampling of the same
random variable is likely to be closer to its mean.

Example
A military commander has two units return, one with 20% casualties and another with 50% casualties. He
praises the first and berates the second. The next time, the two units return with the opposite results. From this
experience, he “learns” that praise weakens performance and berating increases performance.

The Regression Fallacy

The regression fallacy is committed whenever regression toward the mean is interpreted as a real, rather
than a chance, effect.
The regression fallacy can be avoided by splitting the subset of extreme observations into two groups

9
III SEM CSE

Notes Unit 3 Updated
No ratings yet
Notes Unit 3 Updated
17 pages
Correlation and Regression Analysis - Updated
No ratings yet
Correlation and Regression Analysis - Updated
49 pages
Regression and Correlation
No ratings yet
Regression and Correlation
13 pages
Correlation
No ratings yet
Correlation
57 pages
Chapter 1
No ratings yet
Chapter 1
22 pages
15 MAY - NR - Correlation and Regression
No ratings yet
15 MAY - NR - Correlation and Regression
10 pages
Regression and Correlation
No ratings yet
Regression and Correlation
14 pages
Investigating Variables
No ratings yet
Investigating Variables
15 pages
Y X y X N B: Linear Regression
No ratings yet
Y X y X N B: Linear Regression
7 pages
REGRESSION and CORRELATION ANALYSIS STA 106 - DR. BASHIRU
No ratings yet
REGRESSION and CORRELATION ANALYSIS STA 106 - DR. BASHIRU
10 pages
Correlation and Regression Analysis
No ratings yet
Correlation and Regression Analysis
12 pages
Regression and Correlation Analysis
No ratings yet
Regression and Correlation Analysis
16 pages
Correlation - Linear - Logistic Regression
No ratings yet
Correlation - Linear - Logistic Regression
123 pages
Simple Regression Analysis Guide
No ratings yet
Simple Regression Analysis Guide
58 pages
Final Project: Raiha, Maheen, Fabiha Mahnoor, Zara
No ratings yet
Final Project: Raiha, Maheen, Fabiha Mahnoor, Zara
14 pages
Correlation
100% (1)
Correlation
29 pages
Sta404 - Chapter 5 - Bivariate Analysis (Student)
No ratings yet
Sta404 - Chapter 5 - Bivariate Analysis (Student)
27 pages
DAM Class 21-24 Regression Analysis
No ratings yet
DAM Class 21-24 Regression Analysis
93 pages
Cha 6
No ratings yet
Cha 6
8 pages
Advanced Marketing Research
No ratings yet
Advanced Marketing Research
32 pages
Regression Analysis
No ratings yet
Regression Analysis
18 pages
Chapter 8
No ratings yet
Chapter 8
8 pages
Correlation and Regression Analyses
No ratings yet
Correlation and Regression Analyses
8 pages
Correlation and Simple Linear Regression Analyses: Objectives
No ratings yet
Correlation and Simple Linear Regression Analyses: Objectives
6 pages
Bio-L8 - Correlation and Regression Analysis
No ratings yet
Bio-L8 - Correlation and Regression Analysis
15 pages
Intro to Correlation & Regression
No ratings yet
Intro to Correlation & Regression
15 pages
Regression Analysis Overview
No ratings yet
Regression Analysis Overview
15 pages
Simple Linear Regression Part 1
No ratings yet
Simple Linear Regression Part 1
63 pages
Business Analytics Regression Guide
No ratings yet
Business Analytics Regression Guide
91 pages
Simple Linear Regression Analysis
No ratings yet
Simple Linear Regression Analysis
7 pages
Lecture 8 Correlation and Linear Regression
No ratings yet
Lecture 8 Correlation and Linear Regression
66 pages
Intro to Linear Regression
No ratings yet
Intro to Linear Regression
22 pages
Regression and Correlation
No ratings yet
Regression and Correlation
37 pages
Linear Regression & Correlation
No ratings yet
Linear Regression & Correlation
9 pages
Stat 4-6 Chapter
No ratings yet
Stat 4-6 Chapter
37 pages
Correlation and Regression
100% (6)
Correlation and Regression
36 pages
SM 38
No ratings yet
SM 38
28 pages
Regression & Correlation Guide
No ratings yet
Regression & Correlation Guide
5 pages
Correlation and Regression Guide
No ratings yet
Correlation and Regression Guide
9 pages
Data Analytics Lesson 11 Notes
No ratings yet
Data Analytics Lesson 11 Notes
8 pages
Regression and Correlation Guide
No ratings yet
Regression and Correlation Guide
13 pages
CH 4 - Correlation and Regression YARA&LAMA
No ratings yet
CH 4 - Correlation and Regression YARA&LAMA
27 pages
Chapter 6 Student
No ratings yet
Chapter 6 Student
21 pages
5 Chapter Fi
No ratings yet
5 Chapter Fi
29 pages
M. Amir Hossain PHD: Course No: Emba 502: Business Mathematics and Statistics
No ratings yet
M. Amir Hossain PHD: Course No: Emba 502: Business Mathematics and Statistics
31 pages
Correlation Regression
No ratings yet
Correlation Regression
58 pages
Unit 07 Regression Correlation
No ratings yet
Unit 07 Regression Correlation
36 pages
Regression Analysis 1 2020
No ratings yet
Regression Analysis 1 2020
40 pages
Lectures 14 15
No ratings yet
Lectures 14 15
66 pages
Chapter 8
No ratings yet
Chapter 8
8 pages
CH 6
No ratings yet
CH 6
42 pages
Relationship - Correlation and Regression
No ratings yet
Relationship - Correlation and Regression
42 pages
SCM Session 6 Correlation and Regression Analysis
No ratings yet
SCM Session 6 Correlation and Regression Analysis
63 pages
Regression Analysis
No ratings yet
Regression Analysis
21 pages
Chapter 14 Simple Linear Regression .
No ratings yet
Chapter 14 Simple Linear Regression .
39 pages
Stat II Chapter 6
No ratings yet
Stat II Chapter 6
11 pages
Microsoft PowerPoint Session 4 PDF
No ratings yet
Microsoft PowerPoint Session 4 PDF
86 pages
Simple Linear Regression and Correlation Analysis: Chapter Five
No ratings yet
Simple Linear Regression and Correlation Analysis: Chapter Five
5 pages
Unit1 Dsa Merged
No ratings yet
Unit1 Dsa Merged
37 pages
Unit 1 Atds
No ratings yet
Unit 1 Atds
8 pages
Unit 1 DXV
No ratings yet
Unit 1 DXV
28 pages
Data Structures and Algorithms Unit 1
No ratings yet
Data Structures and Algorithms Unit 1
23 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
56 pages
Technology Mini Lesson Plan
No ratings yet
Technology Mini Lesson Plan
5 pages
John 1990, Price: $59.95.: Graphical Models in Applied Multivariate Statistics
No ratings yet
John 1990, Price: $59.95.: Graphical Models in Applied Multivariate Statistics
1 page
Ha01 - PP Test
No ratings yet
Ha01 - PP Test
17 pages
Dougherty5e Studyguide ch14
No ratings yet
Dougherty5e Studyguide ch14
13 pages
Assessment 1: Research Brief Project
No ratings yet
Assessment 1: Research Brief Project
18 pages
Contoh Artikel Kuantitatif
No ratings yet
Contoh Artikel Kuantitatif
11 pages
Using Outreg2 To Report Regression Output, Descriptive Statistics, Frequencies and Basic Crosstabulations
No ratings yet
Using Outreg2 To Report Regression Output, Descriptive Statistics, Frequencies and Basic Crosstabulations
16 pages
Burden, Spacing and Borehole Diameter at Rock Blasting
No ratings yet
Burden, Spacing and Borehole Diameter at Rock Blasting
10 pages
ROI, EPS & Stock Price Impact on IDX
No ratings yet
ROI, EPS & Stock Price Impact on IDX
15 pages
Foundations of Applied Statistical Methods 2nd Edition Hang Lee Newest Edition 2025
No ratings yet
Foundations of Applied Statistical Methods 2nd Edition Hang Lee Newest Edition 2025
152 pages
Generalized Additive Models Guide
No ratings yet
Generalized Additive Models Guide
10 pages
Jurnal+Al-balagh Vol+7+No1+2022 Removed+
No ratings yet
Jurnal+Al-balagh Vol+7+No1+2022 Removed+
30 pages
Assessment of The Determinants of Non-Performing Loans and Their Effects On Performance of Commercial Banks in Kenya
No ratings yet
Assessment of The Determinants of Non-Performing Loans and Their Effects On Performance of Commercial Banks in Kenya
23 pages
Regression Shrinkage Techniques
No ratings yet
Regression Shrinkage Techniques
5 pages
Regression Project
100% (1)
Regression Project
60 pages
Correlation Regression
100% (1)
Correlation Regression
25 pages
BitlyCaseStudy BSD 4
No ratings yet
BitlyCaseStudy BSD 4
6 pages
1 s2.0 S0883540322000687 Main PDF
No ratings yet
1 s2.0 S0883540322000687 Main PDF
5 pages
Martensite Start Temp in High-Mn Steels
No ratings yet
Martensite Start Temp in High-Mn Steels
17 pages
SDSM Manual
No ratings yet
SDSM Manual
70 pages
Regression Analysis of Soil Compressibility
No ratings yet
Regression Analysis of Soil Compressibility
11 pages
BA H DSEC IiApplied Econometrics 5th Sem
No ratings yet
BA H DSEC IiApplied Econometrics 5th Sem
7 pages
Worksheet 1
No ratings yet
Worksheet 1
3 pages
ECN 702 Final Examination Question Paper
No ratings yet
ECN 702 Final Examination Question Paper
6 pages
Teens' Online Self-Presentation Study
No ratings yet
Teens' Online Self-Presentation Study
13 pages
Term Paper Econometrics
100% (1)
Term Paper Econometrics
6 pages
Effects of Employee Training On Employee Performance: A Case of The Judiciary of Kenya
No ratings yet
Effects of Employee Training On Employee Performance: A Case of The Judiciary of Kenya
9 pages
The Impact of Brand Reputation Brand Equity and Brand
No ratings yet
The Impact of Brand Reputation Brand Equity and Brand
15 pages
Instant Download of Statistical and Econometric Methods For Transportation Data Analysis 1st Edition Simon P. Washington Ebook PDF, Every Chapter
100% (17)
Instant Download of Statistical and Econometric Methods For Transportation Data Analysis 1st Edition Simon P. Washington Ebook PDF, Every Chapter
86 pages

Unit 3FDS

Uploaded by

Unit 3FDS

Uploaded by

III SEM CSE

Construction of scatter plots

A CORRELATION COEFFICIENT FOR QUANTITATIVE DATA : r

COMPUTATION FORMULA FOR r

Or the formula is written as

Where n = Number of Information

The purpose of the line is to describe the interrelation of a

LEAST SQUARES REGRESSION LINE

y’ = bx+a b – slope , a – y intercept

Step 1: For each (x,y) calculate x2 and xy:

Step 4: Calculate Intercept a

x y y = 1.518x + 0.305 error

To predict the y value we can assume any value for x.

STANDARD ERROR OF ESTIMATE ,s y | x

Fig. Predictive errors for five friends

x y x2 xy Y’= y-y’ ( y – y’)2

SSy/x = √((y-y’)2 / n-2)

MULTIPLE REGRESSION EQUATIONS

Formula to find multiple regression

REGRESSION TOWARD THE MEAN

The Regression Fallacy

You might also like