0% found this document useful (0 votes)
25 views21 pages

SLR Prediction

Uploaded by

hanandeh0791
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views21 pages

SLR Prediction

Uploaded by

hanandeh0791
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Simple Linear

Regression - Prediction
Research Objective
Research Question: What is the average student height for students whose mother is 64
inches tall?

How would you figure this out?


Prediction in Regression
Research Question: What is the average student height for students whose mother is 64
inches tall?
Answer: Use the best fit regression line to tell you the answer.

y^ = β^0 + β^1 x = 35.653 + 0.503 × 64 = 67.845


Confidence Intervals for Averages
Using similar principles as we have used in the past to build confidence intervals:

⋆ 1 (x − x̄)
^√ +
y^ ± t σ
n ∑ ni=1 (x i − x̄) 2

Is a confidence interval for the average value of y given an x (the population average
student height for 64 inch tall mothers) where the value of t ⋆ is determined by the
confidence level.
For our analysis, this comes out to be (67.662, 68.054) for a 95% interval.
Notes:

1. Don’t worry about the formula (computer will calculate this for you).
2. Interpetation: We are 95% confident that the average height of all students whose
mothers are 64 inches tall is between 67.662 and 68.054.
Prediction in Regression
Research Question: Shaylee’s mom is 64 inches tall, what will her height be?
Thought Questions:

1. Is this the same question as above? If not, what is the difference?


It’s not the same. One is asking about an average while one is asking about a
specific person.
The “average” is the line while specific people are the “dots”.
Prediction in Regression
Research Question: Shaylee’s mom is 64 inches tall, what will her height be?
Thought Questions:

2. Should our point prediction (1 number prediction) be the same or different?


The point prediction should be the same because “dots” could either fall above or
below the line. In this case, we still think Shaylee’s height will be 67.845.
Prediction in Regression
Research Question: Shaylee’s mom is 64 inches tall, what will her height be?

3. Should our interval for the prediction be the same or different? Why or why not?
It should be wider because heights vary a lot from person to person
Prediction Intervals for Individuals
Using similar principles as we have used in the past to build confidence intervals:

⋆ 1 (x − x̄)
^ √1 + +
y^ ± t σ
n ∑ ni=1 (x i − x̄) 2

is a prediction interval for the value of y given an x (for example, Shaylee’s height if her
mom is 64 inches tall) where the value of t ⋆ is determined by the confidence level.
For our analysis, this comes out to be (60.449, 75.268) for a 95% interval.
Notes:

1. Don’t worry about the formula (computer will calculate this for you).
2. Interpetation is similar: We are 95% confident that Shaylee’s height, given her mom is
64 inches tall, should be between 60.449 and 75.268.
Prediction vs Confidence Intervals
Confidence interval for prediction: An interval estimate for the average of y given an x.
Prediction interval for prediction: An interval estimate for the value of a single y given an
x.

Prediction intervals are ALWAYS wider than confidence intervals. Why?

There is more variability from student to student than with the average heights for
students.
Using the Analysis Tool
All previous steps in the tool are the same as covered in previous lecture notes:
Nuances of Predictions
Research Question: Lucy’s mom is 82 inches tall, what will her height be?

Answer:

Don’t do the prediction because its outside of the data range! This is referred to as
extrapolation.
Nuances of Predictions
1. Extrapolation - trying to predict outside of the range of the data.
Nuances of Predictions
2. How do we know if our predictions are any good? For example, how do we know if our
prediction for Shaylee’s height was good or bad?
Issue: To evaluate how well we do at predicting, we essentially need to know the
true answer of the thing we are predicting for.
Solution: Cross-validation
Principles of K-Fold CV
Purpose: Assess how well your model does at predicting
General Idea: Fit your model to part of your data then see how well your model predicts
the remainder of your data
Using the Analysis Tool
Nuances of Cross Validation
1. Randomly split the data into folds → every run of cross-validation will give slightly
different results
2. Lots of performance metrics but most common is root mean square error

 n validation
1
RMSE = ∑ (y i − y^i ) 2
⎷ n validation i=1

where y i is an observation in the validation set and y^i is the corresponding prediction.
3. The intuitive interpretation of RMSE is the average error across our predictions.
4. What constitutes a “small” RMSE is relative to the problem.
Additional Prediction Practice
Measuring possum head size can be difficult. However, measuring total possum length is
easier. What is the relationship between possum length and head size? Use a simple linear
regression model (and the course app) to answer the following questions:

1. Sydney found a huge 96 cm possum. What is your predicted head length for this
possum?
95% prediction interval is (92.431, 102.986).
2. Sydney found a huge 96 cm possum. What is the average head length for possums of
this size?
95% confidence interval is (96.545, 98.872).
3. Sydney found a baby 70 cm possum. What is your predicted head length for this
possum?
EXTRAPOLATION!
4. Is your model good or bad at predicting possum head sizes?
The RMSE of a 104 fold CV is 2.0132492.
Key Terminology
Confidence Intervals for Averages Prediction Intervals for Individuals
Extrapoloation Cross validation
Root mean square error (RMSE)

You might also like