Chapter 8: Regression Wisdom
• Patterns on residual
  plots p277
• Example: Population (in
  millions) in a country
  for 2000-2005
  (recorded as 0, 1, 2, 3,
  4, 5):
                                    1
Example
    • The regression equation
      is
    •   population = 5.19 + 0.686 year
    • R-Sq = 93.5%
                                         2
Example - continued
          • Nonlinearity is more
            prominent
                                   3
      Sifting Residuals for Groups
• No regression analysis is complete without a
  display of the residuals to check that the linear
  model is reasonable.
• Residuals often reveal subtleties that were not
  clear from a plot of the original data
• Sometimes they reveal violations of the
  regression conditions that require our attention
                                                      4
Example: Regression Analysis: Self-
       Esteem versus Age
                  • It is a good idea to look
                    at both a histogram of
                    the residuals and a
                    scatterplot of the
                    residuals vs. predicted
                    values:
                                                5
• Looks like two groups
                          6
Real Data
            7
Example
    • An examination of residuals
      often leads us to discover
      groups of observations that
      are different from the rest.
    • Histograms might show
      multiple modes.
    • When we discover that there
      is more than one group in a
      regression, we may decide to
      analyze the groups separately,
      using a different model for
      each group.
                                       8
   Outliers, Leverage, and Influence
• Any point that stands away from the others
  can be called an outlier and deserves your
  special attention.
• Outlying points can strongly influence a
  regression. Even a single point far from the
  body of the data can dominate the analysis.
                                                 9
                High Leverage point
                                  Examples
• A data point that has an x-
  value far from the mean of
  the x-values is called a high
  leverage point.
                                             10
                High Leverage point
                                  Examples
• A data point that has an x-
  value far from the mean of
  the x-values is called a high
  leverage point.
                                             11
             Influential observations
                                   Example
• A data point is influential if
  omitting it from the analysis
  gives a very different model.
                                             12
             Influential observations
                                   Example
• A data point is influential if
  omitting it from the analysis
  gives a very different model.
  P 284
                                             13
Example (A high leverage point that is
          not influential)
                                     16
        Restricted-range problem
• When one of the variables is restricted (you
  only look at some of the values), the
  correlation can be surprisingly low.
                                                 17
Restricted Range
Restricted Range
Restricted Range
Restricted Range
 Working with summary statistics
• There is a strong,
  positive, linear
  association between
  weight (in pounds) and
  height (in inches) for
  men
                                   22
 Working with summary statistics
• If instead of data on
  individuals we only had
  the mean weight for
  each height value, we
  would see an even
  stronger association
                                   23