5.8 Adding Fitted Lines from Multiple Existing Models
5.8.1 Problem
You have already created a fitted regression model object for a data set, and you want to plot the lines for that model.
5.8.2 Solution
Use the predictvals()
function from the previous recipe (5.7) along with functions from the dplyr
package, including group_by()
and do()
.
With the heightweight
data set, we’ll make a linear model for each of the levels of sex
. The model building is done for each subset of the data frame by specifying the model computation we want within the do()
function.
The following code splits the heightweight
data frame into two data frames, by the grouping variable sex
. This creates a data frame subset for males and a data frame subset for females. We then apply lm(heightIn ~ ageYear, .)
to each subset. The .
in lm(heightIn ~ ageYear)
represents the data frame we are piping in from the previous line – in this case, the two data frame subsets we have just created. Finally, this code returns a data frame which contains a column, model
, which is a list of the model outputs corresponding to each level of the grouping variable sex
, female and male.
library(gcookbook) # Load gcookbook for the heightweight data set
library(dplyr)
# Create an lm model object for each value of sex; this returns a data frame
heightweight %>%
models <- group_by(sex) %>%
do(model = lm(heightIn ~ ageYear, .)) %>%
ungroup()
# Print the data frame
models#> # A tibble: 2 × 2
#> sex model
#> <fct> <list>
#> 1 f <lm>
#> 2 m <lm>
# Print out the model column of the data frame
$model
models#> [[1]]
#>
#> Call:
#> lm(formula = heightIn ~ ageYear, data = .)
#>
#> Coefficients:
#> (Intercept) ageYear
#> 43.963 1.209
#>
#>
#> [[2]]
#>
#> Call:
#> lm(formula = heightIn ~ ageYear, data = .)
#>
#> Coefficients:
#> (Intercept) ageYear
#> 30.658 2.301
Now that we have the list of model objects, we can run the predictvals()
as defined in 5.7 to get the predicted values from each model:
models %>%
predvals <- group_by(sex) %>%
do(predictvals(.$model[[1]], xvar = "ageYear", yvar = "heightIn"))
Finally, we can plot the data with the predicted values (Figure 5.24):
ggplot(heightweight, aes(x = ageYear, y = heightIn, colour = sex)) +
geom_point() +
geom_line(data = predvals)
# Using facets instead of colors for the groups
ggplot(heightweight, aes(x = ageYear, y = heightIn)) +
geom_point() +
geom_line(data = predvals) +
facet_grid(. ~ sex)
5.8.3 Discussion
The group_by()
and do()
calls are used for splitting the data into parts, running functions on those parts, and then reassembling the output.
With the preceding code, the x range of the predicted values for each group spans the x range of each group, and no further; for the males, the prediction line stops at the oldest male, while for females, the prediction line continues further right, to the oldest female. To form prediction lines that have the same x range across all groups, we can simply pass in xrange
, like this:
models %>%
predvals <- group_by(sex) %>%
do(predictvals(
$model[[1]],
.xvar = "ageYear",
yvar = "heightIn",
xrange = range(heightweight$ageYear))
)
Then we can plot it, the same as we did before:
ggplot(heightweight, aes(x = ageYear, y = heightIn, colour = sex)) +
geom_point() +
geom_line(data = predvals)
As you can see in Figure 5.25, the line for males now extends as far to the right as the line for females. Keep in mind that extrapolating past the data isn’t always appropriate, though; whether or not it’s justified will depend on the nature of your data and the assumptions you bring to the table.