6.4 Making Multiple Density Curves from Grouped Data
6.4.2 Solution
Use geom_density()
, and map the grouping variable to an aesthetic like colour
or fill
, as shown in Figure 6.12. The grouping variable must be a factor or a character vector. In the birthwt
data set, the desired grouping variable, smoke
, is stored as a number, so we have to convert it to a factor first.
library(MASS) # Load MASS for the birthwt data set
birthwt %>%
birthwt_mod <- mutate(smoke = as.factor(smoke)) # Convert smoke to a factor
# Map smoke to colour
ggplot(birthwt_mod, aes(x = bwt, colour = smoke)) +
geom_density()
# Map smoke to fill and make the fill semitransparent by setting alpha
ggplot(birthwt_mod, aes(x = bwt, fill = smoke)) +
geom_density(alpha = .3)
6.4.3 Discussion
To make these plots, the data must all be in one data frame, with one column containing a categorical variable used for grouping.
For this example, we used the birthwt
data set. It contains data about birth weights and a number of risk factors for low birth weight:
birthwt#> low age lwt race smoke ptl ht ui ftv bwt
#> 85 0 19 182 2 0 0 0 1 0 2523
#> 86 0 33 155 3 0 0 0 0 3 2551
#> 87 0 20 105 1 1 0 0 0 1 2557
#> ...<183 more rows>...
#> 82 1 23 94 3 1 0 0 0 0 2495
#> 83 1 17 142 2 0 0 1 0 0 2495
#> 84 1 21 130 1 1 0 1 0 3 2495
We looked at the relationship between smoke
(smoking) and bwt
(birth weight in grams). The value of smoke
is either 0 or 1, but since it’s stored as a numeric vector, ggplot doesn’t know that it should be treated as a categorical variable. To make it so ggplot knows to treat smoke
as categorical, we can either convert that column of the data frame to a factor, or tell ggplot to treat it as a factor by using factor(smoke)
inside of the aes()
statement. For these examples, we converted smoke
to a factor.
Another method for visualizing the distributions is to use facets, as shown in Figure 6.13. We can align the facets vertically or horizontally. Here we’ll align them vertically so that it’s easy to compare the two distributions:
ggplot(birthwt_mod, aes(x = bwt)) +
geom_density() +
facet_grid(smoke ~ .)
One problem with the faceted graph is that the facet labels are just 0 and 1, and there’s no label indicating that those values are for smoke. To change the labels, we need to change the names of the factor levels. First we’ll take a look at the factor levels, then we’ll assign new factor level names:
levels(birthwt_mod$smoke)
#> [1] "0" "1"
$smoke <- recode(birthwt_mod$smoke, '0' = 'No Smoke', '1' = 'Smoke') birthwt_mod
Now when we plot our modified data frame, our desired labels appear (Figure 6.13, right):
ggplot(birthwt_mod, aes(x = bwt)) +
geom_density() +
facet_grid(smoke ~ .)
If you want to see the histograms along with the density curves, the best option is to use facets, since other methods of visualizing both histograms in a single graph can be difficult to interpret. To do this, map y = ..density..
, so that the histogram is scaled down to the height of the density curves. In this example, we’ll also make the histogram bars a little less prominent by changing the colors (Figure 6.14):
ggplot(birthwt_mod, aes(x = bwt, y = ..density..)) +
geom_histogram(binwidth = 200, fill = "cornsilk", colour = "grey60", size = .2) +
geom_density() +
facet_grid(smoke ~ .)