15.15 Calculating New Columns From Existing Columns
15.15.2 Solution
Use mutate()
from the dplyr package.
library(gcookbook) # Load gcookbook for the heightweight data set
heightweight#> sex ageYear ageMonth heightIn weightLb
#> 1 f 11.92 143 56.3 85.0
#> 2 f 12.92 155 62.3 105.0
#> ...<232 more rows>...
#> 236 m 13.92 167 62.0 107.5
#> 237 m 12.58 151 59.3 87.0
This will convert heightIn
to centimeters and store it in a new column, heightCm
:
library(dplyr)
%>%
heightweight mutate(heightCm = heightIn * 2.54)
#> sex ageYear ageMonth heightIn weightLb heightCm
#> 1 f 11.92 143 56.3 85.0 143.002
#> 2 f 12.92 155 62.3 105.0 158.242
#> ...<232 more rows>...
#> 236 m 13.92 167 62.0 107.5 157.480
#> 237 m 12.58 151 59.3 87.0 150.622
This returns a new data frame, so if you want to replace the original variable, you will need to save the result over it.
15.15.3 Discussion
You can use mutate()
to transform multiple columns at once:
%>%
heightweight mutate(
heightCm = heightIn * 2.54,
weightKg = weightLb / 2.204
)#> sex ageYear ageMonth heightIn weightLb heightCm weightKg
#> 1 f 11.92 143 56.3 85.0 143.002 38.56624
#> 2 f 12.92 155 62.3 105.0 158.242 47.64065
#> ...<232 more rows>...
#> 236 m 13.92 167 62.0 107.5 157.480 48.77495
#> 237 m 12.58 151 59.3 87.0 150.622 39.47368
It is also possible to calculate a new column based on multiple columns:
%>%
heightweight mutate(bmi = weightKg / (heightCm / 100)^2)
With mutate()
, the columns are added sequentially. That means that we can reference a newly-created column when calculating a new column:
%>%
heightweight mutate(
heightCm = heightIn * 2.54,
weightKg = weightLb / 2.204,
bmi = weightKg / (heightCm / 100)^2
)#> sex ageYear ageMonth heightIn weightLb heightCm weightKg bmi
#> 1 f 11.92 143 56.3 85.0 143.002 38.56624 18.85919
#> 2 f 12.92 155 62.3 105.0 158.242 47.64065 19.02542
#> ...<232 more rows>...
#> 236 m 13.92 167 62.0 107.5 157.480 48.77495 19.66736
#> 237 m 12.58 151 59.3 87.0 150.622 39.47368 17.39926
With base R, calculating a new colum can be done by referencing the new column with the $
operator and assigning some values to it:
$heightCm <- heightweight$heightIn * 2.54 heightweight
15.15.4 See Also
See Recipe 15.16 for how to perform group-wise transformations on data.