WWWWWW WWWWWW WWWWWW WWWWWW WWWW WWWW WWWWWW: Data Transformation With Dplyr
WWWWWW WWWWWW WWWWWW WWWWWW WWWW WWWW WWWWWW: Data Transformation With Dplyr
dplyr functions work with pipes and expect tidy data. In tidy data:
      A B C                                                                                A B C
                                                                                                                                                                                                       Manipulate Cases                                                                           Manipulate Variables
                                                                          &
                                                                                                                                                                                             pipes     EXTRACT CASES                                                                              EXTRACT VARIABLES
                                                                                                                                                                                                       Row functions return a subset of rows as a new table.                                      Column functions return a set of columns as a new vector or table.
     Each variable is in                                                              Each observation, or                                                                           x %>% f(y)
     its own column                                                                   case, is in its own row                                                                        becomes f(x, y)                       filter(.data, …, .preserve = FALSE) Extract rows                                            pull(.data, var = -1, name = NULL, …) Extract
     Summarise Cases                                                                                                                                                                                   w
                                                                                                                                                                                                       www
                                                                                                                                                                                                         ww                that meet logical criteria.
                                                                                                                                                                                                                           filter(mtcars, mpg > 20)                                                w
                                                                                                                                                                                                                                                                                                   www                 column values as a vector, by name or index.
                                                                                                                                                                                                                                                                                                                       pull(mtcars, wt)
                                                                                                                                                                                                                                                                                                   w
                                                                                                                                                                                                                                                                                                   www
     Apply summary functions to columns to create a new table of
                                                                                                                                                                                                       w
                                                                                                                                                                                                       www
                                                                                                                                                                                                         ww
                                                                                                                                                                                                                           rows with duplicate values.                                                                 select(mtcars, mpg, wt)
     summary statistics. Summary functions take vectors as input and                                                                                                                                                       distinct(mtcars, gear)
     return one value (see back).
                                                                                                                                                                                                                                                                                                                       relocate(.data, …, .before = NULL, .a er = NULL)
                                                                                                                                                                                                                           slice(.data, …, .preserve = FALSE) Select rows
                                                                                                                                                                                                                                                                                                   w
                                                                                                                                                                                                                                                                                                   www
                                                                                                                                                                                                                                                                                                     ww
                                                                                                   summary function                                                                                                                                                                                                    Move columns to new position.
                                                                                                                                                                                                                           by position.                                                                                relocate(mtcars, mpg, cyl, .a er = last_col())
                                                                                                                                                                                                                           slice(mtcars, 10:15)
                                                  summarise(.data, …)
     w
     ww                                                                                                                                                                                                w
                                                                                                                                                                                                       www
                                                                                                                                                                                                         ww
                                                  Compute table of summaries.                                                                                                                                              slice_sample(.data, …, n, prop, weight_by =
                                                  summarise(mtcars, avg = mean(mpg))                                                                                                                                       NULL, replace = FALSE) Randomly select rows.                             Use these helpers with select() and across()
                                                                                                                                                                                                                           Use n to select a number of rows and prop to                             e.g. select(mtcars, mpg:cyl)
                                                  count(.data, …, wt = NULL, sort = FALSE, name =                                                                                                                          select a fraction of rows.                                               contains(match)        num_range(prefix, range) :, e.g. mpg:cyl
                                                  NULL) Count number of rows in each group                                                                                                                                 slice_sample(mtcars, n = 5, replace = TRUE)                              ends_with(match) all_of(x)/any_of(x, …, vars) -, e.g, -gear
     w
     ww
                                                  defined by the variables in … Also tally().                                                                                                                                                                                                       starts_with(match) matches(match)               everything()
                                                  count(mtcars, cyl)                                                                                                                                                       slice_min(.data, order_by, …, n, prop,
                                                                                                                                                                                                                           with_ties = TRUE) and slice_max() Select rows
                                                                                                                                                                                                                           with the lowest and highest values.
                                                                                                                                                                                                                                                                                                  MANIPULATE MULTIPLE VARIABLES AT ONCE
     Group Cases                                                                                                                                                                                       w
                                                                                                                                                                                                       www
                                                                                                                                                                                                         ww
                                                                                                                                                                                                                           slice_min(mtcars, mpg, prop = 0.25)
                                                                                                                                                                                                                                                                                                                       across(.cols, .funs, …, .names = NULL) Summarise
                                                                                                                                                                                                                           slice_head(.data, …, n, prop) and slice_tail()
                                                                                                                                                                                                                                                                                                   w
                                                                                                                                                                                                                                                                                                   ww
     Use group_by(.data, …, .add = FALSE, .drop = TRUE) to create a                                                                                                                                                                                                                                                    or mutate multiple columns in the same way.
                                                                                                                                                                                                                           Select the first or last rows.                                                              summarise(mtcars, across(everything(), mean))
     "grouped" copy of a table grouped by columns in ... dplyr                                                                                                                                                             slice_head(mtcars, n = 5)
     functions will manipulate each "group" separately and combine
     the results.                                                                                                                                                                                                                                                                                                      c_across(.cols) Compute across columns in
                                                                                                                                                                                                                                                                                                   w
                                                                                                                                                                                                                                                                                                   ww
                                                                                                                                                                                                         Logical and boolean operators to use with filter()                                                            row-wise data.
                                                                                                                                                                                                          ==       <        <=      is.na() %in% |                              xor()                                  transmute(rowwise(UKgas), total = sum(c_across(1:2)))
     w
     www
       ww                                                                                                      mtcars %>%                                                                                 !=       >        >=      !is.na() !         &
                                                                          w
                                                                                                                group_by(cyl) %>%                                                                                                                                                                 MAKE NEW VARIABLES
                                                                                                                summarise(avg = mean(mpg))                                                               See ?base::Logic and ?Comparison for help.
                                                                                                                                                                                                                                                                                                  Apply vectorized functions to columns. Vectorized functions take
                                                                                                                                                                                                                                                                                                  vectors as input and return vectors of the same length as output
                                                                                                                                                                                                       ARRANGE CASES                                                                              (see back).
     Use rowwise(.data, …) to group data into individual rows. dplyr                                                                                                                                                                                                                                                           vectorized function
                                                                                                                                                                                                                           arrange(.data, …, .by_group = FALSE) Order
     functions will compute results for each row. Also apply functions
                                                                                                                                                                                                       w
                                                                                                                                                                                                       www
                                                                                                                                                                                                         ww
                                                                                                                                                                                                                           rows by values of a column or columns (low to
     to list-columns. See tidyr cheat sheet for list-column workflow.                                                                                                                                                      high), use with desc() to order from high to low.                                             mutate(.data, …, .keep = "all", .before = NULL,
                                                                                                                                                                                                                                                                                                   w
                                                                                                                                                                                                                                                                                                   www
                                                                                                                                                                                                                                                                                                     ww
                                                                                                                                                                                                                           arrange(mtcars, mpg)                                                                          .a er = NULL) Compute new column(s). Also
                                                                                                                       starwars %>%                                                                                        arrange(mtcars, desc(mpg))                                                                    add_column(), add_count(), and add_tally().
      ww
       www
         ww
                                                                                                                                                                                                                                                                                                                         mutate(mtcars, gpm = 1 / mpg)
     w
     w
                                                                                                                         rowwise() %>%
                                                                                                                         mutate(film_count = length(films))
                                                                                                                                                                                                       ADD CASES                                                                                                         transmute(.data, …) Compute new column(s),
                                                                                                                                                                                                       w
                                                                                                                                                                                                       www
                                                                                                                                                                                                         ww
                                                                                                                                                                                                                           Add one or more rows to a table.
     ungroup(g_mtcars)                                                                                                                                                                                                     add_row(cars, speed = 1, dist = 1)                                                            rename(.data, …) Rename columns. Use
                                                                                                                                                                                                                                                                                                   w
                                                                                                                                                                                                                                                                                                   wwww                  rename_with() to rename with a function.
                                                                                                                                                                                                                                                                                                                         rename(cars, distance = dist)
                                                                                                                                                                                                        RStudio® is a trademark of RStudio, PBC • CC BY SA RStudio • info@rstudio.com • 844-448-1212 • rstudio.com • Learn more at dplyr.tidyverse.org • dplyr 1.0.7 • Updated: 2021-07
ft
              
                                                                                      ft
                                                                                           
                                                                                               
                                                                                                   
                                                                                                       
                                                                                                           
                                                                                                               
                                                                                                                   
                                                                                                                                   ft
                                                                                                                                        ft
                                                                                                                                             
                                                                                                                                             
                                                                                                                                                 
                                                                                                                                                     
                                                                                                                                                     
                                                                                                                                                                                                
               Vectorized Functions                                                                                                                                                                                                                                              Summary Functions                                                         Combine Tables
               TO USE WITH MUTATE ()                                                                                                                                                                                                                                             TO USE WITH SUMMARISE ()                                                  COMBINE VARIABLES                                                   COMBINE CASES
               mutate() and transmute() apply vectorized                                                                                                                                                                                                                         summarise() applies summary functions to                                   x       y
               functions to columns to create new columns.                                                                                                                                                                                                                       columns to create a new table. Summary                                      A    B    C              E   F   G       A   B   C   E   F   G                          A B C
                                                                                                                                                                                                                                                                                                               RStudio® is a trademark of RStudio, PBC • CC BY SA RStudio • info@rstudio.com • 844-448-1212 • rstudio.com • Learn more at dplyr.tidyverse.org • dplyr 1.0.7 • Updated: 2021-07
ft
ft
ft
     ft
     ff
          ff
          ff
          ff
          ff
          ff
               ff
                    
                        ff
                             ff
                              
                                                              ff
                                                                   
                                                                       ff
                                                                           
                                                                                                                           ft
                                                                                                                           
                                                                                                                                        
                                                                                                                                            ff