Collapse Cheat Sheet

collapse is a C/C++ package for R that provides efficient statistical functions and data manipulation capabilities. It allows fast grouped, weighted, and time series computations on matrices and data frames. collapse handles data transformation uniformly while preserving attributes and ensuring compatibility with packages like dplyr, data.table, and panel data classes. It provides full user control for statistical programming with optimization possibilities.

Uploaded by

maksnecki

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

41 views2 pages

Collapse Cheat Sheet

Uploaded by

maksnecki

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

Advanced and Fast Data Transformation with collapse : : CHEAT SHEET

Introduction Fast Statistical Functions Grouping and Ordering Fast Data Manipulation
collapse is a C/C++ based package supporting advanced Fast functions to perform column–wise grouped and Optimized functions for grouping, ordering, unique Minimal overhead implementations
(grouped, weighted, time series, panel data and recursive) weighted computations on matrix-like objects values, splitting & recombining, and dealing with factors
fselect[<-]() - select/replace columns
statistical operations in R, with very efficient low-level
vectorizations across both groups and columns. fmean, fmedian, fmode, fsum, fprod, fsd, fvar GRP() - create a grouping object (class ’GRP’): pass to g arg. fsubset() - subset data (rows and columns)
fmin, fmax, fnth, ffirst, flast, fnobs, fndistinct g <- GRP(iris, ~ Species) # or GRP(iris£Species) or GRP(iris["Species"])
It also offers a flexible, class-agnostic, approach to data fndistinct(iris[1:4], g) # Computation without grouping overhead ss() - fast alternative to [, particularly for data frames
transformation in R: handling matrix and data frame based Syntax ## Sepal.Length Sepal.Width Petal.Length Petal.Width [row|col]order[v]() - reorder (sort) rows and columns
objects in a uniform, attribute preserving, way, and ensuring ## setosa 15 16 9 6
## versicolor 21 14 19 9 fmutate(), fsummarise() - dplyr -like, incl. across() feature
seamless compatibility with dplyr / (grouped) tibble, data.table, FUN(x, g = NULL, [w = NULL], TRA = NULL, ## virginica 21 13 20 12
xts, sf and plm classes for panel data (’pseries’, ’pdata.frame’). [na.rm = TRUE], use.g.names = TRUE, [f|set]transform[v][<-]() - transform cols (by reference)
fgroup by() - attach ’GRP’ object to data: a class-agnostic
collapse provides full control to the user for statistical [drop = TRUE], [nthreads = 1L])
grouped frame supporting fast computations fcompute[v]() - compute new cols dropping existing ones
programming - with several ways to reach the same outcome mtcars |> fgroup_by(cyl, vs, am) |> ss(1:2)
and rich optimization possibilities. Its default is na.rm = TRUE, x vector, matrix, or (grouped) data frame / list [f|set]rename() - rename (any object with ’names’ attribute)
## mpg cyl disp hp drat wt qsec vs am gear carb
and implemented at very low cost at the algorithm level. g [optional] (list of) vectors / factors or GRP() object ## Mazda RX4 21 6 160 110 3.9 2.620 16.46 0 1 4 4 [set]relabel() - assign/change variable labels (’label’ attr.)
## Mazda RX4 Wag 21 6 160 110 3.9 2.875 17.02 0 1 4 4
Calling help("collapse-documentation") brings up a w [optional] vector of (frequency) weights ## get vars[<-]() - select/replace columns (standard eval.)
## Grouped by: cyl, vs, am [7 | 5 (3.8) 1-12]
detailed documentation, which is also available online. See TRA [optional] operation to transform data with computed [num|cat|char|fact|logi|date] vars[<-]() - select/
# Group Stats: [N. groups | mean (sd) min-max of group sizes]
also the fastverse package/project for a recommended set of statistics (see FUN argument to TRA() and Examples) # Fast Functions also have a grouped_df method: here wt-weighted medians replace columns by data type or retrieve names/indices
complimentary packages and easy package management. mtcars |> fgroup_by(cyl, vs, am) |> fmedian(wt) |> head(3)
drop drop matrix / data frame dimensions. default TRUE add vars[<-]() - add or column-bind columns
## cyl vs am sum.wt mpg disp hp drat qsec gear carb
## 1 4 0 1 2.140 26.0 120.3 91 4.43 16.70 5 2
Examples Examples
Row/Column Arithmetic (by Reference) fmean(AirPassengers) # Vector
## 2
## 3
4 1 0 8.805 22.8 140.8 95
4 1 1 14.198 30.4 79.0 66
3.70 20.01
4.08 18.61
4
4
2
1
mtcars |> fsubset(mpg > fnth(mpg, 0.95), disp:wt, cylinders = cyl)
Column-wise sweeping out of vectors/matrices/DFs/lists ## [1] 280.2986 GRPN(), fgroup vars(), fungroup() - get group count, ## disp hp drat wt cylinders
fmean(AirPassengers, w = cycle(AirPassengers)) # Weighted mean grouping columns/variables, and ungroup data ## Fiat 128 78.7 66 4.08 2.200 4
%cr%, %c+%, %c-%, %c*%, %c/% e.g. Z = X %c/% rowSums(X) ## [1] 284.3397
## Toyota Corolla 71.1 65 4.22 1.835 4

Row-wise sweeping vectors from vectors/matrices/DFs/lists fmean(EuStockMarkets) # Matrix

qF(), qG() - quick as.factor, and vector grouping object mtcars |> colorder(cyl, vs, am, pos = 'after') |> head(2)

## DAX SMI CAC FTSE

of class ’qG’: a factor-light without levels attribute ## mpg cyl vs am disp hp drat wt qsec gear carb
%rr%, %r+%, %r-%, %r*%, %r/% e.g. Z = X %r/% colSums(X) ## 2530.657 3376.224 2227.828 3565.643
## Mazda RX4 21 6 0 1 160 110 3.9 2.620 16.46 4 4
group() - (multivariate) group id (’qG’) in appearance order ## Mazda RX4 Wag 21 6 0 1 160 110 3.9 2.875 17.02 4 4
Standard (column-wise) math by reference (returns invisibly) fmean(EuStockMarkets, drop = FALSE) # Don't drop dimensions
i <- base::invisible # These are equivalent, the second option is faster:
## DAX SMI CAC FTSE groupid() - run-length-type group id (’qG’) mtcars |> fgroup_by(cyl, vs, am) |> fmutate(sum_mpg = fsum(mpg)) |> i()
%+=%, %-=%, %*=%, %/=% e.g. X %-=% rowSums(X) ## [1,] 2530.657 3376.224 2227.828 3565.643 mtcars |> fmutate(sum_mpg = fsum(mpg, list(cyl, vs, am), TRA = 1)) |> i()
fmean(airquality) # Data Frame (can also use drop = FALSE)
seqid() - group-id from integer-sequences (’qG’) # These are also equivalent (weighted means), again the second is faster
Same thing, also supports row-wise operations by reference mtcars |> fgroup_by(cyl) |> fmutate(across(disp:drat, fmean, wt)) |> i()
## Ozone Solar.R Wind Temp Month Day radixorder[v]() - (multivariate) radix-based ordering mtcars |> ftransformv(disp:drat, fmean, cyl, wt, 1, apply = FALSE) |> i()
setop(X, "/", rowSums(X)) ## 42.129310 185.931507 9.957516 77.882353 6.993464 15.803922
# ftransform()/fcompute() support list input and ignore attached groupings
setop(X, "/", colSums(X), rowwise = TRUE) fmean(iris[1:4], g = iris$Species) # Grouped finteraction() - fast factor interactions (or return ’qG’) mtcars %>% fgroup_by(cyl) %>% ftransform(fselect(., hp:qsec) %>%
## Sepal.Length Sepal.Width Petal.Length Petal.Width fmedian(TRA = 1) %>% fungroup() %>% fsum(TRA = "/")) |> i()
fdroplevels() - fast removal of unused factor levels # Again a faster equivalent: note the use of 'set' to avoid a deep copy
## setosa 5.006 3.428 1.462 0.246
## versicolor 5.936 2.770 4.260 1.326 mtcars %>% ftransform(fselect(., hp:qsec) %>% fmedian(cyl, TRA = 1) %>%
Transform Data by (Grouped) Replacing or f[n]unique() - fast unique values / rows (by columns) fsum(TRA = "/", set = TRUE)) %>% i()
## virginica 6.588 2.974 5.552 2.026
# Aggregation: weighted standard deviations
Sweeping out Statistics (by Reference) X = iris[1:4]; g = iris$Species; w <- abs(rnorm(nrow(X))) gsplit() - fast splitting vector based on ’GRP’ objects mtcars |> fgroup_by(vs) |> fsummarise(across(disp:drat, fsd, w = wt))
fmean(X, g, w) # Grouped and weighted (random weights)
A generalisation of rowwise operations, that also ## Sepal.Length Sepal.Width Petal.Length Petal.Width greorder() - efficiently reorder y = unlist(gsplit(x, g)) ## vs disp hp drat
supports sweeping by groups e.g. aggregate statistics ## 1 0 101.80094 54.79388 0.4249447
## setosa 5.011663 3.467638 1.504067 0.2525002 such that identical(greorder(y, g), x) ## 2 1 56.30073 23.17952 0.4915196
## versicolor 5.930365 2.773558 4.238593 1.3136082
## virginica 6.588903 2.978017 5.552375 2.0221178 # Grouped linear models: .apply = FALSE applies functions to DF subset
TRA(x, STATS, FUN = "-", g = NULL, set = FALSE) collapse optimizes grouping using both factors / ’qG’ objects qTBL(mtcars) |> fgroup_by(vs) |> fsummarise(across(disp:drat,
## Transfomrations: here centering data on the weighted group median
setTRA(x, STATS, FUN = "-", g = NULL) TRA(X, fmedian(X, g, w), "-", g) |> head(3) and ’GRP’ objects. ’GRP’ objects contain most information function(x) list(models = list(lm(disp ~., x))), .apply = FALSE))

## Sepal.Length Sepal.Width Petal.Length Petal.Width

and are thus most efficient for complex computations. ## # A tibble: 2 x 2
x vector, matrix, or (grouped) data frame / list ## vs models
## 1 0.1 0.0 -0.1 0 X <- iris[1:4]; v <- as.character(iris$Species) ## <dbl> <list>
## 2 -0.1 -0.5 -0.1 0 f <- qF(v, na.exclude = FALSE) # Adds 'na.included' class: no NA checks ## 1 0 <lm>
STATS statistics matching (columns of) x (i.e. aggregated ## 3 -0.3 -0.3 -0.2 0 gv <- group(v) # 'qG' object: first appearance order, with 'na.included' ## 2 1 <lm>
vector, matrix or data frame / list) fmedian(X, g, w, TRA = "-") |> head(3) # Same thing: more compact microbenchmark(fmode(X, v), fmode(X, f), fmode(X, gv), fmode(X, g))
# Adding some columns. Use ftransform<- to also replace existing ones
## Sepal.Length Sepal.Width Petal.Length Petal.Width ## Unit: microseconds add_vars(iris) <- num_vars(iris) |> fsum(TRA = '%') |> add_stub("perc_")
FUN integer/string indicating transformation to perform: ## 1 0.1 0.0 -0.1 0 ## expr min lq mean median uq max neval
## 2 -0.1 -0.5 -0.1 0 ## fmode(X, v) 11.890 12.9150 15.17697 13.3455 13.7350 162.073 100
Int. String Description
## 3 -0.3 -0.3 -0.2 0 ## fmode(X, f) 9.225 9.8195 11.33035 10.0860 10.4550 92.947 100
0 "replace NA" replace missing values in x ## fmode(X, gv) 8.569 9.3480 10.73667 9.6555 10.1065 73.021 100
1 "replace fill" replace data and missing values in x fmedian(X, g, w, "-", set = TRUE) # Modify in-place (same as setTRA()) ## fmode(X, g) 6.683 7.2980 7.71620 7.5440 7.7490 13.489 100 Multi-Type Aggregation
2 "replace" replace data but preserve missing values in x head(iris, 3) # Changed iris too, as X = iris[1:4] did a shallow copy
3 "-" subtract: x - STATS(g) ## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
Convenient interface to complex multi-type aggregations
4 "-+" x - STATS(g) + fmean(STATS, w = GRPN) ## 1 0.1 0.0 -0.1 0 setosa
5 "/" divide: x / STATS(g)
collap(data, by, FUN = fmean, catFUN = fmode,
6 "%" compute percentages: x * 100/STATS(g)
## 2
## 3
-0.1
-0.3
-0.5
-0.3
-0.1
-0.2
0 setosa
0 setosa Quick Conversions cols = NULL, w = NULL, wFUN = fsum,
7 "+" add: x + STATS(g)
Fast and exact conversion of common data objects custom = NULL, keep.col.order = TRUE, ...)
8 "*" multiply: x * STATS(g)
9 "%%" modulus: x %% STATS(g) # Population weighted mean (PCGDP, LIFEEX) & mode (country), and sum(POP)
qM(), qDF(), qDT(), qTBL() - convert vectors, arrays,
10 "-%%" subtract modulus: x - x %% STATS(g) Basic Computing with R Functions data.frames or lists to matrix, data.frame, data.table or tibble
collap(wlddev, country + PCGDP + LIFEEX ~ income, w = ~ POP)
## country income PCGDP LIFEEX POP
g [optional] (list of) vectors / factors or GRP() object Apply R functions to rows or columns (by groups) ## 1 United States High income 31284.7366 75.69257 58840837058
m[r|c]tl() - matrix rows/cols to list, data.frame or data.table ## 2 Ethiopia Low income 557.1427 53.50608 20949161394
set TRUE transforms x by reference. setTRA is dapply(x, FUN, ..., MARGIN = 2) - column/row apply ## 3 India Lower middle income 1238.8280 60.58651 113837684528
qF(), as numeric factor(), as character factor() -
## 4 China Upper middle income 4145.6844 68.26984 119606023798
equivalent to invisible(TRA(..., set = TRUE)) BY(x, g, FUN, ...) - split-apply-combine computing convert to/from factors or all factors in a list / data.frame
Page 1 of 2 CC-BY-SA Sebastian Krantz • Learn more at sebkrantz.github.io/collapse • Source code at github.com/SebKrantz/collapse • Updates announced at twitter.com/collapse R - #rcollapse • Cheatsheet created for collapse version 1.8.8 • Updated: 2022-08
Advanced Transformations Time Series and Panel Series G(wldi) |> head(2) # default: compute growth of num_vars(), keep ids Recode and Replace Values
Common transformations (in econometrics) Fast and flexible indexed series and data frames: a ## iso3c year G1.decade G1.PCGDP G1.LIFEEX G1.GINI G1.ODA G1.POP recode num(), recode char() - recode numeric / character
## 1 AFG 1960 NA NA NA NA NA NA
modern upgrade of plm’s ’pseries’ and ’pdata.frame’ ## 2 AFG 1961 0 NA 1.590335 NA 98.74969 1.916611
values (+ regex recoding) in matrix-like objects
Scaling, Centering and Averaging
##
fscale(x, g = NULL, w = NULL, na.rm = TRUE, ## Indexed by: iso3c [1] | year [2 (61)]
replace [NA|Inf|outliers]() - replace special values
mean = 0, sd = 1, ...) Turn DF into an ’indexed frame’ using id and/or time vars
data ix = findex by(data, id1, ..., time) settransform(wldi, PCGDP_growth = fgrowth(PCGDP)) pad() - add (missing) observations / rows i.e. expand objects
fwithin(x, g = NULL, w = NULL, na.rm = TRUE, lm(G(PCGDP) ~ L(G(LIFEEX), 0:2), wldi) |> summary() |> coef() |> round(3)
mean = 0, theta = 1, ...) data ix$indexed series - columns are ’indexed series’ ## Estimate Std. Error t value Pr(>|t|)
fbetween(x, g = NULL, w = NULL, na.rm = TRUE,
index df = findex(data ix) - retrieve ’index df’: DF of ids
##
##
(Intercept)
L(G(LIFEEX), 0:2)--
1.718
0.062
0.081 21.256
0.175 0.353
0.000
0.724
(Memory) Efficient Programming
fill = FALSE, ...) ## L(G(LIFEEX), 0:2)L1 0.368 0.220 1.672 0.095 Functions for (memory) efficient R programming
index df = with(data ix, findex(indexed series)) - can ## L(G(LIFEEX), 0:2)L2 0.254 0.173 1.468 0.142
Higher-Dimensional Centering/Avg. and Linear Prediction any|all[v|NA], which[v|NA], %[=|!]=%, copyv, setv, alloc
fetch ’index df’ from ’indexed series’ in any caller environment
fhdwithin(x, fl, w = NULL, na.rm = TRUE, psacf(), pspacf(), psccf() - panel series ACF/PACF/CCF missing cases, na [insert|rm|omit], vlengths, vtypes,
fill = FALSE, lm.method = "qr", ...) data = unindex(data ix) - unindex (also ’indexed series’) psmat() - panel data to array conversion/reshaping vgcd, frange, fnlevels, fn[row|col], fdim, seq [row|col]
fhdbetween() - same arguments as fhdwithin() reindex(data, index = index df) - reindex / new pointers fsubset(wlddev, year %==% 2010) # 2x faster fsubset(wlddev, year == 2010)
attach(mtcars) # Efficient sub-assignment by reference, various options...
Statistical Operators (function shorthands with extra features) ’indexed series’ can be 1-or-2D atomic objects. Vectors / time Summary Statistics setv(am, 0, vs); setv(am, 1:10, vs); setv(am, 1:10, vs[10:20])
STD(), W(), B(), HDW(), HDB() series / matrices can also be indexed directly using:
qsu() - fast (grouped, weighted, panel-decomposed)
reindex(vec/mat, index = vec/index df)
Examples summary statistics for cross-sectional and panel data
# Grouped scaling
is irregular() - irregularity in any index[ed] obj. or time vec # Panel data statistics: overall, on group-means and group-centered data Small (Helper) Functions
qsu(iris, pid = Sepal.Length ~ Species, higher = TRUE)
iris |> fgroup_by(Species) |> fscale() |> head(2) Functions for (meta-)programming and attributes
## Species Sepal.Length Sepal.Width Petal.Length Petal.Width Example: Indexing Panel Data ## N/T Mean SD Min Max Skew Kurt
## 1 setosa 0.2666745 0.1899414 -0.3570112 -0.4364923
## Overall 150 5.8433 0.8281 4.3 7.9 0.3118 2.4264 .c, massign, %=%, vlabels[<-], setLabels, vclasses,
wldi <- wlddev |> findex_by(iso3c, year) # Balanced: 216 countries ## Between 3 5.8433 0.7951 5.006 6.588 -0.2112 1.5
## 2 setosa -0.3007180 -1.1290958 -0.3570112 -0.4364923 fsubset(wldi, 1:2, iso3c, year, PCGDP:POP) namlab, [add|rm] stub, %!in%, ckmatch, all identical,
## Within 50 5.8433 0.5113 4.1553 7.1553 0.1187 3.2633
STD(iris, ~ Species, stub = FALSE) |> invisible() # Same thing + faster ## iso3c year PCGDP LIFEEX GINI ODA POP all obj equal, all funs, set[Dim|Row|Col]names,
# Grouped and weighted scaling. Operators support formulas and keep ids ## 1 AFG 1960 NA 32.446 NA 116769997 8996973 qtab() - faster table() function, incl. weights & custom funs unattrib, setAttrib, copyAttrib, copyMostAttrib
STD(mtcars, mpg + carb ~ cyl, w = ~ wt) |> head(2) ## 2 AFG 1961 NA 32.962 NA 232080002 9169410
## cyl wt STD.mpg STD.carb ## descr() - detailed statistical description of data.frame .c(var1, var2, var3) # Non-standard concatenation

## Mazda RX4 6 2.620 0.9691687 0.386125 ## Indexed by: iso3c [1] | year [2 (61)] ## [1] "var1" "var2" "var3"
## Mazda RX4 Wag 6 2.875 0.9691687 0.386125
varying() - check variation within groups (panel-ids) .c(values, vectors) %=% eigen(cov(mtcars)) # Multiple Assignment
# Index stats: [N. ids] | [N. periods (tot.N. periods: (max-min)/GCD)]
# Much shorter than fsubset(mpg > fmean(mpg, cyl, TRA = "replace")) LIFEEXi = wldi$LIFEEX # Indexed series pwcor(), pwcov(), pwnobs() - pairwise correlations, # Variable labels: vlabels[<-], [set]relabel() etc. namlab() shows summary
str(LIFEEXi, strict.width = "cut") namlab(wlddev[c(2, 9)], N = TRUE, Ndist = TRUE, class = TRUE)
mtcars |> fsubset(mpg > B(mpg, cyl)) |> head(2) covariance and obs. (with P-value and pretty printing)
## mpg cyl disp hp drat wt qsec vs am gear carb ## 'indexed_series' num [1:13176] 32.4 33 33.5 34 34.5 ... ## Variable Class N Ndist Label
## Mazda RX4 21 6 160 110 3.9 2.620 16.46 0 1 4 4 ## - attr(*, "index_df")=Classes 'index_df', 'pindex' and 'data.frame'.. ## 1 iso3c factor 13176 216 Country Code
## ..$ iso3c: Factor w/ 216 levels "ABW","AFG","AGO",..: 2 2 2 2 2 2 .. ## 2 PCGDP numeric 9470 9470 GDP per capita (constant 2010 US$)
## Mazda RX4 Wag 21 6 160 110 3.9 2.875 17.02 0 1
# Regression with cyl fixed effects - a la Mundlak (1978)
4 4
## ..$ year : Ord.factor w/ 61 levels "1960"<"1961"<..: 1 2 3 4 5 6 7.. List Processing
lm(mpg ~ carb + B(carb, cyl), data = mtcars) |> coef() LIFEEXi[1:7] # Subsetting indexed series Functions to process (nested) lists (of data objects)
##
##
(Intercept)
34.829652
carb B(carb, cyl)
-0.465511 -4.775032
## [1] 32.446 32.962 33.471 33.971 34.463 34.948 35.430
## ldepth() - level of nesting of list API Extensions
# Fast grouped (vs) bivariate regression slopes: mpg ~ carb
## Indexed by: iso3c [1] | year [7 (61)]
is unlistable() - is list composed of atomic objects Shorthands for frequently used functions
mtcars |> fgroup_by(vs) |> fmutate(dm_carb = W(carb)) |> c(is_irregular(LIFEEXi), is_irregular(LIFEEXi[-5])) # Is irregular?
fsummarise(beta = fsum(mpg, dm_carb) %/=% fsum(dm_carb^2)) has elem() - search if list contains certain elements fselect -> slt, fsubset -> sbt, fmutate -> mtt,
## [1] FALSE TRUE
## vs beta
[f/set]transform[v] -> [set]tfm[v], fsummarise ->
## 1 0 -0.5557241 Note: ’indexed series’ and frames are supported via existing get elem() - pull out elements from list / subset list smr, across -> acr, fgroup by -> gby, finteraction
## 2 1 -2.0706468 ’pseries’/’pdata.frame’ methods for time series/panel functions. atomic elem[<-](), list elem[<-]() - get list with atomic / -> itn, findex by -> iby, findex -> ix, frename ->
# Residuals from regressing on 'Petal' vars and 'Species' FE sub-list elements, examining only first level of list rnm, get vars -> gv, num vars -> nv, add vars -> av
fhdwithin(iris[1:2], iris[3:5]) |> head(2) Fast functions to perform time-based computations on
## Sepal.Length Sepal.Width reg elem(), irreg elem() - get full list tree leading to atomic
(irregular) time series and (unbalanced) panel data Namespace masking
## 1 0.14989286 0.1102684 (’regular’) or non-atomic (’irregular’) elements
## 2 -0.05010714 -0.3897316 Can set option(collpse mask = c(...)) with a vector of
# Detrending with country-level cubic polynomials Lags/Leads, Differences, Growth Rates and Cumulative Sums rsplit() - efficient (recursive) splitting
functions starting with f-, to export versions without f-, masking
HDW(wlddev, PCGDP + LIFEEX + POP ~ iso3c * poly(year, 3)) |> head(2) flag(x, n = 1, g = NULL, t = NULL, fill = NA, ...) t list() - efficient list transpose (transpose lists of lists) base R or dplyr. A few keywords exist to mask multiple
## HDW.PCGDP HDW.LIFEEX HDW.POP fdiff(x, n = 1, diff = 1, g = NULL, t = NULL,
## 43 -258.4069 0.2360285 -317459.1 rapply2d() - recursive apply to lists of data objects functions, see help("collapse-options"). This allows clean
fill = NA, log = FALSE, rho = 1, ...)
## 44 -119.5600 0.1136432 -33900.2 & fast code, but poses additional namespace challenges:
fgrowth(x, n = 1, diff = 1, g = NULL, t = NULL, fill unlist2d() - recursive row-binding to data.frame
# Note: HD centering/prediction and polynomials requires package 'fixest' # Masking all f- functions and specials n = GRPN and table = qtab
= NA, logdiff = FALSE, scale = 100, power = 1, ...) options(collapse_mask = "all")
fcumsum(x, g = NULL, o = NULL, na.rm = TRUE, Example: Nested Linear Models library(collapse)
fill = FALSE, check.o = TRUE, ...) (dl <- mtcars |> rsplit(mpg + hp + carb ~ vs + am)) |> str(max.level = 2) # The folowing is 100% collapse code, apart from the base pipe
Linear Models ## List of 2
wlddev |>
Statistical Operators: L(), F(), D(), Dlog(), G() ## $ 0:List of 2
## ..$ 0:'data.frame': 12 obs. of 3 variables: subset(year >= 1990) |>
Fast (barebones) linear model fitting with 6 different solvers group_by(year) |>
## ..$ 1:'data.frame': 6 obs. of 3 variables:
flm(y, X, w = NULL, add.icpt = FALSE, method = "lm") Example: Computing Growth Rates ## $ 1:List of 2 summarise(n = n(), across(PCGDP:GINI, mean, w = POP))
## ..$ 0:'data.frame': 7 obs. of 3 variables:
Fast R2 -based F-test of exclusion restrictions for lm’s (with FE) # Ad-hoc use: note that G() supports formulas which fgrowth() doesn't
## ..$ 1:'data.frame': 7 obs. of 3 variables: with(mtcars, table(cyl, vs, am))
fgrowth(AirPassengers) |> head()
fFtest(y, exc, X = NULL, w = NULL, full.df = TRUE) nest_lm <- dl |> rapply2d(lm, formula = mpg ~ .)
sum(mtcars)
## [1] NA 5.357143 11.864407 -2.272727 -6.201550 11.570248 diff(EuStockMarkets)
(nest_coef <- nest_lm |> rapply2d(summary, classes = "lm") |> droplevels(wlddev)
Both functions also have formula interfaces: G(wlddev, c(1, 10), by = PCGDP ~ iso3c, t = ~ year) |> ss(11:12) get_elem("coefficients")) |> str(give.attr = FALSE, strict = "cut") mean(nv(iris), g = iris$Species)
flm(cbind(mpg, disp) ~ hp + carb, weights = wt, mtcars) ## iso3c year G1.PCGDP L10G1.PCGDP ## List of 2 scale(nv(GGDC10S), g = GGDC10S$Variable)
## 1 AFG 1970 NA NA ## $ 0:List of 2 unique(GGDC10S, cols = c("Variable", "Country"))
## mpg disp
## 2 AFG 1971 NA NA ## ..$ 0: num [1:3, 1:4] 15.8791 0.0683 -4.5715 3.655 0.0345 ... range(wlddev$date)
## (Intercept) 28.48401839 42.155002
## hp -0.06834996 2.101036 wlddev |> fgroup_by(iso3c) |> fselect(iso3c, year, PCGDP, LIFEEX) |> ## ..$ 1: num [1:3, 1:4] 26.9556 -0.0319 -0.308 2.293 0.0149 ...
## carb 0.33207257 -38.183910 fmutate(PCGDP_growth = fgrowth(PCGDP, t = year)) |> head(2) ## $ 1:List of 2 wlddev |>
## iso3c year PCGDP LIFEEX PCGDP_growth ## ..$ 0: num [1:3, 1:4] 30.896903 -0.099403 -0.000332 3.346033 0.035.. index_by(iso3c, year) |>
# Test the exclusion of cyl-dummies and hp.
## 1 AFG 1960 NA 32.446 NA ## ..$ 1: num [1:3, 1:4] 37.0012 -0.1155 0.4762 7.3316 0.0894 ... mutate(PCGDP_lag = lag(PCGDP),
fFtest(mpg ~ qF(cyl) + hp | carb + qF(am), weights = wt, mtcars)
## 2 AFG 1961 NA 32.962 NA nest_coef |> unlist2d(c("vs", "am"), row.names = "variable") |> head(2) PCGDP_diff = PCGDP - PCGDP_lag,
## R-Sq. DF1 DF2 F-Stat. P-Value PCGDP_growth = growth(PCGDP)) |> unindex()
## Full Model 0.812 5 26 22.479 0.000 settransform(wlddev, PCGDP_growth = G(PCGDP, g = iso3c, t = year)) ## vs am variable Estimate Std. Error t value Pr(>|t|)
## Restricted Model 0.674 2 29 30.041 0.000 # Note: can omit t -> requires consecutive observations and groups ## 1 0 0 (Intercept) 15.87914500 3.65495315 4.344555 0.001865018 The best way to set this option is inside an .Rprofile file
## Exclusion Rest. 0.138 3 26 6.351 0.002 # Usage with indexed series / frames: ## 2 0 0 hp 0.06832467 0.03449076 1.980956 0.078938069
placed in the user or project directory. Use it carefully.
Page 2 of 2 CC-BY-SA Sebastian Krantz • Learn more at sebkrantz.github.io/collapse • Source code at github.com/SebKrantz/collapse • Updates announced at twitter.com/collapse R - #rcollapse • Cheatsheet created for collapse version 1.8.8 • Updated: 2022-08

Curso Básico de Iniciación A La Programación Con R Álvaro Mauricio Bustamante Lozano
No ratings yet
Curso Básico de Iniciación A La Programación Con R Álvaro Mauricio Bustamante Lozano
9 pages
Reshape2 - R - Flexibly Reshape Data - A Reboot of The Reshape Package
No ratings yet
Reshape2 - R - Flexibly Reshape Data - A Reboot of The Reshape Package
14 pages
R - Tutorial: Matrices Are Vectors
No ratings yet
R - Tutorial: Matrices Are Vectors
13 pages
Apply Functions With Purrr::: Cheat Sheet
No ratings yet
Apply Functions With Purrr::: Cheat Sheet
2 pages
Essential R Commands Guide
No ratings yet
Essential R Commands Guide
11 pages
A Short List of The Most Useful R Commands
No ratings yet
A Short List of The Most Useful R Commands
8 pages
Data Visualisation L9+L10 Lab 1 R Basics: Printing Character
No ratings yet
Data Visualisation L9+L10 Lab 1 R Basics: Printing Character
9 pages
R Reference Card
No ratings yet
R Reference Card
1 page
Teaching R
No ratings yet
Teaching R
15 pages
R Functions
No ratings yet
R Functions
8 pages
R Programming
No ratings yet
R Programming
34 pages
18 3 24 Upto Week 6 A B Latest 1
No ratings yet
18 3 24 Upto Week 6 A B Latest 1
25 pages
R Programming Tutorial For Beginners
No ratings yet
R Programming Tutorial For Beginners
7 pages
R Tutorial
No ratings yet
R Tutorial
32 pages
R Tutorial: Vectors, Matrices, Arrays
No ratings yet
R Tutorial: Vectors, Matrices, Arrays
8 pages
MATLAB For Data Processing and Visualization Quick Reference
No ratings yet
MATLAB For Data Processing and Visualization Quick Reference
11 pages
R Programming Language: History
No ratings yet
R Programming Language: History
20 pages
R Lecture 2-1
No ratings yet
R Lecture 2-1
28 pages
STAT 04 Simplify Notes
No ratings yet
STAT 04 Simplify Notes
34 pages
R File Code
No ratings yet
R File Code
16 pages
Basic R Dplyr Session 4 Demonstration
No ratings yet
Basic R Dplyr Session 4 Demonstration
18 pages
R Assignment Maths
No ratings yet
R Assignment Maths
27 pages
M2 Dar
No ratings yet
M2 Dar
46 pages
R Machine Learning Lab Guide
0% (1)
R Machine Learning Lab Guide
9 pages
R
No ratings yet
R
38 pages
R Exam
No ratings yet
R Exam
18 pages
Introduction To R
No ratings yet
Introduction To R
74 pages
Data Science Lab Program Printout
No ratings yet
Data Science Lab Program Printout
43 pages
Data Transformation With Data - Table: Cheat Sheet
No ratings yet
Data Transformation With Data - Table: Cheat Sheet
2 pages
Data Transformation With Data - Table: Cheat Sheet
No ratings yet
Data Transformation With Data - Table: Cheat Sheet
2 pages
Data Transformation With Data - Table: Cheat Sheet
No ratings yet
Data Transformation With Data - Table: Cheat Sheet
2 pages
Data Transformation Cheatsheet R
No ratings yet
Data Transformation Cheatsheet R
2 pages
Purrr
No ratings yet
Purrr
2 pages
My First Script.r
No ratings yet
My First Script.r
32 pages
R Reference Card
No ratings yet
R Reference Card
6 pages
R Reference Guide for Programmers
No ratings yet
R Reference Guide for Programmers
6 pages
R Pres
No ratings yet
R Pres
53 pages
Practical 1 - Basics of R
No ratings yet
Practical 1 - Basics of R
8 pages
Purrr
No ratings yet
Purrr
2 pages
CIND123 Swirl Lesson 15
No ratings yet
CIND123 Swirl Lesson 15
46 pages
Data Transformation With Dplyr - Cheatsheet
100% (1)
Data Transformation With Dplyr - Cheatsheet
2 pages
Data Science
No ratings yet
Data Science
20 pages
Base R
No ratings yet
Base R
9 pages
R Imp Funtions
No ratings yet
R Imp Funtions
10 pages
Week 1-B. Data in R
No ratings yet
Week 1-B. Data in R
5 pages
R-Basics Knit
No ratings yet
R-Basics Knit
13 pages
Rbasics
No ratings yet
Rbasics
96 pages
Data Table
No ratings yet
Data Table
2 pages
Screenshot 2025-01-24 at 9.23.10 AM
No ratings yet
Screenshot 2025-01-24 at 9.23.10 AM
42 pages
R Programming Materials
No ratings yet
R Programming Materials
51 pages
Basic R Programming
No ratings yet
Basic R Programming
37 pages
Data Transformation Cheatsheet
No ratings yet
Data Transformation Cheatsheet
2 pages
AI & Data Science Lab Record
No ratings yet
AI & Data Science Lab Record
28 pages
Introduction To R: Nihan Acar-Denizli, Pau Fonseca
No ratings yet
Introduction To R: Nihan Acar-Denizli, Pau Fonseca
50 pages
R Guru Cheat Sheet
No ratings yet
R Guru Cheat Sheet
2 pages
System Administration and IT Infrastructure Services by Google
No ratings yet
System Administration and IT Infrastructure Services by Google
10 pages
User Name & Password: Student-Name User-Name Password Grade Roll No. Category School Name-Campus-Address School-Code
0% (1)
User Name & Password: Student-Name User-Name Password Grade Roll No. Category School Name-Campus-Address School-Code
1 page
Syllabus BCA 21
No ratings yet
Syllabus BCA 21
26 pages
Fiat Runbook PDF
No ratings yet
Fiat Runbook PDF
9 pages
ST 2 Gr.6 Arts With Tos
No ratings yet
ST 2 Gr.6 Arts With Tos
5 pages
The Translation Sales Handbook 2016 Luke Spear
No ratings yet
The Translation Sales Handbook 2016 Luke Spear
136 pages
Notes On Project Management & Critical Path Method: EM416/EA302
No ratings yet
Notes On Project Management & Critical Path Method: EM416/EA302
12 pages
Web Technology Lab Manual
No ratings yet
Web Technology Lab Manual
149 pages
AI in Accounting 200 Unique Model Questions
No ratings yet
AI in Accounting 200 Unique Model Questions
16 pages
Termux PDF Bbgcih
No ratings yet
Termux PDF Bbgcih
11 pages
Lecture 7
No ratings yet
Lecture 7
56 pages
Lifestyle Vision Board: Manifest and Brainstorm With This Whiteboard
No ratings yet
Lifestyle Vision Board: Manifest and Brainstorm With This Whiteboard
17 pages
Steinberg UR12 Manual
No ratings yet
Steinberg UR12 Manual
16 pages
DMW - Unit 1
No ratings yet
DMW - Unit 1
21 pages
Secret Conversations Whitepaper
No ratings yet
Secret Conversations Whitepaper
11 pages
(Sign, Storage, Transmission) Cait McKinney - Information Activism A Queer History of Lesbian Media Technologies-Duke University Press (2020)
No ratings yet
(Sign, Storage, Transmission) Cait McKinney - Information Activism A Queer History of Lesbian Media Technologies-Duke University Press (2020)
293 pages
Oracle User Productivity Kit (UPK)
No ratings yet
Oracle User Productivity Kit (UPK)
20 pages
HE182570 NguyenDucAnh Lab05 Identifiying Threats Vulnerabilities and Exploits
No ratings yet
HE182570 NguyenDucAnh Lab05 Identifiying Threats Vulnerabilities and Exploits
7 pages
Compiler Data Flow Analysis
No ratings yet
Compiler Data Flow Analysis
16 pages
Student Management System Report
No ratings yet
Student Management System Report
13 pages
Obiee Admin Interview Questions
100% (3)
Obiee Admin Interview Questions
13 pages
Amita Ashok Dhainje - Updated Resume PDF
No ratings yet
Amita Ashok Dhainje - Updated Resume PDF
2 pages
Chapter 6 Interface Python With MYSQL
No ratings yet
Chapter 6 Interface Python With MYSQL
80 pages
IT9000-PV6500 User Manual-EN
No ratings yet
IT9000-PV6500 User Manual-EN
38 pages
PHIL222 Paper 7
No ratings yet
PHIL222 Paper 7
3 pages
UC User ManualV1.1.1
No ratings yet
UC User ManualV1.1.1
53 pages
OOP Concepts in Java Explained
No ratings yet
OOP Concepts in Java Explained
1 page
Virtual Machine
90% (10)
Virtual Machine
868 pages
HTML Refrence Notes
No ratings yet
HTML Refrence Notes
8 pages
Case Study Guildeline
No ratings yet
Case Study Guildeline
2 pages

Collapse Cheat Sheet

Uploaded by

Collapse Cheat Sheet

Uploaded by

Advanced and Fast Data Transformation with collapse : : CHEAT SHEET

Row-wise sweeping vectors from vectors/matrices/DFs/lists fmean(EuStockMarkets) # Matrix

## DAX SMI CAC FTSE

## Sepal.Length Sepal.Width Petal.Length Petal.Width

You might also like