Add references to exactextractr #813

Lvulis · 2022-06-25T15:05:02Z

The exactextractr package enables super fast zonal extraction of raster values from polygons. It also reports the coverage fraction of each cell by the input polygon which some could find useful. It works with sf polygons and terra/raster raster data. From the benchmarking reported on the package GitHub, it seems that the improvement with exactextractr::exact_extract over terra::extract is not as significant as the improvement over raster::extract, but could still be useful if someone is working with very large datasets.

Maybe a reference to the package could be added in Chapter 6.

The text was updated successfully, but these errors were encountered:

Robinlovelace · 2022-06-25T16:09:47Z

👍 to that, it's well-maintained, fast and clearly useful. You up for taking a look at this @Lvulis? It's more on Jakub's area, what do you reckon? Thanks for the idea!

kadyb · 2022-06-26T12:26:18Z

It is also worth mentioning that {exactextractr} is more precise than {raster}, ~~{terra}~~ or {stars}. See: 1, 2

Nowosad · 2022-06-26T13:06:59Z

@kadyb including when terra::extract() has exact = TRUE?

kadyb · 2022-06-26T14:07:09Z

There are some differences in results, but are they significant?

library("sf")
library("terra")
library("exactextractr")

f = system.file("ex/elev.tif", package = "terra")
r = rast(f)

geom = spatSample(r, size = 3, as.points = TRUE, na.rm = TRUE)
geom = buffer(geom, width = 2000)

extract(r, geom, fun = "mean", exact = FALSE)
#>   ID elevation
#> 1  1  379.6957
#> 2  2  451.9130
#> 3  3  457.9565

extract(r, geom, fun = "mean", exact = TRUE)
#>      ID elevation
#> [1,]  1  380.0747
#> [2,]  2  452.3375
#> [3,]  3  460.0331

exact_extract(r, st_as_sf(geom), fun = "mean")
#> [1] 380.0745 452.3372 460.0329

Lvulis · 2022-06-26T14:43:29Z

Those are O(10^-6) differences and should be negligible.
I can take a run at writing something integrated with the text and submit it here? Don't have the book source code downloaded currently.

kadyb · 2022-06-28T12:55:38Z

The timings for precise aggregation are also interesting. It seems that the {terra} algorithm is much less efficient.

library("sf")
library("terra")
library("exactextractr") # devel version from GitHub

n = 1000
# in-memory raster
r = rast(nrows = n, ncols = n, vals = rnorm(n^2),
         xmin = 0, xmax = n,  ymin = 0, ymax = n)

s = 1 # numer of points
geom = matrix(c(sample(n, size = s), sample(n, size = s)), ncol = 2)
geom = st_as_sf(as.data.frame(geom), coords = c(1, 2))
geom = st_buffer(geom, dist = 300)
geom2 = vect(geom)

system.time(x <- exact_extract(r, geom, fun = "mean", progress = FALSE))
#> x == 0.00179644
#>  user  system elapsed
#> 0.041   0.008   0.048

system.time(y1 <- extract(r, geom2, fun = "mean", exact = FALSE))
#> y1 == 0.001754078
#>  user  system elapsed
#> 0.971   0.000   0.972

system.time(y2 <- extract(r, geom2, fun = "mean", exact = TRUE))
#> y2 == 0.00179644
#>   user  system elapsed
#> 21.520   0.175  21.697

Nowosad · 2022-06-28T13:26:51Z

@kadyb and what are the timings without aggregation?

kadyb · 2022-06-28T14:00:13Z

The aggregate statistic has a marginal impact (at least in this case):

## `geom` and `geom2` consist of 1 geometry

system.time(x <- exact_extract(r, geom, progress = FALSE))
#>  user  system elapsed
#> 0.042   0.001   0.042

system.time(y1 <- extract(r, geom2, exact = FALSE))
#>  user  system elapsed
#> 0.614   0.001   0.615

system.time(y2 <- extract(r, geom2, exact = TRUE))
#>   user  system elapsed
#> 20.979   0.451  21.430

Lvulis · 2022-06-28T16:05:25Z

@kadyb Your comment motivated me to double check the benchmarking example that's reported on the package GitHub as the discrepancy didn't seem so large. I just noticed that in the example, the close to similar performance between {terra} and {exactextractr} is with terra::extract(exact=F). Running a subset of the example with exact=T yields a ~300x runtime improvement, similar to the 500x improvement you reported in that synthetic data. It seems the package does give a very significant performance boost, so well worth including.

library(raster)
library(sf)
library(exactextractr)
library(microbenchmark)

brazil <- st_as_sf(getData('GADM', country='BRA', level=1))
brazil_spat <- as(brazil, 'SpatVector')

prec_rast <- getData('worldclim', var='prec', res=10)
prec_terra <- terra::rast(prec_rast) 
microbenchmark(
  terra::extract(prec_terra, brazil_spat, mean, na.rm = TRUE, exact = T),
  exact_extract(prec_terra, brazil, 'mean', progress = FALSE),
  times = 10)

Unit: milliseconds
                                                                   expr
 terra::extract(prec_terra, brazil_spat, mean, na.rm = TRUE, exact = T)
            exact_extract(prec_terra, brazil, "mean", progress = FALSE)
       min         lq       mean     median         uq         max neval
 75268.602 75995.8173 90945.6605 78579.3624 81447.6076 203122.7589    10
   212.249   234.2676   263.9664   245.0914   259.0936    455.1932    10

kadyb · 2022-06-28T18:48:17Z

Did you test the stable version from CRAN or development from GitHub? Recently there have been some fixes (isciences/exactextractr@5b88ecf) that make {exactextractr} even faster for SpatRaster object.

terra::extract(exact = FALSE) processing time is just similar to exactextractr::exact_extract() but the biggest differences is for terra::extract(exact = TRUE). Here is one more benchmark from me on satellite data: https://github.com/kadyb/raster-benchmark but extract(exact = FALSE) only.

Lvulis · 2022-06-28T20:11:59Z

The previous benchmark was using the stable version from CRAN. With the development version, for the same example above I have almost the same result. it seems to be a very minor improvement, there was one iteration that was very slow in the previous example and dragged up the mean, but otherwise similar speed.

> microbenchmark(
+   terra::extract(prec_terra, brazil_spat, mean, na.rm = TRUE, exact = T),
+   exact_extract(prec_terra, brazil, 'mean', progress = FALSE),
+   times = 10)
Unit: milliseconds
                                                                   expr
 terra::extract(prec_terra, brazil_spat, mean, na.rm = TRUE, exact = T)
            exact_extract(prec_terra, brazil, "mean", progress = FALSE)
        min         lq      mean     median         uq
 77587.2348 77971.3209 78466.483 78320.0923 79112.2975
   207.5322   231.2555   246.865   243.4413   249.6305
        max neval
 79635.6012    10
   340.1172    10

kadyb · 2022-06-28T20:31:31Z

You can check it on larger data set, because these differences may not be noticeable on small dataset. For this example I got speedup from 17.857 s to 5.572 s on my PC.

Lvulis · 2022-06-28T21:01:13Z

EDIT - I compared the two stable release of {exactextractr} (v0.8.2) versus the GH version (0.9) and did not really find improvement. Also fixed variable names below. See old comment that I did compare against {terra} and found {exactextractr} to be faster.

OLD:
Good idea. I did not do the comparison with different {exactextractr} versions but interestingly I found {exactextractr} to be significantly faster than {terra} with this large dataset, even with exact=F in the extract call. I am a little hesitant to run the extract=T and do replicate benchmarks for how long it'll take.

download.file("https://biogeo.ucdavis.edu/data/worldclim/v2.1/base/wc2.1_2.5m_prec.zip",
              "prec_2.5m.zip")
unzip("prec_2.5m.zip", exdir = "prec_2.5m")
f = list.files("prec_2.5m", pattern = "\\.tif$", full.names = TRUE)

prec = rast(f)
prec = prec * 1

brazil <- st_as_sf(getData('GADM', country='BRA', level=1))
brazil_spat <- as(brazil, 'SpatVector')

microbenchmark(
  terra::extract(prec, brazil_spat, mean, na.rm = TRUE, exact = F),
  exact_extract(prec, brazil, 'mean', progress = FALSE),
  times = 10)

Unit: seconds
                                                         expr
 terra::extract(prec, brazil_spat, mean, na.rm = TRUE, exact = F)
        exact_extract(prec, brazil, "mean", progress = FALSE)
      min       lq     mean   median       uq       max
 9.044934 9.168709 9.561833 9.446062 9.607415 11.124775
 2.037946 2.045342 2.148452 2.115836 2.224045  2.365771
 neval
    10
    10
# exact_extract from GH development version 0.9.0


> microbenchmark(
+   terra::extract(prec, brazil_spat, mean, na.rm = TRUE, exact = F),
+   exact_extract(prec, brazil, 'mean', progress = FALSE),
+   times = 10)
Error in x$.self$finalize() : attempt to apply non-function
Unit: seconds
                                                             expr
 terra::extract(prec, brazil_spat, mean, na.rm = TRUE, exact = F)
            exact_extract(prec, brazil, "mean", progress = FALSE)
      min       lq     mean   median       uq      max
 9.288340 9.578843 9.615942 9.630635 9.733694 9.810699
 2.072167 2.090151 2.375785 2.168442 2.418485 3.256852
 neval
    10
    10
# exact_extract from stable version 0.8.2

kadyb · 2022-06-28T22:54:21Z

brazil <- st_as_sf(getData('GADM', country='BRA', level=1))

You used only 27 polygons. Try level=3 to get over 10,000. I see:

microbenchmark(exact_extract(prec, brazil, 'mean', progress = FALSE), times = 5)

## devel
#>       min       lq     mean   median       uq      max neval
## (n = 27)
#>  1.654517 1.671902 1.886303 1.873598 2.081351 2.176676     5
## (n = 10000)
#>  5.639051 5.694109 5.824986 5.827941 5.940191 6.023639     5

## 0.8.2
#>       min       lq     mean   median       uq      max neval
## (n = 27)
#>  1.986231 2.184738 2.189063 2.198255 2.218664 2.458221     5
## (n = 10000)
#>  17.26146 17.57488 17.77355 17.60151 18.20275 18.22718     5

Lvulis · 2022-06-29T15:53:47Z

Thanks for the suggestion. I did the following comparison of the devel and stable versions with prec loaded at 2.5 m as in the previous examples. Dropped the extract() call in the devel benchmark as it's 10x slower so takes a long time for the function to execute. By the way, in your example, the faster runtime is the level 1 polygons (N = 27) and the slower is the level 3 polygons (N = ~10k), correct?

The development version is evidently faster. It seems the scaling with N polygons grows faster in the stable version than the devel, one could of course test it farther to figure out exactly what the difference is, but it is clearly an improvement. Great work pinpointing this issue.

brazil <- st_as_sf(getData('GADM', country='BRA', level=3))
brazil_spat <- as(brazil, 'SpatVector')

## devel 0.9.0
> microbenchmark(exact_extract(prec, brazil, 'mean', progress = FALSE),
+   times = 5)
Unit: seconds
      min       lq     mean   median       uq     max neval
 4.166733 4.190159 4.330995 4.226648 4.364043 4.70739     5

## 0.8.2 (stable)
> microbenchmark(
+   terra::extract(prec, brazil_spat, mean, na.rm = TRUE, exact = F),
+   exact_extract(prec, brazil, 'mean', progress = FALSE),
+   times = 5)
Unit: seconds
       min        lq      mean    median        uq
 91.929058 93.775886 97.568340 95.035739 101.92225
  9.184623  9.335977  9.937854  9.564593  10.15296
       max neval
 105.17877     5
  11.45112     5

Robinlovelace · 2022-06-29T18:43:03Z

We should probably merge #821 before deeming this issue fully closed. Many thanks @Lvulis for the contribution and pls review/merge when you get a chance @Nowosad .

Lvulis · 2022-06-29T18:44:47Z

Fair point! Sorry about that.

Robinlovelace assigned Nowosad Jun 25, 2022

Lvulis mentioned this issue Jun 28, 2022

Include reference to exactextractr #821

Merged

Lvulis closed this as completed Jun 29, 2022

Lvulis reopened this Jun 29, 2022

Nowosad closed this as completed Jul 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add references to exactextractr #813

Add references to exactextractr #813

Lvulis commented Jun 25, 2022

Robinlovelace commented Jun 25, 2022

kadyb commented Jun 26, 2022 •

edited

Loading

Nowosad commented Jun 26, 2022 •

edited

Loading

kadyb commented Jun 26, 2022

Lvulis commented Jun 26, 2022 •

edited

Loading

kadyb commented Jun 28, 2022

Nowosad commented Jun 28, 2022

kadyb commented Jun 28, 2022

Lvulis commented Jun 28, 2022 •

edited

Loading

kadyb commented Jun 28, 2022

Lvulis commented Jun 28, 2022

kadyb commented Jun 28, 2022

Lvulis commented Jun 28, 2022 •

edited

Loading

kadyb commented Jun 28, 2022 •

edited

Loading

Lvulis commented Jun 29, 2022 •

edited

Loading

Robinlovelace commented Jun 29, 2022

Lvulis commented Jun 29, 2022

Add references to exactextractr #813

Add references to exactextractr #813

Comments

Lvulis commented Jun 25, 2022

Robinlovelace commented Jun 25, 2022

kadyb commented Jun 26, 2022 • edited Loading

Nowosad commented Jun 26, 2022 • edited Loading

kadyb commented Jun 26, 2022

Lvulis commented Jun 26, 2022 • edited Loading

kadyb commented Jun 28, 2022

Nowosad commented Jun 28, 2022

kadyb commented Jun 28, 2022

Lvulis commented Jun 28, 2022 • edited Loading

kadyb commented Jun 28, 2022

Lvulis commented Jun 28, 2022

kadyb commented Jun 28, 2022

Lvulis commented Jun 28, 2022 • edited Loading

kadyb commented Jun 28, 2022 • edited Loading

Lvulis commented Jun 29, 2022 • edited Loading

Robinlovelace commented Jun 29, 2022

Lvulis commented Jun 29, 2022

kadyb commented Jun 26, 2022 •

edited

Loading

Nowosad commented Jun 26, 2022 •

edited

Loading

Lvulis commented Jun 26, 2022 •

edited

Loading

Lvulis commented Jun 28, 2022 •

edited

Loading

Lvulis commented Jun 28, 2022 •

edited

Loading

kadyb commented Jun 28, 2022 •

edited

Loading

Lvulis commented Jun 29, 2022 •

edited

Loading