Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add references to exactextractr #813

Closed
Lvulis opened this issue Jun 25, 2022 · 17 comments
Closed

Add references to exactextractr #813

Lvulis opened this issue Jun 25, 2022 · 17 comments
Assignees

Comments

@Lvulis
Copy link
Contributor

Lvulis commented Jun 25, 2022

The exactextractr package enables super fast zonal extraction of raster values from polygons. It also reports the coverage fraction of each cell by the input polygon which some could find useful. It works with sf polygons and terra/raster raster data. From the benchmarking reported on the package GitHub, it seems that the improvement with exactextractr::exact_extract over terra::extract is not as significant as the improvement over raster::extract, but could still be useful if someone is working with very large datasets.

Maybe a reference to the package could be added in Chapter 6.

@Robinlovelace
Copy link
Collaborator

👍 to that, it's well-maintained, fast and clearly useful. You up for taking a look at this @Lvulis? It's more on Jakub's area, what do you reckon? Thanks for the idea!

@kadyb
Copy link

kadyb commented Jun 26, 2022

It is also worth mentioning that {exactextractr} is more precise than {raster}, {terra} or {stars}. See: 1, 2

@Nowosad
Copy link
Member

Nowosad commented Jun 26, 2022

@kadyb including when terra::extract() has exact = TRUE?

@kadyb
Copy link

kadyb commented Jun 26, 2022

There are some differences in results, but are they significant?

library("sf")
library("terra")
library("exactextractr")

f = system.file("ex/elev.tif", package = "terra")
r = rast(f)

geom = spatSample(r, size = 3, as.points = TRUE, na.rm = TRUE)
geom = buffer(geom, width = 2000)

extract(r, geom, fun = "mean", exact = FALSE)
#>   ID elevation
#> 1  1  379.6957
#> 2  2  451.9130
#> 3  3  457.9565

extract(r, geom, fun = "mean", exact = TRUE)
#>      ID elevation
#> [1,]  1  380.0747
#> [2,]  2  452.3375
#> [3,]  3  460.0331

exact_extract(r, st_as_sf(geom), fun = "mean")
#> [1] 380.0745 452.3372 460.0329

@Lvulis
Copy link
Contributor Author

Lvulis commented Jun 26, 2022

Those are O(10^-6) differences and should be negligible.
I can take a run at writing something integrated with the text and submit it here? Don't have the book source code downloaded currently.

@kadyb
Copy link

kadyb commented Jun 28, 2022

The timings for precise aggregation are also interesting. It seems that the {terra} algorithm is much less efficient.

library("sf")
library("terra")
library("exactextractr") # devel version from GitHub

n = 1000
# in-memory raster
r = rast(nrows = n, ncols = n, vals = rnorm(n^2),
         xmin = 0, xmax = n,  ymin = 0, ymax = n)

s = 1 # numer of points
geom = matrix(c(sample(n, size = s), sample(n, size = s)), ncol = 2)
geom = st_as_sf(as.data.frame(geom), coords = c(1, 2))
geom = st_buffer(geom, dist = 300)
geom2 = vect(geom)

system.time(x <- exact_extract(r, geom, fun = "mean", progress = FALSE))
#> x == 0.00179644
#>  user  system elapsed
#> 0.041   0.008   0.048

system.time(y1 <- extract(r, geom2, fun = "mean", exact = FALSE))
#> y1 == 0.001754078
#>  user  system elapsed
#> 0.971   0.000   0.972

system.time(y2 <- extract(r, geom2, fun = "mean", exact = TRUE))
#> y2 == 0.00179644
#>   user  system elapsed
#> 21.520   0.175  21.697

@Nowosad
Copy link
Member

Nowosad commented Jun 28, 2022

@kadyb and what are the timings without aggregation?

@kadyb
Copy link

kadyb commented Jun 28, 2022

The aggregate statistic has a marginal impact (at least in this case):

## `geom` and `geom2` consist of 1 geometry

system.time(x <- exact_extract(r, geom, progress = FALSE))
#>  user  system elapsed
#> 0.042   0.001   0.042

system.time(y1 <- extract(r, geom2, exact = FALSE))
#>  user  system elapsed
#> 0.614   0.001   0.615

system.time(y2 <- extract(r, geom2, exact = TRUE))
#>   user  system elapsed
#> 20.979   0.451  21.430

@Lvulis
Copy link
Contributor Author

Lvulis commented Jun 28, 2022

@kadyb Your comment motivated me to double check the benchmarking example that's reported on the package GitHub as the discrepancy didn't seem so large. I just noticed that in the example, the close to similar performance between {terra} and {exactextractr} is with terra::extract(exact=F). Running a subset of the example with exact=T yields a ~300x runtime improvement, similar to the 500x improvement you reported in that synthetic data. It seems the package does give a very significant performance boost, so well worth including.

library(raster)
library(sf)
library(exactextractr)
library(microbenchmark)

brazil <- st_as_sf(getData('GADM', country='BRA', level=1))
brazil_spat <- as(brazil, 'SpatVector')

prec_rast <- getData('worldclim', var='prec', res=10)
prec_terra <- terra::rast(prec_rast) 
microbenchmark(
  terra::extract(prec_terra, brazil_spat, mean, na.rm = TRUE, exact = T),
  exact_extract(prec_terra, brazil, 'mean', progress = FALSE),
  times = 10)

Unit: milliseconds
                                                                   expr
 terra::extract(prec_terra, brazil_spat, mean, na.rm = TRUE, exact = T)
            exact_extract(prec_terra, brazil, "mean", progress = FALSE)
       min         lq       mean     median         uq         max neval
 75268.602 75995.8173 90945.6605 78579.3624 81447.6076 203122.7589    10
   212.249   234.2676   263.9664   245.0914   259.0936    455.1932    10

@kadyb
Copy link

kadyb commented Jun 28, 2022

Did you test the stable version from CRAN or development from GitHub? Recently there have been some fixes (isciences/exactextractr@5b88ecf) that make {exactextractr} even faster for SpatRaster object.

terra::extract(exact = FALSE) processing time is just similar to exactextractr::exact_extract() but the biggest differences is for terra::extract(exact = TRUE). Here is one more benchmark from me on satellite data: https://github.com/kadyb/raster-benchmark but extract(exact = FALSE) only.

@Lvulis
Copy link
Contributor Author

Lvulis commented Jun 28, 2022

The previous benchmark was using the stable version from CRAN. With the development version, for the same example above I have almost the same result. it seems to be a very minor improvement, there was one iteration that was very slow in the previous example and dragged up the mean, but otherwise similar speed.

> microbenchmark(
+   terra::extract(prec_terra, brazil_spat, mean, na.rm = TRUE, exact = T),
+   exact_extract(prec_terra, brazil, 'mean', progress = FALSE),
+   times = 10)
Unit: milliseconds
                                                                   expr
 terra::extract(prec_terra, brazil_spat, mean, na.rm = TRUE, exact = T)
            exact_extract(prec_terra, brazil, "mean", progress = FALSE)
        min         lq      mean     median         uq
 77587.2348 77971.3209 78466.483 78320.0923 79112.2975
   207.5322   231.2555   246.865   243.4413   249.6305
        max neval
 79635.6012    10
   340.1172    10

@kadyb
Copy link

kadyb commented Jun 28, 2022

You can check it on larger data set, because these differences may not be noticeable on small dataset. For this example I got speedup from 17.857 s to 5.572 s on my PC.

@Lvulis
Copy link
Contributor Author

Lvulis commented Jun 28, 2022

EDIT - I compared the two stable release of {exactextractr} (v0.8.2) versus the GH version (0.9) and did not really find improvement. Also fixed variable names below. See old comment that I did compare against {terra} and found {exactextractr} to be faster.

OLD:
Good idea. I did not do the comparison with different {exactextractr} versions but interestingly I found {exactextractr} to be significantly faster than {terra} with this large dataset, even with exact=F in the extract call. I am a little hesitant to run the extract=T and do replicate benchmarks for how long it'll take.

download.file("https://biogeo.ucdavis.edu/data/worldclim/v2.1/base/wc2.1_2.5m_prec.zip",
              "prec_2.5m.zip")
unzip("prec_2.5m.zip", exdir = "prec_2.5m")
f = list.files("prec_2.5m", pattern = "\\.tif$", full.names = TRUE)

prec = rast(f)
prec = prec * 1

brazil <- st_as_sf(getData('GADM', country='BRA', level=1))
brazil_spat <- as(brazil, 'SpatVector')

microbenchmark(
  terra::extract(prec, brazil_spat, mean, na.rm = TRUE, exact = F),
  exact_extract(prec, brazil, 'mean', progress = FALSE),
  times = 10)

Unit: seconds
                                                         expr
 terra::extract(prec, brazil_spat, mean, na.rm = TRUE, exact = F)
        exact_extract(prec, brazil, "mean", progress = FALSE)
      min       lq     mean   median       uq       max
 9.044934 9.168709 9.561833 9.446062 9.607415 11.124775
 2.037946 2.045342 2.148452 2.115836 2.224045  2.365771
 neval
    10
    10
# exact_extract from GH development version 0.9.0


> microbenchmark(
+   terra::extract(prec, brazil_spat, mean, na.rm = TRUE, exact = F),
+   exact_extract(prec, brazil, 'mean', progress = FALSE),
+   times = 10)
Error in x$.self$finalize() : attempt to apply non-function
Unit: seconds
                                                             expr
 terra::extract(prec, brazil_spat, mean, na.rm = TRUE, exact = F)
            exact_extract(prec, brazil, "mean", progress = FALSE)
      min       lq     mean   median       uq      max
 9.288340 9.578843 9.615942 9.630635 9.733694 9.810699
 2.072167 2.090151 2.375785 2.168442 2.418485 3.256852
 neval
    10
    10
# exact_extract from stable version 0.8.2

@kadyb
Copy link

kadyb commented Jun 28, 2022

brazil <- st_as_sf(getData('GADM', country='BRA', level=1))

You used only 27 polygons. Try level=3 to get over 10,000. I see:

microbenchmark(exact_extract(prec, brazil, 'mean', progress = FALSE), times = 5)

## devel
#>       min       lq     mean   median       uq      max neval
## (n = 27)
#>  1.654517 1.671902 1.886303 1.873598 2.081351 2.176676     5
## (n = 10000)
#>  5.639051 5.694109 5.824986 5.827941 5.940191 6.023639     5

## 0.8.2
#>       min       lq     mean   median       uq      max neval
## (n = 27)
#>  1.986231 2.184738 2.189063 2.198255 2.218664 2.458221     5
## (n = 10000)
#>  17.26146 17.57488 17.77355 17.60151 18.20275 18.22718     5

@Lvulis
Copy link
Contributor Author

Lvulis commented Jun 29, 2022

Thanks for the suggestion. I did the following comparison of the devel and stable versions with prec loaded at 2.5 m as in the previous examples. Dropped the extract() call in the devel benchmark as it's 10x slower so takes a long time for the function to execute. By the way, in your example, the faster runtime is the level 1 polygons (N = 27) and the slower is the level 3 polygons (N = ~10k), correct?

The development version is evidently faster. It seems the scaling with N polygons grows faster in the stable version than the devel, one could of course test it farther to figure out exactly what the difference is, but it is clearly an improvement. Great work pinpointing this issue.

brazil <- st_as_sf(getData('GADM', country='BRA', level=3))
brazil_spat <- as(brazil, 'SpatVector')

## devel 0.9.0
> microbenchmark(exact_extract(prec, brazil, 'mean', progress = FALSE),
+   times = 5)
Unit: seconds
      min       lq     mean   median       uq     max neval
 4.166733 4.190159 4.330995 4.226648 4.364043 4.70739     5

## 0.8.2 (stable)
> microbenchmark(
+   terra::extract(prec, brazil_spat, mean, na.rm = TRUE, exact = F),
+   exact_extract(prec, brazil, 'mean', progress = FALSE),
+   times = 5)
Unit: seconds
       min        lq      mean    median        uq
 91.929058 93.775886 97.568340 95.035739 101.92225
  9.184623  9.335977  9.937854  9.564593  10.15296
       max neval
 105.17877     5
  11.45112     5

@Lvulis Lvulis closed this as completed Jun 29, 2022
@Robinlovelace
Copy link
Collaborator

We should probably merge #821 before deeming this issue fully closed. Many thanks @Lvulis for the contribution and pls review/merge when you get a chance @Nowosad .

@Lvulis Lvulis reopened this Jun 29, 2022
@Lvulis
Copy link
Contributor Author

Lvulis commented Jun 29, 2022

Fair point! Sorry about that.

@Nowosad Nowosad closed this as completed Jul 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants