-
-
Notifications
You must be signed in to change notification settings - Fork 584
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add references to exactextractr #813
Comments
👍 to that, it's well-maintained, fast and clearly useful. You up for taking a look at this @Lvulis? It's more on Jakub's area, what do you reckon? Thanks for the idea! |
@kadyb including when |
There are some differences in results, but are they significant? library("sf")
library("terra")
library("exactextractr")
f = system.file("ex/elev.tif", package = "terra")
r = rast(f)
geom = spatSample(r, size = 3, as.points = TRUE, na.rm = TRUE)
geom = buffer(geom, width = 2000)
extract(r, geom, fun = "mean", exact = FALSE)
#> ID elevation
#> 1 1 379.6957
#> 2 2 451.9130
#> 3 3 457.9565
extract(r, geom, fun = "mean", exact = TRUE)
#> ID elevation
#> [1,] 1 380.0747
#> [2,] 2 452.3375
#> [3,] 3 460.0331
exact_extract(r, st_as_sf(geom), fun = "mean")
#> [1] 380.0745 452.3372 460.0329 |
Those are O(10^-6) differences and should be negligible. |
The timings for precise aggregation are also interesting. It seems that the library("sf")
library("terra")
library("exactextractr") # devel version from GitHub
n = 1000
# in-memory raster
r = rast(nrows = n, ncols = n, vals = rnorm(n^2),
xmin = 0, xmax = n, ymin = 0, ymax = n)
s = 1 # numer of points
geom = matrix(c(sample(n, size = s), sample(n, size = s)), ncol = 2)
geom = st_as_sf(as.data.frame(geom), coords = c(1, 2))
geom = st_buffer(geom, dist = 300)
geom2 = vect(geom)
system.time(x <- exact_extract(r, geom, fun = "mean", progress = FALSE))
#> x == 0.00179644
#> user system elapsed
#> 0.041 0.008 0.048
system.time(y1 <- extract(r, geom2, fun = "mean", exact = FALSE))
#> y1 == 0.001754078
#> user system elapsed
#> 0.971 0.000 0.972
system.time(y2 <- extract(r, geom2, fun = "mean", exact = TRUE))
#> y2 == 0.00179644
#> user system elapsed
#> 21.520 0.175 21.697 |
@kadyb and what are the timings without aggregation? |
The aggregate statistic has a marginal impact (at least in this case): ## `geom` and `geom2` consist of 1 geometry
system.time(x <- exact_extract(r, geom, progress = FALSE))
#> user system elapsed
#> 0.042 0.001 0.042
system.time(y1 <- extract(r, geom2, exact = FALSE))
#> user system elapsed
#> 0.614 0.001 0.615
system.time(y2 <- extract(r, geom2, exact = TRUE))
#> user system elapsed
#> 20.979 0.451 21.430 |
@kadyb Your comment motivated me to double check the benchmarking example that's reported on the package GitHub as the discrepancy didn't seem so large. I just noticed that in the example, the close to similar performance between library(raster)
library(sf)
library(exactextractr)
library(microbenchmark)
brazil <- st_as_sf(getData('GADM', country='BRA', level=1))
brazil_spat <- as(brazil, 'SpatVector')
prec_rast <- getData('worldclim', var='prec', res=10)
prec_terra <- terra::rast(prec_rast)
microbenchmark(
terra::extract(prec_terra, brazil_spat, mean, na.rm = TRUE, exact = T),
exact_extract(prec_terra, brazil, 'mean', progress = FALSE),
times = 10)
Unit: milliseconds
expr
terra::extract(prec_terra, brazil_spat, mean, na.rm = TRUE, exact = T)
exact_extract(prec_terra, brazil, "mean", progress = FALSE)
min lq mean median uq max neval
75268.602 75995.8173 90945.6605 78579.3624 81447.6076 203122.7589 10
212.249 234.2676 263.9664 245.0914 259.0936 455.1932 10 |
Did you test the stable version from CRAN or development from GitHub? Recently there have been some fixes (isciences/exactextractr@5b88ecf) that make
|
The previous benchmark was using the stable version from CRAN. With the development version, for the same example above I have almost the same result. it seems to be a very minor improvement, there was one iteration that was very slow in the previous example and dragged up the mean, but otherwise similar speed. > microbenchmark(
+ terra::extract(prec_terra, brazil_spat, mean, na.rm = TRUE, exact = T),
+ exact_extract(prec_terra, brazil, 'mean', progress = FALSE),
+ times = 10)
Unit: milliseconds
expr
terra::extract(prec_terra, brazil_spat, mean, na.rm = TRUE, exact = T)
exact_extract(prec_terra, brazil, "mean", progress = FALSE)
min lq mean median uq
77587.2348 77971.3209 78466.483 78320.0923 79112.2975
207.5322 231.2555 246.865 243.4413 249.6305
max neval
79635.6012 10
340.1172 10 |
You can check it on larger data set, because these differences may not be noticeable on small dataset. For this example I got speedup from 17.857 s to 5.572 s on my PC. |
EDIT - I compared the two stable release of OLD: download.file("https://biogeo.ucdavis.edu/data/worldclim/v2.1/base/wc2.1_2.5m_prec.zip",
"prec_2.5m.zip")
unzip("prec_2.5m.zip", exdir = "prec_2.5m")
f = list.files("prec_2.5m", pattern = "\\.tif$", full.names = TRUE)
prec = rast(f)
prec = prec * 1
brazil <- st_as_sf(getData('GADM', country='BRA', level=1))
brazil_spat <- as(brazil, 'SpatVector')
microbenchmark(
terra::extract(prec, brazil_spat, mean, na.rm = TRUE, exact = F),
exact_extract(prec, brazil, 'mean', progress = FALSE),
times = 10)
Unit: seconds
expr
terra::extract(prec, brazil_spat, mean, na.rm = TRUE, exact = F)
exact_extract(prec, brazil, "mean", progress = FALSE)
min lq mean median uq max
9.044934 9.168709 9.561833 9.446062 9.607415 11.124775
2.037946 2.045342 2.148452 2.115836 2.224045 2.365771
neval
10
10
# exact_extract from GH development version 0.9.0
> microbenchmark(
+ terra::extract(prec, brazil_spat, mean, na.rm = TRUE, exact = F),
+ exact_extract(prec, brazil, 'mean', progress = FALSE),
+ times = 10)
Error in x$.self$finalize() : attempt to apply non-function
Unit: seconds
expr
terra::extract(prec, brazil_spat, mean, na.rm = TRUE, exact = F)
exact_extract(prec, brazil, "mean", progress = FALSE)
min lq mean median uq max
9.288340 9.578843 9.615942 9.630635 9.733694 9.810699
2.072167 2.090151 2.375785 2.168442 2.418485 3.256852
neval
10
10
# exact_extract from stable version 0.8.2 |
brazil <- st_as_sf(getData('GADM', country='BRA', level=1)) You used only 27 polygons. Try microbenchmark(exact_extract(prec, brazil, 'mean', progress = FALSE), times = 5)
## devel
#> min lq mean median uq max neval
## (n = 27)
#> 1.654517 1.671902 1.886303 1.873598 2.081351 2.176676 5
## (n = 10000)
#> 5.639051 5.694109 5.824986 5.827941 5.940191 6.023639 5
## 0.8.2
#> min lq mean median uq max neval
## (n = 27)
#> 1.986231 2.184738 2.189063 2.198255 2.218664 2.458221 5
## (n = 10000)
#> 17.26146 17.57488 17.77355 17.60151 18.20275 18.22718 5 |
Thanks for the suggestion. I did the following comparison of the devel and stable versions with The development version is evidently faster. It seems the scaling with N polygons grows faster in the stable version than the devel, one could of course test it farther to figure out exactly what the difference is, but it is clearly an improvement. Great work pinpointing this issue. brazil <- st_as_sf(getData('GADM', country='BRA', level=3))
brazil_spat <- as(brazil, 'SpatVector')
## devel 0.9.0
> microbenchmark(exact_extract(prec, brazil, 'mean', progress = FALSE),
+ times = 5)
Unit: seconds
min lq mean median uq max neval
4.166733 4.190159 4.330995 4.226648 4.364043 4.70739 5
## 0.8.2 (stable)
> microbenchmark(
+ terra::extract(prec, brazil_spat, mean, na.rm = TRUE, exact = F),
+ exact_extract(prec, brazil, 'mean', progress = FALSE),
+ times = 5)
Unit: seconds
min lq mean median uq
91.929058 93.775886 97.568340 95.035739 101.92225
9.184623 9.335977 9.937854 9.564593 10.15296
max neval
105.17877 5
11.45112 5 |
Fair point! Sorry about that. |
The
exactextractr
package enables super fast zonal extraction of raster values from polygons. It also reports the coverage fraction of each cell by the input polygon which some could find useful. It works withsf
polygons andterra
/raster
raster data. From the benchmarking reported on the package GitHub, it seems that the improvement withexactextractr::exact_extract
overterra::extract
is not as significant as the improvement overraster::extract
, but could still be useful if someone is working with very large datasets.Maybe a reference to the package could be added in Chapter 6.
The text was updated successfully, but these errors were encountered: