feat: add LatLngToCellBatch for lower cgo overhead#119
Conversation
… Adds a batched LatLng -> Cell API in a new C file (h3_latLngBatch.{c,h}) that supplements the cloned H3 core.
Coverage Report for CI Build 26683177239Coverage remained the same at 100.0%Details
Uncovered ChangesNo uncovered changes found. Coverage RegressionsNo coverage regressions found. Coverage Stats
💛 - Coveralls |
|
Actually had a thought on pure Go implementations (as we can see in #120) - I'm wondering if its a pure Go implementation, then would we also get rid of the Cgo overhead and wouldn't need a batched version? Before (CGo): After (pure Go): benchstat (n=10): |
As a user and h3 & Go enthusiast, I'm all for pure Go optimizations. |
|
I'm open to allowing this batch function in, with an additional comment that it is specifically an optimization for CGo overhead in bulk. Simultaneously, @justinhwang - looks like you have at least a branch for a pure Go h3? This should be pretty easy to maintain with AI now, would you be interested in putting up a branch & PR, then we can move this discussion there? The original motivations for the bindings was to avoid the maintenance overhead for us humans as development moved on core, but this constraint has largely become moot now, imo. |
Resolves #113
Adds
LatLngToCellBatchso a slice ofLatLng->[]Cellcosts one cgo transaction for the whole batch instead of one per row. The motivating use case is pipelines transforming millions of coordinate rows per cycle, where cgo overhead currently dominates actual H3 work.Per @jogly's suggestion in the issue, the implementation lives in a new C extension (
h3_latLngBatch.{c,h}) that supplements the cloned H3 core. If H3 core later exposes and equivalentlatLngToCellBatch, swapping the Go wrapper to call it directly is a one-line change.Changes
h3_latLngBatch.h(new)latLngToCellBatchh3_latLngBatch.c(new)latLngToCellh3.go#include <h3_latLngBatch.h>in cgo preamble; newLatLngToCellBatchwrapperh3_test.gobench_test.goBenchmarkLatLngToCellBatchplus aBenchmarkLatLngToCellBaselinefor comparing resultsBenchmark
At small
n(n < 32) the per-call path wins. The crossover is aroundn=16-32; from there the batched version grows roughly linearly while the loop pays the cgo cost N times. Atn=16384the batched version is ~1.21x fasterConventions
LatLng.toC(), so the C code sees radians-LatLng identical to the single-call path.