Add geodesic polygon coverage mode for polygonToCellsExperimental#1052
Add geodesic polygon coverage mode for polygonToCellsExperimental#1052holoskii wants to merge 3 commits into
Conversation
|
Thanks @holoskii for the contribution! Geodesic polyfill is an exciting addition that I think a lot of folks would use, myself included! Please bear with us as we try to find the time to properly review this PR. Some initial questions from my end:
|
c9be803 to
11cf632
Compare
Fixes / changes
PerformanceI added a dedicated geodesic benchmark that exercises small, medium, and large shapes.
Notes:
|
1.1 - Yes. 1.2 - The geodesic approach behaves better for large polygons and near poles/antimeridian due to great-circle behavior, but it’s architecturally different from the planar path; there isn’t a practical feature we can “back-port” without re-implementing substantial pieces. 2 - We only share 3 - Performance metrics are now included in the main PR comment (see “Performance” above) with the new benchmark. |
|
A few small fixes for the CI:
|
|
@isaacbrodsky was right, separate fuzzer is a way to go. Old fuzzers left not enough time until timeout in each iteration. In the last commit I've added a separate geodesic fuzzer. PR is now fully ready for review |
|
Hi @nrabinowitz @dfellis @isaacbrodsky - friendly bump on this. |
Thanks for updating the PR and for the bump! We appreciate the contribution and know we need to review this, so it's just a matter of finding time to read through it in depth. We would like to bring this in for the next release of H3. |
| if (iter->_polygon->geoloop.numVerts <= 0) { | ||
| iterErrorPolygonCompact(iter, E_DOMAIN); | ||
| return NULL; | ||
| } |
There was a problem hiding this comment.
a zero-vertex geodesic polygon is not allowed?
There was a problem hiding this comment.
Yes. Algorithm is complex and needs a lot of setup and auxiliary structures, which only work for valid geo loops. Easier to error out right away
There was a problem hiding this comment.
I would actually take this further and require that every loop (outer or hole) have at least 3 vertices, raising an error otherwise. Since it isn't clear what an algorithm should do in the degenerate case of 0, 1, or 2 vertices, I think its easier to eliminate this scenario from consideration entirely.
This would also align with the GeoJSON spec, RFC 7946, Section 3.1.6: "A linear ring is a closed LineString with four or more positions". (We don't repeat the closing vertex, so we can do 3.)
There was a problem hiding this comment.
During testing, I found a use case for two-point 'polygons,' such as long-distance flight paths or marine trajectories. The algorithm correctly covers these start-to-end paths. There is a test case that covers that use case: polygonToCellsLondonNY
There was a problem hiding this comment.
Paths are a totally valid use case, but I'd treat that separate from polygons. Given that this function has polygon in the name, I'd prefer it to be restricted to just that.
A separate function that handles lines would be more than welcome, tho! And, hopefully, it wouldn't be much work to add one, since I'm assuming we can reuse a lot of the machinery from this function.
Side note: In that test, it looks like you repeat the closing vertex for three total points, so you have London -> NY -> London. If you want to test two-point "polygons", should that only be two?
There was a problem hiding this comment.
Addressed this: only loops with >= 3 vertexes are allowed. I kept the test with London -> NY -> London as an edge case of 0-area polygon (as we do not reject those)
f3df47e to
bb067db
Compare
|
@holoskii Thanks for the update! I think CI is failing on the formatting assertion (needs |
My update is a bit overdue, this year has been very busy so far 😅 |
|
Resolved final issues, formatted with version 14. Should be ready for the CI 2 small issues to raise after this PR is merged:
|
|
Windows workflow failed because on 32-bit Windows systems, the math library doesn't handle NaN inputs correctly. Instead of ignoring NaN data when building a bounding box, it lets the NaN poison the BBOX. This leads to us hitting NaN down the road and failing later. I added a check at the very beginning, during polygon creation to catch invalid coordinates (NaN or Infinity). Now, we reject bad input right away with a clear error code (E_DOMAIN) instead of letting it break things further down the line. |
|
Updated PR title and description, should be ready for the merge |
|
Thanks again for this. This is awesome, and the effort is definitely appreciated. I have a few questions about the interface, which don't necessarily need to block this PR, but I think we should discuss and resolve before releasing (or exiting "experimental" status). And since we definitely want to get this functionality over the line and released, we can also split this work up, so it isn't entirely on you, since you've already contributed quite a bit. :) DiscussionI'm primarily interested in how this new mode will work with the "global" polygons (larger than a hemisphere, crossing poles and the antimeridian, etc.) that we can now output with Boundaries and center containmentFor the purposes of testing (and because it'd be generally useful), would it be possible to add a For example, if we take a single cell
Ideally, we'd get just the single cell back, but I understand boundary behavior makes that difficult with FULL and OVERLAPPING. (Side question: Given the known issues with boundaries, are these examples still behaving as you would expect?) Polygon validationMinimum number of points in a loopLike I mention elsewhere in the PR, I think we should only accept polygons with loops containing at least 3 points, and return errors otherwise. There's no notion of "inside" or "outside" when you have only two points in a "polygon". Winding orderFor geodesic mode, I think we should require that polygons follow the "right-hand rule", with outer loops expected (and interpreted) to be in counter-clockwise winding order, and holes in clockwise order. I recognize that this differs from what we do in the planar case, where we accept either ordering and figure out how to interpret it. But if we want to preserve the option of extending this mode to handle "global" polygons in the future, then being strict about loop winding is the only option. This would also align with the GeoJSON spec: RFC 7946, Section 3.1.6. For example, if we take the boundary of a cell, reverse the ordering of the points so that they're in clockwise order, then this loop would represent "everything except the cell", and we'd want this function to interpret it that way. Currently, geodesic mode interprets the boundary and its reversal the same: as the smaller of the two options. API guarantees and error codesThe public API docs should describe the supported input domain and what happens outside it. We should also test against that description. For example, what polygons are guaranteed to work correctly? Which ones aren't? And, in the case of unsupported polygons, do we produce an error code, fail silently, or produce a partial result? To start, I'd propose something like: "We support any polygon which is strictly contained within a hemisphere, and raise an error otherwise." I think that can be implemented fairly easily in the current code by adding a check that all points within a polygon are in some half space, and the loops are oriented correctly (which we can do, for example, with the spherical area calculation code). The plan for future versions of this function would be to relax the constraint on valid inputs, and eventually support all global polygons, like the ones we can get from Testing"Exhaustive" testing over the whole globe is something we've done with other functions that would be helpful here to ensure this is behaving as expected. For example, we could add some tests that evaluate all cells and their boundaries at, say, resolutions 0--4. We could also reverse the boundaries to check that we get the expected output or error code. We might also want to check grid rings and grid disks around the cells, for both small regions and regions approaching or exceeding a hemisphere. In particular, I don't think we currently have any tests for what happens when the function receives a polygon larger than a hemisphere, do we? Again, the Next stepsThat covers my thoughts and concerns on the current PR (which, again, is awesome. thanks!). It involves a few changes, but ones which I expect wouldn't involve a ton of work, since they're more about the API boundary than the function internals. Still, we should definitely have a discussion to see if folks agree, and then how we want to triage and distribute this work. It probably doesn't need to happen all in this PR. |
|
Aside from my API comments above, I may have found a proper bug in the current implementation. The test is the following: For a given cell That is the case for most cells, however, at res 0 and k=1, I'm seeing three test failures: I can find more examples going up to res 1 and k = 2. Edit: Here's the test code I was using: https://gist.github.com/ajfriend/88e366c5b464a15355d511633dcf5b73 |
|
And here's an example where just a single cell boundary (k=0) fails to recover the center cell: |
This is a legitimate issue! |
|
@ajfriend #1052 (comment) Hemisphere approachNow we actually try and find hemisphere that will fit out polygon and fail if we can't find one. CONTAINMENT_CENTERCenter containment mode was added to debug and fix issue with hemespheres. Also main iteration loop was optimized more, so we don't do expensive intersection test if we can avoid it Boundaries issueThe behavior that you highlighted with different results for CW and CCW was not expected. This behavior occured because of the same issue causing wrong results: #1052 (comment). Fixed Winding orderEven though I agree that having determenistic loop order would be a good change it would require doing a bigger refactor, and rethink our approach to current geodesic algorithm. As well as we would need to implement the same changes to the non-geodesic polyfill. Additional testingI have created several "full sweep" type of tests to try and find those nuanced bugs with FP errors.
|
c75a49f to
38293ec
Compare
|
Thanks again for this work, addressing the comments, and fixing the issues, @holoskii. Really excited to get geodesic coverage into the library. One note for alignment:
This isn't actually the case. We can't change the API of the non-geodesic polyfill (like adding a requirement for winding order) without a major version change. Since there are no plans for a major version bump any time soon, the non-geodesic mode is basically fixed as is. However, the geodesic mode is new, so we're free to design that API however we like. The point I was attempting to make is that we don't want paint ourselves into a similar corner with the API for geodesic mode. Since this code is tagged as Edit: And just to be clear, I don't think we need any changes for this PR, necessarily. I'm just calling this out for the future. For this PR, being agnostic to winding order is OK because 1) the function is still |
|
Just noting this here, since I was having trouble finding it earlier: #1052 (comment) The updated PR now requires all loops to have >= 3 vertices. Thanks! |
|
One chore I'd like to propose: splitting the By splitting it up, we'll have an isolated PR focused on the internal refactor, which touches H3's core indexing functions. Subtle numerical changes here could potentially shift cell assignments, or affect indexing performance. @isaacbrodsky already flagged this in his review. Isolating it would let us add exhaustive round-trip tests and benchmarks to build full confidence, beyond what the existing test suite covers (since I'm not sure it was designed to capture regressions from this kind of change). We can also write that PR with an eye towards reusable vec3d components for things like geodesic "line to cells" or "sphere cap to cells" functions. It has the side benefit of making review easier. This is a significant (in size and functionality!) change, so splitting it into smaller PRs is helpful. Curious to hear what other people think. And as mentioned, I'm happy to help out with this. I'll review the recent changes, but my expectation at this point is that--pending this split--this PR is ready to merge. |
|
The idea is great. This is a huge PR, so digesting it in smaller portions will indeed be easier. Thanks for the help with this part! I will be waiting for the focused faceijk/vec3 PR to review, and rebase this one after prep PR is merged |
|
Great! I'll try to get tho this soon. |
Extract _vec3dToClosestFace, _vec3dToHex2d, _vec3dToFaceIjk from LatLng wrappers. Add _vec3dAzimuthRads (3D tangent-plane azimuth) and _hex2dToVec3 (inverse via Rodrigues' rotation). Add vec3ToCell and cellToVec3 public API. Existing LatLng API is unchanged — wrappers delegate to the new Vec3d functions. From uber#1052.
* Add Vec3d math functions: dot, cross, normalize, mag, distSq, latLngToVec3 From #1052. * Refactor faceijk.c: Vec3d as core coordinate type Extract _vec3dToClosestFace, _vec3dToHex2d, _vec3dToFaceIjk from LatLng wrappers. Add _vec3dAzimuthRads (3D tangent-plane azimuth) and _hex2dToVec3 (inverse via Rodrigues' rotation). Add vec3ToCell and cellToVec3 public API. Existing LatLng API is unchanged — wrappers delegate to the new Vec3d functions. From #1052. * Add Vec3d unit tests From #1052. * Route latLngToCell/cellToLatLng through Vec3d; remove dead LatLng-only code * cell boundary code to Vec3d path; remove _hex2dToGeo, _geoAzDistanceRads, and tests * vec3ToCell validation tests * align function names and struct: hex2d -> vec2d * use Vec2 and Vec3 everywhere * normalize * h3 -> cell; years * bench * update bench * correctness * test vertices and along edges * _vec3TangentBasis helper * vec3LinComb helper * return by value * pass by value * clean * test for inlining * header-only; suffering on LTO * header only * clean up benchmark script * _vec3ToFaceIjk * output params * header version of _ijkToVec2 * revert * revert * redo * inline ijk ops * going header only * directedEdgeToBoundary bench * justfile * revert coorijk header change * comments * update benchmarks * again * reduce digits on testCliEdgeLengthM * vertexToLatLng benchmark * cover line in latLng.c * remove temporary benchmarks and assignment tests * _faceIjkToCell back to _faceIjkToH3 * restore Vec3d * missed a few * test files * vec2d.h * Vec2d * vec2 names * more vec2 names * names * comments * years * years * boop * years * old test * haven't i done this one already? * _hex2dToVec3 * pass output by reference * _vec3ToHex2d * forward declare * comment * bah * bah bah * move around _vec3ToClosestFace * todo * bring benchmarks back in * nits * pass by value * expression * _vec3TangentBasis * nits * boop * i swear... * boop * 9 digits * move * 8 digits * comments align * better names for vec3LinComb * dead code: remove faceCenterGeo definition * remove benchmarks and justfile * camel_case * fix more snakeCase * small norm handling * static void _hex2dToVec3 * _faceIjkToVec3 g * fix doc comment on _faceIjkToVec3 * avoid Vec3d casts * unit vector comments * add vec3ToCell tests --------- Co-authored-by: Makar Ivashko <1212makar1212@gmail.com>
Adds geodesic (great-circle) polygon-to-cells support to the polygonToCellsExperimental API via a FLAG_GEODESIC_MASK flag bit. Squashed from 39 commits on geodesic_coverage branch for clean rebase onto upstream/master. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Align macro spacing with upstream style (no space before & in bitmask expressions), as enforced by clang-format-14. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
gcc -O2 inlines _ijkToHex2d and sees the uninitialized read. Initialize to zero like the non-pentagon counterpart already does. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
fb857a1 to
b8d354e
Compare
|
PR has been dragging for too long. The new clean rebased PR will be opened |
Introduce a geodesic flag for polygonToCellsExperimental that interprets
polygon edges as great-circle arcs instead of planar segments. This is
critical for accurate coverage of very large polygons spanning hundreds
or thousands of kilometers, where planar interpolation diverges
significantly from the true spherical geometry.
The implementation operates entirely on the unit sphere using 3D
Cartesian coordinates, with sphere-cap and AABB acceleration structures
for efficient cell pruning. Supports CONTAINMENT_FULL and
CONTAINMENT_OVERLAPPING modes. This mode is slower than the planar
algorithm, so lower resolutions are recommended.
Key changes:
Developed at FloeDB for internal use; contributed upstream.