Alternate polygonToCells algorithm#785
Conversation
| // If we make it out of the loop, we're done | ||
| iterDestroyPolygonCompact(iter); |
There was a problem hiding this comment.
I am not sure what I think of the iterator usually being destroyed for you. On one hand it is convenient, on the other hand the caller will always need to be able to destroy the iterator themselves (e.g. if we exceed their memory or time budget). I guess that only happens if canceling the iteration, so that's how the caller should think of it?
There was a problem hiding this comment.
That seems like a reasonable way to think about it. Perhaps instead of calling it a destroyIterator or etc, it's cancelIterator?
There was a problem hiding this comment.
🤷 I like destroy, as it clearly indicates "release memory" - we use it for destroyLinkedMultiPolygon as well. I went back and forth about doing this for the caller, but prefer this option:
- In general, the component allocating memory should be responsible for releasing it, and in this case that's the iterator functions
- This is more ergonomic for the most common use cases, and ensures proper release in those cases, reducing the potential for bugs
| iterDestroyPolygonCompact(&(iter->_cellIter)); | ||
| iter->cell = H3_NULL; | ||
| iter->error = E_SUCCESS; | ||
| } |
There was a problem hiding this comment.
also zero out the child iterator?
| for (; iter.cell; iterStepPolygon(&iter)) { | ||
| out[i++] = iter.cell; | ||
| } |
There was a problem hiding this comment.
you could also assert we don't go over the estimation size here (which would be an out of bounds write)
There was a problem hiding this comment.
I don't think we'd want this in production code, since we assume the caller has sized the memory correctly, but the *out array could be any size, so our assertion would only be useful as an internal check that we were meeting our contract. But I think can use the assertion macros to check this in tests.
src/h3lib/lib/polyfill.c
Outdated
| // Check if the cell is in the polygon | ||
| // TODO: Handle other polyfill modes here | ||
| LatLng center; | ||
| H3_EXPORT(cellToLatLng)(cell, ¢er); |
There was a problem hiding this comment.
this should have a NEVER check for the error
src/h3lib/lib/polyfill.c
Outdated
| err = H3_EXPORT(latLngToCell)(&NORTH_POLE, H3_GET_RESOLUTION(cell), | ||
| &poleTest); | ||
| if (NEVER(err != E_SUCCESS)) { | ||
| return err; | ||
| } |
There was a problem hiding this comment.
It looks like Clang warnings want these stores to be to different variables https://github.com/uber/h3/actions/runs/6408852991/job/17398764952?pr=785 rather than a reused err variable. Not 100% sure I agree (since this is really complaining about something that only affects the coverage build) although I think it is marginally clearer than reusing err.
There was a problem hiding this comment.
Ok, will update - I thought I was being clever by declaring this at the top of each function that could throw, but clearly not.
src/h3lib/lib/polygon.c
Outdated
| test = ((b2->lat - b1->lat) * (a1->lng - b1->lng) - | ||
| (b2->lng - b1->lng) * (a1->lat - b1->lat)) / | ||
| ((b2->lng - b1->lng) * (a2->lat - a1->lat) - | ||
| (b2->lat - b1->lat) * (a2->lng - a1->lng)); |
There was a problem hiding this comment.
In CI, testPolygonInternal hits an unguarded division by zero on this line:
TEST polygonInternal
/Users/isaac/oss/h3/src/h3lib/lib/polygon.c:175:56: runtime error: division by zero
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /Users/isaac/oss/h3/src/h3lib/lib/polygon.c:175:56 in
......................................................................................zsh: abort ./bin/testPolygonInternal
(to configure: cmake -DCMAKE_C_FLAGS="-fsanitize=undefined,float-divide-by-zero -fno-sanitize-recover=undefined,float-divide-by-zero" .. - must be using clang)
There was a problem hiding this comment.
Thanks! Will fix
src/h3lib/lib/polygon.c
Outdated
| test = ((a2->lat - a1->lat) * (a1->lng - b1->lng) - | ||
| (a2->lng - a1->lng) * (a1->lat - b1->lat)) / | ||
| ((b2->lng - b1->lng) * (a2->lat - a1->lat) - | ||
| (b2->lat - b1->lat) * (a2->lng - a1->lng)); |
There was a problem hiding this comment.
same as above, presumably this could result in division by zero
|
With the various optimizations I've added so far, I've closed the gaps in the original benchmarks, and I think the new algorithm is likely to be faster most of the time, though still slower as the ratio of vertexes to cells increases (numbers shown below are ratio of old / new): Res 9 Res 10 |
|
Some more benchmarking data - when transpiled into The Given the memory benefits, I'm considering this a win, though I was hoping it would actually be faster 🤷. |
|
Even more benchmarking here (now in C) suggests that the resolution of the output cells also makes a difference, which makes sense given the hierarchical search involved in the new algo. This means that the performance of the new algo decreases compared to the old algo as the res gets finer. The following benchmark is for Djibouti: By res 9 it's 3x worse than the current implementation 😢 |
|
Further investigation shows that there's something else going on, rather than just resolution, but I'm not sure what. The benchmark for all countries has the old algo better after res 5, but I can't run it for finer resolutions: Testing Indonesia shows the same pattern, old algo is almost twice as fast by res 7. But testing a smaller set of small Indonesian islands at finer resolutions shows the opposite pattern from res 10 onward: So again, it may be a wash 🤷♂️. As far as I can tell, simpler shapes with better compaction yield faster times for the new algo, complex shapes and poor compaction are faster with the old. |
|
I put together a little animation of this process (code: https://github.com/isaacbrodsky/h3/tree/updated-polyfill-animation), to aid in visualizing what it's doing. I intentionally cropped out the first and last iterations were it's going over the base cells because the scale is so different compared to where most of the work happens. |
These are great! I was thinking about making an Observable notebook for this, but it's a little tough without reimplementing the algo, or just animating a log of the output |
|
Latest commit speeds up the bbox calculation, gaining maybe 20% perf, but still slower for 75% of countries in the benchmark :(. |
9e8b1e4 to
0a3b3a5
Compare
|
One additional note here: My new approach to |
2250df0 to
475d8f6
Compare
| switch (normalization) { | ||
| case NORMALIZE_NONE: | ||
| return lng; | ||
| case NORMALIZE_EAST: | ||
| return lng < 0 ? lng + (double)M_2PI : lng; | ||
| case NORMALIZE_WEST: | ||
| return lng > 0 ? lng - (double)M_2PI : lng; | ||
| } |
There was a problem hiding this comment.
GCC is complaining that there's no default case here so it's possible to reach the end of the function without returning a number.
Perhaps this could be rewritten as:
| switch (normalization) { | |
| case NORMALIZE_NONE: | |
| return lng; | |
| case NORMALIZE_EAST: | |
| return lng < 0 ? lng + (double)M_2PI : lng; | |
| case NORMALIZE_WEST: | |
| return lng > 0 ? lng - (double)M_2PI : lng; | |
| } | |
| switch (normalization) { | |
| case NORMALIZE_EAST: | |
| return lng < 0 ? lng + (double)M_2PI : lng; | |
| case NORMALIZE_WEST: | |
| return lng > 0 ? lng - (double)M_2PI : lng; | |
| default: | |
| return lng; | |
| } |
There was a problem hiding this comment.
Updated, thanks
| // -------------------------------------------- | ||
| // maxPolygonToCellsSize | ||
| // -------------------------------------------- |
There was a problem hiding this comment.
Is maxPolygonToCellsSize modified in this PR? Are these tests duplicated?
There was a problem hiding this comment.
These tests are duplicated. My intent was to have copy/paste versions of the old tests with the new function, so that when we drop the old function it would just be removing the -Experimental files. But I guess you're right that there's no point duplicating the maxPolygonToCellsSize tests, I can remove.
| DECLSPEC H3Error H3_EXPORT(polygonToCellsExperimental)( | ||
| const GeoPolygon *polygon, int res, uint32_t flags, H3Index *out); |
There was a problem hiding this comment.
For clarity, this is not intended to be part of the public API when this PR is merged (as it is not in h3api.h.in?)
There was a problem hiding this comment.
Correct - I added DECLSPEC so that it would be available for export, but I don't expect us to expose it in the API until we transparently replace the old function.
src/h3lib/lib/polyfill.c
Outdated
| /** | ||
| * Get a base cell by number, or H3_NULL if out of bounds | ||
| */ | ||
| static H3Index getBaseCell(int baseCellNum) { |
There was a problem hiding this comment.
This function name looks like it could do the opposite and accept and index and return a number, maybe rename to indexFromBaseCell or something?
There was a problem hiding this comment.
Makes sense, will rename to baseCellNumToIndex
src/h3lib/lib/polyfill.c
Outdated
| * Get a base cell by number, or H3_NULL if out of bounds | ||
| */ | ||
| static H3Index getBaseCell(int baseCellNum) { | ||
| if (NEVER(baseCellNum < 0) || baseCellNum >= NUM_BASE_CELLS) { |
There was a problem hiding this comment.
I don't think this needs NEVER if the function is marked non-static and a unit test exercises this case.
There was a problem hiding this comment.
True, can update. It's not clear to me whether static offers any compilation advantages, or whether it just indicates a local function. We settled on static to avoid _underscore names to denote local functions, but I guess it makes sense to test this separately.
src/h3lib/lib/polyfill.c
Outdated
|
|
||
| // Initialize bounding boxes for polygon and any holes. Memory allocated | ||
| // here must be released through iterDestroyPolygonCompact | ||
| iter._bboxes = H3_MEMORY(malloc)((polygon->numHoles + 1) * sizeof(BBox)); |
There was a problem hiding this comment.
| iter._bboxes = H3_MEMORY(malloc)((polygon->numHoles + 1) * sizeof(BBox)); | |
| iter._bboxes = H3_MEMORY(calloc)(polygon->numHoles + 1, sizeof(BBox)); |
Requesting this change to head off any integer overflow problem
There was a problem hiding this comment.
Will update, thanks
|
Thanks everyone for the reviews and support on this! |
This PR implements an alternate algorithm for
polygonToCells, which providespolygonToCompactCellsas a byproduct.Goals
compactCellssteppolygonToCellscontainsandintersects(Add additional modes for polygonToCells #775, Support different polyfill mode #275, H3 Polyfill skipping areas near the boundary h3-java#41)Algorithm Overview
Results
polygonToCellsCompacteffectively for free.cellBoundaryInsidePolygonandcellBoundaryCrossesGeoLoop).For Discussion
polygonToCellsalgo, even if performance degrades in some cases?cellToChildreniterator public as well?Benchmarks
In the following benchmakrks,
SFis a 6-vertex polygon covering San Francisco,Alamedais a 49-vertex polygon covering a similar area, andSouthernExpansionis a 22-vertex polygon covering an area 2x-3x larger than SF.polygonToCellsperforms better when the ratio of output cells to vertexes is low (few cells, complex polygons)polygonToCells2performs better when the ratio of output cells to vertexes is high (many cells, simple polygons)polygonToCellsCompactfunction, which is justpolygonToCells2without the uncompact step, performs about the same aspolygonToCells2🤔. I went over this several times, but it seems like the benchmark is correct?Res 9
Res 10
Res 11