Alternate maxPolygonToCellsSize implementation#805
Conversation
You know you're saying that on average, calculating the max size takes about 290 nanoseconds. I just can't see anyone saying they don't have 290ns of time to spare, myself. ;) |
|
The polygonToCells tests are what's failing, btw I think we hardwired some of the |
About to look, but I'll bet the failure has to do with adding the new polyfill mode - test issue, not code issue |
Agree, but it can add up in a hot loop, and there are probably some pathological cases (e.g. polygons with millions of vertexes). Those might be slow for the existing estimator too though. |
Yes, but when you were talking about the current algo being constant-time, it's only that after the bounding box has been determined, which is But also, if we are keeping the |
Hmm... But it should be backwards compatible, right? We already required a mode that was forced to be zero with V4, so the actual function call signature and behavior should not have changed. It should only be things like |
The tests still use the old estimator - it was the way we were checking for invalid modes in the tests (using |
On this, I'm fine with reducing the max size, but the fair comparison would be old algo + old estimator vs new algo + new estimator. The old algo was coupled with a lot of false positives to deal with, right? |
Yes, I did - I'm updating the benchmark in this PR. With the new estimator, the new algo is slower by res 4: Compare with the previous results, which didn't include either estimator. I think this is probably an acceptable tradeoff - we may not need/want to use the estimator at all once we have streaming, and the old estimator was actually wrong for the new modes, and possibly for some polar areas. But it's not no-cost. |
| CONTAINMENT_FULL = 1, ///< Cell is fully contained in the shape | ||
| CONTAINMENT_OVERLAPPING = 2, ///< Cell overlaps the shape at any point | ||
| CONTAINMENT_INVALID = 3 ///< This mode is invalid and should not be used | ||
| CONTAINMENT_OVERLAPPING_BBOX = 3, ///< Cell bounding box overlaps shape |
There was a problem hiding this comment.
Is this purely an internal detail, or should be exposed to the user?
There was a problem hiding this comment.
I think users could potentially benefit from this - it's a faster (2-3x) version of CONTAINMENT_OVERLAPPING that may contain false positives, but no false negatives. Not appropriate for most use cases, but some users might find it useful.
| return iter.error; | ||
| } | ||
|
|
||
| static int MAX_SIZE_CELL_THRESHOLD = 10; |
There was a problem hiding this comment.
Per @dfellis - we could potentially allow the caller to configure this value, by supplying it in the flags. The caller could give either a specific value, or an "order of magitude" value (exponent of 10) that could make this more accurate.
Co-authored-by: David Ellis <isv.damocles@gmail.com>
Adds an alternate implementation of
maxPolygonToCellsSize, based on a coarse run ofpolygonToCellsExperimental. This approach is slower than the currentmaxPolygonToCellsSize, but should accommodate the new modes and fix cases where we were previously underestimating the new polyfill algo around the poles. It generally provides a much better ratio of memory allocation to memory use.Algorithm Overview
polygonToCellsCompactiterator, without taking the first step.MAX_SIZE_CELL_THRESHOLDcells (thresholdis currently set to 100, which is a little arbitrary)0.CONTAINMENT_OVERLAPPING_BBOXmode (likeCONTAINMENT_OVERLAPPING, but checks the cell bounding box rather than the boundary, for about 2-3x perf improvement). For each output cell, add the count of its children at the target res to the output estimate.Results
MAX_SIZE_CELL_THRESHOLDvalue, at the cost of precision. Benchmarks:If it seems useful, I can try different values for
MAX_SIZE_CELL_THRESHOLD, which I assume will allow us to trade some of the ratio advantage for faster perf.TODO
polygonToCellsExperimentaltests