Mrb 807 avoid regridding forecasts by Louis-Frey · Pull Request #130 · MeteoSwiss/evalml

Louis-Frey · 2026-04-07T09:34:32Z

Performance bottleneck in map_forecast_to_truth

Date: 2026-04-02

Context

During testing of the spatial verification pipeline (evalml experiment --spatial)
with a single-init-time config (forecasters-ich1-oper-fixed.yaml, 2025-03-01),
the verif_metrics_spatial_baseline jobs for ICON-CH1-EPS took ~20 minutes despite
having only one init time to process — i.e., nothing to aggregate.

Root cause

The bottleneck is in src/verification/spatial.py: map_forecast_to_truth().

Even when the forecast (ICON-CH1-EPS) and truth (KENDA-CH1) are on the same
1km grid and no actual remapping is needed, the function always:

Stacks (y, x) into a flat 'values' dimension (~1M+ points)
Builds a cKDTree over all source (forecast) lat/lon points -- O(N log N)
Queries the tree for all target (truth) lat/lon points -- O(M log N)

At 1km resolution this involves ~1M grid points, making the kd-tree operations
the dominant cost regardless of how many init times are processed.

There is already a TODO comment in the code acknowledging this:

TODO: return fcst unchanged when forecast and truth are already aligned

(src/verification/spatial.py, line 124)

Recommended fix

Before building the kd-tree, check whether forecast and truth lat/lon
coordinates are already aligned (e.g. same shape and max abs difference
below a small tolerance). If so, return fcst unchanged immediately.

This would make the baseline spatial verification near-instantaneous for
same-grid configurations (ICON-CH1-EPS vs KENDA-CH1), and would also
benefit any other same-grid run/truth combination.

…y aligned Avoids O(N log N) kd-tree build and query (~1M points at 1km resolution) when forecast and truth share the same grid, reducing baseline spatial verification from ~20 minutes to near-instantaneous.

jonasbhend · 2026-05-08T11:52:09Z

+    result = map_forecast_to_truth(fcst, truth)
+
+    assert result is fcst


Hi @Louis-Frey. Great addition. The below is just a suggestion:

This doesn't really test if we abort early, no? Either the test should be expanded to actually test timing (probably difficult), or one could add a log statement to the if branch to check if the early option is taken as suspected (and check if not if the grids differ).

Hi Jonas, thanks! I will have a look...

Louis-Frey added 2 commits April 7, 2026 11:12

Skip kd-tree remapping in map_forecast_to_truth when grids are alread…

f708df7

…y aligned Avoids O(N log N) kd-tree build and query (~1M points at 1km resolution) when forecast and truth share the same grid, reducing baseline spatial verification from ~20 minutes to near-instantaneous.

Add test for map_forecast_to_truth fast path when grids are aligned

f141061

Louis-Frey requested review from dnerini and frazane April 7, 2026 09:34

jonasbhend reviewed May 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mrb 807 avoid regridding forecasts#130

Mrb 807 avoid regridding forecasts#130
Louis-Frey wants to merge 2 commits into
mainfrom
MRB-807-avoid-regridding-forecasts

Louis-Frey commented Apr 7, 2026

Uh oh!

jonasbhend May 8, 2026 •

edited

Loading

Uh oh!

Louis-Frey May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		result = map_forecast_to_truth(fcst, truth)

		assert result is fcst

Conversation

Louis-Frey commented Apr 7, 2026

Performance bottleneck in map_forecast_to_truth

Context

Root cause

TODO: return fcst unchanged when forecast and truth are already aligned

Recommended fix

Uh oh!

jonasbhend May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Louis-Frey May 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jonasbhend May 8, 2026 •

edited

Loading