🐛 Enrich partial indicator-upgrader mapping in Anomalist#6260
🐛 Enrich partial indicator-upgrader mapping in Anomalist#6260paarriagadap wants to merge 3 commits into
Conversation
|
Quick links (staging server):
Login: chart-diff: ✅No charts for review.data-diff: ✅ No differences foundAutomatically updated datasets matching excess_mortality|covid|fluid|flunet|country_profile|garden/ihme_gbd/2019/gbd_risk are not included Edited: 2026-06-12 13:02:27 UTC |
… in Anomalist The indicator upgrader persists variable mappings only for charted indicators. Anomalist used any non-empty wizard mapping as-is, so its upgrade detectors (upgrade_missing / upgrade_change) silently scanned only those indicators — a partial mapping suppressed the shortName-inference fallback that an empty table would have triggered. Found on the PFMH update, where the USA's dropped pre-1929 history was flagged only on expenditure (the lone charted indicator) and invisible on the other nine. load_variable_mapping now always infers pairs by shortName for each new dataset and its previous version, and overlays the upgrader's explicit entries on top (explicit wins). owidbot now passes the new->old dataset pairs it already computes, so the staging bot benefits too. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1d43f6afee
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
|
Hi @pabloarosado! Claude suggested doing this, so we can access all the This is relevant because we might use indicators in MDims that are not in charts. Let me know if this makes sense. |
Summary
Fixes a coverage gap in Anomalist's upgrade detectors, found during the Public Finances in Modern History update (#6256).
The bug. Anomalist's
upgrade_missing/upgrade_changedetectors only compare old→new variable pairs from the wizard's variable-mapping table. The indicator upgrader persists mappings only for charted indicators, andload_variable_mappingused any non-empty table as-is — so a partial mapping silently restricted the upgrade detectors to the charted subset, while suppressing the shortName-inference fallback that an empty table would have triggered. A partial mapping was strictly worse than none.Real instance. In #6256, the IMF dropped all USA observations before 1929 across all ten indicators, and revised ~1,600 country-year values per column. Only one indicator (expenditure) is used in charts, so only it was in the mapping — Anomalist flagged the USA data loss on expenditure alone, and the other nine indicators (including gross debt, with revisions up to 26 pp of GDP) were never scanned by either upgrade detector.
Why full coverage matters. A dataset update upgrades all of the dataset's indicators, not just the ones currently feeding standalone charts. Indicators outside the chart-derived mapping are still consumed through MDims and explorers, published in the catalog and data API, cited in articles, and may get charted later. Anomalist QA is the gate where regressions in any of them should surface; scoping it to the charted subset means, e.g., an MDim-only indicator could lose or corrupt data without any anomaly ever being raised.
The fix.
load_variable_mappingnow always infers old→new pairs by shortName for each new dataset and its previous version, and overlays the indicator upgrader's explicit entries on top (explicit wins). The empty-table and no-pairs behaviors are unchanged.owidbot/anomalist.pyalready computed the new→old dataset pairs but discarded them, callingload_variable_mappingwithout them — meaning the staging bot never even reached the inference fallback. It now passes them through.Testing
Against the #6256 staging DB (which has the real partial mapping — 2 expenditure-only entries):
load_variable_mapping([7987], {7987: 7049})→ 11 pairs covering all 10 indicators, with both explicit entries preserved and 9 pairs enriched by shortName (log line confirms).load_variable_mapping([7987])(no pairs, old bot call shape) → 2 pairs, unchanged behavior.upgrade_missing(previously: expenditure only) and grewupgrade_changefrom expenditure-only to 1,049 reduced rows.make checkpasses.🤖 Generated with Claude Code