Skip to content

Fix stoi crash in Arrow format string parsing for w: and +w: types#21692

Merged
Mytherin merged 2 commits into
duckdb:v1.5-variegatafrom
yharby:fix/arrow-stoi-geoparquet
Mar 30, 2026
Merged

Fix stoi crash in Arrow format string parsing for w: and +w: types#21692
Mytherin merged 2 commits into
duckdb:v1.5-variegatafrom
yharby:fix/arrow-stoi-geoparquet

Conversation

@yharby
Copy link
Copy Markdown
Contributor

@yharby yharby commented Mar 29, 2026

Summary

  • format.find(':') in arrow_duck_schema.cpp scanned the entire Arrow format string, matching colons in extension metadata (e.g. GEOMETRY('ogc:crs84') from GeoParquet CRS). This caused std::stoi to crash with stoi: no conversion.
  • Replaced with fixed offsets per the Arrow C Data Interface spec: colon is always at position 1 for w:NN (fixed-size binary) and position 2 for +w:NN (fixed-size list).
  • Added regression test calling GetTypeFromFormat directly with w: format strings.

Fixes #21691
Ref: duckdb/duckdb-wasm#2199

Changes

src/function/table/arrow/arrow_duck_schema.cpp (2 lines):

  • Line 191: format.substr(format.find(':') + 1) -> format.substr(2)
  • Line 230: format.substr(format.find(':') + 1) -> format.substr(3)

test/arrow/arrow_roundtrip.cpp: new TEST_CASE("Test Arrow fixed-size binary format parsing") exercising w:1, w:16, w:128.

Test plan

  • New unit test passes (6 assertions)
  • All 31 existing [arrow] tests pass (131,509 assertions, 0 failures)
  • Formatted with clang-format 11.0.1

format.find(':') scanned the entire format string, matching colons in
extension metadata (e.g. GEOMETRY('ogc:crs84') from GeoParquet CRS).
Use fixed offsets per the Arrow C Data Interface spec instead: colon is
always at position 1 for "w:NN" and position 2 for "+w:NN".

Fixes duckdb#21691
Ref: duckdb/duckdb-wasm#2199
@yharby yharby changed the base branch from main to v1.5-variegata March 29, 2026 17:05
Copy link
Copy Markdown
Collaborator

@Mytherin Mytherin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! LGTM - one comment

Comment thread src/function/table/arrow/arrow_duck_schema.cpp
Comment thread src/function/table/arrow/arrow_duck_schema.cpp
Verify format conforms to spec before parsing (w:NN and +w:NN),
throwing InvalidInputException for malformed strings.
@Mytherin Mytherin marked this pull request as draft March 30, 2026 07:47
@Mytherin Mytherin marked this pull request as ready for review March 30, 2026 07:47
@Mytherin Mytherin merged commit f995d86 into duckdb:v1.5-variegata Mar 30, 2026
127 of 148 checks passed
@yharby yharby deleted the fix/arrow-stoi-geoparquet branch March 30, 2026 12:31
carlopi added a commit to carlopi/duckdb-wasm that referenced this pull request Apr 7, 2026
carlopi added a commit to duckdb/duckdb-wasm that referenced this pull request Apr 7, 2026
carlopi added a commit to carlopi/duckdb-wasm that referenced this pull request Apr 7, 2026
carlopi added a commit to duckdb/duckdb-wasm that referenced this pull request Apr 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Arrow format string parsing crashes with stoi on parameterized types (e.g. GEOMETRY with CRS)

2 participants