Tags: hardwood-hq/hardwood
Tags
#656 Read legacy repeated-field list encodings Parquet's LIST backward-compatibility rules require readers to accept legacy encodings Hardwood surfaced incorrectly. A bare unannotated REPEATED field is a required list of required elements, but was assembled as a scalar (a repeated primitive printed a single value, a repeated group a single struct). A legacy two-level list-of-lists was unreadable: NestedLevelComputer emitted no repetition layer for a bare repeated primitive, leaving the leaf's layer count short of its max repetition level. Both are now read per spec. The fix lives in the RowReader assembly (a bare repeated field is wrapped in a synthetic required LIST whose element is the field itself) and in the shared NestedLevelComputer layer model, so the ColumnReader path is corrected by the same change. getListElement now implements all five element-resolution rules, including the repeated-child rule that tells a genuine list-of-lists element from a synthetic single-field wrapper. Fixtures for the legacy forms are produced by footer surgery in simple-datagen.py, since PyArrow only emits the modern fully-annotated forms.
[release] Updating release notes and CLI demo script