Skip to content

Tags: hardwood-hq/hardwood

Tags

1.0-early-access

Toggle 1.0-early-access's commit message
#656 Read legacy repeated-field list encodings

Parquet's LIST backward-compatibility rules require readers to accept legacy
encodings Hardwood surfaced incorrectly. A bare unannotated REPEATED field is a
required list of required elements, but was assembled as a scalar (a repeated
primitive printed a single value, a repeated group a single struct). A legacy
two-level list-of-lists was unreadable: NestedLevelComputer emitted no
repetition layer for a bare repeated primitive, leaving the leaf's layer count
short of its max repetition level. Both are now read per spec.

The fix lives in the RowReader assembly (a bare repeated field is wrapped in a
synthetic required LIST whose element is the field itself) and in the shared
NestedLevelComputer layer model, so the ColumnReader path is corrected by the
same change. getListElement now implements all five element-resolution rules,
including the repeated-child rule that tells a genuine list-of-lists element
from a synthetic single-field wrapper.

Fixtures for the legacy forms are produced by footer surgery in
simple-datagen.py, since PyArrow only emits the modern fully-annotated forms.

v1.0.0.CR2

Toggle v1.0.0.CR2's commit message
[release]copy for tag v1.0.0.CR2

v1.0.0.CR1

Toggle v1.0.0.CR1's commit message
[release]copy for tag v1.0.0.CR1

v1.0.0.Beta2

Toggle v1.0.0.Beta2's commit message
[release]copy for tag v1.0.0.Beta2

v1.0.0.Beta1-docs

Toggle v1.0.0.Beta1-docs's commit message
[release] Updating release notes and CLI demo script

v1.0.0.Beta1

Toggle v1.0.0.Beta1's commit message
[release] Fixing POMs

v1.0.0.Alpha1-docs

Toggle v1.0.0.Alpha1-docs's commit message
#109 Adding API Reference link to docs

v1.0.0.Alpha1

Toggle v1.0.0.Alpha1's commit message
[release]copy for tag v1.0.0.Alpha1