Make schema parsing lazy in Metadata#6117
Merged
yili-db merged 2 commits intoFeb 27, 2026
Merged
Conversation
| // Logical data schema excluding partition columns | ||
| private final Lazy<StructType> dataSchema; | ||
|
|
||
| public Metadata( |
Collaborator
There was a problem hiding this comment.
who calls this now? e.g. just a bunch of tests?
Contributor
Author
There was a problem hiding this comment.
It's called by TransactionMetadataFactory and many tests
dengsh12
approved these changes
Feb 26, 2026
huashi-st
pushed a commit
to huashi-st/delta
that referenced
this pull request
Apr 24, 2026
#### Which Delta project/connector is this regarding? - [ ] Spark - [ ] Standalone - [ ] Flink - [X] Kernel - [ ] Other (fill in here) ## Description Make schema parsing lazy in `Metadata` so that loading a snapshot no longer eagerly deserializes `schemaString`. Tables created with earlier Delta versions can have `VOID` type columns. Previously, `Metadata.fromColumnVector()` eagerly called `DataTypeJsonSerDe.deserializeStructType()`, which throws a `KernelException` for unsupported types like `VOID`. This blocked the entire snapshot load even for callers that never access the parsed schema. ## How was this patch tested? - **New unit test in `MetadataSuite`:** *"schema parsing is lazy - void type does not block non-schema access"* — constructs a `Metadata` via `fromRow` with a `VOID`-type schema string, verifies `getId()`, `getConfiguration()`, and `getSchemaString()` work without triggering parsing, then verifies `getSchema()` throws `KernelException` with the `VOID` message. - **Updated integration test in `DeltaTableReadsSuite`:** *"table with void type - schema parsing is lazy"* — creates a Delta table with a `VOID` column, verifies `latestSnapshot()` succeeds and `getVersion()` works, then verifies `snapshot.getSchema` throws the expected `KernelException`. - Existing `MetadataSuite` tests (configuration merging, serialization round trip) continue to pass. ## Does this PR introduce any user-facing changes? **Yes.** Previously, calling `Table.getLatestSnapshot()` (or any snapshot resolution) on a table with unsupported schema types (e.g., `VOID`) would throw a `KernelException` immediately. After this change, snapshot loading succeeds and the exception is deferred to the point where the schema is actually accessed (e.g., `snapshot.getSchema()`, `snapshot.getScanBuilder()`). Callers that only need metadata like table ID, version, or configuration are no longer blocked.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which Delta project/connector is this regarding?
Description
Make schema parsing lazy in
Metadataso that loading a snapshot no longer eagerly deserializesschemaString.Tables created with earlier Delta versions can have
VOIDtype columns. Previously,Metadata.fromColumnVector()eagerly calledDataTypeJsonSerDe.deserializeStructType(), which throws aKernelExceptionfor unsupported types likeVOID. This blocked the entire snapshot load even for callers that never access the parsed schema.How was this patch tested?
MetadataSuite: "schema parsing is lazy - void type does not block non-schema access" — constructs aMetadataviafromRowwith aVOID-type schema string, verifiesgetId(),getConfiguration(), andgetSchemaString()work without triggering parsing, then verifiesgetSchema()throwsKernelExceptionwith theVOIDmessage.DeltaTableReadsSuite: "table with void type - schema parsing is lazy" — creates a Delta table with aVOIDcolumn, verifieslatestSnapshot()succeeds andgetVersion()works, then verifiessnapshot.getSchemathrows the expectedKernelException.MetadataSuitetests (configuration merging, serialization round trip) continue to pass.Does this PR introduce any user-facing changes?
Yes. Previously, calling
Table.getLatestSnapshot()(or any snapshot resolution) on a table with unsupported schema types (e.g.,VOID) would throw aKernelExceptionimmediately. After this change, snapshot loading succeeds and the exception is deferred to the point where the schema is actually accessed (e.g.,snapshot.getSchema(),snapshot.getScanBuilder()). Callers that only need metadata like table ID, version, or configuration are no longer blocked.