-
Notifications
You must be signed in to change notification settings - Fork 11
Open
Description
Summary
- Today, types for plans/expressions come from two places: (a) node fields (notably
ValDef.dataTypeand schemas on scan/value nodes), and (b)symbol.symbolInfo.dataTypeattached to trees. This can drift and is ambiguous for contributors. - Goal is to make symbols the single authoritative source for types where it matters (definitions and stable sources), while keeping relationType computation functional for intermediate operators. This matches Scala 3’s denotation-driven design.
Motivation / Goals
- Single source of truth for types at definitions and named entities.
- Eliminate drift between node fields and symbol info.
- Clarify API: readers consult
node.symbol.dataTypewhen a symbol exists; intermediate plans continue derivingrelationTypefrom children. - Align with Scala 3 compiler’s symbols/denotations model.
Non-Goals
- Do not remove
schemafrom leaf source nodes (e.g., scans/files) — those nodes originate schemas. - Do not rework all intermediate operators to store types on nodes; they continue to compute
relationTypefrom children. - Do not overhaul expression typing in one pass.
Scope
- In-scope:
ValDef,ModelDef, and leaf source relations (Values,TableScan,FileScan,ModelScan,IncrementalTableScan). - Out-of-scope (for now): assigning symbols to every intermediate
Relation/Expressionnode.
Current State (where DataType/RelationType fields are read)
ValDef.dataTypeis parsed and read by the typer and codegen;SymbolLabeleralready seedsValSymbolInfo.tpefrom it.ModelDef.givenRelationTypeis optional;relationTypefalls back tochild.relationType.ModelSymbolInfo.tpeis created from this.- Leaf sources carry
schema: RelationTypeand computerelationTypefrom it (possibly with column projection). Resolved by catalog/inspection passes.
Design (Target)
- Symbols are authoritative for definitions and named sources:
ValDef: read/write type viav.symbol.dataType.ModelDef: continue as-is; treatm.symbol.dataTypeas the source of truth.- Source relations (
Values/*Scan): keepschemaon the node, but seednode.symbol.dataType = schemaso downstream readers can consistently use symbol views.
- Intermediate operators: keep computing
relationTypefrom inputs (no new fields). Optionally exposenode.symbol.dataTypewhen a symbol exists, otherwise fall back torelationType.
Plan
-
Definitions first (easy win)
- Add a typed accessor (helper/extension)
tpe(node) = node.symbol.dataTypewith fallback to current field where needed. - Update reads in:
TypeResolver: pattern-match onv.symbol.dataTypefor table value constants; avoid readingv.dataTypedirectly.WvletGenerator: print val types fromv.symbol.dataType, fallback to parsed type for BC.- Tests: assert on
valDef.symbol.dataTypeinstead ofvalDef.dataType.
- Keep
ValDef.dataTypefor parsing; deprecate direct reads in internal code.
- Add a typed accessor (helper/extension)
-
Symbolize source relations (moderate)
- In
SymbolLabeleror a new light phase (e.g.,SourceSymbolizerafter labeler): assign symbols toValues,TableScan,FileScan,ModelScan,IncrementalTableScanand setsymbolInfo.dataType = node.schema(or computed schema forValues). - Add an invariant check in tests: when a plan has a symbol,
plan.symbol.dataType == plan.relationType. - Gradually migrate internal reads that want a uniform API to consult
node.symbol.dataTypewhen present.
- In
-
Cleanup + guardrails
- Deprecate internal usage of
ValDef.dataType; plan removal in a future minor once call sites are switched. - Add a lint/check in specs to flag direct
ValDef.dataTypereads in compiler/optimizer packages. - Document the rule: “Definitions and sources → types on SymbolInfo; operators → compute relationType; expressions keep structural
def dataTypeuntil a later phase.”
- Deprecate internal usage of
Acceptance Criteria
- No functional regressions in
./sbt test. ValDeftyping flows entirely throughsymbol.dataType; codegen and TypeResolver no longer rely onv.dataType(except in parsing and initial seeding).- Source relations have symbols with
dataTypematching theirrelationType. - New helper to get a node’s type consistently is available and used in updated sites.
Risks / Considerations
- Rewrites must preserve symbols. Our
SyntaxTreeNode.copyInstancealready copies metadata and updates symbol.tree. - Avoid double sources of truth by treating schema fields as inputs and symbol info as the canonical view post-labeling.
- Performance: symbol lookups are O(1); minimal impact.
Effort Estimate
- Step 1 (definitions): 0.5–1 day including tests and codegen.
- Step 2 (sources): 1–2 days to assign symbols + invariants.
- Step 3 (cleanup): <0.5 day.
Task Checklist
- Add typed accessor for
symbol.dataType(with safe fallback where appropriate). - Switch
TypeResolvertable-value-constant logic tov.symbol.dataType. - Switch
WvletGeneratorto prefer symbol types forvalprinting. - Assign symbols to
Values,TableScan,FileScan,ModelScan,IncrementalTableScanand seedsymbolInfo.dataType. - Add invariant tests equating
plan.symbol.dataTypeandplan.relationTypeon symbolized nodes. - Update parser tests to assert symbol types instead of node fields.
- Add developer note in CONTRIBUTING/docs about symbol‑centric typing.
- Deprecate internal reads of
ValDef.dataType; plan removal milestone.
References
- Mirrors Scala 3’s denotation-driven design: trees are immutable; types live on symbols and evolve by phase.
- Relevant files to touch (paths):
wvlet-lang/src/main/scala/wvlet/lang/compiler/analyzer/TypeResolver.scalawvlet-lang/src/main/scala/wvlet/lang/compiler/analyzer/SymbolLabeler.scala(or new phase)wvlet-lang/src/main/scala/wvlet/lang/compiler/codegen/WvletGenerator.scalawvlet-lang/src/main/scala/wvlet/lang/model/plan/plan.scala(ValDef)wvlet-lang/src/test/scala/...parser/typing specs
Metadata
Metadata
Assignees
Labels
No labels