Skip to content

analyzeQuery: relations omits base tables nested in CTE / derived-table definitions (can't enumerate a query's physical backing tables) #234

@ujaval403

Description

@ujaval403

Summary

analyzeQuery(...).analysis.relations reports only the direct FROM/JOIN relations of the top-level query. When a query reads from a CTE or a derived table, relations lists the CTE/subquery alias but not the physical base tables referenced inside those definitions — so there's no way, from analyzeQuery alone, to enumerate the real backing tables a query touches.

Repro (v0.5.1, DuckDB dialect)

import { analyzeQuery, Dialect } from '@polyglot-sql/sdk';

analyzeQuery('WITH c AS (SELECT id FROM orders) SELECT id FROM c', Dialect.DuckDB).analysis.relations;
// → [{ name: 'c', alias: null, kind: 'cte', columns: ['id'] }]
//   the physical table `orders` is absent

analyzeQuery('SELECT x FROM (SELECT id FROM orders) d', Dialect.DuckDB).analysis.relations;
// → [{ name: 'd', kind: 'derived_table', ... }]   — `orders` absent

analyzeQuery('SELECT * FROM (SELECT * FROM (SELECT id FROM orders) x) y', Dialect.DuckDB).analysis.relations;
// → [{ name: 'y', kind: 'derived_table', ... }]   — `orders` absent

A flat query is fine: SELECT a FROM orders UNION SELECT a FROM usersrelations lists both orders and users.

Expected / desired

A way to get the transitive set of physical base tables a query reads — including those defined inside CTE and derived-table scopes. Either:

  • a flag on analyzeQuery (e.g. include nested/transitive relations), or
  • a dedicated field (backingTables / baseTables), or
  • at minimum, documenting that relations is direct-scope-only and pointing to the supported way to enumerate all base tables.

Why

A common static-analysis need is "which physical tables does this query depend on?" (lineage graphs, permission/scope checks, cache-invalidation keys). projections[].upstream resolves per output column through CTEs, but a query's backing tables aren't necessarily all reachable from the projected columns (e.g. a table joined only for filtering, or a column dropped by an intermediate aggregate), so projection-upstream isn't a complete substitute.

Notes

  • getSourceTables(column, sql) traces a single output column transitively, but it's column-scoped (and errors with Parse error at line 0, column 0: Cannot find column '<x>' in query when asked for a column that exists only inside a subquery, not in the top output) — so it doesn't cover "all backing tables" either.
  • Related: Python API: expose compact query analysis facts without Python AST traversal #229 (compact analysis facts) specified relations but didn't cover transitive base-table enumeration.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions