Human-readable optimizations#12
Open
peterboncz wants to merge 2 commits into
Open
Conversation
unambiguous about execution order, and itry getting human-readable column names. how? combine multiple operators with obvious execution order in one "block" for Limit-OrderBy-Project-Aggr-Project-Filter-Scan - we only include Scan if it does not have predicate pushdown. - we accep[t any subsequence of this of size > 1 (simplify) rename the CTE blocks tX_operator such that tX is very visible (column names may have tx_ prefix and that is the CTE that introdduces it) when serializing, swallow "AS x" in a SELECT x AS x use tx_something as the standard column expression name, but try to keep it semantic - Scan injects the column name (t1_colname) - operators that pass through keep passing through - if the user introduced an alias with "AS alias", use it: t3_alias - for aggregations, use t4_sum_name (* becomes star, distinct itself) finally, try to remove tx_ from tx_name by a check at the end that name is a unique name in the query.
Refactor serialization around a structured SelectParts (BuildSelectParts) shared by the single-line and pretty renderers, and make the default output pretty-printed: - Pretty-printing: each clause (SELECT / FROM / WHERE / GROUP BY / HAVING / ORDER BY / LIMIT) on its own line, keyword left-aligned at the block indent with the expression area at +8 (or +10 when GROUP BY/ORDER BY is present so the 8-char keywords fit). Comma/AND-separated lists wrap at width 100, continuing at the expression column and never breaking inside an expression. WITH sits on its own line, CTE definitions start at column 0, and CTE bodies are indented 4. CTE bodies are named by the header (no redundant body AS). - HAVING fusion: a filter directly above an aggregate now folds into the merged block's HAVING clause (admitted only when a GROUP BY/aggregate is present) instead of becoming a separate filter CTE. - Joins: the right (inner) side, which is always materialized in its own CTE, is suffixed with _materialized_for_join. DuckDB output only; other dialects keep the plain name. - Tests: add pretty_print/identifiers/merge_pipeline suites and a join-suffix check; update the dialect tests' LIKE patterns for the new whitespace and add a Postgres "no suffix" assertion.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Make LPTS-generated SQL compact and human-readable
Motivation
LPTS emitted one CTE per logical-plan operator with opaque, fully-prefixed column names, e.g.:
This branch reworks serialization so the output becomes (hopefully) better readable while staying exactly round-trip-equivalent, and while keeping execution order legible (the plan is cut only at joins, set ops, and subqueries):
What changed
folded into one flat
SELECT .. FROM .. WHERE .. GROUP BY .. HAVING .. ORDER BY .. LIMIT ..via whole-identifier expression substitution, instead of one CTE each. Cuts the plan at joins/set-ops/subqueries; scans with a pushed-down predicate stay their own CTE. New setting;legacy one-CTE-per-operator output is still available with
SET lpts_merge_pipeline=false.<func>_<col>(count_star, count_distinct_b, sum_a); only genuinecomputations fall back to scalar_N. At serialization, a tX_ name collapses to its bare form when that bare name is globally unique and a safe unquoted identifier, so prefixes remain only where needed to
disambiguate (e.g. across joins/self-joins/keywords).
SELECTinlining — the redundant closingSELECT … FROM <last_cte>is removed; the last CTE's body becomes the result directly (re-aliased to the user-facing names), so trivial queries emit no WITH at all.GROUP BY/ORDER BY); comma/AND lists wrap at width 100, continuing at the expression column and never splitting an expression; WITH on its own line, CTE definitions at column 0, bodies indented 4. Built on a structured SelectParts that backs both the single-line and pretty renderers. CTE names aret<index>_<operator>._materialized_for_join(DuckDB output only). This is done to help DuckDB users understand which CTEs will be materialized in memory.Testing