Human-readable optimizations by peterboncz · Pull Request #12 · cwida/lpts

peterboncz · 2026-06-15T23:48:42Z

Make LPTS-generated SQL compact and human-readable

Motivation

LPTS emitted one CTE per logical-plan operator with opaque, fully-prefixed column names, e.g.:

WITH scan_0 (t0_a, t0_d, t0_b) AS (SELECT a, d, b FROM memory.main.t WHERE a>0), aggregate_1 (t2_d, t3_aggregate_0) AS (SELECT t0_d, sum(t0_a) FROM scan_0 GROUP BY t0_d), order_2 (...) AS (...), limit_3 (...) AS
(...) SELECT t2_d AS d, t3_aggregate_0 AS aggregate_0 FROM limit_3;

This branch reworks serialization so the output becomes (hopefully) better readable while staying exactly round-trip-equivalent, and while keeping execution order legible (the plan is cut only at joins, set ops, and subqueries):

WITH
t0_scan (a, d) AS ( 
    SELECT a, d
    FROM   memory.main.t
    WHERE  (a > 0)   
) 
SELECT   d, sum(a) AS sum_a 
FROM     t0_scan 
GROUP BY d
HAVING   ((sum(a)) > (1))
ORDER BY sum(a) ASC NULLS LAST
LIMIT    5;

What changed

Pipeline fusion (lpts_merge_pipeline, default ON) — a maximal chain of single-child pipeline operators (Limit → OrderBy → Filter → Project → Aggregate → Project → Filter, plus a pushdown-free base-table scan) is
folded into one flat SELECT .. FROM .. WHERE .. GROUP BY .. HAVING .. ORDER BY .. LIMIT .. via whole-identifier expression substitution, instead of one CTE each. Cuts the plan at joins/set-ops/subqueries; scans with a pushed-down predicate stay their own CTE. New setting;
legacy one-CTE-per-operator output is still available with SET lpts_merge_pipeline=false.
Semantic column identifiers — scans use the physical name; pure pass-throughs reuse the source name; user aliases win; aggregates are named <func>_<col> (count_star, count_distinct_b, sum_a); only genuine
computations fall back to scalar_N. At serialization, a tX_ name collapses to its bare form when that bare name is globally unique and a safe unquoted identifier, so prefixes remain only where needed to
disambiguate (e.g. across joins/self-joins/keywords).
Final-SELECT inlining — the redundant closing SELECT … FROM <last_cte> is removed; the last CTE's body becomes the result directly (re-aliased to the user-facing names), so trivial queries emit no WITH at all.
Pretty-printing — each clause on its own line, keyword left-aligned at the block indent with the expression area at +8 (or +10 with GROUP BY/ORDER BY); comma/AND lists wrap at width 100, continuing at the expression column and never splitting an expression; WITH on its own line, CTE definitions at column 0, bodies indented 4. Built on a structured SelectParts that backs both the single-line and pretty renderers. CTE names are t<index>_<operator>.
Join build-side marker — the right (inner) side of a join, which is always materialized in its own CTE, is suffixed _materialized_for_join (DuckDB output only). This is done to help DuckDB users understand which CTEs will be materialized in memory.

Testing

Full unit suite green (added merge_pipeline, identifiers, pretty_print test files and a join-suffix check; updated dialect tests for the new whitespace and added a Postgres "no suffix" assertion).
Every change verified against the SQLStorm TPC-H round-trip benchmark (--tpch_sf 0.001) with incorrect=0 throughout; success rose monotonically: 15724 → 15727 → 15729 → 15757 → 15760 / 17036 (92.51%).

unambiguous about execution order, and itry getting human-readable column names. how? combine multiple operators with obvious execution order in one "block" for Limit-OrderBy-Project-Aggr-Project-Filter-Scan - we only include Scan if it does not have predicate pushdown. - we accep[t any subsequence of this of size > 1 (simplify) rename the CTE blocks tX_operator such that tX is very visible (column names may have tx_ prefix and that is the CTE that introdduces it) when serializing, swallow "AS x" in a SELECT x AS x use tx_something as the standard column expression name, but try to keep it semantic - Scan injects the column name (t1_colname) - operators that pass through keep passing through - if the user introduced an alias with "AS alias", use it: t3_alias - for aggregations, use t4_sum_name (* becomes star, distinct itself) finally, try to remove tx_ from tx_name by a check at the end that name is a unique name in the query.

Refactor serialization around a structured SelectParts (BuildSelectParts) shared by the single-line and pretty renderers, and make the default output pretty-printed: - Pretty-printing: each clause (SELECT / FROM / WHERE / GROUP BY / HAVING / ORDER BY / LIMIT) on its own line, keyword left-aligned at the block indent with the expression area at +8 (or +10 when GROUP BY/ORDER BY is present so the 8-char keywords fit). Comma/AND-separated lists wrap at width 100, continuing at the expression column and never breaking inside an expression. WITH sits on its own line, CTE definitions start at column 0, and CTE bodies are indented 4. CTE bodies are named by the header (no redundant body AS). - HAVING fusion: a filter directly above an aggregate now folds into the merged block's HAVING clause (admitted only when a GROUP BY/aggregate is present) instead of becoming a separate filter CTE. - Joins: the right (inner) side, which is always materialized in its own CTE, is suffixed with _materialized_for_join. DuckDB output only; other dialects keep the plain name. - Tests: add pretty_print/identifiers/merge_pipeline suites and a join-suffix check; update the dialect tests' LIKE patterns for the new whitespace and add a Postgres "no suffix" assertion.

peterboncz added 2 commits June 15, 2026 23:00

peterboncz requested a review from ila June 17, 2026 22:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Human-readable optimizations#12

Human-readable optimizations#12
peterboncz wants to merge 2 commits into
mainfrom
pb/human-readable-optimizations

peterboncz commented Jun 15, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

peterboncz commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Make LPTS-generated SQL compact and human-readable

Motivation

What changed

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

peterboncz commented Jun 15, 2026 •

edited

Loading