Skip to content

Feat: Add TOAST bloat estimation#1

Merged
Alain-L merged 18 commits into
mainfrom
dev
Mar 13, 2026
Merged

Feat: Add TOAST bloat estimation#1
Alain-L merged 18 commits into
mainfrom
dev

Conversation

@Alain-L

@Alain-L Alain-L commented Feb 18, 2026

Copy link
Copy Markdown
Owner

Summary

  • TOAST bloat estimation via --toast flag, using PostgreSQL system catalogs (no pgstattuple required)
  • Go implementation: avg(length(chunk_data)) full scan on TOAST tables — exact chunk average, ~90 ms per 100 MB
  • Standalone SQL (sql/toast_bloat.sql): DO block + named cursor using pg_column_size() on main tables — no objects created, handles both EXTERNAL and EXTENDED storage
  • JSON output support for TOAST bloat results
  • Integration tests covering all TOAST profiles

Accuracy (vs pgstattuple)

Table Profile pgstattuple Go Standalone Delta max
audit_log 4 KB EXTERNAL 69.46% 69.6% 69.5% +0.14
toast_aligned 2 full chunks 54.95% 55.0% 55.0% +0.05
toast_large 20 KB, 11 chunks 52.88% 53.0% 53.0% +0.12
toast_mixed 2.1/8/15 KB mixed 52.60% 52.7% 52.7% +0.10
toast_narrow 2.1 KB, 2 chunks 51.66% 51.7% 51.7% +0.04
toast_multi_col 2 columns 48.98% 49.0% 48.9% -0.08
toast_compressed EXTENDED 47.55% 47.6% 47.6% +0.05

All within < 0.2 pts of pgstattuple on tables >= 10 MB.

Files changed

  • analysis/toast_bloat.go — TOAST bloat detection with temp helper function
  • analysis/types.goToastBloat struct
  • sql/toast_bloat.sql — Standalone estimation query (unique in the PG community)
  • cmd/root.go--toast / --heap flag wiring
  • output/text.go — TOAST text output with severity grouping
  • output/json.go — TOAST JSON output
  • tests/toast_estimate_test.go — Integration tests
  • testdata/demo.sql — Demo database with 8 TOAST profiles
  • README.md — Documentation and usage examples
  • CHANGELOG.md — v0.3.0 release notes
  • .goreleaser.yml — Include standalone SQL in packages

Test plan

  • go test ./... — 46/46 pass
  • ./bin/qwash --estimate --toast -d qwash_demo — verified
  • ./bin/qwash --estimate --toast --json -d qwash_demo — verified
  • Standalone SQL vs pgstattuple — all within < 0.2 pts
  • --heap --toast combined mode — verified

alesage added 18 commits December 24, 2025 15:46
Add --toast flag to estimate TOAST table bloat using ppc (pages per chunk) algorithm.

Features:
- New --toast and --heap flags for target selection (--heap is default)
- TOAST bloat estimation using chunk sampling for accuracy
- Summary view with severity grouping (critical/high/medium)
- Detailed view for specific tables (-t flag)
- Reliability indicator for tables < 10 MB

Usage:
  qwash --estimate --toast -d mydb
  qwash --estimate --toast -t mytable -d mydb
Add createBloatedToastTable helper with realistic ~10KB/row data
(5 chunks per value, STORAGE EXTERNAL) and 7 test cases covering
basic/high/low/no bloat, small tables, CLI output, and chunk sampling.
- Sample full-size chunks (chunk_seq > 0) with fallback to avoid
  partial last-chunk bias in ppc_ref calculation
- Filter out tables with relpages = 0 (stale/uninitialized stats)
- Use current_setting('block_size') instead of hardcoded 8192
- Move helper function to pg_temp schema for crash-safe cleanup
- Add stale stats warning when no VACUUM in the last 24 hours
  (joins pg_stat_user_tables for last_vacuum/last_autovacuum)
- Align sql/toast_bloat.sql reference file with embedded query
TOAST tables do use the FSM for new insertions (via heap_insert),
contrary to what was previously stated. The reason UPDATE col=col
doesn't compact TOAST is that PostgreSQL reuses the existing TOAST
pointer when the value hasn't changed, not because TOAST is append-only.
The previous chunk_seq > 0 sampling strategy returned the partial last
chunk for values with exactly 2 chunks (e.g. ~3KB data), producing a
~48% false-positive bloat estimate on non-bloated tables.

New strategy: find a multi-chunk chunk_id first, then read its
chunk_seq=0 which is always exactly TOAST_MAX_CHUNK_SIZE (1996 bytes
on 8kB blocks). Falls back to any chunk for single-chunk-only tables.

Adds TestToastEstimateTwoChunkValues regression test.
More readable output for standalone usage: min_pages_required and
bloat_pages are directly comparable to toast_pages, consistent with
the min_pages_required pattern in sql/table_bloat.sql.

The embedded Go query is unchanged (uses raw values internally).
Compute TOAST_MAX_CHUNK_SIZE as block_size/4 - 52 instead of
sampling actual chunks via PL/pgSQL function. Uses constants CTE
pattern matching table_bloat.sql style.
Add --toast and --heap flags to command reference, quick start
examples, output sample, and bloat estimation explanation.
Replace hand-crafted README output examples with real qwash output
from a demo database. Fixes order_items at 41% shown in CRITICAL
(≥50%) section. Examples now show all severity levels (CRITICAL,
HIGH, MEDIUM) with correct categorization.

Add omitempty to BloatReport.Indexes JSON tag to suppress
"indexes": null from JSON output when no index data is present.
Add toast field to BloatReport JSON structure so --toast --json
produces proper TOAST bloat data instead of empty output. Works
for all combinations: --toast alone, --heap --toast, and with -t
table filtering. Add TOAST JSON example to README.
The sampling function was reading only chunk_seq=0 (always full-size
at TOAST_MAX_CHUNK_SIZE), ignoring that the last chunk of each value
is typically much smaller. This caused systematic underestimation.

Fix: average all chunks of a sampled value to capture the real
per-chunk size. On test data, TOAST bloat estimation goes from
55% to 69.6%, matching pgstattuple reference (69.5%).
Replace single-value sampling with avg(length(chunk_data)) across
all live chunks. This naturally handles multi-column tables and
mixed payload sizes with ~90 ms per 100 MB of TOAST data.
Replace octet_length() with pg_column_size() which reads va_extsize
from the TOAST pointer. Unlike octet_length() which returns the
decompressed size, pg_column_size() returns the actual stored size,
making it correct for both EXTERNAL and EXTENDED storage.

Rewrite standalone SQL as DO block + named cursor with dynamic
UNION ALL of pg_column_size() per table. No objects created.

Add toast_compressed table to demo for EXTENDED storage testing.
Use exact multiples of TOAST_MAX_CHUNK_SIZE (1996 bytes) in test
payloads to avoid partial last chunks that skew avg(length(chunk_data))
and cause false bloat overestimation with the full avg scan approach.
- CHANGELOG: Add v0.3.0 section with TOAST estimation feature
- Project docs: Add toast_bloat.go, toast_bloat.sql, --toast/--heap flags,
  update implementation status and TODO list
- sql/embed.go: Clarify that toast_bloat.sql is standalone, not embedded
- README: Precise TOAST estimation method (pg_column_size, not constant)
- goreleaser: Include standalone SQL queries in deb/rpm packages
Already in .gitignore, should not be tracked.
- Bump version references from 0.2.0 to 0.3.0
- Fix package filenames: add linux_ prefix to match goreleaser output
- Update TOAST output examples with realistic data
@Alain-L Alain-L merged commit 9feb83b into main Mar 13, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant