Feat: Add TOAST bloat estimation by Alain-L · Pull Request #1 · Alain-L/qwash

Alain-L · 2026-02-18T22:29:04Z

Summary

TOAST bloat estimation via --toast flag, using PostgreSQL system catalogs (no pgstattuple required)
Go implementation: avg(length(chunk_data)) full scan on TOAST tables — exact chunk average, ~90 ms per 100 MB
Standalone SQL (sql/toast_bloat.sql): DO block + named cursor using pg_column_size() on main tables — no objects created, handles both EXTERNAL and EXTENDED storage
JSON output support for TOAST bloat results
Integration tests covering all TOAST profiles

Accuracy (vs pgstattuple)

Table	Profile	pgstattuple	Go	Standalone	Delta max
audit_log	4 KB EXTERNAL	69.46%	69.6%	69.5%	+0.14
toast_aligned	2 full chunks	54.95%	55.0%	55.0%	+0.05
toast_large	20 KB, 11 chunks	52.88%	53.0%	53.0%	+0.12
toast_mixed	2.1/8/15 KB mixed	52.60%	52.7%	52.7%	+0.10
toast_narrow	2.1 KB, 2 chunks	51.66%	51.7%	51.7%	+0.04
toast_multi_col	2 columns	48.98%	49.0%	48.9%	-0.08
toast_compressed	EXTENDED	47.55%	47.6%	47.6%	+0.05

All within < 0.2 pts of pgstattuple on tables >= 10 MB.

Files changed

analysis/toast_bloat.go — TOAST bloat detection with temp helper function
analysis/types.go — ToastBloat struct
sql/toast_bloat.sql — Standalone estimation query (unique in the PG community)
cmd/root.go — --toast / --heap flag wiring
output/text.go — TOAST text output with severity grouping
output/json.go — TOAST JSON output
tests/toast_estimate_test.go — Integration tests
testdata/demo.sql — Demo database with 8 TOAST profiles
README.md — Documentation and usage examples
CHANGELOG.md — v0.3.0 release notes
.goreleaser.yml — Include standalone SQL in packages

Test plan

go test ./... — 46/46 pass
./bin/qwash --estimate --toast -d qwash_demo — verified
./bin/qwash --estimate --toast --json -d qwash_demo — verified
Standalone SQL vs pgstattuple — all within < 0.2 pts
--heap --toast combined mode — verified

Add --toast flag to estimate TOAST table bloat using ppc (pages per chunk) algorithm. Features: - New --toast and --heap flags for target selection (--heap is default) - TOAST bloat estimation using chunk sampling for accuracy - Summary view with severity grouping (critical/high/medium) - Detailed view for specific tables (-t flag) - Reliability indicator for tables < 10 MB Usage: qwash --estimate --toast -d mydb qwash --estimate --toast -t mytable -d mydb

Add createBloatedToastTable helper with realistic ~10KB/row data (5 chunks per value, STORAGE EXTERNAL) and 7 test cases covering basic/high/low/no bloat, small tables, CLI output, and chunk sampling.

- Sample full-size chunks (chunk_seq > 0) with fallback to avoid partial last-chunk bias in ppc_ref calculation - Filter out tables with relpages = 0 (stale/uninitialized stats) - Use current_setting('block_size') instead of hardcoded 8192 - Move helper function to pg_temp schema for crash-safe cleanup - Add stale stats warning when no VACUUM in the last 24 hours (joins pg_stat_user_tables for last_vacuum/last_autovacuum) - Align sql/toast_bloat.sql reference file with embedded query

TOAST tables do use the FSM for new insertions (via heap_insert), contrary to what was previously stated. The reason UPDATE col=col doesn't compact TOAST is that PostgreSQL reuses the existing TOAST pointer when the value hasn't changed, not because TOAST is append-only.

The previous chunk_seq > 0 sampling strategy returned the partial last chunk for values with exactly 2 chunks (e.g. ~3KB data), producing a ~48% false-positive bloat estimate on non-bloated tables. New strategy: find a multi-chunk chunk_id first, then read its chunk_seq=0 which is always exactly TOAST_MAX_CHUNK_SIZE (1996 bytes on 8kB blocks). Falls back to any chunk for single-chunk-only tables. Adds TestToastEstimateTwoChunkValues regression test.

More readable output for standalone usage: min_pages_required and bloat_pages are directly comparable to toast_pages, consistent with the min_pages_required pattern in sql/table_bloat.sql. The embedded Go query is unchanged (uses raw values internally).

Compute TOAST_MAX_CHUNK_SIZE as block_size/4 - 52 instead of sampling actual chunks via PL/pgSQL function. Uses constants CTE pattern matching table_bloat.sql style.

Add --toast and --heap flags to command reference, quick start examples, output sample, and bloat estimation explanation.

Replace hand-crafted README output examples with real qwash output from a demo database. Fixes order_items at 41% shown in CRITICAL (≥50%) section. Examples now show all severity levels (CRITICAL, HIGH, MEDIUM) with correct categorization. Add omitempty to BloatReport.Indexes JSON tag to suppress "indexes": null from JSON output when no index data is present.

Add toast field to BloatReport JSON structure so --toast --json produces proper TOAST bloat data instead of empty output. Works for all combinations: --toast alone, --heap --toast, and with -t table filtering. Add TOAST JSON example to README.

The sampling function was reading only chunk_seq=0 (always full-size at TOAST_MAX_CHUNK_SIZE), ignoring that the last chunk of each value is typically much smaller. This caused systematic underestimation. Fix: average all chunks of a sampled value to capture the real per-chunk size. On test data, TOAST bloat estimation goes from 55% to 69.6%, matching pgstattuple reference (69.5%).

Replace single-value sampling with avg(length(chunk_data)) across all live chunks. This naturally handles multi-column tables and mixed payload sizes with ~90 ms per 100 MB of TOAST data.

Replace octet_length() with pg_column_size() which reads va_extsize from the TOAST pointer. Unlike octet_length() which returns the decompressed size, pg_column_size() returns the actual stored size, making it correct for both EXTERNAL and EXTENDED storage. Rewrite standalone SQL as DO block + named cursor with dynamic UNION ALL of pg_column_size() per table. No objects created. Add toast_compressed table to demo for EXTENDED storage testing.

Use exact multiples of TOAST_MAX_CHUNK_SIZE (1996 bytes) in test payloads to avoid partial last chunks that skew avg(length(chunk_data)) and cause false bloat overestimation with the full avg scan approach.

- CHANGELOG: Add v0.3.0 section with TOAST estimation feature - Project docs: Add toast_bloat.go, toast_bloat.sql, --toast/--heap flags, update implementation status and TODO list - sql/embed.go: Clarify that toast_bloat.sql is standalone, not embedded - README: Precise TOAST estimation method (pg_column_size, not constant) - goreleaser: Include standalone SQL queries in deb/rpm packages

Already in .gitignore, should not be tracked.

- Bump version references from 0.2.0 to 0.3.0 - Fix package filenames: add linux_ prefix to match goreleaser output - Update TOAST output examples with realistic data

alesage added 18 commits December 24, 2025 15:46

Test: Add TOAST bloat estimation integration tests

4f644d5

Add createBloatedToastTable helper with realistic ~10KB/row data (5 chunks per value, STORAGE EXTERNAL) and 7 test cases covering basic/high/low/no bloat, small tables, CLI output, and chunk sampling.

Docs: Rewrite standalone toast_bloat.sql without helper function

40bd33d

Compute TOAST_MAX_CHUNK_SIZE as block_size/4 - 52 instead of sampling actual chunks via PL/pgSQL function. Uses constants CTE pattern matching table_bloat.sql style.

Docs: Document TOAST bloat estimation in README

dc1177e

Add --toast and --heap flags to command reference, quick start examples, output sample, and bloat estimation explanation.

Feat: Add TOAST bloat to JSON output

0ec510c

Add toast field to BloatReport JSON structure so --toast --json produces proper TOAST bloat data instead of empty output. Works for all combinations: --toast alone, --heap --toast, and with -t table filtering. Add TOAST JSON example to README.

Fix: Use full avg scan for TOAST chunk size estimation

cb6f80f

Replace single-value sampling with avg(length(chunk_data)) across all live chunks. This naturally handles multi-column tables and mixed payload sizes with ~90 ms per 100 MB of TOAST data.

Fix: Use aligned payloads in TOAST tests for accurate estimation

30b4731

Use exact multiples of TOAST_MAX_CHUNK_SIZE (1996 bytes) in test payloads to avoid partial last chunks that skew avg(length(chunk_data)) and cause false bloat overestimation with the full avg scan approach.

Fix: Remove project docs from version control

83db194

Already in .gitignore, should not be tracked.

Fix: Remove research notes from version control

19e9dda

Docs: Update README for v0.3.0 release

d59122e

- Bump version references from 0.2.0 to 0.3.0 - Fix package filenames: add linux_ prefix to match goreleaser output - Update TOAST output examples with realistic data

Alain-L merged commit 9feb83b into main Mar 13, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat: Add TOAST bloat estimation#1

Feat: Add TOAST bloat estimation#1
Alain-L merged 18 commits into
mainfrom
dev

Alain-L commented Feb 18, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Alain-L commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Accuracy (vs pgstattuple)

Files changed

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Alain-L commented Feb 18, 2026 •

edited

Loading