Conversation
added 18 commits
December 24, 2025 15:46
Add --toast flag to estimate TOAST table bloat using ppc (pages per chunk) algorithm. Features: - New --toast and --heap flags for target selection (--heap is default) - TOAST bloat estimation using chunk sampling for accuracy - Summary view with severity grouping (critical/high/medium) - Detailed view for specific tables (-t flag) - Reliability indicator for tables < 10 MB Usage: qwash --estimate --toast -d mydb qwash --estimate --toast -t mytable -d mydb
Add createBloatedToastTable helper with realistic ~10KB/row data (5 chunks per value, STORAGE EXTERNAL) and 7 test cases covering basic/high/low/no bloat, small tables, CLI output, and chunk sampling.
- Sample full-size chunks (chunk_seq > 0) with fallback to avoid
partial last-chunk bias in ppc_ref calculation
- Filter out tables with relpages = 0 (stale/uninitialized stats)
- Use current_setting('block_size') instead of hardcoded 8192
- Move helper function to pg_temp schema for crash-safe cleanup
- Add stale stats warning when no VACUUM in the last 24 hours
(joins pg_stat_user_tables for last_vacuum/last_autovacuum)
- Align sql/toast_bloat.sql reference file with embedded query
TOAST tables do use the FSM for new insertions (via heap_insert), contrary to what was previously stated. The reason UPDATE col=col doesn't compact TOAST is that PostgreSQL reuses the existing TOAST pointer when the value hasn't changed, not because TOAST is append-only.
The previous chunk_seq > 0 sampling strategy returned the partial last chunk for values with exactly 2 chunks (e.g. ~3KB data), producing a ~48% false-positive bloat estimate on non-bloated tables. New strategy: find a multi-chunk chunk_id first, then read its chunk_seq=0 which is always exactly TOAST_MAX_CHUNK_SIZE (1996 bytes on 8kB blocks). Falls back to any chunk for single-chunk-only tables. Adds TestToastEstimateTwoChunkValues regression test.
More readable output for standalone usage: min_pages_required and bloat_pages are directly comparable to toast_pages, consistent with the min_pages_required pattern in sql/table_bloat.sql. The embedded Go query is unchanged (uses raw values internally).
Compute TOAST_MAX_CHUNK_SIZE as block_size/4 - 52 instead of sampling actual chunks via PL/pgSQL function. Uses constants CTE pattern matching table_bloat.sql style.
Add --toast and --heap flags to command reference, quick start examples, output sample, and bloat estimation explanation.
Replace hand-crafted README output examples with real qwash output from a demo database. Fixes order_items at 41% shown in CRITICAL (≥50%) section. Examples now show all severity levels (CRITICAL, HIGH, MEDIUM) with correct categorization. Add omitempty to BloatReport.Indexes JSON tag to suppress "indexes": null from JSON output when no index data is present.
Add toast field to BloatReport JSON structure so --toast --json produces proper TOAST bloat data instead of empty output. Works for all combinations: --toast alone, --heap --toast, and with -t table filtering. Add TOAST JSON example to README.
The sampling function was reading only chunk_seq=0 (always full-size at TOAST_MAX_CHUNK_SIZE), ignoring that the last chunk of each value is typically much smaller. This caused systematic underestimation. Fix: average all chunks of a sampled value to capture the real per-chunk size. On test data, TOAST bloat estimation goes from 55% to 69.6%, matching pgstattuple reference (69.5%).
Replace single-value sampling with avg(length(chunk_data)) across all live chunks. This naturally handles multi-column tables and mixed payload sizes with ~90 ms per 100 MB of TOAST data.
Replace octet_length() with pg_column_size() which reads va_extsize from the TOAST pointer. Unlike octet_length() which returns the decompressed size, pg_column_size() returns the actual stored size, making it correct for both EXTERNAL and EXTENDED storage. Rewrite standalone SQL as DO block + named cursor with dynamic UNION ALL of pg_column_size() per table. No objects created. Add toast_compressed table to demo for EXTENDED storage testing.
Use exact multiples of TOAST_MAX_CHUNK_SIZE (1996 bytes) in test payloads to avoid partial last chunks that skew avg(length(chunk_data)) and cause false bloat overestimation with the full avg scan approach.
- CHANGELOG: Add v0.3.0 section with TOAST estimation feature - Project docs: Add toast_bloat.go, toast_bloat.sql, --toast/--heap flags, update implementation status and TODO list - sql/embed.go: Clarify that toast_bloat.sql is standalone, not embedded - README: Precise TOAST estimation method (pg_column_size, not constant) - goreleaser: Include standalone SQL queries in deb/rpm packages
Already in .gitignore, should not be tracked.
- Bump version references from 0.2.0 to 0.3.0 - Fix package filenames: add linux_ prefix to match goreleaser output - Update TOAST output examples with realistic data
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
--toastflag, using PostgreSQL system catalogs (nopgstattuplerequired)avg(length(chunk_data))full scan on TOAST tables — exact chunk average, ~90 ms per 100 MBsql/toast_bloat.sql): DO block + named cursor usingpg_column_size()on main tables — no objects created, handles both EXTERNAL and EXTENDED storageAccuracy (vs pgstattuple)
All within < 0.2 pts of pgstattuple on tables >= 10 MB.
Files changed
analysis/toast_bloat.go— TOAST bloat detection with temp helper functionanalysis/types.go—ToastBloatstructsql/toast_bloat.sql— Standalone estimation query (unique in the PG community)cmd/root.go—--toast/--heapflag wiringoutput/text.go— TOAST text output with severity groupingoutput/json.go— TOAST JSON outputtests/toast_estimate_test.go— Integration teststestdata/demo.sql— Demo database with 8 TOAST profilesREADME.md— Documentation and usage examplesCHANGELOG.md— v0.3.0 release notes.goreleaser.yml— Include standalone SQL in packagesTest plan
go test ./...— 46/46 pass./bin/qwash --estimate --toast -d qwash_demo— verified./bin/qwash --estimate --toast --json -d qwash_demo— verified--heap --toastcombined mode — verified