ββββββββ βββββββ ββββββββ βββββββββββββββββββββββββ βββββββββββ βββ βββββ βββββββββββββ βββ βββββ βββββββββββββββββββββββββ ββββββββ βββββββ ββββββββ
Compress LLM context to save tokens and reduce costs
Real session stats: 3,003 compressions Β· 178,442 tokens saved Β· 24.7% avg reduction Β· up to 92% with dedup
Install Β· How It Works Β· Supported Tools Β· Changelog Β· Discord
sqz compresses command output before it reaches your LLM. Single Rust binary, zero config.
The real win is dedup: when the same file gets read 5 times in a session, sqz sends it once and returns a 13-token reference for every repeat.
Without sqz: With sqz:
File read #1: 2,000 tokens File read #1: ~800 tokens (compressed)
File read #2: 2,000 tokens File read #2: ~13 tokens (dedup ref)
File read #3: 2,000 tokens File read #3: ~13 tokens (dedup ref)
βββββββββββββββββββββββ βββββββββββββββββββββββ
Total: 6,000 tokens Total: ~826 tokens (86% saved)
24.7% average reduction across 3,003 real compressions Β· 92% saved on repeated file reads Β· 86% on shell/git output Β· 13-token refs for cached content
One developer's week, measured from actual sqz gain output:
$ sqz gain
sqz token savings (last 7 days)
ββββββββββββββββββββββββββββββββββββββββββββββββββ
04-13 β β 2,329 saved
04-14 β β 0 saved
04-15 ββββ β 12,954 saved
04-16 βββ β 9,223 saved
04-17 βββββ β 14,752 saved
04-18 ββββββββββββββββββββββββββββββββ 105,569 saved
04-19 βββββββββ β 30,882 saved
04-20 ββ β 4,334 saved
ββββββββββββββββββββββββββββββββββββββββββββββββββ
Total: 3,003 compressions, 178,442 tokens saved (24.7% avg reduction)
Single-command compression (measured via cargo test -p sqz-engine benchmarks):
| Content | Before | After | Saved |
|---|---|---|---|
| Repeated log lines | 148 | 62 | 58% |
| Large JSON array | 259 | 142 | 45% |
| JSON API response | 64 | 53 | 17% |
| Git diff | 61 | 54 | 12% |
| Prose/docs | 124 | 121 | 2% |
| Stack trace (safe mode) | 82 | 82 | 0% |
Where the real savings live β the cache sends each file once, repeats cost 13 tokens:
| Scenario | Without sqz | With sqz | Saved |
|---|---|---|---|
| Same file read 5Γ | 10,000 | 826 | 92% |
| Same JSON response 3Γ | 192 | 79 | 59% |
| Test-fix-test cycle (3 runs) | 15,000 | 5,186 | 65% |
Single-command compression ranges from 2β58% depending on content. Repeated reads drop to 13 tokens each. Your mileage will vary with how repetitive your tool calls are β agentic sessions with many file re-reads see the biggest wins.
Prebuilt binaries (no compiler required β works on every platform):
# macOS / Linux
curl -fsSL https://raw.githubusercontent.com/ojuschugh1/sqz/main/install.sh | sh
# Windows (PowerShell)
irm https://raw.githubusercontent.com/ojuschugh1/sqz/main/install.ps1 | iex
# Any platform via npm
npm install -g sqz-cli
# macOS / Linux via Homebrew
brew tap ojuschugh1/sqz
brew install sqzBuild from source via Cargo:
cargo install sqz-cli sqz-mcpsqz-cli provides the sqz binary; sqz-mcp provides the MCP server. sqz-engine is a library dependency β it compiles automatically and does not need to be installed separately.
Build from source (cargo install sqz-cli) works too, but needs a C toolchain:
- Linux:
build-essential(apt) or equivalent - macOS: Xcode Command Line Tools (
xcode-select --install) - Windows: Visual Studio Build Tools with the "Desktop development with C++" workload. Without these,
cargo installfails withlinker link.exe not found. If you don't already have them, use the PowerShell or npm install above instead.
Then initialize:
sqz init --global # hooks apply to every project on this machine
# or
sqz init # hooks apply to just this project (.claude/settings.local.json)--global writes to ~/.claude/settings.json (the user scope per the
Anthropic scope table),
so the sqz hook fires in every Claude Code session on this machine. This is
the common case on first install. Your existing permissions, env,
statusLine, and unrelated hooks in ~/.claude/settings.json are
preserved β sqz merges its entries rather than overwriting.
Plain sqz init (project scope) is useful when you want sqz active only
inside one repo.
Only using one agent? Pass --only (or --skip) to limit which
configs are written:
sqz init --only opencode # just OpenCode, nothing else
sqz init --only opencode,codex # OpenCode and Codex
sqz init --skip cursor,windsurf # everything except Cursor and WindsurfAccepted names: claude, cursor, windsurf, cline, gemini,
kiro, opencode, codex. Aliases (claude-code, gemini-cli, roo,
kiro-cli) also work. --only and --skip can't be combined.
sqz init round-trips your config file through a JSON parser to merge
the sqz entry, which drops any comments in your opencode.jsonc (and
the analogous JSON-with-comments files other tools accept). If you've
commented your config carefully and want to keep them, install by hand
instead.
OpenCode β two steps:
-
Drop the plugin file in place.
sqzprints the generated TS to stdout so you don't have to hand-write the path-escaping logic:mkdir -p ~/.config/opencode/plugins sqz print-opencode-plugin > ~/.config/opencode/plugins/sqz.ts
-
Add the MCP entry to your existing
opencode.jsoncyourself. Append this block inside the top-levelmcpobject (create themcpobject if it doesn't exist):
Comments in the rest of your file stay put. OpenCode auto-discovers
the plugin file; no plugin array entry needed (adding one causes
double-loading, see issue #10).
Other tools β Claude Code, Cursor, Windsurf, Cline, Gemini CLI,
and Codex use plain JSON configs without comment support, so the
automated path is non-destructive there. Use sqz init --only <tool>
for those.
That's it. Shell hooks installed, AI tool hooks configured.
sqz installs a PreToolUse hook that intercepts bash commands before your AI tool runs them. The output gets compressed transparently β the AI tool never knows.
Claude β git status β [sqz hook rewrites] β compressed output (85% smaller)
What gets compressed:
- Shell output β git, cargo, npm, docker, kubectl, ls, grep, etc.
- JSON β strips nulls, compact encoding
- Logs β collapses repeated lines
- Test output β shows failures only
What doesn't get compressed:
- Stack traces, error messages, secrets β routed to safe mode (0% compression)
- Your prompts and the AI's responses β controlled by the AI tool, not sqz
| Tool | Integration | Setup |
|---|---|---|
| Claude Code | PreToolUse hook (transparent) | sqz init |
| Cursor | PreToolUse hook (transparent) | sqz init |
| Windsurf | PreToolUse hook (transparent) | sqz init |
| Cline | PreToolUse hook (transparent) | sqz init |
| Gemini CLI | BeforeTool hook (transparent) | sqz init |
| Kiro | PreToolUse hook (transparent) | sqz init |
| OpenCode | TypeScript plugin (transparent) | sqz init |
| VS Code | Extension | Install from Marketplace |
| JetBrains | Plugin | Install from Marketplace |
| Chrome | Browser extension | ChatGPT, Claude.ai, Gemini, Grok, Perplexity |
| Firefox | Browser extension | Same sites |
sqz init --global # Install hooks for every project on this machine
sqz init # Install hooks for just this project
sqz init --only kiro # Only configure Kiro (skip the rest)
sqz init --only opencode # Only configure OpenCode (skip the rest)
sqz init --skip cursor # Configure every agent except Cursor
sqz compress <text> # Compress (or pipe from stdin)
sqz compress --no-cache # Compress without dedup (always full output)
sqz expand <ref> # Recover original content from a Β§ref:HASHΒ§ token
sqz compact # Evict stale context to free tokens
sqz gain # Show daily token savings (bar chart)
sqz gain --project . # Per-project daily gains
sqz gain --days 30 # Last 30 days
sqz stats # Cumulative compression report
sqz stats --breakdown # Per-command token usage breakdown
sqz stats --project . # Stats for current project only
sqz stats --project list # List all tracked projects
sqz discover # Find missed savings
sqz resume # Re-inject session context after compaction
sqz vizit # Live terminal dashboard (like htop for AI agents)
sqz hook claude # Process a PreToolUse hook (Claude Code)
sqz hook kiro # Process a PreToolUse hook (Kiro)
sqz print-opencode-plugin # Print OpenCode plugin TS for manual install
sqz proxy --port 8080 # API proxy (compresses full request payloads)When sqz sees the same content twice, it returns a compact Β§ref:HASHΒ§ token
instead of the full text. Most models handle this fine, but some (e.g., GLM 5.1)
can't parse the ref format and loop. Four ways to work around this:
# 1. Recover original content from a ref
sqz expand a1b2c3d4 # prefix match
sqz expand 'Β§ref:a1b2c3d4Β§' # paste the whole token
# 2. Compress without dedup (per-invocation)
echo "..." | sqz compress --no-cache
# 3. Disable dedup globally (env var)
export SQZ_NO_DEDUP=1
# 4. MCP passthrough tool (returns input byte-exact, zero transforms)
# Available via tools/list when sqz-mcp is runningRun sqz gain in your shell any time to see your own daily breakdown (see the
Token Savings section above for what the output looks like), and sqz stats
for the full cumulative report:
$ sqz stats
π sqz compression stats
ββββββββββββββββββββββββββββββββββββββββββββββββββ
178,442 tokens saved
β 24.7% average reduction
Compressions 3,003
Tokens in 721,840
Tokens out 543,398
Tokens saved 178,442
Avg reduction 24.7%
ποΈ Cache
ββββββββββββββββββββββββββββββββββββββββββββββββββ
Entries 43
Size 39.1 KBAdd --breakdown to see exactly which commands consume the most tokens:
$ sqz stats --breakdown
π Top Token Consumers
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
command calls tokens in out saved
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
dedup 249 45541 3237 93%
stdin 51 30851 24289 21%
auto 132 18288 7740 58%
echo 17 1050 558 47%
ls -la 8 948 948 0%
cargo build 7 170 145 15%
git status 4 56 8 86%
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββPer-project filtering:
sqz stats --project . # stats for current project only
sqz stats --project list # list all tracked projects
sqz gain --project . # daily gains for current project
sqz gain --days 30 # last 30 days instead of 7
sqz gain --days 30 --project . # combine bothStats are stored locally in SQLite under ~/.sqz/sessions.db β nothing leaves your machine.
- Per-command formatters β
git statusβ compact summary,cargo testβ failures only,docker psβ name/image/status table - Structural summaries β code files compressed to imports + function signatures + call graph (~70% reduction). The model sees the architecture, not implementation noise.
- Dedup cache β SHA-256 content hash, persistent across sessions. Second read = 13-token reference.
- JSON pipeline β strip nulls β project out debug fields β flatten β collapse arrays β TOON encoding (lossless compact format)
- Safe mode β stack traces, secrets, migrations detected by entropy analysis and routed through with 0% compression
For the full technical details, see docs/.
# ~/.sqz/presets/default.toml
[preset]
name = "default"
version = "1.0"
[compression.condense]
enabled = true
max_repeated_lines = 3
[compression.strip_nulls]
enabled = true
[budget]
warning_threshold = 0.70
default_window_size = 200000- Zero telemetry β no data transmitted, no crash reports
- Fully offline β works in air-gapped environments
- All processing local
git clone https://github.com/ojuschugh1/sqz.git
cd sqz
cargo test --workspace
cargo build --releaseElastic License 2.0 (ELv2) β use, fork, modify freely. Two restrictions: no competing hosted service, no removing license notices.