Next time your Claude Code context is running low, just quit cc then run npx ccprune - it auto-resumes you back into your last thread, now compacted with an intelligent rolling summary. Run it again when you're low again - the summaries stack, so context just keeps rolling forward.
Fork of claude-prune with enhanced features: percentage-based pruning, AI summarization enabled by default, and improved UX.
- Zero-Config Default: Just run
ccprune- auto-detects latest session, keeps 55K tokens (~70K after Claude Code adds system context) - Token-Based Pruning: Prunes based on actual token count, not message count
- Smart Threshold: Automatically skips pruning if session is under 55K tokens
- AI Summarization: Automatically generates a summary of pruned content (enabled by default)
- Summary Synthesis: Re-pruning synthesizes old summary + new pruned content into one cohesive summary
- Small Session Warning: Prompts for confirmation when auto-selecting sessions with < 5 messages
- Safe by Default: Always preserves session summaries and metadata
- Auto Backup: Creates timestamped backups before modifying files
- Restore Support: Easily restore from backups with the
restorecommand - Dry-Run Preview: Preview changes and summary before committing
# Using npx (Node.js)
npx ccprune
# Using bunx (Bun)
bunx ccprune# Using npm
npm install -g ccprune
# Using bun
bun install -g ccprune-
Quit Claude Code - Press
Ctrl+Cor type/quit -
Run prune from the same project directory:
npx ccprune
That's it! ccprune auto-detects your latest session, prunes old messages (keeping a summary), and resumes automatically.
For fast, high-quality summarization, set up a Gemini API key:
- Get a free key from Google AI Studio
- Add to your shell profile (~/.zshrc or ~/.bashrc):
export GEMINI_API_KEY=your_key - Restart your terminal or run
source ~/.zshrc
With GEMINI_API_KEY set, ccprune automatically uses Gemini 2.5 Flash for fast summarization without chunking.
Note: If GEMINI_API_KEY is not set, ccprune automatically falls back to Claude Code CLI for summarization (no additional setup required).
# Zero-config: auto-detects latest session, keeps 55K tokens
ccprune
# Pick from available sessions interactively
ccprune --pick
# Explicit session ID (if you need a specific session)
ccprune <sessionId>
# Explicit token limit
ccprune --keep 55000
ccprune --keep-tokens 80000
# Subcommands
ccprune restore <sessionId> [--dry-run]sessionId: (Optional) UUID of the Claude Code session. Auto-detects latest if omitted.
| Subcommand | Description |
|---|---|
restore <sessionId> |
Restore a session from the latest backup |
restore <sessionId> --dry-run |
Preview restore without making changes |
| Option | Description |
|---|---|
--pick |
Interactively select from available sessions |
-n, --no-resume |
Skip automatic session resume |
--yolo |
Resume with --dangerously-skip-permissions |
--resume-model <model> |
Model for resumed session (opus, sonnet, haiku, opusplan) |
-k, --keep <number> |
Number of tokens to retain (default: 55000) |
--keep-tokens <number> |
Number of tokens to retain (alias for -k) |
--dry-run |
Preview changes and summary without modifying files |
--no-summary |
Skip AI summarization of pruned messages |
--summary-model <model> |
Model for summarization (haiku, sonnet, or full name) |
--summary-timeout <ms> |
Timeout for summarization in milliseconds (default: 360000) |
--gemini |
Use Gemini 3 Pro for summarization |
--gemini-flash |
Use Gemini 2.5 Flash for summarization |
--claude-code |
Use Claude Code CLI for summarization (chunks large transcripts) |
--prune-tools |
Replace all non-protected tool outputs with placeholders |
--prune-tools-ai |
Use AI to identify which tool outputs to prune |
--prune-tools-dedup |
Deduplicate identical tool calls, keep only most recent |
--prune-tools-max |
Maximum savings: dedup + AI analysis combined |
--prune-tools-keep <tools> |
Comma-separated tools to never prune (default: Edit,Write,TodoWrite,TodoRead,AskUserQuestion) |
-h, --help |
Show help information |
-V, --version |
Show version number |
If no session ID is provided, auto-detects the most recently modified session. If no keep option is specified, defaults to 55,000 tokens (~70K actual context after Claude Code adds system prompt and CLAUDE.md).
Summarization priority:
--claude-codeflag: Force Claude Code CLI (chunks transcripts >30K chars)--geminior--gemini-flashflags: Use Gemini API- Auto-detect: If
GEMINI_API_KEYis set, uses Gemini 2.5 Flash - Fallback: Claude Code CLI (no API key needed)
# Simplest: auto-detect, prune, and resume automatically
npx ccprune
# Prune only (don't resume)
npx ccprune -n
# Resume in yolo mode (--dangerously-skip-permissions)
npx ccprune --yolo
# Resume with a specific model (e.g., Opus 4.5)
npx ccprune --resume-model opus
# Combine yolo mode with Opus
npx ccprune --yolo --resume-model opus
# Pick from available sessions interactively
npx ccprune --pick
# Keep 55K tokens (default)
npx ccprune --keep 55000
# Keep 80K tokens (less aggressive pruning)
npx ccprune --keep-tokens 80000
# Preview what would be pruned (shows summary preview too)
npx ccprune --dry-run
# Skip summarization for faster pruning
npx ccprune --no-summary
# Use Claude Code CLI with haiku model (faster/cheaper)
npx ccprune --claude-code --summary-model haiku
# Use Gemini 3 Pro for summarization
npx ccprune --gemini
# Use Gemini 2.5 Flash (default when GEMINI_API_KEY is set)
npx ccprune --gemini-flash
# Force Claude Code CLI for summarization
npx ccprune --claude-code
# Target a specific session by ID
npx ccprune 03953bb8-6855-4e53-a987-e11422a03fc6 --keep 55000
# Restore from the latest backup
npx ccprune restore 03953bb8-6855-4e53-a987-e11422a03fc6Tool pruning runs automatically to reduce tokens before summarization:
- Dedup: Identical tool calls are deduplicated (keeps only most recent)
- AI analysis: Intelligently prunes irrelevant outputs using your summarization backend
# Default behavior (dedup + AI) - runs automatically
ccprune
# Disable automatic tool pruning
ccprune --skip-tool-pruning
# Explicit modes for specific behavior:
ccprune --prune-tools # Simple: replace ALL outputs (no AI)
ccprune --prune-tools-dedup # Dedup only (no AI)
ccprune --prune-tools-ai # AI only (no dedup)
ccprune --prune-tools-max # Explicit dedup + AI (same as default)
# Custom protected tools
ccprune --prune-tools-keep "Edit,Write,Bash"Protected tools (never pruned by default):
Edit,Write- file modification contextTodoWrite,TodoRead- task trackingAskUserQuestion- user interaction
Modes explained:
- Default (no flags): Runs dedup first (free), then AI analysis - maximum savings
- Simple (
--prune-tools): Replaces all non-protected tool outputs with[Pruned: {tool} output - {bytes} bytes] - AI (
--prune-tools-ai): Uses your summarization backend (Gemini or Claude Code CLI) to intelligently identify which outputs are no longer relevant - Dedup (
--prune-tools-dedup): Keeps only the most recent output when the same tool is called with identical input. Annotates with[{total} total calls] - Skip (
--skip-tool-pruning): Disable automatic tool pruning entirely
BEFORE AFTER FIRST PRUNE AFTER RE-PRUNE
────── ──────────────── ──────────────
┌───────────────┐ ┌───────────────┐ ┌───────────────┐
│ msg 1 (old) │─┐ │ [SUMMARY] │─┐ │ [NEW SUMMARY] │ ◄─ synthesized
│ msg 2 (old) │ │ │ "Previously.."│ │ │ (old+middle) │
│ ... │ ├──► ├───────────────┤ │ ├───────────────┤
│ msg N (old) │─┘ │ msg N+1 (kept)│ ├───────► │ msg X (kept) │
├───────────────┤ │ msg N+2 (kept)│ │ │ msg Y (kept) │
│ msg N+1 (new) │─────►│ msg N+3 (kept)│─┘ │ msg Z (kept) │
│ msg N+2 (new) │ └───────────────┘ └───────────────┘
│ msg N+3 (new) │
└───────────────┘ ▲ ▲
│ │
old msgs become old summary + middle
summary, recent kept synthesized, recent kept
- Locates Session File: Finds
$CLAUDE_CONFIG_DIR/projects/{project-path}/{sessionId}.jsonl - Counts Tokens: Uses Claude's cumulative usage data from the last message:
input_tokens + cache_read_input_tokens + cache_creation_input_tokens. This matches Claude Code's UI display exactly - Early Exit: If total tokens ≤ threshold (55K default), skips pruning and auto-resumes
- Preserves Critical Data: Always keeps the first line (file-history-snapshot or session metadata)
- Token-Based Cutoff: Scans right-to-left, accumulating tokens until adding the next message would exceed the threshold
- Content Extraction: Extracts text from messages, including
tool_resultoutputs andthinkingblocks. Tool calls become[Used tool: ToolName]placeholders to provide context without verbose tool I/O - Orphan Cleanup: Removes
tool_resultblocks in kept messages that referencetool_useblocks from pruned messages - AI Summarization: Generates a structured summary with sections: Overview, What Was Accomplished, Files Modified, Key Technical Details, Current State & Pending Work
- Summary Synthesis: Re-pruning synthesizes old summary + new pruned content into one cohesive summary
- Gemini (default with API key): Handles large transcripts natively without chunking
- Claude Code CLI (fallback): May chunk transcripts >30K characters (see Claude Code CLI Summarization below)
- Safe Backup: Creates timestamped backup in
prune-backup/before modifying - Auto-Resume: Optionally resumes Claude Code session after pruning
When using the --claude-code flag (or when GEMINI_API_KEY is not set), ccprune uses the Claude Code CLI for summarization with these specific behaviors:
Chunking for Large Transcripts:
- Transcripts >30,000 characters are automatically split into chunks
- Each chunk is summarized independently
- Chunk summaries are then combined into a final unified summary
- Why: Ensures reliable summarization even for very long sessions
Model Selection:
- Default: Uses your Claude Code CLI default model
- Override with
--summary-model haikuor--summary-model sonnet - Supports full model names (e.g.,
claude-3-5-sonnet-20241022)
Timeout & Retries:
- Default timeout: 360 seconds (6 minutes)
- Override with
--summary-timeout <ms> - Automatic retries: Up to 2 attempts on failure
When to Use:
- No API key required (uses existing Claude Code subscription)
- Handles extremely large transcripts via chunking
- Works offline (if Claude Code CLI works offline)
Trade-offs:
- Slower than Gemini API (spawns subprocess)
- Chunking may lose some context coherence for very large sessions
- Requires Claude Code CLI to be installed and authenticated
Claude Code stores sessions in:
~/.claude/projects/{project-path-with-hyphens}/{sessionId}.jsonl
For example, a project at /Users/alice/my-app becomes:
~/.claude/projects/-Users-alice-my-app/{sessionId}.jsonl
By default, ccprune looks for session files in ~/.claude. If Claude Code is configured to use a different directory, you can specify it with the CLAUDE_CONFIG_DIR environment variable:
CLAUDE_CONFIG_DIR=/custom/path/to/claude ccpruneWhen set, ccprune automatically uses Gemini 2.5 Flash for summarization (recommended). Get your free API key from Google AI Studio.
export GEMINI_API_KEY=your_api_key_here
ccprune # automatically uses Gemini 2.5 FlashUse --gemini for Gemini 3 Pro, or --claude-code to force Claude Code CLI.
If you were using the original claude-prune package, ccprune v3.x has these changes:
# claude-prune v1.x (message-count based, summary was opt-in)
claude-prune <id> -k 10 --summarize-pruned
# ccprune v2.x (percentage-based, summary enabled by default)
ccprune <id> # defaults to 20% of messages
ccprune <id> --keep-percent 25 # keep latest 25% of messages
# ccprune v3.x (token-based, summary enabled by default)
ccprune <id> # defaults to 55K tokens
ccprune <id> -k 55000 # keep 55K tokens
ccprune <id> --keep-tokens 80000 # keep 80K tokensKey changes in v3.x:
- Token-based pruning:
-know means tokens, not message count - Removed:
-p, --keep-percentflag (replaced by token-based approach) - Auto-skip: Sessions under 55K tokens are not pruned
- Lenient boundary: Includes one extra message at the boundary to preserve context
- Summary is enabled by default (use
--no-summaryto disable) - Re-pruning synthesizes old summary + new pruned content into one summary
Key changes in v4.x:
- Accurate token counting: Uses Claude's cumulative usage data (
input_tokens + cache_read + cache_creation) to match Claude Code UI - Proportional scaling: Per-message tokens are scaled to match total context for accurate pruning
--resume-model: Specify which model to use when auto-resuming (opus, sonnet, haiku, opusplan)- 55K default: Results in ~70K total context after Claude Code adds system prompt, CLAUDE.md, and other overhead
# Clone and install
git clone https://github.com/nicobailon/claude-prune.git
cd claude-prune
bun install
# Run tests
bun run test
# Build
bun run build
# Test locally
./dist/index.js --helpThis project is a fork of claude-prune by Danny Aziz. Thanks for the original implementation!
MIT