When AI Assistants Eat Their Own Memory
A technical presentation exploring context window limitations in AI-powered development workflows and practical solutions for developers working with AI coding assistants.
When setting up Claude Code with MCP (Model Context Protocol) servers for Jira, Confluence, and GitLab integration, 91% of the context window was consumed at startup—before writing a single line of code.
Total Context: 200,000 tokens
Used at Startup: 181,000 tokens (91%)
Available: 19,000 tokens (9%)
That's enough for:
→ 1 medium file (~500 lines)
→ OR 5 minutes of conversation
→ OR 3-4 tool outputs
Basically... nothing useful.
The irony: With this setup, it's possible to access Jira to check for bugs, but there's insufficient context remaining to read the files needed to implement fixes. The tools defeat their own purpose.
Baseline (no MCP servers) 68k (34%) ████████████████░░░░░░░░░░░░░░░░░░
✅ Healthy - plenty of room
↓ Add Jira (3 essential tools)
70k tokens (35%) 70k (35%) ████████████████░░░░░░░░░░░░░░░░░░
✅ Still fine - barely changed
↓ Enable full Jira (8 tools)
104k tokens (52%) 104k (52%) ██████████████████████████░░░░░░░░
⚠️ Noticeable - but workable
↓ Add GitLab (109 tools!)
148k tokens (74%) 148k (74%) █████████████████████████████████░░
⚠️ Problematic - getting cramped
↓ Add Confluence + other servers
181k tokens (91%) 181k (91%) ████████████████████████████████████
🔴 BROKEN - unusable
Total Context: 200,000 tokens
├─ Baseline (Claude + config): 68,000 (34%)
└─ MCP Tool Definitions: 114,300 (57.2%)
├─ GitLab: ~77,000 tokens (109 tools)
├─ Jira: 37,000 tokens (8 tools)
└─ Other: 300 tokens
Key insight: Tool definitions consumed 57% of context BEFORE any tools were even USED. That's like paying rent on an entire warehouse just to store the instruction manuals.
Per-tool cost:
- Jira: ~4,625 tokens/tool
- GitLab: ~706 tokens/tool
- Jira tools are 6.5x more expensive per tool!
What it is: Load only the tools you actually need
When to use: When your MCP server supports individual tool configuration
Before: Full Jira (8 tools)
{
"mcpServers": {
"mcp-atlassian": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-atlassian"]
}
}
}→ 104k tokens (52%)
After: Partial Jira (3 tools)
{
"mcpServers": {
"mcp-atlassian": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-atlassian"],
"env": {
"ENABLED_TOOLS": "jira_get_issue,jira_search,jira_add_comment"
}
}
}
}→ 70k tokens (35%)
Result:
- ✅ Saved 34,000 tokens
- ✅ Dropped 17 percentage points
- ✅ Only loaded tools actually needed
This works great for Atlassian MCP Server, but GitLab MCP Server doesn't support individual tool configuration—it's all-or-nothing with 109 tools consuming 77,000 tokens.
Even with optimized Jira: 70k (Jira) + 77k (GitLab) = 147k tokens (74%)
Still broken.
What it is: A proxy that loads MCP tools on-demand instead of at startup
When to use: When Solution 1 isn't enough or isn't available
Traditional MCP:
[Startup]
↓
Load ALL 117 tool definitions
↓
114,300 tokens consumed
↓
[Ready to work]
↓
19k tokens available 🔴
mcp-funnel:
[Startup]
↓
Load bridge tools only
↓
9,600 tokens consumed
↓
[Ready to work]
↓
123k tokens available ✅
↓
[When you need a tool]
↓
Discover by keyword (~1k tokens)
↓
Load specific tool
↓
Use it
BEFORE (Direct MCP)
Context Usage: 181,000 tokens (91%)
MCP Tools: 114,300 tokens
Available: 19,000 tokens
Tool Count: 117 tools (all pre-loaded)
Status: 🔴 UNUSABLE
"Can access Jira but can't read files"
AFTER (mcp-funnel)
Context Usage: 77,000 tokens (38%)
MCP Tools: 9,600 tokens
Available: 123,000 tokens
Tool Count: 12 pre-loaded, 100+ available on-demand
Status: ✅ HEALTHY
"Back to normal development workflow"
The Impact:
- 104,000 tokens saved
- 53 percentage point improvement (91% → 38%)
- 6.5x more working space (19k → 123k tokens)
- 11.9x reduction in MCP overhead (114.3k → 9.6k tokens)
Step 1: Install
npm install -g mcp-funnelStep 2: Update config (~/.claude/claude_desktop_config.json)
See .mcp.json for a working example with mcp-funnel:
{
"mcpServers": {
"mcp-funnel": {
"command": "npx",
"args": ["-y", "mcp-funnel", ".mcp-funnel.json"]
}
}
}Step 3: Create mcp-funnel config (.mcp-funnel.json)
See .mcp-funnel.json for a complete example with GitLab and Sequential Thinking servers:
{
"servers": {
"mcp-gitlab": {
"command": "docker",
"args": ["run", "-i", "--rm", "-e", "GITLAB_PERSONAL_ACCESS_TOKEN", "iwakitakuma/gitlab-mcp"]
},
"mcp-atlassian": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-atlassian"]
}
},
"exposeCoreTools": [
"discover_tools_by_words",
"get_tool_schema",
"bridge_tool_request"
]
}Step 4: Restart Claude Code
That's it. Tools now load on demand.
✅ Pros:
- Massive context savings (91% → 38%)
- All tools remain available
- Works with ANY MCP server (GitLab included!)
- Simple configuration
- Free and open source
- Discovery is cheap (~1k tokens)
- Extra discovery step ("discover Jira tools")
- Small latency on first tool use (barely noticeable)
- Slightly less "magical" (Claude must explicitly discover)
- One more dependency to manage
The Reality: Trading "auto-magical" for "actually usable" is an easy choice.
Use /context or /usage to check token consumption periodically.
The 70% rule:
- < 70%: Work normally
- > 70%: Consider
/compactor/clear - > 85%: Definitely clear soon
Store project context OUTSIDE active memory. Claude reads them automatically without consuming context window.
Phase 1: Research → Save summary → /clear
Phase 2: Implement → Commit → /clear
Phase 3: Test → Save results → /clear
This is not just a Claude problem—every AI coding assistant faces finite context:
| AI Tool | Context Window | Same Issues? |
|---|---|---|
| Claude Code | 200K | ✓ MCP servers |
| GitHub Copilot | 8K | ✓ Extensions |
| ChatGPT | 128K | ✓ Plugins |
| Cursor | Varies | ✓ Integrations |
| Cody | 100K | ✓ Context fetching |
The Universal Truth:
- Every AI coding assistant has finite context
- Every tool integration consumes tokens
- The more ambitious your workflow, the bigger the problem
- Context is the new bottleneck
- → Lazy loading should be default
- → Token costs should be transparent
- → Give users control over what loads
-
Context is Finite and Fills Fast
- 200K tokens sounds like a lot
- MCP servers can consume 57% at startup
- Measure your actual usage
-
Strategic Tool Selection Helps (If Available)
- Load only what you need
- Can save 30-40k tokens
- But not all servers support this
-
Lazy Loading is the Real Solution
- mcp-funnel: 91% → 38%
- 104k tokens saved
- Works with any MCP server
- Gives you your workflow back
-
This Affects Everyone
- All AI coding assistants face this
- Context is the new bottleneck
- Plan your workflows accordingly
-
It's Fixable
- The tools exist today
- Free, open source, easy to set up
- You don't have to live with 91% context usage
- mcp-funnel - Lazy loading proxy for MCP servers
- mcp-atlassian - Jira and Confluence MCP server
- mcp-gitlab - GitLab MCP server
- Model Context Protocol - Official MCP documentation
- Claude Code - AI-powered CLI from Anthropic
This repository contains three configuration scenarios:
.mcp.json- mcp-funnel setup with Atlassian (partial tools).mcp-funnel.json- GitLab + Sequential Thinking servers through mcp-funnelmcp-direct.json- Direct MCP configuration with Atlassian + GitLab (the 91% problem)
See measurements/screenshots/ for visual documentation from all 6 test configurations showing the context usage progression.
| Configuration | Total | % | MCP | Available | Tools |
|---|---|---|---|---|---|
| Baseline | 68k | 34% | 0k | 132k | 0 |
| Jira Partial | 70k | 35% | 3.6k | 130k | 3 |
| Jira Full | 104k | 52% | 37k | 96k | 8 |
| GitLab + Jira | 148k | 74% | 80.9k | 52k | 112 |
| All Combined | 181k | 91% | 114.3k | 19k | 117 |
| mcp-funnel | 77k | 38% | 9.6k | 123k | 12 pre + 100+ lazy |
This presentation and associated materials are shared for educational purposes. Individual tools and projects referenced maintain their respective licenses:
- mcp-funnel: Check repository for license
- mcp-atlassian: Check repository for license
- mcp-gitlab: Check repository for license
From broken to working in one config change.
You don't have to accept broken workflows. The solution exists today—it's free, it's easy, and it works.