Zero-config LLM caching for integration tests via DNS interception
Cache Cow is a transparent caching proxy that intercepts LLM API calls (OpenAI, Anthropic, etc.) via DNS override, automatically caching responses to make your test suite 50-100x faster with absolutely zero code changes.
- ✅ Truly zero config - No code changes, no environment variables, no proxy configuration
- ⚡ Blazing fast - Redis-backed caching with ~50ms response time for cache hits
- 🎯 Smart caching - TTL-based with optional probabilistic bypass to catch API changes
- 📊 Beautiful dashboard - Real-time metrics showing hit rate, latency, and per-provider stats
- 🐳 Easy deployment - Docker Compose setup, up and running in 1 minute
- 🔧 GitHub CI ready - Perfect for self-hosted integration test caching
- 🌐 General purpose - Works with any API, not just LLMs
./scripts/setup.shThis automatically:
- Generates SSL certificate
- Starts Docker containers (Redis + Cache Cow proxy)
- Configures DNS overrides in
/etc/hosts - Installs certificate in Python's certifi bundle
Note: You'll need sudo for DNS configuration, but not for certificate installation.
Try these demos with zero code changes - they use standard production code:
# OpenAI SDK demo
uv run python demos/demo_openai.py
# Anthropic SDK demo
uv run python demos/demo_anthropic.py
# PydanticAI demo (uses OpenAI and Anthropic)
uv run python demos/demo_pydantic_ai.py
# HTTPBin demo (shows it works with any API)
uv run python demos/demo_httpbin.pyEach demo shows:
- First call: ~0.1-2s (cache MISS → real API)
- Second call: ~0.01s (cache HIT → Redis)
- 30-600x speedup! ⚡
Your production code works unchanged:
# OpenAI SDK
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
model="gpt-5.2",
messages=[{"role": "user", "content": "Hello!"}]
)
# PydanticAI
from pydantic_ai import Agent
agent = Agent('openai:gpt-5.2')
result = agent.run_sync("Hello!")
# Any HTTP library
import requests
response = requests.get("https://api.example.com/data")First run: ~2 seconds (cached) Second run: ~0.05 seconds ⚡
Open http://localhost:8001 to see:
- Real-time cache hit rate
- Latency comparison (cached vs uncached)
- Per-provider statistics
- Request distribution
./scripts/teardown.shThis automatically:
- Removes certificate from certifi
- Removes DNS overrides from
/etc/hosts - Stops Docker containers
Your system returns to normal - API calls go directly to real endpoints.
Your Code → api.openai.com
↓
/etc/hosts override (127.0.0.1)
↓
[Cache Cow Proxy]
↓
[Check Redis Cache]
↓
├─ Cache HIT → Return in ~50ms ⚡
└─ Cache MISS → Forward to real API → Cache → Return in ~2s
DNS Interception:
/etc/hostsmakesapi.openai.compoint to127.0.0.1:443- Cache Cow listens on port 443, receives all requests
- Checks Redis cache, serves or forwards to real API
- Your application has no idea Cache Cow exists!
Cache Key Generation:
- Namespace + host + path + normalized prompt + model + max_tokens
- Ignores: temperature, seed, streaming flags (non-deterministic params)
- Normalizes whitespace in prompts for better hit rate
Edit config/cacheCow.yml to customize:
defaults:
ttl: 86400 # 24 hours
namespace: default
domains:
# OpenAI
- host: api.openai.com
target: api.openai.com
port: 443
ttl: 86400
cache: true
# Anthropic
- host: api.anthropic.com
target: api.anthropic.com
port: 443
ttl: 86400
cache: true
# Add your own domains!
- host: your-api.example.com
target: your-api.example.com
port: 443
ttl: 3600
cache: true
# Optional: Probabilistic bypass for testing
probabilistic:
enabled: false
bypass_rate: 0.1 # 10% of requests bypass cacheCatch API changes while still getting caching benefits:
probabilistic:
enabled: true
bypass_rate: 0.1This will:
- Cache 90% of requests (fast)
- Bypass cache for 10% of requests (tests real API)
- Use deterministic hashing (same request → same decision)
curl http://localhost:8001/api/stats/default{
"namespace": "default",
"cache_hits": 45,
"cache_misses": 5,
"bypassed": 0,
"total_requests": 50,
"hit_rate": 90.0,
"avg_cached_latency_ms": 52,
"avg_uncached_latency_ms": 2340,
"speedup": 45
}curl http://localhost:8001/api/stats/default/providers{
"namespace": "default",
"providers": [
{
"provider": "api.openai.com",
"hits": 30,
"misses": 5,
"total": 35,
"hit_rate": 85.7
}
]
}curl -X DELETE http://localhost:8001/api/cache/defaultcurl http://localhost:8001/api/healthOut of the box:
- ✅ OpenAI (api.openai.com)
- ✅ Anthropic (api.anthropic.com)
- ✅ Google AI (generativelanguage.googleapis.com)
- ✅ Cohere (api.cohere.ai)
- ✅ Together AI (api.together.xyz)
Add any domain by editing config/cacheCow.yml!
cache-cow/
├── README.md # This file
├── docker-compose.yml # Multi-container orchestration
├── Dockerfile # Container image definition
├── requirements.txt # Production dependencies
│
├── src/ # Source code
│ ├── reverse_proxy.py # Core caching proxy
│ ├── api.py # FastAPI management server
│ └── dashboard.html # Web dashboard UI
│
├── config/ # Configuration
│ └── cacheCow.yml # Domain whitelist & settings
│
├── scripts/ # Setup scripts
│ ├── setup.sh # Complete setup (one command)
│ ├── teardown.sh # Complete teardown (one command)
│ ├── generate_cert.sh # Generate SSL certificate
│ ├── install_cert.sh # Install cert in certifi
│ ├── cleanup_cert.sh # Remove cert from certifi
│ ├── setup_hosts.sh # Setup DNS overrides
│ ├── cleanup_hosts.sh # Remove DNS overrides
│ └── start.sh # Container startup (internal)
│
└── demos/ # Demo scripts
├── demo_openai.py # OpenAI SDK demo
├── demo_anthropic.py # Anthropic SDK demo
├── demo_pydantic_ai.py # PydanticAI demo (uses Anthropic)
└── demo_httpbin.py # HTTPBin (non-LLM) demo
vs Forward Proxy (HTTP_PROXY):
- ✅ No environment variables needed
- ✅ Works with ALL HTTP clients (no cooperation needed)
- ✅ True zero-config
vs Code Mocking:
- ✅ No code changes
- ✅ Tests real HTTP stack
- ✅ Catches integration issues
vs Recording Fixtures:
- ✅ No manual fixture management
- ✅ Automatic cache management
- ✅ Easy cache invalidation
-
Verify DNS override:
ping api.openai.com # Should show 127.0.0.1 -
Check Cache Cow is running:
docker-compose ps
-
Check logs:
docker-compose logs proxy
-
Check dashboard:
open http://localhost:8001
Make sure Cache Cow is running:
docker-compose ps
# Should show cacheCow-proxy and cacheCow-redis running./scripts/teardown.shCache Cow can be deployed as a centralized caching service for all your CI test runs. Here's how to make it production-ready:
Note: The current implementation is Python-specific. Certificate installation targets Python's certifi bundle. For other languages (Node.js, Ruby, Go, etc.), you would need system-wide certificate installation (requires sudo), which is not currently implemented or tested.
┌─────────────────────────────────────────────────┐
│ Your Infrastructure (AWS/GCP/Azure/Self-hosted) │
│ │
│ ┌────────────────────────────────────────────┐ │
│ │ Cache Cow Service (Single Instance) │ │
│ │ │ │
│ │ ┌─────────────┐ ┌──────────────────┐ │ │
│ │ │ Redis │ │ Reverse Proxy │ │ │
│ │ │ (Persistent│◄──┤ (Cache Logic) │ │ │
│ │ │ Volume) │ │ │ │ │
│ │ └─────────────┘ └──────────────────┘ │ │
│ │ │ │
│ │ Accessible at: cache-cow.yourcompany.com │ │
│ └────────────────────────────────────────────┘ │
│ ▲ │
└───────────────────────┼──────────────────────────┘
│
┌─────────────┴─────────────┐
│ │
┌──────▼───────┐ ┌────────▼──────┐
│ CI Job #1 │ │ CI Job #2 │
│ │ │ │
│ /etc/hosts: │ │ /etc/hosts: │
│ api.openai │ │ api.openai │
│ .com → │ │ .com → │
│ cache-cow IP │ │ cache-cow IP │
└──────────────┘ └───────────────┘
Required for production HTTPS interception.
# Generate CA certificate
openssl req -x509 -newkey rsa:4096 -keyout ca-key.pem -out ca-cert.pem \
-days 3650 -nodes -subj "/CN=Cache Cow CA"
# Generate server certificate
openssl req -newkey rsa:4096 -keyout server-key.pem -out server-csr.pem \
-nodes -subj "/CN=cache-cow.yourcompany.com"
openssl x509 -req -in server-csr.pem -CA ca-cert.pem -CAkey ca-key.pem \
-CAcreateserial -out server-cert.pem -days 365Update docker-compose.yml:
services:
proxy:
environment:
- SSL_CERT_FILE=/certs/server-cert.pem
- SSL_KEY_FILE=/certs/server-key.pem
volumes:
- ./certs:/certs:roCI Setup (Python):
# .github/workflows/test.yml
- name: Install Cache Cow CA cert
run: |
curl -o /tmp/cache-cow-ca.pem https://cache-cow.yourcompany.com/ca-cert.pem
# Python uses certifi bundle, NOT system certificates
CERTIFI_PATH=$(python -c "import certifi; print(certifi.where())")
echo "" >> "$CERTIFI_PATH"
echo "# Cache Cow Certificate" >> "$CERTIFI_PATH"
cat /tmp/cache-cow-ca.pem >> "$CERTIFI_PATH"Note: Python's httpx/requests use the certifi bundle and ignore system certificate stores. For other languages (Node.js, Ruby, Go), you may need system-wide cert installation with sudo, but this is not tested.
Use nginx as SSL termination proxy:
# docker-compose.prod.yml
services:
nginx:
image: nginx:alpine
ports:
- "443:443"
volumes:
- ./nginx.conf:/etc/nginx/nginx.conf:ro
- /etc/letsencrypt:/etc/letsencrypt:ro
depends_on:
- proxy
proxy:
ports:
- "8443:8443" # Internal only, not exposednginx.conf:
server {
listen 443 ssl;
server_name cache-cow.yourcompany.com;
ssl_certificate /etc/letsencrypt/live/cache-cow.yourcompany.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/cache-cow.yourcompany.com/privkey.pem;
# Forward all domains to Cache Cow
location / {
proxy_pass http://proxy:8443;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}
}Preserve cache across restarts.
Update docker-compose.yml:
services:
redis:
image: redis:7-alpine
command: redis-server --save 60 1 --loglevel warning
volumes:
- redis_data:/data
restart: always
volumes:
redis_data:
driver: localFor cloud deployments:
- AWS: Use EBS volume or ElastiCache
- GCP: Use Persistent Disk or Memorystore
- Azure: Use Azure Disk or Azure Cache for Redis
Isolate cache per team/project.
# Team A's tests
- name: Configure Cache Cow
run: |
echo "CACHE_COW_IP api.openai.com" | sudo tee -a /etc/hosts
curl -X POST http://cache-cow.yourcompany.com:8001/api/namespace \
-d '{"namespace": "team-a"}'
# Team B's tests
- name: Configure Cache Cow
run: |
echo "CACHE_COW_IP api.openai.com" | sudo tee -a /etc/hosts
curl -X POST http://cache-cow.yourcompany.com:8001/api/namespace \
-d '{"namespace": "team-b"}'Update config/cacheCow.yml:
defaults:
namespace: "${NAMESPACE:-default}" # Environment variable supportComplete CI workflow example:
# .github/workflows/integration-tests.yml
name: Integration Tests with Cache Cow
on: [push, pull_request]
env:
CACHE_COW_HOST: cache-cow.yourcompany.com
CACHE_COW_IP: 10.0.1.100 # Your Cache Cow server IP
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Configure DNS for Cache Cow
run: |
echo "${{ env.CACHE_COW_IP }} api.openai.com" | sudo tee -a /etc/hosts
echo "${{ env.CACHE_COW_IP }} api.anthropic.com" | sudo tee -a /etc/hosts
- name: Verify Cache Cow connectivity
run: |
curl -f http://${{ env.CACHE_COW_HOST }}:8001/api/health || exit 1
- name: Run integration tests
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
run: |
pytest tests/integration/
# First run: ~5 minutes
# Subsequent runs: ~30 seconds ⚡
- name: View Cache Stats
if: always()
run: |
curl http://${{ env.CACHE_COW_HOST }}:8001/api/stats/defaultAdd to src/api.py:
from prometheus_client import Counter, Histogram, generate_latest
cache_hits = Counter('cache_cow_hits_total', 'Total cache hits')
cache_misses = Counter('cache_cow_misses_total', 'Total cache misses')
request_latency = Histogram('cache_cow_request_duration_seconds', 'Request latency')
@app.get("/metrics")
def metrics():
return Response(generate_latest(), media_type="text/plain")services:
grafana:
image: grafana/grafana
ports:
- "3000:3000"
volumes:
- grafana_data:/var/lib/grafana
- ./grafana/dashboards:/etc/grafana/provisioning/dashboardsTrack API cost savings.
Add to config/cacheCow.yml:
cost_tracking:
enabled: true
pricing:
api.openai.com:
gpt-4: 0.03 # per 1K tokens
gpt-3.5-turbo: 0.002
api.anthropic.com:
claude-3-opus: 0.015
claude-3-sonnet: 0.003- Set up SSL/TLS termination
- Configure persistent Redis storage
- Implement namespace isolation
- Set up monitoring (Prometheus/Grafana)
- Configure log aggregation
- Set up automated backups for Redis
- Document CA certificate distribution
- Test failover behavior
- Set up alerts for high error rates
- Configure rate limiting (optional)
- Set up cost tracking dashboard
- Document DNS configuration for CI
Terraform Example (AWS):
resource "aws_ecs_service" "cache_cow" {
name = "cache-cow"
cluster = aws_ecs_cluster.main.id
task_definition = aws_ecs_task_definition.cache_cow.arn
desired_count = 1
load_balancer {
target_group_arn = aws_lb_target_group.cache_cow.arn
container_name = "cache-cow-proxy"
container_port = 443
}
}
resource "aws_elasticache_cluster" "cache_cow_redis" {
cluster_id = "cache-cow-redis"
engine = "redis"
node_type = "cache.t3.micro"
num_cache_nodes = 1
parameter_group_name = "default.redis7"
}- API Key Security: Cache Cow sees API keys in requests. Use network isolation and encryption at rest.
- Access Control: Restrict dashboard access with authentication
- Audit Logging: Log all cache operations for compliance
- Rate Limiting: Prevent abuse of the caching service
- Network Policies: Use VPC/subnet isolation in cloud environments
For a typical team running 100 CI jobs/day:
- Without Cache Cow: 100 jobs × 5 min = 500 minutes
- With Cache Cow: 100 jobs × 0.5 min = 50 minutes
- Time Saved: 450 minutes/day = 7.5 hours/day
MIT
Built with:
- aiohttp - Async HTTP server
- FastAPI - Modern web framework
- Redis - In-memory data store
- Chart.js - Beautiful charts
MOOOOOOO! 🐮