A Python tool that downloads Claude Code documentation from Anthropic's website, converts it to Markdown, and serves it locally. Features date-based archiving and idempotent downloads for crash recovery.
- Download HTML docs from https://docs.anthropic.com/en/docs/claude-code/
- Convert to Markdown with clean content extraction
- Local web server for offline browsing
- Date-based archiving - automatic snapshots by date
- Idempotent downloads - crash recovery by resuming on same date
- Historical versions preserved in
archive/YYYYMMDD/
Install uv - a fast Python package installer:
curl -LsSf https://astral.sh/uv/install.sh | sh
# Clone and setup
git clone <repo-url>
cd claude-code-docs
make setup
What it does: Creates a virtual environment using uv venv
and installs dependencies via uv sync
# Download and convert to Markdown (recommended)
make run-full
What it does: Runs uv run python main.py --html --md
to download HTML and convert to Markdown
Run make help
to see all available commands:
Command | Description | Underlying Command |
---|---|---|
make setup |
First-time setup | uv venv && uv sync |
make run-full |
Download & convert | uv run python main.py --html --md |
make run-html |
Download HTML only | uv run python main.py --html |
make run-md |
Convert to Markdown | uv run python main.py --md |
make serve |
Start web server (port 8000) | uv run python main.py --serve |
Command | Description |
---|---|
make info |
Show download status and stats |
make archive-status |
Show current date and archives |
make check |
Verify dependencies |
make clean |
Remove downloads/archive |
make clean-all |
Remove downloads/archive/venv |
The scraper uses downloads/meta.json
to track download date and completion status:
{
"download_date": "20251005",
"status": "completed"
}
Same-date re-runs:
- Completed (
status: "completed"
): Skips entirely - already done - Incomplete (
status: "processing"
): Cleans up and restarts from scratch- Detects crashed/interrupted downloads
- Ensures consistency by removing partial data
- Provides true crash recovery
Different-date re-runs (archiving):
- Moves old download to
archive/YYYYMMDD/
- Starts fresh download with new date
- Preserves historical snapshots
# Day 1 (Oct 5) - First run
make run-full
# Creates: meta.json {"download_date": "20251005", "status": "processing"}
# On success: Updates to {"status": "completed"}
# Day 1 - Re-run after successful completion
make run-full
# Output: "Download already completed for 20251005 - skipping (idempotent)"
# Day 1 - Simulate crash (edit meta.json: set status to "processing")
make run-full
# Output: "WARNING: Previous download was incomplete"
# Cleans up html/, md/, db.yaml and restarts fresh
# Day 2 (Oct 6)
make run-full
# Output: "Archiving previous download to archive/20251005/"
# Creates: archive/20251005/{db.yaml,html,md}
# Starts fresh in downloads/
claude-code-docs/
├── main.py # Scraper implementation
├── Makefile # Primary management interface
├── pyproject.toml # UV package config
├── downloads/ # Current download (git-ignored)
│ ├── meta.json # Download date tracker
│ ├── html/ # HTML files + db.yaml
│ └── md/ # Markdown files
└── archive/ # Historical downloads (git-ignored)
└── 20251005/ # Date-based snapshots
├── html/
└── md/
Note: db.yaml
is now stored at downloads/db.yaml
(moved from downloads/html/db.yaml
) for easier visibility. When archiving, it moves to archive/YYYYMMDD/db.yaml
alongside the html/ and md/ folders.
make serve-port PORT=8080
# Activate virtual environment first
source .venv/bin/activate
# Download from custom URL
python main.py --html --url https://docs.anthropic.com/en/docs/claude-code/quickstart
# Serve on custom port
python main.py --serve --port 8080
# Show current download info
make info
# Show archive history
make archive-status
The scraper is implemented in main.py
as a single-class application with these key components:
- HTML Downloading - Recursive crawling with URL tracking
- Content Extraction - Multi-strategy main content detection
- Link Rewriting - Absolute to relative path conversion
- Markdown Conversion - html2text with custom cleaning
- Date Management - meta.json tracking and archiving logic
- Local Serving - Flask-based web server
See CLAUDE.md
for detailed architecture documentation.
This repository uses GitHub Actions to automatically download and release Claude Code documentation daily.
- Schedule: Runs daily at midnight UTC via cron (
0 0 * * *
) - Process:
- Sets up environment with uv
- Runs
make run
to download and convert docs - Creates release with tag
docs-YYYYMMDD
- Commits downloads folder to repository
- Pushes changes to main branch
- Assets: Each release includes:
claude-code-docs-YYYYMMDD.tar.gz
- Complete downloadhtml-YYYYMMDD.tar.gz
- HTML files onlymd-YYYYMMDD.tar.gz
- Markdown files only
Note: The downloads/
folder is tracked in git, so each day's documentation is both:
- Released as downloadable archives
- Committed to the repository for easy browsing on GitHub
# Download latest release
wget https://github.com/[your-repo]/releases/latest/download/claude-code-docs-YYYYMMDD.tar.gz
# Extract
tar -xzf claude-code-docs-YYYYMMDD.tar.gz
# Browse
cd downloads/md
You can manually trigger the workflow from the GitHub Actions tab:
- Go to Actions → Daily Documentation Release
- Click "Run workflow"
- Wait for completion and check Releases
The scraper is implemented in main.py
as a single-class application with these key components:
- HTML Downloading - Recursive crawling with URL tracking
- Content Extraction - Multi-strategy main content detection
- Link Rewriting - Absolute to relative path conversion
- Markdown Conversion - html2text with custom cleaning
- Date Management - meta.json tracking and archiving logic
- Local Serving - Flask-based web server
See CLAUDE.md
for detailed architecture documentation.
[Your License Here]