A Python + Rust automation system for extracting torrent links from javdb.com and automatically adding them to qBittorrent. Designed as an ingestion pipeline before scraping platforms like MDC-NG.
English | 简体中文
- Modular Spider — 14 specialized modules in
packages/python/javdb_spider/, fetches and filters entries with subtitle/today tags, extracts magnet links with priority ordering - Rust Acceleration (optional) — PyO3 + maturin extension for 5-10x faster HTML parsing; falls back to pure Python automatically
- Parallel Processing — Multi-threaded detail page fetching with one worker per proxy; auto-activates in pool mode with 2+ proxies
- Torrent Classification — Priority-based categories: 字幕 (subtitle), hacked (UC无码破解 > UC > U无码破解 > U), no_subtitle
- Dual Mode — Daily mode (default pages) and Ad Hoc mode (custom URLs for actors, tags, etc.)
- qBittorrent Integration — Auto-upload torrents with categorization, file size filtering, and duplicate prevention
- PikPak Bridge — Transfer old torrents from qBittorrent to PikPak cloud storage
- History Tracking — SQLite/Cloudflare D1 dual storage with session-based rollback and pending-mode writes
- Automated Pipeline — GitHub Actions workflows for daily ingestion, ad hoc scraping, file filtering, dedup, and more
- Cross-Runner Coordination (optional) — Cloudflare Worker + Durable Objects for per-proxy throttling and login state sharing across concurrent runners
- Re-download Detection — Automatically re-downloads when a significantly larger torrent becomes available for the same category
- Email Notifications — Pipeline results with intelligent error detection (critical vs. non-critical)
# Clone and install
git clone https://github.com/TongWu/JAVDB_AutoSpider_CICD.git
cd JAVDB_AutoSpider_CICD
pip install -r requirements.txt
# Configure
cp config.py.example config.py
# Edit config.py: set proxy, qBittorrent credentials, etc.
# Run
python3 -m apps.cli.spider # Daily scraping
python3 -m apps.cli.spider --dry-run # Preview without writing
python3 -m apps.cli.pipeline # Full pipeline (spider + upload + notify)For complete setup instructions, see the Local Setup Guide.
apps/
├── cli/ Canonical CLI entrypoints (spider, pipeline, qb_uploader, etc.)
├── api/ FastAPI REST API
├── web/ Vite + Vue.js frontend
└── desktop/ Electron shell (MVP)
packages/
├── python/
│ ├── javdb_spider/ Spider package (14 modules)
│ ├── javdb_platform/ Platform services (db, proxy, logging)
│ ├── javdb_core/ Domain models and utilities
│ ├── javdb_ingestion/ Pipeline orchestration
│ ├── javdb_integrations/ External integrations (qB, PikPak, Rclone)
│ └── javdb_migrations/ Database migrations
└── rust/
└── javdb_rust_core/ PyO3 Rust extension (optional)
Legacy paths (scripts/, pipeline.py, migration/, api/) are kept as compatibility wrappers.
Copy config.py.example to config.py and configure:
# Minimum required settings
PROXY_MODE = 'pool' # 'pool', 'single', or 'None'
PROXY_POOL = [{'name': 'Proxy-1', 'http': 'http://127.0.0.1:7890', 'https': 'http://127.0.0.1:7890'}]
QB_URL = 'https://192.168.1.100:8080' # qBittorrent Web UI
QB_USERNAME = 'admin'
QB_PASSWORD = 'password'For the full configuration reference (60+ options), see Configuration Guide.
| Variable | Default | Description |
|---|---|---|
STORAGE_BACKEND |
sqlite |
sqlite, d1, or dual |
WRITE_MODE |
pending |
pending (default) or audit (legacy, sunset 2026-08-13) |
LOG_LEVEL |
INFO |
DEBUG, INFO, WARNING, ERROR |
STRICT_DUAL_WRITE |
unset | Set 1 to fail on D1 write errors |
# Spider
python3 -m apps.cli.spider # Daily scraping
python3 -m apps.cli.spider --url "https://javdb.com/actors/EvkJ" # Ad hoc mode
python3 -m apps.cli.spider --use-proxy --phase 1 # Force proxy, phase 1 only
python3 -m apps.cli.spider --ignore-release-date # All entries, not just today
# Pipeline
python3 -m apps.cli.pipeline # Full workflow
python3 -m apps.cli.pipeline --use-proxy # With proxy override
# Uploaders
python3 -m apps.cli.qb_uploader # Upload to qBittorrent
python3 -m apps.cli.qb_file_filter --min-size 100 --dry-run # Filter small files
# Maintenance
python3 -m apps.cli.migration --help # Database migrations
python3 -m apps.cli.rollback --session-id 332 # Rollback a session
python3 -m apps.cli.login # Refresh JavDB session cookieFor the full CLI reference, see CLI Reference.
| Method | Guide | Best For |
|---|---|---|
| Local | Local Setup | Development, manual runs |
| GitHub Actions | GH Actions Setup | Automated daily pipeline |
| Docker | Docker Deploy | Self-hosted server |
| Proxy Coordinator | Proxy Coordinator | Multi-runner coordination |
| Workflow | Trigger | Description |
|---|---|---|
DailyIngestion.yml |
Cron 12:00 UTC + manual | Daily scraping pipeline |
AdHocIngestion.yml |
Manual | Custom URL scraping |
QBFileFilter.yml |
Cron 16:00 UTC + manual | Filter small files (4h after daily) |
WeeklyDedup.yml |
Cron Sunday + manual | Rclone deduplication |
RollbackD1.yml |
Manual | Session rollback |
StaleSessionCleanup.yml |
Cron daily 02:00 UTC | Clean up stuck sessions (>48h) |
AuditArchive.yml |
Cron weekly Monday | Prune old audit rows |
Migration.yml |
Manual | Database migration runner |
TestIngestion.yml |
Manual | Dry-run test pipeline |
The system supports three storage modes via STORAGE_BACKEND:
- SQLite (default) — Local files in
reports/(history.db, reports.db, operations.db) - D1 — Cloudflare D1 for GitHub Actions environments
- Dual — Writes mirror to both; reads from D1
Every pipeline run is tagged with a session ID and follows the lifecycle: in_progress → finalizing → committed / failed. Pending-mode writes only land in history tables at commit time; failed runs delete pending rows cleanly.
For rollback procedures, see D1 Rollback Guide.
- Local Setup — From-scratch installation
- GitHub Actions Setup — CI/CD deployment
- Docker Deploy — Container deployment
- Configuration Reference — All 60+ config options
- Proxy Coordinator — Cross-runner coordination
- Proxy Setup — Proxy pool configuration
- CloudFlare Bypass — CF challenge fallback
- JavDB Login — Session cookie refresh
- Web UI Deploy — Web UI + API stack
- Rust Installation — Optional Rust extension
- CLI Reference — All CLI commands and arguments
- API Usage Guide — Python module and REST API
- History System — Duplicate prevention and tracking
- D1 Rollback — Rollback SOP and dispatch matrix
- Troubleshooting — Common issues and solutions
- Logging — Log configuration and formats
- Migration Scripts — Database migration tools
- CONTEXT.md — Domain language glossary
- JavDB Login Guide — Login troubleshooting
- Proxy Coordinator Worker — Cloudflare Worker source
- DeepWiki — AI-powered documentation explorer
- Never commit
config.py(excluded in.gitignore) - Do not commit files under
reports/ - Use GitHub personal access tokens, not passwords
- Store sensitive values in environment variables for CI/CD
- Session cookies auto-expire; refresh via
python3 -m apps.cli.login
Contributions are welcome! Please feel free to submit issues or pull requests.
This project is for educational and personal use only. Please respect the terms of service of the websites you scrape.