A comprehensive system for fetching ALL media from ALL major tracking clients (AniList, MyAnimeList, Simkl, Kitsu, etc.) and creating complete cross-platform ID mappings.
This system now fetches entire catalogs from each client, not just specific media IDs:
- π Complete Catalogs: Fetch ALL anime, manga, movies, and TV shows from each platform
- π Full Synchronization: Cross-reference millions of media items across all platforms
- π Comprehensive Analytics: Detailed statistics on mapping coverage and quality
- βΈοΈ Resumable Operations: Pause and resume large synchronization tasks
- ποΈ Smart Organization: Individual JSON files for each media ID under client folders
- Multi-client Support: AniList, MyAnimeList, Simkl, Kitsu, AniDB, Trakt, TMDB, TheTVDB
- Complete Catalog Fetching: Downloads entire media databases from each client
- Automatic ID Mapping: Cross-references media IDs across different platforms
- JSON File Generation: Creates individual JSON files for each media ID under client folders
- GitHub Automation: Automated weekly updates via GitHub Actions
- Progress Tracking: Real-time progress with resumable downloads
- Rate Limiting: Built-in rate limiting to respect API limits
- Conflict Detection: Identifies and reports mapping conflicts
- Advanced Analytics: Comprehensive statistics and mapping analysis
media-id-mapper/
βββ anilist/ # All AniList media JSON files (thousands)
βββ mal/ # All MyAnimeList media JSON files (thousands)
βββ simkl/ # All Simkl media JSON files
βββ kitsu/ # All Kitsu media JSON files (thousands)
βββ anidb/ # All AniDB media JSON files
βββ trakt/ # All Trakt media JSON files
βββ tmdb/ # All TMDB media JSON files
βββ thetvdb/ # All TheTVDB media JSON files
βββ scripts/ # Synchronization and utility scripts
βββ .github/workflows/ # GitHub Actions workflows
βββ mappings.json # Complete cross-platform mapping database
βββ catalog-stats.json # Comprehensive catalog statistics
βββ sync-stats.json # Synchronization progress and results
βββ progress.json # Real-time progress tracking
βββ catalog-fetcher.js # Complete catalog fetching engine
βββ complete-catalog-mapper.js # Enhanced mapping system
βββ package.json # Dependencies and scripts
npm installCopy .env.example to .env and add your API keys:
cp .env.example .envEdit .env with your API credentials:
ANILIST_CLIENT_ID=your_anilist_client_id
MAL_CLIENT_ID=your_mal_client_id
SIMKL_CLIENT_ID=your_simkl_client_id
KITSU_CLIENT_ID=your_kitsu_client_id
# ... add other API keys
# For complete catalog sync, use longer delays
REQUEST_DELAY=3000
MAX_RETRIES=5- Go to AniList Developer Portal
- Create a new client application
- Copy the Client ID
- Go to MAL API Documentation
- Create a new API application
- Copy the Client ID
- Go to Simkl Developers
- Create a new app
- Copy the Client ID
- Go to Kitsu API Documentation
- Create an account and get your Client ID
- Go to AniDB API Documentation
- Register for API access and get your Client ID
- Note: AniDB has strict rate limits
- Go to Trakt API Documentation
- Create a new app and get Client ID & Secret
- Go to TMDB API Documentation
- Create an account and get your API Key
- Go to TheTVDB API Documentation
- Register for API access and get your API Key
# Fetch ALL media from ALL clients (takes several hours)
npm run sync-all
# Resume interrupted synchronization
npm run resume
# Generate comprehensive catalog statistics
npm run catalog-stats
# Generate mapping analysis and recommendations
npm run mapping-stats# View catalog statistics
npm run catalog-stats
# Analyze mapping quality and conflicts
npm run mapping-stats
# Generate basic statistics
npm run stats# Update popular anime from AniList
npm run update-popular
# Update specific media IDs
npm run update-specific anilist 21,30,16498 anime
# Sync existing media files
npm run sync- AniList: All anime and manga (tens of thousands of items)
- MyAnimeList: All anime and manga (hundreds of thousands of items)
- Simkl: All anime, movies, and TV shows
- Kitsu: All anime and manga (tens of thousands of items)
- AniDB: All anime (with XML parsing and proper API handling)
- Trakt: All movies and TV shows
- TMDB: All movies and TV shows
- TheTVDB: All TV shows
After running npm run sync-all, you'll have:
anilist/
βββ 21.json # One Piece
βββ 30.json # Neon Genesis Evangelion
βββ 16498.json # Attack on Titan
βββ ... # Thousands more files
βββ 52531.json # Latest anime
βββ etc.
mal/
βββ 21.json # One Piece
βββ 30.json # Neon Genesis Evangelion
βββ 16498.json # Attack on Titan
βββ ... # Hundreds of thousands more files
βββ etc.
# ... similar for other clients
Each media ID generates a comprehensive JSON file:
{
"client": "anilist",
"id": 21,
"type": "anime",
"data": {
"id": 21,
"type": "ANIME",
"title": {
"romaji": "One Piece",
"english": "One Piece",
"native": "ONE PIECE"
},
"synonyms": ["OP"],
"format": "TV",
"status": "RELEASING",
"episodes": 1000,
"startDate": {"year": 1999, "month": 10, "day": 20},
"genres": ["Action", "Adventure", "Comedy", "Drama"],
"coverImage": {"large": "https://..."},
"idMal": 21
},
"mappings": {
"anilist": 21,
"mal": 21,
"kitsu": 1,
"simkl": 4324,
"anidb": 236
},
"crossReferences": [
{"client": "mal", "id": 21},
{"client": "kitsu", "id": 1},
{"client": "simkl", "id": 4324},
{"client": "anidb", "id": 236}
],
"crossClientCount": 4,
"mediaKey": "onepiece-1999-anime",
"catalogIndex": 1245,
"totalInCatalog": 15420,
"lastUpdated": "2024-01-15T10:30:00.000Z"
}The GitHub Actions workflow now runs:
- Schedule: Every Sunday at 3 AM UTC (weekly, not daily due to large data volume)
- Timeout: 6 hours for complete catalog synchronization
- Operations: Full sync, resume, statistics generation
- Artifacts: Complete catalogs, statistics, and analysis reports
You can manually trigger the workflow with custom parameters:
- Operation:
full,resume, orstats - Client: Specific client or
all - Type:
anime,manga, orall
Add these secrets to your GitHub repository:
- Go to Settings β Secrets and variables β Actions
- Add the following secrets:
ANILIST_CLIENT_IDMAL_CLIENT_IDSIMKL_CLIENT_IDKITSU_CLIENT_IDANIDB_CLIENT_IDTRAKT_CLIENT_IDTRAKT_CLIENT_SECRETTHETVDB_API_KEYTMDB_API_KEY
npm run catalog-statsProvides:
- Total media items per client
- Storage usage analysis
- Mapping coverage percentages
- Cross-client match statistics
- File count and size analysis
npm run mapping-statsProvides:
- Mapping pattern analysis
- Conflict detection
- Unmapped item identification
- Improvement recommendations
- Quality assessment
- Real-time Progress:
progress.jsonshows current synchronization status - Resumable Operations: Interrupt and resume large downloads
- Error Recovery: Automatic retry with exponential backoff
- Statistics Logging: Detailed logs for monitoring and debugging
- Time: Complete catalog sync takes several hours
- Storage: Requires significant disk space (several GB)
- API Limits: Respects rate limits with built-in delays
- Network: Large data transfer - ensure stable connection
The system includes comprehensive rate limiting:
- Default delay: 3 seconds between requests
- Max retries: 5 attempts per request
- Exponential backoff: Increases delay on retries
- Batch processing: Processes items in batches to avoid memory issues
Configure in .env:
REQUEST_DELAY=3000
MAX_RETRIES=5- Batch Processing: Processes catalogs in batches
- Streaming: Large datasets are processed incrementally
- Cleanup: Automatic cleanup of temporary data
- Monitoring: Memory usage tracking and optimization
const CatalogSynchronizer = require('./scripts/catalog-synchronizer');
const synchronizer = new CatalogSynchronizer();
// Sync specific clients only
await synchronizer.synchronizeSpecificClients(['anilist', 'mal'], 'anime');const CompleteCatalogMapper = require('./complete-catalog-mapper');
const mapper = new CompleteCatalogMapper();
// Find items with incomplete mappings
const incomplete = mapper.findUnmappedItems();
// Analyze mapping patterns
const patterns = mapper.analyzeMappingPatterns();
// Generate custom reports
const report = mapper.generateCustomReport();// Export complete mapping database
const exportData = mapper.exportAllMappings();
// Import existing mappings
await mapper.importMappings(existingMappings);- Rate Limiting: Increase
REQUEST_DELAYin.env - Memory Issues: Reduce
batchSizein catalog-fetcher.js - Network Errors: Check internet connection and API key validity
- Disk Space: Ensure sufficient storage for large catalogs
# Resume interrupted sync
npm run resume
# Check progress
cat progress.json
# Regenerate statistics
npm run catalog-stats
npm run mapping-stats- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
MIT License - see LICENSE file for details.
For issues and questions:
- Create an issue on GitHub
- Check the documentation
- Review existing issues
- Check sync logs and statistics
- NEW: Complete catalog fetching from all clients
- NEW: Resumable synchronization with progress tracking
- NEW: Advanced analytics and mapping analysis
- NEW: Enhanced GitHub Actions workflow
- NEW: Conflict detection and resolution
- NEW: Comprehensive statistics and monitoring
- Improved: Rate limiting and error handling
- Improved: Memory management and performance
- Initial release
- Support for AniList, MAL, Simkl, Kitsu
- Basic GitHub Actions automation
- JSON file generation
- ID mapping system