A modern Python application for monitoring Reddit threads, downloading media content, and storing everything in a local SQLite database. Built with asynchronous architecture and intelligent scheduling for optimal performance and scalability.
- Automated Scheduling: Background scheduler with configurable intervals for continuous monitoring
- Thread Monitoring: Automatically tracks specified Reddit threads and subreddits
- Enhanced Metrics: Collects upvotes, comment counts, and engagement statistics
- Media Download: Downloads images, videos, and other media content from posts
- SQLite Storage: Stores all data in a structured SQLite database with SQLAlchemy ORM
- Concurrent Processing: Supports concurrent media downloads with configurable limits
- OAuth2 Authentication: Secure Reddit API access using refresh tokens
- Web Interface: Modern web interface for browsing downloaded content with metrics
- Error Handling: Robust error handling with exponential backoff retry logic
- Content Filtering: Intelligent duplicate detection and content validation
- Python 3.11 or higher
- Reddit API credentials (client ID and secret)
-
Clone the repository:
git clone https://github.com/yourusername/RedditSync.git cd RedditSync -
Create and activate virtual environment:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies:
pip install -r requirements.txt
-
Set up Reddit API credentials:
# Run the interactive token setup script python tools/1_get_refresh_token.py --save # Verify your environment configuration python tools/2_check_env.py
-
Database migration (if upgrading from older version):
# If you have an existing database, migrate it to the new schema python tools/migrate_add_metrics.py # Test the scheduler implementation (optional) python tools/test_scheduler.py
Create a .env file in the project root based on env.example:
# Required Reddit API settings
REDDIT_CLIENT_ID=your_client_id_here
REDDIT_CLIENT_SECRET=your_client_secret_here
REDDIT_USER_AGENT=python:redditsync:v1.0 (by /u/yourusername)
REDDIT_REFRESH_TOKEN=your_refresh_token_here
# Optional configuration (with defaults)
DB_PATH=db.sqlite
MEDIA_DIR=media
MAX_MEDIA_SIZE=52428800 # 50MB
MAX_CONCURRENT_DOWNLOADS=5
REDIRECT_PORT=8000# Run the main sync application with scheduler
python app/main.py
# Or run the web interface (optional)
cd web && python app.pyReddit Sync runs with an intelligent background scheduler:
- Initial Sync: Starts 10 seconds after application launch
- Regular Sync: Every 2 minutes, processes up to 5 new posts
- Automatic Database Setup: Creates tables automatically on first run
- Metrics Update: Automatically updates scores and comment counts for existing posts
The scheduler prevents overlapping tasks and includes rate limiting to respect Reddit's API guidelines.
RedditSync/
βββ app/ # Core application modules
β βββ __init__.py # Package initialization
β βββ main.py # Application entry point with scheduler
β βββ config.py # Configuration management
β βββ db.py # Database operations (SQLAlchemy ORM)
β βββ models.py # Database models (SQLAlchemy)
β βββ reddit_client.py # Reddit API client
β βββ media_downloader.py # Media download functionality
β βββ sync_worker.py # Synchronization orchestration
β βββ utils.py # Utility functions
βββ docs/ # Documentation
β βββ spec.md # Technical specification
β βββ get_token.md # OAuth2 setup guide
β βββ CODE_STYLE_EN.md # Code style guidelines
βββ tools/ # Utility scripts
β βββ 1_get_refresh_token.py # OAuth2 token generator
β βββ 2_check_env.py # Environment validator
βββ web/ # Web interface (optional)
β βββ app.py # Flask web application
β βββ templates/ # HTML templates
βββ media/ # Downloaded media files (created automatically)
βββ requirements.txt # Python dependencies
βββ env.example # Environment template
βββ README.md # This file
The application uses SQLAlchemy 2.0 with async support as the ORM (Object-Relational Mapping) layer:
- Database Engine: SQLite with aiosqlite for async operations
- ORM: SQLAlchemy 2.0 with modern async/await syntax
- Models: Fully typed SQLAlchemy models with relationships
- Migrations: Built-in migration tool for upgrading from raw SQL
The application uses three main tables:
subscriptions: List of monitored Reddit threads/subredditsnews: Posts and comments with metadata and contentmedia: Downloaded media files with metadata and references
The ORM models provide:
- Type safety with Python type hints
- Relationship mapping between tables
- Automatic query generation and optimization
- Connection pooling and session management
| Variable | Description | Default |
|---|---|---|
REDDIT_CLIENT_ID |
Reddit API client ID | Required |
REDDIT_CLIENT_SECRET |
Reddit API client secret | Required |
REDDIT_USER_AGENT |
User agent string for API requests | Required |
REDDIT_REFRESH_TOKEN |
OAuth2 refresh token | Required |
DB_PATH |
SQLite database file path | db.sqlite |
MEDIA_DIR |
Directory for downloaded media | media |
MAX_MEDIA_SIZE |
Maximum file size for downloads (bytes) | 52428800 (50MB) |
MAX_CONCURRENT_DOWNLOADS |
Concurrent download limit | 5 |
REDIRECT_PORT |
Port for OAuth2 redirect | 8000 |
- Network errors: Automatic retry with exponential backoff
- Rate limiting: Respects Reddit API rate limits
- Large files: Configurable size limits to prevent storage issues
- Duplicate detection: Prevents duplicate downloads and storage
- Logging: Comprehensive logging for debugging and monitoring
This project follows PEP 8 standards with a 99-character line limit. See docs/CODE_STYLE_EN.md for detailed guidelines.
# Install development dependencies
pip install -r requirements-dev.txt
# Run tests
python -m pytest
# Run with coverage
python -m pytest --cov=app- Fork the repository
- Create a feature branch
- Follow the code style guidelines
- Add tests for new functionality
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
- Documentation: Check the
docs/directory for detailed guides - Issues: Report bugs and feature requests via GitHub Issues
- Reddit API: Refer to Reddit API Documentation for API-related questions
- SQLAlchemy ORM integration
- Background scheduler with automated tasks
- Enhanced metrics collection (scores, comments)
- Notifications for new content
- Advanced content filtering and categorization
- Multi-subreddit batch operations
- Export functionality (JSON, CSV)
- Docker containerization
- RESTful API for external integrations
- Database performance optimizations and indexing