Reddit Stash is a Python script designed to help you effortlessly back up your Reddit saved/ posted/ upvoted posts and comments to Dropbox or your local machine. Utilizing GitHub Actions, this script runs every 2 hours during peak hours and twice during off-peak hours, automating the process of archiving your Reddit data in Dropbox after a simple setup.
When Reddit Stash runs successfully, your saved content is organized by subreddit in a clean folder structure and stored as markdown files:
reddit/
├── r_AskReddit/
│ ├── POST_abcd123.md # Your posted content
│ ├── COMMENT_efgh456.md # Your comments
│ ├── SAVED_POST_xyz789.md # Posts you saved
│ └── SAVED_COMMENT_def012.md # Comments you saved
├── r_ProgrammerHumor/
│ ├── UPVOTE_POST_ghi345.md # Posts you upvoted
│ ├── UPVOTE_COMMENT_mno901.md # Comments you upvoted
│ └── GDPR_POST_jkl678.md # From GDPR export (if enabled)
├── gdpr_data/ # GDPR CSV files (if processing enabled)
│ ├── saved_posts.csv
│ └── saved_comments.csv
└── file_log.json # Tracks processed items
Each post and comment is formatted with:
- Original title and content
- Author information
- Post/comment URL
- Timestamp
- Subreddit details
- Any images or links from the original post
When Reddit Stash completes processing, you'll receive detailed storage information:
Processing completed. 150 items processed, 25 items skipped.
Markdown file storage: 12.45 MB
Media file storage: 89.32 MB
Total combined storage: 101.77 MB
This gives you clear visibility into:
- Text content storage: How much space your saved posts and comments use
- Media storage: How much space downloaded images and videos use
- Total storage: Combined space used for your complete Reddit archive
- What You Get
- How It Works
- Quick Start
- Key Features
- Why Use Reddit Stash
- Setup
- Configuration
- Docker Environment Variables
- Alternative Scheduling: External Cron Setup
- Important Notes
- File Organization and Utilities
- Frequently Asked Questions
- Troubleshooting
- Security Considerations
- Contributing
- Acknowledgement
- Project Status
- License
graph LR
A[Reddit API] -->|Fetch Content| B[Reddit Stash Script]
B -->|Save as Markdown| C[Local Storage]
B -->|Check Settings| D{Save Type}
D -->|SAVED| E[Saved Posts/Comments]
D -->|ACTIVITY| F[User Posts/Comments]
D -->|UPVOTED| G[Upvoted Content]
D -->|ALL| H[All Content Types]
C -->|Optional| I[Dropbox Upload]
J[GDPR Export] -->|Optional| B
-
Data Collection:
- The script connects to Reddit's API to fetch your saved, posted, or upvoted content
- Optionally, it can process your GDPR export data for a complete history
-
Processing & Organization:
- Content is processed based on your settings (SAVED, ACTIVITY, UPVOTED, or ALL)
- Files are organized by subreddit in a clean folder structure
- A log file tracks all processed items to avoid duplicates
-
Storage Options:
- Local storage: Content is saved as markdown files on your machine
- Cloud storage: Optional integration with Dropbox for backup
-
Deployment Methods:
- GitHub Actions: Fully automated with scheduled runs and Dropbox integration
- Local Installation: Run manually or schedule with cron jobs on your machine
- Docker: Run in a containerized environment with optional volume mounts
The script is designed to be flexible, allowing you to choose how you collect, process, and store your Reddit content.
For those who want to get up and running quickly, here's a streamlined process:
- Fork this repository.
- Set up the required secrets in your GitHub repository:
- From Reddit:
REDDIT_CLIENT_ID
,REDDIT_CLIENT_SECRET
,REDDIT_USERNAME
,REDDIT_PASSWORD
- From Dropbox:
DROPBOX_APP_KEY
,DROPBOX_APP_SECRET
,DROPBOX_REFRESH_TOKEN
- From Reddit:
- Manually trigger the workflow from the Actions tab.
- Clone the repository:
Replace
git clone https://github.com/YOUR_USERNAME/reddit-stash.git cd reddit-stash
YOUR_USERNAME
with your GitHub username (or userhnfzl
if using the original repository). - Install dependencies:
pip install -r requirements.txt
- Set up your environment variables and run:
python reddit_stash.py
- Build the Docker image:
docker build -t reddit-stash .
- Run with your environment variables:
Note: IMGUR_CLIENT_ID and IMGUR_CLIENT_SECRET are optional - Imgur API registration is permanently closed, only users with existing applications can use these
docker run -it \ -e REDDIT_CLIENT_ID=your_client_id \ -e REDDIT_CLIENT_SECRET=your_client_secret \ -e REDDIT_USERNAME=your_username \ -e REDDIT_PASSWORD=your_password \ -e DROPBOX_APP_KEY=your_dropbox_key \ -e DROPBOX_APP_SECRET=your_dropbox_secret \ -e DROPBOX_REFRESH_TOKEN=your_dropbox_token \ -e IMGUR_CLIENT_ID=your_imgur_client_id \ -e IMGUR_CLIENT_SECRET=your_imgur_client_secret \ -v $(pwd)/reddit:/app/reddit \ reddit-stash
For detailed setup instructions, continue reading the Setup section.
Feature | GitHub Actions | Local Installation | Docker |
---|---|---|---|
Ease of Setup | ⭐⭐⭐ (Easiest) | ⭐⭐ | ⭐⭐ |
Automation | ✅ Runs on schedule | ✅ Manual control or cron | ✅ Built-in scheduling support |
Requirements | GitHub account | Python 3.10-3.12 | Docker |
Data Storage | Dropbox required | Local or Dropbox | Local or Dropbox |
Maintenance | Minimal | More hands-on | Low to Medium |
Privacy | Credentials in GitHub secrets | Credentials on local machine | Credentials in container |
Best For | Set & forget users | Power users with customization needs | Containerized environments & flexible scheduling |
- 🤖 Automated Reddit Backup: Automatically retrieves saved posts and comments from Reddit, even your posts and comments if you set it up.
- 🔄 Flexible Storage Options: Allows for flexible saving options (all activity or only saved items) via
settings.ini
. - 📦 Dropbox Integration: Downloads and uploads the files to Dropbox for storage.
- 📝 Markdown Support: Saves the content as markdown files.
- 🔍 File Deduplication: Uses intelligent file existence checking to avoid re-downloading content.
- ⏱️ Rate Limit Management: Implements dynamic sleep timers to respect Reddit's API rate limits.
- 🔒 GDPR Data Processing: Optional processing of Reddit's GDPR export data.
- 🖼️ Enhanced Media Downloads: Download images, videos, and other media with dramatically improved success rates (~80% vs previous ~10%), featuring intelligent fallback systems and modern web compatibility.
- 🔄 Content Recovery System: 4-provider cascade for failed downloads (Wayback Machine, PullPush.io, Reddit Previews, Reveddit) with SQLite caching and automatic retry across runs.
Reddit Stash was designed with specific use cases in mind:
Reddit only shows your most recent 1000 saved posts. With Reddit Stash, you can save everything and go beyond this limitation.
Many users save technical posts, tutorials, or valuable discussions on Reddit. Reddit Stash helps you build a searchable archive of this knowledge.
Reddit posts and comments can be deleted by users or moderation. Reddit Stash preserves this content in your personal archive.
All of your saved posts are available locally in markdown format, making them easily accessible even without an internet connection.
Since content is saved in markdown, you can easily import it into note-taking systems like Obsidian, Notion, or any markdown-compatible tool.
Beyond text, Reddit Stash can download and preserve images, videos, and other media from posts, ensuring you have complete archives even if external hosting services go offline.
- ✅ Python 3.10-3.12 (Python 3.12 recommended for best performance)
- 🔑 Reddit API credentials
- 📊 A Dropbox account with an API token
Before proceeding with any installation method, ensure that you have set the Reddit environment variables. Follow Reddit API guide to create a Reddit app and obtain the necessary credentials.
Note: The following process requires the Dropbox App setup. The GitHub Actions workflow runs the script every 2 hours during peak hours (6:00-21:00 UTC) and twice during off-peak hours (23:00 and 3:00 UTC), uploading the files to Dropbox. The workflow is defined in .github/workflows/reddit_scraper.yml
.
-
Fork this repository.
-
Set Up Secrets:
- Go to your forked repository's Settings > Secrets and variables > Actions > Click on New repository secret.
- Add the following secrets individually:
REDDIT_CLIENT_ID
REDDIT_CLIENT_SECRET
REDDIT_USERNAME
REDDIT_PASSWORD
For Dropbox SetupDROPBOX_APP_KEY
DROPBOX_APP_SECRET
DROPBOX_REFRESH_TOKEN
For Enhanced Media Downloads (Optional - Imgur API registration is permanently closed)IMGUR_CLIENT_ID
(only if you already have an existing Imgur application)IMGUR_CLIENT_SECRET
(only if you already have an existing Imgur application)
- Enter the respective secret values without any quotes.
- Manually Trigger the Workflow:
- Go to the Actions tab > Select the Reddit Stash Scraper from the list on the left > Click Run workflow > Select the branch
main
> Click the green Run workflow button. The workflow will then be triggered, and you can monitor its progress in the Actions tab. Upon successful completion, you should see the Reddit folder in your Dropbox.
-
The workflow runs automatically on a schedule:
- Every 2 hours during peak hours (6:00-21:00 UTC)
- Twice during off-peak hours (23:00 and 3:00 UTC)
- You can adjust these times in the workflow file to match your timezone if needed.
-
Additional Workflows: The repository includes automated workflows for maintenance and testing:
python-compatibility.yml
: Tests compatibility across Python versions 3.10-3.12
-
Clone this repository:
git clone https://github.com/YOUR_USERNAME/reddit-stash.git cd reddit-stash
Replace
YOUR_USERNAME
with your GitHub username (or userhnfzl
if using the original repository). -
Install the required Python packages:
pip install -r requirements.txt
-
Setup the Dropbox App setup. Skip it if you don't want to setup the dropbox and only want to save the file locally in your system.
-
Edit the settings.ini file, here is how to
-
Set Environment Variables (Optional but preferred):
For macOS and Linux:
export REDDIT_CLIENT_ID='your_client_id' export REDDIT_CLIENT_SECRET='your_client_secret' export REDDIT_USERNAME='your_username' export REDDIT_PASSWORD='your_password' # Optional, if you need dropbox locally export DROPBOX_APP_KEY='dropbox-app-key' export DROPBOX_APP_SECRET='dropbox-secret-key' export DROPBOX_REFRESH_TOKEN='dropbox-secret-key' # Optional, for enhanced Imgur downloading (if you have existing API access) export IMGUR_CLIENT_ID='your_imgur_client_id' export IMGUR_CLIENT_SECRET='your_imgur_client_secret'
For Windows:
set REDDIT_CLIENT_ID='your_client_id' set REDDIT_CLIENT_SECRET='your_client_secret' set REDDIT_USERNAME='your_username' set REDDIT_PASSWORD='your_password' # Optional, if you need dropbox locally set DROPBOX_APP_KEY='dropbox-app-key' set DROPBOX_APP_SECRET='dropbox-secret-key' set DROPBOX_REFRESH_TOKEN='dropbox-secret-key' # Optional, for enhanced Imgur downloading (if you have existing API access) set IMGUR_CLIENT_ID='your_imgur_client_id' set IMGUR_CLIENT_SECRET='your_imgur_client_secret'
You can verify the setup with:
echo $REDDIT_CLIENT_ID echo $REDDIT_CLIENT_SECRET echo $REDDIT_USERNAME echo $REDDIT_PASSWORD echo $DROPBOX_APP_KEY echo $DROPBOX_APP_SECRET echo $DROPBOX_REFRESH_TOKEN echo $IMGUR_CLIENT_ID echo $IMGUR_CLIENT_SECRET
-
Usage:
- First-time setup:
python reddit_stash.py
To upload to Dropbox (optional):
python dropbox_utils.py --upload
- Subsequent runs, as per your convenience:
- Download from Dropbox (optional):
python dropbox_utils.py --download
- Process Reddit saved items:
python reddit_stash.py
- Upload to Dropbox (optional):
python dropbox_utils.py --upload
🐳 Pre-built Images Available! No build required - pull and run directly from GitHub Container Registry.
📁 Important: File Storage Location
When running via Docker, files are downloaded to /app/reddit/
inside the container. Without a volume mount, these files are lost when the container stops.
To persist your downloaded Reddit content, you must mount a volume:
- Named Volume (recommended):
-v reddit-data:/app/reddit
- Docker manages storage, data persists across container restarts - Bind Mount (direct access):
-v $(pwd)/reddit:/app/reddit
- Files stored directly on your host machine at./reddit/
- NAS Path:
-v /volume1/reddit-stash:/app/reddit
- Store on your NAS at a specific location
Pull pre-built multi-platform images (AMD64/ARM64) from GitHub Container Registry:
# Pull the latest stable image
docker pull ghcr.io/rhnfzl/reddit-stash:latest
# Run with your credentials
docker run -d \
--name reddit-stash \
-v reddit-data:/app/reddit \
-e REDDIT_CLIENT_ID='your_client_id' \
-e REDDIT_CLIENT_SECRET='your_client_secret' \
-e REDDIT_USERNAME='your_username' \
-e REDDIT_PASSWORD='your_password' \
ghcr.io/rhnfzl/reddit-stash:latest
Available Image Tags:
Tag | Description | Use Case |
---|---|---|
latest |
Latest stable from main branch | Production deployments |
develop |
Development version | Testing new features |
py3.10-latest , py3.11-latest , py3.12-latest |
Python-specific versions | Specific Python requirements |
v1.0.0 |
Semantic version tags | Version pinning |
sha-abc123 |
Commit-specific builds | Reproducible deployments |
Platform Support:
- ✅ AMD64 (x86_64) - Standard x86 systems
- ✅ ARM64 - Raspberry Pi, ARM-based NAS devices
NAS/HomeLab Compatibility:
- Synology DSM (Container Manager)
- QNAP (Container Station)
- TrueNAS SCALE
- unRAID (Community Applications)
- OpenMediaVault (Docker plugin)
- Proxmox (LXC/Docker)
- Portainer, Yacht, CasaOS, Dockge
Periodic execution with pre-built image:
docker run -d \
--name reddit-stash \
-e REDDIT_CLIENT_ID='your_client_id' \
-e REDDIT_CLIENT_SECRET='your_client_secret' \
-e REDDIT_USERNAME='your_username' \
-e REDDIT_PASSWORD='your_password' \
-e SCHEDULE_MODE='periodic' \
-e SCHEDULE_INTERVAL='7200' \
-v reddit-data:/app/reddit \
ghcr.io/rhnfzl/reddit-stash:latest
Docker Compose is the recommended deployment method for NAS and HomeLab environments. It provides easier management, configuration persistence, and works seamlessly with Portainer, Yacht, CasaOS, and other GUI tools.
1. Create docker-compose.yml
Basic Setup (Local-only with Periodic Execution):
version: '3.8'
services:
reddit-stash:
image: ghcr.io/rhnfzl/reddit-stash:latest
container_name: reddit-stash
restart: unless-stopped
environment:
- REDDIT_CLIENT_ID=${REDDIT_CLIENT_ID}
- REDDIT_CLIENT_SECRET=${REDDIT_CLIENT_SECRET}
- REDDIT_USERNAME=${REDDIT_USERNAME}
- REDDIT_PASSWORD=${REDDIT_PASSWORD}
- SCHEDULE_MODE=periodic
- SCHEDULE_INTERVAL=7200
volumes:
- reddit-data:/app/reddit
volumes:
reddit-data:
2. Create .env file
Create a .env
file in the same directory as your docker-compose.yml
:
REDDIT_CLIENT_ID=your_client_id_here
REDDIT_CLIENT_SECRET=your_client_secret_here
REDDIT_USERNAME=your_reddit_username
REDDIT_PASSWORD=your_reddit_password
3. Run with Docker Compose
# Start the service
docker-compose up -d
# View logs
docker-compose logs -f
# Stop the service
docker-compose down
# Update to latest image
docker-compose pull && docker-compose up -d
With Dropbox Sync:
version: '3.8'
services:
reddit-stash:
image: ghcr.io/rhnfzl/reddit-stash:latest
container_name: reddit-stash
restart: unless-stopped
environment:
- REDDIT_CLIENT_ID=${REDDIT_CLIENT_ID}
- REDDIT_CLIENT_SECRET=${REDDIT_CLIENT_SECRET}
- REDDIT_USERNAME=${REDDIT_USERNAME}
- REDDIT_PASSWORD=${REDDIT_PASSWORD}
- DROPBOX_APP_KEY=${DROPBOX_APP_KEY}
- DROPBOX_APP_SECRET=${DROPBOX_APP_SECRET}
- DROPBOX_REFRESH_TOKEN=${DROPBOX_REFRESH_TOKEN}
- SCHEDULE_MODE=periodic
- SCHEDULE_INTERVAL=7200
volumes:
- reddit-data:/app/reddit
volumes:
reddit-data:
Add to your .env
:
DROPBOX_APP_KEY=your_dropbox_app_key
DROPBOX_APP_SECRET=your_dropbox_app_secret
DROPBOX_REFRESH_TOKEN=your_dropbox_refresh_token
With Imgur API (for Better Rate Limits):
version: '3.8'
services:
reddit-stash:
image: ghcr.io/rhnfzl/reddit-stash:latest
container_name: reddit-stash
restart: unless-stopped
environment:
- REDDIT_CLIENT_ID=${REDDIT_CLIENT_ID}
- REDDIT_CLIENT_SECRET=${REDDIT_CLIENT_SECRET}
- REDDIT_USERNAME=${REDDIT_USERNAME}
- REDDIT_PASSWORD=${REDDIT_PASSWORD}
- IMGUR_CLIENT_ID=${IMGUR_CLIENT_ID}
- SCHEDULE_MODE=periodic
- SCHEDULE_INTERVAL=7200
volumes:
- reddit-data:/app/reddit
volumes:
reddit-data:
Add to your .env
:
IMGUR_CLIENT_ID=your_imgur_client_id
Full-Featured Example:
version: '3.8'
services:
reddit-stash:
image: ghcr.io/rhnfzl/reddit-stash:latest
container_name: reddit-stash
restart: unless-stopped
environment:
# Reddit credentials
- REDDIT_CLIENT_ID=${REDDIT_CLIENT_ID}
- REDDIT_CLIENT_SECRET=${REDDIT_CLIENT_SECRET}
- REDDIT_USERNAME=${REDDIT_USERNAME}
- REDDIT_PASSWORD=${REDDIT_PASSWORD}
# Dropbox sync (optional)
- DROPBOX_APP_KEY=${DROPBOX_APP_KEY}
- DROPBOX_APP_SECRET=${DROPBOX_APP_SECRET}
- DROPBOX_REFRESH_TOKEN=${DROPBOX_REFRESH_TOKEN}
# Imgur API (optional - for better rate limits)
- IMGUR_CLIENT_ID=${IMGUR_CLIENT_ID}
# Scheduling
- SCHEDULE_MODE=periodic
- SCHEDULE_INTERVAL=7200
volumes:
- reddit-data:/app/reddit
# Optional: mount custom settings.ini
# - ./settings.ini:/app/settings.ini:ro
deploy:
resources:
limits:
cpus: '1.0'
memory: 512M
volumes:
reddit-data:
Using with Portainer:
- Go to Stacks → Add stack
- Name your stack (e.g.,
reddit-stash
) - Paste one of the compose examples above
- Scroll down to Environment variables and add your credentials:
REDDIT_CLIENT_ID
= your_client_idREDDIT_CLIENT_SECRET
= your_client_secretREDDIT_USERNAME
= your_usernameREDDIT_PASSWORD
= your_password- (Add others as needed)
- Click Deploy the stack
Bind Mounts for Specific Paths:
If you prefer to use a specific directory instead of a named volume, use bind mounts:
volumes:
# Named volume (recommended)
- reddit-data:/app/reddit
# OR bind mount to specific path
# Synology:
# - /volume1/docker/reddit-stash:/app/reddit
# unRAID:
# - /mnt/user/appdata/reddit-stash:/app/reddit
# Generic:
# - ./reddit:/app/reddit
With Dropbox Sync (Single Run):
docker run --rm \
-e REDDIT_CLIENT_ID='your_client_id' \
-e REDDIT_CLIENT_SECRET='your_client_secret' \
-e REDDIT_USERNAME='your_username' \
-e REDDIT_PASSWORD='your_password' \
-e DROPBOX_APP_KEY='your_dropbox_key' \
-e DROPBOX_APP_SECRET='your_dropbox_secret' \
-e DROPBOX_REFRESH_TOKEN='your_dropbox_token' \
-v reddit-data:/app/reddit \
ghcr.io/rhnfzl/reddit-stash:latest
With Dropbox Sync (Periodic):
docker run -d \
--name reddit-stash \
--restart unless-stopped \
-e REDDIT_CLIENT_ID='your_client_id' \
-e REDDIT_CLIENT_SECRET='your_client_secret' \
-e REDDIT_USERNAME='your_username' \
-e REDDIT_PASSWORD='your_password' \
-e DROPBOX_APP_KEY='your_dropbox_key' \
-e DROPBOX_APP_SECRET='your_dropbox_secret' \
-e DROPBOX_REFRESH_TOKEN='your_dropbox_token' \
-e SCHEDULE_MODE='periodic' \
-e SCHEDULE_INTERVAL='7200' \
-v reddit-data:/app/reddit \
ghcr.io/rhnfzl/reddit-stash:latest
With Imgur API for Better Rate Limits:
docker run -d \
--name reddit-stash \
--restart unless-stopped \
-e REDDIT_CLIENT_ID='your_client_id' \
-e REDDIT_CLIENT_SECRET='your_client_secret' \
-e REDDIT_USERNAME='your_username' \
-e REDDIT_PASSWORD='your_password' \
-e IMGUR_CLIENT_ID='your_imgur_client_id' \
-e SCHEDULE_MODE='periodic' \
-v reddit-data:/app/reddit \
ghcr.io/rhnfzl/reddit-stash:latest
With 2FA Authentication:
If you use Reddit's Two-Factor Authentication, append your 6-digit code to your password:
docker run -d \
--name reddit-stash \
-e REDDIT_CLIENT_ID='your_client_id' \
-e REDDIT_CLIENT_SECRET='your_client_secret' \
-e REDDIT_USERNAME='your_username' \
-e REDDIT_PASSWORD='your_password:123456' \
-e SCHEDULE_MODE='periodic' \
-v reddit-data:/app/reddit \
ghcr.io/rhnfzl/reddit-stash:latest
With Custom settings.ini:
docker run -d \
--name reddit-stash \
--restart unless-stopped \
-e REDDIT_CLIENT_ID='your_client_id' \
-e REDDIT_CLIENT_SECRET='your_client_secret' \
-e REDDIT_USERNAME='your_username' \
-e REDDIT_PASSWORD='your_password' \
-e SCHEDULE_MODE='periodic' \
-v reddit-data:/app/reddit \
-v /path/to/your/settings.ini:/app/settings.ini:ro \
ghcr.io/rhnfzl/reddit-stash:latest
Custom Interval (e.g., every hour):
docker run -d \
--name reddit-stash \
--restart unless-stopped \
-e REDDIT_CLIENT_ID='your_client_id' \
-e REDDIT_CLIENT_SECRET='your_client_secret' \
-e REDDIT_USERNAME='your_username' \
-e REDDIT_PASSWORD='your_password' \
-e SCHEDULE_MODE='periodic' \
-e SCHEDULE_INTERVAL='3600' \
-v reddit-data:/app/reddit \
ghcr.io/rhnfzl/reddit-stash:latest
Dropbox Upload Only:
Upload local content to Dropbox without running the main Reddit scraper:
docker run --rm \
-e DROPBOX_APP_KEY='your_dropbox_key' \
-e DROPBOX_APP_SECRET='your_dropbox_secret' \
-e DROPBOX_REFRESH_TOKEN='your_dropbox_token' \
-v reddit-data:/app/reddit \
ghcr.io/rhnfzl/reddit-stash:latest \
dropbox_utils.py --upload
Dropbox Download Only:
Download content from Dropbox to local storage:
docker run --rm \
-e DROPBOX_APP_KEY='your_dropbox_key' \
-e DROPBOX_APP_SECRET='your_dropbox_secret' \
-e DROPBOX_REFRESH_TOKEN='your_dropbox_token' \
-v reddit-data:/app/reddit \
ghcr.io/rhnfzl/reddit-stash:latest \
dropbox_utils.py --download
Named Volumes (Recommended for NAS):
Named volumes are portable across different NAS systems and don't require specific path configurations:
-v reddit-data:/app/reddit
View volume location:
docker volume inspect reddit-data
Bind Mounts (Specific Paths):
Use bind mounts when you need data in a specific directory:
# Synology DSM
-v /volume1/docker/reddit-stash:/app/reddit
# QNAP
-v /share/Container/reddit-stash:/app/reddit
# TrueNAS SCALE
-v /mnt/tank/apps/reddit-stash:/app/reddit
# unRAID
-v /mnt/user/appdata/reddit-stash:/app/reddit
# Generic Linux/macOS
-v /path/to/reddit:/app/reddit
-v $(pwd)/reddit:/app/reddit
Volume Permissions:
The container runs as UID 1000. If you encounter permission errors with bind mounts:
sudo chown -R 1000:1000 /path/to/reddit
Resource Limits for NAS Devices:
Limit CPU and memory usage to prevent overloading your NAS:
Docker CLI:
docker run -d \
--name reddit-stash \
--restart unless-stopped \
--memory="512m" \
--memory-swap="1g" \
--cpus="1.0" \
-e REDDIT_CLIENT_ID='your_client_id' \
-e REDDIT_CLIENT_SECRET='your_client_secret' \
-e REDDIT_USERNAME='your_username' \
-e REDDIT_PASSWORD='your_password' \
-e SCHEDULE_MODE='periodic' \
-v reddit-data:/app/reddit \
ghcr.io/rhnfzl/reddit-stash:latest
Docker Compose:
services:
reddit-stash:
# ... other config ...
deploy:
resources:
limits:
cpus: '1.0'
memory: 512M
reservations:
cpus: '0.5'
memory: 256M
Monitoring & Logs:
# View real-time logs
docker logs -f reddit-stash
# View last 100 lines
docker logs --tail 100 reddit-stash
# View logs since specific time
docker logs --since 1h reddit-stash
# Monitor resource usage
docker stats reddit-stash
Updating the Container:
# Stop and remove old container
docker stop reddit-stash
docker rm reddit-stash
# Pull latest image
docker pull ghcr.io/rhnfzl/reddit-stash:latest
# Recreate with same command or docker-compose up -d
Or with Docker Compose:
docker-compose pull
docker-compose up -d
For detailed step-by-step guides specific to your NAS platform, see docs/DOCKER_DEPLOYMENT.md:
- Synology DSM - Container Manager setup
- QNAP - Container Station configuration
- TrueNAS SCALE - Apps deployment
- unRAID - Docker tab and Community Applications
- OpenMediaVault - Docker plugin setup
- Proxmox - LXC container with Docker
- Portainer - Stack deployment
If you prefer to build the Docker image yourself:
1. Clone and Build:
# Clone the repository
git clone https://github.com/rhnfzl/reddit-stash.git
cd reddit-stash
# Build with default Python 3.12
docker build -t reddit-stash:local .
# Or build with specific Python version
docker build --build-arg PYTHON_VERSION=3.11 -t reddit-stash:py3.11 .
docker build --build-arg PYTHON_VERSION=3.10 -t reddit-stash:py3.10 .
2. Run the Locally Built Image:
After building, use your local image tag (reddit-stash:local
) instead of the GitHub Container Registry image (ghcr.io/rhnfzl/reddit-stash:latest
) in any of the examples from Option 1 above.
Example:
docker run -d \
--name reddit-stash \
--restart unless-stopped \
-e REDDIT_CLIENT_ID='your_client_id' \
-e REDDIT_CLIENT_SECRET='your_client_secret' \
-e REDDIT_USERNAME='your_username' \
-e REDDIT_PASSWORD='your_password' \
-e SCHEDULE_MODE='periodic' \
-v reddit-data:/app/reddit \
reddit-stash:local
For all other usage scenarios (Dropbox sync, Imgur API, docker-compose, special operations, etc.), refer to the examples in Option 1 above, simply replacing the image name.
- Python Support: Build supports Python 3.10, 3.11, and 3.12 (3.12 is default)
- Security: The container runs as a non-root user for security
- Data Persistence: Data is persisted through a volume mount (
-v $(pwd)/reddit:/app/reddit
) to your local machine - Runtime Configuration: Environment variables must be provided at runtime
- Flexibility: The container supports running different scripts (main script, dropbox operations)
- Interactive Mode: Use
-it
flags for interactive operation with output visible in your terminal - Shell Special Characters: Always use single quotes around environment variable values to prevent shell interpretation of special characters (!, &, $, etc.)
- Execution Modes:
- Single execution (
SCHEDULE_MODE=once
, default): Runs once and exits - Periodic execution (
SCHEDULE_MODE=periodic
): Runs continuously on schedule
- Single execution (
- Scheduling Options:
- Default interval: 2 hours (7200 seconds)
- Custom interval: Set
SCHEDULE_INTERVAL
to any value ≥ 60 seconds - Graceful shutdown: Responds to SIGTERM/SIGINT for clean container stops
- Two Main Storage Modes:
- Local-only: Just Reddit credentials, saves to mounted volume
- Dropbox sync: Full credentials for automatic cloud backup
- Detached Mode: You can also run in detached mode with
-d
if you prefer:docker run -d \ -e REDDIT_CLIENT_ID='your_client_id' \ [other environment variables] \ -v $(pwd)/reddit:/app/reddit \ reddit-stash
- Logging: Logs are available through Docker's logging system when running in detached mode:
docker logs <container_id>
If you've forked this repository to make custom modifications, follow these setup steps:
Docker works the same way with your fork - just build locally:
# Clone your fork
git clone https://github.com/YOUR_USERNAME/reddit-stash.git
cd reddit-stash
# Build your customized version
docker build -t reddit-stash .
# Run your custom build
docker run -it \
-e REDDIT_CLIENT_ID=your_client_id \
-e REDDIT_CLIENT_SECRET=your_client_secret \
-e REDDIT_USERNAME=your_username \
-e REDDIT_PASSWORD=your_password \
-e DROPBOX_APP_KEY=your_dropbox_key \
-e DROPBOX_APP_SECRET=your_dropbox_secret \
-e DROPBOX_REFRESH_TOKEN=your_dropbox_token \
-v $(pwd)/reddit:/app/reddit \
reddit-stash
Replace YOUR_USERNAME
with your GitHub username.
After completing your chosen installation method, verify that everything is working correctly:
- Repository forked successfully
- All required secrets added to repository settings
- Workflow manually triggered at least once
- Workflow completes without errors (check Actions tab)
- Reddit folder appears in your Dropbox account
- Content files are present and readable
- Python 3.10-3.12 installed and working (3.12 recommended)
- Repository cloned successfully
- Dependencies installed via
pip install -r requirements.txt
- Environment variables set correctly
- Script runs without errors
- Content saved to specified directory
- (Optional) Content uploaded to Dropbox if configured
- Docker installed and daemon running
- Image built successfully
- Container runs without errors
- Content appears in mounted volume
- (Optional) Content uploaded to Dropbox if configured
The settings.ini
file in the root directory of the project allows you to configure how Reddit Stash operates. This comprehensive configuration system controls everything from basic saving behavior to advanced media processing and content recovery.
Jump to any section or browse the complete settings index:
Section | Settings Count | Purpose | Jump To |
---|---|---|---|
[Settings] | 8 | Core behavior (save paths, types, file checking) | ↓ View |
[Configuration] | 4 | Reddit API credentials (use env vars!) | ↓ View |
[Media] | 12 | Media download controls (images, videos, albums) | ↓ View |
[Imgur] | 3 | Imgur API configuration (optional) | ↓ View |
[Recovery] | 9 | Content recovery system (4-provider cascade) | ↓ View |
[Retry] | 7 | Retry queue management (exponential backoff) | ↓ View |
Total | 43 settings | Complete system configuration | ↓ Settings Index |
Quick Tips:
- 🔒 Security First: Use environment variables for credentials, not settings.ini
- ⚡ Performance:
check_type=LOG
,max_concurrent_downloads=3-5
- 💾 Storage: Configure
max_image_size
,max_video_size
,max_daily_storage_mb
- 🔄 Recovery: Enable all 4 providers for best deleted content recovery
- 📖 Full Docs: Each setting includes type, defaults, examples, and trade-offs
[Settings]
save_directory = reddit/ # Local directory for saved content
dropbox_directory = /reddit # Dropbox folder path
save_type = ALL # Content types to save
check_type = LOG # File existence checking method
unsave_after_download = false # Auto-unsave after downloading
process_gdpr = false # Process GDPR export data
process_api = true # Use Reddit API for content
ignore_tls_errors = false # Bypass SSL certificate validation (use with caution)
-
save_directory
- Where to save downloaded content locally- Type: String (directory path)
- Default:
reddit/
- Valid Values:
- Relative paths:
reddit/
,./backup/
,../archives/reddit/
- Absolute paths:
/home/user/reddit-archive/
,C:\Users\Name\Documents\reddit\
- Relative paths:
- Behavior:
- Creates directory if it doesn't exist
- Relative paths are relative to script location
- Subdirectories created automatically for each subreddit
- Examples:
save_directory = reddit/ # Default, creates ./reddit/ save_directory = /var/backups/reddit/ # Absolute Unix path save_directory = D:\Reddit\ # Absolute Windows path
- Considerations:
- Ensure sufficient disk space (typical usage: 100MB-10GB+ depending on media)
- Write permissions required for the directory
- Avoid network drives for better performance
-
dropbox_directory
- Dropbox cloud storage path- Type: String (Dropbox path)
- Default:
/reddit
- Valid Values: Any valid Dropbox path starting with
/
- Format Rules:
- Must start with
/
(Dropbox root) - Forward slashes only (even on Windows)
- Case-sensitive on Dropbox side
- Must start with
- Examples:
dropbox_directory = /reddit # Root level folder dropbox_directory = /Backups/reddit # Nested folder dropbox_directory = /Archives/2024/reddit # Deep nesting
- Notes:
- Folder created automatically if it doesn't exist
- Only used when Dropbox credentials are configured
- Syncs file_log.json and all content
-
save_type
- What content to download from Reddit-
Type: String (enum)
-
Default:
ALL
-
Valid Values:
ALL
,SAVED
,ACTIVITY
,UPVOTED
(case-insensitive) -
Detailed Options:
Value What It Downloads File Prefixes Use Case ALL
Everything: your posts, comments, saved items, upvoted items POST_
,COMMENT_
,SAVED_POST_
,SAVED_COMMENT_
,UPVOTE_POST_
,UPVOTE_COMMENT_
Complete Reddit archive SAVED
Only items you saved SAVED_POST_
,SAVED_COMMENT_
Personal knowledge base ACTIVITY
Only your posts & comments POST_
,COMMENT_
Your Reddit contributions UPVOTED
Only upvoted items UPVOTE_POST_
,UPVOTE_COMMENT_
Track interests -
Performance Comparison (approximate, 1000 items each):
ALL: ~15-30 minutes (processes 4 categories) SAVED: ~5-10 minutes (1 category) ACTIVITY: ~5-10 minutes (1 category) UPVOTED: ~5-10 minutes (1 category)
-
API Rate Limit Impact: Reddit limits to ~100 requests/minute, so
ALL
uses 4x more requests -
Examples:
save_type = ALL # Everything save_type = SAVED # Just saved items save_type = all # Case doesn't matter
-
-
check_type
- How to detect already-downloaded files-
Type: String (enum)
-
Default:
LOG
-
Valid Values:
LOG
,DIR
(case-insensitive) -
Detailed Comparison:
Feature LOG
DIR
Speed ⚡ Very fast (in-memory check) 🐢 Slower (filesystem scan) Accuracy Requires intact file_log.json
Always accurate Recovery Fails if log corrupted/deleted Self-healing Scalability Excellent (10,000+ files) Degrades with size Best For GitHub Actions, automated runs Local use, manual management -
LOG
Mode Details:- Uses
file_log.json
in save_directory - Tracks: filename, timestamp, subreddit, content type
- JSON structure:
{"file_id-subreddit-type-category": {...}}
- Loads entire log into memory (typically <10MB for 10,000 items)
- Risk: If log file deleted, treats all files as new (duplicates)
- Fix: If log corrupted, switch to
DIR
temporarily to rebuild
- Uses
-
DIR
Mode Details:- Scans filesystem for matching files
- Checks filename patterns:
POST_*.md
,SAVED_POST_*.md
, etc. - Builds in-memory index during startup
- Performance: ~1 second per 1,000 files
- Benefits: No dependency on log file, always correct
- Drawback: Slow for large archives (10,000+ files = ~10s startup)
-
Examples:
check_type = LOG # Fast, recommended for automation check_type = DIR # Thorough, recommended for local
-
When to Switch:
LOG
→DIR
: Log file corrupted, seeing duplicate downloadsDIR
→LOG
: Archive stable, want faster processing
-
-
unsave_after_download
- Automatically unsave items after downloading- Type: Boolean
- Default:
false
- Valid Values:
true
,false
,yes
,no
,on
,off
,1
,0
(case-insensitive) ⚠️ CRITICAL WARNING: THIS IS PERMANENT AND IRREVERSIBLE!- How It Works:
- Downloads post/comment to local file
- Verifies download successful
- Unsaves item from Reddit (removes from your saved list)
- Waits 0.5 seconds (rate limit protection)
- Continues to next item
- Use Case - Breaking the 1000-Item Limit:
- Reddit shows max 1000 saved items in your list
- Older items beyond 1000 are hidden (but still saved on Reddit)
- By unsaving downloaded items, older items "bubble up" into view
- Run script multiple times to access progressively older saves
- Workflow Example:
Run 1: Download 1000 most recent, unsave them → Access items 1001-2000 Run 2: Download next 1000, unsave them → Access items 2001-3000 Run 3: Download next 1000, unsave them → Access items 3001-4000 Continue until no more items found
- Safety Recommendations:
- First Run: Keep
false
, verify downloads work correctly - Backup: Ensure
file_log.json
is backed up (Dropbox/Git) - Test: Try with
save_type = UPVOTED
first (less critical) - Enable: Set to
true
only when confident - Monitor: Check logs for "unsave failed" messages
- First Run: Keep
- Error Handling:
- If unsave fails (API error), script continues (doesn't stop)
- Failed unsaves logged but don't retry automatically
- Item remains saved on Reddit, can be re-downloaded (will skip if in log)
- Examples:
unsave_after_download = false # Safe default unsave_after_download = true # Enable with caution unsave_after_download = yes # Also valid
-
process_gdpr
- Process Reddit GDPR export files- Type: Boolean
- Default:
false
- Valid Values:
true
,false
,yes
,no
,on
,off
,1
,0
- Purpose: Access complete Reddit history including items beyond 1000-item API limits
- Requirements:
- GDPR export CSV files in
{save_directory}/gdpr_data/
- Expected files:
saved_posts.csv
,saved_comments.csv
- GDPR export CSV files in
- How It Works:
- Reads post/comment IDs from CSV files
- Fetches full content via Reddit API (one API call per item)
- Saves with
GDPR_POST_
orGDPR_COMMENT_
prefix - Respects rate limits (same 100 req/min as normal processing)
- GDPR Export Process:
- Visit https://www.reddit.com/settings/data-request
- Request data (takes 2-30 days to process)
- Download ZIP file when ready
- Extract
saved_posts.csv
andsaved_comments.csv
- Place in
{save_directory}/gdpr_data/
- Performance Impact:
- Processes AFTER regular API content
- Each item requires separate API call
- 1000 GDPR items ≈ 10-15 additional minutes
- Deduplication: Items already in log are skipped (no duplicates)
- Examples:
process_gdpr = false # Default, skip GDPR processing process_gdpr = true # Process GDPR files if present
- Common Issues:
- "GDPR directory not found" → Create
{save_directory}/gdpr_data/
- "No CSV files found" → Verify files named exactly
saved_posts.csv
- Deleted content → GDPR has IDs but content may be deleted (404 errors normal)
- "GDPR directory not found" → Create
-
process_api
- Fetch content from Reddit API-
Type: Boolean
-
Default:
true
-
Valid Values:
true
,false
,yes
,no
,on
,off
,1
,0
-
Purpose: Control whether to fetch current content from Reddit API
-
Use Cases:
process_api
process_gdpr
Behavior true
false
Normal mode: Fetch current saved/posted/upvoted content true
true
Complete mode: Current content + GDPR history false
true
GDPR-only: Only process export files (no API calls for current) false
false
⚠️ Does nothing (no content fetched) -
When to Set
false
:- You only want to process GDPR export files
- Testing GDPR processing without affecting API rate limits
- Already ran API processing, want to add GDPR data only
-
Examples:
process_api = true # Normal operation process_api = false # Skip API, only process GDPR
-
-
ignore_tls_errors
- Bypass SSL certificate verification- Type: Boolean
- Default:
false
- Valid Values:
true
,false
,yes
,no
,on
,off
,1
,0
⚠️ SECURITY WARNING: Setting totrue
reduces security significantly!- What It Does:
false
(default): Validates SSL certificates, rejects expired/invalid/self-signed certstrue
: Accepts ANY certificate, including expired, self-signed, or invalid
- Security Implications:
- Vulnerable to man-in-the-middle attacks
- No guarantee you're downloading from legitimate source
- Should NEVER be used with sensitive content
- Legitimate Use Cases (rare):
- Archiving content from sites with expired certificates
- Corporate networks with self-signed proxy certificates
- Historical content preservation where security less critical
- What It Affects:
- Only affects media downloads (images, videos)
- Does NOT affect Reddit API (always uses valid TLS)
- Third-party hosting: Imgur, Gfycat, etc.
- Warnings Generated:
- Script prints "
⚠️ TLS verification disabled" on startup - Configuration validator shows security warning
- Script prints "
- Examples:
ignore_tls_errors = false # Secure default (recommended) ignore_tls_errors = true # Insecure, use only if absolutely necessary
- Better Alternatives:
- Use content recovery system (often has archived copies)
- Manually download problematic images
- Report invalid certificates to site owners
[Configuration]
client_id = None # Reddit app client ID
client_secret = None # Reddit app client secret
username = None # Reddit username
password = None # Reddit password (or password:2FA_code)
🔒 USE ENVIRONMENT VARIABLES, NOT settings.ini FOR CREDENTIALS!
Unless you absolutely know what you're doing and have a specific reason to store credentials in settings.ini (like an isolated, encrypted system), always use environment variables instead.
Why Environment Variables Are Safer:
Risk | settings.ini | Environment Variables |
---|---|---|
Accidental Git Commit | ❌ VERY HIGH - Credentials exposed publicly | ✅ Safe - Not in version control |
File Sharing | ❌ HIGH - Shared accidentally with settings | ✅ Safe - Not in shared files |
Backup Exposure | ❌ MEDIUM - In all backups/archives | ✅ Safe - Not in file backups |
Log Exposure | ❌ MEDIUM - May appear in logs | ✅ Safer - Separate from logs |
Access Control | ❌ File permissions only | ✅ Better - OS-level protection |
Real-World Danger Example:
You: Edit settings.ini with credentials
You: Test locally, everything works
You: git add . && git commit -m "Update settings"
You: git push
Result: 🚨 YOUR CREDENTIALS ARE NOW PUBLIC ON GITHUB 🚨
Anyone can: Access your Reddit account, download your data, post as you
-
client_id
- Reddit application client ID- Type: String or
None
- Default:
None
⚠️ SECURITY: Use environment variableREDDIT_CLIENT_ID
instead- When to Use settings.ini:
- Isolated test environment (not connected to internet)
- Encrypted filesystem only you access
- Single-use throwaway credentials
- Never Use settings.ini If:
- Repository is public or will be shared
- File will be backed up to cloud
- Multiple people access the system
- You're not 100% sure about security implications
- Environment Variable Method (RECOMMENDED):
# Linux/macOS export REDDIT_CLIENT_ID='your_client_id_here' # Windows set REDDIT_CLIENT_ID=your_client_id_here
- settings.ini Method (NOT RECOMMENDED):
client_id = your_client_id_here # ⚠️ DANGEROUS
- Type: String or
-
client_secret
- Reddit application client secret- Type: String or
None
- Default:
None
⚠️ CRITICAL SECURITY: This is your API password! UseREDDIT_CLIENT_SECRET
env var- Exposure Risk: CRITICAL - Full API access
- If Compromised: Attacker can use your Reddit app credentials
- NEVER commit this to version control
- Environment Variable (REQUIRED for any shared/public code):
export REDDIT_CLIENT_SECRET='your_secret_here'
- Type: String or
-
username
- Your Reddit username- Type: String or
None
- Default:
None
- Security Level: Medium (username is public anyway, but indicates which account)
- Environment Variable (RECOMMENDED):
export REDDIT_USERNAME='your_username'
- Note: Less critical than password, but still recommended to use env var
- Type: String or
-
password
- Your Reddit account password- Type: String or
None
- Default:
None
⚠️ MAXIMUM SECURITY RISK: YOUR REDDIT PASSWORD!- Exposure = Account Takeover: Someone with this can log into your Reddit account
- 2FA Format: If you have two-factor authentication:
your_password:123456
- NEVER EVER put this in settings.ini unless isolated system
- Environment Variable (ABSOLUTELY REQUIRED):
export REDDIT_PASSWORD='your_password' # With 2FA: export REDDIT_PASSWORD='your_password:123456'
- Type: String or
- Environment Variables (checked first) ← USE THIS
- settings.ini values (fallback) ← AVOID THIS
If environment variable exists, settings.ini value is completely ignored.
✅ SAFE - Keep credentials as None:
[Configuration]
client_id = None
client_secret = None
username = None
password = None
❌ UNSAFE - Credentials in file:
[Configuration]
client_id = abc123def456 # ⚠️ WILL BE EXPOSED
client_secret = secret_key_here # ⚠️ CRITICAL RISK
username = your_username # ⚠️ ACCOUNT IDENTIFIER
password = your_password # ⚠️ ACCOUNT TAKEOVER RISK
Only use settings.ini for credentials if ALL of these are true:
- ✅ System is completely isolated (no internet, air-gapped)
- ✅ Filesystem is encrypted
- ✅ Only you have access (no shared users)
- ✅ File will never be committed to version control
- ✅ File will never be backed up to cloud
- ✅ File will never be shared with anyone
- ✅ You fully understand the security implications
- ✅ You're using test/throwaway credentials only
If you're unsure about even ONE of these, use environment variables!
# Check if credentials are in settings.ini
grep -E "client_id|client_secret|username|password" settings.ini
# Safe output (all None):
client_id = None
client_secret = None
username = None
password = None
# Unsafe output (has values):
client_id = abc123 # ⚠️ FIX THIS
- Add settings.ini to .gitignore (even with None values, for safety)
- Use .env files for local development (also in .gitignore)
- Rotate credentials if you suspect exposure
- Use GitHub Secrets for GitHub Actions (already encrypted)
- Never screenshot settings with credentials
- Check git history if you accidentally committed credentials
For detailed instructions on setting up environment variables properly, see Setting Up Reddit Environment Variables
[Media]
# Global media download controls
download_enabled = true # Master switch for all media downloads
download_images = true # Enable image downloads
download_videos = true # Enable video downloads
download_audio = true # Enable audio downloads
# Image processing settings
thumbnail_size = 800 # Thumbnail dimensions in pixels
max_image_size = 5242880 # Max image file size (5MB in bytes)
create_thumbnails = true # Generate thumbnails for large images
# Video settings
video_quality = high # Video quality preference
max_video_size = 209715200 # Max video file size (200MB in bytes)
# Album settings
download_albums = true # Process image albums/galleries
max_album_images = 50 # Max images per album (0 = unlimited)
# Performance and resource limits
max_concurrent_downloads = 3 # Parallel download streams
download_timeout = 30 # Download timeout in seconds
max_daily_storage_mb = 1024 # Daily storage limit in MB
Global Controls:
-
download_enabled
- Master switch for all media downloads- Type: Boolean
- Default:
true
- Valid Values:
true
,false
,yes
,no
,on
,off
,1
,0
- What It Does:
true
: Enables media download system, attempts to download images/videos/audiofalse
: Completely disables media downloads, only saves text content (markdown)
- Performance Impact:
- Enabled: Adds 5-30 minutes per 1000 posts (depends on media count)
- Disabled: ~80% faster processing, minimal storage usage
- Storage Impact:
- Enabled: 50MB-5GB+ depending on content (images are heavy!)
- Disabled: <50MB for 10,000 text-only posts
- Examples:
download_enabled = true # Enable media downloads download_enabled = false # Text-only archiving
-
download_images
- Control image downloads- Type: Boolean
- Default:
true
- Valid Values:
true
,false
,yes
,no
,on
,off
,1
,0
- Requires:
download_enabled = true
- What It Downloads:
- Direct image links (i.redd.it, i.imgur.com)
- Reddit galleries
- External images (direct URLs)
- Success Rate: ~80% with modern web compatibility
- Examples:
download_images = true # Download all images download_images = false # Skip images
-
download_videos
- Control video downloads- Type: Boolean
- Default:
true
- Valid Values:
true
,false
,yes
,no
,on
,off
,1
,0
- Requires:
download_enabled = true
- What It Downloads:
- v.redd.it videos (with audio merging via ffmpeg if available)
- Direct video links (.mp4, .webm, .mov)
- Embedded videos from supported hosts
- Dependencies:
- ffmpeg recommended for v.redd.it audio merging
- Without ffmpeg: video-only (no audio track)
- Storage Warning: Videos are large! Single video can be 50-500MB
- Examples:
download_videos = true # Download videos download_videos = false # Skip videos (save storage)
-
download_audio
- Control audio downloads- Type: Boolean
- Default:
true
- Valid Values:
true
,false
,yes
,no
,on
,off
,1
,0
- Requires:
download_enabled = true
- What It Downloads:
- Audio-only posts (podcasts, music)
- Audio tracks from videos (when separated)
- Note: Rarely used by Reddit, most audio is embedded in video
- Examples:
download_audio = true # Download audio files download_audio = false # Skip audio
Image Processing:
-
thumbnail_size
- Maximum thumbnail dimension in pixels- Type: Integer (positive)
- Default:
800
- Valid Range: 100-4096 (recommended: 300-1200)
- What It Means:
- Thumbnails are resized to fit within this width × height box
- Aspect ratio is preserved
- If image is 1920×1080 and thumbnail_size=800: → 800×450
- Storage Impact Per Image:
- 300px: ~50KB (very small)
- 800px: ~150KB (balanced)
- 1200px: ~300KB (high quality)
- Use Cases:
- 300-500: Quick previews, storage-constrained
- 800-1000: Balanced quality/size (recommended)
- 1200-2000: High quality, display on large screens
- Examples:
thumbnail_size = 400 # Small previews thumbnail_size = 800 # Default, good balance thumbnail_size = 1200 # High quality
-
max_image_size
- Maximum image file size in bytes- Type: Integer (positive)
- Default:
5242880
(5MB) - Valid Range: 100,000 (100KB) to 52,428,800 (50MB)
- What It Does:
- Images larger than this are SKIPPED (not downloaded)
- Not resized - full skip to save bandwidth/storage
- Applies BEFORE download (uses Content-Length header if available)
- Size Conversion Guide:
1 MB = 1048576 bytes 2 MB = 2097152 bytes 5 MB = 5242880 bytes (default) 10 MB = 10485760 bytes 20 MB = 20971520 bytes 50 MB = 52428800 bytes
- Trade-offs:
- Small limit (1-2MB): Saves storage, may miss high-res images
- Medium limit (5-10MB): Balanced, gets most images
- Large limit (20-50MB): Complete archive, heavy storage
- Examples:
max_image_size = 1048576 # 1MB limit (minimal) max_image_size = 5242880 # 5MB limit (default) max_image_size = 20971520 # 20MB limit (comprehensive)
-
create_thumbnails
- Generate thumbnail versions of images- Type: Boolean
- Default:
true
- Valid Values:
true
,false
,yes
,no
,on
,off
,1
,0
- What It Does:
true
: Creates smaller preview version alongside originalfalse
: Only saves original full-size image
- File Naming:
- Original:
abc123_media.jpg
- Thumbnail:
abc123_media_thumb.jpg
- Original:
- Storage Cost: +10-20% additional space
- Benefits:
- Fast loading in file browsers
- Preview without opening large files
- Useful for browsing archives offline
- Processing Cost: +2-5 seconds per image
- Examples:
create_thumbnails = true # Generate previews create_thumbnails = false # Original only
Video Processing:
-
video_quality
- Download quality preference for videos-
Type: String (enum)
-
Default:
high
-
Valid Values:
high
,low
(case-insensitive) -
What It Means:
Quality Resolution Bitrate File Size (per min) Use Case high
720p-1080p+ High 10-50MB/min Best quality, storage available low
360p-480p Lower 3-10MB/min Save bandwidth/storage -
How It Works:
- v.redd.it: Selects from available quality levels
- YouTube: Requests specific quality (if available)
- Gfycat: Downloads HD vs SD version
-
Important: Not all sources provide multiple qualities
-
Examples:
video_quality = high # Best quality (default) video_quality = low # Save storage video_quality = HIGH # Case doesn't matter
-
-
max_video_size
- Maximum video file size in bytes- Type: Integer (positive)
- Default:
209715200
(200MB) - Valid Range: 1,048,576 (1MB) to 1,073,741,824 (1GB)
- What It Does: Videos larger than this limit are SKIPPED
- Size Conversion Guide:
10 MB = 10485760 bytes 50 MB = 52428800 bytes 100 MB = 104857600 bytes 200 MB = 209715200 bytes (default) 500 MB = 524288000 bytes 1 GB = 1073741824 bytes
- Typical Video Sizes:
- Short clip (30s): 5-20MB
- Medium video (2-5 min): 20-100MB
- Long video (10+ min): 100-500MB
- HD long video: 500MB-2GB
- Recommendations By Use Case:
- Storage-constrained: 50-100MB
- Balanced: 200-300MB (default range)
- Complete archive: 500MB-1GB
- Examples:
max_video_size = 52428800 # 50MB (minimal) max_video_size = 209715200 # 200MB (default) max_video_size = 524288000 # 500MB (comprehensive)
Album Handling:
-
download_albums
- Process multi-image posts (albums/galleries)- Type: Boolean
- Default:
true
- Valid Values:
true
,false
,yes
,no
,on
,off
,1
,0
- What It Handles:
- Imgur albums (imgur.com/a/XXXXX)
- Reddit galleries (multiple images in one post)
- Other multi-image services
- File Naming:
{post_id}_media_001.jpg
,{post_id}_media_002.jpg
, etc. - Storage Impact: Albums can be 2-100+ images each
- Examples:
download_albums = true # Download all album images download_albums = false # Skip albums entirely
-
max_album_images
- Limit images per album- Type: Integer (non-negative)
- Default:
50
- Valid Values:
0
(unlimited) or positive integer (1-1000+) - What It Does:
0
: Download ALL images in album (no limit)Positive number
: Download only first N images, skip rest
- Why Limit?:
- Some albums have 100+ images (massive storage)
- Prevent single post from consuming too much space
- Faster processing
- Behavior Example:
- Album has 80 images
max_album_images = 50
: Download first 50, skip remaining 30max_album_images = 0
: Download all 80
- Recommendations:
- Conservative: 20-30 images
- Balanced: 50 images (default)
- Complete: 0 (unlimited, use with caution)
- Examples:
max_album_images = 10 # First 10 only max_album_images = 50 # First 50 (default) max_album_images = 0 # No limit
Performance Controls:
-
max_concurrent_downloads
- Parallel download streams-
Type: Integer (positive)
-
Default:
3
-
Valid Range: 1-20 (recommended: 1-10)
-
What It Does: Number of media files downloaded simultaneously
-
Trade-offs:
Value Speed CPU Usage Memory Usage Network Load Best For 1 Slow Low Low Light Slow connections, low-power devices 3 Moderate Medium Medium Moderate Default, balanced 5-7 Fast High High Heavy Fast connections, powerful machines 10+ Fastest Very High Very High Very Heavy Server environments, very fast connections -
Limiting Factors:
- GitHub Actions: 3-5 recommended (shared resources)
- Home internet: Based on bandwidth (3-5 typical)
- Fast connection: 5-10
-
Warning: Too high can trigger rate limits!
-
Examples:
max_concurrent_downloads = 1 # One at a time (safest) max_concurrent_downloads = 3 # Default (balanced) max_concurrent_downloads = 5 # Fast (requires good connection)
-
-
download_timeout
- Per-file download timeout in seconds- Type: Integer (positive)
- Default:
30
- Valid Range: 5-600 (recommended: 15-120)
- What It Does: Max time to wait for single file download
- What Happens on Timeout:
- Download cancelled
- Item added to retry queue
- Script continues to next file
- Recommendations By File Type:
- Images only: 15-30 seconds
- Mixed (images + small videos): 30-60 seconds (default range)
- Large videos: 60-300 seconds
- Network Speed Considerations:
- Slow connection (<1 Mbps): 60-120 seconds
- Medium (1-10 Mbps): 30-60 seconds
- Fast (10+ Mbps): 15-30 seconds
- Examples:
download_timeout = 15 # Fast timeout (risk more failures) download_timeout = 30 # Default (balanced) download_timeout = 120 # Patient (for large files/slow networks)
-
max_daily_storage_mb
- Daily storage consumption limit in megabytes- Type: Integer (positive)
- Default:
1024
(1GB) - Valid Range: 10-100,000+ (10MB to 100GB+)
- What It Does:
- Tracks total storage used for media per run
- Stops downloading media when limit reached
- Text content (markdown) still saved
- Resets on next script run (not calendar day)
- Why Use It:
- Prevent unexpected storage exhaustion
- Control cloud storage costs (Dropbox)
- GitHub Actions storage limits
- Predictable resource usage
- Size Planning Guide:
100 MB: ~500-1000 images OR ~2-5 short videos 500 MB: ~2500-5000 images OR ~10-25 videos 1 GB: ~5000-10000 images OR ~20-50 videos (default) 5 GB: ~25000+ images OR 100+ videos 10 GB: Complete large archive
- Recommendations By Use Case:
- Testing: 100-200MB
- GitHub Actions (free): 500-1000MB
- Home backup: 2000-5000MB (2-5GB)
- Complete archive: 10000+ MB (10GB+)
- Examples:
max_daily_storage_mb = 100 # Testing/minimal max_daily_storage_mb = 1024 # 1GB default max_daily_storage_mb = 5120 # 5GB comprehensive max_daily_storage_mb = 0 # No limit (use with caution!)
[Imgur]
# Optional: Comma-separated client IDs for rate limit rotation
client_ids = None # Multiple Imgur client IDs
client_secrets = None # Corresponding client secrets
recover_deleted = true # Attempt recovery of deleted content
-
client_ids
- Imgur application client IDs for API access-
Type: String (comma-separated list in settings.ini, single value in env var)
-
Default:
None
-
Valid Values:
None
: No Imgur API access (uses fallback methods)- Single ID:
abc123def456
- Multiple IDs (settings.ini only):
id1,id2,id3
(for rate limit rotation)
-
Format Rules:
- No spaces around commas:
id1,id2,id3
✅ - With spaces (invalid):
id1, id2, id3
❌ - Each ID is alphanumeric, typically 15 characters
- No spaces around commas:
-
What It Does:
- Enables official Imgur API access
- Higher rate limits (12,500 requests/day per app)
- Album support and metadata
- Multiple IDs (settings.ini only) rotate to avoid single-app rate limits
-
Without API Access:
- Falls back to direct HTTP downloads
- Lower success rate (~30-40% vs ~70-80% with API)
- Frequent 429 rate limit errors (expected and normal)
- No album support
-
Configuration Methods:
Method Single ID Multiple IDs Best For Environment Variable ✅ Yes ❌ No Most users, single app settings.ini ✅ Yes ✅ Yes Advanced: rate limit rotation -
How to Get (only if you registered before May 2024):
- Go to https://imgur.com/account/settings/apps
- Select your application
- Copy the "Client ID" value
-
Examples:
# settings.ini - Multiple IDs supported client_ids = None # No API (most users) client_ids = abc123def456 # Single app client_ids = abc123,def456,ghi789 # Multiple apps (rotation) ⚠️ settings.ini only
# Environment variable - Single ID only export IMGUR_CLIENT_ID='abc123def456' # Single app only
-
⚠️ Important: Multiple client ID rotation is only supported in settings.ini, not via environment variables. If you need rotation across multiple Imgur apps, you must use settings.ini configuration.
-
-
client_secrets
- Imgur application client secrets-
Type: String (comma-separated list in settings.ini, single value in env var)
-
Default:
None
-
Valid Values:
None
: No Imgur API access- Comma-separated secrets matching
client_ids
order (settings.ini only)
-
MUST MATCH
client_ids
:- If
client_ids
has 3 IDs,client_secrets
must have 3 secrets - Order matters:
client_ids[0]
pairs withclient_secrets[0]
- If
-
Format Rules:
- Same as client_ids: no spaces
- Each secret is alphanumeric, typically 40 characters
- Keep these SECRET (never commit to version control!)
-
Security Warning:
- These are sensitive credentials
- Environment variables recommended for single app
- Never share or expose publicly
-
Configuration Methods:
Method Single Secret Multiple Secrets Recommended Environment Variable ✅ Yes ❌ No ✅ Most secure settings.ini ✅ Yes ✅ Yes ⚠️ Only if multiple apps -
Examples:
# settings.ini - Multiple secrets supported client_secrets = None # No API client_secrets = abcdef1234567890abcdef1234567890abcdef12 # Single client_secrets = secret1_40chars,secret2_40chars,secret3_40chars # Multiple ⚠️ settings.ini only
# Environment variable - Single secret only (RECOMMENDED) export IMGUR_CLIENT_SECRET='abcdef1234567890abcdef1234567890abcdef12'
-
Validation: Script checks that count matches
client_ids
-
⚠️ Note: Multiple client secrets are only supported in settings.ini. For single app (most users), use environment variable for better security.
-
-
recover_deleted
- Attempt recovery of deleted/unavailable Imgur content- Type: Boolean
- Default:
true
- Valid Values:
true
,false
,yes
,no
,on
,off
,1
,0
- What It Does:
true
: When Imgur returns 404, triggers content recovery cascadefalse
: Skip recovery, treat as permanent failure
- Recovery Process:
- Imgur returns 404 (not found) or 429 (rate limited)
- System tries Wayback Machine for archived copy
- Falls back to Reddit preview URLs
- Checks other recovery providers
- Caches result (success or failure) to avoid re-trying
- Success Rates (for deleted Imgur content):
- Recent deletions (<1 month): ~40-60% recovery
- Older deletions (1-6 months): ~20-40% recovery
- Very old (>6 months): ~10-20% recovery
- Popular images: Higher success (more likely archived)
- Performance Impact:
- Adds 5-15 seconds per failed Imgur image
- Only activates on failures (no cost for successful downloads)
- Results cached (subsequent failures instant)
- When to Disable:
- You don't care about deleted content
- Want faster processing (skip recovery attempts)
- Already know most Imgur links are dead
- Examples:
recover_deleted = true # Try to recover (default) recover_deleted = false # Skip recovery, faster
[Recovery]
# Recovery providers (4-provider cascade)
use_wayback_machine = true # Internet Archive Wayback Machine
use_pushshift_api = true # PullPush.io (Pushshift successor)
use_reddit_previews = true # Reddit's preview/thumbnail system
use_reveddit_api = true # Reveddit deleted content recovery
# Performance settings
timeout_seconds = 10 # Per-provider timeout
cache_duration_hours = 24 # Cache recovery results
# Cache management
max_cache_entries = 10000 # Maximum cached recovery results
max_cache_size_mb = 100 # Cache size limit in MB
cleanup_interval_minutes = 60 # Cache cleanup frequency
enable_background_cleanup = true # Automatic cache maintenance
Reddit Stash includes a sophisticated 4-provider cascade system that attempts to recover deleted, removed, or unavailable content.
-
use_wayback_machine
- Use Internet Archive Wayback Machine- Type: Boolean
- Default:
true
- Valid Values:
true
,false
,yes
,no
,on
,off
,1
,0
- What It Does: Archives web snapshots going back to 1996
- Best For: Popular content, older deletions, historical preservation
- Success Rate: 60-80% for popular content, 20-40% for obscure content
- Coverage: Billions of web pages, extensive image archives
- Rate Limit: 60 requests/minute (respectful, non-blocking)
- Response Time: 2-10 seconds average
- Examples:
use_wayback_machine = true # Enable (recommended) use_wayback_machine = false # Disable to save time
-
use_pushshift_api
- Use PullPush.io (Pushshift successor)- Type: Boolean
- Default:
true
- Valid Values:
true
,false
,yes
,no
,on
,off
,1
,0
- What It Does: Reddit-specific archive of posts and comments
- Best For: Deleted/removed Reddit text content, metadata
- Success Rate: 40-70% for Reddit content (higher for older content)
- Coverage: Reddit posts/comments from 2005-present
- Rate Limit: 12 requests/minute (conservative, respects soft limit of 15)
- Response Time: 1-3 seconds average
- Note: Sometimes slower or down (community-run service)
- Examples:
use_pushshift_api = true # Enable (recommended for Reddit content) use_pushshift_api = false # Disable if service unavailable
-
use_reddit_previews
- Use Reddit's preview/thumbnail system- Type: Boolean
- Default:
true
- Valid Values:
true
,false
,yes
,no
,on
,off
,1
,0
- What It Does: Reddit's own cached preview images
- Best For: Recent posts with images, when original host is down
- Success Rate: 20-50% (only works if Reddit generated preview)
- Coverage: Images from posts where Reddit created thumbnails
- Quality: Usually lower resolution (preview quality, not original)
- Rate Limit: 30 requests/minute
- Response Time: <1 second (very fast)
- Limitations:
- Only works for posts Reddit previewed
- Lower quality than originals
- May not work for very old posts
- Examples:
use_reddit_previews = true # Enable (fast fallback) use_reddit_previews = false # Disable if quality matters
-
use_reveddit_api
- Use Reveddit deleted content recovery- Type: Boolean
- Default:
true
- Valid Values:
true
,false
,yes
,no
,on
,off
,1
,0
- What It Does: Specialized service for recovering deleted Reddit content
- Best For: Recently deleted posts/comments (within days/weeks)
- Success Rate: 30-60% for recent deletions, lower for older
- Coverage: Reddit posts and comments deleted by users or moderators
- Rate Limit: 20 requests/minute
- Response Time: 2-5 seconds average
- Note: Most effective for recent deletions (<30 days)
- Examples:
use_reveddit_api = true # Enable (good for recent deletions) use_reveddit_api = false # Disable to save time
-
timeout_seconds
- Maximum wait time per provider attempt-
Type: Integer (positive)
-
Default:
10
-
Valid Range: 3-120 seconds (recommended: 5-30)
-
What It Does: Max time to wait for each recovery provider to respond
-
Behavior on Timeout:
- Provider attempt cancelled
- Moves to next provider in cascade
- Failure logged but doesn't stop processing
-
Trade-offs:
Timeout Success Rate Speed Best For 5s Lower (~50%) Fast Quick pass, fast networks 10s Good (~70%) Moderate Default, balanced 20-30s Higher (~85%) Slow Thorough recovery, slow networks -
Cascade Example (timeout=10s, 4 providers):
- Wayback: Try for 10s → Success/Fail → Next
- PullPush: Try for 10s → Success/Fail → Next
- Reddit Preview: Try for 10s → Success/Fail → Next
- Reveddit: Try for 10s → Success/Fail → Give up
- Total: 0-40 seconds per item (stops at first success)
-
Examples:
timeout_seconds = 5 # Fast, may miss some content timeout_seconds = 10 # Default, balanced timeout_seconds = 30 # Thorough, slow
-
-
cache_duration_hours
- How long to cache recovery results-
Type: Integer (positive)
-
Default:
24
-
Valid Range: 1-720 hours (1 hour to 30 days)
-
What It Does: Stores recovery results (success/failure) to avoid re-trying
-
What Gets Cached:
- Successful recoveries: URL → recovered content location
- Failed attempts: URL → "not found" (avoid retrying same failure)
-
Cache Database:
.recovery_cache.db
in save directory (SQLite) -
Why Cache?:
- Avoid re-querying same URL across runs
- Respect provider rate limits
- Speed up subsequent runs (instant cache hits)
- Reduce network usage
-
Duration Recommendations:
Duration Use Case Behavior 1-6 hours Testing, rapidly changing content Short-term cache 24-48 hours Normal use, daily runs Default, balanced 168 hours (1 week) Weekly runs, stable content Longer persistence 720 hours (30 days) Monthly runs, archival Maximum persistence -
Auto-Cleanup: Expired entries automatically removed based on
cleanup_interval_minutes
-
Examples:
cache_duration_hours = 6 # 6 hours (short-term) cache_duration_hours = 24 # 24 hours (default) cache_duration_hours = 168 # 1 week (long-term)
-
-
max_cache_entries
- Maximum number of cached results- Type: Integer (positive)
- Default:
10000
- Valid Range: 100-1,000,000+
- What It Does: Limits total number of entries in cache database
- Cleanup Behavior:
- When limit reached: Oldest entries removed first (FIFO)
- Expired entries removed first, then oldest
- Cleanup triggered automatically
- Storage Per Entry: ~200-500 bytes average
- Total Storage Examples:
1,000 entries = ~0.5 MB 10,000 entries = ~5 MB (default) 100,000 entries = ~50 MB 1,000,000 entries = ~500 MB
- Recommendations:
- Small archive (<1000 posts): 1,000-5,000 entries
- Medium archive (1000-10000): 10,000-50,000 entries (default range)
- Large archive (10000+): 50,000-500,000 entries
- Examples:
max_cache_entries = 1000 # Minimal cache max_cache_entries = 10000 # Default max_cache_entries = 100000 # Large cache
-
max_cache_size_mb
- Cache size limit in megabytes- Type: Integer (positive)
- Default:
100
- Valid Range: 1-10,000 MB (1MB to 10GB)
- What It Does: Limits total disk space used by cache database
- Cleanup Trigger: When cache file exceeds this size
- Relationship with
max_cache_entries
:- Both limits enforced independently
- Whichever limit reached first triggers cleanup
- Typically entries limit hits first
- Recommendations:
10 MB: Testing, minimal cache 100 MB: Default, sufficient for most users 500 MB: Large archives, lots of recovery attempts 1000 MB: Very large archives, maximum persistence
- Examples:
max_cache_size_mb = 10 # Minimal max_cache_size_mb = 100 # Default max_cache_size_mb = 500 # Large
-
cleanup_interval_minutes
- How often to run cache cleanup- Type: Integer (positive)
- Default:
60
- Valid Range: 5-1440 minutes (5 minutes to 24 hours)
- What It Does: Automatic cleanup of expired cache entries
- What Gets Cleaned:
- Entries older than
cache_duration_hours
- Excess entries beyond
max_cache_entries
- If size exceeds
max_cache_size_mb
- Entries older than
- Trigger Timing: Based on wall clock time
- Performance Impact: Minimal (<1 second per cleanup)
- Recommendations:
15-30 min: Frequent runs, tight control 60 min: Default, balanced (hourly cleanup) 120-240 min: Infrequent runs, less overhead 1440 min: Once per day (very infrequent runs)
- Examples:
cleanup_interval_minutes = 30 # Every 30 minutes cleanup_interval_minutes = 60 # Every hour (default) cleanup_interval_minutes = 1440 # Daily
-
enable_background_cleanup
- Automatic cache maintenance- Type: Boolean
- Default:
true
- Valid Values:
true
,false
,yes
,no
,on
,off
,1
,0
- What It Does:
true
: Runs cleanup automatically based oncleanup_interval_minutes
false
: Only cleans up when limits exceeded (manual mode)
- Background Mode (
true
):- Periodic automatic cleanup
- Prevents cache bloat
- Recommended for most users
- Manual Mode (
false
):- Only cleans when forced (size/entry limits hit)
- Slightly less overhead
- Cache may grow larger before cleanup
- Use if you want maximum cache retention
- Examples:
enable_background_cleanup = true # Automatic (recommended) enable_background_cleanup = false # Manual only
[Retry]
# Retry behavior
max_retries = 5 # Maximum retry attempts per item
base_retry_delay_high = 5 # Base delay for high priority (seconds)
base_retry_delay_medium = 10 # Base delay for medium priority (seconds)
base_retry_delay_low = 15 # Base delay for low priority (seconds)
# Exponential backoff
exponential_base_delay = 60 # Base delay for exponential backoff
max_retry_delay = 86400 # Maximum delay (24 hours in seconds)
# Dead letter queue
dead_letter_threshold_days = 7 # Days before moving to dead letter queue
The retry system ensures failed downloads are automatically retried across multiple runs with intelligent priority-based backoff strategies. Failed items are queued in a persistent SQLite database (.retry_queue.db
) and retried on subsequent script runs.
max_retries
- Maximum retry attempts before giving up- Type: Integer (positive)
- Default:
5
- Valid Range: 1-50 (recommended: 3-10)
- What It Does: How many times to retry a failed download before moving to dead letter queue
- Retry Counter:
- Attempt 1: Initial download (not a retry)
- Attempts 2-6: Actual retries (if max_retries=5)
- After attempt 6: Move to dead letter queue
- What Triggers Retries:
- Network timeouts
- HTTP errors (403, 429, 500, 502, 503, 504)
- Temporary service unavailability
- Rate limit errors
- What Doesn't Retry:
- 404 Not Found (permanent, triggers content recovery instead)
- Invalid URLs
- File too large (exceeds size limits)
- Recommendations By Use Case:
3 retries: Quick processing, accept some failures 5 retries: Default, balanced persistence 10 retries: Maximum persistence, thorough recovery 20+ retries: Extreme cases, very unstable networks
- Examples:
max_retries = 3 # Quick, fewer attempts max_retries = 5 # Default, balanced max_retries = 10 # Persistent, thorough
Failed items are assigned priorities that determine their retry delays:
-
Priority Assignment (automatic):
- High Priority: Small files (<1MB), recently saved posts, first-time failures
- Medium Priority: Medium files (1-10MB), standard content
- Low Priority: Large files (>10MB), repeated failures, low-value content
-
base_retry_delay_high
- Base delay for high-priority items (seconds)- Type: Integer (positive)
- Default:
5
- Valid Range: 1-300 seconds
- When Used: Small, recent, important content
- Examples:
base_retry_delay_high = 1 # Retry almost immediately base_retry_delay_high = 5 # Default, 5 second delay base_retry_delay_high = 30 # More patient
-
base_retry_delay_medium
- Base delay for medium-priority items (seconds)- Type: Integer (positive)
- Default:
10
- Valid Range: 5-600 seconds
- When Used: Standard content, typical failures
- Examples:
base_retry_delay_medium = 5 # Quick retry base_retry_delay_medium = 10 # Default base_retry_delay_medium = 60 # Patient retry
-
base_retry_delay_low
- Base delay for low-priority items (seconds)- Type: Integer (positive)
- Default:
15
- Valid Range: 10-1800 seconds
- When Used: Large files, repeated failures
- Examples:
base_retry_delay_low = 10 # Relatively quick base_retry_delay_low = 15 # Default base_retry_delay_low = 120 # Very patient (2 minutes)
Priority Delay Example:
High priority item (attempt 1): Wait 5 seconds
Medium priority item (attempt 1): Wait 10 seconds
Low priority item (attempt 1): Wait 15 seconds
After the base delay, subsequent retries use exponential backoff to avoid hammering failing services.
-
exponential_base_delay
- Base delay for exponential backoff calculation- Type: Integer (positive)
- Default:
60
- Valid Range: 10-3600 seconds (10 seconds to 1 hour)
- Formula:
delay = exponential_base_delay × 2^(attempt_number - 1)
- Backoff Examples (base=60):
Attempt 1: 60 × 2^0 = 60 seconds (1 minute) Attempt 2: 60 × 2^1 = 120 seconds (2 minutes) Attempt 3: 60 × 2^2 = 240 seconds (4 minutes) Attempt 4: 60 × 2^3 = 480 seconds (8 minutes) Attempt 5: 60 × 2^4 = 960 seconds (16 minutes)
- Purpose: Gradual backoff reduces load on failing services, increases success chance
- Recommendations:
30s: Quick backoff, impatient 60s: Default, balanced (1 minute base) 300s: Slow backoff, very patient (5 minute base)
- Examples:
exponential_base_delay = 30 # Quick backoff exponential_base_delay = 60 # Default (1 min) exponential_base_delay = 300 # Slow backoff (5 min)
-
max_retry_delay
- Maximum delay between retries (cap)- Type: Integer (positive)
- Default:
86400
(24 hours) - Valid Range: 60-604800 seconds (1 minute to 7 days)
- What It Does: Caps exponential backoff to prevent extremely long delays
- Without Cap: Attempt 10 with base=60 would be 30,720 seconds (8.5 hours!)
- With Cap (86400): Any delay >24 hours is capped at 24 hours
- Delay Capping Example (base=60, max=86400):
Attempt 1: 60s (1 min) Attempt 2: 120s (2 min) Attempt 3: 240s (4 min) Attempt 4: 480s (8 min) Attempt 5: 960s (16 min) Attempt 6: 1920s (32 min) Attempt 7: 3840s (64 min) Attempt 8: 7680s (128 min ≈ 2 hours) Attempt 9: 15360s (256 min ≈ 4 hours) Attempt 10: 30720s (512 min ≈ 8.5 hours) Attempt 11: 61440s → CAPPED at 86400s (24 hours)
- Recommendations:
3600s (1 hour): Quick turnaround, frequent runs 86400s (24 hours): Default, daily runs 604800s (7 days): Weekly runs, maximum patience
- Examples:
max_retry_delay = 3600 # 1 hour max max_retry_delay = 86400 # 24 hours (default) max_retry_delay = 604800 # 7 days max
Items that exceed retry limits are moved to a "dead letter queue" for manual review or permanent archiving.
dead_letter_threshold_days
- Days before giving up permanently- Type: Integer (positive)
- Default:
7
- Valid Range: 1-365 days (1 day to 1 year)
- What It Does: After this many days of retrying, item moves to dead letter queue
- Dead Letter Queue Behavior:
- Items marked as "permanently failed"
- No longer retried automatically
- Kept in database for manual review
- Can be manually cleared or re-queued
- Calculation:
- Based on first_failure_timestamp, not retry count
- Example: Item first fails on Jan 1, threshold=7 days → Moves to DLQ on Jan 8
- What Happens After DLQ:
- Item logged as permanent failure
- Visible in retry queue status
- Manual intervention required to retry
- Can be cleared to reduce database size
- Recommendations:
1-2 days: Quick cleanup, aggressive pruning 7 days: Default, balanced (1 week) 30 days: Patient, thorough recovery attempts 365 days: Maximum persistence (1 year)
- Examples:
dead_letter_threshold_days = 1 # Move to DLQ after 1 day dead_letter_threshold_days = 7 # Default (1 week) dead_letter_threshold_days = 30 # Patient (1 month)
Dead Letter Queue Management:
- View DLQ items: Check
.retry_queue.db
with SQLite browser - Clear DLQ: Delete database file (creates fresh on next run)
- Re-queue items: Manual SQL UPDATE statements
- Monitor: Check logs for "moved to dead letter queue" messages
Complete Retry Flow Example (max_retries=5, threshold=7 days):
Day 1, Run 1: Download fails → Queue (attempt 1/5, high priority, 5s delay)
Day 1, Run 2: Retry fails → Queue (attempt 2/5, 60s backoff)
Day 2, Run 1: Retry fails → Queue (attempt 3/5, 120s backoff)
Day 3, Run 1: Retry fails → Queue (attempt 4/5, 240s backoff)
Day 4, Run 1: Retry fails → Queue (attempt 5/5, 480s backoff)
Day 5, Run 1: Final retry fails → Max retries exceeded, keep in queue
Day 8: Item in queue for >7 days → Move to dead letter queue
Quick alphabetical reference of all 43 settings with links to detailed documentation:
base_retry_delay_high
(Integer, default: 5) - High-priority retry delay | → Retry Sectionbase_retry_delay_low
(Integer, default: 15) - Low-priority retry delay | → Retry Sectionbase_retry_delay_medium
(Integer, default: 10) - Medium-priority retry delay | → Retry Sectioncache_duration_hours
(Integer, default: 24) - Cache recovery results duration | → Recovery Sectioncheck_type
(String, default: LOG) - File existence checking method | → Settings Sectioncleanup_interval_minutes
(Integer, default: 60) - Cache cleanup frequency | → Recovery Sectionclient_id
(String, default: None) - Reddit API client ID | → Configuration Sectionclient_ids
(String, default: None) - Imgur client IDs (multiple in settings.ini only) | → Imgur Sectionclient_secret
(String, default: None) - Reddit API client secret | → Configuration Sectionclient_secrets
(String, default: None) - Imgur client secrets (multiple in settings.ini only) | → Imgur Sectioncreate_thumbnails
(Boolean, default: true) - Generate thumbnail versions | → Media Section
dead_letter_threshold_days
(Integer, default: 7) - Days before moving to DLQ | → Retry Sectiondownload_albums
(Boolean, default: true) - Process multi-image posts | → Media Sectiondownload_audio
(Boolean, default: true) - Control audio downloads | → Media Sectiondownload_enabled
(Boolean, default: true) - Master media download switch | → Media Sectiondownload_images
(Boolean, default: true) - Control image downloads | → Media Sectiondownload_timeout
(Integer, default: 30) - Per-file download timeout | → Media Sectiondownload_videos
(Boolean, default: true) - Control video downloads | → Media Sectiondropbox_directory
(String, default: /reddit) - Dropbox cloud storage path | → Settings Sectionenable_background_cleanup
(Boolean, default: true) - Automatic cache maintenance | → Recovery Sectionexponential_base_delay
(Integer, default: 60) - Exponential backoff base delay | → Retry Sectionignore_tls_errors
(Boolean, default: false) - Bypass SSL certificate validation | → Settings Section
max_album_images
(Integer, default: 50) - Limit images per album | → Media Sectionmax_cache_entries
(Integer, default: 10000) - Maximum cached recovery results | → Recovery Sectionmax_cache_size_mb
(Integer, default: 100) - Cache size limit in MB | → Recovery Sectionmax_concurrent_downloads
(Integer, default: 3) - Parallel download streams | → Media Sectionmax_daily_storage_mb
(Integer, default: 1024) - Daily storage limit in MB | → Media Sectionmax_image_size
(Integer, default: 5242880) - Max image file size in bytes | → Media Sectionmax_retries
(Integer, default: 5) - Maximum retry attempts | → Retry Sectionmax_retry_delay
(Integer, default: 86400) - Maximum retry delay cap | → Retry Sectionmax_video_size
(Integer, default: 209715200) - Max video file size in bytes | → Media Sectionpassword
(String, default: None) - Reddit account password | → Configuration Sectionprocess_api
(Boolean, default: true) - Fetch content from Reddit API | → Settings Sectionprocess_gdpr
(Boolean, default: false) - Process GDPR export files | → Settings Section
recover_deleted
(Boolean, default: true) - Attempt Imgur content recovery | → Imgur Sectionsave_directory
(String, default: reddit/) - Local save directory | → Settings Sectionsave_type
(String, default: ALL) - What content to download | → Settings Sectionthumbnail_size
(Integer, default: 800) - Thumbnail dimensions in pixels | → Media Sectiontimeout_seconds
(Integer, default: 10) - Per-provider recovery timeout | → Recovery Sectionunsave_after_download
(Boolean, default: false) - Auto-unsave after download | → Settings Sectionuse_pushshift_api
(Boolean, default: true) - Use PullPush.io provider | → Recovery Sectionuse_reddit_previews
(Boolean, default: true) - Use Reddit preview system | → Recovery Sectionuse_reveddit_api
(Boolean, default: true) - Use Reveddit provider | → Recovery Sectionuse_wayback_machine
(Boolean, default: true) - Use Internet Archive | → Recovery Sectionusername
(String, default: None) - Reddit username | → Configuration Section
video_quality
(String, default: high) - Video quality preference | → Media Section
💡 Tip: Use your browser's search (Ctrl+F / Cmd+F) to quickly find a specific setting in the detailed sections above.
For Maximum Speed:
check_type = LOG
max_concurrent_downloads = 5
download_timeout = 15
For Maximum Reliability:
check_type = DIR
max_concurrent_downloads = 1
download_timeout = 60
Minimal Storage:
download_images = true
download_videos = false
create_thumbnails = false
max_image_size = 1048576 # 1MB
Complete Archive:
download_albums = true
max_video_size = 524288000 # 500MB
max_album_images = 0 # unlimited
- Never commit settings.ini with credentials to version control
- Use environment variables for production deployments
- Keep
ignore_tls_errors = false
unless absolutely necessary - Regularly rotate API credentials
Note: Environment variables always take precedence over settings.ini values for credentials.
Reddit Stash includes an advanced media download system that can download images, videos, and other media from Reddit posts and comments. Through modern web technologies and intelligent retry mechanisms, this system achieves ~80% success rates for media downloads - a dramatic improvement over basic HTTP methods (~10% success).
What this means for you:
- Most images and videos from your saved posts will actually be downloaded and preserved
- Better compatibility with modern hosting services and anti-bot protection
- Automatic recovery from temporary failures and rate limiting
- Separate tracking shows you exactly how much media content was successfully saved
What this means for you:
- Most users: Reddit Stash works great without Imgur API - you'll get occasional Imgur download failures, which is normal
- Lucky few: If you already had an Imgur application before 2024, you can use it for better Imgur downloads
This is the normal experience - don't worry about trying to get Imgur credentials:
✅ What works perfectly:
- Reddit-hosted images and videos (i.redd.it, v.redd.it)
- Most other image hosting services
- All text content and metadata
- Some Imgur images fail with "429 rate limit" errors
- This is expected and normal - not something you need to fix
If you already have an Imgur application from before 2024:
- Find your credentials at https://imgur.com/account/settings/apps
- Set environment variables:
export IMGUR_CLIENT_ID='your_imgur_client_id' export IMGUR_CLIENT_SECRET='your_imgur_client_secret'
- Enjoy enhanced features: Better reliability, album support, fewer rate limits
For enhanced media downloading, you can optionally set these environment variables:
# Imgur API (if you have existing credentials)
export IMGUR_CLIENT_ID='your_imgur_client_id'
export IMGUR_CLIENT_SECRET='your_imgur_client_secret'
# These improve download reliability and enable advanced features
# Leave unset if you don't have Imgur API access - basic downloads will still work
- Create a Reddit app at https://www.reddit.com/prefs/apps or https://old.reddit.com/prefs/apps/
- Set up the name, select
script
, and provide theredirect_uri
as per the PRAW docs.
- Copy the provided
REDDIT_CLIENT_ID
and theREDDIT_CLIENT_SECRET
based on the following screenshot:
REDDIT_USERNAME
is your reddit usernameREDDIT_PASSWORD
is your reddit password
Important for Two-Factor Authentication (TFA):
If your Reddit account has TFA enabled, you must provide your password and TFA code together, separated by a colon (:
), e.g.:
REDDIT_PASSWORD='your_password:123456'
where 123456
is your current TFA code. Alternatively, you can disable TFA for the account to use prawcore authentication.
If neither is done, prawcore authentication will fail.
Keep these credentials for the setup.
- Go to Dropbox Developer App.
- Click on Create app.
- Select
Scoped access
and chooseFull Dropbox
orApp folder
for access type. - give a Name to your app and click
Create app
.
- In the
Permissions
tab, ensure the following are checked underFiles and folders
:
- Your
DROPBOX_APP_KEY
andDROPBOX_APP_SECRET
are in the settings page of the app you created. - To get the
DROPBOX_REFRESH_TOKEN
follow the follwing steps:
Replace <DROPBOX_APP_KEY>
with your DROPBOX_APP_KEY
you got in previous step and add that in the below Authorization URL
https://www.dropbox.com/oauth2/authorize?client_id=<DROPBOX_APP_KEY>&token_access_type=offline&response_type=code
Paste the URL in browser and complete the code flow on the Authorization URL. You will receive an <AUTHORIZATION_CODE>
at the end, save it you will need this later.
Go to Postman, and create a new POST request with below configuration
-
Add Request URL- https://api.dropboxapi.com/oauth2/token
-
Click on the Authorization tab -> Type = Basic Auth -> Username =
<DROPBOX_APP_KEY>
, Password =<DROPBOX_APP_SECRET>
(Refer this answer for cURL -u option)
- Body -> Select "x-www-form-urlencoded"
Key | Value |
---|---|
code | <AUTHORIZATION_CODE> |
grant_type | authorization_code |
After you click send the request, you will receive JSON payload containing refresh_token.
{
"access_token": "sl.****************",
"token_type": "bearer",
"expires_in": 14400,
"refresh_token": "*********************",
"scope": <SCOPES>,
"uid": "**********",
"account_id": "***********************"
}
and add/export the above refresh_token to DROPBOX_REFRESH_TOKEN in your environment. For more information about the setup visit OAuth Guide.
- Credits for above DROPBOX_REFRESH_TOKEN solution : https://stackoverflow.com/a/71794390/12983596
When using Docker, you can control the scheduling behavior using these additional environment variables:
-
SCHEDULE_MODE
: Controls execution modeonce
(default): Run the script once and exitperiodic
: Run the script continuously on a schedule
-
SCHEDULE_INTERVAL
: Time between executions in periodic mode- Default:
7200
(2 hours) - Minimum:
60
(1 minute) - Units: seconds
- Examples:
3600
= 1 hour1800
= 30 minutes14400
= 4 hours
- Default:
Run once (default behavior):
docker run -it [...other env vars...] reddit-stash
Run every 2 hours:
docker run -it -e SCHEDULE_MODE=periodic [...other env vars...] reddit-stash
Run every 30 minutes:
docker run -it -e SCHEDULE_MODE=periodic -e SCHEDULE_INTERVAL=1800 [...other env vars...] reddit-stash
Stopping periodic execution:
Use Ctrl+C
or send SIGTERM to the container for graceful shutdown.
While Docker's built-in scheduling (SCHEDULE_MODE=periodic
) is convenient, some users prefer external scheduling for more control. Here's how to set up cron jobs to run Docker containers on a schedule.
Use External Cron when:
- You want system-level control over scheduling
- You need complex scheduling patterns (weekdays only, multiple times per day, etc.)
- You prefer containers to start/stop rather than run continuously
- You want to integrate with existing cron workflows
- You need different schedules for different operations (main script vs Dropbox uploads)
Use Built-in Scheduling when:
- You want simple, consistent intervals
- You prefer minimal setup
- You want the container to run continuously
- You don't need complex scheduling patterns
1. Edit your crontab:
crontab -e
2. Add cron jobs for different schedules:
Every 2 hours (Reddit data fetch):
0 */2 * * * docker run --rm -e REDDIT_CLIENT_ID='your_client_id' -e REDDIT_CLIENT_SECRET='your_client_secret' -e REDDIT_USERNAME='your_username' -e REDDIT_PASSWORD='your_password' -e DROPBOX_APP_KEY='your_dropbox_key' -e DROPBOX_APP_SECRET='your_dropbox_secret' -e DROPBOX_REFRESH_TOKEN='your_dropbox_token' -v /home/user/reddit-data:/app/reddit reddit-stash >> /var/log/reddit-stash.log 2>&1
Daily at 9 AM:
0 9 * * * docker run --rm -e REDDIT_CLIENT_ID='your_client_id' -e REDDIT_CLIENT_SECRET='your_client_secret' -e REDDIT_USERNAME='your_username' -e REDDIT_PASSWORD='your_password' -v /home/user/reddit-data:/app/reddit reddit-stash >> /var/log/reddit-stash.log 2>&1
Weekdays only, every 3 hours during work hours (9 AM - 6 PM):
0 9,12,15,18 * * 1-5 docker run --rm -e REDDIT_CLIENT_ID='your_client_id' -e REDDIT_CLIENT_SECRET='your_client_secret' -e REDDIT_USERNAME='your_username' -e REDDIT_PASSWORD='your_password' -v /home/user/reddit-data:/app/reddit reddit-stash >> /var/log/reddit-stash.log 2>&1
Separate Dropbox upload job (runs 30 minutes after main job):
30 */2 * * * docker run --rm -e DROPBOX_APP_KEY='your_dropbox_key' -e DROPBOX_APP_SECRET='your_dropbox_secret' -e DROPBOX_REFRESH_TOKEN='your_dropbox_token' -v /home/user/reddit-data:/app/reddit reddit-stash dropbox_utils.py --upload >> /var/log/reddit-stash-upload.log 2>&1
1. Open Task Scheduler (search "Task Scheduler" in Start menu)
2. Create Basic Task:
- Name: "Reddit Stash"
- Trigger: Daily/Weekly/Custom
- Action: "Start a program"
- Program:
docker
- Arguments:
run --rm -e REDDIT_CLIENT_ID=your_client_id -e REDDIT_CLIENT_SECRET=your_client_secret -e REDDIT_USERNAME=your_username -e REDDIT_PASSWORD=your_password -v C:\reddit-data:/app/reddit reddit-stash
3. Advanced Options:
- Run whether user is logged on or not
- Run with highest privileges
- Configure for your Windows version
Option 1: Create an environment file
Create /home/user/.reddit-stash.env
:
REDDIT_CLIENT_ID=your_client_id
REDDIT_CLIENT_SECRET=your_client_secret
REDDIT_USERNAME=your_username
REDDIT_PASSWORD=your_password
DROPBOX_APP_KEY=your_dropbox_key
DROPBOX_APP_SECRET=your_dropbox_secret
DROPBOX_REFRESH_TOKEN=your_dropbox_token
Then use in cron:
0 */2 * * * docker run --rm --env-file /home/user/.reddit-stash.env -v /home/user/reddit-data:/app/reddit reddit-stash >> /var/log/reddit-stash.log 2>&1
Option 2: Create a shell script
Create /home/user/run-reddit-stash.sh
:
#!/bin/bash
docker run --rm \
-e REDDIT_CLIENT_ID='your_client_id' \
-e REDDIT_CLIENT_SECRET='your_client_secret' \
-e REDDIT_USERNAME='your_username' \
-e REDDIT_PASSWORD='your_password' \
-e DROPBOX_APP_KEY='your_dropbox_key' \
-e DROPBOX_APP_SECRET='your_dropbox_secret' \
-e DROPBOX_REFRESH_TOKEN='your_dropbox_token' \
-v /home/user/reddit-data:/app/reddit \
reddit-stash
Make it executable and add to cron:
chmod +x /home/user/run-reddit-stash.sh
Cron entry:
0 */2 * * * /home/user/run-reddit-stash.sh >> /var/log/reddit-stash.log 2>&1
Create docker-compose.yml
:
version: '3.8'
services:
reddit-stash:
image: reddit-stash
environment:
- REDDIT_CLIENT_ID=${REDDIT_CLIENT_ID}
- REDDIT_CLIENT_SECRET=${REDDIT_CLIENT_SECRET}
- REDDIT_USERNAME=${REDDIT_USERNAME}
- REDDIT_PASSWORD=${REDDIT_PASSWORD}
- DROPBOX_APP_KEY=${DROPBOX_APP_KEY}
- DROPBOX_APP_SECRET=${DROPBOX_APP_SECRET}
- DROPBOX_REFRESH_TOKEN=${DROPBOX_REFRESH_TOKEN}
volumes:
- ./reddit:/app/reddit
profiles:
- manual
Run via cron:
0 */2 * * * cd /path/to/reddit-stash && docker-compose --profile manual run --rm reddit-stash >> /var/log/reddit-stash.log 2>&1
View cron logs:
# Ubuntu/Debian
tail -f /var/log/syslog | grep CRON
# CentOS/RHEL
tail -f /var/log/cron
# Your application logs
tail -f /var/log/reddit-stash.log
Monitor container execution:
# Check if containers are running
docker ps
# View recent container logs
docker logs $(docker ps -lq)
# Monitor Docker events
docker events --filter container=reddit-stash
Common Issues:
-
Path issues: Cron has limited PATH. Use full paths:
0 */2 * * * /usr/bin/docker run --rm [...] reddit-stash
-
Environment variables: Cron doesn't inherit your shell environment. Use
--env-file
or set in crontab:PATH=/usr/local/bin:/usr/bin:/bin 0 */2 * * * docker run --rm [...] reddit-stash
-
Permissions: Ensure your user can run Docker without sudo:
sudo usermod -aG docker $USER # Log out and back in
-
Testing cron jobs: Test your command manually first:
# Run the exact command from your crontab docker run --rm -e REDDIT_CLIENT_ID='...' [...] reddit-stash
Using --rm
flag:
- Automatically removes containers after execution
- Prevents accumulation of stopped containers
- Essential for cron-based scheduling
Memory limits:
docker run --rm --memory=512m -e [...] reddit-stash
CPU limits:
docker run --rm --cpus=0.5 -e [...] reddit-stash
Cleanup old images periodically:
# Add to weekly cron
0 0 * * 0 docker image prune -f >> /var/log/docker-cleanup.log 2>&1
unsave_after_download
in settings.ini). This feature can be used to cycle through older saved posts beyond Reddit's 1000-item limit.
- The script downloads and saves a post/comment
- If successful, it attempts to unsave the item
- A small delay is added between unsave operations to respect Reddit's rate limits
- Error handling ensures that failed unsaves don't stop the script
- This process is irreversible - Once items are unsaved, they cannot be automatically restored to your saved items list
- Create backups first - Always ensure you have a backup of your saved items before enabling this feature
- Use with caution - It's recommended to first run the script without unsaving to verify everything works as expected
- Rate Limiting - The script includes built-in delays to avoid hitting Reddit's API limits
- Error Recovery - If an unsave operation fails, the script will continue processing other items
- Set
unsave_after_download = true
in your settings.ini file - Run the script as normal
- The script will now unsave items after successfully downloading them
- Run the script multiple times to gradually access older saved items
- First run: Keep
unsave_after_download = false
and verify all content downloads correctly - Create a backup of your downloaded content
- Enable unsaving by setting
unsave_after_download = true
- Run the script multiple times to access progressively older content
The script can process Reddit's GDPR data export to access your complete saved post history. This feature uses PRAW to fetch full content for each saved item in your export.
-
Request your Reddit data:
- Go to https://www.reddit.com/settings/data-request
- Request your data (processing may take several days)
- Download the ZIP file when ready
-
Extract and place the CSV files:
- Inside your save directory (from settings.ini), create a
gdpr_data
folder - Example structure:
reddit/ # Your save directory ├── gdpr_data/ # GDPR data directory │ ├── saved_posts.csv │ └── saved_comments.csv ├── subreddit1/ # Regular saved content └── file_log.json
- Inside your save directory (from settings.ini), create a
-
Enable GDPR processing:
[Settings] process_gdpr = true
-
Run the script:
python reddit_stash.py
- Uses PRAW's built-in rate limiting
- Processes both submissions and comments
- Maintains consistent file naming with "GDPR_" prefix
- Integrates with existing file logging system
- Handles API errors and retries gracefully
- GDPR processing runs after regular API processing
- Each item requires a separate API call to fetch full content
- Rate limits are shared with regular API processing
- Large exports may take significant time to process
- Duplicate items are automatically skipped via file logging
Reddit Stash organizes content by subreddit with a clear file naming convention:
- Your Posts:
POST_[post_id].md
- Your Comments:
COMMENT_[comment_id].md
- Saved Posts:
SAVED_POST_[post_id].md
- Saved Comments:
SAVED_COMMENT_[comment_id].md
- Upvoted Posts:
UPVOTE_POST_[post_id].md
- Upvoted Comments:
UPVOTE_COMMENT_[comment_id].md
- GDPR Posts:
GDPR_POST_[post_id].md
- GDPR Comments:
GDPR_COMMENT_[comment_id].md
The system includes several utility modules:
- file_operations.py: Handles all file saving and organization logic
- save_utils.py: Contains the core content formatting functions
- gdpr_processor.py: Processes the GDPR data export
- time_utilities.py: Manages rate limiting and API request timing
- log_utils.py: Tracks processed files to avoid duplicates
Q: Why would I want to backup my Reddit content?
A: Reddit only allows you to access your most recent 1000 saved items. This tool lets you preserve everything beyond that limit and ensures you have a backup even if content is removed from Reddit.
Q: How often does the automated backup run? A: If you use the GitHub Actions setup, it runs on a schedule:
- Every 2 hours during peak hours (6:00-21:00 UTC)
- Twice during off-peak hours (23:00 and 3:00 UTC)
Q: Can I run this without GitHub Actions?
A: Yes, you can run it locally on your machine or set up the Docker container version. The README provides instructions for both options.
Q: Does this access private/NSFW subreddits I've saved content from?
A: Yes, as long as you're logged in with your own Reddit credentials, the script can access any content you've saved, including from private or NSFW subreddits.
Q: How can I verify the script is working correctly?
A: Check your specified save directory for the backed-up files. They should be organized by subreddit with clear naming conventions.
Q: Will this impact my Reddit account in any way?
A: No, unless you enable the unsave_after_download
option. This script only reads your data by default; it doesn't modify anything on Reddit unless that specific option is enabled.
Q: What happens if the script encounters rate limits?
A: The script has built-in dynamic sleep timers to respect Reddit's API rate limits. It will automatically pause and retry when necessary.
When using Reddit Stash, keep these security considerations in mind:
- Never share your Reddit API credentials or Dropbox tokens with others
- When using GitHub Actions, your credentials are stored as encrypted secrets
- For local installations, consider using environment variables instead of hardcoding credentials in the settings file
- Regularly rotate your API keys and tokens, especially if you suspect they may have been compromised
- Reddit Stash downloads and stores all content from saved posts, including links and images
- Be aware that this may include sensitive or private information if you've saved such content
- Consider where you're storing the backed-up content and who has access to that location
- Dropbox encryption provides some protection, but for highly sensitive data, consider additional encryption
- The GitHub Actions workflow runs in GitHub's cloud environment
- While GitHub has strong security measures, be aware that your Reddit content is processed in this environment
- The workflow has access to your repository secrets and the content being processed
- For maximum security, consider running the script locally on a trusted machine
- Content is stored in plain text markdown files
- If storing content locally, ensure your device has appropriate security measures (encryption, access controls)
- If you back up your local storage to other services, be mindful of where your Reddit content might end up
Feel free to open issues or submit pull requests if you have any improvements or bug fixes.
- This project was inspired by reddit-saved-saver.
Have an idea for improving Reddit Stash? Feel free to suggest it in the issues or contribute a pull request!
✅ Recently Implemented:
- Content Recovery System - 4-provider cascade for failed downloads (Wayback Machine, PullPush.io, Reddit Previews, Reveddit) with SQLite caching and automatic retry across runs
- Advanced Media Download System - Modern web compatibility with HTTP/2 support and browser impersonation
- Comprehensive Rate Limiting - Multi-layer rate limiting with provider-specific limits and intelligent backoff
🔮 Planned Enhancements:
- Improve error handling for edge cases
- Add support for additional cloud storage providers (Google Drive, OneDrive)
- Create a simple web interface for configuration
- Enhanced Media Processing
- Video compression and format conversion options
- Parallel downloads with queue management
- Selective downloads by file size/type with user-defined rules
- Download progress tracking and statistics
- Additional Recovery Providers
- Archive.today integration
- Library of Congress web archive
- Custom recovery provider plugins
This project is licensed under the MIT License - see the LICENSE file for details.