Skip to content

ayghri/tweet_rss

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Nitter RSS Aggregator

Scrape tweets from a Nitter instance and serve them as RSS feeds.

How It Works

This application scrapes tweets from a Nitter instance (a privacy-focused Twitter frontend) and stores them in a database. It uses headless Chrome via nodriver to render JavaScript and scroll through search results, extracting tweets progressively. Make sure the infinite scrolling is enabled in Nitter instance.

Key features:

  • Scrape tweets for multiple Twitter/X users
  • Organize users into lists
  • Generate RSS feeds per list
  • Automatic periodic scraping (configurable interval)
  • Tweet type detection (post, reply, quote, retweet)
  • Progressive saving (tweets saved as they're found)
  • Django admin for management
  • Docker support with minimal Alpine-based image

Requirements

Self-hosted Nitter instance is strongly recommended. Public Nitter instances are often rate-limited or blocked by Twitter, resulting in empty or incomplete results. Running your own Nitter instance ensures reliable scraping.

Popular Nitter deployments:

Project Structure

├── scripts/                   # Standalone scraper (optional)
│   ├── scraper.py             # Core scraping module
│   ├── scrape.py              # CLI
│   └── requirements.txt
├── tweets_rss/                # Django web app
│   ├── manage.py
│   ├── settings.py
│   ├── urls.py
│   ├── requirements.txt
│   └── rss_aggregator/
│       ├── models.py          # TwitterUser, Tweet, TweetList
│       ├── admin.py           # Admin interface
│       ├── feeds.py           # RSS feed generation
│       ├── scraper.py         # Integrated scraper
│       ├── scheduler.py       # Background task scheduler
│       └── views.py           # YAML export endpoint
├── Dockerfile
├── docker-compose.yml
├── entrypoint.sh
├── .env.example
└── README.md

Quick Start (Docker)

  1. Clone and configure:
git clone <repo-url>
cd tweets_retrieve
cp .env.example .env
  1. Edit .env with your Nitter instance URL:
NITTER_INSTANCE=http://your-nitter-instance:8080
  1. Start the container:
docker compose up -d
# or with podman
podman compose up -d
  1. Access admin: http://localhost:8000/admin/

Default credentials: admin / admin (change via ADMIN_USER / ADMIN_PASSWORD env vars)


Manual Installation

cd tweets_rss
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

python manage.py migrate
python manage.py createsuperuser
python manage.py runserver

Usage

Adding Users

  1. Go to Admin > Twitter Users > Add
  2. Enter the Twitter username (without @)
  3. Set the Start Date (tweets before this date won't be scraped)
  4. Choose whether to Include Replies
  5. Save - scraping starts automatically in the background

Creating Lists

  1. Go to Admin > Tweet Lists > Add
  2. Create a list with a name and slug
  3. Add users to the list
  4. Subscribe to RSS at /rss/<slug>/

Endpoints

Endpoint Description
/admin/ Django admin interface
/rss/<slug>/ RSS feed for a list
/export/<username>.yaml Export user's tweets as YAML

Manual Scraping

From the admin panel:

  • Click "Scrape" button next to any user
  • Select multiple users and use "Scrape selected users" action
  • Click "Test Nitter" to verify your Nitter instance is working

From command line:

# Inside container
python manage.py scrape_tweets
python manage.py scrape_tweets --user karpathy

Configuration

Variable Default Description
SECRET_KEY dev key Django secret key (change in production!)
DEBUG False Debug mode
ALLOWED_HOSTS * Comma-separated allowed hosts
ADMIN_USER admin Admin username
ADMIN_PASSWORD admin Admin password
ADMIN_EMAIL admin@example.com Admin email
NITTER_INSTANCE - Required. Your Nitter instance URL
SCRAPE_DELAY 3.0 Seconds between scrolls (rate limiting)
SCRAPE_INTERVAL_HOURS 6 Hours between automatic scrapes

Rate Limiting

To avoid overloading your Nitter instance:

  • SCRAPE_DELAY: Wait time between page scrolls (default: 3 seconds)
  • Max tweets: Scraping stops after 500 tweets per user per session
  • User delay: 30 seconds wait between scraping different users
  • Scraping resumes from max(start_date, last_scrape - 1 day) to avoid re-fetching old tweets

Standalone Scraper

The scripts/ folder contains a standalone scraper that can be used without Django:

cd scripts
pip install -r requirements.txt

# Basic usage
python scrape.py <username> <since> <until>

# Examples
python scrape.py karpathy 2024-01-01 2024-12-31
python scrape.py karpathy 2024-01-01 2024-12-31 --visible  # Show browser
python scrape.py karpathy 2024-01-01 2024-12-31 -o tweets.yaml

Troubleshooting

Empty results / No tweets found

  1. Test your Nitter instance using the "Test Nitter" button in admin
  2. Ensure your Nitter instance is properly configured and can access Twitter
  3. Public Nitter instances are often rate-limited - use a self-hosted instance

Browser connection errors

The scraper runs headless Chrome. If you see sandbox errors:

  • The container uses no_sandbox=True which is required for running as root
  • Ensure Chromium is installed (included in Docker image)

Database issues

If you get migration errors after updates:

# Reset database (will lose data)
rm tweets_rss/db.sqlite3
docker compose down && docker compose up -d

License

MIT

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors