Note: Scweet is not affiliated with Twitter/X. Use responsibly and lawfully.
Scweet is:
- A hosted Apify Actor (recommended for production runs and easy scaling).
- A Python library (recommended when you want to embed scraping in your own codebase).
Tweet search scraping in v4 is API-only (Twitter/X web GraphQL). Scweet keeps local state in SQLite (accounts, leases, resume checkpoints).
Full documentation: DOCUMENTATION.md
If you want the fastest path to results (and the best option for production workflows), use the hosted Apify Actor:
- Actor page: https://apify.com/altimis/scweet?fpr=a40q9&fp_sid=jeb97
- Apify offers a free plan/trial (plan limits can change); see the Actor page for current details.
- Open the Actor page and click "Run".
- Fill the comprehensive input form (recommended), or switch to JSON input and paste an input JSON (example below).
- Run, then download results from the dataset.
Minimal input example:
{
"source_mode": "auto",
"search_query": "openai",
"profile_urls": [],
"max_items": 1000,
"since": "2026-02-01",
"until": "2026-02-07",
"search_sort": "Latest",
"lang": "en"
}For more details, see Apify Python client quickstart: https://apify.com/altimis/scweet/api/python
| Use case | Recommended |
|---|---|
| You want hosted runs, scheduling, scalable execution, and datasets | Apify Actor |
| You want to embed scraping inside your own Python pipeline/app | Python library |
| You don’t want to manage local state DB / provisioning details | Apify Actor |
| You need full control over provisioning/account leasing | Python library |
- This is scraping (Twitter/X web GraphQL), not an official API.
- You need accounts/cookies. Rate limits and anti-bot controls vary.
- For production workflows, the Apify Actor is usually the simplest and most reliable option.
pip install Scweetfrom Scweet import Scweet, ScweetConfig, ScweetDBLegacy import path (supported in v4.x, deprecated):
from Scweet.scweet import Scweetfrom Scweet import Scweet
scweet = Scweet.from_sources(
db_path="scweet_state.db",
# Provide accounts via one (or more) sources:
cookies_file="cookies.json", # also supports Netscape cookies.txt
# accounts_file="accounts.txt",
# env_path=".env",
# cookies={"auth_token": "...", "ct0": "..."},
# cookies="YOUR_AUTH_TOKEN", # convenience auth_token string (ct0 can be bootstrapped if allowed)
output_format="both", # csv|json|both|none
)
tweets = scweet.scrape(
since="2026-02-01",
until="2026-02-07",
words=["openai"],
limit=200,
resume=True,
save_dir="outputs",
custom_csv_name="openai.csv",
)
print("tweets:", len(tweets))Examples you can run from source checkout:
- Sync:
examples/sync_example.py - Async:
examples/async_example.py
Example input templates (placeholders):
examples/.envexamples/accounts.txtexamples/cookies.json
If you want one place to control everything, build a config and pass it to Scweet(config=...). Keep most advanced knobs in DOCUMENTATION.md.
from Scweet import Scweet, ScweetConfig
cfg = ScweetConfig.from_sources(
db_path="scweet_state.db",
cookies_file="cookies.json", # optional provisioning source
accounts_file="accounts.txt", # optional provisioning source
# cookies={"auth_token": "...", "ct0": "..."}, # optional provisioning source
# cookies="YOUR_AUTH_TOKEN", # optional provisioning source
bootstrap_strategy="auto", # auto|token_only|nodriver_only|none
provision_on_init=True, # import sources during Scweet init
output_format="both", # csv|json|both|none
resume_mode="hybrid_safe", # legacy_csv|db_cursor|hybrid_safe
strict=False, # raise instead of returning empty output for some failures
proxy=None, # used for API calls and nodriver bootstrap
overrides={
"pool": {"concurrency": 4},
"operations": {
"account_lease_ttl_s": 300,
"account_requests_per_min": 30,
"account_min_delay_s": 2,
"account_daily_requests_limit": 30,
"account_daily_tweets_limit": 600,
},
"output": {"dedupe_on_resume_by_tweet_id": True},
},
)
scweet = Scweet(config=cfg)Key knobs most users care about:
pool.concurrencyoperations.account_requests_per_minoperations.account_min_delay_soperations.account_daily_requests_limitandoperations.account_daily_tweets_limitoutput.formatandoutput.dedupe_on_resume_by_tweet_idresume.mode(legacy_csv,db_cursor,hybrid_safe)
Scweet stores accounts in SQLite. Provisioning imports account sources into the DB and marks which accounts are eligible.
Supported sources:
accounts.txtcookies.json(and Netscapecookies.txt)cookies=payload (cookie dict/list/header string/auth_token string/file path/JSON string).envviaenv_path
You can provision on init (recommended) or manually:
from Scweet import Scweet
scweet = Scweet.from_sources(db_path="scweet_state.db", provision_on_init=False)
result = scweet.provision_accounts(
accounts_file="accounts.txt",
cookies_file="cookies.json",
# env_path=".env",
# cookies={"auth_token": "...", "ct0": "..."},
# cookies="YOUR_AUTH_TOKEN",
)
print(result) # {"processed": ..., "eligible": ...}Account/proxy file formats live in DOCUMENTATION.md (kept out of README on purpose).
One account per line (colon-separated):
username:password:email:email_password:2fa:auth_token
Missing trailing fields are allowed.
[
{
"username": "acct1",
"cookies": { "auth_token": "...", "ct0": "..." }
}
]By default, proxy= (runtime proxy) applies to all accounts. If you need a different proxy per account, set a proxy on the account record:
- In
cookies.json: add"proxy"to the account object (string URL or dict). - In
accounts.txt: append a tab then a proxy value (string URL or JSON dict). - Or set it later via
ScweetDB.set_account_proxy(username, proxy).
Proxy credentials are supported:
- API HTTP accepts either a URL string like
"http://user:pass@host:port"or a dict like{"host": "...", "port": 8080, "username": "...", "password": "..."}. - nodriver bootstrap also supports authenticated proxies, but the dict form is recommended for proxy auth.
Example cookies.json record with proxy:
[
{
"username": "acct1",
"cookies": { "auth_token": "...", "ct0": "..." },
"proxy": { "host": "127.0.0.1", "port": 8080, "username": "proxyuser", "password": "proxypass" }
}
]Example accounts.txt line with a proxy (tab-separated):
alice:::::AUTH_TOKEN_HERE {"host":"127.0.0.1","port":8080}
Key scrape inputs:
since,until(YYYY-MM-DD)words(list or legacy string split by//)from_account,to_account,mention_account,hashtag,langdisplay_type("Top" or "Latest")limit(best-effort per run)resume=Trueappends to outputs and continues using the configured resume mode
The return value is always list[dict] of raw GraphQL tweet objects.
Files are controlled by output_format:
csv: curated "important fields" schemajson: raw tweets (full fidelity)both: write bothnone: return only
resume=Trueappends to existing CSV and JSON outputs.- To avoid writing duplicates across runs when resuming:
overrides={"output": {"dedupe_on_resume_by_tweet_id": True}}
Use ScweetDB to inspect and maintain the local state DB:
from Scweet import ScweetDB
db = ScweetDB("scweet_state.db")
print(db.accounts_summary())
print(db.list_accounts(limit=10, eligible_only=True))
print(db.set_account_proxy("acct1", {"host": "127.0.0.1", "port": 8080}))
print(db.reset_daily_counters())
print(db.clear_leases(expired_only=True))
print(db.reset_account_cooldowns(clear_leases=True, include_unusable=True))
print(db.collapse_duplicates_by_auth_token(dry_run=True))- Profile info scraping
- Followers/following scraping
- Profile timeline scraping
- Richer search query inputs
See DOCUMENTATION.md for the full guide (cookies formats, logging setup, strict mode, manifest updates, advanced config knobs).
If you want hosted runs and easy scaling, use the Apify Actor:
We welcome PRs, bug reports, and feature suggestions!
If you find Scweet useful, consider starring the repo ⭐
MIT License • © 2020–2026 Altimis