Uninews is a universal news smart scraper written in Rust.
It downloads a news article from a given URL, cleans the HTML content, and leverages CloudLLM to convert the content into Markdown format with minimal loss.
The LLM provider is pluggable via UNINEWS_LLM_CLIENT and UNINEWS_LLM_MODEL environment variables — see the LLM Providers section. Out of the box Uninews talks to OpenAI, but you can route Markdown conversion through OpenRouter, xAI Grok, Google Gemini, or Anthropic Claude without changing your code.
With its powerful translation capabilities, Uninews can seamlessly translate articles into multiple languages while preserving formatting, making it ideal for multilingual content processing.
The final output (via API) is a JSON object containing the article's title, the Markdown-formatted content (translated if specified), and a featured image URL.
It can be used both as a library and as a command-line tool in Linux, Mac and Windows.
When used as a command-line tool, it outputs the final Markdown with the contents of the news article or blog post in the requested language.
uninews --help
A universal news scraper for extracting content from various news blogs and news sites.
Usage: uninews [OPTIONS] <URL>
Arguments:
<URL> The URL of the news article to scrape
Options:
-l, --language <LANGUAGE> Optional output language (default: english) [default: english]
-j, --json Output the result as JSON instead of human-readable text
-h, --help Print help
-V, --version Print version
- Scraping & Cleaning: Extracts the main content of a news article by targeting the
<article>tag (or falling back to<body>) and removing unwanted elements. - Markdown Conversion: Uses the CloudLLM Rust API to convert the cleaned HTML content into near-lossless Markdown. The LLM provider is pluggable via env vars (see LLM Providers).
- X.com / Twitter Support: Reads individual tweets and full X threads via the X API v2, assembling the thread chronologically before converting it to Markdown.
- Reusable Library: The
universal_scrapefunction is exposed for easy integration into other Rust projects. - Multilanguage Support: The
universal_scrapefunction accepts an optional language parameter to specify the language of the article to scrape, otherwise it defaults to English.
You need to have Rust and Cargo installed on your system.
If you do have Rust installed, follow these steps:
- Install Uninews:
cargo install uninewsIf you don't have Rust installed, follow these steps to install Rust and build from source:
- Install Rust:
On Unix/macOS:
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh- Verify Installation
rustc --version
cargo --version- Clone the Project:
git clone https://github.com/gubatron/uninews.git
cd uninews- Build & Install the Project:
make build
make install- Run it in the command line:
OpenAI (default):
export OPEN_AI_SECRET=sk-xxxxxxxxxxxxxxxxxxxxxxxxxx
uninews <some post url>OpenRouter with any vendor/model slug (e.g. Qwen 3.7 Max):
export UNINEWS_LLM_CLIENT=openrouter
export UNINEWS_LLM_MODEL=qwen/qwen3.7-max
export OPENROUTER_API_KEY=sk-or-xxxxxxxxxxxxxxxxxxxxxxxxxx
uninews <some post url>Or, on a single line without exporting:
OPEN_AI_SECRET=sk-xxx uninews [-l <some language name>] <some post url>Uninews selects the LLM provider used to convert HTML to Markdown based on two environment variables, and lets you tune the LLM context window with a third:
| Variable | Default | Description |
|---|---|---|
UNINEWS_LLM_CLIENT |
openai |
One of openai, openrouter, grok, gemini, claude. |
UNINEWS_LLM_MODEL |
per-client | Free-form model slug. If unset, each client falls back to the default listed in the table below (e.g. gpt-5.5 for openai, openai/gpt-5.5 for openrouter). For OpenRouter you usually want a vendor/model slug (e.g. qwen/qwen3.7-max). |
UNINEWS_LLM_CONTEXT_WINDOW |
256000 |
LLM context-window budget (in tokens) used by LLMSession while formatting the Markdown. Bump this when the model you point at via UNINEWS_LLM_MODEL supports a larger context (e.g. Gemini-class 1M+ models) or a longer article blows past the default. Library callers can also pass Some(n) to universal_scrape / convert_content_to_markdown to override per call; the explicit argument always wins. Invalid or non-positive values fall back to the default. |
Each provider reads its API key from a dedicated env var. Only the one matching
the active UNINEWS_LLM_CLIENT is consulted. When UNINEWS_LLM_MODEL is unset,
each client falls back to its built-in default (rightmost column), so you only
need to override UNINEWS_LLM_MODEL when you want a different model.
UNINEWS_LLM_CLIENT |
API key env var | Default model when UNINEWS_LLM_MODEL is unset |
|---|---|---|
openai |
OPEN_AI_SECRET |
gpt-5.5 |
openrouter |
OPENROUTER_API_KEY |
openai/gpt-5.5 (a vendor/model slug) |
grok |
XAI_API_KEY |
grok-4.3 |
gemini |
GEMINI_API_KEY |
gemini-3.5-flash |
claude |
CLAUDE_API_KEY |
claude-opus-4.7-fast |
OpenAI (default):
export OPEN_AI_SECRET=sk-xxx
uninews https://example.com/articleOpenRouter with Qwen 3.7 Max and a 2M context budget:
export UNINEWS_LLM_CLIENT=openrouter
export UNINEWS_LLM_MODEL=qwen/qwen3.7-max
export UNINEWS_LLM_CONTEXT_WINDOW=2000000
export OPENROUTER_API_KEY=sk-or-xxx
uninews https://example.com/articleAnthropic Claude:
export UNINEWS_LLM_CLIENT=claude
export UNINEWS_LLM_MODEL=claude-sonnet-4-6
export CLAUDE_API_KEY=sk-ant-xxx
uninews https://example.com/articleIf UNINEWS_LLM_CLIENT is set to an unsupported value, or the matching API
key env var is missing, Uninews returns a clear error in Post::error.
If you embed Uninews in another Rust app and want to surface the active
provider/model in a chat notification or log line, the active client is
exposed via the upstream cloudllm::LLMClientInfo trait (re-exported as
uninews::LLMClientInfo):
use uninews::{active_llm_client, active_provider_label, llm_context_window, LLMClientInfo};
if let Ok(client) = active_llm_client() {
println!(
"uninews routed through {} ({})",
client.llm_provider_name().unwrap_or("unknown"),
client.llm_model_name().unwrap_or("unknown"),
);
}
println!("uninews is budgeting {} tokens of context", llm_context_window());
// Or, for a one-line label that's safe to drop into any chat message:
println!("Extrayendo con uninews usando {}...", active_provider_label());
// → "Extrayendo con uninews usando OpenRouter (qwen/qwen3.7-max)..."active_provider_label() always reflects whatever UNINEWS_LLM_CLIENT /
UNINEWS_LLM_MODEL are set to at call time, so consumers can replace their
hardcoded "GPT-5.5" / "Claude" / "Qwen" strings with a single call.
To read tweets and X threads, set:
X_API_KEYas your X App Consumer KeyX_API_SECRETas your X App Consumer Secret
uninews will exchange them for an app-only bearer token automatically.
You can obtain both values from your X App dashboard under Keys and tokens.
export X_API_KEY=your_x_api_key
export X_API_SECRET=your_x_api_secret
uninews "https://x.com/user/status/1234567890"| Variable | Required | Description |
|---|---|---|
X_API_KEY |
Yes | X App Consumer Key / API Key from the Keys and tokens page. |
X_API_SECRET |
Yes | X App Consumer Secret / API Secret from the same Keys and tokens page. |
UNINEWS_CHROME_USER_DATA_DIR |
No | Chrome user-data directory for the secondary X Article browser fallback, if X withholds the article body from its web GraphQL payload and guest HTML. |
UNINEWS_CHROME_PROFILE_DIR |
No | Chrome profile directory name such as Default or Profile 1, used with UNINEWS_CHROME_USER_DATA_DIR. |
UNINEWS_CHROME_BINARY |
No | Override the Chrome/Chromium executable used for the secondary X Article browser fallback. |
When a URL starts with https://x.com/ or https://twitter.com/, uninews will:
- Extract the tweet ID from the URL.
- Fetch the tweet (and its author info) via the X API v2.
- If the post is only sharing an external article link, follow the expanded article URL and scrape the linked article directly.
- If the post is only sharing an X Article link (
x.com/i/article/...), fetch the article body from X's web GraphQL tweet payload. - Only if X still withholds the article body there, fall back to the linked article URL / browser fallback path.
- Otherwise, attempt to retrieve the full thread from the same author using the recent-search endpoint (covers the last 7 days).
- Sort all thread tweets chronologically (oldest → newest).
- Pass the assembled content through the AI formatter, preserving the scraped article wording and structure as closely as possible.
For x.com/i/article/... links, uninews now first asks X's web GraphQL endpoint for the article title and body text tied to the linking tweet. If X still hides the article body there, uninews will try a local Chrome headless fallback automatically. If X still serves the guest wall, point UNINEWS_CHROME_USER_DATA_DIR at a logged-in Chrome user-data directory and optionally set UNINEWS_CHROME_PROFILE_DIR.
When those variables are set, uninews clones the selected Chrome profile into a temporary directory before launching headless Chrome, so your normal Chrome session can stay open and the live profile lock is not touched.
Example on macOS:
export UNINEWS_CHROME_USER_DATA_DIR="$HOME/Library/Application Support/Google/Chrome"
export UNINEWS_CHROME_PROFILE_DIR="Default"
uninews "https://x.com/DiarioBitcoin/status/2034263054754726116"If either X_API_KEY or X_API_SECRET is missing, a clear error message is returned instead of silently failing.
This is not OAuth 1.0a user-context authentication. uninews uses your Consumer Key and Consumer Secret to obtain an OAuth 2.0 app-only bearer token for read-only X API requests.
Command line usage
A universal news scraper for extracting content from various news blogs and newsites.
Usage: uninews [OPTIONS] <URL>
Arguments:
<URL> The URL of the news article to scrape
Options:
-l, --language <LANGUAGE> Optional output language (default: english) [default: english]
-j, --json Output the result as JSON instead of human-readable text
-h, --help Print help
-V, --version Print version
Integrating it with your rust project
Uninews reads the LLM provider from UNINEWS_LLM_CLIENT and UNINEWS_LLM_MODEL, and the context-window budget from UNINEWS_LLM_CONTEXT_WINDOW. If you want to override them in code (instead of via std::env::set_var), do it before calling universal_scrape. For example, to force OpenRouter with a Qwen model and a 2M-token context from inside your app:
use uninews::{universal_scrape, Post};
// Route Markdown conversion through OpenRouter
std::env::set_var("UNINEWS_LLM_CLIENT", "openrouter");
std::env::set_var("UNINEWS_LLM_MODEL", "qwen/qwen3.7-max");
std::env::set_var("UNINEWS_LLM_CONTEXT_WINDOW", "2000000");
std::env::set_var("OPENROUTER_API_KEY", "sk-or-...");
// Or, for a single call, pass the context window explicitly:
let post = universal_scrape(&url, "english", Some(2_000_000)).await;
if !post.error.is_empty() {
eprintln!("Error during scraping: {}", post.error);
return;
}
// Print the title and Markdown-formatted content.
println!("{}\n\n{}", post.title, post.content);If you only need OpenAI, just set OPEN_AI_SECRET once (e.g. before starting
your process) and call universal_scrape(url, "english", None) — Uninews will
pick it up and use the default 256K context window.
Licensed under the MIT License.
Copyright (c) 2026 Ángel León