|
|
Diana converts documents, webpages, and news into high-quality MP3 audio using local or cloud AI text-to-speech models. All core processing runs on your machine — cloud features are optional. Features
Quick Start
|
- Python 3.10+ (3.10, 3.11, 3.12, 3.13 all supported)
- ffmpeg (required for MP3 encoding)
macOS:
brew install ffmpegWindows:
choco install ffmpegOr download from https://ffmpeg.org/download.html and add to your PATH.
-
Clone the repository:
git clone https://github.com/tfm000/diana.git cd diana -
Run the setup script (creates venv, installs deps, downloads all models, copies config):
macOS / Linux:
./setup.sh
Windows:
setup.bat
This creates a
Diana.command(macOS) orDiana.bat(Windows) launcher you can double-click to start Diana.
Manual installation (if you prefer not to use the setup script)
-
Create a virtual environment:
python -m venv .venv
-
Activate the virtual environment:
macOS / Linux:
source .venv/bin/activateWindows:
.venv\Scripts\activate
-
Install dependencies:
pip install -r requirements.txt
-
Download the Kokoro TTS model files (~340 MB total):
# macOS / Linux curl -L -o data/models/kokoro-v1.0.onnx https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files-v1.0/kokoro-v1.0.onnx curl -L -o data/models/voices-v1.0.bin https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files-v1.0/voices-v1.0.binWindows (PowerShell):
mkdir data\models -Force Invoke-WebRequest -Uri "https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files-v1.0/kokoro-v1.0.onnx" -OutFile "data\models\kokoro-v1.0.onnx" Invoke-WebRequest -Uri "https://github.com/thewh1teagle/kokoro-onnx/releases/download/model-files-v1.0/voices-v1.0.bin" -OutFile "data\models\voices-v1.0.bin"
-
Download the Piper TTS model files (~60 MB):
curl -L -o data/models/en_US-lessac-medium.onnx https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/lessac/medium/en_US-lessac-medium.onnx curl -L -o data/models/en_US-lessac-medium.onnx.json https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/lessac/medium/en_US-lessac-medium.onnx.json
-
Copy the example config:
cp config.example.yaml config.yaml
Start Diana:
python run.pyOpen http://localhost:8501 in your browser.
The landing page displays Diana's artwork and quick links to all pages.
Navigate to the Upload page to convert a document to audio.
- Choose a file — drag-and-drop or browse for a PDF, EPUB, or TXT file (up to the configured max upload size, default 1024 MB).
- Select pages/chapters — for PDFs and EPUBs, Diana shows the total page or chapter count and lets you specify which to convert. Enter ranges and individual numbers separated by commas (e.g.
1-3, 5, 10-15). Leave empty to convert the entire document. - Configure TTS — pick an engine, voice, and speed. These default to whatever is set in Settings but can be overridden per job.
- Preview voice — click Preview Voice to hear a short sample with your selected engine, voice, and speed before committing.
- Convert — click Convert to Audio to queue the job.
The Library page lists all conversion jobs.
- Pending / In-progress jobs show their current stage and a progress bar during synthesis.
- Completed jobs have an inline audio player, Download, Rename, Move, and Delete buttons.
- Failed jobs display an expandable error details section.
- Search jobs by filename, filter by status, and sort by date, name, or status.
- Folders — organise jobs into folders via the "Manage Folders" panel and per-job "Move" button.
- Pagination — browse large libraries 20 jobs at a time.
The page auto-refreshes while jobs are processing.
The News page is a newspaper aggregator powered by RSS and AI summarisation.
- Add sources — enter a name and homepage URL. Use Edit to attach one or more RSS feed URLs and assign the source to one or more groups.
- Organise sources — sort by name or group; filter by group. Assign multiple groups per source (e.g. Finance, US News).
- Get Latest Stories — requires an LLM configured in Settings. Diana scrapes each source (RSS first, then homepage, then archive.ph as a last resort), passes all articles to the LLM in a single batched call, and returns deduplicated top stories per category.
- Stories older than 3 days are automatically discarded.
- Categories: Finance, Politics, Technology & Science, Sports & Entertainment, World, Health, Other.
- Max stories per category is configurable in Settings.
- Browse stories — categories are collapsible. Each story shows source, importance score, summary, and Visit / Archive links (via archive.ph).
- Convert to Audio — select all stories, a category, or individual stories, then convert to an MP3 job.
- Persistent — the last fetched batch is stored in the database and reloaded automatically when you return to the page.
- Export sources — download all your sources (names, URLs, RSS feeds, groups) as a
diana_sources.jsonfile to share with others or back up your configuration. - Import sources — upload a
diana_sources.jsonfile to bulk-add sources. If a source already exists (matched by homepage URL), choose to append the imported feeds/groups alongside existing ones or overwrite them entirely.
The Web URL to Audio page converts any webpage into an audiobook in one step.
- Paste a URL.
- Optionally enable LLM cleaning (requires LLM configured in Settings).
- Pick engine, voice, and speed.
- Click Convert — Diana scrapes the page, cleans the text, and queues a TTS job.
The Settings page lets you configure defaults that apply to new jobs and the dashboard itself. All changes are saved to config.yaml.
| Section | Options |
|---|---|
| TTS Engine | Default engine, voice (dropdown), and speed |
| Processing | Max chunk size, output bitrate, silence gap between chunks |
| Dashboard | Theme (dark / light / device auto-detect), max upload size (MB) |
| Model Paths | Kokoro model + voices files, Piper model file |
| LLM Text Cleaning | Provider (OpenAI / Anthropic / Anthropic CLI / Google), API key, model override, translation target language |
| OpenAI TTS | API key, model (tts-1 / tts-1-hd) |
| ElevenLabs TTS | API key, model ID |
| News | Max stories per category returned by the AI |
Note: Theme and max upload size changes require an app restart to take effect.
Click the Terminate button in the sidebar to stop the server.
| Engine | Type | License | Quality |
|---|---|---|---|
| Kokoro (default) | Local | Apache 2.0 | High — natural-sounding |
| Piper | Local | GPL-3.0 | Good — fast, lightweight |
| OpenAI TTS | Cloud | Proprietary | Very high — requires API key |
| ElevenLabs | Cloud | Proprietary | Very high — requires API key |
Cloud engines require API keys configured in Settings. They return MP3 audio directly and do not require ffmpeg for encoding.
Diana can use a large language model for two purposes:
-
Text cleaning — before TTS, send each text chunk to the LLM to strip tables, charts, citations, footnotes, and boilerplate. Significantly improves narration quality for academic papers and web articles. Also supports translating text to a target language.
-
News summarisation — used by the News page to summarise and deduplicate stories across all sources in a single batched API call.
Supported providers:
| Provider | Default model | API key env var |
|---|---|---|
| OpenAI | gpt-4o-mini |
OPENAI_API_KEY |
| Anthropic | claude-haiku-4-5-20251001 |
ANTHROPIC_API_KEY |
| Anthropic CLI | claude-sonnet-4-5 |
None — uses Claude Code CLI login |
gemini-2.0-flash |
GOOGLE_API_KEY |
For security, store API keys as environment variables and reference them in config.yaml as ${OPENAI_API_KEY} rather than entering the raw key.
Anthropic CLI lets users with a Claude Pro/Max subscription drive Diana's LLM features without a separate API key. It uses the claude-agent-sdk under the hood and authenticates via your existing Claude Code CLI session. Prerequisites: install Node.js, then npm install -g @anthropic-ai/claude-code and run claude login once. Usage counts against your Pro/Max quota. Note: each call spawns the Claude Code CLI as a subprocess, so keep max_concurrent_calls low (1–2) for this provider.
- Create a new file in
diana/tts/(e.g.,my_engine.py) - Implement the
TTSEngineprotocol fromdiana/tts/base.py:namepropertyinitialize()— load modelssynthesize(text, voice, speed)— async, returns audio byteslist_voices()— return available voicesshutdown()— release resourcesVOICESclass attribute — list ofTTSVoiceobjects
- Register it in
diana/tts/registry.py
Diana uses a config.yaml file in the project root. Copy the example to get started:
cp config.example.yaml config.yamlAll settings can also be changed from the Settings page in the dashboard. Key configuration sections:
tts:
engine: kokoro # kokoro | piper | openai_tts | elevenlabs
voice: af_heart
speed: 1.0
openai_tts:
api_key: "${OPENAI_API_KEY}"
model: tts-1
elevenlabs:
api_key: "${ELEVENLABS_API_KEY}"
model: eleven_monolingual_v1
llm:
enabled: false
provider: openai # openai | anthropic | anthropic-cli | google
api_key: "${OPENAI_API_KEY}"
model: "" # leave blank for provider default
target_language: "" # e.g. "English" to translate; blank = no translation
chunk_size: 8000
news:
max_stories_per_category: 5The Streamlit-specific settings (theme, upload size, toolbar) are synced to .streamlit/config.toml automatically.
diana/
├── run.py # Launch the dashboard (syncs config on start)
├── setup.sh # One-command setup (macOS / Linux)
├── setup.bat # One-command setup (Windows)
├── Diana.command # Double-click launcher (macOS, created by setup)
├── Diana.bat # Double-click launcher (Windows, created by setup)
├── config.yaml # Your configuration
├── config.example.yaml # Example config with defaults
├── .streamlit/
│ └── config.toml # Auto-generated Streamlit settings
├── diana/
│ ├── config.py # Config loading and saving
│ ├── models.py # Job data model + page range parser
│ ├── database.py # SQLite job tracking and news storage
│ ├── parsers/ # PDF, EPUB, TXT text extraction
│ ├── tts/ # Swappable TTS engine layer
│ │ ├── base.py # TTSEngine protocol + TTSVoice
│ │ ├── kokoro_engine.py
│ │ ├── piper_engine.py
│ │ ├── openai_tts_engine.py
│ │ ├── elevenlabs_engine.py
│ │ └── registry.py # Engine discovery
│ ├── llm/ # LLM client (OpenAI / Anthropic / Google)
│ │ ├── client.py
│ │ └── registry.py
│ ├── news/ # News scraping and summarisation
│ │ ├── scraper.py # RSS-first scraper with archive.ph fallback
│ │ └── summarizer.py # Batched LLM story extraction
│ ├── processing/ # Chunking, synthesis, merging pipeline
│ │ ├── chunker.py # Smart text chunking
│ │ ├── cleaner.py # Rule-based text cleaning
│ │ ├── llm_cleaner.py # LLM-based text cleaning
│ │ ├── synthesizer.py # Per-chunk TTS synthesis
│ │ ├── merger.py # WAV → MP3 merging with ffmpeg
│ │ ├── pipeline.py # Full extraction → audio pipeline
│ │ └── worker.py # Background job worker
│ └── dashboard/ # Streamlit web UI
│ ├── Home.py # Home page
│ ├── sidebar.py # Shared sidebar, logo, global CSS
│ ├── static/ # Images (icon.jpeg, full.png)
│ └── pages/
│ ├── 1_Upload.py
│ ├── 2_Library.py
│ ├── 3_News.py
│ ├── 4_Web.py
│ └── 5_Settings.py
└── data/ # Runtime data (uploads, output, models, SQLite DB)
| Problem | Solution |
|---|---|
Piper model not found |
Re-run ./setup.sh (or setup.bat) to auto-download, or manually download from piper releases and place it at the path shown in Settings |
| Upload size too small | Increase Max upload size in Settings, save, and restart |
| Theme not applying | Theme changes require a restart — stop the app and run python run.py again |
ffmpeg not found |
Install ffmpeg (see Prerequisites) and ensure it's on your PATH |
| News 403 errors | The site blocks scrapers — add an RSS feed URL via Edit (e.g. https://www.ft.com/rss/home) |
| News: no stories | Ensure an LLM is configured in Settings → LLM Text Cleaning |
| LLM module not found | Install the required provider: pip install openai / pip install anthropic / pip install google-generativeai |
| Git push fails for large files | Run git config http.postBuffer 524288000 then push again |
This project is licensed under the GNU General Public License v3.0. See LICENSE for details.
Major feature release: cloud TTS, LLM cleaning, news aggregator, and web-to-audio.
- Cloud TTS: OpenAI TTS (tts-1 / tts-1-hd) and ElevenLabs engines added alongside Kokoro and Piper
- LLM Text Cleaning: Optional pre-processing via OpenAI, Anthropic, or Google Gemini — strips tables, charts, citations, boilerplate; supports translation
- News Page: Add newspaper sources with multiple RSS feeds and group tags; fetch top stories per category via a single batched LLM call; archive.ph fallback for paywalled sites; stories persisted across sessions; convert to audio; export/import source lists as JSON to share across devices or with others
- Web URL to Audio: Paste any URL, scrape + clean + convert in one step
- UI: Sidebar nav items italic; all page titles italic; Settings moved to last sidebar position; news categories collapsible
- News categories: Merged to 7 (Technology & Science, Sports & Entertainment combined)
- Config: New
news.max_stories_per_categorysetting; newllmsection for API-based cleaning
Text cleaning, Piper fixes, and license update.
- Text Cleaning: Rule-based cleaning pipeline strips LaTeX, citations, URLs, and control characters before TTS — fixes "index out of bounds" crashes on academic documents
- Piper Fix: Voice parameter now correctly resolves to voice-specific model files with caching
- Piper Fix: Added
piper-ttsas a required dependency - UX: Graceful termination — no more "Connection error" overlay after shutdown
- UX: Updated terminate confirmation message
- Infrastructure: Updated Piper references from rhasspy/piper to OHF-Voice/piper1-gpl
- Infrastructure: License changed from Apache 2.0 to GPL-3.0 (piper-tts compatibility)
- Infrastructure: Tests for text cleaner and Piper engine
Security, robustness, and UX improvements.
- Security: Sanitize uploaded filenames to prevent path traversal
- Security: Delete and terminate actions now require confirmation
- Library: Search by filename, filter by status, sort by date/name/status
- Library: Organise jobs into folders (create, move, remove)
- Library: Pagination (20 jobs per page)
- Library: Expandable error details for failed jobs
- Upload: Custom voice preview text
- Upload: Page range validation with inline feedback
- Upload: Double-submit prevention
- Settings: Model path validation with warnings on save
- Infrastructure: Apache 2.0 LICENSE, pyproject.toml, .gitignore improvements
- Infrastructure: Unit tests for models, config, database, and chunker
- Infrastructure: CI/CD via GitHub Actions (Python 3.10–3.13)
Initial release.
- Upload PDF, EPUB, and TXT files and convert to MP3
- Page/chapter range selection for PDFs and EPUBs
- Kokoro and Piper TTS engines with voice preview
- Library with audio playback, download, rename, and delete
- Configurable theme (dark / light / device auto-detect)
- Configurable max upload size
- Background job processing with progress tracking
- Modular engine architecture — add new TTS backends by implementing
TTSEngine