Releases: DigiJoe79/AudioBook-Maker
Release v1.1.2 - Unified Error Handling
This release addresses some minor fixes and improving maintainability and consistency of error responses for frontend i18n translation.
What's New
Unified ApplicationError Exception
All backend errors now use a single ApplicationError class instead of 20+ specialized exception subclasses.
Error Format
All errors follow the i18n-compatible format:
[ERROR_CODE]param1:value1;param2:value2
Bug Fixes
Frontend Error Message Parsing
Problem: Error messages containing colons in parameter values were incorrectly parsed, causing i18n translation failures.
Example:
[ENGINE_START_FAILED]error:Engine spacy:docker:local failed
The previous parser split on all colons, resulting in:
error→Engine spacy(truncated)
Fix: Split only on the first colon to preserve colons in values:
error→Engine spacy:docker:local failed(complete)
Affected file: frontend/src/utils/translateBackendError.ts
Docker Image Update Cleanup Failure
Problem: When updating a Docker engine image, the cleanup of the old (dangling) image failed with error:
409 Conflict: unable to delete image - image is being used by running container
Cause: The update flow attempted to remove the old image while the container was still running.
Fix: Stop the running engine container before attempting to remove the old dangling image.
Affected file: backend/api/engines.py
Upgrade Notes
- No database migration required
Full Changelog: v1.1.1...v1.1.2
Release v1.1.1 - Remote Backend Connectivity Fix
This is a hotfix release that resolves a critical issue preventing the desktop app from connecting to remote Docker backends.
Bug Fix
Remote Backend Connection Failure
Symptoms:
- Desktop app shows "Offline" when connecting to remote backends (e.g.,
http://server:8765) - Browser can reach the same backend URL successfully
- No network traffic visible in Wireshark when app attempts connection
Root Cause:
The Content Security Policy (CSP) in tauri.conf.json was restricting HTTP connections to localhost:8765 and 127.0.0.1:8765 only. When users configured a remote backend URL, the CSP silently blocked all fetch requests before they could leave the app.
Fix:
Updated CSP directives to allow connections to any host:
| Directive | Before | After |
|---|---|---|
connect-src |
localhost:8765, 127.0.0.1:8765 |
http://*:*, https://*:*, ws://*:*, wss://*:* |
img-src |
localhost:8765, 127.0.0.1:8765 |
http://*:*, https://*:* |
media-src |
localhost:8765, 127.0.0.1:8765 |
http://*:*, https://*:* |
This enables the desktop app to connect to:
- Local backends (
localhost,127.0.0.1) - Remote backends on LAN (
192.168.x.x, hostnames) - Remote backends over internet (any URL configured by user)
Affected Versions
- v1.1.0 - Remote backend connections blocked
- v1.1.1 - Fixed
Upgrade Notes
No database migration required. Simply replace the desktop app with v1.1.1.
Acknowledgments
Thanks to the following community members for their contributions:
Bug Reports:
- u/k8-bit - Remote backend connection issue with Wireshark captures
- u/GaryDUnicorn - Remote backend connection issue with Wireshark captures
Security Feedback:
- u/coder543 - Corrected misleading SSH key security description
Full Changelog: v1.1.0...v1.1.1
Docker-First Architecture & Online Engine Catalog
Release v1.1.0 - Docker-First Architecture & Online Engine Catalog
This release transforms Audiobook Maker into a Docker-first application. Engines are now distributed as prebuilt Docker images via the audiobook-maker-engines repository, eliminating complex Python environment setup for end users.
The Docker-First Vision
For End Users: Zero-Setup Engines
Before v1.1.0: Installing a TTS engine like XTTS meant:
- Installing Python 3.10+
- Creating virtual environments
- Installing CUDA/cuDNN dependencies
- Downloading models manually
- Debugging dependency conflicts
With v1.1.0: Click "Install" in the UI:
- Prebuilt Docker images from GitHub Container Registry
- All dependencies bundled (Python, CUDA, models)
- One-click installation and updates
- Works on Windows, macOS, and Linux
Online Engine Catalog
All engines are distributed via the audiobook-maker-engines repository:
https://github.com/DigiJoe79/audiobook-maker-engines
├── catalog.yaml # Engine metadata, versions, requirements
├── xtts/ # XTTS v2 engine
│ └── Dockerfile
├── whisper/ # Whisper STT engine
│ └── Dockerfile
└── ...
How it works:
- Backend syncs
catalog.yamlon startup - UI shows available engines with version info
- User clicks "Install" → pulls from
ghcr.io/digijoe79/audiobook-maker-engines - Engine ready to use in seconds (depending on image size)
- Update detection: compares local digest with registry
Image Registry: ghcr.io/digijoe79/audiobook-maker-engines/{engine}:{tag}
For Developers: Subprocess Mode
Subprocess execution (LocalRunner) remains available for engine development:
- Clone the engines repo:
git clone https://github.com/DigiJoe79/audiobook-maker-engines backend/engines - Create VENV, install dependencies, iterate on code
- Test locally before building Docker images
- Backend auto-discovers engines in
backend/engines/
Use subprocess mode when:
- Developing new engines
- Debugging engine code
- Contributing to audiobook-maker-engines
Deployment Scenarios
Recommended: Full Docker Stack
Run everything in Docker - backend and engines:
Using docker compose:
docker compose up -dOr standalone docker run:
docker run -d \
--name audiobook-maker-backend \
-p 8765:8765 \
--add-host=host.docker.internal:host-gateway \
-e DOCKER_ENGINE_HOST=host.docker.internal \
-v /var/run/docker.sock:/var/run/docker.sock \
-v audiobook-data:/app/data \
-v audiobook-media:/app/media \
ghcr.io/digijoe79/audiobook-maker/backend:latestservices:
backend:
build:
context: .
dockerfile: backend/Dockerfile
container_name: audiobook-maker-backend # Required name (orphan cleanup)
ports:
- "8765:8765"
extra_hosts:
- "host.docker.internal:host-gateway"
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- ./data:/app/data
- ./media:/app/media
environment:
- DEFAULT_ENGINE_RUNNER=docker
- DOCKER_ENGINE_HOST=host.docker.internalNote: The container must be named
audiobook-maker-backend. On startup, the backend stops orphaned engine containers (prefixaudiobook-) from previous sessions. Using a different name will cause the backend to stop itself.
Remote GPU Offloading
Run GPU-intensive engines on a dedicated server:
┌─────────────────────┐ SSH ┌─────────────────────┐
│ Your PC (CPU) │◄──────────────────► │ GPU Server │
│ - Tauri App │ │ - Docker │
│ - Backend │ │ - NVIDIA Runtime │
│ - Non-GPU engines │ │ - XTTS, Whisper │
└─────────────────────┘ └─────────────────────┘
Features:
- SSH key auto-generation per host
- GPU detection (NVIDIA runtime check)
- Volume mounts for samples/models
- Per-host engine installation
Developer Setup
For engine development, run backend in VENV:
cd backend
python -m venv venv
./venv/Scripts/activate # Windows
pip install -r requirements.txt
python main.pyThen clone engines repo and develop locally.
Engine Runner Architecture
Three runner types execute engines in different environments:
EngineRunner (ABC)
├── LocalRunner - Subprocess in local VENV (developers)
├── DockerRunner - Local Docker containers (recommended)
└── RemoteDockerRunner - Remote Docker via SSH (GPU offloading)
Configuration:
| Variable | Default | Description |
|---|---|---|
DEFAULT_ENGINE_RUNNER |
local |
Default: local or docker |
DOCKER_ENGINE_HOST |
127.0.0.1 |
Backend address for containers |
Engine Hosts Management UI
New "Engine Hosts" tab in Settings manages all execution environments:
| Host Type | Description |
|---|---|
| Subprocess | Built-in, for developers (VENV-based) |
| Docker Local | Local Docker daemon (recommended) |
| Docker Remote | Remote servers via SSH |
Features:
- Add/remove remote Docker hosts with SSH key wizard
- Test connection with GPU detection
- Volume configuration (samples, models paths)
- SSH public key display with copy-to-clipboard
- Per-host engine installation from online catalog
- Automatic image update detection
REST API
Engine Host Endpoints
| Method | Endpoint | Description |
|---|---|---|
| GET | /engine-hosts |
List all hosts |
| GET | /engine-hosts/{id} |
Get specific host |
| POST | /engine-hosts |
Create new host |
| DELETE | /engine-hosts/{id} |
Delete host |
| POST | /engine-hosts/{id}/test |
Test connection + GPU detection |
| POST | /engine-hosts/prepare |
Generate SSH key for new host |
| GET | /engine-hosts/{id}/volumes |
Get volume configuration |
| POST | /engine-hosts/{id}/volumes |
Set volume configuration |
| GET | /engine-hosts/{id}/public-key |
Get SSH public key for host |
Engine/Catalog Endpoints
| Method | Endpoint | Description |
|---|---|---|
| POST | /engines/catalog/sync |
Sync online catalog |
| POST | /engines/{variant}/install |
Install Docker engine |
| DELETE | /engines/{variant}/uninstall |
Uninstall Docker engine |
| GET | /engines/{variant}/check-update |
Check for image updates |
| POST | /engines/{variant}/pull-update |
Pull latest image |
Database Schema
New engine_hosts table:
CREATE TABLE engine_hosts (
host_id TEXT PRIMARY KEY,
display_name TEXT NOT NULL,
host_type TEXT NOT NULL, -- 'subprocess' | 'docker:local' | 'docker:remote'
ssh_url TEXT, -- For remote: ssh://user@host
is_available INTEGER DEFAULT 1,
has_gpu INTEGER, -- NULL=unknown, 0=no, 1=yes
samples_path TEXT, -- Volume mount for samples
models_path TEXT, -- Volume mount for models
created_at TEXT NOT NULL,
updated_at TEXT NOT NULL
);Bug Fixes
SQLite WAL Mode Disabled
Disabled WAL journal mode for Docker volume mount compatibility. Database uses DELETE mode for network filesystem support.
Technical Details
New Backend Modules
| Module | Description |
|---|---|
core/engine_runner.py |
EngineRunner ABC + EngineEndpoint |
core/local_runner.py |
Subprocess runner (developer mode) |
core/docker_runner.py |
Local Docker runner |
core/remote_docker_runner.py |
SSH-based remote runner |
core/engine_runner_registry.py |
Central runner registry |
db/engine_host_repository.py |
Host CRUD operations |
api/engine_hosts.py |
Host REST API |
services/docker_service.py |
Docker SDK operations |
services/docker_discovery_service.py |
Catalog sync, discovery |
services/docker_host_monitor.py |
Host availability monitoring |
services/ssh_key_service.py |
SSH key management |
services/online_catalog_service.py |
Online catalog management |
New Frontend Components
| Component | Description |
|---|---|
EngineHostsTab.tsx |
Main host management UI |
HostEnginesSection.tsx |
Per-host engine list |
HostSettingsDialog.tsx |
Volume config + SSH key |
AddHostDialog.tsx |
Add remote host wizard |
AddImageDialog.tsx |
Install from catalog |
useSSEDockerHostHandlers.ts |
Host status SSE events |
New Dependencies
docker>=7.0.0- Docker SDK for Python
Tests
23 new tests covering:
- EngineRunner abstraction (3)
- LocalRunner (4)
- DockerRunner (3)
- RemoteDockerRunner (2)
- EngineRunnerRegistry (6)
- DockerHostRepository (3)
- BaseEngineManager integration (2)
Migration Notes
- Existing VENV-based installations continue to work
- No database migration required
- Docker mode is opt-in via
DEFAULT_ENGINE_RUNNER=docker - Recommended: Switch to Docker for production use
Full Changelog: v1.0.2...v1.1.0
Engine Repository: https://github.com/DigiJoe79/audiobook-maker-engines
VibeVoice TTS Engine, EPUB Import and Bugfixes
Release: v1.0.2 - VibeVoice TTS Engine, EPUB Import and Bugfixes
New Features
VibeVoice TTS Engine (Experimental)
Added Microsoft VibeVoice as a new TTS engine with voice cloning support.
Models:
- VibeVoice-1.5B (~3GB VRAM) - Faster generation, up to 90 min audio
- VibeVoice-7B (~18GB VRAM) - Highest quality, up to 45 min audio
Languages:
- Stable: English (en), Chinese (zh)
- Experimental: German (de), French (fr), Italian (it), Japanese (ja), Korean (ko), Dutch (nl), Polish (pl), Portuguese (pt), Spanish (es)
Requirements:
- Python 3.12
- PyTorch 2.9.1 with CUDA 13.0
- Optional: Flash Attention 2 for ~2x faster inference and ~40% less VRAM
Optional Flash Attention 2 Setup (Windows):
The setup script asks whether to install Flash Attention 2. If you choose yes, it installs:
- triton-windows - Triton compiler for Windows
- Flash-Attention-2_for_Windows - Pre-built Windows wheels
Without Flash Attention, SDPA (Scaled Dot-Product Attention) is used as fallback with ~80% of the performance.
Voice Sample Recommendations for Voice Cloning with VibeVoice:
| Aspect | Recommendation |
|---|---|
| Duration | 10-60 seconds (10s minimum) |
| Format | WAV or MP3 |
| Sample Rate | 24kHz (VibeVoice native rate) |
| Quality | Clean, no background noise/music |
| Language | EN or ZH for best results |
Best practices:
- Use clean audio without background noise or music
- Natural speaking pace (not too fast)
- Consistent volume throughout
- Avoid intro phrases like "Welcome to..." or "Hello..." (can cause artifacts)
- The 7B model is more stable with fewer unexpected artifacts
EPUB Import
Added EPUB file support to the Import workflow. You can now import e-books directly alongside Markdown files.
Features:
- Automatic EPUB-to-Markdown conversion
- Chapter structure detection
- Front matter filtering (skips cover, ToC, copyright pages)
- Same preview/execute workflow as Markdown import
Dependencies: ebooklib, beautifulsoup4, markdownify
Contributed by @codesterribly
Bug Fixes
- Text Segmentation: Fixed
max_lengthparameter being ignored during text segmentation. Segments were incorrectly limited to 250 characters instead of the configured limit (e.g., 5000 for VibeVoice). Affected both Upload and Import workflows. - Segment Text Normalization: Newlines and extra whitespace are now removed when segments are created. This fixes issues with TTS engines (like VibeVoice) that interpret newlines as speaker turn boundaries, causing audio to be truncated.
- Engine Enable/Disable: Fixed engines not being disabled when clicking the disable button. The
enabledflag was written to a separate copy of the settings dictionary instead of the original, causing changes to be lost on save. - TTS Job Timestamps: Fixed job completion timestamps jumping by ~1 hour. SQLite's
datetime('now')returns UTC without timezone info, but JavaScript interpreted it as local time. Now usesdatetime.now().isoformat()for consistent local timestamps.
Improvements
- API Types: Migrated frontend API types to OpenAPI-generated source for better type safety
Full Changelog: https://github.com/user/audiobook-maker/compare/v1.0.1...v1.0.2
Bug fixes and stability improvements
Release v1.0.1 - Bug Fixes
Patch release with bug fixes and stability improvements.
Bug Fixes
- Speaker sample paths: Changed from absolute to relative paths for portability. Existing speakers need to be re-added.
- Speaker preview player: Fixed play/pause icon state sync
- Windows server shutdown: Suppressed harmless
ConnectionResetErroron Ctrl+C - E2E tests: Made segment tests locale-agnostic (work in German and English)
Migration
Existing users with speaker samples must re-add their speakers after updating (delete and re-create).
Full Changelog: https://github.com/user/audiobook-maker/compare/v1.0.0...v1.0.1
Multi-Engine Architecture & Quality Assurance
Release v1.0.0 - Multi-Engine Architecture & Quality Assurance
This is a major release of Audiobook Maker featuring a complete multi-engine architecture, quality assurance system, performance optimizations, and modern UI navigation.
Highlights
Multi-Engine Architecture
- 4 Engine Types - TTS, STT, Text Processing, and Audio Analysis
- Isolated Virtual Environments - Each engine runs in its own VENV (no dependency conflicts)
- Engine Enable/Disable - Per-engine enabled flag with database persistence
- Auto-Stop - Non-default engines automatically stop after 5 minutes of inactivity
- Engine Monitoring UI - Real-time status, memory usage, and auto-stop countdown
- Plug-and-Play - Add new engines without modifying backend code
Quality Assurance System
- Whisper Integration - Automatic transcription analysis for quality verification
- Silero-VAD - Audio quality analysis (speech ratio, silence detection, clipping)
- Confidence Scoring - Detect low-quality or mispronounced segments (0-100%)
- Issue Detection - Identify missing words, extra words, mispronunciations
- Quality Status - Visual indicators (perfect/warning/defect) in segment list
- Generic Quality Format - Engine-agnostic result format for UI rendering
Pronunciation Rules System
- Pattern-Based Replacement - Simple text or regex patterns to fix mispronunciations
- Scope-Based Rules - Global, engine-specific, project-specific, or project-engine
- Priority System - Control rule application order
- Live Preview - Test rules before applying
- Import/Export - JSON format for rule sharing
- Automatic Application - Rules applied during TTS generation
Modern Navigation System
- Teams/Discord-Style UI - Icon-based sidebar with 6 views
- Keyboard Shortcuts - Ctrl+1 through Ctrl+6 for view switching
- Collapsible Sidebar - Ctrl+B to toggle
- Event Log - Real-time SSE event monitoring in Monitoring view
Performance Optimizations
- 95% DOM Reduction - Virtual scrolling with @tanstack/react-virtual
- 99% Fewer Re-renders - React.memo optimization for list items
- 95% Faster Event Processing - immer integration (1-2ms vs 20-30ms)
- 60fps Smooth Scrolling - Even with 400+ segments
New Features
Multi-Engine Architecture
Engine Types:
| Type | Purpose | Engines Included |
|---|---|---|
| TTS | Text-to-Speech | XTTS v2, Chatterbox |
| STT | Speech-to-Text | OpenAI Whisper (5 model sizes) |
| Text Processing | Text Segmentation | spaCy (11 languages) |
| Audio Analysis | Audio Quality | Silero-VAD |
Base Server Hierarchy:
BaseEngineServer (Generic)
├── BaseTTSServer (adds /generate endpoint)
├── BaseQualityServer (adds /analyze endpoint)
└── BaseTextServer (adds /segment endpoint)
Engine Management:
- Enable/disable engines via Settings or API
- Auto-stop after 5 minutes inactivity (configurable)
- Real-time status monitoring (disabled/stopped/starting/running/stopping)
- Memory usage tracking
- Start/stop buttons in Monitoring view
Quality Analysis System
Unified Quality Format:
- All engines return generic
AnalysisResultwithqualityScore,qualityStatus,details - Frontend renders any engine's results dynamically
- Supports
fields(key-value pairs) andinfoBlocks(titled lists)
Quality Levels:
perfect(score >= 85) - Green indicatorwarning(score 70-84) - Yellow indicatordefect(score < 70) - Red indicator
Analysis Types:
- STT Analysis (Whisper): Transcription comparison, word-level confidence, text alignment
- Audio Analysis (Silero-VAD): Speech ratio, silence detection, clipping detection, volume analysis
Navigation System
6 Views:
| View | Shortcut | Description |
|---|---|---|
| Main | Ctrl+1 | Audiobook editing with AudioPlayer |
| Import | Ctrl+2 | Markdown import workflow |
| Speakers | Ctrl+3 | Speaker management |
| Pronunciation | Ctrl+4 | Pronunciation rules |
| Monitoring | Ctrl+5 | Jobs, Quality Jobs, Event Log, Engines |
| Settings | Ctrl+6 | 5 settings tabs |
Additional Shortcuts:
- Ctrl+B - Toggle sidebar
- Ctrl+[ - Go back to previous view
- Mac: Cmd instead of Ctrl
Pronunciation Rules
- Create text transformation rules for mispronounced words
- Support regex and simple text patterns
- Scope system: global → engine → project → project-engine
- Priority ordering within scopes
- Active/inactive toggle
- Import/export as JSON
Technical Changes
Backend
Multi-Engine System:
- 4 Engine Managers inheriting from
BaseEngineManager - Auto-discovery per engine type from
backend/engines/{type}/ - Unified
/modelsendpoint returningModelInfoobjects - Activity tracking with timestamps for auto-stop
- Engine enable/disable persisted in settings DB
API Consolidation:
- Jobs API:
/api/jobs/tts/*and/api/jobs/quality/* - Engine API:
/api/engines/status,/api/engines/{type}/{name}/enable|disable - Removed duplicate endpoints
Quality System:
QualityWorkerorchestrates STT + Audio engines- Generic result format for all quality engines
- Quality jobs table with database persistence
New Files:
core/base_engine_manager.py- Shared manager logiccore/base_engine_discovery.py- Shared discovery logiccore/{tts,stt,text,audio}_engine_manager.py- Type-specific managerscore/quality_worker.py- Quality job processorengines/base_quality_server.py- STT + Audio base classengines/base_text_server.py- Text processing base class
Frontend
Navigation System:
store/navigationStore.ts- View state managementpages/*.tsx- 6 main views + embedded viewscomponents/layout/NavigationSidebar.tsx- Icon sidebar
Engine Management:
components/engines/EngineCard.tsx- Status cardcomponents/engines/EngineStatusBadge.tsx- Status indicatorhooks/useEngineQueries.ts- React Query hooks
Performance:
- Virtual scrolling in SegmentList
- React.memo with custom comparisons
- immer for O(1) cache updates
- Stable references with useCallback
SSE Handlers:
- Split into 6 domain-specific hooks (2,555 LOC total)
useSSETTSHandlers- TTS job eventsuseSSEQualityHandlers- Quality analysis eventsuseSSESystemHandlers- Health, settings, pronunciationuseSSESegmentHandlers- Segment/chapter eventsuseSSEExportHandlers- Export eventsuseSSEEngineHandlers- Engine status events
Database
New Tables:
pronunciation_rules- Rule storage with scopequality_jobs- Quality job queue
Engine Settings:
settings.{type}.engines.{name}.enabled- Per-engine enable flag
Breaking Changes
API Changes
Removed Endpoints:
/api/tts/engines/*- Use/api/engines/statusinstead/api/stt/*legacy endpoints - Use/api/quality/*and/api/jobs/quality/*
New Endpoints:
GET /api/engines/status- All engines statusPOST /api/engines/{type}/{name}/enable- Enable enginePOST /api/engines/{type}/{name}/disable- Disable enginePOST /api/engines/{type}/{name}/start- Start enginePOST /api/engines/{type}/{name}/stop- Stop enginePOST /api/quality/analyze/segment- Analyze single segmentPOST /api/quality/analyze/chapter- Analyze chapterGET /api/jobs/quality/*- Quality job management
Engine Directory Structure
Engines moved to type-specific directories:
backend/engines/
├── tts/
│ ├── xtts/
│ ├── chatterbox/
│ └── kani/
├── stt/
│ └── whisper/
├── text_processing/
│ └── spacy/
└── audio_analysis/
└── silero-vad/
Response Model Changes
engine_model_namereplacestts_model_namein discovery/managers- All quality responses use generic
AnalysisResultformat
Available Engines
TTS Engines
| Engine | Languages | Features | Python |
|---|---|---|---|
| XTTS v2 | 17 | Voice cloning, model hotswap | 3.10 |
| Chatterbox | 23 | Voice cloning, fast generation | 3.11 |
STT Engines
| Engine | Languages | Models | Python |
|---|---|---|---|
| Whisper | 12 | tiny, base, small, medium, large | 3.12 |
Text Processing Engines
| Engine | Languages | Features | Python |
|---|---|---|---|
| spaCy | 11 | MD models only, CPU-only | 3.12 |
Audio Analysis Engines
| Engine | Features | Python |
|---|---|---|
| Silero-VAD | Speech/silence detection, clipping, volume | 3.12 |
Known Issues
- Virtual scrolling may have slight visual jitter on rapid scroll
- Pronunciation rules with complex regex may impact generation speed
- Quality analysis may take some time for long chapters
For Developers
Adding Custom Engines
See the updated Engine Development Guide for complete documentation.
Quick Start:
- Choose engine type (TTS, STT, Text, Audio)
- Copy template:
cp -r backend/engines/{type}/_template backend/engines/{type}/my_engine - Implement required methods:
- TTS:
load_model(),generate_audio(),unload_model(),get_available_models() - STT/Audio:
load_model(),analyze_audio(),unload_model(),get_available_models()
- TTS:
- Configure
engine.yaml - Create VENV and install dependencies
- Restart backend - engine appears automatically!
Full Changelog
v0.3.0 (not released on Github)
- Quality Assurance with Whisper STT analysis
- Pronunciation Rules System
- Performance optimizations (virtual scrolling, immer)
- SSE handler refactoring (6 domain-specific hook...
New TTS engine architecture and improvements
Release v0.2.0 - Architecture & Performance Update
This is a major update to Audiobook Maker focusing on architecture improvements, real-time updates, and performance optimizations.
Highlights
Real-Time Updates & Performance
- Server-Sent Events (SSE) - Push-based real-time updates for instant UI feedback
- 99.5% Network Reduction - 2,000+ requests reduced to ~50 during 10-minute generation
- Zero Polling - Complete elimination of aggressive polling when SSE is active
- Instant Visual Feedback - Segments show status changes immediately without delays
Architecture Improvements
- Plug-and-Play Engine System - Add new TTS engines without touching backend code!
- Complete Dependency Isolation - Each engine runs in its own virtual environment
- Database-Backed Job Queue - Persistent job state survives backend restarts
- Worker-Queue Pattern - Background worker with graceful shutdown and recovery
- Job Management - Resume cancelled jobs, clear finished jobs, track progress
Code Quality & Standards
- 100% Pydantic Compliance - All API endpoints use validated response models
User Experience
- JobsPanel UI - Centralized job management with status tracking
- Queued Status - Visual feedback for segments waiting in job queue
- Segment Regeneration - Improved workflow with safety warnings
- Health Monitoring - Real-time backend health status via SSE
Technical Changes
Backend
Server-Sent Events:
- EventBroadcaster service with channel-based routing (jobs, health, speakers, settings)
- 25 event types for real-time updates across all entities
- TTS Worker emits events during generation
- Health broadcaster sends updates every 5s
Database & API:
- Database migration: renamed columns to use
tts_prefix consistently - All endpoints use Pydantic response models with automatic camelCase conversion
- Job persistence in
tts_jobstable (pending → running → completed/failed/cancelled) - Segment cleanup on cancel/delete/crash (prevents orphaned segments)
Engine Architecture:
- Auto-discovery scans
backend/engines/for new engines at backend startup - Dynamic imports with optional
engine.yamlmetadata - No hardcoded engine references in backend code
Frontend
React Query Optimizations:
- SSE-aware polling with automatic fallback (30s interval)
- Increased cache times for static data (10m engines, 5m models)
- Optimized invalidations (chapter-specific instead of project-wide)
Job Management:
- JobsPanel component with status badges and progress tracking
- Resume cancelled jobs with one click
- Clear finished jobs (completed/failed/cancelled)
- Active job monitoring with SSE (no polling)
Breaking Changes
Database Schema
- Migration required:
engine→tts_engine,model_name→tts_model_name,speaker_name→tts_speaker_name - Automatic migration runs on first startup
API Changes
- All responses now use camelCase (automatic conversion via Pydantic)
- Request bodies accept both snake_case and camelCase (backwards compatible)
For Developers: Extending with Custom TTS Engines
v0.2.0 introduces a new plug-and-play engine system!
You can now add your own TTS engines to Audiobook Maker without modifying backend code:
How Easy Is It?
-
Copy the template:
cp -r backend/engines/_template backend/engines/my_engine
-
Implement 3 methods in
server.py:load_model()- Load your TTS modelgenerate_audio()- Synthesize text to audiounload_model()- Free resources
-
Configure
engine.yaml:- Name, languages, capabilities
- Models are auto-discovered from
models/folder
-
Create isolated VENV:
cd backend/engines/my_engine python -m venv venv venv\Scripts\activate pip install -r requirements.txt
-
Restart backend → Engine appears in UI automatically! ✅
Supported TTS Engines
Out of the box:
- XTTS v2 (voice cloning, 17+ languages)
- Chatterbox Multilingual (experimental)
Want to add Piper? Coqui? Just implement the 3 methods!
📖 See ENGINE_DEVELOPMENT_GUIDE.md for complete documentation.
Known Issues
- SSE connection may disconnect on network changes (auto-reconnects within 3s)
- First SSE connection takes 100-200ms to establish
Full Changelog: v0.1.0...v0.2.0
Contributors
Built with Tauri 2.1, React 18, Python FastAPI, and powered by XTTS v2 for voice cloning.
Initial Release
Release v0.1.0 - Initial Release
This is the first public release of Audiobook Maker, a modern desktop application for creating audiobooks with AI-powered voice cloning.
Highlights
- Complete audiobook creation workflow with project, chapter, and segment management
- XTTS voice cloning from audio samples with optional GPU acceleration
- Multi-language support (17+ languages) and smart text segmentation
- Export to MP3, M4A, or WAV formats
- Markdown import for batch project creation
- Session recovery and drag & drop interface
Tech Stack
Built with Tauri 2.1, React 18, Python FastAPI, and XTTS v2 for voice cloning.
Getting Started
See the Quick Start Guide and Usage Guide in the README for installation and setup instructions.
Requirements
- Node.js 18+, Python 3.10+, Rust 1.70+, FFmpeg
- CUDA 11.8/12.1 (optional, for GPU)
What's Next
Planned features include additional TTS engines (OpenAI, ElevenLabs, Azure), Whisper integration, pronunciation dictionary, and audio effects. See the full Roadmap for details.
Full Changelog: Initial Release