Release v1.1.0 - Docker-First Architecture & Online Engine Catalog

This release addresses some minor fixes and improving maintainability and consistency of error responses for frontend i18n translation.

What's New

Unified ApplicationError Exception

All backend errors now use a single ApplicationError class instead of 20+ specialized exception subclasses.

Error Format

All errors follow the i18n-compatible format:

[ERROR_CODE]param1:value1;param2:value2

Bug Fixes

Frontend Error Message Parsing

Problem: Error messages containing colons in parameter values were incorrectly parsed, causing i18n translation failures.

Example:

[ENGINE_START_FAILED]error:Engine spacy:docker:local failed

The previous parser split on all colons, resulting in:

error → Engine spacy (truncated)

Fix: Split only on the first colon to preserve colons in values:

error → Engine spacy:docker:local failed (complete)

Affected file: frontend/src/utils/translateBackendError.ts

Docker Image Update Cleanup Failure

Problem: When updating a Docker engine image, the cleanup of the old (dangling) image failed with error:

409 Conflict: unable to delete image - image is being used by running container

Cause: The update flow attempted to remove the old image while the container was still running.

Fix: Stop the running engine container before attempting to remove the old dangling image.

Affected file: backend/api/engines.py

Upgrade Notes

No database migration required

Full Changelog: v1.1.1...v1.1.2

This is a hotfix release that resolves a critical issue preventing the desktop app from connecting to remote Docker backends.

Bug Fix

Remote Backend Connection Failure

Symptoms:

Desktop app shows "Offline" when connecting to remote backends (e.g., http://server:8765)
Browser can reach the same backend URL successfully
No network traffic visible in Wireshark when app attempts connection

Root Cause:
The Content Security Policy (CSP) in tauri.conf.json was restricting HTTP connections to localhost:8765 and 127.0.0.1:8765 only. When users configured a remote backend URL, the CSP silently blocked all fetch requests before they could leave the app.

Fix:
Updated CSP directives to allow connections to any host:

Directive	Before	After
`connect-src`	`localhost:8765`, `127.0.0.1:8765`	`http://:`, `https://:`, `ws://:`, `wss://:`
`img-src`	`localhost:8765`, `127.0.0.1:8765`	`http://:`, `https://:`
`media-src`	`localhost:8765`, `127.0.0.1:8765`	`http://:`, `https://:`

This enables the desktop app to connect to:

Local backends (localhost, 127.0.0.1)
Remote backends on LAN (192.168.x.x, hostnames)
Remote backends over internet (any URL configured by user)

Affected Versions

v1.1.0 - Remote backend connections blocked
v1.1.1 - Fixed

Upgrade Notes

No database migration required. Simply replace the desktop app with v1.1.1.

Acknowledgments

Thanks to the following community members for their contributions:

Bug Reports:

u/k8-bit - Remote backend connection issue with Wireshark captures
u/GaryDUnicorn - Remote backend connection issue with Wireshark captures

Security Feedback:

u/coder543 - Corrected misleading SSH key security description

Full Changelog: v1.1.0...v1.1.1

Release v1.1.0 - Docker-First Architecture & Online Engine Catalog

This release transforms Audiobook Maker into a Docker-first application. Engines are now distributed as prebuilt Docker images via the audiobook-maker-engines repository, eliminating complex Python environment setup for end users.

The Docker-First Vision

For End Users: Zero-Setup Engines

Before v1.1.0: Installing a TTS engine like XTTS meant:

Installing Python 3.10+
Creating virtual environments
Installing CUDA/cuDNN dependencies
Downloading models manually
Debugging dependency conflicts

With v1.1.0: Click "Install" in the UI:

Prebuilt Docker images from GitHub Container Registry
All dependencies bundled (Python, CUDA, models)
One-click installation and updates
Works on Windows, macOS, and Linux

Online Engine Catalog

All engines are distributed via the audiobook-maker-engines repository:

https://github.com/DigiJoe79/audiobook-maker-engines
├── catalog.yaml          # Engine metadata, versions, requirements
├── xtts/                 # XTTS v2 engine
│   └── Dockerfile
├── whisper/              # Whisper STT engine
│   └── Dockerfile
└── ...

How it works:

Backend syncs catalog.yaml on startup
UI shows available engines with version info
User clicks "Install" → pulls from ghcr.io/digijoe79/audiobook-maker-engines
Engine ready to use in seconds (depending on image size)
Update detection: compares local digest with registry

Image Registry: ghcr.io/digijoe79/audiobook-maker-engines/{engine}:{tag}

For Developers: Subprocess Mode

Subprocess execution (LocalRunner) remains available for engine development:

Clone the engines repo: git clone https://github.com/DigiJoe79/audiobook-maker-engines backend/engines
Create VENV, install dependencies, iterate on code
Test locally before building Docker images
Backend auto-discovers engines in backend/engines/

Use subprocess mode when:

Developing new engines
Debugging engine code
Contributing to audiobook-maker-engines

Deployment Scenarios

Recommended: Full Docker Stack

Run everything in Docker - backend and engines:

Using docker compose:

docker compose up -d

Or standalone docker run:

docker run -d \
  --name audiobook-maker-backend \
  -p 8765:8765 \
  --add-host=host.docker.internal:host-gateway \
  -e DOCKER_ENGINE_HOST=host.docker.internal \
  -v /var/run/docker.sock:/var/run/docker.sock \
  -v audiobook-data:/app/data \
  -v audiobook-media:/app/media \
  ghcr.io/digijoe79/audiobook-maker/backend:latest

services:
  backend:
    build:
      context: .
      dockerfile: backend/Dockerfile
    container_name: audiobook-maker-backend  # Required name (orphan cleanup)
    ports:
      - "8765:8765"
    extra_hosts:
      - "host.docker.internal:host-gateway"
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - ./data:/app/data
      - ./media:/app/media
    environment:
      - DEFAULT_ENGINE_RUNNER=docker
      - DOCKER_ENGINE_HOST=host.docker.internal

Note: The container must be named audiobook-maker-backend. On startup, the backend stops orphaned engine containers (prefix audiobook-) from previous sessions. Using a different name will cause the backend to stop itself.

Remote GPU Offloading

Run GPU-intensive engines on a dedicated server:

┌─────────────────────┐         SSH         ┌─────────────────────┐
│   Your PC (CPU)     │◄──────────────────► │  GPU Server         │
│   - Tauri App       │                     │  - Docker           │
│   - Backend         │                     │  - NVIDIA Runtime   │
│   - Non-GPU engines │                     │  - XTTS, Whisper    │
└─────────────────────┘                     └─────────────────────┘

Features:

SSH key auto-generation per host
GPU detection (NVIDIA runtime check)
Volume mounts for samples/models
Per-host engine installation

Developer Setup

For engine development, run backend in VENV:

cd backend
python -m venv venv
./venv/Scripts/activate  # Windows
pip install -r requirements.txt
python main.py

Then clone engines repo and develop locally.

Engine Runner Architecture

Three runner types execute engines in different environments:

EngineRunner (ABC)
├── LocalRunner        - Subprocess in local VENV (developers)
├── DockerRunner       - Local Docker containers (recommended)
└── RemoteDockerRunner - Remote Docker via SSH (GPU offloading)

Configuration:

Variable	Default	Description
`DEFAULT_ENGINE_RUNNER`	`local`	Default: `local` or `docker`
`DOCKER_ENGINE_HOST`	`127.0.0.1`	Backend address for containers

Engine Hosts Management UI

New "Engine Hosts" tab in Settings manages all execution environments:

Host Type	Description
Subprocess	Built-in, for developers (VENV-based)
Docker Local	Local Docker daemon (recommended)
Docker Remote	Remote servers via SSH

Features:

Add/remove remote Docker hosts with SSH key wizard
Test connection with GPU detection
Volume configuration (samples, models paths)
SSH public key display with copy-to-clipboard
Per-host engine installation from online catalog
Automatic image update detection

REST API

Engine Host Endpoints

Method	Endpoint	Description
GET	`/engine-hosts`	List all hosts
GET	`/engine-hosts/{id}`	Get specific host
POST	`/engine-hosts`	Create new host
DELETE	`/engine-hosts/{id}`	Delete host
POST	`/engine-hosts/{id}/test`	Test connection + GPU detection
POST	`/engine-hosts/prepare`	Generate SSH key for new host
GET	`/engine-hosts/{id}/volumes`	Get volume configuration
POST	`/engine-hosts/{id}/volumes`	Set volume configuration
GET	`/engine-hosts/{id}/public-key`	Get SSH public key for host

Engine/Catalog Endpoints

Method	Endpoint	Description
POST	`/engines/catalog/sync`	Sync online catalog
POST	`/engines/{variant}/install`	Install Docker engine
DELETE	`/engines/{variant}/uninstall`	Uninstall Docker engine
GET	`/engines/{variant}/check-update`	Check for image updates
POST	`/engines/{variant}/pull-update`	Pull latest image

Database Schema

New engine_hosts table:

CREATE TABLE engine_hosts (
    host_id TEXT PRIMARY KEY,
    display_name TEXT NOT NULL,
    host_type TEXT NOT NULL,        -- 'subprocess' | 'docker:local' | 'docker:remote'
    ssh_url TEXT,                   -- For remote: ssh://user@host
    is_available INTEGER DEFAULT 1,
    has_gpu INTEGER,                -- NULL=unknown, 0=no, 1=yes
    samples_path TEXT,              -- Volume mount for samples
    models_path TEXT,               -- Volume mount for models
    created_at TEXT NOT NULL,
    updated_at TEXT NOT NULL
);

Bug Fixes

SQLite WAL Mode Disabled

Disabled WAL journal mode for Docker volume mount compatibility. Database uses DELETE mode for network filesystem support.

Technical Details

New Backend Modules

Module	Description
`core/engine_runner.py`	EngineRunner ABC + EngineEndpoint
`core/local_runner.py`	Subprocess runner (developer mode)
`core/docker_runner.py`	Local Docker runner
`core/remote_docker_runner.py`	SSH-based remote runner
`core/engine_runner_registry.py`	Central runner registry
`db/engine_host_repository.py`	Host CRUD operations
`api/engine_hosts.py`	Host REST API
`services/docker_service.py`	Docker SDK operations
`services/docker_discovery_service.py`	Catalog sync, discovery
`services/docker_host_monitor.py`	Host availability monitoring
`services/ssh_key_service.py`	SSH key management
`services/online_catalog_service.py`	Online catalog management

New Frontend Components

Component	Description
`EngineHostsTab.tsx`	Main host management UI
`HostEnginesSection.tsx`	Per-host engine list
`HostSettingsDialog.tsx`	Volume config + SSH key
`AddHostDialog.tsx`	Add remote host wizard
`AddImageDialog.tsx`	Install from catalog
`useSSEDockerHostHandlers.ts`	Host status SSE events

New Dependencies

docker>=7.0.0 - Docker SDK for Python

Tests

23 new tests covering:

EngineRunner abstraction (3)
LocalRunner (4)
DockerRunner (3)
RemoteDockerRunner (2)
EngineRunnerRegistry (6)
DockerHostRepository (3)
BaseEngineManager integration (2)

Migration Notes

Existing VENV-based installations continue to work
No database migration required
Docker mode is opt-in via DEFAULT_ENGINE_RUNNER=docker
Recommended: Switch to Docker for production use

Full Changelog: v1.0.2...v1.1.0

Engine Repository: https://github.com/DigiJoe79/audiobook-maker-engines

Release: v1.0.2 - VibeVoice TTS Engine, EPUB Import and Bugfixes

New Features

VibeVoice TTS Engine (Experimental)

Added Microsoft VibeVoice as a new TTS engine with voice cloning support.

Models:

VibeVoice-1.5B (~3GB VRAM) - Faster generation, up to 90 min audio
VibeVoice-7B (~18GB VRAM) - Highest quality, up to 45 min audio

Languages:

Stable: English (en), Chinese (zh)
Experimental: German (de), French (fr), Italian (it), Japanese (ja), Korean (ko), Dutch (nl), Polish (pl), Portuguese (pt), Spanish (es)

Requirements:

Python 3.12
PyTorch 2.9.1 with CUDA 13.0
Optional: Flash Attention 2 for ~2x faster inference and ~40% less VRAM

Optional Flash Attention 2 Setup (Windows):

The setup script asks whether to install Flash Attention 2. If you choose yes, it installs:

triton-windows - Triton compiler for Windows
Flash-Attention-2_for_Windows - Pre-built Windows wheels

Without Flash Attention, SDPA (Scaled Dot-Product Attention) is used as fallback with ~80% of the performance.

Voice Sample Recommendations for Voice Cloning with VibeVoice:

Aspect	Recommendation
Duration	10-60 seconds (10s minimum)
Format	WAV or MP3
Sample Rate	24kHz (VibeVoice native rate)
Quality	Clean, no background noise/music
Language	EN or ZH for best results

Best practices:

Use clean audio without background noise or music
Natural speaking pace (not too fast)
Consistent volume throughout
Avoid intro phrases like "Welcome to..." or "Hello..." (can cause artifacts)
The 7B model is more stable with fewer unexpected artifacts

EPUB Import

Added EPUB file support to the Import workflow. You can now import e-books directly alongside Markdown files.

Features:

Automatic EPUB-to-Markdown conversion
Chapter structure detection
Front matter filtering (skips cover, ToC, copyright pages)
Same preview/execute workflow as Markdown import

Dependencies: ebooklib, beautifulsoup4, markdownify

Contributed by @codesterribly

Bug Fixes

Text Segmentation: Fixed max_length parameter being ignored during text segmentation. Segments were incorrectly limited to 250 characters instead of the configured limit (e.g., 5000 for VibeVoice). Affected both Upload and Import workflows.
Segment Text Normalization: Newlines and extra whitespace are now removed when segments are created. This fixes issues with TTS engines (like VibeVoice) that interpret newlines as speaker turn boundaries, causing audio to be truncated.
Engine Enable/Disable: Fixed engines not being disabled when clicking the disable button. The enabled flag was written to a separate copy of the settings dictionary instead of the original, causing changes to be lost on save.
TTS Job Timestamps: Fixed job completion timestamps jumping by ~1 hour. SQLite's datetime('now') returns UTC without timezone info, but JavaScript interpreted it as local time. Now uses datetime.now().isoformat() for consistent local timestamps.

Improvements

API Types: Migrated frontend API types to OpenAPI-generated source for better type safety

Full Changelog: https://github.com/user/audiobook-maker/compare/v1.0.1...v1.0.2

Release v1.0.1 - Bug Fixes

Patch release with bug fixes and stability improvements.

Bug Fixes

Speaker sample paths: Changed from absolute to relative paths for portability. Existing speakers need to be re-added.
Speaker preview player: Fixed play/pause icon state sync
Windows server shutdown: Suppressed harmless ConnectionResetError on Ctrl+C
E2E tests: Made segment tests locale-agnostic (work in German and English)

Migration

Existing users with speaker samples must re-add their speakers after updating (delete and re-create).

Full Changelog: https://github.com/user/audiobook-maker/compare/v1.0.0...v1.0.1

Release v1.0.0 - Multi-Engine Architecture & Quality Assurance

This is a major release of Audiobook Maker featuring a complete multi-engine architecture, quality assurance system, performance optimizations, and modern UI navigation.

Highlights

Multi-Engine Architecture

4 Engine Types - TTS, STT, Text Processing, and Audio Analysis
Isolated Virtual Environments - Each engine runs in its own VENV (no dependency conflicts)
Engine Enable/Disable - Per-engine enabled flag with database persistence
Auto-Stop - Non-default engines automatically stop after 5 minutes of inactivity
Engine Monitoring UI - Real-time status, memory usage, and auto-stop countdown
Plug-and-Play - Add new engines without modifying backend code

Quality Assurance System

Whisper Integration - Automatic transcription analysis for quality verification
Silero-VAD - Audio quality analysis (speech ratio, silence detection, clipping)
Confidence Scoring - Detect low-quality or mispronounced segments (0-100%)
Issue Detection - Identify missing words, extra words, mispronunciations
Quality Status - Visual indicators (perfect/warning/defect) in segment list
Generic Quality Format - Engine-agnostic result format for UI rendering

Pronunciation Rules System

Pattern-Based Replacement - Simple text or regex patterns to fix mispronunciations
Scope-Based Rules - Global, engine-specific, project-specific, or project-engine
Priority System - Control rule application order
Live Preview - Test rules before applying
Import/Export - JSON format for rule sharing
Automatic Application - Rules applied during TTS generation

Modern Navigation System

Teams/Discord-Style UI - Icon-based sidebar with 6 views
Keyboard Shortcuts - Ctrl+1 through Ctrl+6 for view switching
Collapsible Sidebar - Ctrl+B to toggle
Event Log - Real-time SSE event monitoring in Monitoring view

Performance Optimizations

95% DOM Reduction - Virtual scrolling with @tanstack/react-virtual
99% Fewer Re-renders - React.memo optimization for list items
95% Faster Event Processing - immer integration (1-2ms vs 20-30ms)
60fps Smooth Scrolling - Even with 400+ segments

New Features

Multi-Engine Architecture

Engine Types:

Type	Purpose	Engines Included
TTS	Text-to-Speech	XTTS v2, Chatterbox
STT	Speech-to-Text	OpenAI Whisper (5 model sizes)
Text Processing	Text Segmentation	spaCy (11 languages)
Audio Analysis	Audio Quality	Silero-VAD

Base Server Hierarchy:

BaseEngineServer (Generic)
├── BaseTTSServer (adds /generate endpoint)
├── BaseQualityServer (adds /analyze endpoint)
└── BaseTextServer (adds /segment endpoint)

Engine Management:

Enable/disable engines via Settings or API
Auto-stop after 5 minutes inactivity (configurable)
Real-time status monitoring (disabled/stopped/starting/running/stopping)
Memory usage tracking
Start/stop buttons in Monitoring view

Quality Analysis System

Unified Quality Format:

All engines return generic AnalysisResult with qualityScore, qualityStatus, details
Frontend renders any engine's results dynamically
Supports fields (key-value pairs) and infoBlocks (titled lists)

Quality Levels:

perfect (score >= 85) - Green indicator
warning (score 70-84) - Yellow indicator
defect (score < 70) - Red indicator

Analysis Types:

STT Analysis (Whisper): Transcription comparison, word-level confidence, text alignment
Audio Analysis (Silero-VAD): Speech ratio, silence detection, clipping detection, volume analysis

Navigation System

6 Views:

View	Shortcut	Description
Main	Ctrl+1	Audiobook editing with AudioPlayer
Import	Ctrl+2	Markdown import workflow
Speakers	Ctrl+3	Speaker management
Pronunciation	Ctrl+4	Pronunciation rules
Monitoring	Ctrl+5	Jobs, Quality Jobs, Event Log, Engines
Settings	Ctrl+6	5 settings tabs

Additional Shortcuts:

Ctrl+B - Toggle sidebar
Ctrl+[ - Go back to previous view
Mac: Cmd instead of Ctrl

Pronunciation Rules

Create text transformation rules for mispronounced words
Support regex and simple text patterns
Scope system: global → engine → project → project-engine
Priority ordering within scopes
Active/inactive toggle
Import/export as JSON

Technical Changes

Backend

Multi-Engine System:

4 Engine Managers inheriting from BaseEngineManager
Auto-discovery per engine type from backend/engines/{type}/
Unified /models endpoint returning ModelInfo objects
Activity tracking with timestamps for auto-stop
Engine enable/disable persisted in settings DB

API Consolidation:

Jobs API: /api/jobs/tts/* and /api/jobs/quality/*
Engine API: /api/engines/status, /api/engines/{type}/{name}/enable|disable
Removed duplicate endpoints

Quality System:

QualityWorker orchestrates STT + Audio engines
Generic result format for all quality engines
Quality jobs table with database persistence

New Files:

core/base_engine_manager.py - Shared manager logic
core/base_engine_discovery.py - Shared discovery logic
core/{tts,stt,text,audio}_engine_manager.py - Type-specific managers
core/quality_worker.py - Quality job processor
engines/base_quality_server.py - STT + Audio base class
engines/base_text_server.py - Text processing base class

Frontend

Navigation System:

store/navigationStore.ts - View state management
pages/*.tsx - 6 main views + embedded views
components/layout/NavigationSidebar.tsx - Icon sidebar

Engine Management:

components/engines/EngineCard.tsx - Status card
components/engines/EngineStatusBadge.tsx - Status indicator
hooks/useEngineQueries.ts - React Query hooks

Performance:

Virtual scrolling in SegmentList
React.memo with custom comparisons
immer for O(1) cache updates
Stable references with useCallback

SSE Handlers:

Split into 6 domain-specific hooks (2,555 LOC total)
useSSETTSHandlers - TTS job events
useSSEQualityHandlers - Quality analysis events
useSSESystemHandlers - Health, settings, pronunciation
useSSESegmentHandlers - Segment/chapter events
useSSEExportHandlers - Export events
useSSEEngineHandlers - Engine status events

Database

New Tables:

pronunciation_rules - Rule storage with scope
quality_jobs - Quality job queue

Engine Settings:

settings.{type}.engines.{name}.enabled - Per-engine enable flag

Breaking Changes

API Changes

Removed Endpoints:

/api/tts/engines/* - Use /api/engines/status instead
/api/stt/* legacy endpoints - Use /api/quality/* and /api/jobs/quality/*

New Endpoints:

GET /api/engines/status - All engines status
POST /api/engines/{type}/{name}/enable - Enable engine
POST /api/engines/{type}/{name}/disable - Disable engine
POST /api/engines/{type}/{name}/start - Start engine
POST /api/engines/{type}/{name}/stop - Stop engine
POST /api/quality/analyze/segment - Analyze single segment
POST /api/quality/analyze/chapter - Analyze chapter
GET /api/jobs/quality/* - Quality job management

Engine Directory Structure

Engines moved to type-specific directories:

backend/engines/
├── tts/
│   ├── xtts/
│   ├── chatterbox/
│   └── kani/
├── stt/
│   └── whisper/
├── text_processing/
│   └── spacy/
└── audio_analysis/
    └── silero-vad/

Response Model Changes

engine_model_name replaces tts_model_name in discovery/managers
All quality responses use generic AnalysisResult format

Available Engines

TTS Engines

Engine	Languages	Features	Python
XTTS v2	17	Voice cloning, model hotswap	3.10
Chatterbox	23	Voice cloning, fast generation	3.11

STT Engines

Engine	Languages	Models	Python
Whisper	12	tiny, base, small, medium, large	3.12

Text Processing Engines

Engine	Languages	Features	Python
spaCy	11	MD models only, CPU-only	3.12

Audio Analysis Engines

Engine	Features	Python
Silero-VAD	Speech/silence detection, clipping, volume	3.12

Known Issues

Virtual scrolling may have slight visual jitter on rapid scroll
Pronunciation rules with complex regex may impact generation speed
Quality analysis may take some time for long chapters

For Developers

Adding Custom Engines

See the updated Engine Development Guide for complete documentation.

Quick Start:

Choose engine type (TTS, STT, Text, Audio)
Copy template: cp -r backend/engines/{type}/_template backend/engines/{type}/my_engine
Implement required methods:
- TTS: load_model(), generate_audio(), unload_model(), get_available_models()
- STT/Audio: load_model(), analyze_audio(), unload_model(), get_available_models()
Configure engine.yaml
Create VENV and install dependencies
Restart backend - engine appears automatically!

Full Changelog

v0.3.0 (not released on Github)

Quality Assurance with Whisper STT analysis
Pronunciation Rules System
Performance optimizations (virtual scrolling, immer)
SSE handler refactoring (6 domain-specific hook...

Release v0.2.0 - Architecture & Performance Update

This is a major update to Audiobook Maker focusing on architecture improvements, real-time updates, and performance optimizations.

Highlights

Real-Time Updates & Performance

Server-Sent Events (SSE) - Push-based real-time updates for instant UI feedback
99.5% Network Reduction - 2,000+ requests reduced to ~50 during 10-minute generation
Zero Polling - Complete elimination of aggressive polling when SSE is active
Instant Visual Feedback - Segments show status changes immediately without delays

Architecture Improvements

Plug-and-Play Engine System - Add new TTS engines without touching backend code!
Complete Dependency Isolation - Each engine runs in its own virtual environment
Database-Backed Job Queue - Persistent job state survives backend restarts
Worker-Queue Pattern - Background worker with graceful shutdown and recovery
Job Management - Resume cancelled jobs, clear finished jobs, track progress

Code Quality & Standards

100% Pydantic Compliance - All API endpoints use validated response models

User Experience

JobsPanel UI - Centralized job management with status tracking
Queued Status - Visual feedback for segments waiting in job queue
Segment Regeneration - Improved workflow with safety warnings
Health Monitoring - Real-time backend health status via SSE

Technical Changes

Backend

Server-Sent Events:

EventBroadcaster service with channel-based routing (jobs, health, speakers, settings)
25 event types for real-time updates across all entities
TTS Worker emits events during generation
Health broadcaster sends updates every 5s

Database & API:

Database migration: renamed columns to use tts_ prefix consistently
All endpoints use Pydantic response models with automatic camelCase conversion
Job persistence in tts_jobs table (pending → running → completed/failed/cancelled)
Segment cleanup on cancel/delete/crash (prevents orphaned segments)

Engine Architecture:

Auto-discovery scans backend/engines/ for new engines at backend startup
Dynamic imports with optional engine.yaml metadata
No hardcoded engine references in backend code

Frontend

React Query Optimizations:

SSE-aware polling with automatic fallback (30s interval)
Increased cache times for static data (10m engines, 5m models)
Optimized invalidations (chapter-specific instead of project-wide)

Job Management:

JobsPanel component with status badges and progress tracking
Resume cancelled jobs with one click
Clear finished jobs (completed/failed/cancelled)
Active job monitoring with SSE (no polling)

Breaking Changes

Database Schema

Migration required: engine → tts_engine, model_name → tts_model_name, speaker_name → tts_speaker_name
Automatic migration runs on first startup

API Changes

All responses now use camelCase (automatic conversion via Pydantic)
Request bodies accept both snake_case and camelCase (backwards compatible)

For Developers: Extending with Custom TTS Engines

v0.2.0 introduces a new plug-and-play engine system!

You can now add your own TTS engines to Audiobook Maker without modifying backend code:

How Easy Is It?

Copy the template:

cp -r backend/engines/_template backend/engines/my_engine

Implement 3 methods in server.py:
- load_model() - Load your TTS model
- generate_audio() - Synthesize text to audio
- unload_model() - Free resources
Configure engine.yaml:
- Name, languages, capabilities
- Models are auto-discovered from models/ folder

Create isolated VENV:

cd backend/engines/my_engine
python -m venv venv
venv\Scripts\activate
pip install -r requirements.txt

Restart backend → Engine appears in UI automatically! ✅

Supported TTS Engines

Out of the box:

XTTS v2 (voice cloning, 17+ languages)
Chatterbox Multilingual (experimental)

Want to add Piper? Coqui? Just implement the 3 methods!

📖 See ENGINE_DEVELOPMENT_GUIDE.md for complete documentation.

Known Issues

SSE connection may disconnect on network changes (auto-reconnects within 3s)
First SSE connection takes 100-200ms to establish

Full Changelog: v0.1.0...v0.2.0

Contributors

Built with Tauri 2.1, React 18, Python FastAPI, and powered by XTTS v2 for voice cloning.

Release v0.1.0 - Initial Release

This is the first public release of Audiobook Maker, a modern desktop application for creating audiobooks with AI-powered voice cloning.

Highlights

Complete audiobook creation workflow with project, chapter, and segment management
XTTS voice cloning from audio samples with optional GPU acceleration
Multi-language support (17+ languages) and smart text segmentation
Export to MP3, M4A, or WAV formats
Markdown import for batch project creation
Session recovery and drag & drop interface

Tech Stack

Built with Tauri 2.1, React 18, Python FastAPI, and XTTS v2 for voice cloning.

Getting Started

See the Quick Start Guide and Usage Guide in the README for installation and setup instructions.

Requirements

Node.js 18+, Python 3.10+, Rust 1.70+, FFmpeg
CUDA 11.8/12.1 (optional, for GPU)

What's Next

Planned features include additional TTS engines (OpenAI, ElevenLabs, Azure), Whisper integration, pronunciation dictionary, and audio effects. See the full Roadmap for details.

Full Changelog: Initial Release

Releases: DigiJoe79/AudioBook-Maker

Release v1.1.2 - Unified Error Handling

What's New

Unified ApplicationError Exception

Error Format

Bug Fixes

Frontend Error Message Parsing

Docker Image Update Cleanup Failure

Upgrade Notes

Uh oh!

Release v1.1.1 - Remote Backend Connectivity Fix

Bug Fix

Remote Backend Connection Failure

Affected Versions

Upgrade Notes

Acknowledgments

Uh oh!

Docker-First Architecture & Online Engine Catalog

Release v1.1.0 - Docker-First Architecture & Online Engine Catalog

The Docker-First Vision

For End Users: Zero-Setup Engines

Online Engine Catalog

For Developers: Subprocess Mode

Deployment Scenarios

Recommended: Full Docker Stack

Remote GPU Offloading

Developer Setup

Engine Runner Architecture

Engine Hosts Management UI

REST API

Engine Host Endpoints

Engine/Catalog Endpoints

Database Schema

Bug Fixes

SQLite WAL Mode Disabled

Technical Details

New Backend Modules

New Frontend Components

New Dependencies

Tests

Migration Notes

Uh oh!

VibeVoice TTS Engine, EPUB Import and Bugfixes

Release: v1.0.2 - VibeVoice TTS Engine, EPUB Import and Bugfixes

New Features

VibeVoice TTS Engine (Experimental)

EPUB Import

Bug Fixes

Improvements

Uh oh!

Bug fixes and stability improvements

Release v1.0.1 - Bug Fixes

Bug Fixes

Migration

Uh oh!

Multi-Engine Architecture & Quality Assurance

Release v1.0.0 - Multi-Engine Architecture & Quality Assurance

Highlights

Multi-Engine Architecture

Quality Assurance System

Pronunciation Rules System

Modern Navigation System

Performance Optimizations

New Features

Multi-Engine Architecture

Quality Analysis System

Navigation System

Pronunciation Rules

Technical Changes

Backend

Frontend

Database

Breaking Changes

API Changes

Engine Directory Structure

Response Model Changes

Available Engines

TTS Engines

STT Engines

Text Processing Engines