Skip to content

Releases: DigiJoe79/AudioBook-Maker

Release v1.1.2 - Unified Error Handling

04 Jan 16:56

Choose a tag to compare

This release addresses some minor fixes and improving maintainability and consistency of error responses for frontend i18n translation.

What's New

Unified ApplicationError Exception

All backend errors now use a single ApplicationError class instead of 20+ specialized exception subclasses.

Error Format

All errors follow the i18n-compatible format:

[ERROR_CODE]param1:value1;param2:value2

Bug Fixes

Frontend Error Message Parsing

Problem: Error messages containing colons in parameter values were incorrectly parsed, causing i18n translation failures.

Example:

[ENGINE_START_FAILED]error:Engine spacy:docker:local failed

The previous parser split on all colons, resulting in:

  • errorEngine spacy (truncated)

Fix: Split only on the first colon to preserve colons in values:

  • errorEngine spacy:docker:local failed (complete)

Affected file: frontend/src/utils/translateBackendError.ts

Docker Image Update Cleanup Failure

Problem: When updating a Docker engine image, the cleanup of the old (dangling) image failed with error:

409 Conflict: unable to delete image - image is being used by running container

Cause: The update flow attempted to remove the old image while the container was still running.

Fix: Stop the running engine container before attempting to remove the old dangling image.

Affected file: backend/api/engines.py

Upgrade Notes

  • No database migration required

Full Changelog: v1.1.1...v1.1.2

Release v1.1.1 - Remote Backend Connectivity Fix

03 Jan 09:17

Choose a tag to compare

This is a hotfix release that resolves a critical issue preventing the desktop app from connecting to remote Docker backends.

Bug Fix

Remote Backend Connection Failure

Symptoms:

  • Desktop app shows "Offline" when connecting to remote backends (e.g., http://server:8765)
  • Browser can reach the same backend URL successfully
  • No network traffic visible in Wireshark when app attempts connection

Root Cause:
The Content Security Policy (CSP) in tauri.conf.json was restricting HTTP connections to localhost:8765 and 127.0.0.1:8765 only. When users configured a remote backend URL, the CSP silently blocked all fetch requests before they could leave the app.

Fix:
Updated CSP directives to allow connections to any host:

Directive Before After
connect-src localhost:8765, 127.0.0.1:8765 http://*:*, https://*:*, ws://*:*, wss://*:*
img-src localhost:8765, 127.0.0.1:8765 http://*:*, https://*:*
media-src localhost:8765, 127.0.0.1:8765 http://*:*, https://*:*

This enables the desktop app to connect to:

  • Local backends (localhost, 127.0.0.1)
  • Remote backends on LAN (192.168.x.x, hostnames)
  • Remote backends over internet (any URL configured by user)

Affected Versions

  • v1.1.0 - Remote backend connections blocked
  • v1.1.1 - Fixed

Upgrade Notes

No database migration required. Simply replace the desktop app with v1.1.1.

Acknowledgments

Thanks to the following community members for their contributions:

Bug Reports:

  • u/k8-bit - Remote backend connection issue with Wireshark captures
  • u/GaryDUnicorn - Remote backend connection issue with Wireshark captures

Security Feedback:

  • u/coder543 - Corrected misleading SSH key security description

Full Changelog: v1.1.0...v1.1.1

Docker-First Architecture & Online Engine Catalog

02 Jan 23:13

Choose a tag to compare

Release v1.1.0 - Docker-First Architecture & Online Engine Catalog

This release transforms Audiobook Maker into a Docker-first application. Engines are now distributed as prebuilt Docker images via the audiobook-maker-engines repository, eliminating complex Python environment setup for end users.

The Docker-First Vision

For End Users: Zero-Setup Engines

Before v1.1.0: Installing a TTS engine like XTTS meant:

  • Installing Python 3.10+
  • Creating virtual environments
  • Installing CUDA/cuDNN dependencies
  • Downloading models manually
  • Debugging dependency conflicts

With v1.1.0: Click "Install" in the UI:

  • Prebuilt Docker images from GitHub Container Registry
  • All dependencies bundled (Python, CUDA, models)
  • One-click installation and updates
  • Works on Windows, macOS, and Linux

Online Engine Catalog

All engines are distributed via the audiobook-maker-engines repository:

https://github.com/DigiJoe79/audiobook-maker-engines
├── catalog.yaml          # Engine metadata, versions, requirements
├── xtts/                 # XTTS v2 engine
│   └── Dockerfile
├── whisper/              # Whisper STT engine
│   └── Dockerfile
└── ...

How it works:

  1. Backend syncs catalog.yaml on startup
  2. UI shows available engines with version info
  3. User clicks "Install" → pulls from ghcr.io/digijoe79/audiobook-maker-engines
  4. Engine ready to use in seconds (depending on image size)
  5. Update detection: compares local digest with registry

Image Registry: ghcr.io/digijoe79/audiobook-maker-engines/{engine}:{tag}

For Developers: Subprocess Mode

Subprocess execution (LocalRunner) remains available for engine development:

  • Clone the engines repo: git clone https://github.com/DigiJoe79/audiobook-maker-engines backend/engines
  • Create VENV, install dependencies, iterate on code
  • Test locally before building Docker images
  • Backend auto-discovers engines in backend/engines/

Use subprocess mode when:

  • Developing new engines
  • Debugging engine code
  • Contributing to audiobook-maker-engines

Deployment Scenarios

Recommended: Full Docker Stack

Run everything in Docker - backend and engines:

Using docker compose:

docker compose up -d

Or standalone docker run:

docker run -d \
  --name audiobook-maker-backend \
  -p 8765:8765 \
  --add-host=host.docker.internal:host-gateway \
  -e DOCKER_ENGINE_HOST=host.docker.internal \
  -v /var/run/docker.sock:/var/run/docker.sock \
  -v audiobook-data:/app/data \
  -v audiobook-media:/app/media \
  ghcr.io/digijoe79/audiobook-maker/backend:latest
services:
  backend:
    build:
      context: .
      dockerfile: backend/Dockerfile
    container_name: audiobook-maker-backend  # Required name (orphan cleanup)
    ports:
      - "8765:8765"
    extra_hosts:
      - "host.docker.internal:host-gateway"
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - ./data:/app/data
      - ./media:/app/media
    environment:
      - DEFAULT_ENGINE_RUNNER=docker
      - DOCKER_ENGINE_HOST=host.docker.internal

Note: The container must be named audiobook-maker-backend. On startup, the backend stops orphaned engine containers (prefix audiobook-) from previous sessions. Using a different name will cause the backend to stop itself.

Remote GPU Offloading

Run GPU-intensive engines on a dedicated server:

┌─────────────────────┐         SSH         ┌─────────────────────┐
│   Your PC (CPU)     │◄──────────────────► │  GPU Server         │
│   - Tauri App       │                     │  - Docker           │
│   - Backend         │                     │  - NVIDIA Runtime   │
│   - Non-GPU engines │                     │  - XTTS, Whisper    │
└─────────────────────┘                     └─────────────────────┘

Features:

  • SSH key auto-generation per host
  • GPU detection (NVIDIA runtime check)
  • Volume mounts for samples/models
  • Per-host engine installation

Developer Setup

For engine development, run backend in VENV:

cd backend
python -m venv venv
./venv/Scripts/activate  # Windows
pip install -r requirements.txt
python main.py

Then clone engines repo and develop locally.

Engine Runner Architecture

Three runner types execute engines in different environments:

EngineRunner (ABC)
├── LocalRunner        - Subprocess in local VENV (developers)
├── DockerRunner       - Local Docker containers (recommended)
└── RemoteDockerRunner - Remote Docker via SSH (GPU offloading)

Configuration:

Variable Default Description
DEFAULT_ENGINE_RUNNER local Default: local or docker
DOCKER_ENGINE_HOST 127.0.0.1 Backend address for containers

Engine Hosts Management UI

New "Engine Hosts" tab in Settings manages all execution environments:

Host Type Description
Subprocess Built-in, for developers (VENV-based)
Docker Local Local Docker daemon (recommended)
Docker Remote Remote servers via SSH

Features:

  • Add/remove remote Docker hosts with SSH key wizard
  • Test connection with GPU detection
  • Volume configuration (samples, models paths)
  • SSH public key display with copy-to-clipboard
  • Per-host engine installation from online catalog
  • Automatic image update detection

REST API

Engine Host Endpoints

Method Endpoint Description
GET /engine-hosts List all hosts
GET /engine-hosts/{id} Get specific host
POST /engine-hosts Create new host
DELETE /engine-hosts/{id} Delete host
POST /engine-hosts/{id}/test Test connection + GPU detection
POST /engine-hosts/prepare Generate SSH key for new host
GET /engine-hosts/{id}/volumes Get volume configuration
POST /engine-hosts/{id}/volumes Set volume configuration
GET /engine-hosts/{id}/public-key Get SSH public key for host

Engine/Catalog Endpoints

Method Endpoint Description
POST /engines/catalog/sync Sync online catalog
POST /engines/{variant}/install Install Docker engine
DELETE /engines/{variant}/uninstall Uninstall Docker engine
GET /engines/{variant}/check-update Check for image updates
POST /engines/{variant}/pull-update Pull latest image

Database Schema

New engine_hosts table:

CREATE TABLE engine_hosts (
    host_id TEXT PRIMARY KEY,
    display_name TEXT NOT NULL,
    host_type TEXT NOT NULL,        -- 'subprocess' | 'docker:local' | 'docker:remote'
    ssh_url TEXT,                   -- For remote: ssh://user@host
    is_available INTEGER DEFAULT 1,
    has_gpu INTEGER,                -- NULL=unknown, 0=no, 1=yes
    samples_path TEXT,              -- Volume mount for samples
    models_path TEXT,               -- Volume mount for models
    created_at TEXT NOT NULL,
    updated_at TEXT NOT NULL
);

Bug Fixes

SQLite WAL Mode Disabled

Disabled WAL journal mode for Docker volume mount compatibility. Database uses DELETE mode for network filesystem support.

Technical Details

New Backend Modules

Module Description
core/engine_runner.py EngineRunner ABC + EngineEndpoint
core/local_runner.py Subprocess runner (developer mode)
core/docker_runner.py Local Docker runner
core/remote_docker_runner.py SSH-based remote runner
core/engine_runner_registry.py Central runner registry
db/engine_host_repository.py Host CRUD operations
api/engine_hosts.py Host REST API
services/docker_service.py Docker SDK operations
services/docker_discovery_service.py Catalog sync, discovery
services/docker_host_monitor.py Host availability monitoring
services/ssh_key_service.py SSH key management
services/online_catalog_service.py Online catalog management

New Frontend Components

Component Description
EngineHostsTab.tsx Main host management UI
HostEnginesSection.tsx Per-host engine list
HostSettingsDialog.tsx Volume config + SSH key
AddHostDialog.tsx Add remote host wizard
AddImageDialog.tsx Install from catalog
useSSEDockerHostHandlers.ts Host status SSE events

New Dependencies

  • docker>=7.0.0 - Docker SDK for Python

Tests

23 new tests covering:

  • EngineRunner abstraction (3)
  • LocalRunner (4)
  • DockerRunner (3)
  • RemoteDockerRunner (2)
  • EngineRunnerRegistry (6)
  • DockerHostRepository (3)
  • BaseEngineManager integration (2)

Migration Notes

  • Existing VENV-based installations continue to work
  • No database migration required
  • Docker mode is opt-in via DEFAULT_ENGINE_RUNNER=docker
  • Recommended: Switch to Docker for production use

Full Changelog: v1.0.2...v1.1.0

Engine Repository: https://github.com/DigiJoe79/audiobook-maker-engines

VibeVoice TTS Engine, EPUB Import and Bugfixes

10 Dec 22:38

Choose a tag to compare

Release: v1.0.2 - VibeVoice TTS Engine, EPUB Import and Bugfixes

New Features

VibeVoice TTS Engine (Experimental)

Added Microsoft VibeVoice as a new TTS engine with voice cloning support.

Models:

  • VibeVoice-1.5B (~3GB VRAM) - Faster generation, up to 90 min audio
  • VibeVoice-7B (~18GB VRAM) - Highest quality, up to 45 min audio

Languages:

  • Stable: English (en), Chinese (zh)
  • Experimental: German (de), French (fr), Italian (it), Japanese (ja), Korean (ko), Dutch (nl), Polish (pl), Portuguese (pt), Spanish (es)

Requirements:

  • Python 3.12
  • PyTorch 2.9.1 with CUDA 13.0
  • Optional: Flash Attention 2 for ~2x faster inference and ~40% less VRAM

Optional Flash Attention 2 Setup (Windows):

The setup script asks whether to install Flash Attention 2. If you choose yes, it installs:

Without Flash Attention, SDPA (Scaled Dot-Product Attention) is used as fallback with ~80% of the performance.

Voice Sample Recommendations for Voice Cloning with VibeVoice:

Aspect Recommendation
Duration 10-60 seconds (10s minimum)
Format WAV or MP3
Sample Rate 24kHz (VibeVoice native rate)
Quality Clean, no background noise/music
Language EN or ZH for best results

Best practices:

  • Use clean audio without background noise or music
  • Natural speaking pace (not too fast)
  • Consistent volume throughout
  • Avoid intro phrases like "Welcome to..." or "Hello..." (can cause artifacts)
  • The 7B model is more stable with fewer unexpected artifacts

EPUB Import

Added EPUB file support to the Import workflow. You can now import e-books directly alongside Markdown files.

Features:

  • Automatic EPUB-to-Markdown conversion
  • Chapter structure detection
  • Front matter filtering (skips cover, ToC, copyright pages)
  • Same preview/execute workflow as Markdown import

Dependencies: ebooklib, beautifulsoup4, markdownify

Contributed by @codesterribly

Bug Fixes

  • Text Segmentation: Fixed max_length parameter being ignored during text segmentation. Segments were incorrectly limited to 250 characters instead of the configured limit (e.g., 5000 for VibeVoice). Affected both Upload and Import workflows.
  • Segment Text Normalization: Newlines and extra whitespace are now removed when segments are created. This fixes issues with TTS engines (like VibeVoice) that interpret newlines as speaker turn boundaries, causing audio to be truncated.
  • Engine Enable/Disable: Fixed engines not being disabled when clicking the disable button. The enabled flag was written to a separate copy of the settings dictionary instead of the original, causing changes to be lost on save.
  • TTS Job Timestamps: Fixed job completion timestamps jumping by ~1 hour. SQLite's datetime('now') returns UTC without timezone info, but JavaScript interpreted it as local time. Now uses datetime.now().isoformat() for consistent local timestamps.

Improvements

  • API Types: Migrated frontend API types to OpenAPI-generated source for better type safety

Full Changelog: https://github.com/user/audiobook-maker/compare/v1.0.1...v1.0.2

Bug fixes and stability improvements

09 Dec 10:54

Choose a tag to compare

Release v1.0.1 - Bug Fixes

Patch release with bug fixes and stability improvements.

Bug Fixes

  • Speaker sample paths: Changed from absolute to relative paths for portability. Existing speakers need to be re-added.
  • Speaker preview player: Fixed play/pause icon state sync
  • Windows server shutdown: Suppressed harmless ConnectionResetError on Ctrl+C
  • E2E tests: Made segment tests locale-agnostic (work in German and English)

Migration

Existing users with speaker samples must re-add their speakers after updating (delete and re-create).


Full Changelog: https://github.com/user/audiobook-maker/compare/v1.0.0...v1.0.1

Multi-Engine Architecture & Quality Assurance

08 Dec 22:02

Choose a tag to compare

Release v1.0.0 - Multi-Engine Architecture & Quality Assurance

This is a major release of Audiobook Maker featuring a complete multi-engine architecture, quality assurance system, performance optimizations, and modern UI navigation.

Highlights

Multi-Engine Architecture

  • 4 Engine Types - TTS, STT, Text Processing, and Audio Analysis
  • Isolated Virtual Environments - Each engine runs in its own VENV (no dependency conflicts)
  • Engine Enable/Disable - Per-engine enabled flag with database persistence
  • Auto-Stop - Non-default engines automatically stop after 5 minutes of inactivity
  • Engine Monitoring UI - Real-time status, memory usage, and auto-stop countdown
  • Plug-and-Play - Add new engines without modifying backend code

Quality Assurance System

  • Whisper Integration - Automatic transcription analysis for quality verification
  • Silero-VAD - Audio quality analysis (speech ratio, silence detection, clipping)
  • Confidence Scoring - Detect low-quality or mispronounced segments (0-100%)
  • Issue Detection - Identify missing words, extra words, mispronunciations
  • Quality Status - Visual indicators (perfect/warning/defect) in segment list
  • Generic Quality Format - Engine-agnostic result format for UI rendering

Pronunciation Rules System

  • Pattern-Based Replacement - Simple text or regex patterns to fix mispronunciations
  • Scope-Based Rules - Global, engine-specific, project-specific, or project-engine
  • Priority System - Control rule application order
  • Live Preview - Test rules before applying
  • Import/Export - JSON format for rule sharing
  • Automatic Application - Rules applied during TTS generation

Modern Navigation System

  • Teams/Discord-Style UI - Icon-based sidebar with 6 views
  • Keyboard Shortcuts - Ctrl+1 through Ctrl+6 for view switching
  • Collapsible Sidebar - Ctrl+B to toggle
  • Event Log - Real-time SSE event monitoring in Monitoring view

Performance Optimizations

  • 95% DOM Reduction - Virtual scrolling with @tanstack/react-virtual
  • 99% Fewer Re-renders - React.memo optimization for list items
  • 95% Faster Event Processing - immer integration (1-2ms vs 20-30ms)
  • 60fps Smooth Scrolling - Even with 400+ segments

New Features

Multi-Engine Architecture

Engine Types:

Type Purpose Engines Included
TTS Text-to-Speech XTTS v2, Chatterbox
STT Speech-to-Text OpenAI Whisper (5 model sizes)
Text Processing Text Segmentation spaCy (11 languages)
Audio Analysis Audio Quality Silero-VAD

Base Server Hierarchy:

BaseEngineServer (Generic)
├── BaseTTSServer (adds /generate endpoint)
├── BaseQualityServer (adds /analyze endpoint)
└── BaseTextServer (adds /segment endpoint)

Engine Management:

  • Enable/disable engines via Settings or API
  • Auto-stop after 5 minutes inactivity (configurable)
  • Real-time status monitoring (disabled/stopped/starting/running/stopping)
  • Memory usage tracking
  • Start/stop buttons in Monitoring view

Quality Analysis System

Unified Quality Format:

  • All engines return generic AnalysisResult with qualityScore, qualityStatus, details
  • Frontend renders any engine's results dynamically
  • Supports fields (key-value pairs) and infoBlocks (titled lists)

Quality Levels:

  • perfect (score >= 85) - Green indicator
  • warning (score 70-84) - Yellow indicator
  • defect (score < 70) - Red indicator

Analysis Types:

  • STT Analysis (Whisper): Transcription comparison, word-level confidence, text alignment
  • Audio Analysis (Silero-VAD): Speech ratio, silence detection, clipping detection, volume analysis

Navigation System

6 Views:

View Shortcut Description
Main Ctrl+1 Audiobook editing with AudioPlayer
Import Ctrl+2 Markdown import workflow
Speakers Ctrl+3 Speaker management
Pronunciation Ctrl+4 Pronunciation rules
Monitoring Ctrl+5 Jobs, Quality Jobs, Event Log, Engines
Settings Ctrl+6 5 settings tabs

Additional Shortcuts:

  • Ctrl+B - Toggle sidebar
  • Ctrl+[ - Go back to previous view
  • Mac: Cmd instead of Ctrl

Pronunciation Rules

  • Create text transformation rules for mispronounced words
  • Support regex and simple text patterns
  • Scope system: global → engine → project → project-engine
  • Priority ordering within scopes
  • Active/inactive toggle
  • Import/export as JSON

Technical Changes

Backend

Multi-Engine System:

  • 4 Engine Managers inheriting from BaseEngineManager
  • Auto-discovery per engine type from backend/engines/{type}/
  • Unified /models endpoint returning ModelInfo objects
  • Activity tracking with timestamps for auto-stop
  • Engine enable/disable persisted in settings DB

API Consolidation:

  • Jobs API: /api/jobs/tts/* and /api/jobs/quality/*
  • Engine API: /api/engines/status, /api/engines/{type}/{name}/enable|disable
  • Removed duplicate endpoints

Quality System:

  • QualityWorker orchestrates STT + Audio engines
  • Generic result format for all quality engines
  • Quality jobs table with database persistence

New Files:

  • core/base_engine_manager.py - Shared manager logic
  • core/base_engine_discovery.py - Shared discovery logic
  • core/{tts,stt,text,audio}_engine_manager.py - Type-specific managers
  • core/quality_worker.py - Quality job processor
  • engines/base_quality_server.py - STT + Audio base class
  • engines/base_text_server.py - Text processing base class

Frontend

Navigation System:

  • store/navigationStore.ts - View state management
  • pages/*.tsx - 6 main views + embedded views
  • components/layout/NavigationSidebar.tsx - Icon sidebar

Engine Management:

  • components/engines/EngineCard.tsx - Status card
  • components/engines/EngineStatusBadge.tsx - Status indicator
  • hooks/useEngineQueries.ts - React Query hooks

Performance:

  • Virtual scrolling in SegmentList
  • React.memo with custom comparisons
  • immer for O(1) cache updates
  • Stable references with useCallback

SSE Handlers:

  • Split into 6 domain-specific hooks (2,555 LOC total)
  • useSSETTSHandlers - TTS job events
  • useSSEQualityHandlers - Quality analysis events
  • useSSESystemHandlers - Health, settings, pronunciation
  • useSSESegmentHandlers - Segment/chapter events
  • useSSEExportHandlers - Export events
  • useSSEEngineHandlers - Engine status events

Database

New Tables:

  • pronunciation_rules - Rule storage with scope
  • quality_jobs - Quality job queue

Engine Settings:

  • settings.{type}.engines.{name}.enabled - Per-engine enable flag

Breaking Changes

API Changes

Removed Endpoints:

  • /api/tts/engines/* - Use /api/engines/status instead
  • /api/stt/* legacy endpoints - Use /api/quality/* and /api/jobs/quality/*

New Endpoints:

  • GET /api/engines/status - All engines status
  • POST /api/engines/{type}/{name}/enable - Enable engine
  • POST /api/engines/{type}/{name}/disable - Disable engine
  • POST /api/engines/{type}/{name}/start - Start engine
  • POST /api/engines/{type}/{name}/stop - Stop engine
  • POST /api/quality/analyze/segment - Analyze single segment
  • POST /api/quality/analyze/chapter - Analyze chapter
  • GET /api/jobs/quality/* - Quality job management

Engine Directory Structure

Engines moved to type-specific directories:

backend/engines/
├── tts/
│   ├── xtts/
│   ├── chatterbox/
│   └── kani/
├── stt/
│   └── whisper/
├── text_processing/
│   └── spacy/
└── audio_analysis/
    └── silero-vad/

Response Model Changes

  • engine_model_name replaces tts_model_name in discovery/managers
  • All quality responses use generic AnalysisResult format

Available Engines

TTS Engines

Engine Languages Features Python
XTTS v2 17 Voice cloning, model hotswap 3.10
Chatterbox 23 Voice cloning, fast generation 3.11

STT Engines

Engine Languages Models Python
Whisper 12 tiny, base, small, medium, large 3.12

Text Processing Engines

Engine Languages Features Python
spaCy 11 MD models only, CPU-only 3.12

Audio Analysis Engines

Engine Features Python
Silero-VAD Speech/silence detection, clipping, volume 3.12

Known Issues

  • Virtual scrolling may have slight visual jitter on rapid scroll
  • Pronunciation rules with complex regex may impact generation speed
  • Quality analysis may take some time for long chapters

For Developers

Adding Custom Engines

See the updated Engine Development Guide for complete documentation.

Quick Start:

  1. Choose engine type (TTS, STT, Text, Audio)
  2. Copy template: cp -r backend/engines/{type}/_template backend/engines/{type}/my_engine
  3. Implement required methods:
    • TTS: load_model(), generate_audio(), unload_model(), get_available_models()
    • STT/Audio: load_model(), analyze_audio(), unload_model(), get_available_models()
  4. Configure engine.yaml
  5. Create VENV and install dependencies
  6. Restart backend - engine appears automatically!

Full Changelog

v0.3.0 (not released on Github)

  • Quality Assurance with Whisper STT analysis
  • Pronunciation Rules System
  • Performance optimizations (virtual scrolling, immer)
  • SSE handler refactoring (6 domain-specific hook...
Read more

New TTS engine architecture and improvements

07 Nov 23:36

Choose a tag to compare

Release v0.2.0 - Architecture & Performance Update

This is a major update to Audiobook Maker focusing on architecture improvements, real-time updates, and performance optimizations.

Highlights

Real-Time Updates & Performance

  • Server-Sent Events (SSE) - Push-based real-time updates for instant UI feedback
  • 99.5% Network Reduction - 2,000+ requests reduced to ~50 during 10-minute generation
  • Zero Polling - Complete elimination of aggressive polling when SSE is active
  • Instant Visual Feedback - Segments show status changes immediately without delays

Architecture Improvements

  • Plug-and-Play Engine System - Add new TTS engines without touching backend code!
  • Complete Dependency Isolation - Each engine runs in its own virtual environment
  • Database-Backed Job Queue - Persistent job state survives backend restarts
  • Worker-Queue Pattern - Background worker with graceful shutdown and recovery
  • Job Management - Resume cancelled jobs, clear finished jobs, track progress

Code Quality & Standards

  • 100% Pydantic Compliance - All API endpoints use validated response models

User Experience

  • JobsPanel UI - Centralized job management with status tracking
  • Queued Status - Visual feedback for segments waiting in job queue
  • Segment Regeneration - Improved workflow with safety warnings
  • Health Monitoring - Real-time backend health status via SSE

Technical Changes

Backend

Server-Sent Events:

  • EventBroadcaster service with channel-based routing (jobs, health, speakers, settings)
  • 25 event types for real-time updates across all entities
  • TTS Worker emits events during generation
  • Health broadcaster sends updates every 5s

Database & API:

  • Database migration: renamed columns to use tts_ prefix consistently
  • All endpoints use Pydantic response models with automatic camelCase conversion
  • Job persistence in tts_jobs table (pending → running → completed/failed/cancelled)
  • Segment cleanup on cancel/delete/crash (prevents orphaned segments)

Engine Architecture:

  • Auto-discovery scans backend/engines/ for new engines at backend startup
  • Dynamic imports with optional engine.yaml metadata
  • No hardcoded engine references in backend code

Frontend

React Query Optimizations:

  • SSE-aware polling with automatic fallback (30s interval)
  • Increased cache times for static data (10m engines, 5m models)
  • Optimized invalidations (chapter-specific instead of project-wide)

Job Management:

  • JobsPanel component with status badges and progress tracking
  • Resume cancelled jobs with one click
  • Clear finished jobs (completed/failed/cancelled)
  • Active job monitoring with SSE (no polling)

Breaking Changes

Database Schema

  • Migration required: enginetts_engine, model_nametts_model_name, speaker_nametts_speaker_name
  • Automatic migration runs on first startup

API Changes

  • All responses now use camelCase (automatic conversion via Pydantic)
  • Request bodies accept both snake_case and camelCase (backwards compatible)

For Developers: Extending with Custom TTS Engines

v0.2.0 introduces a new plug-and-play engine system!

You can now add your own TTS engines to Audiobook Maker without modifying backend code:

How Easy Is It?

  1. Copy the template:

    cp -r backend/engines/_template backend/engines/my_engine
  2. Implement 3 methods in server.py:

    • load_model() - Load your TTS model
    • generate_audio() - Synthesize text to audio
    • unload_model() - Free resources
  3. Configure engine.yaml:

    • Name, languages, capabilities
    • Models are auto-discovered from models/ folder
  4. Create isolated VENV:

    cd backend/engines/my_engine
    python -m venv venv
    venv\Scripts\activate
    pip install -r requirements.txt
  5. Restart backend → Engine appears in UI automatically! ✅

Supported TTS Engines

Out of the box:

  • XTTS v2 (voice cloning, 17+ languages)
  • Chatterbox Multilingual (experimental)

Want to add Piper? Coqui? Just implement the 3 methods!

📖 See ENGINE_DEVELOPMENT_GUIDE.md for complete documentation.


Known Issues

  • SSE connection may disconnect on network changes (auto-reconnects within 3s)
  • First SSE connection takes 100-200ms to establish

Full Changelog: v0.1.0...v0.2.0

Contributors

Built with Tauri 2.1, React 18, Python FastAPI, and powered by XTTS v2 for voice cloning.

Initial Release

30 Oct 07:37

Choose a tag to compare

Release v0.1.0 - Initial Release

This is the first public release of Audiobook Maker, a modern desktop application for creating audiobooks with AI-powered voice cloning.

Highlights

  • Complete audiobook creation workflow with project, chapter, and segment management
  • XTTS voice cloning from audio samples with optional GPU acceleration
  • Multi-language support (17+ languages) and smart text segmentation
  • Export to MP3, M4A, or WAV formats
  • Markdown import for batch project creation
  • Session recovery and drag & drop interface

Tech Stack

Built with Tauri 2.1, React 18, Python FastAPI, and XTTS v2 for voice cloning.

Getting Started

See the Quick Start Guide and Usage Guide in the README for installation and setup instructions.

Requirements

  • Node.js 18+, Python 3.10+, Rust 1.70+, FFmpeg
  • CUDA 11.8/12.1 (optional, for GPU)

What's Next

Planned features include additional TTS engines (OpenAI, ElevenLabs, Azure), Whisper integration, pronunciation dictionary, and audio effects. See the full Roadmap for details.


Full Changelog: Initial Release