convert epub files to audiobooks using qwen3-tts.
- python 3.12+
- ffmpeg
- sox
- uv (python package manager)
- gpu recommended (cuda or rocm) if running tts locally
- or an openai-compatible tts endpoint (no local gpu required)
# cuda gpu (default, includes local tts)
make build-cuda
# amd rocm gpu (gfx1151, includes local tts)
make build-rocm
# cpu only (includes local tts)
make build-cpulocal tts extras are optional. to drive an openai-compatible tts endpoint
instead, install without the [local] extra and set --api-base / OPENAI_BASE_URL.
source .venv/bin/activate
autiobook --helpautiobook chapters book.epubautiobook convert book.epub -o workdir/runs all phases, skipping already-completed steps.
extract chapter text from epub to workdir.
autiobook extract book.epub -o workdir/
creates:
workdir/extract/metadata.json- book metadataworkdir/extract/NN_Title.txt- chapter text filesworkdir/extract/state.json- resumability state
convert text files to wav audio.
autiobook synthesize workdir/ -s ryan
creates:
workdir/synthesize/NN_Title.wav- audio filesworkdir/synthesize/state.json- resumability state
convert wav files to mp3 with metadata.
autiobook export workdir/
creates:
workdir/export/NN_Title.mp3- mp3 files with id3 tagsworkdir/export/state.json- resumability state
generate a full cast performance using an openai-compatible llm (including llama.cpp) and voice cloning.
# 1. extract text
autiobook extract book.epub -o workdir/
# 2. generate cast list (using llm)
autiobook cast workdir/ --api-key sk-...
# 3. generate base voice per character (review/edit characters.json first if needed)
autiobook audition workdir/
# 4. generate per-emotion voice variants
autiobook emote workdir/
# 5. create dramatized script (using llm)
autiobook script workdir/ --api-key sk-...
# 6. review and repair script (optional; --dry-run to only report)
autiobook revise workdir/ --api-key sk-...
# 7. perform the script (voice cloning)
autiobook perform workdir/
# 8. export to mp3
autiobook export workdir/or run the full dramatization pipeline in one go:
autiobook dramatize book.epub --api-key sk-...
# pause after each phase for examination
autiobook dramatize book.epub --step
# re-run the last completed phase
autiobook dramatize book.epub --redo
# enable all inline quality checks (script revise + voice/segment retake)
autiobook dramatize book.epub --strictafter generating scripts, revise reviews them against the source text,
detecting missing or hallucinated segments and repairing them via llm:
# review and repair: fill missing text, remove hallucinated segments
autiobook revise workdir/ --api-key sk-...
# only review; don't modify scripts
autiobook revise workdir/ --dry-run
# local cleanup only: strip hallucinations, skip the llm fix-missing pass
autiobook revise workdir/ --prunecallbackscansaudition/andemote/wavs for silent/clipped/noisy takes and re-generates them with a bumped seed.retakedoes the same forperform/andsynthesize/segments.locatelooks up which segment wav backs a given audio time position (useful for debugging a glitch you heard in the output).
autiobook callback workdir/
autiobook retake workdir/
autiobook locate workdir/perform/NN_Title.wav 00:12:34-o, --output DIR- workdir for intermediate files (default:<epub>_output/)-s, --voice NAME- tts voice forsynthesize(default: ryan)-c, --chapters RANGE- chapter selection (e.g., 1-5, 3,7,10)--tts-model,--tts-design-model,--tts-clone-model- override tts models--api-base,--api-key- openai-compatible endpoint (defaults to$OPENAI_BASE_URL/$OPENAI_API_KEY)--llm-model- llm model name--m4b- export as a single m4b with chapter markers-v, --verbose- verbose output-f, --force- ignore resume state
environment variables (also loadable from .env; see .env.example):
OPENAI_API_KEY, OPENAI_BASE_URL, AUTIOBOOK_LLM_MODEL,
AUTIOBOOK_TTS_CLONE_MODEL, AUTIOBOOK_TTS_INSTRUCT_MODEL,
AUTIOBOOK_TTS_DESIGN_MODEL, AUTIOBOOK_SEED,
AUTIOBOOK_LLM_THINKING_BUDGET, AUTIOBOOK_CAST_BATCH_SIZE.
Vivian, Ryan, Sunny, Aria, Bella, Nova, Echo, Finn, Atlas
creates one mp3 file per chapter in workdir/export/ (or a single .m4b
with chapter markers when --m4b is passed):
workdir/export/
├── 01_Introduction.mp3
├── 02_Chapter_One.mp3
└── ...
perform/ and synthesize/ also emit .srt and .vtt subtitles alongside
each chapter wav (with speaker labels for dramatized output).
compatible with the Voice audiobook player for android.
Intermediate files are organized into subdirectories by command:
workdir/
├── extract/ # extracted text and metadata
│ ├── metadata.json
│ ├── cover.jpg
│ ├── NN_Title.txt
│ └── state.json
├── cast/ # character list and analysis state
│ ├── characters.json
│ └── state.json
├── audition/ # per-character base voices (description only)
│ ├── Character.wav
│ └── state.json
├── emote/ # per-emotion voice variants
│ ├── Character__neutral.wav
│ ├── Character__happy.wav
│ ├── ...
│ └── state.json
├── script/ # dramatized scripts (speaker segments)
│ ├── NN_Title.json
│ └── state.json
├── perform/ # dramatized audio performance
│ ├── NN_Title.wav
│ ├── NN_Title.wav.timing.json # per-chunk start/end offsets + metadata
│ ├── NN_Title.srt # subtitles (with speaker labels)
│ ├── NN_Title.vtt # webvtt subtitles
│ ├── segments/ # segment cache
│ └── state.json
├── synthesize/ # standard mono-voice audio
│ ├── NN_Title.wav
│ ├── NN_Title.wav.timing.json
│ ├── NN_Title.srt
│ ├── NN_Title.vtt
│ ├── segments/ # segment cache
│ └── state.json
└── export/ # final mp3 output
├── NN_Title.mp3
└── state.json
Each command is fully resumable based on content hashes stored in state.json.