Audiblez generates .m4b audiobooks from regular .epub e-books,
using Kokoro's high-quality speech synthesis.
Kokoro-82M is a recently published text-to-speech model with just 82M params and very natural sounding output. It's released under Apache licence and it was trained on < 100 hours of audio. It currently supports these languages: ๐บ๐ธ ๐ฌ๐ง ๐ช๐ธ ๐ซ๐ท ๐ฎ๐ณ ๐ฎ๐น ๐ฏ๐ต ๐ง๐ท ๐จ๐ณ
On a Google Colab's T4 GPU via Cuda, it takes about 5 minutes to convert "Animal's Farm" by Orwell (which is about 160,000 characters) to audiobook, at a rate of about 600 characters per second.
On my M2 MacBook Pro, on CPU, it takes about 1 hour, at a rate of about 60 characters per second.
If you have Python 3 on your computer, you can install it with pip.
You also need espeak-ng and ffmpeg installed on your machine:
sudo apt install ffmpeg espeak-ng # on Ubuntu/Debian ๐ง
pip install audiblezbrew install ffmpeg espeak-ng # on Mac ๐
pip install audiblezThen you can convert an .epub directly with:
audiblez book.epub -v af_sky
It will first create a bunch of book_chapter_1.wav, book_chapter_2.wav, etc. files in the same directory,
and at the end it will produce a book.m4b file with the whole book you can listen with VLC or any
audiobook player.
It will only produce the .m4b file if you have ffmpeg installed on your machine.
The GUI is a simple graphical interface to use audiblez. You need some extra dependencies to run the GUI:
sudo apt install ffmpeg espeak-ng
sudo apt install libgtk-3-dev # just for Ubuntu/Debian ๐ง, Windows/Mac don't need this
pip install audiblez pillow wxpython
Then you can run the GUI with:
audiblez-ui
After many trials, on Windows we recommend to install audiblez in a Python venv:
- Open a Windows terminal
- Create anew folder:
mkdir audiblez - Enter the folder:
cd audiblez - Create a venv:
python -m venv venv - Activate the venv:
.\venv\Scripts\Activate.ps1 - Install the dependencies:
pip install audiblez pillow wxpython - Now you can run
audiblezoraudiblez-ui - For Cuda support, you need to install Pytorch accordingly: https://pytorch.org/get-started/locally/
By default the audio is generated using a normal speed, but you can make it up to twice slower or faster by specifying a speed argument between 0.5 to 2.0:
audiblez book.epub -v af_sky -s 1.5
Use -v option to specify the voice to use. Available voices are listed here.
The first letter is the language code and the second is the gender of the speaker e.g. im_nicola is an italian male voice.
For hearing samples of Kokoro-82M voices, go here
| Language | Voices |
|---|---|
| ๐บ๐ธ American English | af_alloy, af_aoede, af_bella, af_heart, af_jessica, af_kore, af_nicole, af_nova, af_river, af_sarah, af_sky, am_adam, am_echo, am_eric, am_fenrir, am_liam, am_michael, am_onyx, am_puck, am_santa |
| ๐ฌ๐ง British English | bf_alice, bf_emma, bf_isabella, bf_lily, bm_daniel, bm_fable, bm_george, bm_lewis |
| ๐ช๐ธ Spanish | ef_dora, em_alex, em_santa |
| ๐ซ๐ท French | ff_siwis |
| ๐ฎ๐ณ Hindi | hf_alpha, hf_beta, hm_omega, hm_psi |
| ๐ฎ๐น Italian | if_sara, im_nicola |
| ๐ฏ๐ต Japanese | jf_alpha, jf_gongitsune, jf_nezumi, jf_tebukuro, jm_kumo |
| ๐ง๐ท Brazilian Portuguese | pf_dora, pm_alex, pm_santa |
| ๐จ๐ณ Mandarin Chinese | zf_xiaobei, zf_xiaoni, zf_xiaoxiao, zf_xiaoyi, zm_yunjian, zm_yunxi, zm_yunxia, zm_yunyang |
For more detaila about voice quality, check this document: Kokoro-82M voices
Use --device to pick the compute backend:
audiblez book.epub --device auto # default: MPS on Apple Silicon, else CUDA, else CPU
audiblez book.epub --device mps # Apple Silicon Metal GPU (M-series)
audiblez book.epub --device cuda # NVIDIA GPU
audiblez book.epub --device cpu # force CPU
--cuda is kept as a deprecated alias for --device cuda.
Audiblez runs the Kokoro model on Apple's Metal GPU via PyTorch's MPS backend. On an M-series chip MPS is roughly 25โ35% faster than CPU on real-world EPUBs at steady state (e.g. ~228 vs ~170 chars/sec on representative English prose). MPS also tends to use less power than CPU for the same work.
Four implementation details make MPS work well:
- Model on MPS only. The model is moved with
.to('mps'); we deliberately don't calltorch.set_default_device('mps'), since that drags unrelated tensors (tokenizer state, numpy bridges) onto the GPU and causes constant CPUโGPU transfers that actually make MPS slower than CPU. - Conv1d vocoder. The vocoder is built with
disable_complex=True, which swaps Kokoro'storch.stft-based STFT for its built-in conv1d implementation (CustomSTFT). PyTorch's complex-tensor path on MPS is significantly slower than the real-valued conv path. - Sentence batching. Adjacent sentences are batched into ~1200-character chunks before being handed to Kokoro, well past Kokoro's 510-phoneme internal split point. This amortizes the fixed per-call overhead (kernel launch, tensor allocation) and lets Kokoro pick its own optimal split boundaries internally. This alone produced ~30% speedup on both CPU and MPS.
- fp32 weights. Half-precision (bf16/fp16) is exposed via
--precisionbut not recommended on MPS as of PyTorch 2.12: whole-model bf16/fp16 crashes inMPSNDArrayMatrixMultiplication(accumulator/destination dtype mismatch), and the autocast workaround introduces enough dtype-boundary overhead that it's slower than fp32 in practice. The flag is still useful for CUDA users.
The first MPS run takes a few extra seconds while Metal compiles its shader cache; subsequent runs are fast.
Check out this example: Audiblez running on a Google Colab Notebook with CUDA.
Sometimes you want to manually select which chapters/sections in the e-book to read out loud.
To do so, you can use --pick to interactively choose the chapters to convert (without running the GUI).
For all the options available, you can check the help page audiblez --help:
usage: audiblez [-h] [-v VOICE] [-p] [-s SPEED] [-d {auto,cpu,cuda,mps}]
[--precision {fp32,bf16,fp16}] [-c] [-o FOLDER] epub_file_path
positional arguments:
epub_file_path Path to the epub file
options:
-h, --help show this help message and exit
-v VOICE, --voice VOICE
Choose narrating voice: a, b, e, f, h, i, j, p, z
-p, --pick Interactively select which chapters to read in the audiobook
-s SPEED, --speed SPEED
Set speed from 0.5 to 2.0
-d {auto,cpu,cuda,mps}, --device {auto,cpu,cuda,mps}
Compute device: auto (default), cpu, cuda (NVIDIA), or mps (Apple Metal)
--precision {fp32,bf16,fp16}
Model precision: fp32 (default), bf16, fp16. Half precision currently crashes on MPS;
useful on CUDA.
-c, --cuda Deprecated; equivalent to --device cuda
-o FOLDER, --output FOLDER
Output folder for the audiobook and temporary files
example:
audiblez book.epub -l en-us -v af_sky
to use the GUI, run:
audiblez-ui
by Claudio Santini in 2025, distributed under MIT licence.
Related Article: Audiblez v4: Generate Audiobooks from E-books