- Make a custom Piper TTS model out of your own voice samples or any existing voice dataset
- Learn how to convert a public domain dataset into another voice using an RVC model
- Learn how to make custom datasets from audio clips and text transcripts
- Use the dataset recorder to make fun TTS clones of your family and friends
- Listen to your voice as training progresses in a convenient training environment
- Rapidly train custom TTS voices by finetuning pretrained checkpoint files
- Now runs Piper in a docker container for much more convenient installation
- Includes original resources for creating custom pronunciation rules.
- Includes original guides for using custom Piper voices with Home Assistant
- 100% free, runs 100% offline.
- https://www.tomshardware.com/raspberry-pi/add-any-voice-to-your-raspberry-pi-project-with-textymcspeechy
- https://www.hackster.io/news/erik-bjorgan-makes-voice-cloning-easy-with-the-applio-and-piper-based-textymcspeechy-e9bcef4246fb
download_defaults.shnow can use agenericlanguage pack to download checkpoints that can be used as the starting point for training voices in any language. This option may exchange a bit of voice quality for convenience, but it will make it easier for new users to get started.- Please note that there don't appear to be any compatible
lowquality pretrained checkpoints available on huggingface right now. This means that onlymediumandhighquality voices will be able to be built from downloaded pretrained checkpoint files. You can still train alowquality model from scratch and use one of its checkpoints as a pretrained checkpoint for future models. run_training.shnow offers to save any unsaved checkpoint found intraining_folderbefore wiping it, which will make it easier to resume dojos if they crash due to insufficient memory.piper_training.shnow correctly applies the quality parameter.- teaser: esauvisky has been up to some truly exciting stuff in the pull requests that should make building custom voice datasets vastly easier. Can't wait to see it in action.
download_defaults.shnow uses heuristics to rename checkpoint files that do not have the requiredepoch=1000-step=3493434.ckptfilename format.- A summary of all voice type and quality combinations provided by the
.conffile are now listed when the script is finished.
March 3 2025 - improvements to preprocessing workflow, fix for custom pronunciations not being used for training, new .conf files.
- Added option to skip or reinitialize pre-processing when a preprocessed dataset is found in
training_folder. - Discovered that although custom pronunciation rules had been compiling correctly inside the docker container,
piper_phonemizehad a second set of rules that it was using for preprocessing instead.container_apply_custom_rules.shhas been updated to correct this issue. - Big thanks to kelmoran for putting together a whole bunch of
.conffiles for downloading pretrained checkpoints in languages other than English!
- Added 21 languages supported by
espeak-ngthat were missing fromcreate_dataset.shandespeak_language_identifiers.txtdue to a truncated list being supplied when ChatGPT reformatted the markdown table. - Sinhala, Slovak, Slovenian, Lule Saami, Spanish (Spain), Spanish (Latin America), Swahili, Swedish, Tamil, Thai, Turkmen, Tatar, Telugu, Turkish, Uyghur, Urdu, Uzbek, Vietnamese (Central Vietnam), Vietnamese (Northern Vietnam), Vietnamese (Southern Vietnam), and Welsh are now available in addition to all previously supported languages.
February 24 2025 - fixed docs for manually editing voices to comply with Home Assistant's requirements
- My previous documentation of this process produced voices that worked in user scripts within Home Assistant, but I discovered that they would crash when used to create entities in
Settings>Voice Assistantsif fields set in the.onnx.jsonfile differed even slightly from what was expected. - I have updated the docs to correct this issue.
- This should not impact voices trained with the latest version of TextyMcSpeechy.
- Voice models are now exported with filenames that comply with piper's naming convention (eg
en_US-bob_1234-medium.onnx) .onnx.jsonfiles now have fields set correctly when exported- These changes should make all models exported to
tts_dojo/tts_voicesusable in Home Assistant without modifications. - Fixed issues with menus when resuming sessions that were intially trained from scratch
- Training models from scratch (ie. without using pretrained checkpoint files) is now an option provided by
run_training.sh. create_datasets.shnow stores theespeak-nglanguage identifier indataset.confso that there is no need to manually set a language during preprocessing.- the language code needed to build filenames that comply with Piper's naming convention is also stored in
dataset.conf. - datasets created with earlier versions of TextyMcSpeechy will need to be updated:
create_datasets.sh <dataset_folder> DATASETS/espeak_language_identifiers.txtprovides clear directions about which language codes to use when setting up a dataset.
- This brand new branch runs Piper in a docker container, which makes installation far, far, far, less painful.
- The scripts and docs in this branch have all been overhauled.
- The branch formerly known as
mainis now thenon-containerizedbranch. It will be kept around for reference purposes but will not be maintained.
- Layout of tmux training environment can now be saved by selecting the control console and pressing
t. This layout will be applied automatically on subsequent runs. - Custom pronunciation rules can now be defined in
tts_dojo/ESPEAK_RULES. These can be applied automatically whenever thetextymcspeechy-pipercontainer launches viaESPEAK_RULES/automated_espeak_rules.sh.
Read the quick start guide to learn how to build datasets and train models.
- Customizing pronunciation
- Using custom voices in Home Assistant
- Rendering custom voices for Home Assistant on a networked device with a GPU
- A NVIDIA GPU with drivers capable of running CUDA is required. Training on CPU, while technically possible, is not officially supported.
- A hard drive with sufficient storage capacity for the base installation (~15GB) and checkpoint files generated during training. 50gb of free space is suggested as a practical minimum.
- This project is written entirely in shell script and is primarily intended for Linux users. Debian-based distros are recommended, issues have been reported with Arch. Windows users will need to use WSL to run it.
- Check for currently installed Nvidia driver by running
nvidia-smi. If something like the image below shows up, you may be able to skip to step 3 - If Nvidia drivers are not installed on your system I recommend you do this using whatever "official" method exists for the distribution you are using. That's all the advice I can give you - in the past I have known the pain of spending hours repairing my OS after installing a driver I shouldn't have. If you survive this step continue to step 3.
- Check whether Docker is installed on your system by running
docker --version. If it is installed skip to step 5. - You can install Docker using the instructions here: https://docs.docker.com/engine/install/
- You will need the NVIDIA Container Toolkit to enable GPU access within docker containers. https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html
- Clone this repo:
git clone https://github.com/domesticatedviking/TextyMcSpeechy
- To install packages, make scripts executable, choose the type of container you wish to run, and verify that needed tools are installed, from the
TextyMcSpeechydirectory, run
sudo bash setup.sh
- Setup is complete. If you chose to use the prebuilt container from dockerhub it will download automatically the first time you use the
run_container.shscript or start to train a model. Take note that it's a 6GB download and over 10GB when decompressed. - Continue with the quick start guide to begin training models.
- The prebuilt docker container will install automatically - You don't need to download it. But if you want to anyway, run this:
docker image pull domesticatedviking/textymcspeechy-piper:latest
- To build your own image from the
Dockerfileanddocker-compose.ymlin the mainTextyMcSpeechydirectory, change to that directory and run:
docker compose build
-
Scripts are provided for launching the
textymcspeechy-piperimage, whether it is prebuilt or locally built.local_container_run.shlaunches images you have built yourself withDockerfileanddocker-compose.ymlprebuilt_container_run.shlaunches a prebuilt image.run_container.shis a script that functions as an alias to one of the scripts above. It is called byrun_training.shto automatically bring the container up when training starts.stop_container.shwill shut down thetextymcspeechy-pipercontainer if it is running.
-
Custom
espeak-ngpronunciation rules can be defined intts_dojo/ESPEAK_RULES. A guide for customizing pronunciation can be found here.