This is still unstable! Expect breaking changes on version updates!
A Discord bot providing real-time Text-to-Speech (TTS) in voice channels using Pocket-TTS for fast, high-quality speech generation. Built with Python for cross-platform compatibility.
- Primarily needs
.safetensorsmodels generated from Pocket-TTS - Automatic TTS for muted users
- Make sure docker is installed
- Add voices to the
voicesdirectory - Configure
.env(Discord bot token, etc.) docker compose up
Download WSL2, then follow the instructions above.
- Store custom trained voices in a
voicesdirectory (create this yourself). - Add new voices by:
- Collecting audio samples (
wav,mp3,flac,m4a,ogg,opus) - Using Pocket-TTS and the
export-voices.shscript to convert/truncate/export - Placing
.safetensorsfiles in thevoicesdirectory
- Collecting audio samples (
.wavfiles may be used (no idea if this works) but are slower;.safetensorsis strongly recommended.
Use the export-voices.sh script to convert supported audio formats to truncated 30s, mono, 44kHz .wav, then export to .safetensors using Pocket-TTS.
- Bash only (not Windows compatible)
- Requires
ffmpegand Python packageuv - Usage:
Example:
./export-voices.sh <input_file_or_directory> <output_directory>
./export-voices.sh ./samples ./voices
The export_voices.py script should do the same thing, and is platform independent.
See Pocket-TTS documentation for full training details.
- Python 3.x required
- Install dependencies using
uv:uv sync
- Install ffmpeg (see ffmpeg download page)
- Create a
voicesdirectory and add your models - Place your Discord bot token and config in a
.envfile
!join- Joins the voice channel the caller is in.
- Listens for messages from muted users in VC in the channel this command was called in.
- Recommend you use this command in a VC adjacent text channel (no idea if the text channels embedded in a VC work)
!voice <voice_name>- Set your TTS voice
- Example:
!voice Joe
!s <text>- Speak text directly (no username prefix)
- Example:
!s Hello world!
!prefix <on|off>- Toggle the 'User says:' prefix (on/off).
!multi- Used for playing dialog from different voices back to back. Example:
!multi
alba: Hello everyone! How are you doing?
marius: I'm doing good!
Depends on having
alba.safetensorsandmarius.safetensorsinside thevoicesdirectory.
- Automatic TTS: Muted user’s text messages are spoken aloud in voice channels.
There's a web ui built with FastAPI + HTMX. It's a really simple UI that will allow you to play text outside of Discord. Note that it doesn't have any protection, so be cautious when deploying it publicly.
Contributions welcome! Please:
- Suggest new features
- Submit bug reports and pull requests
- Help extend functionality (such as adding a voice listing command)
Open an issue or PR on GitHub to get involved.