This project is the first implementation of Text-to-Speech (TTS) in Thai using the Wyoming protocol, making it fully compatible with Home Assistant. It enables local, streaming Thai voice synthesis for smarter automations and AI assistants—no cloud required.
Bring your local AI to life in Thai language with seamless integration, low latency, and privacy-first design.
All model weights are provided by VIZINTZOR via Hugging Face:
- VITS Thai Female/Male:
MMS-TTS-THAI-FEMALEV2,
MMS-TTS-THAI-MALEV2 - F5-TTS Thai:
F5-TTS-THAI
F5-TTS-TH-V2
Please acknowledge and cite VIZINTZOR if you use these models in your work.
For best quality and performance, use F5-TTS v1.
You can run the server using either direct uv
commands or the provided entrypoint.sh
script (recommended for Docker and easy switching).
uv run python src/wyoming_thai_vits.py --log-level INFO --host 0.0.0.0 --port 10200 \
--model-id VIZINTZOR/MMS-TTS-THAI-FEMALEV2
uv run python src/wyoming_thai_vits.py --log-level INFO --host 0.0.0.0 --port 10200 \
--model-id VIZINTZOR/MMS-TTS-THAI-MALEV2
uv run python src/wyoming_thai_f5.py --log-level INFO --host 0.0.0.0 --port 10200 \
--model-version v1
uv run python src/wyoming_thai_f5.py --log-level INFO --host 0.0.0.0 --port 10200 \
--model-version v2
Set the backend via THTTS_BACKEND
environment variable:
VITS
for VITS modelF5_V1
for F5-TTS v1 (recommended)F5_V2
for F5-TTS v2
Example:
THTTS_BACKEND=F5_V1 ./entrypoint.sh
You can override other parameters via environment variables (see below).
Variable | Default Value | Description |
---|---|---|
THTTS_BACKEND |
VITS |
Model backend: VITS , F5_V1 , or F5_V2 |
THTTS_HOST |
0.0.0.0 |
Bind address |
THTTS_PORT |
10200 |
Port to listen on |
THTTS_LOG_LEVEL |
INFO |
Log level (DEBUG , INFO , etc.) |
THTTS_MODEL |
VIZINTZOR/MMS-TTS-THAI-FEMALEV2 |
VITS model ID |
THTTS_REF_AUDIO |
hf_sample |
F5 reference audio path |
THTTS_REF_TEXT |
(empty) | F5 reference transcript |
THTTS_DEVICE |
auto |
auto , cpu , or cuda |
THTTS_SPEED |
1.0 |
F5 speech speed multiplier |
THTTS_NFE_STEPS |
32 |
F5 denoising steps |
THTTS_MAX_CONCURRENT |
1 |
Max concurrent synth requests |
THTTS_CKPT_FILE |
(auto-selected by backend) | F5 checkpoint file path |
THTTS_VOCAB_FILE |
(auto-selected by backend) | F5 vocab file path |
THTTS_SPEAK_SPEED |
||
THTTS_MAX_WAIT_MS |
||
THTTS_MIN_SENT_CHARS |
||
THTTS_VOICES_YAML |
Voices List YAML (For multiple voice support) (see [#voice-list-file]) |
You can specify THTTS_VOICES_YAML
to the path containning the following to support multiple voice at the same time
- name: default
attribution:
name: VIZINTZOR/F5-TTS-THAI
url: https://huggingface.co/VIZINTZOR/F5-TTS-THAI
languages: ["th", "th-TH"]
description: Default Original
installed: true
version: "1.0"
ref_sound_path: /mnt/data/services/thtts/ref_sound/original__ฉันเดินทางไปเที่ยวที่จังหวัดเชียงใหม่ในช่วงฤดูหนาวเพื่อสัมผัสอากาศเย็นสบาย.wav
ref_sound_sentence: ฉันเดินทางไปเที่ยวที่จังหวัดเชียงใหม่ในช่วงฤดูหนาวเพื่อสัมผัสอากาศเย็นสบาย
- name: meme
attribution:
name: VIZINTZOR/F5-TTS-THAI
url: https://huggingface.co/VIZINTZOR/F5-TTS-THAI
languages: ["th", "th-TH"]
description: meme Female
installed: true
version: "1.0"
ref_sound_path: /mnt/data/services/thtts/ref_sound/meme__ชั้นเดินทางไปเที่ยวที่จังหวัดเชียงใหม่ในช่วงฤดูหนาวเพื่อสัมผัสอากาศเย็นสบาย.mp3
ref_sound_sentence: ชั้นเดินทางไปเที่ยวที่จังหวัดเชียงใหม่ในช่วงฤดูหนาวเพื่อสัมผัสอากาศเย็นสบาย
services:
thtts:
image: ghcr.io/zen3515/thtts:latest
container_name: thtts
restart: unless-stopped
shm_size: "2g" # please adjust
environment:
- THTTS_BACKEND=F5_V1
- THTTS_HOST=0.0.0.0
- THTTS_PORT=10200
- THTTS_LOG_LEVEL=INFO
- THTTS_DEVICE=auto
- NVIDIA_VISIBLE_DEVICES=all
- NVIDIA_DRIVER_CAPABILITIES=compute,utility
ports:
- "10200:10200"
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
Note:
- Make sure you have NVIDIA Container Toolkit installed.
- Adjust the
THTTS_BACKEND
and other environment variables as needed.
printf '{"type":"describe","data":{}}\n' | nc 127.0.0.1 10200
Just connect it to homeassistant, it's probably the most up to spec with wyoming protocol
See individual model pages on Hugging Face for license details.