Skip to content

Zen3515/THTTS

Repository files navigation

THTTS (Thai TTS)

This project is the first implementation of Text-to-Speech (TTS) in Thai using the Wyoming protocol, making it fully compatible with Home Assistant. It enables local, streaming Thai voice synthesis for smarter automations and AI assistants—no cloud required.

Bring your local AI to life in Thai language with seamless integration, low latency, and privacy-first design.

Model Attribution

All model weights are provided by VIZINTZOR via Hugging Face:

Please acknowledge and cite VIZINTZOR if you use these models in your work.


Recommended Model

For best quality and performance, use F5-TTS v1.


How to Run

You can run the server using either direct uv commands or the provided entrypoint.sh script (recommended for Docker and easy switching).

1. Using uv Directly

VITS Thai (Female/Male)

uv run python src/wyoming_thai_vits.py --log-level INFO --host 0.0.0.0 --port 10200 \
  --model-id VIZINTZOR/MMS-TTS-THAI-FEMALEV2

uv run python src/wyoming_thai_vits.py --log-level INFO --host 0.0.0.0 --port 10200 \
  --model-id VIZINTZOR/MMS-TTS-THAI-MALEV2

F5-TTS Thai v1 (Recommended)

uv run python src/wyoming_thai_f5.py --log-level INFO --host 0.0.0.0 --port 10200 \
  --model-version v1

F5-TTS Thai v2

uv run python src/wyoming_thai_f5.py --log-level INFO --host 0.0.0.0 --port 10200 \
  --model-version v2

2. Using entrypoint.sh (Recommended)

Set the backend via THTTS_BACKEND environment variable:

  • VITS for VITS model
  • F5_V1 for F5-TTS v1 (recommended)
  • F5_V2 for F5-TTS v2

Example:

THTTS_BACKEND=F5_V1 ./entrypoint.sh

You can override other parameters via environment variables (see below).


Environment Variables

Variable Default Value Description
THTTS_BACKEND VITS Model backend: VITS, F5_V1, or F5_V2
THTTS_HOST 0.0.0.0 Bind address
THTTS_PORT 10200 Port to listen on
THTTS_LOG_LEVEL INFO Log level (DEBUG, INFO, etc.)
THTTS_MODEL VIZINTZOR/MMS-TTS-THAI-FEMALEV2 VITS model ID
THTTS_REF_AUDIO hf_sample F5 reference audio path
THTTS_REF_TEXT (empty) F5 reference transcript
THTTS_DEVICE auto auto, cpu, or cuda
THTTS_SPEED 1.0 F5 speech speed multiplier
THTTS_NFE_STEPS 32 F5 denoising steps
THTTS_MAX_CONCURRENT 1 Max concurrent synth requests
THTTS_CKPT_FILE (auto-selected by backend) F5 checkpoint file path
THTTS_VOCAB_FILE (auto-selected by backend) F5 vocab file path
THTTS_SPEAK_SPEED
THTTS_MAX_WAIT_MS
THTTS_MIN_SENT_CHARS
THTTS_VOICES_YAML Voices List YAML (For multiple voice support) (see [#voice-list-file])

Voices List yaml File

You can specify THTTS_VOICES_YAML to the path containning the following to support multiple voice at the same time

- name: default
  attribution:
    name: VIZINTZOR/F5-TTS-THAI
    url: https://huggingface.co/VIZINTZOR/F5-TTS-THAI
  languages: ["th", "th-TH"]
  description: Default Original
  installed: true
  version: "1.0"
  ref_sound_path: /mnt/data/services/thtts/ref_sound/original__ฉันเดินทางไปเที่ยวที่จังหวัดเชียงใหม่ในช่วงฤดูหนาวเพื่อสัมผัสอากาศเย็นสบาย.wav
  ref_sound_sentence: ฉันเดินทางไปเที่ยวที่จังหวัดเชียงใหม่ในช่วงฤดูหนาวเพื่อสัมผัสอากาศเย็นสบาย

- name: meme
  attribution:
    name: VIZINTZOR/F5-TTS-THAI
    url: https://huggingface.co/VIZINTZOR/F5-TTS-THAI
  languages: ["th", "th-TH"]
  description: meme Female
  installed: true
  version: "1.0"
  ref_sound_path: /mnt/data/services/thtts/ref_sound/meme__ชั้นเดินทางไปเที่ยวที่จังหวัดเชียงใหม่ในช่วงฤดูหนาวเพื่อสัมผัสอากาศเย็นสบาย.mp3
  ref_sound_sentence: ชั้นเดินทางไปเที่ยวที่จังหวัดเชียงใหม่ในช่วงฤดูหนาวเพื่อสัมผัสอากาศเย็นสบาย

3. Docker Compose (NVIDIA GPU)

services:
  thtts:
    image: ghcr.io/zen3515/thtts:latest
    container_name: thtts
    restart: unless-stopped
    shm_size: "2g" # please adjust
    environment:
      - THTTS_BACKEND=F5_V1
      - THTTS_HOST=0.0.0.0
      - THTTS_PORT=10200
      - THTTS_LOG_LEVEL=INFO
      - THTTS_DEVICE=auto
      - NVIDIA_VISIBLE_DEVICES=all
      - NVIDIA_DRIVER_CAPABILITIES=compute,utility
    ports:
      - "10200:10200"
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]

Note:

  • Make sure you have NVIDIA Container Toolkit installed.
  • Adjust the THTTS_BACKEND and other environment variables as needed.

How to Test

Query Info

printf '{"type":"describe","data":{}}\n' | nc 127.0.0.1 10200

Synthesize Speech

Just connect it to homeassistant, it's probably the most up to spec with wyoming protocol


License

See individual model pages on Hugging Face for license details.

About

Provide TTS (Text to speech) as wyoming protocol with the focus on Thai language

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages