Dots TTS ComfyUI

English | 中文

ComfyUI custom nodes for rednote-hilab/dots.tts.

What's New in v0.1.3

Added an opt-in compile toggle to the bottom of Dots TTS Load Model using native PyTorch Inductor/Triton compilation.
Added CUDA, Triton, Inductor, compile-length, and cudaMallocAsync compatibility guards. CUDA Graph Trees are disabled automatically with cudaMallocAsync while Triton compilation remains active.
Compile works with SDPA and Flash Attention. Changing compile, model, device, dtype, or attention fully unloads the previous bundle before reloading.
Fixed streaming vocoder LSTM compilation without enabling global Dynamo settings that could affect other ComfyUI nodes.
Compiled graphs and static generation workspaces are cleared during manual unload.
The terminal now displays Preparing/compiling until the first audio patch is ready. The first run for each length bucket is slower while PyTorch compiles it.

Nodes

Dots TTS Load Model
Dots TTS Generate
Dots TTS Voice Clone
Dots TTS Whisper Transcribe

Models

The loader catalog shows the official Rednote checkpoints first, then the drbaph BF16 conversions:

dots.tts Base FP32 (auto-download) - rednote-hilab/dots.tts-base
dots.tts SOAR FP32 (auto-download) - rednote-hilab/dots.tts-soar
dots.tts MF FP32 (auto-download) - rednote-hilab/dots.tts-mf
dots.tts Base BF16 (auto-download) - drbaph/dots.tts-base-bf16
dots.tts SOAR BF16 (auto-download) - drbaph/dots.tts-soar-bf16
dots.tts MF BF16 (auto-download) - drbaph/dots.tts-mf-bf16

dots.tts Models (Quick Reference)

Model	Recommended Steps (NFE)	CFG / Guidance Scale	Primary Use Case
dots.tts-base	10–32	1.2 (adjustable)	Fine-tuning, research, full quality/latency control
dots.tts-soar	10–32	1.2 (adjustable)	Highest-quality zero-shot voice cloning, best speaker similarity
dots.tts-mf	4	0	Low-latency production inference

Simple Recommendation

Quality first → dots.tts-soar
Speed first → dots.tts-mf
Training / fine-tuning → dots.tts-base

Downloaded model files are placed like this:

ComfyUI/
└── models/
    ├── dotstts/
    │   ├── common/
    │   │   ├── speaker_encoder.safetensors
    │   │   └── vocoder.safetensors
    │   ├── dots.tts-base/
    │   │   └── model.safetensors
    │   ├── dots.tts-soar/
    │   │   └── model.safetensors
    │   ├── dots.tts-mf/
    │   │   └── model.safetensors
    │   ├── dots.tts-base-bf16/
    │   │   └── dots.tts-base-bf16.safetensors
    │   ├── dots.tts-soar-bf16/
    │   │   └── dots.tts-soar-bf16.safetensors
    │   └── dots.tts-mf-bf16/
    │       └── dots.tts-mf-bf16.safetensors
    └── audio_encoders/
        ├── openai_whisper-large-v3-turbo/
        ├── openai_whisper-large-v3/
        ├── openai_whisper-medium/
        ├── openai_whisper-small/
        └── openai_whisper-tiny/

Small tokenizer/config assets are bundled in this custom node and separated by source model:

ComfyUI/
└── custom_nodes/
    └── Dots-TTS-ComfyUI/
        └── assets/
            ├── dots.tts-base/
            │   ├── added_tokens.json
            │   ├── chat_template.jinja
            │   ├── config.json
            │   ├── latent_stats.pt
            │   ├── llm_config.json
            │   ├── merges.txt
            │   ├── special_tokens_map.json
            │   ├── tokenizer.json
            │   ├── tokenizer_config.json
            │   └── vocab.json
            ├── dots.tts-soar/
            │   └── same small-file set
            └── dots.tts-mf/
                └── same small-file set

BF16 entries use the matching source-model assets. For example, drbaph/dots.tts-base-bf16 uses assets/dots.tts-base/.

Shared heavy assets come from drbaph/dots.tts-common and are stored under ComfyUI/models/dotstts/common/. The common repo files live at the repo root:

drbaph/dots.tts-common/speaker_encoder.safetensors
drbaph/dots.tts-common/vocoder.safetensors

At load time the node assembles an upstream-compatible runtime cache under runtime/ using links/copies from node assets, shared heavy assets, and the selected model weight. The loader uses Hugging Face directly and does not use HF mirrors.

Generation Limits

max_audio_patches on both Generate and Voice Clone is the maximum audio patch budget for that generation, not a text-token limit. The default is 500. With the bundled configs, one patch is about 0.32 seconds, so 500 is about 160 seconds of audio budget. The model can stop earlier when it reaches EOS; very long text can hit the cap and end early. Voice Clone prompt audio paired with reference_text also consumes part of this budget.

Generation uses a live tqdm terminal progress bar with percentage, elapsed time, estimated remaining time, and iteration speed. Since Dots TTS decides its final length by EOS during generation, the live total is the configured max_audio_patches ceiling; after a successful early stop, the completed bar is normalized to the actual emitted chunk count.

Performance

The loader's optional compile toggle uses upstream's native torch.compile path with PyTorch Inductor and Triton. It is CUDA-only, requires a working Triton installation, and is compatible with both SDPA and Flash Attention. When ComfyUI uses the cudaMallocAsync allocator, the node automatically disables incompatible CUDA Graph Trees while keeping Inductor/Triton compilation enabled. Compilation is lazy: the first generation for each max_audio_patches length bucket is slower while the graph is compiled, then later generations reuse it. Compiled mode supports up to 1024 audio patches. Changing the model, device, dtype, attention, or compile setting fully unloads the active bundle before loading the new one; manual unload also clears compiled graphs and generation workspaces.

For the fastest model path, use the MF BF16 checkpoint with steps=4. Smaller max_audio_patches values can also select a smaller compile bucket and reduce compile time and workspace memory. Upstream recommends splitting long text into shorter segments and keeping voice-clone reference audio around 10 seconds.

Languages

Officially benchmarked: 24 languages — Chinese, English, Cantonese, Japanese, Korean, Arabic, Spanish, Turkish, Indonesian, Portuguese, French, Italian, Dutch, Vietnamese, German, Russian, Ukrainian, Thai, Polish, Romanian, Greek, Czech, Finnish, and Hindi. It may be able to do more languages but those are the ones officially benchmarked. Not all languages produce high quality results — you may need to experiment for yourself to see.

The language dropdown is kept to those 24 languages, plus auto and none: AR, YUE, ZH, CS, NL, EN, FI, FR, DE, EL, HI, ID, IT, JA, KO, PL, PT, RO, RU, ES, TH, TR, UK, VI.

Install

ComfyUI-Manager (recommended): Open ComfyUI-Manager, search for Dots TTS, and click Install. ComfyUI-Manager will handle everything automatically.

Manual helper install with uv:

python -m uv pip install -r requirements.txt

Manual helper install with pip:

python -m pip install -r requirements.txt

The installer protects ComfyUI's core runtime packages and will not automatically upgrade torch, torchaudio, torchvision, transformers, or pydantic.

Notes

Dots upstream recommends recent transformers and pydantic v2. This node warns about those versions instead of changing them automatically, because surprise upgrades can break other ComfyUI nodes.

Audio file I/O uses soundfile first. Dots' speaker feature path has a torchaudio-free fallback for broken torchaudio installs, though the original torchaudio/Kaldi fbank path is used when available.

References

Citation

@article{dotstts2026,
  title   = {dots.tts Technical Report},
  author  = {dots.tts Team},
  journal = {arXiv preprint},
  year    = {2026},
}

License

Released under Apache-2.0.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.github/workflows		.github/workflows
assets		assets
dots_tts		dots_tts
example_workflows		example_workflows
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README_ZH.md		README_ZH.md
__init__.py		__init__.py
install.py		install.py
languages.py		languages.py
loader.py		loader.py
nodes.py		nodes.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
runtime_adapter.py		runtime_adapter.py
whisper.py		whisper.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dots TTS ComfyUI

What's New in v0.1.3

Nodes

Models

dots.tts Models (Quick Reference)

Simple Recommendation

Generation Limits

Performance

Languages

Install

Notes

References

Citation

License

About

Uh oh!

Releases 4

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Dots TTS ComfyUI

What's New in v0.1.3

Nodes

Models

dots.tts Models (Quick Reference)

Simple Recommendation

Generation Limits

Performance

Languages

Install

Notes

References

Citation

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 4

Contributors

Uh oh!

Languages