Skip to content
forked from nari-labs/dia

A TTS model capable of generating ultra-realistic dialogue in one pass. This is a fork of dia and 4bit quant

License

Notifications You must be signed in to change notification settings

rzafiamy/dia-q4-bit

 
 

Repository files navigation

Dia Banner

Join Waitlist LICENSE

Model on Hugging Face Demo Space on Hugging Face


🎙️ Dia Q4 BIT- Text to Speech Dialogue Model

Dia is a 1.6B parameter text-to-speech model developed by Nari Labs, designed to generate highly realistic dialogues directly from transcripts. It supports emotional conditioning and non-verbal cues such as laughter, coughing, and more.

🧪 Pretrained model checkpoints and inference code are publicly available to accelerate research and experimentation.
💬 English only (for now).
🔬 Demo available on Hugging Face Spaces.


🔧 About This Fork

This is a community-enhanced fork of the original nari-labs/dia repository. It focuses on performance optimizations and accessibility for personal hardware.

🚀 Enhancements in This Fork:

  • 🧠 Quantization (4-bit & INT8)
    Reduces VRAM usage by nearly 50%. The model now runs on GPUs with just 8GB VRAM, consuming less than 6GB in practice.

  • Flash Attention Integration
    Experimental support for Flash Attention to speed up inference and reduce memory footprint.

  • 🔄 Continued Development
    Ongoing efforts to improve inference speed, memory efficiency, and model accessibility.


✨ Features

  • 🔈 Generate expressive dialogue using [S1], [S2] tags.
  • 🤖 Realistic non-verbal sounds: (laughs), (sighs), (coughs), etc.
  • 🧬 Optional voice cloning via reference audio.
  • 🪄 Supports speaker conditioning and output diversity.
  • 💡 Simple Python API for generation and saving audio.

⚡ Quickstart

1. Install via pip (from this fork)

pip install git+https://github.com/rzafiamy/dia-q4-bit.git

2. Launch the Gradio UI

git clone https://github.com/rzafiamy/dia-q4-bit.git
cd dia
uv run app.py

Or using a Python virtual environment:

git clone https://github.com/rzafiamy/dia-q4-bit.git
cd dia
python -m venv .venv
source .venv/bin/activate
pip install -e .
python app.py

ℹ️ Voices vary between runs unless you condition output with an audio prompt or set a fixed seed.


🐍 Usage in Python

from dia.model import Dia

def get_model(quantize=False, quantize_4bit=False):
    print("[INFO] Loading Dia model...")
    return Dia.from_pretrained(
        "nari-labs/Dia-1.6B",
        compute_dtype="float16",
        quantize=quantize,
        quantize_4bit=quantize_4bit
    )

model = get_model(quantize_4bit=True)  # or quantize=True for int8

text = "[S1] Dia is fast and memory-efficient! [S2] And it sounds great. (laughs)"

output = model.generate(text, use_torch_compile=True, verbose=True)
model.save_audio("simple.mp3", output)

🔧 Notes on Quantization

  • quantize=True: Use INT8 quantization
  • quantize_4bit=True: Use 4-bit quantization
  • Both reduce VRAM usage significantly (under 6 GB), perfect for 8 GB consumer GPUs.
  • Only one of these options should be enabled at a time.

📢 CLI and PyPI package coming soon.


🖥️ Hardware & Inference Performance

Tested on PyTorch 2.0+ and CUDA 12.6. CPU support coming soon.

Precision Real-time (w/ compile) Real-time (w/o compile) VRAM Usage
float16 x2.2 x1.3 ~10 GB
bfloat16 x2.1 x1.5 ~10 GB
float32 x1.0 x0.9 ~13 GB
int8/4bit ✅ Efficient (forked) ✅ Efficient (forked) < 6 GB

🪪 License

Apache License 2.0 – see the LICENSE file for full details.


⚠️ Disclaimer

This model is for research and educational use only. By using it, you agree not to:

  • Mimic real identities without consent.
  • Generate misleading or harmful content.
  • Use it for illegal or malicious purposes.

🔭 TODO / Roadmap

  • 4-bit / INT8 quantization support.
  • Flash Attention integration.
  • CPU inference support.
  • Docker support (incl. ARM/MacOS).
  • Public PyPI release & CLI.
  • Larger model versions.

🤝 Contributing

This fork is maintained by the open-source community.
PRs are welcome! Join us on Discord to collaborate or share ideas.


🙏 Acknowledgements


⭐ Star History

Star History Chart

About

A TTS model capable of generating ultra-realistic dialogue in one pass. This is a fork of dia and 4bit quant

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%