🎙️ Dia Q4 BIT- Text to Speech Dialogue Model

Dia is a 1.6B parameter text-to-speech model developed by Nari Labs, designed to generate highly realistic dialogues directly from transcripts. It supports emotional conditioning and non-verbal cues such as laughter, coughing, and more.

🧪 Pretrained model checkpoints and inference code are publicly available to accelerate research and experimentation.
💬 English only (for now).
🔬 Demo available on Hugging Face Spaces.

🔧 About This Fork

This is a community-enhanced fork of the original nari-labs/dia repository. It focuses on performance optimizations and accessibility for personal hardware.

🚀 Enhancements in This Fork:

🧠 Quantization (4-bit & INT8)
Reduces VRAM usage by nearly 50%. The model now runs on GPUs with just 8GB VRAM, consuming less than 6GB in practice.
⚡ Flash Attention Integration
Experimental support for Flash Attention to speed up inference and reduce memory footprint.
🔄 Continued Development
Ongoing efforts to improve inference speed, memory efficiency, and model accessibility.

✨ Features

🔈 Generate expressive dialogue using [S1], [S2] tags.
🤖 Realistic non-verbal sounds: (laughs), (sighs), (coughs), etc.
🧬 Optional voice cloning via reference audio.
🪄 Supports speaker conditioning and output diversity.
💡 Simple Python API for generation and saving audio.

⚡ Quickstart

1. Install via pip (from this fork)

pip install git+https://github.com/rzafiamy/dia-q4-bit.git

2. Launch the Gradio UI

git clone https://github.com/rzafiamy/dia-q4-bit.git
cd dia
uv run app.py

Or using a Python virtual environment:

git clone https://github.com/rzafiamy/dia-q4-bit.git
cd dia
python -m venv .venv
source .venv/bin/activate
pip install -e .
python app.py

ℹ️ Voices vary between runs unless you condition output with an audio prompt or set a fixed seed.

🐍 Usage in Python

from dia.model import Dia

def get_model(quantize=False, quantize_4bit=False):
    print("[INFO] Loading Dia model...")
    return Dia.from_pretrained(
        "nari-labs/Dia-1.6B",
        compute_dtype="float16",
        quantize=quantize,
        quantize_4bit=quantize_4bit
    )

model = get_model(quantize_4bit=True)  # or quantize=True for int8

text = "[S1] Dia is fast and memory-efficient! [S2] And it sounds great. (laughs)"

output = model.generate(text, use_torch_compile=True, verbose=True)
model.save_audio("simple.mp3", output)

🔧 Notes on Quantization

quantize=True: Use INT8 quantization
quantize_4bit=True: Use 4-bit quantization
Both reduce VRAM usage significantly (under 6 GB), perfect for 8 GB consumer GPUs.
Only one of these options should be enabled at a time.

📢 CLI and PyPI package coming soon.

🖥️ Hardware & Inference Performance

Tested on PyTorch 2.0+ and CUDA 12.6. CPU support coming soon.

Precision	Real-time (w/ compile)	Real-time (w/o compile)	VRAM Usage
`float16`	x2.2	x1.3	~10 GB
`bfloat16`	x2.1	x1.5	~10 GB
`float32`	x1.0	x0.9	~13 GB
`int8/4bit`	✅ Efficient (forked)	✅ Efficient (forked)	< 6 GB

🪪 License

Apache License 2.0 – see the LICENSE file for full details.

⚠️ Disclaimer

This model is for research and educational use only. By using it, you agree not to:

Mimic real identities without consent.
Generate misleading or harmful content.
Use it for illegal or malicious purposes.

🔭 TODO / Roadmap

🤝 Contributing

This fork is maintained by the open-source community.
PRs are welcome! Join us on Discord to collaborate or share ideas.

🙏 Acknowledgements

Nari Labs for the original Dia model.
Google TPU Research Cloud for computing resources.
Research inspiration from: SoundStorm, Parakeet, and Descript Audio Codec.
Hugging Face for hosting weights and demo space.

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
.github/workflows		.github/workflows
dia		dia
docker		docker
example		example
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
app.py		app.py
cli.py		cli.py
example_prompt.mp3		example_prompt.mp3
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🎙️ Dia Q4 BIT- Text to Speech Dialogue Model

🔧 About This Fork

🚀 Enhancements in This Fork:

✨ Features

⚡ Quickstart

1. Install via pip (from this fork)

2. Launch the Gradio UI

🐍 Usage in Python

🔧 Notes on Quantization

🖥️ Hardware & Inference Performance

🪪 License

⚠️ Disclaimer

🔭 TODO / Roadmap

🤝 Contributing

🙏 Acknowledgements

⭐ Star History

About

Uh oh!

Releases

Packages

Languages

License

rzafiamy/dia-q4-bit

Folders and files

Latest commit

History

Repository files navigation

🎙️ Dia Q4 BIT- Text to Speech Dialogue Model

🔧 About This Fork

🚀 Enhancements in This Fork:

✨ Features

⚡ Quickstart

1. Install via pip (from this fork)

2. Launch the Gradio UI

🐍 Usage in Python

🔧 Notes on Quantization

🖥️ Hardware & Inference Performance

🪪 License

⚠️ Disclaimer

🔭 TODO / Roadmap

🤝 Contributing

🙏 Acknowledgements

⭐ Star History

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages