Dia-1.6B
Dia-1.6B generates lifelike English dialogue and vocal expressions
...Designed for realistic vocal performance, Dia supports expressive features like emotion, tone control, and non-verbal cues such as laughter, coughing, or sighs. The model accepts speaker conditioning through audio prompts, allowing limited voice cloning and speaker consistency across generations. It is optimized for English and built for real-time performance on enterprise GPUs, though CPU and quantized versions are planned. The format supports [S1]/[S2] tags to differentiate speakers and integrates easily into Python workflows. While not tuned to a specific voice, user-provided audio can guide output style. ...