LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.

Python 3,143 224 Updated May 19, 2025

supertone-inc / super-monotonic-align

Python 173 13 Updated Sep 19, 2024

WangHelin1997 / SSR-Speech

SSR-Speech: Towards Stable, Safe and Robust Zero-shot Speech Editing and Synthesis

Python 154 18 Updated Jan 1, 2025

VITA-MLLM / VITA

✨✨[NeurIPS 2025] VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

Python 2,521 182 Updated Mar 28, 2025

keonlee9420 / evaluate-zero-shot-tts

Evaluation Protocol for Large-Scale Zero-Shot TTS Literature

Python 97 10 Updated Mar 12, 2025

zhenye234 / xcodec

AAAI 2025: Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model

Python 308 25 Updated Oct 12, 2025

BinWang28 / audio-ai-hub

The hub for audio AI research: papers, open models, benchmarks & datasets across audio LLMs, speech recognition, TTS, music & audio generation.

Python 948 48 Updated Jul 20, 2026

QwenLM / Qwen2-Audio

The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.

Python 2,093 168 Updated Apr 21, 2025

showlab / Show-o

[ICLR & NeurIPS 2025] Repository for Show-o series, One Single Transformer to Unify Multimodal Understanding and Generation.

Python 1,964 94 Updated Jan 8, 2026

AI-S2-Lab / GPT-Talker

[ACMMM'2024] Generative Expressive Conversational Speech Synthesis

45 2 Updated Oct 28, 2024

NVIDIA / BigVGAN

Official PyTorch implementation of BigVGAN (ICLR 2023)

Python 1,227 146 Updated Sep 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Aaron (Yinghao) Li yl4579

Achievements

Achievements

Highlights

Block or report yl4579

Stars

SesameAILabs / csm

facebookresearch / audiobox-aesthetics

zhenye234 / LLaSA_training

deepseek-ai / DeepSeek-R1

facebookresearch / large_concept_model

naver-ai / usdm

Hannibal046 / Awesome-LLM

alessandroragano / scoreq

fishaudio / fish-speech

SWivid / F5-TTS

bytedance / SALMONN

FireRedTeam / FireRedTTS

tencent-ailab / MuCodec

karpathy / LLM101n

haidog-yaqub / EzAudio

SonyCSLParis / music2latent

Aria-K-Alethia / BigCodec

kyutai-labs / moshi

yangdongchao / SimpleSpeech

BayLing-Models / BayLing-Speech