LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.

Python 3,141 223 Updated May 19, 2025

supertone-inc / super-monotonic-align

Python 171 12 Updated Sep 19, 2024

WangHelin1997 / SSR-Speech

SSR-Speech: Towards Stable, Safe and Robust Zero-shot Speech Editing and Synthesis

Python 151 18 Updated Jan 1, 2025

VITA-MLLM / VITA

✨✨[NeurIPS 2025] VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

Python 2,518 182 Updated Mar 28, 2025

keonlee9420 / evaluate-zero-shot-tts

Evaluation Protocol for Large-Scale Zero-Shot TTS Literature

Python 96 11 Updated Mar 12, 2025

zhenye234 / xcodec

AAAI 2025: Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model

Python 304 25 Updated Oct 12, 2025

BinWang28 / audio-ai-hub

The hub for audio AI research: papers, open models, benchmarks & datasets across audio LLMs, speech recognition, TTS, music & audio generation.

Python 934 48 Updated Jun 15, 2026

QwenLM / Qwen2-Audio

The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.

Python 2,082 165 Updated Apr 21, 2025

showlab / Show-o

[ICLR & NeurIPS 2025] Repository for Show-o series, One Single Transformer to Unify Multimodal Understanding and Generation.

Python 1,955 91 Updated Jan 8, 2026

AI-S2-Lab / GPT-Talker

[ACMMM'2024] Generative Expressive Conversational Speech Synthesis

45 2 Updated Oct 28, 2024

NVIDIA / BigVGAN

Official PyTorch implementation of BigVGAN (ICLR 2023)

Python 1,225 145 Updated Sep 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Aaron (Yinghao) Li yl4579

Achievements

Achievements

Highlights

Block or report yl4579

Stars

SesameAILabs / csm

facebookresearch / audiobox-aesthetics

zhenye234 / LLaSA_training

deepseek-ai / DeepSeek-R1

facebookresearch / large_concept_model

naver-ai / usdm

Hannibal046 / Awesome-LLM

alessandroragano / scoreq

fishaudio / fish-speech

SWivid / F5-TTS

bytedance / SALMONN

FireRedTeam / FireRedTTS

tencent-ailab / MuCodec

karpathy / LLM101n

haidog-yaqub / EzAudio

SonyCSLParis / music2latent

Aria-K-Alethia / BigCodec

kyutai-labs / moshi

yangdongchao / SimpleSpeech

ictnlp / LLaMA-Omni