Skip to content
View yl4579's full-sized avatar
  • Columbia University
  • New York, US

Highlights

  • Pro

Block or report yl4579

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A Conversational Speech Generation Model

Python 14,670 1,482 Updated May 27, 2025

Unified automatic quality assessment for speech, music, and sound.

Python 733 52 Updated Jun 5, 2025

LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis

Python 661 49 Updated Jan 21, 2026

Large Concept Models: Language modeling in a sentence representation space

Python 2,363 210 Updated Jan 29, 2025

Official PyTorch implementation of "Paralinguistics-Aware Speech-Empowered LLMs for Natural Conversation" (NeurIPS 2024)

Python 95 4 Updated Dec 3, 2024

Awesome-LLM: a curated list of Large Language Model

26,948 2,594 Updated Jul 31, 2025

SCOREQ: Speech COntrastive REgression for Quality Assessment (NeurIPS 2024)

Python 115 8 Updated Aug 1, 2025

SOTA Open Source TTS

Python 30,871 2,636 Updated Jun 9, 2026

Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"

Python 14,781 2,158 Updated May 18, 2026

SALMONN family: A suite of advanced multi-modal LLMs

1,453 115 Updated May 26, 2026

An Open-Sourced LLM-empowered Foundation TTS System

Python 913 83 Updated Sep 28, 2025
Python 164 8 Updated Nov 22, 2024

LLM101n: Let's build a Storyteller

37,345 2,052 Updated Aug 1, 2024

High-quality Text-to-Audio Generation with Efficient Diffusion Transformer

Python 331 25 Updated Dec 17, 2025

Encode and decode audio samples to/from compressed latent representations!

Python 262 28 Updated Sep 19, 2025

Official implementation of the paper "BigCodec: Pushing the Limits of Low-Bitrate Neural Speech Codec"

Python 217 18 Updated Sep 19, 2024

Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.

Python 10,429 971 Updated May 16, 2026

The open source code for SimpleSpeech series

Python 146 11 Updated Oct 8, 2024

LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.

Python 3,141 223 Updated May 19, 2025

SSR-Speech: Towards Stable, Safe and Robust Zero-shot Speech Editing and Synthesis

Python 151 18 Updated Jan 1, 2025

✨✨[NeurIPS 2025] VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

Python 2,518 182 Updated Mar 28, 2025

Evaluation Protocol for Large-Scale Zero-Shot TTS Literature

Python 96 11 Updated Mar 12, 2025

AAAI 2025: Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model

Python 304 25 Updated Oct 12, 2025

The hub for audio AI research: papers, open models, benchmarks & datasets across audio LLMs, speech recognition, TTS, music & audio generation.

Python 934 48 Updated Jun 15, 2026

The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.

Python 2,082 165 Updated Apr 21, 2025

[ICLR & NeurIPS 2025] Repository for Show-o series, One Single Transformer to Unify Multimodal Understanding and Generation.

Python 1,955 91 Updated Jan 8, 2026

[ACMMM'2024] Generative Expressive Conversational Speech Synthesis

45 2 Updated Oct 28, 2024

Official PyTorch implementation of BigVGAN (ICLR 2023)

Python 1,225 145 Updated Sep 5, 2024
Next