leo19941227

Shu-wen Yang leo19941227

Speech and Audio Foundation Models

118 followers · 90 following

National Taiwan University
Taipei, Taiwan
07:08 (UTC +08:00)
https://leo19941227.github.io
@leo19941227

Achievements

Organizations

Lists (3)

Sort

✨ Inspiration

1 repository

save

Tools

1 repository

Stars

xiquan-li / MeanAudio

MeanAudio: Fast and Faithful Text-to-Audio Generation with Mean Flows

Python 116 11 Updated Sep 2, 2025

LTH14 / JiT

PyTorch implementation of JiT https://arxiv.org/abs/2511.13720

Python 1,796 105 Updated Dec 8, 2025

zhenye234 / X-Codec-2.0

Codec for paper: LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis

Python 334 45 Updated Jul 21, 2025

facebookresearch / omnilingual-asr

Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages

Python 2,471 213 Updated Dec 16, 2025

Jiawei-Yang / DeTok

Official PyTorch Implementation of "Latent Denoising Makes Good Visual Tokenizers"

Jupyter Notebook 164 4 Updated Dec 17, 2025

AmphionTeam / TaDiCodec

This repository contains a series of works on diffusion-based speech tokenizers, including the official implementation of the paper: "TaDiCodec: Text-aware Diffusion Speech Tokenizer for Speech Lan…

Python 69 3 Updated Sep 21, 2025

XiaomiMiMo / MiMo-Audio-Training

Python 91 10 Updated Oct 16, 2025

HeCheng0625 / Diffusion-Speech-Tokenizer

Python 195 13 Updated Sep 21, 2025

ictnlp / SLED-TTS

Streamable Text-to-Speech model using a language modeling approach, without vector quantization

Python 106 7 Updated May 20, 2025

open-compass / opencompass

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.

Python 6,427 703 Updated Dec 17, 2025

mtkresearch / TASTE-SpokenLM

A method that directly addresses the modality gap by aligning speech token with the corresponding text transcription during the tokenization stage.

Python 101 11 Updated Sep 3, 2025

gemelo-ai / vocos

Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis

Python 1,028 121 Updated Aug 7, 2024

Soul-AILab / SAC

Trainging, inference, and testing of the SAC speech codec model.

Python 91 6 Updated Nov 1, 2025

Anuttacon / speech_drame

Python 28 2 Updated Nov 4, 2025

yangdongchao / ALMTokenizer

The demo page for ALMTokenizer

Python 55 3 Updated Apr 14, 2025

shiml20 / SVG

Official PyTorch Implementation of "Latent Diffusion Model Without Variational Autoencoder".

Python 378 13 Updated Dec 15, 2025

inclusionAI / Ming-UniAudio

Ming-UniAudio: Speech LLM for Joint Understanding, Generation and Editing with Unified Representation

Python 405 28 Updated Nov 27, 2025

karpathy / nanochat

The best ChatGPT that $100 can buy.

Python 38,813 4,890 Updated Dec 9, 2025

d223302 / SHANKS

JavaScript 2 Updated Oct 12, 2025

ZhikangNiu / Semantic-VAE

Official code for "Semantic-VAE: Semantic-Alignment Latent Representation for Better Speech Synthesis"

Python 102 4 Updated Oct 26, 2025

IKMLab / NTHU_Natural_Language_Processing

Jupyter Notebook 139 41 Updated Dec 15, 2025

Red-Killer / shit

4,011 280 Updated Jun 30, 2025

huggingface / accelerate

🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support

Python 9,384 1,245 Updated Dec 17, 2025