-
National Taiwan University
- Taipei, Taiwan
-
07:08
(UTC +08:00) - https://leo19941227.github.io
- @leo19941227
Lists (3)
Sort Name ascending (A-Z)
Stars
MeanAudio: Fast and Faithful Text-to-Audio Generation with Mean Flows
PyTorch implementation of JiT https://arxiv.org/abs/2511.13720
Codec for paper: LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis
Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages
Official PyTorch Implementation of "Latent Denoising Makes Good Visual Tokenizers"
This repository contains a series of works on diffusion-based speech tokenizers, including the official implementation of the paper: "TaDiCodec: Text-aware Diffusion Speech Tokenizer for Speech Lan…
This repository contains a series of works on diffusion-based speech tokenizers, including the official implementation of the paper: "TaDiCodec: Text-aware Diffusion Speech Tokenizer for Speech Lan…
Streamable Text-to-Speech model using a language modeling approach, without vector quantization
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
A method that directly addresses the modality gap by aligning speech token with the corresponding text transcription during the tokenization stage.
Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis
Trainging, inference, and testing of the SAC speech codec model.
Official PyTorch Implementation of "Latent Diffusion Model Without Variational Autoencoder".
Ming-UniAudio: Speech LLM for Joint Understanding, Generation and Editing with Unified Representation
Official code for "Semantic-VAE: Semantic-Alignment Latent Representation for Better Speech Synthesis"
🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support
VoiceStar: Robust, Duration-controllable TTS that can Extrapolate
Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation
程序员在家做饭方法指南。Programmer's guide about how to cook at home (Simplified Chinese only).
VoiceBench: Benchmarking LLM-Based Voice Assistants
GLM-4 series: Open Multilingual Multimodal Chat LMs | 开源多语言多模态对话模型
Reference implementation for DPO (Direct Preference Optimization)
verl: Volcano Engine Reinforcement Learning for LLMs