-
Shanghai Jiao Tong University & Shanghai Innovation Institute
- Shanghai
-
23:52
(UTC +08:00) - https://zhikangniu.github.io/
-
-
Semantic-VAE Public
Official code for "Semantic-VAE: Semantic-Alignment Latent Representation for Better Speech Synthesis"
-
stable-audio-tools Public
Forked from Stability-AI/stable-audio-toolsGenerative models for conditional audio generation
Python MIT License UpdatedOct 9, 2025 -
-
F5-TTS Public
Forked from SWivid/F5-TTSOfficial code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
-
-
MELLE Public
Forked from Shy-98/MELLEUnofficial PyTorch implementation of "Autoregressive Speech Synthesis without Vector Quantization (MELLE)"
Python UpdatedJun 27, 2025 -
-
descript-audio-codec Public
Forked from descriptinc/descript-audio-codecState-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.
-
A-DMA Public
[INTERSPEECH 2025 Oral]Official code for "Accelerating Diffusion-based Text-to-Speech Model Training with Dual Modality Alignment"
-
F5R-TTS Public
Forked from FrontierLabs/F5R-TTSOfficial code for "F5R-TTS: Improving Flow-Matching based Text-to-Speech with Group Relative Policy Optimization"
-
UniCodec Public
Forked from Jiang-Yidi/UniCodec[ACL 2025 Main] UniCodec: a unified audio codec with a single codebook to support multi-domain audio data, including speech, music, and sound
Python UpdatedMay 30, 2025 -
chatterbox Public
Forked from resemble-ai/chatterboxSoTA open-source TTS
Python MIT License UpdatedMay 30, 2025 -
-
minimind Public
Forked from jingyaogong/minimind🚀🚀 「大模型」2小时完全从0训练26M的小参数GPT!🌏 Train a 26M-parameter GPT from scratch in just 2h!
-
FAR Public
Forked from showlab/FARCode for: "Long-Context Autoregressive Video Modeling with Next-Frame Prediction"
Python MIT License UpdatedApr 23, 2025 -
bd3lms Public
Forked from kuleshov-group/bd3lmsBlock Diffusion: Interpolating Between Autoregressive and Diffusion Language Models
Python Apache License 2.0 UpdatedMar 28, 2025 -
LLaMA-Factory Public
Forked from hiyouga/LLaMA-FactoryUnified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
Python Apache License 2.0 UpdatedMar 28, 2025 -
BigVGAN Public
Forked from NVIDIA/BigVGANOfficial PyTorch implementation of BigVGAN (ICLR 2023)
Python MIT License UpdatedMar 23, 2025 -
LLaSA_training Public
Forked from zhenye234/LLaSA_trainingLLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis
-
Steel-LLM Public
Forked from zhanshijinwat/Steel-LLMTrain a 1B LLM with 1T tokens from scratch by personal
-
ms-swift Public
Forked from modelscope/ms-swiftUse PEFT or Full-parameter to finetune 400+ LLMs (Qwen2.5, Llama3.2, GLM4, Internlm2.5, Yi1.5, Mistral, Baichuan2, DeepSeek, ...) or 150+ MLLMs (Qwen2-VL, Qwen2-Audio, Llama3.2-Vision, Llava, Inter…
Python Apache License 2.0 UpdatedFeb 28, 2025 -
CosyVoice Public
Forked from FunAudioLLM/CosyVoiceMulti-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Python Apache License 2.0 UpdatedFeb 25, 2025 -
Amphion Public
Forked from open-mmlab/AmphionAmphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…
Python MIT License UpdatedFeb 24, 2025 -
Sana Public
Forked from NVlabs/SanaSANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer
Python Apache License 2.0 UpdatedFeb 23, 2025 -
OuteTTS Public
Forked from edwko/OuteTTSInterface for OuteTTS models.
Python Apache License 2.0 UpdatedFeb 14, 2025 -
NDVQ Public
[SLT2024]Official code of "NDVQ: Robust Neural Audio Codec with Normal Distribution-Based Vector Quantization"
-
transformers Public
Forked from huggingface/transformers🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Python Apache License 2.0 UpdatedJan 13, 2025 -
stable-codec Public
Forked from Stability-AI/stable-codecA family of state-of-the-art Transformer-based audio codecs for low-bitrate high-quality audio coding.
-
s3prl Public
Forked from s3prl/s3prlSelf-Supervised Speech Pre-training and Representation Learning Toolkit
Python Apache License 2.0 UpdatedJan 2, 2025