Stars
A PyTorch coding practice platform — covering LLM, Diffusion, PEFT, and more A friendly environment to help you deeply understand deep learning components through hands-on practice. Like LeetCode, …
🔥 LeetCode for PyTorch — practice implementing softmax, attention, GPT-2 and more from scratch with instant auto-grading. Jupyter-based, self-hosted or try online.
A Non-Autoregressive Text-to-Speech (NAR-TTS) framework, including official PyTorch implementation of PortaSpeech (NeurIPS 2021) and DiffSpeech (AAAI 2022)
A high-quality rapid TTS voice cloning model that reaches speeds of 150x realtime.
Self-Supervised Speech Pre-training and Representation Learning Toolkit
VoxCPM2: Tokenizer-Free TTS for Multilingual Speech Generation, Creative Voice Design, and True-to-Life Cloning
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
Real-time global intelligence dashboard. AI-powered news aggregation, geopolitical monitoring, and infrastructure tracking in a unified situational awareness interface
Official Repository for "Global Rotation Equivariant Phase Modeling for Speech Enhancement with Deep Magnitude-Phase Interaction"
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
Official code for "F5R-TTS: Improving Flow-Matching based Text-to-Speech with Group Relative Policy Optimization"
This is the main repository of open-sourced speech technology by Huawei Noah's Ark Lab.
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Hybrid Flow Matching and GAN with Multi-Resolution Network for Few-Step High-Fidelity Audio Generation
Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching
分享AI Infra知识&代码练习:PyTorch/vLLM/SGLang框架入门⚡️、性能加速🚀、大模型基础🧠、AI软硬件🔧等
Efficient Triton Kernels for LLM Training
Minimalistic 4D-parallelism distributed training framework for education purpose
My learning notes for ML SYS.
A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3.6, DeepSeek-R1, GLM-5.1, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Gemma4, Llava, …
Robust Speech Recognition via Large-Scale Weak Supervision