Stars
Fault tolerance for PyTorch (HSDP, LocalSGD, DiLoCo, Streaming DiLoCo)
Official implementation of Gumbel Distillation for Parallel Text Generation
Official Codebase For paper "One-step Language Modeling via Continuous Denoising"
[ICML 2026][Ultra Powerful Few-Step Diffusion RL] TDM-R1: Reinforcing Few-Step Diffusion Models with Non-Differentiable Reward
Helios: Real Real-Time Long Video Generation Model
A robust, efficient, low-latency speech-to-text library with advanced voice activity detection, wake word activation and instant transcription.
Open-source framework for conversational voice AI agents
Qwen3-TTS is an open-source series of TTS models developed by the Qwen team at Alibaba Cloud, supporting stable, expressive, and streaming speech generation, free-form voice design, and vivid voice…
Claude Code skill implementing Manus-style persistent markdown planning — the workflow pattern behind the $2B acquisition.
Mobile and Web client for Codex and Claude Code, with realtime voice, encryption and fully featured
Open-Source AI Presentation Generator and API (Gamma, Beautiful AI, Decktopus Alternative)
Towards Scalable Pre-training of Visual Tokenizers for Generation
MiMo-Audio: Audio Language Models are Few-Shot Learners
TurboDiffusion: 100–200× Acceleration for Video Diffusion Models
🚀 Pytorch Distributed native training library for LLMs/VLMs with OOTB Hugging Face support
🔊 Text-Prompted Generative Audio Model
This repository allows reproduction of Poetiq's record-breaking submission to the ARC-AGI-1 and ARC-AGI-2 benchmarks.
slime is an LLM post-training framework for RL Scaling.
Text-audio foundation model from Boson AI
Official implementation of UnifiedReward & [NeurIPS 2025] UnifiedReward-Think & UnifiedReward-Flex
Krea Realtime 14B. An open-source realtime AI video model.
[CVPR 2026] Towards Real-Time Diffusion-Based Streaming Video Super-Resolution — An efficient one-step diffusion framework for streaming VSR with locality-constrained sparse attention and a tiny co…
RLinf: Reinforcement Learning Infrastructure for Embodied and Agentic AI
[ACL 2026 Main] MeanAudio: Fast and Faithful Text-to-Audio Generation with Mean Flows
The official implementation for [NeurIPS2025 Oral] Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free
rCM & Causal-rCM: Best Algorithms/Infrastructures for Bidirectional/Autoregressive Video Diffusion Distillation at Scale