Stars
A Python DSL to write Nvidia PTX for Hopper and Blackwell in JAX and PyTorch
Symphony turns project work into isolated, autonomous implementation runs, allowing teams to manage work instead of supervising coding agents.
AI agents running research on single-GPU nanochat training automatically
A lightweight alternative to OpenClaw that runs in containers for security. Connects to WhatsApp, Telegram, Slack, Discord, Gmail and other messaging apps,, has memory, scheduled jobs, and runs dir…
Low overhead tracing library and trace visualizer for pipelined CUDA kernels
CIFAR-10 speedruns: 94% in 2.6 seconds and 96% in 27 seconds
Official PyTorch implementation of BigVGAN (ICLR 2023)
Open-source framework for the research and development of foundation models.
CUDA Templates and Python DSLs for High-Performance Linear Algebra
Render any git repo into a single static HTML page for humans or LLMs
The batteries-included agent harness.
Text-audio foundation model from Boson AI
Qwen-Image is a powerful image generation foundation model capable of complex text rendering and precise image editing.
An implementation of the Nvidia's Parakeet models for Apple Silicon using MLX.
Home for "How To Scale Your Model", a short blog-style textbook about scaling LLMs on TPUs
Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis
Dia-JAX: A JAX port of Dia, the text-to-speech model for generating realistic dialogue from text with emotion and tone control.
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
A modified VITS that utilizes phoneme duration's ground truth for better robustness
Implementing DeepSeek R1's GRPO algorithm from scratch
Minimal reproduction of DeepSeek R1-Zero
Single File, Single GPU, From Scratch, Efficient, Full Parameter Tuning library for "RL for LLMs"
Semantic segmentation models with 500+ pretrained convolutional and transformer-based backbones.