Lists (3)
Sort Name ascending (A-Z)
Stars
Miles is an enterprise-facing reinforcement learning framework for LLM and VLM post-training, forked from and co-evolving with slime.
A lightweight `vLLM-Omni`-style diffusion implementation built around `Wan2.2-TI2V-5B-Diffusers` inspired from nano-vllm
Skills for Real Engineers. Straight from my .claude directory.
eLLM can infer LLM on CPUs faster than on GPUs
Puzzles for learning Triton
Use Garry Tan's exact Claude Code setup: 23 opinionated tools that serve as CEO, Designer, Eng Manager, Release Manager, Doc Engineer, and QA
The agent that grows with you
Deploy intelligence. Open-source infrastructure for AI agents in production.
Download market data from Yahoo! Finance's API
A framework for efficient model inference with omni-modality models
Chrome DevTools for coding agents
Production-grade engineering skills for AI coding agents.
AI agents running research on single-GPU nanochat training automatically
Autoresearch for GPU kernels. Give it any PyTorch model, go to sleep, wake up to optimized Triton kernels.
🎬 3.7× faster video generation E2E 🖼️ 1.6× faster image generation E2E ⚡ ColumnSparseAttn 9.3× vs FlashAttn‑3 💨 ColumnSparseGEMM 2.5× vs cuBLAS
A curriculum for learning about gpu performance engineering, from scratch to what the frontier AI labs do
Small scale distributed training of sequential deep learning models, built on Numpy and MPI.
Build compute kernels and load them from the Hub.
Implement a reasoning LLM in PyTorch from scratch, step by step
Fun-Audio-Chat is a Large Audio Language Model built for natural, low-latency voice interactions.
TTS model capable of streaming conversational audio in realtime.
MiniMax-M1, the world's first open-weight, large-scale hybrid-attention reasoning model.
"AI-Trader: 100% Fully-Automated Agent-Native Trading"
FlashInfer: Kernel Library for LLM Serving