Stars
CUDA kernels for linear attention variants, written in CuTe DSL and CUTLASS C++.
将冰冷的离别化为温暖的 Skill,欢迎加入数字生命1.0!Transforming cold farewells into warm skills? It's giving rebirth era. Welcome to Digital Life 1.0. 🫶
An agent-managed museum exhibit, built in Rust with Gajae-Code / LazyCodex — developed and maintained with no human intervention.
Sample codes for my CUDA programming book
[CVPR2026] BinaryAttention: One-Bit QK-Attention for Vision and Diffusion Transformers
Model Compression Toolbox for Large Language Models and Diffusion Models
NVFP4 Flash-Attention 4 on BlackWell
A plug-and-play compiler that delivers free-lunch optimizations for both inference and training.
Autoresearch for GPU kernels. Give it any PyTorch model, go to sleep, wake up to optimized Triton kernels.
let coding agents use ncu skills analysis cuda program automatically!
Machine Learning Engineering Open Book
OpenLovart 是一个基于 AI 的设计平台,让创意设计变得简单而强大。通过 AI 对话和智能画布,快速实现你的设计想法。
A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny vLLM + Qwen.
TurboDiffusion: 100–200× Acceleration for Video Diffusion Models
flex-block-attn: an efficient block sparse attention computation library
HunyuanVideo-1.5: A leading lightweight video generation model
A high-throughput and memory-efficient inference and serving engine for LLMs
📚A curated list of Awesome Diffusion Inference Papers with Codes: Sampling, Cache, Quantization, Parallelism, etc.🎉
Large Language Model (LLM) Systems Paper List
A curated list of recent efficient video generation methods.
Puzzles for learning Triton, play it with minimal environment configuration!