Stars
Autoresearch for GPU kernels. Give it any PyTorch model, go to sleep, wake up to optimized Triton kernels.
let coding agents use ncu skills analysis cuda program automatically!
Machine Learning Engineering Open Book
hpc 教程,包含集合通信(mpi、nccl)、cuda 编程、向量化 SIMD、RDMA 通信等
Official Repo for paper: Scaling Behavior Cloning Improves Causal Reasoning: An Open Model for Real-Time Video Game Playing
A PyTorch-native inference engine with hybrid cache acceleration and massive parallelism for DiTs.
A curated collection of fun and creative examples generated with Nano Banana & Nano Banana Pro🍌, Gemini-2.5-flash-image based model. We also release Nano-consistent-150K openly to support the commu…
🏄 Scalable embedding, reasoning, ranking for images and sentences with CLIP
Run OpenAI's CLIP and Apple's MobileCLIP model on iOS to search photos.
The Triton TensorRT-LLM Backend
Official PyTorch Implementation of "Diffusion Transformers with Representation Autoencoders"
Diffusion model(SD,Flux,Wan,Qwen Image,Z-Image,...) inference in pure C/C++
A high-performance inference engine for LLMs, optimized for diverse AI accelerators.
DFloat11 [NeurIPS '25]: Lossless Compression of LLMs and DiTs for Efficient GPU Inference
Qwen-Image is a powerful image generation foundation model capable of complex text rendering and precise image editing.
AndroidImageEdit 安卓设备上图形编辑开源控件,支持磨皮美白 自定义贴图 图片滤镜 图片旋转 图片剪裁 文字贴图 撤销 回退 等操作
Repo for SeedVR2 (ICLR2026) & SeedVR (CVPR2025 Highlight)
📚A curated list of Awesome Diffusion Inference Papers with Codes: Sampling, Cache, Quantization, Parallelism, etc.🎉
A 120-day CUDA learning plan covering daily concepts, exercises, pitfalls, and references (including “Programming Massively Parallel Processors”). Features six capstone projects to solidify GPU par…
📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉
2023年最新整理 c++后端开发,1000篇优秀博文,含内存,网络,架构设计,高性能,数据结构,基础组件,中间件,分布式相关
CAIRI Supervised, Semi- and Self-Supervised Visual Representation Learning Toolbox and Benchmark