Skip to content
View Luwill6's full-sized avatar

Block or report Luwill6

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Autoresearch for GPU kernels. Give it any PyTorch model, go to sleep, wake up to optimized Triton kernels.

Python 917 79 Updated Mar 19, 2026

let coding agents use ncu skills analysis cuda program automatically!

Shell 81 2 Updated Feb 5, 2026

Machine Learning Engineering Open Book

Python 17,606 1,117 Updated Mar 16, 2026

hpc 教程,包含集合通信(mpi、nccl)、cuda 编程、向量化 SIMD、RDMA 通信等

Cuda 388 42 Updated Mar 18, 2026

Official Repo for paper: Scaling Behavior Cloning Improves Causal Reasoning: An Open Model for Real-Time Video Game Playing

Python 128 9 Updated Feb 6, 2026

A PyTorch-native inference engine with hybrid cache acceleration and massive parallelism for DiTs.

Python 1,119 67 Updated Apr 3, 2026

A curated collection of fun and creative examples generated with Nano Banana & Nano Banana Pro🍌, Gemini-2.5-flash-image based model. We also release Nano-consistent-150K openly to support the commu…

21,919 2,240 Updated Dec 12, 2025

Practice Code for text to image trainer

Python 559 37 Updated Feb 27, 2026

🏄 Scalable embedding, reasoning, ranking for images and sentences with CLIP

Python 12,828 2,074 Updated Jan 23, 2024

Run OpenAI's CLIP and Apple's MobileCLIP model on iOS to search photos.

Swift 2,932 450 Updated Mar 29, 2026

PyTorch Neural Network eXchange

Python 702 45 Updated Apr 3, 2026

The Triton TensorRT-LLM Backend

930 136 Updated Mar 17, 2026

Official PyTorch Implementation of "Diffusion Transformers with Representation Autoencoders"

Python 1,841 75 Updated Feb 25, 2026

Diffusion model(SD,Flux,Wan,Qwen Image,Z-Image,...) inference in pure C/C++

C++ 5,676 575 Updated Apr 1, 2026

A high-performance inference engine for LLMs, optimized for diverse AI accelerators.

C++ 1,165 167 Updated Apr 3, 2026

DFloat11 [NeurIPS '25]: Lossless Compression of LLMs and DiTs for Efficient GPU Inference

Python 617 38 Updated Nov 24, 2025

Qwen-Image is a powerful image generation foundation model capable of complex text rendering and precise image editing.

Python 7,703 473 Updated Feb 10, 2026

AndroidImageEdit 安卓设备上图形编辑开源控件,支持磨皮美白 自定义贴图 图片滤镜 图片旋转 图片剪裁 文字贴图 撤销 回退 等操作

Java 2,270 575 Updated Aug 18, 2025

Model Quantization Benchmark

Python 863 142 Updated Apr 20, 2025

Repo for SeedVR2 (ICLR2026) & SeedVR (CVPR2025 Highlight)

Python 1,119 66 Updated Jan 27, 2026

一个用来记录武汉大学杨景媛论文问题的仓库

HTML 3,644 228 Updated Aug 13, 2025

coding CUDA everyday!

Cuda 74 2 Updated Feb 5, 2026

📚A curated list of Awesome Diffusion Inference Papers with Codes: Sampling, Cache, Quantization, Parallelism, etc.🎉

Python 532 26 Updated Mar 19, 2026

诺亚盘古大模型研发背后的真正的心酸与黑暗的故事。

11,414 1,326 Updated Jul 9, 2025

A 120-day CUDA learning plan covering daily concepts, exercises, pitfalls, and references (including “Programming Massively Parallel Processors”). Features six capstone projects to solidify GPU par…

Shell 894 106 Updated Mar 29, 2025

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 10,102 1,024 Updated Mar 23, 2026

CUDA 算子手撕与面试指南

Cuda 913 100 Updated Aug 23, 2025

2023年最新整理 c++后端开发,1000篇优秀博文,含内存,网络,架构设计,高性能,数据结构,基础组件,中间件,分布式相关

2,151 447 Updated Mar 17, 2023

CAIRI Supervised, Semi- and Self-Supervised Visual Representation Learning Toolbox and Benchmark

Python 657 61 Updated Oct 15, 2025
Next