Skip to content
View Luwill6's full-sized avatar

Block or report Luwill6

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Autoresearch for GPU kernels. Give it any PyTorch model, go to sleep, wake up to optimized Triton kernels.

Python 1,180 112 Updated Mar 19, 2026

let coding agents use ncu skills analysis cuda program automatically!

Shell 84 3 Updated Feb 5, 2026

Machine Learning Engineering Open Book

Python 17,660 1,119 Updated Mar 16, 2026

hpc 教程,包含集合通信(mpi、nccl)、cuda 编程、向量化 SIMD、RDMA 通信等

Cuda 404 43 Updated Apr 7, 2026

Official Repo for paper: Scaling Behavior Cloning Improves Causal Reasoning: An Open Model for Real-Time Video Game Playing

Python 134 11 Updated Feb 6, 2026

A PyTorch-native inference engine with cache acceleration, parallelism and quantization for DiTs.

Python 1,130 68 Updated Apr 10, 2026

A curated collection of fun and creative examples generated with Nano Banana & Nano Banana Pro🍌, Gemini-2.5-flash-image based model. We also release Nano-consistent-150K openly to support the commu…

22,034 2,259 Updated Dec 12, 2025

Practice Code for text to image trainer

Python 559 37 Updated Feb 27, 2026

🏄 Scalable embedding, reasoning, ranking for images and sentences with CLIP

Python 12,833 2,073 Updated Jan 23, 2024

Run OpenAI's CLIP and Apple's MobileCLIP model on iOS to search photos.

Swift 2,932 450 Updated Mar 29, 2026

PyTorch Neural Network eXchange

Python 704 45 Updated Apr 9, 2026

The Triton TensorRT-LLM Backend

930 138 Updated Apr 8, 2026

Official PyTorch Implementation of "Diffusion Transformers with Representation Autoencoders"

Python 1,849 75 Updated Feb 25, 2026

Diffusion model(SD,Flux,Wan,Qwen Image,Z-Image,...) inference in pure C/C++

C++ 5,720 583 Updated Apr 11, 2026

A high-performance inference engine for LLMs, optimized for diverse AI accelerators.

C++ 1,202 178 Updated Apr 10, 2026

DFloat11 [NeurIPS '25]: Lossless Compression of LLMs and DiTs for Efficient GPU Inference

Python 617 38 Updated Nov 24, 2025

Qwen-Image is a powerful image generation foundation model capable of complex text rendering and precise image editing.

Python 7,758 477 Updated Feb 10, 2026

AndroidImageEdit 安卓设备上图形编辑开源控件,支持磨皮美白 自定义贴图 图片滤镜 图片旋转 图片剪裁 文字贴图 撤销 回退 等操作

Java 2,268 574 Updated Aug 18, 2025

Model Quantization Benchmark

Python 864 142 Updated Apr 20, 2025

Repo for SeedVR2 (ICLR2026) & SeedVR (CVPR2025 Highlight)

Python 1,133 66 Updated Jan 27, 2026

一个用来记录武汉大学杨景媛论文问题的仓库

HTML 3,639 228 Updated Aug 13, 2025

coding CUDA everyday!

Cuda 74 2 Updated Feb 5, 2026

📚A curated list of Awesome Diffusion Inference Papers with Codes: Sampling, Cache, Quantization, Parallelism, etc.🎉

Python 534 26 Updated Mar 19, 2026

诺亚盘古大模型研发背后的真正的心酸与黑暗的故事。

11,420 1,327 Updated Jul 9, 2025

A 120-day CUDA learning plan covering daily concepts, exercises, pitfalls, and references (including “Programming Massively Parallel Processors”). Features six capstone projects to solidify GPU par…

Shell 900 107 Updated Mar 29, 2025

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 10,236 1,038 Updated Apr 8, 2026

CUDA 算子手撕与面试指南

Cuda 921 102 Updated Aug 23, 2025

2023年最新整理 c++后端开发,1000篇优秀博文,含内存,网络,架构设计,高性能,数据结构,基础组件,中间件,分布式相关

2,159 452 Updated Mar 17, 2023

CAIRI Supervised, Semi- and Self-Supervised Visual Representation Learning Toolbox and Benchmark

Python 657 62 Updated Oct 15, 2025
Next