hanzz2007

Follow

qiuhan hanzz2007

Follow

13 followers · 185 following

china

Achievements

Achievements

Lists (7)

Sort

cpp

🔮 Future ideas

gpt

11 repositories

gpu

onnx

serve

tvm

Stars

lightseekorg / tokenspeed

TokenSpeed is a speed-of-light LLM inference engine.

Python 1,538 183 Updated Jul 2, 2026

Tencent / hpc-ops

High Performance LLM Inference Operator Library

C++ 989 102 Updated Jul 2, 2026

flagos-ai / libtriton_jit

A Triton JIT runtime and ffi provider in C++

C++ 37 18 Updated Jun 30, 2026

linkedin / Liger-Kernel

Efficient Triton Kernels for LLM Training

Python 6,476 549 Updated Jul 2, 2026

End2End-Diffusion / diffusion-bench

Towards Holistic evaluation of Generative Diffusion Transformers!

Python 91 4 Updated Jul 1, 2026

vllm-project / vime

An LLM post-training framework with vLLM for RL Scaling

Python 318 42 Updated Jul 2, 2026

sablin39 / tilelang-cuda-skills

Skills for writing tilelang and debugging with CUDA toolkits.

Python 127 5 Updated May 20, 2026

kraiskil / onnx2c

Open Neural Network Exchange to C compiler.

C 400 76 Updated Apr 23, 2026

AmberLJC / LLMSys-PaperList

Large Language Model (LLM) Systems Paper List

2,157 113 Updated Jun 21, 2026

xianyu110 / awesome-openclaw-tutorial

从零开始玩转OpenClaw：最全面的中文教程，涵盖安装、配置、实战案例和避坑指南（github版）

Shell 4,512 678 Updated Jun 19, 2026

jlxue / claude_profiler

claude code profiler

Python 7 Updated Mar 6, 2026

ROCm / TransferBench

TransferBench is a utility capable of benchmarking simultaneous copies between user-specified devices (CPUs/GPUs)

C++ 73 26 Updated Jul 1, 2026

maxiaosong1124 / ncu-cuda-profiling-skill

let coding agents use ncu skills analysis cuda program automatically!

Shell 116 8 Updated May 25, 2026

charmbracelet / crush

Glamourous agentic coding for all 💘

Go 25,972 1,894 Updated Jul 2, 2026

nunchaku-ai / nunchaku

[ICLR2025 Spotlight] SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models

Python 3,898 260 Updated Mar 7, 2026

ktprime / emhash

Fast and memory efficient c++ flat hash table/map/set

C++ 721 73 Updated Jul 2, 2026

AmberLJC / meta-research

Claude Code agent skill for autonomous AI/scientific research workflows

HTML 12 Updated Mar 5, 2026

Orchestra-Research / AI-Research-SKILLs

Comprehensive open-source library of AI research and engineering skills for any AI model. Package the skills and your claude code/codex/gemini agent will be an AI research agent with full horsepowe…

TeX 10,314 770 Updated Jun 16, 2026

Tencent / KsanaDiT

KsanaDiT: High-Performance DiT (Diffusion Transformer) Inference Framework for Video & Image Generation

Python 59 6 Updated May 13, 2026

cyhdmjzzy / DeepEP-Code-Analysis

Cuda 25 5 Updated Feb 27, 2026

shanraisshan / claude-code-best-practice

from vibe coding to agentic engineering - practice makes claude perfect

HTML 61,829 6,183 Updated Jul 2, 2026

gpu-mode / lectures

Material for gpu-mode lectures

Jupyter Notebook 6,275 629 Updated Jun 15, 2026

BBuf / how-to-optim-algorithm-in-cuda

how to optimize some algorithm in cuda.

Cuda 3,119 283 Updated Jun 28, 2026

sukoncon / TMA-Adaptive-FP8-Grouped-GEMM

Cuda 26 3 Updated Aug 28, 2025

ROCm / aiter

AI Tensor Engine for ROCm

Python 476 386 Updated Jul 2, 2026

nanocoai / nanoclaw

A lightweight alternative to OpenClaw that runs in containers for security. Connects to WhatsApp, Telegram, Slack, Discord, Gmail and other messaging apps,, has memory, scheduled jobs, and runs dir…

TypeScript 30,079 12,901 Updated Jul 2, 2026

jundaf2 / FlyDSL

Forked from ROCm/FlyDSL

Python 1 Updated Feb 25, 2026

toyaix / tritonllm

LLM Inference via Triton (Flexible & Modular): Focused on Kernel Optimization using CUBIN binaries, Starting from gpt-oss Model

Python 118 6 Updated Apr 28, 2026

zarazhangrui / frontend-slides

Create beautiful slides on the web using a coding agent's frontend skills

JavaScript 24,320 1,983 Updated Jun 23, 2026

caoshiyi / flashinfer-bench-ksearch

Python 5 Updated Feb 26, 2026