BBuf

Xiaoyu Zhang BBuf

Working at Skywork.AI and the creator of GiantPandaCV official account.

2.3k followers · 58 following

SkyWork
ChengDu
www.giantpandacv.com

Achievements

x4 x4 x3

Achievements

x4 x4 x3

Lists (1)

Sort

🚀 My stack

1 repository

Stars

KuangjuX / cuda-evolve-oss

Autonomous GPU kernel optimization system driven by AI agents.

Python 29 Updated Mar 19, 2026

pytorch / torchtitan

A PyTorch native platform for training generative AI models

Python 5,175 756 Updated Mar 23, 2026

BBuf / Panzhihua-Mi-Yi-Pipa

If you want to purchase Panzhihua Mi Yi Pipa, please contact me.

11 1 Updated Mar 16, 2026

KuangjuX / ncu-cli

Automated CUDA kernel performance diagnostics from NVIDIA Nsight Compute (NCU) CSV exports.

Rust 25 Updated Mar 18, 2026

RightNow-AI / autokernel

Autoresearch for GPU kernels. Give it any PyTorch model, go to sleep, wake up to optimized Triton kernels.

Python 780 65 Updated Mar 19, 2026

ByteDance-Seed / VeOmni

VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo

Python 1,753 165 Updated Mar 23, 2026

GindaChen / nsys-ai

Terminal UI for NVIDIA Nsight Systems profiles — timeline viewer, kernel navigator, NVTX hierarchy

Python 45 8 Updated Mar 23, 2026

op7418 / Humanizer-zh

Humanizer 的汉化版本，Claude Code Skills，旨在消除文本中 AI 生成的痕迹。

5,106 428 Updated Jan 19, 2026

ArthurinRUC / cutlass-notes

From Minimal GEMM to Everything

Cuda 188 10 Updated Feb 10, 2026

lobehub / lobehub

The ultimate space for work and life — to find, build, and collaborate with agent teammates that grow with you. We are taking agent harness to the next level — enabling multi-agent collaboration, e…

TypeScript 74,172 14,825 Updated Mar 23, 2026

obra / superpowers

An agentic skills framework & software development methodology that works.

Shell 107,425 8,621 Updated Mar 19, 2026

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

Python 5,205 821 Updated Mar 23, 2026

datawhalechina / self-llm

《开源大模型食用指南》针对中国宝宝量身打造的基于Linux环境快速微调（全参数/Lora）、部署国内外开源大模型（LLM）/多模态大模型（MLLM）教程

Jupyter Notebook 29,213 2,873 Updated Mar 22, 2026

flashinfer-ai / flashinfer-bench-starter-kit

FlashInfer Bench @ MLSys 2026: Building AI agents to write high performance GPU kernels

Python 153 109 Updated Mar 20, 2026

Tencent / hpc-ops

High Performance LLM Inference Operator Library

C++ 793 74 Updated Feb 5, 2026

HydraQYH / hp_rms_norm

High performance RMSNorm Implement by using SM Core Storage(Registers and Shared Memory)

Cuda 30 1 Updated Jan 22, 2026

Dao-AILab / sonic-moe

Accelerating MoE with IO and Tile-aware Optimizations

Python 613 66 Updated Mar 23, 2026

sgl-project / mini-sglang

A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.

Python 3,775 517 Updated Mar 13, 2026

flashinfer-ai / cubloaty

a size profiler for cuda binary

Python 71 Updated Jan 15, 2026

dsl-learn / cutile-learn

NVIDIA cuTile learn

Python 165 2 Updated Dec 9, 2025

vipshop / cache-dit

A PyTorch-native inference engine with hybrid cache acceleration and massive parallelism for DiTs.

Python 1,104 66 Updated Mar 23, 2026

gpu-mode / resource-stream

GPU programming related news and material links

2,060 120 Updated Mar 8, 2026

modal-labs / gpu-glossary

GPU documentation for humans

Python 544 66 Updated Jan 27, 2026

sgl-project / SpecForge

Train speculative decoding models effortlessly and port them smoothly to SGLang serving.

Python 741 186 Updated Mar 21, 2026

HydraQYH / expert_specialization_moe

Expert Specialization MoE Solution based on CUTLASS

Cuda 27 2 Updated Jan 19, 2026

Dao-AILab / quack

A Quirky Assortment of CuTe Kernels

Python 863 98 Updated Mar 23, 2026

fzyzcjy / torch_utils

Utility scripts for PyTorch (e.g. Make Perfetto show some disappearing kernels, Memory profiler that understands more low-level allocations such as NCCL, ...)

Python 94 7 Updated Sep 11, 2025

lzyrapx / LeetGPU

🌈 Solutions of LeetGPU

Cuda 76 11 Updated Mar 3, 2026

GeeeekExplorer / nano-vllm

Nano vLLM

Python 12,390 1,772 Updated Nov 3, 2025

ModelTC / LightLLM

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

Python 3,962 311 Updated Mar 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Xiaoyu Zhang BBuf

Achievements

Achievements

Block or report BBuf

Lists (1)

🚀 My stack

Stars

KuangjuX / cuda-evolve-oss

pytorch / torchtitan

BBuf / Panzhihua-Mi-Yi-Pipa

KuangjuX / ncu-cli

RightNow-AI / autokernel

ByteDance-Seed / VeOmni

GindaChen / nsys-ai

op7418 / Humanizer-zh

ArthurinRUC / cutlass-notes

lobehub / lobehub

obra / superpowers

flashinfer-ai / flashinfer

datawhalechina / self-llm

flashinfer-ai / flashinfer-bench-starter-kit

Tencent / hpc-ops

HydraQYH / hp_rms_norm

Dao-AILab / sonic-moe

sgl-project / mini-sglang

flashinfer-ai / cubloaty

dsl-learn / cutile-learn

vipshop / cache-dit

gpu-mode / resource-stream

modal-labs / gpu-glossary

sgl-project / SpecForge

HydraQYH / expert_specialization_moe

Dao-AILab / quack

fzyzcjy / torch_utils

lzyrapx / LeetGPU

GeeeekExplorer / nano-vllm

ModelTC / LightLLM