BBuf

Xiaoyu Zhang BBuf

Working at Skywork.AI and the creator of GiantPandaCV official account.

2.3k followers · 58 following

SkyWork
ChengDu
www.giantpandacv.com

Achievements

x4 x4 x3

Achievements

x4 x4 x3

sglang Public
Forked from sgl-project/sglang

SGLang is a fast serving framework for large language models and vision language models.

Python 1 Apache License 2.0 Updated Apr 13, 2026
Model-Optimizer Public
Forked from NVIDIA/Model-Optimizer

A unified library of SOTA model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment frameworks …

Python Apache License 2.0 Updated Apr 13, 2026
SGLang-Auto-Driven-SKILLS Public

Python 63 6 Updated Apr 10, 2026
how-to-optim-algorithm-in-cuda Public

how to optimize some algorithm in cuda.

cuda llm

Cuda 2,915 267 Updated Apr 9, 2026
sgl-cookbook Public
Forked from sgl-project/sgl-cookbook

Cookbook of SGLang - Recipe

JavaScript 1 Apache License 2.0 Updated Apr 3, 2026
torchtitan Public
Forked from pytorch/torchtitan

A PyTorch native platform for training generative AI models

Python BSD 3-Clause "New" or "Revised" License Updated Mar 16, 2026
Panzhihua-Mi-Yi-Pipa Public

If you want to purchase Panzhihua Mi Yi Pipa, please contact me.

11 1 MIT License Updated Mar 16, 2026
BBuf Public

7 5 Updated Feb 28, 2026
self-llm Public
Forked from datawhalechina/self-llm

《开源大模型食用指南》针对中国宝宝量身打造的基于Linux环境快速微调（全参数/Lora）、部署国内外开源大模型（LLM）/多模态大模型（MLLM）教程

Jupyter Notebook 2 Apache License 2.0 Updated Feb 24, 2026
InfiniteTalk Public
Forked from MeiGen-AI/InfiniteTalk

Unlimited-length talking video generation that supports image-to-video and video-to-video generation

Python Apache License 2.0 Updated Feb 12, 2026
lm-sys.github.io Public
Forked from lm-sys/lm-sys.github.io

JavaScript Other Updated Feb 11, 2026
cache-dit Public
Forked from vipshop/cache-dit

🤗A PyTorch-native Inference Engine with Hybrid Cache Acceleration and Parallelism for DiTs: Z-Image, FLUX2, Qwen-Image, etc.

Python 1 Apache License 2.0 Updated Feb 1, 2026
vllm Public
Forked from vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 1 Apache License 2.0 Updated Nov 29, 2025
gpu-glossary-zh Public

https://bbuf.github.io/gpu-glossary-zh/

Python 26 Other Updated Nov 7, 2025
tilelang Public
Forked from tile-ai/tilelang

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

C++ 1 Other Updated Oct 9, 2025
llm_benchmark Public
Forked from lvhan028/llm_benchmark

Python MIT License Updated Sep 9, 2025
flashinfer Public
Forked from flashinfer-ai/flashinfer

FlashInfer: Kernel Library for LLM Serving

Cuda Apache License 2.0 Updated Jul 14, 2025
Awesome-ML-SYS-Tutorial Public
Forked from zhaochenyang20/Awesome-ML-SYS-Tutorial

My learning notes/codes for ML SYS.

Python 5 Apache License 2.0 Updated May 5, 2025
tvm_mlir_learn Public

compiler learning resources collect.

Python 2,708 370 Updated Mar 19, 2025
PanZhiHua_MiYi_PiPa Public

Updated Mar 1, 2025
DeepGEMM Public
Forked from deepseek-ai/DeepGEMM

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda MIT License Updated Feb 27, 2025
ml-engineering Public
Forked from stas00/ml-engineering

Machine Learning Engineering Open Book

Python 1 Creative Commons Attribution Share Alike 4.0 International Updated Feb 19, 2025
tensorrt-llm-moe Public

C++ 33 2 Updated Feb 3, 2025
HunyuanVideo Public
Forked from Tencent-Hunyuan/HunyuanVideo

HunyuanVideo: A Systematic Framework For Large Video Generation Model

Python Other Updated Dec 20, 2024
cfx-article-src Public
Forked from ColfaxResearch/cfx-article-src

C++ 1 Updated Dec 20, 2024
ao Public
Forked from pytorch/ao

PyTorch native quantization and sparsity for training and inference

Python 1 BSD 3-Clause "New" or "Revised" License Updated Oct 31, 2024
flash-attention Public
Forked from Dao-AILab/flash-attention

Fast and memory-efficient exact attention

Python 1 BSD 3-Clause "New" or "Revised" License Updated Oct 8, 2024
TiledCUDA Public
Forked from TiledTensor/TiledCUDA

TiledCUDA is a highly efficient kernel template library designed to elevate CUDA C’s level of abstraction for processing tiles.

C++ MIT License Updated Sep 6, 2024
ArmNeonOptimization Public

arm-neon

C++ 93 23 Updated Aug 2, 2024
RWKV-World-HF-Tokenizer Public

Python 34 5 Updated Jul 21, 2024

Xiaoyu Zhang BBuf

Achievements

Achievements

sglang Public

Uh oh!

Model-Optimizer Public

Uh oh!

SGLang-Auto-Driven-SKILLS Public

Uh oh!

how-to-optim-algorithm-in-cuda Public

Uh oh!

sgl-cookbook Public

Uh oh!

torchtitan Public

Uh oh!

Panzhihua-Mi-Yi-Pipa Public

Uh oh!

BBuf Public

Uh oh!

self-llm Public

Uh oh!

InfiniteTalk Public

Uh oh!

lm-sys.github.io Public

Uh oh!

cache-dit Public

Uh oh!

vllm Public

Uh oh!

gpu-glossary-zh Public

Uh oh!

tilelang Public

Uh oh!

llm_benchmark Public

Uh oh!

flashinfer Public

Uh oh!

Awesome-ML-SYS-Tutorial Public

Uh oh!

tvm_mlir_learn Public

Uh oh!

PanZhiHua_MiYi_PiPa Public

Uh oh!

DeepGEMM Public

Uh oh!

ml-engineering Public

Uh oh!

tensorrt-llm-moe Public

Uh oh!

HunyuanVideo Public

Uh oh!

cfx-article-src Public

Uh oh!

ao Public

Uh oh!

flash-attention Public

Uh oh!

TiledCUDA Public

Uh oh!

ArmNeonOptimization Public

Uh oh!

RWKV-World-HF-Tokenizer Public

Uh oh!