Skip to content
View BBuf's full-sized avatar

Block or report BBuf

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

[MLSys 26] 🥇 Solution for Gated Delta Net Track of MLSys 26 Flash infer competition

Python 34 2 Updated May 22, 2026

SGLang Omni: High-Performance Multi-Stage Pipeline Framework for Omni Models

Python 489 206 Updated Jun 13, 2026
Python 248 27 Updated Jun 9, 2026

Skills for writing tilelang and debugging with CUDA toolkits.

Python 123 5 Updated May 20, 2026

From Automated Idea Factory to Realization

Shell 1,108 90 Updated Jun 13, 2026
Python 179 29 Updated Jun 13, 2026

Ralph is an autonomous AI agent loop that runs repeatedly until all PRD items are complete.

TypeScript 20,212 1,996 Updated Feb 2, 2026

分享AI Infra知识&代码练习:PyTorch/vLLM/SGLang框架入门⚡️、性能加速🚀、大模型基础🧠、AI软硬件🔧等

Jupyter Notebook 2,578 233 Updated May 30, 2026

Efficient and unified implementations for TopK-based sparse attention

Cuda 36 1 Updated Apr 20, 2026

A unified library of SOTA model optimization techniques like quantization, distillation, pruning, neural architecture search, speculative decoding, etc. It compresses deep learning models for downs…

Python 2,923 437 Updated Jun 13, 2026

Information collection for the Happy Horse AI video generator model. Official demo and updates at happyhorses.io.

633 60 Updated May 12, 2026

PTX ISA 9.1 documentation converted to searchable markdown. Includes Claude Code skill for CUDA development.

Python 207 37 Updated Dec 24, 2025

Agentic Kernel Optimization for All — automated GPU kernel optimization for any kernel, any hardware, any language

Python 286 19 Updated May 31, 2026

Autonomous GPU kernel optimization system driven by AI agents.

Python 31 Updated Mar 29, 2026

A PyTorch native platform for training generative AI models

Python 5,436 859 Updated Jun 13, 2026

Automated CUDA kernel performance diagnostics from NVIDIA Nsight Compute (NCU) CSV exports.

Rust 33 3 Updated Mar 18, 2026

Autoresearch for GPU kernels. Give it any PyTorch model, go to sleep, wake up to optimized Triton kernels.

Python 1,406 142 Updated Mar 19, 2026

VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo

Python 2,013 212 Updated Jun 12, 2026

Terminal UI for NVIDIA Nsight Systems profiles — timeline viewer, kernel navigator, NVTX hierarchy

Python 58 19 Updated Jun 12, 2026

Humanizer 的汉化版本,Claude Code Skills,旨在消除文本中 AI 生成的痕迹。

10,131 773 Updated Jan 19, 2026

From Minimal GEMM to Everything

Python 220 12 Updated Jun 8, 2026

🤯 LobeHub is your Chief Agent Operator, organizing your agents into 7×24 operations by hiring, scheduling, and reporting on your entire AI team.

TypeScript 78,617 15,415 Updated Jun 13, 2026

An agentic skills framework & software development methodology that works.

Shell 226,897 20,176 Updated Jun 13, 2026

FlashInfer: Kernel Library for LLM Serving

Python 5,787 1,047 Updated Jun 13, 2026

《开源大模型食用指南》针对中国宝宝量身打造的基于Linux环境快速微调(全参数/Lora)、部署国内外开源大模型(LLM)/多模态大模型(MLLM)教程

Jupyter Notebook 30,892 3,020 Updated Jun 3, 2026

FlashInfer Bench @ MLSys 2026: Building AI agents to write high performance GPU kernels

Python 171 130 Updated Apr 26, 2026

High Performance LLM Inference Operator Library

C++ 932 96 Updated Jun 11, 2026
Next