Skip to content
View xmfbit's full-sized avatar
  • Bytedance
  • Beijing,China

Block or report xmfbit

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

The official Lark/Feishu CLI tool, maintained by the larksuite team — built for humans and AI Agents. Covers core business domains including Messenger, Docs, Base, Sheets, Calendar, Mail, Tasks, Me…

Go 14,481 997 Updated Jun 22, 2026

[Experimental] Miles-diffusion is an post-training framework for large-scale diffusion model training and production workloads, forked from and co-evolving with miles.

Python 18 5 Updated Jun 17, 2026

Muon is Scalable for LLM Training

1,493 89 Updated Aug 3, 2025
Python 191 31 Updated Jun 22, 2026

GDM Science Skills to speed up agentic scientific workflows with better grounding and higher token efficiency. Integrate insights from AlphaGenome, AFDB, UniProt and 30+ other databases and tools.

Python 2,000 205 Updated Jun 8, 2026

TurboDiffusion: 100–200× Acceleration for Video Diffusion Models

Python 3,536 265 Updated Jun 17, 2026

mKernel: fast multi-node, multi-GPU fused kernels

Cuda 239 22 Updated Jun 21, 2026
Python 210 36 Updated May 29, 2026
JavaScript 82 6 Updated Jun 21, 2026

Conveniently export torch.compile compiled products into self-contained Python files

Python 33 2 Updated Jun 5, 2026

Agentic Kernel Optimization — advanced & eXtensible: a closed-loop, campaign-based multi-agent system for optimizing GPU kernels (benchmark-swappable; default flashinfer-bench).

Python 55 10 Updated May 31, 2026

Agentic Kernel Optimization for All — automated GPU kernel optimization for any kernel, any hardware, any language

Python 299 21 Updated May 31, 2026
TypeScript 477 27 Updated May 30, 2026

Build your own high performance LLM inference engine in C++ and CUDA - a smaller version of vLLM

C++ 808 51 Updated Apr 14, 2026

A structured course built from personal study notes of the book Linux Basics for Hackers by OccupyTheWeb.

1,169 109 Updated Jun 4, 2026

Student version of Assignment 1 for Stanford CS336 - Language Modeling From Scratch

Python 2,022 2,262 Updated Apr 7, 2026

The NVIDIA VSS Blueprint is a suite of reference architectures for building GPU-accelerated vision agents and AI-powered video analytics applications.

C++ 1,570 323 Updated Jun 19, 2026

NVIDIA Data Center GPU Manager (DCGM) is a project for gathering telemetry and measuring the health of NVIDIA GPUs

C++ 744 100 Updated Jun 11, 2026

A fuzzer for ML compilers

Rust 42 4 Updated Jun 12, 2026

TokenSpeed is a speed-of-light LLM inference engine.

Python 1,474 164 Updated Jun 22, 2026

RSS reader for macOS and iOS.

Swift 10,157 703 Updated Jun 22, 2026

DC-VideoGen: Efficient Video Generation with Deep Compression Video Autoencoder

190 7 Updated Oct 5, 2025

StreamingVLM: Real-Time Understanding for Infinite Video Streams

Python 1,026 63 Updated Oct 15, 2025

TriAttention — Efficient long reasoning with trigonometric KV cache compression. Enables OpenClaw local deployment on memory-constrained GPUs.

Python 791 77 Updated Jun 18, 2026

LongLive 2.0: Infra - Long Video Gen

Python 2,367 215 Updated Jun 13, 2026

CODA: Rewriting Transformer Blocks as GEMM-Epilogue Programs

Python 216 22 Updated Jun 22, 2026

Code, labs, and resources for O'Reilly AI Systems Performance Engineering: GPU optimization, distributed training, inference scaling, and full-stack tuning.

Python 1,601 227 Updated Jun 18, 2026
Next