xmfbit

xmfbit xmfbit

Bytedance

256 followers · 199 following

Bytedance
Beijing,China

Achievements

Starred repositories

fjybiocs / perfetto-ui-overlap-fixer

Python 2 Updated Jun 17, 2026

larksuite / cli

The official Lark/Feishu CLI tool, maintained by the larksuite team — built for humans and AI Agents. Covers core business domains including Messenger, Docs, Base, Sheets, Calendar, Mail, Tasks, Me…

Go 14,481 997 Updated Jun 22, 2026

radixark / miles_diffusion

[Experimental] Miles-diffusion is an post-training framework for large-scale diffusion model training and production workloads, forked from and co-evolving with miles.

Python 18 5 Updated Jun 17, 2026

MoonshotAI / Moonlight

Muon is Scalable for LLM Training

1,493 89 Updated Aug 3, 2025

BBuf / KDA-Pilot

Python 191 31 Updated Jun 22, 2026

google-deepmind / science-skills

GDM Science Skills to speed up agentic scientific workflows with better grounding and higher token efficiency. Integrate insights from AlphaGenome, AFDB, UniProt and 30+ other databases and tools.

Python 2,000 205 Updated Jun 8, 2026

thu-ml / TurboDiffusion

TurboDiffusion: 100–200× Acceleration for Video Diffusion Models

Python 3,536 265 Updated Jun 17, 2026

uccl-project / mKernel

mKernel: fast multi-node, multi-GPU fused kernels

Cuda 239 22 Updated Jun 21, 2026

haskaomni / serenity

Python 210 36 Updated May 29, 2026

CatChen / knowledge-wiki-template

JavaScript 82 6 Updated Jun 21, 2026

meta-pytorch / export-python

Conveniently export torch.compile compiled products into self-contained Python files

Python 33 2 Updated Jun 5, 2026

TongmingLAIC / AKO4X

Agentic Kernel Optimization — advanced & eXtensible: a closed-loop, campaign-based multi-agent system for optimizing GPU kernels (benchmark-swappable; default flashinfer-bench).

Python 55 10 Updated May 31, 2026

TongmingLAIC / AKO4ALL

Agentic Kernel Optimization for All — automated GPU kernel optimization for any kernel, any hardware, any language

Python 299 21 Updated May 31, 2026

tantara / openbrief

TypeScript 477 27 Updated May 30, 2026

jmaczan / tiny-vllm

Build your own high performance LLM inference engine in C++ and CUDA - a smaller version of vLLM

C++ 808 51 Updated Apr 14, 2026

ahegazy0 / linux-basics-for-hackers-notes

A structured course built from personal study notes of the book Linux Basics for Hackers by OccupyTheWeb.

1,169 109 Updated Jun 4, 2026

stanford-cs336 / assignment1-basics

Student version of Assignment 1 for Stanford CS336 - Language Modeling From Scratch

Python 2,022 2,262 Updated Apr 7, 2026

NVIDIA-AI-Blueprints / video-search-and-summarization

The NVIDIA VSS Blueprint is a suite of reference architectures for building GPU-accelerated vision agents and AI-powered video analytics applications.

C++ 1,570 323 Updated Jun 19, 2026

NVIDIA / DCGM

NVIDIA Data Center GPU Manager (DCGM) is a project for gathering telemetry and measuring the health of NVIDIA GPUs

C++ 744 100 Updated Jun 11, 2026

SemiAnalysisAI / FuzzX

A fuzzer for ML compilers

Rust 42 4 Updated Jun 12, 2026

lightseekorg / tokenspeed

TokenSpeed is a speed-of-light LLM inference engine.

Python 1,474 164 Updated Jun 22, 2026

Ranchero-Software / NetNewsWire

RSS reader for macOS and iOS.

Swift 10,157 703 Updated Jun 22, 2026

mit-han-lab / mlsys2026-flashinfer-contest

Python 89 5 Updated Jun 13, 2026

dc-ai-projects / DC-VideoGen

DC-VideoGen: Efficient Video Generation with Deep Compression Video Autoencoder

190 7 Updated Oct 5, 2025

mit-han-lab / streaming-vlm

StreamingVLM: Real-Time Understanding for Infinite Video Streams

Python 1,026 63 Updated Oct 15, 2025

WeianMao / triattention

TriAttention — Efficient long reasoning with trigonometric KV cache compression. Enables OpenClaw local deployment on memory-constrained GPUs.

Python 791 77 Updated Jun 18, 2026

NVlabs / LongLive

LongLive 2.0: Infra - Long Video Gen

Python 2,367 215 Updated Jun 13, 2026

mit-han-lab / kernel-design-agents

616 51 Updated Jun 2, 2026

open-lm-engine / coda-kernels

CODA: Rewriting Transformer Blocks as GEMM-Epilogue Programs

Python 216 22 Updated Jun 22, 2026

cfregly / ai-performance-engineering

Code, labs, and resources for O'Reilly AI Systems Performance Engineering: GPU optimization, distributed training, inference scaling, and full-stack tuning.

Python 1,601 227 Updated Jun 18, 2026

weakly-supervised-learning

xmfbit xmfbit

Starred repositories

weakly-supervised-learning

one-shot-learning