asd123www

Follow

Zezhou Wang asd123www

Follow

ByteDance Seed and UW system lab. Graduated from Peking University.

21 followers · 31 following

University of Washington
Seattle, WA
15:05 (UTC -08:00)

Achievements

Achievements

Highlights

Pro

Stars

sgl-project / mini-sglang

A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.

Python 2,144 180 Updated Dec 22, 2025

NVIDIA / cuda-tile

CUDA Tile IR is an MLIR-based intermediate representation and compiler infrastructure for CUDA kernel optimization, focusing on tile-based computation patterns and optimizations targeting NVIDIA te…

MLIR 309 23 Updated Dec 20, 2025

NovaSky-AI / SkyRL

SkyRL: A Modular Full-stack RL Library for LLMs

Python 1,396 204 Updated Dec 22, 2025

deepseek-ai / DeepEP

DeepEP: an efficient expert-parallel communication library

Cuda 8,826 1,035 Updated Dec 5, 2025

ByteDance-Seed / TraceAnything

Trace Anything: Representing Any Video in 4D via Trajectory Fields

Python 439 14 Updated Oct 31, 2025

thinking-machines-lab / tinker-cookbook

Post-training with Tinker

Python 2,600 255 Updated Dec 20, 2025

uccl-project / uccl

UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g., GPU-driven)

C++ 1,134 106 Updated Dec 22, 2025

openai / codex

Lightweight coding agent that runs in your terminal

Rust 54,504 6,922 Updated Dec 22, 2025

NVIDIA / nccl

Optimized primitives for collective multi-GPU communication

C++ 4,326 1,095 Updated Dec 2, 2025

gbxu / autoccl

[NSDI25] AutoCCL: Automated Collective Communication Tuning for Accelerating Distributed and Parallel DNN Training

C++ 29 3 Updated May 2, 2025

stepfun-ai / Step3

441 10 Updated Aug 10, 2025

bytedance / flux

A fast communication-overlapping library for tensor/expert parallelism on GPUs.

C++ 1,211 85 Updated Aug 28, 2025

microsoft / mscclpp

MSCCL++: A GPU-driven communication stack for scalable AI applications

C++ 445 77 Updated Dec 22, 2025

OpenHands / OpenHands

🙌 OpenHands: AI-Driven Development

Python 65,840 8,100 Updated Dec 22, 2025

ChenmienTan / RL2

Python 961 101 Updated Dec 21, 2025

THUDM / slime

slime is an LLM post-training framework for RL Scaling.

Python 2,939 356 Updated Dec 22, 2025

Supercomputing-System-AI-Lab / MiLo

Code repo for efficient quantized MoE inference with mixture of low-rank compensators

Python 28 Updated Apr 14, 2025

skyzh / tiny-llm

A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny vLLM + Qwen.

Python 3,564 246 Updated Dec 18, 2025

volcengine / verl

verl: Volcano Engine Reinforcement Learning for LLMs

Python 17,709 2,866 Updated Dec 22, 2025

Dao-AILab / flash-attention

Fast and memory-efficient exact attention

Python 21,243 2,241 Updated Dec 22, 2025

HazyResearch / Megakernels

kernels, of the mega variety

Python 631 34 Updated Sep 28, 2025

volcengine / veScale

Byted PyTorch Distributed for Hyperscale Training of LLMs and RLs

Python 910 53 Updated Nov 27, 2025

mirage-project / mirage

Mirage Persistent Kernel: Compiling LLMs into a MegaKernel

C++ 2,003 161 Updated Dec 20, 2025

H-Huang / torch_collective_extension

A minimum demo for PyTorch distributed extension functionality for collectives.

C++ 14 4 Updated Jul 29, 2024

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 65,952 12,123 Updated Dec 22, 2025

pytorch / pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Python 96,084 26,335 Updated Dec 22, 2025

huggingface / open-r1

Fully open reproduction of DeepSeek-R1

Python 25,746 2,405 Updated Nov 24, 2025

openai / whisper

Robust Speech Recognition via Large-Scale Weak Supervision

Python 92,241 11,560 Updated Dec 15, 2025

hyunwoongko / transformer

Transformer: PyTorch Implementation of "Attention Is All You Need"

Python 4,332 615 Updated Jul 15, 2025

torvalds / linux

Linux kernel source tree

C 211,476 59,618 Updated Dec 22, 2025