Skip to content
View asd123www's full-sized avatar
  • University of Washington
  • Seattle, WA
  • 15:05 (UTC -08:00)

Highlights

  • Pro

Block or report asd123www

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.

Python 2,144 180 Updated Dec 22, 2025

CUDA Tile IR is an MLIR-based intermediate representation and compiler infrastructure for CUDA kernel optimization, focusing on tile-based computation patterns and optimizations targeting NVIDIA te…

MLIR 309 23 Updated Dec 20, 2025

SkyRL: A Modular Full-stack RL Library for LLMs

Python 1,396 204 Updated Dec 22, 2025

DeepEP: an efficient expert-parallel communication library

Cuda 8,826 1,035 Updated Dec 5, 2025

Trace Anything: Representing Any Video in 4D via Trajectory Fields

Python 439 14 Updated Oct 31, 2025

Post-training with Tinker

Python 2,600 255 Updated Dec 20, 2025

UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g., GPU-driven)

C++ 1,134 106 Updated Dec 22, 2025

Lightweight coding agent that runs in your terminal

Rust 54,504 6,922 Updated Dec 22, 2025

Optimized primitives for collective multi-GPU communication

C++ 4,326 1,095 Updated Dec 2, 2025

[NSDI25] AutoCCL: Automated Collective Communication Tuning for Accelerating Distributed and Parallel DNN Training

C++ 29 3 Updated May 2, 2025

A fast communication-overlapping library for tensor/expert parallelism on GPUs.

C++ 1,211 85 Updated Aug 28, 2025

MSCCL++: A GPU-driven communication stack for scalable AI applications

C++ 445 77 Updated Dec 22, 2025

🙌 OpenHands: AI-Driven Development

Python 65,840 8,100 Updated Dec 22, 2025
Python 961 101 Updated Dec 21, 2025

slime is an LLM post-training framework for RL Scaling.

Python 2,939 356 Updated Dec 22, 2025

Code repo for efficient quantized MoE inference with mixture of low-rank compensators

Python 28 Updated Apr 14, 2025

A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny vLLM + Qwen.

Python 3,564 246 Updated Dec 18, 2025

verl: Volcano Engine Reinforcement Learning for LLMs

Python 17,709 2,866 Updated Dec 22, 2025

Fast and memory-efficient exact attention

Python 21,243 2,241 Updated Dec 22, 2025

kernels, of the mega variety

Python 631 34 Updated Sep 28, 2025

Byted PyTorch Distributed for Hyperscale Training of LLMs and RLs

Python 910 53 Updated Nov 27, 2025

Mirage Persistent Kernel: Compiling LLMs into a MegaKernel

C++ 2,003 161 Updated Dec 20, 2025

A minimum demo for PyTorch distributed extension functionality for collectives.

C++ 14 4 Updated Jul 29, 2024

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 65,952 12,123 Updated Dec 22, 2025

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Python 96,084 26,335 Updated Dec 22, 2025

Fully open reproduction of DeepSeek-R1

Python 25,746 2,405 Updated Nov 24, 2025

Robust Speech Recognition via Large-Scale Weak Supervision

Python 92,241 11,560 Updated Dec 15, 2025

Transformer: PyTorch Implementation of "Attention Is All You Need"

Python 4,332 615 Updated Jul 15, 2025

Linux kernel source tree

C 211,476 59,618 Updated Dec 22, 2025
Next