Skip to content
View xutingl's full-sized avatar

Block or report xutingl

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

Run more RL experiments. Wait less for GPUs.

Python 243 12 Updated Apr 13, 2026

UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g., GPU-driven)

C++ 1,301 136 Updated Apr 15, 2026

FlashInfer: Kernel Library for LLM Serving

Python 5,400 895 Updated Apr 15, 2026

Supercharge Your LLM with the Fastest KV Cache Layer

Python 7,992 1,095 Updated Apr 15, 2026

Quilt is a serverless optimizer that automatically merges workflows that consist of many functions (possibly in different languages) into one process thereby avoiding high invocation latency, commu…

C 7 1 Updated Oct 8, 2025

LLM KV cache compression made easy

Python 1,040 130 Updated Apr 14, 2026

This is a public version of LASER: A Neuro-Symbolic Framework for Learning Spatial-Temporal Scene Graphs with Weak Supervision

Python 167 8 Updated Dec 1, 2025

Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond

Python 855 99 Updated Apr 7, 2026
C++ 871 151 Updated Apr 10, 2026

Achieve state of the art inference performance with modern accelerators on Kubernetes

Shell 2,989 413 Updated Apr 15, 2026

Lightweight Durable Golang Workflows

Go 642 51 Updated Apr 15, 2026

TokenSim is a tool for simulating the behavior of large language models (LLMs) in a distributed environment.

Python 22 2 Updated Sep 20, 2025

Train speculative decoding models effortlessly and port them smoothly to SGLang serving.

Python 782 207 Updated Apr 2, 2026
Python 5 Updated Jun 26, 2025

A debloater for Python applications

Python 9 Updated Jun 6, 2025

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

Python 159,431 32,880 Updated Apr 15, 2026

Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024

Python 365 38 Updated Apr 13, 2026
Python 971 110 Updated Jan 23, 2025

Serverless LLM Serving for Everyone.

Python 674 70 Updated Mar 6, 2026

Large Language Model Text Generation Inference

Python 10,832 1,261 Updated Mar 21, 2026
Python 158 24 Updated Oct 9, 2024

Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding (EMNLP 2023 Long)

Python 65 10 Updated Sep 28, 2024

Repo for external large-scale work

Python 6,540 721 Updated Apr 27, 2024

Artifact for "Apparate: Rethinking Early Exits to Tame Latency-Throughput Tensions in ML Serving" [SOSP '24]

Python 24 2 Updated Nov 21, 2024

EE-LLM is a framework for large-scale training and inference of early-exit (EE) large language models (LLMs).

Python 79 7 Updated Jun 14, 2024

A large-scale simulation framework for LLM inference

Python 585 108 Updated Jul 25, 2025

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 76,723 15,633 Updated Apr 15, 2026

A tracker for misogyny issues in tech

140 1 Updated Jul 26, 2024

Managed collective communication service

Rust 24 4 Updated Sep 2, 2024
Python 49 8 Updated Aug 27, 2024
Next