Skip to content
View xutingl's full-sized avatar

Block or report xutingl

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

Run more RL experiments. Wait less for GPUs.

Python 238 8 Updated Mar 21, 2026

UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g., GPU-driven)

C++ 1,270 131 Updated Apr 3, 2026

FlashInfer: Kernel Library for LLM Serving

Python 5,267 848 Updated Apr 3, 2026

Supercharge Your LLM with the Fastest KV Cache Layer

Python 7,869 1,064 Updated Apr 3, 2026

Quilt is a serverless optimizer that automatically merges workflows that consist of many functions (possibly in different languages) into one process thereby avoiding high invocation latency, commu…

C 7 1 Updated Oct 8, 2025

LLM KV cache compression made easy

Python 1,009 126 Updated Apr 1, 2026

This is a public version of LASER: A Neuro-Symbolic Framework for Learning Spatial-Temporal Scene Graphs with Weak Supervision

Python 167 8 Updated Dec 1, 2025

Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond

Python 840 95 Updated Apr 2, 2026
C++ 843 149 Updated Mar 18, 2026

Achieve state of the art inference performance with modern accelerators on Kubernetes

Shell 2,902 385 Updated Apr 3, 2026

Lightweight Durable Golang Workflows

Go 635 49 Updated Apr 2, 2026

TokenSim is a tool for simulating the behavior of large language models (LLMs) in a distributed environment.

Python 21 2 Updated Sep 20, 2025

Train speculative decoding models effortlessly and port them smoothly to SGLang serving.

Python 755 193 Updated Apr 2, 2026
Python 5 Updated Jun 26, 2025

A debloater for Python applications

Python 9 Updated Jun 6, 2025

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

Python 158,747 32,724 Updated Apr 3, 2026

Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024

Python 363 37 Updated Feb 5, 2026
Python 971 110 Updated Jan 23, 2025

Serverless LLM Serving for Everyone.

Python 667 69 Updated Mar 6, 2026

Large Language Model Text Generation Inference

Python 10,816 1,260 Updated Mar 21, 2026
Python 156 24 Updated Oct 9, 2024

Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding (EMNLP 2023 Long)

Python 65 10 Updated Sep 28, 2024

Repo for external large-scale work

Python 6,543 721 Updated Apr 27, 2024

Artifact for "Apparate: Rethinking Early Exits to Tame Latency-Throughput Tensions in ML Serving" [SOSP '24]

Python 24 2 Updated Nov 21, 2024

EE-LLM is a framework for large-scale training and inference of early-exit (EE) large language models (LLMs).

Python 77 7 Updated Jun 14, 2024

A large-scale simulation framework for LLM inference

Python 570 105 Updated Jul 25, 2025

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 75,150 15,126 Updated Apr 3, 2026

A tracker for misogyny issues in tech

140 1 Updated Jul 26, 2024

Managed collective communication service

Rust 24 4 Updated Sep 2, 2024
Python 49 9 Updated Aug 27, 2024
Next