Skip to content
View zhisbug's full-sized avatar
🚀
Member of acceleration stuff
🚀
Member of acceleration stuff

Organizations

@alpa-projects

Block or report zhisbug

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A Datacenter Scale Distributed Inference Serving Framework

Rust 6,478 993 Updated Apr 4, 2026

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 75,188 15,139 Updated Apr 4, 2026

A unified inference and post-training framework for accelerated video generation.

Python 3,342 308 Updated Apr 3, 2026

The source of LMSYS website and blogs

JavaScript 83 75 Updated Mar 31, 2026

d3LLM: Ultra-Fast Diffusion LLM 🚀

Python 114 6 Updated Mar 19, 2026

Simple Distributed Deep Learning on TensorFlow

Python 136 25 Updated Feb 5, 2026

Multi-Turn RL Training System with AgentTrainer for Language Model Game Reinforcement Learning

Python 61 11 Updated Dec 18, 2025

[NeurIPS 2025] Scaling Speculative Decoding with Lookahead Reasoning

Python 67 6 Updated Oct 31, 2025

A curated list of recent papers on efficient video attention for video diffusion models, including sparsification, quantization, and caching, etc.

60 4 Updated Oct 27, 2025
Jupyter Notebook 87 12 Updated Oct 17, 2025

Pygloo provides Python bindings for Gloo.

C++ 22 13 Updated Jul 7, 2025

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

Python 39,450 4,789 Updated Jun 2, 2025

[NeurIPS 2025] Simple extension on vLLM to help you speed up reasoning model without training.

Python 227 31 Updated May 31, 2025

shadowsocks.wiki

3,124 515 Updated Apr 22, 2025

[ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding

Python 1,325 82 Updated Mar 6, 2025

[ICML 2024] CLLMs: Consistency Large Language Models

Python 414 23 Updated Nov 16, 2024

[NeurIPS 2024] Efficient LLM Scheduling by Learning to Rank

Python 78 18 Updated Nov 4, 2024

Running large language models on a single GPU for throughput-oriented scenarios.

Python 9,376 594 Updated Oct 28, 2024

Code for "BayesAdapter: Being Bayesian, Inexpensively and Robustly, via Bayeisan Fine-tuning"

Python 32 7 Updated Jul 25, 2024

An end-to-end PyTorch framework for image and video classification

Python 1,614 273 Updated Jun 27, 2024

Hyperparameter tuning via uncertainty modeling

Python 51 4 Updated May 3, 2024

Training and serving large-scale neural networks with auto parallelization.

Python 3,187 361 Updated Dec 9, 2023

DyNet: The Dynamic Neural Network Toolkit

C++ 3,434 702 Updated Dec 1, 2023

AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving (OSDI 23)

Python 94 17 Updated Jul 14, 2023

Swarm training framework using Haiku + JAX + Ray for layer parallel transformer language models on unreliable, heterogeneous nodes

Python 242 22 Updated May 12, 2023

Resource-adaptive cluster scheduler for deep learning training.

Python 457 81 Updated Mar 5, 2023
Lua 266 64 Updated Jan 26, 2023

An open source framework that provides a simple, universal API for building distributed applications. Ray is packaged with RLlib, a scalable reinforcement learning library, and Tune, a scalable hyp…

Python 3 1 Updated Jan 21, 2023

(NeurIPS 2022) Automatically finding good model-parallel strategies, especially for complex models and clusters.

Python 44 7 Updated Nov 4, 2022
Next