Skip to content
View dblate's full-sized avatar
🐢
AI Infrastructure
🐢
AI Infrastructure
  • Baidu
  • Beijing, China

Block or report dblate

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A workload for deploying LLM inference services on Kubernetes

Go 75 19 Updated Oct 8, 2025

Checkpoint-engine is a simple middleware to update model weights in LLM inference engines

Python 758 54 Updated Sep 30, 2025

Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of papers on accelerating LLMs, currently focusing mainly on infer…

276 13 Updated Mar 6, 2025

Efficient Interactive LLM Serving with Proxy Model-based Sequence Length Prediction | A tiny BERT model can tell you the verbosity of an LLM (with low latency overhead!)

Jupyter Notebook 44 7 Updated Jun 1, 2024

A python Linear Programming API

Python 2,343 416 Updated Oct 6, 2025

Cost-efficient and pluggable Infrastructure components for GenAI inference

Go 4,288 469 Updated Oct 9, 2025

The simplest, fastest repository for training/finetuning small-sized VLMs.

Python 4,099 393 Updated Sep 10, 2025

GoReplay is an open-source tool for capturing and replaying live HTTP traffic into a test environment in order to continuously test your system with real data. It can be used to increase confidence…

Go 19,137 76 Updated Apr 5, 2025

16-fold memory access reduction with nearly no loss

Python 105 8 Updated Mar 26, 2025

Supercharge Your LLM with the Fastest KV Cache Layer

Python 5,501 627 Updated Oct 9, 2025

[IEEE T-PAMI 2024] All you need for End-to-end Autonomous Driving

3,341 306 Updated Jul 2, 2025

A sparse attention kernel supporting mix sparse patterns

C++ 313 16 Updated Feb 13, 2025

Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.

Python 16,547 3,672 Updated Jun 2, 2023

A collection of AI Agents papers (Updated biweekly)

608 37 Updated Sep 28, 2025

Hackable and optimized Transformers building blocks, supporting a composable construction.

Python 9,986 722 Updated Oct 6, 2025

[ICCV 2025] Official Repository of VideoLLaMB: Long Video Understanding with Recurrent Memory Bridges

Python 77 2 Updated Feb 27, 2025

A throughput-oriented high-performance serving framework for LLMs

Jupyter Notebook 896 43 Updated Sep 17, 2025

Our Clone of Orca used for experimentation

Python 9 4 Updated Oct 15, 2024

Official PyTorch implementation for "Large Language Diffusion Models"

Python 3,014 201 Updated Sep 30, 2025

A curated list of state-of-the-art research in embodied AI, focusing on vision-language-action (VLA) models, vision-language navigation (VLN), and related multimodal learning approaches.

1,716 72 Updated Oct 9, 2025

The little ASGI framework that shines. 🌟

Python 11,529 1,043 Updated Oct 9, 2025

A curated reading list of research in Mixture-of-Experts(MoE).

646 45 Updated Oct 30, 2024

Materials for learning SGLang

595 48 Updated Oct 1, 2025

An Easy-to-use, Scalable and High-performance RLHF Framework based on Ray (PPO & GRPO & REINFORCE++ & vLLM & Ray & Dynamic Sampling & Async Agentic RL)

Python 8,100 790 Updated Oct 9, 2025

关于自建AI推理引擎的手册,从0开始你需要知道的所有事情

270 21 Updated Sep 8, 2022

Development repository for the Triton language and compiler

MLIR 17,168 2,289 Updated Oct 10, 2025

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 4,069 389 Updated Oct 9, 2025

My learning notes/codes for ML SYS.

Python 3,823 232 Updated Oct 6, 2025

A Datacenter Scale Distributed Inference Serving Framework

Rust 5,264 633 Updated Oct 10, 2025
Next