Skip to content
View zszdsze's full-sized avatar

Block or report zszdsze

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this userโ€™s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A collection of modern C++ libraries, include coro_http, coro_rpc, compile-time reflection, struct_pack, struct_json, struct_xml, struct_pb, easylog, async_simple etc.

C++ 2,013 300 Updated Nov 5, 2025

LLM Inference on consumer devices

Python 125 15 Updated Mar 17, 2025

FlexFlow Serve: Low-Latency, High-Performance LLM Serving

C++ 63 5 Updated Sep 15, 2025

[ICLR2025 Spotlight] MagicPIG: LSH Sampling for Efficient LLM Generation

Python 238 16 Updated Dec 16, 2024

Automatically Discovering Fast Parallelization Strategies for Distributed Deep Neural Network Training

C++ 1,844 245 Updated Nov 4, 2025

scalable and robust tree-based speculative decoding algorithm

Python 361 37 Updated Jan 28, 2025

Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3 (NeurIPS'25).

Python 1,976 220 Updated Nov 5, 2025

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

Python 3,704 281 Updated Nov 6, 2025

Efficient and easy multi-instance LLM serving

Python 505 41 Updated Sep 3, 2025

InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management (OSDI'24)

Python 157 29 Updated Jul 10, 2024

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 62,301 11,072 Updated Nov 6, 2025

Cost-efficient and pluggable Infrastructure components for GenAI inference

Go 4,344 479 Updated Nov 6, 2025

Achieve state of the art inference performance with modern accelerators on Kubernetes

Shell 1,983 221 Updated Nov 6, 2025

Supercharge Your LLM with the Fastest KV Cache Layer

Python 5,914 691 Updated Nov 6, 2025

A generative world for general-purpose robotics & embodied AI learning.

Python 27,554 2,534 Updated Nov 6, 2025

Octo is a transformer-based robot policy trained on a diverse mix of 800k robot trajectories.

Python 1,410 236 Updated Jul 31, 2024

๐Ÿ”ฅ SpatialVLA: a spatial-enhanced vision-language-action model that is trained on 1.1 Million real robot episodes. Accepted at RSS 2025.

Python 564 33 Updated Jun 23, 2025

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 4,229 420 Updated Nov 6, 2025

This repository collects papers on VLLM applications. We will update new papers irregularly.

174 14 Updated Sep 7, 2025

RDMA core userspace libraries and daemons

C 2,012 792 Updated Nov 2, 2025

This is a curated list of "Embodied AI or robot with Large Language Models" research. Watch this repository for the latest updates! ๐Ÿ”ฅ

1,583 89 Updated Oct 30, 2025

Official Algorithm Implementation of ICML'23 Paper "VIMA: General Robot Manipulation with Multimodal Prompts"

Python 831 96 Updated Apr 18, 2024

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Python 23,908 2,659 Updated Aug 12, 2024

The official repo of Qwen-VL (้€šไน‰ๅƒ้—ฎ-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.

Python 6,346 468 Updated Aug 7, 2024

OpenVLA: An open-source vision-language-action model for robotic manipulation.

Python 4,334 518 Updated Mar 23, 2025

Collection of AWESOME vision-language models for vision tasks

2,990 223 Updated Oct 14, 2025

[ICLR 2024] Efficient Streaming Language Models with Attention Sinks

Python 7,110 391 Updated Jul 11, 2024
Python 345 44 Updated Apr 2, 2024

[ICML 2025 Spotlight] ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference

Python 269 18 Updated May 1, 2025

๐Ÿ“ฐ Must-read papers on KV Cache Compression (constantly updating ๐Ÿค—).

594 15 Updated Sep 30, 2025
Next