Skip to content
View AnnaYue's full-sized avatar
  • Ant Group
  • shanghai

Block or report AnnaYue

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
15 stars written in Python
Clear filter

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 62,215 11,055 Updated Nov 6, 2025

SGLang is a fast serving framework for large language models and vision language models.

Python 19,781 3,275 Updated Nov 6, 2025

AI Native Data App Development framework with AWEL(Agentic Workflow Expression Language) and Agents

Python 17,578 2,455 Updated Nov 6, 2025

verl: Volcano Engine Reinforcement Learning for LLMs

Python 15,150 2,430 Updated Nov 6, 2025

Wan: Open and Advanced Large-Scale Video Generative Models

Python 14,629 2,110 Updated Jul 17, 2025

The official GitHub page for the survey paper "A Survey of Large Language Models".

Python 11,943 932 Updated Mar 11, 2025

The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!

Python 8,179 885 Updated Nov 4, 2025

My learning notes/codes for ML SYS.

Python 4,073 248 Updated Oct 6, 2025

Lightning-Fast RL for LLM Reasoning and Agents. Made Simple & Flexible.

Python 2,955 222 Updated Nov 6, 2025

Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3 (NeurIPS'25).

Python 1,973 220 Updated Nov 5, 2025

vLLM’s reference system for K8S-native cluster-wide deployment with community-driven performance optimization

Python 1,911 313 Updated Nov 6, 2025

Efficient and easy multi-instance LLM serving

Python 505 41 Updated Sep 3, 2025

Chat2Graph: Graph Native Agentic System.

Python 363 43 Updated Oct 30, 2025
Python 34 2 Updated Jul 23, 2024

Integrating SSE with NVIDIA Triton Inference Server using a Python backend and Zephyr model. There is very less documentation how to use Nvidia Triton in Streaming use-cases ( hard to find in their…

Python 10 Updated May 29, 2024