Skip to content
View AnnaYue's full-sized avatar
  • Ant Group
  • shanghai

Block or report AnnaYue

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Production-Grade Container Scheduling and Management

Go 118,445 41,656 Updated Nov 7, 2025

Consul is a distributed, highly available, and data center aware solution to connect and configure applications across dynamic, distributed infrastructure.

Go 29,491 4,544 Updated Nov 7, 2025

SGLang is a fast serving framework for large language models and vision language models.

Python 19,946 3,298 Updated Nov 7, 2025

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 62,432 11,106 Updated Nov 7, 2025

An MLOps framework to package, deploy, monitor and manage thousands of production machine learning models

Go 4,670 854 Updated Nov 7, 2025

Lightning-Fast RL for LLM Reasoning and Agents. Made Simple & Flexible.

Python 2,961 224 Updated Nov 7, 2025

A workload for deploying LLM inference services on Kubernetes

Go 98 25 Updated Nov 7, 2025

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…

C++ 12,062 1,848 Updated Nov 7, 2025

vCluster - Create fully functional virtual Kubernetes clusters - Each vcluster runs inside a namespace of the underlying k8s cluster. It's cheaper than creating separate full-blown clusters and it …

Go 10,681 535 Updated Nov 7, 2025

Distributed reliable key-value store for the most critical data of a distributed system

Go 50,714 10,207 Updated Nov 7, 2025

FlashInfer: Kernel Library for LLM Serving

Cuda 4,025 559 Updated Nov 7, 2025

Manage k8s resources effectively with risk under control.

Go 95 11 Updated Nov 7, 2025

verl: Volcano Engine Reinforcement Learning for LLMs

Python 15,217 2,441 Updated Nov 7, 2025

A cloud-native Pipeline resource.

Go 8,793 1,848 Updated Nov 7, 2025

The Triton TensorRT-LLM Backend

905 133 Updated Nov 7, 2025

Temporal service

Go 16,482 1,169 Updated Nov 7, 2025

Heterogeneous AI Computing Virtualization Middleware(Project under CNCF)

Go 2,586 412 Updated Nov 7, 2025

CUDA Templates and Python DSLs for High-Performance Linear Algebra

C++ 8,740 1,520 Updated Nov 7, 2025

Giving Kubernetes Superpowers to everyone

Go 7,111 894 Updated Nov 6, 2025

My learning notes/codes for ML SYS.

Python 4,086 248 Updated Nov 6, 2025

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

C++ 12,338 2,266 Updated Nov 6, 2025

Declarative Intent Driven Platform Orchestrator for Internal Developer Platform (IDP).

Go 1,184 94 Updated Nov 6, 2025

Cost-efficient and pluggable Infrastructure components for GenAI inference

Go 4,355 480 Updated Nov 6, 2025

HAMi-core compiles libvgpu.so, which ensures hard limit on GPU in container

C 249 114 Updated Nov 6, 2025

HugeSCM - A next generation cloud-based version control system

Go 122 7 Updated Nov 6, 2025

AI Native Data App Development framework with AWEL(Agentic Workflow Expression Language) and Agents

Python 17,587 2,456 Updated Nov 6, 2025

vLLM’s reference system for K8S-native cluster-wide deployment with community-driven performance optimization

Python 1,918 314 Updated Nov 6, 2025

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 4,235 420 Updated Nov 6, 2025

Kubernetes AI Toolchain Operator

Go 805 145 Updated Nov 5, 2025

Multi-Cluster application progressive delivery controller

Go 18 6 Updated Nov 5, 2025
Next