Skip to content
View AnnaYue's full-sized avatar
  • Ant Group
  • shanghai

Block or report AnnaYue

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
59 results for source starred repositories
Clear filter

Production-Grade Container Scheduling and Management

Go 118,430 41,650 Updated Nov 6, 2025

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 62,339 11,080 Updated Nov 6, 2025

Distributed reliable key-value store for the most critical data of a distributed system

Go 50,708 10,208 Updated Nov 6, 2025

Consul is a distributed, highly available, and data center aware solution to connect and configure applications across dynamic, distributed infrastructure.

Go 29,490 4,544 Updated Nov 6, 2025

LLM training in simple, raw C/CUDA

Cuda 28,091 3,266 Updated Jun 26, 2025

本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)

HTML 21,719 2,545 Updated Oct 19, 2025

SGLang is a fast serving framework for large language models and vision language models.

Python 19,866 3,288 Updated Nov 6, 2025

AI Native Data App Development framework with AWEL(Agentic Workflow Expression Language) and Agents

Python 17,580 2,456 Updated Nov 6, 2025

Temporal service

Go 16,471 1,169 Updated Nov 6, 2025

NVIDIA Linux open GPU kernel module source

C 16,329 1,517 Updated Nov 4, 2025

Kolmogorov Arnold Networks

Jupyter Notebook 15,966 1,523 Updated Jan 19, 2025

verl: Volcano Engine Reinforcement Learning for LLMs

Python 15,179 2,438 Updated Nov 6, 2025

Wan: Open and Advanced Large-Scale Video Generative Models

Python 14,640 2,111 Updated Jul 17, 2025

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

C++ 12,332 2,264 Updated Nov 6, 2025

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…

C++ 12,053 1,844 Updated Nov 6, 2025

The official GitHub page for the survey paper "A Survey of Large Language Models".

Python 11,946 932 Updated Mar 11, 2025

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ 11,848 896 Updated Sep 30, 2025

vCluster - Create fully functional virtual Kubernetes clusters - Each vcluster runs inside a namespace of the underlying k8s cluster. It's cheaper than creating separate full-blown clusters and it …

Go 10,677 535 Updated Nov 6, 2025

A cloud-native Pipeline resource.

Go 8,793 1,847 Updated Nov 6, 2025

CUDA Templates and Python DSLs for High-Performance Linear Algebra

C++ 8,736 1,519 Updated Nov 6, 2025

The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!

Python 8,183 885 Updated Nov 4, 2025

Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation

7,929 286 Updated May 15, 2025

Giving Kubernetes Superpowers to everyone

Go 7,108 894 Updated Nov 5, 2025

Automated management of large-scale applications on Kubernetes (incubating project under CNCF)

Go 5,074 832 Updated Nov 3, 2025

An MLOps framework to package, deploy, monitor and manage thousands of production machine learning models

Go 4,669 854 Updated Nov 6, 2025

Fast and reliable background jobs in Go

Go 4,563 126 Updated Oct 27, 2025

Cost-efficient and pluggable Infrastructure components for GenAI inference

Go 4,345 479 Updated Nov 6, 2025

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 4,232 420 Updated Nov 6, 2025

My learning notes/codes for ML SYS.

Python 4,076 248 Updated Nov 6, 2025

FlashInfer: Kernel Library for LLM Serving

Cuda 4,022 558 Updated Nov 6, 2025
Next