Skip to content
View AnnaYue's full-sized avatar
  • Ant Group
  • shanghai

Block or report AnnaYue

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Temporal service

Go 16,450 1,166 Updated Nov 5, 2025

HAMi-core compiles libvgpu.so, which ensures hard limit on GPU in container

C 248 112 Updated Oct 10, 2025

Fast and reliable background jobs in Go

Go 4,562 126 Updated Oct 27, 2025

A workload for deploying LLM inference services on Kubernetes

Go 95 24 Updated Nov 5, 2025

SGLang is a fast serving framework for large language models and vision language models.

Python 19,755 3,273 Updated Nov 5, 2025

Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3 (NeurIPS'25).

Python 1,971 220 Updated Nov 5, 2025

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 4,219 420 Updated Nov 5, 2025

verl: Volcano Engine Reinforcement Learning for LLMs

Python 15,135 2,428 Updated Nov 5, 2025

My learning notes/codes for ML SYS.

Python 4,068 247 Updated Oct 6, 2025

CUDA Templates and Python DSLs for High-Performance Linear Algebra

C++ 8,728 1,514 Updated Nov 5, 2025

vLLM’s reference system for K8S-native cluster-wide deployment with community-driven performance optimization

Python 1,907 313 Updated Nov 5, 2025

Production-Grade Container Scheduling and Management

Go 118,408 41,640 Updated Nov 5, 2025

AI Native Data App Development framework with AWEL(Agentic Workflow Expression Language) and Agents

Python 17,578 2,455 Updated Nov 4, 2025

Wan: Open and Advanced Large-Scale Video Generative Models

Python 14,631 2,110 Updated Jul 17, 2025

Lightning-Fast RL for LLM Reasoning and Agents. Made Simple & Flexible.

Python 2,953 221 Updated Nov 5, 2025

Chat2Graph: Graph Native Agentic System.

Python 362 43 Updated Oct 30, 2025

Cost-efficient and pluggable Infrastructure components for GenAI inference

Go 4,341 479 Updated Nov 4, 2025

FlashInfer: Kernel Library for LLM Serving

Cuda 4,018 558 Updated Nov 5, 2025

KCL Programming Language (CNCF Sandbox Project). https://kcl-lang.io

Rust 2,194 152 Updated Oct 30, 2025

动手学机器学习

Jupyter Notebook 21 3 Updated Aug 14, 2025

LeaderWorkerSet: An API for deploying a group of pods as a unit of replication

Go 605 114 Updated Nov 4, 2025

Heterogeneous AI Computing Virtualization Middleware(Project under CNCF)

Go 2,573 412 Updated Nov 5, 2025

Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation

7,930 286 Updated May 15, 2025

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ 11,842 896 Updated Sep 30, 2025

HugeSCM - A next generation cloud-based version control system

Go 123 7 Updated Nov 5, 2025

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…

C++ 12,043 1,839 Updated Nov 5, 2025

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

C++ 12,326 2,264 Updated Sep 24, 2025

Efficient and easy multi-instance LLM serving

Python 505 41 Updated Sep 3, 2025

The Triton TensorRT-LLM Backend

904 132 Updated Nov 4, 2025

AI 基础知识 - GPU 架构、CUDA 编程、大模型基础及AI Agent 相关知识

HTML 555 88 Updated Nov 2, 2025
Next