-
alibabacloud
- Beijing
Stars
A fast communication-overlapping library for tensor/expert parallelism on GPUs.
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
My learning notes for ML SYS.
verl: Volcano Engine Reinforcement Learning for LLMs
Train transformer language models with reinforcement learning.
Byted PyTorch Distributed for Hyperscale Training of LLMs and RLs
A throughput-oriented high-performance serving framework for LLMs
NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
Get up and running with OpenAI gpt-oss, DeepSeek-R1, Gemma 3 and other models.
SGLang is a fast serving framework for large language models and vision language models.
Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"
A modular graph-based Retrieval-Augmented Generation (RAG) system
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
A high-throughput and memory-efficient inference and serving engine for LLMs
Fast and memory-efficient exact attention
Kubernetes community content
Standardized Distributed Generative and Predictive AI Inference Platform for Scalable, Multi-Framework Deployment on Kubernetes
AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.
The simplest, fastest repository for training/finetuning medium-sized GPTs.
🦜🔗 The platform for reliable agents.
Crane is a FinOps Platform for Cloud Resource Analytics and Economics in Kubernetes clusters. The goal is not only to help users to manage cloud cost easier but also ensure the quality of applicati…
An app development platform using cloud native stacks