Stars
Training library for Megatron-based models with bidirectional Hugging Face conversion capability
iFlow cli is a comprehensive command-line intelligence that embeds in your terminal, analyzes your repositories, does coding tasks, interprets your needs across contexts, and boosts efficiency by p…
Miles is an enterprise-facing reinforcement learning framework for LLM and VLM post-training, forked from and co-evolving with slime.
slime is an LLM post-training framework for RL Scaling.
High-performance distributed data shuffling (all-to-all) library for MoE training and inference
A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.
andylin-hao / RLinf
Forked from RLinf/RLinfRLinf is a flexible, scalable and open-source infrastructure designed for reinforcement-learning (RL) post-training of foundation models — including large language models (LLMs), vision-language mo…
RLinf: Reinforcement Learning Infrastructure for Embodied and Agentic AI
A prefill & decode disaggregated LLM serving framework with shared GPU memory and fine-grained compute isolation.
Repo for SpecEE: Accelerating Large Language Model Inference with Speculative Early Exiting (ISCA25)
AISystem 主要是指AI系统,包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术
Pretrain, finetune ANY AI model of ANY size on 1 or 10,000+ GPUs with zero code changes.
cuDNN Frontend is NVIDIA's modern, open-source entry point to the cuDNN library and a growing collection of high-performance open-source kernels.
Matrix Shadow:Lightweight CPU/GPU Matrix and Tensor Template Library in C++/CUDA for (Deep) Machine Learning
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…
NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
Fast and memory-efficient exact attention
Making large AI models cheaper, faster and more accessible
A utility library for application developers to query the configuration of the Arm Immortalis GPU or Arm Mali GPU present in their system.
为GPT/GLM等LLM大语言模型提供实用化交互接口,特别优化论文阅读/润色/写作体验,模块化设计,支持自定义快捷按钮&函数插件,支持Python和C++等项目剖析&自译解功能,PDF/LaTex论文翻译&总结功能,支持并行问询多种LLM模型,支持chatglm3等本地模型。接入通义千问, deepseekcoder, 讯飞星火, 文心一言, llama2, rwkv, claude2, m…
A profiler to disclose and quantify hardware features on GPUs.