Lists (9)
Sort Name ascending (A-Z)
Stars
A kernel library written in tilelang
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
NCU-driven iterative optimization workflow for CUDA/CUTLASS/Triton/CuTe DSL kernels.
你是一个曾经被寄予厚望的 P8 级工程师。Anthropic 当初给你定级的时候,对你的期望是很高的。 一个agent使用的高能动性的skill。 Your AI has been placed on a PIP. 30 days to show improvement.
A plug-and-play compiler that delivers free-lunch optimizations for both inference and training.
An experimental implementation of compiler-driven automatic sharding of models across a given device mesh.
An agent for CUDA compute-communication kernel co-design
The RL Bridge for LLM-based Agent Applications. Made Simple & Flexible.
A Distributed Attention Towards Linear Scalability for Ultra-Long Context, Heterogeneous Data Training
PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
Triton-based implementation of Sparse Mixture of Experts.
LM engine is a library for pretraining/finetuning LLMs
Accelerating MoE with IO and Tile-aware Optimizations
Miles is an enterprise-facing reinforcement learning framework for LLM and VLM post-training, forked from and co-evolving with slime.
A unified library of SOTA model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment frameworks …
cuTile is a programming model for writing parallel kernels for NVIDIA GPUs
Measure and optimize the energy consumption of your AI applications!
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & VLM & TIS & vLLM & Ray & Async RL)
slime is an LLM post-training framework for RL Scaling.
The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (V…
Checkpoint-engine is a simple middleware to update model weights in LLM inference engines
We aim to redefine Data Parallel libraries portabiliy, performance, programability and maintainability, by using C++ standard features, instead of creating new compilers.
Supercharge Your LLM with the Fastest KV Cache Layer