Starred repositories
Tile-Based Runtime for Ultra-Low-Latency LLM Inference
slime is an LLM post-training framework for RL Scaling.
A framework for efficient model inference with omni-modality models
A Lighting Pytorch Framework for Recommendation Models (PyTorch推荐算法框架), Easy-to-use and Easy-to-extend. https://datawhalechina.github.io/torch-rechub/
[ICLR 2025] Official PyTorch Implementation of Gated Delta Networks: Improving Mamba2 with Delta Rule
Perplexity open source garden for inference technology
🌟100+ 原创 LLM / RL 原理图📚,《大模型算法》作者巨献!💥(100+ LLM/RL Algorithm Maps )
Fair-code workflow automation platform with native AI capabilities. Combine visual building with custom code, self-host or cloud, 400+ integrations.
A high-performance inference engine for LLMs, optimized for diverse AI accelerators.
Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond
Xray, Penetrates Everything. Also the best v2ray-core. Where the magic happens. An open platform for various uses.
PipeFusion / PipeFusion
Forked from xdit-project/xDiTA Suite for Parallel Inference of Diffusion Transformers (DiTs) on multi-GPU Clusters
An industrial deep learning framework for high-dimension sparse data
A unified architecture deep learning framework designed specifically for ultra-large-scale sparse models.
Speed Always Wins: A Survey on Efficient Architectures for Large Language Models
[ICML 2024] When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models
A PyTorch native platform for training generative AI models
dInfer: An Efficient Inference Framework for Diffusion Language Models
The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.
VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo
📺 An End-to-End Solution for High-Resolution and Long Video Generation Based on Transformer Diffusion
Distributed parallel 3D-Causal-VAE for efficient training and inference
Wan: Open and Advanced Large-Scale Video Generative Models
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels