Lists (4)
Sort Name ascending (A-Z)
Starred repositories
Pluggable in-process caching engine to build and scale high performance services
vLLM’s reference system for K8S-native cluster-wide deployment with community-driven performance optimization
Cost-efficient and pluggable Infrastructure components for GenAI inference
A Datacenter Scale Distributed Inference Serving Framework
A Flexible Framework for Experiencing Heterogeneous LLM Inference/Fine-tune Optimizations
A bidirectional pipeline parallelism algorithm for computation-communication overlap in DeepSeek V3/R1 training.
A high-performance distributed file system designed to address the challenges of AI training and inference workloads.
Integrate the DeepSeek API into popular softwares
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
DeepEP: an efficient expert-parallel communication library
FlashMLA: Efficient Multi-head Latent Attention Kernels
Community maintained hardware plugin for vLLM on Ascend
A sparse attention kernel supporting mix sparse patterns
verl: Volcano Engine Reinforcement Learning for LLMs
PyTorch native quantization and sparsity for training and inference
🚀 Efficient implementations of state-of-the-art linear attention models
Janus-Series: Unified Multimodal Understanding and Generation Models
Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3 (NeurIPS'25).
16-fold memory access reduction with nearly no loss
Build custom inference engines for models, agents, multi-modal systems, RAG, pipelines and more.
open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.
Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities。