- All languages
- Assembly
- Batchfile
- C
- C#
- C++
- CMake
- CSS
- Clojure
- Cuda
- Cython
- D2
- Dockerfile
- Elixir
- Go
- HCL
- HTML
- Java
- JavaScript
- Jupyter Notebook
- LLVM
- Lua
- M4
- MATLAB
- MDX
- Makefile
- Markdown
- Mustache
- NSIS
- Objective-C
- OpenQASM
- PHP
- PLpgSQL
- Perl
- Python
- R
- Roff
- Ruby
- Rust
- Scala
- Shell
- Smarty
- Starlark
- TypeScript
- Vim Script
- Vue
- YAML
- Yacc
Starred repositories
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
Official inference repo for FLUX.1 models
An early research stage expert-parallel load balancer for MoE models based on linear programming.
A guidance language for controlling large language models.
Disaggregated serving system for Large Language Models (LLMs).
High performance Transformer implementation in C++.
DLSlime: Flexible & Efficient Heterogeneous Transfer Toolkit
MSCCL++: A GPU-driven communication stack for scalable AI applications
A throughput-oriented high-performance serving framework for LLMs
Venus Collective Communication Library, supported by SII and Infrawaves.
Efficient Compute-Communication Overlap for Distributed LLM Inference
一个深挖 Linux 内核的新功能特性,以 io_uring, cgroup, ebpf, llvm 为代表,包含开源项目,代码案例,文章,视频,架构脑图等
Seamless operability between C++11 and Python
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
注释的nano_vllm仓库,并且完成了MiniCPM4的适配以及注册新模型的功能
bytedance-iaas / sglang
Forked from sgl-project/sglangSGLang is a fast serving framework for large language models and vision language models.
preminstrel / vllm
Forked from vllm-project/vllmA high-throughput and memory-efficient inference and serving engine for LLMs
Qlib is an AI-oriented Quant investment platform that aims to use AI tech to empower Quant Research, from exploring ideas to implementing productions. Qlib supports diverse ML modeling paradigms, i…
ArcticInference: vLLM plugin for high-throughput, low-latency inference
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
Ring attention implementation with flash attention
Next-generation AI Agent Optimization Platform: Cozeloop addresses challenges in AI agent development by providing full-lifecycle management capabilities from development, debugging, and evaluation…