-
-
Mooncake Public
Forked from kvcache-ai/MooncakeMooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
C++ Apache License 2.0 UpdatedDec 5, 2025 -
ThunderKittens Public
Forked from HazyResearch/ThunderKittensTile primitives for speedy kernels
Cuda MIT License UpdatedDec 4, 2025 -
DLSlime Public
Forked from DeepLink-org/DLSlimeDLSlime: Flexible & Efficient Heterogeneous Transfer Toolkit
C++ BSD 3-Clause "New" or "Revised" License UpdatedOct 26, 2025 -
mscclpp Public
Forked from microsoft/mscclppMSCCL++: A GPU-driven communication stack for scalable AI applications
C++ MIT License UpdatedOct 20, 2025 -
Nanoflow Public
Forked from efeslab/NanoflowA throughput-oriented high-performance serving framework for LLMs
Jupyter Notebook UpdatedOct 20, 2025 -
guidance Public
Forked from guidance-ai/guidanceA guidance language for controlling large language models.
Jupyter Notebook MIT License UpdatedOct 14, 2025 -
VCCL Public
Forked from sii-research/VCCLVenus Collective Communication Library, supported by SII and Infrawaves.
C++ Other UpdatedOct 13, 2025 -
lmdeploy Public
Forked from InternLM/lmdeployLMDeploy is a toolkit for compressing, deploying, and serving LLMs.
Python Apache License 2.0 UpdatedSep 12, 2025 -
-
pybind11 Public
Forked from pybind/pybind11Seamless operability between C++11 and Python
C++ Other UpdatedSep 8, 2025 -
tokenweave Public
Forked from microsoft/tokenweaveEfficient Compute-Communication Overlap for Distributed LLM Inference
Python Other UpdatedSep 8, 2025 -
sglang Public
Forked from bytedance-iaas/sglangSGLang is a fast serving framework for large language models and vision language models.
Python Apache License 2.0 UpdatedSep 4, 2025 -
ArcticInference Public
Forked from snowflakedb/ArcticInferenceArcticInference: vLLM plugin for high-throughput, low-latency inference
Python Apache License 2.0 UpdatedAug 19, 2025 -
tilelang Public
Forked from tile-ai/tilelangDomain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
C++ Other UpdatedAug 12, 2025 -
vllm Public
Forked from preminstrel/vllmA high-throughput and memory-efficient inference and serving engine for LLMs
Python Apache License 2.0 UpdatedAug 12, 2025 -
nano_vllm_note Public
Forked from LDLINGLINGLING/nano_vllm_note注释的nano_vllm仓库,并且完成了MiniCPM4的适配以及注册新模型的功能
Python MIT License UpdatedAug 11, 2025 -
flux-models Public
Forked from black-forest-labs/fluxOfficial inference repo for FLUX.1 models
Python Apache License 2.0 UpdatedJul 31, 2025 -
coze-studio Public
Forked from coze-dev/coze-studioAn AI agent development platform with all-in-one visual tools, simplifying agent creation, debugging, and deployment like never before. Coze your way to AI Agent creation.
TypeScript Apache License 2.0 UpdatedJul 30, 2025 -
coze-loop Public
Forked from coze-dev/coze-loopNext-generation AI Agent Optimization Platform: Cozeloop addresses challenges in AI agent development by providing full-lifecycle management capabilities from development, debugging, and evaluation…
Go Apache License 2.0 UpdatedJul 29, 2025 -
nano-vllm Public
Forked from GeeeekExplorer/nano-vllmNano vLLM
Python MIT License UpdatedJun 13, 2025 -
rocSHMEM Public
Forked from ROCm/rocSHMEMrocSHMEM intra-kernel networking runtime for AMD dGPUs on the ROCm platform.
C++ MIT License UpdatedApr 10, 2025 -
Triton-distributed Public
Forked from ByteDance-Seed/Triton-distributedDistributed Triton for Parallel Systems
MLIR MIT License UpdatedApr 7, 2025 -
pplx-kernels Public
Forked from perplexityai/pplx-kernelsPerplexity GPU Kernels
C++ MIT License UpdatedApr 7, 2025 -
DistServe Public
Forked from LLMServe/DistServeDisaggregated serving system for Large Language Models (LLMs).
Jupyter Notebook Apache License 2.0 UpdatedApr 6, 2025 -
tccl-triton Public
Forked from cchan/tcclextensible collectives library in triton
Python UpdatedMar 31, 2025 -
AlexNet-Source-Code Public
Forked from computerhistory/AlexNet-Source-CodeThis package contains the original 2012 AlexNet code.
Cuda BSD 2-Clause "Simplified" License UpdatedMar 12, 2025 -
flux Public
Forked from bytedance/fluxA fast communication-overlapping library for tensor/expert parallelism on GPUs.
C++ Apache License 2.0 UpdatedMar 11, 2025 -
DeepEP Public
Forked from deepseek-ai/DeepEPDeepEP: an efficient expert-parallel communication library
Cuda MIT License UpdatedFeb 25, 2025 -