-
Moore Threads
- ShangHai, CHINA
-
10:08
(UTC -12:00)
Stars
Cloud Native WireGuard Management Platform built on WireGuard
My learning notes for ML SYS.
vLLM Kunlun (vllm-kunlun) is a community-maintained hardware plugin designed to seamlessly run vLLM on the Kunlun XPU.
Eigent: The Open Source Cowork Desktop to Unlock Your Exceptional Productivity. Local and Free Alternative to Claude Cowork.
OpenSource Claude Cowork. A desktop AI assistant that helps you with programming, file management, and any task you can describe.
An open-source alternative to Claude Cowork built for teams, powered by opencode
A Lightweight LLM Inference Performance Simulator
A distributed key-value storage system developed by Alibaba Group
Provides a Python interface to GPU management and monitoring functions. This is a wrapper around the MTML library.
An adapter layer that ensures torch_musa🔦 delivers a CUDA-compatible PyTorch experience.
A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.
Kubernetes-native AI serving platform for scalable model serving.
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
Tile-Based Runtime for Ultra-Low-Latency LLM Inference
An early research stage expert-parallel load balancer for MoE models based on linear programming.
how to optimize some algorithm in cuda.
Open Model Engine (OME) — Kubernetes operator for LLM serving, GPU scheduling, and model lifecycle management. Works with SGLang, vLLM, TensorRT-LLM, and Triton
📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉
Transformer Explained Visually: Learn How LLM Transformer Models Work with Interactive Visualization
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
antgroup / sglang
Forked from sgl-project/sglangSGLang is a fast serving framework for large language models and vision language models.
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
yeahdongcn / sglang
Forked from sgl-project/sglangSGLang is a fast serving framework for large language models and vision language models.
pytorch distribute tutorials
Use kwok and kind to simulate a 100,000-GPU-node cluster to test scheduler performance.
Distributed KV cache scheduling & offloading libraries