Skip to content
View fishingfly's full-sized avatar
  • Moore Threads
  • ShangHai, CHINA
  • 17:27 (UTC -12:00)

Block or report fishingfly

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Cloud Native WireGuard Management Platform built on WireGuard

Go 10 5 Updated Mar 9, 2026

蚂蚁开源技术沙龙演讲材料归档

18 Updated Jan 21, 2026

My learning notes for ML SYS.

Python 5,767 374 Updated Mar 19, 2026

vLLM Kunlun (vllm-kunlun) is a community-maintained hardware plugin designed to seamlessly run vLLM on the Kunlun XPU.

Python 372 61 Updated Mar 24, 2026

Eigent: The Open Source Cowork Desktop to Unlock Your Exceptional Productivity. Local and Free Alternative to Claude Cowork.

TypeScript 13,193 1,531 Updated Mar 25, 2026

OpenSource Claude Cowork. A desktop AI assistant that helps you with programming, file management, and any task you can describe.

TypeScript 3,068 413 Updated Mar 21, 2026

An open-source alternative to Claude Cowork built for teams, powered by opencode

TypeScript 12,411 1,118 Updated Mar 25, 2026

A Lightweight LLM Inference Performance Simulator

Python 67 18 Updated Mar 18, 2026

A distributed key-value storage system developed by Alibaba Group

C++ 2,306 615 Updated Nov 19, 2019

Dockerfile formatter. a modern dockfmt.

Dockerfile 591 20 Updated Feb 15, 2026

MUSA AI Tensor Engine

C++ 5 Updated Dec 19, 2025

Provides a Python interface to GPU management and monitoring functions. This is a wrapper around the MTML library.

C 6 5 Updated Mar 10, 2026

An adapter layer that ensures torch_musa🔦 delivers a CUDA-compatible PyTorch experience.

Python 31 5 Updated Mar 19, 2026

A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.

Python 3,785 517 Updated Mar 13, 2026

Kubernetes-native AI serving platform for scalable model serving.

Go 272 73 Updated Mar 19, 2026

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

Python 158,356 32,597 Updated Mar 24, 2026

Tile-Based Runtime for Ultra-Low-Latency LLM Inference

Python 687 40 Updated Mar 8, 2026

An early research stage expert-parallel load balancer for MoE models based on linear programming.

Python 499 34 Updated Nov 19, 2025

how to optimize some algorithm in cuda.

Cuda 2,884 264 Updated Mar 24, 2026

Open Model Engine (OME) — Kubernetes operator for LLM serving, GPU scheduling, and model lifecycle management. Works with SGLang, vLLM, TensorRT-LLM, and Triton

Go 404 66 Updated Mar 24, 2026

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 10,008 1,001 Updated Mar 23, 2026

Transformer Explained Visually: Learn How LLM Transformer Models Work with Interactive Visualization

JavaScript 6,992 746 Updated Mar 21, 2026

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

Python 5,420 485 Updated Mar 24, 2026

SGLang is a fast serving framework for large language models and vision language models.

Python 30 5 Updated Mar 25, 2026

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 4,969 623 Updated Mar 25, 2026

SGLang is a fast serving framework for large language models and vision language models.

Python 3 Updated Mar 25, 2026

pytorch distribute tutorials

Jupyter Notebook 172 32 Updated Jun 16, 2025

Use kwok and kind to simulate a 100,000-GPU-node cluster to test scheduler performance.

Shell 3 Updated Aug 14, 2025

Distributed KV cache scheduling & offloading libraries

Go 121 104 Updated Mar 24, 2026

Inference scheduler for llm-d

Go 145 148 Updated Mar 24, 2026
Next