Skip to content
View fishingfly's full-sized avatar
  • Moore Threads
  • ShangHai, CHINA
  • 10:08 (UTC -12:00)

Block or report fishingfly

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Cloud Native WireGuard Management Platform built on WireGuard

Go 10 5 Updated Mar 9, 2026

蚂蚁开源技术沙龙演讲材料归档

18 Updated Jan 21, 2026

My learning notes for ML SYS.

Python 5,744 373 Updated Mar 19, 2026

vLLM Kunlun (vllm-kunlun) is a community-maintained hardware plugin designed to seamlessly run vLLM on the Kunlun XPU.

Python 346 61 Updated Mar 22, 2026

Eigent: The Open Source Cowork Desktop to Unlock Your Exceptional Productivity. Local and Free Alternative to Claude Cowork.

TypeScript 13,127 1,515 Updated Mar 22, 2026

OpenSource Claude Cowork. A desktop AI assistant that helps you with programming, file management, and any task you can describe.

TypeScript 3,053 411 Updated Mar 21, 2026

An open-source alternative to Claude Cowork built for teams, powered by opencode

TypeScript 12,236 1,109 Updated Mar 22, 2026

A Lightweight LLM Inference Performance Simulator

Python 67 18 Updated Mar 18, 2026

A distributed key-value storage system developed by Alibaba Group

C++ 2,307 616 Updated Nov 19, 2019

Dockerfile formatter. a modern dockfmt.

Dockerfile 590 20 Updated Feb 15, 2026

MUSA AI Tensor Engine

C++ 5 Updated Dec 19, 2025

Provides a Python interface to GPU management and monitoring functions. This is a wrapper around the MTML library.

C 6 5 Updated Mar 10, 2026

An adapter layer that ensures torch_musa🔦 delivers a CUDA-compatible PyTorch experience.

Python 31 5 Updated Mar 19, 2026

A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.

Python 3,770 515 Updated Mar 13, 2026

Kubernetes-native AI serving platform for scalable model serving.

Go 267 72 Updated Mar 19, 2026

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

Python 158,258 32,575 Updated Mar 22, 2026

Tile-Based Runtime for Ultra-Low-Latency LLM Inference

Python 687 40 Updated Mar 8, 2026

An early research stage expert-parallel load balancer for MoE models based on linear programming.

Python 499 34 Updated Nov 19, 2025

how to optimize some algorithm in cuda.

Cuda 2,882 264 Updated Mar 22, 2026

Open Model Engine (OME) — Kubernetes operator for LLM serving, GPU scheduling, and model lifecycle management. Works with SGLang, vLLM, TensorRT-LLM, and Triton

Go 401 66 Updated Mar 20, 2026

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 9,971 993 Updated Mar 20, 2026

Transformer Explained Visually: Learn How LLM Transformer Models Work with Interactive Visualization

JavaScript 6,981 746 Updated Mar 21, 2026

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

Python 5,409 482 Updated Mar 22, 2026

SGLang is a fast serving framework for large language models and vision language models.

Python 30 5 Updated Mar 22, 2026

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 4,960 616 Updated Mar 21, 2026

SGLang is a fast serving framework for large language models and vision language models.

Python 3 Updated Mar 20, 2026

pytorch distribute tutorials

Jupyter Notebook 172 32 Updated Jun 16, 2025

Use kwok and kind to simulate a 100,000-GPU-node cluster to test scheduler performance.

Shell 3 Updated Aug 14, 2025

Distributed KV cache scheduling & offloading libraries

Go 119 102 Updated Mar 20, 2026

Inference scheduler for llm-d

Go 144 146 Updated Mar 22, 2026
Next