Skip to content
View zfy3000163's full-sized avatar

Block or report zfy3000163

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results
Jupyter Notebook 383 74 Updated Dec 21, 2025

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 4,460 475 Updated Dec 21, 2025

Tile primitives for speedy kernels

Cuda 3,009 217 Updated Dec 9, 2025

Official inference repo for FLUX.1 models

Python 24,935 1,829 Updated Jul 31, 2025

An early research stage expert-parallel load balancer for MoE models based on linear programming.

Python 471 27 Updated Nov 19, 2025
Python 1 1 Updated Jan 22, 2025

A guidance language for controlling large language models.

Jupyter Notebook 21,013 1,129 Updated Dec 17, 2025
Jupyter Notebook 19 3 Updated Sep 26, 2025

Disaggregated serving system for Large Language Models (LLMs).

Jupyter Notebook 754 82 Updated Apr 6, 2025

High performance Transformer implementation in C++.

C++ 146 17 Updated Jan 18, 2025

DLSlime: Flexible & Efficient Heterogeneous Transfer Toolkit

C++ 84 7 Updated Dec 20, 2025

Contexts Optical Compression

Python 21,510 1,924 Updated Oct 25, 2025

MSCCL++: A GPU-driven communication stack for scalable AI applications

C++ 444 77 Updated Dec 19, 2025

A throughput-oriented high-performance serving framework for LLMs

Jupyter Notebook 924 45 Updated Oct 29, 2025

Venus Collective Communication Library, supported by SII and Infrawaves.

C++ 125 5 Updated Dec 18, 2025

Efficient Compute-Communication Overlap for Distributed LLM Inference

Python 66 4 Updated Oct 31, 2025
C++ 15 5 Updated Sep 10, 2025

一个深挖 Linux 内核的新功能特性,以 io_uring, cgroup, ebpf, llvm 为代表,包含开源项目,代码案例,文章,视频,架构脑图等

C 1,866 283 Updated May 20, 2024

Seamless operability between C++11 and Python

C++ 17,565 2,253 Updated Dec 15, 2025

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Python 7,408 635 Updated Dec 20, 2025

注释的nano_vllm仓库,并且完成了MiniCPM4的适配以及注册新模型的功能

Python 121 23 Updated Aug 11, 2025

SGLang is a fast serving framework for large language models and vision language models.

Python 17 10 Updated Dec 19, 2025

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 1 1 Updated Aug 12, 2025

Qlib is an AI-oriented Quant investment platform that aims to use AI tech to empower Quant Research, from exploring ideas to implementing productions. Qlib supports diverse ML modeling paradigms, i…

Python 34,715 5,402 Updated Dec 18, 2025

ArcticInference: vLLM plugin for high-throughput, low-latency inference

Python 352 40 Updated Dec 16, 2025

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

C++ 4,266 350 Updated Dec 21, 2025

Ring attention implementation with flash attention

Python 949 90 Updated Sep 10, 2025

Infiniband Verbs Performance Tests

C 889 363 Updated Dec 14, 2025

extensible collectives library in triton

Python 91 6 Updated Mar 31, 2025

Next-generation AI Agent Optimization Platform: Cozeloop addresses challenges in AI agent development by providing full-lifecycle management capabilities from development, debugging, and evaluation…

Go 5,181 712 Updated Dec 19, 2025
Next