zfy3000163

Follow

zfy3000 zfy3000163

Follow

18 followers · 102 following

Achievements

Achievements

Starred repositories

deepseek-ai / TileKernels

A kernel library written in tilelang

Python 1,302 106 Updated Apr 23, 2026

kyegomez / OpenMythos

A theoretical reconstruction of the Claude Mythos architecture, built from first principles using the available research literature.

Python 10,960 2,449 Updated Apr 27, 2026

deepseek-ai / profile-data

Analyze computation-communication overlap in V3/R1.

1,152 146 Updated Mar 21, 2025

sphish / Programming-Practice

My homework for Programming Practice.

C++ 1 1 Updated May 31, 2017

sphish / Matrix-Calculator

A simple matrix calculator

Python 1 1 Updated Dec 5, 2016

sphish / qperf

Forked from linux-rdma/qperf

C 1 Updated May 27, 2020

sphish / perftest

Forked from linux-rdma/perftest

Infiniband Verbs Performance Tests

C 1 1 Updated Feb 22, 2022

sphish / RDMA-Example

C++ 1 2 Updated Mar 7, 2022

tukuaiai / vibe-coding-cn

Forked from EnzeD/vibe-coding

Vibe Coding 指南 - 涵盖 Prompt 提示词、Skill 技能库、Workflow 工作流的 AI 编程工作站

Python 12,026 1,240 Updated Apr 28, 2026

BytedTsinghua-SIA / CUDA-Agent

CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation

Python 940 75 Updated Mar 4, 2026

tsint / bpfviewer

A developer tool for disassembling, analyzing, debugging, and visualizing BPF object files.

HTML 25 4 Updated Feb 11, 2026

NVIDIA / nvshmem

NVIDIA NVSHMEM is a parallel programming interface for NVIDIA GPUs based on OpenSHMEM. NVSHMEM can significantly reduce multi-process communication and coordination overheads by allowing programmer…

C++ 517 76 Updated Apr 14, 2026

openclaw / openclaw

Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞

TypeScript 365,630 74,933 Updated Apr 28, 2026

UChi-JCL / CacheGen

Python 158 24 Updated Oct 9, 2024

uccl-project / uccl

UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g., GPU-driven)

C++ 1,334 143 Updated Apr 28, 2026

LMCache / LMCache

Supercharge Your LLM with the Fastest KV Cache Layer

Python 8,144 1,135 Updated Apr 28, 2026

ApostaC / LMCache

Forked from LMCache/LMCache

Python 5 Updated Apr 27, 2026

intel / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, DeepSeek, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discr…

Python 8,794 1,422 Updated Jan 28, 2026

thu-ml / TurboDiffusion

TurboDiffusion: 100–200× Acceleration for Video Diffusion Models

Python 3,477 250 Updated Apr 15, 2026

perplexityai / pplx-garden

Perplexity open source garden for inference technology

Rust 401 38 Updated Dec 25, 2025

david-xinyuwei / david-share

Jupyter Notebook 407 82 Updated Apr 28, 2026

kvcache-ai / Mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 5,215 713 Updated Apr 28, 2026

HazyResearch / ThunderKittens

Tile primitives for speedy kernels

Cuda 3,327 276 Updated Apr 25, 2026

black-forest-labs / flux

Official inference repo for FLUX.1 models

Python 25,469 1,878 Updated Jul 31, 2025

deepseek-ai / LPLB

An early research stage expert-parallel load balancer for MoE models based on linear programming.

Python 503 34 Updated Nov 19, 2025

YitaoYuan / pytorch-nccl-test

Python 1 2 Updated Jan 22, 2025

guidance-ai / guidance

A guidance language for controlling large language models.

Jupyter Notebook 21,409 1,157 Updated Apr 10, 2026

LLMServe / FastServe

Jupyter Notebook 27 4 Updated Sep 26, 2025

LLMServe / DistServe

Disaggregated serving system for Large Language Models (LLMs).

Jupyter Notebook 804 92 Updated Apr 6, 2025

LLMServe / SwiftTransformer

High performance Transformer implementation in C++.

C++ 154 18 Updated Jan 18, 2025

Starred topics

packet-filter

socket-server-c

fstackqperf

qperfdpdk

noviswitch