Skip to content
View zfy3000163's full-sized avatar

Block or report zfy3000163

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

A kernel library written in tilelang

Python 1,302 106 Updated Apr 23, 2026

A theoretical reconstruction of the Claude Mythos architecture, built from first principles using the available research literature.

Python 10,960 2,449 Updated Apr 27, 2026

Analyze computation-communication overlap in V3/R1.

1,152 146 Updated Mar 21, 2025

My homework for Programming Practice.

C++ 1 1 Updated May 31, 2017

A simple matrix calculator

Python 1 1 Updated Dec 5, 2016
C 1 Updated May 27, 2020

Infiniband Verbs Performance Tests

C 1 1 Updated Feb 22, 2022
C++ 1 2 Updated Mar 7, 2022

Vibe Coding 指南 - 涵盖 Prompt 提示词、Skill 技能库、Workflow 工作流的 AI 编程工作站

Python 12,026 1,240 Updated Apr 28, 2026

CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation

Python 940 75 Updated Mar 4, 2026

A developer tool for disassembling, analyzing, debugging, and visualizing BPF object files.

HTML 25 4 Updated Feb 11, 2026

NVIDIA NVSHMEM is a parallel programming interface for NVIDIA GPUs based on OpenSHMEM. NVSHMEM can significantly reduce multi-process communication and coordination overheads by allowing programmer…

C++ 517 76 Updated Apr 14, 2026

Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞

TypeScript 365,630 74,933 Updated Apr 28, 2026
Python 158 24 Updated Oct 9, 2024

UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g., GPU-driven)

C++ 1,334 143 Updated Apr 28, 2026

Supercharge Your LLM with the Fastest KV Cache Layer

Python 8,144 1,135 Updated Apr 28, 2026
Python 5 Updated Apr 27, 2026

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, DeepSeek, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discr…

Python 8,794 1,422 Updated Jan 28, 2026

TurboDiffusion: 100–200× Acceleration for Video Diffusion Models

Python 3,477 250 Updated Apr 15, 2026

Perplexity open source garden for inference technology

Rust 401 38 Updated Dec 25, 2025
Jupyter Notebook 407 82 Updated Apr 28, 2026

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 5,215 713 Updated Apr 28, 2026

Tile primitives for speedy kernels

Cuda 3,327 276 Updated Apr 25, 2026

Official inference repo for FLUX.1 models

Python 25,469 1,878 Updated Jul 31, 2025

An early research stage expert-parallel load balancer for MoE models based on linear programming.

Python 503 34 Updated Nov 19, 2025
Python 1 2 Updated Jan 22, 2025

A guidance language for controlling large language models.

Jupyter Notebook 21,409 1,157 Updated Apr 10, 2026
Jupyter Notebook 27 4 Updated Sep 26, 2025

Disaggregated serving system for Large Language Models (LLMs).

Jupyter Notebook 804 92 Updated Apr 6, 2025

High performance Transformer implementation in C++.

C++ 154 18 Updated Jan 18, 2025
Next