zigzagcai

Follow

🏝️

Happy coding, happy life!

Zheng Cai zigzagcai

🏝️

Happy coding, happy life!

Follow

An engineer and learner passionate about practical distributed systems

75 followers · 119 following

Shanghai, China
00:15 (UTC +08:00)

Achievements

Achievements

Highlights

Developer Program Member

Starred repositories

vllm-project / semantic-router

Intelligent Router for Mixture-of-Models

Go 2,582 366 Updated Dec 25, 2025

nex-agi / NexAU

NexAU (AU for Agent Universe), a general-purpose agent framework for building intelligent agents with tool capabilities.

Python 36 6 Updated Dec 25, 2025

BBuf / how-to-optim-algorithm-in-cuda

how to optimize some algorithm in cuda.

Cuda 2,715 244 Updated Dec 23, 2025

NVIDIA-NeMo / Megatron-Bridge

HuggingFace conversion and training library for Megatron-based models

Python 310 111 Updated Dec 25, 2025

NVIDIA / cuda-tile

CUDA Tile IR is an MLIR-based intermediate representation and compiler infrastructure for CUDA kernel optimization, focusing on tile-based computation patterns and optimizations targeting NVIDIA te…

MLIR 344 26 Updated Dec 20, 2025

ovg-project / kvcached

Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond

Python 727 73 Updated Nov 30, 2025

facebookresearch / DiT

Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"

Python 8,201 737 Updated May 31, 2024

facebookresearch / segment-anything

The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.

Jupyter Notebook 53,005 6,184 Updated Sep 18, 2024

vllm-project / vllm-omni

A framework for efficient model inference with omni-modality models

Python 1,657 209 Updated Dec 25, 2025

Dao-AILab / sonic-moe

Accelerating MoE with IO and Tile-aware Optimizations

Python 462 27 Updated Dec 25, 2025

deepseek-ai / LPLB

An early research stage expert-parallel load balancer for MoE models based on linear programming.

Python 476 27 Updated Nov 19, 2025

sgl-project / mini-sglang

A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.

Python 2,344 206 Updated Dec 23, 2025

vllm-project / router

A high-performance and light-weight router for vLLM large scale deployment

Rust 63 11 Updated Dec 23, 2025

NVIDIA / ib-traffic-monitor

A TUI-based utility for real-time monitoring of InfiniBand traffic and performance metrics on the local node

C 60 5 Updated Dec 19, 2025

feifeibear / long-context-attention

USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inference

Python 618 74 Updated Dec 24, 2025

SWE-bench / SWE-bench

SWE-bench: Can Language Models Resolve Real-world Github Issues?

Python 4,015 720 Updated Dec 18, 2025

xlite-dev / LeetCUDA

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 9,053 891 Updated Dec 24, 2025

tile-ai / TileRT

Tile-Based Runtime for Ultra-Low-Latency LLM Inference

Python 467 22 Updated Dec 23, 2025

NVIDIA / cutile-python

cuTile is a programming model for writing parallel kernels for NVIDIA GPUs

Python 1,672 86 Updated Dec 20, 2025

keivenchang / dynamo-utils

These are personal utilities that are useful for personal use

Python 1 Updated Dec 25, 2025

NVIDIA / nsight-python

Nsight Python is a Python kernel profiling interface based on NVIDIA Nsight Tools

Python 78 6 Updated Dec 23, 2025

ChenQiaoling00 / HydroJobSche

Python 12 Updated Nov 28, 2025

Deep-Learning-Profiling-Tools / triton-viz

Python 267 24 Updated Dec 23, 2025

Tencent-Hunyuan / HunyuanOCR

Python 1,342 106 Updated Dec 4, 2025

meta-pytorch / KernelAgent

Autonomous GPU Kernel Generation via Deep Agents

Python 192 21 Updated Dec 20, 2025

vllm-project / speculators

A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM

Python 174 22 Updated Dec 19, 2025

radixark / miles

Python 627 60 Updated Dec 25, 2025

nex-agi / NexDR

NexDR (Nex Deep Research), a leading deep research agent that autonomously investigates complex topics and generates rich, structured reports.

Python 27 1 Updated Dec 4, 2025

nex-agi / NexRL

NexRL is an ultra-loosely-coupled LLM post-training framework.

Python 62 4 Updated Nov 18, 2025

apache / tvm-ffi

Open ABI and FFI for Machine Learning Systems

C++ 258 43 Updated Dec 24, 2025

Starred topics

CUDA

Bitcoin