Skip to content
View zigzagcai's full-sized avatar
🏝️
Happy coding, happy life!
🏝️
Happy coding, happy life!
  • Shanghai, China
  • 00:15 (UTC +08:00)

Block or report zigzagcai

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

Intelligent Router for Mixture-of-Models

Go 2,582 366 Updated Dec 25, 2025

NexAU (AU for Agent Universe), a general-purpose agent framework for building intelligent agents with tool capabilities.

Python 36 6 Updated Dec 25, 2025

how to optimize some algorithm in cuda.

Cuda 2,715 244 Updated Dec 23, 2025

HuggingFace conversion and training library for Megatron-based models

Python 310 111 Updated Dec 25, 2025

CUDA Tile IR is an MLIR-based intermediate representation and compiler infrastructure for CUDA kernel optimization, focusing on tile-based computation patterns and optimizations targeting NVIDIA te…

MLIR 344 26 Updated Dec 20, 2025

Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond

Python 727 73 Updated Nov 30, 2025

Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"

Python 8,201 737 Updated May 31, 2024

The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.

Jupyter Notebook 53,005 6,184 Updated Sep 18, 2024

A framework for efficient model inference with omni-modality models

Python 1,657 209 Updated Dec 25, 2025

Accelerating MoE with IO and Tile-aware Optimizations

Python 462 27 Updated Dec 25, 2025

An early research stage expert-parallel load balancer for MoE models based on linear programming.

Python 476 27 Updated Nov 19, 2025

A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.

Python 2,344 206 Updated Dec 23, 2025

A high-performance and light-weight router for vLLM large scale deployment

Rust 63 11 Updated Dec 23, 2025

A TUI-based utility for real-time monitoring of InfiniBand traffic and performance metrics on the local node

C 60 5 Updated Dec 19, 2025

USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inference

Python 618 74 Updated Dec 24, 2025

SWE-bench: Can Language Models Resolve Real-world Github Issues?

Python 4,015 720 Updated Dec 18, 2025

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 9,053 891 Updated Dec 24, 2025

Tile-Based Runtime for Ultra-Low-Latency LLM Inference

Python 467 22 Updated Dec 23, 2025

cuTile is a programming model for writing parallel kernels for NVIDIA GPUs

Python 1,672 86 Updated Dec 20, 2025

These are personal utilities that are useful for personal use

Python 1 Updated Dec 25, 2025

Nsight Python is a Python kernel profiling interface based on NVIDIA Nsight Tools

Python 78 6 Updated Dec 23, 2025
Python 12 Updated Nov 28, 2025

Autonomous GPU Kernel Generation via Deep Agents

Python 192 21 Updated Dec 20, 2025

A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM

Python 174 22 Updated Dec 19, 2025
Python 627 60 Updated Dec 25, 2025

NexDR (Nex Deep Research), a leading deep research agent that autonomously investigates complex topics and generates rich, structured reports.

Python 27 1 Updated Dec 4, 2025

NexRL is an ultra-loosely-coupled LLM post-training framework.

Python 62 4 Updated Nov 18, 2025

Open ABI and FFI for Machine Learning Systems

C++ 258 43 Updated Dec 24, 2025
Next