Skip to content
View MaoZiming's full-sized avatar
🔭
Thinking
🔭
Thinking

Organizations

@NetSys @Y-Hack @Yale-LILY @ucbsky @yale-nova @skypilot-org @berkeley-cs168 @Trinity-data-store @uccl-project

Block or report MaoZiming

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

htop-like TUI for real-time RDMA network monitoring.

Rust 48 3 Updated Jun 23, 2026

Can LLMs Write Correct and Efficient GPU Communication Code?

Python 36 1 Updated Jun 9, 2026

The end of web parsing. The beginning of scalable pixel-native search.

Python 3,617 324 Updated Jun 23, 2026

mKernel: fast multi-node, multi-GPU fused kernels

Cuda 241 22 Updated Jun 21, 2026

Ring attention implementation with flash attention

Python 1,026 99 Updated Sep 10, 2025

Fast and memory-efficient exact kmeans

Python 662 39 Updated Jun 3, 2026

A Proof-oriented Programming Language

F* 3,047 256 Updated Jun 23, 2026
Python 412 28 Updated Jun 22, 2026

A Distributed Attention Towards Linear Scalability for Ultra-Long Context, Heterogeneous Data Training

Python 855 58 Updated Jun 22, 2026

A benchmark of real-world DL kernel problems

Python 236 26 Updated May 28, 2026

Share your GPU without MIG or MPS

Python 50 4 Updated Jan 27, 2026

Google Workspace CLI — one command-line tool for Drive, Gmail, Calendar, Sheets, Docs, Chat, Admin, and more. Dynamically built from Google Discovery Service. Includes AI agent skills.

Rust 27,224 1,431 Updated Jun 10, 2026

Automated High-Performance GPU Kernel Generation

Python 116 22 Updated Jun 1, 2026

Fast and Furious AMD Kernels

C++ 434 69 Updated Jun 21, 2026

CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning

Python 303 105 Updated Nov 3, 2025
Shell 22 3 Updated Jan 18, 2026

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 7,405 1,059 Updated Jun 4, 2026

Mirage Persistent Kernel: Compiling LLMs into a MegaKernel

Cuda 2,329 223 Updated Jun 22, 2026

Research works from Tencent AI Lab regarding self-evolving agents

Python 98 5 Updated Jan 30, 2026

SysMoBench: Evaluating AI on Formally Modeling Complex Real-World Systems

Python 21 3 Updated Jun 23, 2026

Autonomous GPU Kernel Generation & Optimization via Deep Agents

Python 456 76 Updated Jun 6, 2026

Building the Virtuous Cycle for AI-driven LLM Systems

Python 250 41 Updated May 1, 2026

A fast communication-overlapping library for tensor/expert parallelism on GPUs.

C++ 1,331 105 Updated Aug 28, 2025

Tile-based language built for AI computation across all scales

C++ 170 8 Updated Jun 16, 2026

Tile-Based Runtime for Ultra-Low-Latency LLM Inference

Python 1,459 92 Updated Jun 8, 2026

[ICLR'25] OpenRCA: Can Large Language Models Locate the Root Cause of Software Failures?

Python 366 44 Updated Jun 19, 2026

Distributed MoE in a Single Kernel [NeurIPS '25]

Cuda 268 39 Updated May 5, 2026

A Flexible Framework for Experiencing Heterogeneous LLM Inference/Fine-tune Optimizations

Python 17,313 1,321 Updated Jun 22, 2026
Next