-
UC Berkeley
- Berkeley, CA
-
01:12
(UTC -07:00) - https://maoziming.github.io/
- @ziming_mao
- in/maoziming
Stars
htop-like TUI for real-time RDMA network monitoring.
Can LLMs Write Correct and Efficient GPU Communication Code?
The end of web parsing. The beginning of scalable pixel-native search.
mKernel: fast multi-node, multi-GPU fused kernels
Ring attention implementation with flash attention
A Distributed Attention Towards Linear Scalability for Ultra-Long Context, Heterogeneous Data Training
A benchmark of real-world DL kernel problems
Google Workspace CLI — one command-line tool for Drive, Gmail, Calendar, Sheets, Docs, Chat, Admin, and more. Dynamically built from Google Discovery Service. Includes AI agent skills.
Automated High-Performance GPU Kernel Generation
CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
Mirage Persistent Kernel: Compiling LLMs into a MegaKernel
Research works from Tencent AI Lab regarding self-evolving agents
SysMoBench: Evaluating AI on Formally Modeling Complex Real-World Systems
Autonomous GPU Kernel Generation & Optimization via Deep Agents
Building the Virtuous Cycle for AI-driven LLM Systems
A fast communication-overlapping library for tensor/expert parallelism on GPUs.
tile-ai / tilescale
Forked from tile-ai/tilelangTile-based language built for AI computation across all scales
Tile-Based Runtime for Ultra-Low-Latency LLM Inference
[ICLR'25] OpenRCA: Can Large Language Models Locate the Root Cause of Software Failures?
Distributed MoE in a Single Kernel [NeurIPS '25]
A Flexible Framework for Experiencing Heterogeneous LLM Inference/Fine-tune Optimizations