Skip to content
View zihaomu's full-sized avatar
🎯
Focusing
🎯
Focusing
  • AMD
  • Shenzhen
  • 23:19 (UTC +08:00)

Organizations

@opencv @ROCm

Block or report zihaomu

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

DFlash: Block Diffusion for Flash Speculative Decoding

Python 5,090 368 Updated May 10, 2026

RDNA-native LLM inference engine in Rust.

Rust 432 46 Updated Jun 13, 2026
Jupyter Notebook 3 Updated Jun 5, 2026

A context and memory system for AI coding agents. Persistent memory, personal rules, skills, and scheduled observations

Python 583 142 Updated Jun 12, 2026

mKernel: fast multi-node, multi-GPU fused kernels

Cuda 231 22 Updated Jun 8, 2026

A kernel library written in tilelang

Python 1,586 138 Updated Apr 23, 2026

LLM KV Cache compression - K+V dual compression, 73-99% VRAM savings, zero accuracy loss

Python 57 9 Updated Mar 30, 2026

Use Garry Tan's exact Claude Code setup: 23 opinionated tools that serve as CEO, Designer, Eng Manager, Release Manager, Doc Engineer, and QA

TypeScript 109,697 16,296 Updated Jun 12, 2026

AI agents running research on single-GPU nanochat training automatically

Python 86,489 12,527 Updated Mar 26, 2026

Autoresearch for GPU kernels. Give it any PyTorch model, go to sleep, wake up to optimized Triton kernels.

Python 1,406 142 Updated Mar 19, 2026

Official inference framework for 1-bit LLMs

Python 39,300 3,592 Updated Mar 10, 2026

Clspv is a compiler for OpenCL C to Vulkan compute shaders

LLVM 718 102 Updated Jun 8, 2026

LLM inference in C/C++

C++ 116,367 19,539 Updated Jun 13, 2026

Official codebase for the MLSys 2026 paper "IntAttention: A Fully Integer Attention Pipeline for Efficient Edge Inference". It enables high-fidelity and high-speed LLM/ViT deployment on ARM CPUs.

Python 15 3 Updated May 29, 2026

C++ wrappers for SIMD intrinsics and parallelized, optimized mathematical functions (SSE, AVX, AVX512, NEON, SVE, WebAssembly, VSX, RISC-V))

C++ 2,705 302 Updated Jun 11, 2026

AI Edge Quantizer: flexible post training quantization for LiteRT models.

Python 154 30 Updated Jun 12, 2026

The awesome collection of OpenClaw skills. 5,400+ skills filtered and categorized from the official OpenClaw Skills Registry.🦞

50,182 4,888 Updated Jun 8, 2026

let coding agents use ncu skills analysis cuda program automatically!

Shell 109 8 Updated May 25, 2026

Build resilient agents.

Python 34,615 5,815 Updated Jun 13, 2026

llmbasedos — Local-First OS Where Your AI Agents Wake Up and Work

Python 287 20 Updated Jan 6, 2026

[MLSys 2026] AccelOpt: Self-improving Agents for AI Accelerator Kernel Optimization

Python 55 7 Updated Jun 7, 2026

Our first fully AI generated deep learning system

Python 628 48 Updated Feb 2, 2026

人人都能用英语

TypeScript 34,750 4,868 Updated Feb 3, 2026

An unbiased CPU benchmark by OpenCV that provides an evaluation of different CPUs under real-world computer vision and AI workloads.

Python 10 1 Updated Feb 4, 2026

[ICML2025] SpargeAttention: A training-free sparse attention that accelerates any model inference.

Cuda 1,003 94 Updated Feb 25, 2026

A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.

Python 4,386 698 Updated May 17, 2026

Core ML tools contain supporting tools for Core ML model conversion, editing, and validation.

Python 5,314 797 Updated Jun 12, 2026

Light Image Video Generation Inference Framework

Python 2,389 216 Updated Jun 13, 2026
Next