Skip to content
View zihaomu's full-sized avatar
🎯
Focusing
🎯
Focusing
  • Shenzhen
  • 12:21 (UTC +08:00)

Organizations

@opencv

Block or report zihaomu

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Core ML tools contain supporting tools for Core ML model conversion, editing, and validation.

Python 5,082 744 Updated Dec 17, 2025

Light Video Generation Inference Framework

Python 1,246 79 Updated Dec 19, 2025

SteadyDancer: Harmonized and Coherent Human Image Animation with First-Frame Preservation

Python 503 35 Updated Dec 13, 2025

Graphic notes on Gilbert Strang's "Linear Algebra for Everyone"

PostScript 20,975 2,519 Updated Jun 30, 2025

Clean, Robust, and Unified PyTorch implementation of popular Deep Reinforcement Learning (DRL) algorithms (Q-learning, Duel DDQN, PER, C51, Noisy DQN, PPO, DDPG, TD3, SAC, ASL)

Python 3,175 382 Updated Jun 11, 2025

FlagGems is an operator library for large language models implemented in the Triton Language.

Python 804 178 Updated Dec 19, 2025

CUDA & Triton Learning Project: Flash Attention 实现探索

Python 14 2 Updated Aug 14, 2025

LeetGPU Challenges

Python 545 43 Updated Dec 11, 2025

微舆:人人可用的多Agent舆情分析助手,打破信息茧房,还原舆情原貌,预测未来走向,辅助决策!从0实现,不依赖任何框架。

Python 33,237 6,398 Updated Dec 20, 2025

Simple high-throughput inference library

Python 153 10 Updated May 14, 2025

📰 Must-read papers on KV Cache Compression (constantly updating 🤗).

624 18 Updated Sep 30, 2025

[ICML 2025] RocketKV: Accelerating Long-Context LLM Inference via Two-Stage KV Cache Compression

Python 28 3 Updated Aug 7, 2025

[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.

Cuda 2,876 289 Updated Dec 11, 2025

Wan: Open and Advanced Large-Scale Video Generative Models

Python 12,929 1,506 Updated Dec 17, 2025

LeetGPU Solutions

Python 91 5 Updated Oct 9, 2025

Mirage Persistent Kernel: Compiling LLMs into a MegaKernel

C++ 2,001 161 Updated Dec 20, 2025

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

C++ 4,266 350 Updated Dec 20, 2025

Triton for OpenCL backend, and use mlir-translate to get source OpenCL code

MLIR 23 4 Updated Aug 27, 2025

The simplest, fastest repository for training/finetuning small-sized VLMs.

Python 4,422 429 Updated Oct 27, 2025

This repository contains the official implementation of "FastVLM: Efficient Vision Encoding for Vision Language Models" - CVPR 2025

Python 7,064 522 Updated May 5, 2025

Fork of the Triton language and compiler for Windows support and easy installation

MLIR 1,676 93 Updated Dec 15, 2025

compiler learning resources collect.

Python 2,618 362 Updated Mar 19, 2025

DeepEP: an efficient expert-parallel communication library

Cuda 8,819 1,034 Updated Dec 5, 2025

OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.

C++ 9,377 1,011 Updated Dec 4, 2025
Python 94 9 Updated Dec 17, 2024

A Flexible Framework for Experiencing Heterogeneous LLM Inference/Fine-tune Optimizations

Python 16,232 1,186 Updated Dec 20, 2025

collection of benchmarks to measure basic GPU capabilities

C++ 475 72 Updated Oct 24, 2025

Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.

Cuda 507 87 Updated Sep 8, 2024

CUDA Matrix Multiplication Optimization

Cuda 246 24 Updated Jul 19, 2024

A Easy-to-understand TensorOp Matmul Tutorial

C++ 397 52 Updated Oct 10, 2025
Next