Skip to content
View xinhaoc's full-sized avatar
🕶️
Focusing
🕶️
Focusing

Block or report xinhaoc

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

The open-source agent-serving project

Python 22 4 Updated Apr 11, 2026

PTX ISA 9.1 documentation converted to searchable markdown. Includes Claude Code skill for CUDA development.

Python 168 29 Updated Dec 24, 2025

Unofficial description of the CUDA assembly (SASS) instruction sets.

Python 208 19 Updated Jul 18, 2025

A plug-and-play compiler that delivers free-lunch optimizations for both inference and training.

Python 288 21 Updated Apr 7, 2026
Python 18 2 Updated Feb 10, 2026

My Python scripts to make high-quality figures for publications in top AI conferences and journals.

Python 763 60 Updated Apr 10, 2026

A curated list of projects related to the reMarkable tablet

7,336 250 Updated Mar 4, 2026

CUDA Tile IR is an MLIR-based intermediate representation and compiler infrastructure for CUDA kernel optimization, focusing on tile-based computation patterns and optimizations targeting NVIDIA te…

C++ 925 76 Updated Apr 1, 2026

A lightweight design for computation-communication overlap.

Python 226 15 Updated Jan 20, 2026

A Datacenter Scale Distributed Inference Serving Framework

Rust 6,533 1,013 Updated Apr 11, 2026

A curated collection of resources, tutorials, and best practices for learning and mastering NVIDIA CUTLASS

253 13 Updated May 6, 2025

FlashInfer: Kernel Library for LLM Serving

Python 5,367 886 Updated Apr 11, 2026

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ 12,557 1,004 Updated Apr 7, 2026

Makefile 教程

HTML 301 35 Updated Mar 4, 2024

Github mirror of trition-lang/triton repo.

MLIR 155 42 Updated Apr 11, 2026

FlexFlow Serve: Low-Latency, High-Performance LLM Serving

C++ 77 11 Updated Sep 15, 2025

Multi-Faceted AI Agent and Workflow Autotuning. Automatically optimizes LangChain, LangGraph, DSPy programs for better quality, lower execution latency, and lower execution cost. Also has a simple …

Python 274 30 Updated May 16, 2025

Translation of C++ Core Guidelines [https://github.com/isocpp/CppCoreGuidelines] into Simplified Chinese.

2,551 349 Updated Apr 2, 2026

Mirage Persistent Kernel: Compiling LLMs into a MegaKernel

Cuda 2,186 193 Updated Apr 10, 2026

Make a personal website using Notion and GitHub Pages

Shell 142 66 Updated Oct 27, 2023

CUDA Templates and Python DSLs for High-Performance Linear Algebra

C++ 9,558 1,782 Updated Apr 9, 2026

An Attention Superoptimizer

C++ 22 Updated Jan 20, 2025

MLX: An array framework for Apple silicon

C++ 25,303 1,673 Updated Apr 11, 2026

Paper collections of retrieval-based (augmented) language model.

232 12 Updated May 24, 2024

paper and its code for AI System

357 23 Updated Feb 10, 2026

Universal cross-platform tokenizers binding to HF and sentencepiece

C++ 474 118 Updated Feb 20, 2026

Automatically Discovering Fast Parallelization Strategies for Distributed Deep Neural Network Training

C++ 1,872 250 Updated Apr 9, 2026
Next