Skip to content
View xinhaoc's full-sized avatar
🕶️
Focusing
🕶️
Focusing

Block or report xinhaoc

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results
Sass 12 4 Updated Apr 24, 2026

The open-source agent-serving project

Python 348 23 Updated Apr 28, 2026

PTX ISA 9.1 documentation converted to searchable markdown. Includes Claude Code skill for CUDA development.

Python 185 34 Updated Dec 24, 2025

Unofficial description of the CUDA assembly (SASS) instruction sets.

Python 211 19 Updated Jul 18, 2025

A plug-and-play compiler that delivers free-lunch optimizations for both inference and training.

Python 299 23 Updated Apr 27, 2026
Python 18 2 Updated Feb 10, 2026

My Python scripts to make high-quality figures for publications in top AI conferences and journals.

Python 887 69 Updated Apr 19, 2026

A curated list of projects related to the reMarkable tablet

7,360 255 Updated Apr 15, 2026

CUDA Tile IR is an MLIR-based intermediate representation and compiler infrastructure for CUDA kernel optimization, focusing on tile-based computation patterns and optimizations targeting NVIDIA te…

C++ 949 77 Updated Apr 1, 2026

A lightweight design for computation-communication overlap.

Python 229 15 Updated Jan 20, 2026

A Datacenter Scale Distributed Inference Serving Framework

Rust 6,688 1,066 Updated Apr 28, 2026

A curated collection of resources, tutorials, and best practices for learning and mastering NVIDIA CUTLASS

256 14 Updated May 6, 2025

FlashInfer: Kernel Library for LLM Serving

Python 5,523 937 Updated Apr 28, 2026

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ 12,604 1,016 Updated Apr 27, 2026

Makefile 教程

HTML 302 35 Updated Mar 4, 2024

Github mirror of trition-lang/triton repo.

MLIR 162 43 Updated Apr 28, 2026

FlexFlow Serve: Low-Latency, High-Performance LLM Serving

C++ 83 11 Updated Sep 15, 2025

Multi-Faceted AI Agent and Workflow Autotuning. Automatically optimizes LangChain, LangGraph, DSPy programs for better quality, lower execution latency, and lower execution cost. Also has a simple …

Python 275 29 Updated May 16, 2025

Translation of C++ Core Guidelines [https://github.com/isocpp/CppCoreGuidelines] into Simplified Chinese.

2,559 349 Updated Apr 2, 2026

Mirage Persistent Kernel: Compiling LLMs into a MegaKernel

Cuda 2,231 200 Updated Apr 27, 2026

Make a personal website using Notion and GitHub Pages

Shell 143 66 Updated Oct 27, 2023

CUDA Templates and Python DSLs for High-Performance Linear Algebra

C++ 9,644 1,823 Updated Apr 25, 2026

An Attention Superoptimizer

C++ 22 Updated Jan 20, 2025

MLX: An array framework for Apple silicon

C++ 25,818 1,730 Updated Apr 28, 2026

Paper collections of retrieval-based (augmented) language model.

232 12 Updated May 24, 2024

paper and its code for AI System

359 23 Updated Feb 10, 2026

Universal cross-platform tokenizers binding to HF and sentencepiece

C++ 484 118 Updated Feb 20, 2026
Next