Skip to content
View xinhaoc's full-sized avatar
🕶️
Focusing
🕶️
Focusing

Block or report xinhaoc

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A lightweight design for computation-communication overlap.

Cuda 199 9 Updated Oct 10, 2025

A Datacenter Scale Distributed Inference Serving Framework

Rust 5,666 750 Updated Dec 21, 2025

A curated collection of resources, tutorials, and best practices for learning and mastering NVIDIA CUTLASS

244 12 Updated May 6, 2025

FlashInfer: Kernel Library for LLM Serving

Cuda 4,317 610 Updated Dec 22, 2025

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ 11,928 919 Updated Dec 15, 2025

Makefile 教程

HTML 298 34 Updated Mar 4, 2024

Github mirror of trition-lang/triton repo.

MLIR 109 31 Updated Dec 21, 2025

FlexFlow Serve: Low-Latency, High-Performance LLM Serving

C++ 65 6 Updated Sep 15, 2025

Multi-Faceted AI Agent and Workflow Autotuning. Automatically optimizes LangChain, LangGraph, DSPy programs for better quality, lower execution latency, and lower execution cost. Also has a simple …

Python 266 31 Updated May 16, 2025

Translation of C++ Core Guidelines [https://github.com/isocpp/CppCoreGuidelines] into Simplified Chinese.

2,485 345 Updated Sep 22, 2025

Mirage Persistent Kernel: Compiling LLMs into a MegaKernel

C++ 2,002 161 Updated Dec 20, 2025

Make a personal website using Notion and GitHub Pages

Shell 143 66 Updated Oct 27, 2023

CUDA Templates and Python DSLs for High-Performance Linear Algebra

C++ 8,995 1,588 Updated Dec 21, 2025

An Attention Superoptimizer

C++ 22 Updated Jan 20, 2025

MLX: An array framework for Apple silicon

C++ 23,185 1,426 Updated Dec 21, 2025

Paper collections of retrieval-based (augmented) language model.

232 12 Updated May 24, 2024

paper and its code for AI System

341 23 Updated Dec 13, 2025

Universal cross-platform tokenizers binding to HF and sentencepiece

C++ 435 104 Updated Aug 8, 2025

Automatically Discovering Fast Parallelization Strategies for Distributed Deep Neural Network Training

C++ 1,850 246 Updated Dec 21, 2025

The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.

Jupyter Notebook 52,969 6,181 Updated Sep 18, 2024

📝 My blog / notes

247 34 Updated Sep 22, 2022

Quick, visual, principled introduction to pytorch code through five colab notebooks.

Jupyter Notebook 452 70 Updated Jan 13, 2025

we want to create a repo to illustrate usage of transformers in chinese

Shell 3,041 491 Updated Aug 18, 2024

Third party libraries for FlexFlow

CMake 1 Updated Oct 19, 2024

C++-based high-performance parallel environment execution engine (vectorized env) for general RL environments.

C++ 1,218 120 Updated Aug 12, 2024

A curated list of awesome READMEs

20,167 3,922 Updated Nov 28, 2025
Next