wx-csy

👋

bonjour

Shaoyuan CHEN wx-csy

👋

bonjour

MadSys@Tsinghua

162 followers · 61 following

Tsinghua University
Beijing, China

Achievements

x3 x2

Achievements

x3 x2

Organizations

Starred repositories

bertdobbelaere / SorterHunter

An evolutionary approach to find small and low latency sorting networks

HTML 82 10 Updated Feb 22, 2026

deepseek-ai / DeepSeek-V3.2-Exp

Python 1,603 178 Updated Nov 18, 2025

andreas-abel / nanoBench

A tool for running small microbenchmarks on recent Intel and AMD x86 CPUs.

Python 523 69 Updated Mar 29, 2026

NVIDIA / cuda-python

CUDA Python: Performance meets Productivity

Cython 3,294 301 Updated Jun 22, 2026

deepseek-ai / 3FS

A high-performance distributed file system designed to address the challenges of AI training and inference workloads.

C++ 9,981 1,055 Updated May 7, 2026

deepseek-ai / EPLB

Expert Parallelism Load Balancer

Python 1,390 204 Updated Mar 24, 2025

deepseek-ai / profile-data

Analyze computation-communication overlap in V3/R1.

1,168 148 Updated Mar 21, 2025

deepseek-ai / DualPipe

A bidirectional pipeline parallelism algorithm for computation-communication overlap in DeepSeek V3/R1 training.

Python 2,966 327 Updated Jan 14, 2026

deepseek-ai / DeepGEMM

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 7,398 1,058 Updated Jun 4, 2026

deepseek-ai / DeepEP

DeepEP: an efficient expert-parallel communication library

Cuda 9,751 1,293 Updated Jun 15, 2026

deepseek-ai / FlashMLA

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ 12,709 1,063 Updated Apr 30, 2026

deepseek-ai / open-infra-index

Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation

8,007 288 Updated May 15, 2025

MoonshotAI / MoBA

MoBA: Mixture of Block Attention for Long-Context LLMs

Python 2,128 151 Updated Apr 3, 2025

tinygrad / open-gpu-kernel-modules

Forked from NVIDIA/open-gpu-kernel-modules

NVIDIA Linux open GPU with P2P support

C 1,382 142 Updated Jun 6, 2025

gkamradt / needle-in-a-haystack

Doing simple retrieval from LLM models at various context lengths to measure accuracy

Jupyter Notebook 2,320 247 Updated Jun 8, 2026

aliireza / ddio-bench

Reexamining Direct Cache Access to Optimize I/O Intensive Applications for Multi-hundred-gigabit Networks

Makefile 103 22 Updated Sep 2, 2021

NVIDIA / gdrcopy

A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology

C 1,391 189 Updated Jun 15, 2026

kvcache-ai / ktransformers

A Flexible Framework for Experiencing Heterogeneous LLM Inference/Fine-tune Optimizations

Python 17,317 1,321 Updated Jun 21, 2026

kvcache-ai / Mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 5,626 869 Updated Jun 22, 2026

BBuf / how-to-optim-algorithm-in-cuda

how to optimize some algorithm in cuda.

Cuda 3,093 279 Updated Jun 21, 2026

NVIDIA / cutlass

CUDA Templates and Python DSLs for High-Performance Linear Algebra

C++ 9,936 1,918 Updated Jun 21, 2026

huggingface / text-generation-inference

Large Language Model Text Generation Inference

Python 10,863 1,270 Updated Mar 21, 2026

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 83,535 18,309 Updated Jun 22, 2026

NVIDIA / FasterTransformer

Transformer related optimization, including BERT, GPT

C++ 6,427 935 Updated Mar 27, 2024

google / rust-crate-audits

268 12 Updated Mar 29, 2026

meta-llama / llama

Inference code for Llama models

Python 59,461 9,790 Updated Jan 26, 2025

intel / pcm

Intel® Performance Counter Monitor (Intel® PCM)

C++ 3,289 524 Updated Jun 19, 2026

yaobaiwei / Grasper

Grasper: A High Performance Distributed System for OLAP on Property Graphs.

C++ 30 9 Updated Apr 3, 2021

ciaranm / glasgow-subgraph-solver

A solver for subgraph isomorphism problems, based upon a series of papers by subsets of McCreesh, Prosser, and Trimble.

C++ 107 30 Updated Jun 21, 2026

ciaranm / cp2015-subgraph-isomorphism

CP 2015 subgraph isomorphism experiments, data and paper

C++ 13 5 Updated Sep 5, 2015

Shaoyuan CHEN wx-csy

Organizations

Starred repositories

program-synthesis