ch-wan

Cheng Wan ch-wan

110 followers · 62 following

Achievements

x4 x3 x2

Achievements

x4 x3 x2

Organizations

Lists (1)

Sort

🚀 My stack

1 repository

Stars

radixark / miles_diffusion

[Experimental] Miles-diffusion is an post-training framework for large-scale diffusion model training and production workloads, forked from and co-evolving with miles.

Python 19 5 Updated Jun 17, 2026

Comradery64 / Clau-Decode

Browse, search, analyze and respond to your AI chat history. Local and private by design.

Python 4 Updated Jun 17, 2026

radixark / miles

Miles is an enterprise-facing reinforcement learning framework for LLM and VLM post-training, forked from and co-evolving with slime.

Python 1,584 266 Updated Jun 18, 2026

fzyzcjy / torch_utils

Utility scripts for PyTorch (e.g. Make Perfetto show some disappearing kernels, Memory profiler that understands more low-level allocations such as NCCL, ...)

Python 111 8 Updated Sep 11, 2025

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

Python 5,820 1,061 Updated Jun 18, 2026

flashinfer-ai / debug-print

Debug print operator for cudagraph debugging

Cuda 15 2 Updated Aug 2, 2024

fzyzcjy / py_gil_spy

Periodically (e.g. 1ms or shorter) dump which thread is holding the GIL lock in Python

Rust 1 Updated Apr 28, 2025

deepseek-ai / DeepGEMM

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 7,391 1,053 Updated Jun 4, 2026

deepseek-ai / open-infra-index

Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation

8,007 287 Updated May 15, 2025

kvcache-ai / Mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 5,608 859 Updated Jun 18, 2026

ByteDance-Seed / Triton-distributed

Distributed Compiler based on Triton for Parallel Systems

Python 1,462 151 Updated Apr 22, 2026

ByteDance-Seed / ByteCheckpoint

ByteCheckpoint: An Unified Checkpointing Library for LFMs

Python 283 19 Updated Feb 2, 2026

perplexityai / pplx-kernels

Perplexity GPU Kernels

C++ 585 95 Updated Nov 7, 2025

sgl-project / sglang

SGLang is a high-performance serving framework for large language models and multimodal models.

Python 29,165 6,604 Updated Jun 18, 2026

deepseek-ai / EPLB

Expert Parallelism Load Balancer

Python 1,388 203 Updated Mar 24, 2025

deepseek-ai / DeepEP

DeepEP: an efficient expert-parallel communication library

Cuda 9,741 1,289 Updated Jun 15, 2026

Infini-AI-Lab / TriForce

[COLM 2024] TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding

Python 281 20 Updated Aug 31, 2024

volcengine / veScale

Byted PyTorch Distributed for Hyperscale Training of LLMs and RLs

Python 1,025 62 Updated Mar 3, 2026

triton-lang / triton

Development repository for the Triton language and compiler

MLIR 19,469 2,944 Updated Jun 18, 2026

horseee / Awesome-Efficient-LLM

A curated list for Efficient Large Language Models

Python 2,019 165 Updated Jun 17, 2025

aqlaboratory / openfold

Trainable, memory-efficient, and GPU-friendly PyTorch reproduction of AlphaFold 2

Python 3,383 681 Updated Dec 16, 2025

graphdeeplearning / benchmarking-gnns

Repository for benchmarking graph neural networks (JMLR 2023)

Jupyter Notebook 2,662 457 Updated Jun 22, 2023

hsung2 / Bit-GraphBLAS

Cuda 2 Updated Apr 21, 2022

lambda7xx / awesome-AI-system

paper and its code for AI System

366 23 Updated May 14, 2026

snap-stanford / ogb

Benchmark datasets, data loaders, and evaluators for graph machine learning

Python 2,089 409 Updated May 6, 2025

BlinkDL / RWKV-LM

RWKV (pronounced RwaKuv) is an RNN with great LLM performance, which can also be directly trained like a GPT transformer (parallelizable). We are at RWKV-7 "Goose". So it's combining the best of RN…

Python 14,567 1,006 Updated Jun 13, 2026

lm-sys / FastChat

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

Python 39,479 4,790 Updated May 1, 2026

FMInference / FlexLLMGen

Running large language models on a single GPU for throughput-oriented scenarios.

Python 9,365 591 Updated Oct 28, 2024

ZaidQureshi / bam

Cuda 228 75 Updated Mar 28, 2026

uwsampl / SparseTIR

SparseTIR: Sparse Tensor Compiler for Deep Learning

Python 142 14 Updated Mar 31, 2023