Skip to content
View ch-wan's full-sized avatar

Organizations

@GATECH-EIC

Block or report ch-wan

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

[Experimental] Miles-diffusion is an post-training framework for large-scale diffusion model training and production workloads, forked from and co-evolving with miles.

Python 19 5 Updated Jun 17, 2026

Browse, search, analyze and respond to your AI chat history. Local and private by design.

Python 4 Updated Jun 17, 2026

Miles is an enterprise-facing reinforcement learning framework for LLM and VLM post-training, forked from and co-evolving with slime.

Python 1,584 266 Updated Jun 18, 2026

Utility scripts for PyTorch (e.g. Make Perfetto show some disappearing kernels, Memory profiler that understands more low-level allocations such as NCCL, ...)

Python 111 8 Updated Sep 11, 2025

FlashInfer: Kernel Library for LLM Serving

Python 5,820 1,061 Updated Jun 18, 2026

Debug print operator for cudagraph debugging

Cuda 15 2 Updated Aug 2, 2024

Periodically (e.g. 1ms or shorter) dump which thread is holding the GIL lock in Python

Rust 1 Updated Apr 28, 2025

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 7,391 1,053 Updated Jun 4, 2026

Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation

8,007 287 Updated May 15, 2025

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 5,608 859 Updated Jun 18, 2026

Distributed Compiler based on Triton for Parallel Systems

Python 1,462 151 Updated Apr 22, 2026

ByteCheckpoint: An Unified Checkpointing Library for LFMs

Python 283 19 Updated Feb 2, 2026

Perplexity GPU Kernels

C++ 585 95 Updated Nov 7, 2025

SGLang is a high-performance serving framework for large language models and multimodal models.

Python 29,165 6,604 Updated Jun 18, 2026

Expert Parallelism Load Balancer

Python 1,388 203 Updated Mar 24, 2025

DeepEP: an efficient expert-parallel communication library

Cuda 9,741 1,289 Updated Jun 15, 2026

[COLM 2024] TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding

Python 281 20 Updated Aug 31, 2024

Byted PyTorch Distributed for Hyperscale Training of LLMs and RLs

Python 1,025 62 Updated Mar 3, 2026

Development repository for the Triton language and compiler

MLIR 19,469 2,944 Updated Jun 18, 2026

A curated list for Efficient Large Language Models

Python 2,019 165 Updated Jun 17, 2025

Trainable, memory-efficient, and GPU-friendly PyTorch reproduction of AlphaFold 2

Python 3,383 681 Updated Dec 16, 2025

Repository for benchmarking graph neural networks (JMLR 2023)

Jupyter Notebook 2,662 457 Updated Jun 22, 2023
Cuda 2 Updated Apr 21, 2022

paper and its code for AI System

366 23 Updated May 14, 2026

Benchmark datasets, data loaders, and evaluators for graph machine learning

Python 2,089 409 Updated May 6, 2025

RWKV (pronounced RwaKuv) is an RNN with great LLM performance, which can also be directly trained like a GPT transformer (parallelizable). We are at RWKV-7 "Goose". So it's combining the best of RN…

Python 14,567 1,006 Updated Jun 13, 2026

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

Python 39,479 4,790 Updated May 1, 2026

Running large language models on a single GPU for throughput-oriented scenarios.

Python 9,365 591 Updated Oct 28, 2024
Cuda 228 75 Updated Mar 28, 2026

SparseTIR: Sparse Tensor Compiler for Deep Learning

Python 142 14 Updated Mar 31, 2023
Next