Skip to content
View yanring's full-sized avatar
:octocat:
:octocat:

Block or report yanring

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
162 results for source starred repositories
Clear filter

Allow torch tensor memory to be released and resumed later

Python 163 26 Updated Nov 1, 2025

VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo

Python 1,274 93 Updated Nov 7, 2025

The best workflows and configurations I've developed, having heavily used Claude Code since the day of it's release. Workflows are based off applied learnings from our AI-native startup.

3,053 461 Updated Sep 14, 2025

Training library for Megatron-based models

Python 172 47 Updated Nov 7, 2025

slime is an LLM post-training framework for RL Scaling.

Python 2,395 244 Updated Nov 7, 2025

Distributed Compiler based on Triton for Parallel Systems

Python 1,222 104 Updated Oct 17, 2025

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

C++ 3,871 301 Updated Nov 7, 2025

Pipeline Parallelism Emulation and Visualization

Python 70 5 Updated Jun 12, 2025

Analyze computation-communication overlap in V3/R1.

1,114 143 Updated Mar 21, 2025

DeepEP: an efficient expert-parallel communication library

Cuda 8,697 976 Updated Nov 6, 2025

A library to analyze PyTorch traces.

Python 425 70 Updated Nov 6, 2025

PyTorch centric eager mode debugger

TypeScript 48 1 Updated Dec 16, 2024

A PyTorch native platform for training generative AI models

Python 4,657 596 Updated Nov 7, 2025

A PyTorch Toolbox for Grouped GEMM in MoE Model Training

6 1 Updated May 28, 2024

A Project dedicated to making GPU Partitioning on Windows easier!

PowerShell 5,175 521 Updated Oct 6, 2025

DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

Python 1,824 294 Updated Jan 16, 2024

chinalist for SwitchyOmega and SmartProxy

Python 146 13 Updated Oct 27, 2025
Python 155 49 Updated Feb 22, 2024

Scalable toolkit for efficient model alignment

Python 844 102 Updated Oct 6, 2025

Virtual whiteboard for sketching hand-drawn like diagrams

TypeScript 109,829 11,432 Updated Nov 6, 2025

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…

C++ 12,061 1,848 Updated Nov 7, 2025

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hopper, Ada and Blackwell GPUs, to provide better performance…

Python 2,890 540 Updated Nov 7, 2025

Microsoft Automatic Mixed Precision Library

Python 627 48 Updated Sep 29, 2024

Latency and Memory Analysis of Transformer Models for Training and Inference

Python 461 54 Updated Apr 19, 2025
Next