yanring

Zijie Yan yanring

LLM Training System at NVIDIA Megatron Core MoE

139 followers · 52 following

Achievements

x3 x2

Achievements

x3 x2

Stars

162 results for source starred repositories

Clear filter

fzyzcjy / torch_memory_saver

Allow torch tensor memory to be released and resumed later

Python 163 26 Updated Nov 1, 2025

NVIDIA-NeMo / Emerging-Optimizers

Python 48 5 Updated Nov 6, 2025

Victarry / PyTorch-Memory-Profiler

Python 32 Updated Sep 8, 2025

ByteDance-Seed / VeOmni

VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo

Python 1,274 93 Updated Nov 7, 2025

OneRedOak / claude-code-workflows

The best workflows and configurations I've developed, having heavily used Claude Code since the day of it's release. Workflows are based off applied learnings from our AI-native startup.

3,053 461 Updated Sep 14, 2025

NVIDIA-NeMo / Megatron-Bridge

Training library for Megatron-based models

Python 172 47 Updated Nov 7, 2025

abcdabcd987 / libfabric-efa-demo

C++ 69 9 Updated Jan 5, 2025

THUDM / slime

slime is an LLM post-training framework for RL Scaling.

Python 2,395 244 Updated Nov 7, 2025

ByteDance-Seed / Triton-distributed

Distributed Compiler based on Triton for Parallel Systems

Python 1,222 104 Updated Oct 17, 2025

tile-ai / tilelang

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

C++ 3,871 301 Updated Nov 7, 2025

Victarry / PP-Schedule-Visualization

Pipeline Parallelism Emulation and Visualization

Python 70 5 Updated Jun 12, 2025

deepseek-ai / profile-data

Analyze computation-communication overlap in V3/R1.

1,114 143 Updated Mar 21, 2025

deepseek-ai / DeepEP

DeepEP: an efficient expert-parallel communication library

Cuda 8,697 976 Updated Nov 6, 2025

facebookresearch / HolisticTraceAnalysis

A library to analyze PyTorch traces.

Python 425 70 Updated Nov 6, 2025

ezyang / torchdbg

PyTorch centric eager mode debugger

TypeScript 48 1 Updated Dec 16, 2024

pytorch / torchtitan

A PyTorch native platform for training generative AI models

Python 4,657 596 Updated Nov 7, 2025

deepseek-ai / DeepSeek-V3

Python 100,185 16,322 Updated Aug 28, 2025

fanshiqing / moe_grouped_gemm

A PyTorch Toolbox for Grouped GEMM in MoE Model Training

6 1 Updated May 28, 2024

AmadeusChan / Awesome-LLM-System-Papers

610 30 Updated May 10, 2025

Aleph-Alpha-Research / NeurIPS-WANT-submission-efficient-parallelization-layouts

Python 22 1 Updated Dec 15, 2023

jamesstringer90 / Easy-GPU-PV

A Project dedicated to making GPU Partitioning on Windows easier!

PowerShell 5,175 521 Updated Oct 6, 2025

deepseek-ai / DeepSeek-MoE

DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

Python 1,824 294 Updated Jan 16, 2024

Rongronggg9 / chinalist

chinalist for SwitchyOmega and SmartProxy

Python 146 13 Updated Oct 27, 2025

calculon-ai / calculon

Python 155 49 Updated Feb 22, 2024

NVIDIA / NeMo-Aligner

Scalable toolkit for efficient model alignment

Python 844 102 Updated Oct 6, 2025

excalidraw / excalidraw

Virtual whiteboard for sketching hand-drawn like diagrams

TypeScript 109,829 11,432 Updated Nov 6, 2025

NVIDIA / TensorRT-LLM

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…

C++ 12,061 1,848 Updated Nov 7, 2025

NVIDIA / TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hopper, Ada and Blackwell GPUs, to provide better performance…

Python 2,890 540 Updated Nov 7, 2025

Azure / MS-AMP

Microsoft Automatic Mixed Precision Library

Python 627 48 Updated Sep 29, 2024

cli99 / llm-analysis

Latency and Memory Analysis of Transformer Models for Training and Inference

Python 461 54 Updated Apr 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Zijie Yan yanring

Achievements

Achievements

Block or report yanring

Stars

fzyzcjy / torch_memory_saver

NVIDIA-NeMo / Emerging-Optimizers

Victarry / PyTorch-Memory-Profiler

ByteDance-Seed / VeOmni

OneRedOak / claude-code-workflows

NVIDIA-NeMo / Megatron-Bridge

abcdabcd987 / libfabric-efa-demo

THUDM / slime

ByteDance-Seed / Triton-distributed

tile-ai / tilelang

Victarry / PP-Schedule-Visualization

deepseek-ai / profile-data

deepseek-ai / DeepEP

facebookresearch / HolisticTraceAnalysis

ezyang / torchdbg

pytorch / torchtitan

deepseek-ai / DeepSeek-V3

fanshiqing / moe_grouped_gemm

AmadeusChan / Awesome-LLM-System-Papers

Aleph-Alpha-Research / NeurIPS-WANT-submission-efficient-parallelization-layouts

jamesstringer90 / Easy-GPU-PV

deepseek-ai / DeepSeek-MoE

Rongronggg9 / chinalist

calculon-ai / calculon

NVIDIA / NeMo-Aligner

excalidraw / excalidraw

NVIDIA / TensorRT-LLM

NVIDIA / TransformerEngine

Azure / MS-AMP

cli99 / llm-analysis