yanring

Zijie Yan yanring

LLM Training System at NVIDIA Megatron Core MoE

139 followers · 52 following

Achievements

x3 x2

Achievements

x3 x2

Stars

pytorch / pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Python 94,793 25,824 Updated Nov 7, 2025

NVIDIA / Megatron-LM

Ongoing research training transformer models at scale

Python 14,128 3,252 Updated Nov 7, 2025

NVIDIA / TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hopper, Ada and Blackwell GPUs, to provide better performance…

Python 2,891 540 Updated Nov 7, 2025

pytorch / torchtitan

A PyTorch native platform for training generative AI models

Python 4,660 596 Updated Nov 7, 2025

NVIDIA / TensorRT-LLM

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…

C++ 12,065 1,848 Updated Nov 7, 2025

jax-ml / jax

Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more

Python 33,914 3,233 Updated Nov 7, 2025

triton-lang / triton

Development repository for the Triton language and compiler

MLIR 17,501 2,364 Updated Nov 7, 2025

NVIDIA-NeMo / Megatron-Bridge

Training library for Megatron-based models

Python 172 48 Updated Nov 7, 2025

excalidraw / excalidraw

Virtual whiteboard for sketching hand-drawn like diagrams

TypeScript 109,852 11,438 Updated Nov 7, 2025

NVIDIA-NeMo / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Python 16,057 3,180 Updated Nov 7, 2025

huggingface / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch.

Python 31,515 6,481 Updated Nov 7, 2025

gpakosz / .tmux

Oh my tmux! My self-contained, pretty & versatile tmux configuration made with 💛🩷💙🖤❤️🤍

Shell 23,741 3,516 Updated Nov 7, 2025

hpcaitech / ColossalAI

Making large AI models cheaper, faster and more accessible

Python 41,225 4,540 Updated Nov 7, 2025

dask / dask

Parallel computing with task scheduling

Python 13,579 1,818 Updated Nov 7, 2025

tile-ai / tilelang

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

C++ 3,872 301 Updated Nov 7, 2025

ByteDance-Seed / VeOmni

VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo

Python 1,274 93 Updated Nov 7, 2025

NVIDIA / apex

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch

Python 8,841 1,500 Updated Nov 7, 2025

apache / tvm

Open deep learning compiler stack for cpu, gpu and specialized accelerators

Python 12,797 3,693 Updated Nov 7, 2025

NVIDIA / cudnn-frontend

cudnn_frontend provides a c++ wrapper for the cudnn backend API and samples on how to use it

C++ 638 134 Updated Nov 7, 2025

deepspeedai / DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Python 40,632 4,613 Updated Nov 7, 2025

THUDM / slime

slime is an LLM post-training framework for RL Scaling.

Python 2,397 244 Updated Nov 7, 2025

NVIDIA-NeMo / Emerging-Optimizers

Python 48 5 Updated Nov 6, 2025

microsoft / superbenchmark

A validation and profiling tool for AI infrastructure

Python 347 76 Updated Nov 6, 2025

facebookresearch / HolisticTraceAnalysis

A library to analyze PyTorch traces.

Python 425 70 Updated Nov 6, 2025

NVIDIA / nccl

Optimized primitives for collective multi-GPU communication

C++ 4,213 1,063 Updated Nov 6, 2025

deepseek-ai / DeepEP

DeepEP: an efficient expert-parallel communication library

Cuda 8,697 976 Updated Nov 6, 2025

huggingface / pytorch-image-models

The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (V…

Python 35,702 5,060 Updated Nov 6, 2025

Dao-AILab / flash-attention

Fast and memory-efficient exact attention

Python 20,394 2,121 Updated Nov 5, 2025

mlcommons / training

Reference implementations of MLPerf® training benchmarks

Python 1,723 584 Updated Nov 5, 2025

flexflow / flexflow-train

Automatically Discovering Fast Parallelization Strategies for Distributed Deep Neural Network Training

C++ 1,844 245 Updated Nov 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Zijie Yan yanring

Achievements

Achievements

Block or report yanring

Stars

pytorch / pytorch

NVIDIA / Megatron-LM

NVIDIA / TransformerEngine

pytorch / torchtitan

NVIDIA / TensorRT-LLM

jax-ml / jax

triton-lang / triton

NVIDIA-NeMo / Megatron-Bridge

excalidraw / excalidraw

NVIDIA-NeMo / NeMo

huggingface / diffusers

gpakosz / .tmux

hpcaitech / ColossalAI

dask / dask

tile-ai / tilelang

ByteDance-Seed / VeOmni

NVIDIA / apex

apache / tvm

NVIDIA / cudnn-frontend

deepspeedai / DeepSpeed

THUDM / slime

NVIDIA-NeMo / Emerging-Optimizers

microsoft / superbenchmark

facebookresearch / HolisticTraceAnalysis

NVIDIA / nccl

deepseek-ai / DeepEP

huggingface / pytorch-image-models

Dao-AILab / flash-attention

mlcommons / training

flexflow / flexflow-train