yanring

Zijie Yan yanring

LLM Training System at NVIDIA Megatron Core MoE

140 followers · 52 following

Achievements

x3 x2

Achievements

x3 x2

Stars

74 results for source starred repositories written in Python

Clear filter

deepseek-ai / DeepSeek-V3

Python 100,225 16,329 Updated Aug 28, 2025

pytorch / pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Python 94,926 25,857 Updated Nov 10, 2025

labmlai / annotated_deep_learning_paper_implementations

🧑‍🏫 60+ Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, sophia, ...), ga…

Python 64,223 6,517 Updated Sep 19, 2025

hpcaitech / ColossalAI

Making large AI models cheaper, faster and more accessible

Python 41,232 4,537 Updated Nov 10, 2025

deepspeedai / DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Python 40,648 4,616 Updated Nov 8, 2025

huggingface / pytorch-image-models

The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (V…

Python 35,727 5,062 Updated Nov 6, 2025

jax-ml / jax

Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more

Python 33,933 3,236 Updated Nov 10, 2025

huggingface / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch.

Python 31,543 6,489 Updated Nov 10, 2025

Dao-AILab / flash-attention

Fast and memory-efficient exact attention

Python 20,435 2,125 Updated Nov 9, 2025

openai / tiktoken

tiktoken is a fast BPE tokeniser for use with OpenAI's models.

Python 16,497 1,292 Updated Oct 6, 2025

NVIDIA-NeMo / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Python 16,067 3,188 Updated Nov 10, 2025

horovod / horovod

Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.

Python 14,626 2,256 Updated Nov 2, 2025

NVIDIA / Megatron-LM

Ongoing research training transformer models at scale

Python 14,150 3,262 Updated Nov 10, 2025

dask / dask

Parallel computing with task scheduling

Python 13,584 1,820 Updated Nov 10, 2025

microsoft / LoRA

Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"

Python 12,913 857 Updated Dec 17, 2024

apache / tvm

Open deep learning compiler stack for cpu, gpu and specialized accelerators

Python 12,809 3,694 Updated Nov 10, 2025

NVIDIA / apex

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch

Python 8,844 1,500 Updated Nov 8, 2025

bigcode-project / starcoder

Home of StarCoder: fine-tuning & inference!

Python 7,473 529 Updated Feb 27, 2024

kuangliu / pytorch-cifar

95.47% on CIFAR10 with PyTorch

Python 6,294 2,175 Updated Feb 24, 2023

pytorch / torchtitan

A PyTorch native platform for training generative AI models

Python 4,670 599 Updated Nov 10, 2025

microsoft / AI-System

System for AI Education Resource.

Python 4,162 521 Updated Oct 25, 2024

billryan / algorithm-exercise

Data Structure and Algorithm notes. 数据结构与算法/leetcode/lintcode题解/

Python 3,480 885 Updated Jul 28, 2022

facebookresearch / fairscale

PyTorch extensions for high performance and large scale training.

Python 3,385 294 Updated Apr 26, 2025

Tencent / PocketFlow

An Automatic Model Compression (AutoMC) framework for developing smaller and faster AI applications.

Python 2,912 495 Updated Mar 31, 2023

NVIDIA / TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hopper, Ada and Blackwell GPUs, to provide better performance…

Python 2,900 540 Updated Nov 7, 2025

EugeneLiu / translationCSAPP

为 CSAPP 视频课程提供字幕，翻译 PPT，Lab。

Python 2,756 282 Updated May 29, 2024

THUDM / slime

slime is an LLM post-training framework for RL Scaling.

Python 2,434 246 Updated Nov 7, 2025

hwchase17 / notion-qa

Python 2,162 368 Updated Sep 6, 2024

deepseek-ai / DeepSeek-MoE

DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

Python 1,829 294 Updated Jan 16, 2024

laekov / fastmoe

A fast MoE impl for PyTorch

Python 1,811 196 Updated Feb 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Zijie Yan yanring

Achievements

Achievements

Block or report yanring

Stars

deepseek-ai / DeepSeek-V3

pytorch / pytorch

labmlai / annotated_deep_learning_paper_implementations

hpcaitech / ColossalAI

deepspeedai / DeepSpeed

huggingface / pytorch-image-models

jax-ml / jax

huggingface / diffusers

Dao-AILab / flash-attention

openai / tiktoken

NVIDIA-NeMo / NeMo

horovod / horovod

NVIDIA / Megatron-LM

dask / dask

microsoft / LoRA

apache / tvm

NVIDIA / apex

bigcode-project / starcoder

kuangliu / pytorch-cifar

pytorch / torchtitan

microsoft / AI-System

billryan / algorithm-exercise

facebookresearch / fairscale

Tencent / PocketFlow

NVIDIA / TransformerEngine

EugeneLiu / translationCSAPP

THUDM / slime

hwchase17 / notion-qa

deepseek-ai / DeepSeek-MoE

laekov / fastmoe