Skip to content
View yanring's full-sized avatar
:octocat:
:octocat:

Block or report yanring

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Python 94,793 25,824 Updated Nov 7, 2025

Ongoing research training transformer models at scale

Python 14,128 3,252 Updated Nov 7, 2025

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hopper, Ada and Blackwell GPUs, to provide better performance…

Python 2,891 540 Updated Nov 7, 2025

A PyTorch native platform for training generative AI models

Python 4,660 596 Updated Nov 7, 2025

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…

C++ 12,065 1,848 Updated Nov 7, 2025

Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more

Python 33,914 3,233 Updated Nov 7, 2025

Development repository for the Triton language and compiler

MLIR 17,501 2,364 Updated Nov 7, 2025

Training library for Megatron-based models

Python 172 48 Updated Nov 7, 2025

Virtual whiteboard for sketching hand-drawn like diagrams

TypeScript 109,852 11,438 Updated Nov 7, 2025

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Python 16,057 3,180 Updated Nov 7, 2025

🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch.

Python 31,515 6,481 Updated Nov 7, 2025

Oh my tmux! My self-contained, pretty & versatile tmux configuration made with 💛🩷💙🖤❤️🤍

Shell 23,741 3,516 Updated Nov 7, 2025

Making large AI models cheaper, faster and more accessible

Python 41,225 4,540 Updated Nov 7, 2025

Parallel computing with task scheduling

Python 13,579 1,818 Updated Nov 7, 2025

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

C++ 3,872 301 Updated Nov 7, 2025

VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo

Python 1,274 93 Updated Nov 7, 2025

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch

Python 8,841 1,500 Updated Nov 7, 2025

Open deep learning compiler stack for cpu, gpu and specialized accelerators

Python 12,797 3,693 Updated Nov 7, 2025

cudnn_frontend provides a c++ wrapper for the cudnn backend API and samples on how to use it

C++ 638 134 Updated Nov 7, 2025

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Python 40,632 4,613 Updated Nov 7, 2025

slime is an LLM post-training framework for RL Scaling.

Python 2,397 244 Updated Nov 7, 2025

A validation and profiling tool for AI infrastructure

Python 347 76 Updated Nov 6, 2025

A library to analyze PyTorch traces.

Python 425 70 Updated Nov 6, 2025

Optimized primitives for collective multi-GPU communication

C++ 4,213 1,063 Updated Nov 6, 2025

DeepEP: an efficient expert-parallel communication library

Cuda 8,697 976 Updated Nov 6, 2025

The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (V…

Python 35,702 5,060 Updated Nov 6, 2025

Fast and memory-efficient exact attention

Python 20,394 2,121 Updated Nov 5, 2025

Reference implementations of MLPerf® training benchmarks

Python 1,723 584 Updated Nov 5, 2025

Automatically Discovering Fast Parallelization Strategies for Distributed Deep Neural Network Training

C++ 1,844 245 Updated Nov 4, 2025
Next