Skip to content
View yanring's full-sized avatar
:octocat:
:octocat:

Block or report yanring

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
27 stars written in C++
Clear filter

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more

C++ 20,831 6,753 Updated Oct 25, 2023

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…

C++ 12,051 1,843 Updated Nov 6, 2025

OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.

C++ 9,368 1,010 Updated Aug 20, 2025

A C++ High Performance Web Server

C++ 8,130 2,138 Updated Sep 27, 2023

Transformer related optimization, including BERT, GPT

C++ 6,343 920 Updated Mar 27, 2024

header only, dependency-free deep learning framework in C++14

C++ 5,995 1,398 Updated Apr 17, 2022

Optimized primitives for collective multi-GPU communication

C++ 4,211 1,063 Updated Nov 6, 2025

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

C++ 3,865 301 Updated Nov 6, 2025

CV-CUDA™ is an open-source, GPU accelerated library for cloud-scale image processing and computer vision.

C++ 2,597 241 Updated May 21, 2025

TinySTL is a subset of STL(cut some containers and algorithms) and also a superset of STL(add some other containers and algorithms)

C++ 2,487 655 Updated Oct 27, 2018

[ARCHIVED] The C++ Standard Library for your entire system. See https://github.com/NVIDIA/cccl

C++ 2,309 191 Updated Feb 7, 2024

Automatically Discovering Fast Parallelization Strategies for Distributed Deep Neural Network Training

C++ 1,844 245 Updated Nov 4, 2025

A primitive library for neural network

C++ 1,367 222 Updated Nov 24, 2024

A flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.

C++ 995 165 Updated Sep 19, 2024

Notes about modern C++, C++11, C++14 and C++17, Boost Libraries, ABI, foreign function interface and reference cards.

C++ 750 146 Updated Feb 16, 2025

cudnn_frontend provides a c++ wrapper for the cudnn backend API and samples on how to use it

C++ 638 135 Updated Nov 6, 2025

Some CMake Templates (examples). Qt, Boost, OpenCV, C++11, etc 一些栗子

C++ 544 142 Updated Dec 7, 2023

Demonstration of various hardware effects on CUDA GPUs.

C++ 388 30 Updated Nov 22, 2023

Distributed LR、 FM model on Parameter Server. FTRL and SGD Optimization Algorithm.

C++ 225 83 Updated Mar 14, 2018

[MLSys 2021] IOS: Inter-Operator Scheduler for CNN Acceleration

C++ 200 34 Updated Apr 27, 2022

Google Colab Notebooks for Udacity CS344 - Intro to Parallel Programming

C++ 134 52 Updated Apr 14, 2021

LR、FM model solved by ftrl and sgd parallel on MPI

C++ 111 50 Updated Dec 3, 2017

Accelerating DNN Convolutional Layers with Micro-batches

C++ 63 9 Updated Apr 30, 2020

Canvas: End-to-End Kernel Architecture Search in Neural Networks

C++ 26 4 Updated Nov 18, 2024

A deep learning framework in cpp

C++ 6 1 Updated Sep 17, 2020

A cpp reimplementation for CTC decoder

C++ 5 Updated Mar 15, 2021