Skip to content
View SuperCB's full-sized avatar
🏠
Working from home
🏠
Working from home
  • rednote-hilab
  • Beijing

Block or report SuperCB

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

145 stars written in C++
Clear filter

A fast communication-overlapping library for tensor/expert parallelism on GPUs.

C++ 1,165 82 Updated Aug 28, 2025

A flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.

C++ 994 165 Updated Sep 19, 2024

UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g., GPU-driven)

C++ 960 85 Updated Nov 11, 2025

Representation and Reference Lowering of ONNX Models in MLIR Compiler Infrastructure

C++ 932 374 Updated Nov 5, 2025

【代码随想录知识星球】项目分享-基于Raft的k-v存储数据库🔥

C++ 921 193 Updated Nov 1, 2025

C++11/14/17 std::optional with functional-style extensions and reference support

C++ 912 73 Updated Jun 10, 2024

asyncio is a c++20 library to write concurrent code using the async/await syntax.

C++ 902 90 Updated Feb 3, 2024

A common bricks library for building scalable and portable distributed machine learning.

C++ 893 527 Updated Nov 10, 2025

Parameter server framework for distributed machine learning

C++ 777 223 Updated Jan 20, 2019

C++ implementation of a fast hash map and hash set using hopscotch hashing

C++ 763 69 Updated Nov 2, 2025

a lightweight LLM model inference framework

C++ 740 94 Updated Apr 7, 2024

The Tensor Algebra SuperOptimizer for Deep Learning

C++ 730 93 Updated Jan 26, 2023

🍦 Never use cout/printf to debug again

C++ 727 36 Updated Aug 7, 2025

A simple but usable RPC framework

C++ 665 176 Updated Sep 12, 2020

Fast and memory efficient c++ flat hash table/map/set

C++ 653 67 Updated Nov 8, 2025

Easy-Reactor是一个Linux C++高性能TCP服务框架,基于Reactor模式,支持单线程、多线程Reactor,也支持UDP服务

C++ 540 128 Updated Dec 27, 2023
C++ 512 41 Updated Sep 12, 2025

A high-performance inference system for large language models, designed for production environments.

C++ 482 37 Updated Nov 6, 2025

optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052

C++ 478 37 Updated Mar 15, 2024

Running BERT without Padding

C++ 475 53 Updated Mar 18, 2022

Antares: an automatic engine for multi-platform kernel generation and optimization. Supporting CPU, CUDA, ROCm, DirectX12, GraphCore, SYCL for CPU/GPU, OpenCL for AMD/NVIDIA, Android CPU/GPU backends.

C++ 469 49 Updated Apr 20, 2025

collection of benchmarks to measure basic GPU capabilities

C++ 451 68 Updated Oct 24, 2025

A high-performance, extensible Python AOT compiler.

C++ 444 44 Updated Sep 26, 2023

MSCCL++: A GPU-driven communication stack for scalable AI applications

C++ 433 72 Updated Nov 11, 2025

Demonstration of various hardware effects on CUDA GPUs.

C++ 389 30 Updated Nov 22, 2023

Code Examples from "C++ Software Design: Design Principles and Patterns for High-Quality Software" (ISBN: 1098113160)

C++ 387 94 Updated Apr 20, 2023

Conversion to/from half-precision floating point formats

C++ 373 97 Updated Aug 16, 2025

Data Processing benchmark featuring Rust, Go, Swift, Zig, Julia etc.

C++ 373 103 Updated Jul 21, 2025

KV cache store for distributed LLM inference

C++ 360 30 Updated Sep 10, 2025

FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores

C++ 331 29 Updated Dec 28, 2024