SuperCB

🏠

Working from home

CuiBo SuperCB

🏠

Working from home

Learning Machine Learning System

48 followers · 176 following

rednote-hilab
Beijing

Lists (1)

Sort

MLsys

Starred repositories

145 stars written in C++

Clear filter

bytedance / flux

A fast communication-overlapping library for tensor/expert parallelism on GPUs.

C++ 1,165 82 Updated Aug 28, 2025

microsoft / nnfusion

A flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.

C++ 994 165 Updated Sep 19, 2024

uccl-project / uccl

UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g., GPU-driven)

C++ 960 85 Updated Nov 11, 2025

onnx / onnx-mlir

Representation and Reference Lowering of ONNX Models in MLIR Compiler Infrastructure

C++ 932 374 Updated Nov 5, 2025

youngyangyang04 / KVstorageBaseRaft-cpp

【代码随想录知识星球】项目分享-基于Raft的k-v存储数据库🔥

C++ 921 193 Updated Nov 1, 2025

TartanLlama / optional

C++11/14/17 std::optional with functional-style extensions and reference support

C++ 912 73 Updated Jun 10, 2024

netcan / asyncio

asyncio is a c++20 library to write concurrent code using the async/await syntax.

C++ 902 90 Updated Feb 3, 2024

dmlc / dmlc-core

A common bricks library for building scalable and portable distributed machine learning.

C++ 893 527 Updated Nov 10, 2025

microsoft / Multiverso

Parameter server framework for distributed machine learning

C++ 777 223 Updated Jan 20, 2019

Tessil / hopscotch-map

C++ implementation of a fast hash map and hash set using hopscotch hashing

C++ 763 69 Updated Nov 2, 2025

MegEngine / InferLLM

a lightweight LLM model inference framework

C++ 740 94 Updated Apr 7, 2024

jiazhihao / TASO

The Tensor Algebra SuperOptimizer for Deep Learning

C++ 730 93 Updated Jan 26, 2023

renatoGarcia / icecream-cpp

🍦 Never use cout/printf to debug again

C++ 727 36 Updated Aug 7, 2025

hjk41 / Remmy

A simple but usable RPC framework

C++ 665 176 Updated Sep 12, 2020

ktprime / emhash

Fast and memory efficient c++ flat hash table/map/set

C++ 653 67 Updated Nov 8, 2025

LeechanX / Easy-Reactor

Easy-Reactor是一个Linux C++高性能TCP服务框架，基于Reactor模式，支持单线程、多线程Reactor，也支持UDP服务

C++ 540 128 Updated Dec 27, 2023

vectorch-ai / ScaleLLM

A high-performance inference system for large language models, designed for production environments.

C++ 482 37 Updated Nov 6, 2025

bytedance / ByteTransformer

optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052

C++ 478 37 Updated Mar 15, 2024

bytedance / effective_transformer

Running BERT without Padding

C++ 475 53 Updated Mar 18, 2022

Antares: an automatic engine for multi-platform kernel generation and optimization. Supporting CPU, CUDA, ROCm, DirectX12, GraphCore, SYCL for CPU/GPU, OpenCL for AMD/NVIDIA, Android CPU/GPU backends.

C++ 469 49 Updated Apr 20, 2025