Skip to content
View DachengLi1's full-sized avatar
😄
😄

Sponsors

Highlights

  • Pro

Block or report DachengLi1

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
34 stars written in C++
Clear filter

An Open Source Machine Learning Framework for Everyone

C++ 192,326 74,960 Updated Nov 6, 2025

LLM inference in C/C++

C++ 89,243 13,583 Updated Nov 6, 2025

GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.

C++ 76,869 8,296 Updated May 27, 2025

A library for efficient similarity search and clustering of dense vectors.

C++ 37,814 4,098 Updated Nov 5, 2025

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)

C++ 23,374 5,868 Updated Nov 6, 2025

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more

C++ 20,831 6,753 Updated Oct 25, 2023

Tensor library for machine learning

C++ 13,421 1,380 Updated Nov 4, 2025

a small build system with a focus on speed

C++ 12,405 1,742 Updated Oct 27, 2025

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ 11,848 896 Updated Sep 30, 2025

Real-Time SLAM for Monocular, Stereo and RGB-D Cameras, with Loop Detection and Relocalization Capabilities

C++ 9,992 4,745 Updated May 15, 2024

OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.

C++ 9,369 1,010 Updated Aug 20, 2025

CUDA Templates and Python DSLs for High-Performance Linear Algebra

C++ 8,735 1,519 Updated Nov 6, 2025

Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive lea…

C++ 8,613 1,932 Updated Oct 17, 2024

Transformer related optimization, including BERT, GPT

C++ 6,343 920 Updated Mar 27, 2024

MindSpore is a new open source deep learning training/inference framework that could be used for mobile, edge and cloud scenarios.

C++ 4,630 747 Updated Jul 29, 2024

Optimized primitives for collective multi-GPU communication

C++ 4,211 1,063 Updated Nov 6, 2025

Automatically Discovering Fast Parallelization Strategies for Distributed Deep Neural Network Training

C++ 1,844 245 Updated Nov 4, 2025

a fast and user-friendly runtime for transformer inference (Bert, Albert, GPT2, Decoders, etc) on CPU and GPU.

C++ 1,536 206 Updated Jul 18, 2025

A primitive library for neural network

C++ 1,367 222 Updated Nov 24, 2024

Collective communications library with various primitives for multi-machine training.

C++ 1,364 337 Updated Oct 21, 2025

C++-based high-performance parallel environment execution engine (vectorized env) for general RL environments.

C++ 1,203 117 Updated Aug 12, 2024

A fast communication-overlapping library for tensor/expert parallelism on GPUs.

C++ 1,162 82 Updated Aug 28, 2025

UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g., GPU-driven)

C++ 854 75 Updated Nov 6, 2025

[MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention

C++ 775 53 Updated Mar 6, 2025

The Legion Parallel Programming System

C++ 746 152 Updated Oct 1, 2025

Fast Multi-dimensional Sparse Attention

C++ 654 51 Updated Oct 26, 2025
C++ 437 133 Updated Jun 19, 2024

Fast Matrix Multiplications for Lookup Table-Quantized LLMs

C++ 376 18 Updated Apr 13, 2025

PMLS-Caffe: Distributed Deep Learning Framework for Parallel ML System

C++ 193 63 Updated May 10, 2018
Next