DachengLi1

😄

Dacheng Li DachengLi1

😄

PhD at UC Berkeley working on ML and distributed systems.

313 followers · 104 following

dacheng-li.info

Achievements

x2 x3 x2

Achievements

x2 x3 x2

Highlights

Lists (1)

Sort

🚀 My stack

1 repository

Stars

34 stars written in C++

Clear filter

tensorflow / tensorflow

An Open Source Machine Learning Framework for Everyone

C++ 192,326 74,960 Updated Nov 6, 2025

ggml-org / llama.cpp

LLM inference in C/C++

C++ 89,243 13,583 Updated Nov 6, 2025

nomic-ai / gpt4all

GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.

C++ 76,869 8,296 Updated May 27, 2025

facebookresearch / faiss

A library for efficient similarity search and clustering of dense vectors.

C++ 37,814 4,098 Updated Nov 5, 2025

PaddlePaddle / Paddle

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice （『飞桨』核心框架，深度学习&机器学习高性能单机、分布式训练和跨平台部署）

C++ 23,374 5,868 Updated Nov 6, 2025

apache / mxnet

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more

C++ 20,831 6,753 Updated Oct 25, 2023

ggml-org / ggml

Tensor library for machine learning

C++ 13,421 1,380 Updated Nov 4, 2025

ninja-build / ninja

a small build system with a focus on speed

C++ 12,405 1,742 Updated Oct 27, 2025

deepseek-ai / FlashMLA

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ 11,848 896 Updated Sep 30, 2025

raulmur / ORB_SLAM2

Real-Time SLAM for Monocular, Stereo and RGB-D Cameras, with Loop Detection and Relocalization Capabilities

C++ 9,992 4,745 Updated May 15, 2024

Oneflow-Inc / oneflow

OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.

C++ 9,369 1,010 Updated Aug 20, 2025

NVIDIA / cutlass

CUDA Templates and Python DSLs for High-Performance Linear Algebra

C++ 8,735 1,519 Updated Nov 6, 2025

VowpalWabbit / vowpal_wabbit

Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive lea…

C++ 8,613 1,932 Updated Oct 17, 2024

NVIDIA / FasterTransformer

Transformer related optimization, including BERT, GPT

C++ 6,343 920 Updated Mar 27, 2024

mindspore-ai / mindspore

MindSpore is a new open source deep learning training/inference framework that could be used for mobile, edge and cloud scenarios.

C++ 4,630 747 Updated Jul 29, 2024

NVIDIA / nccl

Optimized primitives for collective multi-GPU communication

C++ 4,211 1,063 Updated Nov 6, 2025

google-deepmind / code_contests

C++ 2,172 223 Updated Oct 3, 2023

flexflow / flexflow-train

Automatically Discovering Fast Parallelization Strategies for Distributed Deep Neural Network Training

C++ 1,844 245 Updated Nov 4, 2025

Tencent / TurboTransformers

a fast and user-friendly runtime for transformer inference (Bert, Albert, GPT2, Decoders, etc) on CPU and GPU.

C++ 1,536 206 Updated Jul 18, 2025

OpenPPL / ppl.nn

A primitive library for neural network

C++ 1,367 222 Updated Nov 24, 2024

pytorch / gloo

Collective communications library with various primitives for multi-machine training.

C++ 1,364 337 Updated Oct 21, 2025

sail-sg / envpool

C++-based high-performance parallel environment execution engine (vectorized env) for general RL environments.

C++ 1,203 117 Updated Aug 12, 2024

bytedance / flux

A fast communication-overlapping library for tensor/expert parallelism on GPUs.

C++ 1,162 82 Updated Aug 28, 2025

uccl-project / uccl

UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g., GPU-driven)

C++ 854 75 Updated Nov 6, 2025

mit-han-lab / omniserve

[MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention

C++ 775 53 Updated Mar 6, 2025