Skip to content
View feifeibear's full-sized avatar

Block or report feifeibear

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
64 stars written in C++
Clear filter

An Open Source Machine Learning Framework for Everyone

C++ 194,352 75,247 Updated Mar 26, 2026

LLM inference in C/C++

C++ 99,460 15,847 Updated Mar 26, 2026

Caffe: a fast open framework for deep learning.

C++ 34,769 18,540 Updated Jul 31, 2024

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more

C++ 20,817 6,719 Updated Oct 25, 2023

MNN: A blazing-fast, lightweight inference engine battle-tested by Alibaba, powering high-performance on-device LLMs and Edge AI.

C++ 14,666 2,260 Updated Mar 26, 2026

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ 12,537 1,006 Updated Feb 6, 2026

A distributed, fast open-source graph database featuring horizontal scalability and high availability

C++ 12,088 1,302 Updated Oct 22, 2025

A high-performance distributed file system designed to address the challenges of AI training and inference workloads.

C++ 9,779 1,018 Updated Mar 9, 2026

CUDA Templates and Python DSLs for High-Performance Linear Algebra

C++ 9,496 1,750 Updated Mar 24, 2026

High-speed Large Language Model Serving for Local Deployment

C++ 9,121 538 Updated Jan 24, 2026

lightweight, standalone C++ inference engine for Google's Gemma models.

C++ 6,759 603 Updated Mar 25, 2026

Header-only C++/python library for fast approximate nearest neighbors

C++ 5,131 802 Updated Mar 25, 2026

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 4,982 630 Updated Mar 26, 2026

MegEngine 是一个快速、可拓展、易于使用且支持自动求导的深度学习框架

C++ 4,806 549 Updated Oct 24, 2024

Fast inference engine for Transformer models

C++ 4,384 463 Updated Feb 4, 2026

Rime Input Method Engine, the core library

C++ 4,282 677 Updated Mar 9, 2026

LightSeq: A High Performance Library for Sequence Processing and Generation

C++ 3,302 333 Updated May 16, 2023

Tendis is a high-performance distributed storage system fully compatible with the Redis protocol.

C++ 3,140 340 Updated Mar 26, 2026

Go AI program which implements the AlphaGo Zero paper

C++ 2,925 573 Updated Mar 11, 2019

Distributed LLM inference. Connect home devices into a powerful cluster to accelerate LLM inference. More devices means faster inference.

C++ 2,876 216 Updated Feb 10, 2026

Enabling PyTorch on XLA Devices (e.g. Google TPU)

C++ 2,762 568 Updated Dec 18, 2025
C++ 1,653 277 Updated Sep 11, 2018

FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/

C++ 1,548 730 Updated Mar 26, 2026

a fast and user-friendly runtime for transformer inference (Bert, Albert, GPT2, Decoders, etc) on CPU and GPU.

C++ 1,544 206 Updated Jul 18, 2025

Fast Neural Machine Translation in C++

C++ 1,436 247 Updated Aug 25, 2023

Async++ concurrency framework for C++11

C++ 1,410 205 Updated Oct 11, 2024

A fast communication-overlapping library for tensor/expert parallelism on GPUs.

C++ 1,274 98 Updated Aug 28, 2025

common in-memory tensor structure

C++ 1,181 160 Updated Jan 26, 2026

HugeCTR is a high efficiency GPU framework designed for Click-Through-Rate (CTR) estimating training

C++ 1,054 204 Updated Mar 12, 2026

A flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.

C++ 1,004 166 Updated Sep 19, 2024
Next