feifeibear

Jiarui Fang（方佳瑞） feifeibear

Democratizing AGI

1.8k followers · 102 following

Achievements

x3 x4 x3

Achievements

x3 x4 x3

Lists (5)

Sort

Stars

64 stars written in C++

Clear filter

tensorflow / tensorflow

An Open Source Machine Learning Framework for Everyone

C++ 194,352 75,247 Updated Mar 26, 2026

ggml-org / llama.cpp

LLM inference in C/C++

C++ 99,460 15,847 Updated Mar 26, 2026

BVLC / caffe

Caffe: a fast open framework for deep learning.

C++ 34,769 18,540 Updated Jul 31, 2024

apache / mxnet

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more

C++ 20,817 6,719 Updated Oct 25, 2023

alibaba / MNN

MNN: A blazing-fast, lightweight inference engine battle-tested by Alibaba, powering high-performance on-device LLMs and Edge AI.

C++ 14,666 2,260 Updated Mar 26, 2026

deepseek-ai / FlashMLA

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ 12,537 1,006 Updated Feb 6, 2026

vesoft-inc / nebula

A distributed, fast open-source graph database featuring horizontal scalability and high availability

C++ 12,088 1,302 Updated Oct 22, 2025

deepseek-ai / 3FS

A high-performance distributed file system designed to address the challenges of AI training and inference workloads.

C++ 9,779 1,018 Updated Mar 9, 2026

NVIDIA / cutlass

CUDA Templates and Python DSLs for High-Performance Linear Algebra

C++ 9,496 1,750 Updated Mar 24, 2026

Tiiny-AI / PowerInfer

High-speed Large Language Model Serving for Local Deployment

C++ 9,121 538 Updated Jan 24, 2026

google / gemma.cpp

lightweight, standalone C++ inference engine for Google's Gemma models.

C++ 6,759 603 Updated Mar 25, 2026

nmslib / hnswlib

Header-only C++/python library for fast approximate nearest neighbors

C++ 5,131 802 Updated Mar 25, 2026

kvcache-ai / Mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 4,982 630 Updated Mar 26, 2026

MegEngine / MegEngine

MegEngine 是一个快速、可拓展、易于使用且支持自动求导的深度学习框架

C++ 4,806 549 Updated Oct 24, 2024

OpenNMT / CTranslate2

Fast inference engine for Transformer models

C++ 4,384 463 Updated Feb 4, 2026

rime / librime

Rime Input Method Engine, the core library

C++ 4,282 677 Updated Mar 9, 2026

bytedance / lightseq

LightSeq: A High Performance Library for Sequence Processing and Generation

C++ 3,302 333 Updated May 16, 2023

Tencent / Tendis

Tendis is a high-performance distributed storage system fully compatible with the Redis protocol.

C++ 3,140 340 Updated Mar 26, 2026

Tencent / PhoenixGo

Go AI program which implements the AlphaGo Zero paper

C++ 2,925 573 Updated Mar 11, 2019

b4rtaz / distributed-llama

Distributed LLM inference. Connect home devices into a powerful cluster to accelerate LLM inference. More devices means faster inference.

C++ 2,876 216 Updated Feb 10, 2026

pytorch / xla

Enabling PyTorch on XLA Devices (e.g. Google TPU)

C++ 2,762 568 Updated Dec 18, 2025

dmlc / nnvm

C++ 1,653 277 Updated Sep 11, 2018

pytorch / FBGEMM

FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/

C++ 1,548 730 Updated Mar 26, 2026

Tencent / TurboTransformers

a fast and user-friendly runtime for transformer inference (Bert, Albert, GPT2, Decoders, etc) on CPU and GPU.

C++ 1,544 206 Updated Jul 18, 2025

marian-nmt / marian

Fast Neural Machine Translation in C++

C++ 1,436 247 Updated Aug 25, 2023

Amanieu / asyncplusplus

Async++ concurrency framework for C++11

C++ 1,410 205 Updated Oct 11, 2024

bytedance / flux

A fast communication-overlapping library for tensor/expert parallelism on GPUs.

C++ 1,274 98 Updated Aug 28, 2025

dmlc / dlpack

common in-memory tensor structure

C++ 1,181 160 Updated Jan 26, 2026

NVIDIA-Merlin / HugeCTR

HugeCTR is a high efficiency GPU framework designed for Click-Through-Rate (CTR) estimating training

C++ 1,054 204 Updated Mar 12, 2026

microsoft / nnfusion

A flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.

C++ 1,004 166 Updated Sep 19, 2024

Jiarui Fang（方佳瑞） feifeibear

Lists (5)

Diffusion Models

Diffusion Models Inference

GPU Acceleration

LLM Inference

LLM Models

Stars