torch::deploy (multipy for non-torch uses) is a system that lets you get around the GIL problem by running multiple Python interpreters in a single C++ process.

C++ 181 36 Updated Aug 29, 2025

xiaoyang-sde / co-uring-http

High performance HTTP server built on C++20 coroutines and io_uring

C++ 172 5 Updated Aug 7, 2024

hunterzju / llvm-tutorial

llvm-tutorial文档，翻译以及代码仓库

C++ 168 26 Updated Oct 9, 2023

archibate / mallocvis

allocation visualization in svg graph

C++ 159 23 Updated Jul 7, 2025

layerism / brpc_faiss_server

Vector Search Engine base on BRPC + FAISS

C++ 150 52 Updated Oct 21, 2019

TrivialCompiler / TrivialCompiler

A toy compiler written in C++17 that translates SysY (a C-like toy language) into ARM-v7a assembly.

C++ 145 14 Updated Aug 14, 2021

reed-lau / cute-gemm

C++ 140 41 Updated Nov 9, 2025

june505 / SearchEngine

一个搜索引擎迷你项目，涉及分词，建倒排索引，网页去重，计算相似度，文本聚类，多进程编程，网络编程，守护进程编写，makefile编写，工程组织等各方面内容

C++ 139 53 Updated Oct 16, 2015

faaxm / exmpl-cmake-grpc

Example cmake project for grpc / protobuf

C++ 130 33 Updated Mar 24, 2025

CNevd / Difacto_DMLC

Distributed FM and LR based on Parameter Server with Ftrl

C++ 128 76 Updated Sep 19, 2017

dlsys-course / tinyflow

Forked from tqchen/tinyflow

Tutorial code on how to build your own Deep Learning System in 2k Lines

C++ 124 25 Updated Apr 11, 2017

tlc-pack / libflash_attn

Standalone Flash Attention v2 kernel without libtorch dependency

C++ 112 15 Updated Sep 10, 2024

8sileus / zedio

A runtime for writing asynchronous applications with Modern C++, based on C++20 coroutine and liburing (io-uring)

C++ 108 14 Updated Mar 29, 2025

rapidsai / gputreeshap

C++ 103 30 Updated Sep 23, 2025

pwlnk / cuda-neural-network

Simple neural network implementation using CUDA technology. It is an educational implementation.

C++ 97 28 Updated Apr 12, 2018

tlc-pack / cutlass_fpA_intB_gemm

A standalone GEMM kernel for fp16 activation and quantized weight, extracted from FasterTransformer

C++ 96 24 Updated Sep 13, 2025

ParCIS / Magicube

Magicube is a high-performance library for quantized sparse matrix operations (SpMM and SDDMM) of deep learning on Tensor Cores.

C++ 89 16 Updated Nov 23, 2022

Gan-Tu / cppGFS2.0

A distributed Google File System (GFS), partially implemented in C++. (http://bit.ly/gfs-impl)

C++ 83 20 Updated Jun 24, 2025

InfiniTensor / RefactorGraph

分层解耦的深度学习推理引擎

C++ 76 15 Updated Feb 17, 2025

sunbelbd / song

SONG: Approximate Nearest Neighbor Search on GPU. SONG is a graph-based approximate nearest neighbor search toolbox.

C++ 72 20 Updated Apr 29, 2025

Previous Next

CuiBo SuperCB

Lists (1)

MLsys

Starred repositories

anomaly-detection

polyhedral-model

embedded-machine-learning

Compiler

Emulator

Database