MNN is a blazing fast, lightweight deep learning framework, battle-tested by business-critical use cases in Alibaba. Full multimodal LLM Android App:[MNN-LLM-Android](./apps/Android/MnnLlmChat/READ…

C++ 13,720 2,139 Updated Nov 20, 2025

deepseek-ai / FlashMLA

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ 11,920 918 Updated Dec 15, 2025

WasmEdge / WasmEdge

WasmEdge is a lightweight, high-performance, and extensible WebAssembly runtime for cloud native, edge, and decentralized applications. It powers serverless apps, embedded functions, microservices,…

C++ 10,283 924 Updated Dec 18, 2025

BYVoid / OpenCC

Conversion between Traditional and Simplified Chinese

C++ 9,378 1,033 Updated Nov 10, 2025

LostRuins / koboldcpp

Forked from ggml-org/llama.cpp

Run GGUF models easily with a KoboldAI UI. One File. Zero Install.

C++ 9,097 592 Updated Dec 18, 2025

async-profiler / async-profiler

Sampling CPU and HEAP profiler for Java featuring AsyncGetCallTrace + perf_events

C++ 8,731 949 Updated Dec 16, 2025

SJTU-IPADS / PowerInfer

High-speed Large Language Model Serving for Local Deployment

C++ 8,491 461 Updated Aug 2, 2025

google / gemma.cpp

lightweight, standalone C++ inference engine for Google's Gemma models.

C++ 6,643 581 Updated Dec 18, 2025

leejet / stable-diffusion.cpp

Diffusion model(SD,Flux,Wan,Qwen Image,Z-Image,...) inference in pure C/C++

C++ 4,901 472 Updated Dec 17, 2025

infiniflow / infinity

The AI-native database built for LLM applications, providing incredibly fast hybrid search of dense vector, sparse vector, tensor (multi-vector), and full-text.

C++ 4,270 405 Updated Dec 16, 2025

tile-ai / tilelang

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

C++ 4,249 349 Updated Dec 18, 2025

ztxz16 / fastllm

fastllm是后端无依赖的高性能大模型推理库。同时支持张量并行推理稠密模型和混合模式推理MOE模型，任意10G以上显卡即可推理满血DeepSeek。双路9004/9005服务器+单显卡部署DeepSeek满血满精度原版模型，单并发20tps；INT4量化模型单并发30tps，多并发可达60+。

C++ 4,114 415 Updated Dec 4, 2025

AnswerDotAI / gpu.cpp

A lightweight library for portable low-level GPU computation using WebGPU.

C++ 3,924 190 Updated Oct 8, 2025

unum-cloud / USearch

Fast Open-Source Search & Clustering engine × for Vectors & Arbitrary Objects × in C++, C, Python, JavaScript, Rust, Java, Objective-C, Swift, C#, GoLang, and Wolfram 🔍

C++ 3,477 249 Updated Nov 30, 2025

li-plus / chatglm.cpp

C++ implementation of ChatGLM-6B & ChatGLM2-6B & ChatGLM3 & GLM4(V)

C++ 2,968 335 Updated Jul 31, 2024

MuShibo / Micro-Wheeled_leg-Robot

全球最小的桌面级双轮腿机器人！

C++ 2,532 378 Updated Dec 12, 2024

windirstat / windirstat

WinDirStat is a disk usage statistics viewer and cleanup tool for Microsoft Windows

C++ 2,432 137 Updated Dec 17, 2025

oceanbase / seekdb

The AI-Native Search Database. Unifies vector, text, structured and semi-structured data in a single engine, enabling hybrid search and in-database AI workflows.

C++ 1,816 143 Updated Dec 17, 2025

AlibabaResearch / AdvancedLiterateMachinery

A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.

C++ 1,803 201 Updated Apr 9, 2025

ikawrakow / ik_llama.cpp

llama.cpp fork with additional SOTA quants and improved performance

C++ 1,391 166 Updated Dec 17, 2025

zhihu / ZhiLight

A highly optimized LLM inference acceleration engine for Llama and its variants.

C++ 906 102 Updated Jul 10, 2025

microsoft / T-MAC

Low-bit LLM inference on CPU/NPU with lookup table

C++ 902 74 Updated Jun 5, 2025

PABannier / bark.cpp

Suno AI's Bark model in C/C++ for fast text-to-speech generation

C++ 848 79 Updated Nov 16, 2024

leixy76

Lists (9)

agent

BI

fine-tuning

function

Inference

llama

orchestration

prompt

rag

Starred repositories

deep-research

dify