Skip to content
View leixy76's full-sized avatar

Block or report leixy76

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

89 stars written in C++
Clear filter

LLM inference in C/C++

C++ 91,514 14,142 Updated Dec 18, 2025

Port of OpenAI's Whisper model in C/C++

C++ 45,156 5,023 Updated Dec 18, 2025

A library for efficient similarity search and clustering of dense vectors.

C++ 38,463 4,153 Updated Dec 18, 2025

DuckDB is an analytical in-process SQL database management system

C++ 34,831 2,792 Updated Dec 18, 2025

Windows Subsystem for Linux

C++ 30,618 1,559 Updated Dec 18, 2025

Android real-time display control software

C++ 27,310 3,352 Updated Nov 26, 2025

An MCP-based chatbot | 一个基于MCP的聊天机器人

C++ 22,458 4,680 Updated Dec 17, 2025

MNN is a blazing fast, lightweight deep learning framework, battle-tested by business-critical use cases in Alibaba. Full multimodal LLM Android App:[MNN-LLM-Android](./apps/Android/MnnLlmChat/READ…

C++ 13,720 2,139 Updated Nov 20, 2025

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ 11,920 918 Updated Dec 15, 2025

WasmEdge is a lightweight, high-performance, and extensible WebAssembly runtime for cloud native, edge, and decentralized applications. It powers serverless apps, embedded functions, microservices,…

C++ 10,283 924 Updated Dec 18, 2025

Conversion between Traditional and Simplified Chinese

C++ 9,378 1,033 Updated Nov 10, 2025

Run GGUF models easily with a KoboldAI UI. One File. Zero Install.

C++ 9,097 592 Updated Dec 18, 2025

Sampling CPU and HEAP profiler for Java featuring AsyncGetCallTrace + perf_events

C++ 8,731 949 Updated Dec 16, 2025

High-speed Large Language Model Serving for Local Deployment

C++ 8,491 461 Updated Aug 2, 2025

lightweight, standalone C++ inference engine for Google's Gemma models.

C++ 6,643 581 Updated Dec 18, 2025

Diffusion model(SD,Flux,Wan,Qwen Image,Z-Image,...) inference in pure C/C++

C++ 4,901 472 Updated Dec 17, 2025

The AI-native database built for LLM applications, providing incredibly fast hybrid search of dense vector, sparse vector, tensor (multi-vector), and full-text.

C++ 4,270 405 Updated Dec 16, 2025

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

C++ 4,249 349 Updated Dec 18, 2025

fastllm是后端无依赖的高性能大模型推理库。同时支持张量并行推理稠密模型和混合模式推理MOE模型,任意10G以上显卡即可推理满血DeepSeek。双路9004/9005服务器+单显卡部署DeepSeek满血满精度原版模型,单并发20tps;INT4量化模型单并发30tps,多并发可达60+。

C++ 4,114 415 Updated Dec 4, 2025

A lightweight library for portable low-level GPU computation using WebGPU.

C++ 3,924 190 Updated Oct 8, 2025

Fast Open-Source Search & Clustering engine × for Vectors & Arbitrary Objects × in C++, C, Python, JavaScript, Rust, Java, Objective-C, Swift, C#, GoLang, and Wolfram 🔍

C++ 3,477 249 Updated Nov 30, 2025

C++ implementation of ChatGLM-6B & ChatGLM2-6B & ChatGLM3 & GLM4(V)

C++ 2,968 335 Updated Jul 31, 2024

全球最小的桌面级双轮腿机器人!

C++ 2,532 378 Updated Dec 12, 2024

WinDirStat is a disk usage statistics viewer and cleanup tool for Microsoft Windows

C++ 2,432 137 Updated Dec 17, 2025

The AI-Native Search Database. Unifies vector, text, structured and semi-structured data in a single engine, enabling hybrid search and in-database AI workflows.

C++ 1,816 143 Updated Dec 17, 2025

A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.

C++ 1,803 201 Updated Apr 9, 2025

llama.cpp fork with additional SOTA quants and improved performance

C++ 1,391 166 Updated Dec 17, 2025

A highly optimized LLM inference acceleration engine for Llama and its variants.

C++ 906 102 Updated Jul 10, 2025

Low-bit LLM inference on CPU/NPU with lookup table

C++ 902 74 Updated Jun 5, 2025

Suno AI's Bark model in C/C++ for fast text-to-speech generation

C++ 848 79 Updated Nov 16, 2024
Next