Skip to content
View seshurajup's full-sized avatar

Block or report seshurajup

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
45 stars written in C++
Clear filter

LLM inference in C/C++

C++ 89,252 13,585 Updated Nov 6, 2025

GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.

C++ 76,868 8,296 Updated May 27, 2025

Port of OpenAI's Whisper model in C/C++

C++ 44,296 4,895 Updated Nov 1, 2025

Open Source alternative to Algolia + Pinecone and an Easier-to-Use alternative to ElasticSearch ⚡ 🔍 ✨ Fast, typo tolerant, in-memory fuzzy Search Engine for building delightful search experiences

C++ 24,652 827 Updated Nov 6, 2025

MLX: An array framework for Apple silicon

C++ 22,727 1,378 Updated Nov 6, 2025

A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning …

C++ 17,816 3,954 Updated Nov 6, 2025

Tensor library for machine learning

C++ 13,436 1,379 Updated Nov 4, 2025

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…

C++ 12,053 1,845 Updated Nov 6, 2025

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ 11,848 896 Updated Sep 30, 2025

Unsupervised text tokenizer for Neural Network-based text generation.

C++ 11,423 1,305 Updated Nov 6, 2025

A high-performance distributed file system designed to address the challenges of AI training and inference workloads.

C++ 9,442 957 Updated Oct 24, 2025

CUDA Templates and Python DSLs for High-Performance Linear Algebra

C++ 8,736 1,519 Updated Nov 6, 2025

High-speed Large Language Model Serving for Local Deployment

C++ 8,377 450 Updated Aug 2, 2025

lightweight, standalone C++ inference engine for Google's Gemma models.

C++ 6,609 573 Updated Nov 6, 2025

Transformer related optimization, including BERT, GPT

C++ 6,343 920 Updated Mar 27, 2024

cuML - RAPIDS Machine Learning Library

C++ 4,997 600 Updated Nov 6, 2025

Diffusion model(SD,Flux,Wan,Qwen Image,...) inference in pure C/C++

C++ 4,524 439 Updated Nov 3, 2025

Lightning fast C++/CUDA neural network framework

C++ 4,293 531 Updated Oct 13, 2025

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 4,232 420 Updated Nov 6, 2025

fastllm是后端无依赖的高性能大模型推理库。同时支持张量并行推理稠密模型和混合模式推理MOE模型,任意10G以上显卡即可推理满血DeepSeek。双路9004/9005服务器+单显卡部署DeepSeek满血满精度原版模型,单并发20tps;INT4量化模型单并发30tps,多并发可达60+。

C++ 4,066 412 Updated Oct 28, 2025

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

C++ 3,867 301 Updated Nov 6, 2025

Kernels & AI inference engine for phones

C++ 3,637 214 Updated Nov 6, 2025

Perforator is a cluster-wide continuous profiling tool designed for large data centers

C++ 3,345 148 Updated Nov 6, 2025

Fast Open-Source Search & Clustering engine × for Vectors & Arbitrary Objects × in C++, C, Python, JavaScript, Rust, Java, Objective-C, Swift, C#, GoLang, and Wolfram 🔍

C++ 3,235 233 Updated Oct 29, 2025

Distributed LLM inference. Connect home devices into a powerful cluster to accelerate LLM inference. More devices means faster inference.

C++ 2,727 190 Updated Nov 2, 2025

KenLM: Faster and Smaller Language Model Queries

C++ 2,687 532 Updated Mar 30, 2025

A LiDAR odometry pipeline that just works

C++ 1,975 399 Updated Oct 29, 2025

A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.

C++ 1,794 200 Updated Apr 9, 2025

SCUDA is a GPU over IP bridge allowing GPUs on remote machines to be attached to CPU-only machines.

C++ 1,774 73 Updated Jun 16, 2025

A highly optimized LLM inference acceleration engine for Llama and its variants.

C++ 902 103 Updated Jul 10, 2025
Next