Skip to content
View seshurajup's full-sized avatar

Block or report seshurajup

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
46 stars written in C++
Clear filter

LLM inference in C/C++

C++ 89,504 13,623 Updated Nov 10, 2025

GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.

C++ 76,880 8,297 Updated May 27, 2025

Port of OpenAI's Whisper model in C/C++

C++ 44,385 4,913 Updated Nov 9, 2025

Open Source alternative to Algolia + Pinecone and an Easier-to-Use alternative to ElasticSearch ⚡ 🔍 ✨ Fast, typo tolerant, in-memory fuzzy Search Engine for building delightful search experiences

C++ 24,662 829 Updated Nov 7, 2025

MLX: An array framework for Apple silicon

C++ 22,768 1,383 Updated Nov 8, 2025

A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning …

C++ 17,820 3,958 Updated Nov 10, 2025

Tensor library for machine learning

C++ 13,527 1,385 Updated Nov 9, 2025

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…

C++ 12,089 1,851 Updated Nov 10, 2025

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ 11,856 898 Updated Sep 30, 2025

Unsupervised text tokenizer for Neural Network-based text generation.

C++ 11,428 1,305 Updated Nov 6, 2025

A high-performance distributed file system designed to address the challenges of AI training and inference workloads.

C++ 9,451 958 Updated Oct 24, 2025

CUDA Templates and Python DSLs for High-Performance Linear Algebra

C++ 8,751 1,521 Updated Nov 7, 2025

High-speed Large Language Model Serving for Local Deployment

C++ 8,382 450 Updated Aug 2, 2025

lightweight, standalone C++ inference engine for Google's Gemma models.

C++ 6,610 573 Updated Nov 7, 2025

Transformer related optimization, including BERT, GPT

C++ 6,344 920 Updated Mar 27, 2024

cuML - RAPIDS Machine Learning Library

C++ 5,002 600 Updated Nov 7, 2025

Diffusion model(SD,Flux,Wan,Qwen Image,...) inference in pure C/C++

C++ 4,537 441 Updated Nov 9, 2025

Lightning fast C++/CUDA neural network framework

C++ 4,298 530 Updated Oct 13, 2025

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 4,245 421 Updated Nov 10, 2025

fastllm是后端无依赖的高性能大模型推理库。同时支持张量并行推理稠密模型和混合模式推理MOE模型,任意10G以上显卡即可推理满血DeepSeek。双路9004/9005服务器+单显卡部署DeepSeek满血满精度原版模型,单并发20tps;INT4量化模型单并发30tps,多并发可达60+。

C++ 4,068 412 Updated Oct 28, 2025

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

C++ 3,880 303 Updated Nov 10, 2025

Kernels & AI inference engine for phones

C++ 3,692 216 Updated Nov 9, 2025

Perforator is a cluster-wide continuous profiling tool designed for large data centers

C++ 3,346 149 Updated Nov 10, 2025

Fast Open-Source Search & Clustering engine × for Vectors & Arbitrary Objects × in C++, C, Python, JavaScript, Rust, Java, Objective-C, Swift, C#, GoLang, and Wolfram 🔍

C++ 3,245 233 Updated Oct 29, 2025

Distributed LLM inference. Connect home devices into a powerful cluster to accelerate LLM inference. More devices means faster inference.

C++ 2,733 191 Updated Nov 2, 2025

KenLM: Faster and Smaller Language Model Queries

C++ 2,686 532 Updated Mar 30, 2025

A LiDAR odometry pipeline that just works

C++ 1,977 399 Updated Oct 29, 2025

A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.

C++ 1,794 201 Updated Apr 9, 2025

SCUDA is a GPU over IP bridge allowing GPUs on remote machines to be attached to CPU-only machines.

C++ 1,776 73 Updated Jun 16, 2025

A highly optimized LLM inference acceleration engine for Llama and its variants.

C++ 903 103 Updated Jul 10, 2025
Next