Lists (2)
Sort Name ascending (A-Z)
Stars
GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.
Open Source alternative to Algolia + Pinecone and an Easier-to-Use alternative to ElasticSearch ⚡ 🔍 ✨ Fast, typo tolerant, in-memory fuzzy Search Engine for building delightful search experiences
A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning …
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…
FlashMLA: Efficient Multi-head Latent Attention Kernels
Unsupervised text tokenizer for Neural Network-based text generation.
A high-performance distributed file system designed to address the challenges of AI training and inference workloads.
CUDA Templates and Python DSLs for High-Performance Linear Algebra
High-speed Large Language Model Serving for Local Deployment
lightweight, standalone C++ inference engine for Google's Gemma models.
Transformer related optimization, including BERT, GPT
Diffusion model(SD,Flux,Wan,Qwen Image,...) inference in pure C/C++
Lightning fast C++/CUDA neural network framework
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
fastllm是后端无依赖的高性能大模型推理库。同时支持张量并行推理稠密模型和混合模式推理MOE模型,任意10G以上显卡即可推理满血DeepSeek。双路9004/9005服务器+单显卡部署DeepSeek满血满精度原版模型,单并发20tps;INT4量化模型单并发30tps,多并发可达60+。
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
Perforator is a cluster-wide continuous profiling tool designed for large data centers
Fast Open-Source Search & Clustering engine × for Vectors & Arbitrary Objects × in C++, C, Python, JavaScript, Rust, Java, Objective-C, Swift, C#, GoLang, and Wolfram 🔍
Distributed LLM inference. Connect home devices into a powerful cluster to accelerate LLM inference. More devices means faster inference.
A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.
SCUDA is a GPU over IP bridge allowing GPUs on remote machines to be attached to CPU-only machines.
A highly optimized LLM inference acceleration engine for Llama and its variants.