Fast inference engine for Transformer models
-
Updated
Nov 7, 2025 - C++
Fast inference engine for Transformer models
A quantum puzzle and adventure into Native Language decolonization; features an introduction to the master quantum plane and the truthful history of indigenous peoples. Not G-rated.
Repo for AMX + FAST
A tool that performs layer-wise quantization of LLM models to optimize the quality-to-size tradeoff beyond uniform quantization methods. Built on top of 'llama.cpp'.
A lightweight library for the RaBitQ algorithm and its applications in vector search.
Support fixed posit quantised training, inference and fine tuning of neural networks (pytorch based) using the highly optimised fp multiplication on GPU
Faiss-based library for efficient similarity search
A header-only library for serializing and quantizing bits
[SIGMOD 2024] RaBitQ: Quantizing High-Dimensional Vectors with a Theoretical Error Bound for Approximate Nearest Neighbor Search
[SIGMOD 2025] Practical and Asymptotically Optimal Quantization of High-Dimensional Vectors in Euclidean Space for Approximate Nearest Neighbor Search
A simple BMP image viewer, converter and editor. App is primarily focused on implementation of own code for working with BMP images
A resource-conscious neural network implementation for MCUs
An attempt to process through the MoE quantization issues found when converting from safetensors to GGUF
INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model
LinearCosine: Adding beats multiplying for lower-precision efficient cosine similarity
TinyChatEngine: On-Device LLM Inference Library
C++ Image Manipulation Program. Edit your photos by swapping colours, quantise colours or flip the image and more.
Add a description, image, and links to the quantization topic page so that developers can more easily learn about it.
To associate your repository with the quantization topic, visit your repo's landing page and select "manage topics."