A tiny CPU-first, in-memory vector search library in C++ with Python bindings.
- Overview
- Features
- Applications
- Try It Out
- Getting Started
- Examples
- Benchmarks
- Architecture
- Status
- Roadmap
- License
- Disclosure
Spheni is a C++ library with Python bindings to search for points in space that are close to a given query point. The aim is to build-and-document the architectural and performance improvements over time.
- Indexes: Flat, IVF
- Metrics: Cosine, L2
- Storage: F32, INT8
- Ops: add, search, search_batch, train, save, load
Check out the API references for full details:
Spheni manages the low-level indexing and storage of CLIP-generated embeddings to enable vector similarity calculations. It compares the mathematical representation of a text query against the indexed image vectors to find the best semantic matches.
It retrieves relevant lines based on meaning rather than exact keywords. It embeds text once and uses Spheni for fast, offline vector search.
Run this semantic paper search demo in Google Colab:
Searches 5000 ArXiv papers using IVF + INT8 quantization in ~25 lines of code.
Command launcher note:
- Linux: use
python3ifpythonis not available. - Windows: use
py(for example,py -m pip install spheni).
Install from PyPI:
python -m pip install --upgrade pip
python -m pip install spheniVerify:
python -c "import spheni; print(spheni.__version__)"Git clone and navigate into the root directory. Have CMake, pybind11 and OpenMP installed.
Build from the repo root:
./build_spheni.sh --python --install ./distCheck out the full guide.
Build a local wheel (PEP 427):
python -m pip install --upgrade pip
python -m pip wheel . --no-deps -w distFor local-only source builds, you can enable native CPU tuning with:
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release -DSPHENI_BUILD_PYTHON=ON -DSPHENI_ENABLE_MARCH_NATIVE=ON
cmake --build buildC++:
#include "spheni/engine.h"
#include <vector>
int main() {
spheni::IndexSpec spec(3, spheni::Metric::L2, spheni::IndexKind::Flat, false);
spheni::Engine engine(spec);
std::vector<float> data = {1,0,0, 0,1,0, 0,0,1};
engine.add(data);
std::vector<float> query = {0.1f, 0.9f, 0.0f};
auto hits = engine.search(query, 1);
}Python:
import numpy as np
import spheni
spec = spheni.IndexSpec(4, spheni.Metric.L2, spheni.IndexKind.Flat)
engine = spheni.Engine(spec)
base = np.random.rand(10, 4).astype(np.float32)
engine.add(base)
query = np.random.rand(4).astype(np.float32)
results = engine.search(query, 3)
for hit in results:
print(f"ID: {hit.id}, Score: {hit.score}")IVF achieves ~97% Recall@10 with ~12x higher throughput than brute force and stable tail latency. INT8 quantization reduces memory by ~73% with negligible accuracy loss, and OpenMP parallelism adds ~2.4x more throughput.
Read the full benchmark report.
Architecture snapshot reference: docs/arch/v0.1.1.md.
Current code is split by responsibility:
include/spheni/: public API (IndexSpec,Engine, enums, contracts)src/core/: orchestration/factory (Engine, index dispatch)src/indexes/: index algorithms (FlatIndex,IVFIndex)src/math/: shared math kernels and utilities (kernels,kmeans,TopK)src/storage/: storage-specific transforms (quantization)src/io/: binary serialization helperssrc/python/: pybind11 bindings
Contributor workflow:
- Add/modify algorithm behavior in
src/indexes/. - Add reusable scoring/math in
src/math/(instead of duplicating in indexes). - Add representation-specific behavior in
src/storage/. - Keep persistence logic in index state serializers and
src/io/.
Lifecycle contracts:
Engine::train()is explicit and currently IVF-only.IVFIndex::add()buffers vectors before training;IVFIndex::search()requires trained state.SearchParams.nprobeis an IVF query-time control (coarse clusters scanned).- Cosine normalization is controlled by
IndexSpec.normalizeand applied on add/query where relevant.
Spheni is usable for experimentation and benchmarking, but not production-ready.
Current limitations:
- No SIMD kernels
- No deletion or updates
- Limited parameter validation
- IVF uses brute-force centroid assignment
- Harden memory alignment; cut search-time allocations
- Improve IVF cache locality (repack cluster layout)
- Parallelize
search_batch, Flat scan, and IVF training - Add SIMD kernels + runtime ISA dispatch
- Micro-optimize distance and INT8 scoring kernels
- Retune Top-K for small
kand faster merge
- FAISS: ArXiv Paper
- Near Neighbor Search in Large Metric Spaces
- The Binary Vector as the Basis of an Inverted Index File
Apache 2.0
This project used AI assistance (Codex) to generate the serialization, exception-handling and python bindings. Claude Sonnet 4.5 was used to iteratively brainstorm the architecture, the prompt for which can be found here.
Other than that, some inspiration and references were taken from the following projects/forums: