GitHub - datavorous/spheni: An in-memory vector search library in C++ with Python bindings

Spheni

A tiny CPU-first, in-memory vector search library in C++ with Python bindings.

Index

Overview

Spheni is a C++ library with Python bindings to search for points in space that are close to a given query point. The aim is to build-and-document the architectural and performance improvements over time.

Features

Indexes: Flat, IVF
Metrics: Cosine, L2
Storage: F32, INT8
Ops: add, search, search_batch, train, save, load

Check out the API references for full details:

Applications

Semantic Image Search

Spheni manages the low-level indexing and storage of CLIP-generated embeddings to enable vector similarity calculations. It compares the mathematical representation of a text query against the indexed image vectors to find the best semantic matches.

Semantic `grep`

It retrieves relevant lines based on meaning rather than exact keywords. It embeds text once and uses Spheni for fast, offline vector search.

Try It Out

Run this semantic paper search demo in Google Colab:

Searches 5000 ArXiv papers using IVF + INT8 quantization in ~25 lines of code.

Getting Started

Command launcher note:

Linux: use python3 if python is not available.
Windows: use py (for example, py -m pip install spheni).

Quick Start (Python package)

Install from PyPI:

python -m pip install --upgrade pip
python -m pip install spheni

Verify:

python -c "import spheni; print(spheni.__version__)"

Build From Source (C++ / local Python module)

Git clone and navigate into the root directory. Have CMake, pybind11 and OpenMP installed.

Build from the repo root:

./build_spheni.sh --python --install ./dist

Check out the full guide.

Build a local wheel (PEP 427):

python -m pip install --upgrade pip
python -m pip wheel . --no-deps -w dist

For local-only source builds, you can enable native CPU tuning with:

cmake -S . -B build -DCMAKE_BUILD_TYPE=Release -DSPHENI_BUILD_PYTHON=ON -DSPHENI_ENABLE_MARCH_NATIVE=ON
cmake --build build

Examples

C++:

#include "spheni/engine.h"
#include <vector>

int main() {
    spheni::IndexSpec spec(3, spheni::Metric::L2, spheni::IndexKind::Flat, false);
    spheni::Engine engine(spec);
    std::vector<float> data = {1,0,0, 0,1,0, 0,0,1};
    engine.add(data);
    std::vector<float> query = {0.1f, 0.9f, 0.0f};
    auto hits = engine.search(query, 1);
}

Python:

import numpy as np
import spheni

spec = spheni.IndexSpec(4, spheni.Metric.L2, spheni.IndexKind.Flat)
engine = spheni.Engine(spec)

base = np.random.rand(10, 4).astype(np.float32)
engine.add(base)

query = np.random.rand(4).astype(np.float32)
results = engine.search(query, 3)

for hit in results:
    print(f"ID: {hit.id}, Score: {hit.score}")

Benchmarks

IVF achieves ~97% Recall@10 with ~12x higher throughput than brute force and stable tail latency. INT8 quantization reduces memory by ~73% with negligible accuracy loss, and OpenMP parallelism adds ~2.4x more throughput.

Read the full benchmark report.

Architecture

Architecture snapshot reference: docs/arch/v0.1.1.md.

Current code is split by responsibility:

include/spheni/: public API (IndexSpec, Engine, enums, contracts)
src/core/: orchestration/factory (Engine, index dispatch)
src/indexes/: index algorithms (FlatIndex, IVFIndex)
src/math/: shared math kernels and utilities (kernels, kmeans, TopK)
src/storage/: storage-specific transforms (quantization)
src/io/: binary serialization helpers
src/python/: pybind11 bindings

Contributor workflow:

Add/modify algorithm behavior in src/indexes/.
Add reusable scoring/math in src/math/ (instead of duplicating in indexes).
Add representation-specific behavior in src/storage/.
Keep persistence logic in index state serializers and src/io/.

Lifecycle contracts:

Engine::train() is explicit and currently IVF-only.
IVFIndex::add() buffers vectors before training; IVFIndex::search() requires trained state.
SearchParams.nprobe is an IVF query-time control (coarse clusters scanned).
Cosine normalization is controlled by IndexSpec.normalize and applied on add/query where relevant.

Status

Spheni is usable for experimentation and benchmarking, but not production-ready.

Current limitations:

No SIMD kernels
No deletion or updates
Limited parameter validation
IVF uses brute-force centroid assignment

Roadmap

Harden memory alignment; cut search-time allocations
Improve IVF cache locality (repack cluster layout)
Parallelize search_batch, Flat scan, and IVF training
Add SIMD kernels + runtime ISA dispatch
Micro-optimize distance and INT8 scoring kernels
Retune Top-K for small k and faster merge

References

License

Apache 2.0

Disclosure

This project used AI assistance (Codex) to generate the serialization, exception-handling and python bindings. Claude Sonnet 4.5 was used to iteratively brainstorm the architecture, the prompt for which can be found here.

Other than that, some inspiration and references were taken from the following projects/forums:

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.github/workflows		.github/workflows
docs		docs
include/spheni		include/spheni
media		media
python/spheni		python/spheni
src		src
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
README.pypi.md		README.pypi.md
build_spheni.sh		build_spheni.sh
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spheni

Index

Overview

Features

Applications

Semantic Image Search

Semantic `grep`

Try It Out

Getting Started

Quick Start (Python package)

Build From Source (C++ / local Python module)

Examples

Benchmarks

Architecture

Status

Roadmap

References

License

Disclosure

About

Uh oh!

Releases 1

Languages

License

datavorous/spheni

Folders and files

Latest commit

History

Repository files navigation

Spheni

Index

Overview

Features

Applications

Semantic Image Search

Semantic grep

Try It Out

Getting Started

Quick Start (Python package)

Build From Source (C++ / local Python module)

Examples

Benchmarks

Architecture

Status

Roadmap

References

License

Disclosure

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Languages

Semantic `grep`