Skip to content

An in-memory vector search library in C++ with Python bindings

License

Notifications You must be signed in to change notification settings

datavorous/spheni

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spheni

A tiny CPU-first, in-memory vector search library in C++ with Python bindings.

Discord

Index

  1. Overview
  2. Features
  3. Applications
  4. Try It Out
  5. Getting Started
  6. Examples
  7. Benchmarks
  8. Architecture
  9. Status
  10. Roadmap
  11. License
  12. Disclosure

Overview

Spheni is a C++ library with Python bindings to search for points in space that are close to a given query point. The aim is to build-and-document the architectural and performance improvements over time.

Features

  1. Indexes: Flat, IVF
  2. Metrics: Cosine, L2
  3. Storage: F32, INT8
  4. Ops: add, search, search_batch, train, save, load

Check out the API references for full details:

Applications

Semantic Image Search

Spheni manages the low-level indexing and storage of CLIP-generated embeddings to enable vector similarity calculations. It compares the mathematical representation of a text query against the indexed image vectors to find the best semantic matches.

demo gif

Semantic grep

It retrieves relevant lines based on meaning rather than exact keywords. It embeds text once and uses Spheni for fast, offline vector search.

sphgrep

Try It Out

Run this semantic paper search demo in Google Colab:

Open In Colab

Searches 5000 ArXiv papers using IVF + INT8 quantization in ~25 lines of code.

Getting Started

Command launcher note:

  • Linux: use python3 if python is not available.
  • Windows: use py (for example, py -m pip install spheni).

Quick Start (Python package)

Install from PyPI:

python -m pip install --upgrade pip
python -m pip install spheni

Verify:

python -c "import spheni; print(spheni.__version__)"

Build From Source (C++ / local Python module)

Git clone and navigate into the root directory. Have CMake, pybind11 and OpenMP installed.

Build from the repo root:

./build_spheni.sh --python --install ./dist

Check out the full guide.

Build a local wheel (PEP 427):

python -m pip install --upgrade pip
python -m pip wheel . --no-deps -w dist

For local-only source builds, you can enable native CPU tuning with:

cmake -S . -B build -DCMAKE_BUILD_TYPE=Release -DSPHENI_BUILD_PYTHON=ON -DSPHENI_ENABLE_MARCH_NATIVE=ON
cmake --build build

Examples

C++:

#include "spheni/engine.h"
#include <vector>

int main() {
    spheni::IndexSpec spec(3, spheni::Metric::L2, spheni::IndexKind::Flat, false);
    spheni::Engine engine(spec);
    std::vector<float> data = {1,0,0, 0,1,0, 0,0,1};
    engine.add(data);
    std::vector<float> query = {0.1f, 0.9f, 0.0f};
    auto hits = engine.search(query, 1);
}

Python:

import numpy as np
import spheni

spec = spheni.IndexSpec(4, spheni.Metric.L2, spheni.IndexKind.Flat)
engine = spheni.Engine(spec)

base = np.random.rand(10, 4).astype(np.float32)
engine.add(base)

query = np.random.rand(4).astype(np.float32)
results = engine.search(query, 3)

for hit in results:
    print(f"ID: {hit.id}, Score: {hit.score}")

Benchmarks

IVF achieves ~97% Recall@10 with ~12x higher throughput than brute force and stable tail latency. INT8 quantization reduces memory by ~73% with negligible accuracy loss, and OpenMP parallelism adds ~2.4x more throughput.

Read the full benchmark report.

Architecture

Architecture snapshot reference: docs/arch/v0.1.1.md.

Current code is split by responsibility:

  • include/spheni/: public API (IndexSpec, Engine, enums, contracts)
  • src/core/: orchestration/factory (Engine, index dispatch)
  • src/indexes/: index algorithms (FlatIndex, IVFIndex)
  • src/math/: shared math kernels and utilities (kernels, kmeans, TopK)
  • src/storage/: storage-specific transforms (quantization)
  • src/io/: binary serialization helpers
  • src/python/: pybind11 bindings

Contributor workflow:

  1. Add/modify algorithm behavior in src/indexes/.
  2. Add reusable scoring/math in src/math/ (instead of duplicating in indexes).
  3. Add representation-specific behavior in src/storage/.
  4. Keep persistence logic in index state serializers and src/io/.

Lifecycle contracts:

  • Engine::train() is explicit and currently IVF-only.
  • IVFIndex::add() buffers vectors before training; IVFIndex::search() requires trained state.
  • SearchParams.nprobe is an IVF query-time control (coarse clusters scanned).
  • Cosine normalization is controlled by IndexSpec.normalize and applied on add/query where relevant.

Status

Spheni is usable for experimentation and benchmarking, but not production-ready.

Current limitations:

  • No SIMD kernels
  • No deletion or updates
  • Limited parameter validation
  • IVF uses brute-force centroid assignment

Roadmap

  • Harden memory alignment; cut search-time allocations
  • Improve IVF cache locality (repack cluster layout)
  • Parallelize search_batch, Flat scan, and IVF training
  • Add SIMD kernels + runtime ISA dispatch
  • Micro-optimize distance and INT8 scoring kernels
  • Retune Top-K for small k and faster merge

References

  1. FAISS: ArXiv Paper
  2. Near Neighbor Search in Large Metric Spaces
  3. The Binary Vector as the Basis of an Inverted Index File

License

Apache 2.0

Disclosure

This project used AI assistance (Codex) to generate the serialization, exception-handling and python bindings. Claude Sonnet 4.5 was used to iteratively brainstorm the architecture, the prompt for which can be found here.

Other than that, some inspiration and references were taken from the following projects/forums:

  1. tinyvector
  2. comet
  3. r/database reddit thread