·
/|\
· ─── ◆ ─── ·
/·\ /|\ /·\
· ──◆──── ◆───◆ ────◆── ·
/ / /|\ \ /|\ /|\ / |\ \ \
· · · · · · · · · · · · · · ·
╔═══════════════════════════════════════════╗
║ P R O X I M A K I T ║
║ Search by meaning, not keywords. ║
╚═══════════════════════════════════════════╝
Pure-Swift vector search for Apple platforms — powered by Accelerate.
◆ ─────── ◇ ─────── ◆ ─────── ◇ ─────── ◆
ProximaKit finds similar content by understanding what it means — not by matching keywords. Type "beach vacation" and it finds photos of oceans, notes about travel, articles about tropical destinations. None of them need to contain the words "beach" or "vacation."
Everything runs on-device. No server, no API key, no internet. Just your app and Apple Silicon.
HNSW implemented from scratch in Swift. Zero dependencies. Zero C++ wrappers.
No Cloud Required |
Pure Swift |
From Scratch |
◆ ─────── ◇ ─────── ◆ ─────── ◇ ─────── ◆
ProximaKit is a pure-Swift approximate nearest-neighbour library built from scratch on Apple's Accelerate framework. It provides HNSW-based semantic search that runs entirely on-device — no server, no API key, no C++ wrapper required.
The library ships three targets: ProximaKit (core index + distance metrics + persistence), ProximaEmbeddings (text/image → vector converters using Apple's NaturalLanguage, Vision, and CoreML frameworks), and ProximaDemo (CLI) plus ProximaDemoApp (macOS SwiftUI app). All targets are distributed as a single Swift package.
ProximaKit is the foundation of the Chakravyuha stack and is used by TinyBrain (inference) and Lumen (knowledge retrieval) as their vector-search layer.
| ProximaKit | FAISS (C++) | Pinecone (Cloud) | |
|---|---|---|---|
| Language | Pure Swift | C++ wrapper | REST API |
| On-device | Yes | Needs bridging | No (cloud only) |
| Dependencies | Zero | libfaiss, numpy | API key + internet |
| Thread safety | Swift actors (compile-time) | Manual locks | N/A |
| iOS/macOS native | Yes | No | No |
| Setup time | 30 seconds | Hours | Minutes + billing |
◇ ── ◆ ── ◇ ── ◆ ── ◇
- macOS 14+ (macOS 15 recommended)
- Xcode 15+ / Swift 5.9+
- Apple Silicon (M1 or newer) — Accelerate SIMD paths are Apple Silicon–optimised
// Package.swift
dependencies: [
.package(url: "https://github.com/vivekptnk/ProximaKit.git", from: "1.0.0")
].target(
name: "YourApp",
dependencies: [
"ProximaKit", // Core: vectors, search indices, persistence
"ProximaEmbeddings", // Optional: turns text/images into vectors
]
)git clone https://github.com/vivekptnk/ProximaKit.git
cd ProximaKit
swift run ProximaDemoOr open Examples/ProximaDemoApp/ProximaDemoApp.xcodeproj in Xcode for the full GUI experience.
◆ ─────── ◇ ─────── ◆ ─────── ◇ ─────── ◆
You type: "beach vacation"
|
v
┌─────────────────┐
│ EmbeddingProvider│ Converts text to numbers
│ "beach" → [0.23, │ that capture its MEANING
│ -0.41, 0.87...]│ (using Apple's NaturalLanguage)
└────────┬─────────┘
v
┌─────────────────┐
│ HNSWIndex │ Searches a graph structure:
│ │ 1. Start at top layer (express lane)
│ Layer 2: ·──· │ 2. Greedily descend to best region
│ Layer 1: ·─·──· │ 3. Beam search on layer 0
│ Layer 0: ········ │ 4. Return k closest matches
└────────┬──────────┘
v
┌──────────────────┐
│ Search Results │ Ranked by similarity:
│ 0.12 Ocean waves │ Lower distance = more similar
│ 0.18 Tropical... │
│ 0.25 Travel... │
└──────────────────┘
All of this happens on your device, using Apple's Accelerate framework for SIMD math. No internet required.
◇ ── ◆ ── ◇ ── ◆ ── ◇
ProximaDemoApp is a macOS SwiftUI app that ships with the repo. It indexes 48 sample documents at startup and lets you search by meaning in real time, tune efSearch with a slider, add your own notes to the live index, and persist across app launches.
┌────────────────────────────────────────────────────────────────────────┐
│ ProximaDemoApp — semantic search over 48 sample documents │
│ │
│ ┌────────────────────┐ ┌──────────────────────────────────────────┐ │
│ │ efSearch ─── 50 │ │ Query: "space exploration" │ │
│ │ ▐██████████░░░░░░ │ │ ────────────────────────────────────── │ │
│ │ │ │ ● 0.41 Astronauts aboard the ISS... │ │
│ │ Corpus: 48 docs │ │ ● 0.44 NASA launched a new rover... │ │
│ │ Dimension: 512d │ │ ● 0.48 The moon landing changed... │ │
│ │ Build: ~0.9 s │ │ ● 0.51 Scientists study black holes │ │
│ │ Query: ~104 ms │ │ ● 0.55 The James Webb telescope... │ │
│ │ │ │ │ │
│ │ [ Add Note ] │ │ ● dist < 0.55 — strong match │ │
│ │ [ Add Image ] │ │ ● dist < 0.68 — partial match │ │
│ │ │ │ ● dist ≥ 0.68 — weak match │ │
│ └────────────────────┘ └──────────────────────────────────────────┘ │
└────────────────────────────────────────────────────────────────────────┘
Open in Xcode: open Examples/ProximaDemoApp/ProximaDemoApp.xcodeproj
◇ ── ◆ ── ◇ ── ◆ ── ◇
The simplest thing you can do. Uses Apple's built-in language model — no downloads, no setup.
import ProximaKit
import ProximaEmbeddings
// Set up
let embedder = try NLEmbeddingProvider(language: .english)
let index = HNSWIndex(dimension: embedder.dimension, metric: CosineDistance())
// Add content
let sentences = [
"The cat sat on the warm windowsill",
"Dogs love playing fetch in the park",
"Fresh pasta tastes better than dried",
"The sunset painted the sky orange",
]
for sentence in sentences {
let vector = try await embedder.embed(sentence)
let metadata = try JSONEncoder().encode(["text": sentence])
try await index.add(vector, id: UUID(), metadata: metadata)
}
// Search by meaning
let query = try await embedder.embed("animals playing outside")
let results = try await index.search(query: query, k: 3)
// Results: "Dogs love playing fetch" (closest match!)
// "The cat sat on the warm windowsill"
// "The sunset painted the sky orange"What happened: "animals playing outside" found the dog and cat sentences — even though none contain those exact words. It searched by meaning.
◇ ── ◆ ── ◇ ── ◆ ── ◇
let vision = VisionEmbeddingProvider()
let vector = try await vision.embed(myCGImage)
try await imageIndex.add(vector, id: photoID)
// Find visually similar images
let queryVector = try await vision.embed(anotherImage)
let similar = try await imageIndex.search(query: queryVector, k: 5)◇ ── ◆ ── ◇ ── ◆ ── ◇
Don't rebuild the index every time your app launches.
// Save (compact binary format)
try await index.save(to: fileURL)
// Load (memory-mapped for instant startup)
let loaded = try HNSWIndex.load(from: fileURL)◇ ── ◆ ── ◇ ── ◆ ── ◇
For higher quality search, bring a real sentence-transformer model:
let provider = try CoreMLEmbeddingProvider(
modelAt: modelURL,
vocabURL: vocabURL // WordPiece vocab for proper tokenization
)
let vector = try await provider.embed("sunset over the ocean")To convert a HuggingFace model to CoreML, use coremltools:
pip install coremltools transformers
python -c "
import coremltools as ct
from transformers import AutoModel, AutoTokenizer
model = AutoModel.from_pretrained('sentence-transformers/all-MiniLM-L6-v2')
# Export to CoreML with coremltools.convert()
"Place the exported .mlmodelc in Models/ and ProximaKit will discover it automatically.
◆ ─────── ◇ ─────── ◆ ─────── ◇ ─────── ◆
┌─────────────────────────────────────┐
│ Y O U R A P P │
│ (SwiftUI) │
└──────────────┬──────────────────────┘
│
embed() │ search()
│
┌───────────────────┼───────────────────────┐
│ │ ProximaEmbeddings │
│ v │
│ ┌──────────┐ ┌──────────┐ ┌────────┐ │
│ │ NLEmbed │ │ Vision │ │ CoreML │ │
│ │ Provider │ │ Provider │ │Provider│ │
│ └─────┬────┘ └────┬─────┘ └───┬────┘ │
│ └────────────┼─────────────┘ │
│ │ │
│ EmbeddingProvider protocol │
└──────────────────────┼────────────────────┘
│
[Float] vectors
│
┌──────────────────────┼────────────────────┐
│ v ProximaKit │
│ │
│ ┌────────────────────────────────────┐ │
│ │ I N D E X L A Y E R │ │
│ │ │ │
│ │ HNSWIndex BruteForce │ │
│ │ ◆──◆──◆ ◆ ◆ ◆ ◆ │ │
│ │ │╲ │ ╱│ ◆ ◆ ◆ ◆ │ │
│ │ ◆──◆──◆ ◆ ◆ ◆ ◆ │ │
│ │ O(log n) O(n) │ │
│ └──────────────┬─────────────────────┘ │
│ │ │
│ ┌──────────────┴─────────────────────┐ │
│ │ D I S T A N C E M E T R I C S │ │
│ │ cosine · euclidean · dot product │ │
│ │ manhattan · hamming │ │
│ │ (vDSP / Accelerate) │ │
│ └──────────────┬─────────────────────┘ │
│ │ │
│ ┌──────────────┴─────────────────────┐ │
│ │ P E R S I S T E N C E │ │
│ │ binary save · mmap load · compact │ │
│ └────────────────────────────────────┘ │
│ │
│ Foundation + Accelerate ONLY │
└────────────────────────────────────────────┘
| Module | What It Does |
|---|---|
ProximaKit |
Core engine: vectors, distance metrics, HNSW graph search, persistence |
ProximaEmbeddings |
Converts text/images to vectors using Apple frameworks |
ProximaDemo |
Interactive demo app with live semantic search |
◆ ─────── ◇ ─────── ◆ ─────── ◇ ─────── ◆
╔══════════════════════════════════════════════════╗
║ P E R F O R M A N C E ║
╠══════════════════════════════════════════════════╣
║ ║
║ ⚡ Query 104 ms ████████████░░░░░░ ║
║ ⚡ Cold start 50 ms █████░░░░░░░░░░░░░ ║
║ ⚡ Build ~3.0 s ██████████████████░ ║
║ ║
║ ◎ Recall@10 (1K) 98-99% ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ ║
║ ◎ Recall@10 (10K) 87%+ ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓░░░ ║
║ ║
║ ✓ Save/load roundtrip: exact binary match ║
║ ✓ Memory-mapped I/O for instant startup ║
║ ║
╚══════════════════════════════════════════════════╝
◇ ── ◆ ── ◇ ── ◆ ── ◇
| Index | When | Speed |
|---|---|---|
HNSWIndex |
Most cases. Fast approximate search, scales to millions. | O(log n) |
BruteForceIndex |
Under 1,000 items. 100% perfect accuracy. | O(n) |
Both have the exact same API. Swap them without changing any other code.
◇ ── ◆ ── ◇ ── ◆ ── ◇
| Metric | When | Plain English |
|---|---|---|
CosineDistance() |
Text search. Use this unless you have a reason not to. | "How different is the direction?" |
EuclideanDistance() |
Spatial data (coordinates, sensors). | "How far apart are these?" |
DotProductDistance() |
Pre-normalized vectors (advanced). | "How aligned are these?" |
ManhattanDistance() |
Sparse data, grid-based problems. | "How many blocks apart?" |
HammingDistance() |
Binary/quantized vectors. | "How many bits differ?" |
◇ ── ◆ ── ◇ ── ◆ ── ◇
let config = HNSWConfiguration(
m: 16, // Connections per node
efConstruction: 200, // Build quality
efSearch: 50 // Search quality
)| Problem | Fix |
|---|---|
| Results aren't relevant | Increase efSearch (try 100-200) |
| Search too slow | Decrease efSearch (try 20) |
| Too much memory | Decrease m (try 8) |
| Build takes too long | Decrease efConstruction (try 100) |
◇ ── ◆ ── ◇ ── ◆ ── ◇
ProximaKit is fully thread-safe. Both indices are Swift actor types — search from any thread, no crashes, no data races. The compiler enforces this at build time.
// Safe from any thread or Task:
let results = try await index.search(query: vector, k: 10)
try await index.add(newVector, id: UUID())◆ ─────── ◇ ─────── ◆ ─────── ◇ ─────── ◆
| Type | What |
|---|---|
Vector |
A list of floats. The fundamental data type. |
HNSWIndex |
Fast approximate search (use this one). |
BruteForceIndex |
Exact search (for small datasets). |
CosineDistance |
Direction-based similarity (best for text). |
EuclideanDistance |
Straight-line distance. |
DotProductDistance |
Alignment-based (for normalized vectors). |
ManhattanDistance |
L1 / taxicab distance (sparse data). |
HammingDistance |
Count of differing positions (binary vectors). |
SearchResult |
Result: id, distance, metadata. |
HNSWConfiguration |
Tuning: m, efConstruction, efSearch. |
PersistenceEngine |
Binary save/load with memory mapping. |
| Type | What |
|---|---|
NLEmbeddingProvider |
Text to vector. Apple's built-in model. No setup. |
VisionEmbeddingProvider |
Image to vector. Apple's Vision framework. |
CoreMLEmbeddingProvider |
Any CoreML model (BERT, MiniLM, etc). |
WordPieceTokenizer |
BERT-compatible tokenizer for CoreML models. |
◇ ── ◆ ── ◇ ── ◆ ── ◇
See docs/adr/ for Architecture Decision Records:
- ADR-001: Why Accelerate/vDSP for all vector math
- ADR-002: Why actors for thread safety
- ADR-003: Why custom binary (not JSON)
- ADR-004: Why heuristic neighbor selection
◇ ── ◆ ── ◇ ── ◆ ── ◇
# Build
swift build
# Unit + integration tests (fast)
swift test --skip RecallBenchmarkTests
# Full recall benchmarks (slow, needs Release mode)
swift test -c release --filter RecallBenchmarkTests
# Generate DocC documentation
swift package generate-documentation --target ProximaKit◆ ─────── ◇ ─────── ◆ ─────── ◇ ─────── ◆
See docs/ROADMAP.md for the detailed plan. Highlights:
| Area | Status |
|---|---|
| Additional distance metrics — Mahalanobis, Chebyshev, Bray-Curtis | Planned |
| GPU acceleration — Metal/MPSGraph backend for batch index builds | Planned |
| Binary quantization — INT8 scalar, product quantization (PQ) | Planned |
| Filtered search — pre-filter by metadata predicate before ANN | Planned |
| ADR backlog — quantization strategy, filtered search design | In progress |
| Demo app — iOS target, CoreML model download UI, result export | Planned |
Items flagged in the documentation audit (CONTRIBUTING.md polish, CHANGELOG.md, demo app README expansion) are tracked in the roadmap but are out of scope for this release.
◆ ─────── ◇ ─────── ◆ ─────── ◇ ─────── ◆
MIT — use it for anything.
Author: Vivek Pattanaik