TinyDiskANN is a lightweight vector search library designed for scenarios where:
- You have billions of vectors but limited RAM
- You need extreme simplicity without complex dependencies
- Query latency of 100ms-1s is acceptable
- You want zero configuration and minimal setup
Instead of building complex in-memory indexes like graphs or trees, TinyDiskANN:
- Compresses vectors using Product Quantization (PQ)
- Stores them sequentially on disk
- Scans through them linearly during queries
- Keeps only the current batch and top results in RAM
This brute-force approach is surprisingly effective when combined with:
- Modern SSD throughput (3-7GB/s)
- Smart batching and vectorization
- Heavy compression (4-8 bits per dimension)
| Technique | RAM Use | Query Speed | Accuracy | Setup Complexity | Best For |
|---|---|---|---|---|---|
| TinyDiskANN | Minimal | 100ms-1s | Good | None | Memory-constrained billion-scale |
| HNSW | High | 1-10ms | Excellent | Moderate | Low-latency in-memory search |
| FAISS (Flat) | High | 1-100ms | Perfect | None | Small datasets, exact search |
| DiskANN | Medium | 10-100ms | Excellent | High | Balanced disk/memory use |
| SPANN | Medium | 10-100ms | Good | High | Very large distributed datasets |
Key differences:
- No graph construction - TinyDiskANN skips the hours-long index build phase
- No memory spikes - RAM usage stays constant regardless of dataset size
- Portable - Just Python + NumPy, no C++/CUDA dependencies
- Predictable performance - Linear scan means consistent, if slow, queries
✅ Best for:
- "Cold" archival data where queries are rare
- Memory-constrained environments (even <1GB RAM)
- Simple deployments where complex systems won't fit
- Educational purposes to understand vector search basics
❌ Not for:
- Real-time applications needing <100ms latency
- Frequently updated datasets (rebuilds are expensive)
- Cases where 95%+ recall is mandatory
import numpy as np
from tinydiskann import TinyDiskANN
# 1. Create some random vectors
data = np.random.randn(10000, 128).astype(np.float32)
# 2. Train and store the index
codebooks = TinyDiskANN.train_codebooks(data)
pq_codes = TinyDiskANN.quantize_vectors(data, codebooks)
TinyDiskANN.store_pq_codes(pq_codes, codebooks, "my_index")
# 3. Search!
query = np.random.randn(128)
results = TinyDiskANN.search(query, codebooks, "my_index", top_k=5)On a standard laptop with SSD (1M 128-dim vectors):
- Build time: ~2 minutes
- Query latency: ~300ms
- RAM usage: <50MB (vs 2GB+ for in-memory indexes)
- Accuracy: ~80% recall@10 vs exact search
This is purposefully minimal software. For production systems consider:
- Adding multi-threading for batch queries
- Implementing memory-mapped files for zero-copy reads
- Adding support for incremental updates
But sometimes - simple is exactly what you need!