PDX is a vertical data layout for vectors that store the dimensions of different vectors together. In a nutshell, PDX is a PAX for vector similarity search.
- ✂️ Efficient pruning of dimensions with partial distance calculations.
- ⚡ Up to 7x faster IVF queries (compared to FAISS+AVX512) when pairing PDX with the pruning algorithm ADSampling.
- ⚡ Up to 13x faster exhaustive search thanks to pruning.
- ⚡ Raw distance kernels (no pruning) in PDX are up to 1.6x faster than the
float32kernels available in SimSIMD (USearch) and FAISS.- Why? Distance kernels in PDX are free of dependencies and have fewer LOAD/STORE operations.
- Distance kernels can auto-vectorize efficiently without explicit SIMD for
float32. - Distance kernels on small vectors (
d < 16) are up to 8x faster than SIMD kernels in SimSIMD. - More efficient compressed representations of vectors (WIP).
Pruning means avoiding checking all the dimensions of a vector to determine if it will make it onto the KNN of a query. Pruning speedup vector similarity search as (i) less data must be fetched and (ii) fewer computations must be done.
However, pruning methods that do partial distance calculations have a hard time being on par with SIMD-optimized kernels like the ones in FAISS and SimSIMD.
Only thanks to the PDX layout, do pruning methods outperform SIMD-optimized kernels in all CPU microarchitectures (Zens, Intels, Gravitons). This is because, in PDX, distance calculations are efficient even in small vector segments. Also, the evaluation of the pruning predicate is not interleaved with distance calculations.
Pruning algorithms are especially effective when:
- Vectors are of high dimensionality (
d > 512) - High recalls are needed (
> 0.90) - Exact results are needed
kis relatively low (k=10,20)
Click here to see some quick benchmarks of our Python bindings vs FAISS. The complete benchmarks are available in our publication. Furthermore, you will find details on how we adapted these novel pruning algorithms to work in PDX.
We also refer to the recent research on pruning algorithms with partial distance calculations: ADSampling, DDC (previously named BSA), and DADE. All of these rely on rotating the vector collection to prune effectively. Alongside PDX, we also introduce PDX-BOND, a simpler pruning algorithm that does not need to rotate the vectors.
You can quickly try PDX with your data using our Python bindings and examples. We have implemented PDX on Flat IVF indexes and exhaustive search settings.
- Python 3.11 or higher
- FAISS with Python Bindings
- Clang++17 or higher
- CMake 3.26 or higher
- Clone the repository and init submodules (
Eigenfor efficient matrix operations andpybind11)
git clone https://github.com/cwida/PDX
git submodule init
git submodule update- Install Python dependencies and
pdxearchPython bindings.
export CXX="/usr/bin/clang++-18" # Set proper CXX first
pip install -r requirements.txt
python setup.py clean --all
python -m pip install .- Run the examples under
/examples
# Creates an IVF index with FAISS on random data
# Then, it compares the search performance of PDXearch and FAISS
python ./examples/pdxearch_simple.pyFor more details on the available examples and how to use your own data, refer to /examples/README.md.
- We heavily rely on FAISS to create the underlying IVF indexes.
- PDX is an ongoing research project. In its current state, it is not production-quality code.
PDX paired with ADSampling on IVF indexes works great in most scenarios with less than 0.001 recall loss. The higher the dimensionality, the higher the gains from pruning. The following benchmarks are from /examples/pdxearch_ivf.py.
NOTE THAT on these benchmarks: (i) Both FAISS and PDXearch are scanning exactly the same vectors. (ii) The recall loss of ADSampling is always less than 0.001.
| Avg. query time [Intel SPR | r7iz.2x] |
FAISS AVX512 R@10: 0.99 · 0.95 · 0.90 |
PDXearch | Improvement |
|---|---|---|---|
| DBPedia · d=1536 · 1M | 53.6 · 18.0 · 7.7 ms | 7.4 · 3.7 · 2.4 ms | 7.2x · 4.9x · 3.2x |
| arXiv · d=768 · 2.25M | 25.9 · 7.7 · 3.5 | 6.2 · 2.8 · 1.7 | 4.2x · 2.8x · 2.1x |
| SIFT · d=128 · 1M | 1.7 · 0.6 · 0.4 | 1.0 · 0.4 · 0.3 | 1.7x · 1.5x · 1.3x |
| Avg. query time [Graviton 4 | r8g.2x] |
FAISS SVE R@10: 0.99 · 0.95 · 0.90 |
PDXearch | Improvement |
|---|---|---|---|
| DBPedia · d=1536 · 1M | 47.1 · 18.4 · 6.7 ms | 7.1 · 4.1 · 2.5 ms | 6.6x · 4.5x · 2.7x |
| arXiv · d=768 · 2.25M | 25.3 · 7.0 · 3.2 | 5.9 · 2.7 · 1.7 | 4.3x · 2.6x · 1.9x |
| SIFT · d=128 · 1M | 1.1 · 0.5 · 0.3 | 0.7 · 0.4 · 0.2 | 1.6x · 1.3x · 1.3x |
In PDX, building an IVF index can significantly improve exact search speed (thanks to the reliable pruning). The following benchmarks are from /examples/pdxearch_ivf_exhaustive.py.
| Avg. query time [Intel SPR | r7iz.2x] |
FAISS AVX512 | PDXearch | Improvement |
|---|---|---|---|
| DBPedia · d=1536 · 1M | 411.7 ms | 34.9 ms | 11.8x |
| GIST · d=960 · 1M | 252.9 | 27.6 | 9.1x |
| arXiv · d=768 · 2.25M | 454.5 | 41.4 | 11.0x |
| SIFT · d=128 · 1M | 34.4 | 6.5 | 5.3x |
| Avg. query time [Graviton 4 | r8g.2x] |
FAISS SVE | PDXearch | Improvement |
|---|---|---|---|
| DBPedia · d=1536 · 1M | 277.2 ms | 21.7 ms | 12.8x |
| GIST · d=960 · 1M | 170.9 | 19.4 | 8.8x |
| arXiv · d=768 · 2.25M | 306.0 | 27.2 | 11.3x |
| SIFT · d=128 · 1M | 21.3 | 4.7 | 4.5x |
Use PDX+BOND, our pruning algorithm. Here, vectors are not transformed, and we do not use any additional index. Gains vary depending on the dimensions distribution. The following benchmarks are from /examples/pdxearch_exact_bond.py.
| Avg. query time [Intel SPR | r7iz.2x] |
FAISS AVX512 | PDXearch | Improvement |
|---|---|---|---|
| DBPedia · d=1536 · 1M | 374 ms | 216 ms | 1.7x |
| arXiv · d=768 · 2.25M | 422 | 212 | 2.0x |
| MSong · d = 420 · 1M | 117 | 15 | 7.8x |
| SIFT · d=128 · 1M | 31 | 10 | 3.1x |
| Avg. query time [Graviton 4 | r8g.2x] |
FAISS SVE | PDXearch | Improvement |
|---|---|---|---|
| DBPedia · d=1536 · 1M | 278 ms | 139 ms | 2.0x |
| arXiv · d=768 · 2.25M | 305 | 155 | 2.0x |
| MSong · d = 420 · 1M | 70 | 15 | 4.7x |
| SIFT · d=128 · 1M | 22 | 11 | 2.0x |
PDX distance kernels are also faster than the state-of-the-art SIMD kernels in all major architectures, only relying on auto-vectorization (for float32). The following benchmarks are from /examples/pdx_brute.py.
| Avg. query time [Intel SPR | r7iz.2x] |
USearch | FAISS AVX512 | PDXearch | Improvement |
|---|---|---|---|---|
| GIST · d=960 · n=1M | 260 ms | 247 ms | 208 ms | 1.3x · 1.2x |
| STL · d=9216 · n=90K | 223 | 220 | 176 | 1.3x · 1.3x |
| Avg. query time [Graviton 4 | r8g.2x] |
USearch | FAISS SVE | PDXearch | Improvement |
|---|---|---|---|---|
| GIST · d=960 · n=1M | 203 ms | 172 ms | 124 ms | 1.6x · 1.4x |
| STL · d=9216 · n=90K | 175 | 160 | 109 | 1.6x · 1.5x |
Refer to our publication for the complete benchmarks of PDXearch done in 4 microarchitectures: Intel SPR, Zen 4, Zen 3, and Graviton 4. Here, you will also find a comparison against ADSampling in the horizontal layout (.fvecs layout). Furthermore, you will find details on how we adapted pruning algorithms to work in PDX.
- Compression: The vertical layout opens opportunities for compression as indexing algorithms group together vectors that share some numerical similarity within their dimensions. The next step on PDX is to apply our scalar quantization algorithm LEP, which uses database compression techniques (ALP) to compress vectors with higher compression ratios and less information loss.
- More data types: For compressed vectors, we need to implement vertical distance kernels on vectors of variable bit size.
- PDX in HNSW: For this, we need a layout similar to the ones proposed in Starling or AiSAQ, in which neighborhoods of the graph are stored and fetched in blocks.
- Improve code readability and usability.
- Add a testing framework
- Add DDC or DADE algorithms to the Python Bindings
To run our benchmark suite in C++, refer to BENCHMARKING.md.
The code used for the experiments presented at SIGMOD'25 can be found in the sigmod branch.