A personal research project exploring whether biometric authentication is possible without storing any biometric data — not even a derived template.
ZKHD combines three ideas:
SimHash encoding — ArcFace face embeddings (512-dim floats) are projected into 32,000-bit binary hypervectors via Sign Random Projections. The encoding preserves angular distance: same-person pairs land close in Hamming space, different-person pairs land near 0.5.
Fuzzy extraction — a cryptographic primitive that derives a stable key from a noisy biometric using stored helper data. Currently mocked with an oracle; the real construction requires an error-correcting code (SC-LDPC) that this project has not yet implemented.
Zero-knowledge proof — the user should prove "my biometric is close enough" without revealing anything. Currently mocked with an Ed25519 signature.
The representation layer (ArcFace + SimHash) is implemented and empirically validated. The cryptographic layer (fuzzy extractor, ZKP) is scaffolded with placeholders.
The whitepaper reports a negative result: under standard fuzzy extractor bounds, the error rate required for 2D face recognition in the wild forces the helper string to leak more min-entropy than a 2D face embedding is estimated to contain. Secure 256-bit key extraction from a 2D camera is not achievable under these assumptions.
| Dataset | Identities | Best tau | FRR | FAR |
|---|---|---|---|---|
| Five Faces | 5 | 0.44 | 0.15% | 0.09% |
| LFW (>=5 img. per identity) | 420 | 0.41 | 1.23% | 0.08% |
| VGGFace2 subset | 50 | 0.43 | 2.63% | 0.07% |
ZKHD/
├── Code/
│ ├── hv.py SimHash encoder, projection matrix, integrity guard
│ ├── simulate.py Full pipeline: extract, encode, enroll, evaluate
│ ├── fuzzy_extractor.py Fuzzy extractor (oracle mock)
│ ├── zk.py ZKP (Ed25519 mock)
│ └── requirements.txt
└── Dataset/ Not tracked - place your datasets here
Dataset layout: Dataset/<name>/<user>/*.jpg/jpeg or flat Dataset/<name>/<identity>_*.jpg/jpeg.
cd Codepip3 install -r requirements.txtpython3 simulate.py --enroll 3The program prompts for backend (buffalo_l / antelopev2), dataset, and core count.