Find target hash collisions for Apple's NeuralHash perceptual hash function.
For example, starting from a picture of this cat, we can find an adversarial image that has the same hash as the picture of the dog in this post:
$ python collide.py --image cat.jpg --target 59a34eabe31910abfb06f308
...
# took about 2.5 minutes to run on an i7-5930K
We can confirm the hash collision using nnhash.py
from
AsuharietYgvar/AppleNeuralHash2ONNX:
$ python nnhash.py dog.png
59a34eabe31910abfb06f308
$ python nnhash.py adv.png
59a34eabe31910abfb06f308
NeuralHash is a perceptual hash
function that uses a neural
network. Images are resized to 360x360 and passed through a neural network to
produce a 128-dimensional feature vector. Then, the vector is projected onto
R^96 using a 128x96 "seed" matrix. Finally, to produce a 96-bit hash, the
96-dimensional vector is thresholded: negative entries turn into a 0
bit, and
non-negative entries turn into a 1
bit.
This entire process, except for the thresholding, is differentiable, so we can use gradient descent to find hash collisions. This is a well-known property of neural networks, that they are vulnerable to adversarial examples.
We can define a loss that captures how close an image is to a given target
hash: this loss is basically just the NeuralHash algorithm as described above,
but with the final "hard" thresholding step tweaked so that it is "soft" (in
particular, differentiable). Exactly how this is done (choices of activation
functions, parameters, etc.) can affect convergence, so it can require some
experimentation. Refer to collide.py
to see what the implementation currently
does.
After choosing the loss function, we can follow the standard method to find adversarial examples for neural networks: we perform gradient descent.
- Get Apple's NeuralHash model following the instructions in
AsuharietYgvar/AppleNeuralHash2ONNX and either put all the
files in this directory or supply the
--model
/--seed
arguments - Install Python dependencies:
pip install onnx coremltools onnx_tf tensorflow numpy Pillow
Run python collide.py --image [path to image] --target [target hash]
to
generate a hash collision. Run python collide.py --help
to see all the
options, including some knobs you can tweak, like the learning rate and some
other parameters.
The code in this repository is intended to be a demonstration, and perhaps a starting point for other exploration. Tweaking the implementation (choice of loss function, choice of parameters, etc.) might produce much better results than this code currently achieves.
The code in this repository currently implements a simple loss function that just measures the distance to the target hash value. It happens to be the case that starting from a particular image produces a final image that looks somewhat similar; to better enforce this property, the loss function could be modified to add a penalty for making the image look different, e.g. l2 distance between the original image and the computed adversarial example (another standard technique), or we could use projected gradient descent to project onto an l-infinity ball centered at the original image as we optimize (yet another standard technique).
The code in this repository does not currently use any fancy optimization algorithm, just vanilla gradient descent.