Skip to content

jangilcher/cryptoTesting

Repository files navigation

Fuzzing Crypto

Code release for

Fenzi, G., Gilcher, J., Virdia, F. (2026). Finding Bugs and Features Using Cryptographically-Informed Functional Testing. IACR Transactions on Cryptographic Hardware and Embedded Systems, 2026(1). https://eprint.iacr.org/2024/1122

In this repository, we include instructions to run our tests, instructions to run a baseline of generic fuzzing, and an explanation of our source code explaining how it is structured.

Dependencies

  • Docker
  • (Soft dependence): a server with 16+ cores to run the tests in parallel. See Table 2 of the paper for wall times for test completion on 31-core and 75-core machines.

Instructions to run

bash run.sh

A terminal within the container will open.

To run experiments on liboqs

To run experiments on version:

  • 0.14.0, replace ver_liboqs with ches_liboqs below.
  • 0.8.0, replace ver_liboqs with cur_liboqs below.
  • 0.4.0, replace ver_liboqs with mid_liboqs below.
  • 2018-11, replace ver_liboqs with old_liboqs below.

Within the container, run

bash reproduce.sh ver_liboqs

Comment the values in BLACKLIST on lines 35-42 of fuzz_liboqs.py if you want a full run (it will take significantly longer).

To run experiments on supercop 20240107

Within the container

bash reproduce.sh supercop

Reports

The reports generated by the code above can be found inside /reports/, which is mounted as a volume within the docker container. The reports are generated in three formats, as SQLite database, as Excel file and as a Latex table.

The reports from the experiments described in the paper can be found in /paper_reports/. The SQLite format was omitted due to the size of the databases.

Reading the reports

The reports generated by our code refer to the tests with specific names rather than numbers (as done in the paper). The mapping between the two is the following:

Paper test number Paper test name Report test name
Test 1 Hash(Maul(x)) this is the test performed on SUPERCOP
Test 2 Gen(; Maul(r)) KEM/Keygen/badrng and SIGN/Keygen/badrng
Test 3 Encaps(Maul(pk); r) KEM/Encaps/pk-0
Test 4 Decaps(sk, Encaps(Maul(pk); r)) KEM/Encaps/pk
Test 5 Encaps(pk; Maul(r)) KEM/Encaps/badrng
Test 6 Decaps(Maul(sk), c) KEM/Decaps/sk
Test 7 Decaps(sk, Maul(c)) KEM/Decaps/c
Test 8 Sign(Maul(sk), m; r) SIGN/Sign/sk
Test 9 Sign(sk, Maul(m); r) SIGN/Sign/m
Test 10 Sign(sk, m; Maul(r)) SIGN/Sign/badrng
Test 11 Verify(Maul(pk), m, sigma) SIGN/Verify/pk
Test 12 Verify(pk, Maul(m), sigma) SIGN/Verify/m
Test 13 Verify(pk, m, Maul(sigma)) SIGN/Verify/sig

Expected deviations

Some observed "software" crashes are partially probabilistic in nature. For example, hangs are measured by wall time, meaning that running the same tests on a slower CPU could result in more hangs being reported. Similarly, out-of-bounds memory writes may not cause segmentation faults if the memory they write in is not currently allocated to a different process.

This may result in slightly different numbers if reproducing our experiments on the same libraries but different hardware.

Baseline

The instructions in this section allow reproducing the experimental results from section "5.1.1 Baseline" in the paper.

First, create and run the same container as for the above experiments by running.

bash run.sh

A terminal within the container will open.

To run the baseline on liboqs

To run experiments on version:

  • 0.14.0, replace ver_liboqs with ches_liboqs_afl below.
  • 0.8.0, replace ver_liboqs with cur_liboqs_afl below.
  • 0.4.0, replace ver_liboqs with mid_liboqs_afl below.
  • 2018-11, replace ver_liboqs with old_liboqs_afl below.

Within the container, run

bash reproduce.sh ver_liboqs baseline

To run the baseline on supercop 20240107

Within the container, run

bash reproduce.sh supercop baseline

Source code structure

  • Dockerfile: configuration to build an environment that can reproduce results
  • Makefile: configuration to build dependencies for the experiments; takes care of cloning the correct snapshots for LibOQS/SUPERCOP, install dependencies and correct versions of the compiler, fetches, patches and installs AFL++
  • build.sh / run.sh: scripts for creating Docker container to reproduce results in
  • reproduce.sh: wrapper script that consolidates the various steps to reproduce our experiments within the Docker container
  • fuzz_liboqs.py: starts the parallel testing of every implementation provided by LibOQS. For each scheme, it runs the relevant tests by internally calling AFL++ on a harness implementing the metamorphic test
  • fuzz_liboqs_baseline.py: similarly to fuzz_liboqs.py, this script runs the baseline fuzzing campaign.
  • report.py: collects crashes generated by AFL++ within fuzz_liboqs.py, generating reports in three formats: an Excel table, a SQLite database, and a less detailed Latex table. The Excel and SQLite reports contain every crash found, including all inputs to the algorithm being tested that cause the crash/security notion violation, and the diff between the original input and the mauled input that caused the crash/security notion violation
  • report_baseline.py: similar to report.py, collects results from the baseline fuzzing campaign and reports them in Excel and SQLite formats
  • supercop_report.py/supercop_report_baseline.py: similar to report.py/report_baseline.py, they collect crashes and build Excel and SQLite reports for SUPERCOP experiments and baseline fuzzing
  • paper_reports/: directory containing Excel-format reports from experiments reported in the papers, generated with the various reporting scripts mentioned above.
  • /tech/paper_fuzzing/liboqs: contains C and Python code implementing our testing framework for LibOQS. Test harnesses for KEM (resp. SIGN) tests can be found in the KEM (resp. SIGN) subdirectory. For example, consider the KEM.Decaps(sk, Maul(c)) test (source files within /tech/paper_fuzzing/liboqs/KEM/Decaps/c):
    • Call.c: implements the Call function from Definition 7
    • GenInput.c: implements GenInput from Definition 7
    • ParseInput.c: implements a program that passed in input a crash dump generated by AFL++, displays to screen the inputs and outputs from the Call function evaluated for the corresponding dump
    • CodeGen.py: given in input a crash dump, it outputs a C source file that replicates the crashing Call evaluation as a standalone binary (useful when inspecting the cause of a crash)
    • Makefile: contains the necessary commands to run AFL++ on a desired KEM and library version, using our custom mutator.
    • Note: Match and Maul from Definition 7 do not appear in this directory, since they are not specific to the KEM/Decaps/c test. Instead, these are shared by most tests and can be found in /tech/paper_fuzzing/liboqs/
  • /tech/paper_fuzzing/supercop/crypto_hash: Implements our testing specification from Definition 7 (Call, GenInput, Maul, Match). The testing loop is implemented in supercop.sh, which tries to follow the structure of the testing scripts provided by SUPERCOP such as do-part or data-run
  • /tech/paper_fuzzing/utilities: contains small utility C libraries to perform operations on buffers, and an implementation of a custom PRNG for LibOQS tests where we maul the randomness source
  • /tech/paper_fuzzing/vanilla: structured like /tech/paper_fuzzing, it contains the source code needed to run the baseline fuzzing campaign to compare against.

License

This software is distributed under the GNU General Public License version 3. See LICENSE for more details.

Contributors

Code was contributed by

  • Jan Gilcher
  • Fernando Virdia
  • Giacomo Fenzi

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors