Code release for
Fenzi, G., Gilcher, J., Virdia, F. (2026). Finding Bugs and Features Using Cryptographically-Informed Functional Testing. IACR Transactions on Cryptographic Hardware and Embedded Systems, 2026(1). https://eprint.iacr.org/2024/1122
In this repository, we include instructions to run our tests, instructions to run a baseline of generic fuzzing, and an explanation of our source code explaining how it is structured.
- Docker
- (Soft dependence): a server with 16+ cores to run the tests in parallel. See Table 2 of the paper for wall times for test completion on 31-core and 75-core machines.
bash run.shA terminal within the container will open.
To run experiments on version:
- 0.14.0, replace
ver_liboqswithches_liboqsbelow. - 0.8.0, replace
ver_liboqswithcur_liboqsbelow. - 0.4.0, replace
ver_liboqswithmid_liboqsbelow. - 2018-11, replace
ver_liboqswithold_liboqsbelow.
Within the container, run
bash reproduce.sh ver_liboqsComment the values in BLACKLIST on lines 35-42 of fuzz_liboqs.py if you want a full run (it will take significantly longer).
Within the container
bash reproduce.sh supercopThe reports generated by the code above can be found inside /reports/, which is mounted as a volume within the docker container. The reports are generated in three formats, as SQLite database, as Excel file and as a Latex table.
The reports from the experiments described in the paper can be found in /paper_reports/.
The SQLite format was omitted due to the size of the databases.
The reports generated by our code refer to the tests with specific names rather than numbers (as done in the paper). The mapping between the two is the following:
| Paper test number | Paper test name | Report test name |
|---|---|---|
| Test 1 | Hash(Maul(x)) | this is the test performed on SUPERCOP |
| Test 2 | Gen(; Maul(r)) | KEM/Keygen/badrng and SIGN/Keygen/badrng |
| Test 3 | Encaps(Maul(pk); r) | KEM/Encaps/pk-0 |
| Test 4 | Decaps(sk, Encaps(Maul(pk); r)) | KEM/Encaps/pk |
| Test 5 | Encaps(pk; Maul(r)) | KEM/Encaps/badrng |
| Test 6 | Decaps(Maul(sk), c) | KEM/Decaps/sk |
| Test 7 | Decaps(sk, Maul(c)) | KEM/Decaps/c |
| Test 8 | Sign(Maul(sk), m; r) | SIGN/Sign/sk |
| Test 9 | Sign(sk, Maul(m); r) | SIGN/Sign/m |
| Test 10 | Sign(sk, m; Maul(r)) | SIGN/Sign/badrng |
| Test 11 | Verify(Maul(pk), m, sigma) | SIGN/Verify/pk |
| Test 12 | Verify(pk, Maul(m), sigma) | SIGN/Verify/m |
| Test 13 | Verify(pk, m, Maul(sigma)) | SIGN/Verify/sig |
Some observed "software" crashes are partially probabilistic in nature. For example, hangs are measured by wall time, meaning that running the same tests on a slower CPU could result in more hangs being reported. Similarly, out-of-bounds memory writes may not cause segmentation faults if the memory they write in is not currently allocated to a different process.
This may result in slightly different numbers if reproducing our experiments on the same libraries but different hardware.
The instructions in this section allow reproducing the experimental results from section "5.1.1 Baseline" in the paper.
First, create and run the same container as for the above experiments by running.
bash run.shA terminal within the container will open.
To run experiments on version:
- 0.14.0, replace
ver_liboqswithches_liboqs_aflbelow. - 0.8.0, replace
ver_liboqswithcur_liboqs_aflbelow. - 0.4.0, replace
ver_liboqswithmid_liboqs_aflbelow. - 2018-11, replace
ver_liboqswithold_liboqs_aflbelow.
Within the container, run
bash reproduce.sh ver_liboqs baselineWithin the container, run
bash reproduce.sh supercop baselineDockerfile: configuration to build an environment that can reproduce resultsMakefile: configuration to build dependencies for the experiments; takes care of cloning the correct snapshots for LibOQS/SUPERCOP, install dependencies and correct versions of the compiler, fetches, patches and installs AFL++build.sh / run.sh: scripts for creating Docker container to reproduce results inreproduce.sh: wrapper script that consolidates the various steps to reproduce our experiments within the Docker containerfuzz_liboqs.py: starts the parallel testing of every implementation provided by LibOQS. For each scheme, it runs the relevant tests by internally calling AFL++ on a harness implementing the metamorphic testfuzz_liboqs_baseline.py: similarly tofuzz_liboqs.py, this script runs the baseline fuzzing campaign.report.py: collects crashes generated by AFL++ withinfuzz_liboqs.py, generating reports in three formats: an Excel table, a SQLite database, and a less detailed Latex table. The Excel and SQLite reports contain every crash found, including all inputs to the algorithm being tested that cause the crash/security notion violation, and the diff between the original input and the mauled input that caused the crash/security notion violationreport_baseline.py: similar toreport.py, collects results from the baseline fuzzing campaign and reports them in Excel and SQLite formatssupercop_report.py/supercop_report_baseline.py: similar toreport.py/report_baseline.py, they collect crashes and build Excel and SQLite reports for SUPERCOP experiments and baseline fuzzingpaper_reports/: directory containing Excel-format reports from experiments reported in the papers, generated with the various reporting scripts mentioned above./tech/paper_fuzzing/liboqs: contains C and Python code implementing our testing framework for LibOQS. Test harnesses for KEM (resp. SIGN) tests can be found in the KEM (resp. SIGN) subdirectory. For example, consider the KEM.Decaps(sk, Maul(c)) test (source files within/tech/paper_fuzzing/liboqs/KEM/Decaps/c):Call.c: implements the Call function from Definition 7GenInput.c: implements GenInput from Definition 7ParseInput.c: implements a program that passed in input a crash dump generated by AFL++, displays to screen the inputs and outputs from the Call function evaluated for the corresponding dumpCodeGen.py: given in input a crash dump, it outputs a C source file that replicates the crashing Call evaluation as a standalone binary (useful when inspecting the cause of a crash)Makefile: contains the necessary commands to run AFL++ on a desired KEM and library version, using our custom mutator.- Note: Match and Maul from Definition 7 do not appear in this directory, since they are not specific to the
KEM/Decaps/ctest. Instead, these are shared by most tests and can be found in/tech/paper_fuzzing/liboqs/
/tech/paper_fuzzing/supercop/crypto_hash: Implements our testing specification from Definition 7 (Call, GenInput, Maul, Match). The testing loop is implemented insupercop.sh, which tries to follow the structure of the testing scripts provided by SUPERCOP such asdo-partordata-run/tech/paper_fuzzing/utilities: contains small utility C libraries to perform operations on buffers, and an implementation of a custom PRNG for LibOQS tests where we maul the randomness source/tech/paper_fuzzing/vanilla: structured like /tech/paper_fuzzing, it contains the source code needed to run the baseline fuzzing campaign to compare against.
This software is distributed under the GNU General Public License version 3. See LICENSE for more details.
Code was contributed by
- Jan Gilcher
- Fernando Virdia
- Giacomo Fenzi