ISSTA'20 Artifact for: How Far We Have Come: Testing Decompilation Correctness of C Decompilers
Our experiment was conducted on 64-bit Ubuntu 18.04
. We recommend to set up on
the same OS system.
src/
: source code directoryruntime/
: CSmith runtime libraryseed_for_retdec
andseed_for_r2
: sample seeds for EMI testing (see clarifications below).
fuzzer.py
: main component to initialize a fuzzing test campaign by calling functions in this scriptgenerator.py
: to compile and decompile files (to interact with Radare2 and IDA-Pro, we provide two scripts as follows; the other two decompilers can be used directly via command line)R2_decompile.py
: to decompile with Radare2/GhidraIDA_decompile.py
andidapy_decompile.py
: to decompile with IDA (IDA-Pro is not provided in this Artifact Evaluation Package; see clarifications below)
EMI_generator.py
: to generate EMI variantsMySQL_connector.py
: to connect MySQL, which is used in the implementation of EMI mutationCFG_measurer.py
: to measure CFG distance of two programs (used for EMI mutation)ENV_Profiler.py
: to provide live code EMI mutation functionContextTable.py
: context structure
replacer.py
: to replace main() in original code with decompilation resultmodifier.py
: to replace custom macros in decompilation results
checker.py
: to compare the output of the two programs for consistencyConfig.py
: constant values/strings/paths
sudo apt install gcc-multilib
sudo apt install m4
sudo apt install openssl libssl-dev -y
sudo apt install flex bison
sudo apt install pkg-config
Cmake
version 3.12 or later is needed to build r2ghidra-dec. To install latest
version of Cmake, download source code from here,
and then build it following instructions on their website:
./bootstrap
make
sudo make install
MySQL is used in EMI mutation. To install it on Ubuntu:
apt-get install mysql-server
Then start mysql service:
service mysql start
Remember to update user
and passwd
in MySQL_connector.py if you set another user and password. You can check your default user and password by:
sudo cat /etc/mysql/debian.cnf
To install the MySQL Driver for Python3:
apt-get install python3-pip
pip3 install PyMySQL
As reported in the paper, four decompilers are tested as follows:
- IDA Pro: https://www.hex-rays.com/products/ida/
- JEB3: https://www.pnfsoftware.com/
- RetDec: https://retdec.com/
- Radare2: https://www.radare.org/n/radare2.html (we tested the r2ghidra plugin of Radare2, more specifically)
We note that IDA Pro and JEB3 are commercial tools, and we decide to not provide them in this artifact evaluation phase. Instead, we provide instructions to setup the other two free decompilers RetDec and Radare2 with Ghidra plugin. We assure that two commercial decompilers are tested in exactly the same way.
To install Radare2:
git clone https://github.com/radareorg/radare2
cd radare2 ; sys/install.sh ; cd ..
We use commit 06ab29b93cb0168a8ec1cb39f860c6b990678838 when writing this README.
To further install the Ghidra decompiler plugin (named r2ghidra):
r2pm update
r2pm -i r2ghidra-dec
Then we need to install r2pipe to use our decompiler script R2_decompile.py:
pip3 install r2pipe
To install RetDec, we recommend to download and unpack pre-built package to save time, you can also build from source code following the instructions on their github page (note that the size of unpacked RetDec is about 5.5 GB.)
Download and unpack the pre-built RetDec (ver. 4.0) for Ubuntu at
here,
then you can use retdec-decompiler.py
under retdec/bin/
.
Remember to update the absolute path to retdec-decompiler.py
in Config.py. For example:
RetDec_absolute_path = '/home/fuzz/Documents/retdec-install/bin/retdec-decompiler.py'
Clone this repository
git clone https://github.com/monkbai/DecFuzzer.git
Then do not forget to update the absolute path to csmith runtime runtime_dir
in Config.py. For example:
runtime_dir = '/home/fuzz/Documents/DecFuzzer/runtime/'
python3 run.py
The script run.py
will run fuzzing test on RetDec and r2ghidra, separately. It will first test 1000 csmith generated programs in directory ./seed_for_[retdec|r2]
, the result will be stored in ./seed_for_[retdec|r2]/result/
and ./seed_for_[retdec|r2]/error/
, the EMI variants will be stored in ./seed_for_[retdec|r2]/emi/
.
Then it will test all generated EMI variants, the results are stored in a similar manner.
It will take several hours to finish the whole process. While it's unlikely to get exactly the same number (since randomness is involved in generating EMI mutations), it should give a very close number reported in Table 3 in our paper.
Meanwhile, for the ease of understanding/checking our results reported in the paper, We also provide all Csmith generated programs and EMI mutations which can be used to re-produce findings in Table 3, you can download them from here.
Then, you can reproduce the experiment results using the 'reproduce.py` script we provided. It takes two steps:
Step 1
Put all the C source files to be tested in a directory. For instance, to test RetDec, downloading the corresponding folder from Dropbox will result into the following folder structure:
➜ ~ tree retdec_folder
retdec_folder
├── cmisth_files
│ ├── error
│ └── result
└── emi_files
├── error
└── result
Our current results are put inside “error” and “result” subfolders. So before testing, consider removing those four subfolders.
Step 2
Run ./reproduce.py
like:
python3 ./reproduce.py --decompiler <decompiler name> --files_dir <directory to C files>
For instance,
python3 ./reproduce.py --decompiler retdec --files_dir retdec_folder/emi_files
And
python3 ./reproduce.py --decompiler retdec --files_dir retdec_folder/csmith_files
Indeed, reproduce.py
is designed such that when users want to test our four decompilers, then can specify the --decompiler
parameter with: retdec
, r2
, jeb
or ida
. Also, in addition to directly taking EMI or CSmith generated C files as the inputs, reproduce.py
also provides another option --EMI
to enable the generation of new EMI variants along the testing. For example:
python3 ./reproduce.py --decompiler r2 --files_dir ./radare2_folder/csmith_files --emi_dir ./new_seed_for_radare2/emi --EMI
Where --emi_dir
accompanies --EMI
to specify the output directory of newly generated EMI variants.
As noted in our paper, suppose a C file 10.c
is to be tested, it will be compiled first:
10.c == compile ==> 10
Then the executable 10
will be decompiled by corresponding decompiler:
10 == decompile ==> 10_retdec.c or 10_r2.c
We try to generate a new compilable file by replacing func_1
function in 10.c
with code in 10_retdec.c
or 10_r2.c
:
10_retdec.c or 10_r2.c == replace ==> 10_new.c == recompile ==> 10_new
If recompilation is failed, the source code is stored in error
folder and error information is logged in error/error_log.txt
.
Finally, we compare the outputs of 10
and 10_new
, if they yield different outputs, it will be stored in result
and logged in result/result_log.txt
.