Thank you for evaluating this artifact!
To evaluate this artifact, a Linux machine with docker installed is needed.
-
CDD simplifies ProbDD by substituting probabilities with counters, and thus ease comprehension and implementation.
-
On various intial settings, CDD consistently performs comparable to ProbDD, or even better than ProbDD.
-
All the experiments take long time to finish, so it is recommended to use tools like screen and tmux to manage sessions if the experiments are run on remote server. We also provide flags for multi-processing.
-
The evaluation results may not exactly the same as shown in the paper, because both ProbDD and CDD are affected by randomness. Replicating the experiments for multiple times will such impact. However, the deviation should be trivial, and the results should still support the original claims in the paper.
-
If docker is not installed, install it by following the instructions.
-
Install the docker image.
docker pull codesubmission/cdd:latest # This step might takes a while, mainly depending on the network bandwidth. It also takes up much disk space (nearly 80GB) -
Start a container.
docker container run --cap-add SYS_PTRACE --interactive --tty codesubmission/cdd:latest /bin/bash # You should be at /tmp after the above command finishes # Your user name should be `coq` and all the following command are executed in docker # the root folder of the project is /home/coq/cdd cd /home/coq/cdd
In this project,
benchmark suite are in folder ./benchmarks.
-
./benchmarks/compilerbug: 20 cases for program reduction. -
./benchmarks/debloating: 10 cases for software debloating.
In the container, run the following commands to build the tools.
cd /home/coq/cdd
./scripts/build_hdd.sh
./scripts/build_chisel.sh-
Evaluate DDMIN, ProbDD and CDD on 20 programs triggering compiler bugs.
cd /home/coq/cdd # evaluate algorithms on 20 compiler bugs. # ddmin on 20 compiler bugs (around 53 hours given single process) ./scripts/run_hdd.sh --args_for_picireny "--dd ddmin" # evaluate Probdd (around 25 hours given single process) ./scripts/run_hdd.sh --args_for_picireny "--dd probdd" # evaluate CDD (around 25 hours given single process) ./scripts/run_hdd.sh --args_for_picireny "--dd counterdd" # To evaluate on multiple benchmarks concurrently, use the flag --max_jobs, for example: ./scripts/run_hdd.sh --args_for_picireny "--dd ddmin" --max_jobs "8" # To evaluate a specific benchmark, use the flag --benchmark, for example: ./scripts/run_hdd.sh --args_for_picireny "--dd ddmin" --benchmark "clang-22382"
-
Results and log.
Note that every time you start
./run_chisel.sh, a folder named by current timestamp is created in~/cdd/results/hdd. For instance, if current time is 2023/09/12,23:06:25, all results produced by this run will be saved in~/cdd/results/hdd/20230912230625/. Besides, there is a config.txt recording the options in this run, under the folder20230912230625.Summarize the result in this run.
cd ~/cdd/results/hdd/20230912230625/ python ~/cdd/script/summarize_hdd.py .
Then, file
summary.csvwill be saved in~/cdd/results/hdd/20230912230625/. Insummary.csv, data such as time, final size and query number for each benchmark is displayed. -
Evaluate DDMIN, ProbDD and CDD on 10 programs in software debloating.
Similar to how we evaluate algorithms in
compilerbug, just run./script/run_chisel.shwith correct options.# ddmin ./scripts/run_chisel.sh --args_for_chisel "--algorithm ddmin" # ProbDD ./scripts/run_chisel.sh --args_for_chisel "--algorithm probdd" # CDD ./scripts/run_chisel.sh --args_for_chisel "--algorithm counterdd" # To run multiple benchmarks concurrently, use --max_jobs ./scripts/run_chisel.sh --args_for_chisel "--algorithm ddmin" --max_jobs "8" # To run a specific benchmark, use --benchmark ./scripts/run_chisel.sh --args_for_chisel "--algorithm ddmin" --benchmark "mkdir-5.2.1"
Similarly, results will be stored in a folder named by current timestamp, under
~/results/chisel. Runsummarize_chisel.pyto generatesummary.csv.cd ~/cdd/results/chisel/20230912230625/ python ~/cdd/script/summarize_chisel.py .
In RQ1 and RQ2, the initial probability is 0.1 by default. In this RQ, we explictly specify different initial probability (0.05, 0.15, 0.2, 0.25) for evaluation. Everything else is the same as RQ1 and RQ2.
# for 20 cases about compiler bugs
./scripts/run_hdd.sh --args_for_picireny "--dd probdd --init-probability 0.05"
# foe 10 cases about software debloating
./scripts/run_chisel.sh --args_for_chisel "--algorithm ddmin --init_probability 0.05"