NeurIPS 2025 Artifact
This repository accompanies our NeurIPS 2025 paper, “Synthesizing Performance Constraints for Evaluating and Improving Code Efficiency.” If you want to re-run the full pipeline (prompting, constraint synthesis, fuzzing, execution), use the steps below.
Link of our benchmark, PerfForge: https://github.com/UChiSeclab/perfforge.
code-contest-exp/: End-to-end experimental pipeline and Python dependencies.feedback_collection/: Coverage/feedback collection forcpp/,python/, andjava/.scripts/: Orchestration utilities (collection, generation, post-processing).scripts/compile-stats/: Stats aggregation, LaTeX tables, and plots from JSON outputs.use-cases/: Example problems with solution variants.llm-prompts/: Prompts for constraint, validator, and mutator generation.README-for-supplemental-materials.md: Supplemental notes for artifact consumers.
- Ubuntu 20.04+ (tested)
- Python 3.9+ (for the main pipeline), plus Python 2.7/3.8 for legacy solution runners
- GCC/GCov (for C++ coverage)
- Linux
perf(5.15+ recommended) - Java 11 (for Java solutions’ coverage)
- AFL++ (for fuzzing-based test generation)
We use three conda envs: perf (main), py27 (Python2 solutions), py38 (Python3 solutions).
# Miniconda (if needed)
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh
# Environments
conda create -n py27 python==2.7.13
conda create -n py38 python==3.8
conda create -n perf python==3.9
conda activate perf
cd code-contest-exp
pip install -r requirements.txtwget https://download.java.net/java/ga/jdk11/openjdk-11_linux-x64_bin.tar.gz
tar -zxvf openjdk-11_linux-x64_bin.tar.gzsudo apt-get update
sudo apt-get install -y linux-tools-common linux-tools-generic linux-tools-`uname -r`
sudo sysctl kernel.perf_event_paranoid=-1# C++
which gcov || sudo apt-get install -y gcovr gcc g++
pip install gcovr
# Python
pip install -r ../feedback_collection/python/requirements.txtmkdir -p aflpp && cd aflpp
git clone https://github.com/AFLplusplus/AFLplusplus.git
cd AFLplusplus
git checkout v4.30c
export AFLPP_DIR=$(pwd)
# Build per AFL++ docs:
# https://github.com/AFLplusplus/AFLplusplus/blob/stable/docs/INSTALL.mdcd code-contest-exp
export PYTHONPATH=$(pwd)/scripts:../feedback_collectionBelow is the minimal end-to-end to (i) prepare problems and baselines, (ii) synthesize constraints, (iii) generate tests via prompting and fuzzing, (iv) sanitize, and (v) execute and aggregate results.
Tip: Create logs upfront to track progress.
mkdir -p log/{constraint_gen,corpus_gen,cov_collect,dump,feedback_collect,gen,mutator_gen,validator_gen,run}- Choose problem IDs (see
use-cases/for examples or configure your own set). - Update the first element of
code-contest-exp/problem_list.jsonwith your list (used whenuse_specified_problem=Trueinget_cf_problems). - Materialize problem assets:
python scripts/add_problem.pyGenerates result JSONs under config["result_root_dir"]/alphacode used by later steps.
python code-contest-exp/scripts/run.py --experiment_name alphacode > log/run/alphacode.logStores validators under config["problem_root_dir"]/{problem_id}/validator_gen/{validator_mode}. The run may retry until a “good” validator is found (indicated by VAL_GT_INPUT_PASS).
python code-contest-exp/scripts/input_validator_gen.py --validator_mode self_reflect_feedback > log/validator_gen/self_reflect_feedback.logWe support C++/Python/Java; C++ is the most robust in our artifact.
python code-contest-exp/scripts/feedback_collect.py --experiment_name feedback_diff_solution --solution_language cpp > log/feedback_collect/feedback_diff_solution.log 2>&1
python code-contest-exp/scripts/feedback_collect.py --experiment_name feedback_diff_input --solution_language cpp > log/feedback_collect/feedback_diff_input.log 2>&1
python code-contest-exp/scripts/feedback_collect.py --experiment_name feedback_multi_solution_diff_input --solution_language cpp --top_k 5 > log/feedback_collect/feedback_multi_solution_diff_input.log 2>&1
python code-contest-exp/scripts/collect_cov_report.py --experiment_name alphacodeMine contrasting input pairs (ranked by similarity and performance ratio):
python code-contest-exp/scripts/cgig/mine_input_pairs.pySynthesize performance constraints and instrument solutions:
python code-contest-exp/scripts/cgig/constraint_gen.py > log/constraint_gen/constraint_gen.logOutputs include: GPT response files, instrumented programs, and at most 5 constraints per problem under config["constraints_dir"].
Generates a custom AFL++ mutator and validates it with a short dry-run (success yields MUTATOR_CHECK_PASS).
python code-contest-exp/scripts/cgig/mutator_gen.py --mutator_type mutator_with_constraint_multi --problem_with_extracted_constraint_only True --mutator_mode self_reflect_feedback > log/mutator_gen/mutator_with_constraint.log 2>&1Prompting strategies (examples):
- Plain problem prompting (no constraints):
python code-contest-exp/scripts/gen_tests.py \
--experiment_name plain_problem \
--prompt_language java \
--run_tests True \
--validator_mode self_reflect_feedback \
--max_retry 10- With feedback (multi-solution diffs):
python code-contest-exp/scripts/gen_tests.py \
--experiment_name with_feedback_multi_solution_diff_input \
--prompt_language java \
--run_tests True \
--validator_mode self_reflect_feedback \
--max_retry 10Fuzzing strategies (AFL++):
python code-contest-exp/scripts/cgig/corpus_gen.py --mutator_type mutator_with_constraint_multi --problem_with_extracted_constraint_only True > log/corpus_gen/mutator_with_constraint_multi.log 2>&1Sanitize AlphaCode tests with generated validators:
python code-contest-exp/scripts/sanitize_alphacode_result.pyThis creates config["problem_root_dir"]/{problem_id}/alphacode_sanitized (evaluated as a separate strategy).
Validate and dump AFL++ corpora into per-problem strategy directories:
python code-contest-exp/scripts/cgig/dump_corpus_inputs.py --mutator_type mutator_with_constraint_multi --problem_with_extracted_constraint_only True > log/dump/corpus_mutator_with_constraint_multi.logPrompting strategies:
python code-contest-exp/scripts/run.py --experiment_name ${strategy} >> log/run/${strategy}.logFuzzing strategies:
python code-contest-exp/scripts/run.py --experiment_name ${strategy} --problem_with_extracted_constraint_only True >> log/run/${strategy}.logYou can compile cross-technique statistics and figures from the experiment outputs using scripts/compile-stats:
Example:
python3 scripts/compile-stats/main.py \
--json-dir /abs/path/to/results/json/root \
--output-csv-dir ./scripts/compile-stats/csv_out \
--plot-output-dir ./scripts/compile-stats/plots \
--language cppThis will populate CSVs and plots under scripts/compile-stats/{csv_out,plots}. Configure techniques, baselines, and styles in scripts/compile-stats/config.py.
To customize the set of problems/techniques for compiled figures from pipeline outputs, edit scripts/compile-stats/config.py.
- Long-running steps: Coverage collection, constraint synthesis, and AFL++ fuzzing can be time-consuming. We recommend starting with a small problem subset to validate your setup end-to-end, then scaling out.
- LLM access: The validator/constraint/mutator steps use LLM prompting. Configure your provider credentials via environment variables as appropriate for your setup. See scripts under
scripts/cgig/andllm-prompts/. - Monitoring: Check log files and output directories continuously (e.g.,
results/${strategy}growth during runs). - Languages: While C++ has the most mature support in this artifact, we provide Python/Java paths for completeness.
perfpermissions: Make surekernel.perf_event_paranoid=-1. A reboot may be necessary on some systems after changing sysctl.- AFL++ build: Follow the upstream install docs for your distro and ensure
AFLPP_DIRis exported in the shell running our scripts. - AFL++ hack: we increased the max file size that AFL++ can mutate from 1MB to 10MB by modifying
#define MAX_FILE (1 * 1024 * 1024L)ininclude/config.hfile. gcov/coverage mismatch: Ensure your compiler andgcovversions are compatible; recompile solutions if needed.- Java coverage: Ensure Java 11 runtime and the provided coverage jars are accessible (see
feedback_collection/java/lib/).
@inproceedings{
yang2025synthesizing,
title={Synthesizing Performance Constraints for Evaluating and Improving Code Efficiency},
author={Jun Yang and Cheng-Chi Wang and Bogdan Alexandru Stoica and Kexin Pei},
booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems},
year={2025},
url={https://openreview.net/forum?id=Qh458ZamHm}
}Licensing may vary across components. Please see per-directory notices. If in doubt, contact the authors.