Skip to content

ConcoLLMic/ConcoLLMic

Repository files navigation

ConcoLLMic: Agentic Concolic Execution

License: GPL v3 License: Commercial

Paper: IEEE S&P 2026


ConcoLLMic is the first language- and theory-agnostic concolic executor powered by LLM agents. Unlike traditional symbolic execution tools that require language-specific implementations and struggle with constraint solving, ConcoLLMic:

  • Works with any programming language and environment interactions — supports C, C++, Python, Java, ..., and even multi-language systems; requires no additional environment modeling and handles them efficiently.
  • Handles diverse constraint theories — including floating-point arithmetic, strings, structured data, and bit-level operations.
  • Achieves superior coverage — in ~4 hours, ConcoLLMic covers 115%-233% more branches than state-of-the-art tools like KLEE and 81% more than AFL++ running for 48 hours.
  • Finds real bugs — has already discovered 10+ previously unknown vulnerabilities in real-world softwares.

How it works: ConcoLLMic uses LLM agents to

  1. instrument code at source level with logging statements like fprintf(stderr, ...);
  2. summarize path constraints flexibly in natural language, code, or SMT formulas;
  3. solve constraints intelligently — directly reasoning constraints over semantics and delegating to grounded solvers when needed.

This eliminates the need for hand-crafted language-specific symbolic interpreters and environment modeling as well as expensive constraint solving.



Table of Contents


Workflow Example

Phase 1: Instrumentation

Instrumentation of ConcoLLMic

Example instrumentation of src/count.c from FP-Bench, by inserting fprintf at key points in the execution. Output in stderr allows us to reconstruct the dynamic execution path.

Phase 2: Concolic Testing

Testing Phase of ConcoLLMic

This shows one round of concolic execution with ConcoLLMic. Compared to conventional tools with verbose, implementation-level formulas:

start = atof(I[0]) ∧ end = atof(I[1]) ∧ start < end ∧ cur[0] = start ∧ 
(∀ 0 ≤ i < cnt, cur[i] ≠ end) ∧ cur[cnt] = end ∧ cnt ≤ 20

ConcoLLMic summarizes the path constraints concisely using high-level natural-language:

The number of representable FP values in the range must be ≤ 20.

Getting Started

Prerequisites

  • Python 3.10 or higher
  • API key for Claude Sonnet (from Anthropic)

Installation

  1. Clone the repository
git clone https://github.com/ConcoLLMic/ConcoLLMic.git
cd ConcoLLMic
  1. Install Python dependencies
pip install -r requirements.txt
pip install -r requirements-dev.txt # optional
  1. Set up your API key
export ANTHROPIC_API_KEY="your_api_key_here"

By default, ConcoLLMic uses Claude 4.5 Sonnet. You can configure other models in ACE.py (see setup_model()).

Running Your First Example

ConcoLLMic's workflow consists of two main steps: instrumentation & compilation and concolic execution. Below are two examples to get started:

Example 1: Bracket Matching Validation

A simple bracket matching validation program that checks if brackets are balanced.

Step 1: Instrumentation & Compilation

Instrumented code is already available in code_example/instr/validate_brackets.cpp.

To instrument from scratch:

rm ./code_example/instr/validate_brackets.cpp

python3 ACE.py instrument \
    --src_dir ./code_example/src/ \
    --out_dir ./code_example/instr/ \
    --instr_languages c,cpp,python,java

Note: The --instr_languages flag is only used for filtering files by extension. No additional language-specific support is needed.

Compile the instrumented code normally:

g++ -o ./code_example/instr/validate_brackets ./code_example/instr/validate_brackets.cpp

Step 2: Concolic Execution

Run the concolic execution agent with the provided test harness code_example/harness/validate_brackets.py, which is a Python function that prepares program input and executes the program:

python3 ACE.py run \
    --project_dir ./code_example/instr/ \
    --execution ./code_example/harness/validate_brackets.py \
    --out ./out/ \
    --rounds 2 \
    --parallel_num 3

Command Parameters Explained:

  • --project_dir: Directory containing the instrumented code and the compiled program
  • --execution: Test harness (a Python script that executes the target program and returns stderr + exit code)
  • --out: Output directory for generated test cases and logs
  • --rounds: Number of concolic execution rounds (default: unlimited until coverage plateau)
  • --parallel_num: Maximum number of concurrent test case generations per round

Output:

  • ./out/ConcoLLMic_*.log — detailed execution log
  • ./out/queue/id:*.yaml — generated test cases with metadata

Step 3: View Statistics (Optional)

Analyze testing statistics (tokens used, costs):

python3 ACE.py run_data --out_dir ./out/

Step 4: Replay & Measure Coverage (Optional)

# Compile with coverage
cd ./code_example/src/
g++ --coverage -o validate_brackets validate_brackets.cpp
lcov -z -d .
cd ../../

# Replay test cases
python3 ACE.py replay \
    ./out/ \
    ./code_example/src/ \
    ./coverage.csv \
    --cov_script ./code_example/src/coverage.sh

Output:

  • ./coverage.csv — time-series coverage data for each test case
  • ./code_example/src/validate_brackets.cpp.gcov — line-by-line coverage report
Example 2: Floating-Point Counting

A program from FP-Bench that counts representable floating-point values in a range.

Step 1: Instrumentation & Compilation

Instrumented code is already available in code_example/instr/count.c.

To instrument from scratch:

rm ./code_example/instr/count.c

python3 ACE.py instrument \
    --src_dir ./code_example/src/ \
    --out_dir ./code_example/instr/ \
    --instr_languages c,cpp,python,java

Note: The --instr_languages flag is only used for filtering files by extension. No additional language-specific support is needed.

Compile the instrumented code normally:

gcc -o ./code_example/instr/count ./code_example/instr/count.c

Step 2: Concolic Execution

Run the concolic execution agent with the provided test harness code_example/harness/count.py, which is a Python function that prepares program input and executes the program:

python3 ACE.py run \
    --project_dir ./code_example/instr/ \
    --execution ./code_example/harness/count.py \
    --out ./out/ \
    --rounds 2 \
    --parallel_num 3

Command Parameters Explained:

  • --project_dir: Directory containing the instrumented code and the compiled program
  • --execution: Test harness (a Python script that executes the target program and returns stderr + exit code)
  • --out: Output directory for generated test cases and logs
  • --rounds: Number of concolic execution rounds (default: unlimited until coverage plateau)
  • --parallel_num: Maximum number of concurrent test case generations per round

Output:

  • ./out/ConcoLLMic_*.log — detailed execution log
  • ./out/queue/id:*.yaml — generated test cases with metadata

Expected cost: ~$0.40 for 2 rounds with Claude-3.7-Sonnet

Step 3: View Statistics (Optional)

Analyze testing statistics (tokens used, costs):

python3 ACE.py run_data --out_dir ./out/

Step 4: Replay & Measure Coverage (Optional)

# Compile with coverage
cd ./code_example/src/
gcc --coverage -o count count.c
lcov -z -d .
cd ../../

# Replay test cases
python3 ACE.py replay \
    ./out/ \
    ./code_example/src/ \
    ./coverage.csv \
    --cov_script ./code_example/src/coverage.sh

Output:

  • ./coverage.csv — time-series coverage data for each test case
  • ./code_example/src/count.c.gcov — line-by-line coverage report

The gcov report will show that the bug at line 67 has been triggered.


Command Reference

ConcoLLMic provides a unified CLI interface:

python3 ACE.py <command> [options]

Main Commands

Command Description
instrument Instrument source code for tracing
run Execute concolic testing
replay Replay generated test cases
instrument_data Analyze instrumentation statistics
run_data Display testing statistics and costs

For detailed options, run:

python3 ACE.py <command> --help

Or visit the documentation.


Running Large-Scale Experiments

To reproduce the paper's evaluation results or run ConcoLLMic on real-world benchmarks, see the experiments/README.md.

We provide Docker-based setups for:

  • C/C++ benchmarks
  • Multi-language systems
  • Comparison with other tools — KLEE, KLEE-Pending, AFL++, SymCC, SymSan

License

This project is dual licensed:

  • Non-commercial use: Licensed under GPLv3 for academic and non-commercial purposes
  • Commercial use: Please contact the authors for commercial licensing

For full license details, see LICENSE.

For commercial licensing inquiries, contact:


Citation

@inproceedings{luo2026concollmic,
  title={Agentic Concolic Execution},
  author={Luo, Zhengxiong and Zhao, Huan and Wolff, Dylan and Cadar, Cristian and Roychoudhury, Abhik},
  booktitle={2026 IEEE Symposium on Security and Privacy (S\&P)},
  pages={1--19},
  year={2026},
  organization={IEEE}
}

About

ConcoLLMic: the first language- and theory-agonistic concolic execution engine via LLM agents

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages