D3-Gym

D3-Gym is the first automatically constructed dataset of verifiable environments for Data-Driven Discovery. It contains 565 tasks derived from 239 real-world multi-disciplinary scientific repositories.

Each task includes:

a natural language instruction,
an executable environment with pre-installed dependencies,
input datasets and artifact previews,
a reference implementation,
and an automatically generated evaluation script.

Using D3-Gym Environments

All task environments are distributed as Docker images via Docker Hub.

Each image is a self-contained unit representing a single data-driven discovery task. It includes the task specification, datasets and previews, reference outputs, and evaluation script, along with pre-installed dependencies.

To solve a task, provide a solution.py that:

reads the provided datasets, and
writes outputs to pred_results/.

The evaluation script compares your outputs against the reference and returns a pass/fail decision with a short explanation.

For easier browsing, we also provide an annotation sheet with metadata for all tasks on HuggingFace.

Quick Start

Pull a task image and inspect it:

docker pull hananemoussa/d3-gym:task_1
docker run --rm hananemoussa/d3-gym:task_1 inspect

Run your solution and evaluate:

docker run --rm \
  -v $(pwd)/solution.py:/task/solution.py:ro \
  hananemoussa/d3-gym:task_1 run_and_eval

Environment Structure

Each Docker image exposes the following directory layout:

/task/
  task_instruction.txt     # task description
  datasets/                # input data (CSV, JSON, images, etc.)
  *_preview.txt            # dataset schema previews
  eval_script.py           # evaluation logic
  gold_results/            # reference outputs
  pred_results/            # expected location for your outputs
  entrypoint.sh            # command routing

Providing Your Solution or Outputs

# Run and evaluate a solution
docker run --rm \
  -v $(pwd)/solution.py:/task/solution.py:ro \
  hananemoussa/d3-gym:task_151 run_and_eval

# Evaluate precomputed results
docker run --rm \
  -v $(pwd)/my_results:/task/pred_results:ro \
  hananemoussa/d3-gym:task_151 eval

# Interactive debugging session
docker run --rm -it hananemoussa/d3-gym:task_151 shell

Downstream Use Cases

D3-Gym supports workflows that require executable environments with verifiable evaluation signals for data-driven discovery (e.g. reinforcement learning, self-improvement, etc.).

One use case is generating training trajectories (e.g., reasoning traces and solutions). The trajectories used in our experiments are available on HuggingFace.

Disclaimer

Repositories used in the creation of D3-Gym are under permissive licenses. We provide a full breakdown of licenses below. There are also 39 repositories that do not provide any license information; we assume these permit use for research purposes.

License Distribution

License	Count
MIT	99
GNU (GPL, AGPL, LGPL)	43
None	39
BSD	29
Apache	22
CC	4
ISC	1
Custom	2
Total	239

Custom-Licensed Repositories

BrainIAC
DeepDelta

Citation

If you find our paper or resources useful in your work, please cite us:

@article{d3gym2026,
  title   = {D3-Gym: Constructing Verifiable Environments for Data-Driven Discovery},
  author  = {Hanane Nour Moussa, Yifei Li, Zhuoyang Li, Yankai Yang, Cheng Tang, Tianshu Zhang, Nesreen K. Ahmed, Ali Payani, Ziru Chen, Huan Sun},
  journal = {arXiv preprint arXiv:2604.27977},
  year    = {2026},
  url     = {https://arxiv.org/abs/2604.27977}
}

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
training		training
workflow		workflow
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

D3-Gym

Using D3-Gym Environments

Quick Start

Environment Structure

Providing Your Solution or Outputs

Downstream Use Cases

Disclaimer

License Distribution

Custom-Licensed Repositories

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

D3-Gym

Using D3-Gym Environments

Quick Start

Environment Structure

Providing Your Solution or Outputs

Downstream Use Cases

Disclaimer

License Distribution

Custom-Licensed Repositories

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages