This repository contains the official implementation of the Iterative Compositional Data Generation (ICDG) pipeline introduced in "Iterative Compositional Data Generation for Robot Control" (Pham et al.). ICDG is a self-improving generative pipeline for robotic manipulation that uses a semantic compositional diffusion transformer to synthesize high-quality expert data for unseen tasks.
Robotic manipulation domains often contain a combinatorial number of possible tasks, arising from combinations of different components, such as robots, objects, obstacles, and objectives. Collecting real demonstrations for all combinations is prohibitively expensive. ICDG leverages the underlying compositional structure of these domains to generalize far beyond the tasks it has been trained on, enabling large-scale capability growth from limited real data.
-
Semantic Compositional Diffusion Transformer:
Factorizes each transition into specific components and learns their interactions through attention, enabling strong compositional generalization. -
Zero-Shot Generation:
Generates full state–action–next-state transitions for new task combinations that were never observed in real data. -
Iterative Self-Improvement:
Synthetic data is evaluated using offline RL; only high-quality, policy-validated transitions are added back into the training pool, allowing the model to continuously refine itself without additional real data collection. -
Data Efficiency and Generalization:
Trained on real data from approximately 20 percent of possible task combinations, ICDG generates useful data for the remaining tasks and ultimately solves nearly all held-out tasks. -
Emergent Compositional Structure:
Attention patterns and intervention tests reveal that the model recovers meaningful task-factor dependencies, despite no hand-crafted structure being imposed.
- Python 3.9.6
- CUDA-capable GPU (for training diffusion models and policies)
- SLURM cluster access (for running experiments)
- Create a Python 3.9.6 virtual environment:
python3.9 -m venv first_3.9.6
source venv/bin/activate # On Linux/Mac- Install dependencies from
requirements.txt:
pip install --upgrade pip
pip install -r requirements.txtNote: The requirements.txt includes an editable install of CompoSuite from a specific git commit for reproducibility:
-e git+https://github.com/Lifelong-ML/CompoSuite.git@1fa36f67f31aeccc9ef75748bfc797960e044a86#egg=composuite
- Set up the data directory:
- Download expert datasets from Dryad
- Organize the data according to the structure described in
data/README.md - Only expert datasets are needed for this project
The main pipeline implements the iterative self-improvement procedure from the paper (see Figure 1). The process consists of:
- Compositional Diffusion Training: Train the semantic compositional diffusion transformer on N expert datasets + M high-quality synthetic datasets from previous iterations
- Zero-shot Data Generation: Generate synthetic transitions for all remaining task combinations (All combinations - N - M)
- Offline RL Validation: Train policies on synthetic data and evaluate performance via offline RL
- Quality-based Filtering:
- Good datasets: Added to training set for next iteration (M synthetic datasets)
- Bad datasets: Removed from future generation cycles
- Iteration: Repeat until convergence or max iterations reached
Run the pipeline:
python3 -u -m scripts.automated_iterative_diffusion_dits_iiwa \
--max_iterations 5 \
--num_train 14 \
--diffusion_seed 0 \
--curriculum_seed 0 \
2>&1 | tee iterative_diffusion_0_dits_iiwa.outKey arguments:
--max_iterations: Maximum number of iterations to run--num_train: Number of training tasks (14 for IIWA subset)--diffusion_seed: Random seed for diffusion model training--curriculum_seed: Random seed for curriculum schedule generation--success_threshold: Success rate threshold for good tasks (default: 0.8)--threshold_reduction_amount: Amount to reduce threshold by when no good tasks found (default: 0.1)--threshold_reduction_cycle: Number of consecutive iterations with no good tasks before reducing threshold (default: 1)--min_threshold: Minimum threshold value (default: 0.5)
Output:
- Diffusion models:
results/augmented_{iteration}/diffusion/ - Synthetic data:
results/augmented_{iteration}/diffusion/{model_name}/{task}/samples_0.npz - Policy checkpoints:
results/augmented_{iteration}/policies/ - Analysis logs:
scripts/policies_slurm_logs/ - Best test task dataset:
results/best_testtask_dataset/
Run the transformer TD3+BC multitask baseline for comparison:
python3 -u -m scripts.run_transformer_baseline_pipeline \
--num_train 14 \
--seeds 10 11 12 13 14 \
--memory 50 \
--time 24 \
2>&1 | tee multitask_Trans_OfflineRL_iwa_seed2.outKey arguments:
--num_train: Number of training tasks (14 for IIWA subset)--seeds: List of random seeds to run (e.g.,10 11 12 13 14)--memory: Memory per job in GB (default: 50)--time: Time limit per job in hours (default: 24)--max_timesteps: Maximum training timesteps (default: 50000)--batch_size: Batch size (default: 1792)
Output:
- Model checkpoints:
results/transformer_baseline/seed_{seed}/ - Results CSV:
results/transformer_baseline/transformer_baseline_results.csv - Training logs:
scripts/transformer_baseline_logs/
.
├── data/ # Dataset directory (see data/README.md)
├── results/ # Experiment results
│ ├── augmented_{iteration}/ # Iterative diffusion results
│ └── transformer_baseline/ # Transformer baseline results
├── scripts/ # Main scripts (including both baseline scripts
and large-scale scripts, which mirror the
representative structure shown below)
│ ├── automated_iterative_diffusion_dits_iiwa.py # Main pipeline
│ ├── run_transformer_baseline_pipeline.py # Transformer baseline
│ ├── train_augmented_diffusion.py # Diffusion training
│ ├── train_augmented_policy.py # Policy training
│ └── generate_augmented_data_dits.py # Data generation
├── diffusion/ # Diffusion model code
├── corl/ # Offline RL algorithms (TD3-BC, IQL)
├── config/ # Configuration files
└── requirements.txt # Python dependencies
- Semantic Compositional Architecture: Diffusion transformer with factorized components (robot, object, obstacle, objective)
- Iterative Self-Improvement: Each iteration uses validated high-quality synthetic tasks to improve the diffusion model
- Zero-shot Generation: Generates data for unseen task combinations without additional training
- Automatic Retry: Failed jobs are automatically retried with increased resources
- Curriculum Filtering: Component-specific curriculum filtering for iterations 5+ (optional)
- Adaptive Threshold: Success threshold automatically reduces if no good tasks are found
- Comprehensive Logging: Detailed logs and CSV analysis files for each iteration
Default paths are set in the script configuration classes. Modify these in the scripts if needed:
base_path: Project root directorydata_path: Path to expert datasetsresults_path: Path to save resultstasks_path: Path to task list JSON files
If you use this code, please cite:
@article{pham2025iterative,
title={Iterative Compositional Data Generation for Robot Control},
author={Pham, Anh-Quan and Hussing, Marcel and Patankar, Shubhankar P. and Bassett, Dani S. and Mendez-Mendez, Jorge and Eaton, Eric},
journal={arXiv preprint arXiv:2512.10891},
year={2025},
}Related Resources:
For inquiries, please contact Anh-Quan Pham.