A tool for benchmarking and hyper-parameter optimization using TerraTorch.
Leverages MLFlow for experiment logging, optuna for hyperparameter optimization and ray for parallelization.
We recommend using python 3.10, 3.11 or 3.12 and also using a virtual environment for all commands in this guide.
pip install terratorch-iterategit clone https://github.com/IBM/terratorch-iterate.git
cd terratorch-iterate
pip install --upgrade pip setuptools wheel
pip install -e .
pip install -e ".[dev]"
pip install -e ".[test]"
pip install -e ".[utilities]"This tool allows you to design a benchmark test for a backbone that exists in TerraTorch over:
-
Several tasks
-
Several hyperparameter configurations
To do this it relies on a configuration file where the benchmark is defined. This consists of:
-
experiment_name: MLFLow experiment to run the benchmark on. This is the highest level grouping of runs in MLFLow. -
run_name: Name of the parent (top-level) run under the experiment. NOTE: This item should not be included in the config if you wish to use the parameters extraction function inmlfow_utilsto compile results. -
defaults: Defaults that are set for all tasks. Can be overriden under each task. -
tasks: List of tasks to perform. Tasks specify parameters for the decoder, datamodule to be used and training parameters. -
n_trials: Number of trials to be carried out per task, in the case of hyperparameter tuning. -
save_models: Whether to save models. Defaults to False. (Setting this to true can take up a lot of space). Models will be logged as artifacts for each run in MLFlow. -
optimization_space: Hyperparameter space to search over. Bayesian optimization tends to work well with a small number of hyperparameters. -
run_repetitions: number of repetitions to be run using best setting from optimization. -
storage_uri: Location to use for mlflow storage for the hyperparameter optimization (hpo) stage. During optimization, additional folders will be created in parent directory ofstorage_uri. For example, ifstorage_uriis/opt/benchmark_experiments/hpo, additional folders will include:
/opt/benchmark_experiments/
└──hpo_results
└──hpo_repeated_exp
└──repeated_exp_output_csv
└──job_logs
└──optuna_db
Please see configs/benchmark_v2_template.yaml in the git repo for an example.
Besides the --config argument, terratorch-iterate also has two other arguments:
- if users include
--hpoargument, then terratorch-iterate will optimize hyperparameters. Otherwise, it will rerun best experiment - if users include
--repeatargument, then terratorch-iterate will repeat the best experiment. Otherwise, terratorch-iterate will not rerun any experiment
If users want to optimize hyperparameters:
terratorch iterate --hpo --config <config-file>Another way to run terratorch-iterate is to omit terratorch by running:
iterate --hpo --config <config-file>For instance:
iterate --hpo --config configs/dofa_large_patch16_224_upernetdecoder_true_modified.yamlIf users want to rerun best experiment, please use the same config file. Additionally, the parent_run_id, which is the mlflow run id from optimization, should be added as shown below:
iterate --repeat --config <config-file> --parent_run_id <mlflow run_id from hpo>For instance:
iterate --repeat --config configs/dofa_large_patch16_224_upernetdecoder_true_modified.yaml --parent_run_id 61bdee4a35a94f988ad30c46c87d4fbdIf users want to optimize hyperparameters then the rerun best experiment in a single command, please use both settings as shown below:
iterate --hpo --repeat --config <config-file>For instance:
iterate --hpo --repeat --config configs/dofa_large_patch16_224_upernetdecoder_true_modified.yamlTo check the experiment results, use mlflow ui --host $(hostname -f) --port <port> --backend-store-uri <storage_uri>
Summarizing results and hyperparameters of multiple experiments relies on a configuration file where the experiments and tasks are defined. This consists of:
-
list_of_experiment_names: List of MLFLOW experiment names which had been completed. -
task_names: List of tasks found in each experiment inlist_of_experiment_names. -
run_repetitions: Number of repetitions to be expected for each experiment. This should be the same as the value used in in each experiment config file during optimization. -
storage_uri: Location to use for mlflow storage for the hyperparameter optimization (hpo) stage. This should be the same value used asstorage_uriin each experiment config file during optimization (see above). -
benchmark_name: string to be used to name resulting csv file
See configs/summarize_results_template.yaml in the git repo for an example.
To summarize results and hyperparameters, please run the following:
iterate --summarize --config <summarize-config-file>For instance:
iterate --summarize --config configs/summarize_results.yamlThe results and hyperparameters are extracted into a csv file. For example, if storage_uri is /opt/benchmark_experiments/hpo, then sumarized results will be saved in last file as shown below:
/opt/benchmark_experiments/
└──hpo_results
└──hpo_repeated_exp
└──repeated_exp_output_csv
└──job_logs
└──optuna_db
└──summarized_results/
└──<benchmark_name>/
└──results_and_parameters.csv
You can also parallelize your runs over a ray cluster.
Check out instructions in the docs
terratorch-iterate provides an utility to convert terratorch's config file into a single terratorch-iterate's config file. You have to follow these steps:
- Go to
<$TERRATORCH-ITERATE-HOME>/benchmark/config_util - Copy all terratorch config files that you want to be converted into a new directory. Note that all yaml files within this directory will be parsed to generate the terratorch-iterate file, so make sure that only yaml files that you want to be parsed are located in this directory.
- Run build_geobench_configs.py script by specifying 3 input parameters:
input_dir- Full path to the directory that contains all terratorch config yaml filesoutput_dir- Full path to the directory that will stored the generated filestemplate- Full or relative path to the template fileprefix- prefix to the generated file names, e.g., if set prefix tomy_prefixthen a generated filename could bemy_prefix_Clay.yaml
For instance, this is an example of such command:
python3 build_geobench_configs.py --input_dir /Users/john/terratorch/examples/confs/geobenchv2_detection --output_dir /Users/john/terratorch-iterate/benchmark/config_util --template geobenchv2_template.yaml --prefix my_prefix_