[Question] hyperparameter optimization: objective of optuna study #469

bias-ster · 2024-08-23T14:31:14Z

❓ Question

Hi,

I’ve been adapting your code for PPO hyperparameter optimization for my custom environment and I have a question regarding the evaluation metric used.

In exp_manager.py, on line 810, I noticed that the optimization objective is defined using:
reward = eval_callback.last_mean_reward

This means that only the last evaluation is used to determine if the current trial is the best one. I was wondering if there’s a specific reason for this approach. Would you consider using:
'reward = eval_callback.best_mean_reward'
instead?

Checklist

I have checked that there is no similar issue in the repo
I have read the SB3 documentation
I have read the RL Zoo documentation
If code there is, it is minimal and working
If code there is, it is formatted using the markdown code blocks for both code and stack traces.

The text was updated successfully, but these errors were encountered:

bias-ster added the question Further information is requested label Aug 23, 2024

araffin added the duplicate This issue or pull request already exists label Aug 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] hyperparameter optimization: objective of optuna study #469

[Question] hyperparameter optimization: objective of optuna study #469

bias-ster commented Aug 23, 2024

[Question] hyperparameter optimization: objective of optuna study #469

[Question] hyperparameter optimization: objective of optuna study #469

Comments

bias-ster commented Aug 23, 2024

❓ Question

Checklist