You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I’ve been adapting your code for PPO hyperparameter optimization for my custom environment and I have a question regarding the evaluation metric used.
In exp_manager.py, on line 810, I noticed that the optimization objective is defined using: reward = eval_callback.last_mean_reward
This means that only the last evaluation is used to determine if the current trial is the best one. I was wondering if there’s a specific reason for this approach. Would you consider using:
'reward = eval_callback.best_mean_reward'
instead?
Checklist
I have checked that there is no similar issue in the repo
❓ Question
Hi,
I’ve been adapting your code for PPO hyperparameter optimization for my custom environment and I have a question regarding the evaluation metric used.
In exp_manager.py, on line 810, I noticed that the optimization objective is defined using:
reward = eval_callback.last_mean_reward
This means that only the last evaluation is used to determine if the current trial is the best one. I was wondering if there’s a specific reason for this approach. Would you consider using:
'reward = eval_callback.best_mean_reward'
instead?
Checklist
The text was updated successfully, but these errors were encountered: