how to improve the convergence performance of training loss?

Hi kengz, I  find  that the convergence performance of training loss (=value loss+policy loss) of `ppo` algorithem  applied in game `pong` is poor (see Fig.1), but the corresponding `mean_returns` shows a  good upward trend and reaches convergence (see Fig.2).
That is why?  how to improve the convergence performance of training loss? I tried many imporved tricks with ppo, but none of them worked.
![ppo_pong_t0_s0_session_graph_eval_loss_vs_frame](https://user-images.githubusercontent.com/32628105/170934881-9626ed52-a09e-44ca-9029-d629b0314dc3.png)
Fig.1 
![ppo_pong_t0_s0_session_graph_eval_mean_returns_vs_frames](https://user-images.githubusercontent.com/32628105/170934938-482d74a7-6e72-48f8-8906-6eb5805d34d8.png)
Fig.2



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

how to improve the convergence performance of training loss? #510

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

how to improve the convergence performance of training loss? #510

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions