Skip to content

how to improve the convergence performance of training loss? #510

@williamyuanv0

Description

@williamyuanv0

Hi kengz, I find that the convergence performance of training loss (=value loss+policy loss) of ppo algorithem applied in game pong is poor (see Fig.1), but the corresponding mean_returns shows a good upward trend and reaches convergence (see Fig.2).
That is why? how to improve the convergence performance of training loss? I tried many imporved tricks with ppo, but none of them worked.
ppo_pong_t0_s0_session_graph_eval_loss_vs_frame
Fig.1
ppo_pong_t0_s0_session_graph_eval_mean_returns_vs_frames
Fig.2

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions