intuition on the loss function

I run halfcheetach env using MBPO(SAC+3NNs(dynamics), and my training loss increases with this the reward.
I don't have intuition to interpret this
why training loss of model based policy optimization increases?
I can share wandb

Thanks