Algorithm Key Points
●
Actor and Critic have separate networks
●
Loss functions are related to one another (nested function calls)
●
Replay buffer is large, finite, and sampled uniformly; overwrite
earliest memories
●
Target networks for both actor and critic, with soft updates
●
Batch norm for both networks
●
Actions introduced to critic after states
●
Stochastic policy adds noise function to deterministic policy