Hardware: Google Colab L4
| Environment | Model Type | Average Reward | Total Training Steps | HuggingFace |
|---|---|---|---|---|
| Reach | TQC | -0.59 | 1,000,000 | Link |
| Reach | DDPG | -0.65 | 1,000,000 | Link |
| Push | TQC | -2.02 | 1,000,000 | Link |
| Slide | TQC | -6.65 | 1,000,000 | Link |
| Pick and Place | TQC | -2.07 | 1,000,000 | Link |
| Pick and Place | DDPG | -11.01 | 1,000,000 | Link |
- Set
learning_startsneeds to be greater than 100 as the Fetch environment are defaulted to end at 50 steps (max_episode_steps=50) and this will cause for TQC - Due to overestimation issues with DDPG, it will struggle in more complex environment like in Pick & Place and Slide. So it is excepted for DDPG struggle to "solve" the Fetch environments