We work in the never-ending learning gym environment (https://github.com/eaplatanios/nel_framework), which is an endless gridworld with an agent collecting valuable items and tools. Collecting rewards efficiently involves taking the shortest path to collect items and having a high level plan that guides the agent and rioritize where to go and what to collect.
We propose a greedy heuristic and implement Deep Q-networks (DQN), imitation learning and Deep Q-learning from Demonstrations (DQfD).