Fix PER bug. Add loadable memory. N-step TD.#276
Open
jakegrigsby wants to merge 5 commits into
Open
Conversation
Contributor
|
@jakegrigsby Thanks for your PR! I've left you some comments. Also, it seems it doesn't pass on Tensorflow backend. Can you clear this out? |
Author
|
@RaphaelMeudec I use tensorflow as my backend and it seems to be working. I use 1.10 so I'll update and see if that's the issue. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is basically #214 with most of the extra features cut out, because the PER bug needs to get fixed, whether or not the rest ever merges. DQfD, NoisyNets, and the csv logging callback are removed.
Fixes bug discussed in Prioritized Experience Replay #212 where observation indices rotate while their priorities stay static.
Adds loadable memory. The
PartitionedRingBufferis similar to the old implementation ofRingBufferbut the data stays locked to the idx it was originally assigned to. That's what fixed the PER issue, and also makes it pretty easy to load demonstration data and never delete it. I originally used it in DQfD but I think it has more applications than that. A feature like this was requested in [feature-request] Add the possibility to load and store memory. #262.Adds n-step temporal difference error to dqn (only for prioritized memories). This basically involves a change to how the experiences are retrieved from memory. I'd appreciate someone double checking my math on that.