Replies: 1 comment
-
|
@yining043 tagging you here too, since I guess you will need step-wise states (or at least rewards) to save for improvement methods! |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Problem
In most combinatorial settings such as the ones we consider, the initial
td(e.g.locsin a Euclidean routing problem) does not really change so we do not need to carry information about all the computational graph. This is why, unlike TorchRL, we modified thestep()function of the environment here not to save all previoustd(since they would just increase runtime).However, this is not in general true when we consider dynamic / stochastic settings.
Solution
We should better explain why we do this and allow users to save intermediate states during decoding as an option, perhaps specifying the problem as
staticordynamicinstead of having the_torchrl_modein here.PS: optionally one could save the Tensordicts alongside actions here - i.e., saving each step inside of the
DecodingStrategyfrom @LTluttmann upon request and giving back the full nestedtdas usually done in TorchRLCC: @Furffico @cbhua
Beta Was this translation helpful? Give feedback.
All reactions