You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Renamed agent arguments baseline_policy/baseline_network/critic_network to baseline/critic
Renamed agent reward_estimation arguments estimate_horizon to predict_horizon_values, estimate_actions to predict_action_values, estimate_terminal to predict_terminal_values
Renamed agent argument preprocessing to state_preprocessing
Default agent preprocessing linear_normalization
Moved agent arguments for reward/return/advantage processing from preprocessing to reward_preprocessing and reward_estimation[return_/advantage_processing]
New agent argument config with values buffer_observe, enable_int_action_masking, seed
Renamed PPO/TRPO/DPG argument critic_network/_optimizer to baseline/baseline_optimizer
Renamed PPO argument optimization_steps to multi_step
New TRPO argument subsampling_fraction
Changed agent argument use_beta_distribution default to false
Added double DQN agent (double_dqn)
Removed Agent.act() argument evaluation
Removed agent function arguments query (functionality removed)
Agent saver functionality changed (Checkpoint/SavedModel instead of Saver/Protobuf): save/load functions and saver argument changed
Default behavior when specifying saver is not to load agent, unless agent is created via Agent.load
Agent summarizer functionality changed: summarizer argument changed, some summary labels and other options removed
Renamed RNN layers internal_{rnn/lstm/gru} to rnn/lstm/gru and rnn/lstm/gru to input_{rnn/lstm/gru}
Renamed auto network argument internal_rnn to rnn
Renamed (internal_)rnn/lstm/gru layer argument length to horizon
Renamed update_modifier_wrapper to optimizer_wrapper
Renamed optimizing_step to linesearch_step, and UpdateModifierWrapper argument optimizing_iterations to linesearch_iterations
Optimizer subsampling_step accepts both absolute (int) and relative (float) fractions
Objective policy_gradient argument ratio_based renamed to importance_sampling
Added objectives state_value and action_value
Added Gaussian distribution arguments global_stddev and bounded_transform (for improved bounded action space handling)
Changed default memory device argument to CPU:0
Renamed rewards summaries
Agent.create() accepts act-function as agent argument for recording
Singleton states and actions are now consistently handled as singletons
Major change to policy handling and defaults, in particular parametrized_distributions, new default policies parametrized_state/action_value
Combined long and int type
Always wrap environment in EnvironmentWrapper class