Skip to content

Tensorforce 0.6.0

Choose a tag to compare

@AlexKuhnle AlexKuhnle released this 30 Aug 16:48
· 134 commits to master since this release
  • Removed agent arguments execution, buffer_observe, seed
  • Renamed agent arguments baseline_policy/baseline_network/critic_network to baseline/critic
  • Renamed agent reward_estimation arguments estimate_horizon to predict_horizon_values, estimate_actions to predict_action_values, estimate_terminal to predict_terminal_values
  • Renamed agent argument preprocessing to state_preprocessing
  • Default agent preprocessing linear_normalization
  • Moved agent arguments for reward/return/advantage processing from preprocessing to reward_preprocessing and reward_estimation[return_/advantage_processing]
  • New agent argument config with values buffer_observe, enable_int_action_masking, seed
  • Renamed PPO/TRPO/DPG argument critic_network/_optimizer to baseline/baseline_optimizer
  • Renamed PPO argument optimization_steps to multi_step
  • New TRPO argument subsampling_fraction
  • Changed agent argument use_beta_distribution default to false
  • Added double DQN agent (double_dqn)
  • Removed Agent.act() argument evaluation
  • Removed agent function arguments query (functionality removed)
  • Agent saver functionality changed (Checkpoint/SavedModel instead of Saver/Protobuf): save/load functions and saver argument changed
  • Default behavior when specifying saver is not to load agent, unless agent is created via Agent.load
  • Agent summarizer functionality changed: summarizer argument changed, some summary labels and other options removed
  • Renamed RNN layers internal_{rnn/lstm/gru} to rnn/lstm/gru and rnn/lstm/gru to input_{rnn/lstm/gru}
  • Renamed auto network argument internal_rnn to rnn
  • Renamed (internal_)rnn/lstm/gru layer argument length to horizon
  • Renamed update_modifier_wrapper to optimizer_wrapper
  • Renamed optimizing_step to linesearch_step, and UpdateModifierWrapper argument optimizing_iterations to linesearch_iterations
  • Optimizer subsampling_step accepts both absolute (int) and relative (float) fractions
  • Objective policy_gradient argument ratio_based renamed to importance_sampling
  • Added objectives state_value and action_value
  • Added Gaussian distribution arguments global_stddev and bounded_transform (for improved bounded action space handling)
  • Changed default memory device argument to CPU:0
  • Renamed rewards summaries
  • Agent.create() accepts act-function as agent argument for recording
  • Singleton states and actions are now consistently handled as singletons
  • Major change to policy handling and defaults, in particular parametrized_distributions, new default policies parametrized_state/action_value
  • Combined long and int type
  • Always wrap environment in EnvironmentWrapper class
  • Changed tune.py arguments