Releases: tensorforce/tensorforce
Releases · tensorforce/tensorforce
Tensorforce 0.6.5
Agents:
- Renamed agent argument
reward_preprocessingtoreward_processing, and in case of Tensorforce agent moved toreward_estimation[reward_processing]
Distributions:
- New
categoricaldistribution argumentskip_linearto not add the implicit linear logits layer
Environments:
- Support for multi-actor parallel environments via new function
Environment.num_actors()Runneruses multi-actor parallelism by default if environment is multi-actor
- New optional
Environmentfunctionepisode_return()which returns the true return of the last episode, if cumulative sum of environment rewards is not a good metric for runner display
Examples:
- New
vectorized_environment.pyandmultiactor_environment.pyscript to illustrate how to setup a vectorized/multi-actor environment.
Tensorforce 0.6.4
Agents:
- Agent argument
update_frequency/update[frequency]now supports float values > 0.0, which specify the update-frequency relative to the batch-size - Changed default value for argument
update_frequencyfrom1.0to0.25for DQN, DoubleDQN, DuelingDQN agents - New argument
return_processingandadvantage_processing(where applicable) for all agent sub-types - New function
Agent.get_specification()which returns the agent specification as dictionary - New function
Agent.get_architecture()which returns a string representation of the network layer architecture
Modules:
- Improved and simplified module specification, for instance:
network=my_moduleinstead ofnetwork=my_module.TestNetwork, orenvironment=envs.custom_envinstead ofenvironment=envs.custom_env.CustomEnvironment(module file needs to be in the same directory or a sub-directory)
Networks:
- New argument
single_output=Truefor some policy types which, ifFalse, allows the specification of additional network outputs for some/all actions via registered tensors KerasNetworkargumentmodelnow supports arbitrary functions as long as they return atf.keras.Model
Layers:
- New layer type
SelfAttention(specification key:self_attention)
Parameters:
- Support tracking of non-constant parameter values
Runner:
- Rename attribute
episode_rewardsasepisode_returns, and TQDM statusrewardasreturn - Extend argument
agentto supportAgent.load()keyword arguments to load an existing agent instead of creating a new one.
Examples:
- Added
action_masking.pyexample script to illustrate an environment implementation with built-in action masking.
Buxfixes:
- Customized device placement was not applied to most tensors
Tensorforce 0.6.3
Agents:
- New agent argument
trackingand corresponding functiontracked_tensors()to track and retrieve the current value of predefined tensors, similar tosummarizerfor TensorBoard summaries - New experimental value
trace_decayandgae_decayfor Tensorforce agent argumentreward_estimation, soon for other agent types as well - New options
"early"and"late"for valueestimate_advantageof Tensorforce agent argumentreward_estimation - Changed default value for
Agent.act()argumentdeterministicfromFalsetoTrue
Networks:
- New network type
KerasNetwork(specification key:keras) as wrapper for networks specified as Keras model - Passing a Keras model class/object as policy/network argument is automatically interpreted as
KerasNetwork
Distributions:
- Changed
Gaussiandistribution argumentglobal_stddev=Falsetostddev_mode='predicted' - New
Categoricaldistribution argumenttemperature_mode=None
Layers:
- New option for
Functionlayer argumentfunctionto pass string function expression with argument "x", e.g. "(x+1.0)/2.0"
Summarizer:
- New summary
episode-lengthrecorded as part of summary label "reward"
Environments:
- Support for vectorized parallel environments via new function
Environment.is_vectorizable()and new argumentnum_parallelforEnvironment.reset()- See
tensorforce/environments.cartpole.pyfor a vectorizable environment example Runneruses vectorized parallelism by default ifnum_parallel > 1,remote=Noneand environment supports vectorization- See
examples/act_observe_vectorized.pyfor more details on act-observe interaction
- See
- New extended and vectorizable custom CartPole environment via key
custom_cartpole(work in progress) - New environment argument
reward_shapingto provide a simple way to modify/shape rewards of an environment, can be specified either as callable or string function expression
run.py script:
- New option for command line arguments
--checkpointsand--summariesto add comma-separated checkpoint/summary filename in addition to directory - Added episode lengths to logging plot besides episode returns
Buxfixes:
- Temporal horizon handling of RNN layers
- Critical bugfix for late horizon value prediction (including DQN variants and DPG agent) in combination with baseline RNN
- GPU problems with scatter operations
Tensorforce 0.6.2
Buxfixes:
- Critical bugfix for DQN variants and DPG agent
Tensorforce 0.6.1
Agents:
- Removed default value
"adam"for Tensorforce agent argumentoptimizer(since default optimizer argumentlearning_rateremoved, see below) - Removed option
"minimum"for Tensorforce agent argumentmemory, useNoneinstead - Changed default value for
dqn/double_dqn/dueling_dqnagent argumenthuber_lossfrom0.0toNone
Layers:
- Removed default value
0.999forexponential_normalizationlayer argumentdecay - Added new layer
batch_normalization(generally should only be used for the agent argumentsreward_processing[return_processing]andreward_processing[advantage_processing]) - Added
exponential/instance_normalizationlayer argumentonly_meanwith defaultFalse - Added
exponential/instance_normalizationlayer argumentmin_variancewith default1e-4
Optimizers:
- Removed default value
1e-3for optimizer argumentlearning_rate - Changed default value for optimizer argument
gradient_norm_clippingfrom1.0toNone(no gradient clipping) - Added new optimizer
doublecheck_stepand corresponding argumentdoublecheck_updatefor optimizer wrapper - Removed
linesearch_stepoptimizer argumentaccept_ratio - Removed
natural_gradientoptimizer argumentreturn_improvement_estimate
Saver:
- Added option to specify agent argument
saveras string, which is interpreted assaver[directory]with otherwise default values - Added default value for agent argument
saver[frequency]as10(save model every 10 updates by default) - Changed default value of agent argument
saver[max_checkpoints]from5to10
Summarizer:
- Added option to specify agent argument
summarizeras string, which is interpreted assummarizer[directory]with otherwise default values - Renamed option of agent argument
summarizerfromsummarizer[labels]tosummarizer[summaries](use of the term "label" due to earlier version, outdated and confusing by now) - Changed interpretation of agent argument
summarizer[summaries] = "all"to include only numerical summaries, so all summaries except "graph" - Changed default value of agent argument
summarizer[summaries]from["graph"]to"all" - Changed default value of agent argument
summarizer[max_summaries]from5to7(number of different colors in TensorBoard) - Added option
summarizer[filename]to agent argumentsummarizer
Recorder:
- Added option to specify agent argument
recorderas string, which is interpreted asrecorder[directory]with otherwise default values
run.py script:
- Added
--checkpoints/--summaries/--recordingscommand line argument to enable saver/summarizer/recorder agent argument specification separate from core agent configuration
Examples:
- Added
save_load_agent.pyexample script to illustrate regular agent saving and loading
Buxfixes
- Fixed problem with optimizer argument
gradient_norm_clippingnot being applied correctly - Fixed problem with
exponential_normalizationlayer not updating moving mean and variance correctly - Fixed problem with
recentmemory for timestep-based updates sometimes sampling invalid memory indices
Tensorforce 0.6.0
- Removed agent arguments
execution,buffer_observe,seed - Renamed agent arguments
baseline_policy/baseline_network/critic_networktobaseline/critic - Renamed agent
reward_estimationargumentsestimate_horizontopredict_horizon_values,estimate_actionstopredict_action_values,estimate_terminaltopredict_terminal_values - Renamed agent argument
preprocessingtostate_preprocessing - Default agent preprocessing
linear_normalization - Moved agent arguments for reward/return/advantage processing from
preprocessingtoreward_preprocessingandreward_estimation[return_/advantage_processing] - New agent argument
configwith valuesbuffer_observe,enable_int_action_masking,seed - Renamed PPO/TRPO/DPG argument
critic_network/_optimizertobaseline/baseline_optimizer - Renamed PPO argument
optimization_stepstomulti_step - New TRPO argument
subsampling_fraction - Changed agent argument
use_beta_distributiondefault to false - Added double DQN agent (
double_dqn) - Removed
Agent.act()argumentevaluation - Removed agent function arguments
query(functionality removed) - Agent saver functionality changed (Checkpoint/SavedModel instead of Saver/Protobuf):
save/loadfunctions andsaverargument changed - Default behavior when specifying
saveris not to load agent, unless agent is created viaAgent.load - Agent summarizer functionality changed:
summarizerargument changed, some summary labels and other options removed - Renamed RNN layers
internal_{rnn/lstm/gru}tornn/lstm/gruandrnn/lstm/grutoinput_{rnn/lstm/gru} - Renamed
autonetwork argumentinternal_rnntornn - Renamed
(internal_)rnn/lstm/grulayer argumentlengthtohorizon - Renamed
update_modifier_wrappertooptimizer_wrapper - Renamed
optimizing_steptolinesearch_step, andUpdateModifierWrapperargumentoptimizing_iterationstolinesearch_iterations - Optimizer
subsampling_stepaccepts both absolute (int) and relative (float) fractions - Objective
policy_gradientargumentratio_basedrenamed toimportance_sampling - Added objectives
state_valueandaction_value - Added
Gaussiandistribution argumentsglobal_stddevandbounded_transform(for improved bounded action space handling) - Changed default memory
deviceargument toCPU:0 - Renamed rewards summaries
Agent.create()accepts act-function asagentargument for recording- Singleton states and actions are now consistently handled as singletons
- Major change to policy handling and defaults, in particular
parametrized_distributions, new default policiesparametrized_state/action_value - Combined
longandinttype - Always wrap environment in
EnvironmentWrapperclass - Changed
tune.pyarguments
Tensorforce 0.5.5
- Changed independent mode of
agent.actto use final values of dynamic hyperparameters and avoid TensorFlow conditions - Extended
"tensorflow"format ofagent.saveto include an optimized Protobuf model with an act-only graph as.pbfile, andAgent.loadformat"pb-actonly"to load act-only agent based on Protobuf model - Support for custom summaries via new
summarizerargument valuecustomto specify summary type, andAgent.summarize(...)to record summary values - Added min/max-bounds for dynamic hyperparameters min/max-bounds to assert valid range and infer other arguments
- Argument
batch_sizenow mandatory for all agent classes - Removed
Estimatorargumentcapacity, now always automatically inferred - Internal changes related to agent arguments
memory,updateandreward_estimation - Changed the default
biasandactivationargument of some layers - Fixed issues with
sequencepreprocessor - DQN and dueling DQN properly constrained to
intactions only - Added
use_beta_distributionargument with defaultTrueto many agents andParametrizedDistributionspolicy, so default can be changed
Tensorforce 0.5.4
- DQN/DuelingDQN/DPG argument
memorynow required to be specified explicitly, plusupdate_frequencydefault changed - Removed (temporarily)
conv1d/conv2d_transposelayers due to TensorFlow gradient problems Agent,EnvironmentandRunnercan now be imported viafrom tensorforce import ...- New generic reshape layer available as
reshape - Support for batched version of
Agent.actandAgent.observe - Support for parallelized remote environments based on Python's
multiprocessingandsocket(replacingtensorforce/contrib/socket_remote_env/andtensorforce/environments/environment_process_wrapper.py), available viaEnvironment.create(...),Runner(...)andrun.py - Removed
ParallelRunnerand merged functionality withRunner - Changed
run.pyarguments - Changed independent mode for
Agent.act: additional argumentinternalsand corresponding return value, initial internals viaAgent.initial_internals(),Agent.reset()not required anymore - Removed
deterministicargument forAgent.actunless independent mode - Added
formatargument tosave/load/restorewith supported formatstensorflow,numpyandhdf5 - Changed
saveargumentappend_timesteptoappendwith defaultNone(instead of'timesteps') - Added
get_variableandassign_variableagent functions
Tensorforce 0.5.3
- Added optional
memoryargument to various agents - Improved summary labels, particularly
"entropy"and"kl-divergence" linearlayer now accepts tensors of rank 1 to 3- Network output / distribution input does not need to be a vector anymore
- Transposed convolution layers (
conv1d/2d_transpose) - Parallel execution functionality contributed by @jerabaul29, currently under
tensorforce/contrib/ - Accept string for runner
save_best_agentargument to specify best model directory different fromsaverconfiguration saverargumentstepsremoved andsecondsrenamed tofrequency- Moved
Parallel/Runnerargumentmax_episode_timestepsfromrun(...)to constructor - New
Environment.create(...)argumentmax_episode_timesteps - TensorFlow 2.0 support
- Improved Tensorboard summaries recording
- Summary labels
graph,variablesandvariables-histogramtemporarily not working - TF-optimizers updated to TensorFlow 2.0 Keras optimizers
- Added TensorFlow Addons dependency, and support for TFA optimizers
- Changed unit of
target_sync_frequencyfrom timesteps to updates fordqnanddueling_dqnagent
Tensorforce 0.5.2
- Improved unittest performance
- Added
updatesand renamedtimesteps/episodescounter for agents and runners - Renamed
critic_{network,optimizer}argument tobaseline_{network,optimizer} - Added Actor-Critic (
ac), Advantage Actor-Critic (a2c) and Dueling DQN (dueling_dqn) agents - Improved "same" baseline optimizer mode and added optional weight specification
- Reuse layer now global for parameter sharing across modules
- New block layer type (
block) for easier sharing of layer blocks - Renamed
PolicyAgent/-ModeltoTensorforceAgent/-Model - New
Agent.load(...)function, saving includes agent specification - Removed
PolicyAgentargument(baseline-)network - Added policy argument
temperature - Removed
"same"and"equal"options forbaseline_*arguments and changed internal baseline handling - Combined
state/action_valuetovalueobjective with argumentvalueeither"state"or"action"