Tensorforce is an open-source deep reinforcement learning framework, with an emphasis on modularized flexible library design and straightforward usability for applications in research and practice. Tensorforce is built on top of Google's TensorFlow framework and requires Python 3.

Tensorforce follows a set of high-level design choices which differentiate it from other similar libraries:

  • Modular component-based design: Feature implementations, above all, strive to be as generally applicable and configurable as possible, potentially at some cost of faithfully resembling details of the introducing paper.
  • Separation of RL algorithm and application: Algorithms are agnostic to the type and structure of inputs (states/observations) and outputs (actions/decisions), as well as the interaction with the application environment.
  • Full-on TensorFlow models: The entire reinforcement learning logic, including control flow, is implemented in TensorFlow, to enable portable computation graphs independent of application programming language, and to facilitate the deployment of models.


Table of content


A stable version of Tensorforce is periodically updated on PyPI and installed as follows:

pip3 install tensorforce

To always use the latest version of Tensorforce, install the GitHub version instead:

git clone https://github.com/tensorforce/tensorforce.git
pip3 install -e tensorforce

Note on installation on M1 Macs: At the moment Tensorflow, which is a core dependency of Tensorforce, cannot be installed on M1 Macs directly. Follow the "M1 Macs" section in the documentation for a workaround.

Environments require additional packages for which there are setup options available (ale, gym, retro, vizdoom, carla; or envs for all environments), however, some require additional tools to be installed separately (see environments documentation). Other setup options include tfa for TensorFlow Addons and tune for HpBandSter required for the tune.py script.

Note on GPU usage: Different from (un)supervised deep learning, RL does not always benefit from running on a GPU, depending on environment and agent configuration. In particular for environments with low-dimensional state spaces (i.e., no images), it is hence worth trying to run on CPU only.

Quickstart example code

from tensorforce import Agent, Environment

# Pre-defined or custom environment
environment = Environment.create(
    environment='gym', level='CartPole', max_episode_timesteps=500

# Instantiate a Tensorforce agent
agent = Agent.create(
    environment=environment,  # alternatively: states, actions, (max_episode_timesteps)
    update=dict(unit='timesteps', batch_size=64),
    optimizer=dict(type='adam', learning_rate=3e-4),

# Train for 300 episodes
for _ in range(300):

    # Initialize episode
    states = environment.reset()
    terminal = False

    while not terminal:
        # Episode timestep
        actions = agent.act(states=states)
        states, terminal, reward = environment.execute(actions=actions)
        agent.observe(terminal=terminal, reward=reward)


Command line usage

Tensorforce comes with a range of example configurations for different popular reinforcement learning environments. For instance, to run Tensorforce's implementation of the popular Proximal Policy Optimization (PPO) algorithm on the OpenAI Gym CartPole environment, execute the following line:

python3 run.py --agent benchmarks/configs/ppo.json --environment gym \
    --level CartPole-v1 --episodes 100

For more information check out the documentation.


  • Network layers: Fully-connected, 1- and 2-dimensional convolutions, embeddings, pooling, RNNs, dropout, normalization, and more; plus support of Keras layers.
  • Network architecture: Support for multi-state inputs and layer (block) reuse, simple definition of directed acyclic graph structures via register/retrieve layer, plus support for arbitrary architectures.
  • Memory types: Simple batch buffer memory, random replay memory.
  • Policy distributions: Bernoulli distribution for boolean actions, categorical distribution for (finite) integer actions, Gaussian distribution for continuous actions, Beta distribution for range-constrained continuous actions, multi-action support.
  • Reward estimation: Configuration options for estimation horizon, future reward discount, state/state-action/advantage estimation, and for whether to consider terminal and horizon states.
  • Training objectives: (Deterministic) policy gradient, state-(action-)value approximation.
  • Optimization algorithms: Various gradient-based optimizers provided by TensorFlow like Adam/AdaDelta/RMSProp/etc, evolutionary optimizer, natural-gradient-based optimizer, plus a range of meta-optimizers.
  • Exploration: Randomized actions, sampling temperature, variable noise.
  • Preprocessing: Clipping, deltafier, sequence, image processing.
  • Regularization: L2 and entropy regularization.
  • Execution modes: Parallelized execution of multiple environments based on Python's multiprocessing and socket.
  • Optimized act-only SavedModel extraction.
  • TensorBoard support.

By combining these modular components in different ways, a variety of popular deep reinforcement learning models/features can be replicated:

Note that in general the replication is not 100% faithful, since the models as described in the corresponding paper often involve additional minor tweaks and modifications which are hard to support with a modular design (and, arguably, also questionable whether it is important/desirable to support them). On the upside, these models are just a few examples from the multitude of module combinations supported by Tensorforce.

Environment adapters

  • Arcade Learning Environment, a simple object-oriented framework that allows researchers and hobbyists to develop AI agents for Atari 2600 games.
  • CARLA, is an open-source simulator for autonomous driving research.
  • OpenAI Gym, a toolkit for developing and comparing reinforcement learning algorithms which supports teaching agents everything from walking to playing games like Pong or Pinball.
  • OpenAI Retro, lets you turn classic video games into Gym environments for reinforcement learning and comes with integrations for ~1000 games.
  • OpenSim, reinforcement learning with musculoskeletal models.
  • PyGame Learning Environment, learning environment which allows a quick start to Reinforcement Learning in Python.
  • ViZDoom, allows developing AI bots that play Doom using only the visual information.

Support, feedback and donating

Please get in touch via mail or on Gitter if you have questions, feedback, ideas for features/collaboration, or if you seek support for applying Tensorforce to your problem.

If you want to support the Tensorforce core team (see below), please also consider donating: GitHub Sponsors or Liberapay.

Core team and contributors

Tensorforce is currently developed and maintained by Alexander Kuhnle.

Earlier versions of Tensorforce (<= 0.4.2) were developed by Michael Schaarschmidt, Alexander Kuhnle and Kai Fricke.

The advanced parallel execution functionality was originally contributed by Jean Rabault (@jerabaul29) and Vincent Belus (@vbelus). Moreover, the pretraining feature was largely developed in collaboration with Hongwei Tang (@thw1021) and Jean Rabault (@jerabaul29).

The CARLA environment wrapper is currently developed by Luca Anzalone (@luca96).

We are very grateful for our open-source contributors (listed according to Github, updated periodically):

Islandman93, sven1977, Mazecreator, wassname, lefnire, daggertye, trickmeyer, mkempers, mryellow, ImpulseAdventure, janislavjankov, andrewekhalel, HassamSheikh, skervim, beflix, coord-e, benelot, tms1337, vwxyzjn, erniejunior, Deathn0t, petrbel, nrhodes, batu, yellowbee686, tgianko, AdamStelmaszczyk, BorisSchaeling, christianhidber, Davidnet, ekerazha, gitter-badger, kborozdin, Kismuz, mannsi, milesmcc, nagachika, neitzal, ngoodger, perara, sohakes, tomhennigan.

Cite Tensorforce

Please cite the framework as follows:

  author       = {Kuhnle, Alexander and Schaarschmidt, Michael and Fricke, Kai},
  title        = {Tensorforce: a TensorFlow library for applied reinforcement learning},
  howpublished = {Web page},
  url          = {https://github.com/tensorforce/tensorforce},
  year         = {2017}

If you use the parallel execution functionality, please additionally cite it as follows:

  title        = {Accelerating deep reinforcement learning strategies of flow control through a multi-environment approach},
  author       = {Rabault, Jean and Kuhnle, Alexander},
  journal      = {Physics of Fluids},
  volume       = {31},
  number       = {9},
  pages        = {094105},
  year         = {2019},
  publisher    = {AIP Publishing}

If you use Tensorforce in your research, you may additionally consider citing the following paper:

  author       = {Schaarschmidt, Michael and Kuhnle, Alexander and Ellis, Ben and Fricke, Kai and Gessert, Felix and Yoneki, Eiko},
  title        = {{LIFT}: Reinforcement Learning in Computer Systems by Learning From Demonstrations},
  journal      = {CoRR},
  volume       = {abs/1808.07903},
  year         = {2018},
  url          = {http://arxiv.org/abs/1808.07903},
  archivePrefix = {arXiv},
  eprint       = {1808.07903}
  • GPU integration for MacOs12.3 M1 Max

    GPU integration for MacOs12.3 M1 Max

    I ran the Quickstart.py example, and I get the following error;

    Metal device set to: Apple M1 Max

    systemMemory: 32.00 GB maxCacheSize: 10.67 GB

    WARNING:root:Infinite min_value bound for state. Episodes: 0%| | 0/200 [00:00, return=0.00, ts/ep=0, sec/ep=0.00, ms/ts=0.0, agent=0.0%]Traceback (most recent call last): File "/Users/dominikrichard/workspace/minesweeping/minesweepingpython/main/tensforce_testing.py", line 53, in main() File "/Users/dominikrichard/workspace/minesweeping/minesweepingpython/main/tensforce_testing.py", line 46, in main runner.run(num_episodes=200) File "/opt/homebrew/anaconda3/envs/TensFenv/lib/python3.10/site-packages/tensorforce/execution/runner.py", line 649, in run self.handle_act(parallel=n) File "/opt/homebrew/anaconda3/envs/TensFenv/lib/python3.10/site-packages/tensorforce/execution/runner.py", line 697, in handle_act actions = self.agent.act(states=self.states[parallel], parallel=parallel) File "/opt/homebrew/anaconda3/envs/TensFenv/lib/python3.10/site-packages/tensorforce/agents/agent.py", line 415, in act return super().act( File "/opt/homebrew/anaconda3/envs/TensFenv/lib/python3.10/site-packages/tensorforce/agents/recorder.py", line 262, in act actions, internals = self.fn_act( File "/opt/homebrew/anaconda3/envs/TensFenv/lib/python3.10/site-packages/tensorforce/agents/agent.py", line 462, in fn_act actions, timesteps = self.model.act( File "/opt/homebrew/anaconda3/envs/TensFenv/lib/python3.10/site-packages/tensorforce/core/module.py", line 136, in decorated output_args = function_graphsstr(graph_params) File "/opt/homebrew/anaconda3/envs/TensFenv/lib/python3.10/site-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler raise e.with_traceback(filtered_tb) from None File "/opt/homebrew/anaconda3/envs/TensFenv/lib/python3.10/site-packages/tensorflow/python/eager/execute.py", line 54, in quick_execute tensors = pywrap_tfe.TFE_Py_Execute(ctx.handle, device_name, op_name, tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation agent/VerifyFinite/CheckNumerics: Could not satisfy explicit device specification '' because the node {{colocation_node agent/VerifyFinite/CheckNumerics}} was colocated with a group of nodes that required incompatible device '/job:localhost/replica:0/task:0/device:GPU:0'. All available devices [/job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:GPU:0]. Colocation Debug Info: Colocation group had the following types and supported devices: Root Member(assigned_device_name_index=1 requested_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' assigned_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' resource_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[] Identity: GPU CPU Switch: GPU CPU CheckNumerics: CPU _Arg: GPU CPU

    Colocation members, user-requested devices, and framework assigned devices, if any: args_0 (_Arg) framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0 agent/VerifyFinite/CheckNumerics (CheckNumerics) agent/VerifyFinite/control_dependency (Identity) agent/assert_greater_equal/Assert/AssertGuard/args_0/_16 (Switch) agent/assert_less_equal/Assert/AssertGuard/args_0/_26 (Switch) Func/agent/StatefulPartitionedCall/input/_80 (Identity) /job:localhost/replica:0/task:0/device:GPU:0 Func/agent/assert_greater_equal/Assert/AssertGuard/then/_10/input/_153 (Identity) Func/agent/assert_greater_equal/Assert/AssertGuard/else/_11/input/_159 (Identity) Func/agent/assert_less_equal/Assert/AssertGuard/then/_20/input/_165 (Identity) Func/agent/assert_less_equal/Assert/AssertGuard/else/_21/input/_171 (Identity) Func/agent/StatefulPartitionedCall/state_preprocessing/PartitionedCall/input/_260 (Identity) /job:localhost/replica:0/task:0/device:GPU:0 Func/agent/StatefulPartitionedCall/state_preprocessing/PartitionedCall/linear_normalization0/PartitionedCall/input/_356 (Identity) /job:localhost/replica:0/task:0/device:GPU:0

         [[{{node agent/VerifyFinite/CheckNumerics}}]] [Op:__inference_act_1848]

    Episodes: 0%| | 0/200 [00:00, return=0.00, ts/ep=0, sec/ep=0.00, ms/ts=0.0, agent=0.0%]

    I installed Tensorforce using this guide; https://tensorforce.readthedocs.io/en/latest/basics/installation.html

    for M1 Mac in a new Conda environment. I also had to upgrade numpy to 1.22 to run the code.

    My Conda env is build as follow;

    Name Version Build Channel

    absl-py 1.2.0 pypi_0 pypi astunparse 1.6.3 pypi_0 pypi blas 1.0 openblas
    bzip2 1.0.8 h620ffc9_4
    c-ares 1.18.1 h1a28f6b_0
    ca-certificates 2022.07.19 hca03da5_0
    cachetools 5.2.0 pypi_0 pypi certifi 2022.6.15 py310hca03da5_0
    charset-normalizer 2.1.0 pypi_0 pypi cloudpickle 2.1.0 pypi_0 pypi cycler 0.11.0 pypi_0 pypi flatbuffers 1.12 pypi_0 pypi fonttools 4.34.4 pypi_0 pypi gast 0.4.0 pypi_0 pypi google-auth 2.10.0 pypi_0 pypi google-auth-oauthlib 0.4.6 pypi_0 pypi google-pasta 0.2.0 pypi_0 pypi grpcio 1.42.0 py310h95c9599_0
    gym 0.21.0 pypi_0 pypi h5py 3.6.0 py310h181c318_0
    hdf5 1.12.1 h160e8cb_2
    idna 3.3 pypi_0 pypi keras 2.9.0 pypi_0 pypi keras-preprocessing 1.1.2 pypi_0 pypi kiwisolver 1.4.4 pypi_0 pypi krb5 1.19.2 h3b8d789_0
    libclang 14.0.6 pypi_0 pypi libcurl 7.84.0 hc6d1d07_0
    libcxx 12.0.0 hf6beb65_1
    libedit 3.1.20210910 h1a28f6b_0
    libev 4.33 h1a28f6b_1
    libffi 3.4.2 hc377ac9_4
    libgfortran 5.0.0 11_2_0_he6877d6_26
    libgfortran5 11.2.0 he6877d6_26
    libnghttp2 1.46.0 h95c9599_0
    libopenblas 0.3.20 hea475bc_0
    libssh2 1.10.0 hf27765b_0
    llvm-openmp 12.0.0 haf9daa7_1
    markdown 3.4.1 pypi_0 pypi markupsafe 2.1.1 pypi_0 pypi matplotlib 3.5.1 pypi_0 pypi msgpack 1.0.3 pypi_0 pypi msgpack-numpy pypi_0 pypi ncurses 6.3 h1a28f6b_3
    numpy 1.22.0 pypi_0 pypi oauthlib 3.2.0 pypi_0 pypi openssl 1.1.1q h1a28f6b_0
    opt-einsum 3.3.0 pypi_0 pypi packaging 21.3 pypi_0 pypi pillow 9.2.0 pypi_0 pypi pip 22.1.2 py310hca03da5_0
    protobuf 3.19.4 pypi_0 pypi pyasn1 0.4.8 pypi_0 pypi pyasn1-modules 0.2.8 pypi_0 pypi pyparsing 3.0.9 pypi_0 pypi python 3.10.4 hbdb9e5c_0
    python-dateutil 2.8.2 pypi_0 pypi readline 8.1.2 h1a28f6b_1
    requests 2.28.1 pypi_0 pypi requests-oauthlib 1.3.1 pypi_0 pypi rsa 4.9 pypi_0 pypi setuptools 61.2.0 py310hca03da5_0
    six 1.15.0 pypi_0 pypi sqlite 3.39.2 h1058600_0
    tensorboard 2.9.1 pypi_0 pypi tensorboard-data-server 0.6.1 pypi_0 pypi tensorboard-plugin-wit 1.8.1 pypi_0 pypi tensorflow-deps 2.8.0 0 apple tensorflow-estimator 2.9.0 pypi_0 pypi tensorflow-macos 2.9.2 pypi_0 pypi tensorflow-metal 0.5.0 pypi_0 pypi tensorforce 0.6.5 pypi_0 pypi termcolor 1.1.0 pypi_0 pypi tk 8.6.12 hb8d0fd4_0
    tqdm 4.62.3 pypi_0 pypi typing-extensions 4.3.0 pypi_0 pypi tzdata 2022a hda174b7_0
    urllib3 1.26.11 pypi_0 pypi werkzeug 2.2.2 pypi_0 pypi wheel 0.37.1 pyhd3eb1b0_0
    wrapt 1.14.1 pypi_0 pypi xz 5.2.5 h1a28f6b_1
    zlib 1.2.12 h5a0b063_2

    Is there any way to dares this issue? I also tried downgrading python to 3.9 with did not work. Is Mac OS not supposed to be supported using TensorFlow-metal?

    Thank you.

    opened by doric35 0
  • Bump mistune from 0.8.4 to 2.0.3 in /docs

    Bump mistune from 0.8.4 to 2.0.3 in /docs

    Bumps mistune from 0.8.4 to 2.0.3.

    Release notes

    Sourced from mistune's releases.

    Version 2.0.2

    Fix escape_url via lepture/mistune#295

    Version 2.0.1

    Fix XSS for image link syntax.

    Version 2.0.0

    First release of Mistune v2.

    Version 2.0.0 RC1

    In this release, we have a Security Fix for harmful links.

    Version 2.0.0 Alpha 1

    This is the first release of v2. An alpha version for users to have a preview of the new mistune.


    Sourced from mistune's changelog.


    Here is the full history of mistune v2.

    Version 2.0.4

    Released on Jul 15, 2022
    • Fix url plugin in &lt;a&gt; tag
    • Fix * formatting

    Version 2.0.3

    Released on Jun 27, 2022

    • Fix table plugin
    • Security fix for CVE-2022-34749

    Version 2.0.2

    Released on Jan 14, 2022

    Fix escape_url

    Version 2.0.1

    Released on Dec 30, 2021

    XSS fix for image link syntax.

    Version 2.0.0

    Released on Dec 5, 2021

    This is the first non-alpha release of mistune v2.

    Version 2.0.0rc1

    Released on Feb 16, 2021

    Version 2.0.0a6


    ... (truncated)

    • 3f422f1 Version bump 2.0.3
    • a6d4321 Fix asteris emphasis regex CVE-2022-34749
    • 5638e46 Merge pull request #307 from jieter/patch-1
    • 0eba471 Fix typo in guide.rst
    • 61e9337 Fix table plugin
    • 76dec68 Add documentation for renderer heading when TOC enabled
    • 799cd11 Version bump 2.0.2
    • babb0cf Merge pull request #295 from dairiki/bug.escape_url
    • fc2cd53 Make mistune.util.escape_url less aggressive
    • 3e8d352 Version bump 2.0.1
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    opened by dependabot[bot] 0
  • 0.6.5(Aug 30, 2021)

    • Renamed agent argument reward_preprocessing to reward_processing, and in case of Tensorforce agent moved to reward_estimation[reward_processing]
    • New categorical distribution argument skip_linear to not add the implicit linear logits layer
    • Support for multi-actor parallel environments via new function Environment.num_actors()
      • Runner uses multi-actor parallelism by default if environment is multi-actor
    • New optional Environment function episode_return() which returns the true return of the last episode, if cumulative sum of environment rewards is not a good metric for runner display
    • New vectorized_environment.py and multiactor_environment.py script to illustrate how to setup a vectorized/multi-actor environment.
    Source code(tar.gz)
    Source code(zip)
  • 0.6.4(Jun 5, 2021)

    • Agent argument update_frequency / update[frequency] now supports float values > 0.0, which specify the update-frequency relative to the batch-size
    • Changed default value for argument update_frequency from 1.0 to 0.25 for DQN, DoubleDQN, DuelingDQN agents
    • New argument return_processing and advantage_processing (where applicable) for all agent sub-types
    • New function Agent.get_specification() which returns the agent specification as dictionary
    • New function Agent.get_architecture() which returns a string representation of the network layer architecture
    • Improved and simplified module specification, for instance: network=my_module instead of network=my_module.TestNetwork, or environment=envs.custom_env instead of environment=envs.custom_env.CustomEnvironment (module file needs to be in the same directory or a sub-directory)
    • New argument single_output=True for some policy types which, if False, allows the specification of additional network outputs for some/all actions via registered tensors
    • KerasNetwork argument model now supports arbitrary functions as long as they return a tf.keras.Model
    • New layer type SelfAttention (specification key: self_attention)
    • Support tracking of non-constant parameter values
    • Rename attribute episode_rewards as episode_returns, and TQDM status reward as return
    • Extend argument agent to support Agent.load() keyword arguments to load an existing agent instead of creating a new one.
    • Added action_masking.py example script to illustrate an environment implementation with built-in action masking.
    • Customized device placement was not applied to most tensors
    Source code(tar.gz)
    Source code(zip)
  • 0.6.3(Mar 22, 2021)

    • New agent argument tracking and corresponding function tracked_tensors() to track and retrieve the current value of predefined tensors, similar to summarizer for TensorBoard summaries
    • New experimental value trace_decay and gae_decay for Tensorforce agent argument reward_estimation, soon for other agent types as well
    • New options "early" and "late" for value estimate_advantage of Tensorforce agent argument reward_estimation
    • Changed default value for Agent.act() argument deterministic from False to True
    • New network type KerasNetwork (specification key: keras) as wrapper for networks specified as Keras model
    • Passing a Keras model class/object as policy/network argument is automatically interpreted as KerasNetwork
    • Changed Gaussian distribution argument global_stddev=False to stddev_mode='predicted'
    • New Categorical distribution argument temperature_mode=None
    • New option for Function layer argument function to pass string function expression with argument "x", e.g. "(x+1.0)/2.0"
    • New summary episode-length recorded as part of summary label "reward"
    • Support for vectorized parallel environments via new function Environment.is_vectorizable() and new argument num_parallel for Environment.reset()
      • See tensorforce/environments.cartpole.py for a vectorizable environment example
      • Runner uses vectorized parallelism by default if num_parallel > 1, remote=None and environment supports vectorization
      • See examples/act_observe_vectorized.py for more details on act-observe interaction
    • New extended and vectorizable custom CartPole environment via key custom_cartpole (work in progress)
    • New environment argument reward_shaping to provide a simple way to modify/shape rewards of an environment, can be specified either as callable or string function expression
    run.py script:
    • New option for command line arguments --checkpoints and --summaries to add comma-separated checkpoint/summary filename in addition to directory
    • Added episode lengths to logging plot besides episode returns
    • Temporal horizon handling of RNN layers
    • Critical bugfix for late horizon value prediction (including DQN variants and DPG agent) in combination with baseline RNN
    • GPU problems with scatter operations
    Source code(tar.gz)
    Source code(zip)
  • 0.6.2(Oct 3, 2020)

  • 0.6.1(Sep 19, 2020)

    • Removed default value "adam" for Tensorforce agent argument optimizer (since default optimizer argument learning_rate removed, see below)
    • Removed option "minimum" for Tensorforce agent argument memory, use None instead
    • Changed default value for dqn/double_dqn/dueling_dqn agent argument huber_loss from 0.0 to None
    • Removed default value 0.999 for exponential_normalization layer argument decay
    • Added new layer batch_normalization (generally should only be used for the agent arguments reward_processing[return_processing] and reward_processing[advantage_processing])
    • Added exponential/instance_normalization layer argument only_mean with default False
    • Added exponential/instance_normalization layer argument min_variance with default 1e-4
    • Removed default value 1e-3 for optimizer argument learning_rate
    • Changed default value for optimizer argument gradient_norm_clipping from 1.0 to None (no gradient clipping)
    • Added new optimizer doublecheck_step and corresponding argument doublecheck_update for optimizer wrapper
    • Removed linesearch_step optimizer argument accept_ratio
    • Removed natural_gradient optimizer argument return_improvement_estimate
    • Added option to specify agent argument saver as string, which is interpreted as saver[directory] with otherwise default values
    • Added default value for agent argument saver[frequency] as 10 (save model every 10 updates by default)
    • Changed default value of agent argument saver[max_checkpoints] from 5 to 10
    • Added option to specify agent argument summarizer as string, which is interpreted as summarizer[directory] with otherwise default values
    • Renamed option of agent argument summarizer from summarizer[labels] to summarizer[summaries] (use of the term "label" due to earlier version, outdated and confusing by now)
    • Changed interpretation of agent argument summarizer[summaries] = "all" to include only numerical summaries, so all summaries except "graph"
    • Changed default value of agent argument summarizer[summaries] from ["graph"] to "all"
    • Changed default value of agent argument summarizer[max_summaries] from 5 to 7 (number of different colors in TensorBoard)
    • Added option summarizer[filename] to agent argument summarizer
    • Added option to specify agent argument recorder as string, which is interpreted as recorder[directory] with otherwise default values
    run.py script:
    • Added --checkpoints/--summaries/--recordings command line argument to enable saver/summarizer/recorder agent argument specification separate from core agent configuration
    • Added save_load_agent.py example script to illustrate regular agent saving and loading
    • Fixed problem with optimizer argument gradient_norm_clipping not being applied correctly
    • Fixed problem with exponential_normalization layer not updating moving mean and variance correctly
    • Fixed problem with recent memory for timestep-based updates sometimes sampling invalid memory indices
    Source code(tar.gz)
    Source code(zip)
  • 0.6.0(Aug 30, 2020)

    • Removed agent arguments execution, buffer_observe, seed
    • Renamed agent arguments baseline_policy/baseline_network/critic_network to baseline/critic
    • Renamed agent reward_estimation arguments estimate_horizon to predict_horizon_values, estimate_actions to predict_action_values, estimate_terminal to predict_terminal_values
    • Renamed agent argument preprocessing to state_preprocessing
    • Default agent preprocessing linear_normalization
    • Moved agent arguments for reward/return/advantage processing from preprocessing to reward_preprocessing and reward_estimation[return_/advantage_processing]
    • New agent argument config with values buffer_observe, enable_int_action_masking, seed
    • Renamed PPO/TRPO/DPG argument critic_network/_optimizer to baseline/baseline_optimizer
    • Renamed PPO argument optimization_steps to multi_step
    • New TRPO argument subsampling_fraction
    • Changed agent argument use_beta_distribution default to false
    • Added double DQN agent (double_dqn)
    • Removed Agent.act() argument evaluation
    • Removed agent function arguments query (functionality removed)
    • Agent saver functionality changed (Checkpoint/SavedModel instead of Saver/Protobuf): save/load functions and saver argument changed
    • Default behavior when specifying saver is not to load agent, unless agent is created via Agent.load
    • Agent summarizer functionality changed: summarizer argument changed, some summary labels and other options removed
    • Renamed RNN layers internal_{rnn/lstm/gru} to rnn/lstm/gru and rnn/lstm/gru to input_{rnn/lstm/gru}
    • Renamed auto network argument internal_rnn to rnn
    • Renamed (internal_)rnn/lstm/gru layer argument length to horizon
    • Renamed update_modifier_wrapper to optimizer_wrapper
    • Renamed optimizing_step to linesearch_step, and UpdateModifierWrapper argument optimizing_iterations to linesearch_iterations
    • Optimizer subsampling_step accepts both absolute (int) and relative (float) fractions
    • Objective policy_gradient argument ratio_based renamed to importance_sampling
    • Added objectives state_value and action_value
    • Added Gaussian distribution arguments global_stddev and bounded_transform (for improved bounded action space handling)
    • Changed default memory device argument to CPU:0
    • Renamed rewards summaries
    • Agent.create() accepts act-function as agent argument for recording
    • Singleton states and actions are now consistently handled as singletons
    • Major change to policy handling and defaults, in particular parametrized_distributions, new default policies parametrized_state/action_value
    • Combined long and int type
    • Always wrap environment in EnvironmentWrapper class
    • Changed tune.py arguments
    Source code(tar.gz)
    Source code(zip)
  • 0.5.5(Jun 16, 2020)

    • Changed independent mode of agent.act to use final values of dynamic hyperparameters and avoid TensorFlow conditions
    • Extended "tensorflow" format of agent.save to include an optimized Protobuf model with an act-only graph as .pb file, and Agent.load format "pb-actonly" to load act-only agent based on Protobuf model
    • Support for custom summaries via new summarizer argument value custom to specify summary type, and Agent.summarize(...) to record summary values
    • Added min/max-bounds for dynamic hyperparameters min/max-bounds to assert valid range and infer other arguments
    • Argument batch_size now mandatory for all agent classes
    • Removed Estimator argument capacity, now always automatically inferred
    • Internal changes related to agent arguments memory, update and reward_estimation
    • Changed the default bias and activation argument of some layers
    • Fixed issues with sequence preprocessor
    • DQN and dueling DQN properly constrained to int actions only
    • Added use_beta_distribution argument with default True to many agents and ParametrizedDistributions policy, so default can be changed
    Source code(tar.gz)
    Source code(zip)
  • 0.5.4(Feb 15, 2020)

    • DQN/DuelingDQN/DPG argument memory now required to be specified explicitly, plus update_frequency default changed
    • Removed (temporarily) conv1d/conv2d_transpose layers due to TensorFlow gradient problems
    • Agent, Environment and Runner can now be imported via from tensorforce import ...
    • New generic reshape layer available as reshape
    • Support for batched version of Agent.act and Agent.observe
    • Support for parallelized remote environments based on Python's multiprocessing and socket (replacing tensorforce/contrib/socket_remote_env/ and tensorforce/environments/environment_process_wrapper.py), available via Environment.create(...), Runner(...) and run.py
    • Removed ParallelRunner and merged functionality with Runner
    • Changed run.py arguments
    • Changed independent mode for Agent.act: additional argument internals and corresponding return value, initial internals via Agent.initial_internals(), Agent.reset() not required anymore
    • Removed deterministic argument for Agent.act unless independent mode
    • Added format argument to save/load/restore with supported formats tensorflow, numpy and hdf5
    • Changed save argument append_timestep to append with default None (instead of 'timesteps')
    • Added get_variable and assign_variable agent functions
    Source code(tar.gz)
    Source code(zip)
  • 0.5.3(Dec 26, 2019)

    • Added optional memory argument to various agents
    • Improved summary labels, particularly "entropy" and "kl-divergence"
    • linear layer now accepts tensors of rank 1 to 3
    • Network output / distribution input does not need to be a vector anymore
    • Transposed convolution layers (conv1d/2d_transpose)
    • Parallel execution functionality contributed by @jerabaul29, currently under tensorforce/contrib/
    • Accept string for runner save_best_agent argument to specify best model directory different from saver configuration
    • saver argument steps removed and seconds renamed to frequency
    • Moved Parallel/Runner argument max_episode_timesteps from run(...) to constructor
    • New Environment.create(...) argument max_episode_timesteps
    • TensorFlow 2.0 support
    • Improved Tensorboard summaries recording
    • Summary labels graph, variables and variables-histogram temporarily not working
    • TF-optimizers updated to TensorFlow 2.0 Keras optimizers
    • Added TensorFlow Addons dependency, and support for TFA optimizers
    • Changed unit of target_sync_frequency from timesteps to updates for dqn and dueling_dqn agent
    Source code(tar.gz)
    Source code(zip)
  • 0.5.2(Oct 14, 2019)

    • Improved unittest performance
    • Added updates and renamed timesteps/episodes counter for agents and runners
    • Renamed critic_{network,optimizer} argument to baseline_{network,optimizer}
    • Added Actor-Critic (ac), Advantage Actor-Critic (a2c) and Dueling DQN (dueling_dqn) agents
    • Improved "same" baseline optimizer mode and added optional weight specification
    • Reuse layer now global for parameter sharing across modules
    • New block layer type (block) for easier sharing of layer blocks
    • Renamed PolicyAgent/-Model to TensorforceAgent/-Model
    • New Agent.load(...) function, saving includes agent specification
    • Removed PolicyAgent argument (baseline-)network
    • Added policy argument temperature
    • Removed "same" and "equal" options for baseline_* arguments and changed internal baseline handling
    • Combined state/action_value to value objective with argument value either "state" or "action"
    Source code(tar.gz)
    Source code(zip)
  • 0.5.1(Sep 10, 2019)

  • 0.5.0(Sep 8, 2019)

    Major Revision

    • DQFDAgent removed (temporarily)
    • DQNNstepAgent and NAFAgent part of DQNAgent
    • Agents need to be initialized via agent.initialize() before application
    • States/actions of type int require an entry num_values (instead of num_actions)
    • Agent.from_spec() changed and renamed to Agent.create()
    • Agent.act() argument fetch_tensors changed and renamed to query, index renamed to parallel, buffered removed
    • Agent.observe() argument index renamed to parallel
    • Agent.atomic_observe() removed
    • Agent.save/restore_model() renamed to Agent.save/restore()
    Agent arguments:
    • update_mode renamed to update
    • states_preprocessing and reward_preprocessing changed and combined to preprocessing
    • actions_exploration changed and renamed to exploration
    • execution entry num_parallel replaced by a separate argument parallel_interactions
    • batched_observe and batching_capacity replaced by argument buffer_observe
    • scope renamed to name
    DQNAgent arguments:
    • update_mode replaced by batch_size, update_frequency and start_updating
    • optimizer removed, implicitly defined as 'adam', learning_rate added
    • memory defines capacity of implicitly defined memory 'replay'
    • double_q_model removed (temporarily)
    Policy gradient agent arguments:
    • New mandatory argument max_episode_timesteps
    • update_mode replaced by batch_size and update_frequency
    • memory removed
    • baseline_mode removed
    • baseline argument changed and renamed to critic_network
    • baseline_optimizer renamed to critic_optimizer
    • gae_lambda removed (temporarily)
    PPOAgent arguments:
    • step_optimizer removed, implicitly defined as 'adam', learning_rate added
    TRPOAgent arguments:
    • cg_* and ls_* arguments removed
    VPGAgent arguments:
    • optimizer removed, implicitly defined as 'adam', learning_rate added
    • Environment properties states and actions are now functions states() and actions()
    • States/actions of type int require an entry num_values (instead of num_actions)
    • New function Environment.max_episode_timesteps()
    Contrib environments:
    • ALE, MazeExp, OpenSim, Gym, Retro, PyGame and ViZDoom moved to tensorforce.environments
    • Other environment implementations removed (may be upgraded in the future)
    • Improved run() API for Runner and ParallelRunner
    • ThreadedRunner removed
    • examples folder (including configs) removed, apart from quickstart.py
    • New benchmarks folder to replace parts of old examples folder
    Source code(tar.gz)
    Source code(zip)
