Tensorforce: a TensorFlow library for applied reinforcement learning

Last update: Jan 02, 2023

Overview

Tensorforce: a TensorFlow library for applied reinforcement learning

Introduction

Tensorforce is an open-source deep reinforcement learning framework, with an emphasis on modularized flexible library design and straightforward usability for applications in research and practice. Tensorforce is built on top of Google's TensorFlow framework and requires Python 3.

Tensorforce follows a set of high-level design choices which differentiate it from other similar libraries:

Modular component-based design: Feature implementations, above all, strive to be as generally applicable and configurable as possible, potentially at some cost of faithfully resembling details of the introducing paper.
Separation of RL algorithm and application: Algorithms are agnostic to the type and structure of inputs (states/observations) and outputs (actions/decisions), as well as the interaction with the application environment.
Full-on TensorFlow models: The entire reinforcement learning logic, including control flow, is implemented in TensorFlow, to enable portable computation graphs independent of application programming language, and to facilitate the deployment of models.

Quicklinks

Documentation and update notes
Contact and Gitter channel
Benchmarks and projects using Tensorforce
Roadmap and contribution guidelines
GitHub Sponsors and Liberapay

Installation

A stable version of Tensorforce is periodically updated on PyPI and installed as follows:

pip3 install tensorforce

To always use the latest version of Tensorforce, install the GitHub version instead:

git clone https://github.com/tensorforce/tensorforce.git
pip3 install -e tensorforce

Environments require additional packages for which there are setup options available (ale, gym, retro, vizdoom, carla; or envs for all environments), however, some require additional tools to be installed separately (see environments documentation). Other setup options include tfa for TensorFlow Addons and tune for HpBandSter required for the tune.py script.

Note on GPU usage: Different from (un)supervised deep learning, RL does not always benefit from running on a GPU, depending on environment and agent configuration. In particular for environments with low-dimensional state spaces (i.e., no images), it is hence worth trying to run on CPU only.

Quickstart example code

from tensorforce import Agent, Environment

# Pre-defined or custom environment
environment = Environment.create(
    environment='gym', level='CartPole', max_episode_timesteps=500
)

# Instantiate a Tensorforce agent
agent = Agent.create(
    agent='tensorforce',
    environment=environment,  # alternatively: states, actions, (max_episode_timesteps)
    memory=10000,
    update=dict(unit='timesteps', batch_size=64),
    optimizer=dict(type='adam', learning_rate=3e-4),
    policy=dict(network='auto'),
    objective='policy_gradient',
    reward_estimation=dict(horizon=20)
)

# Train for 300 episodes
for _ in range(300):

    # Initialize episode
    states = environment.reset()
    terminal = False

    while not terminal:
        # Episode timestep
        actions = agent.act(states=states)
        states, terminal, reward = environment.execute(actions=actions)
        agent.observe(terminal=terminal, reward=reward)

agent.close()
environment.close()

Command line usage

Tensorforce comes with a range of example configurations for different popular reinforcement learning environments. For instance, to run Tensorforce's implementation of the popular Proximal Policy Optimization (PPO) algorithm on the OpenAI Gym CartPole environment, execute the following line:

python3 run.py --agent benchmarks/configs/ppo.json --environment gym \
    --level CartPole-v1 --episodes 100

For more information check out the documentation.

Features

Network layers: Fully-connected, 1- and 2-dimensional convolutions, embeddings, pooling, RNNs, dropout, normalization, and more; plus support of Keras layers.
Network architecture: Support for multi-state inputs and layer (block) reuse, simple definition of directed acyclic graph structures via register/retrieve layer, plus support for arbitrary architectures.
Memory types: Simple batch buffer memory, random replay memory.
Policy distributions: Bernoulli distribution for boolean actions, categorical distribution for (finite) integer actions, Gaussian distribution for continuous actions, Beta distribution for range-constrained continuous actions, multi-action support.
Reward estimation: Configuration options for estimation horizon, future reward discount, state/state-action/advantage estimation, and for whether to consider terminal and horizon states.
Training objectives: (Deterministic) policy gradient, state-(action-)value approximation.
Optimization algorithms: Various gradient-based optimizers provided by TensorFlow like Adam/AdaDelta/RMSProp/etc, evolutionary optimizer, natural-gradient-based optimizer, plus a range of meta-optimizers.
Exploration: Randomized actions, sampling temperature, variable noise.
Preprocessing: Clipping, deltafier, sequence, image processing.
Regularization: L2 and entropy regularization.
Execution modes: Parallelized execution of multiple environments based on Python's multiprocessing and socket.
Optimized act-only SavedModel extraction.
TensorBoard support.

By combining these modular components in different ways, a variety of popular deep reinforcement learning models/features can be replicated:

Q-learning: Deep Q-learning, Double-DQN, Dueling DQN, n-step DQN, Normalised Advantage Function (NAF)
Policy gradient: vanilla policy-gradient / REINFORCE, Actor-critic and A3C, Proximal Policy Optimization, Trust Region Policy Optimization, Deterministic Policy Gradient

Note that in general the replication is not 100% faithful, since the models as described in the corresponding paper often involve additional minor tweaks and modifications which are hard to support with a modular design (and, arguably, also questionable whether it is important/desirable to support them). On the upside, these models are just a few examples from the multitude of module combinations supported by Tensorforce.

Environment adapters

Arcade Learning Environment, a simple object-oriented framework that allows researchers and hobbyists to develop AI agents for Atari 2600 games.
CARLA, is an open-source simulator for autonomous driving research.
OpenAI Gym, a toolkit for developing and comparing reinforcement learning algorithms which supports teaching agents everything from walking to playing games like Pong or Pinball.
OpenAI Retro, lets you turn classic video games into Gym environments for reinforcement learning and comes with integrations for ~1000 games.
OpenSim, reinforcement learning with musculoskeletal models.
PyGame Learning Environment, learning environment which allows a quick start to Reinforcement Learning in Python.
ViZDoom, allows developing AI bots that play Doom using only the visual information.

Support, feedback and donating

Please get in touch via mail or on Gitter if you have questions, feedback, ideas for features/collaboration, or if you seek support for applying Tensorforce to your problem.

If you want to support the Tensorforce core team (see below), please also consider donating: GitHub Sponsors or Liberapay.

Core team and contributors

Tensorforce is currently developed and maintained by Alexander Kuhnle.

Earlier versions of Tensorforce (<= 0.4.2) were developed by Michael Schaarschmidt, Alexander Kuhnle and Kai Fricke.

The advanced parallel execution functionality was originally contributed by Jean Rabault (@jerabaul29) and Vincent Belus (@vbelus). Moreover, the pretraining feature was largely developed in collaboration with Hongwei Tang (@thw1021) and Jean Rabault (@jerabaul29).

The CARLA environment wrapper is currently developed by Luca Anzalone (@luca96).

We are very grateful for our open-source contributors (listed according to Github, updated periodically):

Islandman93, sven1977, Mazecreator, wassname, lefnire, daggertye, trickmeyer, mkempers, mryellow, ImpulseAdventure, janislavjankov, andrewekhalel, HassamSheikh, skervim, beflix, coord-e, benelot, tms1337, vwxyzjn, erniejunior, Deathn0t, petrbel, nrhodes, batu, yellowbee686, tgianko, AdamStelmaszczyk, BorisSchaeling, christianhidber, Davidnet, ekerazha, gitter-badger, kborozdin, Kismuz, mannsi, milesmcc, nagachika, neitzal, ngoodger, perara, sohakes, tomhennigan.

Cite Tensorforce

Please cite the framework as follows:

@misc{tensorforce,
  author       = {Kuhnle, Alexander and Schaarschmidt, Michael and Fricke, Kai},
  title        = {Tensorforce: a TensorFlow library for applied reinforcement learning},
  howpublished = {Web page},
  url          = {https://github.com/tensorforce/tensorforce},
  year         = {2017}
}

If you use the parallel execution functionality, please additionally cite it as follows:

@article{rabault2019accelerating,
  title        = {Accelerating deep reinforcement learning strategies of flow control through a multi-environment approach},
  author       = {Rabault, Jean and Kuhnle, Alexander},
  journal      = {Physics of Fluids},
  volume       = {31},
  number       = {9},
  pages        = {094105},
  year         = {2019},
  publisher    = {AIP Publishing}
}

If you use Tensorforce in your research, you may additionally consider citing the following paper:

@article{lift-tensorforce,
  author       = {Schaarschmidt, Michael and Kuhnle, Alexander and Ellis, Ben and Fricke, Kai and Gessert, Felix and Yoneki, Eiko},
  title        = {{LIFT}: Reinforcement Learning in Computer Systems by Learning From Demonstrations},
  journal      = {CoRR},
  volume       = {abs/1808.07903},
  year         = {2018},
  url          = {http://arxiv.org/abs/1808.07903},
  archivePrefix = {arXiv},
  eprint       = {1808.07903}
}

Comments

Unable to train for many episodes: RAM usage too high!
Hi @AlexKuhnle, I have some trouble training a ppo agent. Basically, I'm able to train it for only very few episodes (e.g. 4, 8). If I increase the number of episodes, my laptop will crash or freeze due to running out of RAM.

I have a linux machine with 16GB of RAM. Tensorflow 2.1.0 (cpu-only) and Tensorforce 0.5.4. The agent I'm trying to train is defined as follows:

policy_network = dict(type='auto', size=64, depth=2, final_size=256, final_depth=2, internal_rnn=10) agent = Agent.create( agent='ppo', environment=environment, max_episode_timesteps=200, network=policy_network, # Optimization batch_size=4, update_frequency=1, learning_rate=1e-3, subsampling_fraction=0.5, optimization_steps=5,: # Reward estimation likelihood_ratio_clipping=0.2, discount=0.99, estimate_terminal=False, # Critic critic_network='auto', critic_optimizer=dict(optimizer='adam', multi_step=10, learning_rate=1e-3), # Exploration exploration=0.0, variable_noise=0.0, # Regularization l2_regularization=0.0, entropy_regularization=0.0, )

The environment is a custom one I've made: it has a complex state space (i.e. an image and some feature vectors), and a simple action space (i.e. five float actions).

I use a Runner to train the agent:

runner = Runner(agent, environment, max_episode_timesteps=200, num_parallel=None) runner.run(num_episodes=100)

As you can see from the above code snippet, I'd like to train my agent for (at least) 100 episodes but the system crashes after completing episode 4.

I noticed that, during training, every batch_size episodes (4 in my case) Tensorforce allocates an additional amount of 6/7 GB of RAM which causes the system to crash: my OS uses 1 GB + 2/3 GB the environment simulator + 3/4 GB for the agent and environment.

This is what happens (slightly before freezing):

Is this behaviour normal? Just to be sure, I tested a similar (but simpler) agent on the CartPole environment for 300 episodes and it works fine with very little memory overhead. How it's possible?

Thank you in advance.
opened by Luca96 36
Custom network and layer freezing

Hi,

I want to build a custom environment in which an action would be a 2d matrix (basically a b/w image), and one of the solutions I found is to use a policy based algorithm such as PPO with the policy network having layers of deconvolutions (I would probably use a U-net).

I first intended to use baselines, but I want the output of my network to match the action pixel-wise (the output at position (x,y) is used for the value at position (x,y) in the action), which I believe is not the case in the PPO2 implementation of baselines, where there is a fully-connected layer when the output of the network becomes the parameters of a probability distribution from which the action is sampled.

Would it be possible to simply write the U-net architecture as a dictionary in your implementation, and have it working like I want, given the action space and network output shape are matching, or am I missing something ?

Also, is it possible to freeze layers of the network, and/or use a pre-trained network ?

I read through the documentation, but some of my questions will probably have an obvious answer somewhere in the repo, sorry for that !

opened by vbelus 29
Quickstart example get stuck [GPU]
Hi,

I just installed tensorforce (from pip) with tensorflow-gpu 1.7 and tried to run example/quickstart.py. The training starts but then gets stucks after n episodes where n is the minimum of batch_size and frequency value of the update_mode argument of PPOAgent.

update_mode=dict( unit='episodes', # 10 episodes per update batch_size=20, # Every 10 episodes frequency=20 ),

No error message is displayed, it just hangs forever. Has anyone experienced something similar?

Thanks,
opened by nisace 28
tf2 branch: unable to use "saved_model"

Hi,

I've started to look at the saved_model export in the tf2 branch and I face some issues: First, I had to change tensorforce/core/utils/dicts.py, line 121 to accept all data types - it seems that tensorflow tries to rebuild dictionaries in the process: value_type=(tf.IndexedSlices, tf.Tensor, tf.Variable, object)

Then, in tensorforce/core/models/model.py line 678, I got errors caused by the signature: ValueError: Got non-flat outputs '(TensorDict(main_sail_angle=Tensor("StatefulPartitionedCall:1", shape=(None,), dtype=float32), jib_angle=Tensor("StatefulPartitionedCall:0", shape=(None,), dtype=float32), rudder_angle=Tensor("StatefulPartitionedCall:2", shape=(None,), dtype=float32)), TensorDict())' from 'b'__inference_function_graph_2203'' for SavedModel signature 'serving_default'. Signatures have one Tensor per output, so to have predictable names Python functions used to generate these signatures should avoid outputting Tensors in nested structures.

I tried to remove the signature in the saved_model.save call, and I got troubles with tensorforce/core/module.py, the function tf_function which build a function_graphs with keys which are tuples - and tensorflow doesn't like it. I converted them to string and I could save a file, but it's totally unusable.

So I'm stuck here, I'd need more help: what is tf_function doing exactly? Why don't you use tf.function instead?

Thanks! Ben

opened by bezineb5 24
Error: Invalid Gradient

Hi! I got this error during the training which I never saw before. Could you please help me with it?

Thank you very much! Zebin Li

Traceback (most recent call last): File "C:/Users/lizeb/Box/research projects/active learning for image classification/code/run_manytimes_RAL_AL_samedata_0522.py", line 18, in performance_history_RL, performance_history_RL_test, performance_history_AL, performance_history_AL_test, test_RL, test_AL, rewards = runmanytimes() File "C:\Users\lizeb\Box\research projects\active learning for image classification\code\RAL_AL_samedata_0522.py", line 407, in runmanytimes agent.observe(terminal=terminal, reward=reward) File "C:\Users\lizeb\Box\research projects\active learning for image classification\code\venv\lib\site-packages\tensorforce\agents\agent.py", line 510, in observe updated, episodes, updates = self.model.observe( File "C:\Users\lizeb\Box\research projects\active learning for image classification\code\venv\lib\site-packages\tensorforce\core\module.py", line 128, in decorated output_args = function_graphsstr(graph_params) File "C:\Users\lizeb\Box\research projects\active learning for image classification\code\venv\lib\site-packages\tensorflow\python\eager\def_function.py", line 780, in call result = self._call(*args, **kwds) File "C:\Users\lizeb\Box\research projects\active learning for image classification\code\venv\lib\site-packages\tensorflow\python\eager\def_function.py", line 814, in _call results = self._stateful_fn(*args, **kwds) File "C:\Users\lizeb\Box\research projects\active learning for image classification\code\venv\lib\site-packages\tensorflow\python\eager\function.py", line 2829, in call return graph_function._filtered_call(args, kwargs) # pylint: disable=protected-access File "C:\Users\lizeb\Box\research projects\active learning for image classification\code\venv\lib\site-packages\tensorflow\python\eager\function.py", line 1843, in _filtered_call return self._call_flat( File "C:\Users\lizeb\Box\research projects\active learning for image classification\code\venv\lib\site-packages\tensorflow\python\eager\function.py", line 1923, in _call_flat return self._build_call_outputs(self._inference_function.call( File "C:\Users\lizeb\Box\research projects\active learning for image classification\code\venv\lib\site-packages\tensorflow\python\eager\function.py", line 545, in call outputs = execute.execute( File "C:\Users\lizeb\Box\research projects\active learning for image classification\code\venv\lib\site-packages\tensorflow\python\eager\execute.py", line 59, in quick_execute tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name, tensorflow.python.framework.errors_impl.InvalidArgumentError: Invalid gradient: contains inf or nan. : Tensor had NaN values [[{{node agent/StatefulPartitionedCall/agent/cond_1/then/_262/agent/cond_1/StatefulPartitionedCall/agent/StatefulPartitionedCall_5/policy_optimizer/StatefulPartitionedCall/policy_optimizer/StatefulPartitionedCall/policy_optimizer/while/body/_1185/policy_optimizer/while/StatefulPartitionedCall/policy_optimizer/cond/then/_1464/policy_optimizer/cond/StatefulPartitionedCall/policy_optimizer/VerifyFinite/CheckNumerics}}]] [Op:__inference_observe_5103]

Function call stack: observe

opened by Zebin-Li 23

Some questions about tensorforce

Hi, thanks for your great work. But when I read the docs I have some questions about this framework.

Q1: How does the network update? Is it agent.observe(terminal=terminal, reward=reward) collect gradients until the specified timesteps/episodes in update_model?

Q2: Does the output layer of network define automatically when we define an agent? Such as I define an DQNAgent which has three actions to choose and I don't need to define the last layer is dict(type='dense', size=3, activation='softmax').

Q3: DQNAgent needs to collect [St, a, r, St+1], in the following examples:

while True:
    state2=f(state)  
    action = agent.act(states=state2)
    action2=g(action) 
    state, reward, terminal = environment.execute(actions=action2)
    agent.observe(reward=reward, terminal=terminal)

Does it collects [state2, action2, r, state2'] or [state, action, r, state'] ?

Q4: How can I output training loss ?

Actually, I use a DQNAgent to realize robot navigation task. the input is compressed image, goal and the previous action. The output is three actions(forwrd, left, right) to choose. The agent is defined as:

network_spec = [
    dict(type='dense', size=128, activation='relu'),
    dict(type='dense', size=32, activation='relu')
]

memory = dict(
    type='replay',
    include_next_states=True,
    capacity=10000
)

exploration = dict(
    type='epsilon_decay',
    initial_epsilon=1.0,
    final_epsilon=0.1,
    timesteps=10000,
    start_timestep=0
)

update_model = dict(
    unit='timesteps',
    batch_size=64,
    frequency=64
)

optimizer = dict(
    type='adam',
    learning_rate=0.0001
)

agent = DQNAgent(
    states=dict(shape=(36,), type='float'), 
    actions=dict(shape=(3,), type='int'), 
    network=network_spec,
    update_mode=update_model,
    memory=memory,
    actions_exploration=exploration,
    optimizer=optimizer,
    double_q_model=True
)

Because I need to deal with the captured image to a compressed vector as a part of state, I run an episode as the following rather than using runner.

    while True:
        compressed_image = compress_image(observation)   # map the capture image to a 32-dim vector
        goal = env.goal   # shape(2, )
        pre_action = action  # shape(2, )
        state = compressed_image + goal + pre_action
        action = agent.act(state)
        observation, terminal, reward = env.execute(action)
        agent.observe(terminal=terminal, reward=reward)
        timestep += 1
        episode_reward += reward
        if terminal or timestep == max_timesteps:
            success = env.success
            break

Can it work? I haved trained much time but the result is not ideal. So I want to know if I use tensorforce correctly. Thank you!

opened by marooncn 23

[silent BUG] Saving/Restoring/Seeding PPO model when action_spec has multiple actions

I'm still a novice with tensorforce. I'm trying to save my ppo agent after training. The agent train well but when I save the model, stop the program, then relaunch the program, restore model then the agent performance is as from scratch, whereas it was working well before.

To save/restore I use :

agent.save_model(directory=directory)
agent.restore_model(directory=directory)

I have looked using :

 tf_weights =agent.model.network.get_variables()
 weights = agent.model.session.run(tf_weights)
 print(weights)

That the saved weights are correctly restored.

I tried to set some seed using tf.set_random_seed(42) at the beginning of my program in the hope to obtain reproducible results (my env is fully deterministic), but upon two sequential launch from the same restored weight, I get different actions for the same input.

First run first action after restore :

input : 
[[ 0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.
   0.05  0.    0.    0.    0.05  0.    0.    0.    0.05]]
action : 
[-1.65043855 -0.12582253  0.33019719 -0.42400551  0.39128172 -0.1892394
 -1.38783872 -0.84797424 -0.76125687 -0.44233581  0.2647087   0.57517719]

Second run first action after restore :

input : 
[[ 0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.
   0.05  0.    0.    0.    0.05  0.    0.    0.    0.05]]
action : 
[ 0.00452828  1.70186901  0.18290332  0.1153125   0.80178595 -1.31738091
  0.2404308  -0.16986398 -1.69459999  2.09507513 -0.46165684 -0.34024456]

I have disabled exploration and created the agent with :

layerSize=300
 actions = {}
 for i in range(12):
       actions[str(i)] = {'type': 'float'}
network_spec = [
            dict(type='dense', size=layerSize, activation='selu'),
            dict(type='dense', size=layerSize, activation='selu'),
            dict(type='dense', size=layerSize, activation='selu')
        ]
agent = PPOAgent(
            states=dict(type='float', shape=(12+9,)),
            actions=actions,
            batching_capacity=1000,
            network=network_spec,
            states_preprocessing=None,
            entropy_regularization=1e-3,
            actions_exploration=None,
            step_optimizer=dict(
                type='adam',
                learning_rate=1e-5
            ),
        )

Are there some extra parameters which needs to be saved when saving a PPO agent ? (maybe the parameters of the last layer which are used to generate the mean and variance of the gaussians needed to generate the continuous action).

tensorforce.__version__
'0.4.2'

Thanks

opened by unrealwill 23

What is the output of Agent Neural Network? If there is a std, can we fix it manually?

Hi there, I'm curious about the output of actor NN. In RL, the action is obtained by sampling from the output distribution of actor NN. Therefore, the output of actor NN must have something like mean and standard deviation if it is a Gaussian. We can also fix the std and let NN give us the mean. What is the setting in your library? Can we change it manually?

Besides, when we create the agent, we only need to provide the max and min value of actions. How do you choose the action if the sampled action outside the range? Do you select the boundary value or shrink the distribution?

Help appreciated!

opened by XueminLiu111 22
Network Spec / Layers Documentation

First of all, hello! I'm glad to have discovered this project and am planning on trying to use it.

As for my question - I am unable to find any documentation describing what each of the layers are, what they do, and what their parameters are. Have I missed it or is it nonexistent? If it doesn't exist, I'd be happy to add some.

opened by chairbender 22

InvalidArgumentError on terminal observe call

Perhaps something is wrong with my code, but almost half the time when the episode ends, I get an assertion error when I run observe on my PPO agent:

Traceback (most recent call last):
  File "ll.py", line 208, in <module>
    main()
  File "ll.py", line 181, in main
    agent.give_reward(reward, terminal)
  File "ll.py", line 123, in give_reward
    self.agent.observe(reward=reward, terminal=terminal)
  File "c:\users\connor\desktop\tensorforce\tensorforce\agents\agent.py", line 534, in observe
    terminal=terminal, reward=reward, parallel=[parallel], **kwargs
  File "c:\users\connor\desktop\tensorforce\tensorforce\core\module.py", line 578, in fn
    fetched = self.monitored_session.run(fetches=fetches, feed_dict=feed_dict)
  File "C:\Users\Connor\Anaconda3\envs\ll\lib\site-packages\tensorflow_core\python\training\monitored_session.py", line 754, in run
    run_metadata=run_metadata)
  File "C:\Users\Connor\Anaconda3\envs\ll\lib\site-packages\tensorflow_core\python\training\monitored_session.py", line 1360, in run
    raise six.reraise(*original_exc_info)
  File "C:\Users\Connor\Anaconda3\envs\ll\lib\site-packages\six.py", line 696, in reraise
    raise value
  File "C:\Users\Connor\Anaconda3\envs\ll\lib\site-packages\tensorflow_core\python\training\monitored_session.py", line 1345, in run
    return self._sess.run(*args, **kwargs)
  File "C:\Users\Connor\Anaconda3\envs\ll\lib\site-packages\tensorflow_core\python\training\monitored_session.py", line 1418, in run
    run_metadata=run_metadata)
  File "C:\Users\Connor\Anaconda3\envs\ll\lib\site-packages\tensorflow_core\python\training\monitored_session.py", line 1176, in run
    return self._sess.run(*args, **kwargs)
  File "C:\Users\Connor\Anaconda3\envs\ll\lib\site-packages\tensorflow_core\python\client\session.py", line 956, in run
    run_metadata_ptr)
  File "C:\Users\Connor\Anaconda3\envs\ll\lib\site-packages\tensorflow_core\python\client\session.py", line 1180, in _run
    feed_dict_tensor, options, run_metadata)
  File "C:\Users\Connor\Anaconda3\envs\ll\lib\site-packages\tensorflow_core\python\client\session.py", line 1359, in _do_run
    run_metadata)
  File "C:\Users\Connor\Anaconda3\envs\ll\lib\site-packages\tensorflow_core\python\client\session.py", line 1384, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: assertion failed: [] [Condition x <= y did not hold element-wise:x (baseline-network-state.observe/baseline-network-state.core_observe/baseline-network-state.core_experience/memory.enqueue/strided_slice:0) = ] [18243] [y (baseline-network-state.observe/baseline-network-state.core_observe/baseline-network-state.core_experience/memory.enqueue/sub_2:0) = ] [17999]
         [[{{node Assert}}]]

My original theory was that I was accidentally calling observe again after setting terminal=True and before resetting the agent, or some other abuse of observe, but I prevented that from happening in my code, so I don't believe that's the case. Also, the episode runs completely fine, and I get through thousands of calls to observe without ever running to any issues. It's only when terminal=True that it seems to occur.

Running on Windows 10 x64, with tensorflow-gpu v2.0.0 on an RTX2070, Tensorforce installed from the Github at commit 827febcf8ffda851e5e4f0c9d12d4a8e8502b282

opened by connorlbark 21

Configuration refactoring - thoughts and suggestions welcome!
Configuration has been a topic of discussion for quite some time now, so I thought it'd be a good idea to get all those thoughts in one place and hopefully solicit some user thoughts as well.

From my understanding, the current purposes of configs are:

make it easy for people to get up and running

get all parameters in one place for ease of setting up experiments and making them interpretable

keep signatures simple, which makes it easy to create arbitrary things from one big blob

The current issues I'm having with configs are:

they are somewhere between a dictionary and a blob-object, which makes them confusing

they aren't serializable, so I have to create config wrappers around Configurations. Eww.

defaults and unused parameters make it challenging to know what's really being used.

My personal opinion is that we can keep benefits (1) and (2) above and get rid of all three issues all in exchange for the small sacrifice of benefit (3). In fact, I don't know how much of a benefit (3) is, as it obfuscates the true parameters of all objects in the codebase.

I would propose doing so by putting the burden of the parameter creation and passing into the constructors of objects onto the user. Any intermediate user will have experience creating parameter/config generation wrappers. Less experienced users who want to get up and running quickly can still use the same JSON objects you've already written with something like this when actually creating the objects downstream:

SomeObject(config_dict['some_key'], config_dict['another_key'])

Users who want to create defaults can do this:

SomeObject(config_dict.get('some_key', default_value), config_dict.get('another_key', another_default_value))

Which to me is a much more clear way of going about defaults.

The side benefit of all of this is that you aren't stuck supporting configurations for users. Configuration and deployment are two challenging parts of any project, and I personally would prefer everyone's time spent on RL, not trying to solve a fundamental CS issue (configuration!) that in the end is always problem-specific, and no matter how hard we try, never suits everyone's needs.
opened by trickmeyer 21
Issues installing Tensorforce from pip on Python 3.10

I've been trying to use Tensorforce for a project in my college machine learning course, and ran into this issue: On Python 3.10, pip for some reason automatically installed Tensorforce 0.5.5 (probably because of issues discovered below). I was then able to import modules from it into a Jupyter notebook and define a custom environment class, but it threw a confusing error when I tried to initialize an agent. Trying to upgrade Tensorforce to 0.6.5 caused errors with backend dependency numpy, "could not build wheels for numpy." I eventually discovered that this was because Tensorforce 0.6.5 uses numpy 1.19.5, which is only compatible with Python releases up to 3.9. I'm currently trying to work around the problem with pyenv, and running into other issues with tensorflow and keras, but that's probably irrelevant to this project specifically. The main issue I wanted to point out is just that pip allowed the older version of Tensorforce to be installed on Python 3.10 and somehow missed the dependency issue, resulting in esoteric NoneType errors deep in the optimizer code. (I'm using KDE Plasma 5.24.7 on Ubuntu 22.04.1 custom-installed on an HP laptop with an Intel Core i5 10th Gen CPU and integrated graphics, just in case any of that is relevant at all.)

opened by Nat-the-Chicken 0
Bump tensorflow from 2.8.0 to 2.9.3
Bumps tensorflow from 2.8.0 to 2.9.3.

Release notes

Sourced from tensorflow's releases.

TensorFlow 2.9.3

Release 2.9.3

This release introduces several vulnerability fixes:

Fixes an overflow in tf.keras.losses.poisson (CVE-2022-41887)

Fixes a heap OOB failure in ThreadUnsafeUnigramCandidateSampler caused by missing validation (CVE-2022-41880)

Fixes a segfault in ndarray_tensor_bridge (CVE-2022-41884)

Fixes an overflow in FusedResizeAndPadConv2D (CVE-2022-41885)

Fixes a overflow in ImageProjectiveTransformV2 (CVE-2022-41886)

Fixes an FPE in tf.image.generate_bounding_box_proposals on GPU (CVE-2022-41888)

Fixes a segfault in pywrap_tfe_src caused by invalid attributes (CVE-2022-41889)

Fixes a CHECK fail in BCast (CVE-2022-41890)

Fixes a segfault in TensorListConcat (CVE-2022-41891)

Fixes a CHECK_EQ fail in TensorListResize (CVE-2022-41893)

Fixes an overflow in CONV_3D_TRANSPOSE on TFLite (CVE-2022-41894)

Fixes a heap OOB in MirrorPadGrad (CVE-2022-41895)

Fixes a crash in Mfcc (CVE-2022-41896)

Fixes a heap OOB in FractionalMaxPoolGrad (CVE-2022-41897)

Fixes a CHECK fail in SparseFillEmptyRowsGrad (CVE-2022-41898)

Fixes a CHECK fail in SdcaOptimizer (CVE-2022-41899)

Fixes a heap OOB in FractionalAvgPool and FractionalMaxPool(CVE-2022-41900)

Fixes a CHECK_EQ in SparseMatrixNNZ (CVE-2022-41901)

Fixes an OOB write in grappler (CVE-2022-41902)

Fixes a overflow in ResizeNearestNeighborGrad (CVE-2022-41907)

Fixes a CHECK fail in PyFunc (CVE-2022-41908)

Fixes a segfault in CompositeTensorVariantToComponents (CVE-2022-41909)

Fixes a invalid char to bool conversion in printing a tensor (CVE-2022-41911)

Fixes a heap overflow in QuantizeAndDequantizeV2 (CVE-2022-41910)

Fixes a CHECK failure in SobolSample via missing validation (CVE-2022-35935)

Fixes a CHECK fail in TensorListScatter and TensorListScatterV2 in eager mode (CVE-2022-35935)

TensorFlow 2.9.2

Release 2.9.2

This releases introduces several vulnerability fixes:

Fixes a CHECK failure in tf.reshape caused by overflows (CVE-2022-35934)

Fixes a CHECK failure in SobolSample caused by missing validation (CVE-2022-35935)

Fixes an OOB read in Gather_nd op in TF Lite (CVE-2022-35937)

Fixes a CHECK failure in TensorListReserve caused by missing validation (CVE-2022-35960)

Fixes an OOB write in Scatter_nd op in TF Lite (CVE-2022-35939)

Fixes an integer overflow in RaggedRangeOp (CVE-2022-35940)

Fixes a CHECK failure in AvgPoolOp (CVE-2022-35941)

Fixes a CHECK failures in UnbatchGradOp (CVE-2022-35952)

Fixes a segfault TFLite converter on per-channel quantized transposed convolutions (CVE-2022-36027)

Fixes a CHECK failures in AvgPool3DGrad (CVE-2022-35959)

Fixes a CHECK failures in FractionalAvgPoolGrad (CVE-2022-35963)

Fixes a segfault in BlockLSTMGradV2 (CVE-2022-35964)

Fixes a segfault in LowerBound and UpperBound (CVE-2022-35965)

... (truncated)

Changelog

Sourced from tensorflow's changelog.

Release 2.9.3

This release introduces several vulnerability fixes:

Fixes an overflow in tf.keras.losses.poisson (CVE-2022-41887)

Fixes a heap OOB failure in ThreadUnsafeUnigramCandidateSampler caused by missing validation (CVE-2022-41880)

Fixes a segfault in ndarray_tensor_bridge (CVE-2022-41884)

Fixes an overflow in FusedResizeAndPadConv2D (CVE-2022-41885)

Fixes a overflow in ImageProjectiveTransformV2 (CVE-2022-41886)

Fixes an FPE in tf.image.generate_bounding_box_proposals on GPU (CVE-2022-41888)

Fixes a segfault in pywrap_tfe_src caused by invalid attributes (CVE-2022-41889)

Fixes a CHECK fail in BCast (CVE-2022-41890)

Fixes a segfault in TensorListConcat (CVE-2022-41891)

Fixes a CHECK_EQ fail in TensorListResize (CVE-2022-41893)

Fixes an overflow in CONV_3D_TRANSPOSE on TFLite (CVE-2022-41894)

Fixes a heap OOB in MirrorPadGrad (CVE-2022-41895)

Fixes a crash in Mfcc (CVE-2022-41896)

Fixes a heap OOB in FractionalMaxPoolGrad (CVE-2022-41897)

Fixes a CHECK fail in SparseFillEmptyRowsGrad (CVE-2022-41898)

Fixes a CHECK fail in SdcaOptimizer (CVE-2022-41899)

Fixes a heap OOB in FractionalAvgPool and FractionalMaxPool(CVE-2022-41900)

Fixes a CHECK_EQ in SparseMatrixNNZ (CVE-2022-41901)

Fixes an OOB write in grappler (CVE-2022-41902)

Fixes a overflow in ResizeNearestNeighborGrad (CVE-2022-41907)

Fixes a CHECK fail in PyFunc (CVE-2022-41908)

Fixes a segfault in CompositeTensorVariantToComponents (CVE-2022-41909)

Fixes a invalid char to bool conversion in printing a tensor (CVE-2022-41911)

Fixes a heap overflow in QuantizeAndDequantizeV2 (CVE-2022-41910)

Fixes a CHECK failure in SobolSample via missing validation (CVE-2022-35935)

Fixes a CHECK fail in TensorListScatter and TensorListScatterV2 in eager mode (CVE-2022-35935)

Release 2.8.4

This release introduces several vulnerability fixes:

Fixes a heap OOB failure in ThreadUnsafeUnigramCandidateSampler caused by missing validation (CVE-2022-41880)

Fixes a segfault in ndarray_tensor_bridge (CVE-2022-41884)

Fixes an overflow in FusedResizeAndPadConv2D (CVE-2022-41885)

Fixes a overflow in ImageProjectiveTransformV2 (CVE-2022-41886)

Fixes an FPE in tf.image.generate_bounding_box_proposals on GPU (CVE-2022-41888)

Fixes a segfault in pywrap_tfe_src caused by invalid attributes (CVE-2022-41889)

Fixes a CHECK fail in BCast (CVE-2022-41890)

Fixes a segfault in TensorListConcat (CVE-2022-41891)

Fixes a CHECK_EQ fail in TensorListResize (CVE-2022-41893)

Fixes an overflow in CONV_3D_TRANSPOSE on TFLite (CVE-2022-41894)

Fixes a heap OOB in MirrorPadGrad (CVE-2022-41895)

Fixes a crash in Mfcc (CVE-2022-41896)

Fixes a heap OOB in FractionalMaxPoolGrad (CVE-2022-41897)

Fixes a CHECK fail in SparseFillEmptyRowsGrad (CVE-2022-41898)

Fixes a CHECK fail in SdcaOptimizer (CVE-2022-41899)

... (truncated)

Commits

a5ed5f3 Merge pull request #58584 from tensorflow/vinila21-patch-2

258f9a1 Update py_func.cc

cd27cfb Merge pull request #58580 from tensorflow-jenkins/version-numbers-2.9.3-24474

3e75385 Update version numbers to 2.9.3

bc72c39 Merge pull request #58482 from tensorflow-jenkins/relnotes-2.9.3-25695

3506c90 Update RELEASE.md

8dcb48e Update RELEASE.md

4f34ec8 Merge pull request #58576 from pak-laura/c2.99f03a9d3bafe902c1e6beb105b2f2417...

6fc67e4 Replace CHECK with returning an InternalError on failing to create python tuple

5dbe90a Merge pull request #58570 from tensorflow/r2.9-7b174a0f2e4

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 0

Gym envirnoment broken: 'dict' object has no attribute 'env_specs

It seems that the Gym module had updated recently, leaving examples that use it, broken.

I tried act_observe_interface.py

Traceback:

AttributeError                            Traceback (most recent call last)
Cell In [75], line 59
     55     environment.close()
     58 if __name__ == '__main__':
---> 59     main()

Cell In [75], line 20, in main()
     19 def main():
---> 20     environment = Environment.create(environment='cartpole.json')
     21     agent = Agent.create(agent='ppo.json', environment=environment)
     23     # Train for 100 episodes

File c:\Stuff\Code\ML-main\ML_env\lib\site-packages\tensorforce\environments\environment.py:176, in Environment.create(environment, max_episode_timesteps, remote, blocking, host, port, **kwargs)
    173     if max_episode_timesteps is None:
    174         max_episode_timesteps = kwargs.pop('max_episode_timesteps', None)
--> 176     return Environment.create(
    177         environment=environment, max_episode_timesteps=max_episode_timesteps, **kwargs
    178     )
    180 elif '.' in environment:
    181     # Library specification
    182     library_name, module_name = environment.rsplit('.', 1)

File c:\Stuff\Code\ML-main\ML_env\lib\site-packages\tensorforce\environments\environment.py:192, in Environment.create(environment, max_episode_timesteps, remote, blocking, host, port, **kwargs)
    189 elif environment in tensorforce.environments.environments:
    190     # Keyword specification
    191     environment = tensorforce.environments.environments[environment]
--> 192     return Environment.create(
    193         environment=environment, max_episode_timesteps=max_episode_timesteps, **kwargs
    194     )
    196 else:
    197     # Default: OpenAI Gym
    198     try:

File c:\Stuff\Code\ML-main\ML_env\lib\site-packages\tensorforce\environments\environment.py:146, in Environment.create(environment, max_episode_timesteps, remote, blocking, host, port, **kwargs)
    143     return environment
    145 elif isinstance(environment, type) and issubclass(environment, Environment):
--> 146     environment = environment(**kwargs)
    147     assert isinstance(environment, Environment)
    148     return Environment.create(
    149         environment=environment, max_episode_timesteps=max_episode_timesteps
    150     )

File c:\Stuff\Code\ML-main\ML_env\lib\site-packages\tensorforce\environments\openai_gym.py:163, in OpenAIGym.__init__(self, level, visualize, max_episode_steps, terminal_reward, reward_threshold, drop_states_indices, visualize_directory, **kwargs)
    161     self.max_episode_steps = max_episode_steps
    162 else:
--> 163     self.environment, self.max_episode_steps = self.__class__.create_level(
    164         level=self.level, max_episode_steps=max_episode_steps,
    165         reward_threshold=reward_threshold, **kwargs
    166     )
    168 if visualize_directory is not None:
    169     self.environment = gym.wrappers.Monitor(
    170         env=self.environment, directory=visualize_directory
    171     )

File c:\Stuff\Code\ML-main\ML_env\lib\site-packages\tensorforce\environments\openai_gym.py:67, in OpenAIGym.create_level(cls, level, max_episode_steps, reward_threshold, **kwargs)
     64 requires_register = False
     66 # Find level
---> 67 if level not in gym.envs.registry.env_specs:
     68     if max_episode_steps is None:  # interpret as false if level does not exist
     69         max_episode_steps = False

AttributeError: 'dict' object has no attribute 'env_specs'

opened by Ammar-AlDabbagh 1

lstm+ppo

While training Pendulum-v0 with lstm+ppo, the following problems occurred using parallel environment:

Traceback (most recent call last):
  File "D:\desktop\lunwen_dabao\xinsuanfa0912\tian_sac_test\launch_multiprocessing_traning_cylinder.py", line 91, in <module>
    runner = Runner(
  File "C:\Users\1900\.conda\envs\yl\lib\site-packages\tensorforce\execution\runner.py", line 168, in __init__
    environment = Environment.create(
  File "C:\Users\1900\.conda\envs\yl\lib\site-packages\tensorforce\environments\environment.py", line 94, in create
    environment = MultiprocessingEnvironment(
  File "C:\Users\1900\.conda\envs\yl\lib\site-packages\tensorforce\environments\multiprocessing_environment.py", line 62, in __init__
    process.start()
  File "C:\Users\1900\.conda\envs\yl\lib\multiprocessing\process.py", line 121, in start
    self._popen = self._Popen(self)
  File "C:\Users\1900\.conda\envs\yl\lib\multiprocessing\context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "C:\Users\1900\.conda\envs\yl\lib\multiprocessing\context.py", line 327, in _Popen
    return Popen(process_obj)
  File "C:\Users\1900\.conda\envs\yl\lib\multiprocessing\popen_spawn_win32.py", line 93, in __init__
    reduction.dump(process_obj, to_child)
  File "C:\Users\1900\.conda\envs\yl\lib\multiprocessing\reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'overwrite_staticmethod.<locals>.overwritten'

Below is my code, what is wrong with that part? And about the settings of lstm, is there a problem?

import argparse
import re
from tensorforce import Runner, Agent,Environment
# import envobject_cylinder
import os
import gym

parser = argparse.ArgumentParser()
shell_args = vars(parser.parse_args())
shell_args['num_episodes']=300
shell_args['max_episode_timesteps']=200

number_servers=10
environments = []
for i in range(10):
    env = Environment.create(
    environment='gym', level='Pendulum-v0', max_episode_timesteps=200
    )
    environments.append(env)

# environment = Environment.create(
#     environment='gym', level='Pendulum-v0', max_episode_timesteps=200
# )

network_spec = [
    dict(type='rnn', size=512,horizon=4,activation='tanh',cell='lstm'),
    dict(type='rnn', size=512,horizon=4,activation='tanh',cell='lstm')
]
baseline_spec = [
   dict(type='rnn', size=512,horizon=4,activation='tanh',cell='lstm', ),
    dict(type='rnn', size=512,horizon=4,activation='tanh',cell='lstm', )
]

# env=gym.make('Pendulum-v0')
# Instantiate a Tensorforce agent
agent = Agent.create(
    states=dict(
                type='float',
                shape=(int(3), )
            ),
    actions=dict(
            type='float',
            shape=(1, ),
            min_value=-2,
            max_value=2
        ),
    max_episode_timesteps=200,
    agent='ppo',
    # num_parallel=10,
    environment=env,
    # max_episode_timesteps=200,
    batch_size=20,
     network=network_spec,
    learning_rate=0.001,state_preprocessing=None,
    entropy_regularization=0.01, likelihood_ratio_clipping=0.2, subsampling_fraction=0.2,
    predict_terminal_values=True,
    discount=0.97,
    # baseline=dict(type='1', size=[32, 32]),
    baseline=baseline_spec,
    baseline_optimizer=dict(
        type='multi_step',
        optimizer=dict(
            type='adam',
            learning_rate=1e-3
        ),
        num_steps=5
    ),
    multi_step=25,
    parallel_interactions=number_servers,
    saver=dict(directory=os.path.join(os.getcwd(), 'saved_models/checkpoint'),frequency=1  
    # save checkpoint every 100 updates
    ),
    summarizer=dict(
        directory='summary',
        # list of labels, or 'all'
        summaries=['entropy', 'kl-divergence', 'loss', 'reward', 'update-norm']
    ),
)
print('Agent defined DONE!')

runner = Runner(
    agent=agent,
    num_parallel=10,
    environments=environments,
    max_episode_timesteps=200,
    evaluation=False,
    remote='multiprocessing',
)
print('Runner defined DONE!')

runner.run(num_episodes=shell_args['num_episodes'],
           save_best_agent ='best_model',
           sync_episodes=False,
           )
runner.close()

opened by 1900360 0

GPU integration for MacOs12.3 M1 Max
I ran the Quickstart.py example, and I get the following error;

Metal device set to: Apple M1 Max

systemMemory: 32.00 GB maxCacheSize: 10.67 GB

WARNING:root:Infinite min_value bound for state. Episodes: 0%| | 0/200 [00:00, return=0.00, ts/ep=0, sec/ep=0.00, ms/ts=0.0, agent=0.0%]Traceback (most recent call last): File "/Users/dominikrichard/workspace/minesweeping/minesweepingpython/main/tensforce_testing.py", line 53, in main() File "/Users/dominikrichard/workspace/minesweeping/minesweepingpython/main/tensforce_testing.py", line 46, in main runner.run(num_episodes=200) File "/opt/homebrew/anaconda3/envs/TensFenv/lib/python3.10/site-packages/tensorforce/execution/runner.py", line 649, in run self.handle_act(parallel=n) File "/opt/homebrew/anaconda3/envs/TensFenv/lib/python3.10/site-packages/tensorforce/execution/runner.py", line 697, in handle_act actions = self.agent.act(states=self.states[parallel], parallel=parallel) File "/opt/homebrew/anaconda3/envs/TensFenv/lib/python3.10/site-packages/tensorforce/agents/agent.py", line 415, in act return super().act( File "/opt/homebrew/anaconda3/envs/TensFenv/lib/python3.10/site-packages/tensorforce/agents/recorder.py", line 262, in act actions, internals = self.fn_act( File "/opt/homebrew/anaconda3/envs/TensFenv/lib/python3.10/site-packages/tensorforce/agents/agent.py", line 462, in fn_act actions, timesteps = self.model.act( File "/opt/homebrew/anaconda3/envs/TensFenv/lib/python3.10/site-packages/tensorforce/core/module.py", line 136, in decorated output_args = function_graphsstr(graph_params) File "/opt/homebrew/anaconda3/envs/TensFenv/lib/python3.10/site-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler raise e.with_traceback(filtered_tb) from None File "/opt/homebrew/anaconda3/envs/TensFenv/lib/python3.10/site-packages/tensorflow/python/eager/execute.py", line 54, in quick_execute tensors = pywrap_tfe.TFE_Py_Execute(ctx.handle, device_name, op_name, tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation agent/VerifyFinite/CheckNumerics: Could not satisfy explicit device specification '' because the node {{colocation_node agent/VerifyFinite/CheckNumerics}} was colocated with a group of nodes that required incompatible device '/job:localhost/replica:0/task:0/device:GPU:0'. All available devices [/job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:GPU:0]. Colocation Debug Info: Colocation group had the following types and supported devices: Root Member(assigned_device_name_index=1 requested_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' assigned_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' resource_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[] Identity: GPU CPU Switch: GPU CPU CheckNumerics: CPU _Arg: GPU CPU

Colocation members, user-requested devices, and framework assigned devices, if any: args_0 (_Arg) framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0 agent/VerifyFinite/CheckNumerics (CheckNumerics) agent/VerifyFinite/control_dependency (Identity) agent/assert_greater_equal/Assert/AssertGuard/args_0/_16 (Switch) agent/assert_less_equal/Assert/AssertGuard/args_0/_26 (Switch) Func/agent/StatefulPartitionedCall/input/_80 (Identity) /job:localhost/replica:0/task:0/device:GPU:0 Func/agent/assert_greater_equal/Assert/AssertGuard/then/_10/input/_153 (Identity) Func/agent/assert_greater_equal/Assert/AssertGuard/else/_11/input/_159 (Identity) Func/agent/assert_less_equal/Assert/AssertGuard/then/_20/input/_165 (Identity) Func/agent/assert_less_equal/Assert/AssertGuard/else/_21/input/_171 (Identity) Func/agent/StatefulPartitionedCall/state_preprocessing/PartitionedCall/input/_260 (Identity) /job:localhost/replica:0/task:0/device:GPU:0 Func/agent/StatefulPartitionedCall/state_preprocessing/PartitionedCall/linear_normalization0/PartitionedCall/input/_356 (Identity) /job:localhost/replica:0/task:0/device:GPU:0

[[{{node agent/VerifyFinite/CheckNumerics}}]] [Op:__inference_act_1848]

Episodes: 0%| | 0/200 [00:00, return=0.00, ts/ep=0, sec/ep=0.00, ms/ts=0.0, agent=0.0%]

I installed Tensorforce using this guide; https://tensorforce.readthedocs.io/en/latest/basics/installation.html

for M1 Mac in a new Conda environment. I also had to upgrade numpy to 1.22 to run the code.

My Conda env is build as follow;

Name Version Build Channel

absl-py 1.2.0 pypi_0 pypi astunparse 1.6.3 pypi_0 pypi blas 1.0 openblas
bzip2 1.0.8 h620ffc9_4
c-ares 1.18.1 h1a28f6b_0
ca-certificates 2022.07.19 hca03da5_0
cachetools 5.2.0 pypi_0 pypi certifi 2022.6.15 py310hca03da5_0
charset-normalizer 2.1.0 pypi_0 pypi cloudpickle 2.1.0 pypi_0 pypi cycler 0.11.0 pypi_0 pypi flatbuffers 1.12 pypi_0 pypi fonttools 4.34.4 pypi_0 pypi gast 0.4.0 pypi_0 pypi google-auth 2.10.0 pypi_0 pypi google-auth-oauthlib 0.4.6 pypi_0 pypi google-pasta 0.2.0 pypi_0 pypi grpcio 1.42.0 py310h95c9599_0
gym 0.21.0 pypi_0 pypi h5py 3.6.0 py310h181c318_0
hdf5 1.12.1 h160e8cb_2
idna 3.3 pypi_0 pypi keras 2.9.0 pypi_0 pypi keras-preprocessing 1.1.2 pypi_0 pypi kiwisolver 1.4.4 pypi_0 pypi krb5 1.19.2 h3b8d789_0
libclang 14.0.6 pypi_0 pypi libcurl 7.84.0 hc6d1d07_0
libcxx 12.0.0 hf6beb65_1
libedit 3.1.20210910 h1a28f6b_0
libev 4.33 h1a28f6b_1
libffi 3.4.2 hc377ac9_4
libgfortran 5.0.0 11_2_0_he6877d6_26
libgfortran5 11.2.0 he6877d6_26
libnghttp2 1.46.0 h95c9599_0
libopenblas 0.3.20 hea475bc_0
libssh2 1.10.0 hf27765b_0
llvm-openmp 12.0.0 haf9daa7_1
markdown 3.4.1 pypi_0 pypi markupsafe 2.1.1 pypi_0 pypi matplotlib 3.5.1 pypi_0 pypi msgpack 1.0.3 pypi_0 pypi msgpack-numpy 0.4.7.1 pypi_0 pypi ncurses 6.3 h1a28f6b_3
numpy 1.22.0 pypi_0 pypi oauthlib 3.2.0 pypi_0 pypi openssl 1.1.1q h1a28f6b_0
opt-einsum 3.3.0 pypi_0 pypi packaging 21.3 pypi_0 pypi pillow 9.2.0 pypi_0 pypi pip 22.1.2 py310hca03da5_0
protobuf 3.19.4 pypi_0 pypi pyasn1 0.4.8 pypi_0 pypi pyasn1-modules 0.2.8 pypi_0 pypi pyparsing 3.0.9 pypi_0 pypi python 3.10.4 hbdb9e5c_0
python-dateutil 2.8.2 pypi_0 pypi readline 8.1.2 h1a28f6b_1
requests 2.28.1 pypi_0 pypi requests-oauthlib 1.3.1 pypi_0 pypi rsa 4.9 pypi_0 pypi setuptools 61.2.0 py310hca03da5_0
six 1.15.0 pypi_0 pypi sqlite 3.39.2 h1058600_0
tensorboard 2.9.1 pypi_0 pypi tensorboard-data-server 0.6.1 pypi_0 pypi tensorboard-plugin-wit 1.8.1 pypi_0 pypi tensorflow-deps 2.8.0 0 apple tensorflow-estimator 2.9.0 pypi_0 pypi tensorflow-macos 2.9.2 pypi_0 pypi tensorflow-metal 0.5.0 pypi_0 pypi tensorforce 0.6.5 pypi_0 pypi termcolor 1.1.0 pypi_0 pypi tk 8.6.12 hb8d0fd4_0
tqdm 4.62.3 pypi_0 pypi typing-extensions 4.3.0 pypi_0 pypi tzdata 2022a hda174b7_0
urllib3 1.26.11 pypi_0 pypi werkzeug 2.2.2 pypi_0 pypi wheel 0.37.1 pyhd3eb1b0_0
wrapt 1.14.1 pypi_0 pypi xz 5.2.5 h1a28f6b_1
zlib 1.2.12 h5a0b063_2

Is there any way to dares this issue? I also tried downgrading python to 3.9 with did not work. Is Mac OS not supposed to be supported using TensorFlow-metal?

Thank you.
opened by doric35 0
Bump mistune from 0.8.4 to 2.0.3 in /docs
Bumps mistune from 0.8.4 to 2.0.3.

Release notes

Sourced from mistune's releases.

Version 2.0.2

Fix escape_url via lepture/mistune#295

Version 2.0.1

Fix XSS for image link syntax.

Version 2.0.0

First release of Mistune v2.

Version 2.0.0 RC1

In this release, we have a Security Fix for harmful links.

Version 2.0.0 Alpha 1

This is the first release of v2. An alpha version for users to have a preview of the new mistune.

Changelog

Sourced from mistune's changelog.

Changelog

Here is the full history of mistune v2.

Version 2.0.4

Released on Jul 15, 2022 Fix url plugin in <a> tag Fix * formatting
Version 2.0.3

Released on Jun 27, 2022

Fix table plugin

Security fix for CVE-2022-34749

Version 2.0.2

Released on Jan 14, 2022 Fix escape_url
Version 2.0.1

Released on Dec 30, 2021

XSS fix for image link syntax.

Version 2.0.0

Released on Dec 5, 2021 This is the first non-alpha release of mistune v2.
Version 2.0.0rc1

Released on Feb 16, 2021

Version 2.0.0a6

</tr></table>

... (truncated)

Commits

3f422f1 Version bump 2.0.3

a6d4321 Fix asteris emphasis regex CVE-2022-34749

5638e46 Merge pull request #307 from jieter/patch-1

0eba471 Fix typo in guide.rst

61e9337 Fix table plugin

76dec68 Add documentation for renderer heading when TOC enabled

799cd11 Version bump 2.0.2

babb0cf Merge pull request #295 from dairiki/bug.escape_url

fc2cd53 Make mistune.util.escape_url less aggressive

3e8d352 Version bump 2.0.1

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 0

Releases(0.6.5)

0.6.5(Aug 30, 2021)
Agents:

Renamed agent argument reward_preprocessing to reward_processing, and in case of Tensorforce agent moved to reward_estimation[reward_processing]

Distributions:

New categorical distribution argument skip_linear to not add the implicit linear logits layer

Environments:

Support for multi-actor parallel environments via new function Environment.num_actors()

Runner uses multi-actor parallelism by default if environment is multi-actor

New optional Environment function episode_return() which returns the true return of the last episode, if cumulative sum of environment rewards is not a good metric for runner display

Examples:

New vectorized_environment.py and multiactor_environment.py script to illustrate how to setup a vectorized/multi-actor environment.

Source code(tar.gz)
Source code(zip)
0.6.4(Jun 5, 2021)
Agents:

Agent argument update_frequency / update[frequency] now supports float values > 0.0, which specify the update-frequency relative to the batch-size

Changed default value for argument update_frequency from 1.0 to 0.25 for DQN, DoubleDQN, DuelingDQN agents

New argument return_processing and advantage_processing (where applicable) for all agent sub-types

New function Agent.get_specification() which returns the agent specification as dictionary

New function Agent.get_architecture() which returns a string representation of the network layer architecture

Modules:

Improved and simplified module specification, for instance: network=my_module instead of network=my_module.TestNetwork, or environment=envs.custom_env instead of environment=envs.custom_env.CustomEnvironment (module file needs to be in the same directory or a sub-directory)

Networks:

New argument single_output=True for some policy types which, if False, allows the specification of additional network outputs for some/all actions via registered tensors

KerasNetwork argument model now supports arbitrary functions as long as they return a tf.keras.Model

Layers:

New layer type SelfAttention (specification key: self_attention)

Parameters:

Support tracking of non-constant parameter values

Runner:

Rename attribute episode_rewards as episode_returns, and TQDM status reward as return

Extend argument agent to support Agent.load() keyword arguments to load an existing agent instead of creating a new one.

Examples:

Added action_masking.py example script to illustrate an environment implementation with built-in action masking.

Buxfixes:

Customized device placement was not applied to most tensors

Source code(tar.gz)
Source code(zip)
0.6.3(Mar 22, 2021)
Agents:

New agent argument tracking and corresponding function tracked_tensors() to track and retrieve the current value of predefined tensors, similar to summarizer for TensorBoard summaries

New experimental value trace_decay and gae_decay for Tensorforce agent argument reward_estimation, soon for other agent types as well

New options "early" and "late" for value estimate_advantage of Tensorforce agent argument reward_estimation

Changed default value for Agent.act() argument deterministic from False to True

Networks:

New network type KerasNetwork (specification key: keras) as wrapper for networks specified as Keras model

Passing a Keras model class/object as policy/network argument is automatically interpreted as KerasNetwork

Distributions:

Changed Gaussian distribution argument global_stddev=False to stddev_mode='predicted'

New Categorical distribution argument temperature_mode=None

Layers:

New option for Function layer argument function to pass string function expression with argument "x", e.g. "(x+1.0)/2.0"

Summarizer:

New summary episode-length recorded as part of summary label "reward"

Environments:

Support for vectorized parallel environments via new function Environment.is_vectorizable() and new argument num_parallel for Environment.reset()

See tensorforce/environments.cartpole.py for a vectorizable environment example

Runner uses vectorized parallelism by default if num_parallel > 1, remote=None and environment supports vectorization

See examples/act_observe_vectorized.py for more details on act-observe interaction

New extended and vectorizable custom CartPole environment via key custom_cartpole (work in progress)

New environment argument reward_shaping to provide a simple way to modify/shape rewards of an environment, can be specified either as callable or string function expression

run.py script:

New option for command line arguments --checkpoints and --summaries to add comma-separated checkpoint/summary filename in addition to directory

Added episode lengths to logging plot besides episode returns

Buxfixes:

Temporal horizon handling of RNN layers

Critical bugfix for late horizon value prediction (including DQN variants and DPG agent) in combination with baseline RNN

GPU problems with scatter operations

Source code(tar.gz)
Source code(zip)
0.6.2(Oct 3, 2020)
Buxfixes:

Critical bugfix for DQN variants and DPG agent

Source code(tar.gz)
Source code(zip)
0.6.1(Sep 19, 2020)
Agents:

Removed default value "adam" for Tensorforce agent argument optimizer (since default optimizer argument learning_rate removed, see below)

Removed option "minimum" for Tensorforce agent argument memory, use None instead

Changed default value for dqn/double_dqn/dueling_dqn agent argument huber_loss from 0.0 to None

Layers:

Removed default value 0.999 for exponential_normalization layer argument decay

Added new layer batch_normalization (generally should only be used for the agent arguments reward_processing[return_processing] and reward_processing[advantage_processing])

Added exponential/instance_normalization layer argument only_mean with default False

Added exponential/instance_normalization layer argument min_variance with default 1e-4

Optimizers:

Removed default value 1e-3 for optimizer argument learning_rate

Changed default value for optimizer argument gradient_norm_clipping from 1.0 to None (no gradient clipping)

Added new optimizer doublecheck_step and corresponding argument doublecheck_update for optimizer wrapper

Removed linesearch_step optimizer argument accept_ratio

Removed natural_gradient optimizer argument return_improvement_estimate

Saver:

Added option to specify agent argument saver as string, which is interpreted as saver[directory] with otherwise default values

Added default value for agent argument saver[frequency] as 10 (save model every 10 updates by default)

Changed default value of agent argument saver[max_checkpoints] from 5 to 10

Summarizer:

Added option to specify agent argument summarizer as string, which is interpreted as summarizer[directory] with otherwise default values

Renamed option of agent argument summarizer from summarizer[labels] to summarizer[summaries] (use of the term "label" due to earlier version, outdated and confusing by now)

Changed interpretation of agent argument summarizer[summaries] = "all" to include only numerical summaries, so all summaries except "graph"

Changed default value of agent argument summarizer[summaries] from ["graph"] to "all"

Changed default value of agent argument summarizer[max_summaries] from 5 to 7 (number of different colors in TensorBoard)

Added option summarizer[filename] to agent argument summarizer

Recorder:

Added option to specify agent argument recorder as string, which is interpreted as recorder[directory] with otherwise default values

run.py script:

Added --checkpoints/--summaries/--recordings command line argument to enable saver/summarizer/recorder agent argument specification separate from core agent configuration

Examples:

Added save_load_agent.py example script to illustrate regular agent saving and loading

Buxfixes

Fixed problem with optimizer argument gradient_norm_clipping not being applied correctly

Fixed problem with exponential_normalization layer not updating moving mean and variance correctly

Fixed problem with recent memory for timestep-based updates sometimes sampling invalid memory indices

Source code(tar.gz)
Source code(zip)
0.6.0(Aug 30, 2020)
Removed agent arguments execution, buffer_observe, seed

Renamed agent arguments baseline_policy/baseline_network/critic_network to baseline/critic

Renamed agent reward_estimation arguments estimate_horizon to predict_horizon_values, estimate_actions to predict_action_values, estimate_terminal to predict_terminal_values

Renamed agent argument preprocessing to state_preprocessing

Default agent preprocessing linear_normalization

Moved agent arguments for reward/return/advantage processing from preprocessing to reward_preprocessing and reward_estimation[return_/advantage_processing]

New agent argument config with values buffer_observe, enable_int_action_masking, seed

Renamed PPO/TRPO/DPG argument critic_network/_optimizer to baseline/baseline_optimizer

Renamed PPO argument optimization_steps to multi_step

New TRPO argument subsampling_fraction

Changed agent argument use_beta_distribution default to false

Added double DQN agent (double_dqn)

Removed Agent.act() argument evaluation

Removed agent function arguments query (functionality removed)

Agent saver functionality changed (Checkpoint/SavedModel instead of Saver/Protobuf): save/load functions and saver argument changed

Default behavior when specifying saver is not to load agent, unless agent is created via Agent.load

Agent summarizer functionality changed: summarizer argument changed, some summary labels and other options removed

Renamed RNN layers internal_{rnn/lstm/gru} to rnn/lstm/gru and rnn/lstm/gru to input_{rnn/lstm/gru}

Renamed auto network argument internal_rnn to rnn

Renamed (internal_)rnn/lstm/gru layer argument length to horizon

Renamed update_modifier_wrapper to optimizer_wrapper

Renamed optimizing_step to linesearch_step, and UpdateModifierWrapper argument optimizing_iterations to linesearch_iterations

Optimizer subsampling_step accepts both absolute (int) and relative (float) fractions

Objective policy_gradient argument ratio_based renamed to importance_sampling

Added objectives state_value and action_value

Added Gaussian distribution arguments global_stddev and bounded_transform (for improved bounded action space handling)

Changed default memory device argument to CPU:0

Renamed rewards summaries

Agent.create() accepts act-function as agent argument for recording

Singleton states and actions are now consistently handled as singletons

Major change to policy handling and defaults, in particular parametrized_distributions, new default policies parametrized_state/action_value

Combined long and int type

Always wrap environment in EnvironmentWrapper class

Changed tune.py arguments

Source code(tar.gz)
Source code(zip)
0.5.5(Jun 16, 2020)
Changed independent mode of agent.act to use final values of dynamic hyperparameters and avoid TensorFlow conditions

Extended "tensorflow" format of agent.save to include an optimized Protobuf model with an act-only graph as .pb file, and Agent.load format "pb-actonly" to load act-only agent based on Protobuf model

Support for custom summaries via new summarizer argument value custom to specify summary type, and Agent.summarize(...) to record summary values

Added min/max-bounds for dynamic hyperparameters min/max-bounds to assert valid range and infer other arguments

Argument batch_size now mandatory for all agent classes

Removed Estimator argument capacity, now always automatically inferred

Internal changes related to agent arguments memory, update and reward_estimation

Changed the default bias and activation argument of some layers

Fixed issues with sequence preprocessor

DQN and dueling DQN properly constrained to int actions only

Added use_beta_distribution argument with default True to many agents and ParametrizedDistributions policy, so default can be changed

Source code(tar.gz)
Source code(zip)
0.5.4(Feb 15, 2020)
DQN/DuelingDQN/DPG argument memory now required to be specified explicitly, plus update_frequency default changed

Removed (temporarily) conv1d/conv2d_transpose layers due to TensorFlow gradient problems

Agent, Environment and Runner can now be imported via from tensorforce import ...

New generic reshape layer available as reshape

Support for batched version of Agent.act and Agent.observe

Support for parallelized remote environments based on Python's multiprocessing and socket (replacing tensorforce/contrib/socket_remote_env/ and tensorforce/environments/environment_process_wrapper.py), available via Environment.create(...), Runner(...) and run.py

Removed ParallelRunner and merged functionality with Runner

Changed run.py arguments

Changed independent mode for Agent.act: additional argument internals and corresponding return value, initial internals via Agent.initial_internals(), Agent.reset() not required anymore

Removed deterministic argument for Agent.act unless independent mode

Added format argument to save/load/restore with supported formats tensorflow, numpy and hdf5

Changed save argument append_timestep to append with default None (instead of 'timesteps')

Added get_variable and assign_variable agent functions

Source code(tar.gz)
Source code(zip)
0.5.3(Dec 26, 2019)
Added optional memory argument to various agents

Improved summary labels, particularly "entropy" and "kl-divergence"

linear layer now accepts tensors of rank 1 to 3

Network output / distribution input does not need to be a vector anymore

Transposed convolution layers (conv1d/2d_transpose)

Parallel execution functionality contributed by @jerabaul29, currently under tensorforce/contrib/

Accept string for runner save_best_agent argument to specify best model directory different from saver configuration

saver argument steps removed and seconds renamed to frequency

Moved Parallel/Runner argument max_episode_timesteps from run(...) to constructor

New Environment.create(...) argument max_episode_timesteps

TensorFlow 2.0 support

Improved Tensorboard summaries recording

Summary labels graph, variables and variables-histogram temporarily not working

TF-optimizers updated to TensorFlow 2.0 Keras optimizers

Added TensorFlow Addons dependency, and support for TFA optimizers

Changed unit of target_sync_frequency from timesteps to updates for dqn and dueling_dqn agent

Source code(tar.gz)
Source code(zip)
0.5.2(Oct 14, 2019)
Improved unittest performance

Added updates and renamed timesteps/episodes counter for agents and runners

Renamed critic_{network,optimizer} argument to baseline_{network,optimizer}

Added Actor-Critic (ac), Advantage Actor-Critic (a2c) and Dueling DQN (dueling_dqn) agents

Improved "same" baseline optimizer mode and added optional weight specification

Reuse layer now global for parameter sharing across modules

New block layer type (block) for easier sharing of layer blocks

Renamed PolicyAgent/-Model to TensorforceAgent/-Model

New Agent.load(...) function, saving includes agent specification

Removed PolicyAgent argument (baseline-)network

Added policy argument temperature

Removed "same" and "equal" options for baseline_* arguments and changed internal baseline handling

Combined state/action_value to value objective with argument value either "state" or "action"

Source code(tar.gz)
Source code(zip)
0.5.1(Sep 10, 2019)
Fixed setup.py packages value

Source code(tar.gz)
Source code(zip)
0.5.0(Sep 8, 2019)
Major Revision

Agent:

DQFDAgent removed (temporarily)

DQNNstepAgent and NAFAgent part of DQNAgent

Agents need to be initialized via agent.initialize() before application

States/actions of type int require an entry num_values (instead of num_actions)

Agent.from_spec() changed and renamed to Agent.create()

Agent.act() argument fetch_tensors changed and renamed to query, index renamed to parallel, buffered removed

Agent.observe() argument index renamed to parallel

Agent.atomic_observe() removed

Agent.save/restore_model() renamed to Agent.save/restore()

Agent arguments:

update_mode renamed to update

states_preprocessing and reward_preprocessing changed and combined to preprocessing

actions_exploration changed and renamed to exploration

execution entry num_parallel replaced by a separate argument parallel_interactions

batched_observe and batching_capacity replaced by argument buffer_observe

scope renamed to name

DQNAgent arguments:

update_mode replaced by batch_size, update_frequency and start_updating

optimizer removed, implicitly defined as 'adam', learning_rate added

memory defines capacity of implicitly defined memory 'replay'

double_q_model removed (temporarily)

Policy gradient agent arguments:

New mandatory argument max_episode_timesteps

update_mode replaced by batch_size and update_frequency

memory removed

baseline_mode removed

baseline argument changed and renamed to critic_network

baseline_optimizer renamed to critic_optimizer

gae_lambda removed (temporarily)

PPOAgent arguments:

step_optimizer removed, implicitly defined as 'adam', learning_rate added

TRPOAgent arguments:

cg_* and ls_* arguments removed

VPGAgent arguments:

optimizer removed, implicitly defined as 'adam', learning_rate added

Environment:

Environment properties states and actions are now functions states() and actions()

States/actions of type int require an entry num_values (instead of num_actions)

New function Environment.max_episode_timesteps()

Contrib environments:

ALE, MazeExp, OpenSim, Gym, Retro, PyGame and ViZDoom moved to tensorforce.environments

Other environment implementations removed (may be upgraded in the future)

Runners:

Improved run() API for Runner and ParallelRunner

ThreadedRunner removed

Other:

examples folder (including configs) removed, apart from quickstart.py

New benchmarks folder to replace parts of old examples folder

Source code(tar.gz)
Source code(zip)
0.4.4(Sep 7, 2019)

Source code(tar.gz)
Source code(zip)
0.4.3(Aug 16, 2018)

Source code(tar.gz)
Source code(zip)

Tensorforce: a TensorFlow library for applied reinforcement learning

Related tags

Overview

Tensorforce: a TensorFlow library for applied reinforcement learning

Introduction

Quicklinks

Table of content

Installation

Quickstart example code

Command line usage

Features

Environment adapters

Support, feedback and donating

Core team and contributors

Cite Tensorforce

Comments

TensorFlow 2.9.3

Release 2.9.3

TensorFlow 2.9.2

Release 2.9.2

Release 2.9.3

Release 2.8.4

Name Version Build Channel

Version 2.0.2

Version 2.0.1

Version 2.0.0

Version 2.0.0 RC1

Version 2.0.0 Alpha 1

Changelog

Releases(0.6.5)

0.6.5(Aug 30, 2021)

Agents:

Distributions:

Environments:

Examples:

0.6.4(Jun 5, 2021)

Agents:

Modules:

Networks:

Layers:

Parameters:

Runner:

Examples:

Buxfixes:

0.6.3(Mar 22, 2021)

Agents:

Networks:

Distributions:

Layers:

Summarizer:

Environments:

run.py script:

Buxfixes:

0.6.2(Oct 3, 2020)

Buxfixes:

0.6.1(Sep 19, 2020)

Agents:

Layers:

Optimizers:

Saver:

Summarizer:

Recorder:

run.py script:

Examples:

Buxfixes

0.6.0(Aug 30, 2020)

0.5.5(Jun 16, 2020)

0.5.4(Feb 15, 2020)

0.5.3(Dec 26, 2019)

0.5.2(Oct 14, 2019)

0.5.1(Sep 10, 2019)

0.5.0(Sep 8, 2019)

Major Revision

Agent:

Agent arguments:

DQNAgent arguments:

Policy gradient agent arguments:

PPOAgent arguments:

TRPOAgent arguments:

VPGAgent arguments: