Tensorforce: a TensorFlow library for applied reinforcement learning

Overview

Tensorforce: a TensorFlow library for applied reinforcement learning

Docs Gitter Build Status pypi version python version License Donate Donate

Introduction

Tensorforce is an open-source deep reinforcement learning framework, with an emphasis on modularized flexible library design and straightforward usability for applications in research and practice. Tensorforce is built on top of Google's TensorFlow framework and requires Python 3.

Tensorforce follows a set of high-level design choices which differentiate it from other similar libraries:

  • Modular component-based design: Feature implementations, above all, strive to be as generally applicable and configurable as possible, potentially at some cost of faithfully resembling details of the introducing paper.
  • Separation of RL algorithm and application: Algorithms are agnostic to the type and structure of inputs (states/observations) and outputs (actions/decisions), as well as the interaction with the application environment.
  • Full-on TensorFlow models: The entire reinforcement learning logic, including control flow, is implemented in TensorFlow, to enable portable computation graphs independent of application programming language, and to facilitate the deployment of models.

Quicklinks

Table of content

Installation

A stable version of Tensorforce is periodically updated on PyPI and installed as follows:

pip3 install tensorforce

To always use the latest version of Tensorforce, install the GitHub version instead:

git clone https://github.com/tensorforce/tensorforce.git
pip3 install -e tensorforce

Environments require additional packages for which there are setup options available (ale, gym, retro, vizdoom, carla; or envs for all environments), however, some require additional tools to be installed separately (see environments documentation). Other setup options include tfa for TensorFlow Addons and tune for HpBandSter required for the tune.py script.

Note on GPU usage: Different from (un)supervised deep learning, RL does not always benefit from running on a GPU, depending on environment and agent configuration. In particular for environments with low-dimensional state spaces (i.e., no images), it is hence worth trying to run on CPU only.

Quickstart example code

from tensorforce import Agent, Environment

# Pre-defined or custom environment
environment = Environment.create(
    environment='gym', level='CartPole', max_episode_timesteps=500
)

# Instantiate a Tensorforce agent
agent = Agent.create(
    agent='tensorforce',
    environment=environment,  # alternatively: states, actions, (max_episode_timesteps)
    memory=10000,
    update=dict(unit='timesteps', batch_size=64),
    optimizer=dict(type='adam', learning_rate=3e-4),
    policy=dict(network='auto'),
    objective='policy_gradient',
    reward_estimation=dict(horizon=20)
)

# Train for 300 episodes
for _ in range(300):

    # Initialize episode
    states = environment.reset()
    terminal = False

    while not terminal:
        # Episode timestep
        actions = agent.act(states=states)
        states, terminal, reward = environment.execute(actions=actions)
        agent.observe(terminal=terminal, reward=reward)

agent.close()
environment.close()

Command line usage

Tensorforce comes with a range of example configurations for different popular reinforcement learning environments. For instance, to run Tensorforce's implementation of the popular Proximal Policy Optimization (PPO) algorithm on the OpenAI Gym CartPole environment, execute the following line:

python3 run.py --agent benchmarks/configs/ppo.json --environment gym \
    --level CartPole-v1 --episodes 100

For more information check out the documentation.

Features

  • Network layers: Fully-connected, 1- and 2-dimensional convolutions, embeddings, pooling, RNNs, dropout, normalization, and more; plus support of Keras layers.
  • Network architecture: Support for multi-state inputs and layer (block) reuse, simple definition of directed acyclic graph structures via register/retrieve layer, plus support for arbitrary architectures.
  • Memory types: Simple batch buffer memory, random replay memory.
  • Policy distributions: Bernoulli distribution for boolean actions, categorical distribution for (finite) integer actions, Gaussian distribution for continuous actions, Beta distribution for range-constrained continuous actions, multi-action support.
  • Reward estimation: Configuration options for estimation horizon, future reward discount, state/state-action/advantage estimation, and for whether to consider terminal and horizon states.
  • Training objectives: (Deterministic) policy gradient, state-(action-)value approximation.
  • Optimization algorithms: Various gradient-based optimizers provided by TensorFlow like Adam/AdaDelta/RMSProp/etc, evolutionary optimizer, natural-gradient-based optimizer, plus a range of meta-optimizers.
  • Exploration: Randomized actions, sampling temperature, variable noise.
  • Preprocessing: Clipping, deltafier, sequence, image processing.
  • Regularization: L2 and entropy regularization.
  • Execution modes: Parallelized execution of multiple environments based on Python's multiprocessing and socket.
  • Optimized act-only SavedModel extraction.
  • TensorBoard support.

By combining these modular components in different ways, a variety of popular deep reinforcement learning models/features can be replicated:

Note that in general the replication is not 100% faithful, since the models as described in the corresponding paper often involve additional minor tweaks and modifications which are hard to support with a modular design (and, arguably, also questionable whether it is important/desirable to support them). On the upside, these models are just a few examples from the multitude of module combinations supported by Tensorforce.

Environment adapters

  • Arcade Learning Environment, a simple object-oriented framework that allows researchers and hobbyists to develop AI agents for Atari 2600 games.
  • CARLA, is an open-source simulator for autonomous driving research.
  • OpenAI Gym, a toolkit for developing and comparing reinforcement learning algorithms which supports teaching agents everything from walking to playing games like Pong or Pinball.
  • OpenAI Retro, lets you turn classic video games into Gym environments for reinforcement learning and comes with integrations for ~1000 games.
  • OpenSim, reinforcement learning with musculoskeletal models.
  • PyGame Learning Environment, learning environment which allows a quick start to Reinforcement Learning in Python.
  • ViZDoom, allows developing AI bots that play Doom using only the visual information.

Support, feedback and donating

Please get in touch via mail or on Gitter if you have questions, feedback, ideas for features/collaboration, or if you seek support for applying Tensorforce to your problem.

If you want to support the Tensorforce core team (see below), please also consider donating: GitHub Sponsors or Liberapay.

Core team and contributors

Tensorforce is currently developed and maintained by Alexander Kuhnle.

Earlier versions of Tensorforce (<= 0.4.2) were developed by Michael Schaarschmidt, Alexander Kuhnle and Kai Fricke.

The advanced parallel execution functionality was originally contributed by Jean Rabault (@jerabaul29) and Vincent Belus (@vbelus). Moreover, the pretraining feature was largely developed in collaboration with Hongwei Tang (@thw1021) and Jean Rabault (@jerabaul29).

The CARLA environment wrapper is currently developed by Luca Anzalone (@luca96).

We are very grateful for our open-source contributors (listed according to Github, updated periodically):

Islandman93, sven1977, Mazecreator, wassname, lefnire, daggertye, trickmeyer, mkempers, mryellow, ImpulseAdventure, janislavjankov, andrewekhalel, HassamSheikh, skervim, beflix, coord-e, benelot, tms1337, vwxyzjn, erniejunior, Deathn0t, petrbel, nrhodes, batu, yellowbee686, tgianko, AdamStelmaszczyk, BorisSchaeling, christianhidber, Davidnet, ekerazha, gitter-badger, kborozdin, Kismuz, mannsi, milesmcc, nagachika, neitzal, ngoodger, perara, sohakes, tomhennigan.

Cite Tensorforce

Please cite the framework as follows:

@misc{tensorforce,
  author       = {Kuhnle, Alexander and Schaarschmidt, Michael and Fricke, Kai},
  title        = {Tensorforce: a TensorFlow library for applied reinforcement learning},
  howpublished = {Web page},
  url          = {https://github.com/tensorforce/tensorforce},
  year         = {2017}
}

If you use the parallel execution functionality, please additionally cite it as follows:

@article{rabault2019accelerating,
  title        = {Accelerating deep reinforcement learning strategies of flow control through a multi-environment approach},
  author       = {Rabault, Jean and Kuhnle, Alexander},
  journal      = {Physics of Fluids},
  volume       = {31},
  number       = {9},
  pages        = {094105},
  year         = {2019},
  publisher    = {AIP Publishing}
}

If you use Tensorforce in your research, you may additionally consider citing the following paper:

@article{lift-tensorforce,
  author       = {Schaarschmidt, Michael and Kuhnle, Alexander and Ellis, Ben and Fricke, Kai and Gessert, Felix and Yoneki, Eiko},
  title        = {{LIFT}: Reinforcement Learning in Computer Systems by Learning From Demonstrations},
  journal      = {CoRR},
  volume       = {abs/1808.07903},
  year         = {2018},
  url          = {http://arxiv.org/abs/1808.07903},
  archivePrefix = {arXiv},
  eprint       = {1808.07903}
}
Comments
  • Unable to train for many episodes: RAM usage too high!

    Unable to train for many episodes: RAM usage too high!

    Hi @AlexKuhnle, I have some trouble training a ppo agent. Basically, I'm able to train it for only very few episodes (e.g. 4, 8). If I increase the number of episodes, my laptop will crash or freeze due to running out of RAM.

    I have a linux machine with 16GB of RAM. Tensorflow 2.1.0 (cpu-only) and Tensorforce 0.5.4. The agent I'm trying to train is defined as follows:

    policy_network = dict(type='auto', size=64, depth=2, 
                          final_size=256, final_depth=2, internal_rnn=10)
            
    agent = Agent.create(
                agent='ppo', 
                environment=environment, 
                max_episode_timesteps=200,
                network=policy_network,
                # Optimization
                batch_size=4, 
                update_frequency=1, 
                learning_rate=1e-3, 
                subsampling_fraction=0.5,
                optimization_steps=5,:
                # Reward estimation
                likelihood_ratio_clipping=0.2, discount=0.99, estimate_terminal=False,
                # Critic
                critic_network='auto',
                critic_optimizer=dict(optimizer='adam', multi_step=10, learning_rate=1e-3),
                # Exploration
                exploration=0.0, variable_noise=0.0,
                # Regularization
                l2_regularization=0.0, entropy_regularization=0.0,
            )
    

    The environment is a custom one I've made: it has a complex state space (i.e. an image and some feature vectors), and a simple action space (i.e. five float actions).

    I use a Runner to train the agent:

    runner = Runner(agent, environment, max_episode_timesteps=200, num_parallel=None)
    runner.run(num_episodes=100)
    

    As you can see from the above code snippet, I'd like to train my agent for (at least) 100 episodes but the system crashes after completing episode 4.

    I noticed that, during training, every batch_size episodes (4 in my case) Tensorforce allocates an additional amount of 6/7 GB of RAM which causes the system to crash: my OS uses 1 GB + 2/3 GB the environment simulator + 3/4 GB for the agent and environment.

    This is what happens (slightly before freezing): memory_issue_tensorforce

    Is this behaviour normal? Just to be sure, I tested a similar (but simpler) agent on the CartPole environment for 300 episodes and it works fine with very little memory overhead. How it's possible?

    Thank you in advance.

    opened by Luca96 36
  • Custom network and layer freezing

    Custom network and layer freezing

    Hi,

    I want to build a custom environment in which an action would be a 2d matrix (basically a b/w image), and one of the solutions I found is to use a policy based algorithm such as PPO with the policy network having layers of deconvolutions (I would probably use a U-net).

    I first intended to use baselines, but I want the output of my network to match the action pixel-wise (the output at position (x,y) is used for the value at position (x,y) in the action), which I believe is not the case in the PPO2 implementation of baselines, where there is a fully-connected layer when the output of the network becomes the parameters of a probability distribution from which the action is sampled.

    Would it be possible to simply write the U-net architecture as a dictionary in your implementation, and have it working like I want, given the action space and network output shape are matching, or am I missing something ?

    Also, is it possible to freeze layers of the network, and/or use a pre-trained network ?

    I read through the documentation, but some of my questions will probably have an obvious answer somewhere in the repo, sorry for that !

    opened by vbelus 29
  • Quickstart example get stuck [GPU]

    Quickstart example get stuck [GPU]

    Hi,

    I just installed tensorforce (from pip) with tensorflow-gpu 1.7 and tried to run example/quickstart.py. The training starts but then gets stucks after n episodes where n is the minimum of batch_size and frequency value of the update_mode argument of PPOAgent.

    update_mode=dict(
        unit='episodes',
        # 10 episodes per update
        batch_size=20,
        # Every 10 episodes
        frequency=20
    ),
    

    No error message is displayed, it just hangs forever. Has anyone experienced something similar?

    Thanks,

    opened by nisace 28
  • tf2 branch: unable to use

    tf2 branch: unable to use "saved_model"

    Hi,

    I've started to look at the saved_model export in the tf2 branch and I face some issues: First, I had to change tensorforce/core/utils/dicts.py, line 121 to accept all data types - it seems that tensorflow tries to rebuild dictionaries in the process: value_type=(tf.IndexedSlices, tf.Tensor, tf.Variable, object)

    Then, in tensorforce/core/models/model.py line 678, I got errors caused by the signature: ValueError: Got non-flat outputs '(TensorDict(main_sail_angle=Tensor("StatefulPartitionedCall:1", shape=(None,), dtype=float32), jib_angle=Tensor("StatefulPartitionedCall:0", shape=(None,), dtype=float32), rudder_angle=Tensor("StatefulPartitionedCall:2", shape=(None,), dtype=float32)), TensorDict())' from 'b'__inference_function_graph_2203'' for SavedModel signature 'serving_default'. Signatures have one Tensor per output, so to have predictable names Python functions used to generate these signatures should avoid outputting Tensors in nested structures.

    I tried to remove the signature in the saved_model.save call, and I got troubles with tensorforce/core/module.py, the function tf_function which build a function_graphs with keys which are tuples - and tensorflow doesn't like it. I converted them to string and I could save a file, but it's totally unusable.

    So I'm stuck here, I'd need more help: what is tf_function doing exactly? Why don't you use tf.function instead?

    Thanks! Ben

    opened by bezineb5 24
  • Error: Invalid Gradient

    Error: Invalid Gradient

    Hi! I got this error during the training which I never saw before. Could you please help me with it?

    Thank you very much! Zebin Li

    Traceback (most recent call last): File "C:/Users/lizeb/Box/research projects/active learning for image classification/code/run_manytimes_RAL_AL_samedata_0522.py", line 18, in performance_history_RL, performance_history_RL_test, performance_history_AL, performance_history_AL_test, test_RL, test_AL, rewards = runmanytimes() File "C:\Users\lizeb\Box\research projects\active learning for image classification\code\RAL_AL_samedata_0522.py", line 407, in runmanytimes agent.observe(terminal=terminal, reward=reward) File "C:\Users\lizeb\Box\research projects\active learning for image classification\code\venv\lib\site-packages\tensorforce\agents\agent.py", line 510, in observe updated, episodes, updates = self.model.observe( File "C:\Users\lizeb\Box\research projects\active learning for image classification\code\venv\lib\site-packages\tensorforce\core\module.py", line 128, in decorated output_args = function_graphsstr(graph_params) File "C:\Users\lizeb\Box\research projects\active learning for image classification\code\venv\lib\site-packages\tensorflow\python\eager\def_function.py", line 780, in call result = self._call(*args, **kwds) File "C:\Users\lizeb\Box\research projects\active learning for image classification\code\venv\lib\site-packages\tensorflow\python\eager\def_function.py", line 814, in _call results = self._stateful_fn(*args, **kwds) File "C:\Users\lizeb\Box\research projects\active learning for image classification\code\venv\lib\site-packages\tensorflow\python\eager\function.py", line 2829, in call return graph_function._filtered_call(args, kwargs) # pylint: disable=protected-access File "C:\Users\lizeb\Box\research projects\active learning for image classification\code\venv\lib\site-packages\tensorflow\python\eager\function.py", line 1843, in _filtered_call return self._call_flat( File "C:\Users\lizeb\Box\research projects\active learning for image classification\code\venv\lib\site-packages\tensorflow\python\eager\function.py", line 1923, in _call_flat return self._build_call_outputs(self._inference_function.call( File "C:\Users\lizeb\Box\research projects\active learning for image classification\code\venv\lib\site-packages\tensorflow\python\eager\function.py", line 545, in call outputs = execute.execute( File "C:\Users\lizeb\Box\research projects\active learning for image classification\code\venv\lib\site-packages\tensorflow\python\eager\execute.py", line 59, in quick_execute tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name, tensorflow.python.framework.errors_impl.InvalidArgumentError: Invalid gradient: contains inf or nan. : Tensor had NaN values [[{{node agent/StatefulPartitionedCall/agent/cond_1/then/_262/agent/cond_1/StatefulPartitionedCall/agent/StatefulPartitionedCall_5/policy_optimizer/StatefulPartitionedCall/policy_optimizer/StatefulPartitionedCall/policy_optimizer/while/body/_1185/policy_optimizer/while/StatefulPartitionedCall/policy_optimizer/cond/then/_1464/policy_optimizer/cond/StatefulPartitionedCall/policy_optimizer/VerifyFinite/CheckNumerics}}]] [Op:__inference_observe_5103]

    Function call stack: observe

    opened by Zebin-Li 23
  • Some questions about tensorforce

    Some questions about tensorforce

    Hi, thanks for your great work. But when I read the docs I have some questions about this framework.

    Q1: How does the network update? Is it agent.observe(terminal=terminal, reward=reward) collect gradients until the specified timesteps/episodes in update_model?

    Q2: Does the output layer of network define automatically when we define an agent? Such as I define an DQNAgent which has three actions to choose and I don't need to define the last layer is dict(type='dense', size=3, activation='softmax').

    Q3: DQNAgent needs to collect [St, a, r, St+1], in the following examples:

    while True:
        state2=f(state)  
        action = agent.act(states=state2)
        action2=g(action) 
        state, reward, terminal = environment.execute(actions=action2)
        agent.observe(reward=reward, terminal=terminal)
    

    Does it collects [state2, action2, r, state2'] or [state, action, r, state'] ?

    Q4: How can I output training loss ?

    Actually, I use a DQNAgent to realize robot navigation task. the input is compressed image, goal and the previous action. The output is three actions(forwrd, left, right) to choose. The agent is defined as:

    network_spec = [
        dict(type='dense', size=128, activation='relu'),
        dict(type='dense', size=32, activation='relu')
    ]
    
    memory = dict(
        type='replay',
        include_next_states=True,
        capacity=10000
    )
    
    exploration = dict(
        type='epsilon_decay',
        initial_epsilon=1.0,
        final_epsilon=0.1,
        timesteps=10000,
        start_timestep=0
    )
    
    update_model = dict(
        unit='timesteps',
        batch_size=64,
        frequency=64
    )
    
    optimizer = dict(
        type='adam',
        learning_rate=0.0001
    )
    
    agent = DQNAgent(
        states=dict(shape=(36,), type='float'), 
        actions=dict(shape=(3,), type='int'), 
        network=network_spec,
        update_mode=update_model,
        memory=memory,
        actions_exploration=exploration,
        optimizer=optimizer,
        double_q_model=True
    )
    

    Because I need to deal with the captured image to a compressed vector as a part of state, I run an episode as the following rather than using runner.

        while True:
            compressed_image = compress_image(observation)   # map the capture image to a 32-dim vector
            goal = env.goal   # shape(2, )
            pre_action = action  # shape(2, )
            state = compressed_image + goal + pre_action
            action = agent.act(state)
            observation, terminal, reward = env.execute(action)
            agent.observe(terminal=terminal, reward=reward)
            timestep += 1
            episode_reward += reward
            if terminal or timestep == max_timesteps:
                success = env.success
                break
    

    Can it work? I haved trained much time but the result is not ideal. So I want to know if I use tensorforce correctly. Thank you!

    opened by marooncn 23
  • [silent BUG] Saving/Restoring/Seeding PPO model when action_spec has multiple actions

    [silent BUG] Saving/Restoring/Seeding PPO model when action_spec has multiple actions

    I'm still a novice with tensorforce. I'm trying to save my ppo agent after training. The agent train well but when I save the model, stop the program, then relaunch the program, restore model then the agent performance is as from scratch, whereas it was working well before.

    To save/restore I use :

    agent.save_model(directory=directory)
    agent.restore_model(directory=directory)
    

    I have looked using :

     tf_weights =agent.model.network.get_variables()
     weights = agent.model.session.run(tf_weights)
     print(weights)
    

    That the saved weights are correctly restored.

    I tried to set some seed using tf.set_random_seed(42) at the beginning of my program in the hope to obtain reproducible results (my env is fully deterministic), but upon two sequential launch from the same restored weight, I get different actions for the same input.

    First run first action after restore :

    input : 
    [[ 0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.
       0.05  0.    0.    0.    0.05  0.    0.    0.    0.05]]
    action : 
    [-1.65043855 -0.12582253  0.33019719 -0.42400551  0.39128172 -0.1892394
     -1.38783872 -0.84797424 -0.76125687 -0.44233581  0.2647087   0.57517719]
    

    Second run first action after restore :

    input : 
    [[ 0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.    0.
       0.05  0.    0.    0.    0.05  0.    0.    0.    0.05]]
    action : 
    [ 0.00452828  1.70186901  0.18290332  0.1153125   0.80178595 -1.31738091
      0.2404308  -0.16986398 -1.69459999  2.09507513 -0.46165684 -0.34024456]
    

    I have disabled exploration and created the agent with :

    layerSize=300
     actions = {}
     for i in range(12):
           actions[str(i)] = {'type': 'float'}
    network_spec = [
                dict(type='dense', size=layerSize, activation='selu'),
                dict(type='dense', size=layerSize, activation='selu'),
                dict(type='dense', size=layerSize, activation='selu')
            ]
    agent = PPOAgent(
                states=dict(type='float', shape=(12+9,)),
                actions=actions,
                batching_capacity=1000,
                network=network_spec,
                states_preprocessing=None,
                entropy_regularization=1e-3,
                actions_exploration=None,
                step_optimizer=dict(
                    type='adam',
                    learning_rate=1e-5
                ),
            )
    

    Are there some extra parameters which needs to be saved when saving a PPO agent ? (maybe the parameters of the last layer which are used to generate the mean and variance of the gaussians needed to generate the continuous action).

    tensorforce.__version__
    '0.4.2'
    

    Thanks

    opened by unrealwill 23
  • What is the output of Agent Neural Network? If there is a std, can we fix it manually?

    What is the output of Agent Neural Network? If there is a std, can we fix it manually?

    Hi there, I'm curious about the output of actor NN. In RL, the action is obtained by sampling from the output distribution of actor NN. Therefore, the output of actor NN must have something like mean and standard deviation if it is a Gaussian. We can also fix the std and let NN give us the mean. What is the setting in your library? Can we change it manually?

    Besides, when we create the agent, we only need to provide the max and min value of actions. How do you choose the action if the sampled action outside the range? Do you select the boundary value or shrink the distribution?

    Help appreciated!

    opened by XueminLiu111 22
  • Network Spec / Layers Documentation

    Network Spec / Layers Documentation

    First of all, hello! I'm glad to have discovered this project and am planning on trying to use it.

    As for my question - I am unable to find any documentation describing what each of the layers are, what they do, and what their parameters are. Have I missed it or is it nonexistent? If it doesn't exist, I'd be happy to add some.

    opened by chairbender 22
  • InvalidArgumentError on terminal observe call

    InvalidArgumentError on terminal observe call

    Perhaps something is wrong with my code, but almost half the time when the episode ends, I get an assertion error when I run observe on my PPO agent:

    Traceback (most recent call last):
      File "ll.py", line 208, in <module>
        main()
      File "ll.py", line 181, in main
        agent.give_reward(reward, terminal)
      File "ll.py", line 123, in give_reward
        self.agent.observe(reward=reward, terminal=terminal)
      File "c:\users\connor\desktop\tensorforce\tensorforce\agents\agent.py", line 534, in observe
        terminal=terminal, reward=reward, parallel=[parallel], **kwargs
      File "c:\users\connor\desktop\tensorforce\tensorforce\core\module.py", line 578, in fn
        fetched = self.monitored_session.run(fetches=fetches, feed_dict=feed_dict)
      File "C:\Users\Connor\Anaconda3\envs\ll\lib\site-packages\tensorflow_core\python\training\monitored_session.py", line 754, in run
        run_metadata=run_metadata)
      File "C:\Users\Connor\Anaconda3\envs\ll\lib\site-packages\tensorflow_core\python\training\monitored_session.py", line 1360, in run
        raise six.reraise(*original_exc_info)
      File "C:\Users\Connor\Anaconda3\envs\ll\lib\site-packages\six.py", line 696, in reraise
        raise value
      File "C:\Users\Connor\Anaconda3\envs\ll\lib\site-packages\tensorflow_core\python\training\monitored_session.py", line 1345, in run
        return self._sess.run(*args, **kwargs)
      File "C:\Users\Connor\Anaconda3\envs\ll\lib\site-packages\tensorflow_core\python\training\monitored_session.py", line 1418, in run
        run_metadata=run_metadata)
      File "C:\Users\Connor\Anaconda3\envs\ll\lib\site-packages\tensorflow_core\python\training\monitored_session.py", line 1176, in run
        return self._sess.run(*args, **kwargs)
      File "C:\Users\Connor\Anaconda3\envs\ll\lib\site-packages\tensorflow_core\python\client\session.py", line 956, in run
        run_metadata_ptr)
      File "C:\Users\Connor\Anaconda3\envs\ll\lib\site-packages\tensorflow_core\python\client\session.py", line 1180, in _run
        feed_dict_tensor, options, run_metadata)
      File "C:\Users\Connor\Anaconda3\envs\ll\lib\site-packages\tensorflow_core\python\client\session.py", line 1359, in _do_run
        run_metadata)
      File "C:\Users\Connor\Anaconda3\envs\ll\lib\site-packages\tensorflow_core\python\client\session.py", line 1384, in _do_call
        raise type(e)(node_def, op, message)
    tensorflow.python.framework.errors_impl.InvalidArgumentError: assertion failed: [] [Condition x <= y did not hold element-wise:x (baseline-network-state.observe/baseline-network-state.core_observe/baseline-network-state.core_experience/memory.enqueue/strided_slice:0) = ] [18243] [y (baseline-network-state.observe/baseline-network-state.core_observe/baseline-network-state.core_experience/memory.enqueue/sub_2:0) = ] [17999]
             [[{{node Assert}}]]
    

    My original theory was that I was accidentally calling observe again after setting terminal=True and before resetting the agent, or some other abuse of observe, but I prevented that from happening in my code, so I don't believe that's the case. Also, the episode runs completely fine, and I get through thousands of calls to observe without ever running to any issues. It's only when terminal=True that it seems to occur.

    Running on Windows 10 x64, with tensorflow-gpu v2.0.0 on an RTX2070, Tensorforce installed from the Github at commit 827febcf8ffda851e5e4f0c9d12d4a8e8502b282

    opened by connorlbark 21
  • Configuration refactoring - thoughts and suggestions welcome!

    Configuration refactoring - thoughts and suggestions welcome!

    Configuration has been a topic of discussion for quite some time now, so I thought it'd be a good idea to get all those thoughts in one place and hopefully solicit some user thoughts as well.

    From my understanding, the current purposes of configs are:

    1. make it easy for people to get up and running
    2. get all parameters in one place for ease of setting up experiments and making them interpretable
    3. keep signatures simple, which makes it easy to create arbitrary things from one big blob

    The current issues I'm having with configs are:

    1. they are somewhere between a dictionary and a blob-object, which makes them confusing
    2. they aren't serializable, so I have to create config wrappers around Configurations. Eww.
    3. defaults and unused parameters make it challenging to know what's really being used.

    My personal opinion is that we can keep benefits (1) and (2) above and get rid of all three issues all in exchange for the small sacrifice of benefit (3). In fact, I don't know how much of a benefit (3) is, as it obfuscates the true parameters of all objects in the codebase.

    I would propose doing so by putting the burden of the parameter creation and passing into the constructors of objects onto the user. Any intermediate user will have experience creating parameter/config generation wrappers. Less experienced users who want to get up and running quickly can still use the same JSON objects you've already written with something like this when actually creating the objects downstream:

    SomeObject(config_dict['some_key'], config_dict['another_key'])

    Users who want to create defaults can do this:

    SomeObject(config_dict.get('some_key', default_value), config_dict.get('another_key', another_default_value))

    Which to me is a much more clear way of going about defaults.

    The side benefit of all of this is that you aren't stuck supporting configurations for users. Configuration and deployment are two challenging parts of any project, and I personally would prefer everyone's time spent on RL, not trying to solve a fundamental CS issue (configuration!) that in the end is always problem-specific, and no matter how hard we try, never suits everyone's needs.

    opened by trickmeyer 21
  • Issues installing Tensorforce from pip on Python 3.10

    Issues installing Tensorforce from pip on Python 3.10

    I've been trying to use Tensorforce for a project in my college machine learning course, and ran into this issue: On Python 3.10, pip for some reason automatically installed Tensorforce 0.5.5 (probably because of issues discovered below). I was then able to import modules from it into a Jupyter notebook and define a custom environment class, but it threw a confusing error when I tried to initialize an agent. Trying to upgrade Tensorforce to 0.6.5 caused errors with backend dependency numpy, "could not build wheels for numpy." I eventually discovered that this was because Tensorforce 0.6.5 uses numpy 1.19.5, which is only compatible with Python releases up to 3.9. I'm currently trying to work around the problem with pyenv, and running into other issues with tensorflow and keras, but that's probably irrelevant to this project specifically. The main issue I wanted to point out is just that pip allowed the older version of Tensorforce to be installed on Python 3.10 and somehow missed the dependency issue, resulting in esoteric NoneType errors deep in the optimizer code. (I'm using KDE Plasma 5.24.7 on Ubuntu 22.04.1 custom-installed on an HP laptop with an Intel Core i5 10th Gen CPU and integrated graphics, just in case any of that is relevant at all.)

    opened by Nat-the-Chicken 0
  • Bump tensorflow from 2.8.0 to 2.9.3

    Bump tensorflow from 2.8.0 to 2.9.3

    Bumps tensorflow from 2.8.0 to 2.9.3.

    Release notes

    Sourced from tensorflow's releases.

    TensorFlow 2.9.3

    Release 2.9.3

    This release introduces several vulnerability fixes:

    TensorFlow 2.9.2

    Release 2.9.2

    This releases introduces several vulnerability fixes:

    ... (truncated)

    Changelog

    Sourced from tensorflow's changelog.

    Release 2.9.3

    This release introduces several vulnerability fixes:

    Release 2.8.4

    This release introduces several vulnerability fixes:

    ... (truncated)

    Commits
    • a5ed5f3 Merge pull request #58584 from tensorflow/vinila21-patch-2
    • 258f9a1 Update py_func.cc
    • cd27cfb Merge pull request #58580 from tensorflow-jenkins/version-numbers-2.9.3-24474
    • 3e75385 Update version numbers to 2.9.3
    • bc72c39 Merge pull request #58482 from tensorflow-jenkins/relnotes-2.9.3-25695
    • 3506c90 Update RELEASE.md
    • 8dcb48e Update RELEASE.md
    • 4f34ec8 Merge pull request #58576 from pak-laura/c2.99f03a9d3bafe902c1e6beb105b2f2417...
    • 6fc67e4 Replace CHECK with returning an InternalError on failing to create python tuple
    • 5dbe90a Merge pull request #58570 from tensorflow/r2.9-7b174a0f2e4
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • Gym envirnoment broken: 'dict' object has no attribute 'env_specs

    Gym envirnoment broken: 'dict' object has no attribute 'env_specs

    It seems that the Gym module had updated recently, leaving examples that use it, broken.

    I tried act_observe_interface.py

    Traceback:

    AttributeError                            Traceback (most recent call last)
    Cell In [75], line 59
         55     environment.close()
         58 if __name__ == '__main__':
    ---> 59     main()
    
    Cell In [75], line 20, in main()
         19 def main():
    ---> 20     environment = Environment.create(environment='cartpole.json')
         21     agent = Agent.create(agent='ppo.json', environment=environment)
         23     # Train for 100 episodes
    
    File c:\Stuff\Code\ML-main\ML_env\lib\site-packages\tensorforce\environments\environment.py:176, in Environment.create(environment, max_episode_timesteps, remote, blocking, host, port, **kwargs)
        173     if max_episode_timesteps is None:
        174         max_episode_timesteps = kwargs.pop('max_episode_timesteps', None)
    --> 176     return Environment.create(
        177         environment=environment, max_episode_timesteps=max_episode_timesteps, **kwargs
        178     )
        180 elif '.' in environment:
        181     # Library specification
        182     library_name, module_name = environment.rsplit('.', 1)
    
    File c:\Stuff\Code\ML-main\ML_env\lib\site-packages\tensorforce\environments\environment.py:192, in Environment.create(environment, max_episode_timesteps, remote, blocking, host, port, **kwargs)
        189 elif environment in tensorforce.environments.environments:
        190     # Keyword specification
        191     environment = tensorforce.environments.environments[environment]
    --> 192     return Environment.create(
        193         environment=environment, max_episode_timesteps=max_episode_timesteps, **kwargs
        194     )
        196 else:
        197     # Default: OpenAI Gym
        198     try:
    
    File c:\Stuff\Code\ML-main\ML_env\lib\site-packages\tensorforce\environments\environment.py:146, in Environment.create(environment, max_episode_timesteps, remote, blocking, host, port, **kwargs)
        143     return environment
        145 elif isinstance(environment, type) and issubclass(environment, Environment):
    --> 146     environment = environment(**kwargs)
        147     assert isinstance(environment, Environment)
        148     return Environment.create(
        149         environment=environment, max_episode_timesteps=max_episode_timesteps
        150     )
    
    File c:\Stuff\Code\ML-main\ML_env\lib\site-packages\tensorforce\environments\openai_gym.py:163, in OpenAIGym.__init__(self, level, visualize, max_episode_steps, terminal_reward, reward_threshold, drop_states_indices, visualize_directory, **kwargs)
        161     self.max_episode_steps = max_episode_steps
        162 else:
    --> 163     self.environment, self.max_episode_steps = self.__class__.create_level(
        164         level=self.level, max_episode_steps=max_episode_steps,
        165         reward_threshold=reward_threshold, **kwargs
        166     )
        168 if visualize_directory is not None:
        169     self.environment = gym.wrappers.Monitor(
        170         env=self.environment, directory=visualize_directory
        171     )
    
    File c:\Stuff\Code\ML-main\ML_env\lib\site-packages\tensorforce\environments\openai_gym.py:67, in OpenAIGym.create_level(cls, level, max_episode_steps, reward_threshold, **kwargs)
         64 requires_register = False
         66 # Find level
    ---> 67 if level not in gym.envs.registry.env_specs:
         68     if max_episode_steps is None:  # interpret as false if level does not exist
         69         max_episode_steps = False
    
    AttributeError: 'dict' object has no attribute 'env_specs'
    
    opened by Ammar-AlDabbagh 1
  • lstm+ppo

    lstm+ppo

    While training Pendulum-v0 with lstm+ppo, the following problems occurred using parallel environment:

    Traceback (most recent call last):
      File "D:\desktop\lunwen_dabao\xinsuanfa0912\tian_sac_test\launch_multiprocessing_traning_cylinder.py", line 91, in <module>
        runner = Runner(
      File "C:\Users\1900\.conda\envs\yl\lib\site-packages\tensorforce\execution\runner.py", line 168, in __init__
        environment = Environment.create(
      File "C:\Users\1900\.conda\envs\yl\lib\site-packages\tensorforce\environments\environment.py", line 94, in create
        environment = MultiprocessingEnvironment(
      File "C:\Users\1900\.conda\envs\yl\lib\site-packages\tensorforce\environments\multiprocessing_environment.py", line 62, in __init__
        process.start()
      File "C:\Users\1900\.conda\envs\yl\lib\multiprocessing\process.py", line 121, in start
        self._popen = self._Popen(self)
      File "C:\Users\1900\.conda\envs\yl\lib\multiprocessing\context.py", line 224, in _Popen
        return _default_context.get_context().Process._Popen(process_obj)
      File "C:\Users\1900\.conda\envs\yl\lib\multiprocessing\context.py", line 327, in _Popen
        return Popen(process_obj)
      File "C:\Users\1900\.conda\envs\yl\lib\multiprocessing\popen_spawn_win32.py", line 93, in __init__
        reduction.dump(process_obj, to_child)
      File "C:\Users\1900\.conda\envs\yl\lib\multiprocessing\reduction.py", line 60, in dump
        ForkingPickler(file, protocol).dump(obj)
    AttributeError: Can't pickle local object 'overwrite_staticmethod.<locals>.overwritten'
    

    Below is my code, what is wrong with that part? And about the settings of lstm, is there a problem?

    import argparse
    import re
    from tensorforce import Runner, Agent,Environment
    # import envobject_cylinder
    import os
    import gym
    
    parser = argparse.ArgumentParser()
    shell_args = vars(parser.parse_args())
    shell_args['num_episodes']=300
    shell_args['max_episode_timesteps']=200
    
    number_servers=10
    environments = []
    for i in range(10):
        env = Environment.create(
        environment='gym', level='Pendulum-v0', max_episode_timesteps=200
        )
        environments.append(env)
    
    # environment = Environment.create(
    #     environment='gym', level='Pendulum-v0', max_episode_timesteps=200
    # )
    
    network_spec = [
        dict(type='rnn', size=512,horizon=4,activation='tanh',cell='lstm'),
        dict(type='rnn', size=512,horizon=4,activation='tanh',cell='lstm')
    ]
    baseline_spec = [
       dict(type='rnn', size=512,horizon=4,activation='tanh',cell='lstm', ),
        dict(type='rnn', size=512,horizon=4,activation='tanh',cell='lstm', )
    ]
    
    # env=gym.make('Pendulum-v0')
    # Instantiate a Tensorforce agent
    agent = Agent.create(
        states=dict(
                    type='float',
                    shape=(int(3), )
                ),
        actions=dict(
                type='float',
                shape=(1, ),
                min_value=-2,
                max_value=2
            ),
        max_episode_timesteps=200,
        agent='ppo',
        # num_parallel=10,
        environment=env,
        # max_episode_timesteps=200,
        batch_size=20,
         network=network_spec,
        learning_rate=0.001,state_preprocessing=None,
        entropy_regularization=0.01, likelihood_ratio_clipping=0.2, subsampling_fraction=0.2,
        predict_terminal_values=True,
        discount=0.97,
        # baseline=dict(type='1', size=[32, 32]),
        baseline=baseline_spec,
        baseline_optimizer=dict(
            type='multi_step',
            optimizer=dict(
                type='adam',
                learning_rate=1e-3
            ),
            num_steps=5
        ),
        multi_step=25,
        parallel_interactions=number_servers,
        saver=dict(directory=os.path.join(os.getcwd(), 'saved_models/checkpoint'),frequency=1  
        # save checkpoint every 100 updates
        ),
        summarizer=dict(
            directory='summary',
            # list of labels, or 'all'
            summaries=['entropy', 'kl-divergence', 'loss', 'reward', 'update-norm']
        ),
    )
    print('Agent defined DONE!')
    
    runner = Runner(
        agent=agent,
        num_parallel=10,
        environments=environments,
        max_episode_timesteps=200,
        evaluation=False,
        remote='multiprocessing',
    )
    print('Runner defined DONE!')
    
    runner.run(num_episodes=shell_args['num_episodes'],
               save_best_agent ='best_model',
               sync_episodes=False,
               )
    runner.close()
    
    opened by 1900360 0
  • GPU integration for MacOs12.3 M1 Max

    GPU integration for MacOs12.3 M1 Max

    I ran the Quickstart.py example, and I get the following error;

    Metal device set to: Apple M1 Max

    systemMemory: 32.00 GB maxCacheSize: 10.67 GB

    WARNING:root:Infinite min_value bound for state. Episodes: 0%| | 0/200 [00:00, return=0.00, ts/ep=0, sec/ep=0.00, ms/ts=0.0, agent=0.0%]Traceback (most recent call last): File "/Users/dominikrichard/workspace/minesweeping/minesweepingpython/main/tensforce_testing.py", line 53, in main() File "/Users/dominikrichard/workspace/minesweeping/minesweepingpython/main/tensforce_testing.py", line 46, in main runner.run(num_episodes=200) File "/opt/homebrew/anaconda3/envs/TensFenv/lib/python3.10/site-packages/tensorforce/execution/runner.py", line 649, in run self.handle_act(parallel=n) File "/opt/homebrew/anaconda3/envs/TensFenv/lib/python3.10/site-packages/tensorforce/execution/runner.py", line 697, in handle_act actions = self.agent.act(states=self.states[parallel], parallel=parallel) File "/opt/homebrew/anaconda3/envs/TensFenv/lib/python3.10/site-packages/tensorforce/agents/agent.py", line 415, in act return super().act( File "/opt/homebrew/anaconda3/envs/TensFenv/lib/python3.10/site-packages/tensorforce/agents/recorder.py", line 262, in act actions, internals = self.fn_act( File "/opt/homebrew/anaconda3/envs/TensFenv/lib/python3.10/site-packages/tensorforce/agents/agent.py", line 462, in fn_act actions, timesteps = self.model.act( File "/opt/homebrew/anaconda3/envs/TensFenv/lib/python3.10/site-packages/tensorforce/core/module.py", line 136, in decorated output_args = function_graphsstr(graph_params) File "/opt/homebrew/anaconda3/envs/TensFenv/lib/python3.10/site-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler raise e.with_traceback(filtered_tb) from None File "/opt/homebrew/anaconda3/envs/TensFenv/lib/python3.10/site-packages/tensorflow/python/eager/execute.py", line 54, in quick_execute tensors = pywrap_tfe.TFE_Py_Execute(ctx.handle, device_name, op_name, tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation agent/VerifyFinite/CheckNumerics: Could not satisfy explicit device specification '' because the node {{colocation_node agent/VerifyFinite/CheckNumerics}} was colocated with a group of nodes that required incompatible device '/job:localhost/replica:0/task:0/device:GPU:0'. All available devices [/job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:GPU:0]. Colocation Debug Info: Colocation group had the following types and supported devices: Root Member(assigned_device_name_index=1 requested_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' assigned_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' resource_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[] Identity: GPU CPU Switch: GPU CPU CheckNumerics: CPU _Arg: GPU CPU

    Colocation members, user-requested devices, and framework assigned devices, if any: args_0 (_Arg) framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0 agent/VerifyFinite/CheckNumerics (CheckNumerics) agent/VerifyFinite/control_dependency (Identity) agent/assert_greater_equal/Assert/AssertGuard/args_0/_16 (Switch) agent/assert_less_equal/Assert/AssertGuard/args_0/_26 (Switch) Func/agent/StatefulPartitionedCall/input/_80 (Identity) /job:localhost/replica:0/task:0/device:GPU:0 Func/agent/assert_greater_equal/Assert/AssertGuard/then/_10/input/_153 (Identity) Func/agent/assert_greater_equal/Assert/AssertGuard/else/_11/input/_159 (Identity) Func/agent/assert_less_equal/Assert/AssertGuard/then/_20/input/_165 (Identity) Func/agent/assert_less_equal/Assert/AssertGuard/else/_21/input/_171 (Identity) Func/agent/StatefulPartitionedCall/state_preprocessing/PartitionedCall/input/_260 (Identity) /job:localhost/replica:0/task:0/device:GPU:0 Func/agent/StatefulPartitionedCall/state_preprocessing/PartitionedCall/linear_normalization0/PartitionedCall/input/_356 (Identity) /job:localhost/replica:0/task:0/device:GPU:0

         [[{{node agent/VerifyFinite/CheckNumerics}}]] [Op:__inference_act_1848]
    

    Episodes: 0%| | 0/200 [00:00, return=0.00, ts/ep=0, sec/ep=0.00, ms/ts=0.0, agent=0.0%]


    I installed Tensorforce using this guide; https://tensorforce.readthedocs.io/en/latest/basics/installation.html

    for M1 Mac in a new Conda environment. I also had to upgrade numpy to 1.22 to run the code.

    My Conda env is build as follow;

    Name Version Build Channel

    absl-py 1.2.0 pypi_0 pypi astunparse 1.6.3 pypi_0 pypi blas 1.0 openblas
    bzip2 1.0.8 h620ffc9_4
    c-ares 1.18.1 h1a28f6b_0
    ca-certificates 2022.07.19 hca03da5_0
    cachetools 5.2.0 pypi_0 pypi certifi 2022.6.15 py310hca03da5_0
    charset-normalizer 2.1.0 pypi_0 pypi cloudpickle 2.1.0 pypi_0 pypi cycler 0.11.0 pypi_0 pypi flatbuffers 1.12 pypi_0 pypi fonttools 4.34.4 pypi_0 pypi gast 0.4.0 pypi_0 pypi google-auth 2.10.0 pypi_0 pypi google-auth-oauthlib 0.4.6 pypi_0 pypi google-pasta 0.2.0 pypi_0 pypi grpcio 1.42.0 py310h95c9599_0
    gym 0.21.0 pypi_0 pypi h5py 3.6.0 py310h181c318_0
    hdf5 1.12.1 h160e8cb_2
    idna 3.3 pypi_0 pypi keras 2.9.0 pypi_0 pypi keras-preprocessing 1.1.2 pypi_0 pypi kiwisolver 1.4.4 pypi_0 pypi krb5 1.19.2 h3b8d789_0
    libclang 14.0.6 pypi_0 pypi libcurl 7.84.0 hc6d1d07_0
    libcxx 12.0.0 hf6beb65_1
    libedit 3.1.20210910 h1a28f6b_0
    libev 4.33 h1a28f6b_1
    libffi 3.4.2 hc377ac9_4
    libgfortran 5.0.0 11_2_0_he6877d6_26
    libgfortran5 11.2.0 he6877d6_26
    libnghttp2 1.46.0 h95c9599_0
    libopenblas 0.3.20 hea475bc_0
    libssh2 1.10.0 hf27765b_0
    llvm-openmp 12.0.0 haf9daa7_1
    markdown 3.4.1 pypi_0 pypi markupsafe 2.1.1 pypi_0 pypi matplotlib 3.5.1 pypi_0 pypi msgpack 1.0.3 pypi_0 pypi msgpack-numpy 0.4.7.1 pypi_0 pypi ncurses 6.3 h1a28f6b_3
    numpy 1.22.0 pypi_0 pypi oauthlib 3.2.0 pypi_0 pypi openssl 1.1.1q h1a28f6b_0
    opt-einsum 3.3.0 pypi_0 pypi packaging 21.3 pypi_0 pypi pillow 9.2.0 pypi_0 pypi pip 22.1.2 py310hca03da5_0
    protobuf 3.19.4 pypi_0 pypi pyasn1 0.4.8 pypi_0 pypi pyasn1-modules 0.2.8 pypi_0 pypi pyparsing 3.0.9 pypi_0 pypi python 3.10.4 hbdb9e5c_0
    python-dateutil 2.8.2 pypi_0 pypi readline 8.1.2 h1a28f6b_1
    requests 2.28.1 pypi_0 pypi requests-oauthlib 1.3.1 pypi_0 pypi rsa 4.9 pypi_0 pypi setuptools 61.2.0 py310hca03da5_0
    six 1.15.0 pypi_0 pypi sqlite 3.39.2 h1058600_0
    tensorboard 2.9.1 pypi_0 pypi tensorboard-data-server 0.6.1 pypi_0 pypi tensorboard-plugin-wit 1.8.1 pypi_0 pypi tensorflow-deps 2.8.0 0 apple tensorflow-estimator 2.9.0 pypi_0 pypi tensorflow-macos 2.9.2 pypi_0 pypi tensorflow-metal 0.5.0 pypi_0 pypi tensorforce 0.6.5 pypi_0 pypi termcolor 1.1.0 pypi_0 pypi tk 8.6.12 hb8d0fd4_0
    tqdm 4.62.3 pypi_0 pypi typing-extensions 4.3.0 pypi_0 pypi tzdata 2022a hda174b7_0
    urllib3 1.26.11 pypi_0 pypi werkzeug 2.2.2 pypi_0 pypi wheel 0.37.1 pyhd3eb1b0_0
    wrapt 1.14.1 pypi_0 pypi xz 5.2.5 h1a28f6b_1
    zlib 1.2.12 h5a0b063_2


    Is there any way to dares this issue? I also tried downgrading python to 3.9 with did not work. Is Mac OS not supposed to be supported using TensorFlow-metal?

    Thank you.

    opened by doric35 0
  • Bump mistune from 0.8.4 to 2.0.3 in /docs

    Bump mistune from 0.8.4 to 2.0.3 in /docs

    Bumps mistune from 0.8.4 to 2.0.3.

    Release notes

    Sourced from mistune's releases.

    Version 2.0.2

    Fix escape_url via lepture/mistune#295

    Version 2.0.1

    Fix XSS for image link syntax.

    Version 2.0.0

    First release of Mistune v2.

    Version 2.0.0 RC1

    In this release, we have a Security Fix for harmful links.

    Version 2.0.0 Alpha 1

    This is the first release of v2. An alpha version for users to have a preview of the new mistune.

    Changelog

    Sourced from mistune's changelog.

    Changelog

    Here is the full history of mistune v2.

    Version 2.0.4

    
    Released on Jul 15, 2022
    
    • Fix url plugin in &lt;a&gt; tag
    • Fix * formatting

    Version 2.0.3

    Released on Jun 27, 2022

    • Fix table plugin
    • Security fix for CVE-2022-34749

    Version 2.0.2

    
    Released on Jan 14, 2022
    

    Fix escape_url

    Version 2.0.1

    Released on Dec 30, 2021

    XSS fix for image link syntax.

    Version 2.0.0

    
    Released on Dec 5, 2021
    

    This is the first non-alpha release of mistune v2.

    Version 2.0.0rc1

    Released on Feb 16, 2021

    Version 2.0.0a6

    
    </tr></table> 
    

    ... (truncated)

    Commits
    • 3f422f1 Version bump 2.0.3
    • a6d4321 Fix asteris emphasis regex CVE-2022-34749
    • 5638e46 Merge pull request #307 from jieter/patch-1
    • 0eba471 Fix typo in guide.rst
    • 61e9337 Fix table plugin
    • 76dec68 Add documentation for renderer heading when TOC enabled
    • 799cd11 Version bump 2.0.2
    • babb0cf Merge pull request #295 from dairiki/bug.escape_url
    • fc2cd53 Make mistune.util.escape_url less aggressive
    • 3e8d352 Version bump 2.0.1
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
Releases(0.6.5)
  • 0.6.5(Aug 30, 2021)

    Agents:
    • Renamed agent argument reward_preprocessing to reward_processing, and in case of Tensorforce agent moved to reward_estimation[reward_processing]
    Distributions:
    • New categorical distribution argument skip_linear to not add the implicit linear logits layer
    Environments:
    • Support for multi-actor parallel environments via new function Environment.num_actors()
      • Runner uses multi-actor parallelism by default if environment is multi-actor
    • New optional Environment function episode_return() which returns the true return of the last episode, if cumulative sum of environment rewards is not a good metric for runner display
    Examples:
    • New vectorized_environment.py and multiactor_environment.py script to illustrate how to setup a vectorized/multi-actor environment.
    Source code(tar.gz)
    Source code(zip)
  • 0.6.4(Jun 5, 2021)

    Agents:
    • Agent argument update_frequency / update[frequency] now supports float values > 0.0, which specify the update-frequency relative to the batch-size
    • Changed default value for argument update_frequency from 1.0 to 0.25 for DQN, DoubleDQN, DuelingDQN agents
    • New argument return_processing and advantage_processing (where applicable) for all agent sub-types
    • New function Agent.get_specification() which returns the agent specification as dictionary
    • New function Agent.get_architecture() which returns a string representation of the network layer architecture
    Modules:
    • Improved and simplified module specification, for instance: network=my_module instead of network=my_module.TestNetwork, or environment=envs.custom_env instead of environment=envs.custom_env.CustomEnvironment (module file needs to be in the same directory or a sub-directory)
    Networks:
    • New argument single_output=True for some policy types which, if False, allows the specification of additional network outputs for some/all actions via registered tensors
    • KerasNetwork argument model now supports arbitrary functions as long as they return a tf.keras.Model
    Layers:
    • New layer type SelfAttention (specification key: self_attention)
    Parameters:
    • Support tracking of non-constant parameter values
    Runner:
    • Rename attribute episode_rewards as episode_returns, and TQDM status reward as return
    • Extend argument agent to support Agent.load() keyword arguments to load an existing agent instead of creating a new one.
    Examples:
    • Added action_masking.py example script to illustrate an environment implementation with built-in action masking.
    Buxfixes:
    • Customized device placement was not applied to most tensors
    Source code(tar.gz)
    Source code(zip)
  • 0.6.3(Mar 22, 2021)

    Agents:
    • New agent argument tracking and corresponding function tracked_tensors() to track and retrieve the current value of predefined tensors, similar to summarizer for TensorBoard summaries
    • New experimental value trace_decay and gae_decay for Tensorforce agent argument reward_estimation, soon for other agent types as well
    • New options "early" and "late" for value estimate_advantage of Tensorforce agent argument reward_estimation
    • Changed default value for Agent.act() argument deterministic from False to True
    Networks:
    • New network type KerasNetwork (specification key: keras) as wrapper for networks specified as Keras model
    • Passing a Keras model class/object as policy/network argument is automatically interpreted as KerasNetwork
    Distributions:
    • Changed Gaussian distribution argument global_stddev=False to stddev_mode='predicted'
    • New Categorical distribution argument temperature_mode=None
    Layers:
    • New option for Function layer argument function to pass string function expression with argument "x", e.g. "(x+1.0)/2.0"
    Summarizer:
    • New summary episode-length recorded as part of summary label "reward"
    Environments:
    • Support for vectorized parallel environments via new function Environment.is_vectorizable() and new argument num_parallel for Environment.reset()
      • See tensorforce/environments.cartpole.py for a vectorizable environment example
      • Runner uses vectorized parallelism by default if num_parallel > 1, remote=None and environment supports vectorization
      • See examples/act_observe_vectorized.py for more details on act-observe interaction
    • New extended and vectorizable custom CartPole environment via key custom_cartpole (work in progress)
    • New environment argument reward_shaping to provide a simple way to modify/shape rewards of an environment, can be specified either as callable or string function expression
    run.py script:
    • New option for command line arguments --checkpoints and --summaries to add comma-separated checkpoint/summary filename in addition to directory
    • Added episode lengths to logging plot besides episode returns
    Buxfixes:
    • Temporal horizon handling of RNN layers
    • Critical bugfix for late horizon value prediction (including DQN variants and DPG agent) in combination with baseline RNN
    • GPU problems with scatter operations
    Source code(tar.gz)
    Source code(zip)
  • 0.6.2(Oct 3, 2020)

  • 0.6.1(Sep 19, 2020)

    Agents:
    • Removed default value "adam" for Tensorforce agent argument optimizer (since default optimizer argument learning_rate removed, see below)
    • Removed option "minimum" for Tensorforce agent argument memory, use None instead
    • Changed default value for dqn/double_dqn/dueling_dqn agent argument huber_loss from 0.0 to None
    Layers:
    • Removed default value 0.999 for exponential_normalization layer argument decay
    • Added new layer batch_normalization (generally should only be used for the agent arguments reward_processing[return_processing] and reward_processing[advantage_processing])
    • Added exponential/instance_normalization layer argument only_mean with default False
    • Added exponential/instance_normalization layer argument min_variance with default 1e-4
    Optimizers:
    • Removed default value 1e-3 for optimizer argument learning_rate
    • Changed default value for optimizer argument gradient_norm_clipping from 1.0 to None (no gradient clipping)
    • Added new optimizer doublecheck_step and corresponding argument doublecheck_update for optimizer wrapper
    • Removed linesearch_step optimizer argument accept_ratio
    • Removed natural_gradient optimizer argument return_improvement_estimate
    Saver:
    • Added option to specify agent argument saver as string, which is interpreted as saver[directory] with otherwise default values
    • Added default value for agent argument saver[frequency] as 10 (save model every 10 updates by default)
    • Changed default value of agent argument saver[max_checkpoints] from 5 to 10
    Summarizer:
    • Added option to specify agent argument summarizer as string, which is interpreted as summarizer[directory] with otherwise default values
    • Renamed option of agent argument summarizer from summarizer[labels] to summarizer[summaries] (use of the term "label" due to earlier version, outdated and confusing by now)
    • Changed interpretation of agent argument summarizer[summaries] = "all" to include only numerical summaries, so all summaries except "graph"
    • Changed default value of agent argument summarizer[summaries] from ["graph"] to "all"
    • Changed default value of agent argument summarizer[max_summaries] from 5 to 7 (number of different colors in TensorBoard)
    • Added option summarizer[filename] to agent argument summarizer
    Recorder:
    • Added option to specify agent argument recorder as string, which is interpreted as recorder[directory] with otherwise default values
    run.py script:
    • Added --checkpoints/--summaries/--recordings command line argument to enable saver/summarizer/recorder agent argument specification separate from core agent configuration
    Examples:
    • Added save_load_agent.py example script to illustrate regular agent saving and loading
    Buxfixes
    • Fixed problem with optimizer argument gradient_norm_clipping not being applied correctly
    • Fixed problem with exponential_normalization layer not updating moving mean and variance correctly
    • Fixed problem with recent memory for timestep-based updates sometimes sampling invalid memory indices
    Source code(tar.gz)
    Source code(zip)
  • 0.6.0(Aug 30, 2020)

    • Removed agent arguments execution, buffer_observe, seed
    • Renamed agent arguments baseline_policy/baseline_network/critic_network to baseline/critic
    • Renamed agent reward_estimation arguments estimate_horizon to predict_horizon_values, estimate_actions to predict_action_values, estimate_terminal to predict_terminal_values
    • Renamed agent argument preprocessing to state_preprocessing
    • Default agent preprocessing linear_normalization
    • Moved agent arguments for reward/return/advantage processing from preprocessing to reward_preprocessing and reward_estimation[return_/advantage_processing]
    • New agent argument config with values buffer_observe, enable_int_action_masking, seed
    • Renamed PPO/TRPO/DPG argument critic_network/_optimizer to baseline/baseline_optimizer
    • Renamed PPO argument optimization_steps to multi_step
    • New TRPO argument subsampling_fraction
    • Changed agent argument use_beta_distribution default to false
    • Added double DQN agent (double_dqn)
    • Removed Agent.act() argument evaluation
    • Removed agent function arguments query (functionality removed)
    • Agent saver functionality changed (Checkpoint/SavedModel instead of Saver/Protobuf): save/load functions and saver argument changed
    • Default behavior when specifying saver is not to load agent, unless agent is created via Agent.load
    • Agent summarizer functionality changed: summarizer argument changed, some summary labels and other options removed
    • Renamed RNN layers internal_{rnn/lstm/gru} to rnn/lstm/gru and rnn/lstm/gru to input_{rnn/lstm/gru}
    • Renamed auto network argument internal_rnn to rnn
    • Renamed (internal_)rnn/lstm/gru layer argument length to horizon
    • Renamed update_modifier_wrapper to optimizer_wrapper
    • Renamed optimizing_step to linesearch_step, and UpdateModifierWrapper argument optimizing_iterations to linesearch_iterations
    • Optimizer subsampling_step accepts both absolute (int) and relative (float) fractions
    • Objective policy_gradient argument ratio_based renamed to importance_sampling
    • Added objectives state_value and action_value
    • Added Gaussian distribution arguments global_stddev and bounded_transform (for improved bounded action space handling)
    • Changed default memory device argument to CPU:0
    • Renamed rewards summaries
    • Agent.create() accepts act-function as agent argument for recording
    • Singleton states and actions are now consistently handled as singletons
    • Major change to policy handling and defaults, in particular parametrized_distributions, new default policies parametrized_state/action_value
    • Combined long and int type
    • Always wrap environment in EnvironmentWrapper class
    • Changed tune.py arguments
    Source code(tar.gz)
    Source code(zip)
  • 0.5.5(Jun 16, 2020)

    • Changed independent mode of agent.act to use final values of dynamic hyperparameters and avoid TensorFlow conditions
    • Extended "tensorflow" format of agent.save to include an optimized Protobuf model with an act-only graph as .pb file, and Agent.load format "pb-actonly" to load act-only agent based on Protobuf model
    • Support for custom summaries via new summarizer argument value custom to specify summary type, and Agent.summarize(...) to record summary values
    • Added min/max-bounds for dynamic hyperparameters min/max-bounds to assert valid range and infer other arguments
    • Argument batch_size now mandatory for all agent classes
    • Removed Estimator argument capacity, now always automatically inferred
    • Internal changes related to agent arguments memory, update and reward_estimation
    • Changed the default bias and activation argument of some layers
    • Fixed issues with sequence preprocessor
    • DQN and dueling DQN properly constrained to int actions only
    • Added use_beta_distribution argument with default True to many agents and ParametrizedDistributions policy, so default can be changed
    Source code(tar.gz)
    Source code(zip)
  • 0.5.4(Feb 15, 2020)

    • DQN/DuelingDQN/DPG argument memory now required to be specified explicitly, plus update_frequency default changed
    • Removed (temporarily) conv1d/conv2d_transpose layers due to TensorFlow gradient problems
    • Agent, Environment and Runner can now be imported via from tensorforce import ...
    • New generic reshape layer available as reshape
    • Support for batched version of Agent.act and Agent.observe
    • Support for parallelized remote environments based on Python's multiprocessing and socket (replacing tensorforce/contrib/socket_remote_env/ and tensorforce/environments/environment_process_wrapper.py), available via Environment.create(...), Runner(...) and run.py
    • Removed ParallelRunner and merged functionality with Runner
    • Changed run.py arguments
    • Changed independent mode for Agent.act: additional argument internals and corresponding return value, initial internals via Agent.initial_internals(), Agent.reset() not required anymore
    • Removed deterministic argument for Agent.act unless independent mode
    • Added format argument to save/load/restore with supported formats tensorflow, numpy and hdf5
    • Changed save argument append_timestep to append with default None (instead of 'timesteps')
    • Added get_variable and assign_variable agent functions
    Source code(tar.gz)
    Source code(zip)
  • 0.5.3(Dec 26, 2019)

    • Added optional memory argument to various agents
    • Improved summary labels, particularly "entropy" and "kl-divergence"
    • linear layer now accepts tensors of rank 1 to 3
    • Network output / distribution input does not need to be a vector anymore
    • Transposed convolution layers (conv1d/2d_transpose)
    • Parallel execution functionality contributed by @jerabaul29, currently under tensorforce/contrib/
    • Accept string for runner save_best_agent argument to specify best model directory different from saver configuration
    • saver argument steps removed and seconds renamed to frequency
    • Moved Parallel/Runner argument max_episode_timesteps from run(...) to constructor
    • New Environment.create(...) argument max_episode_timesteps
    • TensorFlow 2.0 support
    • Improved Tensorboard summaries recording
    • Summary labels graph, variables and variables-histogram temporarily not working
    • TF-optimizers updated to TensorFlow 2.0 Keras optimizers
    • Added TensorFlow Addons dependency, and support for TFA optimizers
    • Changed unit of target_sync_frequency from timesteps to updates for dqn and dueling_dqn agent
    Source code(tar.gz)
    Source code(zip)
  • 0.5.2(Oct 14, 2019)

    • Improved unittest performance
    • Added updates and renamed timesteps/episodes counter for agents and runners
    • Renamed critic_{network,optimizer} argument to baseline_{network,optimizer}
    • Added Actor-Critic (ac), Advantage Actor-Critic (a2c) and Dueling DQN (dueling_dqn) agents
    • Improved "same" baseline optimizer mode and added optional weight specification
    • Reuse layer now global for parameter sharing across modules
    • New block layer type (block) for easier sharing of layer blocks
    • Renamed PolicyAgent/-Model to TensorforceAgent/-Model
    • New Agent.load(...) function, saving includes agent specification
    • Removed PolicyAgent argument (baseline-)network
    • Added policy argument temperature
    • Removed "same" and "equal" options for baseline_* arguments and changed internal baseline handling
    • Combined state/action_value to value objective with argument value either "state" or "action"
    Source code(tar.gz)
    Source code(zip)
  • 0.5.1(Sep 10, 2019)

  • 0.5.0(Sep 8, 2019)

    Major Revision

    Agent:
    • DQFDAgent removed (temporarily)
    • DQNNstepAgent and NAFAgent part of DQNAgent
    • Agents need to be initialized via agent.initialize() before application
    • States/actions of type int require an entry num_values (instead of num_actions)
    • Agent.from_spec() changed and renamed to Agent.create()
    • Agent.act() argument fetch_tensors changed and renamed to query, index renamed to parallel, buffered removed
    • Agent.observe() argument index renamed to parallel
    • Agent.atomic_observe() removed
    • Agent.save/restore_model() renamed to Agent.save/restore()
    Agent arguments:
    • update_mode renamed to update
    • states_preprocessing and reward_preprocessing changed and combined to preprocessing
    • actions_exploration changed and renamed to exploration
    • execution entry num_parallel replaced by a separate argument parallel_interactions
    • batched_observe and batching_capacity replaced by argument buffer_observe
    • scope renamed to name
    DQNAgent arguments:
    • update_mode replaced by batch_size, update_frequency and start_updating
    • optimizer removed, implicitly defined as 'adam', learning_rate added
    • memory defines capacity of implicitly defined memory 'replay'
    • double_q_model removed (temporarily)
    Policy gradient agent arguments:
    • New mandatory argument max_episode_timesteps
    • update_mode replaced by batch_size and update_frequency
    • memory removed
    • baseline_mode removed
    • baseline argument changed and renamed to critic_network
    • baseline_optimizer renamed to critic_optimizer
    • gae_lambda removed (temporarily)
    PPOAgent arguments:
    • step_optimizer removed, implicitly defined as 'adam', learning_rate added
    TRPOAgent arguments:
    • cg_* and ls_* arguments removed
    VPGAgent arguments:
    • optimizer removed, implicitly defined as 'adam', learning_rate added
    Environment:
    • Environment properties states and actions are now functions states() and actions()
    • States/actions of type int require an entry num_values (instead of num_actions)
    • New function Environment.max_episode_timesteps()
    Contrib environments:
    • ALE, MazeExp, OpenSim, Gym, Retro, PyGame and ViZDoom moved to tensorforce.environments
    • Other environment implementations removed (may be upgraded in the future)
    Runners:
    • Improved run() API for Runner and ParallelRunner
    • ThreadedRunner removed
    Other:
    • examples folder (including configs) removed, apart from quickstart.py
    • New benchmarks folder to replace parts of old examples folder
    Source code(tar.gz)
    Source code(zip)
Owner
Tensorforce
Tensorforce
Paddle-RLBooks is a reinforcement learning code study guide based on pure PaddlePaddle.

Paddle-RLBooks Welcome to Paddle-RLBooks which is a reinforcement learning code study guide based on pure PaddlePaddle. 欢迎来到Paddle-RLBooks,该仓库主要是针对强化学

AgentMaker 117 Dec 12, 2022
This is the official implementation of Multi-Agent PPO.

MAPPO Chao Yu*, Akash Velu*, Eugene Vinitsky, Yu Wang, Alexandre Bayen, and Yi Wu. Website: https://sites.google.com/view/mappo This repository implem

653 Jan 06, 2023
Game Agent Framework. Helping you create AIs / Bots that learn to play any game you own!

Serpent.AI - Game Agent Framework (Python) Update: Revival (May 2020) Development work has resumed on the framework with the aim of bringing it into 2

Serpent.AI 6.4k Jan 05, 2023
Tensorforce: a TensorFlow library for applied reinforcement learning

Tensorforce: a TensorFlow library for applied reinforcement learning Introduction Tensorforce is an open-source deep reinforcement learning framework,

Tensorforce 3.2k Jan 02, 2023
Deep Reinforcement Learning for Keras.

Deep Reinforcement Learning for Keras What is it? keras-rl implements some state-of-the art deep reinforcement learning algorithms in Python and seaml

Keras-RL 5.4k Jan 04, 2023
Dopamine is a research framework for fast prototyping of reinforcement learning algorithms.

Dopamine Dopamine is a research framework for fast prototyping of reinforcement learning algorithms. It aims to fill the need for a small, easily grok

Google 10k Jan 07, 2023
OpenAI Baselines: high-quality implementations of reinforcement learning algorithms

Status: Maintenance (expect bug fixes and minor updates) Baselines OpenAI Baselines is a set of high-quality implementations of reinforcement learning

OpenAI 13.5k Jan 07, 2023
Open world survival environment for reinforcement learning

Crafter Open world survival environment for reinforcement learning. Highlights Crafter is a procedurally generated 2D world, where the agent finds foo

Danijar Hafner 213 Jan 05, 2023
TensorFlow Reinforcement Learning

TRFL TRFL (pronounced "truffle") is a library built on top of TensorFlow that exposes several useful building blocks for implementing Reinforcement Le

DeepMind 3.1k Dec 29, 2022
Reinforcement Learning Coach by Intel AI Lab enables easy experimentation with state of the art Reinforcement Learning algorithms

Coach Coach is a python reinforcement learning framework containing implementation of many state-of-the-art algorithms. It exposes a set of easy-to-us

Intel Labs 2.2k Jan 05, 2023
A toolkit for reproducible reinforcement learning research.

garage garage is a toolkit for developing and evaluating reinforcement learning algorithms, and an accompanying library of state-of-the-art implementa

Reinforcement Learning Working Group 1.6k Jan 09, 2023
A toolkit for developing and comparing reinforcement learning algorithms.

Status: Maintenance (expect bug fixes and minor updates) OpenAI Gym OpenAI Gym is a toolkit for developing and comparing reinforcement learning algori

OpenAI 29.6k Jan 01, 2023
A general-purpose multi-agent training framework.

MALib A general-purpose multi-agent training framework. Installation step1: build environment conda create -n malib python==3.7 -y conda activate mali

MARL @ SJTU 346 Jan 03, 2023
A customisable 3D platform for agent-based AI research

DeepMind Lab is a 3D learning environment based on id Software's Quake III Arena via ioquake3 and other open source software. DeepMind Lab provides a

DeepMind 6.8k Jan 05, 2023
Rethinking the Importance of Implementation Tricks in Multi-Agent Reinforcement Learning

MARL Tricks Our codes for RIIT: Rethinking the Importance of Implementation Tricks in Multi-AgentReinforcement Learning. We implemented and standardiz

404 Dec 25, 2022
Monitor your el-cheapo UPS via SNMP

UPSC-SNMP-Agent UPSC-SNMP-Agent exposes your el-cheapo locally connected UPS via the SNMP network management protocol. This enables various equipment

Tom Szilagyi 32 Jul 28, 2022
Modular Deep Reinforcement Learning framework in PyTorch. Companion library of the book "Foundations of Deep Reinforcement Learning".

SLM Lab Modular Deep Reinforcement Learning framework in PyTorch. Documentation: https://slm-lab.gitbook.io/slm-lab/ BeamRider Breakout KungFuMaster M

Wah Loon Keng 1.1k Dec 24, 2022
A platform for Reasoning systems (Reinforcement Learning, Contextual Bandits, etc.)

Applied Reinforcement Learning @ Facebook Overview ReAgent is an open source end-to-end platform for applied reinforcement learning (RL) developed and

Facebook Research 3.3k Jan 05, 2023
Doom-based AI Research Platform for Reinforcement Learning from Raw Visual Information. :godmode:

ViZDoom ViZDoom allows developing AI bots that play Doom using only the visual information (the screen buffer). It is primarily intended for research

Marek Wydmuch 1.5k Dec 30, 2022