Deep reinforcement learning library built on top of Neural Network Libraries

Overview

License Build status

Deep Reinforcement Learning Library built on top of Neural Network Libraries

NNablaRL is a deep reinforcement learning library built on top of Neural Network Libraries that is intended to be used for research, development and production.

Installation

Installing NNablaRL is easy!

$ pip install nnabla-rl

NNablaRL only supports Python version >= 3.6 and NNabla version >= 1.17.

Enabling GPU accelaration (Optional)

NNablaRL algorithms run on CPU by default. To run the algorithm on GPU, first install nnabla-ext-cuda as follows. (Replace [cuda-version] depending on the CUDA version installed on your machine.)

$ pip install nnabla-ext-cuda[cuda-version]
# Example installation. Supposing CUDA 11.0 is installed on your machine.
$ pip install nnabla-ext-cuda110

After installing nnabla-ext-cuda, set the gpu id to run the algorithm on through algorithm's configuration.

import nnabla_rl.algorithms as A

config = A.DQNConfig(gpu_id=0) # Use gpu 0. If negative, will run on CPU.
dqn = A.DQN(env, config=config)
...

Features

Friendly API

NNablaRL has friendly Python APIs which enables to start training with only 3 lines of python code.

import nnabla_rl
import nnabla_rl.algorithms as A
from nnabla_rl.utils.reproductions import build_atari_env

env = build_atari_env("BreakoutNoFrameskip-v4") # 1
dqn = A.DQN(env)  # 2
dqn.train(env)  # 3

To get more details about NNablaRL, see documentation and examples.

Many builtin algorithms

Most of famous/SOTA deep reinforcement learning algorithms, such as DQN, SAC, BCQ, GAIL, etc., are implemented in NNablaRL. Implemented algorithms are carefully tested and evaluated. You can easily start training your agent using these verified implementations.

For the list of implemented algorithms see here.

You can also find the reproduction and evaluation results of each algorithm here.
Note that you may not get completely the same results when running the reproduction code on your computer. The result may slightly change depending on your machine, nnabla/nnabla-rl's package version, etc.

Seemless switching of online and offline training

In reinforcement learning, there are two main training procedures, online and offline, to train the agent. Online training is a training procedure that executes both data collection and network update alternately. Conversely, offline training is a training procedure that updates the network using only existing data. With NNablaRL, you can switch these two training procedures seemlessly. For example, as shown below, you can easily train a robot's controller online using simulated environment and finetune it offline with real robot dataset.

import nnabla_rl
import nnabla_rl.algorithms as A

simulator = get_simulator() # This is just an example. Assuming that simulator exists
dqn = A.DQN(simulator)
# train online for 1M iterations
dqn.train_online(simulator, total_iterations=1000000)

real_data = get_real_robot_data() # This is also an example. Assuming that you have real robot data
# fine tune the agent offline for 10k iterations using real data
dqn.train_offline(real_data, total_iterations=10000)

Getting started

Try below interactive demos to get started.
You can run it directly on Colab from the links in the table below.

Title Notebook Target RL task
Simple reinforcement learning training to get started Open In Colab Pendulum
Learn how to use training algorithms Open In Colab Pendulum
Learn how to use customized network model for training Open In Colab Mountain car
Learn how to use different network solver for training Open In Colab Pendulum
Learn how to use different replay buffer for training Open In Colab Pendulum
Learn how to use your own environment for training Open In Colab Customized environment
Atari game training example Open In Colab Atari games

Documentation

Full documentation is here.

Contribution guide

Any kind of contribution to NNablaRL is welcome! See the contribution guide for details.

License

NNablaRL is provided under the Apache License Version 2.0 license.

Comments
  • Update cem function interface

    Update cem function interface

    Updated interface of cross entropy function methods. The args, pop_size is now changed to sample_size. In addition, the given objective function to CEM function will be called with variable x which has (batch_size, sample_size, x_dim). This is different from previous interface. If you want to know the details, please see the function docs.

    opened by sbsekiguchi 1
  • Add implementation for RNN support and DRQN algorithm

    Add implementation for RNN support and DRQN algorithm

    Add RNN model support and DRQN algorithm.

    Following trainers will support RNN-model.

    • Q value-based trainers
    • Deterministic gradient and Soft policy trainers

    Other trainers can support RNN models in future but is not implemented in the initial release.

    See this paper for the details of the DRQN algorithm.

    opened by ishihara-y 1
  • Implement SACD

    Implement SACD

    This PR implements SAC-D algorithm. https://arxiv.org/abs/2206.13901

    These changes have been made:

    • New environments with factored reward functions have been added
      • FactoredLunarLanderContinuousV2NNablaRL-v1
      • FactoredAntV4NNablaRL-v1
      • FactoredHopperV4NNablaRL-v1
      • FactoredHalfCheetahV4NNablaRL-v1
      • FactoredWalker2dV4NNablaRL-v1
      • FactoredHumanoidV4NNablaRL-v1
    • SACD algorithms has been added
    • SoftQDTrainer has been added
    • _InfluenceMetricsEvaluator has been added
    • reproduction script has been added (not benchmarked yet)

    visualizing influence metrics

    import gym
    
    import numpy as np
    import matplotlib.pyplot as plt
    
    import nnabla_rl.algorithms as A
    import nnabla_rl.hooks as H
    import nnabla_rl.writers as W
    from nnabla_rl.utils.evaluator import EpisodicEvaluator
    
    env = gym.make("FactoredLunarLanderContinuousV2NNablaRL-v1")
    eval_env = gym.make("FactoredLunarLanderContinuousV2NNablaRL-v1")
    
    evaluation_hook = H.EvaluationHook(
        eval_env,
        EpisodicEvaluator(run_per_evaluation=10),
        timing=5000,
        writer=W.FileWriter(outdir="logdir", file_prefix='evaluation_result'),
    )
    iteration_num_hook = H.IterationNumHook(timing=100)
    
    config = A.SACDConfig(gpu_id=0, reward_dimension=9)
    sacd = A.SACD(env, config=config)
    sacd.set_hooks([iteration_num_hook, evaluation_hook])
    sacd.train_online(env, total_iterations=100000)
    
    influence_history = []
    
    state = env.reset()
    while True:
        action = sacd.compute_eval_action(state)
        influence = sacd.compute_influence_metrics(state, action)
        influence_history.append(influence)
        state, _, done, _ = env.step(action)
        if done:
            break
    
    influence_history = np.array(influence_history)
    for i, label in enumerate(["position", "velocity", "angle", "left_leg", "right_leg", "main_eingine", "side_engine", "failure", "success"]):
        plt.plot(influence_history[:, i], label=label)
    plt.xlabel("step")
    plt.ylabel("influence metrics")
    plt.legend()
    plt.show()
    

    image

    sample animation

    sample

    opened by ishihara-y 0
  • Add gmm and Update gaussian

    Add gmm and Update gaussian

    Added gmm and gaussian of the numpy models. In addition, updated the gaussian distribution's API.

    The API change is like following:

    Previous :

    batch_size = 10
    output_dim = 10
    input_shape = (batch_size, output_dim)
    mean = np.zeros(shape=input_shape)
    sigma = np.ones(shape=input_shape) * 5.
    ln_var = np.log(sigma) * 2.
    distribution = D.Gaussian(mean, ln_var)
    # return nn.Variable
    assert isinstance(distribution.sample(), nn.Variable)
    

    Updated:

    batch_size = 10
    output_dim = 10
    input_shape = (batch_size, output_dim)
    mean = np.zeros(shape=input_shape)
    sigma = np.ones(shape=input_shape) * 5.
    ln_var = np.log(sigma) * 2.
    # You have to pass the nn.Variable if you want to get nn.Variable as all class method's return.
    distribution = D.Gaussian(nn.Variable.from_numpy_array(mean), nn.Variable.from_numpy_array(ln_var))
    assert isinstance(distribution.sample(), nn.Variable)
    
    # If you pass np.ndarray, then all class methods return np.ndarray
    # Currently, only support without batch shape (i.e. mean.shape = (dims,), ln_var.shape = (dims, dims)).
    distribution = D.Gaussian(mean[0], np.diag(ln_var[0]))  # without batch
    assert isinstance(distribution.sample(), np.ndarray)
    
    opened by sbsekiguchi 0
  • Support nnabla-browser

    Support nnabla-browser

    • [x] add MonitorWriter
    • [x] save computational graph as nntxt

    example

    import gym
    
    import nnabla_rl.algorithms as A
    import nnabla_rl.hooks as H
    import nnabla_rl.writers as W
    from nnabla_rl.utils.evaluator import EpisodicEvaluator
    
    # save training computational graph
    training_graph_hook = H.TrainingGraphHook(outdir="test")
    
    # evaluation hook with nnabla's Monitor
    eval_env = gym.make("Pendulum-v0")
    evaluator = EpisodicEvaluator(run_per_evaluation=10)
    evaluation_hook = H.EvaluationHook(
        eval_env,
        evaluator,
        timing=10,
        writer=W.MonitorWriter(outdir="test", file_prefix='evaluation_result'),
    )
    
    env = gym.make("Pendulum-v0")
    sac = A.SAC(env)
    sac.set_hooks([training_graph_hook, evaluation_hook])
    
    sac.train_online(env, total_iterations=100)
    

    image image

    opened by ishihara-y 0
  • Add iLQR and LQR

    Add iLQR and LQR

    Implementation of Linear Quadratic Regulator (LQR) and iterative LQR algorithms.

    Co-authored-by: Yu Ishihara [email protected] Co-authored-by: Shunichi Sekiguchi [email protected]

    opened by ishihara-y 0
  • Check np_random instance and use correct randint alternative

    Check np_random instance and use correct randint alternative

    I am not sure when this change was made but in some environment, gym.unwrapped.np_random returns Generator instead of RandomState.

    # in case of RandomState
    # this line works
    gym.unwrapped.np_random.rand_int(...)
    # in case of Generator
    # rand_int does not exist and we must use integers as an alternative
    gym.unwrapped.np_random.integers(...)
    

    This PR will fix this issue and chooses correct function for sampling integers.

    opened by ishihara-y 0
  • Add icra2018 qtopt

    Add icra2018 qtopt

    opened by sbsekiguchi 0
Releases(v0.12.0)
Owner
Sony
Sony Group Corporation
Sony
Opensea-upload-with-recaptcha-solution - Updated opensea uploading solution with recaptcha pass

opensea-upload-with-recaptcha-solution updated opensea uploading solution with r

byeonggeon sim 25 Nov 15, 2022
The official wrapper for spyse.com API, written in Python, aimed to help developers build their integrations with Spyse.

Python wrapper for Spyse API The official wrapper for spyse.com API, written in Python, aimed to help developers build their integrations with Spyse.

Spyse 15 Nov 22, 2022
A Telegram Bot That Provides Permanent Download Links For Sent Files.

FileStreamBot A Telegram bot to all media and documents files to web link . Report a Bug | Request Feature Demo Bot: 🍁 About This Bot : This bot will

Flux Inc. 1 Nov 02, 2021
Apps related to Odoo it's calendar features

calendar Apps related to Odoo it's calendar/appointments features: online_appointment_locations: allow setting an online URL per employee online_appoi

Yenthe Van Ginneken 3 Oct 27, 2022
📷 Instagram Bot - Tool for automated Instagram interactions

InstaPy Tooling that automates your social media interactions to “farm” Likes, Comments, and Followers on Instagram Implemented in Python using the Se

Tim Großmann 13.5k Dec 01, 2021
Discondelete, is a Discord self-bot to delete dm's or purge all messages from a guild.

Discondelete Discondelete, is a Discord self-bot to delete dm's or purge all messages from a guild. Report Bug · Request Feature Table of Contents Abo

core 4 Feb 28, 2022
A multi-purpose Discord bot with simple moderation commands, reaction roles, reminders, and much more!

Nokari This is the rewrite of Nokari. There are still a lot of things to be done. I'm still working on the internal logic, so the bot basically has no

Norizon 13 Nov 17, 2022
This is a very easy to use tool developed in python that will search for free courses from multiple sites including youtube and enroll in the ones in which it can.

Free-Course-Hunter-and-Enroller This is a very easy to use tool developed in python that will search for free courses from multiple sites including yo

Zain 12 Nov 12, 2022
The implementation of Learning Instance and Task-Aware Dynamic Kernels for Few Shot Learning

INSTA: Learning Instance and Task-Aware Dynamic Kernels for Few Shot Learning This repository provides the implementation and demo of Learning Instanc

11 Jan 02, 2023
API para realizar parser de frases

NLP API Simple api to parse and apply some preprocessing steps in portuguses phrases (pt_BR) This api uses the great FastAPI and spaCy packages! Usage

⟠ Rodolfo De Nadai 1 Dec 28, 2021
Translator based on Google API

Yakusu Toshiko Translator based on Google API. Instance of this bot is running as @yakusubot. Features Add a plus to a language's name to show an orig

Arisu W. 2 Sep 21, 2022
Telegram bot to stream videos in telegram voicechat for both groups and channels. Supports live strams, YouTube videos and telegram media.

Telegram VCVideoPlayBot An Telegram Bot By @ZauteKm To Stream Videos in Telegram Voice Chat. NOTE: Make sure you have started a VoiceChat in your Grou

Zaute 20 Oct 21, 2022
Python API for British Geological Survey magnetic field calculator

Magnetic field calculator Python API for British Geological Survey magnetic field calculator. Description This project magnetic field calculator. It u

Filip Š 3 Mar 11, 2022
Telegram Bot for generating and decoding QR-codes

Telegram openqrgen_bot Telegram Bot that generates from user's messages and decodes QR-codes from photos. Also contains rickroll detection :) Just typ

2 Nov 14, 2021
Bot Telegram per creare e gestire un Babbo Natale Segreto con amici ecc

Babbo Natale Segreto: Telegram Bot Bot Telegram per creare e gestire un Babbo Natale Segreto con amici ecc. Che cos'è? Il Babbo Natale Segreto è un gi

Francesco Ciociola 2 Jul 18, 2022
Async client API for the Telegram Group Calls

PyTgCalls This project allow to make Telegram group call with MTProto Api using Pyrogram and WebRTC, this is possible thanks to the power of NodeJS's

185 Jan 03, 2023
a discord libary that use to make discord bot with low efficiency and bad performance because I don't know how to manage the project

Aircord 🛩️ a discord libary that use to make discord bot with low efficiency and bad performance because I don't know how to manage the project Examp

Aircord 2 Oct 24, 2021
OSINT tool to get information from a Github and Gitlab profile and find user's email addresses leaked on commits.

gitrecon OSINT tool to get information from a Github or Gitlab profile and find user's email addresses leaked on commits. 📚 How does this work? GitHu

GOΠZO 211 Dec 17, 2022
This is Pdisk Upload Bot made using Python with Pyrogram Framework. Its capable of uploading direct download link with thumbnail or without thumbnail & with Title Support.

Pdisk-Upload-Bot Introduction This Is PDisk Upload Bot Used To Upload Direct Link To Pdisk With Thumb Support Deploy Heroku Deploy Local Deploy pip in

HEIMAN PICTURES 32 Oct 21, 2022
A powerfull Telegram Leech Bot

owner of this repo :- Abijthkutty contact me :- Abijth Telegram Torrent and Direct links Leecher Dont Abuse The Repo ... this is intented to run in Sm

αвιנтн 9 Jun 11, 2022