A toolkit for reproducible reinforcement learning research.

Overview

Docs Garage CI License codecov PyPI version

garage

garage is a toolkit for developing and evaluating reinforcement learning algorithms, and an accompanying library of state-of-the-art implementations built using that toolkit.

The toolkit provides wide range of modular tools for implementing RL algorithms, including:

  • Composable neural network models
  • Replay buffers
  • High-performance samplers
  • An expressive experiment definition interface
  • Tools for reproducibility (e.g. set a global random seed which all components respect)
  • Logging to many outputs, including TensorBoard
  • Reliable experiment checkpointing and resuming
  • Environment interfaces for many popular benchmark suites
  • Supporting for running garage in diverse environments, including always up-to-date Docker containers

See the latest documentation for getting started instructions and detailed APIs.

Installation

pip install --user garage

Examples

Starting from version v2020.10.0, garage comes packaged with examples. To get a list of examples, run:

garage examples

You can also run garage examples --help, or visit the documentation for even more details.

Join the Community

Join the garage-announce mailing list for infrequent updates (<1/mo.) on the status of the project and new releases.

Need some help? Want to ask garage is right for your project? Have a question which is not quite a bug and not quite a feature request?

Join the community Slack by filling out this Google Form.

Algorithms

The table below summarizes the algorithms available in garage.

Algorithm Framework(s)
CEM numpy
CMA-ES numpy
REINFORCE (a.k.a. VPG) PyTorch, TensorFlow
DDPG PyTorch, TensorFlow
DQN PyTorch, TensorFlow
DDQN PyTorch, TensorFlow
ERWR TensorFlow
NPO TensorFlow
PPO PyTorch, TensorFlow
REPS TensorFlow
TD3 PyTorch, TensorFlow
TNPG TensorFlow
TRPO PyTorch, TensorFlow
MAML PyTorch
RL2 TensorFlow
PEARL PyTorch
SAC PyTorch
MTSAC PyTorch
MTPPO PyTorch, TensorFlow
MTTRPO PyTorch, TensorFlow
Task Embedding TensorFlow
Behavioral Cloning PyTorch

Supported Tools and Frameworks

garage requires Python 3.6+. If you need Python 3.5 support, the last garage release to support Python 3.5 was v2020.06.

The package is tested on Ubuntu 18.04. It is also known to run on Ubuntu 16.04, 18.04, and 20.04, and recent versions of macOS using Homebrew. Windows users can install garage via WSL, or by making use of the Docker containers.

We currently support PyTorch and TensorFlow for implementing the neural network portions of RL algorithms, and additions of new framework support are always welcome. PyTorch modules can be found in the package garage.torch and TensorFlow modules can be found in the package garage.tf. Algorithms which do not require neural networks are found in the package garage.np.

The package is available for download on PyPI, and we ensure that it installs successfully into environments defined using conda, Pipenv, and virtualenv.

Testing

The most important feature of garage is its comprehensive automated unit test and benchmarking suite, which helps ensure that the algorithms and modules in garage maintain state-of-the-art performance as the software changes.

Our testing strategy has three pillars:

  • Automation: We use continuous integration to test all modules and algorithms in garage before adding any change. The full installation and test suite is also run nightly, to detect regressions.
  • Acceptance Testing: Any commit which might change the performance of an algorithm is subjected to comprehensive benchmarks on the relevant algorithms before it is merged
  • Benchmarks and Monitoring: We benchmark the full suite of algorithms against their relevant benchmarks and widely-used implementations regularly, to detect regressions and improvements we may have missed.

Supported Releases

Release Build Status Last date of support
v2020.06 Garage CI Release-2020.06 February 28th, 2021

Garage releases a new stable version approximately every 4 months, in February, June, and October. Maintenance releases have a stable API and dependency tree, and receive bug fixes and critical improvements but not new features. We currently support each release for a window of 8 months.

Citing garage

If you use garage for academic research, please cite the repository using the following BibTeX entry. You should update the commit field with the commit or release tag your publication uses.

@misc{garage,
 author = {The garage contributors},
 title = {Garage: A toolkit for reproducible reinforcement learning research},
 year = {2019},
 publisher = {GitHub},
 journal = {GitHub repository},
 howpublished = {\url{https://github.com/rlworkgroup/garage}},
 commit = {be070842071f736eb24f28e4b902a9f144f5c97b}
}

Credits

The earliest code for garage was adopted from predecessor project called rllab. The garage project is grateful for the contributions of the original rllab authors, and hopes to continue advancing the state of reproducibility in RL research in the same spirit. garage has previously been supported by the Amazon Research Award "Watch, Practice, Learn, Do: Unsupervised Learning of Robust and Composable Robot Motion Skills by Fusing Expert Demonstrations with Robot Experience."


Made with  at and  

Comments
  • Bug fixes for importing Box

    Bug fixes for importing Box

    There are two major changes:

    1. Change importing gym.spaces.Box to from gym.spaces import Box
    2. Add spec property method for MujocoEnv to fix the no method error when calling env.spec
    opened by bohan-zhang 30
  • Delete stub()

    Delete stub()

    Traceback (most recent call last):
      File "/Users/jonathon/Documents/garage/garage/scripts/run_experiment.py", line 191, in <module>
        run_experiment(sys.argv)
      File "/Users/jonathon/Documents/garage/garage/scripts/run_experiment.py", line 146, in run_experiment
        logger.log_parameters_lite(params_log_file, args)
      File "/Users/jonathon/Documents/garage/garage/garage/misc/logger.py", line 372, in log_parameters_lite
        json.dump(log_params, f, indent=2, sort_keys=True, cls=MyEncoder)
      File "/anaconda2/envs/garage/lib/python3.6/json/__init__.py", line 179, in dump
        for chunk in iterable:
      File "/anaconda2/envs/garage/lib/python3.6/json/encoder.py", line 430, in _iterencode
        yield from _iterencode_dict(o, _current_indent_level)
      File "/anaconda2/envs/garage/lib/python3.6/json/encoder.py", line 404, in _iterencode_dict
        yield from chunks
      File "/anaconda2/envs/garage/lib/python3.6/json/encoder.py", line 404, in _iterencode_dict
        yield from chunks
      File "/anaconda2/envs/garage/lib/python3.6/json/encoder.py", line 404, in _iterencode_dict
        yield from chunks
      [Previous line repeated 1 more times]
      File "/anaconda2/envs/garage/lib/python3.6/json/encoder.py", line 437, in _iterencode
        o = _default(o)
      File "/Users/jonathon/Documents/garage/garage/garage/misc/logger.py", line 352, in default
        return json.JSONEncoder.default(self, o)
      File "/anaconda2/envs/garage/lib/python3.6/json/encoder.py", line 180, in default
        o.__class__.__name__)
    TypeError: Object of type 'TimeLimit' is not JSON serializable
    
    bug envs 
    opened by jonashen 25
  • Fix a typo to allow evaluating algos deterministically

    Fix a typo to allow evaluating algos deterministically

    When running some experiments with SAC I have discovered that my algorithm does not act deterministically during the evaluation (i.e. the action is sampled from the policy distribution instead of taking the mean/mode of the distribution). The code for obtaining evaluation samples uses the rollout function from sampler.utils with the argument deterministic=True.

    The rollout function is then supposed to look into agent_info dictionary and use the mean value stored there. Unfortunately, currently in the code it looks into the agent_infos (with s at the end), which is a list containing agent_info dictionaries and as such obviously does not contain the mean key. This means that the stochastic, sampled action is used instead. My pull request solves this issue by fixing the typo.

    Technical sidenote - maybe there should be an exception raised if deterministic=True and there is no mean key in the dict?

    ready-to-merge backport-to-2019.10 backport-to-2020.06 
    opened by maciejwolczyk 24
  • Replace CategoricalConvPolicy

    Replace CategoricalConvPolicy

    • Remove all occurrences of CategoricalConvPolicy
    • Rename CategoricalConvPolicyWithModel to CategoricalConvPolicy
    • Create and remove integration test

    Benchmark script is located in origin/benchmark_categorical_cnn_policy.

    Results: MemorizeDigits-v0_benchmark_ppo

    MemorizeDigits-v0_benchmark_ppo_mean

    Also tried running both versions in an atari environment PongNoFrameskip with PPO, but realized that this combination of environment and algorithm is not ideal for our testing: image

    As discussed, results from MemorizeDigits are sufficient to show that the layer implementation can be replaced with the model implementation.

    opened by lywong92 24
  • Fix sleeping processes

    Fix sleeping processes

    The joblib package responsible of the MemmappingPool has been updated to consider any bugs that could produce the sleeping processes in the parallel sampler. Also the environment variable JOBLIB_START_METHOD has been removed since it's not implemented by joblib anymore. However, if run_experiment is interrupted during the optimization steps, the sleeping processes are still produced. To fix the problem, the child processes of the parallel sampler ignore SIGINT so they're not killed while holding a lock that is also acquired by the parent process, avoiding a dead lock. To make sure the child processes are terminated, the SIGINT handler in the parent process is overridden to call the terminate and join functions in the processes pool. The process (thread in TF) used in Plotter is terminated thanks to registering the method shutdown with function atexit, but one important step missing was to clean the Queue that interacts with worker process.

    opened by ghost 22
  • Failed to reproduce example her_ddpg_fetchreach

    Failed to reproduce example her_ddpg_fetchreach

    Hi,

    I was trying to run examples/tf/her_ddpg_fetchreach.py but got a much worse performance. I attached the results as follows, and it seems that it's not working at all. Do you have any idea how to make it work? Though the default parameters look reasonable, should I try to tune some parameters?

    Thank you in advance.

    AverageSuccessRate 0 Epoch 49 Evaluation/AverageDiscountedReturn -9.94112 Evaluation/AverageReturn -999.93 Evaluation/CompletionRate 0 Evaluation/Iteration 980 Evaluation/MaxReturn -998 Evaluation/MinReturn -1000 Evaluation/NumTrajs 100 Evaluation/StdReturn 0.324191 Policy/AveragePolicyLoss 4.17451 QFunction/AverageAbsQ 4.19709 QFunction/AverageAbsY 4.19412 QFunction/AverageQ -4.16916 QFunction/AverageQFunctionLoss 0.0232313 QFunction/AverageY -4.16946 QFunction/MaxQ 2.87869 QFunction/MaxY 2.62668 TotalEnvSteps 100000

    bug 
    opened by st2yang 20
  • Master branch can't pass make test

    Master branch can't pass make test

    Current master branch can't pass make test. However the failed tests will pass when running unittest separately.

    ======================================================================
    ERROR: test_dm_control_tf_policy (tests.garage.envs.dm_control.test_dm_control_tf_policy.TestDmControlTfPolicy)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/root/code/garage/tests/garage/envs/dm_control/test_dm_control_tf_policy.py", line 38, in test_dm_control_tf_policy
        runner.train(n_epochs=1, batch_size=10)
      File "/root/code/garage/garage/experiment/local_tf_runner.py", line 321, in train
        start_epoch=0)
      File "/root/code/garage/garage/experiment/local_tf_runner.py", line 407, in _train
        self.save(epoch, paths if store_paths else None)
      File "/root/code/garage/garage/experiment/local_tf_runner.py", line 210, in save
        snapshotter.save_itr_params(epoch, params)
      File "/root/code/garage/garage/logger/snapshotter.py", line 85, in save_itr_params
        with open(file_name, 'wb') as file:
    FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmpgplyc983/params.pkl'
    
    ======================================================================
    ERROR: test_cem_cartpole (tests.garage.np.algos.test_cem.TestCEM)
    Test CEM with Cartpole-v1 environment.
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/root/code/garage/tests/garage/np/algos/test_cem.py", line 35, in test_cem_cartpole
        n_epochs=5, batch_size=2000, n_epoch_cycles=n_samples)
      File "/root/code/garage/garage/experiment/local_tf_runner.py", line 321, in train
        start_epoch=0)
      File "/root/code/garage/garage/experiment/local_tf_runner.py", line 407, in _train
        self.save(epoch, paths if store_paths else None)
      File "/root/code/garage/garage/experiment/local_tf_runner.py", line 210, in save
        snapshotter.save_itr_params(epoch, params)
      File "/root/code/garage/garage/logger/snapshotter.py", line 85, in save_itr_params
        with open(file_name, 'wb') as file:
    FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmpgplyc983/params.pkl'
    
    ======================================================================
    ERROR: test_cma_es_cartpole (tests.garage.np.algos.test_cma_es.TestCMAES)
    Test CMAES with Cartpole-v1 environment.
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/root/code/garage/tests/garage/np/algos/test_cma_es.py", line 33, in test_cma_es_cartpole
        runner.train(n_epochs=1, batch_size=1000, n_epoch_cycles=n_samples)
      File "/root/code/garage/garage/experiment/local_tf_runner.py", line 321, in train
        start_epoch=0)
      File "/root/code/garage/garage/experiment/local_tf_runner.py", line 407, in _train
        self.save(epoch, paths if store_paths else None)
      File "/root/code/garage/garage/experiment/local_tf_runner.py", line 210, in save
        snapshotter.save_itr_params(epoch, params)
      File "/root/code/garage/garage/logger/snapshotter.py", line 85, in save_itr_params
        with open(file_name, 'wb') as file:
    FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmpgplyc983/params.pkl'
    
    ======================================================================
    FAIL: test_trpo_recurrent_cartpole (tests.garage.tf.algos.test_trpo_with_model.TestTRPO)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/root/code/garage/tests/garage/tf/algos/test_trpo_with_model.py", line 39, in test_trpo_recurrent_cartpole
        assert last_avg_ret > 90
    AssertionError
    
    ----------------------------------------------------------------------
    Ran 623 tests in 789.240s
    
    FAILED (failures=1, errors=3)
    Makefile:60: recipe for target 'run-headless' failed
    make: *** [run-headless] Error 1
    
    bug 
    opened by zequnyu 19
  • Update setup script for OS X

    Update setup script for OS X

    The script has been updated based on setup_linux.sh, using homebrew to install Linux packages and updating the requirements for mujoco-py, gym and baselines according to their documentation for OS X. Since mujoco-py is installed within the setup scripts for Linux and OS X, the script to setup mujoco has been removed. Also, the tensorflow package was added to environment.yml so it can be installed out of the box without the scripts. The feature to install the GPU flavor of TensorFlow may be removed from the Linux script once everything could be installed using environment.yml only, or another single list of dependencies. Finally, the default for set-envvar and the correct replacement of string in print error and warning functions have been added to the script to setup linux.

    opened by ghost 19
  • Add Pytorch TRPO

    Add Pytorch TRPO

    Implemented Trust Region Policy Optimization in PyTorch.

    Benchmarks are currently running and should be finished by tomorrow. I opened this PR to get some feedback since Initial results and tests looked good.

    ready-to-merge 
    opened by utkarshjp7 18
  • Wrap examples in tests to run on CI

    Wrap examples in tests to run on CI

    Doesn't include the test for sim_policy.py. Will do that separately.

    Update: Also exludes tf/ppo_memorize_digits, tf/dqn_pong.py and tf/trpo_cubecrash.py for now, since they take too long to run on CI even with 1 epoch. Still figuring out decent enough parameters to run them on CI, but want to get this merged first.

    needs force push backport-to-2019.10 
    opened by gitanshu 18
  • Replace CategoricalLSTMPolicy with Model

    Replace CategoricalLSTMPolicy with Model

    This PR replaces CategoricalLSTMPolicy with the one implemented using garage.tf.models.

    Benchmark script is in benchmark_categorical_lstm_policy branch, modified from benchmark_ppo_catogorical.py.

    There seems to be some randomness in the benchmark. In some environments, the trails of the old and the new policies do not match very well. When this happends, usually the new one performs worse. See the figures below.

    Raw data for tensorboard. 2019-09-21-16-09-09-112853.zip

    Assault-ramDeterministic-v4 assault_seed_42 assault_seed_70 assault_seed_96 Breakout-ramDeterministic-v4 breakout_seed_40 breakout_seed_47 breakout_seed_86 ChopperCommand-ramDeterministic-v4 chopper_seed_3 chopper_seed_71 chopper_seed_99 LunarLander-v2 lunarLander_seed_23 lunarLander_seed_24 lunarLander_seed_35 Tutankham-ramDeterministic-v4/trial_1_seed_41 tutankham_seed_41 tutankham_seed_65 tutankham_seed_79

    opened by naeioi 18
  • Multinode experiments

    Multinode experiments

    Is it possible to train Garage DQN algorithm at multiple nodes ( CPUs only)? If yes, can you share a demo script or program. My samplers work at 2 nodes, 56 PPN leveraging Ray Sampler and wondering how to parallelizing training as well?

    opened by aruhela 0
  • Reproducibility issue

    Reproducibility issue

    Hi all, I've used the deterministic.set_seed but still get different results. It seems that the samples that start from itr2 become different (itr0 & itr1 are the same). Do you have ideas about this issue?

    opened by YY-GX 1
  • implementation detail question about MAMLPPO

    implementation detail question about MAMLPPO

    Hi, Thank you for providing the quality code. I have a implementation detail question.

    1. Why don't you just call optimizer.step() after meta_objective.backward()?

    https://github.com/rlworkgroup/garage/blob/f056fb8f6226c83d340c869e0d5312d61acf07f0/src/garage/torch/algos/maml.py#L124-L127

    opened by seolhokim 0
  • Problems with workers when using custom Gym/Reacher Environment

    Problems with workers when using custom Gym/Reacher Environment

    Hi

    I implemented a customized version of the gym reacher environment - actually using an aditional parameter (cameras) for rendering in multiple camera perspectives. I also wrapped the environment with GymEnv This works in general fine, but I run into problems when using multiple workers (with LocalSampler). As far as I understand for splitting the work on multiple workers, multiple environments are generated. Unfortunately then my environments are not built correctly anymore - it seems not able to handle the custom parameter - does anyone know any help?

    File "/home/name/Desktop/launch_script_reacher.py", line 86, in <module>
        run_my_reacher(mode="rgb_array",iterations=10)
      File "/home/name/anaconda3/envs/my_env/lib/python3.7/site-packages/garage/experiment/experiment.py", line 369, in __call__
        result = self.function(ctxt, **kwargs)
      File "/home/name/Desktop/my_folder/launch_reacher.py", line 53, in run_my_reacher
        worker_class=worker_with_disc)
      File "/home/name/Desktop/my_folder/helper/own_local_sampler.py", line 154, in __init__
        worker_class=worker_class,worker_args=worker_args
      File "/home/name/anaconda3/envs/my_env/lib/python3.7/site-packages/garage/sampler/local_sampler.py", line 79, in __init__
        envs, preprocess=copy.deepcopy)
      File "/home/name/anaconda3/envs/my_env/lib/python3.7/site-packages/garage/sampler/worker_factory.py", line 95, in prepare_worker_messages
        return [preprocess(objs) for _ in range(self.n_workers)]
      File "/home/name/anaconda3/envs/my_env/lib/python3.7/site-packages/garage/sampler/worker_factory.py", line 95, in <listcomp>
        return [preprocess(objs) for _ in range(self.n_workers)]
      File "/home/name/anaconda3/envs/my_env/lib/python3.7/copy.py", line 169, in deepcopy
        rv = reductor(4)
      File "/home/name/anaconda3/envs/my_env/lib/python3.7/site-packages/garage/envs/gym_env.py", line 353, in __getstate__
        state = copy.deepcopy(self.__dict__)
      File "/home/name/anaconda3/envs/my_env/lib/python3.7/copy.py", line 150, in deepcopy
        y = copier(x, memo)
      File "/home/name/anaconda3/envs/my_env/lib/python3.7/copy.py", line 241, in _deepcopy_dict
        y[deepcopy(key, memo)] = deepcopy(value, memo)
      File "/home/name/anaconda3/envs/my_env/lib/python3.7/copy.py", line 180, in deepcopy
        y = _reconstruct(x, memo, *rv)
      File "/home/name/anaconda3/envs/my_env/lib/python3.7/copy.py", line 283, in _reconstruct
        y.__setstate__(state)
      File "/home/name/anaconda3/envs/my_env/lib/python3.7/site-packages/gym/utils/ezpickle.py", line 26, in __setstate__
        out = type(self)(*d["_ezpickle_args"], **d["_ezpickle_kwargs"])
    TypeError: __init__() missing 1 required positional argument: 'cameras'
    
    opened by Suli1223 0
  • Suggestion to add how to implement pre-trained policies.

    Suggestion to add how to implement pre-trained policies.

    Hi! I'm currently working to implement a policy trained using SAC in mujoco into a real robot. I'm trying to load the two q-functions but I obtain weird result in q_loss and the returns. Any suggestion in how to load correctly the policy? Thanks!

    opened by bara-bba 1
Releases(v2020.10.0rc5)
  • v2020.10.0rc5(Oct 2, 2020)

    This is a pre-release of v2020.10. It contains changes to garage since v2020.06.0.

    This pre-release makes cutting-edge features available via PyPI, but comes with no promises of support or bug fixes. If you encounter problems with this release, you are encouraged to either install from master or revert to the v2020.06 release.

    For information on what to expected in garage v2020.09, see the release notes for v2020.06.0

    Source code(tar.gz)
    Source code(zip)
  • v2020.06.3(Sep 14, 2020)

    This is a maintenance release for 2020.06.

    • Fixed
      • PyTorch 1.7 support (#1934)
      • LocalRunner ignores worker_cls attribute of algorithms (#1984)
      • mujoco_py versions greater than v2.0.2.8 are incompatible with some GCC versions in conda (#2000)
      • MTSAC not learning because it corrupts the termination signal by wrapping with GarageEnv twice (#2029)
      • MTSAC does not respect max_episode_length_eval hyperparameter (#2029)
      • MTSAC MetaWorld examples do not use the correct number of tasks (#2029)
      • MTSAC now supports a separate max_episode_length for evalaution via the max_episode_length_eval hyperparameter (#2029)
      • MTSAC MetaWorld MT50 example used an incorrect max_episode_length (#2029)
    Source code(tar.gz)
    Source code(zip)
  • v2020.09.0rc4(Sep 14, 2020)

    This is a pre-release of v2020.09. It contains changes to garage since v2020.06.0.

    This pre-release makes cutting-edge features available via PyPI, but comes with no promises of support or bug fixes. If you encounter problems with this release, you are encouraged to either install from master or revert to the v2020.06 release.

    For information on what to expected in garage v2020.09, see the release notes for v2020.06.0

    Source code(tar.gz)
    Source code(zip)
  • v2020.09.0rc3(Aug 20, 2020)

    This is a pre-release of v2020.09. It contains changes to garage since v2020.06.0.

    This pre-release makes cutting-edge features available via PyPI, but comes with no promises of support or bug fixes. If you encounter problems with this release, you are encouraged to either install from master or revert to the v2020.06 release.

    For information on what to expected in garage v2020.09, see the release notes for v2020.06.0

    Source code(tar.gz)
    Source code(zip)
  • v2020.06.2(Aug 17, 2020)

    This is a maintenance release for 2020.06.

    • Fixed
      • Better parameters for example her_ddpg_fetchreach (#1763)
      • Ensure determinism in TensorFlow by using tfp.SeedStream (#1821)
      • Broken rendering of MuJoCo environments to pixels in the NVIDIA Docker container (#1838)
      • Enable cudnn in the NVIDIA Docker container (#1840)
      • Bug in DiscreteQfDerivedPolicy in which parameters were not returned (#1847)
      • Populate TimeLimit.truncated at every step when using gym.Env (#1852)
      • Bug in which parameters where not copied when TensorFlow primitives are clone()ed (#1855)
      • Typo in the Makefile target run-nvidia (#1914)
    Source code(tar.gz)
    Source code(zip)
  • v2020.09.0rc2(Aug 17, 2020)

    This is a pre-release of v2020.09. It contains changes to garage since v2020.06.0.

    This pre-release makes cutting-edge features available via PyPI, but comes with no promises of support or bug fixes. If you encounter problems with this release, you are encouraged to either install from master or revert to the v2020.06 release.

    For information on what to expected in garage v2020.09, see the release notes for v2020.06.0

    Source code(tar.gz)
    Source code(zip)
  • v2019.10.3(Aug 17, 2020)

    This is a maintenance release for 2019.10.

    Fixed

    • Better parameters for example her_ddpg_fetchreach (#1764)
    • Bug in DiscreteQfDerivedPolicy in which parameters were not returned (#1847)
    • Bug which made it impossible to evaluate stochastic policies deterministically (#1715)
    Source code(tar.gz)
    Source code(zip)
  • v2020.06.1(Jul 13, 2020)

    This is a maintenance release for v2020.06

    Fixed

    • Pipenv fails to resolve a stable dependency set because of excessively-narrow dependencies in tensorflow-probability (#1721)
    • Bug which prevented rollout from running policies deterministically (#1714)
    Source code(tar.gz)
    Source code(zip)
  • v2020.09.0rc1(Jul 5, 2020)

    This is a pre-release of v2020.09.0rc1. It contains changes to garage since v2020.06.0.

    This pre-release makes cutting-edge features available via PyPI, but comes with no promises of support or bug fixes. If you encounter problems with this release, you are encouraged to either install from master or revert to the v2020.06 release.

    For information on what to expected in garage v2020.09, see the release notes for v2020.06.0

    Source code(tar.gz)
    Source code(zip)
  • v2019.10.2(Jun 24, 2020)

    This is a maintenance release for 2019.10.

    Fixed

    • Use a GitHub Token in the CI to retrieve packages to avoid hitting GitHub API rate limit (#1250)
    • Avoid installing dev extra dependencies during the conda check (#1296)
    • Install dm_control from PyPI (#1406)
    • Pin tfp to 0.8.x to avoid breaking pipenv (#1480)
    • Force python 3.5 in CI (#1522)
    • Separate terminal and completion signal in vectorized sampler (#1581)
    • Disable certicate check for roboti.us (#1595)
    • Fix advantages shape in compute_advantage() in torch tree (#1209)
    • Fix plotting using tf.plotter (#1292)
    • Fix duplicate window rendering when using garage.Plotter (#1299)
    • Fix setting garage.model parameters (#1363)
    • Fix two example jupyter notebook (#1584)
    • Fix collecting samples in RaySampler (#1583)
    Source code(tar.gz)
    Source code(zip)
  • v2020.06.0(Jun 23, 2020)

    The Reinforcement Learning Working Group is proud to announce the 2020.06 release of garage.

    As always, we are actively seeking new contributors. If you use garage, please consider submitting a PR with your algorithm or improvements to the framework.

    Summary

    Please see the CHANGELOG for detailed information on the changes in this release.

    This released focused primarily on adding first-class support for meta-RL and multi-task RL. To achieve this, we rewrote the sampling API and subsystem completely, adding a Sampler API which is now multi-environment and multi-agent aware. We also added a library of baseline meta-RL and multi-task algorithms which reach state-of-the-art performance: MAML, PEARL, RL2, MTPPO, MTTRPO, MTSAC, Task Embeddings.

    Highlights in this release:

    • First-class support for meta-RL and multi-task RL, demonstrated using the MetaWorld benchmark
    • More PyTorch algorithms, including MAML, SAC, MTSAC, PEARL, PPO, and TRPO (97% test coverage)
    • More TensorFlow meta-RL algorithms, including RL2 and Task Embeddings (95% test coverage)
    • All-new Sampler API, with first-class support for multiple agents and environments
    • All-new experiment definition decorator @wrap_experiment, which replaces the old run_experiment function
    • Continued improvements to quality and test coverage. Garage now has 90% overall test coverage
    • Simplified and updated the Docker containers, adding better support for CUDA/nvidia-docker2 and removing the complex docker-compose based system

    Read below for more information on what's new in this release. See Looking forward for more information on what to expect in the next release.

    First-class support for meta-RL and MTRL

    We added first-class support for meta-RL and multi-task RL, including state-of-the-art performing versions of the following baseline algorithms:

    We also added explicit support for meta-task sampling and evaluation.

    New Sampler API

    The new Sampler API allows you to define a custom worker or rollout function for your algorithm, to control the algorithm's sampling behavior. These Workers are agnostic of the sampling parallelization backend used. This makes it easy to customize sampling behavior without forcing you to write your own sampler.

    For example, you can define one Worker and use it to collect samples inside the local process, or alternatively use it to collect many samples in parallel using multiprocessing, without ever having to interact with multiprocessing code and synchronization. Both RL2 and PEARL define custom workers, which allow them to implement the special sampling procedure necessary for these meta-RL algorithms.

    The sampler is also aware of multiple policies and environments, allowing you to customize it for use with multi-task/meta-RL or multi-agent RL.

    Currently-available sampling backends are:

    • LocalSampler - collects samples serially within the main optimization process
    • MultiprocessingSampler - collects samples in parallel across multiple processors using the Python standard library's multiprocessing library
    • RaySampler - collect samples in parallel using a ray cluster (that cluster can just be your local machine, of course)

    The API for defining a new Sampler backend is small and well-defined. If you have a new bright idea for a parallel sampler backend, send us a PR!

    New Experiment Definition API

    We added the @wrap_experiment decorator, which defines the new standard way of declaring an experiment and its hyperparameters in garage. In short, an experiment is a function, and a hyperparameters are the arguments to that function. You can wrap your experiment function with @wrap_experiment to set experiment meta-data such as snapshot schedules and log directories.

    Calling your experiment function runs the experiment.

    wrap_experiment has features such as saving the current git context, automatically naming experiments, and automatically saving the hyperparameters of any experiment function it decorates. Take a look at the examples/ directory for hands-on examples of how to use it.

    Improvements to quality and test coverage

    Overall test coverage increased from 85% to 90% since v2019.10, and we expect this to keep climbing. We also now define standard benchmarks for all algorithms in the separate benchmarks directory.

    Why we skipped 2020.02

    Our focus on adding meta- and multi-task RL support required changing around and generalizing many APIs in garage. Around January 2020, this support existed, and we were in the process of polishing it for the February 2020 release. Around this time, our development was impacted by the COVID-19 pandemic, forcing many members of the garage core maintainers team to socially isolate in their homes, slowing down communication, and overall the development of garage. Rather than rushing to release the software during stressful times, the team decided to skip the February 2020 release and put together a much more polished version for this release milestone.

    We intend to return to our regularly-scheduled release cadence for 2020.09.

    Who should use this release, and how

    Users who want to base a project on a semi-stable version of this software, and are not interested in bleeding-edge features should use the release branch and tags.

    Platform support

    This release has been tested extensively on Ubuntu 18.04 and 20.04. We have also used it successfully on Ubuntu 16.04 and macOS 10.13, 10.14, and 10.15.

    Maintenance Plan

    We plan on supporting this branch until at least February 2021. Our support will come mostly in the form of attempting to reproduce and fix critical user-reported bugs, conducting quality control on user-contributed PRs to the release branch, and releasing new versions when fixes are committed.

    We haven no intention of performing proactive maintenance such as dependency upgrades, nor new features, tests, platform support, or documentation. However, we welcome PRs to the maintenance branch (release-2020.06) from contributors wishing see these enhancements to this version of the software.

    Hotfixes

    We will post backwards-compatible hotfixes for this release to the branch release-2020.06. New hotfixes will also trigger a new release tag which complies with semantic versioning, i.e. the first hotfix release would be tagged v2020.06.1, the second would be tagged v2020.06.2, etc.

    We will not add new features, nor remove existing features from the branch release-2020.06 unless it is absolutely necessary for the integrity of the software.

    Next release

    We hope to release 2-3 times per year, approximately aligned with the North American academic calendar. We hope to release next around late September 2020, e.g. v2020.00.

    Looking forward

    The next release of garage will focus primarily on two goals: meta- and multi-task RL algorithms (and associated toolkit support) and stable, well-defined component APIs for fundamental RL abstractions such as Policy, QFunction, ValueFunction, Sampler, ReplayBuffer, Optimizer, etc.

    Complete documentation

    We are working feverishly to document garage and its APIs, to give the toolkit a full user manual, how-tos, tutorials, per-algorithm documentation and baseline curves, and a reference guide motivating the design and usage of all APIs.

    Stable and well-defined component APIs

    The toolkit has gotten mature-enough that most components have a fully-described formal API or an informal API which all components of that type implement, and large-enough that we have faith that our existing components cover most current RL use cases.

    Now we will turn to formalizing the major component APIs and ensuring that the components in garage all conform to these APIs This will allow us to simplify lots of logic throughout the toolkit, and will make it easier to mix components defined outside garage with those defined inside garage.

    More flexible packaging

    We intend on removing hard dependencies on TensorFlow, PyTorch, and OpenAI Gym. Instead, garage will detect what software you have installed and activate features accordingly. This will make it much easier to mix-and-match garage features you'd like to take advantage of, without having to install a giant list of all possible garage dependencies into your project.

    More algorithms and training environments

    We plan on adding more multi-task and meta-RL methods, such as PCGrad and ProMP. We also plan to add better support for gameplay domains and associated DQN-family algorithms, and will start adding first-class support for imitation learning.

    For training environments, we are actively working on adding PyBullet support.

    What about TensorFlow 2.0 support?

    Given the uncertainty about the future of TensorFlow, and frequent reports of performance regressions when using TF2, core maintainers have paused work on moving the TensorFlow tree to use the new TF2 eager execution semantics. Note that garage can be installed using TensorFlow 2, but will still make use of the Graph APIs under tf.compat.v1. We are also focusing new algorithm development on the PyTorch tree, but will continue to perform proactive maintenance and usability improvements in the TensorFlow tree.

    We'll revisit this decision after the next release (v2020.09), when we hope the future of TensorFlow APIs is more clear. We suggest those who really need eager execution APIs today should instead focus on garage.torch.

    Users who are eager to add garage support for TF2 are welcome to become contributors and start sending us Pull Requests.

    Contributors to this release

    • Ryan Julian (@ryanjulian)
    • K.R. Zentner (@krzentner)
    • Anson Wong (@ahtsan)
    • Gitanshu Sardana (@gitanshu)
    • Zequn Yu (@zequnyu)
    • Keren Zhu (@naeioi)
    • Avnish Narayan (@avnishn)
    • Linda Wong (@lywong92)
    • Mishari Aliesa (@maliesa96)
    • Yonghyun Cho (@yonghyuc)
    • Utkarsh Patel (@utkarshjp7)
    • Chang Su (@CatherineSue)
    • Eric Yihan Chen (@AiRuiChen)
    • Iris Liu (@irisliucy)
    • Ruofu Wang (@yeukfu)
    • Hayden Shively (@haydenshively)
    • Gagan Khandate (@gagankhandate)
    • Lucas Barcelos de Oliveira (@lubaroli)
    Source code(tar.gz)
    Source code(zip)
  • v2020.05rc1(May 19, 2020)

  • v2020.04rc1(Apr 29, 2020)

    This is the second release candidate for the forthcoming v2020.04 release. It contains several API changes and improvements over the v2019.10 series, including more PyTorch algorithms and support for meta- and multi-task RL.

    We encourage users to install release candidates if they'd like cutting-edge features without the day-to-day instability of installing from tip. Please see the release notes for v2019.10 for more info on what to expect in the v2020.04 release.

    Note: due to COVID-19, the 2020.02 release has been delayed to April, and will be numbered v2020.04 to reflect this new reality.

    Source code(tar.gz)
    Source code(zip)
  • v2020.02.0rc1(Dec 9, 2019)

    This is the first release candidate for the forthcoming v2020.02 release. It contains several API changes and improvements over the v2019.10 series.

    We encourage users to install release candidates if they'd like cutting-edge features without the day-to-day instability of installing from tip. Please see the release notes for v2019.10 for more info on what to expect in the v2020.02 release.

    Source code(tar.gz)
    Source code(zip)
  • v2019.10.1(Dec 9, 2019)

    This is a maintenance release for 2019.10.

    Added

    • Integration tests which cover all example scripts (#1078, #1090)
    • Deterministic mode support for PyTorch (#1068)
    • Install script support for macOS 10.15.1 (#1051)
    • PyTorch modules now support either functions or modules for specifying their non-linearities (#1038)

    Fixed

    • Errors in the documentation on implementing new algorithms (#1074)
    • Broken example for DDPG+HER in TensorFlow (#1070)
    • Error in the documentation for using garage with conda (#1066)
    • Broken pickling of environment wrappers (#1061)
    • garage.torch was not included in the PyPI distribution (#1037)
    • A few broken examples for garage.tf (#1032)
    Source code(tar.gz)
    Source code(zip)
  • v2019.10.0(Nov 5, 2019)

    The Reinforcement Learning Working Group is proud to announce the 2019.10 release of garage.

    As always, we are actively seeking new contributors. If you use garage, please consider submitting a PR with your algorithm or improvements to the framework.

    Summary

    Please see the CHANGELOG for detailed information on the changes in this release.

    This release contains an immense number of improvements and new features for garage.

    It includes:

    • PyTorch support, including DDPG and VPG (94% test coverage)
    • Flexible new TensorFlow Model API and complete re-write of the TensorFlow neural network library using it (93% test coverage)
    • Better APIs for defining, running, and resuming experiments
    • New logging API with dowel, which allows a single log() call to stream logs of virtually any object to the screen, disk, CSV files, TensorBoard, and more.
    • New algorithms including (D)DQN and TD3 in TensorFlow, and DDPG and VPG in PyTorch
    • Distribution via PyPI -- you can now pip install garage!

    Read below for more information on what's new in this release. See Looking forward for more information on what to expect in the next release.

    Why we skipped 2019.06

    After 2019.02 we made some large, fundamental changes in garage APIs. Around June these APIs were defined, but the library was in limbo, with some components using new APIs and other using old APIs. Rather than release a half-baked version, we decided our time was better spent getting the toolkit in shape for the next release.

    We intend to return to our regularly-scheduled release cadence for 2020.02.

    PyTorch Support

    We added the garage.torch tree and primitives which allow you to define and train on-policy and off-policy algorithms in PyTorch.

    Though the tree is small, the algorithms in this this tree achieve state-of-the-art performance, have 94% test coverage, and use idiomatic PyTorch constructs with garage APIs. Expect to see many more algorithms and primitives in PyTorch in future releases.

    garage.tf.Model API and TensorFlow primitives re-write

    The garage.tf.layers library quickly became a maintenance burden, and was hindering progress in TensorFlow.

    To escape from under this unmaintainable custom library, we embarked on a complete re-write of the TensorFlow primitives around a new API called garage.tf.Model. This new API allows you to use idiomatic TensorFlow APIs to define reusable components for RL algorithms such as Policies and Q-functions.

    Defining a new primitive in garage is easier than ever, and most components you want (e.g. MLPs, CNNs, RNNs) already exist as re-usable and composable Model classes.

    Runner API and improvements to experiment snapshotting and resuming

    We defined a new Runner API, which unifies how all algorithms, samplers, and environments interact to create an experiment. Using LocalRunner handles many of the important minutiae of running a successful experiment, including logging, snapshotting, and consistent definitions of batch size and other hyperparameters.

    LocalRunner also makes it very easy to resume an experiment from an arbitrary iteration from disk, either using the Python API, or invoked from command line the garage command (e.g. garage resume path/to/experiment).

    See the examples for how to run an algorithm using LocalRunner.

    Log anything to anywhere with dowel

    We replaced the garage.misc.logger package with a new flexible logger, which is implemented in a new package called dowel.

    dowel has all of the features of the old logger, but a simpler well-defined API, and support logging any object to any number of outputs, provided a handler has been provided for that object and output. For instance, this allows us to log the TensorFlow graph to TensorBoard using a line like logger.log(tf.get_default_graph()), and a few lines below to log a message to the console like logger.log('Starting training...').

    Dowel knows how to log key-value pairs, TensorFlow graphs, strings, and even histograms. Defining new logger outputs and input handlers is easy. Currently dowel supports output to the console, text files, CSVs, TensorBoard. Add your own today!

    pip install garage

    We delivered many improvements to make garage installable using only pip. You no longer need to run a setup script to install system dependencies, unless you'd like support for MuJoCo. We now automatically release new versions to pip.

    This also means using garage with the environment manager of your choice is easy. We test virtualenv, pipenv, and conda in our CI pipeline to garage can always successfully install in your environment.

    Extensive maintainability and documentation improvements

    This release includes extensive maintainability and documentation improvements. Most of these are behind-the-scenes, but make an immense difference in the reliability and usability of the toolkit.

    Highlights:

    • Unit test coverage increased from ~30% to ~80%
    • Overall test coverage increased from ~50% to ~85%
    • Overall coverage for garage.tf and garage.torch (which is where algorithm-performance critical code lives) is ~94%
    • TensorFlow and PyTorch algorithms are benchmarked before every commit to master
    • Every primitive is pickleable/snapshottable and this is tested in the CI
    • Docstrings added to all major APIs, including type information
    • API documentation is automatically generated and posted to https://garage.readthedocs.io
    • Large amounts of old and/or unused code deleted, especially from garage.misc

    Who should use this release, and how

    Users who want to base a project on a semi-stable version of this software, and are not interested in bleeding-edge features should use the release branch and tags.

    Platform support

    This release has been tested extensively on Ubuntu 16.04 and 18.04. We have also used it successfully on macOS 10.13, 10.14, and 10.15.

    Maintenance Plan

    We plan on supporting this branch until at least June 2020. Our support will come mostly in the form of attempting to reproduce and fix critical user-reported bugs, conducting quality control on user-contributed PRs to the release branch, and releasing new versions when fixes are committed.

    We haven no intention of performing proactive maintenance such as dependency upgrades, nor new features, tests, platform support, or documentation. However, we welcome PRs to the maintenance branch (release-2019.10) from contributors wishing see these enhancements to this version of the software.

    Hotfixes

    We will post backwards-compatible hotfixes for this release to the branch release-2019.10. New hotfixes will also trigger a new release tag which complies with semantic versioning, i.e. the first hotfix release would be tagged v2019.10.1, the second would be tagged v2019.10.2, etc.

    We will not add new features, nor remove existing features from the branch release-2019.02 unless it is absolutely necessary for the integrity of the software.

    Next release

    We hope to release 2-3 times per year, approximately aligned with the North American academic calendar. We hope to release next around early February 2020, e.g. v2020.02.

    Looking forward

    The next release of garage will focus primarily on two goals: meta- and multi-task RL algorithms (and associated toolkit support) and stable, well-defined component APIs for fundamental RL abstractions such as Policy, QFunction, ValueFunction, Sampler, ReplayBuffer, Optimizer, etc.

    Meta- and Mulit-Task RL

    We are adding a full suite of meta-RL and multi-task RL algorithms to the toolkit, and associated toolkit support where necessary. We would like garage to be the gold standard library for meta- and multi-task RL implementations.

    As always, all new meta- and multi-task RL algorithms will be thoroughly tested and verified to meet-or-exceed the best state-of-the-art implementation we can find.

    Stable and well-defined component APIs

    The toolkit has gotten mature-enough that most components have a fully-described formal API or an informal API which all components of that type implement, and large-enough that we have faith that our existing components cover most current RL use cases.

    Now we will turn to formalizing the major component APIs and ensuring that the components in garage all conform to these APIs. This will allow us to simplify lots of logic throughout the toolkit, and will make it easier to mix components defined outside garage with those defined inside garage.

    Idiomatic TensorFlow model and tensorflow_probability

    While the implementation of the primitives using garage.tf.Model is complete, their external API still uses the old style from rllab which defines a new feedforward graph for every call to a symbolic API. For instance, a call to GaussianMLPPolicy.log_likelihood_sym() will create a copy of the GaussianMLPPolicy graph which implements GaussianMLPPolicy.get_action() (the two graphs share parameters so optimization results are unaffected). This is not idiomatic TensorFlow, and can be a source of confusion for algorithm implementers.

    Now that we have stable and well-tested back-end for the primitives, we will embark on simplifying their APIs to only have a single feedforward path. We will also transition to using tensorflow_probability for modeling stochastic primitives.

    Now that TensorFlow has started to define first-party APIs for composable models (specifically tf.Module and tf.keras.Model), we will look into integrating these with garage.tf.Model.

    What about TensorFlow 2.0 support?

    We intend to support TensorFlow 2.x and eager execution in the near future, but it may take a release or two to get there. We believe that the garage.tf.Model API already makes writing neural network code for RL nearly as painless as TensorFlow 2.0, so most users won't notice much of a difference.

    We suggest that who really need eager execution APIs today should instead focus on garage.torch.

    For the coming release, we will focus on moving all of our algorithms and primitives to using idiomatic TensorFlow and TensorFlow Probability. Our in-progress transition to garage.tf.Model and idiomatic usage of TensorFlow will drastically reduce the amount of code which changes between TensorFlow 2.x and 1.x, so we will focus on that before embarking on TF2 support. This will also give TensorFlow 2.x APIs time to stabilize, and time for its performance to catch up to TensorFlow 1.x (there is currently a 10-20% performance hit for using eager execution).

    If all goes well, we may be able to begin TF2 support around the 2020.06 release. If you are interested in seeing this happen faster, please contact us on the issue tracker and we will get you started helping with the port!

    Contributors to this release

    • Ryan Julian (@ryanjulian)
    • Anson Wong (@ahtsan)
    • Nisanth Hegde (@nish21)
    • Keren Zhu (@naeioi)
    • Zequn Yu (@zequnyu)
    • Gitanshu Sardana (@gitanshu)
    • Utkarsh Patel (@utkarshjp7)
    • Avnish Narayan (@avnishn)
    • Linda Wong (@lywong92)
    • Yong Cho (@yonghyuc)
    • K.R. Zentner (@krzentner)
    • Peter Lillian (@pelillian)
    • Angel Ivan Gonzalez (@gonzaiva)
    • Kevin Cheng (@cheng-kevin)
    • Chang Su (@CatherineSue)
    • Jonathon Shen (@jonashen)
    • Zhanpeng He (@zhanpenghe)
    • Shadi Akiki (@shadiakiki1986)
    • Nate Pham (@nhanph)
    • Dhiaeddine Gharsallah (@dgharsallah)
    • @wyjw
    Source code(tar.gz)
    Source code(zip)
  • v2019.02.2(Nov 5, 2019)

    This is a maintenance release for 2019.02.

    This is the final maintenance release for this version, as described in our maintenance plan.

    Users should expect no further bug fixes for 2019.02, and should plan on moving their projects onto 2019.10 ASAP. Maintainers will accept PRs for the 2019.02 branch which fully conform to the contributor's guide, but will not proactively backport new fixes into the release branch.

    This release fixes several small bugs:

    • Improper implementation of entropy regularization in TensorFlow PPO/TRPO (#579)
    • Broken advantage normalization was broken for recurrent policies (#626)
    • Bug in examples/sim_policy.py (#691)
    • FiniteDifferenceHvp was not pickleable (#745)
    Source code(tar.gz)
    Source code(zip)
  • v2019.02.1(Nov 5, 2019)

    This is a maintenance release for v2019.02.

    This release fixes a bug (#622) in GaussianMLPRegressor which causes many on-policy algorithms to run slower with each iteration, eventually virtually-stopping the training process.

    Projects based on v2019.02 are encouraged to upgrade ASAP.

    Source code(tar.gz)
    Source code(zip)
  • v2019.02.0(Mar 2, 2019)

    The Reinforcement Learning Working Group is proud to announce the 2019.02 release of garage.

    We are actively seeking new contributors. If you use garage, please consider submitting a PR with your algorithm or improvements to the framework.

    Summary

    Please see the CHANGELOG for detailed information on the changes in this release.

    Splitting garage into packages

    Most changes in this released are focused on moving garage towards a modular future. We are moving the framework from a single monolithic repository to a family of independent Python packages, where each package serves a well-defined single purpose.

    This will help garage have the widest impact by:

    • Allowing users to pick-and-choose which parts of the software fit well for their project, making using garage not an all-or-nothing decision
    • Making the the framework more stable, because smaller codebases are easier to test and maintain
    • Making it easier to introduce new frameworks (e.g. PyTorch) and features more easily, by forcing API separation between different parts of the software
    • Separating parts of the software at different maturity levels into different packages, making it easier for users to know which parts are stable and well-tested, and which parts are experimental and quickly-changing

    In service of that goal, in this release we moved 3 packages to independent repositories with their own packages on PyPI (e.g. you can pip install <package>).

    • akro: Spaces types for reinforcement learning (from garage.spaces)
    • viskit: Hyperparamter-tuning dashboard for reinforcement learning experiments (from garage.viskit)
    • metaworlds: Environments for benchmarking meta-learning and multi-task learning (from garage.envs.mujoco and garage.envs.box2d)
    • gym-sawyer: Simulations and ROS bindings for the Sawyer robot, based on the openai/gym interface (from garage.envs.mujoco.sawyer and garage.envs.ros)

    Deleting redundant or unused code

    We've also started aggressively deleting unused code, or code where a better implementation already exists in the community. The largest example of this is MuJoCo and Box2D environments, many of which we removed because they have well-tested equivalents in openai/gym. Expect to find many other smaller examples in this and future releases.

    Deleting Theano

    We completed feature-parity between the Theano and TensorFlow trees, and deleted the Theano tree because we have not found any future interest in maintaining it. We made sure to port over all algorithms available in Theano to TensorFlow before making this change.

    Preparing garage for PyTorch and other frameworks

    We have started a full rewrite of the experiment definition, experiment deployment, snapshotting, and logging functionality in garage. This will allow new algorithm libraries or research projects to easily use garage tooling (e.g. logging, snapshotting, environment wrappers), irrespective of what numerical framework they use.

    conda is now optional

    While we still use conda in the CI environment for garage, we've moved all Python dependency information into a canonical setup.py file. While we are not releasing garage on PyPI yet, this means you can use any Python environment manager you'd like (e.g. pipenv, virtualenv, etc.) for your garage projects. In the future, we will add CI checks to make sure that the environment installs successfully in the most popular Python environment managers.

    Primitives for pixel-based policies

    We added CNN and wrapper primitives useful for pixel-based algorithms. Our implementation of DQN is forthcoming, since we are still benchmarking to make we can guarantee state-of-the-art performance.

    Updated Docker support

    We completely rewrote the garage Dockerfiles, added docker-compose examples for using them in your projects, and added a Makefile to help you easily execute your experiments using Docker (for both CPU and GPU machines). We use these Dockerfiles to run out own CI environment, so you can be sure that they are always updated.

    Who should use this release, and how

    Users who want to base a project on a semi-stable version of this software, and are not interested in bleeding-edge features should use the release branch and tags.

    As always, we recommend existing rllab users migrate their code to a garage release ASAP.

    Platform support

    This release has been tested extensively on Ubuntu 16.04 and 18.04. We have also used it successfully on macOS 10.12, 10.13, and 10.14.

    Maintenance Plan

    We plan on supporting this branch until at least October 2019. Our support will come mostly in the form of attempting to reproduce and fix critical user-reported bugs, conducting quality control on user-contributed PRs to the release branch, and releasing new versions when fixes are committed.

    We haven no intention of performing proactive maintenance such as dependency upgrades, nor new features, tests, platform support, or documentation. However, we welcome PRs to the maintenance branch (release-2019.02) from contributors wishing see these enhancements to this version of the software.

    Hotfixes

    We will post backwards-compatible hotfixes for this release to the branch release-2019.02. New hotfixes will also trigger a new release tag which complies with semantic versioning, i.e. the first hotfix release would be tagged v2019.02.1, the second would be tagged v2019.02.2, etc.

    We will not add new features, nor remove existing features from the branch release-2019.02 unless it is absolutely necessary for the integrity of the software.

    Next release

    We hope to release 2-3 times per year, approximately aligned with the North American academic calendar. We hope to release next around early June 2019, e.g. v2019.06.

    See Looking forward for more information on what to expect in the next release.

    Looking forward

    The next release of garage will focus primarily on two related goals: PyTorch support and completely-revamped component APIs. These are linked because gracefully supporting more than one framework requires well-defined interfaces for the sampler, logger, snapshotter, RL agent, and other components.

    For TensorFlow algorithms development, we are focusing on adding a full suite of pixel-oriented RL algorithms to the TensorFlow tree, and on adding meta-RL algorithms to and associated new interfaces. We will also finish removing the custom layers library from the TensorFlow tree, and replacing it with code based on vanilla TensorFlow, a new abstraction called Model(inspired by the torch.nn.Module interface). We will also finish removing the custom garage.tf.distributions library and replacing it with fully-differentiable components from tensorflow-probability.

    For PyTorch algorithms development, we hope to add garage support to a fork of rlkit, to prove the usefulness of our tooling for different algorithm libraries.

    You can expect to see several more packages split from garage (e.g. the TensorFlow algorithm suite and experiment runner/sampler/logger), along with many API changes which make it easier to use those components independently from the garage codebase.

    Contributors to this release

    • Ryan Julian (@ryanjulian)
    • Chang Su (@CatherineSue)
    • Angel Ivan-Gonzalez (@gonzaiva)
    • Anson Wong (@ahtsan)
    • Keren Zhu (@naeioi)
    • K.R. Zentner (@krzentner)
    • Zhanpeng He (@zhanpenghe)
    • Jonathon Shen (@jonashen)
    • Gautam Salhotra (@gautams3)
    Source code(tar.gz)
    Source code(zip)
  • v2018.10.1(Mar 1, 2019)

    This is a maintenance release for v2018.10. It contains several bug fixes on top of the v2018.10.0 release, but no new features and API changes.

    We encourage projects based on v2018.10.0 to rebase onto v2018.10.1 without fear, so that they can enjoy better stability.

    Source code(tar.gz)
    Source code(zip)
  • v2018.10.0(Oct 31, 2018)

    The Reinforcement Learning Working Group is proud to announce the 2018.10 release of garage.

    We are actively seeking new contributors. If you use garage, please consider submitting a PR with your algorithm or improvements to the framework.

    Summary

    This release's life began as a maintenance fork of rllab. The original authors of rllab, current maintainers, and heavy users conferred about the future of the project. We reached a consensus to continue development of rllab under the new name "garage," and to organize future development within a GitHub organization which is detached from any particular institution. We named this organization the Reinforcement Learning Working Group.

    Most changes in this release concern stability, dependency updates, platform support, testing, and maintainability. We added many pieces of automation which are invisible to everyday users, but greatly assist in speeding garage development and keeping the framework stable. We have made many attempts to remove code which we did not think we could support in the future, though some unstable parts (e.g. viskit, EC2 support in garage.misc.instrument) remain and should be treated with caution. We welcome PRs for features which need updates and improvements.

    We finished building out the TensorFlow tree, added a few algorithms in TensorFlow (e.g. PPO, TRPO, DDPG with HER), and promoted it out of sandbox into the main tree as garage.tf.

    Likewise, we moved all Theano-specific modules into their own subtree (garage.theano) to separate the framework-agnostic and framework-specific parts of garage clearly.

    New features include TensorBoard support in the logger, support for dm_control environments, and a general implementation of dynamics randomization for MuJoCo-based environments.

    Users migrating their projects from rllab should consult the migration instructions below.

    Please see the CHANGELOG for detailed information on the changes in this release.

    Who should use this release, and how

    Users who want to base a project on a semi-stable version of this software, and are not interested in bleeding-edge features (e.g. PyTorch support) should use the release branch and tags. We also recommend existing rllab users migrate their code to this release ASAP.

    Platform support

    This release has been tested extensively on Ubuntu 16.04. We have also used it successfully on Ubuntu 18.04 and on macOS 10.12, 10.13, and 10.14.

    Maintenance Plan

    We plan on supporting this branch until at least June 2019. Our support will come mostly in the form of attempting to reproduce and fix critical user-reported bugs, conducting quality control on user-contributed PRs to the release branch, and releasing new versions when fixes are committed.

    We haven no intention of performing proactive maintenance such as dependency upgrades, nor new features, tests, platform support, or documentation. However, we welcome PRs to the maintenance branch (release-2018.10) from contributors wishing see these enhancements to this version of the software.

    Hotfixes

    We will post backwards-compatible hotfixes for this release to the branch release-2018.10. New hotfixes will also trigger a new release tag which complies with semantic versioning, i.e. the first hotfix release would be tagged v2018.10.1, the second would be tagged v2018.10.2, etc.

    We will not add new features, nor remove existing features from the branch release-2018.10 unless it is absolutely necessary for the integrity of the software.

    Next release

    v2018.10 marks the first in what will hopefully be a long line of regular releases. We hope to release 2-3 times per year, approximately aligned with the North American academic calendar. We hope to release next around early February 2019, e.g. v2019.02.

    See Looking forward for more information on what to expect in the next release.

    Migrating from rllab

    garage is based on a predecessor project called rllab. Migrating from rllab to garage should be mostly painless, but not completely automatic. Some classes and functions from rllab have been renamed, moved, or had their signatures changed. Very few have been removed. Follow the process below to migrate.

    1. Install the garage conda environment Execute the installation script for Linux or macOS. This will create a separate conda environment named "garage" will, so there won't be any conflicts with a previous installation of a "rllab" environment. However, be aware that both scripts try to install miniconda, so there could be conflicts if you already have a different conda installation. If you're not using conda for other purpose, the best option is to remove it as indicated here. For a more granular installation of garage, read the installation scripts and only execute those commands that are required for your system.

    2. Rebase or retarget your repository on garage This step will be very specific to your project. Essentially, get garage into your PYTHONPATH, e.g. by moving your rllab sandbox into garage/sandbox, or by editing your environment configuration.

    3. Replace rllab package imports with garage package imports e.g. from rllab.core import Serializable becomes from garage.core import Serializable. Note that some import paths have changed. Please check the CHANGELOG for hints at where to look for changes which affect your project

    4. Run your launcher files and tests, and make sure everything is working as-expected. Fix errors as you find them.

    Looking forward

    The next release of garage will focus primarily on two goals: PyTorch support and splitting garage into federated packages.

    Our goal is to make the software foundation for reproducible reinforcement learning research. That requires good maintenance, stability, and widespread adoption. We believe breaking garage up is necessary to ensure the future maintainability of the project, and to speed adoption.

    The next release will likely bring many breaking changes to garage, along with a new federated project structure which splits what is currently called garage into a family of several Python packages with independent repositories, development infrastructure, dependency management, and documentation. The federated packages will be pip dependencies which may be downloaded and used in any project.

    Today, using garage is an all-or-nothing choice for a prospective user. He or she may either buy into the entire ecosystem — algorithms, experiment runner, conda environment, custom environments, plotter, etc. — or use none of it at all.

    Our goal for the next release is to take the first steps towards unbundling garage into a family easy-to-adopt Python packages with well-designed interfaces which allow them to easily work together, or be used separately. Stay tuned to this GitHub repository for details and proposals, and to give your own input.

    Here's an example of how a federated garage might be split into packages:

    Experiment runner

    • Experiment runner framework and algorithm interface (parallel samplers, logging, live plotting, deployment support for local, EC2, GCP targets, etc.). Contains abstractions for defining experiments, running them, monitoring them, collecting results, and visualizing results.

      Likely based on lagom and/or ray

    Algorithm libraries

    • Library of numpy-based RL algorithms, math utilities for RL, and useful algorithm base classes

    • Library of TensorFlow-based algorithms

    • Library of PyTorch-based RL algorithms

      Note: we plan on removing Theano support by the next release.

    Environments

    • gym.Env wrappers and dependency management for popular environments (e.g. dm_control, ALE, pybullet)
    • Custom single-task environments
    • Custom multi-task/meta-learning environments and supporting wrappers
    • Custom robotics environments, and bindings to ROS for real robot execution

    Utilities

    • A small library of Python types for RL (e.g. garage.spaces)
    • Experiment results visualization toolkit (e.g. viskit)

    Distribution Repository (garage) garage becomes an application repository which pulls all of the above together into a single environment, and demonstrates how to use them. New projects can use garage as a template or as an upstream.

    Contributors to this release

    • Ryan Julian (@ryanjulian)
    • Jonathon Shen (@jonashen)
    • Angel Ivan-Gonzalez (@gonzaiva)
    • Chang Su (@CatherineSue)
    • Hejia Zhang (@hjzh4)
    • Zhanpeng He (@zhanpenghe)
    • Junchao Chen (@cjcchen)
    • Keren Zhu (@naeioi)
    • Peter Lillian (@pelillian)
    • Gautam Salhotra (@gautams3)
    • Anson Wong (@ahtsan)
    Source code(tar.gz)
    Source code(zip)
Owner
Reinforcement Learning Working Group
Coalition of researchers which develop open source reinforcement learning research software
Reinforcement Learning Working Group
Open world survival environment for reinforcement learning

Crafter Open world survival environment for reinforcement learning. Highlights Crafter is a procedurally generated 2D world, where the agent finds foo

Danijar Hafner 213 Jan 05, 2023
A fork of OpenAI Baselines, implementations of reinforcement learning algorithms

Stable Baselines Stable Baselines is a set of improved implementations of reinforcement learning algorithms based on OpenAI Baselines. You can read a

Ashley Hill 3.7k Jan 01, 2023
Dopamine is a research framework for fast prototyping of reinforcement learning algorithms.

Dopamine Dopamine is a research framework for fast prototyping of reinforcement learning algorithms. It aims to fill the need for a small, easily grok

Google 10k Jan 07, 2023
TF-Agents: A reliable, scalable and easy to use TensorFlow library for Contextual Bandits and Reinforcement Learning.

TF-Agents: A reliable, scalable and easy to use TensorFlow library for Contextual Bandits and Reinforcement Learning. TF-Agents makes implementing, de

2.4k Dec 29, 2022
A toolkit for developing and comparing reinforcement learning algorithms.

Status: Maintenance (expect bug fixes and minor updates) OpenAI Gym OpenAI Gym is a toolkit for developing and comparing reinforcement learning algori

OpenAI 29.6k Jan 01, 2023
A customisable 3D platform for agent-based AI research

DeepMind Lab is a 3D learning environment based on id Software's Quake III Arena via ioquake3 and other open source software. DeepMind Lab provides a

DeepMind 6.8k Jan 05, 2023
ChainerRL is a deep reinforcement learning library built on top of Chainer.

ChainerRL ChainerRL is a deep reinforcement learning library that implements various state-of-the-art deep reinforcement algorithms in Python using Ch

Chainer 1.1k Dec 26, 2022
A general-purpose multi-agent training framework.

MALib A general-purpose multi-agent training framework. Installation step1: build environment conda create -n malib python==3.7 -y conda activate mali

MARL @ SJTU 346 Jan 03, 2023
TensorFlow Reinforcement Learning

TRFL TRFL (pronounced "truffle") is a library built on top of TensorFlow that exposes several useful building blocks for implementing Reinforcement Le

DeepMind 3.1k Dec 29, 2022
This is the official implementation of Multi-Agent PPO.

MAPPO Chao Yu*, Akash Velu*, Eugene Vinitsky, Yu Wang, Alexandre Bayen, and Yi Wu. Website: https://sites.google.com/view/mappo This repository implem

653 Jan 06, 2023
Tensorforce: a TensorFlow library for applied reinforcement learning

Tensorforce: a TensorFlow library for applied reinforcement learning Introduction Tensorforce is an open-source deep reinforcement learning framework,

Tensorforce 3.2k Jan 02, 2023
Retro Games in Gym

Status: Maintenance (expect bug fixes and minor updates) Gym Retro Gym Retro lets you turn classic video games into Gym environments for reinforcement

OpenAI 2.8k Jan 03, 2023
An open source robotics benchmark for meta- and multi-task reinforcement learning

Meta-World Meta-World is an open-source simulated benchmark for meta-reinforcement learning and multi-task learning consisting of 50 distinct robotic

Reinforcement Learning Working Group 823 Jan 06, 2023
Game Agent Framework. Helping you create AIs / Bots that learn to play any game you own!

Serpent.AI - Game Agent Framework (Python) Update: Revival (May 2020) Development work has resumed on the framework with the aim of bringing it into 2

Serpent.AI 6.4k Jan 05, 2023
Monitor your el-cheapo UPS via SNMP

UPSC-SNMP-Agent UPSC-SNMP-Agent exposes your el-cheapo locally connected UPS via the SNMP network management protocol. This enables various equipment

Tom Szilagyi 32 Jul 28, 2022
Deep Reinforcement Learning for Keras.

Deep Reinforcement Learning for Keras What is it? keras-rl implements some state-of-the art deep reinforcement learning algorithms in Python and seaml

Keras-RL 5.4k Jan 04, 2023
A toolkit for reproducible reinforcement learning research.

garage garage is a toolkit for developing and evaluating reinforcement learning algorithms, and an accompanying library of state-of-the-art implementa

Reinforcement Learning Working Group 1.6k Jan 09, 2023
Reinforcement Learning Coach by Intel AI Lab enables easy experimentation with state of the art Reinforcement Learning algorithms

Coach Coach is a python reinforcement learning framework containing implementation of many state-of-the-art algorithms. It exposes a set of easy-to-us

Intel Labs 2.2k Jan 05, 2023
Paddle-RLBooks is a reinforcement learning code study guide based on pure PaddlePaddle.

Paddle-RLBooks Welcome to Paddle-RLBooks which is a reinforcement learning code study guide based on pure PaddlePaddle. 欢迎来到Paddle-RLBooks,该仓库主要是针对强化学

AgentMaker 117 Dec 12, 2022