Multi-objective gym environments for reinforcement learning.

Last update: Jan 03, 2023

Overview

MO-Gym: Multi-Objective Reinforcement Learning Environments

Gym environments for multi-objective reinforcement learning (MORL). The environments follow the standard gym's API, but return vectorized rewards as numpy arrays.

For details on multi-objective MPDS (MOMDP's) and other MORL definitions, see A practical guide to multi-objective reinforcement learning and planning.

Install

git clone https://github.com/LucasAlegre/mo-gym.git
cd mo-gym
pip install -e .

Usage

import gym
import mo_gym

env = gym.make('minecart-v0') # It follows the original gym's API ...

obs = env.reset()
next_obs, vector_reward, done, info = env.step(your_agent.act(obs))  # but vector_reward is a numpy array!

# Optionally, you can scalarize the reward function with the LinearReward wrapper
env = mo_gym.LinearReward(env, weight=np.array([0.8, 0.2, 0.2]))

Environments

Env	Obs/Action spaces	Objectives	Description
`deep-sea-treasure-v0`	Discrete / Discrete	`[treasure, time_penalty]`	Agent is a submarine that must collect a treasure while taking into account a time penalty. Treasures values taken from Yang et al. 2019.
`resource-gathering-v0`	Discrete / Discrete	`[enemy, gold, gem]`	Agent must collect gold or gem. Enemies have a 10% chance of killing the agent. From Barret & Narayanan 2008.
`four-room-v0`	Discrete / Discrete	`[item1, item2, item3]`	Agent must collect three different types of items in the map and reach the goal.
`mo-mountaincar-v0`	Continuous / Discrete	`[time_penalty, reverse_penalty, forward_penalty]`	Classic Mountain Car env, but with extra penalties for the forward and reverse actions. From Vamplew et al. 2011.
`mo-reacher-v0`	Continuous / Discrete	`[target_1, target_2, target_3, target_4]`	Reacher robot from PyBullet, but there are 4 different target positions.
`minecart-v0`	Continuous or Image / Discrete	`[ore1, ore2, fuel]`	Agent must collect two types of ores and minimize fuel consumption. From Abels et al. 2019.
`mo-supermario-v0`	Image / Discrete	`[x_pos, time, death, coin, enemy]`	Multi-objective version of SuperMarioBrosEnv. Objectives are defined similarly as in Yang et al. 2019.

Citing

If you use this repository in your work, please cite:

@misc{mo-gym,
  author = {Lucas N. Alegre},
  title = {MO-Gym: Multi-Objective Reinforcement Learning Environments},
  year = {2022},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/LucasAlegre/mo-gym}},
}

Acknowledgments

The minecart-v0 env is a refactor of https://github.com/axelabels/DynMORL.
The deep-sea-treasure-v0 and mo-supermario-v0 are based on https://github.com/RunzheYang/MORL.
The four-room-v0 is based on https://github.com/mike-gimelfarb/deep-successor-features-for-transfer.

Comments

Adds the breakable bottles environment

Adds the breakable bottles environment which is used in Vamplew et al. 2021 as a toy model for irreversible change in stochastic environments.

I wasn't really planning for creating a pull request, so the commit history is a bit messy...

opened by rk1a 4
A few bug fixes
DST:

The bounds of the rewards were hardcoded for the convex map.

The way to fix the seed is deprecated. From what I saw in the official gym envs, the seed is now fixed just using the reset method. (e.g. https://github.com/openai/gym/blob/master/gym/envs/classic_control/cartpole.py#L198)

setup.py:

Gym 0.25.0 introduces breaking changes. So I fixed the version to 0.24.1.
opened by ffelten 2
Consider using info field for reward vector

Hello,

Thanks for this repository, it will be very useful to the MORL community :-).

I was just wondering if you think it would be a good idea to enforce gym compatibility by specifying rewards as scalar and giving the vectorial rewards elsewhere. The idea would be to use a field in the info dictionary as they do in PGMORL. This would allow to use existing RL algorithms and logging libraries out of box (e.g. stable-baselines, tensorboard logs, ...).

For example: In a DST env, if you return the treasure reward only in the reward field, you can use the DQN implementation from baselines and have insights on the average reward, as well as the episode length in the tensorboard logs. Of course, you can extract the full vectorial reward from the info dictionary in order to learn with MORL :-).

With kind regards,

Florian

opened by ffelten 2
Add MO reward wrappers

I added two wrappers commonly used: normalize and clip.

The idea is to provide the index of the reward component you want to normalize or clip, and leave the other components as they are. Of course, wrappers can be wrapped inside others to normalize all rewards (see tests).

opened by ffelten 1

Fix notebook

There are still issues with the video recorder :(

/usr/local/lib/python3.9/site-packages/gym/wrappers/monitoring/video_recorder.py:59: UserWarning: WARN: Disabling video recorder because environment <TimeLimit<OrderEnforcing<MOMountainCar<mo-mountaincar-v0>>>> was not initialized with any compatible video mode between `rgb_array` and `rgb_array_list`
  logger.warn(

opened by ffelten 0

Add fishwood env

Code was provided by Denis Steckelmacher, I did a bit of refactoring and migrated it to 0.26.

I didn't bother making the render with the images, but I did upload them in case somebody gets motivated, the env is super simple.

opened by ffelten 0
Add wrapper to help logging episode returns

The implementation is mostly a copy paste of the original gym. I had to copy paste instead of override and call to super because the way the return is a numpy array, which is mutable, and the original implementation resets it to 0. Hence, if we kept the original, the return will always be a vector of zeros (because resetted)

opened by ffelten 0

Releases(0.2.1)

0.2.1(Dec 9, 2022)
5 new environments: fishwood-v0 (ESR), mo-MountainCarContinuous-v0, water-reservoir-v0, mo-highway-v0 and mo-highway-fast-v0;

Revamped README file;

Linting and automatic imports optimization;

Updated bib file and citation;

Few bugfixes.

Source code(tar.gz)
Source code(zip)
0.2.0(Sep 25, 2022)

Support for new Gym>=0.26 API
Source code(tar.gz)
Source code(zip)
0.1.2(Sep 25, 2022)

Source code(tar.gz)
Source code(zip)
0.1.1(Aug 24, 2022)

Source code(tar.gz)
Source code(zip)

Owner

Lucas Alegre

PhD student at Institute of Informatics - UFRGS. Interested in reinforcement learning, machine learning and artificial (neuro-inspired) intelligence.

GitHub Repository

Consensus Learning from Heterogeneous Objectives for One-Class Collaborative Filtering

Consensus Learning from Heterogeneous Objectives for One-Class Collaborative Filtering This repository provides the source code of "Consensus Learning

6 Apr 29, 2022

public repo for ESTER dataset and modeling (EMNLP'21)

Project / Paper Introduction This is the project repo for our EMNLP'21 paper: https://arxiv.org/abs/2104.08350 Here, we provide brief descriptions of

19 Oct 27, 2022

Differentiable Neural Computers, Sparse Access Memory and Sparse Differentiable Neural Computers, for Pytorch

Differentiable Neural Computers and family, for Pytorch Includes: Differentiable Neural Computers (DNC) Sparse Access Memory (SAM) Sparse Differentiab

302 Dec 14, 2022

Explore the Expression: Facial Expression Generation using Auxiliary Classifier Generative Adversarial Network

Explore the Expression: Facial Expression Generation using Auxiliary Classifier Generative Adversarial Network This is the official implementation of

2 Jul 09, 2022

Multi-resolution SeqMatch based long-term Place Recognition

MRS-SLAM for long-term place recognition In this work, we imply an multi-resolution sambling based visual place recognition method. This work is based

6 Dec 06, 2022

This project deploys a yolo fastest model in the form of tflite on raspberry 3b+. The model is from another repository of mine called -Trash-Classification-Car

Deploy-yolo-fastest-tflite-on-raspberry 觉得有用的话可以顺手点个star嗷这个项目将垃圾分类小车中的tflite模型移植到了树莓派3b+上面。该项目主要是为了记录在树莓派部署yolo fastest tflite的流程 (之后有时间会尝试用C++部署来提升

7 Aug 16, 2022

MIMO-UNet - Official Pytorch Implementation

MIMO-UNet - Official Pytorch Implementation This repository provides the official PyTorch implementation of the following paper: Rethinking Coarse-to-

248 Jan 02, 2023

Python module providing a framework to trace individual edges in an image using Gaussian process regression.

Edge Tracing using Gaussian Process Regression Repository storing python module which implements a framework to trace individual edges in an image usi

7 Dec 27, 2022

Implementing yolov4 target detection and tracking based on nao robot

6 Apr 19, 2022

Code for our ICASSP 2021 paper: SA-Net: Shuffle Attention for Deep Convolutional Neural Networks

SA-Net: Shuffle Attention for Deep Convolutional Neural Networks (paper) By Qing-Long Zhang and Yu-Bin Yang [State Key Laboratory for Novel Software T

199 Jan 08, 2023

The official implementation of EIGNN: Efficient Infinite-Depth Graph Neural Networks (NeurIPS 2021)

EIGNN: Efficient Infinite-Depth Graph Neural Networks The official implementation of EIGNN: Efficient Infinite-Depth Graph Neural Networks (NeurIPS 20

14 Nov 22, 2022

A pure PyTorch batched computation implementation of "CIF: Continuous Integrate-and-Fire for End-to-End Speech Recognition"

14 Dec 02, 2022

Fuzzing the Kernel Using Unicornafl and AFL++

Unicorefuzz Fuzzing the Kernel using UnicornAFL and AFL++. For details, skim through the WOOT paper or watch this talk at CCCamp19. Is it any good? ye

283 Dec 26, 2022

basic tutorial on pytorch

Quick Tutorial on PyTorch PyTorch Basics Linear Regression Logistic Regression Artificial Neural Networks Convolutional Neural Networks Recurrent Neur

7 Sep 15, 2022

Official Implementation of SimIPU: Simple 2D Image and 3D Point Cloud Unsupervised Pre-Training for Spatial-Aware Visual Representations

Official Implementation of SimIPU SimIPU: Simple 2D Image and 3D Point Cloud Unsupervised Pre-Training for Spatial-Aware Visual Representations Since

37 Dec 01, 2022

This repository contains the database and code used in the paper Embedding Arithmetic for Text-driven Image Transformation

This repository contains the database and code used in the paper Embedding Arithmetic for Text-driven Image Transformation (Guillaume Couairon, Holger

31 Oct 17, 2022

Multi-objective gym environments for reinforcement learning.

Related tags

Overview

MO-Gym: Multi-Objective Reinforcement Learning Environments

Install

Usage

Environments

Citing

Acknowledgments

Comments

Adds the breakable bottles environment

A few bug fixes

Consider using info field for reward vector

Add MO reward wrappers

Fix notebook

Add fishwood env

Add wrapper to help logging episode returns