Multi-objective gym environments for reinforcement learning.

Overview

tests Project Status: Active – The project has reached a stable, usable state and is being actively developed. License

MO-Gym: Multi-Objective Reinforcement Learning Environments

Gym environments for multi-objective reinforcement learning (MORL). The environments follow the standard gym's API, but return vectorized rewards as numpy arrays.

For details on multi-objective MPDS (MOMDP's) and other MORL definitions, see A practical guide to multi-objective reinforcement learning and planning.

Install

git clone https://github.com/LucasAlegre/mo-gym.git
cd mo-gym
pip install -e .

Usage

import gym
import mo_gym

env = gym.make('minecart-v0') # It follows the original gym's API ...

obs = env.reset()
next_obs, vector_reward, done, info = env.step(your_agent.act(obs))  # but vector_reward is a numpy array!

# Optionally, you can scalarize the reward function with the LinearReward wrapper
env = mo_gym.LinearReward(env, weight=np.array([0.8, 0.2, 0.2]))

Environments

Env Obs/Action spaces Objectives Description
deep-sea-treasure-v0
Discrete / Discrete [treasure, time_penalty] Agent is a submarine that must collect a treasure while taking into account a time penalty. Treasures values taken from Yang et al. 2019.
resource-gathering-v0
Discrete / Discrete [enemy, gold, gem] Agent must collect gold or gem. Enemies have a 10% chance of killing the agent. From Barret & Narayanan 2008.
four-room-v0
Discrete / Discrete [item1, item2, item3] Agent must collect three different types of items in the map and reach the goal.
mo-mountaincar-v0
Continuous / Discrete [time_penalty, reverse_penalty, forward_penalty] Classic Mountain Car env, but with extra penalties for the forward and reverse actions. From Vamplew et al. 2011.
mo-reacher-v0
Continuous / Discrete [target_1, target_2, target_3, target_4] Reacher robot from PyBullet, but there are 4 different target positions.
minecart-v0
Continuous or Image / Discrete [ore1, ore2, fuel] Agent must collect two types of ores and minimize fuel consumption. From Abels et al. 2019.
mo-supermario-v0
Image / Discrete [x_pos, time, death, coin, enemy] Multi-objective version of SuperMarioBrosEnv. Objectives are defined similarly as in Yang et al. 2019.

Citing

If you use this repository in your work, please cite:

@misc{mo-gym,
  author = {Lucas N. Alegre},
  title = {MO-Gym: Multi-Objective Reinforcement Learning Environments},
  year = {2022},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/LucasAlegre/mo-gym}},
}

Acknowledgments

Comments
  • Adds the breakable bottles environment

    Adds the breakable bottles environment

    Adds the breakable bottles environment which is used in Vamplew et al. 2021 as a toy model for irreversible change in stochastic environments.

    I wasn't really planning for creating a pull request, so the commit history is a bit messy...

    opened by rk1a 4
  • A few bug fixes

    A few bug fixes

    DST:

    • The bounds of the rewards were hardcoded for the convex map.
    • The way to fix the seed is deprecated. From what I saw in the official gym envs, the seed is now fixed just using the reset method. (e.g. https://github.com/openai/gym/blob/master/gym/envs/classic_control/cartpole.py#L198)

    setup.py:

    • Gym 0.25.0 introduces breaking changes. So I fixed the version to 0.24.1.
    opened by ffelten 2
  • Consider using info field for reward vector

    Consider using info field for reward vector

    Hello,

    Thanks for this repository, it will be very useful to the MORL community :-).

    I was just wondering if you think it would be a good idea to enforce gym compatibility by specifying rewards as scalar and giving the vectorial rewards elsewhere. The idea would be to use a field in the info dictionary as they do in PGMORL. This would allow to use existing RL algorithms and logging libraries out of box (e.g. stable-baselines, tensorboard logs, ...).

    For example: In a DST env, if you return the treasure reward only in the reward field, you can use the DQN implementation from baselines and have insights on the average reward, as well as the episode length in the tensorboard logs. Of course, you can extract the full vectorial reward from the info dictionary in order to learn with MORL :-).

    With kind regards,

    Florian

    opened by ffelten 2
  • Add MO reward wrappers

    Add MO reward wrappers

    I added two wrappers commonly used: normalize and clip.

    The idea is to provide the index of the reward component you want to normalize or clip, and leave the other components as they are. Of course, wrappers can be wrapped inside others to normalize all rewards (see tests).

    opened by ffelten 1
  • Fix notebook

    Fix notebook

    There are still issues with the video recorder :(

    /usr/local/lib/python3.9/site-packages/gym/wrappers/monitoring/video_recorder.py:59: UserWarning: WARN: Disabling video recorder because environment <TimeLimit<OrderEnforcing<MOMountainCar<mo-mountaincar-v0>>>> was not initialized with any compatible video mode between `rgb_array` and `rgb_array_list`
      logger.warn(
    
    opened by ffelten 0
  • Add fishwood env

    Add fishwood env

    Code was provided by Denis Steckelmacher, I did a bit of refactoring and migrated it to 0.26.

    I didn't bother making the render with the images, but I did upload them in case somebody gets motivated, the env is super simple.

    opened by ffelten 0
  • Add wrapper to help logging episode returns

    Add wrapper to help logging episode returns

    The implementation is mostly a copy paste of the original gym. I had to copy paste instead of override and call to super because the way the return is a numpy array, which is mutable, and the original implementation resets it to 0. Hence, if we kept the original, the return will always be a vector of zeros (because resetted)

    opened by ffelten 0
Releases(0.2.1)
Owner
Lucas Alegre
PhD student at Institute of Informatics - UFRGS. Interested in reinforcement learning, machine learning and artificial (neuro-inspired) intelligence.
Lucas Alegre
Code of PVTv2 is released! PVTv2 largely improves PVTv1 and works better than Swin Transformer with ImageNet-1K pre-training.

Updates (2020/06/21) Code of PVTv2 is released! PVTv2 largely improves PVTv1 and works better than Swin Transformer with ImageNet-1K pre-training. Pyr

1.3k Jan 04, 2023
In this project, two programs can help you take full agvantage of time on the model training with a remote server

In this project, two programs can help you take full agvantage of time on the model training with a remote server, which can push notification to your phone about the information during model trainin

GrayLee 8 Dec 27, 2022
RetinaFace: Deep Face Detection Library in TensorFlow for Python

RetinaFace is a deep learning based cutting-edge facial detector for Python coming with facial landmarks.

Sefik Ilkin Serengil 512 Dec 29, 2022
ONNX Runtime Web demo is an interactive demo portal showing real use cases running ONNX Runtime Web in VueJS.

ONNX Runtime Web demo is an interactive demo portal showing real use cases running ONNX Runtime Web in VueJS. It currently supports four examples for you to quickly experience the power of ONNX Runti

Microsoft 58 Dec 18, 2022
Complete* list of autonomous driving related datasets

AD Datasets Complete* and curated list of autonomous driving related datasets Contributing Contributions are very welcome! To add or update a dataset:

Daniel Bogdoll 13 Dec 19, 2022
Pytorch reimplementation of the Vision Transformer (An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale)

Vision Transformer Pytorch reimplementation of Google's repository for the ViT model that was released with the paper An Image is Worth 16x16 Words: T

Eunkwang Jeon 1.4k Dec 28, 2022
XViT - Space-time Mixing Attention for Video Transformer

XViT - Space-time Mixing Attention for Video Transformer This is the official implementation of the XViT paper: @inproceedings{bulat2021space, title

Adrian Bulat 33 Dec 23, 2022
Official code for Next Check-ins Prediction via History and Friendship on Location-Based Social Networks (MDM 2018)

MUC Next Check-ins Prediction via History and Friendship on Location-Based Social Networks (MDM 2018) Performance Details for Accuracy: | Dataset

Yijun Su 3 Oct 09, 2022
Codes accompanying the paper "Learning Nearly Decomposable Value Functions with Communication Minimization" (ICLR 2020)

NDQ: Learning Nearly Decomposable Value Functions with Communication Minimization Note This codebase accompanies paper Learning Nearly Decomposable Va

Tonghan Wang 69 Nov 26, 2022
code for generating data set ES-ImageNet with corresponding training code

es-imagenet-master code for generating data set ES-ImageNet with corresponding training code dataset generator some codes of ODG algorithm The variabl

Ordinarabbit 18 Dec 25, 2022
Multi-Task Learning as a Bargaining Game

Nash-MTL Official implementation of "Multi-Task Learning as a Bargaining Game". Setup environment conda create -n nashmtl python=3.9.7 conda activate

Aviv Navon 87 Dec 26, 2022
PyTorch wrappers for using your model in audacity!

audacitorch This package contains utilities for prepping PyTorch audio models for use in Audacity. More specifically, it provides abstract classes for

Hugo Flores García 130 Dec 14, 2022
Improving Contrastive Learning by Visualizing Feature Transformation, ICCV 2021 Oral

Improving Contrastive Learning by Visualizing Feature Transformation This project hosts the codes, models and visualization tools for the paper: Impro

Bingchen Zhao 83 Dec 15, 2022
Face recognition with trained classifiers for detecting objects using OpenCV

Face_Detector Face recognition with trained classifiers for detecting objects using OpenCV Libraries required to be installed using pip Command: cv2 n

Chumui Tripura 0 Oct 31, 2021
3D Pose Estimation for Vehicles

3D Pose Estimation for Vehicles Introduction This work generates 4 key-points and 2 key-edges from vertices and edges of vehicles as ground truth. The

Jingyi Wang 1 Nov 01, 2021
FaceAnon - Anonymize people in images and videos using yolov5-crowdhuman

Face Anonymizer Blur faces from image and video files in /input/ folder. Require

22 Nov 03, 2022
Official repository of IMPROVING DEEP IMAGE MATTING VIA LOCAL SMOOTHNESS ASSUMPTION.

IMPROVING DEEP IMAGE MATTING VIA LOCAL SMOOTHNESS ASSUMPTION This is the official repository of IMPROVING DEEP IMAGE MATTING VIA LOCAL SMOOTHNESS ASSU

电线杆 14 Dec 15, 2022
Re-implementation of the Noise Contrastive Estimation algorithm for pyTorch, following "Noise-contrastive estimation: A new estimation principle for unnormalized statistical models." (Gutmann and Hyvarinen, AISTATS 2010)

Noise Contrastive Estimation for pyTorch Overview This repository contains a re-implementation of the Noise Contrastive Estimation algorithm, implemen

Denis Emelin 42 Nov 24, 2022
Text-to-SQL in the Wild: A Naturally-Occurring Dataset Based on Stack Exchange Data

SEDE SEDE (Stack Exchange Data Explorer) is new dataset for Text-to-SQL tasks with more than 12,000 SQL queries and their natural language description

Rupert. 83 Nov 11, 2022
Emotion Recognition from Facial Images

Reconhecimento de Emoções a partir de imagens faciais Este projeto implementa um classificador simples que utiliza técncias de deep learning e transfe

Gabriel 2 Feb 09, 2022