An OpenAI Gym environment for Super Mario Bros

Overview

gym-super-mario-bros

BuildStatus PackageVersion PythonVersion Stable Format License

Mario

An OpenAI Gym environment for Super Mario Bros. & Super Mario Bros. 2 (Lost Levels) on The Nintendo Entertainment System (NES) using the nes-py emulator.

Installation

The preferred installation of gym-super-mario-bros is from pip:

pip install gym-super-mario-bros

Usage

Python

You must import gym_super_mario_bros before trying to make an environment. This is because gym environments are registered at runtime. By default, gym_super_mario_bros environments use the full NES action space of 256 discrete actions. To contstrain this, gym_super_mario_bros.actions provides three actions lists (RIGHT_ONLY, SIMPLE_MOVEMENT, and COMPLEX_MOVEMENT) for the nes_py.wrappers.JoypadSpace wrapper. See gym_super_mario_bros/actions.py for a breakdown of the legal actions in each of these three lists.

from nes_py.wrappers import JoypadSpace
import gym_super_mario_bros
from gym_super_mario_bros.actions import SIMPLE_MOVEMENT
env = gym_super_mario_bros.make('SuperMarioBros-v0')
env = JoypadSpace(env, SIMPLE_MOVEMENT)

done = True
for step in range(5000):
    if done:
        state = env.reset()
    state, reward, done, info = env.step(env.action_space.sample())
    env.render()

env.close()

NOTE: gym_super_mario_bros.make is just an alias to gym.make for convenience.

NOTE: remove calls to render in training code for a nontrivial speedup.

Command Line

gym_super_mario_bros features a command line interface for playing environments using either the keyboard, or uniform random movement.

gym_super_mario_bros -e <the environment ID to play> -m <`human` or `random`>

NOTE: by default, -e is set to SuperMarioBros-v0 and -m is set to human.

Environments

These environments allow 3 attempts (lives) to make it through the 32 stages in the game. The environments only send reward-able game-play frames to agents; No cut-scenes, loading screens, etc. are sent from the NES emulator to an agent nor can an agent perform actions during these instances. If a cut-scene is not able to be skipped by hacking the NES's RAM, the environment will lock the Python process until the emulator is ready for the next action.

Environment Game ROM Screenshot
SuperMarioBros-v0 SMB standard
SuperMarioBros-v1 SMB downsample
SuperMarioBros-v2 SMB pixel
SuperMarioBros-v3 SMB rectangle
SuperMarioBros2-v0 SMB2 standard
SuperMarioBros2-v1 SMB2 downsample

Individual Stages

These environments allow a single attempt (life) to make it through a single stage of the game.

Use the template

SuperMarioBros-<world>-<stage>-v<version>

where:

  • <world> is a number in {1, 2, 3, 4, 5, 6, 7, 8} indicating the world
  • <stage> is a number in {1, 2, 3, 4} indicating the stage within a world
  • <version> is a number in {0, 1, 2, 3} specifying the ROM mode to use
    • 0: standard ROM
    • 1: downsampled ROM
    • 2: pixel ROM
    • 3: rectangle ROM

For example, to play 4-2 on the downsampled ROM, you would use the environment id SuperMarioBros-4-2-v1.

Random Stage Selection

The random stage selection environment randomly selects a stage and allows a single attempt to clear it. Upon a death and subsequent call to reset, the environment randomly selects a new stage. This is only available for the standard Super Mario Bros. game, not Lost Levels (at the moment). To use these environments, append RandomStages to the SuperMarioBros id. For example, to use the standard ROM with random stage selection use SuperMarioBrosRandomStages-v0. To seed the random stage selection use the seed method of the env, i.e., env.seed(1), before any calls to reset.

Step

Info about the rewards and info returned by the step method.

Reward Function

The reward function assumes the objective of the game is to move as far right as possible (increase the agent's x value), as fast as possible, without dying. To model this game, three separate variables compose the reward:

  1. v: the difference in agent x values between states
    • in this case this is instantaneous velocity for the given step
    • v = x1 - x0
      • x0 is the x position before the step
      • x1 is the x position after the step
    • moving right ⇔ v > 0
    • moving left ⇔ v < 0
    • not moving ⇔ v = 0
  2. c: the difference in the game clock between frames
    • the penalty prevents the agent from standing still
    • c = c0 - c1
      • c0 is the clock reading before the step
      • c1 is the clock reading after the step
    • no clock tick ⇔ c = 0
    • clock tick ⇔ c < 0
  3. d: a death penalty that penalizes the agent for dying in a state
    • this penalty encourages the agent to avoid death
    • alive ⇔ d = 0
    • dead ⇔ d = -15

r = v + c + d

The reward is clipped into the range (-15, 15).

info dictionary

The info dictionary returned by the step method contains the following keys:

Key Type Description
coins int The number of collected coins
flag_get bool True if Mario reached a flag or ax
life int The number of lives left, i.e., {3, 2, 1}
score int The cumulative in-game score
stage int The current stage, i.e., {1, ..., 4}
status str Mario's status, i.e., {'small', 'tall', 'fireball'}
time int The time left on the clock
world int The current world, i.e., {1, ..., 8}
x_pos int Mario's x position in the stage (from the left)
y_pos int Mario's y position in the stage (from the bottom)

Citation

Please cite gym-super-mario-bros if you use it in your research.

@misc{gym-super-mario-bros,
  author = {Christian Kauten},
  howpublished = {GitHub},
  title = {{S}uper {M}ario {B}ros for {O}pen{AI} {G}ym},
  URL = {https://github.com/Kautenja/gym-super-mario-bros},
  year = {2018},
}
Owner
Andrew Stelmach
Andrew Stelmach
A code implementation of AC-GC: Activation Compression with Guaranteed Convergence, in NeurIPS 2021.

Code For AC-GC: Lossy Activation Compression with Guaranteed Convergence This code is intended to be used as a supplemental material for submission to

Dave Evans 2 Nov 01, 2022
Clustering with variational Bayes and population Monte Carlo

pypmc pypmc is a python package focusing on adaptive importance sampling. It can be used for integration and sampling from a user-defined target densi

45 Feb 06, 2022
DEMix Layers for Modular Language Modeling

DEMix This repository contains modeling utilities for "DEMix Layers: Disentangling Domains for Modular Language Modeling" (Gururangan et. al, 2021). T

Suchin 43 Nov 11, 2022
Several simple examples for popular neural network toolkits calling custom CUDA operators.

Neural Network CUDA Example Several simple examples for neural network toolkits (PyTorch, TensorFlow, etc.) calling custom CUDA operators. We provide

WeiYang 798 Jan 01, 2023
Barlow Twins and HSIC

Barlow Twins and HSIC Unofficial Pytorch implementation for Barlow Twins and HSIC_SSL on small datasets (CIFAR10, STL10, and Tiny ImageNet). Correspon

Yao-Hung Hubert Tsai 49 Nov 24, 2022
Simple Pixelbot for Diablo 2 Resurrected written in python and opencv.

Simple Pixelbot for Diablo 2 Resurrected written in python and opencv. Obviously only use it in offline mode as it is against the TOS of Blizzard to use it in online mode!

468 Jan 03, 2023
GAN-based 3D human pose estimation model for 3DV'17 paper

Tensorflow implementation for 3DV 2017 conference paper "Adversarially Parameterized Optimization for 3D Human Pose Estimation". @inproceedings{jack20

Dominic Jack 15 Feb 27, 2021
Example-custom-ml-block-keras - Custom Keras ML block example for Edge Impulse

Custom Keras ML block example for Edge Impulse This repository is an example on

Edge Impulse 8 Nov 02, 2022
Code needed to reproduce the examples found in "The Temporal Robustness of Stochastic Signals"

The Temporal Robustness of Stochastic Signals Code needed to reproduce the examples found in "The Temporal Robustness of Stochastic Signals" Case stud

0 Oct 28, 2021
A pytorch-based deep learning framework for multi-modal 2D/3D medical image segmentation

A 3D multi-modal medical image segmentation library in PyTorch We strongly believe in open and reproducible deep learning research. Our goal is to imp

Adaloglou Nikolas 1.2k Dec 27, 2022
This is the official implementation for "Do Transformers Really Perform Bad for Graph Representation?".

Graphormer By Chengxuan Ying, Tianle Cai, Shengjie Luo, Shuxin Zheng*, Guolin Ke, Di He*, Yanming Shen and Tie-Yan Liu. This repo is the official impl

Microsoft 1.3k Dec 29, 2022
Pytorch-diffusion - A basic PyTorch implementation of 'Denoising Diffusion Probabilistic Models'

PyTorch implementation of 'Denoising Diffusion Probabilistic Models' This reposi

Arthur Juliani 76 Jan 07, 2023
Inflated i3d network with inception backbone, weights transfered from tensorflow

I3D models transfered from Tensorflow to PyTorch This repo contains several scripts that allow to transfer the weights from the tensorflow implementat

Yana 479 Dec 08, 2022
Classify music genre from a 10 second sound stream using a Neural Network.

MusicGenreClassification Academic research in the field of Deep Learning (Deep Neural Networks) and Sound Processing, Tel Aviv University. Featured in

Matan Lachmish 453 Dec 27, 2022
Prometheus exporter for Cisco Unified Computing System (UCS) Manager

prometheus-ucs-exporter Overview Use metrics from the UCS API to export relevant metrics to Prometheus This repository is a fork of Drew Stinnett's or

Marshall Wace 6 Nov 07, 2022
Rl-quickstart - Reinforcement Learning Quickstart

Reinforcement Learning Quickstart To get setup with the repository, git clone ht

UCLA DataRes 3 Jun 16, 2022
Official code for 'Weakly-supervised Video Anomaly Detection with Robust Temporal Feature Magnitude Learning' [ICCV 2021]

RTFM This repo contains the Pytorch implementation of our paper: Weakly-supervised Video Anomaly Detection with Robust Temporal Feature Magnitude Lear

Yu Tian 242 Jan 08, 2023
VOGUE: Try-On by StyleGAN Interpolation Optimization

VOGUE is a StyleGAN interpolation optimization algorithm for photo-realistic try-on. Top: shirt try-on automatically synthesized by our method in two different examples.

Wei ZHANG 66 Dec 09, 2022
Focal Loss for Dense Rotation Object Detection

Convert ResNets weights from GluonCV to Tensorflow Abstract GluonCV released some new resnet pre-training weights and designed some new resnets (such

17 Nov 24, 2021
Code for the paper "SmoothMix: Training Confidence-calibrated Smoothed Classifiers for Certified Robustness" (NeurIPS 2021)

SmoothMix: Training Confidence-calibrated Smoothed Classifiers for Certified Robustness (NeurIPS2021) This repository contains code for the paper "Smo

Jongheon Jeong 17 Dec 27, 2022