A repository for the updated version of CoinRun used to collect MUGEN, a multimodal video-audio-text dataset.

Last update: Oct 22, 2022

Related tags

Overview

MUGEN Dataset

Setup

conda create --name MUGEN python=3.6
conda activate MUGEN
pip install --ignore-installed https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.12.0-cp36-cp36m-linux_x86_64.whl 
module load cuda/9.0
module load cudnn/v7.4-cuda.10.0
git clone coinrun_MUGEN
cd coinrun_MUGEN
pip install -r requirements.txt
conda install -c conda-forge mpi4py
pip install -e .

Training Agents

Basic training commands:

python -m coinrun.train_agent --run-id myrun --save-interval 1

After each parameter update, this will save a copy of the agent to ./saved_models/. Results are logged to /tmp/tensorflow by default.

Run parallel training using MPI:

mpiexec -np 8 python -m coinrun.train_agent --run-id myrun

Train an agent on a fixed set of N levels. With N = 0, the training set is unbounded.

python -m coinrun.train_agent --run-id myrun --num-levels N

Continue training an agent from a checkpoint:

python -m coinrun.train_agent --run-id newrun --restore-id myrun

View training options:

python -m coinrun.train_agent --help

Example commands for MUGEN agents:

Base model

python -m coinrun.train_agent --run-id name_your_agent \
                --architecture impala --paint-vel-info 1 --dropout 0.0 --l2-weight 0.0001 \
                --num-levels 0 --use-lstm 1 --num-envs 96 --set-seed 80 \
                --bump-head-penalty 0.25 -kill-monster-reward 10.0

Add squat penalty to reduce excessive squating

python -m coinrun.train_agent --run-id gamev2_fine_tune_m4_squat_penalty \
                --architecture impala --paint-vel-info 1 --dropout 0.0 --l2-weight 0.0001 \
                --num-levels 0 --use-lstm 1 --num-envs 96 --set-seed 811 \
                --bump-head-penalty 0.1 --kill-monster-reward 5.0 --squat-penalty 0.1 \
                --restore-id gamev2_fine_tune_m4_0

Larger model

python -m coinrun.train_agent --run-id gamev2_largearch_bump_head_penalty_0.05_0 \
                --architecture impalalarge --paint-vel-info 1 --dropout 0.0 --l2-weight 0.0001 \
                --num-levels 0 --use-lstm 1 --num-envs 96 --set-seed 51 \
                --bump-head-penalty 0.05 -kill-monster-reward 10.0

Add reward for dying

python -m coinrun.train_agent --run-id gamev2_fine_tune_squat_penalty_die_reward_3.0 \
                --architecture impala --paint-vel-info 1 --dropout 0.0 --l2-weight 0.0001 \
                --num-levels 0 --use-lstm 1 --num-envs 96 --set-seed 857 \
                --bump-head-penalty 0.1 --kill-monster-reward 5.0 --squat-penalty 0.1 \
                --restore-id gamev2_fine_tune_m4_squat_penalty --die-penalty -3.0

Add jump penalty

python -m coinrun.train_agent --run-id gamev2_fine_tune_m4_jump_penalty \
                --architecture impala --paint-vel-info 1 --dropout 0.0 --l2-weight 0.0001 \
                --num-levels 0 --use-lstm 1 --num-envs 96 --set-seed 811 \
                --bump-head-penalty 0.1 --kill-monster-reward 10.0 --jump-penalty 0.1 \
                --restore-id gamev2_fine_tune_m4_0

Data Collection

Collect video data with trained agent. The following command will create a folder {save_dir}/{model_name}_seed_{seed}, which contains the audio semantic maps to reconstruct game audio, as well as the csv containing all game metadata. We use the csv for reconstructing video data in the next step.

python -m coinrun.collect_data --collect_data --paint-vel-info 1 \
                --set-seed 406 --restore-id gamev2_fine_tune_squat_penalty_timeout_300 \
                --save-dir  \
                --level-timeout 600 --num-levels-to-collect 2000

The next step is to create 3.2 second videos with audio by running the script gen_videos.sh. This script first parses the csv metadata of agent gameplay into a json format. Then, we sample 3 second clips, render to RGB, generate audio, and save .mp4s. Note that we apply some sampling logic in gen_videos.py to only generate videos for levels of sufficient length and with interesting game events. You can adjust the sampling logic to your liking here.

There are three outputs from this script:

./json_metadata - where full level jsons are saved for longer video rendering
./video_metadata - where 3.2 second video jsons are saved
./videos - where 3.2s .mp4 videos with audio are saved. We use these videos for human annotation.

bash gen_videos.sh

For example:

bash gen_videos.sh video_data model_gamev2_fine_tune_squat_penalty_timeout_300_seed_406

License Info

The majority of MUGEN is licensed under CC-BY-NC, however portions of the project are available under separate license terms: CoinRun, VideoGPT, VideoCLIP, and S3D are licensed under the MIT license; Tokenizer is licensed under the Apache 2.0 Pycocoevalcap is licensed under the BSD license; VGGSound is licensed under the CC-BY-4.0 license.

A repository for the updated version of CoinRun used to collect MUGEN, a multimodal video-audio-text dataset.

Related tags

Overview

MUGEN Dataset

Setup

Training Agents

Basic training commands:

Example commands for MUGEN agents:

Data Collection

License Info

Owner

MUGEN

JAX code for the paper "Control-Oriented Model-Based Reinforcement Learning with Implicit Differentiation"

Source code for the paper "PLOME: Pre-training with Misspelled Knowledge for Chinese Spelling Correction" in ACL2021

Regularizing Nighttime Weirdness: Efficient Self-supervised Monocular Depth Estimation in the Dark (ICCV 2021)

A unified framework for machine learning with time series

PyTorch implementations of Top-N recommendation, collaborative filtering recommenders.

DTCN IJCAI - Sequential prediction learning framework and algorithm

Code for generating the figures in the paper "Capacity of Group-invariant Linear Readouts from Equivariant Representations: How Many Objects can be Linearly Classified Under All Possible Views?"

FeTaQA: Free-form Table Question Answering

An official source code for paper Deep Graph Clustering via Dual Correlation Reduction, accepted by AAAI 2022

TensorFlow Implementation of "Show, Attend and Tell"

Self-training for Few-shot Transfer Across Extreme Task Differences

MultiMix: Sparingly Supervised, Extreme Multitask Learning From Medical Images (ISBI 2021, MELBA 2021)

BMN: Boundary-Matching Network

PyTorch implementation of DARDet: A Dense Anchor-free Rotated Object Detector in Aerial Images

Study of human inductive biases in CNNs and Transformers.

Code to go with the paper "Decentralized Bayesian Learning with Metropolis-Adjusted Hamiltonian Monte Carlo"

Pseudo-rng-app - whos needs science to make a random number when you have pseudoscience?

Exposure Time Calculator (ETC) and radial velocity precision estimator for the Near InfraRed Planet Searcher (NIRPS) spectrograph

Synthesizing and manipulating 2048x1024 images with conditional GANs

Clustering is a popular approach to detect patterns in unlabeled data