PyTorch implementation of MulMON

Overview

MulMON

This repository contains a PyTorch implementation of the paper:
Learning Object-Centric Representations of Multi-object Scenes from Multiple Views

Li Nanbo, Cian Eastwood, Robert B. Fisher
NeurIPS 2020 (Spotlight)

Working examples

Check our video presentation for more: https://youtu.be/Og2ic2L77Pw.

Requirements

Hardware:

  • GPU. Currently, at least one GPU device is required to run this code, however, we will consider adding CPU demo code in the future.
  • Disk space: we do NOT have any hard requirement for the disk space, this is totally data-dependent. To use all the datasets we provide, you will need ~9GB disk space. However, it is not necessary to use all of our datasets (or even our datasets), see Data section for more details.

Python Environement:

  1. We use Anaconda to manage our python environment. Check conda installation guide here: https://docs.anaconda.com/anaconda/install/linux/.

  2. Open a new terminal, direct to the MulMON directory:

cd <YOUR-PATH-TO-MulMON>/MulMON/

create a new conda environment called "mulmon" and then activate it:

conda env create -f ./conda-env-spec.yml  
conda activate mulmon
  1. Install a gpu-supported PyTorch (tested with PyTorch 1.1, 1.2 and 1.7). It is very likely that there exists a PyTorch installer that is compatible with both your CUDA and this code. Go find it on PyTorch official site, and install it with one line of command.

  2. Install additional packages:

pip install tensorboard  
pip install scikit-image

If pytorch <=1.2 is used, you will also need to execute: pip install tensorboardX and import it in the ./trainer/base_trainer.py file. This can be done by commenting the 4th line AND uncommenting the 5th line of that file.

Data

  • Data structure (important):
    We use a data structure as follows:

    <YOUR-PATH>                                          
        ├── ...
        └── mulmon_datasets
              ├── clevr                                   # place your own CLEVR-MV under this directory if you go the fun way
              │    ├── ...
              │    ├── clevr_mv            
              │    │    └── ... (omit)                    # see clevr_<xxx> for subdirectory details
              │    ├── clevr_aug           
              │    │    └── ... (omit)                    # see clevr_<xxx> for subdirectory details
              │    └── clevr_<xxx>
              │         ├── ...
              │         ├── data                          # contains a list of scene files
              │         │    ├── CLEVR_new_#.npy          # one .npy --> one scene sample
              │         │    ├── CLEVR_new_#.npy       
              │         │    └── ...
              │         ├── clevr_<xxx>_train.json        # meta information of the training scenes
              │         └── clevr_<xxx>_test.json         # meta information of the testing scenes  
              └── GQN  
                   ├── ...
                   └── gqn-jaco                 
                        ├── gqn_jaco_train.h5
                        └── gqn_jaco_test.h5
    

    We recommend one to get the necessary data folders ready before downloading/generating the data files:

    mkdir <YOUR-PATH>/mulmon_datasets  
    mkdir <YOUR-PATH>/mulmon_datasets/clevr  
    mkdir <YOUR-PATH>/mulmon_datasets/GQN
    
  • Get Datasets

    • Easy way:
      Download our datasets:

      • clevr_mv.tar.gz and place it under the <YOUR-PATH>/mulmon_datasets/clevr/ directory (~1.8GB when extracted).
      • clevr_aug.tar.gz and place it under the <YOUR-PATH>/mulmon_datasets/clevr/ directory (~3.8GB when extracted).
      • gqn_jaco.tar.gz and place it under the <YOUR-PATH>/mulmon_datasets/GQN/ directory (~3.2GB when extracted).

      and extract them in places. For example, the command for extracting clevr_mv.tar.gz:

      tar -zxvf <YOUR-PATH>/mulmon_datasets/clevr/clevr_mv.tar.gz -C <YOUR-PATH>/mulmon_datasets/clevr/
      

      Note that: 1) we used only a subset of the DeepMind GQN-Jaco dataset, more available at deepmind/gqn-datasets, and 2) the published clevr_aug dataset differs slightly from the CLE-Aug used in the paper---we added more shapes (such as dolphins) into the dataset to make the dataset more interesting (also more complex).

    • Fun way :
      Customise your own multi-view CLEVR data. (available soon...)

Pre-trained models

Download the pretrained models (← click) and place it under `MulMON/', i.e. the root directory of this repository, then extract it by executing: tar -zxvf ./logs.tar.gz. Note that some of them are slightly under-trained, so one could train them further to achieve better results (How to train?).

Usage

Configure data path
To run the code, the data path, i.e. the <YOUR-PATH> in a script, needs to be correctly configured. For example, we store the MulMON dataset folder mulmon_datasets in ../myDatasets/, to train a MulMON on GQN-Jaco dataset using a single GPU, the 4th line of the ./scripts/train_jaco.sh script should look like: data_path=../myDatasets/mulmon_datasets/GQN.

  • Demo (Environment Test)
    Before running the below code, make sure the pretrained models are downloaded and saved first:

    . scripts/demo.sh  
    

    Check ./logs folder for the generated demos.

    • Notes for disentanglement demos: we randomly pick one object for each scene to create the disentanglement demo, so for scene samples where an empty object slot is picked, you won't see any object manipulation effect in the corresponding gifs (especially for the GQN-Jaco scenes). To create a demo like the shown one, one needs to specify (hard-coding) an object slot of interest and traverse informative latent dimensions (as some dimensions are redundant---capture no object property).
  • Train

    • On a single gpu (e.g. using the GQN-Jaco dataset):
    . scripts/train_jaco.sh  
    
    • On multiple GPUs (e.g. using the GQN-Jaco dataset):
    . scripts/train_jaco_parallel.sh  
    
    • To resume training from a stopped session, i.e. saved weights checkpoint-epoch<#number>.pth, simply append a flag --resume_epoch <#number> to one of the flags in the script files.
      For example, to resume previous training (saved as checkpoint-epoch2000.pth) on GQN-Jaco data, we just need to reconfigure the 10th line of the ./scripts/train_jaco.sh as:
      --input_dir ${data_path} --output_dir ${log_path} --resume_epoch 2000 \.
  • Evaluation

    • On a single gpu (e.g. using the Clevr_MV dataset):
    . scripts/eval_clevr.sh  
    
    • Here is a list of imporant evaluation settings which one might wants to play with
      --resume_epoch specify a model to evaluate --test_batch how many batches of test data one uses for evaluation.
      --vis_batch how many batches of output one visualises (save) while evaluation. (note: <= --test_batch)
      --analyse_batch how many batches of latent codes one saves for a post analysis, e.g. disentanglement. (note: <= --test_batch)
      --eval_all (boolean) set True for all [--eval_recon, --eval_seg, --eval_qry_obs, --eval_qry_seg] items, one could also use each of the four independently.
      --eval_dist (boolean) save latent codes for disentanglement analysis. (note: not controlled by --eval_all)
    • For the disentanglement evaluation, run the scripts/eval_clevr.sh script with --eval_dist flag set to True and set the --analyse_batch variable (which controls how many scenes of latent codes one wants to analyse) to be greater than 0. This saves the ouptut latent codes and ground-truth information that allows you to conduct disentanglement quantification using the QEDR framework.
    • You might observe that the evaluation results on the CLE-Aug dataset differ form those on the original paper, this is because the CLE-Aug here is slightly different the one we used for the paper (see more details).

Contact

We constantly respond to the raised ''issues'' in terms of running the code. For further inquiries and discussions (e.g. questions about the paper), email: [email protected].

Cite

Please cite our paper if you find this code useful.

@inproceedings{nanbo2020mulmon,
  title={Learning Object-Centric Representations of Multi-Object Scenes from Multiple Views},
  author={Nanbo, Li and Eastwood, Cian and Fisher, Robert B},
  booktitle={Advances in Neural Information Processing Systems},
  year={2020}
}
Owner
NanboLi
PhD Student, University of Edinburgh
NanboLi
Part-Aware Data Augmentation for 3D Object Detection in Point Cloud

Part-Aware Data Augmentation for 3D Object Detection in Point Cloud This repository contains a reference implementation of our Part-Aware Data Augment

Jaeseok Choi 62 Jan 03, 2023
Paper Title: Heterogeneous Knowledge Distillation for Simultaneous Infrared-Visible Image Fusion and Super-Resolution

HKDnet Paper Title: "Heterogeneous Knowledge Distillation for Simultaneous Infrared-Visible Image Fusion and Super-Resolution" Email:

wasteland 11 Nov 12, 2022
A cool little repl-based simulation written in Python

A cool little repl-based simulation written in Python planned to integrate machine-learning into itself to have AI battle to the death before your eye

Em 6 Sep 17, 2022
Github Traffic Insights as Prometheus metrics.

github-traffic Github Traffic collects your repository's traffic data and exposes it as Prometheus metrics. Grafana dashboard that displays the metric

Grafana Labs 34 Oct 27, 2022
Classic Papers for Beginners and Impact Scope for Authors.

There have been billions of academic papers around the world. However, maybe only 0.0...01% among them are valuable or are worth reading. Since our limited life has never been forever, TopPaper provi

Qiulin Zhang 228 Dec 18, 2022
PASTRIE: A Corpus of Prepositions Annotated with Supersense Tags in Reddit International English

PASTRIE Official release of the corpus described in the paper: Michael Kranzlein, Emma Manning, Siyao Peng, Shira Wein, Aryaman Arora, and Nathan Schn

NERT @ Georgetown 4 Dec 02, 2021
Malware Bypass Research using Reinforcement Learning

Malware Bypass Research using Reinforcement Learning

Bobby Filar 76 Dec 26, 2022
VISSL is FAIR's library of extensible, modular and scalable components for SOTA Self-Supervised Learning with images.

What's New Below we share, in reverse chronological order, the updates and new releases in VISSL. All VISSL releases are available here. [Oct 2021]: V

Meta Research 2.9k Jan 07, 2023
Music source separation is a task to separate audio recordings into individual sources

Music Source Separation Music source separation is a task to separate audio recordings into individual sources. This repository is an PyTorch implmeme

Bytedance Inc. 958 Jan 03, 2023
Optimizing DR with hard negatives and achieving SOTA first-stage retrieval performance on TREC DL Track (SIGIR 2021 Full Paper).

Optimizing Dense Retrieval Model Training with Hard Negatives Jingtao Zhan, Jiaxin Mao, Yiqun Liu, Jiafeng Guo, Min Zhang, Shaoping Ma This repo provi

Jingtao Zhan 99 Dec 27, 2022
This repository implements and evaluates convolutional networks on the Möbius strip as toy model instantiations of Coordinate Independent Convolutional Networks.

Orientation independent Möbius CNNs This repository implements and evaluates convolutional networks on the Möbius strip as toy model instantiations of

Maurice Weiler 59 Dec 09, 2022
DSEE: Dually Sparsity-embedded Efficient Tuning of Pre-trained Language Models

DSEE Codes for [Preprint] DSEE: Dually Sparsity-embedded Efficient Tuning of Pre-trained Language Models Xuxi Chen, Tianlong Chen, Yu Cheng, Weizhu Ch

VITA 4 Dec 27, 2021
Gesture Volume Control Using OpenCV and MediaPipe

This Project Uses OpenCV and MediaPipe Hand solutions to identify hands and Change system volume by taking thumb and index finger positions

Pratham Bhatnagar 6 Sep 12, 2022
🔪 Elimination based Lightweight Neural Net with Pretrained Weights

ELimNet ELimNet: Eliminating Layers in a Neural Network Pretrained with Large Dataset for Downstream Task Removed top layers from pretrained Efficient

snoop2head 4 Jul 12, 2022
Adds timm pretrained backbone to pytorch's FasterRcnn model

Operating Systems Lab (ETCS-352) Experiments for Operating Systems Lab (ETCS-352) performed by me in 2021 at uni. All codes are written by me except t

Mriganka Nath 12 Dec 03, 2022
The codebase for Data-driven general-purpose voice activity detection.

Data driven GPVAD Repository for the work in TASLP 2021 Voice activity detection in the wild: A data-driven approach using teacher-student training. S

Heinrich Dinkel 75 Nov 27, 2022
Paddle-Skeleton-Based-Action-Recognition - DecoupleGCN-DropGraph, ASGCN, AGCN, STGCN

Paddle-Skeleton-Action-Recognition DecoupleGCN-DropGraph, ASGCN, AGCN, STGCN. Yo

Chenxu Peng 3 Nov 02, 2022
Pytorch implementation of Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors

Make-A-Scene - PyTorch Pytorch implementation (inofficial) of Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors (https://arxiv.org/

Casual GAN Papers 259 Dec 28, 2022
classify fashion-mnist dataset with pytorch

Fashion-Mnist Classifier with PyTorch Inference 1- clone this repository: git clone https://github.com/Jhamed7/Fashion-Mnist-Classifier.git 2- Instal

1 Jan 14, 2022
Code for the paper BERT might be Overkill: A Tiny but Effective Biomedical Entity Linker based on Residual Convolutional Neural Networks

Biomedical Entity Linking This repo provides the code for the paper BERT might be Overkill: A Tiny but Effective Biomedical Entity Linker based on Res

Tuan Manh Lai 24 Oct 24, 2022