PyTorch implementation of MulMON

Overview

MulMON

This repository contains a PyTorch implementation of the paper:
Learning Object-Centric Representations of Multi-object Scenes from Multiple Views

Li Nanbo, Cian Eastwood, Robert B. Fisher
NeurIPS 2020 (Spotlight)

Working examples

Check our video presentation for more: https://youtu.be/Og2ic2L77Pw.

Requirements

Hardware:

  • GPU. Currently, at least one GPU device is required to run this code, however, we will consider adding CPU demo code in the future.
  • Disk space: we do NOT have any hard requirement for the disk space, this is totally data-dependent. To use all the datasets we provide, you will need ~9GB disk space. However, it is not necessary to use all of our datasets (or even our datasets), see Data section for more details.

Python Environement:

  1. We use Anaconda to manage our python environment. Check conda installation guide here: https://docs.anaconda.com/anaconda/install/linux/.

  2. Open a new terminal, direct to the MulMON directory:

cd <YOUR-PATH-TO-MulMON>/MulMON/

create a new conda environment called "mulmon" and then activate it:

conda env create -f ./conda-env-spec.yml  
conda activate mulmon
  1. Install a gpu-supported PyTorch (tested with PyTorch 1.1, 1.2 and 1.7). It is very likely that there exists a PyTorch installer that is compatible with both your CUDA and this code. Go find it on PyTorch official site, and install it with one line of command.

  2. Install additional packages:

pip install tensorboard  
pip install scikit-image

If pytorch <=1.2 is used, you will also need to execute: pip install tensorboardX and import it in the ./trainer/base_trainer.py file. This can be done by commenting the 4th line AND uncommenting the 5th line of that file.

Data

  • Data structure (important):
    We use a data structure as follows:

    <YOUR-PATH>                                          
        ├── ...
        └── mulmon_datasets
              ├── clevr                                   # place your own CLEVR-MV under this directory if you go the fun way
              │    ├── ...
              │    ├── clevr_mv            
              │    │    └── ... (omit)                    # see clevr_<xxx> for subdirectory details
              │    ├── clevr_aug           
              │    │    └── ... (omit)                    # see clevr_<xxx> for subdirectory details
              │    └── clevr_<xxx>
              │         ├── ...
              │         ├── data                          # contains a list of scene files
              │         │    ├── CLEVR_new_#.npy          # one .npy --> one scene sample
              │         │    ├── CLEVR_new_#.npy       
              │         │    └── ...
              │         ├── clevr_<xxx>_train.json        # meta information of the training scenes
              │         └── clevr_<xxx>_test.json         # meta information of the testing scenes  
              └── GQN  
                   ├── ...
                   └── gqn-jaco                 
                        ├── gqn_jaco_train.h5
                        └── gqn_jaco_test.h5
    

    We recommend one to get the necessary data folders ready before downloading/generating the data files:

    mkdir <YOUR-PATH>/mulmon_datasets  
    mkdir <YOUR-PATH>/mulmon_datasets/clevr  
    mkdir <YOUR-PATH>/mulmon_datasets/GQN
    
  • Get Datasets

    • Easy way:
      Download our datasets:

      • clevr_mv.tar.gz and place it under the <YOUR-PATH>/mulmon_datasets/clevr/ directory (~1.8GB when extracted).
      • clevr_aug.tar.gz and place it under the <YOUR-PATH>/mulmon_datasets/clevr/ directory (~3.8GB when extracted).
      • gqn_jaco.tar.gz and place it under the <YOUR-PATH>/mulmon_datasets/GQN/ directory (~3.2GB when extracted).

      and extract them in places. For example, the command for extracting clevr_mv.tar.gz:

      tar -zxvf <YOUR-PATH>/mulmon_datasets/clevr/clevr_mv.tar.gz -C <YOUR-PATH>/mulmon_datasets/clevr/
      

      Note that: 1) we used only a subset of the DeepMind GQN-Jaco dataset, more available at deepmind/gqn-datasets, and 2) the published clevr_aug dataset differs slightly from the CLE-Aug used in the paper---we added more shapes (such as dolphins) into the dataset to make the dataset more interesting (also more complex).

    • Fun way :
      Customise your own multi-view CLEVR data. (available soon...)

Pre-trained models

Download the pretrained models (← click) and place it under `MulMON/', i.e. the root directory of this repository, then extract it by executing: tar -zxvf ./logs.tar.gz. Note that some of them are slightly under-trained, so one could train them further to achieve better results (How to train?).

Usage

Configure data path
To run the code, the data path, i.e. the <YOUR-PATH> in a script, needs to be correctly configured. For example, we store the MulMON dataset folder mulmon_datasets in ../myDatasets/, to train a MulMON on GQN-Jaco dataset using a single GPU, the 4th line of the ./scripts/train_jaco.sh script should look like: data_path=../myDatasets/mulmon_datasets/GQN.

  • Demo (Environment Test)
    Before running the below code, make sure the pretrained models are downloaded and saved first:

    . scripts/demo.sh  
    

    Check ./logs folder for the generated demos.

    • Notes for disentanglement demos: we randomly pick one object for each scene to create the disentanglement demo, so for scene samples where an empty object slot is picked, you won't see any object manipulation effect in the corresponding gifs (especially for the GQN-Jaco scenes). To create a demo like the shown one, one needs to specify (hard-coding) an object slot of interest and traverse informative latent dimensions (as some dimensions are redundant---capture no object property).
  • Train

    • On a single gpu (e.g. using the GQN-Jaco dataset):
    . scripts/train_jaco.sh  
    
    • On multiple GPUs (e.g. using the GQN-Jaco dataset):
    . scripts/train_jaco_parallel.sh  
    
    • To resume training from a stopped session, i.e. saved weights checkpoint-epoch<#number>.pth, simply append a flag --resume_epoch <#number> to one of the flags in the script files.
      For example, to resume previous training (saved as checkpoint-epoch2000.pth) on GQN-Jaco data, we just need to reconfigure the 10th line of the ./scripts/train_jaco.sh as:
      --input_dir ${data_path} --output_dir ${log_path} --resume_epoch 2000 \.
  • Evaluation

    • On a single gpu (e.g. using the Clevr_MV dataset):
    . scripts/eval_clevr.sh  
    
    • Here is a list of imporant evaluation settings which one might wants to play with
      --resume_epoch specify a model to evaluate --test_batch how many batches of test data one uses for evaluation.
      --vis_batch how many batches of output one visualises (save) while evaluation. (note: <= --test_batch)
      --analyse_batch how many batches of latent codes one saves for a post analysis, e.g. disentanglement. (note: <= --test_batch)
      --eval_all (boolean) set True for all [--eval_recon, --eval_seg, --eval_qry_obs, --eval_qry_seg] items, one could also use each of the four independently.
      --eval_dist (boolean) save latent codes for disentanglement analysis. (note: not controlled by --eval_all)
    • For the disentanglement evaluation, run the scripts/eval_clevr.sh script with --eval_dist flag set to True and set the --analyse_batch variable (which controls how many scenes of latent codes one wants to analyse) to be greater than 0. This saves the ouptut latent codes and ground-truth information that allows you to conduct disentanglement quantification using the QEDR framework.
    • You might observe that the evaluation results on the CLE-Aug dataset differ form those on the original paper, this is because the CLE-Aug here is slightly different the one we used for the paper (see more details).

Contact

We constantly respond to the raised ''issues'' in terms of running the code. For further inquiries and discussions (e.g. questions about the paper), email: [email protected].

Cite

Please cite our paper if you find this code useful.

@inproceedings{nanbo2020mulmon,
  title={Learning Object-Centric Representations of Multi-Object Scenes from Multiple Views},
  author={Nanbo, Li and Eastwood, Cian and Fisher, Robert B},
  booktitle={Advances in Neural Information Processing Systems},
  year={2020}
}
Owner
NanboLi
PhD Student, University of Edinburgh
NanboLi
Anomaly Detection Based on Hierarchical Clustering of Mobile Robot Data

We proposed a new approach to detect anomalies of mobile robot data. We investigate each data seperately with two clustering method hierarchical and k-means. There are two sub-method that we used for

Zekeriyya Demirci 1 Jan 09, 2022
Efficient Conformer: Progressive Downsampling and Grouped Attention for Automatic Speech Recognition

Efficient Conformer: Progressive Downsampling and Grouped Attention for Automatic Speech Recognition Official implementation of the Efficient Conforme

Maxime Burchi 145 Dec 30, 2022
Array Camera Ptychography

Array Camera Ptychography This repository provides the code for the following papers: Schulz, Timothy J., David J. Brady, and Chengyu Wang. "Photon-li

Brady lab in Optical Sciences 1 Nov 15, 2021
Pretrained models for Jax/Flax: StyleGAN2, GPT2, VGG, ResNet.

Pretrained models for Jax/Flax: StyleGAN2, GPT2, VGG, ResNet.

Matthias Wright 169 Dec 26, 2022
Prototypical Cross-Attention Networks for Multiple Object Tracking and Segmentation, NeurIPS 2021 Spotlight

PCAN for Multiple Object Tracking and Segmentation This is the offical implementation of paper PCAN for MOTS. We also present a trailer that consists

ETH VIS Group 328 Dec 29, 2022
Code release for "Transferable Semantic Augmentation for Domain Adaptation" (CVPR 2021)

Transferable Semantic Augmentation for Domain Adaptation Code release for "Transferable Semantic Augmentation for Domain Adaptation" (CVPR 2021) Paper

66 Dec 16, 2022
Official implementation of deep Gaussian process (DGP)-based multi-speaker speech synthesis with PyTorch.

Multi-speaker DGP This repository provides official implementation of deep Gaussian process (DGP)-based multi-speaker speech synthesis with PyTorch. O

sarulab-speech 24 Sep 07, 2022
Implementation of "A MLP-like Architecture for Dense Prediction"

A MLP-like Architecture for Dense Prediction (arXiv) Updates (22/07/2021) Initial release. Model Zoo We provide CycleMLP models pretrained on ImageNet

Shoufa Chen 244 Dec 27, 2022
Home repository for the Regularized Greedy Forest (RGF) library. It includes original implementation from the paper and multithreaded one written in C++, along with various language-specific wrappers.

Regularized Greedy Forest Regularized Greedy Forest (RGF) is a tree ensemble machine learning method described in this paper. RGF can deliver better r

RGF-team 364 Dec 28, 2022
Simulation of moving particles under microscopic imaging

Simulation of moving particles under microscopic imaging Install scipy numpy scikit-image tiffile Run python simulation.py Read result https://imagej

Zehao Wang 2 Dec 14, 2021
This repository is a series of notebooks that show solutions for the projects at Dataquest.io.

Dataquest Project Solutions This repository is a series of notebooks that show solutions for the projects at Dataquest.io. Of course, there are always

Dataquest 1.1k Dec 30, 2022
Food recognition model using convolutional neural network & computer vision

Food recognition model using convolutional neural network & computer vision. The goal is to match or beat the DeepFood Research Paper

Hemanth Chandran 1 Jan 13, 2022
Centroid-UNet is deep neural network model to detect centroids from satellite images.

Centroid UNet - Locating Object Centroids in Aerial/Serial Images Introduction Centroid-UNet is deep neural network model to detect centroids from Aer

GIC-AIT 19 Dec 08, 2022
Pytorch code for "State-only Imitation with Transition Dynamics Mismatch" (ICLR 2020)

This repo contains code for our paper State-only Imitation with Transition Dynamics Mismatch published at ICLR 2020. The code heavily uses the RL mach

20 Sep 08, 2022
Real-world Anomaly Detection in Surveillance Videos- pytorch Re-implementation

Real world Anomaly Detection in Surveillance Videos : Pytorch RE-Implementation This repository is a re-implementation of "Real-world Anomaly Detectio

seominseok 62 Dec 08, 2022
Official codebase for running the small, filtered-data GLIDE model from GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models.

GLIDE This is the official codebase for running the small, filtered-data GLIDE model from GLIDE: Towards Photorealistic Image Generation and Editing w

OpenAI 2.9k Jan 04, 2023
Repository for self-supervised landmark discovery

self-supervised-landmarks Repository for self-supervised landmark discovery Requirements pytorch pynrrd (for 3d images) Usage The use of this models i

Riddhish Bhalodia 2 Apr 18, 2022
Deep Learning with PyTorch made easy 🚀 !

Deep Learning with PyTorch made easy 🚀 ! Carefree? carefree-learn aims to provide CAREFREE usages for both users and developers. It also provides a c

381 Dec 22, 2022
Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. It can use GPUs and perform efficient symbolic differentiation.

============================================================================================================ `MILA will stop developing Theano https:

9.6k Dec 31, 2022
Exploring Simple 3D Multi-Object Tracking for Autonomous Driving (ICCV 2021)

Exploring Simple 3D Multi-Object Tracking for Autonomous Driving Chenxu Luo, Xiaodong Yang, Alan Yuille Exploring Simple 3D Multi-Object Tracking for

QCraft 141 Nov 21, 2022