Dynamic Visual Reasoning by Learning Differentiable Physics Models from Video and Language (NeurIPS 2021)

Related tags

Deep LearningVRDP
Overview

VRDP (NeurIPS 2021)

Dynamic Visual Reasoning by Learning Differentiable Physics Models from Video and Language
Mingyu Ding, Zhenfang Chen, Tao Du, Ping Luo, Joshua B. Tenenbaum, and Chuang Gan

image

More details can be found at the Project Page.

If you find our work useful in your research please consider citing our paper:

@inproceedings{ding2021dynamic,
  author = {Ding, Mingyu and Chen, Zhenfang and Du, Tao and Luo, Ping and Tenenbaum, Joshua B and Gan, Chuang},
  title = {Dynamic Visual Reasoning by Learning Differentiable Physics Models from Video and Language},
  booktitle = {Advances In Neural Information Processing Systems},
  year = {2021}
}

Prerequisites

  • Python 3
  • PyTorch 1.3 or higher
  • All relative packages are covered by Miniconda
  • Both CPUs and GPUs are supported

Dataset preparation

  • Download videos, video annotation, questions and answers, and object proposals accordingly from the official website

  • Transform videos into ".png" frames with ffmpeg.

  • Organize the data as shown below.

    clevrer
    ├── annotation_00000-01000
    │   ├── annotation_00000.json
    │   ├── annotation_00001.json
    │   └── ...
    ├── ...
    ├── image_00000-01000
    │   │   ├── 1.png
    │   │   ├── 2.png
    │   │   └── ...
    │   └── ...
    ├── ...
    ├── questions
    │   ├── train.json
    │   ├── validation.json
    │   └── test.json
    ├── proposals
    │   ├── proposal_00000.json
    │   ├── proposal_00001.json
    │   └── ...
    
  • We also provide data for physics learning and program execution in Google Drive. You can download them optionally and put them in the ./data/ folder.

  • Download the processed data executor_data.zip for the executor. Put it in and unzip it to ./executor/data/.

Get Object Dictionaries (Concepts and Trajectories)

Download the object proposals from the region proposal network and follow the Step-by-step Training in DCL to get object concepts and trajectories.

The above process includes:

  • trajectory extraction
  • concept learning
  • trajectory refinement

Or you can download our extracted object dictionaries object_dicts.zip directly from Google Drive.

Learning

1. Differentiable Physics Learning

After we get the above object dictionaries, we learn physical parameters from object properties and trajectories.

cd dynamics/
python3 learn_dynamics.py 10000 15000
# Here argv[1] and argv[2] represent the start and end processing index respectively.

The output object physical parameters object_dicts_with_physics.zip can be downloaded from Google Drive.

2. Physics Simulation (counterfactual)

Physical simulation using learned physical parameters.

cd dynamics/
python3 physics_simulation.py 10000 15000
# Here argv[1] and argv[2] represent the start and end processing index respectively.

The output simulated trajectories/events object_simulated.zip can be downloaded from Google Drive.

3. Physics Simulation (predictive)

Correction of long-range prediction according to video observations.

cd dynamics/
python3 refine_prediction.py 10000 15000
# Here argv[1] and argv[2] represent the start and end processing index respectively.

The output refined trajectories/events object_updated_results.zip can be downloaded from Google Drive.

Evaluation

After we get the final trajectories/events, we perform the neuro-symbolic execution and evaluate the performance on the validation set.

cd executor/
python3 evaluation.py

The test json file for evaluation on evalAI can be generated by

cd executor/
python3 get_results.py

The Generalized Clerver Dataset (counterfactual_mass)

Examples

  • Predictive question image
  • Counterfactual question image

Acknowledgements

For questions regarding VRDP, feel free to post here or directly contact the author ([email protected]).

Owner
Mingyu Ding
Mingyu Ding
Anagram Generator in Python

Anagrams Generator This is a program for computing multiword anagrams. It makes no effort to come up with sentences that make sense; it only finds ana

Day Fundora 5 Nov 17, 2022
Official implementation of "Motif-based Graph Self-Supervised Learning forMolecular Property Prediction"

Motif-based Graph Self-Supervised Learning for Molecular Property Prediction Official Pytorch implementation of NeurIPS'21 paper "Motif-based Graph Se

zaixi 71 Dec 20, 2022
Demonstrational Session git repo for H SAF User Workshop (28/1)

5th H SAF User Workshop The 5th H SAF User Workshop supported by EUMeTrain will be held in online in January 24-28 2022. This repository contains inst

H SAF 4 Aug 04, 2022
Reproduce ResNet-v2(Identity Mappings in Deep Residual Networks) with MXNet

Reproduce ResNet-v2 using MXNet Requirements Install MXNet on a machine with CUDA GPU, and it's better also installed with cuDNN v5 Please fix the ran

Wei Wu 531 Dec 04, 2022
Train emoji embeddings based on emoji descriptions.

emoji2vec This is my attempt to train, visualize and evaluate emoji embeddings as presented by Ben Eisner, Tim Rocktäschel, Isabelle Augenstein, Matko

Miruna Pislar 17 Sep 03, 2022
Framework to build and train RL algorithms

RayLink RayLink is a RL framework used to build and train RL algorithms. RayLink was used to build a RL framework, and tested in a large-scale multi-a

Bytedance Inc. 32 Oct 07, 2022
MBPO (paper: When to trust your model: Model-based policy optimization) in offline RL settings

offline-MBPO This repository contains the code of a version of model-based RL algorithm MBPO, which is modified to perform in offline RL settings Pape

LxzGordon 1 Oct 24, 2021
Pytorch implementation of PCT: Point Cloud Transformer

PCT: Point Cloud Transformer This is a Pytorch implementation of PCT: Point Cloud Transformer.

Yi_Zhang 265 Dec 22, 2022
Official Implementation (PyTorch) of "Point Cloud Augmentation with Weighted Local Transformations", ICCV 2021

PointWOLF: Point Cloud Augmentation with Weighted Local Transformations This repository is the implementation of PointWOLF(To appear). Sihyeon Kim1*,

MLV Lab (Machine Learning and Vision Lab at Korea University) 16 Nov 03, 2022
PyTorch implementation of SmoothGrad: removing noise by adding noise.

SmoothGrad implementation in PyTorch PyTorch implementation of SmoothGrad: removing noise by adding noise. Vanilla Gradients SmoothGrad Guided backpro

SSKH 143 Jan 05, 2023
This repository compare a selfie with images from identity documents and response if the selfie match.

aws-rekognition-facecompare This repository compare a selfie with images from identity documents and response if the selfie match. This code was made

1 Jan 27, 2022
A Deep learning based streamlit web app which can tell with which bollywood celebrity your face resembles.

Project Name: Which Bollywood Celebrity You look like A Deep learning based streamlit web app which can tell with which bollywood celebrity your face

BAPPY AHMED 20 Dec 28, 2021
Title: Heart-Failure-Classification

This Notebook is based off an open source dataset available on where I have created models to classify patients who can potentially witness heart failure on the basis of various parameters. The best

Akarsh Singh 2 Sep 13, 2022
Pytorch code for paper "Image Compressed Sensing Using Non-local Neural Network" TMM 2021.

NL-CSNet-Pytorch Pytorch code for paper "Image Compressed Sensing Using Non-local Neural Network" TMM 2021. Note: this repo only shows the strategy of

WenxueCui 7 Nov 07, 2022
Self-supervised Multi-modal Hybrid Fusion Network for Brain Tumor Segmentation

JBHI-Pytorch This repository contains a reference implementation of the algorithms described in our paper "Self-supervised Multi-modal Hybrid Fusion N

FeiyiFANG 5 Dec 13, 2021
A basic reminder tool written in Python.

A simple Python Reminder Here's a basic reminder tool written in Python that speaks to the user and sends a notification. Run pip3 install pyttsx3 w

Sachit Yadav 4 Feb 05, 2022
A curated list of the top 10 computer vision papers in 2021 with video demos, articles, code and paper reference.

The Top 10 Computer Vision Papers of 2021 The top 10 computer vision papers in 2021 with video demos, articles, code, and paper reference. While the w

Louis-François Bouchard 118 Dec 21, 2022
Non-Metric Space Library (NMSLIB): An efficient similarity search library and a toolkit for evaluation of k-NN methods for generic non-metric spaces.

Non-Metric Space Library (NMSLIB) Important Notes NMSLIB is generic but fast, see the results of ANN benchmarks. A standalone implementation of our fa

2.9k Jan 04, 2023
a reimplementation of LiteFlowNet in PyTorch that matches the official Caffe version

pytorch-liteflownet This is a personal reimplementation of LiteFlowNet [1] using PyTorch. Should you be making use of this work, please cite the paper

Simon Niklaus 365 Dec 31, 2022
Official implementation of "Can You Spot the Chameleon? Adversarially Camouflaging Images from Co-Salient Object Detection" in CVPR 2022.

Jadena Official implementation of "Can You Spot the Chameleon? Adversarially Camouflaging Images from Co-Salient Object Detection" in CVPR 2022. arXiv

Qing Guo 13 Nov 29, 2022