Dynamic Visual Reasoning by Learning Differentiable Physics Models from Video and Language (NeurIPS 2021)

Related tags

Deep LearningVRDP
Overview

VRDP (NeurIPS 2021)

Dynamic Visual Reasoning by Learning Differentiable Physics Models from Video and Language
Mingyu Ding, Zhenfang Chen, Tao Du, Ping Luo, Joshua B. Tenenbaum, and Chuang Gan

image

More details can be found at the Project Page.

If you find our work useful in your research please consider citing our paper:

@inproceedings{ding2021dynamic,
  author = {Ding, Mingyu and Chen, Zhenfang and Du, Tao and Luo, Ping and Tenenbaum, Joshua B and Gan, Chuang},
  title = {Dynamic Visual Reasoning by Learning Differentiable Physics Models from Video and Language},
  booktitle = {Advances In Neural Information Processing Systems},
  year = {2021}
}

Prerequisites

  • Python 3
  • PyTorch 1.3 or higher
  • All relative packages are covered by Miniconda
  • Both CPUs and GPUs are supported

Dataset preparation

  • Download videos, video annotation, questions and answers, and object proposals accordingly from the official website

  • Transform videos into ".png" frames with ffmpeg.

  • Organize the data as shown below.

    clevrer
    ├── annotation_00000-01000
    │   ├── annotation_00000.json
    │   ├── annotation_00001.json
    │   └── ...
    ├── ...
    ├── image_00000-01000
    │   │   ├── 1.png
    │   │   ├── 2.png
    │   │   └── ...
    │   └── ...
    ├── ...
    ├── questions
    │   ├── train.json
    │   ├── validation.json
    │   └── test.json
    ├── proposals
    │   ├── proposal_00000.json
    │   ├── proposal_00001.json
    │   └── ...
    
  • We also provide data for physics learning and program execution in Google Drive. You can download them optionally and put them in the ./data/ folder.

  • Download the processed data executor_data.zip for the executor. Put it in and unzip it to ./executor/data/.

Get Object Dictionaries (Concepts and Trajectories)

Download the object proposals from the region proposal network and follow the Step-by-step Training in DCL to get object concepts and trajectories.

The above process includes:

  • trajectory extraction
  • concept learning
  • trajectory refinement

Or you can download our extracted object dictionaries object_dicts.zip directly from Google Drive.

Learning

1. Differentiable Physics Learning

After we get the above object dictionaries, we learn physical parameters from object properties and trajectories.

cd dynamics/
python3 learn_dynamics.py 10000 15000
# Here argv[1] and argv[2] represent the start and end processing index respectively.

The output object physical parameters object_dicts_with_physics.zip can be downloaded from Google Drive.

2. Physics Simulation (counterfactual)

Physical simulation using learned physical parameters.

cd dynamics/
python3 physics_simulation.py 10000 15000
# Here argv[1] and argv[2] represent the start and end processing index respectively.

The output simulated trajectories/events object_simulated.zip can be downloaded from Google Drive.

3. Physics Simulation (predictive)

Correction of long-range prediction according to video observations.

cd dynamics/
python3 refine_prediction.py 10000 15000
# Here argv[1] and argv[2] represent the start and end processing index respectively.

The output refined trajectories/events object_updated_results.zip can be downloaded from Google Drive.

Evaluation

After we get the final trajectories/events, we perform the neuro-symbolic execution and evaluate the performance on the validation set.

cd executor/
python3 evaluation.py

The test json file for evaluation on evalAI can be generated by

cd executor/
python3 get_results.py

The Generalized Clerver Dataset (counterfactual_mass)

Examples

  • Predictive question image
  • Counterfactual question image

Acknowledgements

For questions regarding VRDP, feel free to post here or directly contact the author ([email protected]).

Owner
Mingyu Ding
Mingyu Ding
(Python, R, C/C++) Isolation Forest and variations such as SCiForest and EIF, with some additions (outlier detection + similarity + NA imputation)

IsoTree Fast and multi-threaded implementation of Extended Isolation Forest, Fair-Cut Forest, SCiForest (a.k.a. Split-Criterion iForest), and regular

141 Dec 29, 2022
Syed Waqas Zamir 906 Dec 30, 2022
Code for Discriminative Sounding Objects Localization (NeurIPS 2020)

Discriminative Sounding Objects Localization Code for our NeurIPS 2020 paper Discriminative Sounding Objects Localization via Self-supervised Audiovis

51 Dec 11, 2022
Language-Driven Semantic Segmentation

Language-driven Semantic Segmentation (LSeg) The repo contains official PyTorch Implementation of paper Language-driven Semantic Segmentation. Authors

Intelligent Systems Lab Org 416 Jan 03, 2023
Face Library is an open source package for accurate and real-time face detection and recognition

Face Library Face Library is an open source package for accurate and real-time face detection and recognition. The package is built over OpenCV and us

52 Nov 09, 2022
GARCH and Multivariate LSTM forecasting models for Bitcoin realized volatility with potential applications in crypto options trading, hedging, portfolio management, and risk management

Bitcoin Realized Volatility Forecasting with GARCH and Multivariate LSTM Author: Chi Bui This Repository Repository Directory ├── README.md

Chi Bui 113 Dec 29, 2022
Code and models for "Pano3D: A Holistic Benchmark and a Solid Baseline for 360 Depth Estimation", OmniCV Workshop @ CVPR21.

Pano3D A Holistic Benchmark and a Solid Baseline for 360o Depth Estimation Pano3D is a new benchmark for depth estimation from spherical panoramas. We

Visual Computing Lab, Information Technologies Institute, Centre for Reseach and Technology Hellas 50 Dec 29, 2022
An Exact Solver for Semi-supervised Minimum Sum-of-Squares Clustering

PC-SOS-SDP: an Exact Solver for Semi-supervised Minimum Sum-of-Squares Clustering PC-SOS-SDP is an exact algorithm based on the branch-and-bound techn

Antonio M. Sudoso 1 Nov 13, 2022
Implementation for paper "STAR: A Structure-aware Lightweight Transformer for Real-time Image Enhancement" (ICCV 2021).

STAR-pytorch Implementation for paper "STAR: A Structure-aware Lightweight Transformer for Real-time Image Enhancement" (ICCV 2021). CVF (pdf) STAR-DC

43 Dec 21, 2022
Official PyTorch implementation of "Synthesis of Screentone Patterns of Manga Characters"

Manga Character Screentone Synthesis Official PyTorch implementation of "Synthesis of Screentone Patterns of Manga Characters" presented in IEEE ISM 2

Tsubota 2 Nov 20, 2021
Official repository for MixFaceNets: Extremely Efficient Face Recognition Networks

MixFaceNets This is the official repository of the paper: MixFaceNets: Extremely Efficient Face Recognition Networks. (Accepted in IJCB2021) https://i

Fadi Boutros 51 Dec 13, 2022
Yoloxkeypointsegment - An anchor-free version of YOLO, with a simpler design but better performance

Introduction 关键点版本:已完成 全景分割版本:已完成 实例分割版本:已完成 YOLOX is an anchor-free version of

23 Oct 20, 2022
Model Zoo for MindSpore

Welcome to the Model Zoo for MindSpore In order to facilitate developers to enjoy the benefits of MindSpore framework, we will continue to add typical

MindSpore 226 Jan 07, 2023
Revisiting, benchmarking, and refining Heterogeneous Graph Neural Networks.

Heterogeneous Graph Benchmark Revisiting, benchmarking, and refining Heterogeneous Graph Neural Networks. Roadmap We organize our repo by task, and on

THUDM 176 Dec 17, 2022
[ICCV 2021] FaPN: Feature-aligned Pyramid Network for Dense Image Prediction

FaPN: Feature-aligned Pyramid Network for Dense Image Prediction [arXiv] [Project Page] @inproceedings{ huang2021fapn, title={{FaPN}: Feature-alig

EMI-Group 175 Dec 30, 2022
CVPR 2021 Official Pytorch Code for UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training

UC2 UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training Mingyang Zhou, Luowei Zhou, Shuohang Wang, Yu Cheng, Linjie Li, Zhou Yu,

Mingyang Zhou 28 Dec 30, 2022
A web porting for NVlabs' StyleGAN2, to facilitate exploring all kinds characteristic of StyleGAN networks

This project is a web porting for NVlabs' StyleGAN2, to facilitate exploring all kinds characteristic of StyleGAN networks. Thanks for NVlabs' excelle

K.L. 150 Dec 15, 2022
PyTorch implementation of Lip to Speech Synthesis with Visual Context Attentional GAN (NeurIPS2021)

Lip to Speech Synthesis with Visual Context Attentional GAN This repository contains the PyTorch implementation of the following paper: Lip to Speech

6 Nov 02, 2022
Text and code for the forthcoming second edition of Think Bayes, by Allen Downey.

Think Bayes 2 by Allen B. Downey The HTML version of this book is here. Think Bayes is an introduction to Bayesian statistics using computational meth

Allen Downey 1.5k Jan 08, 2023