This repo contains the pytorch implementation for Dynamic Concept Learner (accepted by ICLR 2021).

Overview

DCL-PyTorch

Pytorch implementation for the Dynamic Concept Learner (DCL). More details can be found at the project page.

Framework

Grounding Physical Concepts of Objects and Events Through Dynamic Visual Reasoning
Zhenfang Chen, Jiayuan Mao, Jiajun Wu, Kwan-Yee K. Wong, Joshua B. Tenenbaum, and Chuang Gan

Prerequisites

  • Python 3
  • PyTorch 1.0 or higher, with NVIDIA CUDA Support
  • Other required python packages specified by requirements.txt. See the Installation.

Installation

Install Jacinle: Clone the package, and add the bin path to your global PATH environment variable:

git clone https://github.com/vacancy/Jacinle --recursive
export PATH=<path_to_jacinle>/bin:$PATH

Clone this repository:

git clone https://github.com/zfchenUnique/DCL-Release.git --recursive

Create a conda environment for NS-CL, and install the requirements. This includes the required python packages from both Jacinle NS-CL. Most of the required packages have been included in the built-in anaconda package:

Dataset preparation

  • Download videos, video annotation, questions and answers, and object proposals accordingly from the official website
  • Transform videos into ".png" frames with ffmpeg.
  • Organize the data as shown below.
    clevrer
    ├── annotation_00000-01000
    │   ├── annotation_00000.json
    │   ├── annotation_00001.json
    │   └── ...
    ├── ...
    ├── image_00000-01000
    │   │   ├── 1.png
    │   │   ├── 2.png
    │   │   └── ...
    │   └── ...
    ├── ...
    ├── questions
    │   ├── train.json
    │   ├── validation.json
    │   └── test.json
    ├── proposals
    │   ├── proposal_00000.json
    │   ├── proposal_00001.json
    │   └── ...
    

Fast Evaluation

    git clone https://github.com/zfchenUnique/clevrer_dynamic_propnet.git
    cd clevrer_dynamic_propnet
    sh ./scripts/eval_fast_release_v2.sh 0
   sh scripts/script_test_prp_clevrer_qa.sh 0

Step-by-step Training

  • Step 1: download the proposals from the region proposal network and extract object trajectories for train and val set by
   sh scripts/script_gen_tubes.sh
  • Step 2: train a concept learner with descriptive and explanatory questions for static concepts (i.e. color, shape and material)
   sh scripts/script_train_dcl_stage1.sh 0
  • Step 3: extract static attributes & refine object trajectories extract static attributes
   sh scripts/script_extract_attribute.sh

refine object trajectories

   sh scripts/script_gen_tubes_refine.sh
  • Step 4: extract predictive and counterfactual scenes by
    cd clevrer_dynamic_propnet
    sh ./scripts/train_tube_box_only.sh # train
    sh ./scripts/train_tube.sh # train
    sh ./scripts/eval_fast_release_v2.sh 0 # val
  • Step 5: train DCL with all questions and the refined trajectories
   sh scripts/script_train_dcl_stage2.sh 0

Generalization to CLEVRER-Grounding

    sh ./scripts/script_grounding.sh  0
    jac-crun 0 scripts/script_evaluate_grounding.py

Generalization to CLEVRER-Retrieval

    sh ./scripts/script_retrieval.sh  0
    jac-crun 0 scripts/script_evaluate_retrieval.py

Extension to Tower Blocks

    sh ./scripts/script_train_blocks.sh 0
  • Step 3: download the pretrain model from google drive and evaluate on Tower block QA
    sh ./scripts/script_eval_blocks.sh 0

Others

Citation

If you find this repo useful in your research, please consider citing:

@inproceedings{zfchen2021iclr,
    title={Grounding Physical Concepts of Objects and Events Through Dynamic Visual Reasoning},
    author={Chen, Zhenfang and Mao, Jiayuan and Wu, Jiajun and Wong, Kwan-Yee K and Tenenbaum, Joshua B. and Gan, Chuang},
    booktitle={International Conference on Learning Representations},
    year={2021}
    }
Owner
Zhenfang Chen
Keep it simple.
Zhenfang Chen
Graph neural network message passing reframed as a Transformer with local attention

Adjacent Attention Network An implementation of a simple transformer that is equivalent to graph neural network where the message passing is done with

Phil Wang 49 Dec 28, 2022
Prototype-based Incremental Few-Shot Semantic Segmentation

Prototype-based Incremental Few-Shot Semantic Segmentation Fabio Cermelli, Massimiliano Mancini, Yongqin Xian, Zeynep Akata, Barbara Caputo -- BMVC 20

Fabio Cermelli 21 Dec 29, 2022
cl;asification problem using classification models in supervised learning

wine-quality-predition---classification cl;asification problem using classification models in supervised learning Wine Quality Prediction Analysis - C

Vineeth Reddy Gangula 1 Jan 18, 2022
Code for "Learning the Best Pooling Strategy for Visual Semantic Embedding", CVPR 2021

Learning the Best Pooling Strategy for Visual Semantic Embedding Official PyTorch implementation of the paper Learning the Best Pooling Strategy for V

Jiacheng Chen 106 Jan 06, 2023
A repository for benchmarking neural vocoders by their quality and speed.

License The majority of VocBench is licensed under CC-BY-NC, however portions of the project are available under separate license terms: Wavenet, Para

Meta Research 177 Dec 12, 2022
Cross-modal Deep Face Normals with Deactivable Skip Connections

Cross-modal Deep Face Normals with Deactivable Skip Connections Victoria Fernández Abrevaya*, Adnane Boukhayma*, Philip H. S. Torr, Edmond Boyer (*Equ

72 Nov 27, 2022
This solves the autonomous driving issue which is supported by deep learning technology. Given a video, it splits into images and predicts the angle of turning for each frame.

Self Driving Car An autonomous car (also known as a driverless car, self-driving car, and robotic car) is a vehicle that is capable of sensing its env

Sagor Saha 4 Sep 04, 2021
This repository contains the implementation of the HealthGen model, a generative model to synthesize realistic EHR time series data with missingness

HealthGen: Conditional EHR Time Series Generation This repository contains the implementation of the HealthGen model, a generative model to synthesize

0 Jan 20, 2022
A2LP for short, ECCV2020 spotlight, Investigating SSL principles for UDA problems

Label-Propagation-with-Augmented-Anchors (A2LP) Official codes of the ECCV2020 spotlight (label propagation with augmented anchors: a simple semi-supe

20 Oct 27, 2022
Multi-task head pose estimation in-the-wild

Multi-task head pose estimation in-the-wild We provide C++ code in order to replicate the head-pose experiments in our paper https://ieeexplore.ieee.o

Roberto Valle 26 Oct 06, 2022
Deep Probabilistic Programming Course @ DIKU

Deep Probabilistic Programming Course @ DIKU

52 May 14, 2022
Experimenting with computer vision techniques to generate annotated image datasets from gameplay recordings automatically.

Experimenting with computer vision techniques to generate annotated image datasets from gameplay recordings automatically. The collected data will then be used to train a deep neural network that can

Martin Valchev 3 Apr 24, 2022
Implementation of PersonaGPT Dialog Model

PersonaGPT An open-domain conversational agent with many personalities PersonaGPT is an open-domain conversational agent cpable of decoding personaliz

ILLIDAN Lab 42 Jan 01, 2023
Multi-Agent Reinforcement Learning (MARL) method to learn scalable control polices for multi-agent target tracking.

scalableMARL Scalable Reinforcement Learning Policies for Multi-Agent Control CD. Hsu, H. Jeong, GJ. Pappas, P. Chaudhari. "Scalable Reinforcement Lea

Christopher Hsu 17 Nov 17, 2022
Official Implementation of VAT

Semantic correspondence Few-shot segmentation Cost Aggregation Is All You Need for Few-Shot Segmentation For more information, check out project [Proj

Hamacojr 114 Dec 27, 2022
Show-attend-and-tell - TensorFlow Implementation of "Show, Attend and Tell"

Show, Attend and Tell Update (December 2, 2016) TensorFlow implementation of Show, Attend and Tell: Neural Image Caption Generation with Visual Attent

Yunjey Choi 902 Nov 29, 2022
A PyTorch implementation for PyramidNets (Deep Pyramidal Residual Networks)

A PyTorch implementation for PyramidNets (Deep Pyramidal Residual Networks) This repository contains a PyTorch implementation for the paper: Deep Pyra

Greg Dongyoon Han 262 Jan 03, 2023
Generic Foreground Segmentation in Images

Pixel Objectness The following repository contains pretrained model for pixel objectness. Please visit our project page for the paper and visual resul

Suyog Jain 157 Nov 21, 2022
R-Drop: Regularized Dropout for Neural Networks

R-Drop: Regularized Dropout for Neural Networks R-drop is a simple yet very effective regularization method built upon dropout, by minimizing the bidi

756 Dec 27, 2022
Diffgram - Supervised Learning Data Platform

Data Annotation, Data Labeling, Annotation Tooling, Training Data for Machine Learning

Diffgram 1.6k Jan 07, 2023