Official Repository of NeurIPS2021 paper: PTR

Related tags

Deep LearningPTR
Overview

PTR: A Benchmark for Part-based Conceptual, Relational, and Physical Reasoning

Dataset Overview

Figure 1. Dataset Overview.

Introduction

A critical aspect of human visual perception is the ability to parse visual scenes into individual objects and further into object parts, forming part-whole hierarchies. Such composite structures could induce a rich set of semantic concepts and relations, thus playing an important role in the interpretation and organization of visual signals as well as for the generalization of visual perception and reasoning. However, existing visual reasoning benchmarks mostly focus on objects rather than parts. Visual reasoning based on the full part-whole hierarchy is much more challenging than object-centric reasoning due to finer-grained concepts, richer geometry relations, and more complex physics. Therefore, to better serve for part-based conceptual, relational and physical reasoning, we introduce a new large-scale diagnostic visual reasoning dataset named PTR. PTR contains around 70k RGBD synthetic images with ground truth object and part level annotations regarding semantic instance segmentation, color attributes, spatial and geometric relationships, and certain physical properties such as stability. These images are paired with 700k machine-generated questions covering various types of reasoning types, making them a good testbed for visual reasoning models. We examine several state-of-the-art visual reasoning models on this dataset and observe that they still make many surprising mistakes in situations where humans can easily infer the correct answer. We believe this dataset will open up new opportunities for part-based reasoning.

PTR is accepted by NeurIPS 2021.

Authors: Yining Hong, Li Yi, Joshua B Tenenbaum, Antonio Torralba and Chuang Gan from UCLA, MIT, IBM, Stanford and Tsinghua.

Arxiv Version: https://arxiv.org/abs/2112.05136

Project Page: http://ptr.csail.mit.edu/

Download

Data and evaluation server can be found here

TODOs

baseline models will be available soon!

About the Data

The data includes train/val/test images / questions / scene annotations / depths. Note that due to data cleaning process, the indices of the images are not necessarily consecutive.

The scene annotation is a json file that contains the following keys:

    cam_location        #location of the camera
    cam_rotation        #rotation of the camera
    directions          #Based on the camera, the vectors of the directions
    image_filename      #the filename of the image
    image_index         #the index of the image
    objects             #the objects in the scene, which contains a list of objects
        3d_coords       #the location of the object
        category        #the object category
        line_geo        #a dictionary containing (part, line unit normal vector) pairs. See the [unit normal vector](https://sites.math.washington.edu/~king/coursedir/m445w04/notes/vector/normals-plane.html) of a line. If the vector is not a unit vector, then the part cannot be considered a line.
        plane_geo       #a dictionary containing (part, plane unit normal vector) pairs. See the [unit normal vector](https://sites.math.washington.edu/~king/coursedir/m445w04/notes/vector/normals-plane.html) of a plane. If the vector is not a unit vector, then the part cannot be considered a line.
        obj_mask        #the mask of the object
        part_color      #a dictionary containing the colors of the parts
        part_count      #a dictionary containing the number of the parts
        part_mask       #a dictionary containing the masks of the parts
        partnet_id      #the id of the original partnet object in the PartNet dataset
        pixel_coords    #the pixel of the object
    relationships       #according to the directions, the spatial relationships of the objects
    projection_matrix   #the projection matrix of the camera to reconstruct 3D scene using depths
    physics(optional)   #if physics in the keys and the key is True, this is a physical scene.

The question file is a json file which contains a list of questions. Each question has the following keys:

    image_filename      #the image file that the question asks about
    image_index         #the image index that the question asks about
    program             #the original program used to generate the question
    program_nsclseq     #rearranged program as described in the paper
    question            #the question text
    answer              #the answer text
    type1               #the five questions types
    type2               #the 14 subtypes described in Table 2 in the paper

Data Generation Engine

The images and scene annotations can be generated via invoking data_generation/image_generation/render_images_partnet.py

blender --background --python render_images_partnet.py -- [args]

To generate physical scenes, invoke data_generation/image_generation/render_images_physics.py

blender --background --python render_images_physics.py -- [args]

For more instructions on image generation, please go to this directory and see the README file

To generate questions and answers based on the images, please go to this directory, and run

python generate_questions.py --input_scene_dir $INPUT_SCENE_DIR --output_dir $OUTPUT_QUESTION_DIR --output_questions_file $OUTPUT_FILE

The data generation engine is based partly on the CLEVR generation engine.

Errata

We have manually examined the images, annotations and questions twice. However, provided that there are annotation errors of the PartNet dataset we used, there could still be some errors in the scene annotations. If you find any errors that make the questions unanswerable, please contact [email protected].

Citations

@inproceedings{hong2021ptr,
author = {Hong, Yining and Yi, Li and Tenenbaum, Joshua B and Torralba, Antonio and Gan, Chuang},
title = {PTR: A Benchmark for Part-based Conceptual, Relational, and Physical Reasoning},
booktitle = {Advances In Neural Information Processing Systems},
year = {2021}
}
Owner
Yining Hong
https://evelinehong.github.io
Yining Hong
This repository contains the code for the paper in EMNLP 2021: "HRKD: Hierarchical Relational Knowledge Distillation for Cross-domain Language Model Compression".

HRKD: Hierarchical Relational Knowledge Distillation for Cross-domain Language Model Compression This repository contains the code for the paper in EM

Chenhe Dong 2 Mar 24, 2022
Direct design of biquad filter cascades with deep learning by sampling random polynomials.

IIRNet Direct design of biquad filter cascades with deep learning by sampling random polynomials. Usage git clone https://github.com/csteinmetz1/IIRNe

Christian J. Steinmetz 55 Nov 02, 2022
Code for the paper "Unsupervised Contrastive Learning of Sound Event Representations", ICASSP 2021.

Unsupervised Contrastive Learning of Sound Event Representations This repository contains the code for the following paper. If you use this code or pa

Eduardo Fonseca 81 Dec 22, 2022
68 keypoint annotations for COFW test data

68 keypoint annotations for COFW test data This repository contains manually annotated 68 keypoints for COFW test data (original annotation of CFOW da

31 Dec 06, 2022
[TIP2020] Adaptive Graph Representation Learning for Video Person Re-identification

Introduction This is the PyTorch implementation for Adaptive Graph Representation Learning for Video Person Re-identification. Get started git clone h

WuYiming 41 Dec 12, 2022
A python library for self-supervised learning on images.

Lightly is a computer vision framework for self-supervised learning. We, at Lightly, are passionate engineers who want to make deep learning more effi

Lightly 2k Jan 08, 2023
Learning from History: Modeling Temporal Knowledge Graphs with Sequential Copy-Generation Networks

CyGNet This repository reproduces the AAAI'21 paper “Learning from History: Modeling Temporal Knowledge Graphs with Sequential Copy-Generation Network

CunchaoZ 89 Jan 03, 2023
Multi-Object Tracking in Satellite Videos with Graph-Based Multi-Task Modeling

TGraM Multi-Object Tracking in Satellite Videos with Graph-Based Multi-Task Modeling, Qibin He, Xian Sun, Zhiyuan Yan, Beibei Li, Kun Fu Abstract Rece

Qibin He 6 Nov 25, 2022
Wind Speed Prediction using LSTMs in PyTorch

Implementation of Deep-Forecast using PyTorch Deep Forecast: Deep Learning-based Spatio-Temporal Forecasting Adapted from original implementation Setu

Onur Kaplan 151 Dec 14, 2022
CPPE - 5 (Medical Personal Protective Equipment) is a new challenging object detection dataset

CPPE - 5 CPPE - 5 (Medical Personal Protective Equipment) is a new challenging dataset with the goal to allow the study of subordinate categorization

Rishit Dagli 53 Dec 17, 2022
Haze Removal can remove slight to extreme cases of haze affecting an image

Haze Removal can remove slight to extreme cases of haze affecting an image. Its most typical use is for landscape photography where the haze causes low contrast and low saturation, but it can also be

Grace Ugochi Nneji 3 Feb 15, 2022
用强化学习DQN算法,训练AI模型来玩合成大西瓜游戏,提供Keras版本和PARL(paddle)版本

用强化学习玩合成大西瓜 代码地址:https://github.com/Sharpiless/play-daxigua-using-Reinforcement-Learning 用强化学习DQN算法,训练AI模型来玩合成大西瓜游戏,提供Keras版本、PARL(paddle)版本和pytorch版本

72 Dec 17, 2022
Runtime type annotations for the shape, dtype etc. of PyTorch Tensors.

torchtyping Type annotations for a tensor's shape, dtype, names, ... Turn this: def batch_outer_product(x: torch.Tensor, y: torch.Tensor) - torch.Ten

Patrick Kidger 1.2k Jan 03, 2023
Official code for the paper "Why Do Self-Supervised Models Transfer? Investigating the Impact of Invariance on Downstream Tasks".

Why Do Self-Supervised Models Transfer? Investigating the Impact of Invariance on Downstream Tasks This repository contains the official code for the

Linus Ericsson 11 Dec 16, 2022
Simple and Effective Few-Shot Named Entity Recognition with Structured Nearest Neighbor Learning

structshot Code and data for paper "Simple and Effective Few-Shot Named Entity Recognition with Structured Nearest Neighbor Learning", Yi Yang and Arz

ASAPP Research 47 Dec 27, 2022
Torchserve server using a YoloV5 model running on docker with GPU and static batch inference to perform production ready inference.

Yolov5 running on TorchServe (GPU compatible) ! This is a dockerfile to run TorchServe for Yolo v5 object detection model. (TorchServe (PyTorch librar

82 Nov 29, 2022
Homepage of paper: Paint Transformer: Feed Forward Neural Painting with Stroke Prediction, ICCV 2021.

Paint Transformer: Feed Forward Neural Painting with Stroke Prediction [Paper] [Official Paddle Implementation] [Huggingface Gradio Demo] [Unofficial

442 Dec 16, 2022
The Official TensorFlow Implementation for SPatchGAN (ICCV2021)

SPatchGAN: Official TensorFlow Implementation Paper "SPatchGAN: A Statistical Feature Based Discriminator for Unsupervised Image-to-Image Translation"

39 Dec 30, 2022
PyTorch implementation of "Simple and Deep Graph Convolutional Networks"

Simple and Deep Graph Convolutional Networks This repository contains a PyTorch implementation of "Simple and Deep Graph Convolutional Networks".(http

chenm 253 Dec 08, 2022
Finding Donors for CharityML

Finding-Donors-for-CharityML - Investigated factors that affect the likelihood of charity donations being made based on real census data.

Moamen Abdelkawy 1 Dec 30, 2021