A PyTorch implementation of the baseline method in Panoptic Narrative Grounding (ICCV 2021 Oral)

Related tags

Deep LearningPNG
Overview

❇️   ❇️     Please visit our Project Page to learn more about Panoptic Narrative Grounding.    ❇️   ❇️

Panoptic Narrative Grounding

This repository provides a PyTorch implementation of the baseline method in Panoptic Narrative Grounding (ICCV 2021 Oral). Panoptic Narrative Grounding is a spatially fine and general formulation of the natural language visual grounding problem. We establish an experimental framework for the study of this new task, including new ground truth and metrics, and we propose a strong baseline method to serve as stepping stone for future work. We exploit the intrinsic semantic richness in an image by including panoptic categories, and we approach visual grounding at a fine-grained level by using segmentations. In terms of ground truth, we propose an algorithm to automatically transfer Localized Narratives annotations to specific regions in the panoptic segmentations of the MS COCO dataset. The proposed baseline achieves a performance of 55.4 absolute Average Recall points. This result is a suitable foundation to push the envelope further in the development of methods for Panoptic Narrative Grounding.

Paper

Panoptic Narrative Grounding,
Cristina González1, Nicolás Ayobi1, Isabela Hernández1, José Hernández 1, Jordi Pont-Tuset2, Pablo Arbeláez1
ICCV 2021 Oral.

1 Center for Research and Formation in Artificial Intelligence (CINFONIA) , Universidad de Los Andes.
2 Google Research, Switzerland.

Installation

Requirements

  • Python
  • Numpy
  • Pytorch 1.7.1
  • Tqdm 4.56.0
  • Scipy 1.5.3

Cloning the repository

$ git clone [email protected]:BCV-Uniandes/PNG.git
$ cd PNG

Dataset Preparation

Panoptic Marrative Grounding Benchmark

  1. Download the 2017 MSCOCO Dataset from its official webpage. You will need the train and validation splits' images1 and panoptic segmentations annotations.

  2. Download the Panoptic Narrative Grounding Benchmark and pre-computed features from our project webpage with the following folders structure:

panoptic_narrative_grounding
|_ images
|  |_ train2017
|  |_ val2017
|_ features
|  |_ train2017
|  |  |_ mask_features
|  |  |_ sem_seg_features
|  |  |_ panoptic_seg_predictions
|  |_ val2017
|     |_ mask_features
|     |_ sem_seg_features
|     |_ panoptic_seg_predictions
|_ annotations
   |_ png_coco_train2017.json
   |_ png_coco_val2017.json
   |_ panoptic_segmentation
      |_ train2017
      |_ val2017

Train setup:

Modify the routes in train_net.sh according to your local paths.

python main --init_method "tcp://localhost:8080" NUM_GPUS 1 DATA.PATH_TO_DATA_DIR path_to_your_data_dir DATA.PATH_TO_FEATURES_DIR path_to_your_features_dir OUTPUT_DIR output_dir

Test setup:

Modify the routes in test_net.sh according to your local paths.

python main --init_method "tcp://localhost:8080" NUM_GPUS 1 DATA.PATH_TO_DATA_DIR path_to_your_data_dir DATA.PATH_TO_FEATURES_DIR path_to_your_features_dir OUTPUT_DIR output_dir TRAIN.ENABLE "False"

Pretrained model

To reproduce all our results as reported bellow, you can use our pretrained model and our source code.

Method things + stuff things stuff
Oracle 64.4 67.3 60.4
Ours 55.4 56.2 54.3
MCN - 48.2 -
Method singulars + plurals singulars plurals
Oracle 64.4 64.8 60.7
Ours 55.4 56.2 48.8

Citation

If you find Panoptic Narrative Grounding useful in your research, please use the following BibTeX entry for citation:

@inproceedings{gonzalez2021png,
  title={Panoptic Narrative Grounding},
  author={Gonz{\'a}lez, Cristina and Ayobi, Nicol{'\a}s and Hern{\'a}ndez, Isabela and Hern{\'a}ndez, Jose and Pont-Tuset, Jordi and Arbel{\'a}ez, Pablo},
  booktitle={ICCV},
  year={2021}
}
Owner
Biomedical Computer Vision @ Uniandes
Our field of research is computer vision, the area of artificial intelligence seeking automated understanding of visual information
Biomedical Computer Vision @ Uniandes
Set of methods to ensemble boxes from different object detection models, including implementation of "Weighted boxes fusion (WBF)" method.

Set of methods to ensemble boxes from different object detection models, including implementation of "Weighted boxes fusion (WBF)" method.

1.4k Jan 05, 2023
Official repo of the paper "Surface Form Competition: Why the Highest Probability Answer Isn't Always Right"

Surface Form Competition This is the official repo of the paper "Surface Form Competition: Why the Highest Probability Answer Isn't Always Right" We p

Peter West 46 Dec 23, 2022
A fuzzing framework for SMT solvers

yinyang A fuzzing framework for SMT solvers. Given a set of seed SMT formulas, yinyang generates mutant formulas to stress-test SMT solvers. yinyang c

Project Yin-Yang for SMT Solver Testing 145 Jan 04, 2023
Speech Emotion Recognition with Fusion of Acoustic- and Linguistic-Feature-Based Decisions

APSIPA-SER-with-A-and-T This code is the implementation of Speech Emotion Recognition (SER) with acoustic and linguistic features. The network model i

kenro515 3 Jan 04, 2023
Apollo optimizer in tensorflow

Apollo Optimizer in Tensorflow 2.x Notes: Warmup is important with Apollo optimizer, so be sure to pass in a learning rate schedule vs. a constant lea

Evan Walters 1 Nov 09, 2021
CARMS: Categorical-Antithetic-REINFORCE Multi-Sample Gradient Estimator

CARMS: Categorical-Antithetic-REINFORCE Multi-Sample Gradient Estimator This is the official code repository for NeurIPS 2021 paper: CARMS: Categorica

Alek Dimitriev 1 Jul 09, 2022
Official Datasets and Implementation from our Paper "Video Class Agnostic Segmentation in Autonomous Driving".

Video Class Agnostic Segmentation [Method Paper] [Benchmark Paper] [Project] [Demo] Official Datasets and Implementation from our Paper "Video Class A

Mennatullah Siam 26 Oct 24, 2022
IGCN : Image-to-graph convolutional network

IGCN : Image-to-graph convolutional network IGCN is a learning framework for 2D/3D deformable model registration and alignment, and shape reconstructi

Megumi Nakao 7 Oct 27, 2022
The official implementation of our CVPR 2021 paper - Hybrid Rotation Averaging: A Fast and Robust Rotation Averaging Approach

Graph Optimizer This repo contains the official implementation of our CVPR 2021 paper - Hybrid Rotation Averaging: A Fast and Robust Rotation Averagin

Chenyu 109 Dec 23, 2022
Source code for "OmniPhotos: Casual 360° VR Photography"

OmniPhotos: Casual 360° VR Photography Project Page | Video | Paper | Demo | Data This repository contains the source code for creating and viewing Om

Christian Richardt 144 Dec 30, 2022
This repo provides function call to track multi-objects in videos

Custom Object Tracking Introduction This repo provides function call to track multi-objects in videos with a given trained object detection model and

Jeff Lo 51 Nov 22, 2022
Annotate with anyone, anywhere.

h h is the web app that serves most of the https://hypothes.is/ website, including the web annotations API at https://hypothes.is/api/. The Hypothesis

Hypothesis 2.6k Jan 08, 2023
Example of a Quantum LSTM

Example of a Quantum LSTM

Riccardo Di Sipio 36 Oct 31, 2022
Contains modeling practice materials and homework for the Computational Neuroscience course at Okinawa Institute of Science and Technology

A310 Computational Neuroscience - Okinawa Institute of Science and Technology, 2022 This repository contains modeling practice materials and homework

Sungho Hong 1 Jan 24, 2022
NCVX (NonConVeX): A User-Friendly and Scalable Package for Nonconvex Optimization in Machine Learning.

NCVX NCVX: A User-Friendly and Scalable Package for Nonconvex Optimization in Machine Learning. Please check https://ncvx.org for detailed instruction

SUN Group @ UMN 28 Aug 03, 2022
Official PyTorch implementation for paper "Efficient Two-Stage Detection of Human–Object Interactions with a Novel Unary–Pairwise Transformer"

UPT: Unary–Pairwise Transformers This repository contains the official PyTorch implementation for the paper Frederic Z. Zhang, Dylan Campbell and Step

Frederic Zhang 109 Dec 20, 2022
Human Detection - Pedestrian Detection using OpenCV Python

Pedestrian Detection using OpenCV Python Follow us on Instagram for Machine Lear

Hrishikesh Dutta 1 Jan 23, 2022
Scalable training for dense retrieval models.

Scalable implementation of dense retrieval. Training on cluster By default it trains locally: PYTHONPATH=.:$PYTHONPATH python dpr_scale/main.py traine

Facebook Research 90 Dec 28, 2022
TRACER: Extreme Attention Guided Salient Object Tracing Network implementation in PyTorch

TRACER: Extreme Attention Guided Salient Object Tracing Network This paper was accepted at AAAI 2022 SA poster session. Datasets All datasets are avai

Karel 118 Dec 29, 2022
ChainerRL is a deep reinforcement learning library built on top of Chainer.

ChainerRL and PFRL ChainerRL (this repository) is a deep reinforcement learning library that implements various state-of-the-art deep reinforcement al

Chainer 1.1k Jan 01, 2023