Refer-it-in-RGBD

This is the repository of our paper 'Refer-it-in-RGBD: A Bottom-up Approach for 3D Visual Grounding in RGBD Images' in CVPR 2021

Paper - ArXiv - pdf (abs)
Project page: https://unclemedm.github.io/Refer-it-in-RGBD/

Introduction

We present a novel task of 3D visual grounding in single-view RGB-D images where the referred objects are often only partially scanned. In contrast to previous works that directly generate object proposals for grounding in the 3D scenes, we propose a bottom-up approach to gradually aggregate information, effectively addressing the challenge posed by the partial scans. Our approach first fuses the language and the visual features at the bottom level to generate a heatmap that coarsely localizes the relevant regions in the RGB-D image. Then our approach adopts an adaptive search based on the heatmap and performs the object-level matching with another visio-linguistic fusion to finally ground the referred object. We evaluate the proposed method by comparing to the state-of-the-art methods on both the RGB-D images extracted from the ScanRefer dataset and our newly collected SUN-Refer dataset. Experiments show that our method outperforms the previous methods by a large margin (by 11.1% and 11.2% [email protected]) on both datasets.

Dataset

Download SUNREFER_v2 dataset
SUNREFER dataset contains 38,495 referring expression corresponding to 7,699 objects from SUNRGBD dataset. Here is one example from SUNREFER dataset:

Repository of our paper 'Refer-it-in-RGBD' in CVPR 2021

Related tags

Overview

Refer-it-in-RGBD

Introduction

Dataset

Owner

Haolin Liu

Duke Machine Learning Winter School: Computer Vision 2022

Tracking Pipeline helps you to solve the tracking problem more easily

Understanding Convolutional Neural Networks from Theoretical Perspective via Volterra Convolution

Repositório criado para abrigar os notebooks com a listas de exercícios propostos pelo professor Gustavo Guanabara do canal Curso em Vídeo do YouTube durante o Curso de Python 3

BookMyShowPC - Movie Ticket Reservation App made with Tkinter

Official implementation of Self-supervised Graph Attention Networks (SuperGAT), ICLR 2021.

A minimal TPU compatible Jax implementation of NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis

PyTorch implementation of paper "StarEnhancer: Learning Real-Time and Style-Aware Image Enhancement" (ICCV 2021 Oral)

Python Interview Questions

VACA: Designing Variational Graph Autoencoders for Interventional and Counterfactual Queries

This program generates a random 12 digit/character password (upper and lowercase) and stores it in a file along with your username and app/website.

Pure python PEMDAS expression solver without using built-in eval function

PyTorch implementation of Wide Residual Networks with 1-bit weights by McDonnell (ICLR 2018)

Retina blood vessel segmentation with a convolutional neural network

Line-level Handwritten Text Recognition (HTR) system implemented with TensorFlow.

Seach Losses of our paper 'Loss Function Discovery for Object Detection via Convergence-Simulation Driven Search', accepted by ICLR 2021.

TLDR; Train custom adaptive filter optimizers without hand tuning or extra labels.

PyTorch implementation of the paper: Label Noise Transition Matrix Estimation for Tasks with Lower-Quality Features

A cross-lingual COVID-19 fake news dataset

Official implementation of Monocular Quasi-Dense 3D Object Tracking