Cerberus Transformer: Joint Semantic, Affordance and Attribute Parsing

Last update: Dec 05, 2022

Related tags

Overview

Cerberus Transformer: Joint Semantic, Affordance and Attribute Parsing

Introduction

Multi-task indoor scene understanding is widely considered as an intriguing formulation, as the affinity of different tasks may lead to improved performance. In this paper, we tackle the new problem of joint semantic, affordance and attribute parsing. However, successfully resolving it requires a model to capture long-range dependency, learn from weakly aligned data and properly balance sub-tasks during training. To this end, we propose an attention-based architecture named Cerberus and a tailored training framework. Our method effectively addresses aforementioned challenges and achieves state-of-the-art performance on all three tasks. Moreover, an in-depth analysis shows concept affinity consistent with human cognition, which inspires us to explore the possibility of extremely low-shot learning. Surprisingly, Cerberus achieves strong results using only 0.1%-1% annotation. Visualizations further confirm that this success is credited to common attention maps across tasks. Code and models are publicly available.

Citation

If you find our work useful in your research, please consider citing:

Installation

Requirements

Data preparation

Attribute

Affordance

Semantic

Run Pre-trained Model

You can download pre-trained model HERE.

Training and evaluating

To train a Cerberus on NYUd2 with a single GPU:

CUDA_VISIBLE_DEVICES=0 python main.py train -d [dataset_path] -s 512 --batch-size 2 --random-scale 2 --random-rotate 10 --epochs 200 --lr 0.007 --momentum 0.9 --lr-mode poly --workers 12

To test the trained model with its checkpoint:

CUDA_VISIBLE_DEVICES=0 python main.py test -d [dataset_path]  -s 512 --resume model_best.pth.tar --phase val --batch-size 1 --ms --workers 10

Cerberus Transformer: Joint Semantic, Affordance and Attribute Parsing

Related tags

Overview

Cerberus Transformer: Joint Semantic, Affordance and Attribute Parsing

Introduction

Citation

Installation

Requirements

Data preparation

Attribute

Affordance

Semantic

Run Pre-trained Model

Training and evaluating

Owner

Pixel-level Crack Detection From Images Of Levee Systems : A Comparative Study

Project page for the paper Semi-Supervised Raw-to-Raw Mapping 2021.

Automatic 2D-to-3D Video Conversion with CNNs

Unsupervised MRI Reconstruction via Zero-Shot Learned Adversarial Transformers

Pytorch implementation of "Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech"

This initial strategy was developed specifically for larger pools and is based on taking a moving average and deriving Bollinger Bands to create a projected active liquidity range.

Only a Matter of Style: Age Transformation Using a Style-Based Regression Model

Resources complimenting the Machine Learning Course led in the Faculty of mathematics and informatics part of Sofia University.

An implementation of the 1. Parallel, 2. Streaming, 3. Randomized SVD using MPI4Py

Pytorch implementation of our paper LIMUSE: LIGHTWEIGHT MULTI-MODAL SPEAKER EXTRACTION.

Pytorch implementation of paper "Efficient Nearest Neighbor Language Models" (EMNLP 2021)

The Official Repository for "Generalized OOD Detection: A Survey"

Non-Homogeneous Poisson Process Intensity Modeling and Estimation using Measure Transport

PyTorch implementations of the beta divergence loss.

Codes accompanying the paper "Learning Nearly Decomposable Value Functions with Communication Minimization" (ICLR 2020)

Code release for "Self-Tuning for Data-Efficient Deep Learning" (ICML 2021)

Code for TIP 2017 paper --- Illumination Decomposition for Photograph with Multiple Light Sources.

i-SpaSP: Structured Neural Pruning via Sparse Signal Recovery

Laplace Redux -- Effortless Bayesian Deep Learning

Generative Models as a Data Source for Multiview Representation Learning