Cross-Task Consistency Learning Framework for Multi-Task Learning

Related tags

Deep Learningxtask_mt
Overview

Cross-Task Consistency Learning Framework for Multi-Task Learning

Tested on

  • numpy(v1.19.1)
  • opencv-python(v4.4.0.42)
  • torch(v1.7.0)
  • torchvision(v0.8.0)
  • tqdm(v4.48.2)
  • matplotlib(v3.3.1)
  • seaborn(v0.11.0)
  • pandas(v.1.1.2)

Data

Cityscapes (CS)

Download Cityscapes dataset and put it in a subdirectory named ./data/cityscapes. The folder should have the following subfolders:

  • RGB image in folder leftImg8bit
  • Segmentation in folder gtFine
  • Disparity maps in folder disparity

NYU

We use the preprocessed NYUv2 dataset provided by this repo. Download the dataset and put it in the dataset folder in ./data/nyu.

Model

The model consists of one encoder (ResNet) and two decoders, one for each task. The decoders outputs the predictions for each task ("direct predictions"), which are fed to the TaskTransferNet.
The objective of the TaskTranferNet is to predict the other task given a prediction image as an input (Segmentation prediction -> Depth prediction, vice versa), which I refer to as "transferred predictions"

Loss function

When computing the losses, the direct predictions are compared with the target while the transferred predictions are compared with the direct predictions so that they "align themselves".
The total loss consists of 4 different losses:

  • direct segmentation loss: CrossEntropyLoss()
  • direct depth loss: L1() or MSE() or logL1() or SmoothL1()
  • transferred segmentation loss:
    CrossEntropyLoss() or KLDivergence()
  • transferred depth loss: L1() or SSIM()

* Label smoothing: To "smooth" the one-hot probability by taking some of the probability from the correct class and distributing it among other classes.
* SSIM: Structural Similarity Loss

Flags

The flags are the same for both datasets. The flags and its usage are as written below,

Flag Name Usage Comments
input_path Path to dataset default is data/cityscapes (CS) or data/nyu (NYU)
height height of prediction default: 128 (CS) or 288 (NYU)
width width of prediction default: 256 (CS) or 384 (NYU)
epochs # of epochs default: 250 (CS) or 100 (NYU)
enc_layers which encoder to use default: 34, can choose from 18, 34, 50, 101, 152
use_pretrain toggle on to use pretrained encoder weights available for both datasets
batch_size batch size default: 8 (CS) or 6 (NYU)
scheduler_step_size step size for scheduler default: 80 (CS) or 60 (NYU), note that we use StepLR
scheduler_gamma decay rate of scheduler default: 0.5
alpha weight of adding transferred depth loss default: 0.01 (CS) or 0.0001 (NYU)
gamma weight of adding transferred segmentation loss default: 0.01 (CS) or 0.0001 (NYU)
label_smoothing amount of label smoothing default: 0.0
lp loss fn for direct depth loss default: L1, can choose from L1, MSE, logL1, smoothL1
tdep_loss loss fn for transferred depth loss default: L1, can choose from L1 or SSIM
tseg_loss loss fn for transferred segmentation loss default: cross, can choose from cross or kl
batch_norm toggle to enable batch normalization layer in TaskTransferNet slightly improves segmentation task
wider_ttnet toggle to double the # of channels in TaskTransferNet
uncertainty_weights toggle to use uncertainty weights (Kendall, et al. 2018) we used this for best results
gradnorm toggle to use GradNorm (Chen, et al. 2018)

Training

Cityscapes

For the Cityscapes dataset, there are two versions of segmentation task, which are 7-classes task and 19-classes task (Use flag 'num_classes' to switch tasks, default is 7).
So far, the results show near-SOTA for 7-class segmentation task + depth estimation.

ResNet34 was used as the encoder, L1() for direct depth loss and CrossEntropyLoss() for transferred segmentation loss.
The hyperparameter weights for both transferred predictions were 0.01.
I used Adam as my optimizer with an initial learning rate of 0.0001 and trained for 250 epochs with batch size 8. The learning rate was halved every 80 epochs.

To reproduce the code, use the following:

python main_cross_cs.py --uncertainty_weights

NYU

Our results show SOTA for NYU dataset.

ResNet34 was used as the encoder, L1() for direct depth loss and CrossEntropyLoss() for transferred segmentation loss.
The hyperparameter weights for both transferred predictions were 0.0001.
I used Adam as my optimizer with an initial learning rate of 0.0001 and trained for 100 epochs with batch size 6. The learning rate was halved every 60 epochs.

To reproduce the code, use the following:

python main_cross_nyu.py --uncertainty_weights

Comparisons

Evaluation metrics are the following:

Segmentation

  • Pixel accuracy (Pix Acc): percentage of pixels with the correct label
  • mIoU: mean Intersection over Union

Depth

  • Absolute Error (Abs)
  • Absolute Relative Error (Abs Rel): Absolute error divided by ground truth depth

The results are the following:

Cityscapes

Models mIoU Pix Acc Abs Abs Rel
MTAN 53.04 91.11 0.0144 33.63
KD4MTL 52.71 91.54 0.0139 27.33
PCGrad 53.59 91.45 0.0171 31.34
AdaMT-Net 62.53 94.16 0.0125 22.23
Ours 66.51 93.56 0.0122 19.40

NYU

Models mIoU Pix Acc Abs Abs Rel
MTAN* 21.07 55.70 0.6035 0.2472
MTAN† 20.10 53.73 0.6417 0.2758
KD4MTL* 20.75 57.90 0.5816 0.2445
KD4MTL† 22.44 57.32 0.6003 0.2601
PCGrad* 20.17 56.65 0.5904 0.2467
PCGrad† 21.29 54.07 0.6705 0.3000
AdaMT-Net* 21.86 60.35 0.5933 0.2456
AdaMT-Net† 20.61 58.91 0.6136 0.2547
Ours† 30.31 63.02 0.5954 0.2235

*: Trained on 3 tasks (segmentation, depth, and surface normal)
†: Trained on 2 tasks (segmentation and depth)
Italic: Reproduced by ourselves

Scores with models trained on 3 tasks for NYU dataset are shown only as reference.

Papers referred

MTAN: [paper][github]
KD4MTL: [paper][github]
PCGrad: [paper][github (tensorflow)][github (pytorch)]
AdaMT-Net: [paper]

Owner
Aki Nakano
Student at the University of Tokyo pursuing master's degree. Joined UC Berkeley Summer Session 2019. Researching deep learning. Python/R
Aki Nakano
The official implementation of "Rethink Dilated Convolution for Real-time Semantic Segmentation"

RegSeg The official implementation of "Rethink Dilated Convolution for Real-time Semantic Segmentation" Paper: arxiv D block Decoder Setup Install the

Roland 61 Dec 27, 2022
Miscellaneous and lightweight network tools

Network Tools Collection of miscellaneous and lightweight network tools to simplify daily operations, administration, and troubleshooting of networks.

Nicholas Russo 22 Mar 22, 2022
Source code for EquiDock: Independent SE(3)-Equivariant Models for End-to-End Rigid Protein Docking (ICLR 2022)

Source code for EquiDock: Independent SE(3)-Equivariant Models for End-to-End Rigid Protein Docking (ICLR 2022) Please cite "Independent SE(3)-Equivar

Octavian Ganea 154 Jan 02, 2023
Privacy as Code for DSAR Orchestration: Privacy Request automation to fulfill GDPR, CCPA, and LGPD data subject requests.

Meet Fidesops: Privacy as Code for DSAR Orchestration A part of the greater Fides ecosystem. ⚡ Overview Fidesops (fee-dez-äps, combination of the Lati

Ethyca 44 Dec 06, 2022
Pytorch implementation for A-NeRF: Articulated Neural Radiance Fields for Learning Human Shape, Appearance, and Pose

A-NeRF: Articulated Neural Radiance Fields for Learning Human Shape, Appearance, and Pose Paper | Website | Data A-NeRF: Articulated Neural Radiance F

Shih-Yang Su 172 Dec 22, 2022
Voxel Set Transformer: A Set-to-Set Approach to 3D Object Detection from Point Clouds (CVPR 2022)

Voxel Set Transformer: A Set-to-Set Approach to 3D Object Detection from Point Clouds (CVPR2022)[paper] Authors: Chenhang He, Ruihuang Li, Shuai Li, L

Billy HE 141 Dec 30, 2022
Source code of the paper "Deep Learning of Latent Variable Models for Industrial Process Monitoring".

Source code of the paper "Deep Learning of Latent Variable Models for Industrial Process Monitoring".

Xiangyin Kong 7 Nov 08, 2022
Implementation EfficientDet: Scalable and Efficient Object Detection in PyTorch

Implementation EfficientDet: Scalable and Efficient Object Detection in PyTorch

tonne 1.4k Dec 29, 2022
Official code for 'Robust Siamese Object Tracking for Unmanned Aerial Manipulator' and offical introduction to UAMT100 benchmark

SiamSA: Robust Siamese Object Tracking for Unmanned Aerial Manipulator Demo video 📹 Our video on Youtube and bilibili demonstrates the evaluation of

Intelligent Vision for Robotics in Complex Environment 12 Dec 18, 2022
Cl datasets - PyTorch image dataloaders and utility functions to load datasets for supervised continual learning

Continual learning datasets Introduction This repository contains PyTorch image

berjaoui 5 Aug 28, 2022
improvement of CLIP features over the traditional resnet features on the visual question answering, image captioning, navigation and visual entailment tasks.

CLIP-ViL In our paper "How Much Can CLIP Benefit Vision-and-Language Tasks?", we show the improvement of CLIP features over the traditional resnet fea

310 Dec 28, 2022
RobustART: Benchmarking Robustness on Architecture Design and Training Techniques

The first comprehensive Robustness investigation benchmark on large-scale dataset ImageNet regarding ARchitecture design and Training techniques towards diverse noises.

132 Dec 23, 2022
An Exact Solver for Semi-supervised Minimum Sum-of-Squares Clustering

PC-SOS-SDP: an Exact Solver for Semi-supervised Minimum Sum-of-Squares Clustering PC-SOS-SDP is an exact algorithm based on the branch-and-bound techn

Antonio M. Sudoso 1 Nov 13, 2022
Motion planning environment for Sampling-based Planners

Sampling-Based Motion Planners' Testing Environment Sampling-based motion planners' testing environment (sbp-env) is a full feature framework to quick

Soraxas 23 Aug 23, 2022
Official Implementation of DAFormer: Improving Network Architectures and Training Strategies for Domain-Adaptive Semantic Segmentation

DAFormer: Improving Network Architectures and Training Strategies for Domain-Adaptive Semantic Segmentation [Arxiv] [Paper] As acquiring pixel-wise an

Lukas Hoyer 305 Dec 29, 2022
One implementation of the paper "DMRST: A Joint Framework for Document-Level Multilingual RST Discourse Segmentation and Parsing".

Introduction One implementation of the paper "DMRST: A Joint Framework for Document-Level Multilingual RST Discourse Segmentation and Parsing". Users

seq-to-mind 18 Dec 11, 2022
TensorFlow-based implementation of "ICNet for Real-Time Semantic Segmentation on High-Resolution Images".

ICNet_tensorflow This repo provides a TensorFlow-based implementation of paper "ICNet for Real-Time Semantic Segmentation on High-Resolution Images,"

HsuanKung Yang 406 Nov 27, 2022
Cascaded Pyramid Network (CPN) based on Keras (Tensorflow backend)

ML2 Takehome Project Reimplementing the paper: Cascaded Pyramid Network for Multi-Person Pose Estimation Dataset The model uses the COCO dataset which

Vo Van Tu 1 Nov 22, 2021
Official implementation of the Implicit Behavioral Cloning (IBC) algorithm

Implicit Behavioral Cloning This codebase contains the official implementation of the Implicit Behavioral Cloning (IBC) algorithm from our paper: Impl

Google Research 210 Dec 09, 2022
Generate Cartoon Images using Generative Adversarial Network

AvatarGAN ✨ Generate Cartoon Images using DC-GAN Deep Convolutional GAN is a generative adversarial network architecture. It uses a couple of guidelin

Aakash Jhawar 50 Dec 29, 2022