This repository contains the source code of Auto-Lambda and baselines from the paper, Auto-Lambda: Disentangling Dynamic Task Relationships.

Last update: Dec 20, 2022

Overview

Auto-Lambda

This repository contains the source code of Auto-Lambda and baselines from the paper, Auto-Lambda: Disentangling Dynamic Task Relationships.

We encourage readers to check out our project page, including more interesting discussions and insights which are not covered in our technical paper.

Multi-task Methods

We implemented all weighting and gradient-based baselines presented in the paper for computer vision tasks: Dense Prediction Tasks (for NYUv2 and CityScapes) and Multi-domain Classification Tasks (for CIFAR-100).

Specifically, we have covered the implementation of these following multi-task optimisation methods:

Weighting-based:

Equal - All task weightings are 1. --weight equal
Uncertainty - https://arxiv.org/abs/1705.07115 --weight uncert
Dynamic Weight Average - https://arxiv.org/abs/1803.10704 --weight dwa
Auto-Lambda - Our approach. --weight autol

Gradient-based:

GradDrop - https://arxiv.org/abs/2010.06808 --grad_method graddrop
PCGrad - https://arxiv.org/abs/2001.06782 --grad_method pcgrad
CAGrad - https://arxiv.org/abs/2110.14048 --grad_method cagrad

Note: Applying a combination of both weighting and gradient-based methods can further improve performance.

Datasets

We applied the same data pre-processing following our previous project: MTAN which experimented on:

NYUv2 [3 Tasks] - 13 Class Segmentation + Depth Estimation + Surface Normal. [288 x 384] Resolution.
CityScapes [3 Tasks] - 19 Class Segmentation + 10 Class Part Segmentation + Disparity (Inverse Depth) Estimation. [256 x 512] Resolution.

Note: We have included a new task: Part Segmentation for CityScapes dataset. The pre-processing file for CityScapes has also been included in the dataset folder.

Experiments

All experiments were written in PyTorch 1.7 and can be trained with different flags (hyper-parameters) when running each training script. We briefly introduce some important flags below.

Flag Name	Usage	Comments
`network`	choose multi-task network: `split, mtan`	both architectures are based on ResNet-50; only available in dense prediction tasks
`dataset`	choose dataset: `nyuv2, cityscapes`	only available in dense prediction tasks
`weight`	choose weighting-based method: `equal, uncert, dwa, autol`	only `autol` will behave differently when set to different primary tasks
`grad_method`	choose gradient-based method: `graddrop, pcgrad, cagrad`	`weight` and `grad_method` can be applied together
`task`	choose primary tasks: `seg, depth, normal` for NYUv2, `seg, part_seg, disp` for CityScapes, `all`: a combination of all standard 3 tasks	only available in dense prediction tasks
`with_noise`	toggle on to add noise prediction task for training (to evaluate robustness in auxiliary learning setting)	only available in dense prediction tasks
`subset_id`	choose domain ID for CIFAR-100, choose `-1` for the multi-task learning setting	only available in CIFAR-100 tasks
`autol_init`	initialisation of Auto-Lambda, default `0.1`	only available when applying Auto-Lambda
`autol_lr`	learning rate of Auto-Lambda, default `1e-4` for NYUv2 and `3e-5` for CityScapes	only available when applying Auto-Lambda

Training Auto-Lambda in Multi-task / Auxiliary Learning Mode:

python trainer_dense.py --dataset [nyuv2, cityscapes] --task [PRIMARY_TASK] --weight autol --gpu 0   # for NYUv2 or CityScapes dataset
python trainer_cifar.py --subset_id [PRIMARY_DOMAIN_ID] --weight autol --gpu 0   # for CIFAR-100 dataset

Training in Single-task Learning Mode:

python trainer_dense_single.py --dataset [nyuv2, cityscapes] --task [PRIMARY_TASK]  --gpu 0   # for NYUv2 or CityScapes dataset
python trainer_cifar_single.py --subset_id [PRIMARY_DOMAIN_ID] --gpu 0   # for CIFAR-100 dataset

Note: All experiments in the original paper were trained from scratch without pre-training.

Benchmark

For standard 3 tasks in NYUv2 (without dense prediction task) in the multi-task learning setting with Split architecture, please follow the results below.

Method	Sem. Seg. (mIOU)	Depth (aErr.)	Normal (mDist.)	Delta MTL
Single	43.37	52.24	22.40	-
Equal	44.64	43.32	24.48	+3.57%
DWA	45.14	43.06	24.17	+4.58%
GradDrop	45.39	43.23	24.18	+4.65%
PCGrad	45.15	42.38	24.13	+5.09%
Uncertainty	45.98	41.26	24.09	+6.50%
CAGrad	46.14	41.91	23.52	+7.05%
Auto-Lambda	47.17	40.97	23.68	+8.21%
Auto-Lambda + CAGrad	48.26	39.82	22.81	+11.07%

Note: The results were averaged across three random seeds. You should expect the error range less than +/-1%.

Citation

If you found this code/work to be useful in your own research, please considering citing the following:

@article{liu2022auto-lambda,
  title={Auto-Lambda: Disentangling Dynamic Task Relationships},
  author={Liu, Shikun and James, Stephen and Davison, Andrew J and Johns, Edward},
  journal={arXiv preprint arXiv:2202.03091},
  year={2022}
}

Acknowledgement

We would like to thank @Cranial-XIX for his clean implementation for gradient-based optimisation methods.

Contact

If you have any questions, please contact [email protected].

This repository contains the source code of Auto-Lambda and baselines from the paper, Auto-Lambda: Disentangling Dynamic Task Relationships.

Related tags

Overview

Auto-Lambda

Multi-task Methods

Weighting-based:

Gradient-based:

Datasets

Experiments

Benchmark

Citation

Acknowledgement

Contact

Owner

Shikun Liu

Punctuation Restoration using Transformer Models for High-and Low-Resource Languages

The official implementation of ICCV paper "Box-Aware Feature Enhancement for Single Object Tracking on Point Clouds".

Implementation of SE3-Transformers for Equivariant Self-Attention, in Pytorch.

A booklet on machine learning systems design with exercises

Old Photo Restoration (Official PyTorch Implementation)

Dynamic Divide-and-Conquer Adversarial Training for Robust Semantic Segmentation （ICCV2021）

CARMS: Categorical-Antithetic-REINFORCE Multi-Sample Gradient Estimator

Hierarchical Clustering: O(1)-Approximation for Well-Clustered Graphs

Generating Digital Painting Lighting Effects via RGB-space Geometry (SIGGRAPH2020/TOG2020)

10x faster matrix and vector operations

Improving Factual Completeness and Consistency of Image-to-text Radiology Report Generation

Source code for NAACL 2021 paper "TR-BERT: Dynamic Token Reduction for Accelerating BERT Inference"

The official codes of "Semi-supervised Models are Strong Unsupervised Domain Adaptation Learners".

Learning to Disambiguate Strongly Interacting Hands via Probabilistic Per-Pixel Part Segmentation [3DV 2021 Oral]

MetaTTE: a Meta-Learning Based Travel Time Estimation Model for Multi-city Scenarios

we propose EfficientDerain for high-efficiency single-image deraining

Code of our paper "Contrastive Object-level Pre-training with Spatial Noise Curriculum Learning"

Minimal implementation of PAWS (https://arxiv.org/abs/2104.13963) in TensorFlow.

Custom studies about block sparse attention.

Contrastive Loss Gradient Attack (CLGA)