Codebase for the paper titled "Continual learning with local module selection"

Related tags

Deep LearningLMC
Overview

This repository contains the codebase for the paper Continual Learning via Local Module Composition.


Setting up the environemnt

Create a new conda environment and install the requirements.

conda create --name ENV python=3.7
conda activate ENV
pip install -r requirements.txt
pip install -e Utils/ctrl/
pip install Utils/nngeometry/

CTrL Benchmark

All experiments were run on Nvidia Quadro RTX 8000 GPUs. To run CTrL experiments use the following comands for different streams:

Stream S-

LMC (task agnostic)

python main_transfer.py --activate_after_str_oh=0 --momentum_bn 0.1 --track_running_stats_bn 1 --pr_name lmc_cr --shuffle_test 0 --init_oh=none --task_sequence s_minus --momentum_bn_decoder=0.1 --activation_structural=sigmoid --deviation_threshold=4 --depth=4 --epochs=100 --fix_layers_below_on_addition=0 --hidden_size=64 --lr=0.001 --mask_str_loss=1 --module_init=mean --multihead=gated_linear --normalize_oh=1 --optmize_structure_only_free_modules=1 --projection_layer_oh=0 --projection_phase_length=20 --reg_factor=10  --running_stats_steps=100 --str_prior_factor=1 --str_prior_temp=0.1 --structure_inv=ae --structure_inv_oh=linear_no_act --task_agnostic_test=1 --temp=0.1 --wdecay=0.001

(test acc. 0.6863, 15 modules)

MNTDP (task aware)

python main_transfer_mntdp.py --momentum_bn 0.1 --pr_name lmc_cr --copy_batchstats 1 --track_running_stats_bn 1 --task_sequence s_minus --gating MNTDP --shuffle_test 0 --epochs 100 --lr 1e-3 --wdecay 1e-3

(test acc. 0.667, 12 modules)

Stream S+

LMC

python main_transfer.py --activate_after_str_oh=0 --activation_structural=sigmoid --deviation_threshold=1.5 --early_stop_complete=0 --pr_name lmc_cr --epochs=100 --epochs_str_only_after_addition=1 --hidden_size=64 --init_oh=none --init_runingstats_on_addition=1 --keep_bn_in_eval_after_freeze=1 --lr=0.001 --module_init=most_likely --momentum_bn=0.1 --momentum_bn_decoder=0.1 --multihead=gated_linear --normalize_oh=1 --optmize_structure_only_free_modules=1 --projection_layer_oh=0 --projection_phase_length=5 --reg_factor=10 --running_stats_steps=100 --str_prior_factor=1 --str_prior_temp=0.1 --structure_inv=ae --structure_inv_oh=linear_no_act --task_agnostic_test=1 --task_sequence=s_plus --temp=1 --wdecay=0.001

(test acc. 0.6244, 22 modules)

MNTDP (task aware)

python main_transfer_mntdp.py --momentum_bn 0.1 --pr_name lmc_cr --copy_batchstats 1 --track_running_stats_bn 1 --task_sequence s_plus --gating MNTDP --shuffle_test 0 --epochs 100 --lr 1e-3 --wdecay 1e-3 --regenerate_seed 0

(test acc. 0.609, 18 modules)

Stream Sin

LMC

python main_transfer.py --activate_after_str_oh=0 --momentum_bn 0.1 --track_running_stats_bn 1 --pr_name lmc_cr --shuffle_test 0 --init_oh=none --task_sequence s_in --momentum_bn_decoder=0.1 --activation_structural=sigmoid --deviation_threshold=4 --depth=4 --epochs=100 --fix_layers_below_on_addition=0 --hidden_size=64 --lr=0.001 --mask_str_loss=1 --module_init=most_likely --multihead=gated_linear --normalize_oh=1 --optmize_structure_only_free_modules=1 --projection_layer_oh=0 --projection_phase_length=20 --reg_factor=10  --running_stats_steps=100 --str_prior_factor=1 --str_prior_temp=0.1 --structure_inv=ae --structure_inv_oh=linear_no_act --task_agnostic_test=1 --temp=0.1 --wdecay=0.001

(test acc. 0.7081, 21 modules)

MNTDP (task aware)

python main_transfer_mntdp.py --momentum_bn 0.1 --pr_name lmc_cr --copy_batchstats 1 --track_running_stats_bn 1 --task_sequence s_in --gating MNTDP --shuffle_test 0 --epochs 100 --lr 1e-3 --wdecay 1e-3 --regenerate_seed 0

(test acc. 0.6646, 15 modules)

Stream Sout

LMC

python main_transfer.py --activate_after_str_oh=0 --momentum_bn 0.1 --track_running_stats_bn 1 --pr_name lmc_cr --shuffle_test 0 --init_oh=none --task_sequence s_out --momentum_bn_decoder=0.1 --activation_structural=sigmoid --deviation_threshold=4 --depth=4 --epochs=100 --fix_layers_below_on_addition=0 --hidden_size=64 --lr=0.001 --mask_str_loss=1 --module_init=mean --multihead=gated_linear --normalize_oh=1 --optmize_structure_only_free_modules=1 --projection_layer_oh=0 --projection_phase_length=20 --reg_factor=10  --running_stats_steps=100 --str_prior_factor=1 --str_prior_temp=0.1 --structure_inv=ae --structure_inv_oh=linear_no_act --task_agnostic_test=1 --temp=0.1 --wdecay=0.001

(test acc. 0.5849, 15 modules)

MNTDP (task aware)

python main_transfer_mntdp.py --momentum_bn 0.1 --pr_name lmc_cr --copy_batchstats 1 --track_running_stats_bn 1 --task_sequence s_out --gating MNTDP --shuffle_test 0 --epochs 100 --lr 1e-3 --wdecay 0 --regenerate_seed 0

(test acc. 0.6567, 11 modules)

Stream Spl

LMC

python main_transfer.py --activate_after_str_oh=0 --activation_structural=sigmoid --pr_name lmc_cr --deviation_threshold=1.5 --early_stop_complete=0 --epochs=100 --hidden_size=64 --init_oh=none --init_runingstats_on_addition=0 --keep_bn_in_eval_after_freeze=1 --lr=0.001 --module_init=most_likely --momentum_bn=0.1 --momentum_bn_decoder=0.1 --multihead=gated_linear --normalize_oh=1 --optmize_structure_only_free_modules=1 --projection_layer_oh=0 --projection_phase_length=10 --reg_factor=10 --running_stats_steps=100 --str_prior_factor=1 --str_prior_temp=0.1 --structure_inv=ae --structure_inv_oh=linear_no_act --task_agnostic_test=1 --task_sequence=s_pl --temp=1 --regenerate_seed 0 --wdecay=0.001

(test acc. 0.6241, 19 modules)

MNTDP (task aware)

python main_transfer_mntdp.py --momentum_bn 0.1 --pr_name lmc_cr --copy_batchstats 1 --track_running_stats_bn 1 --task_sequence s_pl --gating MNTDP --shuffle_test 0 --epochs 100 --lr 1e-3 --wdecay 1e-4 --regenerate_seed 0

(test acc. 0.6391, 18 modules)


Stream Slong30 -- 30 tasks

LMC (task aware)

python main_transfer.py --activate_after_str_oh=0 --activation_structural=sigmoid --deviation_threshold=1.5 --epochs=50 --hidden_size=64 --init_oh=none --keep_bn_in_eval_after_freeze=1 --lr=0.001 --module_init=most_likely --momentum_bn_decoder=0.1 --multihead=gated_linear --n_tasks=100 --normalize_oh=1 --optmize_structure_only_free_modules=1 --projection_layer_oh=0 --projection_phase_length=5 --reg_factor=1 --running_stats_steps=50 --seed=180 --str_prior_factor=1 --str_prior_temp=0.01 --structure_inv=ae --structure_inv_oh=linear_no_act --task_agnostic_test=0 --task_sequence=s_long30 --temp=1 --wdecay=0.001

(test acc. 62.44, 50 modules)

MNTDP (task aware)

python main_transfer_mntdp.py --epochs=50 --hidden_size=64 --lr=0.001 --module_init=most_likely --multihead=gated_linear --n_tasks=100 --seed=180 --task_sequence=s_long30 --wdecay=0.001

(test acc. 64.58, 64 modules)


Stream Slong -- 100 tasks

LMC (task aware)

python main_transfer.py --activate_after_str_oh=0 --activation_structural=sigmoid --deviation_threshold=4 --epochs=100 --hidden_size=64 --init_oh=none --keep_bn_in_eval_after_freeze=1 --lr=0.001 --module_init=most_likely --momentum_bn_decoder=0.1 --multihead=gated_linear --n_tasks=100 --normalize_oh=1 --optmize_structure_only_free_modules=1 --projection_layer_oh=0 --projection_phase_length=5 --reg_factor=1 --running_stats_steps=50 --seed=180 --str_prior_factor=1 --str_prior_temp=0.01 --structure_inv=ae --structure_inv_oh=linear_no_act --task_agnostic_test=0 --task_sequence=s_long --temp=1 --pr_name s_long_cr --wdecay=0

(test acc. 63.88, 32 modules)

MNTDP (task aware)

python main_transfer_mntdp.py --momentum_bn 0.1 --n_tasks 100 --hidden_size 64 --searchspace topdown --keep_bn_in_eval_after_freeze 1 --pr_name s_long_cr --copy_batchstats 1 --track_running_stats_bn 1 --wand_notes correct_MNTDP --task_sequence s_long --gating MNTDP --shuffle_test 0 --epochs 50 --lr 1e-3 --wdecay 1e-3

(test acc. 68.92, 142 modules)


OOD generalization experiments

LMC

python main_transfer.py --regenerate_seed 0 --deviation_threshold=8 --epochs=50 --pr_name lmc_cr --hidden_size=64 --keep_bn_in_eval_after_freeze=0 --lr=0.001 --module_init=none --momentum_bn_decoder=0.1 --normalize_data=1 --optmize_structure_only_free_modules=0 --projection_phase_length=10 --no_projection_phase 0 --reg_factor=10 --running_stats_steps=1000 --str_prior_factor=1 --str_prior_temp=0.1 --structure_inv=linear_no_act --task_sequence=s_ood --temp=1 --wdecay=0 --task_agnostic_test=0

EWC

python main_transfer.py --epochs=50 --ewc=1000 --hidden_size=256 --keep_bn_in_eval_after_freeze=0 --lr=0.001 --module_init=none --pr_name lmc_cr --multihead=usual --normalize_data=1  --task_sequence=s_ood --use_structural=0 --wdecay=0 --projection_phase_length=0

MNTDP

python main_transfer_mntdp.py --epochs=50 --regenerate_seed 0 --hidden_size=64 --keep_bn_in_eval_after_freeze=0 --pr_name lmc_cr --lr=0.01 --module_init=none --multihead=usual --normalize_data=1 --task_sequence=s_ood --use_structural=0 --wdecay=0

LMC (no projetion)

python main_transfer.py --regenerate_seed 0 --deviation_threshold=8 --epochs=50 --pr_name lmc_cr --hidden_size=64 --keep_bn_in_eval_after_freeze=0 --lr=0.001 --module_init=none --momentum_bn_decoder=0.1 --normalize_data=1 --optmize_structure_only_free_modules=0 --projection_phase_length=0 --no_projection_phase 1 --reg_factor=10 --running_stats_steps=1000 --str_prior_factor=1 --str_prior_temp=0.1 --structure_inv=linear_no_act --task_sequence=s_ood --temp=1 --wdecay=0

Plug and play (combining independently trained modular learners)

python main_plug_and_play.py --activate_after_str_oh=0 --activation_structural=sigmoid --deviation_threshold=1.5 --early_stop_complete=0 --epochs=100 --epochs_str_only_after_addition=1 --pr_name lmc_cr --hidden_size=64 --init_oh=none --init_runingstats_on_addition=1 --keep_bn_in_eval_after_freeze=1 --lr=0.001 --module_init=mean --momentum_bn=0.1 --momentum_bn_decoder=0.1 --multihead=gated_linear --n_tasks=3 --normalize_oh=1 --optmize_structure_only_free_modules=1 --projection_layer_oh=0 --projection_phase_length=5 --reg_factor=10 --running_stats_steps=10 --str_prior_factor=1 --str_prior_temp=0.1 --structure_inv=ae --structure_inv_oh=linear_no_act --task_agnostic_test=1 --task_sequence=s_pnp_comp --temp=1 --wdecay=0.001

A list of hyperparameters used for other baselines can be found in the baselines.txt file.


References

Owner
Oleksiy Ostapenko
Oleksiy Ostapenko
A simple and extensible library to create Bayesian Neural Network layers on PyTorch.

Blitz - Bayesian Layers in Torch Zoo BLiTZ is a simple and extensible library to create Bayesian Neural Network Layers (based on whats proposed in Wei

Pi Esposito 722 Jan 08, 2023
Code for BMVC2021 paper "Boundary Guided Context Aggregation for Semantic Segmentation"

Boundary-Guided-Context-Aggregation Boundary Guided Context Aggregation for Semantic Segmentation Haoxiang Ma, Hongyu Yang, Di Huang In BMVC'2021 Pape

Haoxiang Ma 31 Jan 08, 2023
Codes for our paper The Stem Cell Hypothesis: Dilemma behind Multi-Task Learning with Transformer Encoders published to EMNLP 2021.

The Stem Cell Hypothesis Codes for our paper The Stem Cell Hypothesis: Dilemma behind Multi-Task Learning with Transformer Encoders published to EMNLP

Emory NLP 5 Jul 08, 2022
Face Identity Disentanglement via Latent Space Mapping [SIGGRAPH ASIA 2020]

Face Identity Disentanglement via Latent Space Mapping Description Official Implementation of the paper Face Identity Disentanglement via Latent Space

150 Dec 07, 2022
2D Human Pose estimation using transformers. Implementation in Pytorch

PE-former: Pose Estimation Transformer Vision transformer architectures perform very well for image classification tasks. Efforts to solve more challe

Panteleris Paschalis 23 Oct 17, 2022
Stochastic Extragradient: General Analysis and Improved Rates

Stochastic Extragradient: General Analysis and Improved Rates This repository is the official implementation of the paper "Stochastic Extragradient: G

Hugo Berard 4 Nov 11, 2022
World Models with TensorFlow 2

World Models This repo reproduces the original implementation of World Models. This implementation uses TensorFlow 2.2. Docker The easiest way to hand

Zac Wellmer 234 Nov 30, 2022
SelfAugment extends MoCo to include automatic unsupervised augmentation selection.

SelfAugment extends MoCo to include automatic unsupervised augmentation selection. In addition, we've included the ability to pretrain on several new datasets and included a wandb integration.

Colorado Reed 24 Oct 26, 2022
PyTorch reimplementation of REALM and ORQA

PyTorch reimplementation of REALM and ORQA

Li-Huai (Allan) Lin 17 Aug 20, 2022
Code for the paper "Zero-shot Natural Language Video Localization" (ICCV2021, Oral).

Zero-shot Natural Language Video Localization (ZSNLVL) by Pseudo-Supervised Video Localization (PSVL) This repository is for Zero-shot Natural Languag

Computer Vision Lab. @ GIST 37 Dec 27, 2022
High-performance moving least squares material point method (MLS-MPM) solver.

High-Performance MLS-MPM Solver with Cutting and Coupling (CPIC) (MIT License) A Moving Least Squares Material Point Method with Displacement Disconti

Yuanming Hu 2.2k Dec 31, 2022
An efficient and easy-to-use deep learning model compression framework

TinyNeuralNetwork 简体中文 TinyNeuralNetwork is an efficient and easy-to-use deep learning model compression framework, which contains features like neura

Alibaba 441 Dec 25, 2022
Weighted QMIX: Expanding Monotonic Value Function Factorisation

This repo contains the cleaned-up code that was used in "Weighted QMIX: Expanding Monotonic Value Function Factorisation"

whirl 82 Dec 29, 2022
TensorFlow for Raspberry Pi

TensorFlow on Raspberry Pi It's officially supported! As of TensorFlow 1.9, Python wheels for TensorFlow are being officially supported. As such, this

Sam Abrahams 2.2k Dec 16, 2022
Twin-deep neural network for semi-supervised learning of materials properties

Deep Semi-Supervised Teacher-Student Material Synthesizability Prediction Citation: Semi-supervised teacher-student deep neural network for materials

MLEG 3 Dec 14, 2022
PyTorch Implementation of Exploring Explicit Domain Supervision for Latent Space Disentanglement in Unpaired Image-to-Image Translation.

DosGAN-PyTorch PyTorch Implementation of Exploring Explicit Domain Supervision for Latent Space Disentanglement in Unpaired Image-to-Image Translation

40 Nov 30, 2022
git《Joint Entity and Relation Extraction with Set Prediction Networks》(2020) GitHub:

Joint Entity and Relation Extraction with Set Prediction Networks Source code for Joint Entity and Relation Extraction with Set Prediction Networks. W

130 Dec 13, 2022
Tiny Object Detection in Aerial Images.

AI-TOD AI-TOD is a dataset for tiny object detection in aerial images. [Paper] [Dataset] Description AI-TOD comes with 700,621 object instances for ei

jwwangchn 116 Dec 30, 2022
Show Me the Whole World: Towards Entire Item Space Exploration for Interactive Personalized Recommendations

HierarchicyBandit Introduction This is the implementation of WSDM 2022 paper : Show Me the Whole World: Towards Entire Item Space Exploration for Inte

yu song 5 Sep 09, 2022
Code release for ICCV 2021 paper "Anticipative Video Transformer"

Anticipative Video Transformer Ranked first in the Action Anticipation task of the CVPR 2021 EPIC-Kitchens Challenge! (entry: AVT-FB-UT) [project page

Facebook Research 123 Dec 13, 2022