Deep Ensembling with No Overhead for either Training or Testing: The All-Round Blessings of Dynamic Sparsity

Last update: Dec 31, 2022

Overview

[ICLR 2022] Deep Ensembling with No Overhead for either Training or Testing: The All-Round Blessings of Dynamic Sparsity

Deep Ensembling with No Overhead for either Training or Testing: The All-Round Blessings of Dynamic Sparsity
Shiwei Liu, Tianlong Chen, Zahra Atashgahi, Xiaohan Chen, Ghada Sokar, Elena Mocanu, Mykola Pechenizkiy, Zhangyang Wang, Decebal Constantin Mocanu

https://openreview.net/forum?id=RLtqs6pzj1-

Abstract: The success of deep ensembles on improving predictive performance, uncertainty, and out-of-distribution robustness has been extensively demonstrated in the machine learning literature. Albeit the promising results, naively training multiple deep neural networks and combining their predictions at test lead to prohibitive computational costs and memory requirements. Recently proposed efficient ensemble approaches reach the performance of the traditional deep ensembles with significantly lower costs. However, the training resources required by these approaches are still at least the same as training a single dense model. In this work, we draw a unique connection between sparse neural network training and deep ensembles, yielding a novel efficient ensemble learning framework called FreeTickets. Instead of training multiple dense networks and averaging them, we directly train sparse subnetworks from scratch and extract diverse yet accurate subnetworks during this efficient, sparse-to-sparse training. Our framework, FreeTickets, is defined as the ensemble of these relatively cheap sparse subnetworks. Despite being an ensemble method, FreeTickets has even fewer parameters and training FLOPs compared to a single dense model. This seemingly counter-intuitive outcome is due to the ultra training efficiency of dynamic sparse training. FreeTickets improves over the dense baseline in the following criteria: prediction accuracy, uncertainty estimation, out-of-distribution (OoD) robustness, and training/inference efficiency. Impressively, FreeTickets outperforms the naive deep ensemble with ResNet50 on ImageNet using around only 1/5 training FLOPs required by the latter.

This code base is created by Shiwei Liu [email protected] during his Ph.D. at Eindhoven University of Technology.

Requirements

Python 3.6, PyTorch v1.5.1, and CUDA v10.2.

How to Run Experiments

CIFAR-10/100 Experiments

To train Wide ResNet28-10 on CIFAR10/100 with DST ensemble at sparsity 0.8:

python main_DST.py --sparse --model wrn-28-10 --data cifar10 --seed 17 --sparse-init ERK \
--update-frequency 1000 --batch-size 128 --death-rate 0.5 --large-death-rate 0.8 \
--growth gradient --death magnitude --redistribution none --epochs 250 --density 0.2

To train Wide ResNet28-10 on CIFAR10/100 with EDST ensemble at sparsity 0.8:

python3 main_EDST.py --sparse --model wrn-28-10 --data cifar10 --nolrsche \
--decay-schedule constant --seed 17 --epochs-explo 150 --model-num 3 --sparse-init ERK \
--update-frequency 1000 --batch-size 128 --death-rate 0.5 --large-death-rate 0.8 \
--growth gradient --death magnitude --redistribution none --epochs 450 --density 0.2

[Training module] The training module is controlled by the following arguments:

--epochs-explo - An integer that controls the training epochs of the exploration phase.
--model-num - An integer, the number free tickets to produce.
--large-death-rate - A float, the ratio of parameters to explore for each refine phase.
--density - An float, the density (1-sparsity) level for each free ticket.

To train Wide ResNet28-10 on CIFAR10/100 with PF (prung and finetuning) ensemble at sparsity 0.8:

First, we need train a dense model with:

python3 main_individual.py  --model wrn-28-10 --data cifar10 --decay-schedule cosine --seed 18 \
--sparse-init ERK --update-frequency 1000 --batch-size 128 --death-rate 0.5 --large-death-rate 0.5 \
--growth gradient --death magnitude --redistribution none --epochs 250 --density 0.2

Then, perform pruning and finetuning with:

pretrain='results/wrn-28-10/cifar10/individual/dense/18.pt'
python3 main_PF.py --sparse --model wrn-28-10 --resume --pretrain $pretrain --lr 0.001 \
--fix --data cifar10 --nolrsche --decay-schedule constant --seed 18 
--epochs-fs 150 --model-num 3 --sparse-init pruning --update-frequency 1000 --batch-size 128 \
--death-rate 0.5 --large-death-rate 0.8 --growth gradient --death magnitude \
--redistribution none --epochs $epoch --density 0.2

After finish the training of various ensemble methods, run the following commands for test ensemble:

resume=results/wrn-28-10/cifar10/density_0.2/EDST/M=3/
python ensemble_freetickets.py --mode predict --resume $resume --dataset cifar10 --model wrn-28-10 \
--seed 18 --test-batch-size 128

--resume - An folder path that contains the all the free tickets obtained during training.
--mode - An str that control the evaluation mode, including: predict, disagreement, calibration, KD, and tsne.

ImageNet Experiments

cd ImageNet
python $1multiproc.py --nproc_per_node 2 $1main.py --sparse_init ERK --multiplier 1 --growth gradient --seed 17 --master_port 4545 -j5 -p 500 --arch resnet50 -c fanin --update_frequency 4000 --label-smoothing 0.1 -b 64 --lr 0.1 --warmup 5 --epochs 310 --density 0.2 $2 ../data/

Citation

if you find this repo is helpful, please cite

@inproceedings{
liu2022deep,
title={Deep Ensembling with No Overhead for either Training or Testing: The All-Round Blessings of Dynamic Sparsity},
author={Shiwei Liu and Tianlong Chen and Zahra Atashgahi and Xiaohan Chen and Ghada Sokar and Elena Mocanu and Mykola Pechenizkiy and Zhangyang Wang and Decebal Constantin Mocanu},
booktitle={International Conference on Learning Representations},
year={2022},
url={https://openreview.net/forum?id=RLtqs6pzj1-}
}

Deep Ensembling with No Overhead for either Training or Testing: The All-Round Blessings of Dynamic Sparsity

Related tags

Overview

[ICLR 2022] Deep Ensembling with No Overhead for either Training or Testing: The All-Round Blessings of Dynamic Sparsity

Requirements

How to Run Experiments

CIFAR-10/100 Experiments

ImageNet Experiments

Citation

Owner

VITA

Official implementation of "OpenPifPaf: Composite Fields for Semantic Keypoint Detection and Spatio-Temporal Association" in PyTorch.

TensorFlow implementation of "Variational Inference with Normalizing Flows"

PerfFuzz: Automatically Generate Pathological Inputs for C/C++ programs

BasicNeuralNetwork - This project looks over the basic structure of a neural network and how machine learning training algorithms work

Building blocks for uncertainty-aware cycle consistency presented at NeurIPS'21.

A Lighting Pytorch Framework for Recommendation System, Easy-to-use and Easy-to-extend.

Universal Adversarial Triggers for Attacking and Analyzing NLP (EMNLP 2019)

TumorInsight is a Brain Tumor Detection and Classification model built using RESNET50 architecture.

The official PyTorch implementation of the paper: Xili Dai, Xiaojun Yuan, Haigang Gong, Yi Ma. "Fully Convolutional Line Parsing." .

This is the official repository of Music Playlist Title Generation: A Machine-Translation Approach.

MOT-Tracking-by-Detection-Pipeline - For Tracking-by-Detection format MOT (Multi Object Tracking), is it a framework that separates Detection and Tracking processes?

Simple-Neural-Network From Scratch in Python

Reinforcement Learning via Supervised Learning

A collection of differentiable SVD methods and also the official implementation of the ICCV21 paper "Why Approximate Matrix Square Root Outperforms Accurate SVD in Global Covariance Pooling?"

Source code of generalized shuffled linear regression

Fog Simulation on Real LiDAR Point Clouds for 3D Object Detection in Adverse Weather

[ICME 2021 Oral] CORE-Text: Improving Scene Text Detection with Contrastive Relational Reasoning

Fuzzification helps developers protect the released, binary-only software from attackers who are capable of applying state-of-the-art fuzzing techniques

Object detection GUI based on PaddleDetection

Final project for machine learning (CSC 590). Detection of hepatitis C and progression through blood samples.

Deep Ensembling with No Overhead for either Training or Testing: The All-Round Blessings of Dynamic Sparsity

Related tags

Overview

[ICLR 2022] Deep Ensembling with No Overhead for either Training or Testing: The All-Round Blessings of Dynamic Sparsity

Requirements

How to Run Experiments

CIFAR-10/100 Experiments

ImageNet Experiments

Citation

Owner

VITA

Official implementation of "OpenPifPaf: Composite Fields for Semantic Keypoint Detection and Spatio-Temporal Association" in PyTorch.

TensorFlow implementation of "Variational Inference with Normalizing Flows"

PerfFuzz: Automatically Generate Pathological Inputs for C/C++ programs

BasicNeuralNetwork - This project looks over the basic structure of a neural network and how machine learning training algorithms work

Building blocks for uncertainty-aware cycle consistency presented at NeurIPS'21.

A Lighting Pytorch Framework for Recommendation System, Easy-to-use and Easy-to-extend.

Universal Adversarial Triggers for Attacking and Analyzing NLP (EMNLP 2019)

TumorInsight is a Brain Tumor Detection and Classification model built using RESNET50 architecture.

The official PyTorch implementation of the paper: *Xili Dai, Xiaojun Yuan, Haigang Gong, Yi Ma. "Fully Convolutional Line Parsing." *.

This is the official repository of Music Playlist Title Generation: A Machine-Translation Approach.

MOT-Tracking-by-Detection-Pipeline - For Tracking-by-Detection format MOT (Multi Object Tracking), is it a framework that separates Detection and Tracking processes?

Simple-Neural-Network From Scratch in Python

Reinforcement Learning via Supervised Learning

A collection of differentiable SVD methods and also the official implementation of the ICCV21 paper "Why Approximate Matrix Square Root Outperforms Accurate SVD in Global Covariance Pooling?"

Source code of generalized shuffled linear regression

Fog Simulation on Real LiDAR Point Clouds for 3D Object Detection in Adverse Weather

[ICME 2021 Oral] CORE-Text: Improving Scene Text Detection with Contrastive Relational Reasoning

Fuzzification helps developers protect the released, binary-only software from attackers who are capable of applying state-of-the-art fuzzing techniques

Object detection GUI based on PaddleDetection

Final project for machine learning (CSC 590). Detection of hepatitis C and progression through blood samples.

The official PyTorch implementation of the paper: Xili Dai, Xiaojun Yuan, Haigang Gong, Yi Ma. "Fully Convolutional Line Parsing." .