🔪 Elimination based Lightweight Neural Net with Pretrained Weights

Last update: Jul 12, 2022

Overview

ELimNet

ELimNet: Eliminating Layers in a Neural Network Pretrained with Large Dataset for Downstream Task

Removed top layers from pretrained EfficientNetB0 and ResNet18 to construct lightweight CNN model with less than 1M #params.
Assessed on Trash Annotations in Context(TACO) Dataset sampled for 6 classes with 20,851 images.
Compared performance with lightweight models generated with Optuna's Neural Architecture Search(NAS) constituted with same convolutional blocks.

Quickstart

Installation

# clone the repository
git clone https://github.com/snoop2head/elimnet

# fetch image dataset and unzip
!wget -cq https://aistages-prod-server-public.s3.amazonaws.com/app/Competitions/000081/data/data.zip
!unzip ./data.zip -d ./

Train

# finetune on the dataset with pretrained model
python train.py --model ./model/efficientnet_b0.yaml

# finetune on the dataset with ElimNet
python train.py --model ./model/efficientnet_b0_elim_3.yaml

Inference

# inference with the lastest ran model
python inference.py --model_dir ./exp/latest/

Performance

Performance is compared with (1) original pretrained model and (2) Optuna NAS constructed models with no pretrained weights.

Indicates that top convolutional layers eliminated pretrained CNN models outperforms empty Optuna NAS models generated with same convolutional blocks.
Suggests that eliminating top convolutional layers creates lightweight model that shows similar(or better) classifcation performance with original pretrained model.
Reduces parameters to 7%(or less) of its original parameters while maintaining(or improving) its performance. Saves inference time by 20% or more by eliminating top convolutional layters.

ELimNet vs Pretrained Models (Train)

[100 epochs]	# of Parameters	# of Layers	Train	Validation	Test F1
Pretrained EfficientNet B0	4.0M	352	Loss: 0.43 Acc: 81.23% F1: 0.84	Loss: 0.469 Acc: 82.17% F1: 0.76	0.7493
EfficientNet B0 Elim 2	0.9M	245	Loss:0.652 Acc: 87.22% F1: 0.84	Loss: 0.622 Acc: 87.22% F1: 0.77	0.7603
EfficientNet B0 Elim 3	0.30M	181	Loss: 0.602 Acc: 78.17% F1: 0.74	Loss: 0.661 Acc: 77.41% F1: 0.74	0.7349

Resnet18	11.17M	69	Loss: 0.578 Acc: 78.90% F1: 0.76	Loss: 0.700 Acc: 76.17% F1: 0.719	-
Resnet18 Elim 2	0.68M	37	Loss: 0.447 Acc: 83.73% F1: 0.71	Loss: 0.712 Acc: 75.42% F1: 0.71	-

ELimNet vs Pretrained Models (Inference)

	# of Parameters	# of Layers	CPU times (sec)	CUDA time (sec)	Test Inference Time (sec)
Pretrained EfficientNet B0	4.0M	352	3.9s	4.0s	105.7s
EfficientNet B0 Elim 2	0.9M	245	4.1s	13.0s	83.4s
EfficientNet B0 Elim 3	0.30M	181	3.0s	9.0s	73.5s

Resnet18	11.17M	69	-	-	-
Resnet18 Elim 2	0.68M	37	-	-	-

ELimNet vs Empty Optuna NAS Models (Train)

[100 epochs]	# of Parameters	# of Layers	Train	Valid	Test F1
Empty MobileNet V3	4.2M	227	Loss 0.925 Acc: 65.18% F1: 0.58	Loss 0.993 Acc: 62.83% F1: 0.56	-
Empty EfficientNet B0	1.3M	352	Loss 0.867 Acc: 67.28% F1: 0.61	Loss 0.898 Acc: 66.80% F1: 0.61	0.6337

Empty DWConv & InvertedResidualv3 NAS	0.08M	66	-	Loss: 0.766 Acc: 71.71% F1: 0.68	0.6740
Empty MBConv NAS	0.33M	141	Loss: 0.786 Acc: 70.72% F1: 0.66	Loss: 0.866 Acc: 68.09% F1: 0.62	0.6245

Resnet18 Elim 2	0.68M	37	Loss: 0.447 Acc: 83.73% F1: 0.71	Loss: 0.712 Acc: 75.42% F1: 0.71	-
EfficientNet B0 Elim 3	0.30M	181	Loss: 0.602 Acc: 78.17% F1: 0.74	Loss: 0.661 Acc: 77.41% F1: 0.74	0.7603

ELimNet vs Empty Optuna NAS Models (Inference)

	# of Parameters	# of Layers	CPU times (sec)	CUDA time (sec)	Test Inference Time (sec)
Empty MobileNet V3	4.2M	227	4	13	-
Empty EfficientNet B0	1.3M	352	3.780	3.782	68.4s

Empty DWConv & InvertedResidualv3 NAS	0.08M	66	1	3.5	61.1s
Empty MBConv NAS	0.33M	141	2.14	7.201	67.1s

Resnet18 Elim 2	0.68M	37	-	-	-
EfficientNet B0 Elim 3	0.30M	181	3.0s	9s	73.5s

Background & WiP

Background

NLP tasks are usually downstream tasks of finetuning large pretrained transformers models(i.e. BERT, RoBERTa, XLNet).
Removing top transformers layers may yield 40% reduction in size while preserving up to 98.2% of the performance.
Likewise, for computer vision's classification task, removing convolutional top layers from pretrained models are tested.

Work in Progress

Will test the performance of replacing convolutional blocks with pretrained weights with a single convolutional layer without pretrained weights.
Will add ResNet18's inference time data and compare with Optuna's NAS constructed lightweight model.
Will test on pretrained MobileNetV3, MnasNet on torchvision with elimination based lightweight model architecture search.
Will be applied on other small datasets such as Fashion MNIST dataset and Plant Village dataset.

Others

"Empty" stands for model with no pretrained weights.
"EfficientNet B0 Elim 2" means 2 convolutional blocks have been eliminated from pretrained EfficientNet B0. Number next to "Elim" annotates how many convolutional blocks have been removed.
Table's performance illustrates best performance out of 100 epochs of finetuning on TACO Dataset.

🔪 Elimination based Lightweight Neural Net with Pretrained Weights

Related tags

Overview

ELimNet

Quickstart

Installation

Train

Inference

Performance

ELimNet vs Pretrained Models (Train)

ELimNet vs Pretrained Models (Inference)

ELimNet vs Empty Optuna NAS Models (Train)

ELimNet vs Empty Optuna NAS Models (Inference)

Background & WiP

Background

Work in Progress

Others

Authors

Owner

snoop2head

MemStream: Memory-Based Anomaly Detection in Multi-Aspect Streams with Concept Drift

Mixup for Supervision, Semi- and Self-Supervision Learning Toolbox and Benchmark

Non-Homogeneous Poisson Process Intensity Modeling and Estimation using Measure Transport

I will implement Fastai in each projects present in this repository.

Framework for training options with different attention mechanism and using them to solve downstream tasks.

Short and long time series classification using convolutional neural networks

Deep Implicit Moving Least-Squares Functions for 3D Reconstruction

GAN example for Keras. Cuz MNIST is too small and there should be something more realistic.

BADet: Boundary-Aware 3D Object Detection from Point Clouds (Pattern Recognition 2022)

implementation for paper "ShelfNet for fast semantic segmentation"

PyTorch implementation of "Debiased Visual Question Answering from Feature and Sample Perspectives" (NeurIPS 2021)

Collection of in-progress libraries for entity neural networks.

MixText: Linguistically-Informed Interpolation of Hidden Space for Semi-Supervised Text Classification

Patient-Survival - Using Python, I developed a Machine Learning model using classification techniques such as Random Forest and SVM classifiers to predict a patient's survival status that have undergone breast cancer surgery.

CenterPoint 3D Object Detection and Tracking using center points in the bird-eye view.

ilpyt: imitation learning library with modular, baseline implementations in Pytorch

PyTorch implementation code for the paper MixCo: Mix-up Contrastive Learning for Visual Representation

Code of paper Interact, Embed, and EnlargE (IEEE): Boosting Modality-specific Representations for Multi-Modal Person Re-identification.

Code for the paper: Learning Adversarially Robust Representations via Worst-Case Mutual Information Maximization (https://arxiv.org/abs/2002.11798)

Gym environment for FLIPIT: The Game of "Stealthy Takeover"