SOTA model in CIFAR10

Overview

A PyTorch Implementation of CIFAR Tricks

调研了CIFAR10数据集上各种trick,数据增强,正则化方法,并进行了实现。目前项目告一段落,如果有更好的想法,或者希望一起维护这个项目可以提issue或者在我的主页找到我的联系方式。

0. Requirements

  • Python 3.6+
  • torch=1.8.0+cu111
  • torchvision+0.9.0+cu111
  • tqdm=4.26.0
  • PyYAML=6.0

1. Implements

1.1 Tricks

  • Warmup
  • Cosine LR Decay
  • SAM
  • Label Smooth
  • KD
  • Adabound
  • Xavier Kaiming init
  • lr finder

1.2 Augmentation

  • Auto Augmentation
  • Cutout
  • Mixup
  • RICAP
  • Random Erase
  • ShakeDrop

2. Training

2.1 CIFAR-10训练示例

WideResNet28-10 baseline on CIFAR-10:

python train.py --dataset cifar10

WideResNet28-10 +RICAP on CIFAR-10:

python train.py --dataset cifar10 --ricap True

WideResNet28-10 +Random Erasing on CIFAR-10:

python train.py --dataset cifar10 --random-erase True

WideResNet28-10 +Mixup on CIFAR-10:

python train.py --dataset cifar10 --mixup True

3. Results

3.1 原pytorch-ricap的结果

Model Error rate Loss Error rate (paper)
WideResNet28-10 baseline 3.82(96.18) 0.158 3.89
WideResNet28-10 +RICAP 2.82(97.18) 0.141 2.85
WideResNet28-10 +Random Erasing 3.18(96.82) 0.114 4.65
WideResNet28-10 +Mixup 3.02(96.98) 0.158 3.02

3.2 Reimplementation结果

Model Error rate Loss Error rate (paper)
WideResNet28-10 baseline 3.78(96.22) 3.89
WideResNet28-10 +RICAP 2.81(97.19) 2.85
WideResNet28-10 +Random Erasing 3.03(96.97) 0.113 4.65
WideResNet28-10 +Mixup 2.93(97.07) 0.158 3.02

3.3 Half data快速训练验证各网络结构

reimplementation models(no augmentation, half data,epoch200,bs128)

Model Error rate Loss
lenet(cpu爆炸) (70.76)
wideresnet 3.78(96.22)
resnet20 (89.72)
senet (92.34)
resnet18 (92.08)
resnet34 (92.48)
resnet50 (91.72)
regnet (92.58)
nasnet out of mem
shake_resnet26_2x32d (93.06)
shake_resnet26_2x64d (94.14)
densenet (92.06)
dla (92.58)
googlenet (91.90) 0.2675
efficientnetb0(利用率低且慢) (86.82) 0.5024
mobilenet(利用率低) (89.18)
mobilenetv2 (91.06)
pnasnet (90.44)
preact_resnet (90.76)
resnext (92.30)
vgg(cpugpu利用率都高) (88.38)
inceptionv3 (91.84)
inceptionv4 (91.10)
inception_resnet_v2 (83.46)
rir (92.34) 0.3932
squeezenet(CPU利用率高) (89.16) 0.4311
stochastic_depth_resnet18 (90.22)
xception
dpn (92.06) 0.3002
ge_resnext29_8x64d (93.86) 巨慢

3.4 测试cpu gpu影响

TEST: scale/kernel ToyNet

修改网络的卷积层深度,并进行训练,可以得到以下结论:

结论:lenet这种卷积量比较少,只有两层的,cpu利用率高,gpu利用率低。在这个基础上增加深度,用vgg那种直筒方式增加深度,发现深度越深,cpu利用率越低,gpu利用率越高。

修改训练过程的batch size,可以得到以下结论:

结论:bs会影响收敛效果。

3.5 StepLR优化下测试cutout和mixup

architecture epoch cutout mixup C10 test acc (%)
shake_resnet26_2x64d 200 96.33
shake_resnet26_2x64d 200 96.99
shake_resnet26_2x64d 200 96.60
shake_resnet26_2x64d 200 96.46

3.6 测试SAM,ASAM,Cosine,LabelSmooth

architecture epoch SAM ASAM Cosine LR Decay LabelSmooth C10 test acc (%)
shake_resnet26_2x64d 200 96.51
shake_resnet26_2x64d 200 96.80
shake_resnet26_2x64d 200 96.61
shake_resnet26_2x64d 200 96.57

PS:其他库在加长训练过程(epoch=1800)情况下可以实现 shake_resnet26_2x64d achieved 97.71% test accuracy with cutout and mixup!!

3.7 测试cosine lr + shake

architecture epoch cutout mixup C10 test acc (%)
shake_resnet26_2x64d 300 96.66
shake_resnet26_2x64d 300 97.21
shake_resnet26_2x64d 300 96.90
shake_resnet26_2x64d 300 96.73

1800 epoch CIFAR ZOO中结果,由于耗时过久,未进行复现。

architecture epoch cutout mixup C10 test acc (%)
shake_resnet26_2x64d 1800 96.94(cifar zoo)
shake_resnet26_2x64d 1800 97.20(cifar zoo)
shake_resnet26_2x64d 1800 97.42(cifar zoo)
shake_resnet26_2x64d 1800 97.71(cifar zoo)

3.8 Divide and Co-training方案研究

  • lr:
    • warmup (20 epoch)
    • cosine lr decay
    • lr=0.1
    • total epoch(300 epoch)
  • bs=128
  • aug:
    • Random Crop and resize
    • Random left-right flipping
    • AutoAugment
    • Normalization
    • Random Erasing
    • Mixup
  • weight decay=5e-4 (bias and bn undecayed)
  • kaiming weight init
  • optimizer: nesterov

复现:((v100:gpu1) 4min*300/60=20h) top1: 97.59% 本项目目前最高值。

python train.py --model 'pyramidnet272' \
                --name 'divide-co-train' \
                --autoaugmentation True \ 
                --random-erase True \
                --mixup True \
                --epochs 300 \
                --sched 'warmcosine' \
                --optims 'nesterov' \
                --bs 128 \
                --root '/home/dpj/project/data'

3.9 测试多种数据增强

architecture epoch cutout mixup autoaugment random-erase C10 test acc (%)
shake_resnet26_2x64d 200 96.42
shake_resnet26_2x64d 200 96.49
shake_resnet26_2x64d 200 96.17
shake_resnet26_2x64d 200 96.25
shake_resnet26_2x64d 200 96.20
shake_resnet26_2x64d 200 95.82
shake_resnet26_2x64d 200 96.02
shake_resnet26_2x64d 200 96.00
shake_resnet26_2x64d 200 95.83
shake_resnet26_2x64d 200 95.89
shake_resnet26_2x64d 200 96.25
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_orgin' --bs 64
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_c' --cutout True --bs 64
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_m' --mixup True --bs 64
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_a' --autoaugmentation True  --bs 64
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_r' --random-erase True  --bs 64
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_cm'  --cutout True --mixup True --bs 64
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_ca' --cutout True --autoaugmentation True --bs 64
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_cr' --cutout True --random-erase True --bs 64
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_ma' --mixup True --autoaugmentation True --bs 64
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_mr' --mixup True --random-erase True --bs 64
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_ar' --autoaugmentation True --random-erase True  --bs 64

4. Reference

[1] https://github.com/BIGBALLON/CIFAR-ZOO

[2] https://github.com/pprp/MutableNAS

[3] https://github.com/clovaai/CutMix-PyTorch

[4] https://github.com/4uiiurz1/pytorch-ricap

[5] https://github.com/NUDTNASLab/pytorch-image-models

[6] https://github.com/facebookresearch/LaMCTS

[7] https://github.com/Alibaba-MIIL/ImageNet21K

Owner
PJDong
Computer vision learner, deep learner
PJDong
MVSDF - Learning Signed Distance Field for Multi-view Surface Reconstruction

MVSDF - Learning Signed Distance Field for Multi-view Surface Reconstruction This is the official implementation for the ICCV 2021 paper Learning Sign

110 Dec 20, 2022
VIsually-Pivoted Audio and(N) Text

VIP-ANT: VIsually-Pivoted Audio and(N) Text Code for the paper Connecting the Dots between Audio and Text without Parallel Data through Visual Knowled

Yän.PnG 16 Nov 04, 2022
Code for Contrastive-Geometry Networks for Generalized 3D Pose Transfer

CGTransformer Code for our AAAI 2022 paper "Contrastive-Geometry Transformer network for Generalized 3D Pose Transfer" Contrastive-Geometry Transforme

18 Jun 28, 2022
Implementation of Bagging and AdaBoost Algorithm

Bagging-and-AdaBoost Implementation of Bagging and AdaBoost Algorithm Dataset Red Wine Quality Data Sets For simplicity, we will have 2 classes of win

Zechen Ma 1 Nov 01, 2021
ONNX Command-Line Toolbox

ONNX Command Line Toolbox Aims to improve your experience of investigating ONNX models. Use it like onnx infershape /path/to/model.onnx. (See the usag

黎明灰烬 (王振华 Zhenhua WANG) 23 Nov 13, 2022
Tensorflow 2.x implementation of Vision-Transformer model

Vision Transformer Unofficial Tensorflow 2.x implementation of the Transformer based Image Classification model proposed by the paper AN IMAGE IS WORT

Soumik Rakshit 16 Jul 20, 2022
[CVPR 2021] Pytorch implementation of Hijack-GAN: Unintended-Use of Pretrained, Black-Box GANs

Hijack-GAN: Unintended-Use of Pretrained, Black-Box GANs In this work, we propose a framework HijackGAN, which enables non-linear latent space travers

Hui-Po Wang 46 Sep 05, 2022
Keras Image Embeddings using Contrastive Loss

Image to Embedding projection in vector space. Implementation in keras and tensorflow of batch all triplet loss for one-shot/few-shot learning.

Shravan Anand K 5 Mar 21, 2022
StyleGAN - Official TensorFlow Implementation

StyleGAN — Official TensorFlow Implementation Picture: These people are not real – they were produced by our generator that allows control over differ

NVIDIA Research Projects 13.1k Jan 09, 2023
Mscp jamf - Build compliance in jamf

mscp_jamf Build compliance in Jamf. This will build the following xml pieces to

Bob Gendler 3 Jul 25, 2022
Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome

bottom-up-attention This code implements a bottom-up attention model, based on multi-gpu training of Faster R-CNN with ResNet-101, using object and at

Peter Anderson 1.3k Jan 09, 2023
A object detecting neural network powered by the yolo architecture and leveraging the PyTorch framework and associated libraries.

Yolo-Powered-Detector A object detecting neural network powered by the yolo architecture and leveraging the PyTorch framework and associated libraries

Luke Wilson 1 Dec 03, 2021
Official implementation of the Neurips 2021 paper Searching Parameterized AP Loss for Object Detection.

Parameterized AP Loss By Chenxin Tao, Zizhang Li, Xizhou Zhu, Gao Huang, Yong Liu, Jifeng Dai This is the official implementation of the Neurips 2021

46 Jul 06, 2022
Code for the paper "Adversarially Regularized Autoencoders (ICML 2018)" by Zhao, Kim, Zhang, Rush and LeCun

ARAE Code for the paper "Adversarially Regularized Autoencoders (ICML 2018)" by Zhao, Kim, Zhang, Rush and LeCun https://arxiv.org/abs/1706.04223 Disc

Junbo (Jake) Zhao 399 Jan 02, 2023
Pytorch implementation for "Adversarial Robustness under Long-Tailed Distribution" (CVPR 2021 Oral)

Adversarial Long-Tail This repository contains the PyTorch implementation of the paper: Adversarial Robustness under Long-Tailed Distribution, CVPR 20

Tong WU 89 Dec 15, 2022
FreeSOLO for unsupervised instance segmentation, CVPR 2022

FreeSOLO: Learning to Segment Objects without Annotations This project hosts the code for implementing the FreeSOLO algorithm for unsupervised instanc

NVIDIA Research Projects 253 Jan 02, 2023
Deep Learning Training Scripts With Python

Deep Learning Training Scripts DNN Frameworks Caffe PyTorch Tensorflow CNN Models VGG ResNet DenseNet Inception Language Modeling GatedCNN-LM Attentio

Multicore Computing Research Lab 16 Dec 15, 2022
Malmo Collaborative AI Challenge - Team Pig Catcher

The Malmo Collaborative AI Challenge - Team Pig Catcher Approach The challenge involves 2 agents who can either cooperate or defect. The optimal polic

Kai Arulkumaran 66 Jun 29, 2022
Optimizes image files by converting them to webp while also updating all references.

About Optimizes images by (re-)saving them as webp. For every file it replaced it automatically updates all references. Works on single files as well

Watermelon Wolverine 18 Dec 23, 2022
Racing line optimization algorithm in python that uses Particle Swarm Optimization.

Racing Line Optimization with PSO This repository contains a racing line optimization algorithm in python that uses Particle Swarm Optimization. Requi

Parsa Dahesh 6 Dec 14, 2022