SOTA model in CIFAR10

Overview

A PyTorch Implementation of CIFAR Tricks

调研了CIFAR10数据集上各种trick,数据增强,正则化方法,并进行了实现。目前项目告一段落,如果有更好的想法,或者希望一起维护这个项目可以提issue或者在我的主页找到我的联系方式。

0. Requirements

  • Python 3.6+
  • torch=1.8.0+cu111
  • torchvision+0.9.0+cu111
  • tqdm=4.26.0
  • PyYAML=6.0

1. Implements

1.1 Tricks

  • Warmup
  • Cosine LR Decay
  • SAM
  • Label Smooth
  • KD
  • Adabound
  • Xavier Kaiming init
  • lr finder

1.2 Augmentation

  • Auto Augmentation
  • Cutout
  • Mixup
  • RICAP
  • Random Erase
  • ShakeDrop

2. Training

2.1 CIFAR-10训练示例

WideResNet28-10 baseline on CIFAR-10:

python train.py --dataset cifar10

WideResNet28-10 +RICAP on CIFAR-10:

python train.py --dataset cifar10 --ricap True

WideResNet28-10 +Random Erasing on CIFAR-10:

python train.py --dataset cifar10 --random-erase True

WideResNet28-10 +Mixup on CIFAR-10:

python train.py --dataset cifar10 --mixup True

3. Results

3.1 原pytorch-ricap的结果

Model Error rate Loss Error rate (paper)
WideResNet28-10 baseline 3.82(96.18) 0.158 3.89
WideResNet28-10 +RICAP 2.82(97.18) 0.141 2.85
WideResNet28-10 +Random Erasing 3.18(96.82) 0.114 4.65
WideResNet28-10 +Mixup 3.02(96.98) 0.158 3.02

3.2 Reimplementation结果

Model Error rate Loss Error rate (paper)
WideResNet28-10 baseline 3.78(96.22) 3.89
WideResNet28-10 +RICAP 2.81(97.19) 2.85
WideResNet28-10 +Random Erasing 3.03(96.97) 0.113 4.65
WideResNet28-10 +Mixup 2.93(97.07) 0.158 3.02

3.3 Half data快速训练验证各网络结构

reimplementation models(no augmentation, half data,epoch200,bs128)

Model Error rate Loss
lenet(cpu爆炸) (70.76)
wideresnet 3.78(96.22)
resnet20 (89.72)
senet (92.34)
resnet18 (92.08)
resnet34 (92.48)
resnet50 (91.72)
regnet (92.58)
nasnet out of mem
shake_resnet26_2x32d (93.06)
shake_resnet26_2x64d (94.14)
densenet (92.06)
dla (92.58)
googlenet (91.90) 0.2675
efficientnetb0(利用率低且慢) (86.82) 0.5024
mobilenet(利用率低) (89.18)
mobilenetv2 (91.06)
pnasnet (90.44)
preact_resnet (90.76)
resnext (92.30)
vgg(cpugpu利用率都高) (88.38)
inceptionv3 (91.84)
inceptionv4 (91.10)
inception_resnet_v2 (83.46)
rir (92.34) 0.3932
squeezenet(CPU利用率高) (89.16) 0.4311
stochastic_depth_resnet18 (90.22)
xception
dpn (92.06) 0.3002
ge_resnext29_8x64d (93.86) 巨慢

3.4 测试cpu gpu影响

TEST: scale/kernel ToyNet

修改网络的卷积层深度,并进行训练,可以得到以下结论:

结论:lenet这种卷积量比较少,只有两层的,cpu利用率高,gpu利用率低。在这个基础上增加深度,用vgg那种直筒方式增加深度,发现深度越深,cpu利用率越低,gpu利用率越高。

修改训练过程的batch size,可以得到以下结论:

结论:bs会影响收敛效果。

3.5 StepLR优化下测试cutout和mixup

architecture epoch cutout mixup C10 test acc (%)
shake_resnet26_2x64d 200 96.33
shake_resnet26_2x64d 200 96.99
shake_resnet26_2x64d 200 96.60
shake_resnet26_2x64d 200 96.46

3.6 测试SAM,ASAM,Cosine,LabelSmooth

architecture epoch SAM ASAM Cosine LR Decay LabelSmooth C10 test acc (%)
shake_resnet26_2x64d 200 96.51
shake_resnet26_2x64d 200 96.80
shake_resnet26_2x64d 200 96.61
shake_resnet26_2x64d 200 96.57

PS:其他库在加长训练过程(epoch=1800)情况下可以实现 shake_resnet26_2x64d achieved 97.71% test accuracy with cutout and mixup!!

3.7 测试cosine lr + shake

architecture epoch cutout mixup C10 test acc (%)
shake_resnet26_2x64d 300 96.66
shake_resnet26_2x64d 300 97.21
shake_resnet26_2x64d 300 96.90
shake_resnet26_2x64d 300 96.73

1800 epoch CIFAR ZOO中结果,由于耗时过久,未进行复现。

architecture epoch cutout mixup C10 test acc (%)
shake_resnet26_2x64d 1800 96.94(cifar zoo)
shake_resnet26_2x64d 1800 97.20(cifar zoo)
shake_resnet26_2x64d 1800 97.42(cifar zoo)
shake_resnet26_2x64d 1800 97.71(cifar zoo)

3.8 Divide and Co-training方案研究

  • lr:
    • warmup (20 epoch)
    • cosine lr decay
    • lr=0.1
    • total epoch(300 epoch)
  • bs=128
  • aug:
    • Random Crop and resize
    • Random left-right flipping
    • AutoAugment
    • Normalization
    • Random Erasing
    • Mixup
  • weight decay=5e-4 (bias and bn undecayed)
  • kaiming weight init
  • optimizer: nesterov

复现:((v100:gpu1) 4min*300/60=20h) top1: 97.59% 本项目目前最高值。

python train.py --model 'pyramidnet272' \
                --name 'divide-co-train' \
                --autoaugmentation True \ 
                --random-erase True \
                --mixup True \
                --epochs 300 \
                --sched 'warmcosine' \
                --optims 'nesterov' \
                --bs 128 \
                --root '/home/dpj/project/data'

3.9 测试多种数据增强

architecture epoch cutout mixup autoaugment random-erase C10 test acc (%)
shake_resnet26_2x64d 200 96.42
shake_resnet26_2x64d 200 96.49
shake_resnet26_2x64d 200 96.17
shake_resnet26_2x64d 200 96.25
shake_resnet26_2x64d 200 96.20
shake_resnet26_2x64d 200 95.82
shake_resnet26_2x64d 200 96.02
shake_resnet26_2x64d 200 96.00
shake_resnet26_2x64d 200 95.83
shake_resnet26_2x64d 200 95.89
shake_resnet26_2x64d 200 96.25
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_orgin' --bs 64
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_c' --cutout True --bs 64
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_m' --mixup True --bs 64
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_a' --autoaugmentation True  --bs 64
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_r' --random-erase True  --bs 64
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_cm'  --cutout True --mixup True --bs 64
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_ca' --cutout True --autoaugmentation True --bs 64
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_cr' --cutout True --random-erase True --bs 64
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_ma' --mixup True --autoaugmentation True --bs 64
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_mr' --mixup True --random-erase True --bs 64
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_ar' --autoaugmentation True --random-erase True  --bs 64

4. Reference

[1] https://github.com/BIGBALLON/CIFAR-ZOO

[2] https://github.com/pprp/MutableNAS

[3] https://github.com/clovaai/CutMix-PyTorch

[4] https://github.com/4uiiurz1/pytorch-ricap

[5] https://github.com/NUDTNASLab/pytorch-image-models

[6] https://github.com/facebookresearch/LaMCTS

[7] https://github.com/Alibaba-MIIL/ImageNet21K

Owner
PJDong
Computer vision learner, deep learner
PJDong
ProMP: Proximal Meta-Policy Search

ProMP: Proximal Meta-Policy Search Implementations corresponding to ProMP (Rothfuss et al., 2018). Overall this repository consists of two branches: m

Jonas Rothfuss 212 Dec 20, 2022
Zero-Shot Text-to-Image Generation VQGAN+CLIP Dockerized

VQGAN-CLIP-Docker About Zero-Shot Text-to-Image Generation VQGAN+CLIP Dockerized This is a stripped and minimal dependency repository for running loca

Kevin Costa 73 Sep 11, 2022
Ludwig is a toolbox that allows to train and evaluate deep learning models without the need to write code.

Translated in 🇰🇷 Korean/ Ludwig is a toolbox that allows users to train and test deep learning models without the need to write code. It is built on

Ludwig 8.7k Dec 31, 2022
Official implementation of the paper "Light Field Networks: Neural Scene Representations with Single-Evaluation Rendering"

Light Field Networks Project Page | Paper | Data | Pretrained Models Vincent Sitzmann*, Semon Rezchikov*, William Freeman, Joshua Tenenbaum, Frédo Dur

Vincent Sitzmann 130 Dec 29, 2022
PyTorch implementation of the Crafting Better Contrastive Views for Siamese Representation Learning

Crafting Better Contrastive Views for Siamese Representation Learning This is the official PyTorch implementation of the ContrastiveCrop paper: @artic

249 Dec 28, 2022
Machine Translation Implement By Bi-GRU And Transformer

Seq2Seq Translation Implement By Bidirectional GRU And Transformer In Pytorch Before You Run The Code You should download the data through the link be

He Wang 2 Oct 27, 2021
The backbone CSPDarkNet of YOLOX.

YOLOX-Backbone The backbone CSPDarkNet of YOLOX. In this project, you can enjoy: CSPDarkNet-S CSPDarkNet-M CSPDarkNet-L CSPDarkNet-X CSPDarkNet-Tiny C

Jianhua Yang 9 Aug 22, 2022
A powerful framework for decentralized federated learning with user-defined communication topology

Scatterbrained Decentralized Federated Learning Scatterbrained makes it easy to build federated learning systems. In addition to traditional federated

Johns Hopkins Applied Physics Laboratory 7 Sep 26, 2022
Code for "Optimizing risk-based breast cancer screening policies with reinforcement learning"

Tempo: Optimizing risk-based breast cancer screening policies with reinforcement learning Introduction This repository was used to develop Tempo, as d

Adam Yala 12 Oct 11, 2022
Model-based 3D Hand Reconstruction via Self-Supervised Learning, CVPR2021

S2HAND: Model-based 3D Hand Reconstruction via Self-Supervised Learning S2HAND presents a self-supervised 3D hand reconstruction network that can join

Yujin Chen 72 Dec 12, 2022
Deep Implicit Moving Least-Squares Functions for 3D Reconstruction

DeepMLS: Deep Implicit Moving Least-Squares Functions for 3D Reconstruction This repository contains the implementation of the paper: Deep Implicit Mo

103 Dec 22, 2022
Simple implementation of OpenAI CLIP model in PyTorch.

It was in January of 2021 that OpenAI announced two new models: DALL-E and CLIP, both multi-modality models connecting texts and images in some way. In this article we are going to implement CLIP mod

Moein Shariatnia 226 Jan 05, 2023
Pytorch implementation for RelTransformer

RelTransformer Our Architecture This is a Pytorch implementation for RelTransformer The implementation for Evaluating on VG200 can be found here Requi

Vision CAIR Research Group, KAUST 21 Nov 22, 2022
Hierarchical Time Series Forecasting with a familiar API

scikit-hts Hierarchical Time Series with a familiar API. This is the result from not having found any good implementations of HTS on-line, and my work

Carlo Mazzaferro 204 Dec 17, 2022
Deep Hedging Demo - An Example of Using Machine Learning for Derivative Pricing.

Deep Hedging Demo Pricing Derivatives using Machine Learning 1) Jupyter version: Run ./colab/deep_hedging_colab.ipynb on Colab. 2) Gui version: Run py

Yu Man Tam 102 Jan 06, 2023
PyTorch for Semantic Segmentation

PyTorch for Semantic Segmentation This repository contains some models for semantic segmentation and the pipeline of training and testing models, impl

Zijun Deng 1.7k Jan 06, 2023
PIXIE: Collaborative Regression of Expressive Bodies

PIXIE: Collaborative Regression of Expressive Bodies [Project Page] This is the official Pytorch implementation of PIXIE. PIXIE reconstructs an expres

Yao Feng 331 Jan 04, 2023
Progressive Domain Adaptation for Object Detection

Progressive Domain Adaptation for Object Detection Implementation of our paper Progressive Domain Adaptation for Object Detection, based on pytorch-fa

96 Nov 25, 2022
Benchmark for Answering Existential First Order Queries with Single Free Variable

EFO-1-QA Benchmark for First Order Query Estimation on Knowledge Graphs This repository contains an entire pipeline for the EFO-1-QA benchmark. EFO-1

HKUST-KnowComp 14 Oct 24, 2022
Dashboard for the COVID19 spread

COVID-19 Data Explorer App A streamlit Dashboard for the COVID-19 spread. The app is live at: [https://covid19.cwerner.ai]. New data is queried from G

Christian Werner 22 Sep 29, 2022