Instance-based label smoothing for improving deep neural networks generalization and calibration

Last update: Aug 13, 2022

Overview

Instance-based Label Smoothing for Neural Networks

Pytorch Implementation of the algorithm.
This repository includes a new proposed method for instance-based label smoothing in neural networks, where the target probability distribution is not uniformly distributed among incorrect classes. Instead, each incorrect class is going to be assigned a target probability that is proportional to the output score of this particular class relative to all the remaining classes for a network trained with vanilla cross-entropy loss on the hard target labels.

The following figure summarizes the idea of our instance-based label smoothing that aims to keep the information about classes similarity structure while training using label smoothing.

Requirements

Python 3.x
pandas
numpy
pytorch

Usage

Datasets

CIFAR10 / CIFAR100 / FashionMNIST

Files Content

The project have a structure as below:

├── Vanilla-cross-entropy.py
├── Label-smoothing.py
├── Instance-based-smoothing.py
├── Models-evaluation.py
├── Network-distillation.py
├── utils
│   ├── data_loader.py
│   ├── utils.py
│   ├── evaluate.py
│   ├── params.json
├── models
│   ├── resnet.py
│   ├── densenet.py
│   ├── inception.py
│   ├── shallownet.py

Vanilla-cross-entropy.py is the file used for training the networks using cross-entropy without label smoothing.
Label-smoothing.py is the file used for training the networks using cross-entropy with standard label smoothing.
Instance-based-smoothing.py is the file used for training the networks using cross-entropy with instance-based label smoothing.
Models-evaluation.py is the file used for evaluation of the trained networks.
Network-distillation.py is the file used for distillation of trained networks into a shallow convolutional network of 5 layers.
models/ includes all the implementations of the different architectures used in our evaluation like ResNet, DenseNet, Inception-V4. Also, the shallow-cnn student network used in distillation experiments.
utils/ includes all utilities functions required for the different models training and evaluation.

Example

python Instance-based-smoothing.py --dataset cifar10 --model resnet18 --num_classes 10

List of Arguments accepted for Codes of Training and Evaluation of Different Models:

--lr type = float, default = 0.1, help = Starting learning rate (A weight decay of $1e^{-4}$ is used).
--tr_size type = float, default = 0.8, help = Size of training set split out of the whole training set (0.2 for validation).
--batch_size type = int, default = 512, help = Batch size of mini-batch training process.
--epochs type = int, default = 100, help = Number of training epochs.
--estop type = int, default = 10, help = Number of epochs without loss improvement leading to early stopping.
--ece_bins type = int, default = 10, help = Number of bins for expected calibration error calculation.
--dataset, type=str, help=Name of dataset to be used (cifar10/cifar100/fashionmnist).
--num_classes type = int, default = 10, help = Number of classes in the dataset.
--model, type=str, help=Name of the model to be trained. eg: resnet18 / resnet50 / inceptionv4 / densetnet (works for FashionMNIST only).

Results

Results of the comparison of different methods on 3 datasets using 4 different architectures are reported in the following table.
The experiments were repeated 3 times, and average $\pm$ stdev of log loss, expected calibration error (ECE), accuracy, distilled student network accuracy and distilled student log loss metrics are reported.

A t-sne visualization for the logits of 3-different classes in CIFAR-10 can be shown below:

Instance-based label smoothing for improving deep neural networks generalization and calibration

Related tags

Overview

Instance-based Label Smoothing for Neural Networks

Requirements

Usage

Datasets

Files Content

List of Arguments accepted for Codes of Training and Evaluation of Different Models:

Results

Owner

Mohamed Maher

Physics-informed convolutional-recurrent neural networks for solving spatiotemporal PDEs

It's like Shape Editor in Maya but works with skeletons (transforms).

Codes for 'Dual Parameterization of Sparse Variational Gaussian Processes'

Official implementation for "Low-light Image Enhancement via Breaking Down the Darkness"

Revisiting Self-Training for Few-Shot Learning of Language Model.

Simultaneous NMT/MMT framework in PyTorch

InDuDoNet+: A Model-Driven Interpretable Dual Domain Network for Metal Artifact Reduction in CT Images

Local-Global Stratified Transformer for Efficient Video Recognition

KakaoBrain KoGPT (Korean Generative Pre-trained Transformer)

This is an open source library implementing hyperbox-based machine learning algorithms

🦙 LaMa Image Inpainting, Resolution-robust Large Mask Inpainting with Fourier Convolutions, WACV 2022

Fast and Simple Neural Vocoder, the Multiband RNNMS

Machine learning evaluation metrics, implemented in Python, R, Haskell, and MATLAB / Octave

Pytorch Implementation of Value Retrieval with Arbitrary Queries for Form-like Documents.

Dynamic wallpaper generator.

Pytorch implementation of "M-LSD: Towards Light-weight and Real-time Line Segment Detection"

TensorFlow ROCm port

The original implementation of TNDM used in the NeurIPS 2021 paper (no longer being updated)

Distance correlation and related E-statistics in Python

Code for our CVPR 2021 Paper "Rethinking Style Transfer: From Pixels to Parameterized Brushstrokes".