Implementing DropPath/StochasticDepth in PyTorch

Related tags

Deep LearningDropPath
Overview
%load_ext memory_profiler

Implementing Stochastic Depth/Drop Path In PyTorch

DropPath is available on glasses my computer vision library!

Introduction

Today we are going to implement Stochastic Depth also known as Drop Path in PyTorch! Stochastic Depth introduced by Gao Huang et al is technique to "deactivate" some layers during training.

Let's take a look at a normal ResNet Block that uses residual connections (like almost all models now).If you are not familiar with ResNet, I have an article showing how to implement it.

Basically, the block's output is added to its input: output = block(input) + input. This is called a residual connection

alt

Here we see four ResnNet like blocks, one after the other.

alt

Stochastic Depth/Drop Path will deactivate some of the block's weight

alt

The idea is to reduce the number of layers/block used during training, saving time and make the network generalize better.

Practically, this means setting to zero the output of the block before adding.

Implementation

Let's start by importing our best friend, torch.

import torch
from torch import nn
from torch import Tensor

We can define a 4D tensor (batch x channels x height x width), in our case let's just send 4 images with one pixel each :)

x = torch.ones((4, 1, 1, 1))

We need a tensor of shape batch x 1 x 1 x 1 that will be used to set some of the elements in the batch to zero, using a given prob. Bernoulli to the rescue!

keep_prob: float = .5
mask: Tensor = x.new_empty(x.shape[0], 1, 1, 1).bernoulli_(keep_prob)
    
mask
tensor([[[[0.]]],


        [[[1.]]],


        [[[1.]]],


        [[[1.]]]])

Btw, this is equivelant to

mask: Tensor = (torch.rand(x.shape[0], 1, 1, 1) > keep_prob).float()
mask
tensor([[[[1.]]],


        [[[1.]]],


        [[[1.]]],


        [[[1.]]]])

Before we multiply x by the mask we need to divide x by keep_prob to rescale down the inputs activation during training, see cs231n. So

x_scaled : Tensor = x / keep_prob
x_scaled
tensor([[[[2.]]],


        [[[2.]]],


        [[[2.]]],


        [[[2.]]]])

Finally

output: Tensor = x_scaled * mask
output
tensor([[[[2.]]],


        [[[2.]]],


        [[[2.]]],


        [[[2.]]]])

We can put together in a function

def drop_path(x: Tensor, keep_prob: float = 1.0) -> Tensor:
    mask: Tensor = x.new_empty(x.shape[0], 1, 1, 1).bernoulli_(keep_prob)
    x_scaled: Tensor = x / keep_prob
    return x_scaled * mask

drop_path(x, keep_prob=0.5)
tensor([[[[0.]]],


        [[[0.]]],


        [[[2.]]],


        [[[0.]]]])

We can also do the operation in place

def drop_path(x: Tensor, keep_prob: float = 1.0) -> Tensor:
    mask: Tensor = x.new_empty(x.shape[0], 1, 1, 1).bernoulli_(keep_prob)
    x.div_(keep_prob)
    x.mul_(mask)
    return x


drop_path(x, keep_prob=0.5)
tensor([[[[2.]]],


        [[[2.]]],


        [[[0.]]],


        [[[0.]]]])

However, we may want to use x somewhere else, and dividing x or mask by keep_prob is the same thing. Let's arrive at the final implementation

def drop_path(x: Tensor, keep_prob: float = 1.0, inplace: bool = False) -> Tensor:
    mask: Tensor = x.new_empty(x.shape[0], 1, 1, 1).bernoulli_(keep_prob)
    mask.div_(keep_prob)
    if inplace:
        x.mul_(mask)
    else:
        x = x * mask
    return x

x = torch.ones((4, 1, 1, 1))
drop_path(x, keep_prob=0.8)
tensor([[[[1.2500]]],


        [[[1.2500]]],


        [[[1.2500]]],


        [[[1.2500]]]])

drop_path only works for 2d data, we need to automatically calculate the number of dimensions from the input size to make it work for any data time

def drop_path(x: Tensor, keep_prob: float = 1.0, inplace: bool = False) -> Tensor:
    mask_shape: Tuple[int] = (x.shape[0],) + (1,) * (x.ndim - 1) 
    # remember tuples have the * operator -> (1,) * 3 = (1,1,1)
    mask: Tensor = x.new_empty(mask_shape).bernoulli_(keep_prob)
    mask.div_(keep_prob)
    if inplace:
        x.mul_(mask)
    else:
        x = x * mask
    return x

x = torch.ones((4, 1))
drop_path(x, keep_prob=0.8)
tensor([[0.],
        [0.],
        [0.],
        [0.]])

Let's create a nice DropPath nn.Module

class DropPath(nn.Module):
    def __init__(self, p: float = 0.5, inplace: bool = False):
        super().__init__()
        self.p = p
        self.inplace = inplace

    def forward(self, x: Tensor) -> Tensor:
        if self.training and self.p > 0:
            x = drop_path(x, self.p, self.inplace)
        return x

    def __repr__(self):
        return f"{self.__class__.__name__}(p={self.p})"

    
DropPath()(torch.ones((4, 1)))
tensor([[2.],
        [0.],
        [0.],
        [0.]])

Usage with Residual Connections

We have our DropPath, cool but how do we use it? We need a classic ResNet block, let's implement our good old friend BottleNeckBlock

from torch import nn


class ConvBnAct(nn.Sequential):
    def __init__(self, in_features: int, out_features: int, kernel_size=1):
        super().__init__(
            nn.Conv2d(in_features, out_features, kernel_size=kernel_size, padding=kernel_size // 2),
            nn.BatchNorm2d(out_features),
            nn.ReLU()
        )
         

class BottleNeck(nn.Module):
    def __init__(self, in_features: int, out_features: int, reduction: int = 4):
        super().__init__()
        self.block = nn.Sequential(
            # wide -> narrow
            ConvBnAct(in_features, out_features // reduction, kernel_size=1),
            # narrow -> narrow
            ConvBnAct( out_features // reduction, out_features // reduction, kernel_size=3),
            # wide -> narrow
            ConvBnAct( out_features // reduction, out_features, kernel_size=1),
        )
        # I am lazy, no shortcut etc
        
    def forward(self, x: Tensor) -> Tensor:
        res = x
        x = self.block(x)
        return x + res
    
    
BottleNeck(64, 64)(torch.ones((1,64, 28, 28))).shape
torch.Size([1, 64, 28, 28])

To deactivate the block the operation x + res must be equal to res, so our DropPath has to be applied after the block.

class BottleNeck(nn.Module):
    def __init__(self, in_features: int, out_features: int, reduction: int = 4):
        super().__init__()
        self.block = nn.Sequential(
            # wide -> narrow
            ConvBnAct(in_features, out_features // reduction, kernel_size=1),
            # narrow -> narrow
            ConvBnAct( out_features // reduction, out_features // reduction, kernel_size=3),
            # wide -> narrow
            ConvBnAct( out_features // reduction, out_features, kernel_size=1),
        )
        # I am lazy, no shortcut etc
        self.drop_path = DropPath()
        
    def forward(self, x: Tensor) -> Tensor:
        res = x
        x = self.block(x)
        x = self.drop_path(x)
        return x + res
    
BottleNeck(64, 64)(torch.ones((1,64, 28, 28)))
tensor([[[[1.0009, 1.0000, 1.0000,  ..., 1.0000, 1.0000, 1.0000],
          [1.0134, 1.0034, 1.0034,  ..., 1.0034, 1.0034, 1.0000],
          [1.0134, 1.0034, 1.0034,  ..., 1.0034, 1.0034, 1.0000],
          ...,
          [1.0134, 1.0034, 1.0034,  ..., 1.0034, 1.0034, 1.0000],
          [1.0134, 1.0034, 1.0034,  ..., 1.0034, 1.0034, 1.0000],
          [1.0000, 1.0000, 1.0000,  ..., 1.0000, 1.0000, 1.0000]],

         [[1.0005, 1.0000, 1.0000,  ..., 1.0000, 1.0000, 1.0000],
          [1.0000, 1.0000, 1.0000,  ..., 1.0000, 1.0000, 1.0421],
          [1.0000, 1.0000, 1.0000,  ..., 1.0000, 1.0000, 1.0421],
          ...,
          [1.0000, 1.0000, 1.0000,  ..., 1.0000, 1.0000, 1.0421],
          [1.0000, 1.0000, 1.0000,  ..., 1.0000, 1.0000, 1.0421],
          [1.0000, 1.0011, 1.0011,  ..., 1.0011, 1.0011, 1.0247]],

         [[1.0203, 1.0123, 1.0123,  ..., 1.0123, 1.0123, 1.0299],
          [1.0000, 1.0005, 1.0005,  ..., 1.0005, 1.0005, 1.0548],
          [1.0000, 1.0005, 1.0005,  ..., 1.0005, 1.0005, 1.0548],
          ...,
          [1.0000, 1.0005, 1.0005,  ..., 1.0005, 1.0005, 1.0548],
          [1.0000, 1.0005, 1.0005,  ..., 1.0005, 1.0005, 1.0548],
          [1.0000, 1.0000, 1.0000,  ..., 1.0000, 1.0000, 1.0000]],

         ...,

         [[1.0011, 1.0180, 1.0180,  ..., 1.0180, 1.0180, 1.0465],
          [1.0000, 1.0000, 1.0000,  ..., 1.0000, 1.0000, 1.0245],
          [1.0000, 1.0000, 1.0000,  ..., 1.0000, 1.0000, 1.0245],
          ...,
          [1.0000, 1.0000, 1.0000,  ..., 1.0000, 1.0000, 1.0245],
          [1.0000, 1.0000, 1.0000,  ..., 1.0000, 1.0000, 1.0245],
          [1.0000, 1.0000, 1.0000,  ..., 1.0000, 1.0000, 1.0000]],

         [[1.0130, 1.0170, 1.0170,  ..., 1.0170, 1.0170, 1.0213],
          [1.0052, 1.0000, 1.0000,  ..., 1.0000, 1.0000, 1.0065],
          [1.0052, 1.0000, 1.0000,  ..., 1.0000, 1.0000, 1.0065],
          ...,
          [1.0052, 1.0000, 1.0000,  ..., 1.0000, 1.0000, 1.0065],
          [1.0052, 1.0000, 1.0000,  ..., 1.0000, 1.0000, 1.0065],
          [1.0012, 1.0139, 1.0139,  ..., 1.0139, 1.0139, 1.0065]],

         [[1.0103, 1.0181, 1.0181,  ..., 1.0181, 1.0181, 1.0539],
          [1.0001, 1.0016, 1.0016,  ..., 1.0016, 1.0016, 1.0231],
          [1.0001, 1.0016, 1.0016,  ..., 1.0016, 1.0016, 1.0231],
          ...,
          [1.0001, 1.0016, 1.0016,  ..., 1.0016, 1.0016, 1.0231],
          [1.0001, 1.0016, 1.0016,  ..., 1.0016, 1.0016, 1.0231],
          [1.0000, 1.0000, 1.0000,  ..., 1.0000, 1.0000, 1.0000]]]],
       grad_fn=<AddBackward0>)

Tada 🎉 ! Now, randomly, our .block will be completely skipped!


Owner
Francesco Saverio Zuppichini
Computer Vision Engineer @ 🤗 BSc informatics. MSc AI. Artificial Intelligence /Deep Learning Enthusiast & Full Stack developer
Francesco Saverio Zuppichini
This repository is the code of the paper "Sparse Spatial Transformers for Few-Shot Learning".

🌟 Sparse Spatial Transformers for Few-Shot Learning This code implements the Sparse Spatial Transformers for Few-Shot Learning(SSFormers). Our code i

chx_nju 38 Dec 13, 2022
RefineGNN - Iterative refinement graph neural network for antibody sequence-structure co-design (RefineGNN)

Iterative refinement graph neural network for antibody sequence-structure co-des

Wengong Jin 83 Dec 31, 2022
A set of tools to pre-calibrate and calibrate (multi-focus) plenoptic cameras (e.g., a Raytrix R12) based on the libpleno.

COMPOTE: Calibration Of Multi-focus PlenOpTic camEra. COMPOTE is a set of tools to pre-calibrate and calibrate (multifocus) plenoptic cameras (e.g., a

ComSEE - Computers that SEE 4 May 10, 2022
Deep Illuminator is a data augmentation tool designed for image relighting. It can be used to easily and efficiently generate a wide range of illumination variants of a single image.

Deep Illuminator Deep Illuminator is a data augmentation tool designed for image relighting. It can be used to easily and efficiently generate a wide

George Chogovadze 52 Nov 29, 2022
PyTorch version repo for CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes

Study-CSRNet-pytorch This is the PyTorch version repo for CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes

0 Mar 01, 2022
Implementation of association rules mining algorithms (Apriori|FPGrowth) using python.

Association Rules Mining Using Python Implementation of association rules mining algorithms (Apriori|FPGrowth) using python. As a part of hw1 code in

Pre 2 Nov 10, 2021
A list of multi-task learning papers and projects.

This page contains a list of papers on multi-task learning for computer vision. Please create a pull request if you wish to add anything. If you are interested, consider reading our recent survey pap

svandenh 297 Dec 17, 2022
Time-stretch audio clips quickly with PyTorch (CUDA supported)! Additional utilities for searching efficient transformations are included.

Time-stretch audio clips quickly with PyTorch (CUDA supported)! Additional utilities for searching efficient transformations are included.

Kento Nishi 22 Jul 07, 2022
CSE-519---Project - Job Title Analysis (Project for CSE 519 - Data Science Fundamentals)

A Multifaceted Approach to Job Title Analysis CSE 519 - Data Science Fundamentals Project Description Project consists of three parts: Salary Predicti

Jimit Dholakia 1 Jan 04, 2022
This repository contains part of the code used to make the images visible in the article "How does an AI Imagine the Universe?" published on Towards Data Science.

Generative Adversarial Network - Generating Universe This repository contains part of the code used to make the images visible in the article "How doe

Davide Coccomini 9 Dec 18, 2022
EsViT: Efficient self-supervised Vision Transformers

Efficient Self-Supervised Vision Transformers (EsViT) PyTorch implementation for EsViT, built with two techniques: A multi-stage Transformer architect

Microsoft 352 Dec 25, 2022
Implementation of a protein autoregressive language model, but with autoregressive infilling objective (editing subsequences capability)

Protein GLM (wip) Implementation of a protein autoregressive language model, but with autoregressive infilling objective (editing subsequences capabil

Phil Wang 17 May 06, 2022
This is an official implementation of CvT: Introducing Convolutions to Vision Transformers.

Introduction This is an official implementation of CvT: Introducing Convolutions to Vision Transformers. We present a new architecture, named Convolut

Bin Xiao 175 Jan 08, 2023
DeepFaceLab fork which provides IPython Notebook to use DFL with Google Colab

DFL-Colab — DeepFaceLab fork for Google Colab This project provides you IPython Notebook to use DeepFaceLab with Google Colaboratory. You can create y

779 Jan 05, 2023
Imaging, analysis, and simulation software for radio interferometry

ehtim (eht-imaging) Python modules for simulating and manipulating VLBI data and producing images with regularized maximum likelihood methods. This ve

Andrew Chael 5.2k Dec 28, 2022
[CVPR 2022] PoseTriplet: Co-evolving 3D Human Pose Estimation, Imitation, and Hallucination under Self-supervision (Oral)

PoseTriplet: Co-evolving 3D Human Pose Estimation, Imitation, and Hallucination under Self-supervision Kehong Gong*, Bingbing Li*, Jianfeng Zhang*, Ta

256 Dec 28, 2022
Neural network pruning for finding a sparse computational model for controlling a biological motor task.

MothPruning Scientific Overview Originally inspired by biological nervous systems, deep neural networks (DNNs) are powerful computational tools for mo

Olivia Thomas 0 Dec 14, 2022
Attentive Implicit Representation Networks (AIR-Nets)

Attentive Implicit Representation Networks (AIR-Nets) Preprint | Supplementary | Accepted at the International Conference on 3D Vision (3DV) teaser.mo

29 Dec 07, 2022
Code for the paper: On Pathologies in KL-Regularized Reinforcement Learning from Expert Demonstrations

Non-Parametric Prior Actor-Critic (N-PPAC) This repository contains the code for On Pathologies in KL-Regularized Reinforcement Learning from Expert D

Cong Lu 5 May 13, 2022
Implémentation en pyhton de l'article Depixelizing pixel art de Johannes Kopf et Dani Lischinski

Implémentation en pyhton de l'article Depixelizing pixel art de Johannes Kopf et Dani Lischinski

TableauBits 3 May 29, 2022