A memory-efficient implementation of DenseNets

Last update: Dec 25, 2022

Overview

efficient_densenet_pytorch

A PyTorch >=1.0 implementation of DenseNets, optimized to save GPU memory.

Recent updates

Now works on PyTorch 1.0! It uses the checkpointing feature, which makes this code WAY more efficient!!!

Motivation

While DenseNets are fairly easy to implement in deep learning frameworks, most implmementations (such as the original) tend to be memory-hungry. In particular, the number of intermediate feature maps generated by batch normalization and concatenation operations grows quadratically with network depth. It is worth emphasizing that this is not a property inherent to DenseNets, but rather to the implementation.

This implementation uses a new strategy to reduce the memory consumption of DenseNets. We use checkpointing to compute the Batch Norm and concatenation feature maps. These intermediate feature maps are discarded during the forward pass and recomputed for the backward pass. This adds 15-20% of time overhead for training, but reduces feature map consumption from quadratic to linear.

This implementation is inspired by this technical report, which outlines a strategy for efficient DenseNets via memory sharing.

Requirements

PyTorch >=1.0.0
CUDA

Usage

In your existing project: There is one file in the models folder.

models/densenet.py is an implementation based off the torchvision and project killer implementations.

If you care about speed, and memory is not an option, pass the efficient=False argument into the DenseNet constructor. Otherwise, pass in efficient=True.

Options:

All options are described in the docstrings of the model files
The depth is controlled by block_config option
efficient=True uses the memory-efficient version
If you want to use the model for ImageNet, set small_inputs=False. For CIFAR or SVHN, set small_inputs=True.

Running the demo:

The only extra package you need to install is python-fire:

pip install fire

Single GPU:

CUDA_VISIBLE_DEVICES=0 python demo.py --efficient True --data <path_to_folder_with_cifar10> --save <path_to_save_dir>

Multiple GPU:

CUDA_VISIBLE_DEVICES=0,1,2 python demo.py --efficient True --data <path_to_folder_with_cifar10> --save <path_to_save_dir>

Options:

--depth (int) - depth of the network (number of convolution layers) (default 40)
--growth_rate (int) - number of features added per DenseNet layer (default 12)
--n_epochs (int) - number of epochs for training (default 300)
--batch_size (int) - size of minibatch (default 256)
--seed (int) - manually set the random seed (default None)

Performance

A comparison of the two implementations (each is a DenseNet-BC with 100 layers, batch size 64, tested on a NVIDIA Pascal Titan-X):

Implementation	Memory cosumption (GB/GPU)	Speed (sec/mini batch)
Naive	2.863	0.165
Efficient	1.605	0.207
Efficient (multi-GPU)	0.985	-

Other efficient implementations

LuaTorch (by Gao Huang)
Tensorflow (by Joe Yearsley)
Caffe (by Tongcheng Li)

Reference

@article{pleiss2017memory,
  title={Memory-Efficient Implementation of DenseNets},
  author={Pleiss, Geoff and Chen, Danlu and Huang, Gao and Li, Tongcheng and van der Maaten, Laurens and Weinberger, Kilian Q},
  journal={arXiv preprint arXiv:1707.06990},
  year={2017}
}

A memory-efficient implementation of DenseNets

Related tags

Overview

efficient_densenet_pytorch

Recent updates

Motivation

Requirements

Usage

Performance

Other efficient implementations

Reference

Owner

Geoff Pleiss

ManipulaTHOR, a framework that facilitates visual manipulation of objects using a robotic arm

Prototypical python implementation of the trust-region algorithm presented in Sequential Linearization Method for Bound-Constrained Mathematical Programs with Complementarity Constraints by Larson, Leyffer, Kirches, and Manns.

Spiking Neural Network for Computer Vision using SpikingJelly framework and Pytorch-Lightning

Reimplement of SimSwap training code

[ICCV 2021 Oral] PoinTr: Diverse Point Cloud Completion with Geometry-Aware Transformers

IJON is an annotation mechanism that analysts can use to guide fuzzers such as AFL.

Code for "Sparse Steerable Convolutions: An Efficient Learning of SE(3)-Equivariant Features for Estimation and Tracking of Object Poses in 3D Space"

Lightweight Cuda Renderer with Python Wrapper.

Adabelief-Optimizer - Repository for NeurIPS 2020 Spotlight "AdaBelief Optimizer: Adapting stepsizes by the belief in observed gradients"

How to Predict Stock Prices Easily Demo

[ICCV2021] Learning to Track Objects from Unlabeled Videos

Nonnegative spatial factorization for multivariate count data

[ICML 2021] Towards Understanding and Mitigating Social Biases in Language Models

Pytorch code for paper "Image Compressed Sensing Using Non-local Neural Network" TMM 2021.

[ICCV 2021 Oral] NerfingMVS: Guided Optimization of Neural Radiance Fields for Indoor Multi-view Stereo

Real-time face detection and emotion/gender classification using fer2013/imdb datasets with a keras CNN model and openCV.

Pytorch implementation of FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks

Rlmm blender toolkit - A set of tools to streamline level generation in UDK straight from Blender

A "gym" style toolkit for building lightweight Neural Architecture Search systems

Code repository for "Reducing Underflow in Mixed Precision Training by Gradient Scaling" presented at IJCAI '20