A PyTorch implementation of DenseNet.

Overview

A PyTorch Implementation of DenseNet

This is a PyTorch implementation of the DenseNet-BC architecture as described in the paper Densely Connected Convolutional Networks by G. Huang, Z. Liu, K. Weinberger, and L. van der Maaten. This implementation gets a CIFAR-10+ error rate of 4.77 with a 100-layer DenseNet-BC with a growth rate of 12. Their official implementation and links to many other third-party implementations are available in the liuzhuang13/DenseNet repo on GitHub.

Why DenseNet?

As this table from the DenseNet paper shows, it provides competitive state of the art results on CIFAR-10, CIFAR-100, and SVHN.

Why yet another DenseNet implementation?

PyTorch is a great new framework and it's nice to have these kinds of re-implementations around so that they can be integrated with other PyTorch projects.

How do you know this implementation is correct?

Interestingly while implementing this, I had a lot of trouble getting it to converge and looked at every part of the code closer than I usually would. I compared all of the model's hidden states and gradients with the official implementation to make sure my code was correct and even trained a VGG-style network on CIFAR-10 with the training code here. It turns out that I uncovered a new critical PyTorch bug (now fixed) that was causing this.

I have left around my original message about how this isn't working and the things that I have checked in this document. I think this should be interesting for other people to see my development and debugging strategies when having issues implementing a model that's known to converge. I also started this PyTorch forum thread, which has a few other discussion points. You may also be interested in my script that compares PyTorch gradients to Torch gradients and my script that numerically checks PyTorch gradients.

My convergence issues were due to a critical PyTorch bug related to using torch.cat with convolutions with cuDNN enabled (which it is by default when CUDA is used). This bug caused incorrect gradients and the fix to this bug is to disable cuDNN (which doesn't have to be done anymore because it's fixed). The oversight in my debugging strategies that caused me to not find this error is that I did not think to disable cuDNN. Until now, I have assumed that the cuDNN option in frameworks are bug-free, but have learned that this is not always the case. I may have also found something if I would have numerically debugged torch.cat layers with convolutions instead of fully connected layers.

Adam fixed the PyTorch bug that caused this in this PR and has been merged into Torch's master branch. If you are interested in using the DenseNet code in this repository, make sure your PyTorch version contains this PR and was downloaded after 2017-02-10.

What does the PyTorch compute graph of the model look like?

You can see the compute graph here, which I created with make_graph.py, which I copied from Adam Paszke's gist. Adam says PyTorch will soon have a better way to create compute graphs.

How does this implementation perform?

By default, this repo trains a 100-layer DenseNet-BC with an growth rate of 12 on the CIFAR-10 dataset with data augmentations. Due to GPU memory sizes, this is the largest model I am able to run. The paper reports a final test error of 4.51 with this architecture and we obtain a final test error of 4.77.

Why don't people use ADAM instead of SGD for training ResNet-style models?

I also tried training a net with ADAM and found that it didn't converge as well with the default hyper-parameters compared to SGD with a reasonable learning rate schedule.

What about the non-BC version?

I haven't tested this as thoroughly, you should make sure it's working as expected if you plan to use and modify it. Let me know if you find anything wrong with it.

A paradigm for ML code

I like to include a few features in my projects that I don't see in some other re-implementations that are present in this repo. The training code in train.py uses argparse so the batch size and some other hyper-params can easily be changed and as the model is training, progress is written out to csv files in a work directory also defined by the arguments. Then a separate script plot.py plots the progress written out by the training script. The training script calls plot.py after every epoch, but it can importantly be run on its own so figures can be tweaked without re-running the entire experiment.

Help wanted: Improving memory utilization and multi-GPU support

I think there are ways to improve the memory utilization in this code as in the the official space-efficient Torch implementation. I also would be interested in multi-GPU support.

Running the code and viewing convergence

First install PyTorch (ideally in an anaconda3 distribution). ./train.py will create a model, start training it, and save progress to args.save, which is work/cifar10.base by default. The training script will call plot.py after every epoch to create plots from the saved progress.

Citations

The following is a BibTeX entry for the DenseNet paper that you should cite if you use this model.

@article{Huang2016Densely,
  author = {Huang, Gao and Liu, Zhuang and Weinberger, Kilian Q.},
  title = {Densely Connected Convolutional Networks},
  journal = {arXiv preprint arXiv:1608.06993},
  year = {2016}
}

If you use this implementation, please also consider citing this implementation and code repository with the following BibTeX or plaintext entry. The BibTeX entry requires the url LaTeX package.

@misc{amos2017densenet,
  title = {{A PyTorch Implementation of DenseNet}},
  author = {Amos, Brandon and Kolter, J. Zico},
  howpublished = {\url{https://github.com/bamos/densenet.pytorch}},
  note = {Accessed: [Insert date here]}
}

Brandon Amos, J. Zico Kolter
A PyTorch Implementation of DenseNet
https://github.com/bamos/densenet.pytorch.
Accessed: [Insert date here]

Licensing

This repository is Apache-licensed.

Owner
Brandon Amos
Brandon Amos
Repo for WWW 2022 paper: Progressively Optimized Bi-Granular Document Representation for Scalable Embedding Based Retrieval

BiDR Repo for WWW 2022 paper: Progressively Optimized Bi-Granular Document Representation for Scalable Embedding Based Retrieval. Requirements torch==

Microsoft 11 Oct 20, 2022
Official implementation of NPMs: Neural Parametric Models for 3D Deformable Shapes - ICCV 2021

NPMs: Neural Parametric Models Project Page | Paper | ArXiv | Video NPMs: Neural Parametric Models for 3D Deformable Shapes Pablo Palafox, Aljaz Bozic

PabloPalafox 109 Nov 22, 2022
On-device speech-to-index engine powered by deep learning.

On-device speech-to-index engine powered by deep learning.

Picovoice 30 Nov 24, 2022
ByteTrack with ReID module following the paradigm of FairMOT, tracking strategy is borrowed from FairMOT/JDE.

ByteTrack_ReID ByteTrack is the SOTA tracker in MOT benchmarks with strong detector YOLOX and a simple association strategy only based on motion infor

Han GuangXin 46 Dec 29, 2022
An architecture that makes any doodle realistic, in any specified style, using VQGAN, CLIP and some basic embedding arithmetics.

Sketch Simulator An architecture that makes any doodle realistic, in any specified style, using VQGAN, CLIP and some basic embedding arithmetics. See

12 Dec 18, 2022
OneShot Learning-based hotword detection.

EfficientWord-Net Hotword detection based on one-shot learning Home assistants require special phrases called hotwords to get activated (eg:"ok google

ANT-BRaiN 102 Dec 25, 2022
Implementation of "Large Steps in Inverse Rendering of Geometry"

Large Steps in Inverse Rendering of Geometry ACM Transactions on Graphics (Proceedings of SIGGRAPH Asia), December 2021. Baptiste Nicolet · Alec Jacob

RGL: Realistic Graphics Lab 274 Jan 06, 2023
Bilinear attention networks for visual question answering

Bilinear Attention Networks This repository is the implementation of Bilinear Attention Networks for the visual question answering and Flickr30k Entit

Jin-Hwa Kim 506 Nov 29, 2022
Learning hidden low dimensional dyanmics using a Generalized Onsager Principle and neural networks

OnsagerNet Learning hidden low dimensional dyanmics using a Generalized Onsager Principle and neural networks This is the original pyTorch implemenati

Haijun.Yu 3 Aug 24, 2022
Citation Intent Classification in scientific papers using the Scicite dataset an Pytorch

Citation Intent Classification Table of Contents About the Project Built With Installation Usage Acknowledgments About The Project Citation Intent Cla

Federico Nocentini 4 Mar 04, 2022
Checking fibonacci - Generating the Fibonacci sequence is a classic recursive problem

Fibonaaci Series Generating the Fibonacci sequence is a classic recursive proble

Moureen Caroline O 1 Feb 15, 2022
Perception-aware multi-sensor fusion for 3D LiDAR semantic segmentation (ICCV 2021)

Perception-Aware Multi-Sensor Fusion for 3D LiDAR Semantic Segmentation (ICCV 2021) [中文|EN] 概述 本工作主要探索一种高效的多传感器(激光雷达和摄像头)融合点云语义分割方法。现有的多传感器融合方法主要将点云投影

ICE 126 Dec 30, 2022
PyTorch Implementation of AnimeGANv2

PyTorch implementation of AnimeGANv2

4k Jan 07, 2023
Instant-nerf-pytorch - NeRF trained SUPER FAST in pytorch

instant-nerf-pytorch This is WORK IN PROGRESS, please feel free to contribute vi

94 Nov 22, 2022
Coarse implement of the paper "A Simultaneous Denoising and Dereverberation Framework with Target Decoupling", On DNS-2020 dataset, the DNSMOS of first stage is 3.42 and second stage is 3.47.

SDDNet Coarse implement of the paper "A Simultaneous Denoising and Dereverberation Framework with Target Decoupling", On DNS-2020 dataset, the DNSMOS

Cyril Lv 43 Nov 21, 2022
Airborne Optical Sectioning (AOS) is a wide synthetic-aperture imaging technique

AOS: Airborne Optical Sectioning Airborne Optical Sectioning (AOS) is a wide synthetic-aperture imaging technique that employs manned or unmanned airc

JKU Linz, Institute of Computer Graphics 39 Dec 09, 2022
Free Book about Deep-Learning approaches for Chess (like AlphaZero, Leela Chess Zero and Stockfish NNUE)

Free Book about Deep-Learning approaches for Chess (like AlphaZero, Leela Chess Zero and Stockfish NNUE)

Dominik Klein 189 Dec 21, 2022
The aim of this project is to build an AI bot that can play the Wordle game, or more generally Squabble

Wordle RL The aim of this project is to build an AI bot that can play the Wordle game, or more generally Squabble I know there are more deterministic

Aditya Arora 3 Feb 22, 2022
Python framework for Stochastic Differential Equations modeling

SDElearn: a Python package for SDE modeling This package implements functionalities for working with Stochastic Differential Equations models (SDEs fo

4 May 10, 2022
Code for the paper "Improved Techniques for Training GANs"

Status: Archive (code is provided as-is, no updates expected) improved-gan code for the paper "Improved Techniques for Training GANs" MNIST, SVHN, CIF

OpenAI 2.2k Jan 01, 2023