Convnet transfer - Code for paper How transferable are features in deep neural networks?

Overview

How transferable are features in deep neural networks?

This repository contains source code necessary to reproduce the results presented in the following paper:

@inproceedings{yosinski_2014_NIPS
  title={How transferable are features in deep neural networks?},
  author={Yosinski, Jason and Clune, Jeff and Bengio, Yoshua and Lipson, Hod},
  booktitle={Advances in Neural Information Processing Systems 27 (NIPS '14)},
  editor = {Z. Ghahramani and M. Welling and C. Cortes and N.D. Lawrence and K.Q. Weinberger},
  publisher = {Curran Associates, Inc.},
  pages = {3320--3328},
  year={2014}
}

The are four steps to using this codebase to reproduce the results in the paper.

  • Assemble prerequisites
  • Create datasets
  • Train models
  • Gather and plot results

Each is described below. Training results are also provided in the results directory for those just wishing to compare results to their own work without undertaking the arduous training process.

Assemble prerequisites

Several dependencies should be installed.

  • To run experiments: Caffe and its relevant dependencies (see install tutorial).
  • To produce plots: the IPython, numpy, and matplotlib packages for python. Depending on your setup, it may be possible to install these via pip install ipython numpy matplotlib.

Create Datasets

1. Obtain ILSVRC 2012 dataset

The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 2012 dataset can be downloaded here (registration required).

2. Create derivative dataset splits

The necessary smaller derivative datasets (random halves, natural and man-made halves, and reduced volume versions) can be created from the raw ILSVRC12 dataset.

$ cd ilsvrc12
$ ./make_reduced_datasets.sh

The script will do most of the work, including setting random seeds to hopefully produce the exact same random splits used in the paper. Md5sums are listed for each dataset file at the bottom of make_reduced_datasets.sh, which can be used to verify the match. Results may vary on different platforms though, so don't worry too much if your sums don't match.

3. Convert datasets to databases

The datasets created above are so far just text files providing a list of image filenames and class ids. To train a Caffe model, they should be converted to a LevelDB or LMDB, one per dataset. See the Caffe ImageNet Tutorial for a more in depth look at this process.

First, edit create_all_leveldbs.sh and set the IMAGENET_DIR and CAFFE_TOOLS_DIR to point to the directories containing the ImageNet image files and compiled caffe tools (like convert_imageset.bin), respectively. Then run:

$ ./create_all_leveldbs.sh

This step takes a lot of space (and time), approximately 230 GB for the base training dataset, and on average 115 GB for each of the 10 split versions, for a total of about 1.5 TB. If this is prohibitive, you might consider using a different type of data layer type for Caffe that loads images directly from a single shared directory.

4. Compute the mean of each dataset

Again, edit the paths in the script to point to the appropriate locations, and then run:

$ ./create_all_means.sh

This just computes the mean of each dataset and saves it in the dataset directory. Means are subtracted from input images during training and inference.

Train models

A total of 163 networks were trained to produce the results in the paper. Many of these networks can be trained in parallel, but because weights are transferred from one network to another, some must be trained serially. In particular, all networks in the first block below must be trained before any in the second block can be trained. All networks within a block may be trained at the same time. The "whenever" block does not contain dependencies and can be trained any time.

Block: one
  half*       (10 nets)

Block: two
  transfer*   (140 nets)

Block: whenever
  netbase     (1 net)
  reduced-*   (12 nets)

To train a given network, change to its directory, copy (or symlink) the required caffe executable, and run the training procedure. This can be accomplished using the following commands, demonstrated for the half0A network:

$ cd results/half0A
$ cp /path/to/caffe/build/tools/caffe.bin .
$ ./caffe.bin train -solver imagenet_solver.prototxt

Repeat this process for all networks in block: one and block: whenever above. Once the networks in block: one are trained, train all the networks in block: two similarly. This time the command is slightly different, because we need to load the base network in order to fine-tune it on the target task. Here's an example for the transfer0A0A_1_1 network:

$ cd results/transfer0A0A_1_1
$ cp /path/to/caffe/build/tools/caffe.bin .
$ ./caffe.bin train -solver imagenet_solver.prototxt -weights basenet/caffe_imagenet_train_iter_450000

The basenet symlinks have been added to point to the appropriate base network, but the basenet/caffe_imagenet_train_iter_450000 file will not exist until the relevant block: one networks has been trained.

Training notes: while the above procedure should work if followed literally, because each network takes about 9.5 days to train (on a K20 GPU), it will be much faster to train networks in parallel in a cluster environment. To do so, create and submit jobs as appropriate for your system. You'll also want to ensure that the output of the training procedure is logged, either by piping to a file

$ ./caffe.bin train ... > log_file 2>&1

or via whatever logging facilities are supplied by your cluster or job manager setup.

Plot results

Once the networks are trained, the results can be plotted using the included IPython notebook plots/transfer_plots.ipynb. Start the IPython Notebook server:

$ cd plots
$ ipython notebook

Select the transfer_plots.ipynb notebook and execute the included code. Note that without modification, the code will load results from the cached log files included in this repository. If you've run your own training and wish to plot those log files, change the paths in the "Load all the data" section to point to your log files instead.

Shortcut: to skip all the work and just see the results, take a look at this notebook with cached plots.

Questions?

Please drop me a line if you have any questions!

Owner
Jason Yosinski
Jason Yosinski
Implementation of the paper: "SinGAN: Learning a Generative Model from a Single Natural Image"

SinGAN This is an unofficial implementation of SinGAN from someone who's been sitting right next to SinGAN's creator for almost five years. Please ref

35 Nov 10, 2022
[NeurIPS 2020] Code for the paper "Balanced Meta-Softmax for Long-Tailed Visual Recognition"

Balanced Meta-Softmax Code for the paper Balanced Meta-Softmax for Long-Tailed Visual Recognition Jiawei Ren, Cunjun Yu, Shunan Sheng, Xiao Ma, Haiyu

Jiawei Ren 65 Dec 21, 2022
Solver for Large-Scale Rank-One Semidefinite Relaxations

STRIDE: spectrahedral proximal gradient descent along vertices A Solver for Large-Scale Rank-One Semidefinite Relaxations About STRIDE is designed for

48 Dec 20, 2022
Official Pytorch implementation of 6DRepNet: 6D Rotation representation for unconstrained head pose estimation.

6D Rotation Representation for Unconstrained Head Pose Estimation (Pytorch) Paper Thorsten Hempel and Ahmed A. Abdelrahman and Ayoub Al-Hamadi, "6D Ro

Thorsten Hempel 284 Dec 23, 2022
NVIDIA Merlin is an open source library providing end-to-end GPU-accelerated recommender systems, from feature engineering and preprocessing to training deep learning models and running inference in production.

NVIDIA Merlin NVIDIA Merlin is an open source library designed to accelerate recommender systems on NVIDIA’s GPUs. It enables data scientists, machine

419 Jan 03, 2023
Spectral normalization (SN) is a widely-used technique for improving the stability and sample quality of Generative Adversarial Networks (GANs)

Why Spectral Normalization Stabilizes GANs: Analysis and Improvements [paper (NeurIPS 2021)] [paper (arXiv)] [code] Authors: Zinan Lin, Vyas Sekar, Gi

Zinan Lin 32 Dec 16, 2022
[AAAI 2021] MVFNet: Multi-View Fusion Network for Efficient Video Recognition

MVFNet: Multi-View Fusion Network for Efficient Video Recognition (AAAI 2021) Overview We release the code of the MVFNet (Multi-View Fusion Network).

Wenhao Wu 114 Nov 27, 2022
Augmented Traffic Control: A tool to simulate network conditions

Augmented Traffic Control Full documentation for the project is available at http://facebook.github.io/augmented-traffic-control/. Overview Augmented

Meta Archive 4.3k Jan 08, 2023
nanodet_plus,yolov5_v6.0

OAK_Detection OAK设备上适配nanodet_plus,yolov5_v6.0 Environment pytorch = 1.7.0

炼丹去了 1 Feb 18, 2022
Explainer for black box models that predict molecule properties

Explaining why that molecule exmol is a package to explain black-box predictions of molecules. The package uses model agnostic explanations to help us

White Laboratory 172 Dec 19, 2022
Code for KiloNeRF: Speeding up Neural Radiance Fields with Thousands of Tiny MLPs

KiloNeRF: Speeding up Neural Radiance Fields with Thousands of Tiny MLPs Check out the paper on arXiv: https://arxiv.org/abs/2103.13744 This repo cont

Christian Reiser 373 Dec 20, 2022
Athena is the only tool that you will ever need to optimize your portfolio.

Athena Portfolio optimization is the process of selecting the best portfolio (asset distribution), out of the set of all portfolios being considered,

Indrajit 1 Mar 25, 2022
Human POSEitioning System (HPS): 3D Human Pose Estimation and Self-localization in Large Scenes from Body-Mounted Sensors, CVPR 2021

Human POSEitioning System (HPS): 3D Human Pose Estimation and Self-localization in Large Scenes from Body-Mounted Sensors Human POSEitioning System (H

Aymen Mir 66 Dec 21, 2022
A-SDF: Learning Disentangled Signed Distance Functions for Articulated Shape Representation (ICCV 2021)

A-SDF: Learning Disentangled Signed Distance Functions for Articulated Shape Representation (ICCV 2021) This repository contains the official implemen

81 Dec 14, 2022
Multi-angle c(q)uestion answering

Macaw Introduction Macaw (Multi-angle c(q)uestion answering) is a ready-to-use model capable of general question answering, showing robustness outside

AI2 430 Jan 04, 2023
Hyperbolic Hierarchical Clustering.

Hyperbolic Hierarchical Clustering (HypHC) This code is the official PyTorch implementation of the NeurIPS 2020 paper: From Trees to Continuous Embedd

HazyResearch 154 Dec 15, 2022
PyTorch implementation of ICLR 2022 paper PiCO: Contrastive Label Disambiguation for Partial Label Learning

PiCO: Contrastive Label Disambiguation for Partial Label Learning This is a PyTorch implementation of ICLR 2022 Oral paper PiCO; also see our Project

王皓波 147 Jan 07, 2023
A certifiable defense against adversarial examples by training neural networks to be provably robust

DiffAI v3 DiffAI is a system for training neural networks to be provably robust and for proving that they are robust. The system was developed for the

SRI Lab, ETH Zurich 202 Dec 13, 2022
A Partition Filter Network for Joint Entity and Relation Extraction EMNLP 2021

EMNLP 2021 - A Partition Filter Network for Joint Entity and Relation Extraction

zhy 127 Jan 04, 2023
a baseline to practice

ccks2021_track3_baseline a baseline to practice 路径可能会有问题,自己改改 torch==1.7.1 pyhton==3.7.1 transformers==4.7.0 cuda==11.0 this is a baseline, you can fi

45 Nov 23, 2022