Code for visualizing the loss landscape of neural nets

Overview

Visualizing the Loss Landscape of Neural Nets

This repository contains the PyTorch code for the paper

Hao Li, Zheng Xu, Gavin Taylor, Christoph Studer and Tom Goldstein. Visualizing the Loss Landscape of Neural Nets. NIPS, 2018.

An interactive 3D visualizer for loss surfaces has been provided by telesens.

Given a network architecture and its pre-trained parameters, this tool calculates and visualizes the loss surface along random direction(s) near the optimal parameters. The calculation can be done in parallel with multiple GPUs per node, and multiple nodes. The random direction(s) and loss surface values are stored in HDF5 (.h5) files after they are produced.

Setup

Environment: One or more multi-GPU node(s) with the following software/libraries installed:

Pre-trained models: The code accepts pre-trained PyTorch models for the CIFAR-10 dataset. To load the pre-trained model correctly, the model file should contain state_dict, which is saved from the state_dict() method. The default path for pre-trained networks is cifar10/trained_nets. Some of the pre-trained models and plotted figures can be downloaded here:

Data preprocessing: The data pre-processing method used for visualization should be consistent with the one used for model training. No data augmentation (random cropping or horizontal flipping) is used in calculating the loss values.

Visualizing 1D loss curve

Creating 1D linear interpolations

The 1D linear interpolation method [1] evaluates the loss values along the direction between two minimizers of the same network loss function. This method has been used to compare the flatness of minimizers trained with different batch sizes [2]. A 1D linear interpolation plot is produced using the plot_surface.py method.

mpirun -n 4 python plot_surface.py --mpi --cuda --model vgg9 --x=-0.5:1.5:401 --dir_type states \
--model_file cifar10/trained_nets/vgg9_sgd_lr=0.1_bs=128_wd=0.0_save_epoch=1/model_300.t7 \
--model_file2 cifar10/trained_nets/vgg9_sgd_lr=0.1_bs=8192_wd=0.0_save_epoch=1/model_300.t7 --plot
  • --x=-0.5:1.5:401 sets the range and resolution for the plot. The x-coordinates in the plot will run from -0.5 to 1.5 (the minimizers are located at 0 and 1), and the loss value will be evaluated at 401 locations along this line.
  • --dir_type states indicates the direction contains dimensions for all parameters as well as the statistics of the BN layers (running_mean and running_var). Note that ignoring running_mean and running_var cannot produce correct loss values when plotting two solutions togeather in the same figure.
  • The two model files contain network parameters describing the two distinct minimizers of the loss function. The plot will interpolate between these two minima.

VGG-9 SGD, WD=0

Producing plots along random normalized directions

A random direction with the same dimension as the model parameters is created and "filter normalized." Then we can sample loss values along this direction.

mpirun -n 4 python plot_surface.py --mpi --cuda --model vgg9 --x=-1:1:51 \
--model_file cifar10/trained_nets/vgg9_sgd_lr=0.1_bs=128_wd=0.0_save_epoch=1/model_300.t7 \
--dir_type weights --xnorm filter --xignore biasbn --plot
  • --dir_type weights indicates the direction has the same dimensions as the learned parameters, including bias and parameters in the BN layers.
  • --xnorm filter normalizes the random direction at the filter level. Here, a "filter" refers to the parameters that produce a single feature map. For fully connected layers, a "filter" contains the weights that contribute to a single neuron.
  • --xignore biasbn ignores the direction corresponding to bias and BN parameters (fill the corresponding entries in the random vector with zeros).

VGG-9 SGD, WD=0

We can also customize the appearance of the 1D plots by calling plot_1D.py once the surface file is available.

Visualizing 2D loss contours

To plot the loss contours, we choose two random directions and normalize them in the same way as the 1D plotting.

mpirun -n 4 python plot_surface.py --mpi --cuda --model resnet56 --x=-1:1:51 --y=-1:1:51 \
--model_file cifar10/trained_nets/resnet56_sgd_lr=0.1_bs=128_wd=0.0005/model_300.t7 \
--dir_type weights --xnorm filter --xignore biasbn --ynorm filter --yignore biasbn  --plot

ResNet-56

Once a surface is generated and stored in a .h5 file, we can produce and customize a contour plot using the script plot_2D.py.

python plot_2D.py --surf_file path_to_surf_file --surf_name train_loss
  • --surf_name specifies the type of surface. The default choice is train_loss,
  • --vmin and --vmax sets the range of values to be plotted.
  • --vlevel sets the step of the contours.

Visualizing 3D loss surface

plot_2D.py can make a basic 3D loss surface plot with matplotlib. If you want a more detailed rendering that uses lighting to display details, you can render the loss surface with ParaView.

ResNet-56-noshort ResNet-56

To do this, you must

  1. Convert the surface .h5 file to a .vtp file.
python h52vtp.py --surf_file path_to_surf_file --surf_name train_loss --zmax  10 --log

This will generate a VTK file containing the loss surface with max value 10 in the log scale.

  1. Open the .vtp file with ParaView. In ParaView, open the .vtp file with the VTK reader. Click the eye icon in the Pipeline Browser to make the figure show up. You can drag the surface around, and change the colors in the Properties window.

  2. If the surface appears extremely skinny and needle-like, you may need to adjust the "transforming" parameters in the left control panel. Enter numbers larger than 1 in the "scale" fields to widen the plot.

  3. Select Save screenshot in the File menu to save the image.

Reference

[1] Ian J Goodfellow, Oriol Vinyals, and Andrew M Saxe. Qualitatively characterizing neural network optimization problems. ICLR, 2015.

[2] Nitish Shirish Keskar, Dheevatsa Mudigere, Jorge Nocedal, Mikhail Smelyanskiy, and Ping Tak Peter Tang. On large-batch training for deep learning: Generalization gap and sharp minima. ICLR, 2017.

Citation

If you find this code useful in your research, please cite:

@inproceedings{visualloss,
  title={Visualizing the Loss Landscape of Neural Nets},
  author={Li, Hao and Xu, Zheng and Taylor, Gavin and Studer, Christoph and Goldstein, Tom},
  booktitle={Neural Information Processing Systems},
  year={2018}
}
Owner
Tom Goldstein
Tom Goldstein
This repository contains python code necessary to replicated the experiments performed in our paper "Invariant Ancestry Search"

InvariantAncestrySearch This repository contains python code necessary to replicated the experiments performed in our paper "Invariant Ancestry Search

Phillip Bredahl Mogensen 0 Feb 02, 2022
👐OpenHands : Making Sign Language Recognition Accessible (WiP 🚧👷‍♂️🏗)

👐 OpenHands: Sign Language Recognition Library Making Sign Language Recognition Accessible Check the documentation on how to use the library: ReadThe

AI4Bhārat 69 Dec 12, 2022
Image super-resolution (SR) is a fast-moving field with novel architectures attracting the spotlight

Revisiting RCAN: Improved Training for Image Super-Resolution Introduction Image super-resolution (SR) is a fast-moving field with novel architectures

Zudi Lin 76 Dec 01, 2022
Hardware-accelerated DNN model inference ROS2 packages using NVIDIA Triton/TensorRT for both Jetson and x86_64 with CUDA-capable GPU

Isaac ROS DNN Inference Overview This repository provides two NVIDIA GPU-accelerated ROS2 nodes that perform deep learning inference using custom mode

NVIDIA Isaac ROS 62 Dec 14, 2022
Gauge equivariant mesh cnn

Geometric Mesh CNN The code in this repository is an implementation of the Gauge Equivariant Mesh CNN introduced in the paper Gauge Equivariant Mesh C

50 Dec 18, 2022
Fairness Metrics: All you need to know

Fairness Metrics: All you need to know Testing machine learning software for ethical bias has become a pressing current concern. Recent research has p

Anonymous2020 1 Jan 17, 2022
DanceTrack: Multiple Object Tracking in Uniform Appearance and Diverse Motion

DanceTrack DanceTrack is a benchmark for tracking multiple objects in uniform appearance and diverse motion. DanceTrack provides box and identity anno

260 Dec 28, 2022
Very Deep Convolutional Networks for Large-Scale Image Recognition

pytorch-vgg Some scripts to convert the VGG-16 and VGG-19 models [1] from Caffe to PyTorch. The converted models can be used with the PyTorch model zo

Justin Johnson 217 Dec 05, 2022
Cooperative multi-agent reinforcement learning for high-dimensional nonequilibrium control

Cooperative multi-agent reinforcement learning for high-dimensional nonequilibrium control Official implementation of: Cooperative multi-agent reinfor

0 Nov 16, 2021
用强化学习DQN算法,训练AI模型来玩合成大西瓜游戏,提供Keras版本和PARL(paddle)版本

用强化学习玩合成大西瓜 代码地址:https://github.com/Sharpiless/play-daxigua-using-Reinforcement-Learning 用强化学习DQN算法,训练AI模型来玩合成大西瓜游戏,提供Keras版本、PARL(paddle)版本和pytorch版本

72 Dec 17, 2022
Object Database for Super Mario Galaxy 1/2.

Super Mario Galaxy Object Database Welcome to the public object database for Super Mario Galaxy and Super Mario Galaxy 2. Here, we document all object

Aurum 9 Dec 04, 2022
AntiFuzz: Impeding Fuzzing Audits of Binary Executables

AntiFuzz: Impeding Fuzzing Audits of Binary Executables Get the paper here: https://www.usenix.org/system/files/sec19-guler.pdf Usage: The python scri

Chair for Sys­tems Se­cu­ri­ty 88 Dec 21, 2022
Official Repository for the paper "Improving Baselines in the Wild".

iWildCam and FMoW baselines (WILDS) This repository was originally forked from the official repository of WILDS datasets (commit 7e103ed) For general

Kazuki Irie 3 Nov 24, 2022
Official implementation of FCL-taco2: Fast, Controllable and Lightweight version of Tacotron2 @ ICASSP 2021

FCL-Taco2: Towards Fast, Controllable and Lightweight Text-to-Speech synthesis (ICASSP 2021) Paper | Demo Block diagram of FCL-taco2, where the decode

Disong Wang 39 Sep 28, 2022
Novel Instances Mining with Pseudo-Margin Evaluation for Few-Shot Object Detection

Novel Instances Mining with Pseudo-Margin Evaluation for Few-Shot Object Detection (NimPme) The official implementation of Novel Instances Mining with

12 Sep 08, 2022
Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting This is the origin Pytorch implementation of Informer in the followin

Haoyi 3.1k Dec 29, 2022
The implementation of the CVPR2021 paper "Structure-Aware Face Clustering on a Large-Scale Graph with 10^7 Nodes"

STAR-FC This code is the implementation for the CVPR 2021 paper "Structure-Aware Face Clustering on a Large-Scale Graph with 10^7 Nodes" 🌟 🌟 . 🎓 Re

Shuai Shen 87 Dec 28, 2022
Code accompanying the paper "How Tight Can PAC-Bayes be in the Small Data Regime?"

How Tight Can PAC-Bayes be in the Small Data Regime? This is the code to reproduce all experiments for the following paper: @inproceedings{Foong:2021:

5 Dec 21, 2021
Time Series Cross-Validation -- an extension for scikit-learn

TSCV: Time Series Cross-Validation This repository is a scikit-learn extension for time series cross-validation. It introduces gaps between the traini

Wenjie Zheng 222 Jan 01, 2023
An introduction to satellite image analysis using Python + OpenCV and JavaScript + Google Earth Engine

A Gentle Introduction to Satellite Image Processing Welcome to this introductory course on Satellite Image Analysis! Satellite imagery has become a pr

Edward Oughton 32 Jan 03, 2023