Create animations for the optimization trajectory of neural nets

Overview

Animating the Optimization Trajectory of Neural Nets

PyPi Latest Release Release License

loss-landscape-anim lets you create animated optimization path in a 2D slice of the loss landscape of your neural networks. It is based on PyTorch Lightning, please follow its suggested style if you want to add your own model.

Check out my article Visualizing Optimization Trajectory of Neural Nets for more examples and some intuitive explanations.

0. Installation

From PyPI:

pip install loss-landscape-anim

From source, you need Poetry. Once you cloned this repo, run the command below to install the dependencies.

poetry install

1. Basic Examples

With the provided spirals dataset and the default multilayer perceptron MLP model, you can directly call loss_landscape_anim to get a sample animated GIF like this:

# Use default MLP model and sample spirals dataset
loss_landscape_anim(n_epochs=300)

sample gif 1

Note: if you are using it in a notebook, don't forget to include the following at the top:

%matplotlib notebook

Here's another example – the LeNet5 convolutional network on the MNIST dataset. There are many levers you can tune: learning rate, batch size, epochs, frames per second of the GIF output, a seed for reproducible results, whether to load from a trained model, etc. Check out the function signature for more details.

bs = 16
lr = 1e-3
datamodule = MNISTDataModule(batch_size=bs, n_examples=3000)
model = LeNet(learning_rate=lr)

optim_path, loss_steps, accu_steps = loss_landscape_anim(
    n_epochs=10,
    model=model,
    datamodule=datamodule,
    optimizer="adam",
    giffps=15,
    seed=SEED,
    load_model=False,
    output_to_file=True,
    return_data=True,  # Optional return values if you need them
    gpus=1  # Enable GPU training if available
)

GPU training is supported. Just pass gpus into loss_landscape_anim if they are available.

The output of LeNet5 on the MNIST dataset looks like this:

sample gif 2

2. Why PCA?

To create a 2D visualization, the first thing to do is to pick the 2 directions that define the plane. In the paper Visualizing the Loss Landscape of Neural Nets, the authors argued why 2 random directions don't work and why PCA is much better. In summary,

  1. 2 random vectors in high dimensional space have a high probability of being orthogonal, and they can hardly capture any variation for the optimization path. The path’s projection onto the plane spanned by the 2 vectors will just look like random walk.

  2. If we pick one direction to be the vector pointing from the initial parameters to the final trained parameters, and another direction at random, the visualization will look like a straight line because the second direction doesn’t capture much variance compared to the first.

  3. If we use principal component analysis (PCA) on the optimization path and get the top 2 components, we can visualize the loss over the 2 orthogonal directions with the most variance.

For showing the most motion in 2D, PCA is preferred. If you need a quick recap on PCA, here's a minimal example you can go over under 3 minutes.

3. Random and Custom Directions

Although PCA is a good approach for picking the directions, if you need more control, the code also allows you to set any 2 fixed directions, either generated at random or handpicked.

For 2 random directions, set reduction_method to "random", e.g.

loss_landscape_anim(n_epochs=300, load_model=False, reduction_method="random")

For 2 fixed directions of your choosing, set reduction_method to "custom", e.g.

import numpy as np

n_params = ... # number of parameters your model has
u_gen = np.random.normal(size=n_params)
u = u_gen / np.linalg.norm(u_gen)
v_gen = np.random.normal(size=n_params)
v = v_gen / np.linalg.norm(v_gen)

loss_landscape_anim(
    n_epochs=300, load_model=False, reduction_method="custom", custom_directions=(u, v)
)

Here is an sample GIF produced by two random directions:

sample gif 3

By default, reduction_method="pca".

4. Custom Dataset and Model

  1. Prepare your DataModule. Refer to datamodule.py for examples.
  2. Define your custom model that inherits model.GenericModel. Refer to model.py for examples.
  3. Once you correctly setup your custom DataModule and model, call the function as shown below to train the model and plot the loss landscape animation.
bs = ...
lr = ...
datamodule = YourDataModule(batch_size=bs)
model = YourModel(learning_rate=lr)

loss_landscape_anim(
    n_epochs=10,
    model=model,
    datamodule=datamodule,
    optimizer="adam",
    seed=SEED,
    load_model=False,
    output_to_file=True
)

5. Comparing Different Optimizers

As mentioned in section 2, the optimization path usually falls into a very low-dimensional space, and its projection in other directions may look like random walk. On the other hand, different optimizers can take very different paths in the high dimensional space. As a result, it is difficult to pick 2 directions to effectively compare different optimizers.

In this example, I have adam, sgd, adagrad, rmsprop initialized with the same parameters. The two figures below share the same 2 random directions but are centered around different local minima. The first figure centers around the one Adam finds, the second centers around the one RMSprop finds. Essentially, the planes are 2 parallel slices of the loss landscape.

The first figure shows that when centering on the end of Adam's path, it looks like RMSprop is going somewhere with larger loss value. But that is an illusion. If you inspect the loss values of RMSprop, it actually finds a local optimum that has a lower loss than Adam's.

Same 2 directions centering on Adam's path:

adam

Same 2 directions centering on RMSprop's path:

rmsprop

This is a good reminder that the contours are just a 2D slice out of a very high-dimensional loss landscape, and the projections can't reflect the actual path.

However, we can see that the contours are convex no matter where it centers around in these 2 special cases. It more or less reflects that the optimizers shouldn't have a hard time finding a relatively good local minimum. To measure convexity more rigorously, the paper [1] mentioned a better method – using principal curvature, i.e. the eigenvalues of the Hessian. Check out the end of section 6 in the paper for more details.

Reference

[1] Visualizing the Loss Landscape of Neural Nets

You might also like...
Real-CUGAN - Real Cascade U-Nets for Anime Image Super Resolution

Real Cascade U-Nets for Anime Image Super Resolution 中文 | English 🔥 Real-CUGAN

Genetic Algorithm, Particle Swarm Optimization, Simulated Annealing, Ant Colony Optimization Algorithm,Immune Algorithm, Artificial Fish Swarm Algorithm, Differential Evolution and TSP(Traveling salesman)
Genetic Algorithm, Particle Swarm Optimization, Simulated Annealing, Ant Colony Optimization Algorithm,Immune Algorithm, Artificial Fish Swarm Algorithm, Differential Evolution and TSP(Traveling salesman)

scikit-opt Swarm Intelligence in Python (Genetic Algorithm, Particle Swarm Optimization, Simulated Annealing, Ant Colony Algorithm, Immune Algorithm,A

library for nonlinear optimization, wrapping many algorithms for global and local, constrained or unconstrained, optimization

NLopt is a library for nonlinear local and global optimization, for functions with and without gradient information. It is designed as a simple, unifi

Racing line optimization algorithm in python that uses Particle Swarm Optimization.
Racing line optimization algorithm in python that uses Particle Swarm Optimization.

Racing Line Optimization with PSO This repository contains a racing line optimization algorithm in python that uses Particle Swarm Optimization. Requi

Code + pre-trained models for the paper Keeping Your Eye on the Ball Trajectory Attention in Video Transformers

Motionformer This is an official pytorch implementation of paper Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers. In this rep

Learning trajectory representations using self-supervision and programmatic supervision.
Learning trajectory representations using self-supervision and programmatic supervision.

Trajectory Embedding for Behavior Analysis (TREBA) Implementation from the paper: Jennifer J. Sun, Ann Kennedy, Eric Zhan, David J. Anderson, Yisong Y

A selection of State Of The Art research papers (and code) on human locomotion (pose + trajectory) prediction (forecasting)

A selection of State Of The Art research papers (and code) on human trajectory prediction (forecasting). Papers marked with [W] are workshop papers.

A Planar RGB-D SLAM which utilizes Manhattan World structure to provide optimal camera pose trajectory while also providing a sparse reconstruction containing points, lines and planes, and a dense surfel-based reconstruction.
A Planar RGB-D SLAM which utilizes Manhattan World structure to provide optimal camera pose trajectory while also providing a sparse reconstruction containing points, lines and planes, and a dense surfel-based reconstruction.

ManhattanSLAM Authors: Raza Yunus, Yanyan Li and Federico Tombari ManhattanSLAM is a real-time SLAM library for RGB-D cameras that computes the camera

This is the codebase for the ICLR 2021 paper Trajectory Prediction using Equivariant Continuous Convolution
This is the codebase for the ICLR 2021 paper Trajectory Prediction using Equivariant Continuous Convolution

Trajectory Prediction using Equivariant Continuous Convolution (ECCO) This is the codebase for the ICLR 2021 paper Trajectory Prediction using Equivar

Owner
Logan Yang
Software engineer, machine learning practitioner
Logan Yang
Source code for paper "Deep Superpixel-based Network for Blind Image Quality Assessment"

DSN-IQA Source code for paper "Deep Superpixel-based Network for Blind Image Quality Assessment" Requirements Python =3.8.0 Pytorch =1.7.1 Usage wit

7 Oct 13, 2022
A method to perform unsupervised cross-region adaptation of crop classifiers trained with satellite image time series.

TimeMatch Official source code of TimeMatch: Unsupervised Cross-region Adaptation by Temporal Shift Estimation by Joachim Nyborg, Charlotte Pelletier,

Joachim Nyborg 17 Nov 01, 2022
curl-impersonate: A special compilation of curl that makes it impersonate Chrome & Firefox

curl-impersonate A special compilation of curl that makes it impersonate real browsers. It can impersonate the four major browsers: Chrome, Edge, Safa

lwthiker 1.9k Jan 03, 2023
The official PyTorch implementation of paper BBN: Bilateral-Branch Network with Cumulative Learning for Long-Tailed Visual Recognition

BBN: Bilateral-Branch Network with Cumulative Learning for Long-Tailed Visual Recognition Boyan Zhou, Quan Cui, Xiu-Shen Wei*, Zhao-Min Chen This repo

Megvii-Nanjing 616 Dec 21, 2022
This repository is the official implementation of Open Rule Induction. This paper has been accepted to NeurIPS 2021.

Open Rule Induction This repository is the official implementation of Open Rule Induction. This paper has been accepted to NeurIPS 2021. Abstract Rule

Xingran Chen 16 Nov 14, 2022
Auditing Black-Box Prediction Models for Data Minimization Compliance

Data-Minimization-Auditor An auditing tool for model-instability based data minimization that is introduced in "Auditing Black-Box Prediction Models f

Bashir Rastegarpanah 2 Mar 24, 2022
Chinese Mandarin tts text-to-speech 中文 (普通话) 语音 合成 , by fastspeech 2 , implemented in pytorch, using waveglow as vocoder,

Chinese mandarin text to speech based on Fastspeech2 and Unet This is a modification and adpation of fastspeech2 to mandrin(普通话). Many modifications t

291 Jan 02, 2023
Pytorch implementation for our ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering".

TRAnsformer Routing Networks (TRAR) This is an official implementation for ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visu

Ren Tianhe 49 Nov 10, 2022
A short code in python, Enchpyter, is able to encrypt and decrypt words as you determine, of course

Enchpyter Enchpyter is a program do encrypt and decrypt any word you want (just letters). You enter how many letters jumps and write the word, so, the

João Assalim 2 Oct 10, 2022
Code for "MetaMorph: Learning Universal Controllers with Transformers", Gupta et al, ICLR 2022

MetaMorph: Learning Universal Controllers with Transformers This is the code for the paper MetaMorph: Learning Universal Controllers with Transformers

Agrim Gupta 50 Jan 03, 2023
Official Implementation of "Learning Disentangled Behavior Embeddings"

DBE: Disentangled-Behavior-Embedding Official implementation of Learning Disentangled Behavior Embeddings (NeurIPS 2021). Environment requirement The

Mishne Lab 12 Sep 28, 2022
a short visualisation script for pyvideo data

PyVideo Speakers A CLI that visualises repeat speakers from events listed in https://github.com/pyvideo/data Not terribly efficient, but you know. Ins

Katie McLaughlin 3 Nov 24, 2021
Use stochastic processes to generate samples and use them to train a fully-connected neural network based on Keras

Use stochastic processes to generate samples and use them to train a fully-connected neural network based on Keras which will then be used to generate residuals

Federico Lopez 2 Jan 14, 2022
The code uses SegFormer for Semantic Segmentation on Drone Dataset.

SegFormer_Segmentation The code uses SegFormer for Semantic Segmentation on Drone Dataset. The details for the SegFormer can be obtained from the foll

Dr. Sander Ali Khowaja 1 May 08, 2022
Contextual Attention Localization for Offline Handwritten Text Recognition

CALText This repository contains the source code for CALText model introduced in "CALText: Contextual Attention Localization for Offline Handwritten T

0 Feb 17, 2022
Python implementation of "Elliptic Fourier Features of a Closed Contour"

PyEFD An Python/NumPy implementation of a method for approximating a contour with a Fourier series, as described in [1]. Installation pip install pyef

Henrik Blidh 71 Dec 09, 2022
Implementation of Auto-Conditioned Recurrent Networks for Extended Complex Human Motion Synthesis

acLSTM_motion This folder contains an implementation of acRNN for the CMU motion database written in Pytorch. See the following links for more backgro

Yi_Zhou 61 Sep 07, 2022
RefineGNN - Iterative refinement graph neural network for antibody sequence-structure co-design (RefineGNN)

Iterative refinement graph neural network for antibody sequence-structure co-des

Wengong Jin 83 Dec 31, 2022
Code release for the ICML 2021 paper "PixelTransformer: Sample Conditioned Signal Generation".

PixelTransformer Code release for the ICML 2021 paper "PixelTransformer: Sample Conditioned Signal Generation". Project Page Installation Please insta

Shubham Tulsiani 24 Dec 17, 2022
Language models are open knowledge graphs ( non official implementation )

language-models-are-knowledge-graphs-pytorch Language models are open knowledge graphs ( work in progress ) A non official reimplementation of Languag

theblackcat102 132 Dec 18, 2022