Implements pytorch code for the Accelerated SGD algorithm.

Overview

AccSGD

This is the code associated with Accelerated SGD algorithm used in the paper On the insufficiency of existing momentum schemes for Stochastic Optimization, selected to appear at ICLR 2018.

Usage:

The code can be downloaded and placed in a given local directory. In a manner similar to using any usual optimizer from the pytorch toolkit, it is also possible to use the AccSGD optimizer with little effort. First, we require importing the optimizer through the following command:

from AccSGD import *

Next, an ASGD optimizer working with a given pytorch model can be invoked using the following command:

optimizer = AccSGD(model.parameters(), lr=0.1, kappa = 1000.0, xi = 10.0)

where, lr is the learning rate, kappa the long step parameter and xi is the statistical advantage parameter.

Guidelines on setting parameters/debugging:

The learning rate lr: lr is set in a manner similar to schemes such as vanilla Stochastic Gradient Descent (SGD)/Standard Momentum (Heavy Ball)/Nesterov's Acceleration. Note that lr is a function of batch size - a rigorous quantification of this phenomenon can be found in the following paper. Such a characterization has been observed in several empirical works.

Long Step kappa: As the networks grow deeper (e.g. with resnets) and when dealing with typically harder datasets such as CIFAR/ImageNet, employing kappa to be 10^4 or more helps. For shallow nets and easier datasets such as MNIST, a typical value of kappa can be set as 10^3 or even 10^2.

Statistical Advantage Parameter xi: xi lies between 1.0 and sqrt(kappa). When large batch sizes (nearly matching batch gradient descent) are used, it is advisable to use xi that is closer to sqrt(kappa). In general, as the batch size increases by a factor of k, increase xi by sqrt(k).

Effective ways to debug:

For Nets with ReLU/ELU type activations:

(--1--) Slower convergence: There are three reasons for this to happen:

  • This could be a result of setting the learning rate too low (similar to SGD/vanilla momentum/Nesterov's acceleration).
  • This could be as a result of setting kappa to be too high.
  • The other reason could be that xi has been set to a small value and needs to be increased.

(--2--) Oscillatory behavior/Divergence: There are two reasons for this to happen:

  • This could be a result of setting the learning rate to be too high (similar to SGD/vanilla momentum/Nesterov's acceleration).
  • The other reason is that xi has been set to a large value and needs to be decreased.

For nets with Sigmoid activations:

Slower convergence after an initial rapid decrease in error: This is a sign of an over aggressive setting of parameters and must be treated in a similar manner as the oscillatory/divergence behavior (--2--) encountered in the ReLU/ELU activation case.

Slow convergence right from the start: This is more likely related to slower convergence (--1--) encountered in the ReLU/ELU case.

Citation:

If AccSGD is used in your paper/experiments, please cite the following papers.

@inproceedings{Kidambi2018Insufficiency,
  title={On the insufficiency of existing momentum schemes for Stochastic Optimization},
  author={Kidambi, Rahul and Netrapalli, Praneeth and Jain, Prateek and Kakade, Sham},
  booktitle={International Conference on Learning Representations},
  year={2018}
}

@Article{Jain2017Accelerating,
  title={Accelerating Stochastic Gradient Descent},
  author={Jain, Prateek and Kakade, Sham and Kidambi, Rahul and Netrapalli, Praneeth and Sidford, Aaron},
  journal={CoRR},
  volume = {abs/1704.08227},
  year={2017}
}
A tiny package to compare two neural networks in PyTorch

Compare neural networks by their feature similarity

Anand Krishnamoorthy 180 Dec 30, 2022
A lightweight wrapper for PyTorch that provides a simple declarative API for context switching between devices, distributed modes, mixed-precision, and PyTorch extensions.

A lightweight wrapper for PyTorch that provides a simple declarative API for context switching between devices, distributed modes, mixed-precision, and PyTorch extensions.

Fidelity Investments 56 Sep 13, 2022
Tez is a super-simple and lightweight Trainer for PyTorch. It also comes with many utils that you can use to tackle over 90% of deep learning projects in PyTorch.

Tez: a simple pytorch trainer NOTE: Currently, we are not accepting any pull requests! All PRs will be closed. If you want a feature or something does

abhishek thakur 1.1k Jan 04, 2023
The easiest way to use deep metric learning in your application. Modular, flexible, and extensible. Written in PyTorch.

News March 3: v0.9.97 has various bug fixes and improvements: Bug fixes for NTXentLoss Efficiency improvement for AccuracyCalculator, by using torch i

Kevin Musgrave 5k Jan 02, 2023
PyTorch Lightning Optical Flow models, scripts, and pretrained weights.

PyTorch Lightning Optical Flow models, scripts, and pretrained weights.

Henrique Morimitsu 105 Dec 16, 2022
Pytorch implementation of Distributed Proximal Policy Optimization

Pytorch-DPPO Pytorch implementation of Distributed Proximal Policy Optimization: https://arxiv.org/abs/1707.02286 Using PPO with clip loss (from https

Alexis David Jacq 164 Jan 05, 2023
A Closer Look at Structured Pruning for Neural Network Compression

A Closer Look at Structured Pruning for Neural Network Compression Code used to reproduce experiments in https://arxiv.org/abs/1810.04622. To prune, w

Bayesian and Neural Systems Group 140 Dec 05, 2022
A PyTorch implementation of Learning to learn by gradient descent by gradient descent

Intro PyTorch implementation of Learning to learn by gradient descent by gradient descent. Run python main.py TODO Initial implementation Toy data LST

Ilya Kostrikov 300 Dec 11, 2022
PyGCL: Graph Contrastive Learning Library for PyTorch

PyGCL is an open-source library for graph contrastive learning (GCL), which features modularized GCL components from published papers, standardized evaluation, and experiment management.

GCL: Graph Contrastive Learning Library for PyTorch 592 Jan 07, 2023
A pure Python implementation of Compact Bilinear Pooling and Count Sketch for PyTorch.

Compact Bilinear Pooling for PyTorch. This repository has a pure Python implementation of Compact Bilinear Pooling and Count Sketch for PyTorch. This

Grégoire Payen de La Garanderie 234 Dec 07, 2022
A PyTorch implementation of L-BFGS.

PyTorch-LBFGS: A PyTorch Implementation of L-BFGS Authors: Hao-Jun Michael Shi (Northwestern University) and Dheevatsa Mudigere (Facebook) What is it?

Hao-Jun Michael Shi 478 Dec 27, 2022
PyNIF3D is an open-source PyTorch-based library for research on neural implicit functions (NIF)-based 3D geometry representation.

PyNIF3D is an open-source PyTorch-based library for research on neural implicit functions (NIF)-based 3D geometry representation. It aims to accelerate research by providing a modular design that all

Preferred Networks, Inc. 96 Nov 28, 2022
Unofficial PyTorch implementation of DeepMind's Perceiver IO with PyTorch Lightning scripts for distributed training

Unofficial PyTorch implementation of DeepMind's Perceiver IO with PyTorch Lightning scripts for distributed training

Martin Krasser 251 Dec 25, 2022
Fast, general, and tested differentiable structured prediction in PyTorch

Torch-Struct: Structured Prediction Library A library of tested, GPU implementations of core structured prediction algorithms for deep learning applic

HNLP 1.1k Jan 07, 2023
You like pytorch? You like micrograd? You love tinygrad! ❤️

For something in between a pytorch and a karpathy/micrograd This may not be the best deep learning framework, but it is a deep learning framework. Due

George Hotz 9.7k Jan 05, 2023
A few Windows specific scripts for PyTorch

It is a repo that contains scripts that makes using PyTorch on Windows easier. Easy Installation Update: Starting from 0.4.0, you can go to the offici

408 Dec 15, 2022
Differentiable ODE solvers with full GPU support and O(1)-memory backpropagation.

PyTorch Implementation of Differentiable ODE Solvers This library provides ordinary differential equation (ODE) solvers implemented in PyTorch. Backpr

Ricky Chen 4.4k Jan 04, 2023
ocaml-torch provides some ocaml bindings for the PyTorch tensor library.

ocaml-torch provides some ocaml bindings for the PyTorch tensor library. This brings to OCaml NumPy-like tensor computations with GPU acceleration and tape-based automatic differentiation.

Laurent Mazare 369 Jan 03, 2023
PyTorch framework A simple and complete framework for PyTorch, providing a variety of data loading and simple task solutions that are easy to extend and migrate

PyTorch framework A simple and complete framework for PyTorch, providing a variety of data loading and simple task solutions that are easy to extend and migrate

Cong Cai 12 Dec 19, 2021
This is an differentiable pytorch implementation of SIFT patch descriptor.

This is an differentiable pytorch implementation of SIFT patch descriptor. It is very slow for describing one patch, but quite fast for batch. It can

Dmytro Mishkin 150 Dec 24, 2022