Bunch of optimizer implementations in PyTorch

Overview

pytorch-optimizer

Bunch of optimizer implementations in PyTorch with clean-code, strict types. Also, including useful optimization ideas.
Most of the implementations are based on the original paper, but I added some tweaks.
Highly inspired by pytorch-optimizer.

Documentation

https://pytorch-optimizers.readthedocs.io/en/latest/

Usage

Install

$ pip3 install pytorch-optimizer

Simple Usage

from pytorch_optimizer import Ranger21

...
model = YourModel()
optimizer = Ranger21(model.parameters())
...

for input, output in data:
  optimizer.zero_grad()
  loss = loss_function(output, model(input))
  loss.backward()
  optimizer.step()

Supported Optimizers

Optimizer Description Official Code Paper
AdaBelief Adapting Stepsizes by the Belief in Observed Gradients github https://arxiv.org/abs/2010.07468
AdaBound Adaptive Gradient Methods with Dynamic Bound of Learning Rate github https://openreview.net/forum?id=Bkg3g2R9FX
AdaHessian An Adaptive Second Order Optimizer for Machine Learning github https://arxiv.org/abs/2006.00719
AdamP Slowing Down the Slowdown for Momentum Optimizers on Scale-invariant Weights github https://arxiv.org/abs/2006.08217
diffGrad An Optimization Method for Convolutional Neural Networks github https://arxiv.org/abs/1909.11015v3
MADGRAD A Momentumized, Adaptive, Dual Averaged Gradient Method for Stochastic github https://arxiv.org/abs/2101.11075
RAdam On the Variance of the Adaptive Learning Rate and Beyond github https://arxiv.org/abs/1908.03265
Ranger a synergistic optimizer combining RAdam and LookAhead, and now GC in one optimizer github https://bit.ly/3zyspC3
Ranger21 a synergistic deep learning optimizer github https://arxiv.org/abs/2106.13731

Useful Resources

Several optimization ideas to regularize & stabilize the training. Most of the ideas are applied in Ranger21 optimizer.

Also, most of the captures are taken from Ranger21 paper.

Adaptive Gradient Clipping Gradient Centralization Softplus Transformation
Gradient Normalization Norm Loss Positive-Negative Momentum
Linear learning rate warmup Stable weight decay Explore-exploit learning rate schedule
Lookahead Chebyshev learning rate schedule (Adaptive) Sharpness-Aware Minimization
On the Convergence of Adam and Beyond    

Adaptive Gradient Clipping

This idea originally proposed in NFNet (Normalized-Free Network) paper.
AGC (Adaptive Gradient Clipping) clips gradients based on the unit-wise ratio of gradient norms to parameter norms.

Gradient Centralization

https://raw.githubusercontent.com/kozistr/pytorch_optimizer/main/assets/gradient_centralization.png

Gradient Centralization (GC) operates directly on gradients by centralizing the gradient to have zero mean.

Softplus Transformation

By running the final variance denom through the softplus function, it lifts extremely tiny values to keep them viable.

Gradient Normalization

Norm Loss

https://raw.githubusercontent.com/kozistr/pytorch_optimizer/main/assets/norm_loss.png

Positive-Negative Momentum

https://raw.githubusercontent.com/kozistr/pytorch_optimizer/main/assets/positive_negative_momentum.png

Linear learning rate warmup

https://raw.githubusercontent.com/kozistr/pytorch_optimizer/main/assets/linear_lr_warmup.png

Stable weight decay

https://raw.githubusercontent.com/kozistr/pytorch_optimizer/main/assets/stable_weight_decay.png

Explore-exploit learning rate schedule

https://raw.githubusercontent.com/kozistr/pytorch_optimizer/main/assets/explore_exploit_lr_schedule.png

Lookahead

k steps forward, 1 step back. Lookahead consisting of keeping an exponential moving average of the weights that is
updated and substituted to the current weights every k_{lookahead} steps (5 by default).

Chebyshev learning rate schedule

Acceleration via Fractal Learning Rate Schedules

(Adaptive) Sharpness-Aware Minimization

Sharpness-Aware Minimization (SAM) simultaneously minimizes loss value and loss sharpness.
In particular, it seeks parameters that lie in neighborhoods having uniformly low loss.

On the Convergence of Adam and Beyond

Citations

AdamP

@inproceedings{heo2021adamp,
    title={AdamP: Slowing Down the Slowdown for Momentum Optimizers on Scale-invariant Weights},
    author={Heo, Byeongho and Chun, Sanghyuk and Oh, Seong Joon and Han, Dongyoon and Yun, Sangdoo and Kim, Gyuwan and Uh, Youngjung and Ha, Jung-Woo},
    year={2021},
    booktitle={International Conference on Learning Representations (ICLR)},
}

Adaptive Gradient Clipping (AGC)

@article{brock2021high,
  author={Andrew Brock and Soham De and Samuel L. Smith and Karen Simonyan},
  title={High-Performance Large-Scale Image Recognition Without Normalization},
  journal={arXiv preprint arXiv:2102.06171},
  year={2021}
}

Chebyshev LR Schedules

@article{agarwal2021acceleration,
  title={Acceleration via Fractal Learning Rate Schedules},
  author={Agarwal, Naman and Goel, Surbhi and Zhang, Cyril},
  journal={arXiv preprint arXiv:2103.01338},
  year={2021}
}

Gradient Centralization (GC)

@inproceedings{yong2020gradient,
  title={Gradient centralization: A new optimization technique for deep neural networks},
  author={Yong, Hongwei and Huang, Jianqiang and Hua, Xiansheng and Zhang, Lei},
  booktitle={European Conference on Computer Vision},
  pages={635--652},
  year={2020},
  organization={Springer}
}

Lookahead

@article{zhang2019lookahead,
  title={Lookahead optimizer: k steps forward, 1 step back},
  author={Zhang, Michael R and Lucas, James and Hinton, Geoffrey and Ba, Jimmy},
  journal={arXiv preprint arXiv:1907.08610},
  year={2019}
}

RAdam

@inproceedings{liu2019radam,
 author = {Liu, Liyuan and Jiang, Haoming and He, Pengcheng and Chen, Weizhu and Liu, Xiaodong and Gao, Jianfeng and Han, Jiawei},
 booktitle = {Proceedings of the Eighth International Conference on Learning Representations (ICLR 2020)},
 month = {April},
 title = {On the Variance of the Adaptive Learning Rate and Beyond},
 year = {2020}
}

Norm Loss

@inproceedings{georgiou2021norm,
  title={Norm Loss: An efficient yet effective regularization method for deep neural networks},
  author={Georgiou, Theodoros and Schmitt, Sebastian and B{\"a}ck, Thomas and Chen, Wei and Lew, Michael},
  booktitle={2020 25th International Conference on Pattern Recognition (ICPR)},
  pages={8812--8818},
  year={2021},
  organization={IEEE}
}

Positive-Negative Momentum

@article{xie2021positive,
  title={Positive-Negative Momentum: Manipulating Stochastic Gradient Noise to Improve Generalization},
  author={Xie, Zeke and Yuan, Li and Zhu, Zhanxing and Sugiyama, Masashi},
  journal={arXiv preprint arXiv:2103.17182},
  year={2021}
}

Explore-Exploit learning rate schedule

@article{iyer2020wide,
  title={Wide-minima Density Hypothesis and the Explore-Exploit Learning Rate Schedule},
  author={Iyer, Nikhil and Thejas, V and Kwatra, Nipun and Ramjee, Ramachandran and Sivathanu, Muthian},
  journal={arXiv preprint arXiv:2003.03977},
  year={2020}
}

Linear learning-rate warm-up

@article{ma2019adequacy,
  title={On the adequacy of untuned warmup for adaptive optimization},
  author={Ma, Jerry and Yarats, Denis},
  journal={arXiv preprint arXiv:1910.04209},
  volume={7},
  year={2019}
}

Stable weight decay

@article{xie2020stable,
  title={Stable weight decay regularization},
  author={Xie, Zeke and Sato, Issei and Sugiyama, Masashi},
  journal={arXiv preprint arXiv:2011.11152},
  year={2020}
}

Softplus transformation

@article{tong2019calibrating,
  title={Calibrating the adaptive learning rate to improve convergence of adam},
  author={Tong, Qianqian and Liang, Guannan and Bi, Jinbo},
  journal={arXiv preprint arXiv:1908.00700},
  year={2019}
}

MADGRAD

@article{defazio2021adaptivity,
  title={Adaptivity without compromise: a momentumized, adaptive, dual averaged gradient method for stochastic optimization},
  author={Defazio, Aaron and Jelassi, Samy},
  journal={arXiv preprint arXiv:2101.11075},
  year={2021}
}

AdaHessian

@article{yao2020adahessian,
  title={ADAHESSIAN: An adaptive second order optimizer for machine learning},
  author={Yao, Zhewei and Gholami, Amir and Shen, Sheng and Mustafa, Mustafa and Keutzer, Kurt and Mahoney, Michael W},
  journal={arXiv preprint arXiv:2006.00719},
  year={2020}
}

AdaBound

@inproceedings{Luo2019AdaBound,
  author = {Luo, Liangchen and Xiong, Yuanhao and Liu, Yan and Sun, Xu},
  title = {Adaptive Gradient Methods with Dynamic Bound of Learning Rate},
  booktitle = {Proceedings of the 7th International Conference on Learning Representations},
  month = {May},
  year = {2019},
  address = {New Orleans, Louisiana}
}

AdaBelief

@article{zhuang2020adabelief,
  title={Adabelief optimizer: Adapting stepsizes by the belief in observed gradients},
  author={Zhuang, Juntang and Tang, Tommy and Ding, Yifan and Tatikonda, Sekhar and Dvornek, Nicha and Papademetris, Xenophon and Duncan, James S},
  journal={arXiv preprint arXiv:2010.07468},
  year={2020}
}

Sharpness-Aware Minimization

@article{foret2020sharpness,
  title={Sharpness-aware minimization for efficiently improving generalization},
  author={Foret, Pierre and Kleiner, Ariel and Mobahi, Hossein and Neyshabur, Behnam},
  journal={arXiv preprint arXiv:2010.01412},
  year={2020}
}

Adaptive Sharpness-Aware Minimization

@article{kwon2021asam,
  title={ASAM: Adaptive Sharpness-Aware Minimization for Scale-Invariant Learning of Deep Neural Networks},
  author={Kwon, Jungmin and Kim, Jeongseop and Park, Hyunseo and Choi, In Kwon},
  journal={arXiv preprint arXiv:2102.11600},
  year={2021}
}

diffGrad

@article{dubey2019diffgrad,
  title={diffgrad: An optimization method for convolutional neural networks},
  author={Dubey, Shiv Ram and Chakraborty, Soumendu and Roy, Swalpa Kumar and Mukherjee, Snehasis and Singh, Satish Kumar and Chaudhuri, Bidyut Baran},
  journal={IEEE transactions on neural networks and learning systems},
  volume={31},
  number={11},
  pages={4500--4511},
  year={2019},
  publisher={IEEE}
}

On the Convergence of Adam and Beyond

@article{reddi2019convergence,
  title={On the convergence of adam and beyond},
  author={Reddi, Sashank J and Kale, Satyen and Kumar, Sanjiv},
  journal={arXiv preprint arXiv:1904.09237},
  year={2019}
}

Author

Hyeongchan Kim / @kozistr

Comments
  • Sharpness Aware Minimization (SAM) requires closure

    Sharpness Aware Minimization (SAM) requires closure

    Hi, thank you so much for your repo, I am using SAM optimizer but I am facing this error, how to fix this?

    RuntimeError: [-] Sharpness Aware Minimization (SAM) requires closure

    question 
    opened by manza-ari 21
  •  Trying to use SAM optimizer for Random Sampling Image Classification

    Trying to use SAM optimizer for Random Sampling Image Classification

    I am trying to use SAM optimizer when I use the backward function twice in train_epoch() # second forward-backward pass, it gives me an error otherwise it works fine.

    Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [512, 100]], which is output 0 of AsStridedBackward0, is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

    `def train_epoch(models, criterion, optimizers, dataloaders):

    models.train()
    global iters
    for data in tqdm(dataloaders['train'], leave=False, total=len(dataloaders['train'])):
        with torch.cuda.device(CUDA_VISIBLE_DEVICES):
            inputs = data[0].cuda()
            labels = data[1].cuda()
        iters += 1
        optimizers.zero_grad()  
        #pdb.set_trace()      
        scores, _, features = models(inputs) 
        
        target_loss = criterion(scores, labels)
        m_backbone_loss = torch.sum(target_loss) / target_loss.size(0)        
        loss  = m_backbone_loss
         # -----------------SAM Optimizer -------------------
        # first forward-backward pass
        criterion(models(inputs)[0], labels)
        loss.backward(retain_graph=True)
        optimizers.first_step(zero_grad=True)
        
        # second forward-backward pass
        criterion(models(inputs)[0], labels)
        loss.backward(retain_graph=True)
        optimizers.second_step(zero_grad=True)
    #return loss`
    
    question 
    opened by manza-ari 14
  • Ranger21 does not work

    Ranger21 does not work

    Below is the trace when I try to use Ranger21, other optimizers work as they should

    c:\users\g\appdata\local\programs\python\python38\lib\site-packages\pytorch_optimizer\ranger21.py in init(self, params, lr, beta0, betas, use_softplus, beta_softplus, num_iterations, num_warm_up_iterations, num_warm_down_iterations, warm_down_min_lr, agc_clipping_value, agc_eps, centralize_gradients, normalize_gradients, lookahead_merge_time, lookahead_blending_alpha, weight_decay, norm_loss_factor, eps) 114 # warmup iterations 115 self.num_warm_up_iterations: int = ( --> 116 self.build_warm_up_iterations(num_iterations, betas[1]) 117 if num_warm_up_iterations is None 118 else num_warm_up_iterations

    c:\users\g\appdata\local\programs\python\python38\lib\site-packages\pytorch_optimizer\ranger21.py in build_warm_up_iterations(total_iterations, beta2, warm_up_pct) 150 def build_warm_up_iterations(total_iterations: int, beta2: float, warm_up_pct: float = 0.22) -> int: 151 warm_up_iterations: int = math.ceil(2.0 / (1.0 - beta2)) # default un-tuned linear warmup --> 152 beta_pct: float = warm_up_iterations / total_iterations 153 if beta_pct > 0.45: 154 return int(warm_up_pct * total_iterations)

    TypeError: unsupported operand type(s) for /: 'int' and 'NoneType'

    bug 
    opened by BaconGabe 3
  • [Feature] support torch.hub.load

    [Feature] support torch.hub.load

    Problem (Why?)

    loading optimizers via torch.hub.load

    Solution (What/How?)

    it is inconvenient to define functions one by one, I used a trick using globals().

    Other changes (bug fixes, small refactors)

    Change Callable to Type[Optimizer]. Perhaps this is what you intended. see: typing.Type

    Notes

    example colab

    import torch
    
    Adan = torch.hub.load("Bing-su/pytorch_optimizer:hubconf", "Adan")
    
    feature size/S 
    opened by Bing-su 2
  • [Test] Increase the test coverage

    [Test] Increase the test coverage

    Problem (Why?)

    heading to coverage 98%

    Solution (What/How?)

    • [x] update test_no_gradient

    Other changes (bug fixes, small refactors)

    • [x] fix API documentation

    Notes

    nope

    documentation enhancement size/XL 
    opened by kozistr 1
  • [Refactor/Docs] Organize Class docstring & Add custom exceptions

    [Refactor/Docs] Organize Class docstring & Add custom exceptions

    Problem (Why?)

    there's no proper exception class (e.g. no sparse gradient, zero parameter size)

    Solution (What/How?)

    • [x] register custom exceptions
    • [x] refactor the docstrings
    • [x] support gradient centralization for Adai optimizer
    • [x] support AdamD debias for AdaPNM optimizer
    • [x] fix SAM optimizer
    • [x] add API documentation

    Other changes (bug fixes, small refactors)

    • [x] wrapper to the module (not __init__) in hubconf.py
    • [x] add a citation to README.rst

    Notes

    to v2.1.1

    bug documentation enhancement refactoring size/XXL 
    opened by kozistr 1
  • [Feature] Implement `Adai` optimizer

    [Feature] Implement `Adai` optimizer

    Problem (Why?)

    Implement Adai optimizer

    Solution (What/How?)

    • [x] implement Adai & Adai v2 optimizers

    Other changes (bug fixes, small refactors)

    nope

    Notes

    version to v2.1.0

    feature size/L 
    opened by kozistr 1
  • [CI] Reduce `num_iterations` to speed up the testing

    [CI] Reduce `num_iterations` to speed up the testing

    Problem (Why?)

    num_iterations to train the model is kinda enough while the testing takes about 2 mins.

    Solution (What/How?)

    • [x] reduce num_iterations to 100 ~ 200, which is enough to train the model (it takes 2 mins -> 1 min)

    Other changes (bug fixes, small refactors)

    • [x] explicit torch package to CPU version

    Notes

    nope

    enhancement size/L 
    opened by kozistr 1
  • [CI] Add `pytest-testmon` to reduce testing time

    [CI] Add `pytest-testmon` to reduce testing time

    Problem (Why?)

    only to run needed tests, not the whole cases.

    Solution (What/How?)

    • [x] add pytest-testmon

    Other changes (bug fixes, small refactors)

    nope

    Notes

    nope

    dependencies size/M 
    opened by kozistr 1
  • [Build] Upgrade Python version to 3.11 for CI/CD pipeline

    [Build] Upgrade Python version to 3.11 for CI/CD pipeline

    Problem (Why?)

    just upgrading

    Solution (What/How?)

    • [x] Python version to 3.11 for CI/CD pipeline
    • [x] github action
      • [x] codecov/codecov-action to v3
      • [x] actions/setup-python to v4
    • [x] remove CUDA-related packages from the dependencies manually
    • [x] upgrade dev dependencies
    • [x] replace lint.py with pylint built-in option (fail-under)
    • [x] update .pylintrc

    Other changes (bug fixes, small refactors)

    nope

    Notes

    nope

    dependencies size/L 
    opened by kozistr 1
  • [Build] Bump setuptools from 65.5.0 to 65.5.1

    [Build] Bump setuptools from 65.5.0 to 65.5.1

    Bumps setuptools from 65.5.0 to 65.5.1.

    Changelog

    Sourced from setuptools's changelog.

    v65.5.1

    Misc ^^^^

    • #3638: Drop a test dependency on the mock package, always use :external+python:py:mod:unittest.mock -- by :user:hroncok
    • #3659: Fixed REDoS vector in package_index.
    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies size/XS 
    opened by dependabot[bot] 1
  • Versions of codes that work with half precision models

    Versions of codes that work with half precision models

    Hi I just discovered your repo and I would like to try it to fine-tune my ParlAI blenderbot2 (see https://github.com/facebookresearch/ParlAI) model. However, I am running the model in FP16 precision to make better use of my GPU. ParlAI has versions of a few optimizers that can use FP16 models, and I have tried installing a couple of other optimizers that can also work with FP16 models by casting the state parameters and gradients to FP32 within the optimizer, determining the new state parameters with FP32 accuracy, and recasting the state parameters back to FP16 for updating the model. If you had a version of your library that automatically did this, it would greatly simplify its use with FP16 precision models. Thanks!

    P.S. It looks like adabelief, radam, and diffrgrad do something like this, but not in a consistent way.

    feature request 
    opened by sjscotti 1
Releases(v2.1.1)
Owner
Hyeongchan Kim
Machine Learning Researcher
Hyeongchan Kim
A lightweight wrapper for PyTorch that provides a simple declarative API for context switching between devices, distributed modes, mixed-precision, and PyTorch extensions.

A lightweight wrapper for PyTorch that provides a simple declarative API for context switching between devices, distributed modes, mixed-precision, and PyTorch extensions.

Fidelity Investments 56 Sep 13, 2022
Implements pytorch code for the Accelerated SGD algorithm.

AccSGD This is the code associated with Accelerated SGD algorithm used in the paper On the insufficiency of existing momentum schemes for Stochastic O

205 Jan 02, 2023
A collection of extensions and data-loaders for few-shot learning & meta-learning in PyTorch

Torchmeta A collection of extensions and data-loaders for few-shot learning & meta-learning in PyTorch. Torchmeta contains popular meta-learning bench

Tristan Deleu 1.7k Jan 06, 2023
PyTorch framework A simple and complete framework for PyTorch, providing a variety of data loading and simple task solutions that are easy to extend and migrate

PyTorch framework A simple and complete framework for PyTorch, providing a variety of data loading and simple task solutions that are easy to extend and migrate

Cong Cai 12 Dec 19, 2021
This is an differentiable pytorch implementation of SIFT patch descriptor.

This is an differentiable pytorch implementation of SIFT patch descriptor. It is very slow for describing one patch, but quite fast for batch. It can

Dmytro Mishkin 150 Dec 24, 2022
Code for paper "Energy-Constrained Compression for Deep Neural Networks via Weighted Sparse Projection and Layer Input Masking"

model_based_energy_constrained_compression Code for paper "Energy-Constrained Compression for Deep Neural Networks via Weighted Sparse Projection and

Haichuan Yang 16 Jun 15, 2022
pip install antialiased-cnns to improve stability and accuracy

Antialiased CNNs [Project Page] [Paper] [Talk] Making Convolutional Networks Shift-Invariant Again Richard Zhang. In ICML, 2019. Quick & easy start Ru

Adobe, Inc. 1.6k Dec 28, 2022
A PyTorch implementation of Learning to learn by gradient descent by gradient descent

Intro PyTorch implementation of Learning to learn by gradient descent by gradient descent. Run python main.py TODO Initial implementation Toy data LST

Ilya Kostrikov 300 Dec 11, 2022
The goal of this library is to generate more helpful exception messages for numpy/pytorch matrix algebra expressions.

Tensor Sensor See article Clarifying exceptions and visualizing tensor operations in deep learning code. One of the biggest challenges when writing co

Terence Parr 704 Dec 14, 2022
Fast and Easy-to-use Distributed Graph Learning for PyTorch Geometric

Fast and Easy-to-use Distributed Graph Learning for PyTorch Geometric

Quiver Team 221 Dec 22, 2022
Kaldi-compatible feature extraction with PyTorch, supporting CUDA, batch processing, chunk processing, and autograd

Kaldi-compatible feature extraction with PyTorch, supporting CUDA, batch processing, chunk processing, and autograd

Fangjun Kuang 119 Jan 03, 2023
A pure Python implementation of Compact Bilinear Pooling and Count Sketch for PyTorch.

Compact Bilinear Pooling for PyTorch. This repository has a pure Python implementation of Compact Bilinear Pooling and Count Sketch for PyTorch. This

Grégoire Payen de La Garanderie 234 Dec 07, 2022
Training PyTorch models with differential privacy

Opacus is a library that enables training PyTorch models with differential privacy. It supports training with minimal code changes required on the cli

1.3k Dec 29, 2022
Over9000 optimizer

Optimizers and tests Every result is avg of 20 runs. Dataset LR Schedule Imagenette size 128, 5 epoch Imagewoof size 128, 5 epoch Adam - baseline OneC

Mikhail Grankin 405 Nov 27, 2022
Riemannian Adaptive Optimization Methods with pytorch optim

geoopt Manifold aware pytorch.optim. Unofficial implementation for “Riemannian Adaptive Optimization Methods” ICLR2019 and more. Installation Make sur

642 Jan 03, 2023
On the Variance of the Adaptive Learning Rate and Beyond

RAdam On the Variance of the Adaptive Learning Rate and Beyond We are in an early-release beta. Expect some adventures and rough edges. Table of Conte

Liyuan Liu 2.5k Dec 27, 2022
PyTorch Extension Library of Optimized Autograd Sparse Matrix Operations

PyTorch Sparse This package consists of a small extension library of optimized sparse matrix operations with autograd support. This package currently

Matthias Fey 757 Jan 04, 2023
PyTorch Implementation of [1611.06440] Pruning Convolutional Neural Networks for Resource Efficient Inference

PyTorch implementation of [1611.06440 Pruning Convolutional Neural Networks for Resource Efficient Inference] This demonstrates pruning a VGG16 based

Jacob Gildenblat 836 Dec 26, 2022
A very simple and small path tracer written in pytorch meant to be run on the GPU

MentisOculi Pytorch Path Tracer A very simple and small path tracer written in pytorch meant to be run on the GPU Why use pytorch and not some other c

Matthew B. Mirman 222 Dec 01, 2022
Differentiable ODE solvers with full GPU support and O(1)-memory backpropagation.

PyTorch Implementation of Differentiable ODE Solvers This library provides ordinary differential equation (ODE) solvers implemented in PyTorch. Backpr

Ricky Chen 4.4k Jan 04, 2023