Bunch of optimizer implementations in PyTorch

Last update: Jan 03, 2023

Overview

pytorch-optimizer

Bunch of optimizer implementations in PyTorch with clean-code, strict types. Also, including useful optimization ideas.

Most of the implementations are based on the original paper, but I added some tweaks.

Highly inspired by pytorch-optimizer.

Documentation

https://pytorch-optimizers.readthedocs.io/en/latest/

Usage

Install

$ pip3 install pytorch-optimizer

Simple Usage

from pytorch_optimizer import Ranger21

...
model = YourModel()
optimizer = Ranger21(model.parameters())
...

for input, output in data:
  optimizer.zero_grad()
  loss = loss_function(output, model(input))
  loss.backward()
  optimizer.step()

Supported Optimizers

Optimizer	Description	Official Code	Paper
AdaBelief	Adapting Stepsizes by the Belief in Observed Gradients	github	https://arxiv.org/abs/2010.07468
AdaBound	Adaptive Gradient Methods with Dynamic Bound of Learning Rate	github	https://openreview.net/forum?id=Bkg3g2R9FX
AdaHessian	An Adaptive Second Order Optimizer for Machine Learning	github	https://arxiv.org/abs/2006.00719
AdamP	Slowing Down the Slowdown for Momentum Optimizers on Scale-invariant Weights	github	https://arxiv.org/abs/2006.08217
diffGrad	An Optimization Method for Convolutional Neural Networks	github	https://arxiv.org/abs/1909.11015v3
MADGRAD	A Momentumized, Adaptive, Dual Averaged Gradient Method for Stochastic	github	https://arxiv.org/abs/2101.11075
RAdam	On the Variance of the Adaptive Learning Rate and Beyond	github	https://arxiv.org/abs/1908.03265
Ranger	a synergistic optimizer combining RAdam and LookAhead, and now GC in one optimizer	github	https://bit.ly/3zyspC3
Ranger21	a synergistic deep learning optimizer	github	https://arxiv.org/abs/2106.13731

Useful Resources

Several optimization ideas to regularize & stabilize the training. Most of the ideas are applied in Ranger21 optimizer.

Also, most of the captures are taken from Ranger21 paper.

Adaptive Gradient Clipping	Gradient Centralization	Softplus Transformation
Gradient Normalization	Norm Loss	Positive-Negative Momentum
Linear learning rate warmup	Stable weight decay	Explore-exploit learning rate schedule
Lookahead	Chebyshev learning rate schedule	(Adaptive) Sharpness-Aware Minimization
On the Convergence of Adam and Beyond

Adaptive Gradient Clipping

This idea originally proposed in NFNet (Normalized-Free Network) paper.

AGC (Adaptive Gradient Clipping) clips gradients based on the unit-wise ratio of gradient norms to parameter norms.

code : github
paper : arXiv

Gradient Centralization

Gradient Centralization (GC) operates directly on gradients by centralizing the gradient to have zero mean.

code : github
paper : arXiv

Softplus Transformation

By running the final variance denom through the softplus function, it lifts extremely tiny values to keep them viable.

paper : arXiv

Gradient Normalization

Norm Loss

paper : arXiv

Positive-Negative Momentum

code : github
paper : arXiv

Linear learning rate warmup

https://raw.githubusercontent.com/kozistr/pytorch_optimizer/main/assets/linear_lr_warmup.png

paper : arXiv

Stable weight decay

code : github
paper : arXiv

Explore-exploit learning rate schedule

https://raw.githubusercontent.com/kozistr/pytorch_optimizer/main/assets/explore_exploit_lr_schedule.png

code : github
paper : arXiv

Lookahead

k steps forward, 1 step back. Lookahead consisting of keeping an exponential moving average of the weights that is

updated and substituted to the current weights every k_{lookahead} steps (5 by default).

code : github
paper : arXiv

Chebyshev learning rate schedule

Acceleration via Fractal Learning Rate Schedules

paper : arXiv

(Adaptive) Sharpness-Aware Minimization

Sharpness-Aware Minimization (SAM) simultaneously minimizes loss value and loss sharpness.

In particular, it seeks parameters that lie in neighborhoods having uniformly low loss.

SAM paper : paper
ASAM paper : paper
A/SAM code : github

On the Convergence of Adam and Beyond

paper : paper

Citations

AdamP

@inproceedings{heo2021adamp,
    title={AdamP: Slowing Down the Slowdown for Momentum Optimizers on Scale-invariant Weights},
    author={Heo, Byeongho and Chun, Sanghyuk and Oh, Seong Joon and Han, Dongyoon and Yun, Sangdoo and Kim, Gyuwan and Uh, Youngjung and Ha, Jung-Woo},
    year={2021},
    booktitle={International Conference on Learning Representations (ICLR)},
}

Adaptive Gradient Clipping (AGC)

@article{brock2021high,
  author={Andrew Brock and Soham De and Samuel L. Smith and Karen Simonyan},
  title={High-Performance Large-Scale Image Recognition Without Normalization},
  journal={arXiv preprint arXiv:2102.06171},
  year={2021}
}

Chebyshev LR Schedules

@article{agarwal2021acceleration,
  title={Acceleration via Fractal Learning Rate Schedules},
  author={Agarwal, Naman and Goel, Surbhi and Zhang, Cyril},
  journal={arXiv preprint arXiv:2103.01338},
  year={2021}
}

Gradient Centralization (GC)

@inproceedings{yong2020gradient,
  title={Gradient centralization: A new optimization technique for deep neural networks},
  author={Yong, Hongwei and Huang, Jianqiang and Hua, Xiansheng and Zhang, Lei},
  booktitle={European Conference on Computer Vision},
  pages={635--652},
  year={2020},
  organization={Springer}
}

Lookahead

@article{zhang2019lookahead,
  title={Lookahead optimizer: k steps forward, 1 step back},
  author={Zhang, Michael R and Lucas, James and Hinton, Geoffrey and Ba, Jimmy},
  journal={arXiv preprint arXiv:1907.08610},
  year={2019}
}

RAdam

@inproceedings{liu2019radam,
 author = {Liu, Liyuan and Jiang, Haoming and He, Pengcheng and Chen, Weizhu and Liu, Xiaodong and Gao, Jianfeng and Han, Jiawei},
 booktitle = {Proceedings of the Eighth International Conference on Learning Representations (ICLR 2020)},
 month = {April},
 title = {On the Variance of the Adaptive Learning Rate and Beyond},
 year = {2020}
}

Norm Loss

@inproceedings{georgiou2021norm,
  title={Norm Loss: An efficient yet effective regularization method for deep neural networks},
  author={Georgiou, Theodoros and Schmitt, Sebastian and B{\"a}ck, Thomas and Chen, Wei and Lew, Michael},
  booktitle={2020 25th International Conference on Pattern Recognition (ICPR)},
  pages={8812--8818},
  year={2021},
  organization={IEEE}
}

Positive-Negative Momentum

@article{xie2021positive,
  title={Positive-Negative Momentum: Manipulating Stochastic Gradient Noise to Improve Generalization},
  author={Xie, Zeke and Yuan, Li and Zhu, Zhanxing and Sugiyama, Masashi},
  journal={arXiv preprint arXiv:2103.17182},
  year={2021}
}

Explore-Exploit learning rate schedule

@article{iyer2020wide,
  title={Wide-minima Density Hypothesis and the Explore-Exploit Learning Rate Schedule},
  author={Iyer, Nikhil and Thejas, V and Kwatra, Nipun and Ramjee, Ramachandran and Sivathanu, Muthian},
  journal={arXiv preprint arXiv:2003.03977},
  year={2020}
}

Linear learning-rate warm-up

@article{ma2019adequacy,
  title={On the adequacy of untuned warmup for adaptive optimization},
  author={Ma, Jerry and Yarats, Denis},
  journal={arXiv preprint arXiv:1910.04209},
  volume={7},
  year={2019}
}

Stable weight decay

@article{xie2020stable,
  title={Stable weight decay regularization},
  author={Xie, Zeke and Sato, Issei and Sugiyama, Masashi},
  journal={arXiv preprint arXiv:2011.11152},
  year={2020}
}

Softplus transformation

@article{tong2019calibrating,
  title={Calibrating the adaptive learning rate to improve convergence of adam},
  author={Tong, Qianqian and Liang, Guannan and Bi, Jinbo},
  journal={arXiv preprint arXiv:1908.00700},
  year={2019}
}

MADGRAD

@article{defazio2021adaptivity,
  title={Adaptivity without compromise: a momentumized, adaptive, dual averaged gradient method for stochastic optimization},
  author={Defazio, Aaron and Jelassi, Samy},
  journal={arXiv preprint arXiv:2101.11075},
  year={2021}
}

AdaHessian

@article{yao2020adahessian,
  title={ADAHESSIAN: An adaptive second order optimizer for machine learning},
  author={Yao, Zhewei and Gholami, Amir and Shen, Sheng and Mustafa, Mustafa and Keutzer, Kurt and Mahoney, Michael W},
  journal={arXiv preprint arXiv:2006.00719},
  year={2020}
}

AdaBound

@inproceedings{Luo2019AdaBound,
  author = {Luo, Liangchen and Xiong, Yuanhao and Liu, Yan and Sun, Xu},
  title = {Adaptive Gradient Methods with Dynamic Bound of Learning Rate},
  booktitle = {Proceedings of the 7th International Conference on Learning Representations},
  month = {May},
  year = {2019},
  address = {New Orleans, Louisiana}
}

AdaBelief

@article{zhuang2020adabelief,
  title={Adabelief optimizer: Adapting stepsizes by the belief in observed gradients},
  author={Zhuang, Juntang and Tang, Tommy and Ding, Yifan and Tatikonda, Sekhar and Dvornek, Nicha and Papademetris, Xenophon and Duncan, James S},
  journal={arXiv preprint arXiv:2010.07468},
  year={2020}
}

Sharpness-Aware Minimization

@article{foret2020sharpness,
  title={Sharpness-aware minimization for efficiently improving generalization},
  author={Foret, Pierre and Kleiner, Ariel and Mobahi, Hossein and Neyshabur, Behnam},
  journal={arXiv preprint arXiv:2010.01412},
  year={2020}
}

Adaptive Sharpness-Aware Minimization

@article{kwon2021asam,
  title={ASAM: Adaptive Sharpness-Aware Minimization for Scale-Invariant Learning of Deep Neural Networks},
  author={Kwon, Jungmin and Kim, Jeongseop and Park, Hyunseo and Choi, In Kwon},
  journal={arXiv preprint arXiv:2102.11600},
  year={2021}
}

diffGrad

@article{dubey2019diffgrad,
  title={diffgrad: An optimization method for convolutional neural networks},
  author={Dubey, Shiv Ram and Chakraborty, Soumendu and Roy, Swalpa Kumar and Mukherjee, Snehasis and Singh, Satish Kumar and Chaudhuri, Bidyut Baran},
  journal={IEEE transactions on neural networks and learning systems},
  volume={31},
  number={11},
  pages={4500--4511},
  year={2019},
  publisher={IEEE}
}

On the Convergence of Adam and Beyond

@article{reddi2019convergence,
  title={On the convergence of adam and beyond},
  author={Reddi, Sashank J and Kale, Satyen and Kumar, Sanjiv},
  journal={arXiv preprint arXiv:1904.09237},
  year={2019}
}

Author

Hyeongchan Kim / @kozistr

Comments

Sharpness Aware Minimization (SAM) requires closure

Hi, thank you so much for your repo, I am using SAM optimizer but I am facing this error, how to fix this?

RuntimeError: [-] Sharpness Aware Minimization (SAM) requires closure
question

opened by manza-ari 21

Trying to use SAM optimizer for Random Sampling Image Classification

I am trying to use SAM optimizer when I use the backward function twice in train_epoch() # second forward-backward pass, it gives me an error otherwise it works fine.

Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [512, 100]], which is output 0 of AsStridedBackward0, is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

`def train_epoch(models, criterion, optimizers, dataloaders):

models.train()
global iters
for data in tqdm(dataloaders['train'], leave=False, total=len(dataloaders['train'])):
    with torch.cuda.device(CUDA_VISIBLE_DEVICES):
        inputs = data[0].cuda()
        labels = data[1].cuda()
    iters += 1
    optimizers.zero_grad()  
    #pdb.set_trace()      
    scores, _, features = models(inputs) 
    
    target_loss = criterion(scores, labels)
    m_backbone_loss = torch.sum(target_loss) / target_loss.size(0)        
    loss  = m_backbone_loss
     # -----------------SAM Optimizer -------------------
    # first forward-backward pass
    criterion(models(inputs)[0], labels)
    loss.backward(retain_graph=True)
    optimizers.first_step(zero_grad=True)
    
    # second forward-backward pass
    criterion(models(inputs)[0], labels)
    loss.backward(retain_graph=True)
    optimizers.second_step(zero_grad=True)
#return loss`

question

opened by manza-ari 14

Ranger21 does not work

Below is the trace when I try to use Ranger21, other optimizers work as they should

c:\users\g\appdata\local\programs\python\python38\lib\site-packages\pytorch_optimizer\ranger21.py in init(self, params, lr, beta0, betas, use_softplus, beta_softplus, num_iterations, num_warm_up_iterations, num_warm_down_iterations, warm_down_min_lr, agc_clipping_value, agc_eps, centralize_gradients, normalize_gradients, lookahead_merge_time, lookahead_blending_alpha, weight_decay, norm_loss_factor, eps) 114 # warmup iterations 115 self.num_warm_up_iterations: int = ( --> 116 self.build_warm_up_iterations(num_iterations, betas[1]) 117 if num_warm_up_iterations is None 118 else num_warm_up_iterations

c:\users\g\appdata\local\programs\python\python38\lib\site-packages\pytorch_optimizer\ranger21.py in build_warm_up_iterations(total_iterations, beta2, warm_up_pct) 150 def build_warm_up_iterations(total_iterations: int, beta2: float, warm_up_pct: float = 0.22) -> int: 151 warm_up_iterations: int = math.ceil(2.0 / (1.0 - beta2)) # default un-tuned linear warmup --> 152 beta_pct: float = warm_up_iterations / total_iterations 153 if beta_pct > 0.45: 154 return int(warm_up_pct * total_iterations)

TypeError: unsupported operand type(s) for /: 'int' and 'NoneType'
bug

opened by BaconGabe 3
[Feature] support torch.hub.load
Problem (Why?)

loading optimizers via torch.hub.load

Solution (What/How?)

it is inconvenient to define functions one by one, I used a trick using globals().

Other changes (bug fixes, small refactors)

Change Callable to Type[Optimizer]. Perhaps this is what you intended. see: typing.Type

Notes

example colab

import torch Adan = torch.hub.load("Bing-su/pytorch_optimizer:hubconf", "Adan")
feature size/S
opened by Bing-su 2
[Test] Increase the test coverage
Problem (Why?)

heading to coverage 98%

Solution (What/How?)

[x] update test_no_gradient

Other changes (bug fixes, small refactors)

[x] fix API documentation

Notes

nope
documentation enhancement size/XL
opened by kozistr 1
[Refactor/Docs] Organize Class docstring & Add custom exceptions
Problem (Why?)

there's no proper exception class (e.g. no sparse gradient, zero parameter size)

Solution (What/How?)

[x] register custom exceptions

[x] refactor the docstrings

[x] support gradient centralization for Adai optimizer

[x] support AdamD debias for AdaPNM optimizer

[x] fix SAM optimizer

[x] add API documentation

Other changes (bug fixes, small refactors)

[x] wrapper to the module (not __init__) in hubconf.py

[x] add a citation to README.rst

Notes

to v2.1.1
bug documentation enhancement refactoring size/XXL
opened by kozistr 1
[Feature] Implement `Adai` optimizer
Problem (Why?)

Implement Adai optimizer

Solution (What/How?)

[x] implement Adai & Adai v2 optimizers

Other changes (bug fixes, small refactors)

nope

Notes

version to v2.1.0
feature size/L
opened by kozistr 1
[CI] Reduce `num_iterations` to speed up the testing
Problem (Why?)

num_iterations to train the model is kinda enough while the testing takes about 2 mins.

Solution (What/How?)

[x] reduce num_iterations to 100 ~ 200, which is enough to train the model (it takes 2 mins -> 1 min)

Other changes (bug fixes, small refactors)

[x] explicit torch package to CPU version

Notes

nope
enhancement size/L
opened by kozistr 1
[CI] Add `pytest-testmon` to reduce testing time
Problem (Why?)

only to run needed tests, not the whole cases.

Solution (What/How?)

[x] add pytest-testmon

Other changes (bug fixes, small refactors)

nope

Notes

nope
dependencies size/M
opened by kozistr 1
[Build] Upgrade Python version to 3.11 for CI/CD pipeline
Problem (Why?)

just upgrading

Solution (What/How?)

[x] Python version to 3.11 for CI/CD pipeline

[x] github action

[x] codecov/codecov-action to v3

[x] actions/setup-python to v4

[x] remove CUDA-related packages from the dependencies manually

[x] upgrade dev dependencies

[x] replace lint.py with pylint built-in option (fail-under)

[x] update .pylintrc

Other changes (bug fixes, small refactors)

nope

Notes

nope
dependencies size/L
opened by kozistr 1
[Build] Bump setuptools from 65.5.0 to 65.5.1
Bumps setuptools from 65.5.0 to 65.5.1.

Changelog

Sourced from setuptools's changelog.

v65.5.1

Misc ^^^^

#3638: Drop a test dependency on the mock package, always use :external+python:py:mod:unittest.mock -- by :user:hroncok

#3659: Fixed REDoS vector in package_index.

Commits

a462cb5 Bump version: 65.5.0 → 65.5.1

de35d8b Merge pull request #3656 from bmorris3/typos

58e23de Update changelog. Ref #3659.

43a9c9b Limit the amount of whitespace to search/backtrack. Fixes #3659.

5791343 Add test capturing failed expectation. Ref #3659.

1f97905 ⚫ Fade to black.

6254567 Remove workaround for emacs.

729b180 ⚫ Fade to black.

c068081 Typo corrections

f777a40 Suppress deprecation warning in --rsyncdir. Workaround for #3655.

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies size/XS
opened by dependabot[bot] 1
Versions of codes that work with half precision models

Hi I just discovered your repo and I would like to try it to fine-tune my ParlAI blenderbot2 (see https://github.com/facebookresearch/ParlAI) model. However, I am running the model in FP16 precision to make better use of my GPU. ParlAI has versions of a few optimizers that can use FP16 models, and I have tried installing a couple of other optimizers that can also work with FP16 models by casting the state parameters and gradients to FP32 within the optimizer, determining the new state parameters with FP32 accuracy, and recasting the state parameters back to FP16 for updating the model. If you had a version of your library that automatically did this, it would greatly simplify its use with FP16 precision models. Thanks!

P.S. It looks like adabelief, radam, and diffrgrad do something like this, but not in a consistent way.
feature request

opened by sjscotti 1

Releases(v2.1.1)

v2.1.1(Jan 2, 2023)
Change Log

#90

Feature

Support gradient centralization for Adai optimizer

Support AdamD debias for AdaPNM optimizer

Register custom exceptions (e.g. NoSparseGradientError, NoClosureError, ...)

Documentation

Add API documentation

Bug

Fix SAM optimizer

Source code(tar.gz)
Source code(zip)
v2.1.0(Jan 1, 2023)
Change Log

Implement Adai optimizer, #89

Speed up the testing, #88

Upgrade to Python 3.11 (only for CI pipeline), #86

Source code(tar.gz)
Source code(zip)
v2.0.1(Nov 1, 2022)
Change Log

[Fix] update hubconf.py, #81

[Fix] python 3.7 for Colab environment, #83 (#82)

Source code(tar.gz)
Source code(zip)
v2.0.0(Oct 21, 2022)
Chage Log

[x] Refactor the package depth

4 depths

pytorch_optimizer.lr_scheduler : lr schedulers

pytorch_optimizer.optimizer : optimizers

pytorch_optimizer.base : base utils

pytorch_optimizer.experimental : any experimental features

pytorch_optimizer.adamp -> pytorch_optimizer.optimizer.adamp

Still from pytorch_optimizer import AdamP is possible

[x] Implement lr schedulers

[x] CosineAnealingWarmupRestarts

[x] Implement (experimental) lr schedulers

[x] DeBERTaV3-large layer-wise lr scheduler

Other changes (bug fixes, small refactors)

[x] Fix AGC (to returning the parameter)

[x] Make a room for experimental features (at pytorch_optimizer.experimental)

[x] base types

Source code(tar.gz)
Source code(zip)
v1.3.2(Sep 2, 2022)
Change Log

torch.hub usage in docs, #76

Adan optimizer, #77

fix: forgot to divide into beta_correction

feat: support weight_decouple

Source code(tar.gz)
Source code(zip)
v1.3.1(Sep 1, 2022)
Change Log

[x] raw directive in RST format cannot be used due to the security issue e.g. code injection. #75

Source code(tar.gz)
Source code(zip)
v1.3.0(Sep 1, 2022)
Change Log

[x] Support torch.hub.load to load the pytorch_optimizer, #73

Contributions

thanks to

@Bing-su
Source code(tar.gz)
Source code(zip)
v1.2.0(Aug 26, 2022)
Change Log

Add a new optimizer, Adan optimizer. #69

Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models

Source code(tar.gz)
Source code(zip)
v1.1.4(Aug 25, 2022)
Just minor changes on the dependencies.

Change Log

required torch version to >=1.10 (just for CI/CD), PR68

still supports all versions of the torch (1.x).

Source code(tar.gz)
Source code(zip)
v1.1.3(Aug 23, 2022)

Source code(tar.gz)
Source code(zip)
v1.1.2(Jun 1, 2022)
Change Log

Fix Ranger21 parameters, not to assign a default value None to num_iterations parameter. Issue

Source code(tar.gz)
Source code(zip)
v1.1.1(May 9, 2022)

Source code(tar.gz)
Source code(zip)
v1.1.0(May 8, 2022)

Source code(tar.gz)
Source code(zip)
v1.0.0(May 7, 2022)

Source code(tar.gz)
Source code(zip)
v0.6.1(May 7, 2022)

Source code(tar.gz)
Source code(zip)
v0.6.0(Apr 2, 2022)

Source code(tar.gz)
Source code(zip)
v0.5.0(Mar 5, 2022)

Source code(tar.gz)
Source code(zip)
v0.4.2(Mar 5, 2022)

Source code(tar.gz)
Source code(zip)
v0.4.1(Feb 20, 2022)

Source code(tar.gz)
Source code(zip)
v0.4.0(Feb 19, 2022)
remove .data

torch.no_grad() to the all step() function (except closure())

support bfloat16 dtype (XLA compatibility)

Source code(tar.gz)
Source code(zip)
v0.3.7(Feb 1, 2022)

Source code(tar.gz)
Source code(zip)
v0.3.6(Jan 31, 2022)

Source code(tar.gz)
Source code(zip)
v0.3.5(Jan 30, 2022)
[x] Refactor test modules

merge fp32 & fp16 recipes into one list

[x] test cases for Lookahead

[x] test case for no gradient

[x] fix Ranger21 optimizer able to handle no gradient

[x] improve stability pre_norm for Lamb optimizer

Source code(tar.gz)
Source code(zip)
v0.3.4(Jan 29, 2022)

Source code(tar.gz)
Source code(zip)
v0.3.3(Jan 29, 2022)

Source code(tar.gz)
Source code(zip)
v0.3.2(Jan 28, 2022)

Source code(tar.gz)
Source code(zip)
v0.3.1(Jan 28, 2022)

Source code(tar.gz)
Source code(zip)
v0.3.0(Jan 28, 2022)

Source code(tar.gz)
Source code(zip)
v0.2.2(Nov 29, 2021)

Source code(tar.gz)
Source code(zip)
v0.2.1(Nov 22, 2021)

Source code(tar.gz)
Source code(zip)

Owner

Hyeongchan Kim

Machine Learning Researcher

GitHub Repository https://pytorch-optimizers.readthedocs.io/en/latest/

Implementation of LambdaNetworks, a new approach to image recognition that reaches SOTA with less compute

Lambda Networks - Pytorch Implementation of λ Networks, a new approach to image recognition that reaches SOTA on ImageNet. The new method utilizes λ l

1.5k Jan 07, 2023

Tacotron 2 - PyTorch implementation with faster-than-realtime inference

Tacotron 2 (without wavenet) PyTorch implementation of Natural TTS Synthesis By Conditioning Wavenet On Mel Spectrogram Predictions. This implementati

4.1k Jan 03, 2023

High-level batteries-included neural network training library for Pytorch

Pywick High-Level Training framework for Pytorch Pywick is a high-level Pytorch training framework that aims to get you up and running quickly with st

382 Dec 06, 2022

Over9000 optimizer

Optimizers and tests Every result is avg of 20 runs. Dataset LR Schedule Imagenette size 128, 5 epoch Imagewoof size 128, 5 epoch Adam - baseline OneC

405 Nov 27, 2022

A collection of extensions and data-loaders for few-shot learning & meta-learning in PyTorch

Torchmeta A collection of extensions and data-loaders for few-shot learning & meta-learning in PyTorch. Torchmeta contains popular meta-learning bench

1.7k Jan 06, 2023

GPU-accelerated PyTorch implementation of Zero-shot User Intent Detection via Capsule Neural Networks

GPU-accelerated PyTorch implementation of Zero-shot User Intent Detection via Capsule Neural Networks This repository implements a capsule model Inten

15 Dec 24, 2022

PyNIF3D is an open-source PyTorch-based library for research on neural implicit functions (NIF)-based 3D geometry representation.

PyNIF3D is an open-source PyTorch-based library for research on neural implicit functions (NIF)-based 3D geometry representation. It aims to accelerate research by providing a modular design that all

96 Nov 28, 2022

The goal of this library is to generate more helpful exception messages for numpy/pytorch matrix algebra expressions.

Tensor Sensor See article Clarifying exceptions and visualizing tensor operations in deep learning code. One of the biggest challenges when writing co

704 Dec 14, 2022

A pure Python implementation of Compact Bilinear Pooling and Count Sketch for PyTorch.

Compact Bilinear Pooling for PyTorch. This repository has a pure Python implementation of Compact Bilinear Pooling and Count Sketch for PyTorch. This

234 Dec 07, 2022

Pytorch implementation of Distributed Proximal Policy Optimization

Pytorch-DPPO Pytorch implementation of Distributed Proximal Policy Optimization: https://arxiv.org/abs/1707.02286 Using PPO with clip loss (from https

164 Jan 05, 2023

Bunch of optimizer implementations in PyTorch

76 Jan 03, 2023

PyTorch implementations of normalizing flow and its variants.

55 Dec 01, 2022

Differentiable SDE solvers with GPU support and efficient sensitivity analysis.

PyTorch Implementation of Differentiable SDE Solvers This library provides stochastic differential equation (SDE) solvers with GPU support and efficie

1.2k Jan 04, 2023

A code copied from google-research which named motion-imitation was rewrited with PyTorch

motor-system Introduction A code copied from google-research which named motion-imitation was rewrited with PyTorch. More details can get from this pr

6 Jan 08, 2022

PyTorch implementation of Glow, Generative Flow with Invertible 1x1 Convolutions

glow-pytorch PyTorch implementation of Glow, Generative Flow with Invertible 1x1 Convolutions

433 Dec 27, 2022

A lightweight wrapper for PyTorch that provides a simple declarative API for context switching between devices, distributed modes, mixed-precision, and PyTorch extensions.

56 Sep 13, 2022

Bunch of optimizer implementations in PyTorch

Related tags

Overview

pytorch-optimizer

Documentation

Usage

Install

Simple Usage

Supported Optimizers

Useful Resources

Adaptive Gradient Clipping

Gradient Centralization

Softplus Transformation

Gradient Normalization

Norm Loss

Positive-Negative Momentum

Linear learning rate warmup

Stable weight decay

Explore-exploit learning rate schedule

Lookahead

Chebyshev learning rate schedule

(Adaptive) Sharpness-Aware Minimization

On the Convergence of Adam and Beyond

Citations

Author

Comments

Problem (Why?)

Solution (What/How?)

Other changes (bug fixes, small refactors)

Notes

Problem (Why?)

Solution (What/How?)

Other changes (bug fixes, small refactors)

Notes

Problem (Why?)

Solution (What/How?)

Other changes (bug fixes, small refactors)

Notes

Problem (Why?)

Solution (What/How?)

Other changes (bug fixes, small refactors)

Notes

Problem (Why?)

Solution (What/How?)

Other changes (bug fixes, small refactors)

Notes

Problem (Why?)

Solution (What/How?)

Other changes (bug fixes, small refactors)

Notes

Problem (Why?)

Solution (What/How?)

Other changes (bug fixes, small refactors)

Notes

v65.5.1

Releases(v2.1.1)

v2.1.1(Jan 2, 2023)

Change Log

Feature

Documentation

Bug

v2.1.0(Jan 1, 2023)

Change Log

v2.0.1(Nov 1, 2022)

Change Log

v2.0.0(Oct 21, 2022)

Chage Log

Other changes (bug fixes, small refactors)

v1.3.2(Sep 2, 2022)

Change Log

v1.3.1(Sep 1, 2022)

Change Log

v1.3.0(Sep 1, 2022)

Change Log

Contributions

v1.2.0(Aug 26, 2022)

Change Log

v1.1.4(Aug 25, 2022)

Change Log

v1.1.3(Aug 23, 2022)