SAGE: Sensitivity-guided Adaptive Learning Rate for Transformers

Overview

SAGE: Sensitivity-guided Adaptive Learning Rate for Transformers

This repo contains our codes for the paper "No Parameters Left Behind: Sensitivity Guided Adaptive Learning Rate for Training Large Transformer Models" (ICLR 2022).


Getting Start

  1. Pull and run docker
    pytorch/pytorch:1.5.1-cuda10.1-cudnn7-devel
  2. Install requirements
    pip install -r requirements.txt

Data and Model

  1. Download data and pre-trained models
    ./download.sh
    Please refer to this link for details on the GLUE benchmark.
  2. Preprocess data
    ./experiments/glue/prepro.sh
    For the most updated data processing details, please refer to the mt-dnn repo.

Fine-tuning Pre-trained Models using SAGE

We provide an example script for fine-tuning a pre-trained BERT-base model on MNLI using Adamax-SAGE:

./scripts/train_mnli_usadamax.sh GPUID

A few notices:

  • learning_rate and beta3 are two of the most important hyper-parameters. learning_rate that works well for Adamax/AdamW-SAGE is usually 2 to 5 times larger than that works well for Adamax/AdamW, depending on the tasks. beta3 that works well for Adamax/AdamW-SAGE is usually in the range of 0.6 and 0.9, depending on the tasks.

  • To use AdamW-SAGE, set argument --optim=usadamw. The current codebase only contains the implementation of Adamax-SAGE and AdamW-SAGE. Please refer to module/bert_optim.py for details. Please refer to our paper for integrating SAGE on other optimizers.

  • To fine-tune a pre-trained RoBERTa-base model, set arguments --init_checkpoint to the model path and set --encoder_type to 2. Other supported models are listed in pretrained_models.py.

  • To fine-tune on other tasks, set arguments --train_datasets and --test_datasets to the corresponding task names.


Citation

@inproceedings{
liang2022no,
title={No Parameters Left Behind: Sensitivity Guided Adaptive Learning Rate for Training Large Transformer Models},
author={Chen Liang and Haoming Jiang and Simiao Zuo and Pengcheng He and Xiaodong Liu and Jianfeng Gao and Weizhu Chen and Tuo Zhao},
booktitle={International Conference on Learning Representations},
year={2022},
url={https://openreview.net/forum?id=cuvga_CiVND}
}

Contact Information

For help or issues related to this package, please submit a GitHub issue. For personal questions related to this paper, please contact Chen Liang ([email protected]).

Owner
Chen Liang
Chen Liang
A PyTorch implementation of SlowFast based on ICCV 2019 paper "SlowFast Networks for Video Recognition"

SlowFast A PyTorch implementation of SlowFast based on ICCV 2019 paper SlowFast Networks for Video Recognition. Requirements Anaconda PyTorch conda in

Hao Ren 8 Dec 23, 2022
PyTorch experiments with the Zalando fashion-mnist dataset

zalando-pytorch PyTorch experiments with the Zalando fashion-mnist dataset Project Organization ├── LICENSE ├── Makefile - Makefile with co

Federico Baldassarre 31 Sep 25, 2021
Analyses of the individual electric field magnitudes with Roast.

Aloi Davide - PhD Student (UoB) Analysis of electric field magnitudes (wp2a dataset only at the moment) and correlation analysis with Dynamic Causal M

Davide Aloi 7 Dec 15, 2022
Unofficial Pytorch Implementation of WaveGrad2

WaveGrad 2 — Unofficial PyTorch Implementation WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis Unofficial PyTorch+Lightning Implementati

MINDs Lab 104 Nov 29, 2022
Experiments for Fake News explainability project

fake-news-explainability Experiments for fake news explainability project This repository only contains the notebooks used to train the models and eva

Lorenzo Flores (Lj) 1 Dec 03, 2022
Extreme Lightwegith Portrait Segmentation

Extreme Lightwegith Portrait Segmentation Please go to this link to download code Requirements python 3 pytorch = 0.4.1 torchvision==0.2.1 opencv-pyt

HYOJINPARK 59 Dec 16, 2022
[CVPR2021] The source code for our paper 《Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Learning》.

TBE The source code for our paper "Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Le

Jinpeng Wang 150 Dec 28, 2022
QuadTree Attention for Vision Transformers (ICLR2022)

This repository contains codes for quadtree attention. This repo contains codes for feature matching, image classficiation, object detection and seman

tangshitao 222 Dec 28, 2022
Lightweight Salient Object Detection in Optical Remote Sensing Images via Feature Correlation

CorrNet This project provides the code and results for 'Lightweight Salient Object Detection in Optical Remote Sensing Images via Feature Correlation'

Gongyang Li 13 Nov 03, 2022
Answering Open-Domain Questions of Varying Reasoning Steps from Text

This repository contains the authors' implementation of the Iterative Retriever, Reader, and Reranker (IRRR) model in the EMNLP 2021 paper "Answering Open-Domain Questions of Varying Reasoning Steps

26 Dec 22, 2022
Gesture-controlled Video Game. Just swing your finger and play the game without touching your PC

Gesture Controlled Video Game Detailed Blog : https://www.analyticsvidhya.com/blog/2021/06/gesture-controlled-video-game/ Introduction This project is

Devbrat Anuragi 35 Jan 06, 2023
ThunderGBM: Fast GBDTs and Random Forests on GPUs

Documentations | Installation | Parameters | Python (scikit-learn) interface What's new? ThunderGBM won 2019 Best Paper Award from IEEE Transactions o

Xtra Computing Group 647 Jan 04, 2023
UltraGCN: An Ultra Simplification of Graph Convolutional Networks for Recommendation

UltraGCN This is our Pytorch implementation for our CIKM 2021 paper: Kelong Mao, Jieming Zhu, Xi Xiao, Biao Lu, Zhaowei Wang, Xiuqiang He. UltraGCN: A

XUEPAI 93 Jan 03, 2023
tree-math: mathematical operations for JAX pytrees

tree-math: mathematical operations for JAX pytrees tree-math makes it easy to implement numerical algorithms that work on JAX pytrees, such as iterati

Google 137 Dec 28, 2022
Official code of the paper "Expanding Low-Density Latent Regions for Open-Set Object Detection" (CVPR 2022)

OpenDet Expanding Low-Density Latent Regions for Open-Set Object Detection (CVPR2022) Jiaming Han, Yuqiang Ren, Jian Ding, Xingjia Pan, Ke Yan, Gui-So

csuhan 64 Jan 07, 2023
Simulate genealogical trees and genomic sequence data using population genetic models

msprime msprime is a population genetics simulator based on tskit. Msprime can simulate random ancestral histories for a sample of individuals (consis

Tskit developers 150 Dec 14, 2022
Implementation of Ag-Grid component for Streamlit

streamlit-aggrid AgGrid is an awsome grid for web frontend. More information in https://www.ag-grid.com/. Consider purchasing a license from Ag-Grid i

Pablo Fonseca 556 Dec 31, 2022
Code of paper Interact, Embed, and EnlargE (IEEE): Boosting Modality-specific Representations for Multi-Modal Person Re-identification.

Interact, Embed, and EnlargE (IEEE): Boosting Modality-specific Representations for Multi-Modal Person Re-identification We provide the codes for repr

12 Dec 12, 2022
Segmentation and Identification of Vertebrae in CT Scans using CNN, k-means Clustering and k-NN

Segmentation and Identification of Vertebrae in CT Scans using CNN, k-means Clustering and k-NN If you use this code for your research, please cite ou

41 Dec 08, 2022
Differentiable Annealed Importance Sampling (DAIS)

Differentiable Annealed Importance Sampling (DAIS) This repository contains the code to reproduce the DAIS results from the paper Differentiable Annea

Guodong Zhang 6 Dec 26, 2021