Escaping the Gradient Vanishing: Periodic Alternatives of Softmax in Attention Mechanism

Last update: Sep 06, 2021

Overview

Period-alternatives-of-Softmax

Experimental Demo for our paper

'Escaping the Gradient Vanishing: Periodic Alternatives of Softmax in Attention Mechanism'

We suggest that replacing the exponential function by periodic functions. Through experiments on a simply designed demo referenced to LeViT, our method is proved to be able to alleviate the gradient problem and yield substantial improvements compared to Softmax and its variants.

** Create your own 'dataset' fold, and maybe need to modify the demo.py file for your own dataset except for cifar-10, cifar-100 and Tiny-imageNet.

Function available:

softmax , norm_softmax
sinmax, norm_sinmax
cosmax, norm_cosmax
sin_2_max, norm_sin_2_max
sin_2_max_move, norm_sin_2_max_move
sirenmax, norm_sirenmax
sin_softmax, norm_sin_softmax

mode available:

search:
        Random search for a suitable set of learning rate and weight decay, and record the results in 
        Attention_test/*functions/lr_wd_search.txt
run:
        Train the demo, and there will be four .npy files created in root.
        (1) 'record_val_acc.npy' for val acc record every 100 iter;
        (2) 'record_train_acc.npy' for train acc record every batch;
        (3) 'record_loss.npy' for train loss record every batch;
        (4) 'kq_value.npy' for Q.K record *before sclaled*.
att_run:
        Same as the run mode but:
        (1) No kq_value record;
        (2) Every 5 epoch, input a test image and record the attention score map of each head of each layer.
            Saved in 'Attention_test/attention_maps.npy'

Escaping the Gradient Vanishing: Periodic Alternatives of Softmax in Attention Mechanism

Related tags

Overview

Period-alternatives-of-Softmax

'Escaping the Gradient Vanishing: Periodic Alternatives of Softmax in Attention Mechanism'

Function available:

mode available:

Owner

slwang9353

A PyTorch implementation of SIN: Superpixel Interpolation Network

Code for the paper "Benchmarking and Analyzing Point Cloud Classification under Corruptions"

Six - a Python 2 and 3 compatibility library

Probabilistic Entity Representation Model for Reasoning over Knowledge Graphs

A self-supervised learning framework for audio-visual speech

In this project we use both Resnet and Self-attention layer for cat, dog and flower classification.

Code, pre-trained models and saliency results for the paper "Boosting RGB-D Saliency Detection by Leveraging Unlabeled RGB Images".

Code for ACL 2019 Paper: "COMET: Commonsense Transformers for Automatic Knowledge Graph Construction"

Extracting knowledge graphs from language models as a diagnostic benchmark of model performance.

CC-GENERATOR - A python script for generating CC

A PyTorch implementation of the paper Mixup: Beyond Empirical Risk Minimization in PyTorch

Denoising Normalizing Flow

KwaiRec: A Fully-observed Dataset for Recommender Systems (Density: Almost 100%)

IEEE-CIS Technical Challenge on Predict+Optimize for Renewable Energy Scheduling

DeepLab resnet v2 model in pytorch

Official implementation of Few-Shot and Continual Learning with Attentive Independent Mechanisms

PConv-Keras - Unofficial implementation of "Image Inpainting for Irregular Holes Using Partial Convolutions". Try at: www.fixmyphoto.ai

RSC-Net: 3D Human Pose, Shape and Texture from Low-Resolution Images and Videos

Code for "NeRS: Neural Reflectance Surfaces for Sparse-View 3D Reconstruction in the Wild," in NeurIPS 2021

PyMove is a Python library to simplify queries and visualization of trajectories and other spatial-temporal data