Implementation of parameterized soft-exponential activation function.

Last update: Feb 23, 2022

Overview

Soft-Exponential-Activation-Function:

Implementation of parameterized soft-exponential activation function. In this implementation, the parameters are the same for all neurons initially starting with -0.01. This activation function revolves around the idea of a "soft" exponential function. The soft-exponential function is a function that is very similar to the exponential function, but it is not as steep at the beginning and it is more gradual at the end. The soft-exponential function is a good choice for neural networks that have a lot of connections and a lot of neurons.

This activation function is under the idea that the function is logarithmic, linear, exponential and smooth.

The equation for the soft-exponential function is:

$$ f(\alpha,x)= \left{ \begin{array}{ll} -\frac{ln(1-\alpha(x + \alpha))}{\alpha} & \alpha < 0\ x & \alpha = 0 \ \frac{e^{\alpha x} - 1}{\alpha} + \alpha & \alpha > 0 \ \end{array} \right. $$

Problems faced:

1. Misinformation about the function

From a paper by A continuum among logarithmic, linear, and exponential functions, and its potential to improve generalization in neural networks, here in Figure 2, the soft-exponential function is shown as a logarithmic function. This is not the case.

The real figure should be shown here:

Here we can see in some cases the soft-exponential function is undefined for some values of $\alpha$,$x$ and $\alpha$,$x$ is not a constant.

2. Negative values inside logarithm

Here comes the tricky part. The soft-exponential function is defined for all values of $\alpha$ and $x$. However, the logarithm is not defined for negative values.

In the issues under Keras, one of the person has suggested to use the following function $sinh^{-1}()$ instead of the $\ln()$.

3. Initialization of alpha

Starting with an initial value of -0.01, the soft-exponential function was steep at the beginning and it is more gradual at the end. This was a good idea.

Performance:

First picture showing the accuracy of the soft-exponential function.

This shows the loss of the soft-exponential function.

Model Structure:

_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input_1 (InputLayer)        [(None, 28, 28)]          0         
                                                                 
 flatten (Flatten)           (None, 784)               0         
                                                                 
 dense_layer (Dense_layer)   (None, 128)               100480    
                                                                 
 parametric_soft_exp (Parame  (None, 128)              128       
 tricSoftExp)                                                    
                                                                 
 dense_layer_1 (Dense_layer)  (None, 128)              16512     
                                                                 
 parametric_soft_exp_1 (Para  (None, 128)              128       
 metricSoftExp)                                                  
                                                                 
 dense (Dense)               (None, 10)                1290      
                                                                 
=================================================================
Total params: 118,538
Trainable params: 118,538
Non-trainable params: 0

Implementation of parameterized soft-exponential activation function.

Related tags

Overview

Soft-Exponential-Activation-Function:

Problems faced:

1. Misinformation about the function

2. Negative values inside logarithm

3. Initialization of alpha

Performance:

Acknowledgements:

Owner

Shuvrajeet Das

Implementation of a memory efficient multi-head attention as proposed in the paper, "Self-attention Does Not Need O(n²) Memory"

Code for the paper: Hierarchical Reinforcement Learning With Timed Subgoals, published at NeurIPS 2021

CUda Matrix Multiply library.

Frequency Spectrum Augmentation Consistency for Domain Adaptive Object Detection

Code for the upcoming CVPR 2021 paper

Applications using the GTN library and code to reproduce experiments in "Differentiable Weighted Finite-State Transducers"

Genetic feature selection module for scikit-learn

Official pytorch implementation of paper Dual-Level Collaborative Transformer for Image Captioning (AAAI 2021).

A CNN implementation using only numpy. Supports multidimensional images, stride, etc.

Multi-objective gym environments for reinforcement learning.

CUP-DNN is a deep neural network model used to predict tissues of origin for cancers of unknown of primary.

Active Offline Policy Selection With Python

Codes accompanying the paper "Learning Nearly Decomposable Value Functions with Communication Minimization" (ICLR 2020)

The official MegEngine implementation of the ICCV 2021 paper: GyroFlow: Gyroscope-Guided Unsupervised Optical Flow Learning

Robotics environments

Object tracking using YOLO and a tracker(KCF, MOSSE, CSRT) in openCV

EfficientMPC - Efficient Model Predictive Control Implementation

Rainbow is all you need! A step-by-step tutorial from DQN to Rainbow

Image-Stitching - Panorama composition using SIFT Features and a custom implementaion of RANSAC algorithm

Implementation of fast algorithms for Maximum Spanning Tree (MST) parsing that includes fast ArcMax+Reweighting+Tarjan algorithm for single-root dependency parsing.