An efficient semi-sigmoidal non-linear activation function approach for deep neural networks
A non-linear activation function is one of the key contributing factors to the success of Deep Learning (DL). Since the revival of DL takes place in 2012, Rectified Linear Unit (ReLU) has been regarded as a de facto standard for many DL models by the community. Despite its popularity, however, Re...
Saved in:
Main Author: | |
---|---|
Format: | Thesis |
Language: | English English English |
Published: |
2022
|
Subjects: | |
Online Access: | http://eprints.uthm.edu.my/8409/1/24p%20CHIENG%20HOCK%20HUNG.pdf http://eprints.uthm.edu.my/8409/2/CHIENG%20HOCK%20HUNG%20COPYRIGHT%20DECLARATION.pdf http://eprints.uthm.edu.my/8409/3/CHIENG%20HOCK%20HUNG%20WATERMARK.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | A non-linear activation function is one of the key contributing factors to the success
of Deep Learning (DL). Since the revival of DL takes place in 2012, Rectified Linear
Unit (ReLU) has been regarded as a de facto standard for many DL models by the
community. Despite its popularity, however, ReLU contains several shortcomings that
could result in inefficient learning of the DL models. These shortcomings are: 1) the
inherent negative cancellation property in ReLU tends to remove all negative inputs
and causes massive information lost to the network; 2) the derivative of ReLU
potentially causes the occurrence of dead neurons problem to the networks; 3) the
mean activation generated by ReLU is highly positive and lead to bias shift effect in
the network layers; 4) the inherent multilinear structure of ReLU restricts the nonlinear
capability of the networks; 5) the predefined nature of ReLU limits the flexibility
of the networks. To address these shortcomings, this study proposed a new variant of
activation function based on the Semi-sigmoidal (Sig) approach. Based on this
approach, three variants of activation functions are introduced, namely, Shifted Semisigmoidal
(SSig), Adaptive Shifted Semi-sigmoidal (ASSig), and Bi-directional
Adaptive Shifted Semi-sigmoidal (BiASSig). The proposed activation functions were
tested against the ReLU (baseline) and state-of-the-art methods using eight Deep
Neural Networks (DNNs) on seven benchmark image datasets. Further, Adaptive
Moment Estimation (ADAM) and Stochastic Gradient Descent (SGD) were selected
as optimizers to train the DNNs. The baseline comparison score and mean rank were
used to consolidate and analyse the experimental results effectively. The experimental
results in terms of the overall baseline comparison score shown that SSig, ASSig, and
BiASSig obtained the score of 79, 87, and 86 out of 112, respectively, which achieving
outstanding performance than ReLU in more than 70% of the cases. In terms of overall
mean rank (OMR), ReLU ranked at tenth (10th), whereas SSig, ASSig, and BiASSig
ranked at fifth (5th), first (1st), and second (2nd), showing remarkable performance than
ReLU and other comparing methods. |
---|