Activation functions in deep neural networks

Pretorius, A.M.

Activation functions in deep neural networks

Files

Pretorius_AM.pdf (38.25 MB)

Date

2020

Authors

Pretorius, A.M.

Publisher

North-West University (South Africa)

Abstract

The ability of machine learning algorithms to generalize is arguably their most important aspect as it determines their ability to perform appropriately on unseen data. The impres-sive generalization abilities of deep neural networks (DNNs) are not yet well understood. In particular, the inﬂuence of activation functions on the learning process has received limited theoretical attention, even though phenomena such as vanishing gradients, node saturation and sparsity have been identiﬁed as possible contributors when comparing diﬀerent activation functions. In this study, we present ﬁndings based on a comparison of several DNN architectures trained with two popular activation functions, and investigate the eﬀect of these ac-tivation functions on training and generalization. We aim to determine the principal factors that contribute towards the superior generalization performance of rectiﬁed linear networks when compared with sigmoidal networks. We investigate these factors using fully-connected feedforward networks trained on three standard benchmark tasks. We ﬁnd that the most salient diﬀerences between networks trained with these activation functions relate to the way in which class-distinctive information is separated and prop-agated through the network. We ﬁnd that the behavior of nodes in ReLU and sigmoidal networks shows similar regularities in some cases. We also ﬁnd that there are relationships in the ability of hidden layers to accurately use the information available to them and the capacity (speciﬁcally depth and width) of the models. The study contributes towards open questions regarding the generalization performance of deep neural networks, specif-ically giving an informed perspective on the role of two historically popular activation functions.

Description

MEng (Computer and Electronic Engineering), North-West University, Potchefstroom Campus

Keywords

Deep neural network, Generalization, Non-linear activation function, Activation distribution, Node activity

URI

https://orcid.org/0000-0002-6873-8904
http://hdl.handle.net/10394/36251

Collections

Engineering

Full item page

Activation functions in deep neural networks

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By