Show simple item record

dc.contributor.advisorDavel, M.H.en_US
dc.contributor.advisorBarnard, E.en_US
dc.contributor.authorPretorius, A.M.en_US
dc.date.accessioned2020-11-05T07:10:38Z
dc.date.available2020-11-05T07:10:38Z
dc.date.issued2020en_US
dc.identifier.urihttps://orcid.org/0000-0002-6873-8904en_US
dc.identifier.urihttp://hdl.handle.net/10394/36251
dc.descriptionMEng (Computer and Electronic Engineering), North-West University, Potchefstroom Campus
dc.description.abstractThe ability of machine learning algorithms to generalize is arguably their most important aspect as it determines their ability to perform appropriately on unseen data. The impres-sive generalization abilities of deep neural networks (DNNs) are not yet well understood. In particular, the influence of activation functions on the learning process has received limited theoretical attention, even though phenomena such as vanishing gradients, node saturation and sparsity have been identified as possible contributors when comparing different activation functions. In this study, we present findings based on a comparison of several DNN architectures trained with two popular activation functions, and investigate the effect of these ac-tivation functions on training and generalization. We aim to determine the principal factors that contribute towards the superior generalization performance of rectified linear networks when compared with sigmoidal networks. We investigate these factors using fully-connected feedforward networks trained on three standard benchmark tasks. We find that the most salient differences between networks trained with these activation functions relate to the way in which class-distinctive information is separated and prop-agated through the network. We find that the behavior of nodes in ReLU and sigmoidal networks shows similar regularities in some cases. We also find that there are relationships in the ability of hidden layers to accurately use the information available to them and the capacity (specifically depth and width) of the models. The study contributes towards open questions regarding the generalization performance of deep neural networks, specif-ically giving an informed perspective on the role of two historically popular activation functions.
dc.language.isoenen_US
dc.publisherNorth-West University (South Africa)en_US
dc.subjectDeep neural network
dc.subjectGeneralization
dc.subjectNon-linear activation function
dc.subjectActivation distribution
dc.subjectNode activity
dc.titleActivation functions in deep neural networksen_US
dc.typeThesisen_US
dc.description.thesistypeMastersen_US
dc.contributor.researchID23607955 - Davel, Marelie Hattingh (Supervisor)en_US
dc.contributor.researchID21021287 - Barnard, Etienne (Supervisor)en_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record