• Login
    View Item 
    •   NWU-IR Home
    • Electronic Theses and Dissertations (ETDs)
    • Engineering
    • View Item
    •   NWU-IR Home
    • Electronic Theses and Dissertations (ETDs)
    • Engineering
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Activation functions in deep neural networks

    Thumbnail
    View/Open
    Pretorius_AM.pdf (38.24Mb)
    Date
    2020
    Author
    Pretorius, A.M.
    Metadata
    Show full item record
    Abstract
    The ability of machine learning algorithms to generalize is arguably their most important aspect as it determines their ability to perform appropriately on unseen data. The impres-sive generalization abilities of deep neural networks (DNNs) are not yet well understood. In particular, the influence of activation functions on the learning process has received limited theoretical attention, even though phenomena such as vanishing gradients, node saturation and sparsity have been identified as possible contributors when comparing different activation functions. In this study, we present findings based on a comparison of several DNN architectures trained with two popular activation functions, and investigate the effect of these ac-tivation functions on training and generalization. We aim to determine the principal factors that contribute towards the superior generalization performance of rectified linear networks when compared with sigmoidal networks. We investigate these factors using fully-connected feedforward networks trained on three standard benchmark tasks. We find that the most salient differences between networks trained with these activation functions relate to the way in which class-distinctive information is separated and prop-agated through the network. We find that the behavior of nodes in ReLU and sigmoidal networks shows similar regularities in some cases. We also find that there are relationships in the ability of hidden layers to accurately use the information available to them and the capacity (specifically depth and width) of the models. The study contributes towards open questions regarding the generalization performance of deep neural networks, specif-ically giving an informed perspective on the role of two historically popular activation functions.
    URI
    https://orcid.org/0000-0002-6873-8904
    http://hdl.handle.net/10394/36251
    Collections
    • Engineering [1424]

    Copyright © North-West University
    Contact Us | Send Feedback
    Theme by 
    Atmire NV
     

     

    Browse

    All of NWU-IR Communities & CollectionsBy Issue DateAuthorsTitlesSubjectsAdvisor/SupervisorThesis TypeThis CollectionBy Issue DateAuthorsTitlesSubjectsAdvisor/SupervisorThesis Type

    My Account

    LoginRegister

    Copyright © North-West University
    Contact Us | Send Feedback
    Theme by 
    Atmire NV