• Login
    View Item 
    •   NWU-IR Home
    • Electronic Theses and Dissertations (ETDs)
    • Engineering
    • View Item
    •   NWU-IR Home
    • Electronic Theses and Dissertations (ETDs)
    • Engineering
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Generalization in deep learning : bilateral synergies in MLP learning

    Thumbnail
    View/Open
    Theunissen MW 22721339.pdf (13.97Mb)
    Date
    2021
    Author
    Theunissen, Marthinus Wilhelmus
    Metadata
    Show full item record
    Abstract
    We present an investigation of how simple artificial neural networks (specifcally, feed-forward networks with full connections between each successive pair of layers) generalize to out-of-sample data. By emphasizing the substructures formed within these networks we are able to shed light on several phenomena and relevant open questions in the literature. Specifically, we show that hidden units with piecewise linear activation functions are optimized on the train set in a distributed manner, meaning each sub-unit is only optimized to reduce the loss of a specific sub-population of the train set. This mechanism gives rise to a type of modularity that is not often considered in investigations of artificial neural networks and generalization. We are able to uncover informative regularity in sub-unit behavior and elucidate known phenomena such as: different artificial neural networks tend to prioritize similar samples, over-parametization does not necessarily lead to poor generalization, artificial neural networks are able to interpolate large amounts of noise and still generalize appropriately, and generalization error as a function of representational capacity undergoes a second descent beyond the point of interpolation (a.k.a the double descent phenomenon). We motivate a perspective of generalization in deep learning that is less focused on the complexity of hypothesis spaces, and looks to substructures and the manner by which training data is compartmentalized as a method of understanding the observed ability of these networks to generalize. This perspective contradicts classical ideas of generalization and complexity under certain conditions.
    URI
    https://orcid.org/0000-0002-7456-7769
    http://hdl.handle.net/10394/38073
    Collections
    • Engineering [1424]

    Copyright © North-West University
    Contact Us | Send Feedback
    Theme by 
    Atmire NV
     

     

    Browse

    All of NWU-IR Communities & CollectionsBy Issue DateAuthorsTitlesSubjectsAdvisor/SupervisorThesis TypeThis CollectionBy Issue DateAuthorsTitlesSubjectsAdvisor/SupervisorThesis Type

    My Account

    LoginRegister

    Copyright © North-West University
    Contact Us | Send Feedback
    Theme by 
    Atmire NV