NWU Institutional Repository

Generalization in deep learning : bilateral synergies in MLP learning

Loading...
Thumbnail Image

Date

Journal Title

Journal ISSN

Volume Title

Publisher

North-West University (South Africa).

Abstract

We present an investigation of how simple artificial neural networks (specifcally, feed-forward networks with full connections between each successive pair of layers) generalize to out-of-sample data. By emphasizing the substructures formed within these networks we are able to shed light on several phenomena and relevant open questions in the literature. Specifically, we show that hidden units with piecewise linear activation functions are optimized on the train set in a distributed manner, meaning each sub-unit is only optimized to reduce the loss of a specific sub-population of the train set. This mechanism gives rise to a type of modularity that is not often considered in investigations of artificial neural networks and generalization. We are able to uncover informative regularity in sub-unit behavior and elucidate known phenomena such as: different artificial neural networks tend to prioritize similar samples, over-parametization does not necessarily lead to poor generalization, artificial neural networks are able to interpolate large amounts of noise and still generalize appropriately, and generalization error as a function of representational capacity undergoes a second descent beyond the point of interpolation (a.k.a the double descent phenomenon). We motivate a perspective of generalization in deep learning that is less focused on the complexity of hypothesis spaces, and looks to substructures and the manner by which training data is compartmentalized as a method of understanding the observed ability of these networks to generalize. This perspective contradicts classical ideas of generalization and complexity under certain conditions.

Description

PhD (Computer and Electronic Engineering), North-West University, Potchefstroom Campus

Citation

Collections

Endorsement

Review

Supplemented By

Referenced By