Contrasting Convolutional Neural Networks with alternative architectures for transformation invariance

Mouton, Coenraad

Contrasting Convolutional Neural Networks with alternative architectures for transformation invariance

Files

Mouton_C.pdf (1.84 MB)

Date

2021

Authors

Mouton, Coenraad

Researcher ID

23607955 - Davel, Marelie Hattingh (Supervisor)

Supervisors

Davel, M.H.

Publisher

North-West University (South Africa)

Abstract

Convolutional Neural Networks (CNNs) have become the standard for image classiﬁcation tasks, however, they are not completely invariant to transformations of the input image. We empirically investigate to which degree CNNs can handle transformed input images, and also compare their abilities to multilayer perceptrons (MLPs) and spatial transformer networks (STNs). We measure invariance to three aﬃne transformations, namely: translation, rotation and scale; and speciﬁcally focus on translation. The lack of translation invariance in CNNs is attributed to the use of stride which sub-samples the input, resulting in a loss of information, and fully connected layers which lack spatial reasoning. We ﬁrst theoretically show that stride can greatly beneﬁt translation invariance given that it is combined with suﬃcient similarity between neighbouring pixels, a characteristic which we refer to as local homogeneity. We then empirically verify this hypothesis, and also observe that this characteristic is dataset-speciﬁc, which dictates the required relationship between pooling kernel size and stride for translation invariance. Furthermore we ﬁnd that a trade-oﬀ exists between generalization and translation invariance in the case of pooling kernel size and stride, as larger kernel sizes and strides lead to better invariance but poorer generalization. We then compare the translation, scale, and rotation invariance of CNNs to STN-CNNs and MLPs. As expected, we ﬁnd that MLPs fair far worse than CNNs and STN-CNNs in terms of transformation invariance and generalization. We ﬁnd that STNs can improve the transformation invariance of a CNN architecture, given that it is exposed to enough transformed samples during the training process. Furthermore, we observe that without explicit regularization, STNs do not provide any beneﬁts over CNNs in terms of generalization ability.

Description

MEng (Computer and Electronic Engineering), North-West University, Potchefstroom Campus

Keywords

Convolutional Neural Network, Spatial Transformer Network, transformation invariance, scale invariance, rotation invariance, translation invariance, architectural comparison, subsampling

URI

https://orcid.org/0000-0001-8610-2478
http://hdl.handle.net/10394/37744

Collections

Engineering

Full item page

Contrasting Convolutional Neural Networks with alternative architectures for transformation invariance

Files

Date

Authors

Researcher ID

Supervisors

Journal Title

Journal ISSN

Volume Title

Publisher

Record Identifier

Abstract

Sustainable Development Goals

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By