Contrasting Convolutional Neural Networks with alternative architectures for transformation invariance
Loading...
Files
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
North-West University (South Africa)
Abstract
Convolutional Neural Networks (CNNs) have become the standard for image classification tasks, however, they are not completely invariant to transformations of the input image. We empirically investigate to which degree CNNs can handle transformed input images, and also compare their abilities to multilayer perceptrons (MLPs) and spatial transformer networks (STNs). We measure invariance to three affine transformations, namely: translation, rotation and scale; and specifically focus on translation.
The lack of translation invariance in CNNs is attributed to the use of stride which sub-samples the input, resulting in a loss of information, and fully connected layers which lack spatial reasoning. We first theoretically show that stride can greatly benefit translation invariance given that it is combined with sufficient similarity between neighbouring pixels, a characteristic which we refer to as local homogeneity. We then empirically verify this hypothesis, and also observe that this characteristic is dataset-specific, which dictates the required relationship between pooling kernel size and stride for translation invariance. Furthermore we find that a trade-off exists between generalization and translation invariance in the case of pooling kernel size and stride, as larger kernel sizes and strides lead to better invariance but poorer generalization. We then compare the translation, scale, and rotation invariance of CNNs to STN-CNNs and MLPs. As expected, we find that MLPs fair far worse than CNNs and STN-CNNs in terms of transformation invariance and generalization. We find that STNs can improve the transformation invariance of a CNN architecture, given that it is exposed to enough transformed samples during the training process. Furthermore, we observe that without explicit regularization, STNs do not provide any benefits over CNNs in terms of generalization ability.
Description
MEng (Computer and Electronic Engineering), North-West University, Potchefstroom Campus