Novel data augmentation schemes for pose classification using a convolutional neural network
Abstract
Population aging is a global trend that can be attributed to declining fertility rates, longer life expectancies, and aging cohorts in many developed countries. The number of individuals over the age of 60 is expected to rise significantly in the coming decades and will likely place additional pressure on healthcare systems worldwide. Currently, falls are one of the leading causes of hospitalisation among the elderly and can result in life-threating injuries and long-term disability. Promptly administering aid to a fall victim can significantly improve their chances of recovery and reduce the need for specialised care. To address these problems, fall detection systems have been developed which facilitate early intervention and reduce the negative consequences of falls. This allows seniors to maintain their independence even after incurring a fall while also alleviating
the anticipated pressures on healthcare since injuries are readily treated. However, there is no universal approach or solution to fall detection due to the complexity of the problem domain. New avenues of research are continually stimulated and explored as technological advances are made
and new sensor technology becomes available. Vision-based approaches for fall detection are more favourable than wearable and ambient
sensor-based approaches since they are less obtrusive and can provide more information about the context of a fall. However, fall recognition requires quantifying the recorded human body from footage data, which can be achieved through pose estimation and approximating the location of different body parts. This study investigates the effectiveness of data augmentation techniques on such pose estimated mappings to improve pose classification when conducted using a convolutional neural network (CNN). A novel pose descriptor is designed that can be superimposed onto the imaged human body to encode the kinematic arrangement of limbs and relevant positional cues. This method emphasises the postural differences between poses in the abstracted feature space of a CNN classifier, thereby improving the classifier's ability to reliably differentiate between poses. The approach is demonstrated on a fall dataset that consists of multiple pose classes that are typical of everyday activities.
The study results indicate that the proposed visual augmentation schemes are effective in improving CNN-based pose classification. An improvement of up to 11 percentage points was achieved over a baseline pose recognition accuracy that is free of any augmentations. The
simplicity of the approach makes it useful for real-time applications that warrant timely action and response. These findings are expected to pave the way for more accurate fall detection systems that minimise the risk of fall-related injuries while reducing the intensity of their medical care.