Affective computing and deep learning to perform sentiment analysis
Abstract
Companies often rely on feedback from consumers to make strategic decisions. However, respondents often neglect to provide their honest answers due to issues, such as response and social desirability bias. This may be caused by several external factors, such as having difficulty in accurately expressing their feelings about a subject or having an opinion that is not aligned with the norm of society. Nevertheless, the accuracy of the data from such studies is negatively affected, leading to invalid results. Sentiment analysis has provided a means of delving into the true opinions of customers and consumers based on text documents, such as tweets and Facebook posts. However, these texts can often be ambiguous and without emotion. It may, therefore, be beneficial to incorporate affective computing into this process to gain information from facial expressions relating to the customer's opinion. Another useful tool that may ease this process is deep neural networks. In this study, a method for performing sentiment analysis based on a subject's facial expressions is proposed. Affective computing is employed to extract meaningful metrics or features from the faces, which is then given as input to a deep multilayer perceptron neural network to classify the corresponding sentiment. Five models were trained, using different data sets to test the validity of this approach. For the first two models, which served as a pilot study, a data set consisting of videos taken of nine participants’ faces were used for training and testing purposes. The videos were processed to extract 42 affective metrics which served as input for the first model and six emotions as input for the second models. The results obtained from these two models proved that it was better to make use of the 42 metrics instead of merely the six emotions to train a model to perform sentiment analysis. However, the models may have overfitted due to creating the training, validation and test data sets at frame level. A third model was created by following a similar approach, but by increasing the number of participants to 22 and subdividing the data sets into training, validation and test data sets at video level instead of at frame level. To reduce the influence of human bias on the models, an already existing, pre-annotated data set was used to train for the next models. The data set had to be relabelled to only make use of three distinct sentiment classes. Two ways of doing this were identified; thus, two more models were created. The first variation of the data set had a class imbalance leading to a model with somewhat skewed results. For the second variation, the classes were more evenly distributed, which was reflected in the performance of the model. The overall results obtained from the study show that the proposed techniques produce models with accuracies that are comparable to models found in the literature, thereby indicating the usability of the proposed techniques. However, it is suggested that other types of neural networks that process time-series data, such as long-short term memory neural networks, may be used to improve the results even further.