Machine learning and deep learning techniques for natural language processing with application to audio recordings
Abstract
Many debt collection companies need to rely on research focusing on data analysis methods that
can assist them to analyse their unstructured data which holds information that could help them to
better assign their collection agents to high repayment probable accounts. These types of accounts
are characterised by the debtor’s ability to repay which comprise their employment status among
many other driving factors. Unfortunately, analysing unstructured data is extremely challenging
as it comes in natural forms such as audio recordings, videos and images, to mention a few. The aim of this study was to seek for data analysis methods that can accurately predict the employment
status of the debtor using audio call recordings. Transcription of the recordings to text was done
using Automatic Speech Recognition (ASR), followed by data cleaning and the transcribed text
was represented in numerical form using the Term Frequency-Inverse Document Frequency (TF-
IDF) and the Count Vectorizer. The study then compared the accuracy of Artificial Neural
Network (ANN) and Naïve Bayes classifiers in predicting the employment status of the debtor. To
evaluate the performance of the ASR transcription method, word error rate (WER) was used, for
text and to compare ANN and Naïve Bayes, the accuracy, recall and F1-Score were used. An
overall WER of 106.93 was archived by the speech recognition ASR method. ANN with TF-IDF
was identified as the best model for predicting employment status from transcribed audio
recordings.