A regression model for predicting the likelihood of reporting a crime based on the victim’s demographic variables and their perceptions towards the police
Montshiwa, Tlhalitshi Volition
MetadataShow full item record
Despite the growing criminal activities in South Africa,many victims still do not report the crimes, therefore there was a need to understand the determinants of the likelihood of reporting a crime in the country. Binary logistic regression is a supervisedmachine learning algorithmthat can assist in predicting the likelihood of reporting a crime but the selection of relevant variables to add in the model varies from one author to the other. Selection of theoretically sound and statistically relevant independent variables is key to achieving parsimonious multivariate models. This study sought to test the efficiency of some commonly used variable selection methods for logistic regression models in order to identify the most relevant determinants of the likelihood of reporting a crime of housebreaking. The study used 17 candidate variables such as the victims’ demographic variables and their perceptions on the police. The multivariate model fitted using stepwise selection was found to be a best fit for the data based on the lowest AIC, the highest classification accuracy rate and the highest Area under the Receiver Operating Characteristic curve. Themodel fitted using theHosmer-Lemeshow(H-L) algorithmwas the worst fit for the data. The study revealed a limitation of the stepwise selection method which is that this method may select different independent variables for each unique set of randomly selected observations of the same dataset. The study established a multivariate logistic regression model to predict the likelihood of a victim reporting a crime of housebreaking and the determinants thereof.