Interpretability of the random forest model under class imbalance.

Dube, Lindani; Verste, Tanja

Interpretability of the random forest model under class imbalance.

dc.contributor.author	Dube, Lindani
dc.contributor.author	Verste, Tanja
dc.date.accessioned	2026-02-04T07:40:29Z
dc.date.issued	2024
dc.description	Journal Article. Centre for Business Mathematics & Informatics, North West University, Potchefstroom
dc.description.abstract	In predictive modeling, addressing class imbalance is a critical concern, particularly in applications where certain classes are disproportionately represented. This study delved into the implications of class imbalance on the interpretability of the random forest models. Class imbalance is a common challenge in machine learning, particularly in domains where certain classes are underrepresented. This study investigated the impact of class imbalance on random forest model performance in churn and fraud detection scenarios. We trained and evaluated random forest models on churn datasets with class imbalances ranging from 20% to 50% and fraud datasets with imbalances from 1% to 15%. The results revealed consistent improvements in the precision, recall, F1-score, and accuracy as class imbalance decreases, indicating that models become more precise and accurate in identifying rare events with balanced datasets. Additionally, we employed interpretability techniques such as Shapley values, partial dependence plots (PDPs), and breakdown plots to elucidate the effect of class imbalance on model interpretability. Shapley values showed varying feature importance across different class distributions, with a general decrease as datasets became more balanced. PDPs illustrated a consistent upward trend in estimated values as datasets approached balance, indicating consistent relationships between input variables and predicted outcomes. Breakdown plots highlighted significant changes in individual predictions as class imbalance varied, underscoring the importance of considering class distribution in interpreting model outputs. These findings contribute to our understanding of the complex interplay between class balance, model performance, and interpretability, offering insights for developing more robust and reliable predictive models in real-world applications.
dc.identifier.citation	Dube, L. and Verster, T. 2024. Interpretability of the random forest model under class imbalance. Data Sci Financ Econ 4: 446–468.https://dx.doi.org/10.3934/DSFE.2024019
dc.identifier.uri	http://hdl.handle.net/10394/45850
dc.language.iso	en
dc.publisher	AIMS Press
dc.subject	Credit
dc.subject	Fraud
dc.subject	Modeling
dc.subject	Classification
dc.subject	Imbalance
dc.subject	Random forest
dc.subject	Interpretability
dc.title	Interpretability of the random forest model under class imbalance.
dc.type	Article

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Dube and Verster.pdf
Size:: 1.69 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Articles