Two statistical problems related to credit scoring
De la Rey, Tanja
MetadataShow full item record
This thesis focuses on two statistical problems related to credit scoring. In credit scoring of individuals, two classes are distinguished, namely low and high risk individuals (the so-called "good" and "bad" risk classes). Firstly, we suggest a measure which may be used to study the nature of a classifier for distinguishing between the two risk classes. Secondly, we derive a new method DOUW (detecting outliers using weights) which may be used to fit logistic regression models robustly and for the detection of outliers. In the first problem, the focus is on a measure which may be used to study the nature of a classifier. This measure transforms a random variable so that it has the same distribution as another random variable. Assuming a linear form of this measure, three methods for estimating the parameters (slope and intercept) and for constructing confidence bands are developed and compared by means of a Monte Carlo study. The application of these estimators is illustrated on a number of datasets. We also construct statistical hypothesis to test this linearity assumption. In the second problem, the focus is on providing a robust logistic regression fit and the identification of outliers. It is well-known that maximum likelihood estimators of logistic regression parameters are adversely affected by outliers. We propose a robust approach that also serves as an outlier detection procedure and is called DOUW. The approach is based on associating high and low weights with the observations as a result of the likelihood maximization. It turns out that the outliers are those observations to which low weights are assigned. This procedure depends on two tuning constants. A simulation study is presented to show the effects of these constants on the performance of the proposed methodology. The results are presented in terms of four benchmark datasets as well as a large new dataset from the application area of retail marketing campaign analysis. In the last chapter we apply the techniques developed in this thesis on a practical credit scoring dataset. We show that the DOUW method improves the classifier performance and that the measure developed to study the nature of a classifier is useful in a credit scoring context and may be used for assessing whether the distribution of the good and the bad risk individuals is from the same translation-scale family.