A study on models for credit scoring with support vector machines
Student thesis: Doctoral Thesis
Related Research Unit(s)
|Award date||2 Oct 2008|
Credit risk assessment is one of the most important topics in the field of financial risk management. Due to recent financial crises and regulatory concerns of Basel II, credit risk assessment has been a major focus of financial services and banking industries. Especially for credit granting institutions such as commercial banks and credit companies, the ability to discriminate good customers from bad is crucial to the success of their business. Credit scoring is the main tool for credit risk assessment. Most quantitative methods have been widely used for credit scoring in finance and banking. Support vector machines (SVM) are a set of data-driven, supervised learning methods that do not require specific assumptions on the underlying data generating process. This feature is particularly appealing for practical business situations where data are abundant or easily available, even though the theoretical model or the underlying relationship is unknown. In most practical applications, SVM generalization performs significantly better than competing methods. In this thesis, we study credit scoring models with SVM from three aspects. First, direct search method is introduced to optimize parameters for credit scoring models with SVM. Empirical experiments on two credit datasets show that credit scoring models based on SVM, with direct search for parameters selection, have good accuracy and consume less computational time compared with traditional search methods, and their classification accuracy outperforms commonly used methods. Second, a new reliability-based ensemble strategy with minimizing correlations among classifiers, and another new ensemble strategy based on weight assignment in terms of performance on tough samples, is proposed for SVM ensemble models. The models were compared with 19 single methods on the basis of tests on real-world credit datasets. The results show that ensemble models have good robustness to ensure higher classification accuracy. In addition, a straightforward measure d+-d- is put forward to make an attempt to describe the complexity of the testing samples, relative to the training samples, for the binary classification problem. Third, a genetic algorithm based weighted SVM model is proposed for multiple objectives credit scoring. This model can also provide the relative importance of input features, whereas most previous researches only use GA to conduct features selection for SVM, which can not determine the relative importance of selected or unselected features. Empirical experiments show that the GAWSVM model can implement multi-objectives credit scoring models effectively. However, when the trade-offs between the objectives are not properly defined, the GAWSVM model may not have good robustness.
- Machine learning, Credit scoring systems, Algorithms