Comparison of ensemble cost sensitive algorithms: application to credit scoring prediction Meryem Saidi Mostafa El Habib Daho High School of Management Biomedical Engineering Laboratory GBM Laboratory, Tlemcen University Tlemcen University miryem.saidi@gmail.com mostafa.elhabibdaho@gmail.com Nesma Settouti Mohammed El Amine Bechar Biomedical Engineering Laboratory Biomedical Engineering Laboratory Tlemcen University Tlemcen University nesma.settouti@gmail.com am.bechar@gmail.com 1 Introduction Credit scoring is the process of analyzing credit files, Abstract to decide the creditworthiness of an individual. Dis- tinguishing a good applicant for a loan from a bad In recent years, the increase in the demand for one is important to cut financial institution’s losses credit leads the financial institutions to con- [AEW13]. The use of machine learning tools allows sider artificial intelligence and machine learn- auditors to analyze large amounts of information for ing techniques as a solution to make decisions evaluating the credit risk in a reasonable time [Yu17]. in a reduced time. These decision support sys- These algorithms tend to decrease the classification tems reach good results in classifying loan ap- error and assume that all misclassification’s have the plications into good loans and bad loans. Al- same cost. However, the cost for labeling a positive beit they suffer of some limitations, mainly, example as negative is different from the cost for label- they consider that the misclassification errors ing a negative example as positive. Indeed, approving have the same financial impact. a bad loan is much more costly than rejecting a po- In this work, we study the performance of en- tentially good loan [KBC16]. Indeed, if a loan can not semble cost sensitive algorithms in reducing full fill its loan obligations this may result in negative the most expensive errors. We apply these impacts on bank profits and big financial losses. How- techniques on German credit data. By com- ever, if a good loan is rejected, it causes lower profits paring the different algorithms, we demon- losses. strate the effectiveness of cost sensitive ensem- These algorithms tend to decrease the classification ble algorithms in determining the potential error and assume that all misclassification’s have the loan defaulters to reduce the financial cost. same cost. However, the cost for labeling a positive example as negative is different from the cost for label- ing a negative example as positive. Indeed, approving Keywords Cost sensitive learning, credit scoring, en- a bad loan is much more costly than rejecting a po- semble algorithms. tentially good loan [KBC16]. Indeed, if a loan can not full fill its loan obligations this may result in negative impacts on bank profits and big financial losses. How- Copyright c by the paper’s authors. Copying permitted for ever, if a good loan is rejected, it causes lower profits private and academic purposes. losses. In: Proceedings of the 3rd Edition of the International Conference on Advanced Aspects of Software Engineering On the other hand, credit datasets are highly im- (ICAASE18), Constantine, Algeria, 1,2-December-2018, pub- balanced which worsens the situation. Traditional ma- lished at http://ceur-ws.org chine learning algorithms tend to maximize accuracy Page 56 Comparison of ensemble cost sensitive algorithms: application to credit scoring prediction ICAASE'2018 by considering most of the cases as good loans (major- in cost sensitive learning is total cost (see equation 1). ity class), thus causing significant default loss. Motivated by the non-uniform cost classification T otal Cost = (F N ∗ CF N ) + (F P ∗ CF P ) (1) problem, the data mining researchers propose new cost-sensitive learning approaches for taking into The cost-sensitive learning methods can be categorized account the misclassification costs or other types into two categories: direct and indirect. of costs such as acquisition cost or computer cost [Tur00, Dom99, Elk01, Mar02]. Some studies have Direct methods been conducted on the use of cost-sensitive (CS) learn- In the direct method, the learning algorithm is it- ing in credit scoring as a CS-boosted tree [XLL17], self cost-sensitive (CS). The CS learning algorithms CS-Neuronal network [AGM+ 13], CS-decision tree use the misclassification cost during the learning pro- [BAO15] and CS-logistic regression [BAO14]. cess. There are several works on cost-sensitive learn- ing algorithms such as ICET [Tur95], an evolution- The objective of this study is to compare the effec- ary algorithm using a misclassification cost in the tiveness of different techniques to assist the loan officer fitness function. Many cost sensitive decision tree in screening out potential loan defaulters in the credit approaches were proposed [MZ12, Tur95, DHR+ 06, environment. The rest of paper is framed as follows: FCPB07, ZL16]. In [KK98], the authors perform a Section 2 describes the used algorithms. Experimen- comparative study of different cost-sensitive neural tal results and discussion are presented in Section 3, networks. Other researches propose cost sensitive en- Finally, we conclude with a summary of results and semble methods [KW14, KWS14, SKWW07, MSV11, directions for future works. Mar99]. 2 Research methodology Indirect methods On the other hand, the indirect methods, called In this section, we present the cost sensitive learning Cost-sensitive meta-learning, convert existing cost- principle and the selected algorithms for the evalua- insensitive learning algorithms into cost-sensitive ones tion. without modifying them. The cost-sensitive meta- learning technique, propose two major mechanisms: 2.1 Cost sensitive learning a pre-process instance sampling or weighting of the There are several methods to deal with unequal mis- training dataset and a threshold adjusting of the out- classification costs. The first one is to use a learn- put of a cost-insensitive algorithm [Zha08]. In this ing algorithm that takes into account the costs when category, we can cite MetaCost [Dom99] which manip- building the classifier. The second strategy is to use ulate the training set labels, Costing [ZLA03], Weight- sampling (oversampling and under-sampling) to alter ing [Tin02]or Empirical Thresholding [SL06]. the class distribution of the training data. In cost- sensitive classification, the misclassification cost plays 2.2 Used algorithms an important role in the learning process. A cost ma- Classification and regression trees (CART) trix is used to encode the penalty of misclassifying an example from one class as another [Dom99]. Table 1 Proposed by Breiman et al. [BFOS84], CART is a represents a misclassification cost matrix, used to ob- binary decision tree. This algorithm processes con- tain the cost of a false positive (FP), false negative tinuous and categorical attributes and target. CART (FN), true positive (TP), and true negative (TN). uses the Gini splitting rule to search the best possi- ble variable to split the node into two child nodes and grow the trees to their maximum size until no splits Table 1: Cost matrix are possible. Predict Actual Positive Negative Bagging Positive CT P CF N Negative CF P CT N Bootstrap Aggregation (Bagging), is one of the ear- liest and simplest ensemble algorithms [Bre96]. The The positive class is the most expensive class and learners are fitted to bootstrap replicates of the train- C(i, j) denote the cost of predicting an instance from ing set by sampling randomly from original set with class i as class j. Usually, C(i, i) have a null or a nega- replacement i.e.: an observation xi may appear multi- tive cost and the FN cost is more expensive than a FP ple times in the sample. After the base learners have cost (C(0, 1) > C(1, 0)). The best evaluation metrics been fit, the aggregated response is the majority vote. International Conference on Advanced Aspects of Software Engineering Page 57 ICAASE, December, 01-02, 2018 Comparison of ensemble cost sensitive algorithms: application to credit scoring prediction ICAASE'2018 Hence, Bagging has no memory, it is easily parallelize Algorithm 2 CS-bagging Algorithm (as can be seen in Algorithm 1). Input: S = ((x1 , y1 ), ..., (xm , ym ))), P: the number of classifier to train. Algorithm 1 CS-bagging Algorithm for p:=1 to P do Input: S = ((x1 , y1 ), ..., (xm , ym ))), P: the number Sp = Bootstrap(S), i.i.d. sampling with replace- of classifier to train. ment from S. for p:=1 to P do hp = TrainClassifier(St ). Sp = Bootstrap(S), i.i.d. sampling with replace- Add hp = to the ensemble. ment from S. end for hp = TrainClassifier(St ). for p:=1 to P do Add hp = to the ensemble. Ŷp (w) =Set the prediction with hp . end for end for According to the proportions observed on the P 0 s prediction, we have an estimate of P (Y = Boosting yk /X(w)). Proposed by Schapire [SFBL97, SF12], boosting is Make the prediction which minimizes the cost. a technique for sequentially combining multiple base classifiers whose combined performance is significantly Algorithm 3 MetaCost Algorithm better than that of any of the base classifiers. Each Input: S = ((x1 , y1 ), ..., (xm , ym ))), L: cost matrix, base classifier is trained on data that is weighted based H : classifier. on the performance of the previous classifier and each Estimate the class probabilities P (yi |xi ). classifier votes to obtain a final decision. P Relabel yi = argmin j = 1kP (j|xi )L(γ, j)∀i . T = H(x, y). CS-CART Output: T . To generate a cost-sensitive CART algorithm Breiman • One phase Cost classifiers : CS-CART, BAG- et al. [BFOS84] modify the class probabilities, P (i) GING CS-CART, CS-BAGGING CART, MULTI used in the information gain measure. Instead of es- COST CART, BOOSTING CS-CART. timating P (i) by Ni /N , it is weighted by the relative cost. • Two phases Cost classifier: CS-BAGGING CS- X CART P (i) = Cij ∗ (Ni /N )/ cost(j)(Nj /N ) j 3 Experimentation The cost of misclassifying an example of class j as class 3.1 Dataset P i is : cost(j) = Cij The empirical evaluation was made on the German credit scoring dataset from the UCI Machine Learning CS-bagging Repository. This dataset consists of 20 features and 1000 instances including 700 instances of credit-worthy It learns the different individual classifiers then it uses applicants and 300 instances of insolvent customers the available classifiers for a better estimation of the who should not have been granted credit. This dataset posterior probabilities according to a voting scheme. is provided with a cost matrix, This approach is applicable regardless of the underly- ing learning method [Shi15]. Table 2: Cost matrix Predict MetaCost Actual Insolvent Creditworthy This algorithm was proposed by [Dom99]. MetaCost Insolvent 0 5 estimates the class probabilities then relabel the train- Creditworthy 1 0 ing instances to minimize the expected cost. Finally, a new classifier is built on the relabeled dataset. 3.2 Results and discussion We test different combinations of the former algo- In this section, we present the results obtained by the rithms: different algorithms on the German Credit dataset. All computations were performed under Tanagra 1 . • The insensitive cost classifiers: CART, BAG- GING of CART, BOOSTING of CART. 1 http://eric.univ-lyon2.fr/ ricco/tanagra/fr/tanagra.html International Conference on Advanced Aspects of Software Engineering Page 58 ICAASE, December, 01-02, 2018 Comparison of ensemble cost sensitive algorithms: application to credit scoring prediction ICAASE'2018 Table 3: Classification Results Error Specificity Sensibility Cost Friedman test CART 27.57 88.52 35.87 47.68 7 CS-CART 39.34 49.76 83.70 26.91 3.5 BAGGING CART 21.93 91.87 46.74 39.16 6.62 BAGGING CS-CART 36.88 55.50 80.43 27.35 4.76 CS-BAGGING CART 38.87 51.67 82.61 27.06 4 CS-BAGGING CS-CART 43.85 41.63 89.13 25.71 3 MULTICOST CART 40.53 49.76 81.52 28.40 5 BOOSTING CART 21.93 87.08 57.61 33.18 5.87 BOOSTING CS-CART 37.21 55.50 79.35 28.10 5.25 We compare the performances of the methods with dif- that boosting-CART offers a good trade-off between ferent metrics: error, misclassification cost, sensitivity error and misclassification cost followed by bagging- and specificity. CART. But, it is clear that even if the ensemble cost- insensitive algorithms obtain generally good results, FP FN they promote a good classification of the majority class Error = + (T P + F P ) (F N + T N ) at the expense of the minority class (insolvent loans). However, when we carry out a bagging of cost sensi- Cost = F P ∗ CF P + F N ∗ CF N tive trees the recognition of the insolvent loan is higher without a major lose in specificity. On the other hand, TP the CS-bagging obtains better sensibility but lower Specif icity = specificity. (T P + F N ) From a general point of view, if we consider both mis- TN classification error and cost, we can say that the winner Sensitivity = (F P + T N ) in this study is a bagging of CS-trees. Table 3 presents the general results of the nine algo- rithms. Following the recommendation of [TSTL12], 4 Conclusion we employ the non-parametric Freidman test to com- pare the classifiers. The Friedman test ranks the al- In recent years, the number of insolvent loans has in- gorithms; to the best performing is the rank of 1, the creased due to the financial crisis. It becomes neces- second best is the rank 2, etc. The last column depicts sary for banks to find new methods for the evaluation the statistical test. credit application. Machine learning techniques have A number of conclusions emerge from this table. been used to perform financial decision making. How- First, it emphasizes the superiority of ensemble meth- ever, these methods intended to minimize the misclas- ods compared to the individual classifier. When we sification error and assume that the different errors consider the classification error, the best performances are equals. The cost sensitive techniques are used to are reached by classical bagging and boosting. How- handle the misclassification cost in many real world ever, these algorithms focus on improving the classifi- problems. cation accuracy at the expense of the minority class. In this paper, we compare the performance of differ- So, they obtain a low sensibility which increases the ent cost-sensitive and cost-insensitive ensemble algo- cost. rithms in determining the creditworthiness of an in- On the other hand, a CS-bagging of CS-CART ob- dividual. The experiments drew the following conclu- tains the lowest misclassification cost followed by the sions (1) the ensemble approaches obtain better re- individual classifier CS-CART. In this case, the sta- sults than individual classifier; (2) the insensitive ap- tistical improvement is not significant (just 1.19). We proaches reached the best classification accuracy but can consider that this little improvement not worth since the class distribution is highly imbalanced the the the computational cost. However, in some cases a minority class (insolvent loan) is less well recognized; small gain in performances represents a great gain in (3) the cost sensitive approaches intended to reduce economical benefits. Albeit this technique obtains the the cost at the expense of the accuracy. highest classification error. Finally, we found that the cost sensitive bagging algo- Figure 1 compares the average results of the dif- rithm offers the best trade-off between accuracy and ferent methods. In figure 2 and 3, we can see the re- misclassification cost. For future research, we aim to sults error vs cost and specificity vs sensitivity for each use techniques to handle imbalanced datasets and ex- classifier. Considering those values, we can suppose periment with other cost sensitive algorithms. International Conference on Advanced Aspects of Software Engineering Page 59 ICAASE, December, 01-02, 2018 Comparison of ensemble cost sensitive algorithms: application to credit scoring prediction ICAASE'2018 Figure 1: Error classification Vs misclassification cost Figure 2: Sensitivity Vs Specificity International Conference on Advanced Aspects of Software Engineering Page 60 ICAASE, December, 01-02, 2018 Comparison of ensemble cost sensitive algorithms: application to credit scoring prediction ICAASE'2018 References [Mar02] Dragos Dorin Margineantu. Methods for Cost- sensitive Learning. PhD thesis, Oregon State Uni- [AEW13] Shweta Arya, Catherine Eckel, and Colin Wichman. versity, Corvallis, OR, USA, 2002. AAI3029569. Anatomy of the credit score. Journal of Economic Behavior & Organization, 95:175 – 185, 2013. [MSV11] H. Masnadi-Shirazi and N. Vasconcelos. Cost- sensitive boosting. IEEE Transactions on Pattern [AGM+ 13] R. Alejo, V. Garcı́a, A. I. Marqués, J. S. Sánchez, Analysis and Machine Intelligence, 33(2):294309, and J. A. Antonio-Velázquez. Making accurate 2011. credit risk predictions with cost-sensitive mlp neural networks. In Jorge Casillas, Francisco J. Martı́nez- [MZ12] F. Min and W. Zhu. A competition strategy to cost- López, Rosa Vicari, and Fernando De la Prieta, ed- sensitive decision trees. Rough Sets and Knowledge itors, Management Intelligent Systems, pages 1–8, Technology, pages 359–368, 2012. Heidelberg, 2013. Springer International Publishing. [SF12] R.E. Schapire and F. Freund. Boosting: Founda- [BAO14] Alejandro Correa Bahnsen, Djamila Aouada, and tions and Algorithms. The MIT Press, 2012. Bjrn Ottersten. Example-dependent cost-sensitive logistic regression for credit scoring. In 13th In- [SFBL97] R. E. Schapire, Y. Freund, P. Bartlett, and W.S. ternational Conference on Machine Learning and Lee. Boosting the margin: a new explanation for the Applications, 2014. effectiveness of voting methods. In Machine Learn- ing: Proceedings of the Fourteenth International [BAO15] Alejandro Correa Bahnsen, Djamila Aouada, and Conference, 1997. Bjrn Ottersten. Example-dependent cost-sensitive decision trees. Expert Systems with Applications, [Shi15] S.A. Shilbayeh. Cost sensitive meta learning. PhD 42(19):6609 – 6619, 2015. thesis, School of computing, science and engineering university of salford manchester, UK, 2015. [BFOS84] L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. [SKWW07] Yanmin Sun, Mohamed S. Kamel, Andrew K.C. Stone. Classification And Regression Trees. Chap- Wong, and Yang Wang. Cost-sensitive boosting for man and Hall, New York, 1984. classification of imbalanced data. Pattern Recogni- tion, 40(12):3358 – 3378, 2007. [Bre96] L. Breiman. Bagging predictors. Machine Learning, 24:123–140, 1996. [SL06] V. S. Sheng and C. X. Ling. Thresholding for making classifiers cost-sensitive. In Proceedings of the st na- + [DHR 06] J.V. Davis, J. Ha, C.J. Rossbach, H.E. Ramadan, tional conference on artificial intelligence, Boston, and E. Witchel. Cost-sensitive decision tree learning Massachusetts, 2006. for forensic classification. In Proceedings of the 17th European Conference on Machine Learning,, page [Tin02] K. M. Ting. An instance-weighting method to in- 622629, 2006. duce cost-sensitive trees. IEEE Transactions on Knowledge and Data Engineering, 14(3):659 – 665, [Dom99] Pedro M. Domingos. Metacost: a general method for 2002. making classifiers cost-sensitive. In Proceedings of the Fifth ACM SIGKDD International Conference [TSTL12] Bogdan Trawinski, Magdalena Smetek, Zbigniew on Knowledge Discovery and Data Mining, pages Telec, and Tadeusz Lasota. Nonparametric statis- 155–164, 1999. tical analysis for multiple comparison of machine learning regression algorithms. Applied Mathemat- [Elk01] Charles Elkan. The foundations of cost-sensitive ics and Computer Science, 22(4):867–881, 2012. learning. In Proceedings of the 17th International Joint Conference on Artificial Intelligence - Vol- [Tur95] P. Turney. Cost-sensitive classification: Empirical ume 2, IJCAI’01, pages 973–978, San Francisco, CA, evaluation of a hybrid genetic decision tree induc- USA, 2001. Morgan Kaufmann Publishers Inc. tion algorithm. Journal of Artificial Intelligence Research (JAIR), 2:369–409, 1995. [FCPB07] A. Freitas, A. Costa-Pereira, and P. Brazdil. Cost- [Tur00] Peter D. Turney. Types of cost in inductive concept sensitive decision trees applied to medical data. learning. In Workshop on Cost-Sensitive Learn- Data Warehousing and Knowledge Discovery, page ing at the Seventeenth International Conference on 303312, 2007. Machine Learning, volume cs.LG/0212034, 2000. [KBC16] Yeonkook J. Kim, Bok Baik, and Sungzoon Cho. De- [XLL17] Yufei Xia, Chuanzhe Liu, and Nana Liu. Cost- tecting financial misstatements with fraud intention sensitive boosted tree for loan evaluation in peer-to- using multi-class cost-sensitive learning. Expert Sys- peer lending. Electronic Commerce Research and tems with Applications, 62:32 – 43, 2016. Applications, 24:30 – 49, 2017. [KK98] M. Kukar and I. Kononenko. Cost-sensitive learning [Yu17] Xiaojiao Yu. Machine learning application in online with neural networks. In Proceedings of the Thir- lending risk prediction. ArXiv e-prints, July 2017. teenth European Conference on Artificial Intelli- gence, Chichester, NY., 1998. [Zha08] Huimin Zhao. Instance weighting versus threshold adjusting for cost-sensitive classification. Knowl- [KW14] Bartosz Krawczyk and Michal Woźniak. Evolu- edge and Information Systems, 15(3):321–334, Jun tionary cost-sensitive ensemble for malware detec- 2008. tion. In International Joint Conference SOCO’14- CISIS’14-ICEUTE’14, pages 433–442, Cham, 2014. [ZL16] H. Zhao and X. Li. A cost sensitive decision tree Springer International Publishing. algorithm based on weighted class distribution with batch deleting attribute mechanism. Information [KWS14] Bartosz Krawczyk, Micha Woniak, and Gerald sciences, 2016. Schaefer. Cost-sensitive decision tree ensembles for effective imbalanced classification. Applied Soft [ZLA03] Bianca Zadrozny, John Langford, and Naoki Abe. Computing, 14:554 – 562, 2014. Cost-sensitive learning by cost-proportionate exam- ple weighting. In Proceedings of the Third IEEE [Mar99] D.D. Margineantu. Building ensembles of classifiers International Conference on Data Mining, ICDM for loss minimization. In Proceedings of the 31st ’03, pages 435–, Washington, DC, USA, 2003. IEEE Symposium on the Interface, Models, Predictions Computer Society. and Computing, pages 190–194, 1999. International Conference on Advanced Aspects of Software Engineering Page 61 ICAASE, December, 01-02, 2018