-

International Conference on Advanced Aspects of Software Engineering ICAASE, December

Comparison of ensemble cost sensitive algorithms: application to credit scoring prediction

Meryem Saidi

miryem.saidi@gmail.com 1

Nesma Settouti

nesma.settouti@gmail.com 0

Mostafa El Habib Daho

mostafa.elhabibdaho@gmail.com 0

Mohammed El Amine Bechar

am.bechar@gmail.com 0 0 Biomedical Engineering Laboratory, Tlemcen University 1 High School of Management, GBM Laboratory, Tlemcen University

2018

0 1 02

Page 56

Cost sensitive learning credit scoring ensemble algorithms

In recent years, the increase in the demand for credit leads the nancial institutions to consider arti cial intelligence and machine learning techniques as a solution to make decisions in a reduced time. These decision support systems reach good results in classifying loan applications into good loans and bad loans. Albeit they su er of some limitations, mainly, they consider that the misclassi cation errors have the same nancial impact.

In this work, we study the performance of ensemble cost sensitive algorithms in reducing the most expensive errors. We apply these techniques on German credit data. By comparing the di erent algorithms, we demonstrate the e ectiveness of cost sensitive ensemble algorithms in determining the potential loan defaulters to reduce the nancial cost.

In: Proceedings of the 3rd Edition of the International Conference on Advanced Aspects of Software Engineering (ICAASE18), Constantine, Algeria, 1,2-December-2018, published at http://ceur-ws.org 1

Introduction

Credit scoring is the process of analyzing credit les, to decide the creditworthiness of an individual. Distinguishing a good applicant for a loan from a bad one is important to cut nancial institution's losses [AEW13]. The use of machine learning tools allows auditors to analyze large amounts of information for evaluating the credit risk in a reasonable time [Yu17].

These algorithms tend to decrease the classi cation error and assume that all misclassi cation's have the same cost. However, the cost for labeling a positive example as negative is di erent from the cost for labeling a negative example as positive. Indeed, approving a bad loan is much more costly than rejecting a potentially good loan [KBC16]. Indeed, if a loan can not full ll its loan obligations this may result in negative impacts on bank pro ts and big nancial losses. However, if a good loan is rejected, it causes lower pro ts losses.

On the other hand, credit datasets are highly imbalanced which worsens the situation. Traditional machine learning algorithms tend to maximize accuracy by considering most of the cases as good loans (majority class), thus causing signi cant default loss.

Motivated by the non-uniform cost classi cation problem, the data mining researchers propose new cost-sensitive learning approaches for taking into account the misclassi cation costs or other types of costs such as acquisition cost or computer cost [Tur00, Dom99, Elk01, Mar02]. Some studies have been conducted on the use of cost-sensitive (CS) learning in credit scoring as a CS-boosted tree [XLL17], CS-Neuronal network [AGM+13], CS-decision tree [BAO15] and CS-logistic regression [BAO14].

The objective of this study is to compare the e ectiveness of di erent techniques to assist the loan o cer in screening out potential loan defaulters in the credit environment. The rest of paper is framed as follows: Section 2 describes the used algorithms. Experimental results and discussion are presented in Section 3, Finally, we conclude with a summary of results and directions for future works. 2

Research methodology

In this section, we present the cost sensitive learning principle and the selected algorithms for the evaluation. 2.1

Cost sensitive learning

There are several methods to deal with unequal misclassi cation costs. The rst one is to use a learning algorithm that takes into account the costs when building the classi er. The second strategy is to use sampling (oversampling and under-sampling) to alter the class distribution of the training data. In costsensitive classi cation, the misclassi cation cost plays an important role in the learning process. A cost matrix is used to encode the penalty of misclassifying an example from one class as another [Dom99]. Table 1 represents a misclassi cation cost matrix, used to obtain the cost of a false positive (FP), false negative (FN), true positive (TP), and true negative (TN).

Actual

Positive Negative

The positive class is the most expensive class and C(i; j) denote the cost of predicting an instance from class i as class j. Usually, C(i; i) have a null or a negative cost and the FN cost is more expensive than a FP cost (C(0; 1) > C(1; 0)). The best evaluation metrics in cost sensitive learning is total cost (see equation 1).

T otal Cost = (F N

CF N ) + (F P

CF P ) (1) The cost-sensitive learning methods can be categorized into two categories: direct and indirect.

Direct methods

In the direct method, the learning algorithm is itself cost-sensitive (CS). The CS learning algorithms use the misclassi cation cost during the learning process. There are several works on cost-sensitive learning algorithms such as ICET [Tur95], an evolutionary algorithm using a misclassi cation cost in the tness function. Many cost sensitive decision tree approaches were proposed [MZ12, Tur95, DHR+06, FCPB07, ZL16]. In [KK98], the authors perform a comparative study of di erent cost-sensitive neural networks. Other researches propose cost sensitive ensemble methods [KW14, KWS14, SKWW07, MSV11, Mar99].

Indirect methods

On the other hand, the indirect methods, called Cost-sensitive meta-learning, convert existing costinsensitive learning algorithms into cost-sensitive ones without modifying them. The cost-sensitive metalearning technique, propose two major mechanisms: a pre-process instance sampling or weighting of the training dataset and a threshold adjusting of the output of a cost-insensitive algorithm [Zha08]. In this category, we can cite MetaCost [Dom99] which manipulate the training set labels, Costing [ZLA03], Weighting [Tin02]or Empirical Thresholding [SL06]. 2.2

Used algorithms

Classi cation and regression trees (CART) Proposed by Breiman et al. [BFOS84], CART is a binary decision tree. This algorithm processes continuous and categorical attributes and target. CART uses the Gini splitting rule to search the best possible variable to split the node into two child nodes and grow the trees to their maximum size until no splits are possible.

Bagging

Bootstrap Aggregation (Bagging), is one of the earliest and simplest ensemble algorithms [Bre96]. The learners are tted to bootstrap replicates of the training set by sampling randomly from original set with replacement i.e.: an observation xi may appear multiple times in the sample. After the base learners have been t, the aggregated response is the majority vote. Page 57 Hence, Bagging has no memory, it is easily parallelize (as can be seen in Algorithm 1).

Algorithm 1 CS-bagging Algorithm

Input: S = ((x1; y1); :::; (xm; ym))), P: the number of classi er to train. for p:=1 to P do

Sp = Bootstrap(S), i.i.d. sampling with replacement from S. hp = TrainClassi er(St).

Add hp = to the ensemble.

end for

Boosting

Proposed by Schapire [SFBL97, SF12], boosting is a technique for sequentially combining multiple base classi ers whose combined performance is signi cantly better than that of any of the base classi ers. Each base classi er is trained on data that is weighted based on the performance of the previous classi er and each classi er votes to obtain a nal decision.

CS-CART

To generate a cost-sensitive CART algorithm Breiman et al. [BFOS84] modify the class probabilities, P (i) used in the information gain measure. Instead of estimating P (i) by Ni=N , it is weighted by the relative cost.

P (i) = Cij (Ni=N )= X cost(j)(Nj =N ) j The cost of misclassifying an example of class j as class i is : cost(j) = P Cij

CS-bagging

It learns the di erent individual classi ers then it uses the available classi ers for a better estimation of the posterior probabilities according to a voting scheme. This approach is applicable regardless of the underlying learning method [Shi15].

MetaCost

This algorithm was proposed by [Dom99]. MetaCost estimates the class probabilities then relabel the training instances to minimize the expected cost. Finally, a new classi er is built on the relabeled dataset.

We test di erent combinations of the former algorithms:

The insensitive cost classi ers: CART, BAGGING of CART, BOOSTING of CART.

Algorithm 2 CS-bagging Algorithm

Input: S = ((x1; y1); :::; (xm; ym))), P: the number of classi er to train. for p:=1 to P do

Sp = Bootstrap(S), i.i.d. sampling with replacement from S. hp = TrainClassi er(St).

Add hp = to the ensemble. end for for p:=1 to P do

Y^p(w) =Set the prediction with hp . end for According to the proportions observed on the P 0s prediction, we have an estimate of P (Y = yk=X(w)).

Make the prediction which minimizes the cost.

Algorithm 3 MetaCost Algorithm

Input: S = ((x1; y1); :::; (xm; ym))), L: cost matrix, H : classi er.

Estimate the class probabilities P (yijxi).

Relabel yi = argmin P j = 1kP (jjxi)L( ; j)8i . T = H(x; y).

Output: T .

One phase Cost classi ers : CS-CART, BAGGING CS-CART, CS-BAGGING CART, MULTI COST CART, BOOSTING CS-CART.

Two phases Cost classi er: CS-BAGGING CSCART 3 3.1

Experimentation Dataset

The empirical evaluation was made on the German credit scoring dataset from the UCI Machine Learning Repository. This dataset consists of 20 features and 1000 instances including 700 instances of credit-worthy applicants and 300 instances of insolvent customers who should not have been granted credit. This dataset is provided with a cost matrix,

Actual

Insolvent Creditworthy Page 58

Table 3 presents the general results of the nine algorithms. Following the recommendation of [TSTL12], we employ the non-parametric Freidman test to compare the classi ers. The Friedman test ranks the algorithms; to the best performing is the rank of 1, the second best is the rank 2, etc. The last column depicts the statistical test.

A number of conclusions emerge from this table. First, it emphasizes the superiority of ensemble methods compared to the individual classi er. When we consider the classi cation error, the best performances are reached by classical bagging and boosting. However, these algorithms focus on improving the classi cation accuracy at the expense of the minority class. So, they obtain a low sensibility which increases the cost.

On the other hand, a CS-bagging of CS-CART obtains the lowest misclassi cation cost followed by the individual classi er CS-CART. In this case, the statistical improvement is not signi cant (just 1.19). We can consider that this little improvement not worth the the computational cost. However, in some cases a small gain in performances represents a great gain in economical bene ts. Albeit this technique obtains the highest classi cation error.

Figure 1 compares the average results of the different methods. In gure 2 and 3, we can see the results error vs cost and speci city vs sensitivity for each classi er. Considering those values, we can suppose 4

Conclusion

In recent years, the number of insolvent loans has increased due to the nancial crisis. It becomes necessary for banks to nd new methods for the evaluation credit application. Machine learning techniques have been used to perform nancial decision making. However, these methods intended to minimize the misclassi cation error and assume that the di erent errors are equals. The cost sensitive techniques are used to handle the misclassi cation cost in many real world problems.

In this paper, we compare the performance of di erent cost-sensitive and cost-insensitive ensemble algorithms in determining the creditworthiness of an individual. The experiments drew the following conclusions (1) the ensemble approaches obtain better results than individual classi er; (2) the insensitive approaches reached the best classi cation accuracy but since the class distribution is highly imbalanced the minority class (insolvent loan) is less well recognized; (3) the cost sensitive approaches intended to reduce the cost at the expense of the accuracy.

Finally, we found that the cost sensitive bagging algorithm o ers the best trade-o between accuracy and misclassi cation cost. For future research, we aim to use techniques to handle imbalanced datasets and experiment with other cost sensitive algorithms. Page 59 Page 60 [AEW13]

Anatomy of the credit score . Journal of Economic Behavior & Organization , 95 : 175 { 185 , 2013 .

Alejo ,

Garc a ,

A. I.

Marques ,

J. S.

Sanchez , and J. A . Antonio-Velazquez. Making accurate credit risk predictions with cost-sensitive mlp neural networks . In Jorge Casillas , Francisco J. Mart nezLopez, Rosa Vicari , and Fernando De la Prieta, editors, Management Intelligent Systems , pages 1 { 8, Heidelberg, 2013 . Springer International Publishing.

Alejandro

Correa

Bahnsen , Djamila Aouada, and

Bjrn

Ottersten . Example-dependent cost-sensitive logistic regression for credit scoring . In 13th International Conference on Machine Learning and Applications , 2014 .

Alejandro

Correa

Bahnsen , Djamila Aouada, and

Bjrn

Ottersten . Example-dependent cost-sensitive decision trees . Expert Systems with Applications , 42 ( 19 ): 6609 { 6619 , 2015 .

Stone . Classi cation And Regression Trees . Chapman and Hall , New York, 1984 .

Breiman . Bagging predictors . Machine Learning , 24 : 123 { 140 , 1996 .

J.V.

Davis ,

Ha ,

C.J.

Rossbach ,

H.E.

Ramadan , and

Witchel . Cost-sensitive decision tree learning for forensic classi cation . In Proceedings of the 17th European Conference on Machine Learning,, page 622629 , 2006 .

Pedro

Domingos . Metacost: a general method for making classi ers cost-sensitive . In Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , pages 155 { 164 , 1999 .

Charles

Elkan . The foundations of cost-sensitive learning . In Proceedings of the 17th International Joint Conference on Arti cial Intelligence - Volume 2, IJCAI'01 , pages 973 { 978 , San Francisco, CA, USA, 2001 . Morgan Kaufmann Publishers Inc.

Data

Warehousing and Knowledge Discovery , page 303312 , 2007 .

Yeonkook J. Kim , Bok Baik, and Sungzoon

Cho . Detecting nancial misstatements with fraud intention using multi-class cost-sensitive learning . Expert Systems with Applications , 62 : 32 { 43 , 2016 .

Kukar and I. Kononenko. Cost-sensitive learning with neural networks . In Proceedings of the Thirteenth European Conference on Arti cial Intelligence , Chichester, NY., 1998 .

Bartosz

Krawczyk and

Michal

Wozniak . Evolutionary cost-sensitive ensemble for malware detection . In International Joint Conference SOCO' 14 - CISIS' 14 -ICEUTE' 14 , pages 433 { 442 , Cham , 2014 .

Bartosz

Krawczyk , Micha Woniak, and

Gerald

Schaefer . Cost-sensitive decision tree ensembles for e ective imbalanced classi cation . Applied Soft Computing , 14 : 554 { 562 , 2014 .

D.D.

Margineantu . Building ensembles of classi ers for loss minimization . In Proceedings of the 31st Symposium on the Interface, Models, Predictions and Computing , pages 190 { 194 , 1999 .

[Mar02] [MSV11] [MZ12] [SF12] [SFBL97] [Shi15] [SKWW07] [SL06] [Tin02] [TSTL12] [Tur95] [Tur00] [XLL17] [Yu17] [Zha08] [ZL16] [ZLA03] Dragos Dorin Margineantu . Methods for Costsensitive Learning . PhD thesis , Oregon State University, Corvallis, OR , USA, 2002 . AAI3029569.

Masnadi-Shirazi and

Vasconcelos . Costsensitive boosting . IEEE Transactions on Pattern Analysis and Machine Intelligence , 33 ( 2 ): 294309 , 2011 .

Min and

Zhu . A competition strategy to costsensitive decision trees . Rough Sets and Knowledge Technology , pages 359 { 368 , 2012 .

R.E.

Schapire and

Freund . Boosting: Foundations and Algorithms . The MIT Press, 2012 .

Lee . Boosting the margin: a new explanation for the e ectiveness of voting methods . In Machine Learning: Proceedings of the Fourteenth International Conference , 1997 .

S.A.

Shilbayeh . Cost sensitive meta learning . PhD thesis , School of computing, science and engineering university of salford manchester, UK, 2015 .

Wong , and Yang

Wang . Cost-sensitive boosting for classi cation of imbalanced data . Pattern Recognition , 40 ( 12 ): 3358 { 3378 , 2007 .

V. S.

Sheng and

C. X.

Ling . Thresholding for making classi ers cost-sensitive . In Proceedings of the st national conference on arti cial intelligence , Boston, Massachusetts, 2006 .

K. M. Ting . An instance-weighting method to induce cost-sensitive trees . IEEE Transactions on Knowledge and Data Engineering , 14 ( 3 ): 659 { 665 , 2002 .

Bogdan

Trawinski , Magdalena Smetek, Zbigniew Telec, and

Tadeusz

Lasota . Nonparametric statistical analysis for multiple comparison of machine learning regression algorithms . Applied Mathematics and Computer Science , 22 ( 4 ): 867 { 881 , 2012 .

Turney . Cost-sensitive classi cation: Empirical evaluation of a hybrid genetic decision tree induction algorithm . Journal of Arti cial Intelligence Research (JAIR) , 2 : 369 { 409 , 1995 .

Peter D. Turney . Types of cost in inductive concept learning . In Workshop on Cost-Sensitive Learning at the Seventeenth International Conference on Machine Learning , volume cs. LG/0212034 , 2000 .

Yufei

Xia , Chuanzhe Liu, and Nana Liu. Costsensitive boosted tree for loan evaluation in peer-topeer lending . Electronic Commerce Research and Applications , 24 : 30 { 49 , 2017 .

Xiaojiao

Yu . Machine learning application in online lending risk prediction . ArXiv e-prints, July 2017 .

Huimin

Zhao . Instance weighting versus threshold adjusting for cost-sensitive classi cation . Knowledge and Information Systems , 15 ( 3 ): 321 { 334 , Jun 2008 .

Zhao and

Li . A cost sensitive decision tree algorithm based on weighted class distribution with batch deleting attribute mechanism . Information sciences , 2016 .

Cost-sensitive learning by cost-proportionate example weighting . In Proceedings of the Third IEEE International Conference on Data Mining, ICDM '03 , pages 435 {, Washington, DC, USA, 2003 . IEEE Computer Society.