Comparison of ensemble cost sensitive algorithms:
           application to credit scoring prediction

                  Meryem Saidi                                       Mostafa El Habib Daho
            High School of Management                           Biomedical Engineering Laboratory
         GBM Laboratory, Tlemcen University                            Tlemcen University
             miryem.saidi@gmail.com                              mostafa.elhabibdaho@gmail.com

                    Nesma Settouti                              Mohammed El Amine Bechar
           Biomedical Engineering Laboratory                 Biomedical Engineering Laboratory
                  Tlemcen University                                Tlemcen University
               nesma.settouti@gmail.com                           am.bechar@gmail.com


                                                            1    Introduction
                                                            Credit scoring is the process of analyzing credit files,
                      Abstract                              to decide the creditworthiness of an individual. Dis-
                                                            tinguishing a good applicant for a loan from a bad
    In recent years, the increase in the demand for         one is important to cut financial institution’s losses
    credit leads the financial institutions to con-         [AEW13]. The use of machine learning tools allows
    sider artificial intelligence and machine learn-        auditors to analyze large amounts of information for
    ing techniques as a solution to make decisions          evaluating the credit risk in a reasonable time [Yu17].
    in a reduced time. These decision support sys-             These algorithms tend to decrease the classification
    tems reach good results in classifying loan ap-         error and assume that all misclassification’s have the
    plications into good loans and bad loans. Al-           same cost. However, the cost for labeling a positive
    beit they suffer of some limitations, mainly,           example as negative is different from the cost for label-
    they consider that the misclassification errors         ing a negative example as positive. Indeed, approving
    have the same financial impact.                         a bad loan is much more costly than rejecting a po-
    In this work, we study the performance of en-           tentially good loan [KBC16]. Indeed, if a loan can not
    semble cost sensitive algorithms in reducing            full fill its loan obligations this may result in negative
    the most expensive errors. We apply these               impacts on bank profits and big financial losses. How-
    techniques on German credit data. By com-               ever, if a good loan is rejected, it causes lower profits
    paring the different algorithms, we demon-              losses.
    strate the effectiveness of cost sensitive ensem-          These algorithms tend to decrease the classification
    ble algorithms in determining the potential             error and assume that all misclassification’s have the
    loan defaulters to reduce the financial cost.           same cost. However, the cost for labeling a positive
                                                            example as negative is different from the cost for label-
                                                            ing a negative example as positive. Indeed, approving
Keywords Cost sensitive learning, credit scoring, en-       a bad loan is much more costly than rejecting a po-
semble algorithms.                                          tentially good loan [KBC16]. Indeed, if a loan can not
                                                            full fill its loan obligations this may result in negative
                                                            impacts on bank profits and big financial losses. How-
Copyright c by the paper’s authors. Copying permitted for   ever, if a good loan is rejected, it causes lower profits
private and academic purposes.
                                                            losses.
In: Proceedings of the 3rd Edition of the International
Conference on Advanced Aspects of Software Engineering
                                                               On the other hand, credit datasets are highly im-
(ICAASE18), Constantine, Algeria, 1,2-December-2018, pub-   balanced which worsens the situation. Traditional ma-
lished at http://ceur-ws.org                                chine learning algorithms tend to maximize accuracy


                                                                                                              Page 56
Comparison of ensemble cost sensitive algorithms: application to credit scoring prediction                                 ICAASE'2018


by considering most of the cases as good loans (major-                        in cost sensitive learning is total cost (see equation 1).
ity class), thus causing significant default loss.
   Motivated by the non-uniform cost classification                                   T otal Cost = (F N ∗ CF N ) + (F P ∗ CF P )   (1)
problem, the data mining researchers propose new
cost-sensitive learning approaches for taking into                            The cost-sensitive learning methods can be categorized
account the misclassification costs or other types                            into two categories: direct and indirect.
of costs such as acquisition cost or computer cost
[Tur00, Dom99, Elk01, Mar02]. Some studies have                               Direct methods
been conducted on the use of cost-sensitive (CS) learn-                       In the direct method, the learning algorithm is it-
ing in credit scoring as a CS-boosted tree [XLL17],                           self cost-sensitive (CS). The CS learning algorithms
CS-Neuronal network [AGM+ 13], CS-decision tree                               use the misclassification cost during the learning pro-
[BAO15] and CS-logistic regression [BAO14].                                   cess. There are several works on cost-sensitive learn-
                                                                              ing algorithms such as ICET [Tur95], an evolution-
   The objective of this study is to compare the effec-                       ary algorithm using a misclassification cost in the
tiveness of different techniques to assist the loan officer                   fitness function. Many cost sensitive decision tree
in screening out potential loan defaulters in the credit                      approaches were proposed [MZ12, Tur95, DHR+ 06,
environment. The rest of paper is framed as follows:                          FCPB07, ZL16]. In [KK98], the authors perform a
Section 2 describes the used algorithms. Experimen-                           comparative study of different cost-sensitive neural
tal results and discussion are presented in Section 3,                        networks. Other researches propose cost sensitive en-
Finally, we conclude with a summary of results and                            semble methods [KW14, KWS14, SKWW07, MSV11,
directions for future works.                                                  Mar99].

2     Research methodology                                                    Indirect methods
                                                                              On the other hand, the indirect methods, called
In this section, we present the cost sensitive learning
                                                                              Cost-sensitive meta-learning, convert existing cost-
principle and the selected algorithms for the evalua-
                                                                              insensitive learning algorithms into cost-sensitive ones
tion.
                                                                              without modifying them. The cost-sensitive meta-
                                                                              learning technique, propose two major mechanisms:
2.1    Cost sensitive learning                                                a pre-process instance sampling or weighting of the
There are several methods to deal with unequal mis-                           training dataset and a threshold adjusting of the out-
classification costs. The first one is to use a learn-                        put of a cost-insensitive algorithm [Zha08]. In this
ing algorithm that takes into account the costs when                          category, we can cite MetaCost [Dom99] which manip-
building the classifier. The second strategy is to use                        ulate the training set labels, Costing [ZLA03], Weight-
sampling (oversampling and under-sampling) to alter                           ing [Tin02]or Empirical Thresholding [SL06].
the class distribution of the training data. In cost-
sensitive classification, the misclassification cost plays                    2.2     Used algorithms
an important role in the learning process. A cost ma-                         Classification and regression trees (CART)
trix is used to encode the penalty of misclassifying an
example from one class as another [Dom99]. Table 1                            Proposed by Breiman et al. [BFOS84], CART is a
represents a misclassification cost matrix, used to ob-                       binary decision tree. This algorithm processes con-
tain the cost of a false positive (FP), false negative                        tinuous and categorical attributes and target. CART
(FN), true positive (TP), and true negative (TN).                             uses the Gini splitting rule to search the best possi-
                                                                              ble variable to split the node into two child nodes and
                                                                              grow the trees to their maximum size until no splits
                   Table 1: Cost matrix                                       are possible.
                                Predict
                Actual   Positive Negative
                                                                              Bagging
               Positive    CT P       CF N
               Negative    CF P       CT N                                    Bootstrap Aggregation (Bagging), is one of the ear-
                                                                              liest and simplest ensemble algorithms [Bre96]. The
   The positive class is the most expensive class and                         learners are fitted to bootstrap replicates of the train-
C(i, j) denote the cost of predicting an instance from                        ing set by sampling randomly from original set with
class i as class j. Usually, C(i, i) have a null or a nega-                   replacement i.e.: an observation xi may appear multi-
tive cost and the FN cost is more expensive than a FP                         ple times in the sample. After the base learners have
cost (C(0, 1) > C(1, 0)). The best evaluation metrics                         been fit, the aggregated response is the majority vote.


International Conference on Advanced Aspects of Software Engineering                                                            Page 57
ICAASE, December, 01-02, 2018
Comparison of ensemble cost sensitive algorithms: application to credit scoring prediction                                       ICAASE'2018


Hence, Bagging has no memory, it is easily parallelize                        Algorithm 2 CS-bagging Algorithm
(as can be seen in Algorithm 1).                                               Input: S = ((x1 , y1 ), ..., (xm , ym ))), P: the number
                                                                               of classifier to train.
Algorithm 1 CS-bagging Algorithm
                                                                               for p:=1 to P do
 Input: S = ((x1 , y1 ), ..., (xm , ym ))), P: the number                         Sp = Bootstrap(S), i.i.d. sampling with replace-
 of classifier to train.                                                          ment from S.
 for p:=1 to P do                                                                 hp = TrainClassifier(St ).
    Sp = Bootstrap(S), i.i.d. sampling with replace-                              Add hp = to the ensemble.
    ment from S.                                                               end for
    hp = TrainClassifier(St ).                                                 for p:=1 to P do
    Add hp = to the ensemble.                                                     Ŷp (w) =Set the prediction with hp .
 end for                                                                       end for
                                                                               According to the proportions observed on the
                                                                               P 0 s prediction, we have an estimate of P (Y =
Boosting                                                                       yk /X(w)).
Proposed by Schapire [SFBL97, SF12], boosting is                               Make the prediction which minimizes the cost.
a technique for sequentially combining multiple base
classifiers whose combined performance is significantly                       Algorithm 3 MetaCost Algorithm
better than that of any of the base classifiers. Each                          Input: S = ((x1 , y1 ), ..., (xm , ym ))), L: cost matrix,
base classifier is trained on data that is weighted based                      H : classifier.
on the performance of the previous classifier and each                         Estimate the class probabilities     P (yi |xi ).
classifier votes to obtain a final decision.                                                          P
                                                                               Relabel yi = argmin j = 1kP (j|xi )L(γ, j)∀i .
                                                                               T = H(x, y).
CS-CART                                                                        Output: T .
To generate a cost-sensitive CART algorithm Breiman                               • One phase Cost classifiers : CS-CART, BAG-
et al. [BFOS84] modify the class probabilities, P (i)                               GING CS-CART, CS-BAGGING CART, MULTI
used in the information gain measure. Instead of es-                                COST CART, BOOSTING CS-CART.
timating P (i) by Ni /N , it is weighted by the relative
cost.                                                                             • Two phases Cost classifier: CS-BAGGING CS-
                               X                                                    CART
       P (i) = Cij ∗ (Ni /N )/    cost(j)(Nj /N )
                                        j                                     3     Experimentation
The cost of misclassifying an example of class j as class                     3.1     Dataset
                P
i is : cost(j) = Cij                                                          The empirical evaluation was made on the German
                                                                              credit scoring dataset from the UCI Machine Learning
CS-bagging                                                                    Repository. This dataset consists of 20 features and
                                                                              1000 instances including 700 instances of credit-worthy
It learns the different individual classifiers then it uses
                                                                              applicants and 300 instances of insolvent customers
the available classifiers for a better estimation of the
                                                                              who should not have been granted credit. This dataset
posterior probabilities according to a voting scheme.
                                                                              is provided with a cost matrix,
This approach is applicable regardless of the underly-
ing learning method [Shi15].
                                                                                               Table 2: Cost matrix
                                                                                                              Predict
MetaCost                                                                                 Actual      Insolvent Creditworthy
This algorithm was proposed by [Dom99]. MetaCost                                       Insolvent         0            5
estimates the class probabilities then relabel the train-                             Creditworthy       1            0
ing instances to minimize the expected cost. Finally,
a new classifier is built on the relabeled dataset.                           3.2     Results and discussion
   We test different combinations of the former algo-                         In this section, we present the results obtained by the
rithms:                                                                       different algorithms on the German Credit dataset. All
                                                                              computations were performed under Tanagra 1 .
  • The insensitive cost classifiers: CART, BAG-
    GING of CART, BOOSTING of CART.                                               1 http://eric.univ-lyon2.fr/ ricco/tanagra/fr/tanagra.html


International Conference on Advanced Aspects of Software Engineering                                                                  Page 58
ICAASE, December, 01-02, 2018
Comparison of ensemble cost sensitive algorithms: application to credit scoring prediction                                ICAASE'2018


                                           Table 3: Classification Results
                                        Error     Specificity     Sensibility      Cost       Friedman test
        CART                            27.57         88.52           35.87         47.68             7
        CS-CART                         39.34         49.76           83.70         26.91            3.5
        BAGGING CART                   21.93          91.87           46.74         39.16           6.62
        BAGGING CS-CART                 36.88         55.50           80.43         27.35           4.76
        CS-BAGGING CART                 38.87         51.67           82.61         27.06             4
        CS-BAGGING CS-CART              43.85         41.63           89.13        25.71              3
        MULTICOST CART                  40.53         49.76           81.52         28.40             5
        BOOSTING CART                  21.93          87.08           57.61         33.18           5.87
        BOOSTING CS-CART                37.21         55.50           79.35         28.10           5.25
We compare the performances of the methods with dif-         that boosting-CART offers a good trade-off between
ferent metrics: error, misclassification cost, sensitivity   error and misclassification cost followed by bagging-
and specificity.                                             CART. But, it is clear that even if the ensemble cost-
                                                             insensitive algorithms obtain generally good results,
                       FP               FN                   they promote a good classification of the majority class
        Error =                 +
                   (T P + F P ) (F N + T N )                 at the expense of the minority class (insolvent loans).
                                                             However, when we carry out a bagging of cost sensi-
           Cost = F P ∗ CF P + F N ∗ CF N                    tive trees the recognition of the insolvent loan is higher
                                                             without a major lose in specificity. On the other hand,
                                   TP                        the CS-bagging obtains better sensibility but lower
              Specif icity =                                 specificity.
                             (T P + F N )
                                                             From a general point of view, if we consider both mis-
                                   TN                        classification error and cost, we can say that the winner
              Sensitivity =
                             (F P + T N )                    in this study is a bagging of CS-trees.
   Table 3 presents the general results of the nine algo-
rithms. Following the recommendation of [TSTL12],                             4     Conclusion
we employ the non-parametric Freidman test to com-
pare the classifiers. The Friedman test ranks the al-                         In recent years, the number of insolvent loans has in-
gorithms; to the best performing is the rank of 1, the                        creased due to the financial crisis. It becomes neces-
second best is the rank 2, etc. The last column depicts                       sary for banks to find new methods for the evaluation
the statistical test.                                                         credit application. Machine learning techniques have
   A number of conclusions emerge from this table.                            been used to perform financial decision making. How-
First, it emphasizes the superiority of ensemble meth-                        ever, these methods intended to minimize the misclas-
ods compared to the individual classifier. When we                            sification error and assume that the different errors
consider the classification error, the best performances                      are equals. The cost sensitive techniques are used to
are reached by classical bagging and boosting. How-                           handle the misclassification cost in many real world
ever, these algorithms focus on improving the classifi-                       problems.
cation accuracy at the expense of the minority class.                         In this paper, we compare the performance of differ-
So, they obtain a low sensibility which increases the                         ent cost-sensitive and cost-insensitive ensemble algo-
cost.                                                                         rithms in determining the creditworthiness of an in-
On the other hand, a CS-bagging of CS-CART ob-                                dividual. The experiments drew the following conclu-
tains the lowest misclassification cost followed by the                       sions (1) the ensemble approaches obtain better re-
individual classifier CS-CART. In this case, the sta-                         sults than individual classifier; (2) the insensitive ap-
tistical improvement is not significant (just 1.19). We                       proaches reached the best classification accuracy but
can consider that this little improvement not worth                           since the class distribution is highly imbalanced the
the the computational cost. However, in some cases a                          minority class (insolvent loan) is less well recognized;
small gain in performances represents a great gain in                         (3) the cost sensitive approaches intended to reduce
economical benefits. Albeit this technique obtains the                        the cost at the expense of the accuracy.
highest classification error.                                                 Finally, we found that the cost sensitive bagging algo-
   Figure 1 compares the average results of the dif-                          rithm offers the best trade-off between accuracy and
ferent methods. In figure 2 and 3, we can see the re-                         misclassification cost. For future research, we aim to
sults error vs cost and specificity vs sensitivity for each                   use techniques to handle imbalanced datasets and ex-
classifier. Considering those values, we can suppose                          periment with other cost sensitive algorithms.


International Conference on Advanced Aspects of Software Engineering                                                           Page 59
ICAASE, December, 01-02, 2018
Comparison of ensemble cost sensitive algorithms: application to credit scoring prediction        ICAASE'2018


                                       Figure 1: Error classification Vs misclassification cost


                                                   Figure 2: Sensitivity Vs Specificity


International Conference on Advanced Aspects of Software Engineering                                  Page 60
ICAASE, December, 01-02, 2018
Comparison of ensemble cost sensitive algorithms: application to credit scoring prediction                                           ICAASE'2018


References                                                                    [Mar02]        Dragos Dorin Margineantu. Methods for Cost-
                                                                                             sensitive Learning. PhD thesis, Oregon State Uni-
[AEW13]         Shweta Arya, Catherine Eckel, and Colin Wichman.                             versity, Corvallis, OR, USA, 2002. AAI3029569.
                Anatomy of the credit score. Journal of Economic
                Behavior & Organization, 95:175 – 185, 2013.                  [MSV11]        H. Masnadi-Shirazi and N. Vasconcelos.     Cost-
                                                                                             sensitive boosting. IEEE Transactions on Pattern
[AGM+ 13]       R. Alejo, V. Garcı́a, A. I. Marqués, J. S. Sánchez,                        Analysis and Machine Intelligence, 33(2):294309,
                and J. A. Antonio-Velázquez.       Making accurate                          2011.
                credit risk predictions with cost-sensitive mlp neural
                networks. In Jorge Casillas, Francisco J. Martı́nez-          [MZ12]         F. Min and W. Zhu. A competition strategy to cost-
                López, Rosa Vicari, and Fernando De la Prieta, ed-                          sensitive decision trees. Rough Sets and Knowledge
                itors, Management Intelligent Systems, pages 1–8,                            Technology, pages 359–368, 2012.
                Heidelberg, 2013. Springer International Publishing.
                                                                              [SF12]         R.E. Schapire and F. Freund. Boosting: Founda-
[BAO14]         Alejandro Correa Bahnsen, Djamila Aouada, and                                tions and Algorithms. The MIT Press, 2012.
                Bjrn Ottersten. Example-dependent cost-sensitive
                logistic regression for credit scoring. In 13th In-           [SFBL97]       R. E. Schapire, Y. Freund, P. Bartlett, and W.S.
                ternational Conference on Machine Learning and                               Lee. Boosting the margin: a new explanation for the
                Applications, 2014.                                                          effectiveness of voting methods. In Machine Learn-
                                                                                             ing: Proceedings of the Fourteenth International
[BAO15]         Alejandro Correa Bahnsen, Djamila Aouada, and                                Conference, 1997.
                Bjrn Ottersten. Example-dependent cost-sensitive
                decision trees. Expert Systems with Applications,             [Shi15]        S.A. Shilbayeh. Cost sensitive meta learning. PhD
                42(19):6609 – 6619, 2015.                                                    thesis, School of computing, science and engineering
                                                                                             university of salford manchester, UK, 2015.
[BFOS84]        L. Breiman, J. H. Friedman, R. A. Olshen, and C. J.
                                                                              [SKWW07]       Yanmin Sun, Mohamed S. Kamel, Andrew K.C.
                Stone. Classification And Regression Trees. Chap-
                                                                                             Wong, and Yang Wang. Cost-sensitive boosting for
                man and Hall, New York, 1984.
                                                                                             classification of imbalanced data. Pattern Recogni-
                                                                                             tion, 40(12):3358 – 3378, 2007.
[Bre96]         L. Breiman. Bagging predictors. Machine Learning,
                24:123–140, 1996.                                             [SL06]         V. S. Sheng and C. X. Ling. Thresholding for making
                                                                                             classifiers cost-sensitive. In Proceedings of the st na-
      +
[DHR 06]        J.V. Davis, J. Ha, C.J. Rossbach, H.E. Ramadan,                              tional conference on artificial intelligence, Boston,
                and E. Witchel. Cost-sensitive decision tree learning                        Massachusetts, 2006.
                for forensic classification. In Proceedings of the 17th
                European Conference on Machine Learning,, page                [Tin02]        K. M. Ting. An instance-weighting method to in-
                622629, 2006.                                                                duce cost-sensitive trees. IEEE Transactions on
                                                                                             Knowledge and Data Engineering, 14(3):659 – 665,
[Dom99]         Pedro M. Domingos. Metacost: a general method for                            2002.
                making classifiers cost-sensitive. In Proceedings of
                the Fifth ACM SIGKDD International Conference                 [TSTL12]       Bogdan Trawinski, Magdalena Smetek, Zbigniew
                on Knowledge Discovery and Data Mining, pages                                Telec, and Tadeusz Lasota. Nonparametric statis-
                155–164, 1999.                                                               tical analysis for multiple comparison of machine
                                                                                             learning regression algorithms. Applied Mathemat-
[Elk01]         Charles Elkan. The foundations of cost-sensitive                             ics and Computer Science, 22(4):867–881, 2012.
                learning. In Proceedings of the 17th International
                Joint Conference on Artificial Intelligence - Vol-            [Tur95]        P. Turney. Cost-sensitive classification: Empirical
                ume 2, IJCAI’01, pages 973–978, San Francisco, CA,                           evaluation of a hybrid genetic decision tree induc-
                USA, 2001. Morgan Kaufmann Publishers Inc.                                   tion algorithm. Journal of Artificial Intelligence
                                                                                             Research (JAIR), 2:369–409, 1995.
[FCPB07]        A. Freitas, A. Costa-Pereira, and P. Brazdil. Cost-
                                                                              [Tur00]        Peter D. Turney. Types of cost in inductive concept
                sensitive decision trees applied to medical data.
                                                                                             learning. In Workshop on Cost-Sensitive Learn-
                Data Warehousing and Knowledge Discovery, page
                                                                                             ing at the Seventeenth International Conference on
                303312, 2007.
                                                                                             Machine Learning, volume cs.LG/0212034, 2000.
[KBC16]         Yeonkook J. Kim, Bok Baik, and Sungzoon Cho. De-              [XLL17]        Yufei Xia, Chuanzhe Liu, and Nana Liu. Cost-
                tecting financial misstatements with fraud intention                         sensitive boosted tree for loan evaluation in peer-to-
                using multi-class cost-sensitive learning. Expert Sys-                       peer lending. Electronic Commerce Research and
                tems with Applications, 62:32 – 43, 2016.                                    Applications, 24:30 – 49, 2017.

[KK98]          M. Kukar and I. Kononenko. Cost-sensitive learning            [Yu17]         Xiaojiao Yu. Machine learning application in online
                with neural networks. In Proceedings of the Thir-                            lending risk prediction. ArXiv e-prints, July 2017.
                teenth European Conference on Artificial Intelli-
                gence, Chichester, NY., 1998.                                 [Zha08]        Huimin Zhao. Instance weighting versus threshold
                                                                                             adjusting for cost-sensitive classification. Knowl-
[KW14]          Bartosz Krawczyk and Michal Woźniak. Evolu-                                 edge and Information Systems, 15(3):321–334, Jun
                tionary cost-sensitive ensemble for malware detec-                           2008.
                tion. In International Joint Conference SOCO’14-
                CISIS’14-ICEUTE’14, pages 433–442, Cham, 2014.                [ZL16]         H. Zhao and X. Li. A cost sensitive decision tree
                Springer International Publishing.                                           algorithm based on weighted class distribution with
                                                                                             batch deleting attribute mechanism. Information
[KWS14]         Bartosz Krawczyk, Micha Woniak, and Gerald                                   sciences, 2016.
                Schaefer. Cost-sensitive decision tree ensembles for
                effective imbalanced classification. Applied Soft             [ZLA03]        Bianca Zadrozny, John Langford, and Naoki Abe.
                Computing, 14:554 – 562, 2014.                                               Cost-sensitive learning by cost-proportionate exam-
                                                                                             ple weighting. In Proceedings of the Third IEEE
[Mar99]         D.D. Margineantu. Building ensembles of classifiers                          International Conference on Data Mining, ICDM
                for loss minimization. In Proceedings of the 31st                            ’03, pages 435–, Washington, DC, USA, 2003. IEEE
                Symposium on the Interface, Models, Predictions                              Computer Society.
                and Computing, pages 190–194, 1999.


International Conference on Advanced Aspects of Software Engineering                                                                       Page 61
ICAASE, December, 01-02, 2018