-

BoostEMM | Transparent Boosting using Exceptional Model Mining

Simon van der Zon

Oren Zeev Ben Mordehai

Tom Vrijdag

t.s.vrijdag@student.tue.nl 1

Werner van Ipenburg

Jan Veldsink

jan.veldsinkg@rabobank.nl 0

Wouter Duivesteijn

Mykola Pechenizkiy

m.pechenizkiyg@tue.nl 1 0 Cooperatieve Rabobank U.A. , the Netherlands 1 Eindhoven University of Technology , the Netherlands

Boosting is an iterative ensemble-learning paradigm. Every iteration, a weak predictor learns a classi cation task, taking into account performance achieved in previous iterations. This is done by assigning weights to individual records of the dataset, which are increased if the record is misclassi ed by the previous weak predictor. Hence, subsequent predictors learn to focus on problematic records in the dataset. Boosting ensembles such as AdaBoost have shown to be e ective models at ghting both high variance and high bias, even in challenging situations such as class imbalance. However, some aspects of AdaBoost might imply limitations for its deployment in the real world. On the one hand, focusing on problematic records can lead to over tting in the presence of random noise. On the other hand, learning a boosting ensemble that assigns higher weights to hard-to-classify people might throw up serious questions in the age of responsible and transparent data analytics; if a bank must tell a customer that they are denied a loan, because the underlying algorithm made a decision speci cally focusing the customer since they are hard to classify, this could be legally dubious. To kill these two birds with one stone, we introduce BoostEMM: a variant of AdaBoost where in every iteration of the procedure, rather than boosting problematic records, we boost problematic subgroups as found through Exceptional Model Mining. Boosted records being part of a coherent group should prevent over tting, and explicit de nitions of the subgroups of people being boosted enhances the transparency of the algorithm.

Boosting class imbalance Exceptional Model Mining model transparency responsible analytics

decisions of the earlier, blunter base classi ers with the later, more speci c ones, an ensemble classi er is built that performs well overall. Hence, AdaBoost as a mechanism shines in devoting appropriate attention of a classi er to those records of the dataset that prove to be problematic.

For all its strenghts, AdaBoost also comes with two weaknesses. The records proving problematic for the classi er might encompass outliers, which could lead to over tting. Moreover, in the age of responsible data analytics, we want to know not only what our algorithm does, but also why it is reasonable. A recent European Parliament resolution on the future of robotics and arti cial intelligence in Europe contains the following [9, Section \Ethical principles", point 12]: [. . . ] it should always be possible to supply the rationale behind any decision taken with the aid of AI that can have a substantive impact on one or more persons' lives; [. . . ] it must always be possible to reduce the AI system's computations to a form comprehensible by humans; If the decision in a dataset on loan applications is outsourced to AdaBoost, the customer might demand insight in the reasoning behind rejection. When this resolution is put into law, it is likely that a customer demanding insight in the reasoning AdaBoost deployed behind its decision (which has a substantive impact on the customer's life), must be presented with not only how AdaBoost decides where to focus its extra attention, but also why. Hence, in the near future, nancial institutions will shy away from the liability associated with using AdaBoost if there is no transparent form of boosting.

In this paper, we ll that void by proposing BoostEMM. This method combines AdaBoost-style iterative learning of base classi ers, where the focus shifts towards problematic parts of input space, with Exceptional Model Mining (EMM) [ 18,5 ]. This is a local pattern mining method, designed to nd subgroups (subsets of the dataset at hand) that satisfy two conditions. On the one hand, subgroups must be interpretable. This is typically enforced by only allowing subgroups that can be de ned as a conjunction of few conditions on input attributes of the dataset; hence subgroups come in terms that any domain expert can understand. On the other hand, subgroups must be exceptional. This typically is formalized in terms of an unusual interaction between several target attributes; hence subgroups represent unusual behavior in the dataset. We choose targets that represent the actual class label and the label predicted by base classi ers, and de ne several quality measures that gauge unusual interaction between those targets. Hence, for various types of bad base classi er performance, EMM nds coherent parts of the classi er input space where this behavior occurs. 1.1

Main Contributions

We provide BoostEMM, a method encompassing core ideas from both AdaBoost and Exceptional Model Mining. By dovetailing these techniques, BoostEMM achieves two bene ts over AdaBoost: 1. by de ning speci c EMM variants (cf. Section 3.3), we steer boosting to punish speci c kinds of bad behavior (error rate, class imbalance, FPR/TPR), which is relevant for cost-sensitive applications; 2. by dovetailing an EMM run with every iteration of the boosting algorithm, on every step we can report subgroups where extra boosting is needed. This adds transparency to the boosting process (cf. Section 3.4), which is relevant in the light of looming EU law. 2

Related Work

The groundbreaking paper on AdaBoost in its traditional form is [ 10 ]; this form is also explained in Section 3. A version incorporating class probabilities was introduced in [ 11 ]. Convex potential boosting algorithms (which includes AdaBoost) cannot handle random classi cation noise well [ 20 ]; boosting tends to over t towards the noisily labeled records of the dataset, reducing generality of the learned classi cation model.

The resurgence of neural networks through the deep learning hype has led to a reappreciation of complex classi ers, that perform extremely well on the task for which they have been designed, but whose internal reasoning is far too complex for a human to fully understand. As a reaction, papers emerge that take a peek into the black box. Some of the rst such papers include [ 12,13 ] for hard classi ers, and [ 8 ] for soft classi ers. These papers share the objective of transparency with BoostEMM, but they do not (as BoostEMM does) loop back the interpretable results into the classi cation process to improve performance.

The study of how local patterns can aid global models was the topic of the LeGo workshop [ 16 ]. This workshop encompasses both papers enhancing existing classi ers with local patterns, or combining local patterns into a classi er. A few years later, LeGo-EMM [ 7 ] enhanced multi-label classi ers with local patterns found through Exceptional Model Mining with the Bayesian networks model class [ 6 ], which improved multi-label SVN classi cation. These methods have in common that the learning process is a single line: rst local patterns are mined, then a subset of those patterns is selected, and nally that subset is used to enhance/replace the feature set of a classi er, with which subsequently predictions are made. Hence, LeGo papers share the incorporation of local pattern knowledge in classi cation with BoostEMM, but they do not (as BoostEMM does) loop back the output into an iterative learning process.

Exceptional Model Mining seeks to nd subgroups of a dataset where several targets interact in an unusual manner. A simpler cousin of EMM is Subgroup Discovery (SD) [ 15,21,14 ]: the task of nding subgroups of a dataset where a single target displays an unusual distribution. SD is closely related to Contrast Set Mining (CSM) [ 1 ] and Emerging Pattern Mining (EPM) [ 3 ]; the results of the latter technique have also been exploited to enhance classi cation [ 4 ], in the style of the LeGo workshop papers discussed in the previous paragraph. The relation between SD, CSM, and EPM is explored in detail in [ 17 ].

The BoostEMM Method

Given a dataset , which is a bag of N records r 2 of the form r = (a1; : : : ; ak; `), where fa1; : : : ; akg are the input attributes of the dataset, taken from some collective domain A, and ` is the class label, typically taken to be binary. If we need to refer to a speci c record (or corresponding data components), we do so by superscripts: the ith record is denoted ri, `i is its class label, and the value of its jth input attribute is aij . We also use the shorthand ai to denote the collective input attribute values of ri: ai = (ai1; : : : ; aik).

The goal of classi cation is to nd a mapping P from the input attribute space to the class label; P : A ! f0; 1g, such that we can predict the latter for unseen instances of the former. In boosting, these predictions are improved through the following methodology. We iteratively build one strong learner or expert P = (E; W). This is done by constructing an ensemble E of h weak learners or predictors (i.e. classi ers that perform (slightly) better than random) E = (P1; : : : ; Ph), and an associated tuple W = (w1; : : : ; wh) of weights related to the performance of each predictor. AdaBoost obtains these weights for each weak classi er Pi by a transformation (cf. Section 3.2) of its error rate erri. In each iteration of the boosting process, a new weak learner Pj is constructed, taking into account the whole training set but also the up-to-date priorities, or weights, of the records. These weights are maintained as another tuple of N weights W , associated with the records of the dataset: W = (w1; : : : ; wN ). In the rst iteration, all training data (unless given initial weights by the end user) are initialized with equal weights wi = 1=N, and the rst weak learner is trained. In subsequent iterations, the weights are increased for all samples that are classi ed incorrectly by P, after which the weights are normalized. In later sections, we replace the selection mechanism for the erroneous samples by an equivalent EMM function de ning the subgroups to be boosted. AdaBoost updates these weights W in a manner similar to the weights W, by a transformation (cf. Section 3.2) based on errj . 3.1

Exceptional Model Mining

Given a dataset , which is a bag of N records r 2 of the form (a1; : : : ; ak; t1; : : : ; tm), where fa1; : : : ; akg are the descriptors of the dataset, and ft1; : : : ; tmg the targets. The goal of EMM is to nd subgroups of the dataset at hand, de ned in terms of a conjunction of a few conditions on single descriptors of the dataset (e.g.: a7 3 ^ a3 = true), for which the targets interact in an unusual manner. De nition 1 (Subgroup). A subgroup corresponding to a description D is the bag of records GD that D covers, i.e.

GD = f ri 2 j D(ai) = 1 g From now on we omit the D if no confusion can arise, and refer to the coverage of a subgroup by n = jGj.

In order to objectively evaluate a candidate description in a given dataset, we need to de ne a quality measure. For each description D in a user-de ned description language D, this function quanti es the extent to which the subgroup GD corresponding to the description deviates from the norm.

De nition 2 (Quality Measure). A quality measure is a function ' : D ! R that assigns a numeric value to a description D. 3.2

Mining Descriptions for Boosting and Updating the Weights

The input attributes in classi cation/boosting correspond to the descriptors in EMM. Having trained a weak learner Pj , we generate targets that re ect how well the classi cation performs on each record. We explore several choices for unusual interaction between these targets in Section 3.3. Having thusly de ned a model class for EMM, we run the beam search algorithm [5, Algorithm 1] to generate a set Dtop-q of subgroups GD with their associated descriptions D. AdaBoost constructs the subset to be boosted by simply picking all erroneously classi ed samples. Instead, BoostEMM picks every record that is covered by at least one of the top subgroups we found with EMM. Hence, BoostEMM adheres to the following scheme:

X wiI(`i 6= Pj (ai)) errj =

j = log wi

1 N P wi i=1 i=1

errj wi exp 1 errj

M P(a) = arg max X `2f0;1g m=1

j I(Pj (a) = `) j I ai 2 9Dj2Dtop-q : Dj (ai) = 1 In AdaBoost, the weight update function is instead given by: wi

wi exp( j I(`i 6= Pj (ai))) 3.3

The Transparent Boosting Model Class for EMM

The missing ingredient in the description of BoostEMM in the previous section, is: how do we nd the subgroup set Dtop-q? We do so by Exceptional Model Mining. Within BoostEMM, every EMM run is encompassed by the boosting process. Hence, we have just trained a weak learner Pj to predict a speci c class label ` for every possible input attribute vector a 2 A. Within this setting, in order to employ EMM, we need to cast the available building blocks into a form that ts the EMM problem speci cation, as outlined in Section 3.1. We need to describe the dataset in terms of descriptors and targets, formulate a model class over the targets, and de ne a quality measure over this model class.

For the descriptors in EMM we take the input attributes ai1; : : : ; aik of the classi cation task given at the start of Section 3. In the Transparent Boosting model class for EMM there are two targets: the original class label, ti1 = `i, and the class label predicted by the available weak learner Pj , ti2 = Pj (ai). The kind of interaction in which the Transparent Boosting model class is interested, is an exceptional discord between the original class label and the predicted class label: where does our weak learner perform not so well?

The last question can be answered in many reasonable manners. Which answer we choose depends on what kind of boosting we want to achieve. In EMM, the quality measure governs what exactly we nd interesting within the kind of interaction de ned by the model class. As is common in EMM, we build up the quality measure from two components: 'TB(D) = 'size(D) 'dev(D). The latter component measures the exceptionality degree of target interaction. Since a large value for this can easily be obtained in tiny subgroups, we need to prevent overtting by multiplying with a component explicitly representing subgroup size. For this, we take 'size(D) = log(jGDj). We employ the logarithm here, since we do not want to put a penalty on medium-sized subgroups compared to large subgroups; this component is only meant to discourage tiny subgroups. For the deviation component 'dev(D), we develop four alternatives.

Error-based boosting with 'err If one would merely be interested in the error rate, we de ne the target interaction exceptionality of the subgroup as follows.

'err(D) =

P i:D(ai)=1 wiI(`i 6= Pj (ai))

P i:D(ai)=1 wi This quality measure computes the error rate, but only on the records covered by the subgroup. Hence, unlike AdaBoost, BoostEMM will also boost records of the dataset that were classi ed correctly by the weak learner. This is deliberate, since this ought to reduce the over tting e ect from which AdaBoost su ers. Kappa-based boosting with ' In the presence of class imbalance, optimizing for the Kappa statistic is more appropriate than the error rate. ' (D) = accobs(D) 1 accexp(D)

accexp(D) accobs(D) = 1= w(D) wi I(`i = 1)

i:D(ai)=1 accexp(D) = (pos(D) posp(D)+neg(D) negp(D))=( w(D))2 X X , where w(D) =

wi X

i:D(ai)=1 wiI(`i = Pj (ai)) pos(D) = neg(D) = i:D(ai)=1

X Cost-sensitive boosting with 'FNR and 'FPR Based on the dataset domain at hand, one might be interested in cost-sensitive classi cation. When the cost of false negatives and false positives is substantially skewed, one would desire to nd subgroups that boost for either of these components. Hence, we employ each type of classi cation mistake as a deviation component of its own. 'FNR(D) = 1=n 'FPR(D) = 1=n After one has trained an ensemble using boosting, it is insightful to know how the ensemble was constructed (i.e. which data was emphasized most during the boosting process). Especially when boosting for various quality measures, it can be interesting to see how the various boosting strategies behave. We present a visualization that shows the user exactly which regions have (successfully) been boosted the most. Our method can show a high number of descriptions by visualizing them in a tree. The tree is constructed by looping over the descriptions. For each description we create a branch: 1. A branch consists of nodes represented by the literals of a description, and the root of the branch is the rst literal (which makes sense, since each following literal is a re nement on the description). 2. The last node (literal) of the branch stores the weight of the description. The weight corresponds to the error of the weak learner that was constructed from this description (w). 3. If a literal already exists during creation of the branch, we increase the weight of the existing leaf by the weight of the literal, because the description was used more heavily (i.e. by multiple classi ers). We proceed the creation of the branch using the existing path.

After tree construction, we merge sibling leaf nodes de ned on the same attribute that originate from the same boosting iteration. For instance, two sibling leaf nodes \age < 20" and \age < 30" can be merged into a single leaf node \age < 30", since all descriptions from the same round are boosted together.

Figure 1 shows the descriptions encountered in a run of the BoostEMM process. The size of the nodes represents the weight, indicating the degree to which the constraint contributes to the selection of samples for boosting. 4

Experiments

Three datasets with a binary classi cation task were used for the experiments (cf. Table 1). The well-known Adults dataset stems from the UCI repository [ 19 ]; the positive class is high earners. Credit-card fraud [ 2 ] can be found at Kaggle; the task is to detect fraudulent transactions. Metadata for both these datasets is readily available online. The third dataset, however, is proprietary: Rabobank provided an anonymized real-life dataset related to on-line fraud.

Rabobank is a nancial institution, one of the three biggest banks in the Netherlands. Rabobank supplies a full range of nancial products to its customers, including internet and mobile banking. The dataset we work with encompasses over 30 million samples of internet transactions and mobile payments, performed from January 2016 up to February 2017. Most of these samples represent genuine, non-fraudulent transactions, but a tiny fraction ( 2 000) were manually marked as fraudulent by domain experts. Known types of fraud include trojan attacks, phishing and ID takeover. For trojan attacks and phishing, the client takes part in the fraudulent transaction by providing the two-factor authentication to the bank, after being misled by the fraudster to do so on a payment prepared by the fraudster. In ID takeover, the fraudster has stolen the credentials of the client and is able to provide the authentication herself. Typically, di erent attackers and attack types occur in the same period. As attacks are being blocked the modus operandi is changed or renewed within days or weeks. Old attacks are retried over time, new attack vectors show up.

Each record consists of a timestamp, an identi cation hash, a binary label to indicate fraud, and 1 013 anonymized features. Attributes are masked by renaming. Algorithmically each attribute is inspected; if a sample contains not more then 200 unique values it is considered to be a code which is recoded to a numeric value. Numeric and text values are frequency-based transformed into up to 801 bins. Base attributes are constructed from the current transaction, as well as the history from accountholder and bene ciary. Aggregations found to be useful for the current business rule system were added.

The task in the Rabobank dataset is to predict transactions to be fraudulent (label=1) or not (label=0), having learned from historical data only. In order to be useful alongside the current fraud detection, the bank requires the FPR to be stable and far less then 1:10 000. 4.1

Experimental Setup

All datasets are imbalanced: there are substantially more negative records than positive ones. Since we build on scikit-learn's Python Decision Tree implementation as weak learner, we use dummy variables to handle categorical variables in the Adult dataset. For the Credit-card and Rabobank dataset, all the values were given as numeric in the rst place. We discard the `Time' column in the Credit-card dataset. The beam search algorithm for EMM [5, Algorithm 1] is parametrized with search width w = 3, search depth d = 3, and incorporates the top-q subgroups into BoostEMM with q = 6. We use an AdaBoost implementation with decision stumps (i.e. decision trees with depth 1). 4.2

Experimental Results

We run comparative experiments with seven competitors; results can be found in Table 2. The models are Straw Man (majority class), AdaBoost SAMME [ 10 ], AdaBoost SAMME.R [ 11 ], and BoostEMM with each of the target interaction exceptionality components (cf. Section 3.3). For each competitor we report accuracy evaluated on a withheld test set. Since all datasets are imbalanced, we also report Kappa and AUC. For all measures, higher is better.

Since Adults is the only non-anonymized dataset, we present descriptions discovered during the BoostEMM training for a qualitative inspection. Subgroups 2 3 that are deemed most problematic by the four compound quality measures in the rst ve iterations of the BoostEMM process can be found in Tables 3{6. 5

Discussion

Table 2 shows that BoostEMM can sometimes match the performance of AdaBoost, but sometimes it does not do so well. As expected, 'err mimics AdaBoost best in terms of pure performance; it barely loses accuracy on the Rabobank and Credit-card datasets in comparison with AdaBoost, while it has to cede some ground on the Adults dataset. Interestingly, while BoostEMM with 'FNR leads to substantial accuracy loss in two of the datasets, it performs unexpectedly well on the third. All methods have a high accuracy on the Credit-card dataset, but in terms of AUC, 'FNR outperforms all other methods including AdaBoost.

When we inspect subgroups in more detail (cf. Tables 3{6), we obtain more transparency and hence accountability into the boosting process. This transparency is augmented by the visualization as introduced in Figure 1. Additionally, from Table 6 we learn that BoostEMM su ers from a familiar problem in data mining. This table follows the process of boosting subgroups featuring an unusually high False Postive Rate. As the table shows, the top subgroups found in the rst ve iterations have an unde ned FPR: there are no positives in these subgroups at all. This is caused by the rst weak learner assigning all records to the majority class, which is negative: the process only features true and false negatives! In this setting, FPR boosting makes no sense. Therefore, in 3 4 5

D capital-gain capital-loss capital-gain capital-gain capital-loss 6896:48 ^ age 19:52 ^ education 7th-8th 6= 1 1802:48ca^pimtaal-rliotassl-sta1tu9s52M:6a9rried-civ-spouse = 1^ 493 16 0 477 0 24137:69 ^ marital-status Married-civ-spouse 6= 1 ^ age 19:52 98 1 0 97 0 1802:48 ^ marital-status Married-civ-spouse = 1 ^ age > 27:07 887 133 0 754 0 24137:69 ^ marital-status Married-civ-spouse 6= 1 ^ age 19:52 98 1 0 97 0 jGDj TN FP FN TP 1632 17 0 1615 0 future work, we plan to tackle this problem by dovetailing the various kinds of boosting BoostEMM has to o er.

A similarly detailed investigation as the on in Tables 3{6 has been made for the Rabobank dataset. Here, the attribute names are all obfuscated; we nd them in the form C 0010. However, we presented the resulting subgroups in such tables to domain experts at Rabobank who possess the key to translate obfuscated features back to real-life information. They reported back that the subgroups focus on the historical behavior of the customer of counterparty. Subgroups reported in the rst iteration make the initial, rough cut. Subgroups reported in the second iteration give it more detail towards a speci c modus operandi. Client con dentiality disallows us to discuss more details about these subgroups, but the domain experts con rm that the problematic areas have clear meaning to them, which provides us with con dence that BoostEMM indeed adds the desired transparency to the boosting process.

S.D.

Bay ,

M.J.

Pazzani . Detecting Change in Categorical Data: Mining Contrast Sets . Proc. KDD , pp. 302 { 306 , 1999 .

Dal Pozzolo ,

Caelen ,

R.A.

Johnson , G. Bontempi. Calibrating Probability with Undersampling for Unbalanced Classi cation . Proc. SSCI , pp. 159 { 166 , 2015 .

Dong ,

Li . E cient Mining of Emerging Patterns: Discovering Trends and Di erences . Proc. KDD , pp. 43 { 52 , 1999 .

Dong ,

Ramamohanarao . Enhancing Traditional Classi ers Using Emerging Patterns . In: G. Dong, J. Bailey (eds.): Contrast Data Mining: Concepts , Algorithms , and Applications, pp. 187 { 196 , CRC Press, 2013 .

Duivesteijn ,

A.J.

Feelders ,

Knobbe . Exceptional model mining . Data Mining and Knowledge Discovery 30 ( 1 ): 47 { 98 , 2016 .

Duivesteijn ,

Knobbe ,

Feelders , M. van Leeuwen. Subgroup Discovery meets Bayesian networks | an Exceptional Model Mining approach . Proc. ICDM , pp. 158 { 167 , 2010 .

Duivesteijn , E. Loza Menc a , J. Furnkranz,

A.J.

Knobbe . Multi-label LeGo | Enhancing Multi-label Classi ers with Local Patterns . Proc. IDA , pp. 114 { 125 , 2012 .

Duivesteijn ,

Thaele . Understanding Where Your Classi er Does (Not) Work | The SCaPE Model Class for EMM . Proc. ICDM , pp. 809 { 814 , 2014 .

European

Parliament . Resolution of 16 February 2017 with recommendations to the Commission on Civil Law Rules on Robotics (2015/2103(INL)) . http://www.europarl.europa.eu/sides/getDoc.do?pubRef=- //EP//TEXT+TA+P8 -TA- 2017 -0051+0+DOC+XML+V0//EN [accessed July 3 , 2017 ], 2017 .

10.

Freund ,

R.E.

Schapire . A decision-theoretic generalization of on-line learning and an application to boosting . Journal of Computer and System Sciences 55 ( 1 ): 119 { 139 , 1997 .

11. J. Friedman , T.

Hastie , R.

Tibshirani . Additive logistic regression: a statistical view of boosting . Annals of Statistics 28 ( 2 ): 337 { 407 , 2000 .

12.

Henelius , K. Puolamaki, H. Bostrom, L. Asker,

Papapetrou . A peek into the black box: exploring classi ers by randomization . Data Mining and Knowledge Discovery 28 ( 5 {6): 1503 { 1529 , 2014 .

13.

Henelius , K. Puolamaki, I. Karlsson,

Zhao ,

Asker , H. Bostrom, P. Papapetrou. GoldenEye++ : A Closer Look into the Black Box . Proc. SLDS , pp. 96 { 105 , 2015 .

14.

Herrera ,

C.J.

Carmona ,

Gonzalez , M. J. del Jesus . An overview on subgroup discovery: foundations and applications . Knowledge and Information Systems 29 ( 3 ): 495 { 525 , 2011 .

15. W. Klosgen. Explora: A Multipattern and Multistrategy Discovery Assistant . Advances in Knowledge Discovery and Data Mining , pp. 249 { 271 , 1996 .

16.

Knobbe ,

Cremilleux , J. Furnkranz,

Scholz . From Local Patterns to Global Models: The LeGo Approach to Data Mining . Proc. LeGo: From Local Patterns to Global Models workshop @ ECML/PKDD, pp. 1 { 16 , 2008 .

17.

Kralj Novak ,

Lavrac ,

G.I.

Webb . Supervised Descriptive Rule Discovery: A Unifying Survey of Contrast Set, Emerging Pattern and Subgroup Mining . Journal of Machine Learning Research 10 : 377 { 403 , 2009 .

18.

Leman ,

Feelders ,

A.J.

Knobbe . Exceptional Model Mining . Proc. ECML/PKDD (2) , pp. 1 { 16 , 2008 .

19. M. Lichman , UCI Machine Learning Repository, http://archive.ics.uci.edu/ml, University of California, Irvine, School of Information and Computer Sciences, 2013 .

20. P.M. Long , R.A. Servedio . Random classi cation noise defeats all convex potential boosters . Machine Learning 78 ( 3 ): 287 - 304 , 2010 .

21.

Wrobel . An Algorithm for Multi-relational Discovery of Subgroups . Proc. PKDD , pp. 78 { 87 , 1997 .