<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>BoostEMM | Transparent Boosting using Exceptional Model Mining</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Simon van der Zon</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Oren Zeev Ben Mordehai</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tom Vrijdag</string-name>
          <email>t.s.vrijdag@student.tue.nl</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Werner van Ipenburg</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jan Veldsink</string-name>
          <email>jan.veldsinkg@rabobank.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Wouter Duivesteijn</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mykola Pechenizkiy</string-name>
          <email>m.pechenizkiyg@tue.nl</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Cooperatieve Rabobank U.A.</institution>
          ,
          <country country="NL">the Netherlands</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Eindhoven University of Technology</institution>
          ,
          <country country="NL">the Netherlands</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Boosting is an iterative ensemble-learning paradigm. Every iteration, a weak predictor learns a classi cation task, taking into account performance achieved in previous iterations. This is done by assigning weights to individual records of the dataset, which are increased if the record is misclassi ed by the previous weak predictor. Hence, subsequent predictors learn to focus on problematic records in the dataset. Boosting ensembles such as AdaBoost have shown to be e ective models at ghting both high variance and high bias, even in challenging situations such as class imbalance. However, some aspects of AdaBoost might imply limitations for its deployment in the real world. On the one hand, focusing on problematic records can lead to over tting in the presence of random noise. On the other hand, learning a boosting ensemble that assigns higher weights to hard-to-classify people might throw up serious questions in the age of responsible and transparent data analytics; if a bank must tell a customer that they are denied a loan, because the underlying algorithm made a decision speci cally focusing the customer since they are hard to classify, this could be legally dubious. To kill these two birds with one stone, we introduce BoostEMM: a variant of AdaBoost where in every iteration of the procedure, rather than boosting problematic records, we boost problematic subgroups as found through Exceptional Model Mining. Boosted records being part of a coherent group should prevent over tting, and explicit de nitions of the subgroups of people being boosted enhances the transparency of the algorithm.</p>
      </abstract>
      <kwd-group>
        <kwd>Boosting</kwd>
        <kwd>class imbalance</kwd>
        <kwd>Exceptional Model Mining</kwd>
        <kwd>model transparency</kwd>
        <kwd>responsible analytics</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>decisions of the earlier, blunter base classi ers with the later, more speci c ones,
an ensemble classi er is built that performs well overall. Hence, AdaBoost as
a mechanism shines in devoting appropriate attention of a classi er to those
records of the dataset that prove to be problematic.</p>
      <p>For all its strenghts, AdaBoost also comes with two weaknesses. The records
proving problematic for the classi er might encompass outliers, which could
lead to over tting. Moreover, in the age of responsible data analytics, we want
to know not only what our algorithm does, but also why it is reasonable. A
recent European Parliament resolution on the future of robotics and arti cial
intelligence in Europe contains the following [9, Section \Ethical principles",
point 12]:
[. . . ] it should always be possible to supply the rationale behind any
decision taken with the aid of AI that can have a substantive impact on
one or more persons' lives; [. . . ] it must always be possible to reduce the
AI system's computations to a form comprehensible by humans;
If the decision in a dataset on loan applications is outsourced to AdaBoost,
the customer might demand insight in the reasoning behind rejection. When
this resolution is put into law, it is likely that a customer demanding insight in
the reasoning AdaBoost deployed behind its decision (which has a substantive
impact on the customer's life), must be presented with not only how AdaBoost
decides where to focus its extra attention, but also why. Hence, in the near
future, nancial institutions will shy away from the liability associated with
using AdaBoost if there is no transparent form of boosting.</p>
      <p>
        In this paper, we ll that void by proposing BoostEMM. This method
combines AdaBoost-style iterative learning of base classi ers, where the focus shifts
towards problematic parts of input space, with Exceptional Model Mining (EMM)
[
        <xref ref-type="bibr" rid="ref18 ref5">18,5</xref>
        ]. This is a local pattern mining method, designed to nd subgroups (subsets
of the dataset at hand) that satisfy two conditions. On the one hand, subgroups
must be interpretable. This is typically enforced by only allowing subgroups
that can be de ned as a conjunction of few conditions on input attributes of
the dataset; hence subgroups come in terms that any domain expert can
understand. On the other hand, subgroups must be exceptional. This typically is
formalized in terms of an unusual interaction between several target attributes;
hence subgroups represent unusual behavior in the dataset. We choose targets
that represent the actual class label and the label predicted by base classi ers,
and de ne several quality measures that gauge unusual interaction between those
targets. Hence, for various types of bad base classi er performance, EMM nds
coherent parts of the classi er input space where this behavior occurs.
1.1
      </p>
    </sec>
    <sec id="sec-2">
      <title>Main Contributions</title>
      <p>We provide BoostEMM, a method encompassing core ideas from both AdaBoost
and Exceptional Model Mining. By dovetailing these techniques, BoostEMM
achieves two bene ts over AdaBoost:
1. by de ning speci c EMM variants (cf. Section 3.3), we steer boosting to
punish speci c kinds of bad behavior (error rate, class imbalance, FPR/TPR),
which is relevant for cost-sensitive applications;
2. by dovetailing an EMM run with every iteration of the boosting algorithm,
on every step we can report subgroups where extra boosting is needed. This
adds transparency to the boosting process (cf. Section 3.4), which is relevant
in the light of looming EU law.
2</p>
      <sec id="sec-2-1">
        <title>Related Work</title>
        <p>
          The groundbreaking paper on AdaBoost in its traditional form is [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]; this form
is also explained in Section 3. A version incorporating class probabilities was
introduced in [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. Convex potential boosting algorithms (which includes
AdaBoost) cannot handle random classi cation noise well [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ]; boosting tends to
over t towards the noisily labeled records of the dataset, reducing generality of
the learned classi cation model.
        </p>
        <p>
          The resurgence of neural networks through the deep learning hype has led
to a reappreciation of complex classi ers, that perform extremely well on the
task for which they have been designed, but whose internal reasoning is far too
complex for a human to fully understand. As a reaction, papers emerge that
take a peek into the black box. Some of the rst such papers include [
          <xref ref-type="bibr" rid="ref12 ref13">12,13</xref>
          ] for
hard classi ers, and [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] for soft classi ers. These papers share the objective of
transparency with BoostEMM, but they do not (as BoostEMM does) loop back
the interpretable results into the classi cation process to improve performance.
        </p>
        <p>
          The study of how local patterns can aid global models was the topic of the
LeGo workshop [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]. This workshop encompasses both papers enhancing existing
classi ers with local patterns, or combining local patterns into a classi er. A few
years later, LeGo-EMM [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] enhanced multi-label classi ers with local patterns
found through Exceptional Model Mining with the Bayesian networks model
class [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ], which improved multi-label SVN classi cation. These methods have in
common that the learning process is a single line: rst local patterns are mined,
then a subset of those patterns is selected, and nally that subset is used to
enhance/replace the feature set of a classi er, with which subsequently predictions
are made. Hence, LeGo papers share the incorporation of local pattern
knowledge in classi cation with BoostEMM, but they do not (as BoostEMM does)
loop back the output into an iterative learning process.
        </p>
        <p>
          Exceptional Model Mining seeks to nd subgroups of a dataset where several
targets interact in an unusual manner. A simpler cousin of EMM is Subgroup
Discovery (SD) [
          <xref ref-type="bibr" rid="ref14 ref15 ref21">15,21,14</xref>
          ]: the task of nding subgroups of a dataset where a
single target displays an unusual distribution. SD is closely related to Contrast
Set Mining (CSM) [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] and Emerging Pattern Mining (EPM) [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]; the results of
the latter technique have also been exploited to enhance classi cation [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], in the
style of the LeGo workshop papers discussed in the previous paragraph. The
relation between SD, CSM, and EPM is explored in detail in [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ].
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>The BoostEMM</title>
      </sec>
      <sec id="sec-2-3">
        <title>Method</title>
        <p>Given a dataset , which is a bag of N records r 2 of the form r =
(a1; : : : ; ak; `), where fa1; : : : ; akg are the input attributes of the dataset, taken
from some collective domain A, and ` is the class label, typically taken to be
binary. If we need to refer to a speci c record (or corresponding data
components), we do so by superscripts: the ith record is denoted ri, `i is its class label,
and the value of its jth input attribute is aij . We also use the shorthand ai to
denote the collective input attribute values of ri: ai = (ai1; : : : ; aik).</p>
        <p>The goal of classi cation is to nd a mapping P from the input attribute
space to the class label; P : A ! f0; 1g, such that we can predict the latter
for unseen instances of the former. In boosting, these predictions are improved
through the following methodology. We iteratively build one strong learner or
expert P = (E; W). This is done by constructing an ensemble E of h weak
learners or predictors (i.e. classi ers that perform (slightly) better than random)
E = (P1; : : : ; Ph), and an associated tuple W = (w1; : : : ; wh) of weights related
to the performance of each predictor. AdaBoost obtains these weights for each
weak classi er Pi by a transformation (cf. Section 3.2) of its error rate erri. In
each iteration of the boosting process, a new weak learner Pj is constructed,
taking into account the whole training set but also the up-to-date priorities,
or weights, of the records. These weights are maintained as another tuple of N
weights W , associated with the records of the dataset: W = (w1; : : : ; wN ). In the
rst iteration, all training data (unless given initial weights by the end user) are
initialized with equal weights wi = 1=N, and the rst weak learner is trained. In
subsequent iterations, the weights are increased for all samples that are classi ed
incorrectly by P, after which the weights are normalized. In later sections, we
replace the selection mechanism for the erroneous samples by an equivalent EMM
function de ning the subgroups to be boosted. AdaBoost updates these weights
W in a manner similar to the weights W, by a transformation (cf. Section 3.2)
based on errj .
3.1</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Exceptional Model Mining</title>
      <p>Given a dataset , which is a bag of N records r 2 of the form (a1; : : : ; ak;
t1; : : : ; tm), where fa1; : : : ; akg are the descriptors of the dataset, and ft1; : : : ; tmg
the targets. The goal of EMM is to nd subgroups of the dataset at hand, de ned
in terms of a conjunction of a few conditions on single descriptors of the dataset
(e.g.: a7 3 ^ a3 = true), for which the targets interact in an unusual manner.
De nition 1 (Subgroup). A subgroup corresponding to a description D is
the bag of records GD that D covers, i.e.</p>
      <p>GD = f ri 2
j D(ai) = 1 g
From now on we omit the D if no confusion can arise, and refer to the coverage
of a subgroup by n = jGj.</p>
      <p>In order to objectively evaluate a candidate description in a given dataset,
we need to de ne a quality measure. For each description D in a user-de ned
description language D, this function quanti es the extent to which the subgroup
GD corresponding to the description deviates from the norm.</p>
      <p>De nition 2 (Quality Measure). A quality measure is a function ' : D ! R
that assigns a numeric value to a description D.
3.2</p>
    </sec>
    <sec id="sec-4">
      <title>Mining Descriptions for Boosting and Updating the Weights</title>
      <p>The input attributes in classi cation/boosting correspond to the descriptors in
EMM. Having trained a weak learner Pj , we generate targets that re ect how
well the classi cation performs on each record. We explore several choices for
unusual interaction between these targets in Section 3.3. Having thusly de ned
a model class for EMM, we run the beam search algorithm [5, Algorithm 1]
to generate a set Dtop-q of subgroups GD with their associated descriptions D.
AdaBoost constructs the subset to be boosted by simply picking all erroneously
classi ed samples. Instead, BoostEMM picks every record that is covered by at
least one of the top subgroups we found with EMM. Hence, BoostEMM adheres
to the following scheme:</p>
      <p>N</p>
      <p>X wiI(`i 6= Pj (ai))
errj =</p>
      <p>j = log
wi</p>
      <p>1
N
P wi i=1
i=1</p>
      <p>errj
wi exp
1
errj</p>
      <p>M
P(a) = arg max X
`2f0;1g m=1</p>
      <p>j I(Pj (a) = `)
j I
ai 2
9Dj2Dtop-q : Dj (ai) = 1
In AdaBoost, the weight update function is instead given by:
wi</p>
      <p>wi exp( j I(`i 6= Pj (ai)))
3.3</p>
    </sec>
    <sec id="sec-5">
      <title>The Transparent Boosting Model Class for EMM</title>
      <p>The missing ingredient in the description of BoostEMM in the previous section,
is: how do we nd the subgroup set Dtop-q? We do so by Exceptional Model
Mining. Within BoostEMM, every EMM run is encompassed by the boosting
process. Hence, we have just trained a weak learner Pj to predict a speci c class
label ` for every possible input attribute vector a 2 A. Within this setting, in
order to employ EMM, we need to cast the available building blocks into a form
that ts the EMM problem speci cation, as outlined in Section 3.1. We need to
describe the dataset in terms of descriptors and targets, formulate a model class
over the targets, and de ne a quality measure over this model class.</p>
      <p>For the descriptors in EMM we take the input attributes ai1; : : : ; aik of the
classi cation task given at the start of Section 3. In the Transparent Boosting
model class for EMM there are two targets: the original class label, ti1 = `i, and
the class label predicted by the available weak learner Pj , ti2 = Pj (ai). The kind
of interaction in which the Transparent Boosting model class is interested, is an
exceptional discord between the original class label and the predicted class label:
where does our weak learner perform not so well?</p>
      <p>The last question can be answered in many reasonable manners. Which
answer we choose depends on what kind of boosting we want to achieve. In EMM,
the quality measure governs what exactly we nd interesting within the kind of
interaction de ned by the model class. As is common in EMM, we build up the
quality measure from two components: 'TB(D) = 'size(D) 'dev(D). The latter
component measures the exceptionality degree of target interaction. Since a large
value for this can easily be obtained in tiny subgroups, we need to prevent
overtting by multiplying with a component explicitly representing subgroup size.
For this, we take 'size(D) = log(jGDj). We employ the logarithm here, since
we do not want to put a penalty on medium-sized subgroups compared to large
subgroups; this component is only meant to discourage tiny subgroups. For the
deviation component 'dev(D), we develop four alternatives.</p>
      <p>Error-based boosting with 'err If one would merely be interested in the
error rate, we de ne the target interaction exceptionality of the subgroup as
follows.</p>
      <p>'err(D) =</p>
      <p>P
i:D(ai)=1
wiI(`i 6= Pj (ai))</p>
      <p>P
i:D(ai)=1
wi
This quality measure computes the error rate, but only on the records covered
by the subgroup. Hence, unlike AdaBoost, BoostEMM will also boost records of
the dataset that were classi ed correctly by the weak learner. This is deliberate,
since this ought to reduce the over tting e ect from which AdaBoost su ers.
Kappa-based boosting with ' In the presence of class imbalance,
optimizing for the Kappa statistic is more appropriate than the error rate.
' (D) =
accobs(D)
1
accexp(D)</p>
      <p>accexp(D)
accobs(D) = 1= w(D)
wi I(`i = 1)</p>
      <p>i:D(ai)=1
accexp(D) = (pos(D) posp(D)+neg(D) negp(D))=( w(D))2
X X
, where
w(D) =</p>
      <p>X</p>
      <p>wi
X</p>
      <p>i:D(ai)=1
wiI(`i = Pj (ai))
pos(D) =
neg(D) =
i:D(ai)=1</p>
      <p>X
Cost-sensitive boosting with 'FNR and 'FPR Based on the dataset domain
at hand, one might be interested in cost-sensitive classi cation. When the cost
of false negatives and false positives is substantially skewed, one would desire
to nd subgroups that boost for either of these components. Hence, we employ
each type of classi cation mistake as a deviation component of its own.
'FNR(D) = 1=n
'FPR(D) = 1=n
After one has trained an ensemble using boosting, it is insightful to know how
the ensemble was constructed (i.e. which data was emphasized most during the
boosting process). Especially when boosting for various quality measures, it can
be interesting to see how the various boosting strategies behave. We present a
visualization that shows the user exactly which regions have (successfully) been
boosted the most. Our method can show a high number of descriptions by
visualizing them in a tree. The tree is constructed by looping over the descriptions.
For each description we create a branch:
1. A branch consists of nodes represented by the literals of a description, and
the root of the branch is the rst literal (which makes sense, since each
following literal is a re nement on the description).
2. The last node (literal) of the branch stores the weight of the description. The
weight corresponds to the error of the weak learner that was constructed from
this description (w).
3. If a literal already exists during creation of the branch, we increase the weight
of the existing leaf by the weight of the literal, because the description was
used more heavily (i.e. by multiple classi ers). We proceed the creation of
the branch using the existing path.</p>
      <p>After tree construction, we merge sibling leaf nodes de ned on the same attribute
that originate from the same boosting iteration. For instance, two sibling leaf
nodes \age &lt; 20" and \age &lt; 30" can be merged into a single leaf node \age &lt;
30", since all descriptions from the same round are boosted together.</p>
      <p>Figure 1 shows the descriptions encountered in a run of the BoostEMM
process. The size of the nodes represents the weight, indicating the degree to
which the constraint contributes to the selection of samples for boosting.
4</p>
      <sec id="sec-5-1">
        <title>Experiments</title>
        <p>
          Three datasets with a binary classi cation task were used for the experiments
(cf. Table 1). The well-known Adults dataset stems from the UCI repository [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ];
the positive class is high earners. Credit-card fraud [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] can be found at Kaggle;
the task is to detect fraudulent transactions. Metadata for both these datasets
is readily available online. The third dataset, however, is proprietary: Rabobank
provided an anonymized real-life dataset related to on-line fraud.
        </p>
        <p>Rabobank is a nancial institution, one of the three biggest banks in the
Netherlands. Rabobank supplies a full range of nancial products to its
customers, including internet and mobile banking. The dataset we work with
encompasses over 30 million samples of internet transactions and mobile payments,
performed from January 2016 up to February 2017. Most of these samples
represent genuine, non-fraudulent transactions, but a tiny fraction ( 2 000) were
manually marked as fraudulent by domain experts. Known types of fraud
include trojan attacks, phishing and ID takeover. For trojan attacks and phishing,
the client takes part in the fraudulent transaction by providing the two-factor
authentication to the bank, after being misled by the fraudster to do so on a
payment prepared by the fraudster. In ID takeover, the fraudster has stolen the
credentials of the client and is able to provide the authentication herself.
Typically, di erent attackers and attack types occur in the same period. As attacks
are being blocked the modus operandi is changed or renewed within days or
weeks. Old attacks are retried over time, new attack vectors show up.</p>
        <p>Each record consists of a timestamp, an identi cation hash, a binary label
to indicate fraud, and 1 013 anonymized features. Attributes are masked by
renaming. Algorithmically each attribute is inspected; if a sample contains not
more then 200 unique values it is considered to be a code which is recoded to
a numeric value. Numeric and text values are frequency-based transformed into
up to 801 bins. Base attributes are constructed from the current transaction, as
well as the history from accountholder and bene ciary. Aggregations found to
be useful for the current business rule system were added.</p>
        <p>The task in the Rabobank dataset is to predict transactions to be fraudulent
(label=1) or not (label=0), having learned from historical data only. In order to
be useful alongside the current fraud detection, the bank requires the FPR to
be stable and far less then 1:10 000.
4.1</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Experimental Setup</title>
      <p>All datasets are imbalanced: there are substantially more negative records than
positive ones. Since we build on scikit-learn's Python Decision Tree
implementation as weak learner, we use dummy variables to handle categorical variables
in the Adult dataset. For the Credit-card and Rabobank dataset, all the values
were given as numeric in the rst place. We discard the `Time' column in the
Credit-card dataset. The beam search algorithm for EMM [5, Algorithm 1] is
parametrized with search width w = 3, search depth d = 3, and incorporates
the top-q subgroups into BoostEMM with q = 6. We use an AdaBoost
implementation with decision stumps (i.e. decision trees with depth 1).
4.2</p>
    </sec>
    <sec id="sec-7">
      <title>Experimental Results</title>
      <p>
        We run comparative experiments with seven competitors; results can be found
in Table 2. The models are Straw Man (majority class), AdaBoost SAMME [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ],
AdaBoost SAMME.R [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], and BoostEMM with each of the target interaction
exceptionality components (cf. Section 3.3). For each competitor we report
accuracy evaluated on a withheld test set. Since all datasets are imbalanced, we
also report Kappa and AUC. For all measures, higher is better.
      </p>
      <p>Since Adults is the only non-anonymized dataset, we present descriptions
discovered during the BoostEMM training for a qualitative inspection. Subgroups
2
3
that are deemed most problematic by the four compound quality measures in
the rst ve iterations of the BoostEMM process can be found in Tables 3{6.
5</p>
      <sec id="sec-7-1">
        <title>Discussion</title>
        <p>Table 2 shows that BoostEMM can sometimes match the performance of
AdaBoost, but sometimes it does not do so well. As expected, 'err mimics AdaBoost
best in terms of pure performance; it barely loses accuracy on the Rabobank and
Credit-card datasets in comparison with AdaBoost, while it has to cede some
ground on the Adults dataset. Interestingly, while BoostEMM with 'FNR leads
to substantial accuracy loss in two of the datasets, it performs unexpectedly well
on the third. All methods have a high accuracy on the Credit-card dataset, but
in terms of AUC, 'FNR outperforms all other methods including AdaBoost.</p>
        <p>When we inspect subgroups in more detail (cf. Tables 3{6), we obtain more
transparency and hence accountability into the boosting process. This
transparency is augmented by the visualization as introduced in Figure 1.
Additionally, from Table 6 we learn that BoostEMM su ers from a familiar problem
in data mining. This table follows the process of boosting subgroups featuring
an unusually high False Postive Rate. As the table shows, the top subgroups
found in the rst ve iterations have an unde ned FPR: there are no positives
in these subgroups at all. This is caused by the rst weak learner assigning all
records to the majority class, which is negative: the process only features true
and false negatives! In this setting, FPR boosting makes no sense. Therefore, in
3
4
5</p>
        <p>D
capital-gain
capital-loss
capital-gain
capital-gain
capital-loss
6896:48 ^ age 19:52 ^ education 7th-8th 6= 1
1802:48ca^pimtaal-rliotassl-sta1tu9s52M:6a9rried-civ-spouse = 1^ 493 16 0 477 0
24137:69 ^ marital-status Married-civ-spouse 6= 1 ^ age 19:52 98 1 0 97 0
1802:48 ^ marital-status Married-civ-spouse = 1 ^ age &gt; 27:07 887 133 0 754 0
24137:69 ^ marital-status Married-civ-spouse 6= 1 ^ age 19:52 98 1 0 97 0
jGDj TN FP FN TP
1632 17 0 1615 0
future work, we plan to tackle this problem by dovetailing the various kinds of
boosting BoostEMM has to o er.</p>
        <p>A similarly detailed investigation as the on in Tables 3{6 has been made for
the Rabobank dataset. Here, the attribute names are all obfuscated; we nd them
in the form C 0010. However, we presented the resulting subgroups in such tables
to domain experts at Rabobank who possess the key to translate obfuscated
features back to real-life information. They reported back that the subgroups
focus on the historical behavior of the customer of counterparty. Subgroups
reported in the rst iteration make the initial, rough cut. Subgroups reported
in the second iteration give it more detail towards a speci c modus operandi.
Client con dentiality disallows us to discuss more details about these subgroups,
but the domain experts con rm that the problematic areas have clear meaning
to them, which provides us with con dence that BoostEMM indeed adds the
desired transparency to the boosting process.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>S.D.</given-names>
            <surname>Bay</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.J.</given-names>
            <surname>Pazzani</surname>
          </string-name>
          .
          <article-title>Detecting Change in Categorical Data: Mining Contrast Sets</article-title>
          .
          <source>Proc. KDD</source>
          , pp.
          <volume>302</volume>
          {
          <issue>306</issue>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>A.</given-names>
            <surname>Dal Pozzolo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Caelen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.A.</given-names>
            <surname>Johnson</surname>
          </string-name>
          , G. Bontempi.
          <article-title>Calibrating Probability with Undersampling for Unbalanced Classi cation</article-title>
          .
          <source>Proc. SSCI</source>
          , pp.
          <volume>159</volume>
          {
          <issue>166</issue>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>G.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          .
          <article-title>E cient Mining of Emerging Patterns: Discovering Trends and Di erences</article-title>
          .
          <source>Proc. KDD</source>
          , pp.
          <volume>43</volume>
          {
          <issue>52</issue>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>G.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Ramamohanarao</surname>
          </string-name>
          .
          <article-title>Enhancing Traditional Classi ers Using Emerging Patterns</article-title>
          . In: G. Dong, J. Bailey (eds.):
          <source>Contrast Data Mining: Concepts</source>
          ,
          <string-name>
            <surname>Algorithms</surname>
          </string-name>
          , and Applications, pp.
          <volume>187</volume>
          {
          <issue>196</issue>
          , CRC Press,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>W.</given-names>
            <surname>Duivesteijn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.J.</given-names>
            <surname>Feelders</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Knobbe</surname>
          </string-name>
          .
          <article-title>Exceptional model mining</article-title>
          .
          <source>Data Mining and Knowledge Discovery</source>
          <volume>30</volume>
          (
          <issue>1</issue>
          ):
          <volume>47</volume>
          {
          <fpage>98</fpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>W.</given-names>
            <surname>Duivesteijn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Knobbe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Feelders</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. van Leeuwen. Subgroup</surname>
          </string-name>
          <article-title>Discovery meets Bayesian networks | an Exceptional Model Mining approach</article-title>
          .
          <source>Proc. ICDM</source>
          , pp.
          <volume>158</volume>
          {
          <issue>167</issue>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>W.</given-names>
            <surname>Duivesteijn</surname>
          </string-name>
          ,
          <string-name>
            <surname>E.</surname>
          </string-name>
          <article-title>Loza Menc a</article-title>
          , J. Furnkranz,
          <string-name>
            <given-names>A.J.</given-names>
            <surname>Knobbe</surname>
          </string-name>
          <article-title>. Multi-label LeGo | Enhancing Multi-label Classi ers with Local Patterns</article-title>
          .
          <source>Proc. IDA</source>
          , pp.
          <volume>114</volume>
          {
          <issue>125</issue>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>W.</given-names>
            <surname>Duivesteijn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Thaele</surname>
          </string-name>
          .
          <article-title>Understanding Where Your Classi er Does (Not) Work | The SCaPE Model Class for EMM</article-title>
          .
          <source>Proc. ICDM</source>
          , pp.
          <volume>809</volume>
          {
          <issue>814</issue>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>European</given-names>
            <surname>Parliament</surname>
          </string-name>
          .
          <article-title>Resolution of 16 February 2017 with recommendations to the Commission on Civil Law Rules on Robotics (2015/2103(INL))</article-title>
          . http://www.europarl.europa.eu/sides/getDoc.do?pubRef=- //EP//TEXT+TA+P8
          <string-name>
            <surname>-TA-</surname>
          </string-name>
          2017
          <source>-0051+0+DOC+XML+V0//EN [accessed July 3</source>
          ,
          <year>2017</year>
          ],
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <given-names>Y.</given-names>
            <surname>Freund</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.E.</given-names>
            <surname>Schapire</surname>
          </string-name>
          .
          <article-title>A decision-theoretic generalization of on-line learning and an application to boosting</article-title>
          .
          <source>Journal of Computer and System Sciences</source>
          <volume>55</volume>
          (
          <issue>1</issue>
          ):
          <volume>119</volume>
          {
          <fpage>139</fpage>
          ,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>J. Friedman</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Hastie</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Tibshirani</surname>
          </string-name>
          .
          <article-title>Additive logistic regression: a statistical view of boosting</article-title>
          .
          <source>Annals of Statistics</source>
          <volume>28</volume>
          (
          <issue>2</issue>
          ):
          <volume>337</volume>
          {
          <fpage>407</fpage>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <given-names>A.</given-names>
            <surname>Henelius</surname>
          </string-name>
          , K. Puolamaki, H. Bostrom, L. Asker,
          <string-name>
            <given-names>P.</given-names>
            <surname>Papapetrou</surname>
          </string-name>
          .
          <article-title>A peek into the black box: exploring classi ers by randomization</article-title>
          .
          <source>Data Mining and Knowledge Discovery</source>
          <volume>28</volume>
          (
          <issue>5</issue>
          {6):
          <volume>1503</volume>
          {
          <fpage>1529</fpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <given-names>A.</given-names>
            <surname>Henelius</surname>
          </string-name>
          , K. Puolamaki, I. Karlsson,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Asker</surname>
          </string-name>
          , H. Bostrom, P. Papapetrou. GoldenEye++
          <article-title>: A Closer Look into the Black Box</article-title>
          .
          <source>Proc. SLDS</source>
          , pp.
          <volume>96</volume>
          {
          <issue>105</issue>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <given-names>F.</given-names>
            <surname>Herrera</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.J.</given-names>
            <surname>Carmona</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Gonzalez</surname>
          </string-name>
          , M.
          <source>J. del Jesus</source>
          .
          <article-title>An overview on subgroup discovery: foundations and applications</article-title>
          .
          <source>Knowledge and Information Systems</source>
          <volume>29</volume>
          (
          <issue>3</issue>
          ):
          <volume>495</volume>
          {
          <fpage>525</fpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15. W. Klosgen. Explora:
          <article-title>A Multipattern and Multistrategy Discovery Assistant</article-title>
          .
          <source>Advances in Knowledge Discovery and Data Mining</source>
          , pp.
          <volume>249</volume>
          {
          <issue>271</issue>
          ,
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <given-names>A.</given-names>
            <surname>Knobbe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Cremilleux</surname>
          </string-name>
          , J. Furnkranz,
          <string-name>
            <given-names>M.</given-names>
            <surname>Scholz</surname>
          </string-name>
          . From Local Patterns to Global Models:
          <article-title>The LeGo Approach to Data Mining</article-title>
          .
          <source>Proc. LeGo:</source>
          From Local Patterns to Global Models workshop @ ECML/PKDD, pp.
          <volume>1</volume>
          {
          <issue>16</issue>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <given-names>P.</given-names>
            <surname>Kralj Novak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Lavrac</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.I.</given-names>
            <surname>Webb</surname>
          </string-name>
          .
          <article-title>Supervised Descriptive Rule Discovery: A Unifying Survey of Contrast Set, Emerging Pattern and Subgroup Mining</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          <volume>10</volume>
          :
          <fpage>377</fpage>
          {
          <fpage>403</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <given-names>D.</given-names>
            <surname>Leman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Feelders</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.J.</given-names>
            <surname>Knobbe</surname>
          </string-name>
          .
          <article-title>Exceptional Model Mining</article-title>
          .
          <source>Proc. ECML/PKDD (2)</source>
          , pp.
          <volume>1</volume>
          {
          <issue>16</issue>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>M. Lichman</surname>
          </string-name>
          , UCI Machine Learning Repository, http://archive.ics.uci.edu/ml, University of California, Irvine, School of Information and Computer Sciences,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>P.M. Long</surname>
            ,
            <given-names>R.A.</given-names>
          </string-name>
          <string-name>
            <surname>Servedio</surname>
          </string-name>
          .
          <article-title>Random classi cation noise defeats all convex potential boosters</article-title>
          .
          <source>Machine Learning</source>
          <volume>78</volume>
          (
          <issue>3</issue>
          ):
          <fpage>287</fpage>
          -
          <lpage>304</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <given-names>S.</given-names>
            <surname>Wrobel</surname>
          </string-name>
          .
          <article-title>An Algorithm for Multi-relational Discovery of Subgroups</article-title>
          .
          <source>Proc. PKDD</source>
          , pp.
          <volume>78</volume>
          {
          <issue>87</issue>
          ,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>