<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>What Can We Expect from Active Class Selection??</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Mirko Bunse</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Katharina Morik</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>TU Dortmund, AI Group</institution>
          ,
          <addr-line>44221 Dortmund</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The promise of active class selection is that the proportions of classes can be optimized in newly acquired data. In this short paper, we take a step towards the identification of properties that data sets must meet in order to make active class selection (potentially) successful. Also, we compare the conceivable benefit of active class selection to that of active learning and we identify open research issues. It becomes apparent that active class selection is a tough task, in which informed strategies often exhibit only minor improvements over random sampling.</p>
      </abstract>
      <kwd-group>
        <kwd>Active class selection</kwd>
        <kwd>Active learning</kwd>
        <kwd>Classification</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Active class selection (ACS) [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] seeks to optimize the proportions of classes in
newly acquired data. This process is taken out sequentially: In each iteration,
the most promising proportion of classes is selected and instances are generated
according to these proportions. Due to this iterative collection of training data,
there is a certain similarity between ACS and active learning (AL) [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. However,
the data acquisition is fundamentally different between these paradigms: Where
AL selects un-labeled instances to be labeled, ACS selects classes for which new
instances are to be generated.
      </p>
      <p>
        This distinction reveals the contrasting assumptions which underlie AL and
ACS with regard to the data generating process: AL assumes an external oracle
which is able to assign labels to observations, e.g. a human annotator. ACS
assumes a data generator which produces observations from label queries. One
prominent example of such generator is the artificial nose experiment, where a
vapor (the label) must be selected before a sensor array can record data [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. In
both cases, it is assumed that each query is costly. Therefore, ACS and AL try
to minimize the amount of training data by selecting only the most promising
examples. Let us thus narrow the question raised above: Given that new training
data can be generated from label queries, can we expect ACS to make optimal
use of a limited data generation budget? Which preconditions must hold to make
ACS a success? Our contribution with respect to these questions is three-fold:
• We identify common properties of the data used in ACS publications.
• We compare the potential benefit of ACS to that of AL.
• We recognize open issues in ACS research.
      </p>
      <p>The first one of these contributions is detailed in Sec. 2. The second and third
ones are presented in Sec. 3 and in Sec. 4. Finally, Sec. 5 concludes our findings.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Data Used in ACS</title>
      <p>
        Despite the potential relevance of ACS, we could make out only two papers
which suggest algorithms for this task. Lomasky et al. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], who also introduced
ACS, present five approaches, the most successful of which seek to stabilize the
empirical error of each class. Kottke et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] compare these approaches to a
framework with which AL methods can be adapted to ACS. Namely, they use
AL to score pseudo instances and aggregate the scores for each class. Both papers
use random sampling (proportional and uniform) as a baseline.
      </p>
      <p>
        RedistrictIinmg[
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]
      </p>
      <p>Tab. 1 summarizes the results that have been reported for these methods.
The columns with upright names indicate whether a method clearly outperforms
its competitors (3) or not (7). Missing values denote that a method has not
been evaluated. Please consider that the qualification of a “winner” must remain
somewhat subjective. We therefore declare multiple methods as winners wherever
a single winner cannot be made out from the published plots and tables.</p>
      <p>
        One observation to make is that the random strategies “proportional” and
“uniform” perform highly competitive. In this overview, they win on five out of
eight data sets. Moreover, they come for free, whereas the informed (i.e.
nonrandom) strategies imply a certain computational overhead which needs to be
justified with the data acquisition cost. Also, one may be concerned about the
applicability of (informed) active sampling in general [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Note that proportional
sampling assumes that the correct label proportions of the test set are known at
training time, which may not hold in some use cases.
      </p>
      <p>
        All of the data sets used so far distinguish between at least three classes.
Moreover, we see that the predictability differs among their classes. The synthetic
data sets for instance are modeled so that one class can easily be distinguished
from the other two classes, which in turn are hard to distinguish from each
other. For the UCI data sets [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], we provide the confusion matrices in Tab. 2.
Displayed are the mean values over 50 trials, using proportional sampling and the
classifier from the ACS experiments. Each row is scaled to unit sum to account
for class imbalance. We see that the yeast data exhibits large differences among
class difficulties (78.7% vs 41.4% class-wise accuracy). The differences on the
vertebral data are smaller, yet considerable (74.0% vs 56.1%).
Given a data set consisting of at least three classes of varying difficulty, what
is the improvement that we can expect from ACS? How does it relate to the
improvement AL methods achieve? To answer these questions, we reproduce
some of the experiments described in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. We add one strategy to these
experiments that is optimal for the spirals data—it uses only a single example from
the easy class and randomly samples from the difficult classes. It is “optimal”
with regard to the overall accuracy because a single example is already enough
to achieve 100% accuracy on the easy class. Even though this strategy does not
adapt to any other data set, it shows how well ACS could potentially perform.
Moreover, we extend the experiments by evaluating an AL strategy, namely the
probabilistic active learning (PAL) [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] which is also used inside of PAL-ACS.
Fig. 1 presents the results of these extensions, specifically the mean error over
500 trials.
e
t
ra 0.3
n
o
i
tca 0.2
fi
i
s
lsa 0.1
-c
s
i
m
spirals
vertebral
      </p>
      <p>yeast</p>
      <p>The optimal strategy indicates that there is still room for improving ACS
methods. In particular, knowing the difficulty of the classes in advance allows
us to outperform the other strategies on the spirals data set. However, PAL
is even better than that. Knowing which examples are available thus allows
us to improve even further. These observations can not be made on the two
UCI data sets. On both of them, neither uniform sampling nor PAL-ACS are
clear winners—a finding we deem consistent with the original experiments. What
is probably surprising is that the AL strategy performs worse than ACS. We
conjecture that the identification of relevant examples is not necessarily easier
than, but considerably different from the identification of relevant classes.
4</p>
    </sec>
    <sec id="sec-3">
      <title>Open Issues in ACS</title>
      <p>
        It remains open whether the current limits of (informed) ACS stem from the
problem itself—i.e. from sequentially optimizing only the class proportions—
or from the methods proposed to date. We suggest to approach this question
by studying relaxations of “pure” ACS. Indeed, example generators are often
not only controlled by class proportions but also from auxiliary parameters. In
the artificial nose experiment, for instance, not only a vapor (the label) must
be selected before data can be recorded, but also the vapor’s concentration [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
Optimizing the data generation only with respect to the class proportions means
to limit the actual task artificially—and maybe even detrimentally.
      </p>
      <p>
        An issue that has been neglected in ACS so far is the problem of imbalanced
data [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. This problem refers to situations in which one class is abound and
another one is scarce, typically leading to the degradation of classifiers and
evaluation metrics. It has also been argued that within-class imbalances, i.e. abound
and scarce sub-groups of single classes, can hinder learning [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. In ACS, we
are free to choose how balanced the data is, but only with respect to the label.
Methods for imbalanced learning could therefore guide ACS by constraining the
class proportions for between-class balance and they may also correct the effects
of within-class imbalances.
ACS addresses use cases which distinguish between at least three classes of
varying predictability. However, this precondition does not necessarily lead to a
successful application of ACS. Experiments suggest that a random sampling of
classes is hard to beat with informed strategies. We expect future advances to
be made by (i) queries which combine the label with auxiliary parameters that
control the data generator and by (ii) accounting for data imbalances.
Acknowledgments We thank Daniel Kottke for the discussions we had and
for his great support in reproducing the experiments on PAL-ACS. We also thank
our reviewers for their valuable comments, in particular for pointing imbalanced
learning out to us.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Attenberg</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Provost</surname>
            ,
            <given-names>F.J.</given-names>
          </string-name>
          :
          <article-title>Inactive learning? Difficulties employing active learning in practice</article-title>
          .
          <source>SIGKDD Explorations</source>
          <volume>12</volume>
          (
          <issue>2</issue>
          ),
          <fpage>36</fpage>
          -
          <lpage>41</lpage>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Brodley</surname>
            ,
            <given-names>C.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Friedl</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          :
          <article-title>Improving automated land cover mapping by identifying and eliminating mislabeled observations from training data</article-title>
          .
          <source>In: Int. Geoscience and Remote Sensing Symp</source>
          . vol.
          <volume>2</volume>
          , pp.
          <fpage>1382</fpage>
          -
          <lpage>1384</lpage>
          .
          <string-name>
            <surname>Citeseer</surname>
          </string-name>
          (
          <year>1996</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Chapelle</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          :
          <article-title>Active learning for Parzen window classifier</article-title>
          .
          <source>In: Proc. of the AISTATS 2005. Society for Artificial Intelligence and Statistics</source>
          (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Dua</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Graff</surname>
            ,
            <given-names>C.:</given-names>
          </string-name>
          <article-title>UCI machine learning repository (</article-title>
          <year>2017</year>
          ), http://archive.ics. uci.edu/ml
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5. Ferna´ndez,
          <string-name>
            <surname>A.</surname>
          </string-name>
          , Garc´ıa,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Galar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Prati</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.C.</given-names>
            ,
            <surname>Krawczyk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Herrera</surname>
          </string-name>
          ,
          <string-name>
            <surname>F.</surname>
          </string-name>
          :
          <source>Learning from Imbalanced Data Sets</source>
          . Springer (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Kottke</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Krempl</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lang</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Teschner</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Spiliopoulou</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>Multi-class probabilistic active learning</article-title>
          .
          <source>In: Proc. of the ECAI 2016. Frontiers in Artificial Intelligence and Applications</source>
          , vol.
          <volume>285</volume>
          , pp.
          <fpage>586</fpage>
          -
          <lpage>594</lpage>
          . IOS Press (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Kottke</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Krempl</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stecklina</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>von Rekowski</surname>
            ,
            <given-names>C.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sabsch</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Minh</surname>
            ,
            <given-names>T.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Deliano</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Spiliopoulou</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sick</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Probabilistic active learning for active class selection</article-title>
          .
          <source>In: Proc. of the NIPS Workshop on the Future of Interactive Learning Machines</source>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Lomasky</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brodley</surname>
            ,
            <given-names>C.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aernecke</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Walt</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Friedl</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          :
          <article-title>Active class selection</article-title>
          .
          <source>In: Proc. of the ECML 2007. LNCS</source>
          , vol.
          <volume>4701</volume>
          , pp.
          <fpage>640</fpage>
          -
          <lpage>647</lpage>
          . Springer (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Rodriguez-Lujan</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fonollosa</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vergara</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Homer</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huerta</surname>
          </string-name>
          , R.:
          <article-title>On the calibration of sensor arrays for pattern recognition using the minimal number of experiments</article-title>
          .
          <source>Chemometrics and Intelligent Laboratory Systems</source>
          <volume>130</volume>
          ,
          <fpage>123</fpage>
          -
          <lpage>134</lpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Settles</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Active Learning</article-title>
          .
          <source>Synthesis Lectures on Artificial Intelligence and Machine Learning</source>
          , Morgan &amp; Claypool Publishers (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Weiss</surname>
            ,
            <given-names>G.M.:</given-names>
          </string-name>
          <article-title>Mining with rarity: A unifying framework</article-title>
          .
          <source>SIGKDD Explorations</source>
          <volume>6</volume>
          (
          <issue>1</issue>
          ),
          <fpage>7</fpage>
          -
          <lpage>19</lpage>
          (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>