<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Probabilistic Active Learning with Structure-Sensitive Kernels</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Dominik Lang</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Daniel Kottke</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Georg Krempl</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bernhard Sick</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>IES Group, Faculty of Computer Science, University of Kassel</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>KMD Lab, Faculty of Computer Science, Otto-von-Guericke University</institution>
          ,
          <addr-line>Magdeburg</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <fpage>37</fpage>
      <lpage>48</lpage>
      <abstract>
        <p>This work proposes two approaches to improve the poolbased active learning strategy 'Multi-Class Probabilistic Active Learning' (McPAL) by using two kernel functions based on Gaussian mixture models (GMMs). One uses the kernels for the instance selection of the McPAL strategy, the second employs them in the classification step. The results of the evaluation show that using a different classification model from the one that is used for selection, especially an SVM with one of the kernels, can improve the performance of the active learner in some cases.</p>
      </abstract>
      <kwd-group>
        <kwd>active learning</kwd>
        <kwd>gaussian mixture</kwd>
        <kwd>kernel function</kwd>
        <kwd>support vector machine</kwd>
        <kwd>McPAL</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction &amp;</title>
    </sec>
    <sec id="sec-2">
      <title>Motivation</title>
      <p>
        Active learning (AL) is a special case of semi-supervised machine learning, in
which a learning algorithm has both labeled and unlabeled data available to it
and is able to acquire the true labels of instances from an external source, in
most cases one or multiple human agents. Since the number of labels that can
be acquired is limited due to the cost that the acquisition entails, AL strategies
aim to select instances that maximize the learners classification performance
while being efficient with respect to the costs. A pool-based AL strategy named
’Multi-Class Probabilistic Active Learning ’ (McPAL) [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] has shown to
outperform competing strategies. This paper investigates the possibility of improving
the performance of the method by including the information captured by a
Gaussian mixture model (GMM) into the active learner. To achieve this, two kernel
functions that are based on a GMM are used. These structure-sensitive kernel
functions, based on the GMM [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and RWM [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] distance measures, are
leveraged by the active learner in two different ways: (1) by being included in the
computation of the McPAL score, (2) by being used in the model that performs
classification based on the sampled set of labeled instances. These approaches
are compared to the original McPAL method as well as random sampling in a
series of experiments on one artificial data set and nine real-world data sets from
the UCI machine learning repository [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
2
      </p>
    </sec>
    <sec id="sec-3">
      <title>Related Research</title>
      <p>
        In active learning, various criteria have been proposed to determine which
instances are most helpful to learn a classification model. One of the most common
is the model’s uncertainty regarding the classification of a sample. A strategy
that solely relies on this criterion is known as uncertainty sampling (US) [
        <xref ref-type="bibr" rid="ref13 ref18">13, 18</xref>
        ].
In their application of US to SVMs, Tong and Koller [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] motivated that the
goal is to approximately halve the version space through selecting instances
that lie closest to the current decision boundary of the classifier. To extend
this to multi-class problems, many methods have been proposed, for example,
’Best-versus-Second-Best’ [
        <xref ref-type="bibr" rid="ref11 ref8 ref9">8, 9, 11</xref>
        ] (also referred to as ’Margin Sampling’ [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ])
or Entropy-based sampling [
        <xref ref-type="bibr" rid="ref18 ref8 ref9">8, 18, 9</xref>
        ]. Solely relying on this criterion to select
instances has been shown to be prone to being ’locked in’, ignoring possibly
informative instances in favor of refining the current decision boundary [
        <xref ref-type="bibr" rid="ref12 ref18">18, 12</xref>
        ].
Hence, various approaches have been proposed that, in addition to uncertainty,
also include other criteria. These include, for example, the diversity of the
sampled instances [
        <xref ref-type="bibr" rid="ref2 ref5">2, 5</xref>
        ] or the density around a candidate instance [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ]. A promising
AL strategy is Multi-class Probabilistic Active Learning (McPAL) [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], which
has shown promising results compared to other approaches. It combines the
density, the class posterior probability and the number of already sampled instances
in the neighborhood of a candidate instance to estimate the potential gain of
acquiring an instance’s label. Instances that entail the highest potential gain are
selected for labeling by the strategy. This acquisition is performed in a
one-byone fashion.
      </p>
      <p>
        However, the selection does not have to be solely based on supervised models
but can also use unsupervised approaches. Known clustering algorithms like
kmedoid [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] or hierarchical clustering [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] as well as generative models like GMMs
[
        <xref ref-type="bibr" rid="ref16 ref7">16, 7</xref>
        ] can be used to model the structure of the data and include it in the
selection process.
      </p>
      <p>
        The classification models, that are used in the process of instance selection, are
often used for training the final classifier. Tomanek &amp; Morik [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] investigated,
to which degree the bias towards the learning algorithm used in the selection
process affects, what they call label reusability - i.e., the training of a classifier,
other than the one used for selection, on the acquired labeled data. They
introduced the terms of selector and consumer classifiers to describe the model used
for selecting instances, and performing classification based on them, respectively.
Contrary to their initial assumptions, they concluded that self-selection (the
selector and consumer classifier are the same) is in fact not in all cases the best
choice.
      </p>
    </sec>
    <sec id="sec-4">
      <title>Using GMM-based Kernels with the McPAL Strategy</title>
      <p>
        Since the majority of data available in the scenario of AL is unlabeled and
therefore carries no explicit information about the mapping of f : x 7→ y, implicit
information contained in the structure of the data becomes even more important.
The GMM ([
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], Eq. 6) and RWM ([
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], Eq. 7) distance measures (denoted as
ΔGMM and ΔRW M ) are based on Gaussian mixture models (GMMs). A GMM
models data with a number of J multivariate Gaussian distributions3. To speed
up the training process, first k-means clustering is performed to find J clusters
in the data. Based on the samples belonging to the clusters, the initial means
and variances are computed to initialize the Gaussian distributions. Then the
components are refined, either with the Expectation Maximization (EM) or
Variational Inference (VI) training method [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], using only the feature vectors x of
the samples. The GMMs used in this paper are trained with the VI method.
The result of the training is a GMM with J components, component weights
φj 4 that determine the influence of the component in the mixture, as well as the
component covariance matrices Σj . Building on such a mixture model, the GMM
[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and RWM distance measures [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] consist of the Mahalanobis distance of two
instances a and b with respect to the covariance matrices of the mixture model,
weighted in two different ways: (1) the distance is weighted by the mixture
coefficients of the model (GMM-distance, Eq. 1); (2) the distance is weighted by half
the sum of the components responsibilities for the two instances (RWM-distance,
Eq. 2). Both measures include the information captured by the GMM into the
distance measure. The resulting distance is small, if both instances lie closest to
the same GMM component, and large, if their closest GMM components differ.
These distance measures are incorporated into kernel functions by substituting
the Euclidean distance in the Gaussian RBF kernel with the GMM or RWM
distance respectively [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. The kernel functions thereby keep the parameter γ
of the RBF kernel. These kernel functions can be used in kernel-based learning
methods like SVMs or Parzen-Window kernel density estimation.
ΔGMM (a, b) =
ΔRW M (a, b) =
      </p>
      <p>J
X φj ΔΣj (a, b))
j=1
J 1
X( (p(j|a) + p(j|b))ΔΣj (a, b))
j=1 2
(1)
(2)
The research question of this work is whether the inclusion of structural
information by means of such kernels improves the performance of the McPAL approach.
To examine that this work investigates two possible ways the McPAL strategy
can employ such kernels and to which extent these benefit the strategy. The first
approach of using these structure-sensitive kernel functions in combination with
3 These distributions are referred to as ’components’
4 As part of the VI training method, the weights of some components can be set close
to zero, effectively ’pruning’ them from the model
the McPAL strategy is incorporating them into the process of instance selection.
To this end, two changes to the method are made that are described in the
following.</p>
      <p>
        First, the GMM/RWM kernels replace the Gaussian RBF kernel in the
computation of the kernel frequency estimates (denoted as →−k in Eq. 4), which are
required by the McPAL method, by means of the Parzen-Window method. These
frequency estimates are computed in the same way as in the original McPAL
approach [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], i.e. by computing the kernel density estimates with the
ParzenWindow method, but leaving out the normalization by the number of samples.
(3)
(4)
(5)
(6)
kx,y =
      </p>
      <p>X
{(x0,y0);y0=y}
→−
k x = {kx,y1 , kx,y2 , · · · , kx,yn }</p>
      <p>KGMM/RW M (x, x0)
Second, instead of using Parzen-Window estimation, the density estimates are
directly taken from the GMM used by the GMM and RWM kernels. The
ParzenWindow method places a kernel K with bandwidth h on each of the N samples
in the data set, with each of them equally contributing to the resulting density
estimate (s. Eq. 5). The GMM uses a fixed number of J multivariate Gaussians N
to model the data, the contribution of each of these components being weighted
by the mixture coefficient or component weight φ of the component (s. Eq. 6).
These changes enable the McPAL strategy to use the information provided by
the GMM and RWM kernels in the instance selection process. For the purpose of
disambiguation, this modified version of the McPAL strategy is in the following
referred to as StrucPAL.</p>
      <p>p(x) =
p(x) =
1 XN Kh(x − xi) =
N i=1
J
X φj N (x|μj , Σj )
j=1</p>
      <p>
        n=1
1 XN K( x − xn )
N h h
The second approach to use structure-sensitive kernels to improve the
performance of the McPAL strategy is by using them in the consumer classifier. This
is possible in two ways, either as ’self-selection’ or ’foreign-selection’. Tomanek
&amp; Morik [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] use these terms to refer to, in the first case, the selector and
consumer classifiers being the same, or in the second case, the selector and
consumer classifiers being different. Therefore, two scenarios for using the GMM and
RWM kernels in the classification process are possible. The first is the StrucPAL
method being used with self-selection, so pwcrwm or pwcgmm act as both selector
and consumer classifier respectively. The second is that the McPAL or StrucPAL
strategy is employed for instance selection but classification is performed by a
foreign classifier which uses the GMM or RWM kernels - i.e. Parzen-Window
classifier or a SVM.
      </p>
      <p>This work aims to investigate two questions regarding the use of structural
information by the McPAL method, in order to gain additional insight into what
approaches are worth exploring in future research.</p>
      <p>The first question is whether, and to what extent, the performance of the
StrucPAL method differs from the original McPAL method. Due to the already
mentioned inclusion of the information of the underlying mixture model into the
instance selection process, a positive impact on the performance is expected.
The second question is if and to what extent the McPAL and StrucPAL learners
benefit from foreign-selection. The first part of this question is to investigate, how
the performance of McPAL and StrucPAL learners using self-selection compares
to using a SVM with the same kernel as the selector as consumer classifier. The
second part of this question is to investigate, how the original McPAL strategy
can benefit from consumer classifiers that use the GMM or RWM kernels.
4</p>
    </sec>
    <sec id="sec-5">
      <title>Experiments</title>
      <p>
        In our experiments, eight data sets from the UCI machine learning repository [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]
are used, namely australian, glass, haberman, heart, qsar-biodeg5,
steel-platesfault6, vehicle and vertebral. Furthermore, the phoneme data set from OpenML
[
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] is used. In addition to that, an artificial 2d data set referred to as blobs is
used, consisting of three Gaussians that make up the classes.
      </p>
      <p>The experiments include three AL strategies: McPAL (mp), StrucPAL (sp) and
random sampling (rl). As classifiers, the Parzen-Window classifier (pwc) and
support vector machine classifier (svm) are used. The PWC and SVM classifiers
use either the Gaussian RBF, the RWM or the GMM kernels, as introduced
earlier. The kernel that the classifier uses is denoted in subscript, for example
pwcrbf . An active learner in the experiments consists of three components: the
AL strategy (al), the selector classifier (clal) and the consumer classifier (cl). In
the case of self-selection, cl and clal are identical.</p>
      <p>Based on a set of 10 seeds for randomization, for each seed, the data sets are split
using five-fold stratified cross-validation. One fold per split is used as holdout set
to test the trained consumer classifier, while the four remaining folds are used
as training data. The initial labeled set L is initialized with one instance from
two randomly picked classes in the training data. The random choice is based
on the seed used in the current cross-validation split. Starting with L initialized
with 2 instances and the rest of the training set comprising the unlabeled set
U , pool-based AL is performed. As part of this, the labels of 60 instances in
total are acquired in a one-by-one fashion with both the selector and consumer
classifier being updated after each acquisition. Then the consumer classifier is
evaluated on the holdout set using the accuracy metric. This process is repeated
until every fold has been used as test set once. The performance scores at every
point in the AL process are averaged over all folds. After each of the 10 seeds
5 in the following abbreviated as ’qsar’
6 in the following abbreviated as ’steel’
has been used to seed the cross-validation split, the final results are gained by
computing the average accuracy per step in the AL process and the standard
deviation of the accuracy over all seeds.</p>
      <p>The hyperparameters of the models for each data set are determined by
performing an exhaustive search over a parameter grid on a subset of the data.
This subset is a stratified, seed-based7 random subsample consisting of 90
instances. The 90 instances are split using three-fold stratified cross-validation,
with one fold being used to train a classifier with a given set of parameters,
while the other two folds are used to evaluate the performance of this classifier.
The small size of this tuning data set is founded in the fact, that in AL
applications there is little labeled data available, therefore performing model selection
with a large tuning set would be unrealistic. However, a review of the literature
on active and semi-supervised learning did not provide a fitting way to determine
the hyperparameters without using more labeled data than would be available
in this scenario.
5</p>
    </sec>
    <sec id="sec-6">
      <title>Results &amp; Discussion</title>
      <p>In the following, the results for the scenarios of self-selection and foreign-selection
are presented in two ways.</p>
      <p>First, the average accuracy scores and the corresponding standard deviation is
tabulated for the different active learners on each data set. The highest accuracy
score on a data set is printed in bold font. In the case of learners scoring equally,
a lower standard deviation decides the winner. In case these are also identical,
the first place is shared by these. For each learner, the difference in accuracy
score to the highest score on each data set is computed and averaged for all data
sets. This average difference in accuracy to the winners is shown in the column
’diff ’ . Based on this difference, the learners are ranked, shown in column ’rank ’.
The second way of illustrating the results is so called learning curve plots. These
show the performance of the learners on a given data set over the entire AL
process, that is for each acquired label.
5.1</p>
      <p>Self-Selection
First, the results of the experiments for the scenario of self-selection, shown
in Tab. 1, will be considered. In the three moments in the learning process
at 10, 20 and 30 acquired labels a good performance of the original McPAL
method can be observed. It performs best on 6 of 10 data sets at 10 sampled
instances, scoring the first rank in the comparison to the two StrucPAL variants
and random selection learners with the Parzen-Window classifier with the RBF,
GMM and RWM kernels. At 20 sampled instances, McPAL performs best on 5
data sets, scoring second rank and at 30 sampled instances it is best on 8 data
7 The seed used for the model selection was not used in the splits for the experiments
themselves.
sets, scoring first rank again. Based on these observations a solid performance can
be attested to the McPAL strategy, although it does not manage to perform best
on the steel and vehicle, where it is beaten by random selection with only one
exception (vehicle, 10 sampled instances). The StrucPAL method only manages
to perform better than McPAL on the blobs data set at 10 and 20 sampled
instances as well as on heart at 20 sampled instances, although Fig. 1 shows an
overall better performance of StrucPAL on heart. The gap between the scores
of StrucPAL to the best performing learner on each data set varies in size, but
when averaged leads to the two StrucPAL learners taking the last two ranks in
the ranking.</p>
      <p>Concluding the results of the self-selection scenario, the StrucPAL method did
not provide better classification performance than the original method. Based
on this observation it appears, that including the structural information from
the Gaussian mixture model in the selection process did not improve the McPAL
method.
beled instances with Parzen Window Classifier (PWC), using self-selection or
randomselection. Abbreviations are explained in Sec. 4.
the accuracy scores at the stages of 10, 20 and 30 sampled instances. For every
AL strategy, self-selection is compared to the use of an SVM (with the same
kernel as the selector) as consumer classifier.</p>
      <p>
        As originally pointed out by Tomanek and Morik [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ], it can be observed that
foreign-selection can be indeed beneficial with regard to classification
performance. However, the extent of this varies in the experiments, ranging from a
difference in accuracy of 0.01 to 0.08 and is limited to some of the data sets.
Based on the averaged difference in score to the best performing method,
selfselection scores better than foreign selection in all three segments. This analysis,
however, only included a consumer classifier (SVM), that uses the same kernel
function as the selector. In order to investigate, how McPAL learners perform, if
1.0
0.9
paired with consumer classifiers using the GMM and RWM kernels, a separate
tabulation is shown in Tab. 3.
      </p>
      <p>Although the self-selection learner with McPAL+pwcrbf scores the first rank
in all three stages, it can be observed that McPAL can benefit from different
consumer classifiers. At the stage of 10 sampled instances, an SVM with GMM
kernel scores a higher accuracy on the steel (+0.1) and vehicle (+0.07) data sets,
with minor gains being provided by an SVM with RBF kernel (+0.01 on qsar,
+0.02 on glass) and a SVM with RWM kernel (+0.02 on haberman). However,
these gains are accompanied by worse performance than McPAL on other data
sets. The advantage provided by the foreign classifiers reduces in the stages of 20
and 30 sampled instances, with svmgmm still showing good gains at 20 sampled
instances (+0.05 on steel, +0.1 on vehicle).</p>
      <p>Fig. 2 shows the learning curves on the vehicle and steel data sets. While on
vehicle a solid advantage of svmgmm, svmrbf and pwcgmm over McPAL in terms
of accuracy can be observed, the development on steel is a different one. While
the svmgmm and svmrwm learners perform well due to a stagnating but better
performance in the early phase, they fail to exploit the additionally acquired
labels in the fashion of the other learners, resulting in a slight but increasing
advantage for the learners using GMM-based PWCs later, which are in the last
phase of the learning process surpassed by svmrbf .</p>
      <p>Concluding the results of the foreign-selection scenario it can be summarized,
that although self-selection McPAL has performed solidly in the experiments, the
results indicate that the use of classifiers with GMM-based kernels in this
scenario shows potential and the general use of foreign-selection motivates further
research.
0.4
0.3
0
10
20</p>
      <p>30 40
# of labeled instances
50
60
0
10
20</p>
      <p>30 40
# of labeled instances
50
60
1.0
0.9
6</p>
    </sec>
    <sec id="sec-7">
      <title>Conclusion</title>
      <p>The experiments explored two possible approaches to incorporate the
information of a GMM into the McPAL method. The first approach, using two
GMMbased kernel functions in the instance selection process, has shown to not provide
an advantage regarding the performance compared to the original method. In
total, the original McPAL selection strategy with pwcrbf both as selector and
consumer classifier, has shown to perform better than the StrucPAL learners,
with random sampling performing better than both methods in few cases.
Especially data sets like australian, glass, vehicle and vertebral proofed harder for
the StrucPAL learners. One possible explanation for this is that the assumption
of the GMM, i.e. that the subpopulations in the data representing the different
classes fit a multivariate Gaussian distribution, does not hold in these cases.
The second approach, using the GMM-based kernel functions in the consumer
classifiers of a foreign-selection scenario, showed potential gains regarding
classification accuracy. Using a svmgmm as consumer classifier for the original McPAL
learner has shown to improve the classification performance on the steel and
vehicle data sets while performing slightly worse on others. The fact, that learners
using the StrucPAL method and PWC with the GMM or RWM kernel
generally did not benefit from using an SVM with these kernels proves interesting.
It appears that either the use of the same kernel function did not mitigate the
adverse effect foreign-selection seems to entail in this case, or that the labeled
set sampled by StrucPAL is simply less fit for classification with svmgmm or
svmrwm.</p>
      <p>Regarding the performance of the SVM classifiers used in the experiments, it
has to be considered that the model selection procedure employed in the
experiments is admittedly weak. Therefore, it is possible that the hyperparameters
used in the experiments, not only for the SVMs but also the other classifiers,
are suboptimal. Considering the restrictive nature of the AL setting regarding
the availability of labeled data, this circumstance is acceptable, since using more
data for model selection would be even more unrealistic in this setting.
It appears that the McPAL strategy already performs very well at selecting the
most useful instances and including the information of the GMM does not add
to this, in some cases even hindering a good selection. Based on these results it
appears that work on the McPAL strategy in the future should focus on
improving the method regarding other aspects, for example imbalanced data.
However, using other classifiers to exploit the labeled set sampled with the
McPAL strategy has shown to be of possible gain, in order to improve the overall
classification performance of the active learner. The use of SVMs as consumer
classifiers showed to have potential, although determining fitting
hyperparameters in the setting of active learning still poses a problem, that should be
investigated further.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>C</given-names>
            <surname>Bishop</surname>
          </string-name>
          .
          <article-title>Pattern recognition and machine learning (information science</article-title>
          and statistics),
          <source>1st edn. 2006. corr. 2nd printing edn</source>
          . Springer, New York,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Klaus</given-names>
            <surname>Brinker</surname>
          </string-name>
          .
          <article-title>Incorporating diversity in active learning with support vector machines</article-title>
          .
          <source>In ICML</source>
          , volume
          <volume>3</volume>
          , pages
          <fpage>59</fpage>
          -
          <lpage>66</lpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Nicolas</given-names>
            <surname>Cebron and Michael R Berthold</surname>
          </string-name>
          .
          <article-title>Active learning for object classification: from exploration to exploitation</article-title>
          .
          <source>Data Mining and Knowledge Discovery</source>
          ,
          <volume>18</volume>
          (
          <issue>2</issue>
          ):
          <fpage>283</fpage>
          -
          <lpage>299</lpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Gang</given-names>
            <surname>Chen</surname>
          </string-name>
          , Tian-jiang
          <string-name>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>Li-yu Gong, and Perfecto Herrera</article-title>
          .
          <article-title>Multiclass support vector machine active learning for music annotation</article-title>
          .
          <source>International Journal of Innovative Computing, Information and Control</source>
          ,
          <volume>6</volume>
          (
          <issue>3</issue>
          ):
          <fpage>921</fpage>
          -
          <lpage>930</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Charlie</surname>
            <given-names>K Dagli</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shyamsundar Rajaram</surname>
          </string-name>
          , and Thomas S Huang.
          <article-title>Utilizing information theoretic diversity for svm active learn</article-title>
          .
          <source>In 18th International Conference on Pattern Recognition (ICPR'06)</source>
          , volume
          <volume>2</volume>
          , pages
          <fpage>506</fpage>
          -
          <lpage>511</lpage>
          . IEEE,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Sanjoy</given-names>
            <surname>Dasgupta</surname>
          </string-name>
          and
          <string-name>
            <given-names>Daniel</given-names>
            <surname>Hsu</surname>
          </string-name>
          .
          <article-title>Hierarchical sampling for active learning</article-title>
          .
          <source>In Proceedings of the 25th international conference on Machine learning</source>
          , pages
          <fpage>208</fpage>
          -
          <lpage>215</lpage>
          . ACM,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Dezhi</given-names>
            <surname>Hong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Hongning</given-names>
            <surname>Wang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Kamin</given-names>
            <surname>Whitehouse</surname>
          </string-name>
          .
          <article-title>Clustering-based active learning on sensor type classification in buildings</article-title>
          .
          <source>In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management</source>
          , pages
          <fpage>363</fpage>
          -
          <lpage>372</lpage>
          . ACM,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Ajay</surname>
            <given-names>J Joshi</given-names>
          </string-name>
          , Fatih Porikli, and
          <article-title>Nikolaos Papanikolopoulos. Multi-class active learning for image classification</article-title>
          .
          <source>In Computer Vision and Pattern Recognition</source>
          ,
          <year>2009</year>
          .
          <article-title>CVPR 2009</article-title>
          . IEEE Conference on, pages
          <fpage>2372</fpage>
          -
          <lpage>2379</lpage>
          . IEEE,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Ajay</surname>
            <given-names>J Joshi</given-names>
          </string-name>
          , Fatih Porikli, and Nikolaos P Papanikolopoulos.
          <article-title>Scalable active learning for multiclass image classification</article-title>
          .
          <source>IEEE transactions on pattern analysis and machine intelligence</source>
          ,
          <volume>34</volume>
          (
          <issue>11</issue>
          ):
          <fpage>2259</fpage>
          -
          <lpage>2273</lpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Daniel</given-names>
            <surname>Kottke</surname>
          </string-name>
          , Georg Krempl, Dominik Lang, Johannes Teschner, and
          <article-title>Myra Spiliopoulou. Multi-Class Probabilistic Active Learning</article-title>
          , volume
          <volume>285</volume>
          <source>of Frontiers in Artificial Intelligence and Applications</source>
          , pages
          <fpage>586</fpage>
          -
          <lpage>594</lpage>
          . IOS Press,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Jan</surname>
            <given-names>Kremer</given-names>
          </string-name>
          , Kim Steenstrup Pedersen, and
          <string-name>
            <given-names>Christian</given-names>
            <surname>Igel</surname>
          </string-name>
          .
          <article-title>Active learning with support vector machines</article-title>
          .
          <source>Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery</source>
          ,
          <volume>4</volume>
          (
          <issue>4</issue>
          ):
          <fpage>313</fpage>
          -
          <lpage>326</lpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Dominik</surname>
            <given-names>Lang</given-names>
          </string-name>
          , Daniel Kottke, Georg Krempl, and
          <string-name>
            <given-names>Myra</given-names>
            <surname>Spiliopoulou</surname>
          </string-name>
          .
          <article-title>Investigating exploratory capabilities of uncertainty sampling using svms in active learning</article-title>
          .
          <source>In Active Learning: Applications, Foundations and Emerging Trends</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>David</surname>
            <given-names>D</given-names>
          </string-name>
          <string-name>
            <surname>Lewis</surname>
            and
            <given-names>William A</given-names>
          </string-name>
          <string-name>
            <surname>Gale</surname>
          </string-name>
          .
          <article-title>A sequential algorithm for training text classifiers</article-title>
          .
          <source>In Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval</source>
          , pages
          <fpage>3</fpage>
          -
          <lpage>12</lpage>
          . Springer-Verlag New York, Inc.,
          <year>1994</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>M. Lichman.</surname>
          </string-name>
          <article-title>UCI machine learning repository</article-title>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Hieu</surname>
            <given-names>T</given-names>
          </string-name>
          <string-name>
            <surname>Nguyen</surname>
            and
            <given-names>Arnold</given-names>
          </string-name>
          <string-name>
            <surname>Smeulders</surname>
          </string-name>
          .
          <article-title>Active learning using pre-clustering</article-title>
          .
          <source>In Proceedings of the twenty-first international conference on Machine learning, page 79. ACM</source>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>Tobias</given-names>
            <surname>Reitmaier</surname>
          </string-name>
          and
          <string-name>
            <given-names>Bernhard</given-names>
            <surname>Sick</surname>
          </string-name>
          .
          <article-title>Active classifier training with the 3ds strategy</article-title>
          .
          <source>In Computational Intelligence and Data Mining (CIDM)</source>
          ,
          <source>2011 IEEE Symposium on</source>
          , pages
          <fpage>88</fpage>
          -
          <lpage>95</lpage>
          . IEEE,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>Tobias</given-names>
            <surname>Reitmaier</surname>
          </string-name>
          and
          <string-name>
            <given-names>Bernhard</given-names>
            <surname>Sick</surname>
          </string-name>
          .
          <article-title>The responsibility weighted mahalanobis kernel for semi-supervised training of support vector machines for classification</article-title>
          .
          <source>Information Sciences</source>
          ,
          <volume>323</volume>
          :
          <fpage>179</fpage>
          -
          <lpage>198</lpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>Burr</given-names>
            <surname>Settles</surname>
          </string-name>
          .
          <article-title>Active learning literature survey</article-title>
          . University of Wisconsin, Madison,
          <volume>52</volume>
          (
          <fpage>55</fpage>
          -66):
          <fpage>11</fpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>Katrin</given-names>
            <surname>Tomanek</surname>
          </string-name>
          and
          <string-name>
            <given-names>Katharina</given-names>
            <surname>Morik</surname>
          </string-name>
          .
          <article-title>Inspecting sample reusability for active learning</article-title>
          .
          <source>In Isabelle Guyon</source>
          , Gavin C. Cawley, Gideon Dror, Vincent Lemaire, and Alexander R. Statnikov, editors,
          <source>AISTATS workshop on Active Learning and Experimental Design</source>
          , volume
          <volume>16</volume>
          <source>of JMLR Proceedings</source>
          , pages
          <fpage>169</fpage>
          -
          <lpage>181</lpage>
          . JMLR.org,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>Simon</given-names>
            <surname>Tong</surname>
          </string-name>
          and
          <string-name>
            <given-names>Daphne</given-names>
            <surname>Koller</surname>
          </string-name>
          .
          <article-title>Support vector machine active learning with applications to text classification</article-title>
          .
          <source>Journal of machine learning research</source>
          ,
          <volume>2</volume>
          (Nov):
          <fpage>45</fpage>
          -
          <lpage>66</lpage>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <surname>Joaquin</surname>
            <given-names>Vanschoren</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Jan N. van Rijn</given-names>
            ,
            <surname>Bernd Bischl</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Luis</given-names>
            <surname>Torgo</surname>
          </string-name>
          . Openml:
          <article-title>Networked science in machine learning</article-title>
          .
          <source>SIGKDD Explorations</source>
          ,
          <volume>15</volume>
          (
          <issue>2</issue>
          ):
          <fpage>49</fpage>
          -
          <lpage>60</lpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>