<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Learning active learning at the crossroads? evaluation and discussion</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>L. Desreumaux</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>V. Lemaire</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Orange Labs</institution>
          ,
          <addr-line>Lannion</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>SAP Labs</institution>
          ,
          <addr-line>Paris</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <fpage>38</fpage>
      <lpage>54</lpage>
      <abstract>
        <p>Active learning aims to reduce annotation cost by predicting which samples are useful for a human expert to label. Although this field is quite old, several important challenges to using active learning in real-world settings still remain unsolved. In particular, most selection strategies are hand-designed, and it has become clear that there is no best active learning strategy that consistently outperforms all others in all applications. This has motivated research into meta-learning algorithms for “learning how to actively learn”. In this paper, we compare this kind of approach with the association of a Random Forest with the margin sampling strategy, reported in recent comparative studies as a very competitive heuristic. To this end, we present the results of a benchmark performed on 20 datasets that compares a strategy learned using a recent meta-learning algorithm with margin sampling. We also present some lessons learned and open future perspectives.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Modern supervised learning methods3 are known to require large amounts of
training examples to reach their full potential. Since these examples are mainly
obtained through human experts who manually label samples, the labelling
process may have a high cost. Active learning (AL) is a field that includes all the
selection strategies that allow to iteratively build the training set of a model in
interaction with a human expert, also called oracle. The aim is to select the most
informative examples to minimize the labelling cost.</p>
      <p>In this article, we consider the selective sampling framework, in which the
strategies manipulate a set of examples D = L ∪ U of constant size, where
L = {(xi, yi)}li=1 is the set of labelled examples and U = {xi}i=l+1 is the set of
n
unlabelled examples. In this framework, active learning is an iterative process
that continues until a labelling budget is exhausted or a pre-defined
performance threshold is reached. Each iteration begins with the selection of the most
informative example x? ∈ U . This selection is generally based on information
collected during previous iterations (predictions of a classifier, density measures,
etc.). The example x? is then submitted to the oracle that returns the
corresponding class y?, and the pair (x?, y?) is added to L. The new learning set is
then used to improve the model and the new predictions are used in the next
iteration.</p>
      <p>
        The utility measures defined by the active learning strategies in the
literature [
        <xref ref-type="bibr" rid="ref36">36</xref>
        ] differ in their positioning according to a dilemma between the
exploitation of the current classifier and the exploration of the training data. Selecting
an unlabelled example in an unknown region of the observation space Rd helps
to explore the data, so as to limit the risk of learning a hypothesis too specific
to the current set L. Conversely, selecting an example in a sampled region of Rd
locally refines the predictive model.
1.1
      </p>
    </sec>
    <sec id="sec-2">
      <title>Traditional heuristic-based AL</title>
      <p>The active learning field comes from a parallel between active educational
methods and machine learning theory. The learner is from now a statistical model
and not a student. The interactions between the student and the teacher
correspond to the interactions between the model and the oracle. The examples are
situations used by the model to generate knowledge on the problem.</p>
      <p>
        The first AL algorithms were designed with the objective of transposing these
“educational” methods to the machine learning domain. The easiest way was
to keep the usual supervised learning methods and to add “strategies” relying
on various heuristics to guide the selection of the most informative examples.
From the first initiative and up to now, a lot of strategies motivated by human
intuitions have been suggested in the literature. The purpose of this paper is not
to give an overview of the existing strategies but the reader may find in [
        <xref ref-type="bibr" rid="ref1 ref36">36, 1</xref>
        ]
of lot of them.
      </p>
      <p>
        A careful reading of the experimental results published in the literature shows
that there is no best AL strategy that consistently outperforms all others in all
applications, and some strategies cater to specific classifiers or to specific
applications. Based on this observation, several comprehensive benchmarks carried
out on numerous datasets have highlighted the strategies which, on average, are
the most suitable for several classification models [
        <xref ref-type="bibr" rid="ref28 ref29 ref41">28, 41, 29</xref>
        ]. They are given
in Table 1. For example, the most appropriate strategy for logistic regression
and random forest is an uncertainty-based sampling4 strategy, named margin
sampling, which consists in selecting at each iteration the instance for which
the difference between the probabilities of the two most likely classes is the
smallest [
        <xref ref-type="bibr" rid="ref34">34</xref>
        ]. To produce this table, we purposefully omitted studies that have
a restricted scope, such as focusing on too few datasets [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], specific tasks [
        <xref ref-type="bibr" rid="ref37">37</xref>
        ],
an insufficient number of strategies [
        <xref ref-type="bibr" rid="ref31 ref35">35, 31</xref>
        ], or variants of a single strategy [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ].
4 The reader interested in the measures used to quantify the degree of uncertainty in
the context of active learning may find in [
        <xref ref-type="bibr" rid="ref18 ref25">25, 18</xref>
        ] an interesting view which advocates
a distinction between two different types of uncertainty, referred to as epistemic and
aleatoric.
Strategy RF1 SVM2 5NN3 GNB4 C4.55 LR6 VFDT7
Margina [
        <xref ref-type="bibr" rid="ref29">29</xref>
        ]
Entropyb
      </p>
      <p>QBDc
Densityd</p>
      <p>
        OERe
[
        <xref ref-type="bibr" rid="ref29">29</xref>
        ]
      </p>
    </sec>
    <sec id="sec-3">
      <title>Meta-learning approaches to active learning</title>
      <p>
        While the traditional AL strategies can achieve remarkable performance, it is
often challenging to predict in advance which strategy is the most suitable in a
particular situation. In recent years, meta-learning algorithms have been gaining
in popularity [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ]. Some of them have been proposed to tackle the problem of
learning AL strategies instead of relying on manually designed strategies.
      </p>
      <p>
        Motivated by the success of methods that combine predictors, the first AL
algorithms within this paradigm were designed to combine traditional AL
strategies with bandit algorithms [
        <xref ref-type="bibr" rid="ref10 ref12 ref17 ref26 ref3 ref8">3, 12, 17, 8, 10, 26</xref>
        ]. These algorithms learn how to
select the best AL criterion for any given dataset and adapt it over time as the
learner improves. However, all the learning must be achieved within a few
examples to be helpful, and these algorithms suffer from a cold start issue. Moreover,
these approaches are restricted to combining existing AL heuristic strategies.
      </p>
      <p>
        Within the meta-learning framework, some other algorithms have been
developed to learn from scratch an AL strategy on multiple source datasets and
transfer it to new target datasets [
        <xref ref-type="bibr" rid="ref19 ref20 ref27">19, 20, 27</xref>
        ]. Most of them are based on modern
reinforcement learning methods. The key challenge consists in learning an AL
strategy that is general enough to automatically control the exploitation/exploration
trade-off when used on new unlabelled datasets, which is not possible when using
heuristic strategies.
1.3
      </p>
    </sec>
    <sec id="sec-4">
      <title>Objective of this paper</title>
      <p>From the state of the art, it appears that meta-learned AL strategies can
outperform the most widely used traditional AL strategies, like uncertainty sampling.
However, most of the papers that introduce new meta-learning algorithms do
not include comprehensive benchmarks that could ascertain the transferability
of the learned strategies and demonstrate that these strategies can safely be used
in real-world settings.</p>
      <p>The objective of this article is thus to compare two possible options in the
realization of an AL solution that could be used in an industrial context: using a
traditional heuristic-based strategy (see Section 1.1) that, on average, is the best
one for a given model and could be used as a strong baseline easy to implement
and not so easy to beat, or using a more sophisticated strategy learned in a
data-driven fashion that comes from the very recent literature on meta-learning
(see Section 1.2).</p>
      <p>
        To this end, we present the results of a benchmark performed on 20 datasets
that compares a strategy learned using the meta-learning algorithm proposed
in [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] with margin sampling [
        <xref ref-type="bibr" rid="ref34">34</xref>
        ], the models used being in both cases logistic
regression and random forest. We evaluated the work of [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] since the authors claim
to be able to learn a “general-purpose” AL strategy that can generalise across
diverse problems and outperform the best heuristic and bandit approaches.
      </p>
      <p>
        The rest of the paper is organized as follows. In Section 2, we explain all
the aspects of the Learning Active Learning (LAL) method proposed in [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ],
namely the Deep Q-Learning algorithm and the modeling of active learning as a
Markov decision process (MDP). In Section 3, we present the protocol used to do
extensive comparative experiments on public datasets from various application
areas. In Section 4, we give the results of our experimental study and make
some observations. Finally, we present some lessons learned and we open future
perspectives in Section 5.
2
2.1
      </p>
    </sec>
    <sec id="sec-5">
      <title>Q-Learning</title>
      <sec id="sec-5-1">
        <title>Learning active learning strategies</title>
        <p>A Markov decision process is a formalism for modeling the interaction between
an agent and its environment. This formalism uses the concepts of state, which
describes the situation in which the environment finds itself, action, which
describes the decision made by the agent, and reward, received by the agent when
it performs an action. The procedure followed by the agent to select the action
to be performed at time t is the policy. Given a policy π, the state-action table
is the function Qπ(s, a) which gives the expectation of the weighted sum of the
rewards received from the state s if the agent first executes the action a and
then follows the policy π.</p>
        <p>Q-Learning is a reinforcement learning algorithm that estimates the optimal
state-action table Q? = maxπ Qπ from interactions between the agent and the
environment. The state-action table Q is updated at any time from the current
state s, the action a = π(s) where π is the policy derived from Q, the reward
received r and the next state of the environment s0:</p>
        <p>
          Qt+1(s, a) = (1 − αt(s, a))Qt(s, a) + αt(s, a) r + γ max Qt(s0, a0) ,
a0∈A
(1)
where γ ∈ [0, 1[ is the weighting factor of the rewards and the αt(s, a) ∈ ]0, 1[ are
the learning steps that determine the weight of the new experience in relation
to the knowledge acquired at previous steps. Assuming that all the state-action
pairs are visited an infinite number of times and under some conditions on the
learning steps, the resulting sequence of state-action tables converges to Q? [
          <xref ref-type="bibr" rid="ref40">40</xref>
          ].
        </p>
        <p>The goal of a reinforcement learning agent is to maximize the rewards
received over the long term. To do this, in addition to actions that seem to lead to
high rewards (exploitation), the agent must select potentially suboptimal actions
that allow him to acquire new knowledge about the environment (exploration).
For Q-Learning, the -greedy method is the most commonly used to manage this
dilemma. It consists in randomly exploring with a probability of and acting
according to a greedy strategy that chooses the best action with a probability
of (1 − ). It is also possible to decrease the probability at each transition to
model the fact that exploration becomes less and less useful with time.
2.2</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Deep Q-Learning</title>
      <p>In the Q-Learning algorithm, if the state-action table is implemented as a
twoinput table, then it is impossible to deal with high-dimensional problems. It is
necessary to use a parametric model that will be noted as Q(s, a; θ). If it is a
deep neural network, it is called Deep Q-Learning.</p>
      <p>The training of a neural network requires the prior definition of an error
criterion to quantify the loss between the value returned by the network and the
actual value. In the context of Q-Learning, the latter value does not exist: one
can only use the reward obtained after the completion of an action to calculate
a new value, and then estimate the error achieved by calculating the difference
between the old value and the new one. A possible cost function would thus be
the following:</p>
      <p>L(s, a, r, s0, θ) =
r + γ max Q(s0, a0; θ) − Q(s, a; θ)</p>
      <p>a0∈A
However, this poses an obvious problem: updating the parameters leads to
updating the target. In practice, this means that the training procedure does not
converge.</p>
      <p>
        In 2013, a successful implementation of Deep Q-Learning introducing several
new features was published [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ]. The first novelty is the introduction of a target
network, which is a copy of the first network that is regularly updated. This has
the effect of stabilizing learning. The cost function becomes:
2
.
      </p>
      <p>(2)
r + γ max Q(s0, a0; θ−) − Q(s, a; θ)
a0∈A
2
where θ− is the vector of the target network parameters. The second
novelty is experience replay. It consists in saving each experience of the agent
(si, ai, ri, si+1) in a memory of size m and using random samples drawn from
it to update the parameters by stochastic gradient descent. This random draw
allows to not necessarily select consecutive, potentially correlated experiences.</p>
    </sec>
    <sec id="sec-7">
      <title>Improvements to Deep Q-Learning</title>
      <p>Many improvements to Deep Q-Learning have been published since the article
that introduced it. We present here the improvements that interest us for the
study of the LAL method.</p>
      <p>
        Double Deep Q-Learning. A first improvement is the correction of the
overestimation bias. It has indeed been empirically shown that Deep Q-Learning as
presented in Section 2.2 can produce a positive bias that increases the convergence
time and has a significant negative impact on the quality of the asymptotically
obtained policy. The importance of this bias and its consequences have been
verified in particular in the configurations the least favourable to its emergence,
i.e. when the environment and rewards are deterministic. In addition, its value
increases with the size of the set of actions. To correct this bias, the solution
which has been proposed in [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] consists in not using the parameters θ− to both
select and evaluate an action. The cost function then becomes:
Prioritized Experience Replay. Another improvement is the introduction of the
notion of priority in experience replay. In its initial version, Deep Q-Learning
considers that all the experiences can identically advance learning. However,
reusing some experiences at the expense of others can reduce the learning time.
This requires the ability to measure the acceleration potential of learning
associated with an experience. The priority measure proposed in [
        <xref ref-type="bibr" rid="ref33">33</xref>
        ] is the absolute
value of the temporal difference error:
δi = ri + γ max Q(si+1, a0; θ−) − Q(si, ai; θ) .
      </p>
      <p>a0∈A
A maximum priority is assigned to each new experience, so that all the
experiences are used at least once to update the parameters.</p>
      <p>
        However, the experiences that produce a small temporal difference error at
first use may never be reused. To address this issue, a method was introduced
in [
        <xref ref-type="bibr" rid="ref33">33</xref>
        ] to manage the trade-off between uniform sampling and sampling focusing
on experiences producing a large error. It consists in defining the probability of
selecting an experience i as follows:
pi =
      </p>
      <p>Pmρiβ β ,
k=1 ρk</p>
      <p>with ρi = δi + e,
where β ∈ R+ is a parameter that determines the shape of the distribution and
e is a small positive constant that guarantees pi &gt; 0. The case where β = 0 is
equivalent to uniform sampling.</p>
    </sec>
    <sec id="sec-8">
      <title>Formulating active learning as a Markov decision process</title>
      <p>The formulation of active learning as a MDP is quite natural. In each MDP
state, the agent performs an action, which is the selection of an instance to be
labelled, and the latter receives a reward that depends on the quality of the
model learned with the new instance. The active learning strategy becomes the
MDP policy that associates an action with a state.</p>
      <p>In this framework, the iteration t of the policy learning process from a dataset
divided into a learning set D = Lt ∪Ut and a test set5 D0 consists in the following
steps:
1. A model h(t) is learned from Lt. Associated with Lt and Ut, it allows to
characterize a state st.
2. The agent performs the action at = π(st) ∈ At which defines the instance
x(t)∈ Ut to label.
3. The label y(t) associated with x(t) is retrieved and the training set is updated,
i.e. Lt+1 = Lt ∪ {(x(t), y(t))} and Ut+1 = Ut \ {x(t)}.
4. The agent receives the reward rt associated with the performance `t on the
test set D0. This reward is used to update the policy (see Section 2.5).
The set of actions At depends on time because it is not possible to select the
same instance several times. These steps are repeated until a terminal state sT is
reached. Here, we consider that we are in a terminal state when all the instances
have been labelled or when `t ≥ q, where q is a performance threshold that has
been chosen as 98% of the performance obtained when the model is learned on
all the training data.</p>
      <p>The precise definition of the set of states, the set of actions and the reward
function is not evident. To define a state, it has been proposed to use a vector
whose components are the scores ybt(x) = P(Y = 0 | x) associated with the
unlabelled instances of a subset V set aside. This is the simplest representation
that can be used to characterize the uncertainty of a classifier on a dataset at a
given time t.</p>
      <p>The set of actions has been defined at iteration t as the set of vectors ai =
[yt(xi), g(xi, Lt), g(xi, Ut)], where xi ∈ Ut and :
b
g(xi, Lt) =</p>
      <p>X dist(xi, xj ),
g(xi, Ut) =</p>
      <p>X dist(xi, xj ),
(7)
1
|Lt| xj∈Lt</p>
      <p>1
|Ut| xj∈Ut
where dist is the cosine distance. An action is therefore characterized by the
uncertainty on the associated instance, as well as by two statistics related to the
density of the neighbourhood of the instance.</p>
      <p>The reward function has been chosen constant and negative until arrival in a
terminal state (rt = −1). Thus, to maximize its reward, the agent must perform
as few interactions as possible.
5 Given that active learning is usually applied in cases, this test set assumed to be
small or very small the performance evaluated on this test set could be a possibly
bad approximation. This issue and techniques for avoiding it are not examined in
this paper.</p>
    </sec>
    <sec id="sec-9">
      <title>Learning the optimal policy through Deep Q-Learning</title>
      <p>The Deep Q-Learning algorithm with the improvements presented in Section 2.3
is used to learn the optimal policy. To be able to process a state space that evolves
with each iteration, the neural network architecture has been modified. The new
architecture considers actions as inputs to the Q function in the same way as
states. It then returns only one value, while the classical architecture takes only
one state as input and returns the values associated with all the actions.</p>
      <p>The learning procedure involves a collection of Z labelled datasets {Zi}1≤i≤Z .
It consists in repeating the following steps (see Figure 1):
1. A dataset Z ∈ {Zi} is randomly selected and divided into a training set D
and a test set D0.
2. The policy π derived from the Deep Q-Network is used to simulate several
active learning episodes on Z according to the procedure described in
Section 2.4. Experiences (st, at, rt, st+1) are collected in a finite size memory.
3. The Deep Q-Network parameters are updated several times from a
minibatch of experiences extracted from the memory (according to the method
described in Section 2.3).</p>
      <p>To initialize the Deep Q-Network, some warm start episodes are simulated
using a random sampling policy, followed by several parameter updates. Once
the strategy is learned, its deployment is very simple. At each iteration of the
sampling process, the classifier is re-trained, then the vector characterizing the
process state and all the vectors associated with the actions are calculated. The
vector a? corresponding to the example to label x? is then the one that satisfies
a? = arg maxa∈A Q(s, a; θ), the parameters θ being set at the end of the policy
learning procedure.</p>
      <p>Initial
examples</p>
      <p>Simulation of 10 active
learning episodes</p>
      <p>Experience replay
memory of size 10 000</p>
      <sec id="sec-9-1">
        <title>Experimental protocol</title>
        <p>In this section, we introduce our protocol of the comparative experimental study
we conducted.
3.1</p>
      </sec>
    </sec>
    <sec id="sec-10">
      <title>Policy learning</title>
      <p>
        To learn the strategy, we used the same code6, the same hyperparameters and
the same datasets as those used in [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. The complete list of hyperparameters
is given in Table 2 with the variable names from the code that represent them.
The datasets from which the strategy is learned are given in Table 3.
      </p>
      <p>The specification of the neural network architecture is very simple (all the
layers are fully connected): (i) the first layer (linear + sigmoid) receives the
vector s (i.e. |V| = 30 input neurons) and has 10 output neurons; (ii) the second
layer (linear + sigmoid) concatenates the 10 output neurons of the first layer
with the vector a (i.e. 13 neurons in total) and has 5 output neurons; (iii) finally,
the last layer (linear) has only one output to estimate Q(s, a).</p>
      <p>Hyperparameter</p>
      <p>Description
N STATE ESTIMATION = 30 Size of V
REPLAY BUFFER SIZE = 10000 Experience replay memory size
PRIORITIZED REPLAY EXPONENT = 3 Exponent β involved in Equation (6)
BATCH SIZE = 32 Minibatch size for stochastic gradient descent
LEARNING RATE = 0.0001 Learning rate
TARGET COPY FACTOR = 0.01 Value that sets the target network update1
EPSILON START = 1 Exploration probability at start
EPSILON END = 0.1 Minimum exploration probability
EPSILON STEPS = 1000 Number of updates of during the training
WARM START EPISODES = 100 Number of warm start episodes
NN UPDATES PER WARM START = 100 Number of parameter updates after the warm start
TRAINING ITERATIONS = 1000 Number of training iterations
TRAINING EPISODES PER ITERATION = 10 Number of episodes per training iteration
NN UPDATES PER ITERATION = 60 Number of updates per training iteration
1 In this implementation, the target network parameters θ− are updated each time the
parameters θ are changed as follows: θ− ← (1−TARGET COPY FACTOR)·θ−+TARGET COPY FACTOR·θ.
Our objective is to compare the performance of a strategy learned using LAL
with the performance of a heuristic strategy that, on average, is the best one for</p>
      <sec id="sec-10-1">
        <title>6 https://github.com/ksenia-konyushkova/LAL-RL</title>
        <p>Dataset</p>
        <p>
          |D| |Y| #num #cat maj (%) min (%)
a given model. Several benchmarks conducted on numerous datasets have
highlighted the fact that margin sampling is the best heuristic strategy for logistic
regression (LR) and random forest (RF) [
          <xref ref-type="bibr" rid="ref29 ref41">41, 29</xref>
          ].
        </p>
        <p>Margin sampling consists in choosing the instance for which the difference (or
margin) between the probabilities of the two most likely classes is the smallest:
x? = arg min P(y1 | x) − P(y2 | x),
x∈U
(8)
where y1 and y2 are respectively the first and second most probable classes for
x. The main advantage of this strategy is that it is easy to implement: at each
iteration, a single training of the model and |U | predictions are sufficient to
select an example to label. A major disadvantage, however, is its total lack of
exploration, as it only exploits locally the hypothesis learned by the model.</p>
        <p>
          We chose to evaluate the Margin/LR association because it is with logistic
regression that the hyperparameters of Table 2 were optimized in [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ]. In
addition, in order to determine whether it is necessary to modify them when another
model is used, we also evaluated the Margin/RF association. This last
association is particularly interesting because it is the best association highlighted in a
recent and large benchmark carried out on 73 datasets, including 5 classification
models and 8 active learning strategies [
          <xref ref-type="bibr" rid="ref29">29</xref>
          ]. In addition, we evaluated random
sampling (Rnd) for both models.
3.3
        </p>
      </sec>
    </sec>
    <sec id="sec-11">
      <title>Datasets</title>
      <p>The datasets were selected so as to have a high diversity according to the
following criteria: (i) number of examples; (ii) number of numerical variables; (iii)
number of categorical variables; (iv) class imbalance.</p>
      <p>
        We have also taken care to exclude datasets that are too small and not
representative of those used in an industrial context. The 20 selected datasets
are described in Table 4. They all come from the UCI database [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], apart
from the dataset “orange-fraud” which is dataset on fraud detection. Four of
the datasets have been used in a challenge on active learning that took place
in 2010 [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] and the dataset “nomao” comes from another challenge on active
learning [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>
        Dataset
In our evaluation protocol, the active sampling process begins with the random
selection of one instance in each class and ends when 250 instances are labelled.
This value ensures that our results are comparable to other studies in the
literature. For performance comparison, we used the area under the learning curve
(ALC) based on the classification accuracy. We do not claim that the ALC is
a “perfect metric”7 but it is the defacto standard evaluation criterion in active
learning, and it has been chosen as part of a challenge [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
      </p>
      <p>
        Our evaluation was carried out by cross-validation with 5 partitions, in which
class imbalance within the complete dataset was preserved. For each partition,
the sampling process was repeated 5 times with different initializations to get a
7 There is literature on more expressive summary statistics of the active-learning curve
[
        <xref ref-type="bibr" rid="ref30 ref39">39, 30</xref>
        ]. This could be a limitation of this current article, other metrics could be tester
in future versions of experiments.
mean and a variance on the result. However, we have made sure that the initial
instances are identical for all the strategy/model associations on each partition
so as to not introduce bias into the results. In addition, for Rnd, the random
sequence of numbers was identical for all the models.
4
      </p>
      <sec id="sec-11-1">
        <title>Results</title>
        <p>The results of our experimental study are given in Table 5. The mean ALC
obtained for each dataset/classifier/strategy association are reported (the optimal
score is 100). The left part of the table gives the results for logistic regression
and the right part gives the results for random forest. The penultimate line
corresponds to the averages calculated on all the datasets and the last line gives
the number of times the strategy has won, tied or lost. The non-significant
differences were established on the basis of a paired t-test at 99% significance level
(where H0: same mean between populations and where the mean is the estimate
out of 5 repetitions x cross-validation with 5 partitions of each method).</p>
        <p>Dataset</p>
        <p>Rnd/LR Margin/LR LAL/LR Rnd/RF Margin/RF LAL/RF maj
adult
banana
bank-marketing-full
climate-simulation
eeg-eye-state
hiva
ibn-sina
magic
musk
nomao
orange-fraud
ozone-onehr
qsar-biodegradation
seismic-bumps
skin-segmentation
statlog-german-credit
thoracic-surgery
thyroid-hypothyroid
wilt
zebra
Mean
win/tie/loss</p>
        <p>
          Several observations can be made. First of all, it should be noted that the
choice of model is decisive: the results of random forest are all better than those
of logistic regression. The random forest model learns indeed very well from few
data, as highlighted in [
          <xref ref-type="bibr" rid="ref32">32</xref>
          ]. We can notice that even with random sampling, RF is
almost always better than LR, regardless of the strategy used. In addition, using
margin sampling with this model allows a significant performance improvement.
This model is very competitive in itself because by its nature, it includes terms
of exploration and exploitation (see Section 5 Conclusion about this point).
        </p>
        <p>In addition, the results of the learned strategy clearly show that a good
active learning strategy has been learned, since it performs better than random
sampling over a large number of datasets. However, the learned strategy is no
better than margin sampling. These results are nevertheless very interesting
since only 8 datasets were used in the learning procedure.</p>
        <p>
          Finally, the results show a well-known fact about active learning: on very
unbalanced datasets, it is difficult to achieve a better performance than random
sampling, as shown in the last column of Table 5 in which the results obtained
by always predicting the majority class are given. The “cold start” problem that
occurs in active learning, i.e. the inability of making reliable predictions in early
iterations (when training data is not sufficient), is indeed further aggravated
when a dataset has highly imbalanced classes, since the selected samples are
likely to belong to the majority class [
          <xref ref-type="bibr" rid="ref38">38</xref>
          ]. However, if the imbalance is known, it
may be interesting to associate strategies with a model or criterion appropriate
to this case, as illustrated in [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ].
        </p>
        <p>
          To investigate the “learning speed”, we show results for different sizes of L
in Table 6. They lead to similar conclusions and our results for |L| = 32 confirm
the results of [
          <xref ref-type="bibr" rid="ref32">32</xref>
          ]. The reader may find all our experimental results on Github8.
|L| = 32
|L| = 64
|L| = 128
|L| = 250
Dataset Rnd Margin LAL Rnd Margin LAL Rnd Margin LAL Rnd Margin LAL
        </p>
        <sec id="sec-11-1-1">
          <title>8 https://github.com/ldesreumaux/lal_evaluation</title>
        </sec>
      </sec>
      <sec id="sec-11-2">
        <title>Discussion and open questions</title>
        <p>
          In this article, we evaluated a method representative of a recent orientation of
active learning research towards meta-learning methods for “learning how to
actively learn”, which is on top of the state of the art [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ], versus a traditional
heuristic-based Active Learning (the association of Random Forest and Margin)
which is one of the best method reported in recent comparative studies [
          <xref ref-type="bibr" rid="ref29 ref41">41, 29</xref>
          ].
The comparison is limited to just one representative of each of the two classes
(meta-learning and traditional heuristic-based) but since each is on top of the
state of the art several lessons can be drawn from our study.
        </p>
        <p>
          Relevance of LAL. First of all, the experiments carried out confirm the relevance
of the LAL method, since it has enabled us to learn a strategy that achieves the
performance of a very good heuristic, namely margin sampling, but contrary
to the results in [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ], the strategy is not always better than random sampling.
This method still raises many problems, including that of the transferability
of the learned strategies. An active learning solution that can be used in an
industrial context must perform well on real data of an unknown nature and
must not involve parameters to be adjusted. With regard to the LAL method, a
first major problem is therefore the constitution of a “dataset of datasets” large
and varied enough to learn a strategy that is effective in very different contexts.
        </p>
        <p>Moreover, the learning procedure is sensitive to the performance criteria used,
which in our view seems to be a problem. Ideally, the strategy learned should
be usable on new datasets with arbitrary performance criteria (AUC, F-score,
etc.). From our point of view, the work of optimizing the many hyperparameters
of the method (see Table 2) can not be carried out by a user with no expertise
in deep reinforcement learning.</p>
        <p>
          About the Margin/RF association. In addition to the evaluation of the LAL
method, we confirmed a result of [
          <xref ref-type="bibr" rid="ref29">29</xref>
          ], namely that margin sampling, associated
with a random forest, is a very competitive strategy. From an industrial point
of view, regarding the computational complexity, the performances obtained
and the absence of “domain knowledge required to be used” the Margin/RF
association remains a very strong baseline difficult to beat. However, it shares a
major drawback with many active learning strategies, that is its lack of reliability.
Indeed, there is no strategy that is better or equivalent to random sampling on
all datasets and with all models. The literature on active learning is incomplete
with regard to this problem, which is nevertheless a major obstacle to using
active learning in real-world settings.
        </p>
        <p>
          Another important problem in real-world applications, little studied in the
literature, is the estimation of the generalization error without a test set. It
would be interesting to check if the Out-Of-Bag samples of the random forests
[
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] can be used in an active learning context to estimate this error.
        </p>
        <p>Concerning the exploitation/exploration dilemma, margin sampling clearly
performs only exploitation. The good results of the Margin/RF association may
suggest that the RF algorithm intrinsically contains a part of exploration due to
the bagging paradigm. It could be interesting to add experiments in the future
to test this point.</p>
        <p>
          Still with regard to the random forests, an open question is to study if a better
strategy than margin sampling could be designed. Since the random forests are
ensemble classifiers, a possible way of research to design this strategy is to check
if they could be used in the credal uncertainty framework [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] which seeks to
differentiate between the reducible and irreducible part of the uncertainty in a
prediction.
        </p>
        <p>
          About error generalization. In Real world application AL should be used most of
the time in absence of a test dataset. A open question could be to a use another
known result about RF: the possibility to have an estimate of the generalization
error using the Out-Of-Bag (OOB) samples [
          <xref ref-type="bibr" rid="ref16 ref5">16, 5</xref>
          ]. We did not present
experiments on this topic in this paper but an idea could be to analyze the convergence
versus the number of labelled examples between the OOB performance and the
test performance to check at which “moment” (|L|) one could trust9 the OOB
performance (OOB performance ≈ test performance). The use of a “random
uniform forest” [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] for which the OOB performance seems to be more reliable
could also be investigated.
        </p>
        <p>
          About the benchmarking methodology. Recent benchmarks have highlighted the
need for extensive experimentation to compare active learning strategies. The
research community might benefit from a “reference” benchmark, as in the field
of time series classification [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], so that new results can be rigorously compared
to the state of the art on a same and large set of datasets. By this way, one will
have comprehensive benchmarks that could ascertain the transferability of the
learned strategies and demonstrate that these strategies can safely be used in
real-world settings.
        </p>
        <p>
          If this reference benchmark is created, the second step would be to decide
how to compare the AL strategies. This comparison could be made using not
a single criterion but a “pool” of criteria. This pool may be chosen to reflect
different “aspects” of the results [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ].
9 Since when |L| is very low the RF do overtraining thus it’s train performance is not
a good indicator for the error generalization
        </p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Aggarwal</surname>
            ,
            <given-names>C.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kong</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gu</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          , Han,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <surname>P.S.</surname>
          </string-name>
          :
          <article-title>Active Learning: A Survey</article-title>
          . In: Aggarwal, C.C. (ed.)
          <article-title>Data Classification: Algorithms and Applications</article-title>
          , chap. 22, pp.
          <fpage>571</fpage>
          -
          <lpage>605</lpage>
          . CRC Press (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Antonucci</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corani</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bernaschina</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Active Learning by the Naive Credal Classifier</article-title>
          .
          <source>In: Proceedings of the Sixth European Workshop on Probabilistic Graphical Models (PGM)</source>
          . pp.
          <fpage>3</fpage>
          -
          <lpage>10</lpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Baram</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>El-Yaniv</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Luz</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Online Choice of Active Learning Algorithms</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          <volume>5</volume>
          ,
          <fpage>255</fpage>
          -
          <lpage>291</lpage>
          (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Beyer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Krempl</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lemaire</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>How to Select Information That Matters: A Comparative Study on Active Learning Strategies for Classification</article-title>
          .
          <source>In: Proceedings of the 15th International Conference on Knowledge Technologies and Datadriven Business. ACM</source>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Breiman</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Out-of-bag estimation (</article-title>
          <year>1996</year>
          ), https://www.stat.berkeley.edu/ ~breiman/OOBestimation.pdf,
          <source>last visited 08/03/2020</source>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Candillier</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lemaire</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Design and analysis of the nomao challenge active learning in the real-world</article-title>
          .
          <source>In: Proceedings of the ALRA: Active Learning in Realworld Applications</source>
          , Workshop ECML-PKDD. (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Keogh</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hu</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Begum</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bagnall</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mueen</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Batista</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          :
          <article-title>The UCR Time Series Classification Archive (</article-title>
          <year>2015</year>
          ), www.cs.ucr.edu/~eamonn/time_ series_data/
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Chu</surname>
            ,
            <given-names>H.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>H.T.</given-names>
          </string-name>
          :
          <source>Can Active Learning Experience Be Transferred? 2016 IEEE 16th International Conference on Data Mining</source>
          pp.
          <fpage>841</fpage>
          -
          <lpage>846</lpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Ciss</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Generalization Error and Out-of-bag Bounds in Random (Uniform) Forests, working paper or preprint</article-title>
          , https://hal.archives-ouvertes.fr/ hal-01110524/document, last visited 06/03/2020
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Collet</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Optimistic Methods in Active Learning for Classification</article-title>
          .
          <source>Ph.D. thesis</source>
          , Universit´e de Lorraine (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Dua</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Graff</surname>
            ,
            <given-names>C.:</given-names>
          </string-name>
          <article-title>UCI Machine Learning Repository (</article-title>
          <year>2017</year>
          ), http://archive. ics.uci.edu/ml
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Ebert</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fritz</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schiele</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Ralf: A reinforced active learning formulation for object class recognition</article-title>
          .
          <source>In: 2012 IEEE Conference on Computer Vision and Pattern Recognition</source>
          . pp.
          <fpage>3626</fpage>
          -
          <lpage>3633</lpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Ertekin</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bottou</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Giles</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Learning on the Border: Active Learning in Imbalanced Data Classification</article-title>
          .
          <source>In: Conference on Information and Knowledge Management</source>
          . pp.
          <fpage>127</fpage>
          -
          <lpage>136</lpage>
          . CIKM (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Guyon</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cawley</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dror</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lemaire</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Results of the Active Learning Challenge</article-title>
          .
          <source>In: Proceedings of Machine Learning Research</source>
          . vol.
          <volume>16</volume>
          , pp.
          <fpage>19</fpage>
          -
          <lpage>45</lpage>
          . PMLR (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Hasselt</surname>
            ,
            <given-names>H.v.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guez</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Silver</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Deep Reinforcement Learning with Double QLearning</article-title>
          .
          <source>In: AAAI Conference on Artificial Intelligence</source>
          . pp.
          <fpage>2094</fpage>
          -
          <lpage>2100</lpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Hastie</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tibshirani</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Friedman</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>The elements of statistical learning: data mining, inference and prediction</article-title>
          . Springer,
          <volume>2</volume>
          <fpage>edn</fpage>
          . (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Hsu</surname>
            ,
            <given-names>W.N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>H.T.</given-names>
          </string-name>
          :
          <article-title>Active Learning by Learning</article-title>
          .
          <source>In: Proceedings of the TwentyNinth AAAI Conference on Artificial Intelligence</source>
          . pp.
          <fpage>2659</fpage>
          -
          <lpage>2665</lpage>
          . AAAI Press (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18. Hu¨llermeier, E.,
          <string-name>
            <surname>Waegeman</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          :
          <article-title>Aleatoric and Epistemic Uncertainty in Machine Learning: An Introduction to Concepts and Methods</article-title>
          . arXiv:
          <year>1910</year>
          .
          <article-title>09457 [cs</article-title>
          .LG] (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Konyushkova</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sznitman</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fua</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Learning Active Learning from Data</article-title>
          .
          <source>In: Advances in Neural Information Processing Systems</source>
          <volume>30</volume>
          , pp.
          <fpage>4225</fpage>
          -
          <lpage>4235</lpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Konyushkova</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sznitman</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fua</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Discovering General-Purpose Active Learning Strategies</article-title>
          . arXiv:
          <year>1810</year>
          .
          <article-title>04114 [cs</article-title>
          .LG] (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21. K¨orner,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Wrobel</surname>
          </string-name>
          ,
          <string-name>
            <surname>S.:</surname>
          </string-name>
          <article-title>Multi-class Ensemble-Based Active Learning</article-title>
          .
          <source>In: Proceedings of the 17th European Conference on Machine Learning</source>
          . pp.
          <fpage>687</fpage>
          -
          <lpage>694</lpage>
          . SpringerVerlag (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Kottke</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Calma</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huseljic</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Krempl</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sick</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Challenges of Reliable, Realistic and Comparable Active Learning Evaluation</article-title>
          .
          <source>In: Proceedings of the Workshop and Tutorial on Interactive Adaptive Learning</source>
          . pp.
          <fpage>2</fpage>
          -
          <lpage>14</lpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Lemke</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Budka</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gabrys</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Metalearning: a survey of trends and technologies</article-title>
          .
          <source>Artificial Intelligence Review</source>
          <volume>44</volume>
          ,
          <fpage>117</fpage>
          -
          <lpage>130</lpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Mnih</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kavukcuoglu</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Silver</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Graves</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Antonoglou</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wierstra</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Riedmiller</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Playing Atari with Deep Reinforcement Learning</article-title>
          .
          <source>arXiv:1312.5602 [cs.LG]</source>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <surname>Nguyen</surname>
            ,
            <given-names>V.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Destercke</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , Hu¨llermeier, E.:
          <article-title>Epistemic Uncertainty Sampling</article-title>
          . In: Discovery Science (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <surname>Pang</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dong</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hospedales</surname>
            ,
            <given-names>T.M.</given-names>
          </string-name>
          :
          <article-title>Dynamic Ensemble Active Learning: A Non-Stationary Bandit with Expert Advice</article-title>
          .
          <source>In: Proceedings of the 24th International Conference on Pattern Recognition</source>
          . pp.
          <fpage>2269</fpage>
          -
          <lpage>2276</lpage>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          27.
          <string-name>
            <surname>Pang</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dong</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hospedales</surname>
            ,
            <given-names>T.M.</given-names>
          </string-name>
          :
          <article-title>Meta-Learning Transferable Active Learning Policies by Deep Reinforcement Learning</article-title>
          . arXiv:
          <year>1806</year>
          .
          <article-title>04798 [cs</article-title>
          .LG] (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          28.
          <string-name>
            <surname>Pereira-Santos</surname>
          </string-name>
          , D.,
          <string-name>
            <surname>de Carvalho</surname>
            ,
            <given-names>A.C.</given-names>
          </string-name>
          :
          <article-title>Comparison of Active Learning Strategies and Proposal of a Multiclass Hypothesis Space Search</article-title>
          .
          <source>In: Proceedings of the 9th International Conference on Hybrid Artificial Intelligence Systems</source>
          - Volume
          <volume>8480</volume>
          . pp.
          <fpage>618</fpage>
          -
          <lpage>629</lpage>
          . Springer-Verlag (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          29.
          <string-name>
            <surname>Pereira-Santos</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Prudˆencio</given-names>
            , R.B.C.,
            <surname>de Carvalho</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.C.</surname>
          </string-name>
          :
          <article-title>Empirical investigation of active learning strategies</article-title>
          .
          <source>Neurocomputing 326-327</source>
          ,
          <fpage>15</fpage>
          -
          <lpage>27</lpage>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          30.
          <string-name>
            <surname>Pupo</surname>
            ,
            <given-names>O.G.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Altalhi</surname>
            ,
            <given-names>A.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ventura</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Statistical comparisons of active learning strategies over multiple datasets</article-title>
          .
          <source>Knowl. Based Syst</source>
          .
          <volume>145</volume>
          ,
          <fpage>274</fpage>
          -
          <lpage>288</lpage>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          31.
          <string-name>
            <surname>Ramirez-Loaiza</surname>
            ,
            <given-names>M.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sharma</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kumar</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bilgic</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Active learning: an empirical study of common baselines</article-title>
          .
          <source>Data Mining and Knowledge Discovery</source>
          <volume>31</volume>
          (
          <issue>2</issue>
          ),
          <fpage>287</fpage>
          -
          <lpage>313</lpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          32.
          <string-name>
            <surname>Salperwyck</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lemaire</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Learning with few examples: an empirical study on leading classifiers</article-title>
          .
          <source>In: Proceedings of the 2011 International Joint Conference on Neural Networks</source>
          . pp.
          <fpage>1010</fpage>
          -
          <lpage>1019</lpage>
          . IEEE (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          33.
          <string-name>
            <surname>Schaul</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Quan</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Antonoglou</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Silver</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Prioritized Experience Replay</article-title>
          .
          <source>arXiv:1511.05952 [cs.LG]</source>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          34.
          <string-name>
            <surname>Scheffer</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Decomain</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wrobel</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Active Hidden Markov Models for Information Extraction</article-title>
          . In: Hoffmann,
          <string-name>
            <given-names>F.</given-names>
            ,
            <surname>Hand</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.J.</given-names>
            ,
            <surname>Adams</surname>
          </string-name>
          , N.,
          <string-name>
            <surname>Fisher</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guimaraes</surname>
            ,
            <given-names>G</given-names>
          </string-name>
          . (eds.)
          <article-title>Advances in Intelligent Data Analysis</article-title>
          . pp.
          <fpage>309</fpage>
          -
          <lpage>318</lpage>
          (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          35.
          <string-name>
            <surname>Schein</surname>
            ,
            <given-names>A.I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ungar</surname>
            ,
            <given-names>L.H.</given-names>
          </string-name>
          :
          <article-title>Active learning for logistic regression: an evaluation</article-title>
          .
          <source>Machine Learning</source>
          <volume>68</volume>
          ,
          <fpage>235</fpage>
          -
          <lpage>265</lpage>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          36.
          <string-name>
            <surname>Settles</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Active Learning</article-title>
          . Morgan &amp; Claypool Publishers (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          37.
          <string-name>
            <surname>Settles</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Craven</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>An Analysis of Active Learning Strategies for Sequence Labeling Tasks</article-title>
          .
          <source>In: Proceedings of the Conference on Empirical Methods in Natural Language Processing</source>
          . pp.
          <fpage>1070</fpage>
          -
          <lpage>1079</lpage>
          . Association for Computational Linguistics (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          38.
          <string-name>
            <surname>Shao</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Learning to Sample: An Active Learning Framework</article-title>
          .
          <source>IEEE International Conference on Data Mining (ICDM)</source>
          pp.
          <fpage>538</fpage>
          -
          <lpage>547</lpage>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          39.
          <string-name>
            <surname>Trittenbach</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Englhardt</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , Bo¨hm,
          <string-name>
            <surname>K.</surname>
          </string-name>
          :
          <article-title>An overview and a benchmark of active learning for one-class classification</article-title>
          . CoRR abs/
          <year>1808</year>
          .04759 (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          40.
          <string-name>
            <surname>Watkins</surname>
            ,
            <given-names>C.J.C.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dayan</surname>
            ,
            <given-names>P.:</given-names>
          </string-name>
          <article-title>Q-learning</article-title>
          .
          <source>Machine Learning</source>
          <volume>8</volume>
          (
          <issue>3</issue>
          ),
          <fpage>279</fpage>
          -
          <lpage>292</lpage>
          (
          <year>1992</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref41">
        <mixed-citation>
          41.
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Loog</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>A benchmark and comparison of active learning for logistic regression</article-title>
          .
          <source>Pattern Recognition</source>
          <volume>83</volume>
          ,
          <fpage>401</fpage>
          -
          <lpage>415</lpage>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>