<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Active Learning Strategy for Text Categorization Based on Support Vectors Relative Positioning</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Vladimir Vakurin</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrey Kopylov</string-name>
          <email>and.kopylov@gmail.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Oleg Seredin</string-name>
          <email>oseredin@yandex.ru</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Konstantin Mertsalov</string-name>
          <email>kmertsalov@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Rensselaer Polytechnic Institute</institution>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Tula State University</institution>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>A method of decreasing the number of requests to a human labeler, required for annotation of a large text corpus, is proposed. We use an active learning strategy based on subdivision of labeling process into iterative steps, starting from some initial training set and using SVM classi cation results to select a set of objects to be labeled by an expert and added to the training set on the next step. Such procedure can signi cantly reduce time an amount of objects needed for classi er training without loss of recognition accuracy.</p>
      </abstract>
      <kwd-group>
        <kwd>Active Learning</kwd>
        <kwd>SVM margin</kwd>
        <kwd>Text Categorization</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>We address an issue of an e cient use of time of human reviewers that are often
employed to review and categorize electronic texts to create training sets for the
Supervised and Semi-supervised learning methods. Applications of such methods
aim to use the training data to develop an automated classi ers capable of
annotating arbitrarily large data sets. However, the construction of such classi er
is constrained by the time and cost required to review and categorize su ciently
large training set by human reviewers. The research problem addressed here
stems from the trade o between the need to develop a large enough training set
required for an accurate classi er and the need to control the costs of creating
such training set which limits the number of documents that can be reviewed by
humans.</p>
      <p>We consider a method of choosing text objects to be reviewed by the human
reviewer from the pool of available data in a way that accelerates the learning
process. The proposed method belongs to the group of online recognition
methods, and is based on the analysis of the interclass border areas of the general
assembly. Experimental results show the e ectiveness of the method in
comparison with other methods of Active Learning in spite of its relative simplicity.</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        All Machine learning approaches can be divided into two large groups, namely
online [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ] and o ine [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] methods, by the method they use to obtain and process
training data.
      </p>
      <p>In case of o ine approaches, it is assumed that the whole training set is
available for analysis and remains unchanged during the functioning of a recognition
system, while in online systems a new labeled objects or even a sets of labeled
objects, unavailable at the initial training stage, become available.</p>
      <p>
        If the data, used for training, is no longer available or limited and can not be
e ectively used, when all the information about this part of data is represented
only through a decisive rule. Machine learning approaches, which deal with such
situation are referred to as Incremental Learning [
        <xref ref-type="bibr" rid="ref1 ref11">1,11</xref>
        ], or, like in the early
Soviet works, Recurrent Learning. Sometimes Incremental Learning is used as a
synonym to Online Learning.
      </p>
      <p>
        Reinforcement Learning [
        <xref ref-type="bibr" rid="ref29">29</xref>
        ] is an area of machine learning in which the
considered system or agent trains in the process of interaction with some
environment. Reinforcement signals are formed by the reaction of the environment
to accepted decisions but not the special reinforcement control system like in the
case of supervised learning. Therefore reinforcement learning is a particular case
of supervised learning, but the environment itself or its model plays the role of
a teacher. It is also necessary to take into account that some reinforcement rules
are based on implicit teachers, for example, in the case of an arti cial neural
network, on the simultaneous activity of formal neurons, so that they can be
attributed to unsupervised learning [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
      </p>
      <p>A situation, similar to online recognition, can occur when the training set
is so large that the available computational tools do not allow to process it
entirely. Such a situation was typical for the initial stages of the development
of the theory of machine learning, when computing resources were extremely
limited. Now this problem again comes to the fore in relation with the increased
amount of available data produced by the informational systems and gives rise
to a new area of research known as "Big Data".</p>
      <p>
        Some pattern recognition tasks have a speci city, that makes it possible to
separate them into a class of problems with semi-labeled sample set
(Semisupervised Learning) [
        <xref ref-type="bibr" rid="ref23">23,35,36</xref>
        ], when the features of other objects but not
classi cation labels is known in additional to the training set. However, the presence
of such objects gives the necessary information about the population. This
additional information can be used to accelerate the training process or to increase
accuracy. The tasks, when we can request a class label for some objects from
the teacher, originate an important subclass of such problems. Object selection
strategies with the correspondent methods of correction of decision rules form
the Active Learning subclass of pattern recognition problems [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ].
      </p>
      <p>
        If there are data about an object's class received from several experts, or the
accuracy of the teacher can be questioned and has some degree of reliability,
when the further synthesis of active learning methods, called Proactive Learning
can be maid [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
      </p>
      <p>
        A comprehensive survey, devoted to the problems of informational retrieval
on the text data, ways of arrangement of the experiments and evaluation of
their results can be found in the article [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ]. Problems of generalization ability
improvement and active learning tasks are considered in [
        <xref ref-type="bibr" rid="ref17 ref21 ref26 ref27">26,27,21,17</xref>
        ].
      </p>
      <p>
        This paper is attributed to a known pool-based learning problem [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] of
labeling objects from some subset in general assembly, and further model retraining
to improve its generalization ability on that population.
      </p>
      <p>
        A technique of queries the objects about which it is least certain how to
label for further expert annotation (uncertainty sampling) is described in [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] and,
in application to support vector machines, the object selection from interclass
margin in [31]. The common characteristic feature of such techniques consists in
using some classi cation uncertainty measure so that the borderline objects are
the most preferable for disclosure by an expert [31]. For example, when using a
probabilistic model for binary classi cation, uncertainty sampling simply queries
the instance whose posterior probability of being positive is nearest 0.5. For linear
classi ers (like linear SVM) such objects are those that are close to the margin
between classes.
      </p>
      <p>
        The set of hypotheses consistent with the current labeled training set is called
the version space (VS) [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. The generalization error of most possible classi ers
of the population in the VS (named by authors - egalitarian generalisation error
bound) is controlled by the size of the VS relative to the size of the hypothesis
space [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Learner that chooses successive queries that halves the version spaces is
the learner that minimizes the maximum expected size of the version space (p.51
in [31]). If the object is placed near the hyperplane the rate of dimensionality
reduction approaching two in accordance with the distance from this object to
the hyperplane and does not depends from real class label. This rule holds true
if the training set can be linearly separated (though this requirement is not
considered too strict by the author (p.49 in [31]).
      </p>
      <p>
        We propose to use the distance from an object to support vectors as well
as the distance to the hyperplane for object selection, as the compactness and
symmetry assumptions is often not satis ed in practice [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] and the separating
hyperplane could not intersect version space at all (p.52 in [31]). The trivial
argument for such an object selection is the necessity to reduce the number of
manually labeled duplicate texts.
      </p>
      <p>According with [31] active learning means the directed strategy of choosing
objects from general assembly for labeling. Good strategy allows to minimize
number of queries to human labelers.</p>
      <p>
        Several methods for choosing objects for class membership opening have
investigated in [
        <xref ref-type="bibr" rid="ref18 ref20">18,20</xref>
        ]. Comparison of generalization performance increasing
speedup for these two techniques were published in[
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. In [30] the demo program
have introduced for active learning algorithm based on Naive Bayes classi er,
generalization performance increasing speedup and users survey.
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ] expert provides explanations and produces labeling of domain featured
fragments of texts. Based on these fragments of interests and some other specify
fragments which oppose the whole document label authors introduce method of
Learning with Explanations. In [34] the problem of active learning for networked
data, where samples are connected with links and their labels are correlated
with each other have studied. Authors particularly focus on the setting of using
the probabilistic graphical model to simulate the networked data, due to its
e ectiveness in capturing the dependency between labels of linked samples.
Twostage active learning technique for multi-label problem were suggested in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] and
summarize principles from [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. In rst stage an e ective multi-label classi cation
model by combining label ranking with threshold learning, which is incrementally
trained to avoid retraining from scratch after every query was introduced. Then
based on this model, authors propose to exploit both uncertainty and diversity
in the instance space as well as the label space, and actively query the
instancelabel pairs which can improve the classi cation model most.
      </p>
      <p>It is noticed from [31] that there is no any reason to prefer for labeling one of
the classes. So, we base our strategy on principle of equal signi cance for label
opening for class of interest and other.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Active Learning Algorithm based on Support Vectors</title>
    </sec>
    <sec id="sec-4">
      <title>Relative Positioning</title>
      <p>We are using the principle of uncertainty sampling and analysis of borderline
objects. Corrections of decision rule implemented via opening labels for objects
which are close to decision boundary. It is well-known fact that support vectors
in Vapniks SVM [32] uniquely de ne the so called separated hyperplane. The
score function (according to sign) for any unlabeled test object (document, text)
will be more reliable to class membership. The our idea of suggested algorithm
is to take for testing by human labeler just such documents which are close to
the decision boundary and at the same time are far from set of support objects.
So the formal algorithm is follows:
1. At the rst step the optimal separating hyperplane is built for initial
training set and the subset of support objects is xed. Initial training set can
be obtained via review of documents sampled randomly from the complete
population.
2. For all unlabeled objects take into account just those H which is closest to
hyperplane.
3. From H for labeling it is necessary to select N objects !i 2 H under
condition Li &gt; Lj ; i = 1; ::; N; j &gt; N , where Lj = min d(!j ; !kSV ) - is the
!SV
k
distance from object !j 2 H to the set of support objects !kSV .</p>
      <p>The distance is a regular Euclidean distance in corresponding feature space.
4</p>
    </sec>
    <sec id="sec-5">
      <title>Experimental Study. The Procedure</title>
      <p>
        It is clear that computational costs for each iteration of such experiment are
considerable and rise the problem of time sharing between computer (decision
rule correction) and human labeler (reading documents and take a decision about
categorization). Moreover the portion of texts presented for one iteration should
be large enough to be representative but not too large to give the expert an a
reasonable opportunity to read and consider each document before labeling. For
our experiments we have chosen 20 documents for labeling which is consistent
with discussions published in (p.49 in [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]). For the experimental study we used
labeled data set (general assembly imitation) which was divided on three parts:
initial training (start) set, selection set and veri catory set. These three
sets are de ned randomly at the beginning of each experiment in proportion
of 5%, 45% and 50% accordingly. For all competitor methods the start and
veri catory sets were xed and isolated from selection set. So, we provide the
objective monitoring of decision rule quality and generalization performance
increasing speedup while transpose 20 objects from selection set into training
set according to active learning strategy.
      </p>
      <p>In the case when our algorithm does not take 20 objects (N &lt; 20, underrun
of objects in the !j 2 H) we take missing objects by random choice.</p>
      <p>
        We use two-class SVM with linear kernel as routine classi er. The value of
parameter C was equal to 100000. The feature space were formed with frequency
properties of text documents using TF*IDF technique [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ], the number of
features were limited by 15 000 of most high rate in a corpus.
      </p>
      <p>
        The core of our algorithm has the same essence as [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] but we have no idea to
try to nd the representative examples. From our point of view it is not possible.
Moreover we apply our method to the task of text classi cation not for the well
known databases.
5
      </p>
    </sec>
    <sec id="sec-6">
      <title>Experimental Study. Results</title>
      <p>
        Two corpuses of texts were used for experimental study: Enron with categories
[
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], BBC news [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Statistics for these datasets are presented in in Table 1.
      </p>
      <p>
        The quality of active learning strategy were evaluated by common indices:
average values of precision, recall, F1 [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] for multiple splitting data on training
initial training set, selection set and veri catory set. For the most of published
results number of splitting was equal to 100. So the total computation costs were
near two months of Intel Core i5 4*4GHz 4GB.
      </p>
      <p>The formal quality estimation for the three stages of each experiment -
reading of expert (automatic opening of labels and moving objects into the training
set) for 100, 200 and 300 documents from selective part (see Table 2). The charts
(Figures 1 and 2) demonstrated the increasing of average quality for two methods
- our (the blue line) and random choice of 20 objects (red line).</p>
      <p>
        For the comparative estimation of quality improving of our method we use
as competitor the algorithm of multi-class classi cation with active learning [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
Results are published for Enron corpus, see gure 1 from named above paper.
Note, that decision of the classi er assumed to be correct when it matches the
decision of at least one expert. Due to another rising of task (multi-class classi
cation) in the Active learning based on Uncertainty and Diversity for Incremental
multi-label learning , AUDI [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] parameter micro F1 [33] is used.
      </p>
      <p>Results are presented in the Table 3. It is obvious that algorithms show
comparable growth of quality.</p>
      <p>a)
Fig. 2. Comparing curves of quality on the test set for Enron dataset. Average quality
(F1) growth throw steps of algorithm by 20 documents. The blue line - our method,
red one - random choice.
Experimental study demonstrates that suggested technique of active learning
despite of its simplicity shows good results and can be used in industrial tasks.
The quality improvement in comparing with random selection of objects on the
BBC corpus was 23% and for di erent subsets of Enron database the was 3,4%.
Reaching of the same quality in the test set our algorithm request in average
586 documents less than random choice for BBC corpus and 163 documents
for Enron corpus. For our opinion the BBC corpus (news texts) used in a lot
of investigations are not typical for industrial tasks, so the real improving of
recognition quality will be closer to results on Enron with categories.
Experimental study reveals the number of problems which will be interesting for future
investigations:
1. which active learning algorithm appropriate to particular feature space;
2. how to use a priori information about non-labeled set in active learning;
3. what is the criterion that labeling process have reached quality saturation.
30. Stumpf S. et al. Integrating rich user feedback into intelligent user interfaces
//Proceedings of the 13th international conference on Intelligent user interfaces. ACM,
2008. p. 50-59.
31. Tong, S. Koller, D. Support vector machine active learning with applications to
text classi cation// JMLR The Journal of Machine Learning Research. (2) 2002. p.
45-66.
32. Vapnik V. Statistical Learning Theory. Wiley-Interscience. NY, 1998.
33. Yang Y. A study of thresholding strategies for text categorization //Proceedings of
the 24th annual international ACM SIGIR conference on Research and development
in information retrieval. ACM, 2001. p. 137-145. doi: 10.1145/383952.383975 2.
34. Yang Z. et al. Active learning for networked data based on non-progressive di usion
model //Proceedings of the 7th ACM international conference on Web search and
data mining. ACM, 2014. p. 363-372.
35. Zhu, Xiaojin Semi-supervised learning literature survey // Computer
Sciences.</p>
      <p>MA(WI,US):University of Wisconsin-Madison, 2008.
36. Zhu X., Goldberg A. B. Introduction to semi-supervised learning //Synthesis
lectures on arti cial intelligence and machine learning. 2009. Vol. 3. No. 1. p. 1-130.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Angluin</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Smith</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <article-title>A survey of inductive inference: Theory and</article-title>
          methods// Computing Surveys.
          <year>1983</year>
          . 15: P.
          <fpage>237</fpage>
          -
          <lpage>269</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Cohn</surname>
            ,
            <given-names>D. A.</given-names>
          </string-name>
          <string-name>
            <surname>Atlas</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Ladner</surname>
            ,
            <given-names>R. E.</given-names>
          </string-name>
          <article-title>Improving generalization with active learning // Machine Learning</article-title>
          .
          <year>1994</year>
          .
          <volume>15</volume>
          (
          <issue>2</issue>
          ). P.
          <volume>201221</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Guillory</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chastain</surname>
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bilmes</surname>
            <given-names>J. A.</given-names>
          </string-name>
          <article-title>Active Learning as Non-Convex Optimization /</article-title>
          /AISTATS.
          <year>2009</year>
          . p.
          <fpage>201</fpage>
          -
          <lpage>208</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Greene</surname>
            <given-names>D.</given-names>
          </string-name>
          , Cunningham P.
          <article-title>Practical solutions to the problem of diagonal dominance in kernel document clustering //</article-title>
          <source>Proceedings of the 23rd international conference on Machine learning. ACM</source>
          ,
          <year>2006</year>
          . p.
          <fpage>377</fpage>
          -
          <lpage>384</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Herbrich</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Graepel</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Williamson</surname>
            ,
            <given-names>R. C.</given-names>
          </string-name>
          <article-title>The Structure of Version Space in Innovations in Machine Learning: Theory</article-title>
          and Applications/ edited by Holmes D.E. ,
          <string-name>
            <surname>Jain L</surname>
          </string-name>
          .C.// p.
          <fpage>263</fpage>
          -
          <lpage>279</lpage>
          .-Berlin: Springer, (
          <volume>2</volume>
          )
          <year>2006</year>
          .- ISBN 3-540-30609-9.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>He</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <article-title>Learning from imbalanced data// IEEE Transactions on Knowledge and Data Engineering</article-title>
          .
          <source>(21.9)</source>
          .
          <year>2009</year>
          . P.
          <volume>1263</volume>
          -
          <fpage>1284</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Huang</surname>
            <given-names>S. J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jin</surname>
            <given-names>R.</given-names>
          </string-name>
          , Zhou
          <string-name>
            <surname>Z. H.</surname>
          </string-name>
          <article-title>Active learning by querying informative and representative examples //</article-title>
          <source>Advances in neural information processing systems</source>
          .
          <year>2010</year>
          . pp.
          <fpage>892</fpage>
          -
          <lpage>900</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <surname>S-J. Zhou</surname>
          </string-name>
          ,
          <string-name>
            <surname>Z-H. Active Query</surname>
          </string-name>
          <article-title>Driven by Uncertainty and Diversity for Incremental Multi-Label Learning Data Mining/ (ICDM)</article-title>
          ,
          <source>IEEE 13th International Conference</source>
          .
          <year>2013</year>
          .- ISSN 1550-
          <fpage>4786</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Lewis</surname>
            ,
            <given-names>D. D.</given-names>
          </string-name>
          <string-name>
            <surname>Catlett</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <article-title>Heterogeneous uncertainty sampling for supervised learning/</article-title>
          <source>In Proceedings ICML 94</source>
          ,
          <year>1994</year>
          . pages
          <fpage>148156</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Lin</surname>
            <given-names>C. H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mausam</surname>
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weld</surname>
            <given-names>D. S.</given-names>
          </string-name>
          <string-name>
            <surname>Re-Active</surname>
            <given-names>Learning</given-names>
          </string-name>
          : Active Learning with Relabeling //AAAI.
          <year>2016</year>
          . p.
          <fpage>1845</fpage>
          -
          <lpage>1852</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11. Jantke P.
          <article-title>Types of incremental learning //AAAI Symposium on Training Issues in Incremental Learning</article-title>
          .
          <year>1993</year>
          . p.
          <fpage>23</fpage>
          -
          <lpage>25</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Klimt</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <article-title>The Enron Corpus: A New Dataset for Email Classi cation Research / in Proceedings ECML04</article-title>
          . P.
          <volume>217</volume>
          -
          <fpage>226</fpage>
          .- Pisa, Italy,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Kohonen</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <article-title>The self organizing map /Proceedings of the Institute of Electrical and Electronics</article-title>
          , vol.
          <volume>78</volume>
          , P.
          <volume>14641480</volume>
          ,
          <year>1990</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Kremer</surname>
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Steenstrup Pedersen</surname>
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Igel</surname>
            <given-names>C</given-names>
          </string-name>
          .
          <article-title>Active learning with support vector machines //Wiley Interdisciplinary Reviews: Data Mining</article-title>
          and
          <string-name>
            <given-names>Knowledge</given-names>
            <surname>Discovery</surname>
          </string-name>
          .
          <year>2014</year>
          . Vol.
          <volume>4</volume>
          . No.
          <issue>4</issue>
          . pp.
          <fpage>313</fpage>
          -
          <lpage>326</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Melville</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Sindhwani</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <article-title>Active dual supervision: Reducing the cost of annotating examples</article-title>
          and features //
          <source>In Proceedings of the NAACL HLT 2009 Workshop on Active Learning for Natural Language Processing</source>
          .
          <year>2009</year>
          . P.
          <volume>4957</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16. Mitchell, T. Generalization as search // Arti cial Intelligence,(
          <volume>28</volume>
          )
          <year>1982</year>
          . P.
          <volume>203226</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Olsson</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <article-title>A literature survey of active machine learning in the context of natural language processing // SICS Report</article-title>
          , T2009:
          <fpage>06</fpage>
          .
          <year>2009</year>
          .-ISSN 1100-3154.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Fung</surname>
            <given-names>G. M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mangasarian</surname>
            <given-names>O. L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shavlik</surname>
            <given-names>J. W.</given-names>
          </string-name>
          <article-title>Knowledge-based support vector machine classi ers</article-title>
          <source>//Advances in neural information processing systems</source>
          .
          <year>2003</year>
          . p.
          <fpage>537</fpage>
          -
          <lpage>544</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Raghavan</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <article-title>Tandem learning: A learning framework for document categorization /</article-title>
          <source>Ph.D. thesis</source>
          .-Amherst:Graduate school of Massachusetts,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Raghavan</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Madani</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          <string-name>
            <surname>Jones</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <article-title>Active Learning with Feedback on Both Features</article-title>
          and Instances //
          <source>Journal of Machine Learning Research. (7</source>
          )
          <year>2006</year>
          . P.
          <volume>16551686</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Rubens</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Elahi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Sugiyama</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Kaplan</surname>
          </string-name>
          , D. Active Learning in Recommender Systems //In ed. Ricci F.,
          <string-name>
            <surname>Rokach</surname>
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shapira</surname>
            <given-names>B.</given-names>
          </string-name>
          ,
          <source>Recommender Systems Handbook (2 ed.)</source>
          .-US: Springer,
          <year>2016</year>
          .- ISBN 978-1-
          <fpage>4899</fpage>
          -7637-6.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Salton</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>McGill</surname>
            ,
            <given-names>M. J.</given-names>
          </string-name>
          <article-title>Introduction to modern information retrieval</article-title>
          .
          <source>McGrawHill.</source>
          ,
          <year>1986</year>
          .-ISBN 978-
          <fpage>0070544840</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Scudder</surname>
            ,
            <given-names>H. J.</given-names>
          </string-name>
          <article-title>Probability of Error of Some Adaptive Pattern-Recognition Machines/</article-title>
          / IEEE Transaction on Information Theory, (
          <volume>11</volume>
          )
          <year>1965</year>
          . P.
          <volume>363371</volume>
          . Cited in Chapelle et al.
          <year>2006</year>
          , page 3.
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Sculley</surname>
            <given-names>D.</given-names>
          </string-name>
          <article-title>Online active learning methods for fast label-e cient spam ltering /</article-title>
          /CEAS.
          <year>2007</year>
          . Vol.
          <volume>7</volume>
          . p.
          <fpage>143</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25. Sebastiani F.
          <source>Machine learning in automated text categorization //ACM computing surveys (CSUR)</source>
          .
          <year>2002</year>
          . Vol.
          <volume>34</volume>
          . No.
          <volume>1</volume>
          . p.
          <fpage>1</fpage>
          -
          <lpage>47</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <surname>Settles</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Active</surname>
          </string-name>
          Learning Literature Survey // Machine Learning.
          <year>2010</year>
          . Vol.
          <volume>15</volume>
          , No. 2. P.
          <volume>201221</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          27.
          <string-name>
            <surname>Settles</surname>
          </string-name>
          , B. Active Learning // Synth. Lectures Arti cial Intelligence Mach. Learn.
          <year>2012</year>
          . Vol.
          <volume>6</volume>
          , No. 1. P.
          <volume>1114</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          28.
          <string-name>
            <surname>Sharma</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Bilgic</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <article-title>Towards Learning with Feature-Based Explanations for Document Classi cation</article-title>
          .-IL:. Illinois Institute of Technology, Chicago, USA.
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          29.
          <string-name>
            <surname>Sutton</surname>
            ,
            <given-names>R. S.</given-names>
          </string-name>
          <article-title>Temporal Credit Assignment in Reinforcement Learning /</article-title>
          (
          <source>PhD thesis)</source>
          .
          <article-title>-MA(US):</article-title>
          .Amherst: University of Massachusetts,
          <year>1984</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>