<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Sanremo's winner is... Category-driven Selection Strategies for Active Learning</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Anne-Lyse Minard</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Manuela Speranza</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mohammed R. H. Qwaider</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bernardo Magnini Fondazione Bruno Kessler</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Trento</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Italy</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>minard</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>manspera</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>qwaider</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>magninig@fbk.eu</string-name>
        </contrib>
      </contrib-group>
      <pub-date>
        <year>2009</year>
      </pub-date>
      <volume>5478</volume>
      <fpage>6</fpage>
      <lpage>9</lpage>
      <abstract>
        <p>English. This paper compares Active Learning selection strategies for sentiment analysis of Twitter data. We focus mainly on category-driven strategies, which select training instances taking into consideration the confidence of the system as well as the category of the tweet (e.g. positive or negative). We show that this combination is particularly effective when the performance of the system is unbalanced over the different categories. This work was conducted in the framework of automatically ranking the songs of “Festival di Sanremo 2017” based on sentiment analysis of the tweets posted during the contest.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Italiano. Questo lavoro confronta
strategie di selezione di Active Learning per
l’analisi del sentiment dei tweet
focalizzandosi su strategie guidate dalla
categoria. Selezioniamo istanze di
addestramento combinando la categoria del tweet
(per esempio positivo o negativo) con il
grado di confidenza del sistema. Questa
combinazione e` particolarmente efficace
quando la distribuzione delle categorie
non e` bilanciata. Questo lavoro aveva
come scopo il ranking delle canzoni del
“Festival di Sanremo 2017” sulla base
dell’analisi del sentiment dei tweet postati
durante la manifestazione.
al., 1994). In the AL framework samples are
usually selected according to several criteria, such as
informativeness, representativeness, and diversity
        <xref ref-type="bibr" rid="ref10">(Shen et al., 2004)</xref>
        .
      </p>
      <p>This paper investigates AL selection strategies
that consider the categories the current classifier
assigns to samples, combined with the confidence
of the classifier on the same samples. We are
interested in understanding whether these strategies
are effective, particularly when category
distribution and category performance are unbalanced. By
comparing several options, we show that
selecting low confidence samples of the category with
the highest performance is a better strategy than
selecting high confidence samples of the category
with the lowest performance.</p>
      <p>The context of our study is the development of a
sentiment analysis system that classifies tweets in
Italian. We used the system to automatically rank
the songs of Sanremo 2017 based on the sentiment
of the tweets posted during the contest.</p>
      <p>The paper is structured as follows. In Section 2
we give an overview of the state-of-the-art in
selection strategies for AL. Then we present our
experimental setting (Section 3) before detailing the
tested selection strategies (Section 4). Finally, we
describe the results of our experiment in Section 5
and the application of the system to ranking
Sanremo’s songs in Section 6.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        AL
        <xref ref-type="bibr" rid="ref1 ref9">(Cohn et al., 1994; Settles, 2010)</xref>
        provides a
well known methodology for reducing the amount
of human supervision (and the corresponding cost)
for the production of training datasets necessary
in many Natural Language Processing tasks. An
incomplete list of references includes Shen et al.
(2004) for Named Entity Recognition, Ringger et
al. (2007) for PoS Tagging, and Schohn and Cohn
(2000) for Text Classification.
      </p>
      <p>
        AL methods are based on strategies for
sample selection. Although there are two main
types of selection methods, certainty-based and
committee-based, here we concentrate only on
certainty-based selection methods. The main
certainty-based strategy used is the uncertainty
sampling method
        <xref ref-type="bibr" rid="ref1 ref4">(Lewis and Gale, 1994)</xref>
        . Shen et
al. (2004) propose a strategy which is based on the
combination of several criteria: informativeness,
representativeness, and diversity. The results
presented by Settles and Craven (2008) show that
information density is the best criterion for sequence
labeling. Tong and Koller (2002) propose three
selection strategies that are specific to SVM
learners and are based on different measures taking into
consideration the distances to the decision
hyperplane and margins.
      </p>
      <p>Many NLP tasks suffer from unbalanced data.
Ertekin et al. (2007) show that selecting examples
within the margin overcomes the problem of
unbalanced data.</p>
      <p>The previously cited selection strategies are
often applied to binary classification and do not take
into account the predicted class. In this work we
are interested in multi-class classification tasks,
and in the problem of unbalanced data and
dominant classes in terms of performance.</p>
      <p>Esuli and Sebastiani (2009) define three
criteria that they combine to create different selection
strategies in the context of multi-label text
classification. The criteria are based on the confidence
of the system for each label, a combination of the
confidence of each class for one document, and a
weight (based on the F1-measure) assigned to each
class to distinguish those for which the system
performs badly. They show that in most of the cases
this last criteria does not improve the selection.</p>
      <p>Our applicative context is a bit different as we
are not working on a multi-label task. Instead of
computing a weight according to the F1-measure,
we experimented with a change of strategy where
we focus on a single class.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Experimental Setting</title>
      <p>The context of our study was the development of
a supervised sentiment analysis system that
classifies tweets into one of the following four classes:
positive, negative, neutral, and n/a
(i.e. not applicable).</p>
      <p>The manual annotation of the data was mainly
performed by 25 3rd and 4th year students from
local high schools who were doing a one-week
group internship at Fondazione Bruno Kessler.</p>
      <p>We created an initial training set using an AL
mechanism that selects the samples with the
lowest system confidence1, i.e. those closer to the
hyperplane and therefore most difficult to classify. In
the following we describe the sentiment analysis
system, the Active Learning process and the
creation of the test and the initial training set. Finally,
we introduce the experiments performed on
selection strategies for Active Learning.</p>
      <p>Sentiment Analysis System. Our system for
sentiment analysis is based on a supervised
machine learning method using the SVM-MultiClass
tool (Joachims et al., 2009)2. We extract the
following features from each tweet: the tokens
composing the tweet, and the number of urls, hashtags,
and aliases it contains. It takes as input a tokenized
tweet3 and returns as output its polarity.</p>
      <p>
        AL Process. We used TextPro-AL, a platform
which integrates an NLP pipeline, an AL
mechanism and an annotation interface
        <xref ref-type="bibr" rid="ref5">(Magnini et al.,
2016)</xref>
        . The AL process is as follows: (i) a large
unlabeled dataset is annotated by the sentiment
analysis system (with a small temporary model
used to initialize the AL process4); (ii) samples are
selected according to a selection strategy; (iii)
annotators annotate the selected tweets; (iv) the new
annotated samples are accumulated in the batch;
(v) when the batch is full the annotated data are
added to the existing training dataset and a new
model is built; (vi) the unlabeled dataset is
annotated again using the newly built model and the
cycle begins again at (ii).
      </p>
      <p>The unlabeled dataset consists of 400,000
tweets that contained the hashtag #Sanremo2017.
The maximum size of the batch is 120, so
retraining takes place every 120 annotated tweets.
Training and Performance. The initial training
set, whose creation required half a day of work5, is
1The confidence score is computed as the average of the
margin estimated by the SVM classifier for each entity.</p>
      <p>2https://www.cs.cornell.edu/people/tj/
svm_light/svm_multiclass.html</p>
      <p>3Tokenization is performed using the Twokenizer
java library https://github.com/vinhkhuc/
Twitter-Tokenizer/blob/master/src/
Twokenizer.java</p>
      <p>4The temporary model has been built using 155 tweets
annotated manually by one annotator. After the first step of
the AL process, these tweets are removed from the training
set.</p>
      <p>5The 25 high schools students worked in pairs and trios,
for a total of 12 groups.
composed of 2,702 tweets. The class negative
is the most represented, covering almost 40% of
the total, with respect to positive, with around
30% of the total. The distribution of the two
minor classes is rather close, with 18% for neutral
and 13% for n/a.</p>
      <p>As a test set we used 1,136 tweets randomly
selected from among all the tweets which mentioned
either a Sanremo song or singer. The test set was
annotated partly by the high school students (656
tweets) and partly by two expert annotators (480
tweets); each tweet was annotated with the same
category by at least two annotators. 58% of the
tweets are positive, 20% are negative, 14%
are neutral, and 8% are n/a.</p>
      <p>We built the test set selecting the tweets
randomly from the unlabeled dataset in order to make
it representative of the whole dataset.</p>
      <p>The overall performance of the system trained
on the initial set is 40.7 in terms of F1 (see
EVAL2702 in Table 1). The F1 obtained on
the two main categories, i.e. positive and
negative, is 54.5, but the system performs more
poorly on negative than on positive, with
F1-measures of 33.6 and 75.4 respectively.
Experiment. As the evaluation showed good
results on positive but poor results on
negative, we devised and tested novel selection
strategies better able to balance the performance of
the system over the two classes. We divided the 25
annotators into three different groups: each group
annotated 775 tweets. The tweets annotated by the
first group were selected with the same strategy
used before, whereas for the other two groups we
implemented two new selection strategies taking
into account not only the confidence of the system
but also the class it assigns to a tweet. As a
result we obtained three different extensions of the
same size and were thus able to compare the
performance of the system trained on the initial
training set plus each of the extensions.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Selection Strategies</title>
      <p>We tested three selection strategies that take into
account the classification proposed by the
system in order to select the most useful samples to
improve the distinction between positive and
negative.</p>
      <p>S1: low confidence. The first strategy we tested
is the baseline strategy, which selects tweets
classified by the system with the lowest confidence.
The low confidence strategy was also used to build
the initial training set (S0: lowC) as described is
Section 3.</p>
      <p>S2: NEGATIVE with high confidence. The
second strategy consists of selecting the samples
classified as negative with the highest
confidence. We assume that this will increase the
amount of negative tweets selected, thus enabling
us to improve the performance of the system on
the negative class. Nevertheless, as the
system has a high confidence on the classification of
these tweets, through this strategy we are adding
easy examples to the training set that the system is
probably already able to classify correctly.</p>
    </sec>
    <sec id="sec-5">
      <title>S3: POSITIVE with low confidence. The third</title>
      <p>strategy aims at selecting the positive tweets
for which the system has the lowest confidence.
We expect in this way to get the difficult cases, i.e.
tweets that are close to the hyperplane and that are
classified as positive but whose classification
has a high chance of being incorrect.</p>
      <p>As the initial system has high recall (82.8) but
low precision (69.3) for the class positive, we
assume that it needs to improve on the examples
wrongly classified as positive. We expect that
inside the tweets wrongly classified as positive
we will find difficult cases of negative tweets
which will help to improve the system on the
negative class. On the other hand, recall for the
negative class is low (25.7), whereas precision
is slightly better (48.7), which is why we decided
to extract positive tweets with low confidence
instead of negative tweets with low confidence.
5</p>
    </sec>
    <sec id="sec-6">
      <title>Results and Discussion</title>
      <p>In Table 1 we present the results (in tersm of F1)
obtained by the system using the additional
training data selected through the three different
selection strategies described above. In order to
facilitate the interpretation of the results, we also report
the performance obtained by the system trained
only on the initial set of 2,702 tweets.
Additionally, in Table 2, we give the results obtained by
the system for each configuration also in terms of
recall and precision (besides F1).</p>
      <p>The first four lines report the results for each of
the four categories, while lines six and seven
report respectively the macro-average F1 over the
four classes and the macro-average F1 over the</p>
      <sec id="sec-6-1">
        <title>Strategy used</title>
      </sec>
      <sec id="sec-6-2">
        <title>NEGATIVE</title>
      </sec>
      <sec id="sec-6-3">
        <title>POSITIVE</title>
      </sec>
      <sec id="sec-6-4">
        <title>NEUTRAL N/A</title>
      </sec>
      <sec id="sec-6-5">
        <title>Average 4 classes</title>
      </sec>
      <sec id="sec-6-6">
        <title>Average POS/NEG</title>
      </sec>
      <sec id="sec-6-7">
        <title>Strategy used</title>
      </sec>
      <sec id="sec-6-8">
        <title>NEGATIVE POSITIVE NEUTRAL N/A</title>
        <p>Average 4 classes
Average POS/NEG
wrt S0
wrt S0
wrt S0
wrt S0
wrt S0
wrt S0
two most important classes, i.e. positive and
negative. For each selection strategy, we
indicate the difference in performance obtained with
respect to the system trained on the initial set, as
well as the number of annotated tweets that have
been added.</p>
        <p>With the baseline strategy (S1: lowC, i.e.,
selection of the tweets for which the system has the
lowest confidence) the performance of the system
decreases slightly, from an F1 of 40.7 to an F1
of 39.8. Most of the added samples are
negative tweets (38%), which enables the system to
increase its performance on this class by 1.2 points.</p>
        <p>When using the second strategy (S2:
NEGhighC, i.e. selection of the negative tweets with
the highest confidence), 76% of the new tweets are
negative, but the performance of the system on this
class decreases. Even the overall performance of
the system decreases, despite adding 775 tweets.</p>
        <p>We observe that the best strategy is S3
(POSlowC, i.e., selection of the positive tweets with
the lowest confidence), with an improvement of
the macro-average F1-measure over the 4 classes
by 1.6 points and over the positive and
negative classes by 3.4 points. Although we
add more positive than negative tweets to the
training data (34%), the performance of the system on
the negative class increases as well, from F1
33.6 to F1 39.3. This strategy worked very well in
enabling us to select the examples which help the
system discriminate between the two main classes.
6</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>Application: Sanremo’s Ranking</title>
      <p>After evaluating the three different selection
strategies, we trained a new model using all the
tweets that had been annotated. With this new
model, as expected, we obtained the best results.
The average F-measure on the negative and
positive classes is 58.2, the average F-measure
over the 4 classes is 42.1.</p>
      <p>For the annotation to be used for producing the
automatic ranking, we provided the system with
some gazetteers, i.e. a list of words that carry
positive polarity and a list of words that carry negative
polarity. We thus obtained a small improvement in
system performance, with an F1 of 42.8 on the
average of the four classes and an F1 of 58.3 on the
average of positive and negative.</p>
      <p>As explained in the Introduction, the applicative
scope of our work was to rank the songs
competing in Sanremo 2017. For this, we used only the
total number of tweets talking about each singer
and the polarity assigned to each tweet by the
system. In total we had 118,000 tweets containing
either a reference to a competing singer or song that
had been annotated automatically by the sentiment
analysis system. By doing the ranking according
to the proportion of positive tweets of each singer,
we were able to identify 4 out of the top 5 songs
and 4 out of the 5 last place songs. In Table 3,
we show the official ranking versus the automatic
ranking. The Spearman’s rank correlation
coefficient between the official ranking and our ranking
is 0.83, and the Kendall’s tau coefficient is 0.67</p>
      <sec id="sec-7-1">
        <title>Singer Official System</title>
        <p>Francesco Gabbani 1 8
Fiorella Mannoia 2 4
Ermal Meta 3 1
Michele Bravi 4 2
Paola Turci 5 5
Sergio Sylvestre 6 6
Fabrizio Moro 7 3
Elodie 8 9
Bianca Atzei 9 13
Samuel 10 7
Michele Zarrillo 11 10
Lodovica Comello 12 12
Marco Masini 13 14
Chiara 14 11
Alessio Bernabei 15 16</p>
        <p>Clementino 16 15
We have presented a comparative study of three
AL selection strategies. We have shown that a
strategy that takes into account both the
automatically assigned category and the system’s
confidence performs well in the case of unbalanced
performance over the different classes.</p>
        <p>To complete our study it would be interesting
to perform further experiments on other
multiclassification problems. Unfortunately this work
required intensive annotation work and so its
replication on other tasks would be very expensive. A
lot of work on Active Learning has been done
using existing annotated corpora, but we think that
it is too far from a real annotation situation as the
datasets used are generally limited in tems of size.</p>
        <p>In order to test different selection strategies,
we have evaluated the sentiment analysis
system against a gold standard, but we have also
performed an application-oriented evaluation by
ranking the songs participating in Sanremo 2017.</p>
        <p>As future work, we want to explore the
possibility of automatically adapting the selection
strategies while annotating. For example, if the
performance of the classifier of one class is low, the
strategy in use could be changed in order to select the
samples needed to improve on that class.</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgments</title>
      <p>This work has been partially funded by the
EuclipRes project, under the program Bando
Innovazione 2016 of the Autonomous Province of
Bolzano. We also thank the high school students
who contributed to this study with their annotation
work within the FBK Junior initiative.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>David</given-names>
            <surname>Cohn</surname>
          </string-name>
          , Richard Ladner, and
          <string-name>
            <given-names>Alex</given-names>
            <surname>Waibel</surname>
          </string-name>
          .
          <year>1994</year>
          .
          <article-title>Improving generalization with active learning</article-title>
          .
          <source>In Machine Learning</source>
          , pages
          <fpage>201</fpage>
          -
          <lpage>221</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>Seyda</given-names>
            <surname>Ertekin</surname>
          </string-name>
          , Jian Huang, Le´on Bottou, and
          <string-name>
            <given-names>C. Lee</given-names>
            <surname>Giles</surname>
          </string-name>
          .
          <year>2007</year>
          .
          <article-title>Learning on the border: active learning in imbalanced data classification</article-title>
          . In Ma´rio
          <string-name>
            <given-names>J.</given-names>
            <surname>Silva</surname>
          </string-name>
          ,
          <string-name>
            <surname>Alberto H. F. Laender</surname>
          </string-name>
          , Ricardo A. BaezaYates, Deborah L.
          <string-name>
            <surname>McGuinness</surname>
          </string-name>
          ,
          <string-name>
            <surname>Bjørn Olstad</surname>
          </string-name>
          , Øystein Haug Olsen, and
          <string-name>
            <surname>Andre</surname>
          </string-name>
          ´ O. Falca˜o, editors,
          <source>Proceedings of the Sixteenth ACM Conference on Information and Knowledge Management</source>
          ,
          <string-name>
            <surname>CIKM</surname>
          </string-name>
          <year>2007</year>
          , Lisbon, Portugal, November 6-
          <issue>10</issue>
          ,
          <year>2007</year>
          , pages
          <fpage>127</fpage>
          -
          <lpage>136</lpage>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>Andrea</given-names>
            <surname>Esuli</surname>
          </string-name>
          and
          <string-name>
            <given-names>Fabrizio</given-names>
            <surname>Sebastiani</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>Active learning strategies for multi-label text classification</article-title>
          . In Mohand Boughanem, Catherine Berrut, Josiane Mothe, and
          <string-name>
            <given-names>Chantal</given-names>
            <surname>Soule</surname>
          </string-name>
          ´-Dupuy, editors,
          <source>Advances in Information Retrieval, 31th European Thorsten Joachims</source>
          , Thomas Finley, and
          <string-name>
            <given-names>ChunNam John</given-names>
            <surname>Yu</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>Cutting-plane training of structural svms</article-title>
          .
          <source>Mach</source>
          . Learn.,
          <volume>77</volume>
          (
          <issue>1</issue>
          ):
          <fpage>27</fpage>
          -
          <lpage>59</lpage>
          , October.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>David D.</given-names>
            <surname>Lewis</surname>
          </string-name>
          and
          <string-name>
            <given-names>William A.</given-names>
            <surname>Gale</surname>
          </string-name>
          .
          <year>1994</year>
          .
          <article-title>A sequential algorithm for training text classifiers</article-title>
          .
          <source>In Proc.International ACM SIGIR conference on Research and development in information retrieval (SIGIR)</source>
          , pages
          <fpage>3</fpage>
          -
          <lpage>12</lpage>
          , New York, NY, USA. SpringerVerlag New York, Inc.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>Bernardo</given-names>
            <surname>Magnini</surname>
          </string-name>
          ,
          <string-name>
            <surname>Anne-Lyse</surname>
            <given-names>Minard</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mohammed R. H. Qwaider</surname>
            , and
            <given-names>Manuela</given-names>
          </string-name>
          <string-name>
            <surname>Speranza</surname>
          </string-name>
          .
          <year>2016</year>
          . TEXTPRO-AL:
          <article-title>An Active Learning Platform for Flexible and Efficient Production of Training Data for NLP Tasks</article-title>
          .
          <source>In Proceedings of COLING</source>
          <year>2016</year>
          ,
          <article-title>the 26th International Conference on Computational Linguistics: System Demonstrations</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>Eric</given-names>
            <surname>Ringger</surname>
          </string-name>
          ,
          <string-name>
            <surname>Peter McClanahan</surname>
            ,
            <given-names>Robbie</given-names>
          </string-name>
          <string-name>
            <surname>Haertel</surname>
            , George Busby, Marc Carmen, James Carroll, Kevin Seppi, and
            <given-names>Deryle</given-names>
          </string-name>
          <string-name>
            <surname>Lonsdale</surname>
          </string-name>
          .
          <year>2007</year>
          .
          <article-title>Active learning for part-of-speech tagging: Accelerating corpus annotation</article-title>
          .
          <source>In Proceedings of the Linguistic Annotation Workshop</source>
          , LAW '
          <volume>07</volume>
          , pages
          <fpage>101</fpage>
          -
          <lpage>108</lpage>
          , Stroudsburg, PA, USA. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>Greg</given-names>
            <surname>Schohn</surname>
          </string-name>
          and
          <string-name>
            <given-names>David</given-names>
            <surname>Cohn</surname>
          </string-name>
          .
          <year>2000</year>
          .
          <article-title>Less is more: Active learning with support vector machines</article-title>
          .
          <source>In Proceedings of the Seventeenth International Conference on Machine Learning, ICML '00</source>
          , pages
          <fpage>839</fpage>
          -
          <lpage>846</lpage>
          , San Francisco, CA, USA. Morgan Kaufmann Publishers Inc.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>Burr</given-names>
            <surname>Settles</surname>
          </string-name>
          and
          <string-name>
            <given-names>Mark</given-names>
            <surname>Craven</surname>
          </string-name>
          .
          <year>2008</year>
          .
          <article-title>An analysis of active learning strategies for sequence labeling tasks</article-title>
          .
          <source>In 2008 Conference on Empirical Methods in Natural Language Processing, EMNLP 2008, Proceedings of the Conference</source>
          ,
          <volume>25</volume>
          -27
          <source>October</source>
          <year>2008</year>
          , Honolulu, Hawaii, USA,
          <article-title>A meeting of SIGDAT, a Special Interest Group of the ACL</article-title>
          , pages
          <fpage>1070</fpage>
          -
          <lpage>1079</lpage>
          . ACL.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>Burr</given-names>
            <surname>Settles</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>Active learning literature survey</article-title>
          .
          <source>Technical report.</source>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <given-names>Dan</given-names>
            <surname>Shen</surname>
          </string-name>
          , Jie Zhang, Jian Su,
          <string-name>
            <surname>Guodong Zhou</surname>
          </string-name>
          , and
          <string-name>
            <surname>Chew-Lim Tan</surname>
          </string-name>
          .
          <year>2004</year>
          .
          <article-title>Multi-criteria-based active learning for named entity recognition</article-title>
          .
          <source>In Proceedings of the 42Nd Annual Meeting on Association for Computational Linguistics</source>
          , ACL '04,
          <string-name>
            <surname>Stroudsburg</surname>
          </string-name>
          , PA, USA. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <given-names>Simon</given-names>
            <surname>Tong</surname>
          </string-name>
          and
          <string-name>
            <given-names>Daphne</given-names>
            <surname>Koller</surname>
          </string-name>
          .
          <year>2002</year>
          .
          <article-title>Support vector machine active learning with applications to text classification</article-title>
          .
          <source>J. Mach. Learn. Res.</source>
          ,
          <volume>2</volume>
          :
          <fpage>45</fpage>
          -
          <lpage>66</lpage>
          ,
          <year>March</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>