<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Active learning experiments for the classification of smoking tweets</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Aki Ha¨rma¨</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrey Polyakov</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ekaterina Chernyak</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Philips Research</institution>
          ,
          <addr-line>Eindhoven</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Philips Research</institution>
          ,
          <addr-line>Moscow</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In automated health services based on text and voice interfaces, there is need to be able to understand what the user is talking about, and what is the attitude of the user towards a subject. Typical machine learning methods for text analysis require a lot of annotated data for the training. This is often a problem in addressing specific and possibly very personal health care needs. In this paper we propose an active learning algorithm for the training of a text classifier for a conversational therapy application in the area of health behavior change. A new active learning algorithm, Query by Embedded Committee (QBEC), is proposed in the paper. The methods is particularly suitable for the text classification task in a dynamic environment and gives a good performance with realistic test data.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        The application context of the current paper is the development of automated
therapeutic conversational interventions for behavior change [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], in particular, related to
substance abuse. Counseling is known to be the most effective intervention to many
lifestyle diseases but counseling sessions are expensive for the health care system and
often inconvenient for patients. Automation of the effective mechanisms of
counseling by automated agents would lead to a better coverage and cost savings. In a typical
application a conversational agent would implement some elements of the Cognitive
Behavioral Therapy [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Typically, the agent would be available through a social media
platform possibly with a speech interface. The text understanding system should be able
to detect the topics and sentiment structures relevant for the control of the conversation
according to the selected therapeutic strategy.
      </p>
      <p>Recurrent neural networks are popular for text understanding but they require a
large corpus of labeled training data, which is difficult to collect. In addition,
natural language communication is an example of a non-stationary learning environment
where the evolution in the conversational culture over time and populations require a
local customization and maintenance of the classifier, possibly even at the level of an
individual customer.</p>
      <p>
        One approach for the maintenance and continuous improvement of a classifier in the
production environment is to use active learning (AL) methods [
        <xref ref-type="bibr" rid="ref15 ref3">3, 15</xref>
        ]. In pool-based
AL methods only a small part of the available content is manually labeled and used to
train the classifier. A typical approach is to use a committee of classifiers [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] to select
items that are difficult to classify based on the current statistics. This approach works
well in many conventional problems but often leads to robustness problems that are
common in many deep learning architectures [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>
        In this paper we demonstrate an application of active learning in the classification
of short text messages, tweets, from a social media platform using a text classifier based
on recurrent Neural Networks, RNNs [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. We propose an algorithm for the pool-based
selection where the committee method is applied in a space spanned by the class
likelihoods of the current classifier. In this paper the method is called Query by Embedded
Committee, QBEC. The method has interesting properties and it is computationally
significantly lighter than the conventional Query-by-Committee, QBC, method.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Sample selection methods</title>
      <p>
        In pool-based active learning [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] new samples are selected to the training data from
a large pool of unlabeled content. The selection may be based on different principles
and aim at selecting the most informative or representative samples [
        <xref ref-type="bibr" rid="ref16 ref9">9, 16</xref>
        ], reduce the
variance of the classification errors [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], or diameter in a space spanned by alternative
classifiers [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>The AL process starts with an initial set P0 of labeled tuples of K feature vectors
xk and corresponding labels lk, i.e.,</p>
      <p>P0 = fxk; Lkg; k = 0;
; K
1
(1)
The Initial classification model M0 is developed using P0. Next, a new set S1 is selected
from the pool. The samples are manually labeled by a human oracle, for example, a
health counselor. The new training data P1 is produced by adding the samples S1 to P0.
The model is updated and deployed. The same update cycle can then be continuously
repeated.</p>
      <p>The selection of the next batch Sj+1 of B samples can be based on many different
criteria. The minimum requirements for a jth iteration are
1. novelty: Pj \ Sj+1 = ;
2. richness: xn 6= xm; 8n; m 2 Sj+1
i.e., the B new samples in Sj+1 should be novel and they should be different from each
other.
2.1</p>
      <sec id="sec-2-1">
        <title>Query by Committee</title>
        <p>
          In the popular Query-by-Committee (QBC) method [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] the novelty condition is
addressed by measuring the disagreement in a committee of R different classifiers Cr
trained using Pj .
        </p>
        <p>dj = D [C0(xk); C1(xk);
; CR(xk)]
(2)
where D[] is some measure to compute the disagreement.</p>
        <p>In a typical case, the disagreement is based on vectors of class likelihoods given by
the classifiers pr(xk) = Cr(xk). In a committee of two classifiers, the disagreement
can be defined as a norm of the difference dk = jp0(xk) p1(xk)j.</p>
        <sec id="sec-2-1-1">
          <title>Algorithm 1 Query by Committee</title>
          <p>
            The committee often disagrees on very similar samples and therefore the basic
algorithm does not provide the required richness for the new sample collection. A pareto
optimal solution is needed to meet both the novelty and richness conditions. The
richness is related to the nearest neighbor problem (NN). The k-nearest-neighbor searching
problem (kNN) is to find the k -nearest points in a dataset X Rd containing n points
to a query point q 2 Rd under some norm. There are several effective methods for
this problem when the dimension d is small (e.g. 1,2,3), such as Voronoi diagrams [
            <xref ref-type="bibr" rid="ref17">17</xref>
            ]
or Delaunay triangulation [
            <xref ref-type="bibr" rid="ref5">5</xref>
            ]. When the dimension is moderate (e.g. up to the 10s),
it is possible to use kd trees [
            <xref ref-type="bibr" rid="ref8">8</xref>
            ] and metric trees [
            <xref ref-type="bibr" rid="ref10">10</xref>
            ]. If the dimension is high then
Locality-Sensitive Hashing (LSH) is very popular method used in applications. In the
current paper we use an iterative algorithm where the new samples that are close to
already selected samples are penalized.
2.2
          </p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>Query by Embedded Committee</title>
        <p>
          The selection of the new samples based on a disagreement of a committee assumes
a certain variability among the committee members [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. This is typically achieved
by using different initialization of the classifiers Cj , or by using different classifier
prototypes or kernels. In the case of a complex model, for example, based on multiple
layers of memory networks and dense layers, the training of a committee can be a
large effort and may take, for example, several hours of processing time in an GPU.
In principle, the training of each committee member model takes as much time as the
training of the main model itself. However, the final scoring of the network is light and
can be performed, for example, in a smart phone or other end-user device.
        </p>
        <p>In this paper the proposed method is to use the committee in another feature space
derived from the outputs of the model. A multi-class classifier is often developed using
the one-hot encoding principle where the classifier produces a vector of class
likelihoods pr(xk) = M (xk) for a feature vector xk. The likelihoods represent the class
predictions for the testing data. In geometric sense the likelihood vectors pr(xk) span
an orthonormal space, a class space of the current classifier, where each axis represents
a class.</p>
        <p>The proposed method is a variation of the QBC method where selection task is
performed in the class space of the current classifier. The class space is a metric
lowdimensional space and there the committee can be based on conventional classification
tools, e.g., based on a random forest or other relatively light algorithm. The training
of the committee of classifiers and testing of them on a new data can be performed in
an end-user device. Therefore, this enables a local active learning of the classification
model.</p>
        <p>The processing steps of the proposed method are described in Algorithm 2 below.</p>
        <sec id="sec-2-2-1">
          <title>Algorithm 2 Query by Embedded Committee (QBCSC)</title>
          <p>1: repeat
2: Use classifier Mj to get class likelihood vectors for data pr(xk)8xk 2 Pj
3: Use QBC method defined in the class space to select the new samples for labeling.
4: Add new samples to the training data Pj+1; j = j + 1
5: until Stopping criteria are met.</p>
          <p>In this paper we call the modified method Query by Embedded Committee (QBEC),
to separate it from the conventional QBC. In the current paper the committee is
embedded in the class space. Naturally the same can be performed also in another output
space, for example, corresponding to intermediate layers of the network.
3</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Experiments</title>
      <p>Let us start with a synthetic example to illustrate the differences between QBC and the
proposed QBEC method, and their benefits over random sampling from the pool.
3.1</p>
      <sec id="sec-3-1">
        <title>Synthetic example</title>
        <p>The original synthetic data is shown in Fig.4(a) with two classes illustrated by red
crosses and blue circles. A random forest (RF) classifier was designed for the set P0
with 100 labeled samples. In the QBC method a committee of two RF classifiers was
designed using a different initialization. The selection of new samples was based on
selection of the samples with the largest difference in the class likelihood values between
the classifiers. Fig. 4(b) shows an example of a QBC sampling. A random sampling used
in the reference condition is illustrated in Fig. 4(c). The QBC method clearly takes more
samples from the class borders that the random sampling method. The QBEC method
also focuses on the class borders but puts more emphasis on the borders between classes
rather than outer borders.</p>
        <p>The accuracy in the training in the three methods is shown in Figs. 2(a) and 2(b).
The QBC and QBEC has a similar performance in the first batches but the accuracy of
QBEC method keeps improving at the point where the performance of QBC saturates.
This may be understood in this case by comparing the selections in the two methods
in Fig. 4(b). In the QBCSC the sampling focuses on borders between the classes while
in the conventional QBC solution a large number of samples are selected from the
outskirts of the feature space which is less relevant for the class confusions measured
by the accuracy.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Experiment with tweets</title>
      <p>In this paper the content is from Twitter, which is a popular short-text messaging
platform. The content was selected by keywords that relate to smoking and tobacco use.
In the typical flow of content the test system gave approximately 1000 tweets per day
when excluding repetitions (re-tweets) of the same message.</p>
      <p>
        In the current paper the content is manually classified into three classes: sustain
talk, change talk, and neutral communication. The two first classes are considered
important elements in many therapeutic techniques for substance abuse, such as CBT [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]
or Motivational Interviewing (MI) [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. The target behavior is to reduce or quit
smoking. Sustain and change talk contains all client talk that speaks against or for the target
behavior, respectively. Neutral class contains all other content with the same keywords.
The data contains all English language messages from from Oct. 2017 until the end of
Jan 2018 3 There are cultural elements in the data. For example, the tweets from
October contain messages that relate to the Stoptober smoking cessation campaign in the
UK and other countries, in December there are tweets from people who plan to quit for
January, and there are several referrals to a popular song called cigarette daydreams.
4.1
      </p>
      <sec id="sec-4-1">
        <title>Text classification system</title>
        <p>In this paper we use a typical architecture for a text classifier based on a state-of-the-art
deep learning RNN tools. The text classifier model has six components, presented in
Table 1.</p>
        <p>The embedding layer is meant to map each word of the input text into a low
dimensional embedding vector, while the bidirectional layers get higher level features from
the input, dropout being used for regularization. The hard attention layer is used for
global re-weighting of hidden layers and the desired class label is chosen using regular
dense layer with softmax activation.
3 The ethical and legal approval of the data collection was granted, and handled according to, by
the Internal Committee for Biomedical Experiments (ICBE) of Philips.
The initial classifier C(0) was trained using a manually classified set of 2398 tweets.
Examples of typical tweet types and their counts in the initial training set are shown
in Table 2. In addition, an independent test data set with manually labeled tweets was
used for testing. The performance of C(0) in an independent training set is poor; the
accuracy is barely above 0.5. In the following experiment the active learning process
was executed sequentially so that the current dump of tweets about the target topic was
downloaded once a day, classified using classifier C(n). Approximately one percent,
typically around 30 tweets, were selected to the manual labeling using one of the
selection methods. The samples were manually labeled and included in the training set, and
subsequently used in the training of the next model C(n+1).</p>
        <p>The numbers of new labeled samples resulting from daily 23 iterations in the three
methods are illustrated in Figs. 4a-c. First, it seems that random selection rarely picks
samples from change talk category while those are much more common in the two
other methods. The accuracies in the three methods are shown in Fig.4d. In the random
selection the accuracy does not clearly improve over the iterations but in the two other
methods there is a clear improving trend. Unlike the results with the checker board
data, there is not really a difference in accuracy between QBC and QBEC, although,
the computational requirements and processing time in QBEC is obviously significantly
lower than in QBC.
In this paper we propose a new algorithm for active learning in the application of text
classification in the application of health counseling. The training of a a classifier for a
specific complex talk type requires a large labeled data base which is typically difficult,
expensive, and time consuming. There may also continuous concept drift in the target
area, for example, due various cultural influences.</p>
        <p>A popular approach for active learning is to use a disagreement in a committee of
classifiers to select samples for manual labeling and inclusion into the training. These
methods are commonly called Query-by-Committee (QBC) methods. The QBC
methods require that multiple classifiers are trained for the task. In applications where
classifier is complex, a.e.g, a deep neural network model, and requires a long training time
this may be problematic. In the method introduced in the current paper the committee
selection is performed in a low dimensional space spanned by the likelihoods of the
current classifier model. In this case the actual classifiers of the committee can be fairly
simple. The method is called Query-by-Embedded-Committee (QBEC).</p>
        <p>We demonstrate the performance of QBEC first using synthetic data. The
performance of QBEC turns out to be superior to the random selection of training samples
and it, surprisingly, exceeds the performance of QBC. One may speculate that this is
because the embedding based on the prediction likelihoods inherently zooms the
committee to zoom into areas where the disagreement is largest.</p>
        <p>In a second experiment we trained a complex classifier for classification of tweets
related to smoking behavior into three classes. The classes represent change talk,
sustain talk, and neutral communication of the writer in relation to tobacco use. This is a
very challenging classification problem requiring a large labeled data base. In the
active learning experiment 1% of tweet content downloaded on each day was manually
labeled and included in the new model. It was shown that QBC and QBEC outperform
random selection of samples. However, the results of the two methods are similar.
However, it should be noted that the computational of QBEC is significantly lower than in
QBC. Therefore, the sample selection in QBEC could be performed even in a customer
device such as a smart phone.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Bahdanau</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cho</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bengio</surname>
            ,
            <given-names>Y.:</given-names>
          </string-name>
          <article-title>Neural machine translation by jointly learning to align and translate</article-title>
          .
          <source>arXiv preprint arXiv:1409.0473</source>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Bickmore</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Giorgino</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Health dialog systems for patients and consumers</article-title>
          .
          <source>Journal of Biomedical Informatics</source>
          <volume>39</volume>
          (
          <issue>5</issue>
          ),
          <fpage>556</fpage>
          -
          <lpage>571</lpage>
          (
          <year>Oct 2006</year>
          ). https://doi.org/10.1016/j.jbi.
          <year>2005</year>
          .
          <volume>12</volume>
          .004
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Cohn</surname>
            ,
            <given-names>D.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ghahramani</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jordan</surname>
            ,
            <given-names>M.I.</given-names>
          </string-name>
          :
          <article-title>Active Learning with Statistical Models</article-title>
          .
          <source>J. Artif. Int. Res</source>
          .
          <volume>4</volume>
          (
          <issue>1</issue>
          ),
          <fpage>129</fpage>
          -
          <lpage>145</lpage>
          (
          <year>Mar 1996</year>
          ), http://dl.acm.org/ citation.cfm?id=
          <volume>1622737</volume>
          .
          <fpage>1622744</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Dasgupta</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Coarse Sample Complexity Bounds for Active Learning</article-title>
          . pp.
          <fpage>235</fpage>
          -
          <lpage>242</lpage>
          . NIPS'05, MIT Press, Cambridge, MA, USA (
          <year>2005</year>
          ), http://dl.acm. org/citation.cfm?id=
          <volume>2976248</volume>
          .
          <fpage>2976278</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Delaunay</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Sur la sphere vide. a la memoire de georges voronoi</article-title>
          . Bulletin de l'
          <string-name>
            <surname>Academie des Sciences de l'URSS. Bulletin de l'Academie des Sciences de l'URSS</surname>
          </string-name>
          , no.
          <issue>6</issue>
          ,
          <issue>793800</issue>
          (
          <year>1934</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Fawzi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moosavi-Dezfooli</surname>
            ,
            <given-names>S.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Frossard</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>The Robustness of Deep Networks: A Geometrical Perspective</article-title>
          .
          <source>IEEE Signal Processing Magazine</source>
          <volume>34</volume>
          (
          <issue>6</issue>
          ),
          <fpage>50</fpage>
          -
          <lpage>62</lpage>
          (
          <year>Nov 2017</year>
          ). https://doi.org/10.1109/MSP.
          <year>2017</year>
          .2740965
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Hochreiter</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schmidhuber</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>Long short-term memory</article-title>
          .
          <source>Neural computation 9(8)</source>
          ,
          <fpage>1735</fpage>
          -
          <lpage>1780</lpage>
          (
          <year>1997</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Hofmann</surname>
            ,
            <given-names>S.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Asnaani</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vonk</surname>
            ,
            <given-names>I.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sawyer</surname>
            ,
            <given-names>A.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fang</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>The Efficacy of Cognitive Behavioral Therapy: A Review of Meta-analyses</article-title>
          .
          <source>Cognitive therapy and research 36(5)</source>
          ,
          <fpage>427</fpage>
          -
          <lpage>440</lpage>
          (
          <year>Oct 2012</year>
          ). https://doi.org/10.1007/s10608-012-9476-1, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3584580/
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Hoi</surname>
            ,
            <given-names>S.C.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jin</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lyu</surname>
            ,
            <given-names>M.R.</given-names>
          </string-name>
          :
          <article-title>Batch Mode Active Learning with Applications to Text Categorization and Image Retrieval</article-title>
          .
          <source>IEEE Transactions on Knowledge and Data Engineering</source>
          <volume>21</volume>
          (
          <issue>9</issue>
          ),
          <fpage>1233</fpage>
          -
          <lpage>1248</lpage>
          (
          <year>Sep 2009</year>
          ). https://doi.org/10.1109/TKDE.
          <year>2009</year>
          .60
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moore</surname>
            ,
            <given-names>A.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gray</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            .,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>An investigation of practical approximate nearest neighbor algorithms</article-title>
          . In L. K. Saul,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Weiss</surname>
          </string-name>
          , and L. Bottou, editors,
          <source>Advances in Neural Information Processing Systems</source>
          <volume>17</volume>
          , pages
          <fpage>825832</fpage>
          . MIT Press, Cambridge, MA (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>McCallum</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nigam</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <string-name>
            <surname>Employing</surname>
            <given-names>EM</given-names>
          </string-name>
          and
          <article-title>Pool-Based Active Learning for Text Classification</article-title>
          . pp.
          <fpage>350</fpage>
          -
          <lpage>358</lpage>
          . ICML '
          <fpage>98</fpage>
          , Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (
          <year>1998</year>
          ), http://dl.acm.org/citation.cfm? id=
          <volume>645527</volume>
          .
          <fpage>757765</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corrado</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dean</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Efficient estimation of word representations in vector space</article-title>
          .
          <source>arXiv preprint arXiv:1301.3781</source>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Miller</surname>
            ,
            <given-names>W.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rose</surname>
            ,
            <given-names>G.S.</given-names>
          </string-name>
          :
          <article-title>Toward a Theory of Motivational Interviewing</article-title>
          .
          <source>The American psychologist 64(6)</source>
          ,
          <fpage>527</fpage>
          -
          <lpage>537</lpage>
          (
          <year>Sep 2009</year>
          ). https://doi.org/10.1037/a0016830, http://www.ncbi.nlm.nih.gov/ pmc/articles/PMC2759607/
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Pennington</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Socher</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manning</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          : Glove:
          <article-title>Global vectors for word representation</article-title>
          .
          <source>In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP)</source>
          . pp.
          <fpage>1532</fpage>
          -
          <lpage>1543</lpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Tong</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Koller</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Support Vector Machine Active Learning with Applications to Text Classification</article-title>
          .
          <source>J. Mach. Learn. Res</source>
          .
          <volume>2</volume>
          ,
          <fpage>45</fpage>
          -
          <lpage>66</lpage>
          (
          <year>Mar 2002</year>
          ). https://doi.org/10.1162/153244302760185243, http://dx.doi.org/ 10.1162/153244302760185243
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hu</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yuan</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lu</surname>
          </string-name>
          , J.:
          <article-title>Active learning via query synthesis and nearest neighbour search</article-title>
          .
          <source>Neurocomputing</source>
          <volume>147</volume>
          (
          <string-name>
            <surname>Supplement</surname>
            <given-names>C)</given-names>
          </string-name>
          ,
          <volume>426</volume>
          -
          <fpage>434</fpage>
          (
          <year>Jan 2015</year>
          ). https://doi.org/10.1016/j.neucom.
          <year>2014</year>
          .
          <volume>06</volume>
          .042, http://www.sciencedirect.com/science/article/pii/ S0925231214008145
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Ying</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mao</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          :
          <article-title>Point cluster analysis using a 3d voronoi diagram with applications in point cloud segmentation</article-title>
          .
          <source>International Journal of GeoInformation (ISPRS)</source>
          ,
          <volume>4</volume>
          (
          <issue>3</issue>
          ):
          <fpage>1480</fpage>
          -
          <lpage>1499</lpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>