<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Towards an Evaluation Framework for Topic Extraction Systems for Online Reputation Management?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Enrique Amigo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Damiano Spina</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bernardino Beotas</string-name>
          <email>b.beotas@almatech.es</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Julio Gonzalo</string-name>
          <email>juliog@lsi.uned.es</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Departamento de Lenguajes y Sistemas Informaticos Universidad Nacional de Educacion a Distancia C/Juan de Rosal</institution>
          ,
          <addr-line>16 28020 Madrid, Espan~a</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Grupo ALMA C/Valent n Beato</institution>
          ,
          <addr-line>23 28037 Madrid, Espan~a</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>This work present a novel evaluation framework for topic extraction over user generated contents. The motivation of this work is the development of systems that monitor the evolution of opinionated topics around a certain entity (a person, company or product) in the Web. Currently, due to the e ort that would be required to develop a gold standard, topic extraction systems are evaluated qualitatively over cases of study or by means of intrinsic evaluation metrics that can not be applied across heterogeneous systems. We propose evaluation metrics based on available document metadata (link structure and time stamps) which do not require manual annotation of the test corpus. Our preliminary experiments show that these metrics are sensitive to the number of iterations in LDA-based topic extraction algorithms, which is an indication of the consistency of the metrics.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>? This work has been partially supported by Alma Technologies and the Spanish
Government (projects Webopinion and Text-Mess/Ines)</p>
    </sec>
    <sec id="sec-2">
      <title>Introduction</title>
      <p>
        The growing interest on monitoring opinions in the Web 2.0 is well known.
Online Reputation Management consists of monitoring the opinion of Web users on
people, companies or products, and it is already a fundamental tool in
corporate communication. A particularly relevant problem is to detect new topics or
opinion trends which deserve the attention of communication experts, such as a
burst of tweets or blog entries about a controversial issue about a company, or
a defect of a product. A system that assists a communication expert should be
able to detect (particularly new) topics, tag them in an interpretable way, cluster
documents related to each topic and analyze the evolution of topics over time.
What makes this a distinctive problem is the fact that documents are naturally
multi-topic: relevance of a document for a topic may be even sub-sentential. This
problem is sometimes referred to as Temporal Text Mining [
        <xref ref-type="bibr" rid="ref1 ref2">1,2</xref>
        ].
      </p>
      <p>Models and systems to solve these tasks are recently starting to appear in
scholar publications. But a major bottleneck so far is the absence of a
benchmarking test suite to evaluate and compare systems. Creating such a gold standard
is, in fact, a complex task: de ning the set of topics in a document stream is a
subtle task, because topics tend to co-occur in documents and the appropriate
level of granularity in topic and sub-topic distinctions is something fuzzy to x.
For similar reasons it is also di cult, once the set of topics is established, to
decide which documents talk about each of the topics and how central is each
document to each of the topics that the document discusses. In the absence of a
gold standard, extrinsic precision-recall based metrics can not be applied.</p>
      <p>For this reason, current systems are evaluated informally via use cases, or
otherwise using intrinsic evaluation measures which are speci c to the model
being tested.</p>
      <p>There are, however, basic restrictions on how a good system should behave.
For instance, documents which share outlinks to the same web pages should tend
to be more related than documents which do not share outlinks. This type of
information has not been yet used by current topic detection systems, because
they relate together only a small subset of the documents. This information,
however, might be used as a (limited) evaluation or validation mechanism to
optimize system parameters. In this paper we address the task of de ning an
evaluation methodology based on this idea, and check its suitability on an
LDAbased approach to topic detection over time.
2</p>
    </sec>
    <sec id="sec-3">
      <title>State of the art</title>
      <p>We will start with an overview of models to solve the task, and then we will
summarize the evaluation methodologies used so far and discuss their limitations.
2.1</p>
      <sec id="sec-3-1">
        <title>Topic detection models</title>
        <p>
          The most basic approaches for topic monitoring focus on word frequency. The
assumption is that frequent words indicate, in general, salient topics in a document
collection. Some available web services are Blogpulse Trends3, Mood Views4
and Blogscope5. Brooks and Montanez showed that frequent words (extracted
according to tf.idf), produce tags that generate document clusters with more
cohesion than user tags in blogs [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ].
        </p>
        <p>
          Gruhl included the temporal dimension in his model by extracting topic
terms with frequency peaks over time [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. Chi considered also the distribution of
terms across blogs [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. He assumed that topics gain prominence in blog subsets.
His model consists of computing the singular values of the time-blog frequency
matrix . Mei et al. combine topological information with the temporal dimension
[
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. Their model employs the EM algorithm to identify the topic distributions
along time and location that maximize the likelihood of word occurrences. An
interesting feature of this model is that it assumes that several topics can appear
in the same document.
        </p>
        <p>
          Many novel proposals are currently based on the LDA (Latent Dirichlet
Allocation) model [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. As well as Mei's approach, LDA is a probabilistic model that
estimates the distribution d of topics for each document d and the distribution
of words for each topic. The particularity of LDA is that the distribution
parameters are generated by a Dirichlet distribution with certain hyperparameters
that are stated a priori.
        </p>
        <p>
          One example of these models is TOT (Topic Over Time) [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. The most
characteristic aspect of this work is that the temporal variable is added to the LDA
model, assuming that topics follow a Beta distribution along time. One drawback
in this work is that all the document collection must be processed for inferring
temporal distribution when new documents appear in the input stream. The
model Dynamic Topic Model [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] tries to solve it by estimating topic
distributions for each time slot independently. After this, the model employs temporal
series techniques in order to analyze topic evolution. Another model that tackle
this issue is On-line LDA [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. This model states that the knowledge produced
over a time slot represents the a priori knowledge for the next time slot. This
idea allows to process new documents without reprocessing the whole collection.
However, an addition mechanism to detect new topics along the time becomes
necessary. Another interesting model based on LDA is denominated Multiscale
Topic Tomography [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. In this approach the topic distribution includes di erent
granularity levels.
2.2
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>Evaluation approaches</title>
        <p>The main bottleneck in this area of research is the absence of a common
evaluation methodology to compare approaches. Let us summarize the approaches to
evaluation in the research described above.</p>
        <p>In terms of its e ciency and suitability to assist experts in the online
reputation management task, some approaches are best suited than others. For
instance, the models ON-line LDA and Dynamic Topic Model are able to process
3 www.blogpulse.com/trends
4 ilps.science.uva.nl/MoodViews
5 www.blogscope.net
new documents without re-processing the collection. Multiscale Topic
Tomography, on the other hand, allows a topic visualization at di erent granularity
levels. However, it is still necessary to de ne an evaluation framework to
compare approaches in terms of accuracy.</p>
        <p>
          Some approaches are simply evaluated over case studies. This is the case of
Mei's approach [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] and the Dynamic topic model [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. Ghuhl's model [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], on the
other hand, is evaluated against human annotated topic terms; an evaluation
method than can not, for instance, be applied to LDA-based models.
        </p>
        <p>
          The model Topic over Time [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] is evaluated with intrinsic clustering metrics
according to the KL-divergence between topics; but this methodology is only
appropriate to compare similar systems. For instance, in their evaluation the
authors obtained evidence about the advantages of including the temporal
variable in the model. It is not possible, however, to evaluate heterogeneous systems
with intrinsic clustering metrics. For instance, systems based on KL-divergence
would be rewarded by this evaluation method. Something similar happens with
the evaluation of the Multiscale Topic Tomography [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ], where the perplexity
of the model is compared against the perplexity obtained with other models.
In this case, LDA-based models could not be compared with models based on
traditional clustering algorithms. Other proposed evaluation metrics focus on
extrinsic tasks using topic descriptors, such as multi-document summarization
[
          <xref ref-type="bibr" rid="ref2">2</xref>
          ].
        </p>
        <p>With these limitations in mind, our goal is to de ne and apply an
automatic evaluation framework enabling comparison between heterogeneous,
arbitrary systems, and which is not dependent on cost-intensive manual annotation
of data.
3
3.1</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Evaluation methodology</title>
      <sec id="sec-4-1">
        <title>System prerequisites</title>
        <p>We start from a few prerequisites for topic detection systems in opinion mining:
{ Aggregation: The system must detect a nite number of topics. Documents
will be associated to zero, one or more topics in a discrete or continuous way.</p>
        <p>The key point is that related documents should share at least one topic.
{ Temporality In order to analyze the evolution of the reputation of a given
entity, the system must re ect di erences in topic distribution across time.</p>
        <p>This implies to show the intensity of topics across time slots.
{ Interpretability Identi ed topics should be tagged in a way that is
interpretable for the user.
{ Accessibility For each topic, the corresponding documents must be ranked
according to its relevance in the context of the topic.</p>
        <p>In this work, we focus on the two rst functionalities: \aggregation" and
\Temporality". The interesting aspect of these two features is that it is possible
to generate automatically a benchmark for evaluation purposes.
3.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>System Output variables</title>
        <p>The aggregation functionality requires to infer to what extent each document is
related to each topic. This output can be formalized as P ( jd), which represents
the distribution of topics in each document d. For instance, a traditional
discrete clustering algorithm would return P ( ijd) = 1 if the document d belongs
to the cluster associated with the topic i.</p>
        <p>Analogously, the temporality function requires an output variable P ( jt)
representing the distribution of topics in each time slot t. From the perspective of
evaluation, a key aspect is that all functionalities must be mutually consistent.
In particular, the intensity of topics (temporality) has to correspond with the
number of associated documents in the time slot (Aggregation). Therefore,
temporality can be inferred from the output P ( jd). Assuming that the intensity of
topics is proportional to the number of related topic in the time slot, we can
state that:</p>
        <p>P ( jt) =</p>
        <p>X P ( jd)
d2t
3.3</p>
      </sec>
      <sec id="sec-4-3">
        <title>Evaluation measures</title>
        <p>Our evaluation methodology is based on two assumptions on the desired behavior
of systems:
{ Documents with outlinks that point to the same page and documents
produced by the same author will tend to be more topically-related than the
average.
{ It is easier to nd highly related documents in the same time slot (say, blog
posts in the same week), than separated by long time periods (such as several
months).</p>
        <p>As most current systems do not rely on this kind of information, it is possible
to use it at least for parameter optimization cycles. Some of the systems do
employ temporal information, and therefore the second restriction is not totally
system-independent. In such cases, however, the improvement obtained by the
use of temporal information can still be measured in terms of the rst restriction.</p>
        <p>The rst step to evaluate systems according to these two assumptions consists
of de ning when the system considers that two documents are related (as for their
topics). This is not straightforward, given that systems generate a distribution
of weighted topics for each document. We will assume that one topic is enough
to consider that two documents are related, but only if both documents focus on
this topic. According to this, we de ne the Connectedness of a document pair
as:</p>
        <p>Connectedness(d1; d2) = Maxi(Min(P ( i; d1); P ( i; d2)))</p>
        <p>Our evaluation metrics will compare the connectedness of document pairs
in two sets according to the assumptions introduced above. We will call these
sets RDP (Related Document Pairs) and NRDP (Non-Related Document Pairs).
RDP consists of document pairs with, for instance, one or more common
outlinks, while NRDP consists of documents, conversely, without common outlinks.
According to our assumptions, document pairs in RDP should have a higher
connectedness, in average, than document pairs in NRDP.</p>
        <p>In order to avoid dependencies on scale properties of the distribution P ( jd)
associated to each system, we will formulate evaluation metrics in a non-parametric
way, estimating, for each system s:</p>
        <p>metric value(s) = P (Connectedness(dr; dr0) &gt; Connectedness(dn; dn0))
where &lt; dr; dr &gt;2 RDP; &lt; dn; dn0 &gt;2 N RDP .</p>
        <p>In other words, the quality of the system is measured as the probability that
two documents from the RDP set have a higher topic overlap (according to the
system) than two documents from NRDP. Di erent criteria to form RDP and
NRDP lead to di erent evaluation metrics; we now discuss some examples.
Outlink Aggregation In order to obtain the set of related document pairs
(RDP), we assume that two documents are more likely to be related if they
share an outlink to the same web page, if this outlink does not appear in other
documents (this restriction eliminates frequent outlinks which are not related to
the document content, such as links to Facebook).</p>
        <p>Author Aggregation As for documents related by a common author, we will
simply consider pairs of documents with the same author for RDP and pairs
from di erent authors for NRDP.</p>
        <p>Temporality As for temporality, we will assume that is easier to nd documents
sharing a topic when both documents belong to the same time slot. In particular,
we will build RDP with the 100 most related document pairs (according to the
system output) which are created in the same week. NRDP is formed by the 100
most related document pairs which are created with a di erence of at least three
months.
4</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Test Case: Iterations in an LDA-based system</title>
      <p>
        To test our evaluation methodology, we have implemented the LDA approach,
starting with the algorithm described in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] and eliminating the temporal variable
component { which will be tested in future work {. LDA is a generative process
where each document d is associated with a multinomial distribution of topics,
and uses Dirichlet distributions as hyperparameters. The model assumes that
each document token is associated to a single topic, and therefore the topic
distribution in a document would be given by the individual token assignments.
The article by Wang and McCallum describes the approach in detail as well as
the derivation of the Gibbs sampling.
      </p>
      <p>The algorithm implemented consists of the following steps:
1. Random initialization of each token to some of the k topics.
2. For each token in document d, the topic is updated drawing on the
probability P (z) for each topic z. The probabilities are:</p>
      <p>P (z) = (md;z + )</p>
      <p>nz;w_ +</p>
      <p>PvV (nz;w + )
where md;z represents the number of tokens in the document d associated to
topic z; nz;w represents the number of occurrences of the word w from the
corresponding token in topic z, and V is the vocabulary. and are two
hyperparameters that re ect, respectively, the topic dispersion per word and
per document.
3. md;z and nz;w are updated and then we go back to step 2, for as many
iterations as desired.</p>
      <p>Implementations known to us use xed hyperparameters for any word in the
vocabulary and for every document. In future work, however, and counting with
an automatic evaluation mechanism, we could test whether should have some
relation with the document length.
5</p>
    </sec>
    <sec id="sec-6">
      <title>Hypothesis validation</title>
      <p>In order to validate our assumptions, we have performed a small experiment
which involves manual validation of document pairs.</p>
      <p>From our testbed, we have generated 64 random tuples, each consisting of
two document pairs: in one pair, both documents share at least one outlink that
does not appear in any other document (see Section 3.3); in the other pair, they
do not share any outlink. According to our hypothesis, document pairs which
share outlinks should be more topically related in average than pairs that do not
share outlinks.</p>
      <p>For each tuple, we have manually annotated which is the most topically
related document pair (sometimes this is not obvious and the tuple is then
annotated as undecidable). For 50 tuples (78%), the document pair that share
the outlink was more topically related than the other. In 12 cases (19%) it was
undecidable, and only in 2 cases (3%) the linked document pair was less topically
related than the other.</p>
      <p>An analogous process was conducted for the co-authorship criterion,
comparing tweet pairs written by the same author with pairs written by di erent
authors. The results over 97 tuples are similar to the previous ones: co-authored
tweets are more related in 80% of the cases, while non co-authored tweets are
more related only in one occasion (1%).</p>
      <p>These results suggest that our assumptions are reasonable for our testbed.
This is, of course, just a preliminary result that must be validated with larger
manual annotations over di erent testbeds. Note also that the experimental
procedure must be re ned, because \undecidable" cases (which are 20% of the
assessed samples) might become decidable with a more precise, testbed-speci c
de nition of relatedness.</p>
    </sec>
    <sec id="sec-7">
      <title>Experiment and Evaluation Results</title>
      <p>The goal of our experiment is to test the behavior of our evaluation metrics.
As a dataset we use 5,000 tweets and 500 posts from blogs in Spanish
containing the term BBVA (an Spanish bank operating in several countries). We have
generated a vocabulary excluding stop words. In general, for all the approaches
compared, topics detected by the LDA system consist of (i) information about
"Liga BBVA", the Spanish Premier Football League, which is sponsored by
the bank; (ii) economic information on the bank; (iii) information in languages
other than Spanish (such as Catalan); and (iv) topics with unfrequent terms. In
general, the granularity of the topics is relatively low. Note that, unlike other
related experiments, we are focusing on a single entity, while other approaches
cover several totally independent topics.</p>
      <p>Table 2 shows the results on 5,000 tweets, this time measuring aggregation
by author. Again, the metric values seem to stabilize around 100 iterations, and
they show a clear correlation with the number of iterations.</p>
      <p>Another variable that can be analyzed with our evaluation methodology is
the e ect of di erent values for the hyperparameter . Table 3 shows that does
not have a strong e ect on the results. In fact, the maximum seems to be around
= 1. This implies that, in general, documents tend to be centered around one
single topic. This is perhaps due to the low granularity of the topics produced
in our experiment.</p>
      <p>Finally, we have studied the e ect of the number of topics on the results of
the evaluation. Is it possible that LDA, in this context, reaches a more adequate
topic granularity by increasing their number? Table 4 shows the results obtained
for 500 blog entries, 2000 iterations and = 1. Note that, although there is
some positive e ect when increasing the number of topics, it is not as clear as
in previous experiments.
In this work we have proposed an early version of an automatic evaluation
methodology which permits the optimization of topic extraction models for
online reputation management using external information not employed by the
system. In a preliminar experiment using blog entries and tweets for a bank, we
have been able to observe quantitative e ects such as a little in uence of the
hyperparameter on the nal results, the number of iterations which lead to
stable results for LDA, or the e ect produced by the number of topics.</p>
      <p>Our evaluation methodology has still unresolved issues: we do not know yet to
which extent the selection of the RDP and NRDP sets bias the results (they are,
after all, just a small sample of the full test set, with very precise characteristics).
We also need to revise the "temporality" measure to obtain more stable results
in our experimental framework.</p>
      <p>In any case, the methodology provides a way of testing hypothesis not yet
evaluated quantitatively in other studies, such as the e ect of including a
temporal variable in the model, the possibility of processing time slots independently,
the e ects of structuring topics hierarchically, etc.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Hurst</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Temporal text mining</article-title>
          .
          <source>In: Proceedings of the AAAI Spring Symposia on Computational Approaches</source>
          to Analyzing Weblogs. (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Subasic</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bettina</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>From bursty patterns to bursty facts: The e ectiveness of temporal text mining for news</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Brooks</surname>
            ,
            <given-names>C.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Montanez</surname>
          </string-name>
          , N.:
          <article-title>Improved annotation of the blogosphere via autotagging and hierarchical clustering</article-title>
          .
          <source>In: WWW '06: Proceedings of the 15th international conference on World Wide Web</source>
          , New York, NY, USA, ACM Press (
          <year>2006</year>
          )
          <volume>625</volume>
          {
          <fpage>632</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Gruhl</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guha</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liben-Nowell</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tomkins</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Information di usion through blogspace</article-title>
          .
          <source>In: WWW '04: Proceedings of the 13th international conference on World Wide Web</source>
          , New York, NY, USA, ACM (
          <year>2004</year>
          )
          <volume>491</volume>
          {
          <fpage>501</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Chi</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tseng</surname>
            ,
            <given-names>B.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tatemura</surname>
          </string-name>
          , J.:
          <article-title>Eigen-trend: trend analysis in the blogosphere based on singular value decompositions</article-title>
          .
          <source>In: CIKM '06: Proceedings of the 15th ACM international conference on Information and knowledge management</source>
          , New York, NY, USA, ACM Press (
          <year>2006</year>
          )
          <volume>68</volume>
          {
          <fpage>77</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Mei</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Su</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhai</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>A probabilistic approach to spatiotemporal theme pattern mining on weblogs</article-title>
          .
          <source>In: WWW '06: Proceedings of the 15th international conference on World Wide Web</source>
          , New York, NY, USA, ACM Press (
          <year>2006</year>
          )
          <volume>533</volume>
          {
          <fpage>542</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Blei</surname>
            ,
            <given-names>D.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ng</surname>
            ,
            <given-names>A.Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jordan</surname>
            ,
            <given-names>M.I.</given-names>
          </string-name>
          :
          <article-title>Latent dirichlet allocation</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          <volume>3</volume>
          (
          <year>2002</year>
          ) 2003
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McCallum</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Topics over time: a non-markov continuous-time model of topical trends</article-title>
          .
          <source>In: KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining</source>
          , New York, NY, USA, ACM (
          <year>2006</year>
          )
          <volume>424</volume>
          {
          <fpage>433</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Blei</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>La</surname>
            <given-names>erty</given-names>
          </string-name>
          , J.:
          <article-title>Dynamic topic models</article-title>
          .
          <source>In: Proceedings of the 23rd international conference on Machine learning</source>
          , ACM New York, NY, USA (
          <year>2006</year>
          )
          <volume>113</volume>
          {
          <fpage>120</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>AlSumait</surname>
          </string-name>
          , L.,
          <string-name>
            <surname>Barbara</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Domeniconi</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>On-line LDA: adaptive topic models for mining text streams with applications to topic detection and tracking</article-title>
          .
          <source>In: Proceedings of the 2008 Eighth IEEE International Conference on Data Mining</source>
          , IEEE Computer Society Washington, DC, USA (
          <year>2008</year>
          )
          <volume>3</volume>
          {
          <fpage>12</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Nallapati</surname>
            ,
            <given-names>R.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ditmore</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>La</surname>
            <given-names>erty</given-names>
          </string-name>
          , J.D.,
          <string-name>
            <surname>Ung</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Multiscale topic tomography</article-title>
          .
          <source>In: KDD '07: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining</source>
          , New York, NY, USA, ACM (
          <year>2007</year>
          )
          <volume>520</volume>
          {
          <fpage>529</fpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>