<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Hybrid Approach for Dynamic Topic Models with Fluctuating Number of Topics</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Christin Katharina Kreutz</string-name>
          <email>kreutzch@uni-trier.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Trend Mining, Dynamic Topic Models, LDA</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Trier University 54286 Trier</institution>
          ,
          <addr-line>DE</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2018</year>
      </pub-date>
      <abstract>
        <p>Scienti c communities are always changing and evolving. Topics of today might split or even disappear in the future, other topics might merge or appear at some time. Nowadays, the closest we come to picture these developments are dynamic topic models which come with a xed number of topics k. It would be desirable to omit k. This work outlines a research agenda for approaching that task by using LDA as a base in combination with the observation of state transitions in topics at consecutive times.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Algorithms</title>
      <sec id="sec-1-1">
        <title>1. INTRODUCTION</title>
        <p>With today's publication methods, the number of papers
increases rapidly. Losing track of the evolution of the
majority of themes is common. Simultaneously, identifying
important publications is di cult but cardinal for scientists.</p>
        <p>Automatic detection of trends and their indicators in a
scienti c community (trend mining) could bene t
researchers, politicians or entrepreneurs who are not ahead of
current developments but want to get quick insights into
promising areas.</p>
        <p>Our goal is to construct a system, which autonomously
identi es trends and accompanying in uential persons and
papers from a variety of bibliographic data. The appurtenant
research plan is partitioned into three succeeding sections:
First, the transformation of topics generated from a
bibliographic data set over time, their assigned papers, authors
and keywords should be mapped in a dynamic topic model
with variable number of topics. Second, potential upcoming
trends in the topics across the years should automatically be
detected, predicted and extracted from this model, so they
can be evaluated. And third, in uential authors, papers and
venues should be determined in these found trends. The
resulting new insights about what supports the development
of a topic can be used to enhance the identi cation of trends.</p>
        <p>The steps are relatively independent of another, step two
would be applicable on another suitable topic model without
requiring a solution of step one. Figure 1 gives a schematic
overview of our projected line of action.</p>
        <p>In this work, we focus on outlining a research direction
for the rst step, present current state of research on
related models and mark the problems at hand. We touch on
trend mining, before we close with an evaluation plan and
an outlook on possible application for our future model.
2.</p>
      </sec>
      <sec id="sec-1-2">
        <title>DEVELOPMENT OF TOPICS</title>
        <p>
          We assume the importance and set of topics is not
static over time. Topics might sprout, expand, diminish, split,
merge or vanish. Terms that represent the topics change as
new words appear [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. To better understand the dynamics
of topics, we wanted to observe real bibliographical data.
2.1
        </p>
      </sec>
      <sec id="sec-1-3">
        <title>Notation</title>
        <p>Before diving into details of our experiments or the
proposed model, some basic terms need to be set in order to
formally discuss our concepts.</p>
        <p>A paper has a number of fundamental, possibly latent,
ideas. They can be grouped by motive to more general topics
denoted by si. By observing co-occurring topics and terms in
papers, conclusions about the assignment of terms to topics
can be drawn. Topics can be term-wise alike or (partially)
overlap with other topics. Assertions on this can be derived
from the term distributions for topics.</p>
        <p>The total time observed t can be sliced in disjunct
consecutive intervals which are called times t0; : : : ; tn. Given two
times tx and ty, if x &lt; y, tx indicates an interval (and real
period) before ty. Given two times tx and tx+1, tx describes
the interval immediately before tx+1.</p>
        <p>Publications can be uniquely attached to intervals if the
time is sliced by year and their year of publication
determines the assignment. Exact publication dates are mostly not
available. This classi cation is an approximate observation
raster as in theory there is a time continuum and in reality
we only have rough year speci cations. States of topics are
regarded at times.</p>
        <p>A topic si is said to be trending at time tx+y, y 1, if it is
unpopular or not even existing at time tx, but its signi
cance soars. This could be indicated by an increasing number
of publications targeting this subject or its appearance in
important journals or conferences. Essential members of the
scienti c community might start to work in this direction
or the subjects builds its own experts which become widely
known.</p>
        <p>A topic that has not (yet) assigned any publications is
described by s;. This case occurs before a topic is born or if it
is inactive. A topic is inactive, if the number of publications
assigned to the topic does not surpass a threshold or papers
assigned with this topic do only cite papers from the same
topic and are only cited by papers from this area. The
topic has hardly any in uence on the rest of the corpus. The
community which works on this is very tightly connected
but relatively isolated from the rest of the scienti c world.
These enclaves can be described as sects.</p>
        <p>Opposing inactive topics are active topics. The set of
active topics at a time tx can be identi ed by kx. The set of
inactive topics at a time tx can be described by kx.
2.2</p>
      </sec>
      <sec id="sec-1-4">
        <title>Data Set</title>
        <p>
          The data set used in this research is an incompletely
enriched form of the dblp computer science bibliography data
with part of the data from open academic graph. The dblp
data contains bibliographic information related to
publications, authors, conferences and journals from the eld of
computer science and adjacent areas [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]. As of February
2018, it holds metadata of over 4 million publications and
more than 2 million authors. The Microsoft Academic Graph
within open academic graph is used. It contains over 166
million publications and amongst others citation information,
abstracts and details on authors [
          <xref ref-type="bibr" rid="ref21 ref22">22, 21</xref>
          ].
        </p>
        <p>In our set, data from dblp was used completely. In
addition, where publications could be matched based on DOI or
title and author matches where DOI information was not
available, information from open academic graph was
included. The extension contains author a liations, citation
data, abstracts, full texts, keywords and topics. The
structure of the data set is depicted in Figure 2. Because we only
focus on bibliographic information, further data sources like
Twitter are not incorporated in our set.</p>
        <p>For the experiments in this paper, only the data contained
in dblp as well as abstracts were taken into consideration.
At the moment, full texts are only available for a certain
small area in computer science so the usage of them could
have distorted the outcome of our initial trials drastically.
2.3</p>
      </sec>
      <sec id="sec-1-5">
        <title>Methodology</title>
        <p>
          Of the enriched dblp data, only English publications
whose abstract was of considerable length ( 10 words, fewer
words indicate awed data) were taken into account. The
titles and abstracts were purged and stemmed with a Porter
stemmer. Afterwards, LDA [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] with k = 100 was run on all
2.5 million of them. We ignore terms occurring in over 50
percent of publications (collection dependent stop words) or
in under 100 papers as they are often system names.
        </p>
        <p>A visualisation of the data enabled us to draw conclusions
about the characteristics of topics.
2.4</p>
      </sec>
      <sec id="sec-1-6">
        <title>Initial Observations</title>
        <p>In Figure 3, the popularity of a topic in relation to all
topics in the corpus per year is visualised for the years 1990
to 2015 for four selected topics. We assume the number of
topics is appropriate. Di erent settings can be observed:</p>
        <p>There are subjects, which are inactive and whose
popularity rises, so they become active like topic 12, which
is about mobile devices.</p>
        <p>There are subjects, which were always active and
whose popularity increases as seen in topic 13, which covers
terms like management, knowledge and business.</p>
        <p>There are subjects, whose popularity declines such as
seen with topic 27, which includes papers concerning
logic programming and reasoning.</p>
        <p>There are subjects, whose popularity does not really
seem to change over the course of years such as topic
76, which deals with image processing.</p>
        <p>In our data set, we found the case of a topic being
active at a point in time but unrepresented by publications
(a) Overview of popularity of selected topics, topic distributions of papers are
sliced by year. Size of bubble indicates relative importance of topic in all papers (b) Topic number with corresponding assigned most
from this year. important stems.
for a few following years. Later, it re-emerged. The topic's
top keywords contained cloud, so early publications with a
portion of this topic might have a background in weather,
whereas the late publications which were (partly) assigned
to the topic probably pick up on cloud computing.</p>
        <p>The importance and number of active topics is highly
varying throughout the years.</p>
      </sec>
      <sec id="sec-1-7">
        <title>PROBLEM</title>
        <p>Topics can be generated from a corpus by several
probabilistic topic models. The most popular ones all have the
signi cant weakness of an unchangeable number of topics.
Before we dive into the problem, we present some existing
methods.
3.1</p>
      </sec>
      <sec id="sec-1-8">
        <title>Topic Models</title>
        <p>
          The assignment of topics to papers can be performed by
a number of approaches. The simplest one would be Latent
Dirichlet Allocation LDA. Here, it is assumed that every
document is a mixture of topics and every word in the
documents comes from a speci c drawn topic. There are no
words that are partially assigned to no or even a residue
topic. Hidden random variables contain information on the
structure of topics in the documents. First, topic proportions
for a document are drawn. After this step, for every
position of a word in the document, a topic is drawn from this
distribution. In the last part, actual words are drawn from
the topic word distribution. LDA and constitutive models
assume that documents are interchangeable in time. The
number of topics k is xed for a corpus and has to be chosen
beforehand. The vocabulary of the corpus is also xed. [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]
        </p>
        <p>
          A lot of approaches build upon LDA, such as the
AuthorTopic Model ATM. Here, an additional dimension, the
authors, is taken into account. The individual author
codetermines the topic from which a word is drawn. [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ]
        </p>
        <p>
          The correlation of topics was presented with Correlating
Topic Models CTM. Here, LDA was modi ed so instead of
drawing topic distributions for documents from a dirichlet
distribution, they were now taken from a logistic normal
distribution. [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]
        </p>
        <p>
          The temporal aspect of a collection and the development
of topics has been widely disregarded until the introduction
of Dynamic Topic Models DTM. This method extends CTM
by dividing a corpus by year so the topic distribution can
change over time. Topics in slice tx+1 are derived from the
topics in slice tx. Words assigned to a subject are variable
but k is still xed. Information relating to authors is not
used but papers are no longer interchangeable. [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]
3.2
        </p>
      </sec>
      <sec id="sec-1-9">
        <title>Problem Description</title>
        <p>The described methods cannot fully map the dynamics in
a corpus, as the number of topics k is unchangeable. If data
up until a point in time tx is used to generate a DTM, at
time tx+1 new publications can only be assigned to these
already existing k topics. If DTM would be run with new
publications and k + n topics, the resulting topics would
not necessarily represent the former k and additional n new
ones even closely. Changing k slightly results in a di erent
document topic distribution.</p>
        <p>An easy way to capture the dynamics of topics would be
to nd a suitable k, perform LDA on the whole corpus, slice
the corpus by year and look at topics changing over time like
we did in our experiment. Trends could be found
retrospectively. If new data is integrated, LDA could be used another
time on all the publications. Again, trends could be located
in retrospect. Big disadvantages are the determination of k
and the inability to map the topics of the rst run to the
topics of the subsequent runs, especially if k is incremented.
Terms which get mapped to subjects shift and it is
impossible to regain old patterns. It would be unfeasible to measure
if the identi cation of future trends was successful.</p>
        <p>Emergence, disappearance, splitting and merging of topics
over the course of time cannot be modelled with existing
probabilistic topic models. Changes in subjects are indicators
for trends and should thereby be observed.</p>
        <p>
          There are other approaches to nd trends which make use
of a number of other features: Asooja et al. utilise keyword
distributions on textural information [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ], Glanzel et al. work
on citations and textual information [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], Salatino et al.
observe a topic network deployed from connections between
keywords, publications, authors, venues and organisations
[
          <xref ref-type="bibr" rid="ref19">19</xref>
          ].
        </p>
        <p>
          Current methods usually only use a small portion on the
spectrum of available data. A model which incorporates
authors, a liations as well as scientometric measures [
          <xref ref-type="bibr" rid="ref10 ref13 ref20">20, 13,
10</xref>
          ], publication information such as citations [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ] and
vea)
b)
c)
d)
e)
f)
si
si
si
. . .
sj
si
s;
si
tx
si0
. . .
si00
sij
s;
si
s;
        </p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Time</title>
      <p>tx+1
tx+:::
nues in addition to titles, abstracts, full texts, keywords and
topics has the potential to detect trends reliably.</p>
      <sec id="sec-2-1">
        <title>HYBRID APPROACH</title>
        <p>Our theoretic approach is based on the assumption that
there are di erent topic state transitions. They need to be
represented by our model.
4.1</p>
      </sec>
      <sec id="sec-2-2">
        <title>Evolution of Topics over Time</title>
        <p>We identi ed possible state transitions with which the
evolution of topics can be described, they are shown in Figure
4. There are six distinguishable forms: Case a) shows a topic
which does not signi cantly change, b) shows the split of a
topic si into possibly numerous topics si0 , : : : , si00 that are
somewhat coherent or the emergence of a topic si00 from an
already existing (and persisting) topic si, c) shows the
merging of possibly numerous disconnected topics si, : : : , sj into
one, d) shows a vanishing topic, e) shows the birth of a new
topic and f ) shows a combination of cases d) and e) with
the anomaly of the topic si being inactive and re-emerging
over a span of time being the same. The di erent transitions
can be joined ad libitum.</p>
        <p>An example for a) could be the image topic we
already encountered in Figure 3. The distribution of words in
the topic surely changes over time, because the
fundamental terms vary, though the overall motive in them stays the
same. As instance of case b), algorithms concerning depth
rst search could be the base, from which other algorithms,
such as ones for the computation of strongly connected
components, derived. The original topic persisted while new ones
si
tx+y
were emerging from it. A topic describing machine learning
might be a good example of case c). Many areas treating
algorithms are collapsing into this big one, as machine
learning has the potential to outperform even the most re ned
hand-knitted approaches. If a topic describes RSA, it could
fall into category d), as it is no longer considered save,
therefore publications concerning this subject are most likely
going to decrease over the next years until the topic is
inactive. This is a good candidate for the forming of a sect.
The development of a topic for quantum computers could be
mapped to case e). It somewhat was the birth of this topic in
computer science. There certainly were in uences from di
erent communities on the subject but in a corpus restricted to
information technology, the representation might be tting.
As neural networks are currently experiencing a renaissance,
they are an example of f).
4.2</p>
      </sec>
      <sec id="sec-2-3">
        <title>Hybrid Topic Model</title>
        <p>Our future model needs to be able to nd and represent all
described transitions of topics. In the following, we explain
the core components of a hybrid model.</p>
        <p>
          The rough plan would be to split t in years and use LDA
to generate a baseline of topics for t0. For every new year,
the topics of the prior year need to be considered when
calculating the current developments. Citations are a key part
in this as they indicate how information is being spread.
At time tx+1, we examine kx as well as kx and observe
coauthorships, used words and how new publications cite
already classi ed papers. By looking at the topic distributions
and summing the percentages for each topic, it can be
calculated, which topics are cited with corresponding weights
by a new paper. With for example the Wasserstein metric
[
          <xref ref-type="bibr" rid="ref8">8</xref>
          ], the distance between term distributions of topics disttd
is calculated as their di erence. A threshold thtd describes
the distance value over which topic term distributions are
considered dissimilar.
        </p>
        <p>For every topic, the following strategies decide which state
transition has occurred from tx to tx+1:
a) With the rst case, there is no major change in
underlying motives from tx to tx+1. Publications in this
topic reference about the same topics that were cited
at tx and thtd &gt; disttd. The content in cited
publications is typically pretty similar to the content of the
new ones.
b) In this situation, we have the same phenomena as in
case a) but a clustering on publications of this topic
produces multiple distinguishable groups which are
regarded as new topics split from the old one, thtd &lt; disttd
amongst the new topics. New words are likely to occur
in the publications. If they solely appear in the papers
from this area and not throughout the whole corpus,
they strongly hint at a change or split in the topic.
c) If a merging of topics occurs, the witnessed e ects
will resemble those of case a), although publications
which would be ordered to prior topics harmonise their
term distributions and citation behaviour. A clustering
would group the topics together.
d) A dying topic gets none or few new publications
assigned to. The number of papers in this topic might
already be declining for a few years. A topic getting
inactive all of a sudden is highly unlikely.
e) If a new topic emerges, publications do not really match
term distributions of existing ones. They usually cite a
lot of di erent topics as they have no clear predecessor.
The overlap of content from cited papers (not topics)
by a new publication and the citing paper should be
calculated, as it is deemed to be rather small.
f) With the sudden re-emergence of a topic, the term
distribution of publications match a topic in kx.</p>
        <p>After the topic distributions for the new publications are
computed, the then active and inactive topics are assigned
to kx+1 and kx+1 respectively. A run concludes with the
processing of the next year of papers in the same manner.
4.3</p>
      </sec>
      <sec id="sec-2-4">
        <title>Topic Development Prediction and Trend</title>
      </sec>
      <sec id="sec-2-5">
        <title>Mining</title>
        <p>
          Predicting the development of a topic is directly linked to
trend mining. Topics which are about to blow up are future
trends. The upcoming number of publications in a eld, the
estimation of citations a new paper is going to gain [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ]
and possible collaborations between researchers can only be
computed if the underlying author-publication-graph of the
past is thoroughly analysed and in uences on its evolution
are discovered.
        </p>
        <p>The computation of trends in currently active topics is
a step which follows directly from the hybrid topic model.
Topics which changed a lot from tx to tx+1 are candidates
for trends. Not only the development of topics from the last
to the current time frame is going to be observed, the
overall behaviour of the term distributions and cited topics are
relevant. The appearance of new and popular words in the
assigned terms of a topic could signal the beginning of a
trend and is worth further investigation.</p>
        <p>
          Often, popular papers are written by well-known and
highly linked authors, they appear in journals with a lot of
impact or are presented at seminal conferences. Here, the
enriched data is going to be used. A co-author-graph with
researchers' a liations linked to a paper-citation-graph
complete with venues and relationships between journals and
conferences could help discover core persons [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], venues and
publications in topics and trends. Sometimes, trends also
develop from sects, so they have to be steadily looked at.
Topics which were active in tx+1 are judged on whether they
are likely going to be trending in the future. The evolution
can be predicted based on the progress of the topic and the
found in uences.
        </p>
      </sec>
      <sec id="sec-2-6">
        <title>FUTURE PROSPECTS</title>
        <p>After completing the construction of our hybrid approach,
an evaluation of the proposed system needs to prove and
quantify its validity. Furthermore, several practical uses for
the model are presented.
5.1</p>
      </sec>
      <sec id="sec-2-7">
        <title>Evaluation Plan</title>
        <p>The evaluation of our planned system, which includes the
trend mining part, contains multiple steps. The results need
to be cross-validated.</p>
        <p>Our hybrid model is going to be run on a base of data
up until 1995, then topic developments are computed by
the iterative part with data for the next 10 years. For the
following 5 years, trends are predicted. Afterwards, a manual
evaluation of our model and the found trends involves expert
researchers from di erent domains within computer science.
A list which contains our results is presented to them. They
should rate it against the real trends with corresponding
years.</p>
        <p>Additionally, the trends, important researchers and
venues identi ed by our system will be presented to those
experts. They then should rank the correctness of the ndings.</p>
        <p>An automatic method to quantify the accuracy of the
model would involve the observation of data up until a time tx.
Potential trends at this time will be detected, their
evolution and future importance is going to be predicted for the
succeeding ve years and the predictions will be compared
to the real development of signi cance of these topics.
Numbers of papers from topics and citation behaviour could be
prognosticated. If there are discrepancies in predicted and
real data, a manual step could be put in, to question experts
to explain the actual development.</p>
        <p>The hybrid approach also needs to be tested against the
purely incremental model which does not use LDA with a
predetermined k as rst step.
5.2</p>
      </sec>
      <sec id="sec-2-8">
        <title>Applications</title>
        <p>Possible applications of the dynamic topic model with
varying number of topics complete with the identi cation
of trends are manifold. A reviewer recommendation system
for given publications, a citation recommendation system, a
keynote speaker recommendation system or a visualisation
tool for exploring bibliographic data with special focus on
trends could be constructed.</p>
        <p>
          Some reviewer recommendation systems work on word
topic and topic citation distributions [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] or are only usable
for already established conferences as they use former
program committees [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ]. Others are more re ned and want to
integrate the research interest and direction of scientists into
the recommendations [
          <xref ref-type="bibr" rid="ref12 ref16">16, 12</xref>
          ]. Our model is independent of
past conferences. It could make use of the enriched
authorpublication-graph to nd scientists capable and willing to
review new publications from the eld of their current
research interest. As the available data for this task is extensive,
the results could be excellent.
        </p>
        <p>
          Citation recommendation systems suggest tting
publications based on their content, but they do not focus on
returning fundamental papers which lead the way of a topic or
those written by in uential authors for an area [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. The
relative importance of a paper for an area and its development
is not considered. With our hybrid model, the identi cation
of in uential papers and persons is a by-product and could
be easily incorporated in such a system.
        </p>
        <p>Keynote speakers for a conference from topic si should
be in uential scientists from a di erent topic sj , which is
related to si. A linkage of the topics could be predicted,
the term distributions of the topics harmonise or one topic
adapts words from the other area. The ndings in one
topic could highly bene t the other. Our model contains this
information so it could be used for this application.</p>
        <p>
          A visualisation tool for the exploration of found topics,
relationships and trends in the data would be bene cial for
researchers, politicians and entrepreneurs [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. Past work on
the exploration of topics or trends in bibliographic data
sometimes lacks the support for growing and big data sets [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]
or base on a topic model with xed number of topics [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. A
tool using our model and data would inherently dodge these
weaknesses.
        </p>
        <p>This work proposed a hybrid approach which aims at
modelling the agile evolution of topics and trends in a growing
corpus of bibliographic data without a xed and prede ned
number of topics with help of an LDA base. Di erent state
transitions were used to describe the development of topics
over time in detail. A link to trend mining was drawn. The
work concludes with the presentation of an evaluation
concept to con rm the utility of the approach and numerous
examples of use to underline the potential of our future
model.</p>
      </sec>
      <sec id="sec-2-9">
        <title>Acknowledgements</title>
        <p>Special thanks goes to my supervisor Ralf Schenkel for his
invaluable support.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>K.</given-names>
            <surname>Asooja</surname>
          </string-name>
          , G. Bordea, G. Vulcu, and
          <string-name>
            <given-names>P.</given-names>
            <surname>Buitelaar</surname>
          </string-name>
          .
          <article-title>Forecasting emerging trends from scienti c literature</article-title>
          .
          <source>In Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC</source>
          <year>2016</year>
          , Portoroz, Slovenia, May
          <volume>23</volume>
          -28,
          <year>2016</year>
          .,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>D. M.</given-names>
            <surname>Blei</surname>
          </string-name>
          and
          <string-name>
            <surname>J. D.</surname>
          </string-name>
          <article-title>La erty. Correlated topic models</article-title>
          .
          <source>In Advances in Neural Information Processing Systems 18 [Neural Information Processing Systems, NIPS 2005, December 5-8</source>
          ,
          <year>2005</year>
          , Vancouver, British Columbia, Canada], pages
          <fpage>147</fpage>
          {
          <fpage>154</fpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>D. M.</given-names>
            <surname>Blei</surname>
          </string-name>
          and
          <string-name>
            <surname>J. D.</surname>
          </string-name>
          <article-title>La erty. Dynamic topic models</article-title>
          .
          <source>In Machine Learning, Proceedings of the Twenty-Third International Conference (ICML</source>
          <year>2006</year>
          ), Pittsburgh, Pennsylvania, USA, June 25-29,
          <year>2006</year>
          , pages
          <fpage>113</fpage>
          {
          <fpage>120</fpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>D. M.</given-names>
            <surname>Blei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. Y.</given-names>
            <surname>Ng</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M. I.</given-names>
            <surname>Jordan</surname>
          </string-name>
          .
          <article-title>Latent dirichlet allocation</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          ,
          <volume>3</volume>
          :
          <fpage>993</fpage>
          {
          <fpage>1022</fpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.</given-names>
            <surname>Boyd-Graber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Hu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Mimno</surname>
          </string-name>
          . Applications of topic models.
          <volume>11</volume>
          :
          <issue>143</issue>
          {
          <fpage>296</fpage>
          , 01
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A. J.</given-names>
            <surname>Chaney</surname>
          </string-name>
          and
          <string-name>
            <given-names>D. M.</given-names>
            <surname>Blei</surname>
          </string-name>
          .
          <article-title>Visualizing topic models</article-title>
          .
          <source>In Proceedings of the Sixth International Conference on Weblogs and Social Media</source>
          , Dublin, Ireland, June 4-7,
          <year>2012</year>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Fiallos</surname>
          </string-name>
          <string-name>
            <surname>OrdoA</surname>
          </string-name>
          ~ sez,
          <string-name>
            <given-names>K.</given-names>
            <surname>Jimenes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Vaca</surname>
          </string-name>
          , and
          <string-name>
            <given-names>X.</given-names>
            <surname>Ochoa</surname>
          </string-name>
          .
          <article-title>Scienti c communities detection and analysis in the bibliographic database: Scopus, 04</article-title>
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A. L.</given-names>
            <surname>Gibbs</surname>
          </string-name>
          and
          <string-name>
            <given-names>F. E.</given-names>
            <surname>Su</surname>
          </string-name>
          .
          <article-title>On choosing and bounding probability metrics</article-title>
          .
          <source>INTERNAT. STATIST. REV.</source>
          , pages
          <volume>419</volume>
          {
          <fpage>435</fpage>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>W.</given-names>
            <surname>Gla</surname>
          </string-name>
          <article-title>nzel and</article-title>
          <string-name>
            <given-names>B.</given-names>
            <surname>Thijs</surname>
          </string-name>
          .
          <article-title>Using 'core documents' for detecting and labelling new emerging topics</article-title>
          .
          <source>Scientometrics</source>
          ,
          <volume>91</volume>
          (
          <issue>2</issue>
          ):
          <volume>399</volume>
          {
          <fpage>416</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>D.</given-names>
            <surname>Herrmannova</surname>
          </string-name>
          and
          <string-name>
            <given-names>P.</given-names>
            <surname>Knoth</surname>
          </string-name>
          . Semantometrics:
          <article-title>Towards fulltext-based research evaluation</article-title>
          .
          <source>In Proceedings of the 16th ACM/IEEE-CS on Joint Conference on Digital Libraries, JCDL</source>
          <year>2016</year>
          ,
          <article-title>Newark</article-title>
          , NJ, USA, June 19 - 23,
          <year>2016</year>
          , pages
          <fpage>235</fpage>
          {
          <fpage>236</fpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>W.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Mitra</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C. L.</given-names>
            <surname>Giles</surname>
          </string-name>
          .
          <article-title>Refseer: A citation recommendation system</article-title>
          .
          <source>In IEEE/ACM Joint Conference on Digital Libraries, JCDL</source>
          <year>2014</year>
          , London, United Kingdom,
          <source>September</source>
          <volume>8</volume>
          -
          <issue>12</issue>
          ,
          <year>2014</year>
          , pages
          <fpage>371</fpage>
          {
          <fpage>374</fpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>J.</given-names>
            <surname>Jin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Geng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Zhao</surname>
          </string-name>
          , and
          <string-name>
            <surname>L. Zhang.</surname>
          </string-name>
          <article-title>Integrating the trend of research interest for reviewer assignment</article-title>
          .
          <source>In Proceedings of the 26th International Conference on World Wide Web Companion, Perth, Australia, April 3-7</source>
          ,
          <year>2017</year>
          , pages
          <fpage>1233</fpage>
          {
          <fpage>1241</fpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>P.</given-names>
            <surname>Knoth</surname>
          </string-name>
          and
          <string-name>
            <given-names>D.</given-names>
            <surname>Herrmannova</surname>
          </string-name>
          .
          <article-title>Towards semantometrics: A new semantic similarity based measure for assessing a research publication's contribution.</article-title>
          <string-name>
            <surname>D-Lib</surname>
            <given-names>Magazine</given-names>
          </string-name>
          ,
          <volume>20</volume>
          (
          <issue>11</issue>
          /12),
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>B.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Smith</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. G.</given-names>
            <surname>Robertson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Czerwinski</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D. S.</given-names>
            <surname>Tan</surname>
          </string-name>
          .
          <article-title>Facetlens: exposing trends and relationships to support sensemaking within faceted datasets</article-title>
          .
          <source>In Proceedings of the 27th International Conference on Human Factors in Computing Systems, CHI</source>
          <year>2009</year>
          , Boston, MA, USA, April 4-
          <issue>9</issue>
          ,
          <year>2009</year>
          , pages
          <fpage>1293</fpage>
          {
          <fpage>1302</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>M.</given-names>
            <surname>Ley</surname>
          </string-name>
          .
          <article-title>DBLP - some lessons learned</article-title>
          .
          <source>PVLDB</source>
          ,
          <volume>2</volume>
          (
          <issue>2</issue>
          ):
          <volume>1493</volume>
          {
          <fpage>1500</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>X.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Suel</surname>
          </string-name>
          , and
          <string-name>
            <given-names>N. D.</given-names>
            <surname>Memon</surname>
          </string-name>
          .
          <article-title>A robust model for paper reviewer assignment</article-title>
          .
          <source>In Eighth ACM Conference on Recommender Systems</source>
          , RecSys '14,
          <string-name>
            <surname>Foster</surname>
            <given-names>City</given-names>
          </string-name>
          , Silicon Valley, CA, USA - October 06 -
          <issue>10</issue>
          ,
          <year>2014</year>
          , pages
          <fpage>25</fpage>
          {
          <fpage>32</fpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>A.</given-names>
            <surname>Livne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Adar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Teevan</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Dumais</surname>
          </string-name>
          .
          <article-title>Predicting citation counts using text and graph mining</article-title>
          .
          <source>February</source>
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>M.</given-names>
            <surname>Rosen-Zvi</surname>
          </string-name>
          , T. L. Gri ths, M. Steyvers, and
          <string-name>
            <given-names>P.</given-names>
            <surname>Smyth</surname>
          </string-name>
          .
          <article-title>The author-topic model for authors and documents</article-title>
          .
          <source>In UAI '04, Proceedings of the 20th Conference in Uncertainty in Arti cial Intelligence</source>
          , Ban , Canada, July 7-
          <issue>11</issue>
          ,
          <year>2004</year>
          , pages
          <fpage>487</fpage>
          {
          <fpage>494</fpage>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>A. A.</given-names>
            <surname>Salatino</surname>
          </string-name>
          and
          <string-name>
            <given-names>E.</given-names>
            <surname>Motta</surname>
          </string-name>
          .
          <article-title>Detection of embryonic research topics by analysing semantic topic networks</article-title>
          . In A.
          <string-name>
            <surname>Gonzalez-Beltran</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Osborne</surname>
          </string-name>
          , and S. Peroni, editors, Semantics, Analytics, Visualization.
          <source>Enhancing Scholarly Data</source>
          , pages
          <volume>131</volume>
          {
          <fpage>146</fpage>
          ,
          <string-name>
            <surname>Cham</surname>
          </string-name>
          ,
          <year>2016</year>
          . Springer International Publishing.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>S.</given-names>
            <surname>Siebert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dinesh</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Feyer</surname>
          </string-name>
          .
          <article-title>Extending a research-paper recommendation system with bibliometric measures</article-title>
          .
          <source>In Proceedings of the Fifth Workshop on Bibliometric-enhanced Information Retrieval</source>
          (
          <article-title>BIR) co-located with the 39th European Conference on Information Retrieval (ECIR</article-title>
          <year>2017</year>
          ), Aberdeen, UK,
          <year>April 9th</year>
          ,
          <year>2017</year>
          ., pages
          <volume>112</volume>
          {
          <fpage>121</fpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>A.</given-names>
            <surname>Sinha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Ma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Eide</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.-J. P.</given-names>
            <surname>Hsu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Wang</surname>
          </string-name>
          .
          <article-title>An overview of microsoft academic service (mas) and applications</article-title>
          .
          <source>In Proceedings of the 24th International Conference on World Wide Web, WWW '15 Companion</source>
          , pages
          <volume>243</volume>
          {
          <fpage>246</fpage>
          , New York, NY, USA,
          <year>2015</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>J.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , L. Yao,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Z.</given-names>
            <surname>Su</surname>
          </string-name>
          . Arnetminer:
          <article-title>Extraction and mining of academic social networks</article-title>
          .
          <source>In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '08</source>
          , pages
          <fpage>990</fpage>
          {
          <fpage>998</fpage>
          , New York, NY, USA,
          <year>2008</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>H. D.</given-names>
            <surname>Tran</surname>
          </string-name>
          , G. Cabanac, and
          <string-name>
            <given-names>G.</given-names>
            <surname>Hubert</surname>
          </string-name>
          .
          <article-title>Expert suggestion for conference program committees</article-title>
          .
          <source>In 11th International Conference on Research Challenges in Information Science, RCIS</source>
          <year>2017</year>
          , Brighton, United Kingdom, May
          <volume>10</volume>
          -12,
          <year>2017</year>
          , pages
          <fpage>221</fpage>
          {
          <fpage>232</fpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>