=Paper= {{Paper |id=Vol-2126/paper5 |storemode=property |title=A Hybrid Approach for Dynamic Topic Models with Fluctuating Number of Topics |pdfUrl=https://ceur-ws.org/Vol-2126/paper5.pdf |volume=Vol-2126 |authors=Christin Katharina Kreutz |dblpUrl=https://dblp.org/rec/conf/gvd/Kreutz18 }} ==A Hybrid Approach for Dynamic Topic Models with Fluctuating Number of Topics== https://ceur-ws.org/Vol-2126/paper5.pdf

A Hybrid Approach for Dynamic Topic Models with
Fluctuating Number of Topics

Christin Katharina Kreutz
Trier University
54286 Trier, DE
kreutzch@uni-trier.de

ABSTRACT
Scientific communities are always changing and evolving. To-
pics of today might split or even disappear in the future,
other topics might merge or appear at some time. Nowa-
days, the closest we come to picture these developments are
dynamic topic models which come with a fixed number of
topics k. It would be desirable to omit k. This work out-
lines a research agenda for approaching that task by using
LDA as a base in combination with the observation of state
transitions in topics at consecutive times.

Categories and Subject Descriptors
H.1 [Models and Principles]: Document Topic Models;
I.5 [Pattern Recognition]: Trend Mining

General Terms
Algorithms

Keywords
Trend Mining, Dynamic Topic Models, LDA
Figure 1: Simplified visualisation of our research plan.
1. INTRODUCTION
With today’s publication methods, the number of papers
and keywords should be mapped in a dynamic topic model
increases rapidly. Losing track of the evolution of the ma-
with variable number of topics. Second, potential upcoming
jority of themes is common. Simultaneously, identifying im-
trends in the topics across the years should automatically be
portant publications is difficult but cardinal for scientists.
detected, predicted and extracted from this model, so they
Automatic detection of trends and their indicators in a
can be evaluated. And third, influential authors, papers and
scientific community (trend mining) could benefit resear-
venues should be determined in these found trends. The re-
chers, politicians or entrepreneurs who are not ahead of
sulting new insights about what supports the development
current developments but want to get quick insights into
of a topic can be used to enhance the identification of trends.
promising areas.
The steps are relatively independent of another, step two
Our goal is to construct a system, which autonomously
would be applicable on another suitable topic model without
identifies trends and accompanying influential persons and
requiring a solution of step one. Figure 1 gives a schematic
papers from a variety of bibliographic data. The appurtenant
overview of our projected line of action.
research plan is partitioned into three succeeding sections:
In this work, we focus on outlining a research direction
First, the transformation of topics generated from a biblio-
for the first step, present current state of research on rela-
graphic data set over time, their assigned papers, authors
ted models and mark the problems at hand. We touch on
trend mining, before we close with an evaluation plan and
an outlook on possible application for our future model.

2. DEVELOPMENT OF TOPICS
We assume the importance and set of topics is not sta-
tic over time. Topics might sprout, expand, diminish, split,
30th GI-Workshop on Foundations of Databases (Grundlagen von Daten-
banken), 22.05.2018 - 25.05.2018, Wuppertal, Germany. merge or vanish. Terms that represent the topics change as
Copyright is held by the author/owner(s). new words appear [5]. To better understand the dynamics
of topics, we wanted to observe real bibliographical data.

2.1 Notation
Before diving into details of our experiments or the pro-
posed model, some basic terms need to be set in order to
formally discuss our concepts.
A paper has a number of fundamental, possibly latent,
ideas. They can be grouped by motive to more general topics
denoted by si . By observing co-occurring topics and terms in
papers, conclusions about the assignment of terms to topics
can be drawn. Topics can be term-wise alike or (partially)
overlap with other topics. Assertions on this can be derived
from the term distributions for topics.
The total time observed t can be sliced in disjunct conse-
cutive intervals which are called times t0 , . . . , tn . Given two Figure 2: Simplified depiction of the composition of the ex-
times tx and ty , if x < y, tx indicates an interval (and real tended dblp data set. Data is partial.
period) before ty . Given two times tx and tx+1 , tx describes
the interval immediately before tx+1 .
Publications can be uniquely attached to intervals if the cluded. The extension contains author affiliations, citation
time is sliced by year and their year of publication determi- data, abstracts, full texts, keywords and topics. The struc-
nes the assignment. Exact publication dates are mostly not ture of the data set is depicted in Figure 2. Because we only
available. This classification is an approximate observation focus on bibliographic information, further data sources like
raster as in theory there is a time continuum and in reality Twitter are not incorporated in our set.
we only have rough year specifications. States of topics are For the experiments in this paper, only the data contained
regarded at times. in dblp as well as abstracts were taken into consideration.
A topic si is said to be trending at time tx+y , y ≥ 1, if it is At the moment, full texts are only available for a certain
unpopular or not even existing at time tx , but its significan- small area in computer science so the usage of them could
ce soars. This could be indicated by an increasing number have distorted the outcome of our initial trials drastically.
of publications targeting this subject or its appearance in
important journals or conferences. Essential members of the 2.3 Methodology
scientific community might start to work in this direction Of the enriched dblp data, only English publications who-
or the subjects builds its own experts which become widely se abstract was of considerable length (≥ 10 words, fewer
known. words indicate flawed data) were taken into account. The
A topic that has not (yet) assigned any publications is de- titles and abstracts were purged and stemmed with a Porter
scribed by s∅ . This case occurs before a topic is born or if it stemmer. Afterwards, LDA [4] with k = 100 was run on all
is inactive. A topic is inactive, if the number of publications 2.5 million of them. We ignore terms occurring in over 50
assigned to the topic does not surpass a threshold or papers percent of publications (collection dependent stop words) or
assigned with this topic do only cite papers from the same in under 100 papers as they are often system names.
topic and are only cited by papers from this area. The to- A visualisation of the data enabled us to draw conclusions
pic has hardly any influence on the rest of the corpus. The about the characteristics of topics.
community which works on this is very tightly connected
but relatively isolated from the rest of the scientific world. 2.4 Initial Observations
These enclaves can be described as sects. In Figure 3, the popularity of a topic in relation to all
Opposing inactive topics are active topics. The set of ac- topics in the corpus per year is visualised for the years 1990
tive topics at a time tx can be identified by kx . The set of to 2015 for four selected topics. We assume the number of
inactive topics at a time tx can be described by kx . topics is appropriate. Different settings can be observed:

2.2 Data Set • There are subjects, which are inactive and whose popu-
larity rises, so they become active like topic 12, which
The data set used in this research is an incompletely en-
is about mobile devices.
riched form of the dblp computer science bibliography data
with part of the data from open academic graph. The dblp • There are subjects, which were always active and who-
data contains bibliographic information related to publica- se popularity increases as seen in topic 13, which covers
tions, authors, conferences and journals from the field of terms like management, knowledge and business.
computer science and adjacent areas [15]. As of February
2018, it holds metadata of over 4 million publications and • There are subjects, whose popularity declines such as
more than 2 million authors. The Microsoft Academic Graph seen with topic 27, which includes papers concerning
within open academic graph is used. It contains over 166 mil- logic programming and reasoning.
lion publications and amongst others citation information, • There are subjects, whose popularity does not really
abstracts and details on authors [22, 21]. seem to change over the course of years such as topic
In our set, data from dblp was used completely. In addi- 76, which deals with image processing.
tion, where publications could be matched based on DOI or
title and author matches where DOI information was not In our data set, we found the case of a topic being ac-
available, information from open academic graph was in- tive at a point in time but unrepresented by publications
Topic 10 most important stems
mobil, devic, network, commun,
12 peer, music, ad, hoc, messag, wire-
less
manag, knowledg, studi, inform, re-
13 search, technolog, organ, busi, fac-
tor, effect
program, logic, fuzzi, oper, reason,
27
gener, comput, base, languag, execut
imag, color, reconstruct, map, me-
76 thod, algorithm, base, render, reso-
lut, pixel
(a) Overview of popularity of selected topics, topic distributions of papers are
sliced by year. Size of bubble indicates relative importance of topic in all papers (b) Topic number with corresponding assigned most
from this year. important stems.

Figure 3: Exemplary illustration of the development of selected topics over time and their associated stems by running LDA
with k = 100 on the whole extended dblp data set.

for a few following years. Later, it re-emerged. The topic’s by dividing a corpus by year so the topic distribution can
top keywords contained cloud, so early publications with a change over time. Topics in slice tx+1 are derived from the
portion of this topic might have a background in weather, topics in slice tx . Words assigned to a subject are variable
whereas the late publications which were (partly) assigned but k is still fixed. Information relating to authors is not
to the topic probably pick up on cloud computing. used but papers are no longer interchangeable. [3]
The importance and number of active topics is highly va-
rying throughout the years. 3.2 Problem Description
The described methods cannot fully map the dynamics in
3. PROBLEM a corpus, as the number of topics k is unchangeable. If data
Topics can be generated from a corpus by several proba- up until a point in time tx is used to generate a DTM, at
bilistic topic models. The most popular ones all have the time tx+1 new publications can only be assigned to these
significant weakness of an unchangeable number of topics. already existing k topics. If DTM would be run with new
Before we dive into the problem, we present some existing publications and k + n topics, the resulting topics would
methods. not necessarily represent the former k and additional n new
ones even closely. Changing k slightly results in a different
3.1 Topic Models document topic distribution.
The assignment of topics to papers can be performed by An easy way to capture the dynamics of topics would be
a number of approaches. The simplest one would be Latent to find a suitable k, perform LDA on the whole corpus, slice
Dirichlet Allocation LDA. Here, it is assumed that every the corpus by year and look at topics changing over time like
document is a mixture of topics and every word in the do- we did in our experiment. Trends could be found retrospec-
cuments comes from a specific drawn topic. There are no tively. If new data is integrated, LDA could be used another
words that are partially assigned to no or even a residue time on all the publications. Again, trends could be located
topic. Hidden random variables contain information on the in retrospect. Big disadvantages are the determination of k
structure of topics in the documents. First, topic proportions and the inability to map the topics of the first run to the
for a document are drawn. After this step, for every posi- topics of the subsequent runs, especially if k is incremented.
tion of a word in the document, a topic is drawn from this Terms which get mapped to subjects shift and it is impossi-
distribution. In the last part, actual words are drawn from ble to regain old patterns. It would be unfeasible to measure
the topic word distribution. LDA and constitutive models if the identification of future trends was successful.
assume that documents are interchangeable in time. The Emergence, disappearance, splitting and merging of topics
number of topics k is fixed for a corpus and has to be chosen over the course of time cannot be modelled with existing pro-
beforehand. The vocabulary of the corpus is also fixed. [4] babilistic topic models. Changes in subjects are indicators
A lot of approaches build upon LDA, such as the Author- for trends and should thereby be observed.
Topic Model ATM. Here, an additional dimension, the aut- There are other approaches to find trends which make use
hors, is taken into account. The individual author codeter- of a number of other features: Asooja et al. utilise keyword
mines the topic from which a word is drawn. [18] distributions on textural information [1], Glänzel et al. work
The correlation of topics was presented with Correlating on citations and textual information [9], Salatino et al. ob-
Topic Models CTM. Here, LDA was modified so instead of serve a topic network deployed from connections between
drawing topic distributions for documents from a dirichlet keywords, publications, authors, venues and organisations
distribution, they were now taken from a logistic normal [19].
distribution. [2] Current methods usually only use a small portion on the
The temporal aspect of a collection and the development spectrum of available data. A model which incorporates au-
of topics has been widely disregarded until the introduction thors, affiliations as well as scientometric measures [20, 13,
of Dynamic Topic Models DTM. This method extends CTM 10], publication information such as citations [17] and ve-
were emerging from it. A topic describing machine learning
si si might be a good example of case c). Many areas treating
a)
algorithms are collapsing into this big one, as machine lear-
ning has the potential to outperform even the most refined
si0 hand-knitted approaches. If a topic describes RSA, it could
fall into category d), as it is no longer considered save, the-
refore publications concerning this subject are most likely
b) si ... going to decrease over the next years until the topic is in-
active. This is a good candidate for the forming of a sect.
The development of a topic for quantum computers could be
si00
mapped to case e). It somewhat was the birth of this topic in
computer science. There certainly were influences from diffe-
si rent communities on the subject but in a corpus restricted to
information technology, the representation might be fitting.
As neural networks are currently experiencing a renaissance,
c) ... sij they are an example of f).

4.2 Hybrid Topic Model
sj
Our future model needs to be able to find and represent all
described transitions of topics. In the following, we explain
d) si s∅ the core components of a hybrid model.
The rough plan would be to split t in years and use LDA
to generate a baseline of topics for t0 . For every new year,
e) s∅ si the topics of the prior year need to be considered when cal-
culating the current developments. Citations are a key part
in this as they indicate how information is being spread.
f) si s∅ si At time tx+1 , we examine kx as well as kx and observe co-
authorships, used words and how new publications cite al-
Time tx tx+1 tx+... tx+y ready classified papers. By looking at the topic distributions
and summing the percentages for each topic, it can be cal-
Figure 4: Possible state transitions of topics si over time t. culated, which topics are cited with corresponding weights
by a new paper. With for example the Wasserstein metric
[8], the distance between term distributions of topics disttd
nues in addition to titles, abstracts, full texts, keywords and is calculated as their difference. A threshold thtd describes
topics has the potential to detect trends reliably. the distance value over which topic term distributions are
considered dissimilar.
4. HYBRID APPROACH For every topic, the following strategies decide which state
transition has occurred from tx to tx+1 :
Our theoretic approach is based on the assumption that
there are different topic state transitions. They need to be a) With the first case, there is no major change in un-
represented by our model. derlying motives from tx to tx+1 . Publications in this
topic reference about the same topics that were cited
4.1 Evolution of Topics over Time at tx and thtd > disttd . The content in cited publica-
We identified possible state transitions with which the evo- tions is typically pretty similar to the content of the
lution of topics can be described, they are shown in Figure new ones.
4. There are six distinguishable forms: Case a) shows a topic
which does not significantly change, b) shows the split of a b) In this situation, we have the same phenomena as in ca-
topic si into possibly numerous topics si0 , . . . , si00 that are se a) but a clustering on publications of this topic pro-
somewhat coherent or the emergence of a topic si00 from an duces multiple distinguishable groups which are regar-
already existing (and persisting) topic si , c) shows the mer- ded as new topics split from the old one, thtd < disttd
ging of possibly numerous disconnected topics si , . . . , sj into amongst the new topics. New words are likely to occur
one, d) shows a vanishing topic, e) shows the birth of a new in the publications. If they solely appear in the papers
topic and f ) shows a combination of cases d) and e) with from this area and not throughout the whole corpus,
the anomaly of the topic si being inactive and re-emerging they strongly hint at a change or split in the topic.
over a span of time being the same. The different transitions c) If a merging of topics occurs, the witnessed effects
can be joined ad libitum. will resemble those of case a), although publications
An example for a) could be the image topic we alrea- which would be ordered to prior topics harmonise their
dy encountered in Figure 3. The distribution of words in term distributions and citation behaviour. A clustering
the topic surely changes over time, because the fundamen- would group the topics together.
tal terms vary, though the overall motive in them stays the
same. As instance of case b), algorithms concerning depth d) A dying topic gets none or few new publications as-
first search could be the base, from which other algorithms, signed to. The number of papers in this topic might
such as ones for the computation of strongly connected com- already be declining for a few years. A topic getting
ponents, derived. The original topic persisted while new ones inactive all of a sudden is highly unlikely.
e) If a new topic emerges, publications do not really match researchers from different domains within computer science.
term distributions of existing ones. They usually cite a A list which contains our results is presented to them. They
lot of different topics as they have no clear predecessor. should rate it against the real trends with corresponding
The overlap of content from cited papers (not topics) years.
by a new publication and the citing paper should be Additionally, the trends, important researchers and ve-
calculated, as it is deemed to be rather small. nues identified by our system will be presented to those ex-
perts. They then should rank the correctness of the findings.
f) With the sudden re-emergence of a topic, the term An automatic method to quantify the accuracy of the mo-
distribution of publications match a topic in kx . del would involve the observation of data up until a time tx .
Potential trends at this time will be detected, their evolu-
After the topic distributions for the new publications are tion and future importance is going to be predicted for the
computed, the then active and inactive topics are assigned succeeding five years and the predictions will be compared
to kx+1 and kx+1 respectively. A run concludes with the to the real development of significance of these topics. Num-
processing of the next year of papers in the same manner. bers of papers from topics and citation behaviour could be
prognosticated. If there are discrepancies in predicted and
4.3 Topic Development Prediction and Trend real data, a manual step could be put in, to question experts
Mining to explain the actual development.
Predicting the development of a topic is directly linked to The hybrid approach also needs to be tested against the
trend mining. Topics which are about to blow up are future purely incremental model which does not use LDA with a
trends. The upcoming number of publications in a field, the predetermined k as first step.
estimation of citations a new paper is going to gain [17]
and possible collaborations between researchers can only be 5.2 Applications
computed if the underlying author-publication-graph of the
past is thoroughly analysed and influences on its evolution Possible applications of the dynamic topic model with
are discovered. varying number of topics complete with the identification
The computation of trends in currently active topics is of trends are manifold. A reviewer recommendation system
a step which follows directly from the hybrid topic model. for given publications, a citation recommendation system, a
Topics which changed a lot from tx to tx+1 are candidates keynote speaker recommendation system or a visualisation
for trends. Not only the development of topics from the last tool for exploring bibliographic data with special focus on
to the current time frame is going to be observed, the over- trends could be constructed.
all behaviour of the term distributions and cited topics are Some reviewer recommendation systems work on word to-
relevant. The appearance of new and popular words in the pic and topic citation distributions [11] or are only usable
assigned terms of a topic could signal the beginning of a for already established conferences as they use former pro-
trend and is worth further investigation. gram committees [23]. Others are more refined and want to
Often, popular papers are written by well-known and high- integrate the research interest and direction of scientists into
ly linked authors, they appear in journals with a lot of im- the recommendations [16, 12]. Our model is independent of
pact or are presented at seminal conferences. Here, the en- past conferences. It could make use of the enriched author-
riched data is going to be used. A co-author-graph with re- publication-graph to find scientists capable and willing to
searchers’ affiliations linked to a paper-citation-graph com- review new publications from the field of their current rese-
plete with venues and relationships between journals and arch interest. As the available data for this task is extensive,
conferences could help discover core persons [7], venues and the results could be excellent.
publications in topics and trends. Sometimes, trends also Citation recommendation systems suggest fitting publica-
develop from sects, so they have to be steadily looked at. tions based on their content, but they do not focus on retur-
Topics which were active in tx+1 are judged on whether they ning fundamental papers which lead the way of a topic or
are likely going to be trending in the future. The evolution those written by influential authors for an area [11]. The re-
can be predicted based on the progress of the topic and the lative importance of a paper for an area and its development
found influences. is not considered. With our hybrid model, the identification
of influential papers and persons is a by-product and could
be easily incorporated in such a system.
5. FUTURE PROSPECTS Keynote speakers for a conference from topic si should
After completing the construction of our hybrid approach, be influential scientists from a different topic sj , which is
an evaluation of the proposed system needs to prove and related to si . A linkage of the topics could be predicted,
quantify its validity. Furthermore, several practical uses for the term distributions of the topics harmonise or one topic
the model are presented. adapts words from the other area. The findings in one to-
pic could highly benefit the other. Our model contains this
5.1 Evaluation Plan information so it could be used for this application.
The evaluation of our planned system, which includes the A visualisation tool for the exploration of found topics,
trend mining part, contains multiple steps. The results need relationships and trends in the data would be beneficial for
to be cross-validated. researchers, politicians and entrepreneurs [5]. Past work on
Our hybrid model is going to be run on a base of data the exploration of topics or trends in bibliographic data so-
up until 1995, then topic developments are computed by metimes lacks the support for growing and big data sets [14]
the iterative part with data for the next 10 years. For the or base on a topic model with fixed number of topics [6]. A
following 5 years, trends are predicted. Afterwards, a manual tool using our model and data would inherently dodge these
evaluation of our model and the found trends involves expert weaknesses.
6. CONCLUSION [12] J. Jin, Q. Geng, Q. Zhao, and L. Zhang. Integrating
This work proposed a hybrid approach which aims at mo- the trend of research interest for reviewer assignment.
delling the agile evolution of topics and trends in a growing In Proceedings of the 26th International Conference on
corpus of bibliographic data without a fixed and predefined World Wide Web Companion, Perth, Australia, April
number of topics with help of an LDA base. Different state 3-7, 2017, pages 1233–1241, 2017.
transitions were used to describe the development of topics [13] P. Knoth and D. Herrmannova. Towards
over time in detail. A link to trend mining was drawn. The semantometrics: A new semantic similarity based
work concludes with the presentation of an evaluation con- measure for assessing a research publication’s
cept to confirm the utility of the approach and numerous contribution. D-Lib Magazine, 20(11/12), 2014.
examples of use to underline the potential of our future mo- [14] B. Lee, G. Smith, G. G. Robertson, M. Czerwinski,
del. and D. S. Tan. Facetlens: exposing trends and
relationships to support sensemaking within faceted
Acknowledgements datasets. In Proceedings of the 27th International
Conference on Human Factors in Computing Systems,
Special thanks goes to my supervisor Ralf Schenkel for his CHI 2009, Boston, MA, USA, April 4-9, 2009, pages
invaluable support. 1293–1302, 2009.
[15] M. Ley. DBLP - some lessons learned. PVLDB,
7. REFERENCES 2(2):1493–1500, 2009.
[1] K. Asooja, G. Bordea, G. Vulcu, and P. Buitelaar. [16] X. Liu, T. Suel, and N. D. Memon. A robust model for
Forecasting emerging trends from scientific literature. paper reviewer assignment. In Eighth ACM
In Proceedings of the Tenth International Conference Conference on Recommender Systems, RecSys ’14,
on Language Resources and Evaluation LREC 2016, Foster City, Silicon Valley, CA, USA - October 06 -
Portorož, Slovenia, May 23-28, 2016., 2016. 10, 2014, pages 25–32, 2014.
[2] D. M. Blei and J. D. Lafferty. Correlated topic [17] A. Livne, E. Adar, J. Teevan, and S. Dumais.
models. In Advances in Neural Information Processing Predicting citation counts using text and graph
Systems 18 [Neural Information Processing Systems, mining. February 2013.
NIPS 2005, December 5-8, 2005, Vancouver, British [18] M. Rosen-Zvi, T. L. Griffiths, M. Steyvers, and
Columbia, Canada], pages 147–154, 2005. P. Smyth. The author-topic model for authors and
[3] D. M. Blei and J. D. Lafferty. Dynamic topic models. documents. In UAI ’04, Proceedings of the 20th
In Machine Learning, Proceedings of the Twenty-Third Conference in Uncertainty in Artificial Intelligence,
International Conference (ICML 2006), Pittsburgh, Banff, Canada, July 7-11, 2004, pages 487–494, 2004.
Pennsylvania, USA, June 25-29, 2006, pages 113–120, [19] A. A. Salatino and E. Motta. Detection of embryonic
2006. research topics by analysing semantic topic networks.
[4] D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent In A. González-Beltrán, F. Osborne, and S. Peroni,
dirichlet allocation. Journal of Machine Learning editors, Semantics, Analytics, Visualization.
Research, 3:993–1022, 2003. Enhancing Scholarly Data, pages 131–146, Cham,
[5] J. Boyd-Graber, Y. Hu, and D. Mimno. Applications 2016. Springer International Publishing.
of topic models. 11:143–296, 01 2017. [20] S. Siebert, S. Dinesh, and S. Feyer. Extending a
[6] A. J. Chaney and D. M. Blei. Visualizing topic research-paper recommendation system with
models. In Proceedings of the Sixth International bibliometric measures. In Proceedings of the Fifth
Conference on Weblogs and Social Media, Dublin, Workshop on Bibliometric-enhanced Information
Ireland, June 4-7, 2012, 2012. Retrieval (BIR) co-located with the 39th European
[7] A. Fiallos OrdoÃśez, K. Jimenes, C. Vaca, and Conference on Information Retrieval (ECIR 2017),
X. Ochoa. Scientific communities detection and Aberdeen, UK, April 9th, 2017., pages 112–121, 2017.
analysis in the bibliographic database: Scopus, 04 [21] A. Sinha, Z. Shen, Y. Song, H. Ma, D. Eide, B.-J. P.
2017. Hsu, and K. Wang. An overview of microsoft academic
[8] A. L. Gibbs and F. E. Su. On choosing and bounding service (mas) and applications. In Proceedings of the
probability metrics. INTERNAT. STATIST. REV., 24th International Conference on World Wide Web,
pages 419–435, 2002. WWW ’15 Companion, pages 243–246, New York,
[9] W. Glänzel and B. Thijs. Using ’core documents’ for NY, USA, 2015. ACM.
detecting and labelling new emerging topics. [22] J. Tang, J. Zhang, L. Yao, J. Li, L. Zhang, and Z. Su.
Scientometrics, 91(2):399–416, 2012. Arnetminer: Extraction and mining of academic social
[10] D. Herrmannova and P. Knoth. Semantometrics: networks. In Proceedings of the 14th ACM SIGKDD
Towards fulltext-based research evaluation. In International Conference on Knowledge Discovery and
Proceedings of the 16th ACM/IEEE-CS on Joint Data Mining, KDD ’08, pages 990–998, New York,
Conference on Digital Libraries, JCDL 2016, Newark, NY, USA, 2008. ACM.
NJ, USA, June 19 - 23, 2016, pages 235–236, 2016. [23] H. D. Tran, G. Cabanac, and G. Hubert. Expert
[11] W. Huang, Z. Wu, P. Mitra, and C. L. Giles. Refseer: suggestion for conference program committees. In 11th
A citation recommendation system. In IEEE/ACM International Conference on Research Challenges in
Joint Conference on Digital Libraries, JCDL 2014, Information Science, RCIS 2017, Brighton, United
London, United Kingdom, September 8-12, 2014, Kingdom, May 10-12, 2017, pages 221–232, 2017.
pages 371–374, 2014.