<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Approaches to assessing the semantic similarity and future citation of publications by identifying informative terms with predictive properties</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>A.Kh. Khakimova</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>ANO «Scientific and Research Center for Information in Physics and Technique»</institution>
          ,
          <addr-line>Nizhny Novgorod</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>FRC CSC of the Russian Academy of Sciences</institution>
          ,
          <addr-line>Moscow</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Khakimova Aida Kh., PhD, docent, Kama Institute (Naberezhnye Chelny, Russia), ANO «Scientific and Research Center for Information in Physics and Technique» (Nizhny Novgorod</institution>
          ,
          <addr-line>Russia), Е-mail:</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>The article discusses new approaches to assessing the semantic similarity of documents in a vector space, taking into account statistically significant and informative terms. Informative terms reflect the current state of research in a certain field of research. To select informative terms, an algorithm for calculating the impact factor of the term is proposed. It is shown that informative terms allow both to evaluate the semantic similarity of texts and to predict future citations. The developed methods for assessing the semantic similarity and future impact of scientific publications can be used in the framework of “Predictive optimization”, a modern technology that allows us to make decisions based on forecasts. In evaluating the activities of research and individual scientists, bibliometric indicators often play an important role. However, the use of citation-based indicators is problematic in determining the impact of recent publications. Usually, two years after the publication of most articles, they receive only a few links. The probability of future citation can be predicted using the proposed indicator - IFT.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Measuring the similarity between documents is an
important component in various tasks such as document
clustering, topic detection, topic tracking, question
answering, information retrieval and text summarization.</p>
      <p>
        For scientific articles, there are two main types of
similarity measures: citation-based similarity [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and
semantic textual similarity [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. These two types of
similarity measures should correlate and maximizing this
correlation is a convenient way to adjust the coefficients
and parameters on which these measures depend.
      </p>
      <p>
        Citation-based similarity measures such as
bibliographic coupling (if two documents share a reference
in their bibliography) and co-citation (if two documents are
cited by a third document) are an integral component of
many information retrieval systems. Semantic textual
similarity measures analyze situations where two
documents share certain words (co-word linkages [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]),
phrases or ideas [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>
        Latent Semantic Analysis (LSA) [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] and Generalized
Latent Semantic Analysis (GLSA) [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] are the most popular
techniques of Corpus-Based semantic textual similarity [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
GLSA extends the LSA approach by focusing on term
vectors instead of the dual document-term representation.
      </p>
      <p>
        There is a problem of efficient filtering of
noninformative words. LSA and GLSA suffer from noise
introduced by typos and infrequent and non-informative
words [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. To solve this problem, we present a new
citation-based method for efficient filtering of the core
vocabulary and keeping only content bearing words. This
new citation-based method is called the Impact Factor of
Terms (IFT). It is described in Section 2. IFT assesses the
significance and informational content of terms in scientific
articles based on citation analysis of the articles with these
terms. Also, IFT is useful for prediction future citations and
promising topics in different subject areas such as smart
energy systems.
      </p>
      <p>Maximizing correlation between citation-based
similarity and IFT-based semantic textual similarity is a
convenient way to adjust the coefficients and parameters of
the IFT method.</p>
      <p>IFT is similar to journal impact factor (JIF) which has
been used for many years and has proven effective. JIF is a
scientometric index that reflects the yearly average number
of citations that articles published in the last two years in a
given journal received. If all articles of a journal are highly
cited, then this journal has a high JIF value and is
considered significant and authoritative. Similarly, if all
articles with some general term are highly cited, then this
term has a high IFT value and is considered significant and
informative. The IFT helps to identify informative terms
that indicate significant fundamental ideas. Words and
terms with a constantly high IFT (for example, neural
networks) denote significant ideas, interest in which is
stable for many years. For such informative words, the IFT
values are stably high. Also, such words have a high
correlation between IFT values of the current and next year.
This correlation as well as the conditions for the stability
and predictability of the IFT are discussed in Section 4.
Section 3 describes a collection of articles used for
experiments to study the empirical properties of IFT,
including its correlations. The next section gives a formal
description of the IFT.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Impact Factor of Terms (IFT)</title>
      <p>There are currently several journal ranking systems, but
the oldest and most influential system is a journal impact
factor (JIF). JIF is used as an indicator of the importance of
a journal for its field.</p>
      <p>A journal's impact factor is based on how often articles
published in that journal during the previous two years (e.g.
2017 and 2018) were cited by articles published in a
particular year (e.g. 2019).</p>
      <p>The higher the JIF, the more often articles in that
journal are cited by other articles. Thus, the influence factor
can give an approximate idea of how prestigious the
magazine is in its field of science.</p>
      <p>
        The journal with the highest IF value is the one that
publishes the most frequently cited articles over a two-year
period. One easy way to increase JIF is to publish more
review articles, which are usually cited more often than
research reports [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
      <p>
        Author Impact Factor (AIF) is an extension of the
impact factor for authors. The AIF of an author A in year t
is the average number of citations given by papers
published in year t to papers published by A in a period of
Δt years before year t. AIF is able to capture trends and
variations in the influence of scientists over time, in
contrast to the h-index, which is a measure that takes into
account the entire career path [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>We offer an extension of the impact factor idea for
terms. We offer a new numerical indicator of the authority
of words and terms, called the impact factor of the term
(IFT).</p>
      <p>IFT (formula 1) can be used to effectively filter the
dictionary, excluding uninformative words and terms. With
the help of IFT, we can identify promising topics and ideas,
find implicit links between articles and texts, and discover
ideologically influential sites.
(1)

=
 
 
,
where Аt is the number of citations in articles with the term
A published in year t to articles with the term A in the
period Δt years to year t; Nt - total number of articles with
term A for the time period ∆t + 1.</p>
      <p>Therefore, the IFT of term A in year t is the average
number of references cited in articles
with term</p>
      <p>A
published in year t to articles with term A in the period ∆t
years to year t.</p>
      <p>It follows from the IFT formula (1) that the method will
certainly increase the correlation of the similarity measure
of texts with their bibliographic relationship, since the IFT
linearly depends on the number of bibliographic references
over the past two years (or over a period of ∆t years).</p>
      <p>Various approaches to the calculation of IFT were
investigated.</p>
      <p>The modified impact factor of the term (IFTm) is the
ratio of citations of articles with term A to the total number
of articles with this term over 3 years.</p>
      <p>=
  −2 +   −1 +  

,
where Аt-2 - the number of links to the article with the term
A two years ago in same year; Аt-1 - the number of links to
the article with term A last year for the same and previous
years; Аt - the number of links to the article with term A
over a three-year period, including the current year; N
total number of articles with term A for three years.</p>
      <p>Both the IFT and IFTm are considered only for articles
in which the given term is in the title. Only citations from
(2)
articles containing the specified term in the title are taken
into account.</p>
    </sec>
    <sec id="sec-3">
      <title>3. AI collection (Data Set)</title>
      <p>In our experiments, we analyze DBLP citation network,
which is a collection of articles on Artificial Intelligence
from 1936 to 2017, compiled by aminer.org and referred to
here as AI collection.</p>
      <p>The citation data is extracted from DBLP (Digital
&amp;</p>
      <sec id="sec-3-1">
        <title>Library</title>
      </sec>
      <sec id="sec-3-2">
        <title>Project</title>
        <p>dblp.org),</p>
        <p>ACM
(Association for Computing Machinery acm.org), MAG
(Microsoft Academic Graph), and other sources.</p>
        <p>We used the V10 version released in October 2017.
This data set consists of 3,079,007 articles and 25,166,994
citation relationships. For each article there is a title,
authors, year of publication and links. We have processed
all titles and citation relationships.</p>
        <p>In this paper, the AI collection was analyzed in different
directions described in the next Section.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results of a statistical analysis of term trends</title>
      <p>The main goal of the statistical analysis of the AI
collection is to study the empirical properties of Impact
Factor of Terms (IFT), including the correlation of its
current and future values to assess its stability and forecast
future citations.</p>
      <p>Statistical analysis of the collection was carried out
using the Trend+ author program, which built a frequency
dictionary of all words and terms in the collection. Also, for
each term</p>
      <p>with a frequency of more than 5, Trend+
calculated its trend
indicators
(trending</p>
      <p>situations),
including the number of articles with this term for the year,
the number of citations from other articles with this term,
the IFT and IFTm indicators for the current and next year.</p>
      <p>To calculate the correlation, situations/points were
selected for different words in different years, when the
values of IFT and IFTm of the current year were more than
zero. There could be several such situations for one word
in different years. The selected situations were divided into
groups differing in the number of articles with a word over
the past 3 years. According to the number of situations, the
IFTm groups turned out to be larger than the IFT groups,
because IFTm takes into account more citations. Fig. 1
shows graphs of the number of situations/points in these
groups for calculating correlations.</p>
      <p>Fig. 1. Graphs of the number of points for calculating the correlations of the current and future years for the indicators IFTm (upper)
and IFT, depending on the number of articles with the word in the last 3 years</p>
      <p>In Fig. 1, the upper graph corresponds to the IFTm, and
the lower IFT. The y-axis represents the number of points
for calculating the correlations of the current and future
years. The x-axis represents the frequency of terms, i.e. the
number of articles with the term over the last 3 years. The
maximum points on both graphs are achieved when the
number of articles is 5, because the experiment did not
analyze terms that occurred less than 5 times in the
collection for all time.</p>
      <p>On the IFT graph, the maximum number of points
54326 is reached at X = 5, and the minimum 2423 at X =
50. On the IFTm graph, the maximum number of points
91997 is reached at X = 5, and the minimum 2913 at X =
50.</p>
      <p>For each group of trending situations/points (i.e., for
each X) individually, a correlation was calculated between
the current and future values of IFT and IFTm. The results
of calculating the correlations are shown in Fig. 2.</p>
      <p>Fig. 2. Graph of IFT correlations (upper) and IFTm correlations of the current and future years depending on the number of articles
with the word in the last 3 years</p>
      <p>The upper graph is the IFT correlations, and the lower
graph is the IFTm correlations.</p>
      <p>Both graphs behave very similarly, but the correlations
of the IFT (upper graph) are almost always greater than the
correlations of the IFTm. The correlation on the graphs
reaches 0.5 at a frequency of 17 articles over the past three
years, 0.6 at 26 articles, and 0.7 at 45 articles. Thus, IFT
behaves more stably and predictably than IFTm, but IFTm
covers more different situations and words/terms.</p>
      <p>The graphs show that the higher the current frequency
of the term (the number of articles with the term), the higher
the correlation, and therefore, the more stable the IFT
behaves in time. Stable IFT allows you to accurately
predict the average number of future citations, since the IFT
is exactly equal to the average number of citations of
articles with the specified word/term. Thus, the
words/terms with a high frequencies and high IFT values
define promising topics in different subject areas such as
artificial intelligence or smart energy systems.</p>
      <p>The most stable and predictable words/terms with high
IFT values are called informative terms. Informative
words/terms have high frequencies and IFT meanings
above a certain threshold. The type of function for filtering
of non-informative words which grows with increasing IFT
and frequency can be selected by maximizing the
correlation between citation-based similarity and
IFTbased semantic textual similarity. As a first approximation,
this filtering function can be taken as the product of IFT
and frequency with a certain minimum threshold for IFT.</p>
      <p>Here are examples of the most informative words/terms
in the collection of AI articles that have the largest total
values of IFT multiplied by the current frequency: web
(year 1982), fuzzy (1969), sensor networks (1992), neural
(1962), video (1976) , social (1971), cognitive (1973),
semantic (1967), clustering (1970), neural networks
(1986).</p>
      <p>These examples point to the most actively and stably
developing areas of AI, and also confirm the usefulness of
the proposed filtering function and its ability to evaluate the
significance and information content of words/terms.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Predicting the citations with IFT</title>
      <p>Prediction of citation of scientific works was studied by
many researchers. The described approaches are mainly
based on the analysis of a number of features, including
information about the authors (number of authors, country,
authors rating, etc.), features of the journal (total number of
links to the journal, impact factor of the journal), article
parameters (topic, volume, number of references etc.), type
of research (for example, original research compared to a
literature review), as well as other characteristics
(reputation of institutions etc.). In addition, altmetrics are
also used to predict the citation of a scientific paper.</p>
      <p>
        Citation prediction methods have been proposed, for
example, by Walters (2006) [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], Haslam et al. (2008) [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ],
Fu and Aliferis (2010) [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], Wang, Yu and Yu (2011) [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ],
Wang et al. (2012) [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], Didegah and Thelwall (2013) [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ],
Yu, Yu, Li and Wang (2014) [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], Onodera and Yoshikane
(2015) [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], Cao et al. (2016) [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], Golosovsky and
Solomon (2017) [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], Fiala and Tutoky (2018) [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ], Bai et
al. (2019) [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. For example, Wang et al. (2013) [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ]
propose mathematical models that describe how
publications accumulate citations over time. Using these
models, the authors predict the effect of publication citation
on a longer term based on a short-term publication citation
history. Bornmann et al. (2013) [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] present an empirical
analysis of the correlation between short-term and
longterm citation indicators.
      </p>
      <p>IFT evaluates the significance and informativeness of
terms in scientific articles based on an analysis of the
citation of articles with these terms. IFT can also be used to
predict future citations of new articles.</p>
      <p>Given the practical importance of incorporating the
latest publications in evaluations of scientific performance,
one of the goals of our study is to develop a model to
predict the impact that recent publications will have in the
long run.</p>
      <p>Our model assumes a publication citation prediction
based on the following predictors: the impact factor of
significant terms (for example, authors' keywords) and the
time of appearance of subsequent articles associated with
implicit links to the original article.</p>
      <p>The two predictors used are readily available, and
unlike most prediction approaches, they allow you to make
predictions pretty soon after the publication.</p>
      <p>Citation forecasts have a high degree of uncertainty.
Therefore, we believe that it is more important to know the
likelihood that the publication will receive a certain number
of links in the future. Therefore, we do not predict the
average number of links that the publication should attract
in the future, but we predict the probability distribution for
the future number of links based on the developed
mathematical probabilistic model of the dependence of the
number of direct citations on terms with high IFT.</p>
      <p>It is important to emphasize that the purpose of our
work is different from the studies mentioned above. As in
the above studies, we are interested in predicting the future
citation. However, many indicators that have been found to
correlate with the influence of citation are easy to
manipulate.</p>
      <p>For example, suppose researchers know that future
citations of a publication will be predicted based, for
example, on the number of pages or the number of links. In
this case, authors can artificially increase the number of
pages or increase the number of bibliographic references.
Therefore, we consider variables that cannot be changed by
the authors of the publication.</p>
      <p>Based on IFT values, we can choose informative terms
that indicate important fundamental ideas. Words and
terms with a consistently high IFT indicate important ideas
that have been stable for many years.</p>
      <p>In our experiments, we analyze the DBLP citation
network, which is a collection of articles on artificial
intelligence from 1936 to 2017, including 3,079,007
articles and 25,166,994 links. Statistical analysis of the
collection was carried out using the Trend + program,
which built a frequency dictionary and trend indicators,
including the number of articles with this term per year, the
number of links to other articles with this term, IFT and
IFTm indicators for the current and next year.</p>
      <p>The term “Trend of the initial frequency” (TIF) is
proposed - this is the number of years from the first article
with a certain term to the nth article with this term. A
relationship was found between TIF, IFT, and citation
trends. It is shown that the higher the trends of the initial
frequency, the higher the trends of fresh citation links, that
is, the higher the likelihood of quick appearance of links to
the article.</p>
      <p>Of particular interest are trend terms with a large
number of new articles (more than 10 articles in the
previous 2 years). For trend terms, the correlation of current
and future IFTm is more than 60%, which allows us to
make a fairly confident forecast of IFTm (i.e. citation
forecast) for the next year.</p>
      <p>We summarize how our study differs from existing
works:
˗ we are interested in predicting the long-term impact of
citation, based solely on the impact factors of
˗
˗
significant terms (as mentioned above, we do not want
to use variables that can be easily manipulated);
we are interested in predicting the long-term impact of
citation within one or two years after the publication;
unlike most earlier papers, our interest is in predicting
the probability distribution for the future number of
links to a publication. We do not aim to give an accurate
estimate of the future number of links to the publication.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgment</title>
      <p>The reported study was funded by RFBR according to
the research projects № 18-07-00909, 19-07-00857 and
2004-60185.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Gipp</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          (
          <year>2014</year>
          ).
          <article-title>Citation-based Document Similarity</article-title>
          .
          <source>Citation-based Plagiarism Detection</source>
          . Springer Fachmedien Wiesbaden, pp.
          <fpage>43</fpage>
          -
          <lpage>55</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Gomaa</surname>
            ,
            <given-names>W.H.</given-names>
          </string-name>
          <article-title>and</article-title>
          <string-name>
            <surname>Fahmy</surname>
            ,
            <given-names>A.A.</given-names>
          </string-name>
          (
          <year>2013</year>
          ).
          <article-title>A survey of text similarity approaches</article-title>
          ,
          <source>Int. J. Comput. Appl.</source>
          , vol.
          <volume>68</volume>
          , no.
          <issue>13</issue>
          , doi: https://doi.org/10.5120/
          <fpage>11638</fpage>
          -
          <lpage>7118</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Leydesdor</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          (
          <year>1989</year>
          ).
          <article-title>Words and co-words as indicators of intellectual organization</article-title>
          .
          <source>Research Policy</source>
          <volume>18</volume>
          (
          <issue>4</issue>
          ), pp.
          <fpage>209</fpage>
          -
          <lpage>223</lpage>
          . DOI http://dx.doi.org/10.1016/
          <fpage>0048</fpage>
          -
          <lpage>7333</lpage>
          (
          <issue>89</issue>
          )
          <fpage>90016</fpage>
          -
          <lpage>4</lpage>
          . URL http://www.sciencedirect.com/science/article/pii/004 8733389900164
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Charnine</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Klimenko</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          (
          <year>2015</year>
          ).
          <source>Measuring of “Idea-based” Influence of Scientific Papers // Proceedings of the 2015 International Conference on Information Science and Security (ICISS</source>
          <year>2015</year>
          ),
          <source>December 14-16</source>
          , Seoul, South Korea, pp.
          <fpage>160</fpage>
          -
          <lpage>164</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Landauer</surname>
            ,
            <given-names>T.K.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Dumais</surname>
            ,
            <given-names>S.T.</given-names>
          </string-name>
          (
          <year>1997</year>
          ).
          <article-title>A solution to plato's problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge"</article-title>
          ,
          <source>Psychological Review</source>
          ,
          <volume>104</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Matveeva</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Levow</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Farahat</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Royer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          (
          <year>2005</year>
          ).
          <article-title>Generalized latent semantic analysis for term representation</article-title>
          .
          <source>In Proc. of RANLP.</source>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Zaidi</surname>
            <given-names>I</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Singh</surname>
            <given-names>S</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sinha</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dwivedi</surname>
            <given-names>R.</given-names>
          </string-name>
          (
          <year>2015</year>
          ).
          <article-title>Current views and implications of journal impact factor: A key note</article-title>
          .
          <source>Indian J Dent</source>
          .
          <volume>6</volume>
          (
          <issue>2</issue>
          ):
          <fpage>113</fpage>
          -
          <lpage>114</lpage>
          . doi:
          <volume>10</volume>
          .4103/
          <fpage>0975</fpage>
          -
          <lpage>962X</lpage>
          .
          <fpage>154375</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Pan</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fortunato</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          (
          <year>2015</year>
          ).
          <article-title>Author Impact Factor: tracking the dynamics of individual scientific impact</article-title>
          .
          <source>Sci Rep</source>
          <volume>4</volume>
          ,
          <fpage>4880</fpage>
          . https://doi.org/10.1038/srep04880.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Walters</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          (
          <year>2006</year>
          ).
          <article-title>Predicting subsequent citations to articles published in twelve crime-psychology journals: Author impact versus journal impact</article-title>
          .
          <source>Scientometrics</source>
          ,
          <volume>69</volume>
          (
          <issue>3</issue>
          ), pp.
          <fpage>499</fpage>
          -
          <lpage>510</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Haslam</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ban</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kaufmann</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Loughnan</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Peters</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Whelan</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , et al. (
          <year>2008</year>
          ).
          <article-title>What makes an article influential? Predicting impact in social andpersonality psychology</article-title>
          .
          <source>Scientometrics</source>
          ,
          <volume>76</volume>
          (
          <issue>1</issue>
          ), pp.
          <fpage>169</fpage>
          -
          <lpage>185</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Fu</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Aliferis</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          (
          <year>2010</year>
          ).
          <article-title>Using content-based and bibliometric features for machine learning models to predict citation counts in the biomedical literature</article-title>
          .
          <source>Scientometrics</source>
          ,
          <volume>85</volume>
          (
          <issue>1</issue>
          ), pp.
          <fpage>257</fpage>
          -
          <lpage>270</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          (
          <year>2011</year>
          ).
          <article-title>Mining typical features for highly cited papers</article-title>
          .
          <source>Scientometrics</source>
          ,
          <volume>87</volume>
          (
          <issue>3</issue>
          ), pp.
          <fpage>695</fpage>
          -
          <lpage>706</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>He</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>An</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          (
          <year>2012</year>
          ).
          <article-title>Development a case-based classifier for predicting highly cited papers</article-title>
          .
          <source>Journal of Informetrics</source>
          ,
          <volume>6</volume>
          (
          <issue>4</issue>
          ), pp.
          <fpage>586</fpage>
          -
          <lpage>599</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Didegah</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Thelwall</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          (2013a]).
          <article-title>Determinants of research citation impact in nanoscience and nanotechnology</article-title>
          .
          <source>Journal of the American Society forInformation Science and Technology</source>
          ,
          <volume>64</volume>
          (
          <issue>5</issue>
          ), pp.
          <fpage>1055</fpage>
          -
          <lpage>1064</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>P.-Y.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          (
          <year>2014</year>
          ).
          <article-title>Citation impact prediction for scientific papers using stepwise regression analysis</article-title>
          .
          <source>Scientometrics</source>
          ,
          <volume>101</volume>
          (
          <issue>2</issue>
          ), pp.
          <fpage>1233</fpage>
          -
          <lpage>1252</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Onodera</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Yoshikane</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          (
          <year>2015</year>
          ).
          <article-title>Factors affecting citation rates of research articles</article-title>
          .
          <source>Journal of the Association for Information Science and Technology</source>
          ,
          <volume>66</volume>
          (
          <issue>4</issue>
          ),
          <fpage>739</fpage>
          -
          <lpage>764</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Cao</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            <given-names>K.J.R.</given-names>
          </string-name>
          (
          <year>2016</year>
          ).
          <article-title>A data analytic approach to quantifying scientific impact</article-title>
          .
          <source>Journal of Informetrics</source>
          ,
          <volume>10</volume>
          (
          <issue>2</issue>
          ), pp.
          <fpage>471</fpage>
          -
          <lpage>484</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Golosovsky</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Solomon</surname>
            <given-names>S.</given-names>
          </string-name>
          (
          <year>2017</year>
          ).
          <article-title>Growing complex network of citations of scientific papers: Modeling and measurements</article-title>
          . Physical Review E,
          <volume>95</volume>
          (
          <issue>1</issue>
          ), p.
          <fpage>012324</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Fiala</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tutoky</surname>
            <given-names>G.</given-names>
          </string-name>
          (
          <year>2018</year>
          ).
          <article-title>PageRank-based prediction of award-winning researchers and the impact of citations</article-title>
          .
          <source>Journal of Informetrics</source>
          ,
          <volume>11</volume>
          (
          <issue>4</issue>
          ), pp.
          <fpage>1044</fpage>
          -
          <lpage>1068</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Song</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barabási</surname>
            ,
            <given-names>A.-L.</given-names>
          </string-name>
          (
          <year>2013</year>
          ).
          <article-title>Quantifying long-term scientific impact</article-title>
          .
          <source>Science</source>
          ,
          <volume>342</volume>
          (
          <issue>6154</issue>
          ) , pp.
          <fpage>127</fpage>
          -
          <lpage>132</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <surname>Bornmann</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leydesdorff</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          (
          <year>2013</year>
          ).
          <article-title>Which percentile-based approach should be preferred for calculating normalized citation impact values?an empirical comparison of five approaches including a newly developed citation-rank approach (p100)</article-title>
          .
          <source>Journal of Informetrics</source>
          ,
          <volume>7</volume>
          (
          <issue>4</issue>
          ), pp.
          <fpage>933</fpage>
          -
          <lpage>944</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <surname>Bai</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          (
          <year>2019</year>
          ).
          <article-title>Predicting the citations of scholarly paper</article-title>
          .
          <source>Journal of Informetrics</source>
          , Volume
          <volume>13</volume>
          , Issue 1, pp.
          <fpage>407</fpage>
          -
          <lpage>418</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>