<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Important Citations Identification with Semi-supervised Classification Model</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>School of Economics</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Management</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Beijing Forestry University</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Beijing</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>P.R. China anxin@bjfu.edu.cn (Xin An)</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>@outlook.com (Xin Sun)</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>College of Economics and Management, Beijing University of Technology</institution>
          ,
          <addr-line>Beijing 100124</addr-line>
          ,
          <country country="CN">P.R. China</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>1819</year>
      </pub-date>
      <fpage>0000</fpage>
      <lpage>0001</lpage>
      <abstract>
        <p>Given that citations are not equally important, various techniques have been presented to identify important citations on the basis of supervised machine learning models. However, only a small volume of data has been annotated manually with the labels. To make full use of unlabeled data and promote the learning performance, the semi-supervised self-training technique is utilized to identify important citations in this work. After six groups of features are engineered, the semi-supervised versions of SVM and RF models improve significantly the performance of the conventional supervised versions when un-annotated samples under 75% and 95% confidence level are rejoined to the training set, respectively. The AUC-PR and AUC-ROC of SVM model are 0.8102 and 0.9622, and those of RF model reach 0.9248 and 0.9841, which outperform their counterparts. This demonstrates the effectiveness of our semi-supervised self-training strategy for important citation identification.</p>
      </abstract>
      <kwd-group>
        <kwd>Important Citation</kwd>
        <kwd>Semi-supervised Learning</kwd>
        <kwd>Self-training</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Citations are reckoned as a proxy of scientific knowledge flow in the literature, thus
they are usually utilized for multifarious academic evaluation purposes, such as ranking
of researchers [1], journals [
        <xref ref-type="bibr" rid="ref1">2</xref>
        ], organizations [
        <xref ref-type="bibr" rid="ref2">3</xref>
        ], etc. But most studies treat all
references as equally important to an interested citing publication. This is obviously not in
line with actual situations. In recent years, researchers have argued that citations are not
equally important and presented various techniques to identify important citations
[411].
      </p>
      <p>
        The supervised learning methods are commonly used for this task, which learn the
feature space of the labeled data to form a classification model. However, most
supervised learning methods require a large amount of labeled data to ensure the performance
of the resulting machines [
        <xref ref-type="bibr" rid="ref11">12</xref>
        ]. Currently, only a small number of citations are labeled
manually due to the time-consuming annotation and heavy workload. That is to say,
large amounts of unlabeled data have not been exploited. Last two decades have
witnessed significant progress in the field of semi-supervised learning, and many
successful cases from various fields are reported in the literature [
        <xref ref-type="bibr" rid="ref11 ref12 ref13 ref14">12-15</xref>
        ]. However, important
citations identification with semi-supervised model remains largely under-studied.
      </p>
      <p>
        To make full use of unlabeled data and promote the model performance, a
semisupervised self-training method is deployed in this work. After Section 2 briefly
describes the related work, the framework of semi-supervised self-training for important
citation identification is introduced in Section 3 along with six groups of features [
        <xref ref-type="bibr" rid="ref10">11</xref>
        ].
Section 4 shows the statistics of labeled and unlabeled data. In Section 5, the
experiments of SVM and RF models armed with semi-supervised self-training strategy are
conducted, and Section 6 concludes this work.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Related work</title>
      <p>
        In the literature, various techniques have been presented to identify important citations.
Valenzuela et al. [
        <xref ref-type="bibr" rid="ref3">4</xref>
        ] annotated 465 citations from ACL anthology and used two
supervised learning models (SVM and RF) to conduct important citations classification.
Since then, a plethora of studies have been implemented with different super-vised
learning models on this annotated dataset [
        <xref ref-type="bibr" rid="ref10 ref5 ref6 ref7 ref8 ref9">6-11</xref>
        ], including SVM, RF, Naïve Bayes,
KNearest Neighbors, Decision Tree, Deep Learning, etc. Among all these supervised
models, SVM and RF were the most commonly used and outperformed the other
counterparts. It can be seen that the supervised learning model is a main-stream technique
in this task. However, it relies on large amount of labeled data to maintain the
performance, which is in contrast with the reality that labeled data costly to obtain.
      </p>
      <p>
        In practice, to overcome the limitation of little amount of labeled data and make full
use of unlabeled data, the semi-supervised learning algorithm have received more
attention. Many semi-supervised learning methods are raised, such as co-training [
        <xref ref-type="bibr" rid="ref12">13</xref>
        ],
semi-supervised support vector machine (S3VM) [
        <xref ref-type="bibr" rid="ref13">14</xref>
        ], self-training [
        <xref ref-type="bibr" rid="ref14">15</xref>
        ], etc. These
methods have been indicated the effectiveness in improving the predictive performance
when leveraging large amounts of unlabeled data with a small amount of labeled data.
      </p>
      <p>Among these approaches, the self-training method expands the training data with
predictions on unlabeled data. It is easy to conduct and has great flexibility in threshold
setting, which gives more choices on model selection. Therefore, to make full use of
the unlabeled data, the semi-supervised self-training method is preferred to identify
important citations in this paper.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Methodology</title>
      <p>learning the training set of each fold, the labels of the unlabeled data are predicted
respectively. We selected samples with 95%, 90%, 85%, 80%, 75%, and 70% confidence
level as the pseudo-labeled data to rejoin the training set. For each fold, the model is
retrained on the new combined data and evaluated on the testing set. The involved
parameters are optimized correspondingly. The areas under the curve of PR and ROC are
used as indicators for evaluating the performance.</p>
      <p>
        As for the feature engineering, the following six groups of features from our previous
study [
        <xref ref-type="bibr" rid="ref10">11</xref>
        ] are utilized here: G1 (two generative features extracted from the CIM
model), G2 (Structural based features, containing 7 features), G3 (Separate
citationbased feature, containing 1 feature), G4 (Author overlap-based feature, containing 1
feature), G5 (Cue words-based feature, containing 2 feature), G6 (Similarity based
feature, containing 1 feature). Please refer to [
        <xref ref-type="bibr" rid="ref10">11</xref>
        ] for more details.
4
      </p>
    </sec>
    <sec id="sec-4">
      <title>Data and preprocessing</title>
      <p>
        The annotated corpus in [
        <xref ref-type="bibr" rid="ref3">4</xref>
        ] is used in this work. This dataset was randomly chosen
from the ACL anthology and were manually annotated by one expert with the label 0
(related work), 1 (comparison), 2 (using the work), and 3 (extending the work). For
conducting the experiment of identifying important citations, we combine the related
work and comparison classes into incidental class with the label 0, and using the work
and extending the work classes into important class with the label 1. The
inner-annotator agreement was verified between two experts to reduce the bias raised by human
annotation and reached 93.9% in this coarse label set. Table 1 lists the summary of the
labeled dataset. In the end, 456 pairs of labeled data were collected after preprocessing,
of which 14.7% are important citations.
      </p>
      <p>The preprocessing steps include: (1) Collecting PDF format of citing papers and
converting to text format by Xpdf; (2) Parsing the text format data by ParsCit to extract
title, author, abstract and references of each citing paper as well as the generic section
headers; (3) Extracting citation contexts based on regular expressions; (4)
Preprocessing all textual information including citation contexts and abstract using NLTK
toolkit. During the preprocessing, 434 citing papers are collected, which yields 8,541
citing and cited pairs totally. Table 2 lists the statistics of citing paper and references.
Apart from the labeled data described above, 8,085 unlabeled citations come into being.
Similar to the labeled data, the feature engineering and preprocessing are also
conducted on all unlabeled data.</p>
    </sec>
    <sec id="sec-5">
      <title>Experimental results and discussion</title>
      <p>
        As two state-of-the-art discriminative models, SVM and RF are utilized here as our
classifiers. First of all, these two models were trained on the labeled data. To tune the
parameters of these two classifiers, grid search with 5-fold cross-validation [
        <xref ref-type="bibr" rid="ref15">16</xref>
        ] is used
in this study. Figure 2 shows the PR curves and ROC curves of SVM and RF. As one
can see, the area under the ROC curve (AUC-ROC) of SVM and RF models are 0.9287
and 0.9798 respectively, and the areas under the PR curve (AUC-PR) are 0.7628 and
0.9056 respectively. The RF model outperforms the SVM model, which is in
accordance with most of previous studies [
        <xref ref-type="bibr" rid="ref10 ref3 ref4 ref5 ref6 ref7 ref8 ref9">4-11</xref>
        ].
      </p>
      <p>Then, a semi-supervised self-training on the unlabeled data is conducted. After
learning the training set of each fold based on the above 5-fold data, the labels of the
unlabeled data are predicted. We select samples with 95%, 90%, 85%, 80%, 75%, and 70%
confidence level to rejoin the training set. Table 3 lists the number of new samples of
each fold at different confidence level. After that, for each fold, the resulting model is
retrained on new combined data and evaluated on the testing set. Similarly, grid search
is also used to tune the involved parameters. Table 4 reports the results of mean
AUCROC and AUC-PR of 5-fold under different confidence level. It can be seen that the
AUC-PR and AUC-ROC for SVM model reach the maximum at the 75% confidence
level, which are 0.8102 and 0.9622 respectively. The RF model has the highest
AUCPR and AUC-ROC at 95% confidence level (0.9248 and 0.9841). Both are better than
the results of the above supervised learning counterparts.
RF
70%
7,533
5,909
7,502
6,054
7,517
6,086
7,521
6,054
7,499
6,054</p>
      <p>
        Further, to find out the contribution of each group of features, we perform an
additional experiment to observe the changes of mean AUC-PR and mean AUC-ROC.
Table 5 shows the scores of mean AUC-PR and AUC-ROC of the SVM model under 75%
confidence level and the RF model under 95% confidence level and their rankings (in
parentheses) as well as the average rank using different groups of features under 5-fold
cross-validation by controlling for structure features (G2). For each com-bination, the
resulting parameters are optimized separately. As we can observe, the baseline model
based on the structural features achieves a mean AUCPR of about 0.7600 and 0.7903,
and AUCROC of about 0.8906 and 0.4743. The author-overlap based features (G4)
ranks first, which increase respectively the AUC-PR to 0.9462 and 0.8145, AUC-ROC
to 0.8145 and 0.4798. The CIM (Citation Influence Model) [
        <xref ref-type="bibr" rid="ref16">17</xref>
        ] model-based features
(G1) rank the second, which demonstrates that the features generated from the
generative model can improve the performance of important citations identification. This
observation is in accordance with the previous work [
        <xref ref-type="bibr" rid="ref10">11</xref>
        ].
In this paper, we refer to the practices in [
        <xref ref-type="bibr" rid="ref3">4</xref>
        ] to divide citations into important and
incidental classes and use semi-supervised self-training strategy to identify important
citations by leveraging labeled data and unlabeled data to promote the performance and
generalization ability. Through the semi-supervised self-training on the unlabeled data,
the performance of the SVM model can be promoted from 0.9287 to 0.9622 and from
0.7628 to 0.8102 and that of the RF model from 0.9798 to 0.9841 and from 0.9056 to
0.9248 in terms of mean AUC-ROC and mean AUC-PR. This demonstrates the
effectiveness of our semi-supervised self-training strategy for important citation
identification. Additionally, the CIM model-based features, structural based features and
authoroverlap based features contribute greatly on important citations identification.
Acknowledgements
This research received the financial support from the National Natural Science Foundation of
China under grant number 72004012 and 72074014.
1. Hirsch, J.E.: An index to quantify an individual's scientific research output. Proceedings of
the National academy of Sciences 102(46), 16569-16572 (2005).
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          2.
          <string-name>
            <surname>Garfield</surname>
          </string-name>
          , E.:
          <article-title>Citation indexes to science: a new dimension in documentation through association of ideas</article-title>
          .
          <source>Science</source>
          ,
          <volume>122</volume>
          :
          <fpage>108</fpage>
          -
          <lpage>111</lpage>
          (
          <year>1955</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          3.
          <string-name>
            <surname>Lazaridis</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Ranking university departments using the mean h-index</article-title>
          .
          <source>Scientometrics</source>
          <volume>82</volume>
          (
          <issue>2</issue>
          ),
          <fpage>211</fpage>
          -
          <lpage>216</lpage>
          (
          <year>2010</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          4.
          <string-name>
            <surname>Valenzuela</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ha</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Etzioni</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          :
          <article-title>Identifying meaningful citations</article-title>
          .
          <source>In: Workshops at the twenty-ninth AAAI conference on artificial intelligence</source>
          , pp.
          <fpage>21</fpage>
          -
          <lpage>26</lpage>
          . AAAI ,
          <string-name>
            <surname>Austin</surname>
          </string-name>
          (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          5.
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Turney</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lemire</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vellino</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Measuring academic influence: not all citations are equal</article-title>
          .
          <source>Journal of the Association for Information Science and Technology</source>
          <volume>66</volume>
          (
          <issue>2</issue>
          ),
          <fpage>408</fpage>
          -
          <lpage>427</lpage>
          (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          6.
          <string-name>
            <surname>Hassan</surname>
            ,
            <given-names>S.U.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Akram</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Haddawy</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <article-title>Identifying important citations using contextual information from full text</article-title>
          .
          <source>In: 2017 ACM/IEEE Joint Conference on Digital Libraries (JCDL)</source>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          . IEEE, New York (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          7.
          <string-name>
            <surname>Hassan</surname>
            ,
            <given-names>S.U.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Safder</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Akram</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kamiran</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>A novel machine-learning approach to measuring scientific knowledge flows using citation context analysis</article-title>
          .
          <source>Scientometrics</source>
          <volume>116</volume>
          (
          <issue>2</issue>
          ),
          <fpage>973</fpage>
          -
          <lpage>996</lpage>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          8.
          <string-name>
            <surname>Hassan</surname>
            ,
            <given-names>S.U.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Imran</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Iqbal</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aljohani</surname>
            ,
            <given-names>N.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nawaz</surname>
          </string-name>
          , R.:
          <article-title>Deep context of citations using machine-learning models in scholarly full-text articles</article-title>
          .
          <source>Scientometrics</source>
          <volume>117</volume>
          (
          <issue>3</issue>
          ),
          <fpage>1645</fpage>
          -
          <lpage>1662</lpage>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          9.
          <string-name>
            <surname>Qayyum</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Afzal</surname>
          </string-name>
          , M.T.:
          <article-title>Identification of important citations by exploiting research articles' metadata and cue-terms from content</article-title>
          .
          <source>Scientometrics</source>
          <volume>118</volume>
          (
          <issue>1</issue>
          ),
          <fpage>21</fpage>
          -
          <lpage>43</lpage>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          10.
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
          </string-name>
          , J.,
          <string-name>
            <surname>Jiao</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
          </string-name>
          , G.:
          <article-title>Important citation identification by exploiting the syntactic and contextual information of citations</article-title>
          .
          <source>Scientometrics</source>
          <volume>125</volume>
          (
          <issue>3</issue>
          ),
          <fpage>1</fpage>
          -
          <lpage>21</lpage>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          11.
          <string-name>
            <surname>An</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sun</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hao</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Important Citations Identification by Exploiting Generative Model into Discriminative Model</article-title>
          .
          <source>Journal of Information Science</source>
          . (
          <year>2021</year>
          ) doi:10.1177/0165551521991034.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          12.
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>An</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Qiao</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Semi-supervised least-squares support vector regression machines</article-title>
          .
          <source>Journal of Information &amp; Computational</source>
          <volume>8</volume>
          (
          <issue>6</issue>
          ),
          <fpage>885</fpage>
          -
          <lpage>892</lpage>
          (
          <year>2011</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          13.
          <string-name>
            <surname>Blum</surname>
          </string-name>
          , A., Mitchell, T.:
          <article-title>In: Proceeding of the eleventh annual conference on Computational learning theory</article-title>
          , pp.
          <fpage>92</fpage>
          -
          <lpage>100</lpage>
          . ACM, Madison, Wisconsin (
          <year>1998</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          14.
          <string-name>
            <surname>Chapelle</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sindhwani</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Keerthi</surname>
            ,
            <given-names>S.S.:</given-names>
          </string-name>
          <article-title>Optimization techniques for semi-supervised support vector machines</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          <volume>9</volume>
          (
          <issue>2</issue>
          ), (
          <year>2008</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          15.
          <string-name>
            <surname>Tanha</surname>
            , J., van Someren,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Afsarmanesh</surname>
          </string-name>
          , H.:
          <article-title>Semi-supervised self-training for decision tree classifiers</article-title>
          .
          <source>Journal of Machine Learning and Cybernetics</source>
          <volume>8</volume>
          (
          <issue>1</issue>
          ),
          <fpage>355</fpage>
          -
          <lpage>370</lpage>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          16.
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ma</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tao</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Learn from the Information Contained in the False Splice Sites as well as in the True Splice Sites using SVM</article-title>
          .
          <source>Proceedings of the International Conference on Intelligent Systems and Knowledge Engineering</source>
          ,
          <fpage>1360</fpage>
          -
          <lpage>1366</lpage>
          (
          <year>2007</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          17.
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hao</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>An</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Emerging Research Topics Detection with Multiple Machine Learning Models</article-title>
          .
          <source>Journal of Informetrics</source>
          ,
          <volume>13</volume>
          (
          <issue>4</issue>
          ),
          <volume>100983</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>