<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Automatic identification and classification of integrated knowledge content in an interdisciplinary field: A case study on eHealth</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Center for Studies of Information Resources, Wuhan University</institution>
          ,
          <addr-line>Wuhan 430072</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>School of Information Management, Wuhan University</institution>
          ,
          <addr-line>Wuhan 430072</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Investigation of knowledge integration characteristics, especially from the content perspective, is essential for understanding the formation and evolution of interdisciplinary fields. However, in previous studies, it involves considerable time and effort for researchers to recognize the integrated knowledge content in an interdisciplinary field and to analyze the content characteristics. Therefore, we have studied the automatic methods to identify the explicit integrated knowledge phrases from citation contexts in interdisciplinary field papers and recognize the functions of integrated knowledge phrases by utilizing word embedding techniques and deep learning models. To evaluate the performance of our methodology, we constructed an experimental dataset by taking the eHealth field as a case of interdisciplinary field. From the experimental results, we obtained Recall, Precision and F1 scores of 0.838, 0.989 and 0.907 for the explicit integrated knowledge identification process, and Recall, Precision and F1 scores of 0.856, 0.863 and 0.842 in the unknown phrases test dataset in knowledge functions classification.</p>
      </abstract>
      <kwd-group>
        <kwd>Knowledge Integration</kwd>
        <kwd>Interdisciplinary Field</kwd>
        <kwd>Semantic Function Recognition</kwd>
        <kwd>Deep Learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Interdisciplinary research is often considered as an important driver in modern science
[1]. The essence of interdisciplinary research is successful recombination of existing
disconnected knowledge units from various disciplines [2], which may lead to novel
ideas and accelerate scientific breakthroughs. The increasing number of emerging
interdisciplinary fields demonstrate that interdisciplinary research has become an
important mode in science.</p>
      <p>Scientists and policy makers have attempted to explore the characteristics of
interdisciplinary research to promote the development of interdisciplinary research. Several
bibliometric indicators, e.g., Rao-Stirling [3], have been proposed to measure the
interdisciplinarity of research domains or publications. Most studies used citation analysis
to investigate knowledge diffusion relations between an interdisciplinary field and its
source disciplines through references [4-6]. However, these studies only measure
knowledge dissemination at the paper and journal level, rather than from the
perspective of knowledge units. A few recent studies have explored knowledge integration and
evolution of an interdisciplinary field from the content perspective to understand the
formation and development of an interdisciplinary field [1,7-9]. Nonetheless, the
identification of integrated knowledge content involved considerable human efforts in these
studies. To foster the subsequent analysis and knowledge mining at a large scale,
automatically identifying integrated knowledge content is in great demand.
In this study, we investigate the integrated knowledge content by an interdisciplinary
field. We applied NLP techniques to automatically identify the integrated knowledge
units from citation sentences and the texts of reference publications. And, we classified
the functions of integrated knowledge through deep learning models. An experimental
dataset of the eHealth field was constructed to validate the effectiveness of our
methodology.</p>
    </sec>
    <sec id="sec-2">
      <title>Methodology</title>
      <sec id="sec-2-1">
        <title>Definitions and Problem Formulation</title>
        <p>Scientific publications record various forms of knowledge integration, e.g., the
cooperation of researchers and the citations to the references of different disciplines. In this
article, we investigate the knowledge integrated in an interdisciplinary field, which can
be reflected through citation relations. To this end, we propose integrated knowledge
phrases and a classification scheme based on knowledge functions, which are defined
as follows.</p>
        <p>Integrated Knowledge Phrases. Citation contexts often express relevant information
about cited articles [10]. To reflect the integrated knowledge units, we use the noun
phrases extracted from citation sentences that also appear in the text of the
corresponding cited publications. We contend that the shared phrases between the two
counterparts explicitly signify the transferred knowledge.</p>
        <p>Knowledge Classification Schema. Several classification frameworks have been
proposed to annotate semantic functions of concepts or terms in scientific papers [11-12].
However, the classes in these frameworks, e.g., problems, solutions, and goals, are too
general to analyze knowledge integration in a domain at a fine-grained level. In our
previous study, we proposed a knowledge classification schema for the annotation of
integrated knowledge content, which comprises seven categories, including Research
Subject, Theory, Research Methodology, Technology, Entity, Data, and Others [9].
Research Subject is broader than other categories, which covers multiple kinds of
domainspecific research subjects, e.g., diseases, drugs, and research themes. The identification
of domain entities have become significant tasks in many recent studies, e.g.,
Biomedical NER task [13], and some tools have been developed, e.g., PubTator Central [14].
Therefore, we do not involve the Research Subject category, as well as the Others
category. The final framework in this study is shown in Table 1.</p>
        <p>Category
Theory
Research
methodology
Technology
Entity
Data</p>
        <p>Description
Theory related phrases
Methodology used in
research
Technique, device and
system that used in
research
Human related research
object
Phrases related to
dataset, data source and
data material</p>
        <p>Exemplar phrases
e.g., TAM, social cognitive theory,
transtheoretical model
e.g., systematic review, analysis, meta
analysis, randomized control trial
e.g., mobile phone, web, smartphone, app
e.g., patient, woman, child, adolescent
e.g., twitter, qualitative datum, clinical
datum
The research design of this study is summarized in Fig. 1. The task of integrated
knowledge phrases identification includes knowledge phrases extraction from citation
sentences and corresponding reference texts, and knowledge phrases matching between
the two sources. For the task of classifying the functions of integrated knowledge
phrases, several deep learning models are adopted.
We collected full texts of papers in an interdisciplinary field. For each paper, citation
sentences and bibliography data (title, PMID, etc.) were extracted and linked via the
intext citation tags in the text, e.g., “[1], [2-8]”. Next, we complemented the metadata of
the references, e.g., titles and abstracts, as the cited texts. Then, each in-text citation
generates a pair of citation-reference. An in-text citation of several references was split
into multiple citation-reference pairs. The section titles of citation sentences were also
fetched. The citation-reference pairs records are constituted as the initial dataset for the
following processes.
2.3</p>
      </sec>
      <sec id="sec-2-2">
        <title>Integrated Knowledge Phrases Identification</title>
        <p>Knowledge Phrases Extraction. For each citation-reference pair, we used spaCy, an
open-source Python natural language processing toolkit to extract noun phrases in the
citation sentences and reference texts. We only selected phrases with 2 to 4 words rather
than retained all the phrases with less than 7 words as in our previous study [9]. We
also removed the phrases started or ended with numbers and the phrases with single
characters. Moreover, scispaCy [15], a Python package for processing biomedical
scientific text was applied to expand abbreviations in the text.</p>
        <p>Knowledge Phrases Matching. The extracted knowledge phrases from the two
sources were lemmatized and stemmed using the NLTK package before matching, so
that different variations of the same word could be matched correctly. Next, we used a
combination of three approaches to match the phrases from citation sentences with
those from the corresponding references for each citation-reference pair.
Direct Matching. This approach only counts the identical knowledge phrases from the
two sources as matched phrases.</p>
        <p>Indirect Matching. In this process, the extracted knowledge phrases from the citation
sentences will be exactly matched with the sentences in corresponding reference text
using regular expressions, and vice versa. This approach could identify those phrases
with the same meaning but with different collocations. For example, “focus group” and
“focus group method” could be matched through this method.</p>
        <p>The above two approaches were combined as the baseline method for the knowledge
matching process. We further applied a phrase similarity calculation approach based on
word embedding techniques to identify those phrases with similar meaning but are
represented in different word collocations.</p>
        <p>Word Embedding + Cosine Similarity. Word embedding technique was utilized to
transform phrases into high dimensional vectors, and then cosine similarity of phrase
vectors was calculated. We selected two word embedding models, GloVe and BERT
(Bidirectional Encoder Representations from Transformers). In short, GloVe and
BERT are both language representation models which can be used to vectorize the
words. Word vectors involve rich semantic and contextual information of the words in
the training corpus. However, compared to conventional word embedding models, such
as GloVe, BERT introduces position encoding to describe sequence position
information, and takes the jointly left and right contexts for each occurrence of a given word
into account, which could capture more contextual information. In this paper, we used
a 100 dimensions GloVe model [16] that was pre-trained on the dataset of Wikipedia
2014 and Gigaword 5, and a 12 layers, 768 hiddens BERT base model [17] pre-trained
on Wikipedia and BookCorpus, a corpus with 11,038 unpublished books.
Then, we integrated the matched noun phrases identified by the above three approaches,
and removed the deduplicated phrases. The retained phrases were denoted as the
integrated knowledge phrases.
2.4</p>
      </sec>
      <sec id="sec-2-3">
        <title>Integrated Knowledge Phrases Classification</title>
        <p>We applied several deep learning models to classify integrated knowledge phrases,
which consists of the following major modules:
Input Module. For the input module, we considered several contextual information of
the phrases, which could be divided into semantic information and syntactic
information.</p>
        <p>Semantic Information. Both citation sentences and corresponding reference texts
contain the semantic contextual information of phrases. Therefore, we included both of
them as the input text features.</p>
        <p>Syntactic Information. Different sections in the scientific text may have different
functions [18]. For example, the Introduction section describes more information about the
background of the study, while the Methods section depicts the methods applied. The
section title of the citation sentences where the integrated knowledge phrases occur may
cover some useful contextual information for the knowledge classification.
For each integrated knowledge phrase of each citation-reference pair, we combined the
three features, i.e., the citation sentence, the corresponding reference text, and citation
section title as the contextual information field of the integrated knowledge phrase, and
spliced it with the phrase as the input sequence of the deep learning models.
Classification Module. We applied five deep learning models, including LSTM,
TextCNN, BERT, BERT+SVM, and BERT+XGBoost for the phrase classification
task.</p>
        <p>LSTM. LSTM (Long-Short Term Memory) [19] is a type of Recurrent Neural Network
(RNN), but designed to model chronological sequences and their long-range
dependencies more precisely than conventional RNNs. Therefore, it could provide more
longdistance contextual information. The contextual information and integrated knowledge
phrases were embedded respectively in the embedding layers by using the GloVe model
in this paper.</p>
        <p>TextCNN. TextCNN [20] is a text classification technique using the Convolutional
Neural Network(CNN). CNN is a kind of artificial neural network, in which the output of
each layer is used as the input of the next layer of the neuron. Generally, it includes
four parts, including embedding layer, convolutional layer, pooling layer, and fully
connected layer. The input layer of the model is the same as the LSTM model.
BERT. BERT [17] is not only a pre-trained language representation model, but can also
act as a classifier, if we fine-tuned the model with just one additional output layer. For
each integrated knowledge phrase, the text sequence with contextual information and
integrated knowledge phrase was fed to the BERT model, and was embedded into
vectors in the embedding layers, including token embedding, segment embedding, and
position embedding. Then, these embeddings were spliced in the fully connected layer,
and fed to the output layer, which acted as a classifier for predicting the function label
of integrated knowledge phrase.</p>
        <p>BERT + SVM. SVM (Support Vector Machine) [21] is a kind of generalized linear
classifier, which determines the best decision boundary between vectors that belong to
a category and that do not belong to it. In this model, we used the BERT to extract the
pre-trained embeddings of our text features and fed them to the SVM classifier.
BERT + XGBoost. XGBoost [22] represents “Extreme Gradient Boosting”, which is an
implementation of gradient boosted decision trees. It is designed to push the limit of
computations resources for boosted tree algorithms and has achieved great performance
and speed in applied machine learning recently. We also used BERT to vectorize the
input text sequence in this model.
2.5</p>
      </sec>
      <sec id="sec-2-4">
        <title>Evaluation</title>
        <p>We chose precision, recall and F1 score to measure the performance of the integrated
knowledge phrases identification and classification methods.</p>
        <p>For the identification task, precision is calculated as the number of correctly identified
phrases divided by the total number of phrases identified through the automatic
methods. Recall value is calculated by the number of correctly identified phrases divided by
the number of phrases should be identified through the annotation.</p>
        <p>In the classification task, we used weighted precision and recall scores of all the
categories. For each category, precision is calculated as the number of correctly labelled
integrated knowledge phrases by the total number of integrated knowledge phrases of
this category the classified models recognized, and recall is calculated as the number
of correctly labelled integrated knowledge phrases of this category by the number of
phrases that should be labelled as this category. For the overall precision and recall
score of the test dataset, we assign weight to the precision and recall value of each
category by the proportion of phrases in each category, and then plus all the weighted
precision scores and weighted recall scores of all categories.</p>
        <p>Finally, each F1 score is twice of the multiplication value of precision and recall score
divided by the sum of the two values.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Experiments and Results</title>
      <sec id="sec-3-1">
        <title>Experimental Datasets</title>
        <p>We constructed an eHealth dataset to test the performance of our methodology
framework. XML files of 3,221 eHealth papers published from 1999 to 2018 were
downloaded from two high impact journals, Journal of Medical Internet Research and JMIR
mHealth and uHealth. The metadata of the references were complemented from Web
of Science (WoS) and PubMed. Overall, we obtained 199,461 citation-reference pairs.
Two datasets were constructed for the two tasks in our methodology. For the
identification procedure, we randomly selected 100 citation-reference pairs to manually
annotate the matched phrases and obtained 105 matched phrases in total. For the
classification task, we selected 45,166 matched phrases identified in our previous study [9],
which were labelled as “Research Methodology”, “Technology”, “Entity”, “Data”, and
“Theory”. It was randomly divided into ten folds, eight of them for training, which
contains 36,133 phrases; one for validation, containing 4,517 phrases; and the
remaining one, including 4,516 phrases, for test.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Results of Integrated Knowledge Phrases Identification</title>
        <p>We chose the combination of direct and indirect matching approach, which was used
in the previous study [9], as the baseline method. In this paper, we applied a new
approach, Word Embedding + Cosine Similarity, for improvement. Two word embedding
models, GloVe and BERT, were compared. The threshold of cosine similarity was
tuning among 0.7, 0.8, and 0.9. The evaluation results in Table 2 show that the baseline
approach of our method already has a good performance, with precision of 1.0 and F1
score of 0.900. However, the recall value is relatively low. Regarding the new
approaches, all of them obtained a higher recall value than the baseline method. This
means that we could recognize more integrated knowledge phrases with the new
approach, which is more effective for the knowledge integration analysis. We observe
that with the increase of the threshold of cosine similarity, the precision of the method
is rising, while the recall value is decreasing. To comprehensively measure the
performance of our new method, we further calculated the F1 score. It demonstrates that the
Baseline + BERT approach with the 0.9 cosine similarity threshold has the greatest
performance, with F1 value of 0.907.</p>
      </sec>
      <sec id="sec-3-3">
        <title>Results of Phrases Classification</title>
        <p>Five models were trained in the training dataset, and the hyperparameters of the trained
models were tuned in the validation dataset. Then, the test dataset was used for the
evaluation. We calculated the weighted indicators on the overall test dataset as well as
on the unknown phrases dataset. The unknown phrases dataset contains 291 phrases in
the test dataset but neither occur in the training dataset nor in the validation dataset. We
considered these unknown phrases would measure the generalization ability of the
models better than the whole test dataset since some phrases in the test dataset have
occurred in the training and validation dataset and the models may have remembered
the features of these phrases, although with different contextual information.
As shown in Table 3, the models with BERT embedding, i.e., BERT, BERT+SVM and
BERT+XGBoost, appeared more effective than other deep learning models, i.e., LSTM
and TextCNN. The BERT model itself already has a powerful semantic understanding
and syntax analysis capabilities, which achieved the greatest performance on the overall
test dataset, with a precision of 0.980, a recall of 0.980 and a F1 score of 0.980. As for
the unknown phrases dataset, BERT+XGBoost is the best model. This result reflects
the high efficiency, flexibility and portability of the XGBoost algorithm.
In this paper, we provided a new methodology to automatically identify the explicit
integrated knowledge phrases from citation contexts in interdisciplinary field papers
through word embedding techniques, and then utilize several deep learning models to
classify the functions of the integrated knowledge phrases. The eHealth field was taken
as a case of interdisciplinary field to evaluate the performance of our methodology. The
results show that BERT has a great performance, not only as a pre-trained language
representation model but also as a classifier. In the integrated knowledge phrases
identification process, it obtained Recall, Precision and F1 scores of 0.838, 0.989 and 0.907
Methods
LSTM
TextCNN
BERT
BERT+SVM
BERT+XGBoost</p>
        <p>Test Dataset
Overall phrases
Unknown phrases
Overall phrases
Unknown phrases
Overall phrases
Unknown phrases
Overall phrases
Unknown phrases
Overall phrases
Unknown phrases
respectively when combined with the baseline string match method. Meanwhile, it
achieved the weighted Recall, Precision and F1 scores of 0.980, 0.980 and 0.980 on the
overall test dataset in the classification task. Moreover, when utilized the XGBoost
algorithm along with the BERT model, the model achieved the greatest performance on
the unknown phrases dataset, with weighted Recall, Precision and F1 scores of 0.856,
0.863 and 0.842.</p>
        <p>In general, this paper is one of the primary works to apply the word embedding
techniques and deep learning models in the integrated knowledge phrases identification and
classification of an interdisciplinary field. This automatic methodology would
contribute to the deep investigation of knowledge integration in an interdisciplinary field from
the content perspective. In addition, it could be applied to the knowledge interaction
exploration between any source and target publications.</p>
        <p>However, there are also some limitations in this study. First, in the integrated
knowledge phrases identification step, we only considered the knowledge explicitly
integrated from the references to the citation sentences, but did not include other
knowledge integration forms to comprehensively investigate the knowledge integration
of an interdisciplinary field. Second, the cited reference texts in this article were
represented by the metadata of the references rather than cited texts identified in the full text
of references. The integrated knowledge not contained in the metadata of references
may be lost. Finally, the dataset we used just covers two journals in the eHealth field,
more annotation datasets from various disciplines are further needed to test the
performance of our methodology.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Ba</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cao</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mao</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          :
          <article-title>A hierarchical approach to analyzing knowledge integration between two fields-a case study on medical informatics and computer science</article-title>
          .
          <source>Scientometrics</source>
          <volume>119</volume>
          (
          <issue>3</issue>
          ),
          <fpage>1455</fpage>
          -
          <lpage>1486</lpage>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          &amp;
          <string-name>
            <surname>Barabási</surname>
            ,
            <given-names>A. L.</given-names>
          </string-name>
          :
          <article-title>Science of science</article-title>
          .
          <source>Science</source>
          <volume>359</volume>
          (
          <issue>6379</issue>
          ),
          <year>eaao0185</year>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Leydesdorff</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Rafols</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Indicators of the interdisciplinarity of journals: Diversity, centrality, and citations</article-title>
          .
          <source>Journal of Informetrics</source>
          <volume>5</volume>
          (
          <issue>1</issue>
          ),
          <fpage>87</fpage>
          -
          <lpage>100</lpage>
          (
          <year>2011</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Borgman</surname>
            ,
            <given-names>C.L.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Rice</surname>
            ,
            <given-names>R.E.</given-names>
          </string-name>
          :
          <article-title>The convergence of information science and communication: A bibliometric analysis</article-title>
          .
          <source>Journal of the American Society for Information Science</source>
          <volume>43</volume>
          (
          <issue>6</issue>
          ),
          <fpage>397</fpage>
          -
          <lpage>411</lpage>
          (
          <year>1992</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>Y. W.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>M. H.</given-names>
          </string-name>
          :
          <article-title>A study of the evolution of interdisciplinarity in library and information science: Using three bibliometric methods</article-title>
          .
          <source>Journal of the American Society for Information Science and Technology</source>
          ,
          <volume>63</volume>
          (
          <issue>1</issue>
          ),
          <fpage>22</fpage>
          -
          <lpage>33</lpage>
          (
          <year>2012</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Leydesdorff</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Probst</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>The delineation of an interdisciplinary specialty in terms of a journal set: The case of communication studies</article-title>
          .
          <source>Journal of the American Society for Information Science and Technology</source>
          <volume>60</volume>
          (
          <issue>8</issue>
          ),
          <fpage>1709</fpage>
          -
          <lpage>1718</lpage>
          (
          <year>2009</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ding</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , Zhang,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            , &amp;
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <surname>L</surname>
          </string-name>
          :
          <article-title>Understanding the formation of interdisciplinary research from the perspective of keyword evolution: A case study on joint attention</article-title>
          .
          <source>Scientometrics</source>
          <volume>117</volume>
          (
          <issue>2</issue>
          ),
          <fpage>973</fpage>
          -
          <lpage>995</lpage>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>Engerer</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Exploring interdisciplinary relationships between linguistics and information retrieval from the 1960s to today</article-title>
          .
          <source>Journal of the Association for Information Science and Technology</source>
          <volume>68</volume>
          (
          <issue>3</issue>
          ),
          <fpage>660</fpage>
          -
          <lpage>680</lpage>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <surname>Mao</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Shang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          :
          <article-title>Investigating interdisciplinary knowledge flow from the content perspective of citances</article-title>
          .
          <source>In: EEKE@JCDL</source>
          <year>2020</year>
          , pp.
          <fpage>40</fpage>
          -
          <lpage>44</lpage>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <surname>Elkiss</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shen</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fader</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Erkan</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>States</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Radev</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Blind men and elephants: What do citation summaries tell us about a research article?</article-title>
          <source>Journal of the American Society for Information Science and Technology</source>
          <volume>59</volume>
          (
          <issue>1</issue>
          ),
          <fpage>51</fpage>
          -
          <lpage>62</lpage>
          (
          <year>2008</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <surname>Kondo</surname>
            <given-names>T</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nanba</surname>
            <given-names>H</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Takezawa</surname>
            <given-names>T</given-names>
          </string-name>
          , et al.:
          <article-title>Technical trend analysis by analyzing research papers' titles</article-title>
          .
          <source>In: Proceedings of the Language and Technology Conference on Human Language Technology, Challenges for Computer Science and Linguistics</source>
          , pp.
          <fpage>512</fpage>
          -
          <lpage>521</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <surname>Springer</surname>
          </string-name>
          , Heidelberg (
          <year>2009</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <surname>Heffernan</surname>
            <given-names>K</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Teufel</surname>
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Identifying problems and solutions in scientific text</article-title>
          .
          <source>Scientometrics</source>
          <volume>116</volume>
          (
          <issue>2</issue>
          ),
          <fpage>1367</fpage>
          -
          <lpage>1382</lpage>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          Briefings in Bioinformatics, bbaa057 (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <surname>Wei</surname>
            ,
            <given-names>C. H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Allot</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leaman</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Lu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          :
          <article-title>PubTator central: automated concept annotation for biomedical full text articles</article-title>
          .
          <source>Nucleic acids research</source>
          <volume>47</volume>
          (
          <issue>W1</issue>
          ),
          <fpage>W587</fpage>
          -
          <lpage>W593</lpage>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <surname>Neumann</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>King</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Beltagy</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Ammar</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          :
          <article-title>Scispacy: Fast and robust models for biomedical natural language processing</article-title>
          . arXiv preprint arXiv:
          <year>1902</year>
          .
          <volume>07669</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <surname>Pennington</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Socher</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Manning</surname>
            ,
            <given-names>C. D.</given-names>
          </string-name>
          : Glove:
          <article-title>Global vectors for word representation</article-title>
          .
          <source>In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP)</source>
          , pp.
          <fpage>1532</fpage>
          -
          <lpage>1543</lpage>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <surname>Devlin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>M. W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Toutanova</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Bert: Pre-training of deep bidirectional transformers for language understanding</article-title>
          . arXiv preprint arXiv:
          <year>1810</year>
          .
          <volume>04805</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <string-name>
            <surname>Bertin</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Atanassova</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gingras</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Larivière</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>The invariant distribution of references in scientific articles</article-title>
          .
          <source>Journal of the Association for Information Science and Technology</source>
          <volume>67</volume>
          (
          <issue>1</issue>
          ),
          <fpage>164</fpage>
          -
          <lpage>177</lpage>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <string-name>
            <surname>Hochreiter</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Schmidhuber</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>Long short-term memory</article-title>
          .
          <source>Neural computation 9(8)</source>
          ,
          <fpage>1735</fpage>
          -
          <lpage>1780</lpage>
          (
          <year>1997</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Convolutional neural networks for sentence classification</article-title>
          .
          <source>arXiv preprint arXiv: 1408.5882</source>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <string-name>
            <surname>Cortes</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Vapnik</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Support-vector networks</article-title>
          .
          <source>Machine learning 20(3)</source>
          ,
          <fpage>273</fpage>
          -
          <lpage>297</lpage>
          (
          <year>1995</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Guestrin</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>XGBoost: A Scalable Tree Boosting System</article-title>
          .
          <source>In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</source>
          , pp.
          <fpage>785</fpage>
          -
          <lpage>794</lpage>
          . ACM, New York, USA (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>