<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Learning Embeddings from Scientific Corpora using Lexical, Grammatical and Semantic Information</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Andres Garcia-Silva</string-name>
          <email>agarcia@expertsystem.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ronald Denaux</string-name>
          <email>rdenaux@expertsystem.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jose Manuel Gomez-Perez</string-name>
          <email>jmgomez@expertsystem.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Expert System</institution>
          ,
          <addr-line>Madrid</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Expert System</institution>
          ,
          <addr-line>Madrid, spain</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <abstract>
        <p>Natural language processing can assist scientists to leverage the increasing amount of information contained in scientific bibliography. The current trend, based on deep learning and embeddings, uses representations at the (sub)word level that require large amounts of training data and neural architectures with millions of parameters to learn successful language models, like BERT. However, these representations may not be well suited for the scientific domain, where it is common to find complex terms, e.g. multi-word, with a domain-specific meaning in a very specific context. In this paper we propose an approach based on a linguistic analysis of the corpus using a knowledge graph to learn representations that can unambiguously capture such terms and their meaning. We learn embeddings from diferent linguistic annotations on the text and evaluate them through a classification task over the SciGraph taxonomy, showing that our representations outperform (sub)word-level approaches.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>CCS CONCEPTS</title>
      <p>• Computing methodologies → Natural language
processing; Neural networks; Semantic networks; Machine learning
approaches.</p>
    </sec>
    <sec id="sec-2">
      <title>INTRODUCTION</title>
      <p>
        Nowadays scholarly communications are evolving, thanks to the
efort of research communities, funding agencies and publishers,
beyond the conventional delivery method based on documents
to gain better visibility, reuse capabilities and to foster a broader
data accessibility[
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. The list of enhancements is wide and include
the availability of supporting material such as code 1 and research
software[
        <xref ref-type="bibr" rid="ref29">29</xref>
        ], the use of Digital Object Identifiers to favor reusability
and proper credit to authors, the emergence of specialized academic
search engines such as semantic scholar, and the adoption of the
FAIR principles [
        <xref ref-type="bibr" rid="ref32">32</xref>
        ] to make data findable, accessible, interoperable
and reusable.
      </p>
      <p>
        Aligned with the FAIR principles, in particular with the goal of
assisting humans and machines in managing data, research objects
[
        <xref ref-type="bibr" rid="ref13 ref2 ref3 ref33">2, 3, 13, 33</xref>
        ] encapsulate and annotate semantically all the resources
involved in a research endeavour enabling data interoperability and
machine-readability among other benefits. In addition, publishers
have started releasing knowledge graphs such as Springer nature
SciGraph2, an open linked data graph about publications from
the editorial group and cooperating partners, and the Literature
Graph in Semantic scholar [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Nevertheless the knowledge of the
scholar communications is still mainly text which is dificult to
process by software agents. Research objects shed some light on the
publication content with the semantic annotations, however they
are user-generated and scarce in existing repositories [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. Semantic
scholar, on the other hand, uses Natural Language Processing to
extract keywords and identify topics relevant for the publications.
      </p>
      <p>
        In fact NLP technology is progressing at a fast pace thanks to
word embeddings [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] and pre-trained language models based on
transformers[
        <xref ref-type="bibr" rid="ref30">30</xref>
        ] that have allowed to improve the state of the
art on diferent evaluation tasks [
        <xref ref-type="bibr" rid="ref12 ref24">12, 24</xref>
        ]. Most of existing word
embeddings and pre-trained language models use sequences of
characters, word pieces, and words in a sentence as their main input.
However, in the scientific domain there are terms consisting of more
than one word that have a domain-specific semantics. For example
the meaning of a term such as Molecularly imprinted polymer3 can
be hardly identified from the single words, word pieces or other
character-based representations, and hence the neural models used
for NLP need to learn the relation between the single words, word
pieces or characters, requiring complex architectures with a high
number of parameters to optimize, and a huge amount of training
data.
      </p>
      <p>Scientific terminology is domain specific and scarce in a general
corpus and hence accumulating the necessary amount of evidence
from documents to identify it a as single entity with a specific
meaning is very unlikely if we analyse single words and sub words
representations. On the other hand, precisely in the scientific domain,
the amount of structured resources, including catalogs, taxonomies
and knowledge graphs with specific terminology and their
corresponding definitions is available. Thus, the question raises, what is
the minimum information unit or their combination thereof, which
allows for eficient representations in vector form and at the same
time can be linked to a semantically significant concept?</p>
      <p>In this paper we propose to generate embeddings using surface
forms, lemmas and concepts that are able to represent complex
terms consisting of more than one word. The linguistic information
is the result of applying a linguistic analysis that relies on a
knowledge graph where linguistic knowledge is encoded. The linguistic
analysis performs a grammatical, syntactical and semantic analysis
to recognize and disambiguate terms that can consist of more than
2SciGraph homepage: https://www.springernature.com/gp/researchers/scigraph
3According to Wikipedia a Molecularly imprinted polymer is a polymer that has been
processed using the molecular imprinting technique
one word. We generate embeddings from a scholarly
communications corpus for single and joined representations (surface forms,
lemmas, part-of-speech, and concepts). We experiment with these
embeddings in a text classification task where the goal is to classify
academic publications in a topic taxonomy.</p>
      <p>Our results show that using linguistic annotation embeddings
helps to learn better classifiers when compared to those learned
only with words or subword embeddings. According to our
experimentation the best approach is to use surface form and lemma
embeddings jointly. When surface form and lemma embeddings are
enriched with gramma information embeddings, like part-of-speech
tag embeddings, the classifier with the greatest precision is learned.
On the other hand, concept embeddings results were mixed
probably due to the general-purpose annotator used in the experiments
with a limited coverage of the scientific domain vocabulary.</p>
      <p>This papers is structured as follows. Section 2 describes the
related work and the paper contributions. Section 3 summarizes the
approach to learn the embeddings for linguistic annotations. Next,
Section 4 presents the experimental work where we evaluate the
embeddings in a text classification task. Finally section 5 presents
the conclusions and future lines of work.
2</p>
    </sec>
    <sec id="sec-3">
      <title>RELATED WORK</title>
      <p>
        Recent work in distributional representation of words has moved
from static [
        <xref ref-type="bibr" rid="ref17 ref19 ref21 ref28 ref6">6, 17, 19, 21, 28</xref>
        ] to contextualized word embeddings
[
        <xref ref-type="bibr" rid="ref12 ref22">12, 22</xref>
        ], in an efort to generate them dynamically according to the
context and deal with phenomena like polysemy and homonymy.
A main problem with traditional words embeddings is that
unseen words or rare words are not represented in the distributional
space and hence considered as out-of-vocabulary (OOV) words. To
overcome the OOV problem diferent embedding representations
have been proposed including character level used in ELMO [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ],
character-n-grams used in FastText [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], subwords used in GPT [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ]
and word pieces [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ] used in BERT [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
      </p>
      <p>
        In parallel researchers have proposed to learn jointly concepts
and word embeddings as an alternative approach to cope with the
ambiguity of the language. For example Camacho-Collados et al.
[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] relies on Wikipedia and Chen et al. [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] on WordNet to generate
concept embeddings. Many approaches learn embeddings straight
from knowledge graphs [
        <xref ref-type="bibr" rid="ref20 ref25 ref26 ref7">7, 20, 25, 26</xref>
        ], and others use linguistic
annotations on a text corpus [
        <xref ref-type="bibr" rid="ref11 ref18">11, 18</xref>
        ].
      </p>
      <p>
        In the scientific domain, Wang et al . [
        <xref ref-type="bibr" rid="ref31">31</xref>
        ] highlighted the
limitations of general-purpose word embeddings in NLP tasks. So as to
deal with such limitations Beltagy et al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] use BERT[
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] to learn
embeddings from the scientific domain. In this work adopt the
Vecsigrafo approach [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] to generate embeddings from a scientific
corpus for surface forms, lemmas and concepts. The vecsigrafo
embeddings encodes linguistic information in contrast to approaches
like Beltagy et al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] that relies on word pieces.
      </p>
      <p>The main contribution of this paper is a comprehensive
experimentation in the scientific domain with Vecsigrafo embeddings
jointly learned from linguistic annotations and compare them with
word and subword embeddings.
3</p>
    </sec>
    <sec id="sec-4">
      <title>LEVERAGING LEXICAL, GRAMMATICAL</title>
    </sec>
    <sec id="sec-5">
      <title>AND SEMANTIC INFORMATION</title>
      <p>
        To learn embeddings from diferent linguistic annotations we use
Vecsigrafo [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], a method to learn embedding for linguistic
annotations on a text corpus. Vecsigrafo extends the Swivel algorithm [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ]
to jointly learn embeddings for surface forms, lemmas, grammar
types, and concept on a corpus enriched with linguistic
annotations. Vecsigrafo embeddings outperformed the previous state of
the art in word and word-sense embeddings by co-training surface
form, lemma and concept embeddings as opposed to training each
individually.
      </p>
      <p>In contrast to simple tokens produced by space separation
tokenization, linguistic annotations used in Vecsigrafo are based on
terms that are related to one or more words. Surface forms are
terms as they appear in the text, and lemmas are the base form of
these terms. Source forms and lemmas can refer to concepts in a
knowledge graph. For example, table 1 shows the linguistic
annotations added to a text excerpt taken from a publication. Note how at
the surface form level some tokens are grouped into terms like local
anesthetic and phrenic nerve, and at the lemma level some surface
forms such as concerns and relating are turned into their base form
concern and relate. The grammar information indicates the role of
the terms as nouns (N), verbs(V), noun and verb phrases (NP, VP),
prepositions (P) and punctuation marks (PNT). In addition some of
the terms are related to the concepts like like local anesthetic that
is annotated with the concept en%23107824862 that is defined as An
anesthetic that numbs a local area.</p>
      <p>Formally, Vecsigrafo generates, from a corpus an embedding
space Φ = {(x, e) : x ∈ SF ∪ L ∪ G ∪ C, e ∈ Rn } where SF , L, G, and
C are sets of surface forms, lemmas, grammar types, and concepts.
One of the benefits of Vecisgrafo is that concept embeddings
contribute to identifying the intended meaning of ambiguous terms
in the corpus since the term and concept embeddings are learned
jointly. To use Vecsigrafo embeddings in Φ we need to annotate the
target corpus with the linguistic elements used to learn the
embeddings. Note that embeddings representing linguistic annotations
for the same term can be merged to generate a single embedding
for the term, for example, by applying vector operations such as
concatenation or averaging, or dimensional reduction techniques
like PCA or SVD.
4</p>
    </sec>
    <sec id="sec-6">
      <title>EXPERIMENTAL WORK</title>
      <p>In this section we describe the scholarly communication corpus
used to learn the linguistic embeddings, the NLP toolkit used to
annotate the corpus, the neural network that uses the linguistic
embeddings to classify the research publications, and report the
evaluation results of the classifiers.
4.1</p>
    </sec>
    <sec id="sec-7">
      <title>Embeddings for Scholarly Communications</title>
      <p>
        SciGraph [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] is a linked open data platform for the scientific
domain. It contains information from the complete research process:
research projects, conferences, authors and publications, among
others. The knowledge graph contains more than 1 billion facts
about objects of interest to the scholarly domain, distributed over
some 85 million entities described using 50 classes and more than
250 properties. Most of the knowledge graph is available under CC
      </p>
      <p>BY 4.0 License (i.e., attribution) with the exception of abstracts and
grant metadata, which are available under CC BY-NC 4.0 License
(i.e., attribution and non-comercial) A core ontology expressed in
OWL encodes the semantics of the data in the knowledge graph
consisting of 47 classes and 253 properties. From SciGraph we
extract publications including articles and book chapters published
from 2001 to 2017. We use the titles and abstracts of the publications
to generate the corpus with roughly 3.2 million publications, 1.4
million distinct words, and 700 million tokens.</p>
      <p>Next we use Expert System NLP suit (Cogito) to parse the text
and add linguistic annotations. Cogito disambiguator relies on its
own knowledge graph called Sensigrafo, that encodes the linguistic
knowledge in a way similar to WordNet, and applies a rule-based
approach to disambiguation. The Sensigrafo contains about 400K,
lemmas and 300K concepts interlinked via 61 relation types. Note
that we could have used any other NLP toolkit as long as it generates
the linguistic annotations used in this work. The corpus parsing
and annotations generated by Cogito are reported in table 2.</p>
      <p>
        For each linguistic element we learned an initial set of
embeddings with 300 dimensions using Vecsigrafo. The diference between
the number of learned embeddings and the linguistic annotations
is due to a filter that we applied based on previous results [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. We
iflter out elements with grammar type article, punctuation mark
or auxiliary verbs and generalize tokens with grammar type entity
or person proper noun, replacing the original token with special
tokens grammar#ENT and grammar#NPH respectively. In addition,
to these embeddings, we learned 10 Vecsigrafo embedding spaces
for the possible combinations of size 2 and 3 between the linguistic
elements sf, l, g and c.
      </p>
      <sec id="sec-7-1">
        <title>Embeddings</title>
      </sec>
      <sec id="sec-7-2">
        <title>Generation</title>
        <p>Normal Distribution
Optimized by CNN
Publications in Scigraph have one or more field of research codes
that classify the documents in 22 categories such as Mathematical
Sciences, Engineering or Medical and Health Sciences. Thus, we can
formulate a multi-label classification task that aims at predicting
one or more of these 22 first level categories for each publication.</p>
        <p>
          Embeddings are the natural numerical representation of text
for neural networks. Kim [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ] shows that Convolutional neural
networks CNN were fitted for text classification and his results
improved the state of the art on diferent text classification tasks
and benchmarks. CNN are based on convolutional layers that slide
iflters (aka kernels) across the input data and return the dot products
of the elements of the filter and each fragment of the input. These
convolutions allows the network to learn features from the data,
alleviating the manual selection required in traditional approaches.
Stacking several convolutional layers allows feature composition,
increasing the level of abstraction from the initial layers to the
output.
        </p>
        <p>To learn the classifier we use an of the shelf CNN
implementation available in Keras, with 3 convolutional layers, 128 filters
and a 5-element window size. As corpus we use 187795 articles
available in SciGraph published in 2011. To evaluate the classifiers
we use ten-fold cross-validation and precision, recall and f-measure
as metrics. We use a vocabulary with maximum 20K entries, and
sequences size 1000.</p>
        <p>As baseline, we train a classifier that learns from embeddings
generated randomly following a normal distribution. As upper
bound we learn a classifier that is able to optimize the embeddings
in the learning process. The evaluation of baseline and upper bound
classifiers are presented in table 3.
4.3</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>Classifiers using vecsigrafo embeddings</title>
      <p>We train classifiers using single Vecsigrafo embeddings for each
linguistic annotation (sf, l, c) and for the ten 2, and 3 size
combinations of (sf, l, g, c). Grammar embeddings were not evaluated
independently due to the low number of distinct grammar types
used to annotate the terms. When using embeddings of two or
three linguistic annotations two diferent approaches are used. The
ifrst approach relies on a single vocabulary containing at most 20K
entries per each linguistic annotations in the text, and no merging
operation is carried out, while in the second one embeddings are
merged using concatenation or average. Evaluations results are
reported in table 4.
4.4</p>
    </sec>
    <sec id="sec-9">
      <title>Lemmas better than surface forms and tokens</title>
      <p>Regarding single linguistic annotations, lemma l and surface
form sf embeddings contribute to learn the better classifier
than using token t embeddings respectively. This shows that
the classifier learning process benefits from the conflation of
diferent term and word variations (sf, t) into a base form (l). However,
grouping raw tokens into terms (sf ) only generates a slight
improvement in the classifier performance with respect to using only
tokens (t). On the other hand, concept (c) embeddings performance
in this task is worst than t embeddings. The low number of c
embeddings (see table 2) compared to the number of tokens and the
other linguistic annotations afect negatively the learning process.
The diference between concepts and tokens is consequence of
limited coverage of the general-purpose annotator used in a highly
specialized domain as the scientific.
4.5</p>
    </sec>
    <sec id="sec-10">
      <title>Lemmas and surface forms the best combination</title>
      <p>To analyse the results of the diferent combinations of embeddings
for linguistic annotations we focus on each evaluation metric.
Regarding precision the top 2 classifiers are learned from
combinations of sf, l and g. In addition note that the common linguistic
element in the top 6 classifiers is g combined either with sf or l,
and in general removing g produced least precise classifiers. Thus,
precision-wise the part-of-speech information in
combination with surface forms and lemmas is very relevant.
Semantic information (c) also contributes to enhance precision when it
is combined with lemmas and surface forms, or with lemmas and
grammar information. In addition, the precision of 16 classifiers
out of 22 is better than the upper bound reported in table 3, where
the embeddings are optimized in the classifier learning phase, even
though vecsigrafo embeddings were not learned for this specific
purpose.</p>
      <p>The recall analysis shows a diferent picture since the grammar
information (g) does not seem to have a decisive role on the
classifier performance. Surface forms and lemmas generates the
classifier with highest recall . Nevertheless, in this analysis
concepts (c) gain more relevance always in combination with either sf
or l. The combination of l and c seems to benefit recall since it is
presented in 3 of the top 5 classifiers. In contrast, when concepts are
combined with sf the recall is lower. In general g-based embedding
combinations generate classifiers with lower recall. Note that none
of the classifiers reached the recall of the upper bound classifier.</p>
      <p>The f-measure data shows more heterogeneous results since
by definition it is the harmonic mean of precision and recall, and
hence the embedding combinations that generate the best f-measure
Linguistic
Annotations</p>
      <p>Merging Precision
needs a high precision and a high recall. The combination of
surface forms sf and lemmas l embeddings is at the top of
the f-measure ranking, followed by their combination with c. In
general, concept embeddings improves the f-measure when
combined with either lemmas or surface forms. However, when used in
conjunction with lemmas and surface form embeddings the
performance is worse. In general, due to the low coverage of concepts in
the scientific domain the classifiers that relies only on c embeddings
perform worst even when combined with grammar information.
Similarly surface forms ofer poor performance when combined
with grammar information.</p>
      <p>Finally note how the best classifiers were learned when the
linguistic annotation embeddings are used independently which
contrast to the worse results achieved when merging the
embeddings.
4.6</p>
    </sec>
    <sec id="sec-11">
      <title>Words and subwords</title>
      <p>
        We also test embeddings generated from word constituents. We
resorted to FastText[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] since Vecsigrafo approach was not designed
to generate embeddings for word constituents. We use FastText to
generate token and character-ngram embeddings, with n ranging
from 3 to 6. We use these embeddings to learn the classifiers using
the same CNN architecture and evaluation procedure used in the
experiments described above. Evaluation results, presented in table
      </p>
      <p>FastText
5, shows that token embeddings are better than using token and
character-ngram embeddings, which is in line with our assumption
that using subword representations could be not convinient in the
scientific domain. Note that one of the benefits of using
characterngram embeddings is to avoid the out of the vocabulary words
(OOV). However, in our case, the embeddings were learned from
the whole scigraph corpus so we do not face the OOV problem in
our experiments.</p>
      <p>On the other hand, results in table 4 and 5 are not directly
comparable since the embeddings are generated with a diferent
algorithms (FastText vs Vecsigrafo). For example FastText token
embeddings generate a better classifier than using Vecisgrafo token
embeddings, and remarkably FastText embeddings in both cases
reach the highest precision of all the tested embeddings.
Nevertheless, we can see that the f-measure of the classifier that uses
FastText character-ngram embeddings is lesser than the first 11
results reported in table 4, including the classifier that uses only
lemmas.
5</p>
    </sec>
    <sec id="sec-12">
      <title>CONCLUSIONS</title>
      <p>Natural language processing has the potential to help scientists to
manage and get insights out of the huge amount of scholarly
communications available. Nowadays deep learning techniques based
on word embeddings and language models have advanced the state
of the art in diferent NLP tasks. Nevertheless, the predominant
approach in NLP is to use word or subword representations as the
input of deep neural architectures that requires large corpora to
learn performing language models. However, in contrast to
generalpurpose corpora the scientific vocabulary often contains complex
terms comprising more than one word with the additional
characteristic that these terms are very specific and only make sense in
certain fields of knowledge (e.g., Cosmic Microwave Background
Radiation). Thus models using word or subword representations
could have problems to gather the necessary textual evidence to
capture their meaning.</p>
      <p>To overcome the word and subword representation limitation we
propose to use embeddings based on linguistic annotations such as
surface forms, lemmas, part-of-speech information, and concepts.
These embeddings are jointly learned from a corpus of scientific
communications using an existing approach called Vecsigrafo. We
evaluate the linguistic annotation embeddings in a multilabel
classification where the goal was to assign a scientific topic to each
publication. Our evaluations results show that lemmas help to learn
better classifiers than using space-separated words and subword
representations based on character-ngrams. The best results were
achieved when lemma and surface forms were used jointly.
Grammar information was very useful for high precision. Concepts, on
the other hand, were less helpful in general mainly due to the low
coverage of concepts in the scientific domain. Since part of the
analysis that identify surface forms and lemmas are based on lexical
and syntactical analysis the coverage was higher.</p>
      <p>As future work we want evaluate the linguistic annotation
embeddings on other evaluation tasks diferent from text classification
where understanding the the glossary can have more impact like
entailment and question and answering. In addition, another line
of research is to evaluate the impact of the linguistic annotations
when used as input representation to learn language models.</p>
    </sec>
    <sec id="sec-13">
      <title>ACKNOWLEDGMENTS</title>
      <p>This research has been supported by The European Language Grid
project funded by the European Unions Horizon 2020 research and
innovation programme undergrant agreement No 825627 (ELG).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Waleed</given-names>
            <surname>Ammar</surname>
          </string-name>
          , Dirk Groeneveld, Chandra Bhagavatula, Iz Beltagy, Miles Crawford, Doug Downey, Jason Dunkelberger, Ahmed Elgohary, Sergey Feldman, Vu Ha, Rodney Kinney,
          <string-name>
            <given-names>Sebastian</given-names>
            <surname>Kohlmeier</surname>
          </string-name>
          , Kyle Lo, Tyler C. Murray, HsuHan Ooi, Matthew E. Peters, Joanna L. Power, Sam Skjonsberg, Lucy Lu Wang, Christopher Wilhelm, Zheng Yuan, Madeleine van Zuylen,
          <string-name>
            <given-names>and Oren</given-names>
            <surname>Etzioni</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Construction of the Literature Graph in Semantic Scholar</article-title>
          . In NAACL-HLT.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S</given-names>
            <surname>Bechhofer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I</given-names>
            <surname>Buchan</surname>
          </string-name>
          , D De Roure,
          <string-name>
            <given-names>P</given-names>
            <surname>Missier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J</given-names>
            <surname>Ainsworth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J</given-names>
            <surname>Bhagat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P</given-names>
            <surname>Couch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D</given-names>
            <surname>Cruickshank</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M</given-names>
            <surname>Delderfield</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I</given-names>
            <surname>Dunlop</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M</given-names>
            <surname>Gamble</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D</given-names>
            <surname>Michaelides</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S</given-names>
            <surname>Owen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D</given-names>
            <surname>Newman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S</given-names>
            <surname>Sufi</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C</given-names>
            <surname>Goble</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Why linked data is not enough for scientists</article-title>
          .
          <source>Future Generation Computer Systems</source>
          <volume>29</volume>
          ,
          <issue>2</issue>
          (
          <year>2013</year>
          ),
          <fpage>599</fpage>
          -
          <lpage>611</lpage>
          . https://doi.org/10. 1016/j.future.
          <year>2011</year>
          .
          <volume>08</volume>
          .004 Special section: Recent advances in e-Science.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>K</given-names>
            <surname>Belhajjame</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O</given-names>
            <surname>Corcho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D</given-names>
            <surname>Garijo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <surname>P Missier</surname>
          </string-name>
          , DR Newman,
          <string-name>
            <given-names>R</given-names>
            <surname>Palma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S</given-names>
            <surname>Bechhofer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E</given-names>
            <surname>Garcia-Cuesta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>JM</given-names>
            <surname>Gomez-Perez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G</given-names>
            <surname>Klyne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K</given-names>
            <surname>Page</surname>
          </string-name>
          ,
          <string-name>
            <surname>M Roos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>JE</given-names>
            <surname>Ruiz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S</given-names>
            <surname>Soiland-Reyes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L</given-names>
            <surname>Verdes-Montenegro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D</given-names>
            <surname>De Roure</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C</given-names>
            <surname>Goble</surname>
          </string-name>
          . [n. d.].
          <source>WorkflowCentric Research Objects: A First Class Citizen in the Scholarly Discourse</source>
          .
          <fpage>1</fpage>
          -
          <lpage>12</lpage>
          . http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>903</volume>
          /paper-01.pdf
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Iz</given-names>
            <surname>Beltagy</surname>
          </string-name>
          , Arman Cohan, and
          <string-name>
            <given-names>Kyle</given-names>
            <surname>Lo</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>SciBERT: Pretrained Contextualized Embeddings for Scientific Text</article-title>
          . arXiv:arXiv:
          <year>1903</year>
          .10676
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Piotr</given-names>
            <surname>Bojanowski</surname>
          </string-name>
          , Edouard Grave, Armand Joulin, and
          <string-name>
            <given-names>Tomas</given-names>
            <surname>Mikolov</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Enriching Word Vectors with Subword Information</article-title>
          .
          <source>arXiv preprint arXiv:1607.04606</source>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Piotr</given-names>
            <surname>Bojanowski</surname>
          </string-name>
          , Edouard Grave, Armand Joulin, and
          <string-name>
            <given-names>Tomas</given-names>
            <surname>Mikolov</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Enriching word vectors with subword information</article-title>
          .
          <source>Transactions of the Association for Computational Linguistics</source>
          <volume>5</volume>
          (
          <year>2017</year>
          ),
          <fpage>135</fpage>
          -
          <lpage>146</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Antoine</given-names>
            <surname>Bordes</surname>
          </string-name>
          , Nicolas Usunier, Jason Weston, and
          <string-name>
            <given-names>Oksana</given-names>
            <surname>Yakhnenko</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Translating Embeddings for Modeling Multi-Relational Data</article-title>
          .
          <source>Advances in NIPS 26</source>
          (
          <year>2013</year>
          ),
          <fpage>2787</fpage>
          -
          <lpage>2795</lpage>
          . https://doi.org/10.1007/s13398-014
          <source>-0173-7</source>
          .2 arXiv:arXiv:
          <fpage>1011</fpage>
          .1669v3
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Philip</surname>
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Bourne</surname>
          </string-name>
          , Timothy W. Clark, Robert Dale, Anita de Waard, Ivan Herman,
          <string-name>
            <surname>Eduard H. Hovy</surname>
            , and
            <given-names>David</given-names>
          </string-name>
          <string-name>
            <surname>Shotton</surname>
          </string-name>
          .
          <source>2012. Improving The Future of Research Communications and e-Scholarship (Dagstuhl Perspectives Workshop</source>
          <volume>11331</volume>
          ).
          <source>Dagstuhl Manifestos</source>
          <volume>1</volume>
          ,
          <issue>1</issue>
          (
          <year>2012</year>
          ),
          <fpage>41</fpage>
          -
          <lpage>60</lpage>
          . https://doi.org/10.4230/DagMan.1.1.
          <fpage>41</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>José</given-names>
            <surname>Camacho-Collados</surname>
          </string-name>
          , Mohammad Taher Pilehvar, and
          <string-name>
            <given-names>Roberto</given-names>
            <surname>Navigli</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>NASARI: Integrating explicit knowledge and corpus statistics for a multilingual representation of concepts and entities</article-title>
          .
          <source>Artificial Intelligence</source>
          <volume>240</volume>
          (
          <year>2016</year>
          ),
          <fpage>36</fpage>
          -
          <lpage>64</lpage>
          . https://doi.org/10.1016/j.artint.
          <year>2016</year>
          .
          <volume>07</volume>
          .005
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Xinxiong</surname>
            <given-names>Chen</given-names>
          </string-name>
          , Zhiyuan Liu, and
          <string-name>
            <given-names>Maosong</given-names>
            <surname>Sun</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>A Unified Model for Word Sense Representation and Disambiguation</article-title>
          . In EMNLP.
          <volume>1025</volume>
          -
          <fpage>1035</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>R</given-names>
            <surname>Denaux and JM Gomez-Perez</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Vecsigrafo: Corpus-based Word-Concept Embeddings-Bridging the Statistic-Symbolic Representational Gap in Natural Language Processing</article-title>
          . To appear in Semantic Web Journal http://www.semanticweb-journal.net/system/files/swj2148.pdf (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Jacob</surname>
            <given-names>Devlin</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ming-Wei</surname>
            <given-names>Chang</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Kenton</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and Kristina</given-names>
            <surname>Toutanova</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding</article-title>
          . arXiv preprint arXiv:
          <year>1810</year>
          .
          <volume>04805</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Andres</given-names>
            <surname>Garcia-Silva</surname>
          </string-name>
          , Jose Manuel Gomez-Perez, Raul Palma, Marcin Krystek, Simone Mantovani, Federica Foglini, Valentina Grande, Francesco De Leo, Stefano Salvi, Elisa Trasatti, Vito Romaniello, Mirko Albani, Cristiano Silvagni, Rosemarie Leone, Fulvio Marelli, Sergio Albani, Michele Lazzarini,
          <string-name>
            <given-names>Hazel J.</given-names>
            <surname>Napier</surname>
          </string-name>
          ,
          <string-name>
            <surname>Helen M. Glaves</surname>
            , Timothy Aldridge, Charles Meertens, Fran Boler, Henry W. Loescher, Christine Laney, Melissa A. Genazzio, Daniel Crawl, and
            <given-names>Ilkay</given-names>
          </string-name>
          <string-name>
            <surname>Altintas</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Enabling FAIR research in Earth Science through research objects</article-title>
          .
          <source>Future Generation Computer Systems</source>
          <volume>98</volume>
          (
          <year>2019</year>
          ),
          <fpage>550</fpage>
          -
          <lpage>564</lpage>
          . https://doi.org/10.1016/j.future.
          <year>2019</year>
          .
          <volume>03</volume>
          .046
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>Jose</given-names>
            <surname>Manuel</surname>
          </string-name>
          Gomez-Perez,
          <string-name>
            <given-names>Raul</given-names>
            <surname>Palma</surname>
          </string-name>
          , and
          <string-name>
            <surname>Andres</surname>
          </string-name>
          Garcia-Silva.
          <year>2017</year>
          .
          <article-title>Towards a human-machine scientific partnership based on semantically rich research objects</article-title>
          .
          <source>In 2017 IEEE 13th International Conference on e-Science (e-Science). IEEE</source>
          ,
          <fpage>266</fpage>
          -
          <lpage>275</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Tony</surname>
            <given-names>Hammond</given-names>
          </string-name>
          , Michele Pasin, and
          <string-name>
            <given-names>Evangelos</given-names>
            <surname>Theodoridis</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Data integration and disintegration: Managing Springer Nature SciGraph with SHACL and OWL</article-title>
          ..
          <source>In International Semantic Web Conference (Posters, Demos and Industry Tracks) (CEUR Workshop Proceedings)</source>
          , Nadeschda Nikitina, Dezhao Song,
          <source>Achille Fokoue, and Peter Haase (Eds.)</source>
          , Vol.
          <year>1963</year>
          .
          <article-title>CEUR-WS.org</article-title>
          . http://dblp.unitrier.de/db/conf/semweb/iswc2017p.html#HammondPT17
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>Yoon</given-names>
            <surname>Kim</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Convolutional Neural Networks for Sentence Classification</article-title>
          .
          <source>In EMNLP.</source>
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>Omer</given-names>
            <surname>Levy</surname>
          </string-name>
          and
          <string-name>
            <given-names>Yoav</given-names>
            <surname>Goldberg</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Neural Word Embedding As Implicit Matrix Factorization</article-title>
          .
          <source>In Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2 (NIPS'14)</source>
          . MIT Press, Cambridge, MA, USA,
          <fpage>2177</fpage>
          -
          <lpage>2185</lpage>
          . http://dl.acm.org/citation.cfm?id=
          <volume>2969033</volume>
          .
          <fpage>2969070</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Massimiliano</surname>
            <given-names>Mancini</given-names>
          </string-name>
          , José Camacho-Collados,
          <string-name>
            <given-names>Ignacio</given-names>
            <surname>Iacobacci</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Roberto</given-names>
            <surname>Navigli</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Embedding Words and Senses Together via Joint KnowledgeEnhanced Training</article-title>
          . In CoNLL.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Tomas</surname>
            <given-names>Mikolov</given-names>
          </string-name>
          , Kai Chen, Greg Corrado, and
          <string-name>
            <given-names>Jefrey</given-names>
            <surname>Dean</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Eficient Estimation of Word Representations in Vector Space</article-title>
          .
          <source>CoRR abs/1301</source>
          .3781 (
          <year>2013</year>
          ). arXiv:
          <volume>1301</volume>
          .3781 http://arxiv.org/abs/1301.3781
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>Maximilian</surname>
            <given-names>Nickel</given-names>
          </string-name>
          , Lorenzo Rosasco, and
          <string-name>
            <surname>Tomaso</surname>
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Poggio</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Holographic Embeddings of Knowledge Graphs</article-title>
          .
          <source>In AAAI.</source>
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <surname>Jefrey</surname>
            <given-names>Pennington</given-names>
          </string-name>
          , Richard Socher, and
          <string-name>
            <given-names>Christopher D</given-names>
            <surname>Manning</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Glove: Global vectors for word representation</article-title>
          ..
          <source>In EMNLP</source>
          , Vol.
          <volume>14</volume>
          .
          <fpage>1532</fpage>
          -
          <lpage>1543</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <surname>Matthew</surname>
            <given-names>Peters</given-names>
          </string-name>
          , Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark,
          <string-name>
            <given-names>Kenton</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and Luke</given-names>
            <surname>Zettlemoyer</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Deep Contextualized Word Representations</article-title>
          .
          <source>In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , Volume
          <volume>1</volume>
          (Long Papers).
          <source>Association for Computational Linguistics</source>
          ,
          <fpage>2227</fpage>
          -
          <lpage>2237</lpage>
          . https://doi.org/10.18653/v1/
          <fpage>N18</fpage>
          -1202
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <surname>Alec</surname>
            <given-names>Radford</given-names>
          </string-name>
          , Karthik Narasimhan, Tim Salimans, and
          <string-name>
            <given-names>Ilya</given-names>
            <surname>Sutskever</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Improving language understanding by generative pre-training</article-title>
          .
          <source>URL https://s3-us-west-2</source>
          . amazonaws. com/openai-assets/researchcovers/languageunsupervised/language understanding paper.
          <source>pdf</source>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <surname>Alec</surname>
            <given-names>Radford</given-names>
          </string-name>
          , Jefrey Wu, Rewon Child, David Luan,
          <string-name>
            <given-names>Dario</given-names>
            <surname>Amodei</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Ilya</given-names>
            <surname>Sutskever</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Language models are unsupervised multitask learners</article-title>
          .
          <source>OpenAI Blog 1</source>
          ,
          <issue>8</issue>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>Petar</given-names>
            <surname>Ristoski</surname>
          </string-name>
          and
          <string-name>
            <given-names>Heiko</given-names>
            <surname>Paulheim</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>RDF2Vec: RDF graph embeddings for data mining</article-title>
          .
          <source>In International Semantic Web Conference</source>
          , Vol.
          <volume>9981</volume>
          LNCS.
          <fpage>498</fpage>
          -
          <lpage>514</lpage>
          . https://doi.org/10.1007/978-3-
          <fpage>319</fpage>
          -46523-4_
          <fpage>30</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>Michael</given-names>
            <surname>Schlichtkrull</surname>
          </string-name>
          , Thomas N Kipf,
          <string-name>
            <surname>Peter Bloem</surname>
            , Rianne van den Berg, Ivan Titov, and
            <given-names>Max</given-names>
          </string-name>
          <string-name>
            <surname>Welling</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Modeling Relational Data with Graph Convolutional Networks</article-title>
          .
          <source>arXiv:1703.06103</source>
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>Mike</given-names>
            <surname>Schuster</surname>
          </string-name>
          and
          <string-name>
            <given-names>Kaisuke</given-names>
            <surname>Nakajima</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>Japanese and korean voice search</article-title>
          .
          <source>In 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</source>
          . IEEE,
          <fpage>5149</fpage>
          -
          <lpage>5152</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <surname>Noam</surname>
            <given-names>Shazeer</given-names>
          </string-name>
          , Ryan Doherty,
          <string-name>
            <given-names>Colin</given-names>
            <surname>Evans</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Chris</given-names>
            <surname>Waterson</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Swivel: Improving Embeddings by Noticing What's Missing. arXiv preprint (</article-title>
          <year>2016</year>
          ). arXiv:
          <volume>1602</volume>
          .
          <fpage>02215</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <surname>Arfon</surname>
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Smith</surname>
            , Daniel S. Katz, and
            <given-names>Kyle E.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Niemeyer</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Software citation principles</article-title>
          .
          <source>PeerJ Computer Science</source>
          <volume>2</volume>
          (
          <issue>Sept</issue>
          .
          <year>2016</year>
          ),
          <year>e86</year>
          . https://doi.org/10.7717/ peerj-cs.
          <fpage>86</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <surname>Ashish</surname>
            <given-names>Vaswani</given-names>
          </string-name>
          , Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones,
          <string-name>
            <given-names>Aidan N.</given-names>
            <surname>Gomez</surname>
          </string-name>
          , Lukasz Kaiser, and
          <string-name>
            <given-names>Illia</given-names>
            <surname>Polosukhin</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Attention Is All You Need</article-title>
          .
          <source>CoRR abs/1706</source>
          .03762 (
          <year>2017</year>
          ). arXiv:
          <volume>1706</volume>
          .03762 http://arxiv.org/abs/ 1706.03762
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <surname>Yanshan</surname>
            <given-names>Wang</given-names>
          </string-name>
          , Sijia Liu, Naveed Afzal, Majid Rastegar-Mojarad,
          <string-name>
            <given-names>Liwei</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Feichen</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Paul</given-names>
            <surname>Kingsbury</surname>
          </string-name>
          , and Hongfang Liu.
          <year>2018</year>
          .
          <article-title>A comparison of word embeddings for the biomedical natural language processing</article-title>
          .
          <source>Journal of Biomedical Informatics</source>
          <volume>87</volume>
          (
          <year>2018</year>
          ),
          <fpage>12</fpage>
          -
          <lpage>20</lpage>
          . https://doi.org/10.1016/j.jbi.
          <year>2018</year>
          .
          <volume>09</volume>
          .008
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>Mark</given-names>
            <surname>Wilkinson</surname>
          </string-name>
          and et al.
          <year>2016</year>
          .
          <article-title>The FAIR Guiding Principles for scientific data management and stewardship</article-title>
          .
          <source>Nature Scientific Data</source>
          <volume>160018</volume>
          (
          <year>2016</year>
          ). http: //www.nature.com/articles/sdata201618
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>J</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>JM</given-names>
            <surname>Gomez-Perez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K</given-names>
            <surname>Belhajjame</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G</given-names>
            <surname>Klyne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E</given-names>
            <surname>GarcÃŋa-Cuesta</surname>
          </string-name>
          ,
          <string-name>
            <surname>A Garrido</surname>
          </string-name>
          , KM Hettne,
          <string-name>
            <given-names>M</given-names>
            <surname>Roos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D</given-names>
            <surname>De Roure</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C</given-names>
            <surname>Goble</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>Why workflows break - Understanding and combating decay in Taverna workflows</article-title>
          ..
          <source>In 8th IEEE International Conference on E-Science. 1-9</source>
          . https://doi.org/10.1109/eScience.
          <year>2012</year>
          .6404482
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>