<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Lorcán Pigott-Dix</string-name>
          <email>lorcan.pigott-dix@earlham.ac.uk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Robert P. Davey</string-name>
          <email>robert.davey@earlham.ac.uk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Deep Learning, Ontology, Named Entity Recognition, Concept Recognition</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Earlham Institute</institution>
          ,
          <addr-line>Norwich Research Park, Colney Lane, Norwich NR4 7UZ</addr-line>
          ,
          <country country="UK">United Kingdom</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The increasing scale of scientific output necessitates the use of machine-based tools to index, interpret, and allow scientists to digest the expanding volumes of data and literature. These tools depend on rich machine-readable ontology-based metadata. The scale of this task renders manual annotation infeasible. This work compares multi-ontology deep learning-based models for identifying ontology concepts in the natural language text of scientific literature. An existing convolutional neural net (CNN) architecture was improved and compared with two attention-based variants, a CNN with a Squeeze-and-Excite (SAE) mechanism, and a self-attention architecture that had been adapted for limited training data. The models were assessed against a gold-standard dataset of 228 PubMed abstracts, annotated with Human Phenotype Ontology terms. All models exceeded the previous state-of-the-art, with the SAE model promising to be the best candidate for multi-domain ontology concept extraction.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Sciences</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
    </sec>
    <sec id="sec-3">
      <title>2. Related work</title>
      <sec id="sec-3-1">
        <title>2.1. Concept recognition</title>
        <p>
          Previous tools for ontology-based concept recognition have largely been rule-based [
          <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
          ]. They
typically identify potential concepts within text using string matching, coupled with heuristics
to refine the candidate concepts. These methods tend to have high precision but low recall
scores, as they are less able to identify synonyms for concepts if they are not represented as
found in the ontology.
        </p>
        <p>
          Recently ontology-based concept recognition methodologies have begun to incorporate
neural nets, typically employing recurrent neural nets (RNNs), as RNNs can learn dependencies
between words in sequences. These methods employ word embeddings which help to address
the synonym-gap as they represent meaning rather than lexical structure. However, these
methods rely on substantive manual annotation or noisy heuristic data generation. For example,
one [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] used an ontology to heuristically label a training corpus, while another [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] relied on
manual annotation carried out by medical specialists.
        </p>
        <p>
          Arbabi et al. [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] created a method that exploits word embeddings and only requires an
ontology for training: the Neural Concept Recognizer (NCR). NCR is a “neural dictionary”
that uses a convolutional neural net (CNN) to learn associations between sequences of word
embeddings and embeddings representing ontology concepts. The neural dictionary converts
input text into the representation space of the concepts and finds the most similar concept
embeddings. It only requires an OBO format ontology, and no heuristic annotation. When the
NCR model was evaluated against text annotated with concepts from the Human Phenotype
Ontology (HPO), it achieved both micro and macro F1 scores of 70.2% and 73.9% respectively.
        </p>
        <p>“PhenoTagger” [12] combined a string matching approach with a neural classifier for the task
of HPO concept recognition. A dictionary of concepts names, synonyms, and their lemmatised
forms, was created from an ontology. The dictionary was used to label a distantly supervised
training set, which in turn, was was used to fine-tune a pre-trained BioBERT model. In
deployment, PhenoTagger combines the outputs from the string-matching dictionary and BioBERT.
PhenoTagger achieves a “document-level” f-score of 75.7% – the current state-of-the-art (SOTA)
for neural dictionary methods.</p>
      </sec>
      <sec id="sec-3-2">
        <title>2.2. Attention</title>
        <p>In recent years, models containing attention mechanisms have become the SOTA for many
natural language processing (NLP) and computer vision tasks, from machine translation [13] to
object detection [14]. In neural networks, attention mechanisms are specific trainable weights,
that learn to modify the model to increase the signal of task-relevant features and diminish
less relevant features. For example, the Squeeze-and-Excite (SAE) attention mechanism [15]
emphasises or diminishes feature signals by modelling dependencies between convolutional
iflters in a CNN. A more sophisticated attention mechanism, multi-headed self-attention (MHSA),
explicitly models the semantic dependencies between words in a sequence, in order to compute
updated word embeddings.</p>
        <p>
          MHSA is an integral part of SOTA transformer models [13]. However, these models require
large volumes of training data to be efective. Guo et al. [16] argues that this is because
selfattention models have a poor inductive bias, and instead rely heavily on these large volumes in
order to generalise well. This becomes a problem for models that are trained on limited datasets,
such as the text provided by an ontology. Arbabi et al. [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] tried attention mechanisms as an
alternative to the CNN used in their model, but they were not efective. This may be explained
by either the poor inductive bias, the limited training data, or the relative eficacy of CNNs at
local feature extraction.
        </p>
        <p>Guo et al. [16] describe an alternative configuration of MHSA, called Scale-Aware
SelfAttention (SASA). With SASA, each attention head attends to a variable scale. The scaling
restricts attention to within a certain neighbourhood of each sequence position. The intuition
here is that words that are in close proximity within a sentence are more likely to have more
relevance to each other. This scaling forces the attention heads to attend to a smaller set
of features, so the relative diferences between the remaining features are more pronounced,
improving the inductive bias. SASA models exceed the SOTA, or are competitive, for a number
of low-resource NLP tasks – requiring far fewer training examples than typical MHSA models.</p>
        <p>Other methods have been developed to improve the performance of MHSA models. Zhou
et al. [17] randomly drop entire attention heads during training to prevent a minority of
heads from dominating the model, improving the model’s ability to generalise. Wu et al. [18]
applied dropout at various points within the attention mechanism: to the attention weights
and activation layer; to the query, key, and value matrices; and to the output features prior to
the linear transformation. They also randomly removed a proportion of tokens from the input
sequence. This was found to improve the performance of MHSA models without additional
training data or computational power.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>3. Contribution</title>
      <p>We achieve a new SOTA for neural ontology-based concept recognition. The improved
performance is obtained by incorporating an attention mechanism in the CNN architecture and by
using higher quality word embeddings. Unlike previous neural dictionaries, it can incorporate
multiple domain ontologies at once. We found those trained using a combination of diverse
ontologies performed the best. This work also demonstrates that transformer-based
architectures can be modified, with variable attention scaling, to perform competitively with CNNs in
situations where training data is substantially limited. All of the models tested are available
here: https://github.com/lorcanpd/adorNER.</p>
    </sec>
    <sec id="sec-5">
      <title>4. Methods</title>
      <sec id="sec-5-1">
        <title>4.1. Neural Concept Recognition (NCR) adapted to use ELMo Word</title>
      </sec>
      <sec id="sec-5-2">
        <title>Embeddings</title>
        <p>The NCR classifier comprises two parts: the concept embeddings and the CNN classifier.
The concepts are represented by a matrix of randomly initialised embeddings, which share
information between related concepts via multiplication by an ancestry matrix. The CNN
classifier passes filters over the sequence of word embeddings (obtained from a pre-trained
ELMo model [19]) representing the natural language descriptions of the concepts, extracts
semantic signals, and outputs them into the representational space of the concept embeddings.
As the model trains, it learns to reduce the distance between the CNN output and the correct
concept’s position in representational space.</p>
        <p>To perform concept recognition, input sentences are split into all of the possible n-grams they
contain, where  ∈ {1, ..., 7} . Each n-gram is passed to the classifier, returning candidates with
the highest match confidence score above a given threshold. Heuristics then resolve overlap in
the text, favouring longer – likely more specific – terms.</p>
      </sec>
      <sec id="sec-5-3">
        <title>4.2. Squeeze-and-Excite (SAE)</title>
        <p>The CNN model is augmented to include an SAE mechanism. Here, the methodology described
in Hu et al. [15] is adapted for a one-dimensional CNN. Average-pooling is applied to each of
the non-zero elements of each feature map produced by the convolutional layer. This reduces
the feature maps into a single vector, z ∈ ℛ</p>
        <p>where  is the number of feature maps. Each
element of z is calculated like so:
1
 
  =</p>
        <p>∑ u
s =  ( W2( W1z))
Where   is the statistic for the  -th filter,</p>
        <p>u is the  -th feature map’s vector, and   is the
number of non-zero elements in that vector. The vector of all the feature map statistics is then
compressed and decompressed, as follows:
Where s is the vector of filter weights, W1 ∈ ℛ  × the parameter weights of the compression
transformation, W2 ∈ ℛ</p>
        <p>×  the parameter weights of the decompression transformation,  and
 are the sigmoid and ReLU activation functions respectively, and  is the compression ratio.
The max-pooled feature maps are then scaled element-wise by s. As the model is trained, the
SAE mechanism learns to model dependencies between feature maps.</p>
      </sec>
      <sec id="sec-5-4">
        <title>4.3. Multi-Scale Self Attention (MSSA)</title>
        <sec id="sec-5-4-1">
          <title>Architecture</title>
          <p>Here, the CNN architecture is replaced entirely by an encoder based upon Guo’s
SASA transformer [16]. Given an input of word embeddings X ∈ ℛ × , where  represents
the number of embeddings and  their dimensionality, each scale-aware attention head can be
described as follows:
(1)
(2)
(3)
(4)
head(X, ) , = softmax (</p>
          <p>)  (V, )
Q   (K, )</p>
          <p>√ ℎ
ℎ
W , and W (all W ∈ ℛ</p>
          <p>× ℎ ).</p>
          <p>Where  is the scale parameter,  corresponds to the  -th head, and  to the  -th element of the
sequence. K, Q, V are the projections of X into  ×
 subspaces, with ℎ being the number of
heads. X is projected into these subspaces by multiplying it by the parameter matrices W ,</p>
          <p>Q = XW , K = XW , V = XW
  (x, ) is the context-extraction function, which is defined as:</p>
          <p>(X, ) = [ x,− , ... , x,+ ]
When the scale parameter exceeds the range of the sequence, the context-extraction function
pads the sequence with zeros of the appropriate dimensions.</p>
          <p>The ℎ heads are incorporated into a Multi-Scale Multi-headed Self-Attention (MSMSA) block.
This block consists of the scale aware attention layer and a feed-forward network and is
computed as follows:</p>
          <p>MSMSA(X, Ω) = norm([head1(X,  1), ... , headℎ(X,  ℎ))]W + X)
Where Ω ∈ { 1, ... ,  ℎ} is the set of scale parameters, W is a parameter matrix and norm is the
layer normalisation function. The output is then passed to a feed forward (FF) layer. MSMSA
blocks can be stacked multiple times, with varying sets of scale parameters. The final output is
reduced into a single vector z by summing each output vector element-wise, normalised by the
square-root of the sequence length:
z =

∑X⋅
=1
√
z is then passed onto a final feed-forward layer that converts the sentence representation into
the representation space of the concept embeddings. This layer comprises two consecutive
linear transformations, each with GELU activations and l2 normalisation, followed by a final
linear transformation with no activation layer.</p>
        </sec>
        <sec id="sec-5-4-2">
          <title>Dropout regime</title>
          <p>For each input batch, there was 50% chance of the dropout function being
applied to the entire batch. If applied, entire embeddings were dropped from a sequence with
a probability of 20%. Sequences shorter than three tokens in length were excluded from this,
as there was a chance the input signal would be too degraded. Each attention-head had a 25%
chance of an attention head being dropped out. A subsequent dropout was applied after the
activation function within the attention block’s feed forward network, where random elements
were dropped from the matrix with a 10% probability. This dropout is also applied within
the final feed-forward layer. Inside the attention-heads, a further dropout was applied to the
iteratively extracted sections of the Q, K, and V matrices. A final dropout was applied to the
attention scores prior to being scaled and the softmax function being applied.</p>
        </sec>
        <sec id="sec-5-4-3">
          <title>Scale regimes</title>
          <p>The scale regimes for SASA were originally designed for much larger
sequences, whereas these models have a maximum input size of ten tokens. Figure 1 illustrates
the diferent combinations of self-attention head scaling parameters tested.</p>
        </sec>
      </sec>
      <sec id="sec-5-5">
        <title>4.4. Ontologies</title>
        <p>This work also explored combining multiple ontologies from various domains of knowledge,
rather than a single ontology. Table 1 displays an overview of the ontology combinations, the
(5)
(6)
(7)
block 1
block 2
block 3
block 1
block 2
block 3
block 1
block 2
block 3
Regime 3</p>
        <p>Regime 4
Regime 5</p>
        <p>Regime 6
number of unique concepts represented, and their total number of training examples. One
set contained the HPO alone, another both the HPO and the semantically similar Mammal
Phenotype Ontology (MPO). The last set comprised three semantically distinct ontologies: the
HPO, the Cell Ontology (CLO), and the Ontology of Host-Pathogen Interactions (OHPI). Unique
concept IDs in common between ontologies were combined into single concepts.</p>
      </sec>
      <sec id="sec-5-6">
        <title>4.5. Training</title>
        <p>
          In total 81 models were trained (78 MSSA, and 3 NCR) in a python 3.6.13 environment using
TensorFlow 2.2.0 [20]. An unresolvable compiler incompatibility prevented the use of FastText
and Tensorflow 2 together, as was used in Arbabi et al. [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. Instead, the word embeddings
were obtained from the pre-trained ELMo model (v3) using TensorFlow Hub. Nine models (all
SAE) were trained in a python 3.9.12 environment with Tensorflow 2.9.1. A bug in the earlier
version of Tensorflow prevented the calculation of the means of only non-zero elements for
each feature map.
        </p>
        <p>All models were trained with a batch size of 256. After five epochs with no improvement in
the training loss the NCR and SAE models would cease training and revert to the best performing
parameter weights. For the MSSA models, after five epochs of no improvement, the model
parameters revert to the previous best parameter weights, and training resumed with 1/5ℎ of
the previous learning rate. After the 5ℎ learning rate change, when there was no improvement
for five epochs, training was stopped with the model reverting to the best scoring parameter
weights. Both the SAE and NCR models had an initial learning rate of 1/512. The MSSA models
have a warm-up period where the learning rate increased linearly from zero to 1/512 over</p>
      </sec>
      <sec id="sec-5-7">
        <title>4.6. Evaluation</title>
        <p>Once trained, the models were calibrated and then assessed against an annotated gold-standard
228 PubMed abstract corpus [21] (available here: https://github.com/lasigeBioTM/IHP), an
update of the bench-marking dataset created by Groza et al. [22]. To calibrate, the models
were used to annotate 40 randomly selected abstracts using  ∈ {0.05, 0.10, ... , 0.95} confidence
thresholds. The thresholds with the highest sum of both macro and micro F-scores for each
model were then used as the threshold for annotating the remaining 188 abstracts.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>5. Results</title>
      <p>Table 2 displays the performance metrics for the NCR and SAE models and Table 3 shows the
results of best performing MSSA models for each ontology combination. The best performing
score for each metric is highlighted in bold.</p>
      <p>
        When trained solely using the HPO, ELMo embeddings improved the two f-scores of the
NCR model by at least 4%, compared with the FastText NCR model reported in Arbabi et al.
[
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. However, the performance declined when additional ontologies were combined with the
HPO. The SAE model, trained with diverse domain ontologies, achieved a new SOTA, while
only slightly increasing the total number of model parameters.
      </p>
      <p>The MSSA model scores were competitive with the other models, with performance increasing</p>
      <p>HPO
Others
with the inclusion of additional domain ontologies in the training set. Both Table 2 and 3 show
that the models trained using a combination of the HPO and MPO did not perform as well
as those with the HPO alone, or in combination with the CLO and OHPI. Figure 2 contains
density-contour plots of the concept embeddings for the best performing SAE models for each
ontology combination.</p>
    </sec>
    <sec id="sec-7">
      <title>6. Discussion</title>
      <p>In addition to the SAE model setting a new SOTA for concept recognition, this work reinforces
the findings that scaled attention architectures can be competitive with CNNs in low-resource
settings.</p>
      <p>Training the SAE and MSSA models with diverse domain ontologies resulted in better
performance extracting HPO terms. The additional ontologies carry general English-language
semantics that may improve the inductive bias. Conversely, when the domains of the ontologies
have significant overlap, performance is reduced. In Figure 2 panel B, HPO and MPO terms
are less clearly separated compared to the more clearly demarcated ontologies in C. Although
HPO and MPO contain similar concepts with similar natural language names, their ancestry
structure is diferent, which separates these parallel concepts in representational space. To
discriminate between them, the models need to attend to more features. This is reflected in the
explained variance percentages in Figure 2. Further research regarding how the specific ontology
composition and features impact model performance is needed, and assessment may pose a
challenge. Currently, we are unaware of any other benchmark dataset annotated with terms
from another domain ontology. While the performance of each model will depend upon the
language, we expect the scaled attention architecture to be particularly sensitive to syntactical
variation. Specific eforts will need to be carried out by fluent native or multi-lingual experts.</p>
      <p>Qualitative analysis of our model predictions found that the heuristics, preventing overlapping
matches, lead to a not-insignificant number of false negatives. Lobo et al. [21] found that 26% of
annotated concepts in the gold standard corpora overlap. As a result, Luo et al. [12] altered their
heuristics to allow overlap. Indeed, concept overlap is required for multi-domain annotation.</p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgments</title>
      <p>Thank you to the reviewers for their kind feedback. This work was funded as part of the
Norwich Research Park Biosciences Doctoral Training Partnership, grant number BB/M011216/1,
reference code 2243628.
Text Using Ontology-Guided Machine Learning, JMIR Medical Informatics 7 (2019) e12596.
doi:1 0 . 2 1 9 6 / 1 2 5 9 6 .
[12] L. Luo, S. Yan, P.-T. Lai, D. Veltri, A. Oler, S. Xirasagar, R. Ghosh, M. Similuk, P. N. Robinson,
Z. Lu, PhenoTagger: A Hybrid Method for Phenotype Concept Recognition using Human
Phenotype Ontology, Bioinformatics 37 (2021) 1884–1890. doi:1 0 . 1 0 9 3 / b i o i n f o r m a t i c s /
b t a b 0 1 9 .
[13] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. u. Kaiser,
I. Polosukhin, Attention is all you need, in: I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach,
R. Fergus, S. Vishwanathan, R. Garnett (Eds.), Advances in Neural Information Processing
Systems, volume 30, Curran Associates, Inc., 2017. URL: https://proceedings.neurips.cc/
paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf.
[14] N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, S. Zagoruyko, End-to-End Object
Detection with Transformers, in: A. Vedaldi, H. Bischof, T. Brox, J.-M. Frahm (Eds.),
European Conference on Computer Vision, Springer International Publishing, Cham, 2020,
pp. 213–229. doi:1 0 . 1 0 0 7 / 9 7 8 - 3 - 0 3 0 - 5 8 4 5 2 - 8 _ 1 3 .
[15] J. Hu, L. Shen, G. Sun, Squeeze-and-Excitation Networks, in: Proceedings of the IEEE</p>
      <p>Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 7132–7141.
[16] Q. Guo, X. Qiu, P. Liu, X. Xue, Z. Zhang, Multi-scale Self-Attention for Text Classification,
in: Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, 2020, pp.
7847–7854. doi:1 0 . 1 6 0 9 / a a a i . v 3 4 i 0 5 . 6 2 9 0 .
[17] W. Zhou, T. Ge, K. Xu, F. Wei, M. Zhou, Scheduled DropHead: A Regularization Method
for Transformer Models, arXiv preprint arXiv:2004.13342 (2020). doi:1 0 . 4 8 5 5 0 / a r X i v . 2 0 0 4 .
1 3 3 4 2 .
[18] Z. Wu, L. Wu, Q. Meng, Y. Xia, S. Xie, T. Qin, X. Dai, T.-Y. Liu, UniDrop: A Simple
yet Efective Technique to Improve Transformer without Extra Cost, arXiv preprint
arXiv:2104.04946 (2021). doi:1 0 . 4 8 5 5 0 / a r X i v . 2 1 0 4 . 0 4 9 4 6 .
[19] M. E. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee, L. Zettlemoyer, Deep
contextualized word representations, arXiv preprint arXiv:1802.05365 (2018). doi:1 0 . 4 8 5 5 0 /
a r X i v . 1 8 0 2 . 0 5 3 6 5 .
[20] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis,
J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R.
Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, R. Monga, S. Moore, D. Murray,
C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V.
Vanhoucke, V. Vasudevan, F. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu,
X. Zheng, TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed
Systems, 2016. URL: https://www.tensorflow.org/. doi:1 0 . 4 8 5 5 0 / a r X i v . 1 6 0 3 . 0 4 4 6 7 , software
available from tensorflow.org.
[21] M. Lobo, A. Lamurias, F. M. Couto, Identifying Human Phenotype Terms by Combining
Machine Learning and Validation Rules, BioMed Research International 2017 (2017).
doi:1 0 . 1 1 5 5 / 2 0 1 7 / 8 5 6 5 7 3 9 .
[22] T. Groza, S. Köhler, S. Doelken, N. Collier, A. Oellrich, D. Smedley, F. M. Couto, G. Baynam,
A. Zankl, P. N. Robinson, Automatic concept recognition using the Human Phenotype
Ontology reference and test suite corpora, Database 2015 (2015). doi:1 0 . 1 0 9 3 / d a t a b a s e /
b a v 0 0 5 .</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Fortunato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. T.</given-names>
            <surname>Bergstrom</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Börner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Evans</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Helbing</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Milojević</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. M.</given-names>
            <surname>Petersen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Radicchi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Sinatra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Uzzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Vespignani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Waltman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.-L.</given-names>
            <surname>Barabási</surname>
          </string-name>
          ,
          <source>Science of science, Science</source>
          <volume>359</volume>
          (
          <year>2018</year>
          )
          <article-title>eaao0185</article-title>
          . URL: https://www.science.org/doi/abs/10.1126/ science.aao0185.
          <source>doi:1 0 . 1 1 2</source>
          <article-title>6 / s c i e n c e . a a o 0 1 8 5</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>B.</given-names>
            <surname>Ali</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Dahlhaus</surname>
          </string-name>
          ,
          <article-title>The Role of FAIR Data towards Sustainable Agricultural Performance: A Systematic Literature Review</article-title>
          ,
          <source>Agriculture</source>
          <volume>12</volume>
          (
          <year>2022</year>
          )
          <article-title>309. doi:1 0 . 3 3 9 0 / a g r i c u l t u r e 1 2 0 2 0 3 0 9</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>L.</given-names>
            <surname>Cooper</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. L.</given-names>
            <surname>Walls</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Elser</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Gandolfo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. W.</given-names>
            <surname>Stevenson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Smith</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Preece</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Athreya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. J.</given-names>
            <surname>Mungall</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Rensing</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hiss</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Reski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. Z.</given-names>
            <surname>Berardini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Huala</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Schaefer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Menda</surname>
          </string-name>
          , E. Arnaud,
          <string-name>
            <given-names>R.</given-names>
            <surname>Shrestha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yamazaki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Jaiswal</surname>
          </string-name>
          ,
          <article-title>The Plant Ontology as a Tool for Comparative Plant Anatomy</article-title>
          and
          <string-name>
            <given-names>Genomic</given-names>
            <surname>Analyses</surname>
          </string-name>
          ,
          <source>Plant and Cell Physiology</source>
          <volume>54</volume>
          (
          <year>2013</year>
          )
          <fpage>e1</fpage>
          -
          <lpage>e1</lpage>
          .
          <source>doi:1 0 . 1 0</source>
          <volume>9 3</volume>
          / p c p / p
          <source>c s 1</source>
          <volume>6</volume>
          <fpage>3</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Jupp</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Burdett</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Leroy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. E.</given-names>
            <surname>Parkinson</surname>
          </string-name>
          ,
          <article-title>A new Ontology Lookup Service at EMBLEBI</article-title>
          .,
          <source>in: SWAT4LS</source>
          ,
          <year>2015</year>
          , pp.
          <fpage>118</fpage>
          -
          <lpage>119</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>B.</given-names>
            <surname>Eine</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Jurisch</surname>
          </string-name>
          , W. Quint,
          <source>Ontology-Based Big Data Management, Systems</source>
          <volume>5</volume>
          (
          <year>2017</year>
          ). URL: https://www.mdpi.
          <source>com/2079-8954/5/3/45. doi:1 0 . 3 3</source>
          <volume>9 0</volume>
          / s y s
          <source>t e m s 5 0</source>
          <volume>3 0 0 4 5 .</volume>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M. D.</given-names>
            <surname>Wilkinson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dumontier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. J.</given-names>
            <surname>Aalbersberg</surname>
          </string-name>
          , G. Appleton,
          <string-name>
            <given-names>M.</given-names>
            <surname>Axton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Baak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Blomberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-W.</given-names>
            <surname>Boiten</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. B. da Silva</given-names>
            <surname>Santos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. E.</given-names>
            <surname>Bourne</surname>
          </string-name>
          , et al.,
          <article-title>The FAIR Guiding Principles for scientific data management and stewardship</article-title>
          ,
          <source>Scientific Data</source>
          <volume>3</volume>
          (
          <year>2016</year>
          ).
          <source>doi:1 0 . 1 0 3 8 / s d a t a . 2 0</source>
          <volume>1 6 . 1</volume>
          <fpage>8</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>E.</given-names>
            <surname>Tseytlin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Mitchell</surname>
          </string-name>
          , E. Legowski,
          <string-name>
            <given-names>J.</given-names>
            <surname>Corrigan</surname>
          </string-name>
          , G. Chavan, R. S. Jacobson,
          <article-title>NOBLEFlexible concept recognition for large-scale biomedical natural language processing</article-title>
          ,
          <source>BMC Bioinformatics 17</source>
          (
          <year>2016</year>
          ).
          <source>doi:1 0 . 1 1 8 6 / s 1 2</source>
          <volume>8 5 9 - 0 1 5 - 0 8 7</volume>
          <fpage>1</fpage>
          -
          <lpage>y</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>C.</given-names>
            <surname>Jonquet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Youn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Callendar</surname>
          </string-name>
          , M.
          <article-title>-</article-title>
          <string-name>
            <surname>A. Storey</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Musen</surname>
          </string-name>
          , NCBO Annotator:
          <article-title>Semantic Annotation of Biomedical Data</article-title>
          , in: International Semantic Web Conference, Poster and Demo session, volume
          <volume>110</volume>
          ,
          <string-name>
            <surname>Washington</surname>
            <given-names>DC</given-names>
          </string-name>
          , USA,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>E.</given-names>
            <surname>Batbaatar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. H.</given-names>
            <surname>Ryu</surname>
          </string-name>
          ,
          <article-title>Ontology-Based Healthcare Named Entity Recognition from Twitter Messages Using a Recurrent Neural Network Approach</article-title>
          ,
          <source>International Journal of Environmental Research and Public Health</source>
          <volume>16</volume>
          (
          <year>2019</year>
          )
          <article-title>3628</article-title>
          .
          <source>doi:1 0 . 3 3 9 0 / i j e r p h 1</source>
          <volume>6 1 9 3 6 2 8 .</volume>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>X.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chowdhury</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Qian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Guan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <article-title>Transfer bi-directional LSTM RNN for named entity recognition in Chinese electronic medical records</article-title>
          ,
          <source>in: 2017 IEEE 19th International Conference on e-Health Networking, Applications and Services (Healthcom)</source>
          , IEEE,
          <year>2017</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>4</lpage>
          .
          <source>doi:1 0 . 1 1</source>
          0 9 / H e a l t h
          <source>C o m . 2 0</source>
          <volume>1 7 . 8 2 1 0 8 4 0 .</volume>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>A.</given-names>
            <surname>Arbabi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. R.</given-names>
            <surname>Adams</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Fidler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Brudno</surname>
          </string-name>
          , et al.,
          <source>Identifying Clinical Terms in Medical</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>