<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>P. Devkota);</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>with deep learning architectures to improve prediction of ontology concepts from literature</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Pratik Devkota</string-name>
          <email>p_devkota@uncg.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Somya D. Mohanty</string-name>
          <email>mohanty.somya@gmail.com</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Prashanti Manda</string-name>
          <email>p_manda@uncg.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Informatics and Analytics, University of North Carolina at Greensboro</institution>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>Natural language processing methods powered by deep learning have been well-studied over the past years for the task of automated ontology-based annotation of scientific literature. Many of these approaches focus solely on learning associations between text and ontology concepts and use that to annotate new text. However, a great deal of information is embedded in the ontology structure and semantics. Here, we present deep learning architectures that learn not only associations between text and ontology concepts but also the structure of the ontology. Our experiments show that creating architectures that are capable of learning the structure of the ontology result in enhanced annotation performance.</p>
      </abstract>
      <kwd-group>
        <kwd>natural language processing</kwd>
        <kwd>gene ontology</kwd>
        <kwd>deep learning</kwd>
        <kwd>ontology annotation</kwd>
        <kwd>ontology embeddings</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Biological ontologies are widely used for representing biological knowledge across a wide range
of sub-domains ranging from gene function to clinical diagnoses to evolutionary phenotypes
[
        <xref ref-type="bibr" rid="ref1 ref2 ref3">1, 2, 3</xref>
        ]. While the ontologies provide the necessary structure and concepts, the real benefits
of the ontologies can be reaped only when knowledge in scientific literature is represented
using these ontologies through annotation. The scale and pace of scientific publishing demands
sophisticated, fast, and most importantly, automated ways of processing scientific literature to
annotate relevant pieces of text with ontology concepts [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>
        Natural Language Processing (NLP) techniques beginning with lexical analysis, standard
machine learning approaches, and of late, powered by deep learning models have made big
strides in this area [
        <xref ref-type="bibr" rid="ref10 ref5 ref6 ref7 ref8 ref9">5, 6, 7, 8, 9, 10</xref>
        ]. Most NLP approaches for automated ontology annotation
treat the task as that of named entity recognition where relevant entities are identified and
associated with snippets of text. However, ontology based annotation is diferent from named
entity recognition in that there is a great amount of information embedded in the structure and
Brazil
(P. Manda)
CEUR
CEUR
Workshop
Proceedings
      </p>
      <p>ceur-ws.org
ISSN1613-0073
semantics of an ontology whereas generic entities can be independent objects. Knowledge of the
ontological structure and relationships is a crucial part of biological annotation when performed
by a human curator. It is therefore imperative to develop NLP models that are cognizant of
the ontological hierarchy and can efectively incorporate it into the prediction mechanism for
improved ontology concept recognition.</p>
      <p>
        The automated annotation models previously developed by this team [
        <xref ref-type="bibr" rid="ref10 ref11 ref12 ref8 ref9">11, 8, 12, 10, 9</xref>
        ] have
shown good accuracy in recognizing ontology concepts from text. In these studies our focus
was to teach the models to learn associations between text and ontology concepts found in the
gold standard corpus and use that knowledge to create new annotations. In a few studies, we
experimented with diferent techniques of using the ontology structure as one of the inputs in a
bid to improve annotation performance [
        <xref ref-type="bibr" rid="ref10 ref8 ref9">10, 8, 9</xref>
        ]. In some cases, these systems are able to predict
the same ontology concept as the ground truth in the gold standard data achieving perfect
accuracy. Incorporating ontology structure was a bid to improving partial accuracy in cases
where the model does not achieve a perfect match to the actual annotation. Our hypothesis
was that having knowledge of the ontology structure would enable the model to choose a
closely related/semantically similar concept to the actual annotation thereby improving overall
annotation performance as evaluated by semantic similarity.
      </p>
      <p>Our goal in this study is to develop deep learning architectures that learn not only patterns in
text but also the ontology structure. Our hypothesis is that the process of learning the ontology
structure would in turn improve prediction of annotations. Deep learning models learn patterns
in text and annotations from a gold standard corpus and similarly, we need to provide a gold
standard representation of the ontology structure so the models can learn to predict the ontology
structure.</p>
      <p>
        In this study, we use graph embeddings for representing the ontology structure. These graph
embeddings are used as a reference and reinforcement tool for the model as it learns to predicts
the ontology structure. Semantic embedding of large knowledge graphs has been long used
successfully for predictive tasks including natural language processing [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. In recent years,
these semantic embeddings have been extended to OWL ontologies resulting in approaches
that can create embeddings for ontology concepts that efectively represent the structure and
semantics of the ontology [
        <xref ref-type="bibr" rid="ref13 ref14">13, 14</xref>
        ]. These embedding algorithms translate ontologies represented
as directed acyclic graphs into a vector space where the structure and the inherent semantics of
the graph are preserved [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ].
      </p>
      <p>
        There are several approaches for learning ontology embeddings [
        <xref ref-type="bibr" rid="ref16 ref17">16, 17</xref>
        ] each with diferent
strengths. The approaches difer based on whether the ontology is directed, weighted, if it
dynamically changes over time, and the approach for learning the network [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. In this study,
we selected Node2Vec [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] for learning ontology embeddings from the Gene Ontology since it
is widely used in literature for this task [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ].
      </p>
      <p>
        We use the Colorado Richly Annotated Full Text Corpus (CRAFT) as a gold standard for
training and testing the performance of our architectures [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. CRAFT is a widely used training
resource for automated annotation approaches. The current version of the CRAFT corpus (v4.0.1)
provides annotations for 97 biological/biomedical articles with concepts from 7 ontologies
including the GO.
      </p>
      <p>We hypothesize that the added information gained from ontology embeddings can improve
model performance in recognizing ontology concepts from scientific literature. We persent
two deep learning architectures and explore how the diferent architectures combined with the
inclusion of ontology embeddings impacts annotation performance.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        The rise of deep learning in the areas of image and speech recognition has translated into
text-based problems as well. Preliminary research has shown that deep learning methods
result in greater accuracy for text-based tasks including identifying ontology concepts in
text [
        <xref ref-type="bibr" rid="ref19 ref20 ref21 ref5 ref8">5, 19, 20, 21, 8</xref>
        ]. These methods use vector representations that enable them to capture
dependencies and relationships between words using enriched representations of character and
word embeddings from training data [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
      <p>
        Our initial foray into this area involved a feasibility study of using deep learning for the task
of recognizing ontology concepts [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. In a comparison of Gated Recurrent Units (GRUs), Long
Short Term Memory (LSTM), Recurrent Neural Networks (RNNs), and Multi Layer Perceptrons
(MLPs) along with a new deep learning model/architecture based on combining multiple GRUs,
we found GRUs to outperform the rest. These findings indicated that deep learning algorithms
are a promising avenue to be explored for automated ontology-based curation of data.
      </p>
      <p>
        In 2020, we presented new architectures based on GRUs and LSTMs combined with diferent
input encoding formats for automated annotation of ontology concepts [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. We also created
multi level deep learning models designed to incorporate ontology hierarchy into the
prediction. Surprisingly, inclusion of ontology semantics via subsumption reasoning yielded modest
performance improvement [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. This result indicated that more sophisticated approaches to take
advantage of the ontology hierarchy are needed.
      </p>
      <p>
        Continuing this work, a 2022 study [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] presented state of the art deep learning architectures
based on GRUs for annotating text with ontology concepts. We augmented the models with
additional information sources including NCBI’s BioThesauraus and Unified Medical Language
System (UMLS) to augment information from CRAFT for increasing prediction accuracy. We
demonstrated that augmenting the model with additional input pipelines can substantially
enhance prediction performance.
      </p>
      <p>
        Our next work explored a diferent approach to providing the ontology as input to the deep
learning model [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Subsequently, we presented an intelligent annotation system [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] that uses
the ontology hierarchy for training and predicting ontology concepts for pieces of text. Here,
we used a vector of semantic similarity scores to the ground truth and all ancestors in the
ontology to train the model. This representation allowed the model to identify the target GO
term followed by “similar” GO terms that are partially accurate predictions. We showed that
our ontology aware models can result in a 2% - 10% improvement over a baseline model that
doesn’t use ontology hierarchies.
      </p>
      <p>
        Our most recent contribution presented a method called Ontology Boosting [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. A key
component of this approach is to combine the prediction of the deep learning architectures
with the graph structure of ontological concepts. Boosting amplifies the predicted probabilities
of certain concept predictions by combining them with the model predictions of the candidate’s
ancestors/subsumers. Results showed that the boosting step can result in a substantial bump in
prediction accuracy.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Methods</title>
      <sec id="sec-3-1">
        <title>3.1. Generating ontology embeddings</title>
        <p>We used the Node2Vec approach for generating ontology embeddings from the Gene Ontology.</p>
        <p>
          The Node2Vec algorithm consists of two steps:
1. Conduct random walks from a graph or ontology to generate sentences which are a list of
ontology concepts. Once all random walks are conducted, the set of all sentences makes
the corpus which is a representation of the ontology.
2. The Word2Vec [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ] algorithm is applied on the corpus to learn and generate
embeddings for each concept in the ontology. These embeddings are low dimensional vector
representations of ontology concepts.
        </p>
        <p>These embeddings or feature vectors can be used in downstream tasks such as classification or
natural language processing. in a downstream task such as node classification.</p>
        <p>We set the weight of all edges to 1 for weighted random walks indicating that all edges are
weighted equally. The length of random walk was set to 5 and the walk number set to 100.
Dimensionality of embeddings was set to 128 and batch size is set to 50 and the model was
trained for 2 epochs to learn the embeddings.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Deep Learning Architectures</title>
        <p>Here, we present and test three sets of architectures:
1. Baseline
• Tag only (  )
• Ontology Embedding only (</p>
        <p>
          )
2. Cross-connected:
3. Multi-connected:
• Tag to Ontology Embedding ( − &gt; 
• Ontology Embedding to Tag (− &gt; 
)
)
• Embedding to Tag to Embedding
3.2.1. Baseline Architectures
We created two baseline architectures (Figure 1): Tag only (  ) and Ontology Embedding only
( ). The   architecture predicts tags/ontology IDs while the  architecture predicts
ontology embeddings. The   architecture has previously been presented in our prior work
[
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. This architecture has been adjusted to create the  architecture that predicts ontology
embeddings. Both baseline architectures consist of input pipelines, embedding/latent
representations, and a deep learning model and produce either a probability vector of ontology IDs (  )
or an ontology embedding ( ) as the output.
        </p>
        <p>The baseline architectures use three inputs. Each word in a sentence from the CRAFT corpus
is represented by three inputs - 1) token (  ), 2) character sequence ( ℎ ), 3)
Parts-OfSpeech (POS) (  ).The token (  ) input, is a sequential tensor consisting tokens, each
ℎ ) is
represented with a high dimensional one hot encoded vector. The character sequence ( 
also a sequential tensor consisting of character sequences present in a word/token. POS tags
(  ) indicate the type of words in a sentence.</p>
        <p>Embeddings are used to provide a compressed latent space representation for very high
dimensional input components. For example, the one hot vectorization of an individual word
has a dimensionality of 34,166 (vocabulary size). In order to represent them succinctly and
with contextual representation, we use supervised embeddings created from the CRAFT corpus.
Note that these embeddings are diferent from the ontology embeddings discussed above. These
embeddings provide low dimensional representations of words in the training corpus and do
not use the ontology in any way.</p>
        <p>
          Both baseline architectures use a bi-directional gated recurrent model (Bi-GRU). The choice of
Bi-GRU for the architectures was informed by several of our prior works where this model has
consistently outperformed other models such as CNNs, RNNS, and LSTMs [
          <xref ref-type="bibr" rid="ref11 ref8">8, 11</xref>
          ]. Architecture
hyper-parameters were evaluated using a grid search approach. We used Adam [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ] as our
optimiser for all of the experiments with a default learning rate of 0.001.
        </p>
        <p>The two baseline architectures difer primarily because of what they produce as output and
what they use during the propagation stages. In   , the output is a tag/ontology ID where each
word in the input data is mapped to either a GO annotation or a non-annotation.   takes
the hidden/learned representations of the input from the preceding layers of the network and
applies softmax activation to produce a probability distribution over all possible ontology ids.
The predicted vector output values and ground truth values are compared to compute sparse
categorical cross entropy as loss, followed by backpropagation which involves computing the
gradients of the loss with respect to the model’s weights. The ontology ID with the highest
probability is regarded as the prediction.</p>
        <p>In contrast,  uses ground truth ontology embeddings generated using Node2Vec during
back propagation and for computing the loss functions. The intuition is that providing ontology
embeddings to the architecture during the propagation stages will enable it to get an
understanding of the ontology structure and eventually enable it to make more accurate and intelligent
predictions. The output of  is an ontology embedding. The predicted ontology embedding
is compared to all ground truth ontology embeddings using cosine similarity calculation. The
ground truth ontology embedding that is most similar to the predicted embedding is identified
and the ontology ID associated with it is treated as the architecture’s prediction. Accuracy
metrics are then computed by comparing the predicted ontology ID to that in the CRAFT corpus.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Cross connected architectures</title>
        <p>We developed two cross connected architectures: 1) Tag to Ontology Embedding ( − &gt;  )
and 2) Ontology Embedding to Tag (− &gt;  ). Here we test if connecting the tag and ontology
embedding architectures causing one to inform the prediction of the other would result in
improved accuracy and if the direction of the connection matters. The  − &gt;  architecture
(Figure 2) has two diferent outputs, tags/ontology ids and ontology embeddings. The tag output
(  in Figure 2) is concatenated with the output of the main Bi-GRU layer to give a higher
dimensional vector output. The concatenation is then passed through dense layers to further
learn the hierarchical representations of the ontology before generating an ontology embedding
for each input token. This predicted ontology embedding is compared with the ground truth
ontology embeddings learned using Node2Vec. Using cosine similarity as the loss function, loss
is calculated and the gradients are backpropagated to adjust the model’s weight for convergence.</p>
        <p>In − &gt;  , the ontology embedding output (  is concatenated with the output of the
main Bi-GRU layer to give a higher dimensional vector output. The concatenation is then
passed through dense layers before generating a tag for each input token. This predicted tag
is compared with the ground truth tag in CRAFT. The loss is calculated and the gradients are
backpropagated to adjust the model’s weight for convergence. The − &gt;  architecture can
be depicted by switching the    and    blocks as well as the two outputs in Figure 2.</p>
        <p>Figure 3 presents an explanation of the  − &gt;  cross-connected architecture on three
example tokens. Cross connected architectures difer from the baseline architectures by producing
both tags and ontology embeddings instead of one or the other. Here, we show that the training/
inference is done on a sequence of tokens “vesicle”, “formation”, and “in” (which are parts of a
sentence in the CRAFT corpus) as it is evaluated by the network. Each token is preprocessed to
obtain the representative tensors –   ,  ℎ ,   which are passed through embedding
layers learned from CRAFT. The embedding of  ℎ is also passed via a Bi-GRU layer. All of the
resulting values are concatenated to be processed via the main Bi-GRU layer. The output from
‘Tag Dense Layer’ is concatenated with the output of main ‘Bi-GRU layer’ and passed as input
to the ‘Ontology Embedding Dense Layer’ where the model generates ontology embeddings for
each of the input tokens.</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Multi-connected Architecture</title>
        <p>The final architecture ( − &gt;  − &gt;  ) explores if ontology embeddings can be improved
iteratively by connecting a preliminary ontology embedding output to the tag output enabling
improvements to the tag prediction. This predicted tag block is connected back to the ontology
embedding block to urge further learning.</p>
      </sec>
      <sec id="sec-3-5">
        <title>3.5. Performance Evaluation Metrics</title>
        <p>
          We evaluate our architectures using a modified F1 score and semantic similarity [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ]. Metrics
such as F1 are designed for traditional information retrieval systems that either retrieve a piece
of information or fail to do so (a binary evaluation). However, this is not a true indication of
the performance of ontology-based retrieval or prediction systems where the notion of partial
accuracy applies. A model might not predict the exact concept as a gold standard but might
predict the parent or an ancestor of the ground truth as indicated by the ontology. Semantic
similarity metrics [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ] designed to measure diferent degrees of similarity between ontology
concepts can be leveraged to measure the similarity between the predicted concept and the
actual annotation to quantify the partial prediction accuracy. Here, we use Jaccard similarity
[
          <xref ref-type="bibr" rid="ref24">24</xref>
          ] that measures the ontological distance between two concepts to assess partial similarity.
        </p>
        <p>Since the majority of tags in the training corpus are non-annotations, the model predicts
them with great accuracy. In order to avoid biasing the F1 score, we omit accurate predictions of
non-annotations and focus instead on annotations only report a relatively conservative modified
F1 score.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results and Discussion</title>
      <p>The CRAFT v4.0.1 dataset contains 18689 annotations pertaining to 974 concepts from the three
GO sub-ontologies across 97 articles. Table 1 provides further information of the coverage of
GO terms in CRAFT.</p>
      <p>The baseline tag-only architecture (  ) resulted in a 0.80 F1 and a 0.83 semantic similarity
score. The baseline ontology embeddings only architecture ( ) resulted in a 0.65 F1 and a
0.74 semantic similarity.</p>
      <p>Among the two cross-connected architectures, we found that the Tag to Ontology Embedding
architecture ( − &gt;  ) substantially outperformed the − &gt;  architecture according to F1
and was able to achieve similar performance as measured by semantic similarity. This indicates
that  − &gt;  is better at generating exactly matching predictions resulting in high F1 and
semantic similarity. In contrast, − &gt;  performs better are generating semantically similar
matches rather than exact matches leading to lower F1 than semantic similarity scores.</p>
      <p>The  − &gt;  architecture was able to improve upon  ’s prediction of ontology embeddings
by 23% (F1) and 9.4% (semantic similarity). We observed relatively modest improvements to
  ’s tag prediction with 3.8% (F1) and 1.2% (semantic similarity).</p>
      <p>Connecting ontology embedding output to the tag output (− &gt;  ) either did not improve
on the embedding prediction (F1) or resulted in a slight improvement (semantic similarity).
− &gt;  did produce improvements for tag prediction over the   model by 3.7% (F1) and
1.2% (semantic similarity). The multi-connected architecture did poorly in comparison to the
cross-connected architectures.</p>
      <p>Overall, the results suggest that architectures that use ontology embeddings only without
learning associations between text and annotations perform poorly. The other takeaway is that
connecting tag predictions to the ontology embedding block ( − &gt;  ) and letting embedding
prediction learn from the predicted tag iteratively results in more robust architectures. The
 − &gt;  cross-connected architecture results in improved performance in predicting both tags
and ontology embeddings across both metrics.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>This work is funded by a CAREER grant to Manda from the Division of Biological Infrastructure
at the National Science Foundation of United States of America (#1942727).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>T. R.</given-names>
            <surname>Dalmer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. D.</given-names>
            <surname>Clugston</surname>
          </string-name>
          ,
          <article-title>Gene ontology enrichment analysis of congenital diaphragmatic hernia-associated genes</article-title>
          ,
          <source>Pediatric research 85</source>
          (
          <year>2019</year>
          )
          <fpage>13</fpage>
          -
          <lpage>19</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>D.</given-names>
            <surname>Lee</surname>
          </string-name>
          , N. de Keizer,
          <string-name>
            <given-names>F.</given-names>
            <surname>Lau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Cornet</surname>
          </string-name>
          ,
          <article-title>Literature review of snomed ct use</article-title>
          ,
          <source>Journal of the American Medical Informatics Association</source>
          <volume>21</volume>
          (
          <year>2014</year>
          )
          <fpage>e11</fpage>
          -
          <lpage>e19</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>R. C.</given-names>
            <surname>Edmunds</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Su</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. P.</given-names>
            <surname>Balhof</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. F.</given-names>
            <surname>Eames</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W. M.</given-names>
            <surname>Dahdul</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Lapp</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. G.</given-names>
            <surname>Lundberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. J.</given-names>
            <surname>Vision</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. A.</given-names>
            <surname>Dunham</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. M.</given-names>
            <surname>Mabee</surname>
          </string-name>
          , et al.,
          <article-title>Phenoscape: identifying candidate genes for evolutionary phenotypes</article-title>
          ,
          <source>Molecular biology and evolution 33</source>
          (
          <year>2015</year>
          )
          <fpage>13</fpage>
          -
          <lpage>24</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>W.</given-names>
            <surname>Dahdul</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. A.</given-names>
            <surname>Dececchi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ibrahim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Lapp</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Mabee</surname>
          </string-name>
          ,
          <article-title>Moving the mountain: analysis of the efort required to transform comparative anatomy into computable anatomy</article-title>
          ,
          <year>Database 2015</year>
          (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>G.</given-names>
            <surname>Lample</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ballesteros</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Subramanian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Kawakami</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Dyer</surname>
          </string-name>
          ,
          <article-title>Neural architectures for named entity recognition</article-title>
          ,
          <source>arXiv preprint arXiv:1603.01360</source>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M. R.</given-names>
            <surname>Boguslav</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. D.</given-names>
            <surname>Hailu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W. A.</given-names>
            <surname>Baumgartner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. E.</given-names>
            <surname>Hunter</surname>
          </string-name>
          ,
          <article-title>Concept recognition as a machine translation problem</article-title>
          ,
          <source>BMC bioinformatics 22</source>
          (
          <year>2021</year>
          )
          <fpage>1</fpage>
          -
          <lpage>39</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Casteleiro</surname>
          </string-name>
          , G. Demetriou,
          <string-name>
            <given-names>W.</given-names>
            <surname>Read</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. J. F.</given-names>
            <surname>Prieto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Maroto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. M.</given-names>
            <surname>Fernandez</surname>
          </string-name>
          , G. Nenadic,
          <string-name>
            <given-names>J.</given-names>
            <surname>Klein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Keane</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Stevens</surname>
          </string-name>
          ,
          <article-title>Deep learning meets ontologies: experiments to anchor the cardiovascular disease ontology in the biomedical literature</article-title>
          ,
          <source>Journal of biomedical semantics 9</source>
          (
          <year>2018</year>
          )
          <fpage>13</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>P.</given-names>
            <surname>Manda</surname>
          </string-name>
          , S. SayedAhmed, S. D. Mohanty,
          <article-title>Automated ontology-based annotation of scientific literature using deep learning</article-title>
          ,
          <source>in: Proceedings of The International Workshop on Semantic Big Data, SBD '20</source>
          ,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2020</year>
          . URL: https://doi.org/10.1145/3391274.3393636. doi:
          <volume>10</volume>
          .1145/3391274.3393636.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>P.</given-names>
            <surname>Devkota</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. D.</given-names>
            <surname>Mohanty</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Manda</surname>
          </string-name>
          ,
          <article-title>Ontology-powered boosting for improved recognition of ontology concepts from biological literature (</article-title>
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>P.</given-names>
            <surname>Devkota</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mohanty</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Manda</surname>
          </string-name>
          ,
          <article-title>Knowledge of the ancestors: Intelligent ontologyaware annotation of biological literature using semantic similarity</article-title>
          ,
          <source>Proceedings of the International Conference on Biomedical Ontology</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>P.</given-names>
            <surname>Manda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Beasley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mohanty</surname>
          </string-name>
          ,
          <article-title>Taking a dive: Experiments in deep learning for automatic ontology-based annotation of scientific literature</article-title>
          ,
          <source>Proceedings of the International Conference on Biomedical Ontology</source>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>P.</given-names>
            <surname>Devkota</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. D.</given-names>
            <surname>Mohanty</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Manda</surname>
          </string-name>
          ,
          <article-title>A gated recurrent unit based architecture for recognizing ontology concepts from biological literature</article-title>
          ,
          <source>BioData Mining</source>
          <volume>15</volume>
          (
          <year>2022</year>
          )
          <fpage>1</fpage>
          -
          <lpage>23</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>J.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Jimenez-Ruiz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O. M.</given-names>
            <surname>Holter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Antonyrajah</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Horrocks</surname>
          </string-name>
          , Owl2vec*: Embedding of owl ontologies,
          <source>Machine Learning</source>
          <volume>110</volume>
          (
          <year>2021</year>
          )
          <fpage>1813</fpage>
          -
          <lpage>1845</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>A.</given-names>
            <surname>Grover</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Leskovec</surname>
          </string-name>
          , node2vec:
          <article-title>Scalable feature learning for networks</article-title>
          ,
          <source>in: Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>855</fpage>
          -
          <lpage>864</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>M.</given-names>
            <surname>Ou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Cui</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Pei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , W. Zhu,
          <article-title>Asymmetric transitivity preserving graph embedding</article-title>
          ,
          <source>in: Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>1105</fpage>
          -
          <lpage>1114</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>H.</given-names>
            <surname>Cai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. W.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <string-name>
            <surname>K. C.-C. Chang</surname>
          </string-name>
          ,
          <article-title>A comprehensive survey of graph embedding: Problems, techniques, and applications</article-title>
          ,
          <source>IEEE transactions on knowledge and data engineering 30</source>
          (
          <year>2018</year>
          )
          <fpage>1616</fpage>
          -
          <lpage>1637</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>I.</given-names>
            <surname>Makarov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kiselev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Nikitinsky</surname>
          </string-name>
          ,
          <string-name>
            <surname>L. Subelj,</surname>
          </string-name>
          <article-title>Survey on graph embeddings and their applications to machine learning problems on graphs</article-title>
          ,
          <source>PeerJ Computer Science</source>
          <volume>7</volume>
          (
          <year>2021</year>
          )
          <article-title>e357</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>M.</given-names>
            <surname>Bada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Eckert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Evans</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Garcia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Shipley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Sitnikov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W. A.</given-names>
            <surname>Baumgartner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. B.</given-names>
            <surname>Cohen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Verspoor</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Blake</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. E.</given-names>
            <surname>Hunter</surname>
          </string-name>
          ,
          <article-title>Concept annotation in the craft corpus</article-title>
          ,
          <source>BMC Bioinformatics 13</source>
          (
          <year>2012</year>
          )
          <article-title>161</article-title>
          . URL: https://doi.org/10.1186/
          <fpage>1471</fpage>
          -2105-13-
          <lpage>161</lpage>
          . doi:
          <volume>10</volume>
          .1186/
          <fpage>1471</fpage>
          - 2105- 13- 161.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>M.</given-names>
            <surname>Habibi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Weber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Neves</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. L.</given-names>
            <surname>Wiegandt</surname>
          </string-name>
          , U. Leser,
          <article-title>Deep learning with word embeddings improves biomedical named entity recognition</article-title>
          ,
          <source>Bioinformatics</source>
          <volume>33</volume>
          (
          <year>2017</year>
          )
          <fpage>i37</fpage>
          -
          <lpage>i48</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>C.</given-names>
            <surname>Lyu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ren</surname>
          </string-name>
          ,
          <string-name>
            <surname>D.</surname>
          </string-name>
          <article-title>Ji, Long short-term memory rnn for biomedical named entity recognition</article-title>
          ,
          <source>BMC bioinformatics 18</source>
          (
          <year>2017</year>
          )
          <fpage>462</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Ren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zitnik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Shang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Langlotz</surname>
          </string-name>
          , J. Han,
          <article-title>Crosstype biomedical named entity recognition with deep multi-task learning</article-title>
          , arXiv preprint arXiv:
          <year>1801</year>
          .
          <volume>09851</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>K. W.</given-names>
            <surname>Church</surname>
          </string-name>
          , Word2vec,
          <source>Natural Language Engineering</source>
          <volume>23</volume>
          (
          <year>2017</year>
          )
          <fpage>155</fpage>
          -
          <lpage>162</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>D. P.</given-names>
            <surname>Kingma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Ba</surname>
          </string-name>
          ,
          <article-title>Adam: A method for stochastic optimization</article-title>
          ,
          <year>2017</year>
          . arXiv:
          <volume>1412</volume>
          .
          <fpage>6980</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>C.</given-names>
            <surname>Pesquita</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Faria</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. O.</given-names>
            <surname>Falcao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Lord</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. M.</given-names>
            <surname>Couto</surname>
          </string-name>
          ,
          <article-title>Semantic similarity in biomedical ontologies</article-title>
          ,
          <source>PLoS computational biology 5</source>
          (
          <year>2009</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>