=Paper= {{Paper |id=Vol-3603/Paper11 |storemode=property |title=Using Ontology Embeddings With Deep Learning Architectures to Improve Prediction of Ontology Concepts From Literature |pdfUrl=https://ceur-ws.org/Vol-3603/Paper11.pdf |volume=Vol-3603 |authors=Pratik Devkota,Somya Mohanty,Prashanti Manda |dblpUrl=https://dblp.org/rec/conf/icbo/DevkotaMM23 }} ==Using Ontology Embeddings With Deep Learning Architectures to Improve Prediction of Ontology Concepts From Literature== https://ceur-ws.org/Vol-3603/Paper11.pdf
                                Using ontology embeddings with deep learning
                                architectures to improve prediction of ontology
                                concepts from literature
                                Pratik Devkota1 , Somya D. Mohanty2 and Prashanti Manda1
                                1
                                    Informatics and Analytics, University of North Carolina at Greensboro
                                2
                                    United Healthcare


                                                                         Abstract
                                                                         Natural language processing methods powered by deep learning have been well-studied over the past
                                                                         years for the task of automated ontology-based annotation of scientific literature. Many of these
                                                                         approaches focus solely on learning associations between text and ontology concepts and use that to
                                                                         annotate new text. However, a great deal of information is embedded in the ontology structure and
                                                                         semantics. Here, we present deep learning architectures that learn not only associations between text
                                                                         and ontology concepts but also the structure of the ontology. Our experiments show that creating
                                                                         architectures that are capable of learning the structure of the ontology result in enhanced annotation
                                                                         performance.

                                                                         Keywords
                                                                         natural language processing, gene ontology, deep learning, ontology annotation, ontology embeddings




                                1. Introduction
                                Biological ontologies are widely used for representing biological knowledge across a wide range
                                of sub-domains ranging from gene function to clinical diagnoses to evolutionary phenotypes
                                [1, 2, 3]. While the ontologies provide the necessary structure and concepts, the real benefits
                                of the ontologies can be reaped only when knowledge in scientific literature is represented
                                using these ontologies through annotation. The scale and pace of scientific publishing demands
                                sophisticated, fast, and most importantly, automated ways of processing scientific literature to
                                annotate relevant pieces of text with ontology concepts [4].
                                   Natural Language Processing (NLP) techniques beginning with lexical analysis, standard
                                machine learning approaches, and of late, powered by deep learning models have made big
                                strides in this area [5, 6, 7, 8, 9, 10]. Most NLP approaches for automated ontology annotation
                                treat the task as that of named entity recognition where relevant entities are identified and
                                associated with snippets of text. However, ontology based annotation is different from named
                                entity recognition in that there is a great amount of information embedded in the structure and

                                Proceedings of the International Conference on Biomedical Ontologies 2023, August 28th-September 1st, 2023, Brasilia,
                                Brazil
                                Envelope-Open p_devkota@uncg.edu (P. Devkota); mohanty.somya@gmail.com (S. D. Mohanty); p_manda@uncg.edu
                                (P. Manda)
                                Orcid 0000-0001-5161-0798 (P. Devkota); 0000-0002-4253-5201 (S. D. Mohanty); 0000-0002-7162-7770 (P. Manda)
                                                                       © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
                                    CEUR
                                    Workshop
                                    Proceedings
                                                  http://ceur-ws.org
                                                  ISSN 1613-0073
                                                                       CEUR Workshop Proceedings (CEUR-WS.org)




CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings



                                                                                                                                                                                                                 119
semantics of an ontology whereas generic entities can be independent objects. Knowledge of the
ontological structure and relationships is a crucial part of biological annotation when performed
by a human curator. It is therefore imperative to develop NLP models that are cognizant of
the ontological hierarchy and can effectively incorporate it into the prediction mechanism for
improved ontology concept recognition.
   The automated annotation models previously developed by this team [11, 8, 12, 10, 9] have
shown good accuracy in recognizing ontology concepts from text. In these studies our focus
was to teach the models to learn associations between text and ontology concepts found in the
gold standard corpus and use that knowledge to create new annotations. In a few studies, we
experimented with different techniques of using the ontology structure as one of the inputs in a
bid to improve annotation performance [10, 8, 9]. In some cases, these systems are able to predict
the same ontology concept as the ground truth in the gold standard data achieving perfect
accuracy. Incorporating ontology structure was a bid to improving partial accuracy in cases
where the model does not achieve a perfect match to the actual annotation. Our hypothesis
was that having knowledge of the ontology structure would enable the model to choose a
closely related/semantically similar concept to the actual annotation thereby improving overall
annotation performance as evaluated by semantic similarity.
   Our goal in this study is to develop deep learning architectures that learn not only patterns in
text but also the ontology structure. Our hypothesis is that the process of learning the ontology
structure would in turn improve prediction of annotations. Deep learning models learn patterns
in text and annotations from a gold standard corpus and similarly, we need to provide a gold
standard representation of the ontology structure so the models can learn to predict the ontology
structure.
   In this study, we use graph embeddings for representing the ontology structure. These graph
embeddings are used as a reference and reinforcement tool for the model as it learns to predicts
the ontology structure. Semantic embedding of large knowledge graphs has been long used
successfully for predictive tasks including natural language processing [13]. In recent years,
these semantic embeddings have been extended to OWL ontologies resulting in approaches
that can create embeddings for ontology concepts that effectively represent the structure and
semantics of the ontology [13, 14]. These embedding algorithms translate ontologies represented
as directed acyclic graphs into a vector space where the structure and the inherent semantics of
the graph are preserved [15].
   There are several approaches for learning ontology embeddings [16, 17] each with different
strengths. The approaches differ based on whether the ontology is directed, weighted, if it
dynamically changes over time, and the approach for learning the network [17]. In this study,
we selected Node2Vec [14] for learning ontology embeddings from the Gene Ontology since it
is widely used in literature for this task [17].
   We use the Colorado Richly Annotated Full Text Corpus (CRAFT) as a gold standard for
training and testing the performance of our architectures [18]. CRAFT is a widely used training
resource for automated annotation approaches. The current version of the CRAFT corpus (v4.0.1)
provides annotations for 97 biological/biomedical articles with concepts from 7 ontologies
including the GO.
   We hypothesize that the added information gained from ontology embeddings can improve
model performance in recognizing ontology concepts from scientific literature. We persent




                                                                                                      120
two deep learning architectures and explore how the different architectures combined with the
inclusion of ontology embeddings impacts annotation performance.


2. Related Work
The rise of deep learning in the areas of image and speech recognition has translated into
text-based problems as well. Preliminary research has shown that deep learning methods
result in greater accuracy for text-based tasks including identifying ontology concepts in
text [5, 19, 20, 21, 8]. These methods use vector representations that enable them to capture
dependencies and relationships between words using enriched representations of character and
word embeddings from training data [7].
   Our initial foray into this area involved a feasibility study of using deep learning for the task
of recognizing ontology concepts [11]. In a comparison of Gated Recurrent Units (GRUs), Long
Short Term Memory (LSTM), Recurrent Neural Networks (RNNs), and Multi Layer Perceptrons
(MLPs) along with a new deep learning model/architecture based on combining multiple GRUs,
we found GRUs to outperform the rest. These findings indicated that deep learning algorithms
are a promising avenue to be explored for automated ontology-based curation of data.
   In 2020, we presented new architectures based on GRUs and LSTMs combined with different
input encoding formats for automated annotation of ontology concepts [8]. We also created
multi level deep learning models designed to incorporate ontology hierarchy into the predic-
tion. Surprisingly, inclusion of ontology semantics via subsumption reasoning yielded modest
performance improvement [8]. This result indicated that more sophisticated approaches to take
advantage of the ontology hierarchy are needed.
   Continuing this work, a 2022 study [12] presented state of the art deep learning architectures
based on GRUs for annotating text with ontology concepts. We augmented the models with
additional information sources including NCBI’s BioThesauraus and Unified Medical Language
System (UMLS) to augment information from CRAFT for increasing prediction accuracy. We
demonstrated that augmenting the model with additional input pipelines can substantially
enhance prediction performance.
   Our next work explored a different approach to providing the ontology as input to the deep
learning model [8]. Subsequently, we presented an intelligent annotation system [10] that uses
the ontology hierarchy for training and predicting ontology concepts for pieces of text. Here,
we used a vector of semantic similarity scores to the ground truth and all ancestors in the
ontology to train the model. This representation allowed the model to identify the target GO
term followed by “similar” GO terms that are partially accurate predictions. We showed that
our ontology aware models can result in a 2% - 10% improvement over a baseline model that
doesn’t use ontology hierarchies.
   Our most recent contribution presented a method called Ontology Boosting [9]. A key
component of this approach is to combine the prediction of the deep learning architectures
with the graph structure of ontological concepts. Boosting amplifies the predicted probabilities
of certain concept predictions by combining them with the model predictions of the candidate’s
ancestors/subsumers. Results showed that the boosting step can result in a substantial bump in
prediction accuracy.




                                                                                                       121
3. Methods
3.1. Generating ontology embeddings
We used the Node2Vec approach for generating ontology embeddings from the Gene Ontology.
 The Node2Vec algorithm consists of two steps:
   1. Conduct random walks from a graph or ontology to generate sentences which are a list of
      ontology concepts. Once all random walks are conducted, the set of all sentences makes
      the corpus which is a representation of the ontology.
   2. The Word2Vec [22] algorithm is applied on the corpus to learn and generate embed-
      dings for each concept in the ontology. These embeddings are low dimensional vector
      representations of ontology concepts.
These embeddings or feature vectors can be used in downstream tasks such as classification or
natural language processing. in a downstream task such as node classification.
   We set the weight of all edges to 1 for weighted random walks indicating that all edges are
weighted equally. The length of random walk was set to 5 and the walk number set to 100.
Dimensionality of embeddings was set to 128 and batch size is set to 50 and the model was
trained for 2 epochs to learn the embeddings.

3.2. Deep Learning Architectures
Here, we present and test three sets of architectures:
   1. Baseline
          • Tag only (𝑇 𝑂)
          • Ontology Embedding only (𝑂𝐸𝑂)
   2. Cross-connected:
          • Tag to Ontology Embedding (𝑇 − > 𝑂𝐸)
          • Ontology Embedding to Tag (𝑂𝐸− > 𝑇)
   3. Multi-connected:
          • Embedding to Tag to Embedding

3.2.1. Baseline Architectures
We created two baseline architectures (Figure 1): Tag only (𝑇 𝑂) and Ontology Embedding only
(𝑂𝐸𝑂). The 𝑇 𝑂 architecture predicts tags/ontology IDs while the 𝑂𝐸𝑂 architecture predicts
ontology embeddings. The 𝑇 𝑂 architecture has previously been presented in our prior work
[10]. This architecture has been adjusted to create the 𝑂𝐸𝑂 architecture that predicts ontology
embeddings. Both baseline architectures consist of input pipelines, embedding/latent represen-
tations, and a deep learning model and produce either a probability vector of ontology IDs (𝑇 𝑂)
or an ontology embedding (𝑂𝐸𝑂) as the output.
   The baseline architectures use three inputs. Each word in a sentence from the CRAFT corpus
is represented by three inputs - 1) token (𝑋𝑡𝑟𝑎𝑖𝑛𝑡𝑜𝑘𝑒𝑛 ), 2) character sequence (𝑋 𝑐ℎ𝑎𝑟 ), 3) Parts-Of-
                                                                                  𝑡𝑟𝑎𝑖𝑛
                  𝑃𝑂𝑆 ).The token (𝑋 𝑡𝑜𝑘𝑒𝑛 ) input, is a sequential tensor consisting tokens, each
Speech (POS) (𝑋𝑡𝑟𝑎𝑖𝑛                𝑡𝑟𝑎𝑖𝑛




                                                                                                          122
                                                                                      𝑐ℎ𝑎𝑟 ) is
represented with a high dimensional one hot encoded vector. The character sequence (𝑋𝑡𝑟𝑎𝑖𝑛
also a sequential tensor consisting of character sequences present in a word/token. POS tags
   𝑃𝑂𝑆 ) indicate the type of words in a sentence.
(𝑋𝑡𝑟𝑎𝑖𝑛




Figure 1: Tag-only (𝑇 𝑂) and Ontology Embeddings only (𝑂𝐸𝑂) baseline architectures. 𝑇 𝑂 produces a
tag/ontology ID (𝑌𝑡𝑎𝑔 ) as output in the final block whereas 𝑂𝐸𝑂 produces an ontology embedding (𝑌𝑒𝑚𝑏 ).
The architectures also differ in that is used during back propagation to compute gradient loss. 𝑇 𝑂 uses
tags while 𝑂𝐸𝑂 uses Node2Vec ontology embeddings.


  Embeddings are used to provide a compressed latent space representation for very high
dimensional input components. For example, the one hot vectorization of an individual word
has a dimensionality of 34,166 (vocabulary size). In order to represent them succinctly and
with contextual representation, we use supervised embeddings created from the CRAFT corpus.
Note that these embeddings are different from the ontology embeddings discussed above. These
embeddings provide low dimensional representations of words in the training corpus and do
not use the ontology in any way.
  Both baseline architectures use a bi-directional gated recurrent model (Bi-GRU). The choice of
Bi-GRU for the architectures was informed by several of our prior works where this model has
consistently outperformed other models such as CNNs, RNNS, and LSTMs [8, 11]. Architecture
hyper-parameters were evaluated using a grid search approach. We used Adam [23] as our
optimiser for all of the experiments with a default learning rate of 0.001.
  The two baseline architectures differ primarily because of what they produce as output and
what they use during the propagation stages. In 𝑇 𝑂, the output is a tag/ontology ID where each
word in the input data is mapped to either a GO annotation or a non-annotation. 𝑇 𝑂 takes
the hidden/learned representations of the input from the preceding layers of the network and
applies softmax activation to produce a probability distribution over all possible ontology ids.
The predicted vector output values and ground truth values are compared to compute sparse
categorical cross entropy as loss, followed by backpropagation which involves computing the
gradients of the loss with respect to the model’s weights. The ontology ID with the highest
probability is regarded as the prediction.




                                                                                                           123
   In contrast, 𝑂𝐸𝑂 uses ground truth ontology embeddings generated using Node2Vec during
back propagation and for computing the loss functions. The intuition is that providing ontology
embeddings to the architecture during the propagation stages will enable it to get an understand-
ing of the ontology structure and eventually enable it to make more accurate and intelligent
predictions. The output of 𝑂𝐸𝑂 is an ontology embedding. The predicted ontology embedding
is compared to all ground truth ontology embeddings using cosine similarity calculation. The
ground truth ontology embedding that is most similar to the predicted embedding is identified
and the ontology ID associated with it is treated as the architecture’s prediction. Accuracy
metrics are then computed by comparing the predicted ontology ID to that in the CRAFT corpus.

3.3. Cross connected architectures
We developed two cross connected architectures: 1) Tag to Ontology Embedding (𝑇 − > 𝑂𝐸)
and 2) Ontology Embedding to Tag (𝑂𝐸− > 𝑇). Here we test if connecting the tag and ontology
embedding architectures causing one to inform the prediction of the other would result in
improved accuracy and if the direction of the connection matters. The 𝑇 − > 𝑂𝐸 architecture
(Figure 2) has two different outputs, tags/ontology ids and ontology embeddings. The tag output
(𝑌𝑡𝑎𝑔 in Figure 2) is concatenated with the output of the main Bi-GRU layer to give a higher
dimensional vector output. The concatenation is then passed through dense layers to further
learn the hierarchical representations of the ontology before generating an ontology embedding
for each input token. This predicted ontology embedding is compared with the ground truth
ontology embeddings learned using Node2Vec. Using cosine similarity as the loss function, loss
is calculated and the gradients are backpropagated to adjust the model’s weight for convergence.
   In 𝑂𝐸− > 𝑇, the ontology embedding output (𝑌𝑒𝑚𝑏 is concatenated with the output of the
main Bi-GRU layer to give a higher dimensional vector output. The concatenation is then
passed through dense layers before generating a tag for each input token. This predicted tag
is compared with the ground truth tag in CRAFT. The loss is calculated and the gradients are
backpropagated to adjust the model’s weight for convergence. The 𝑂𝐸− > 𝑇 architecture can
be depicted by switching the 𝑌𝑡 𝑎𝑔 and 𝑌𝑒 𝑚𝑏 blocks as well as the two outputs in Figure 2.
   Figure 3 presents an explanation of the 𝑇 − > 𝑂𝐸 cross-connected architecture on three exam-
ple tokens. Cross connected architectures differ from the baseline architectures by producing
both tags and ontology embeddings instead of one or the other. Here, we show that the training/
inference is done on a sequence of tokens “vesicle”, “formation”, and “in” (which are parts of a
sentence in the CRAFT corpus) as it is evaluated by the network. Each token is preprocessed to
                                         𝑡𝑜𝑘𝑒𝑛 , 𝑋 𝑐ℎ𝑎𝑟 , 𝑋 𝑃𝑂𝑆 which are passed through embedding
obtain the representative tensors – 𝑋𝑡𝑟𝑎𝑖𝑛        𝑡𝑟𝑎𝑖𝑛 𝑡𝑟𝑎𝑖𝑛
layers learned from CRAFT. The embedding of 𝑋𝑡𝑟𝑎𝑖𝑛    𝑐ℎ𝑎𝑟 is also passed via a Bi-GRU layer. All of the

resulting values are concatenated to be processed via the main Bi-GRU layer. The output from
‘Tag Dense Layer’ is concatenated with the output of main ‘Bi-GRU layer’ and passed as input
to the ‘Ontology Embedding Dense Layer’ where the model generates ontology embeddings for
each of the input tokens.




                                                                                                           124
Figure 2: Tag to Embedding architecture (𝑇 − > 𝑂𝐸). The tag output is further fed to the ontology
embedding prediction block resulting in a better embedding prediction.


3.4. Multi-connected Architecture
The final architecture (𝑂𝐸− > 𝑇 − > 𝑂𝐸) explores if ontology embeddings can be improved
iteratively by connecting a preliminary ontology embedding output to the tag output enabling
improvements to the tag prediction. This predicted tag block is connected back to the ontology
embedding block to urge further learning.

3.5. Performance Evaluation Metrics
We evaluate our architectures using a modified F1 score and semantic similarity [24]. Metrics
such as F1 are designed for traditional information retrieval systems that either retrieve a piece
of information or fail to do so (a binary evaluation). However, this is not a true indication of
the performance of ontology-based retrieval or prediction systems where the notion of partial
accuracy applies. A model might not predict the exact concept as a gold standard but might
predict the parent or an ancestor of the ground truth as indicated by the ontology. Semantic
similarity metrics [24] designed to measure different degrees of similarity between ontology
concepts can be leveraged to measure the similarity between the predicted concept and the
actual annotation to quantify the partial prediction accuracy. Here, we use Jaccard similarity
[24] that measures the ontological distance between two concepts to assess partial similarity.
   Since the majority of tags in the training corpus are non-annotations, the model predicts
them with great accuracy. In order to avoid biasing the F1 score, we omit accurate predictions of
non-annotations and focus instead on annotations only report a relatively conservative modified
F1 score.




                                                                                                     125
Figure 3: Illustration of the working of the 𝑇 − > 𝑂𝐸 architecture with an example sequence. The
architecture produces two outputs - 1) a tag and 2) an ontology embedding.


4. Results and Discussion
The CRAFT v4.0.1 dataset contains 18689 annotations pertaining to 974 concepts from the three
GO sub-ontologies across 97 articles. Table 1 provides further information of the coverage of
GO terms in CRAFT.
   The baseline tag-only architecture (𝑇 𝑂) resulted in a 0.80 F1 and a 0.83 semantic similarity
score. The baseline ontology embeddings only architecture (𝑂𝐸𝑂) resulted in a 0.65 F1 and a
0.74 semantic similarity.
   Among the two cross-connected architectures, we found that the Tag to Ontology Embedding
architecture (𝑇 − > 𝑂𝐸) substantially outperformed the 𝑂𝐸− > 𝑇 architecture according to F1
and was able to achieve similar performance as measured by semantic similarity. This indicates
that 𝑇 − > 𝑂𝐸 is better at generating exactly matching predictions resulting in high F1 and




                                                                                                   126
semantic similarity. In contrast, 𝑂𝐸− > 𝑇 performs better are generating semantically similar
matches rather than exact matches leading to lower F1 than semantic similarity scores.
   The 𝑇 − > 𝑂𝐸 architecture was able to improve upon 𝑂𝐸𝑂’s prediction of ontology embeddings
by 23% (F1) and 9.4% (semantic similarity). We observed relatively modest improvements to
𝑇 𝑂’s tag prediction with 3.8% (F1) and 1.2% (semantic similarity).
   Connecting ontology embedding output to the tag output (𝑂𝐸− > 𝑇) either did not improve
on the embedding prediction (F1) or resulted in a slight improvement (semantic similarity).
𝑂𝐸− > 𝑇 did produce improvements for tag prediction over the 𝑇 𝑂 model by 3.7% (F1) and
1.2% (semantic similarity). The multi-connected architecture did poorly in comparison to the
cross-connected architectures.
   Overall, the results suggest that architectures that use ontology embeddings only without
learning associations between text and annotations perform poorly. The other takeaway is that
connecting tag predictions to the ontology embedding block (𝑇 − > 𝑂𝐸) and letting embedding
prediction learn from the predicted tag iteratively results in more robust architectures. The
𝑇 − > 𝑂𝐸 cross-connected architecture results in improved performance in predicting both tags
and ontology embeddings across both metrics.

Table 1
Coverage of GO ontology concepts and annotations in the CRAFT corpus
                                                         Total annotations    Unique occurences
        GO sub-ontology          Concepts in ontology
                                                             in CRAFT             in CRAFT
     Biological Process (BP)     30490                   18392                710
    Cellular Component (CC)      4463                    6976                 241
    Molecular Function (MF)      12257                   464                  5



Table 2
Performance metrics of the three sets of architectures measured by F1 and Jaccard semantic similarity
                                            Ontology          Ontology                        Tag
                                                                                 Tag
              Architecture                 Embedding         Embedding                     Similarity
                                                                               F1 Score
                                             F1 Score      Similarity Score                  Score
                                         Baseline Architectures
             Tag-only (𝑇 𝑂)                       -                -             0.80         0.83
    Ontology Embedding Only (𝑂𝐸𝑂)               0.65             0.74              -            -

                                  Cross-connected Architectures
 Tag to Ontology Embedding (𝑇 − > 𝑂𝐸)       0.80             0.81                0.83         0.84
 Ontology Embedding to Tag (𝑂𝐸− > 𝑇)        0.64              0.75               0.83         0.84

                                     Multi-connected Architecture
            𝑂𝐸− > 𝑇 − > 𝑂𝐸                     0.78             0.80             0.82         0.83




                                                                                                        127
Acknowledgments
This work is funded by a CAREER grant to Manda from the Division of Biological Infrastructure
at the National Science Foundation of United States of America (#1942727).


References
 [1] T. R. Dalmer, R. D. Clugston, Gene ontology enrichment analysis of congenital diaphrag-
     matic hernia-associated genes, Pediatric research 85 (2019) 13–19.
 [2] D. Lee, N. de Keizer, F. Lau, R. Cornet, Literature review of snomed ct use, Journal of the
     American Medical Informatics Association 21 (2014) e11–e19.
 [3] R. C. Edmunds, B. Su, J. P. Balhoff, B. F. Eames, W. M. Dahdul, H. Lapp, J. G. Lundberg, T. J.
     Vision, R. A. Dunham, P. M. Mabee, et al., Phenoscape: identifying candidate genes for
     evolutionary phenotypes, Molecular biology and evolution 33 (2015) 13–24.
 [4] W. Dahdul, T. A. Dececchi, N. Ibrahim, H. Lapp, P. Mabee, Moving the mountain: analysis of
     the effort required to transform comparative anatomy into computable anatomy, Database
     2015 (2015).
 [5] G. Lample, M. Ballesteros, S. Subramanian, K. Kawakami, C. Dyer, Neural architectures
     for named entity recognition, arXiv preprint arXiv:1603.01360 (2016).
 [6] M. R. Boguslav, N. D. Hailu, M. Bada, W. A. Baumgartner, L. E. Hunter, Concept recognition
     as a machine translation problem, BMC bioinformatics 22 (2021) 1–39.
 [7] M. A. Casteleiro, G. Demetriou, W. Read, M. J. F. Prieto, N. Maroto, D. M. Fernandez,
     G. Nenadic, J. Klein, J. Keane, R. Stevens, Deep learning meets ontologies: experiments
     to anchor the cardiovascular disease ontology in the biomedical literature, Journal of
     biomedical semantics 9 (2018) 13.
 [8] P. Manda, S. SayedAhmed, S. D. Mohanty, Automated ontology-based annotation of
     scientific literature using deep learning, in: Proceedings of The International Workshop on
     Semantic Big Data, SBD ’20, Association for Computing Machinery, New York, NY, USA,
     2020. URL: https://doi.org/10.1145/3391274.3393636. doi:10.1145/3391274.3393636 .
 [9] P. Devkota, S. D. Mohanty, P. Manda, Ontology-powered boosting for improved recognition
     of ontology concepts from biological literature (2023).
[10] P. Devkota, S. Mohanty, P. Manda, Knowledge of the ancestors: Intelligent ontology-
     aware annotation of biological literature using semantic similarity, Proceedings of the
     International Conference on Biomedical Ontology (2022).
[11] P. Manda, L. Beasley, S. Mohanty, Taking a dive: Experiments in deep learning for auto-
     matic ontology-based annotation of scientific literature, Proceedings of the International
     Conference on Biomedical Ontology (2018).
[12] P. Devkota, S. D. Mohanty, P. Manda, A gated recurrent unit based architecture for
     recognizing ontology concepts from biological literature, BioData Mining 15 (2022) 1–23.
[13] J. Chen, P. Hu, E. Jimenez-Ruiz, O. M. Holter, D. Antonyrajah, I. Horrocks, Owl2vec*:
     Embedding of owl ontologies, Machine Learning 110 (2021) 1813–1845.
[14] A. Grover, J. Leskovec, node2vec: Scalable feature learning for networks, in: Proceedings




                                                                                                      128
     of the 22nd ACM SIGKDD international conference on Knowledge discovery and data
     mining, 2016, pp. 855–864.
[15] M. Ou, P. Cui, J. Pei, Z. Zhang, W. Zhu, Asymmetric transitivity preserving graph embed-
     ding, in: Proceedings of the 22nd ACM SIGKDD international conference on Knowledge
     discovery and data mining, 2016, pp. 1105–1114.
[16] H. Cai, V. W. Zheng, K. C.-C. Chang, A comprehensive survey of graph embedding: Prob-
     lems, techniques, and applications, IEEE transactions on knowledge and data engineering
     30 (2018) 1616–1637.
[17] I. Makarov, D. Kiselev, N. Nikitinsky, L. Subelj, Survey on graph embeddings and their
     applications to machine learning problems on graphs, PeerJ Computer Science 7 (2021)
     e357.
[18] M. Bada, M. Eckert, D. Evans, K. Garcia, K. Shipley, D. Sitnikov, W. A. Baumgartner,
     K. B. Cohen, K. Verspoor, J. A. Blake, L. E. Hunter, Concept annotation in the craft
     corpus, BMC Bioinformatics 13 (2012) 161. URL: https://doi.org/10.1186/1471-2105-13-161.
     doi:10.1186/1471- 2105- 13- 161 .
[19] M. Habibi, L. Weber, M. Neves, D. L. Wiegandt, U. Leser, Deep learning with word
     embeddings improves biomedical named entity recognition, Bioinformatics 33 (2017)
     i37–i48.
[20] C. Lyu, B. Chen, Y. Ren, D. Ji, Long short-term memory rnn for biomedical named entity
     recognition, BMC bioinformatics 18 (2017) 462.
[21] X. Wang, Y. Zhang, X. Ren, Y. Zhang, M. Zitnik, J. Shang, C. Langlotz, J. Han, Cross-
     type biomedical named entity recognition with deep multi-task learning, arXiv preprint
     arXiv:1801.09851 (2018).
[22] K. W. Church, Word2vec, Natural Language Engineering 23 (2017) 155–162.
[23] D. P. Kingma, J. Ba, Adam: A method for stochastic optimization, 2017. arXiv:1412.6980 .
[24] C. Pesquita, D. Faria, A. O. Falcao, P. Lord, F. M. Couto, Semantic similarity in biomedical
     ontologies, PLoS computational biology 5 (2009).




                                                                                                    129