<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Truveta Mapper: A Zero-shot Ontology Alignment Framework</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Mariyam Amir</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Murchana Baruah</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mahsa Eslamialishah</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sina Ehsani</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alireza Bahramali</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sadra Naddaf-Sh</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Saman Zarandioon</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Truveta</institution>
          ,
          <addr-line>Bellevue, WA 98004</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <abstract>
        <p>In this paper, a new perspective is suggested for unsupervised Ontology Matching (OM) or Ontology Alignment (OA) by treating it as a translation task. Ontologies are represented as graphs, and the translation is performed from a node in the source ontology graph to a path in the target ontology graph. The proposed framework, Truveta Mapper (TM), leverages a multi-task sequence-to-sequence transformer model to perform alignment across multiple ontologies in a zero-shot, unified and endto-end manner. Multi-tasking enables the model to implicitly learn the relationship between diferent ontologies via transfer-learning without requiring any explicit cross-ontology manually labeled data. This also enables the formulated framework to outperform existing solutions for both runtime latency and alignment quality. The model is pre-trained and fine-tuned only on publicly available text corpus and inner-ontologies data. The proposed solution outperforms state-of-the-art approaches, Edit-Similarity, LogMap, AML, BERTMap, and the recently presented new OM frameworks in Ontology Alignment Evaluation Initiative (OAEI22), ofers log-linear complexity, and overall makes the OM task eficient and more straightforward without much post-processing involving mapping extension or mapping repair. We are open sourcing our solution+.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Ontology Alignment</kwd>
        <kwd>Ontology Matching</kwd>
        <kwd>multi-task sequence-to-sequence</kwd>
        <kwd>contextual matching</kwd>
        <kwd>transfer learning</kwd>
        <kwd>zero-shot learning</kwd>
        <kwd>attention-based models</kwd>
        <kwd>deep learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Ontology Matching (OM) or Ontology Alignment (OA) is the process of finding correspondence
between the entities of two ontologies. The purpose of this process is to unify data from diferent
sources and reduce heterogeneity, making data more viable for research and development [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
Classical state-of-the-art (SOTA) approaches on OM are based on non-contextual matching,
where the model captures lexical similarity but fails to understand textual semantics. On the
other hand, with contextual approaches, the objective is to match complex pairs which are
lexically diferent but semantically similar and vice-versa.
      </p>
      <p>
        Recently, a transformer-based contextual framework using BERT [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], has been proposed
in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], which showed promising results compared to other OM systems. In their approach
the existing pre-trained BERT model was used which allowed to learn the textual semantics.
However, it did not understand the ontology graph structure, which could significantly extend
the capabilities of ontologies graph matching.
      </p>
      <p>Motivated by the potential of the transformer models for understanding textual semantic
context and overcoming the limitations in the existing methods, the present work proposes
Truveta Mapper (TM), a novel zero-shot sequence-to-sequence multi-task transformer-based
framework for OM, with the capability of learning both the graph-structure and textual semantics
of the ontologies. The model is first pre-trained to learn the hierarchical graph structure of
ontology and semantics of each class using Masked Language Modeling (MLM), then fine-tuned
using class labels and synonyms as input and class hierarchical-ID as the output, capturing
the structure of the ontology. As such, we treat OM as a translation task, where the source
ontology class is translated to a path in the matching target ontology class in a zero-shot and
multitask manner. Proposed approach is based on zero-shot learning and prediction, where
“zero-shot learning” refers to the ability of the model to make source-to-target predictions
without requiring manually labeled cross-ontologies matching pairs, and “zero-shot prediction”
performs end-to-end mapping from the source to the target without the need of similarity
calculation across the entire/subset target ontology or post-processing like extension/repair.
With multi-task training, a single model is capable of matching diferent ontologies such as
SNOMED to FMA, SNOMED to NCIT, and so on, by taking advantage of transfer learning.</p>
      <p>
        In this work, empirical comparison is made with the state-of-the-art lexical matching
approaches and the recent contextual models presented in [
        <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
        ] on the Unified Medical Language
System (UMLS) datasets as part of the Bio-ML track for OAEI 2022. The Ontology Alignment
Evaluation Initiative (OAEI) organizes yearly campaigns on ontology matching tasks. Our
solution surpasses state-of-the-art LogMap, AML models, Edit-similarity, and recently proposed
BERTMap, AMD, LogMap-Lite, BERTMap-Lite, LSMatch, Matcha and ATMatcher.
      </p>
      <p>The remainder of this paper is as follows. Section 2 reviews the recent SOTA-related works
on OM/OA; Section 3 defines the problem statement; Section 4 elaborates on pre-training,
ifne-tuning, zero-shot learning, and predictions; Section 5 shows the overall model performance;
and lastly, Section 6 provides conclusion and outlines our potential future works.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        OM classical approaches are primarily based on non-contextual matching. Related to that, some
notable works in the field of OM include Edit-Similarity [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], LSMatch [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], LogMap [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], and
AgreementMakerLight (AML) [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], among others. Edit-Similarity is a naïve lexical matching
approach based on normalized edit similarity scores. LSMatch is another lexical matching
approach based on string similarity match. LogMap and AML are two classical OM systems
with leading performance in many equivalence matching tasks. These two approaches are
based on lexical matching, mapping extension (adding new mappings for semantically related
classes of the current mappings), and mapping repair (removing mappings that can lead to
logical conflicts). However, these lexical approaches do not consider contextual semantics.
      </p>
      <p>Several OM systems, such as OntoEmma [10], DeepAlignment [11], VeeAlign [12] leveraged
dense word embeddings, in which words are projected into a vector. Word pairs with smaller
Euclidean distances in the vector space have closer semantic meanings. Diferent techniques
are used to generate these embeddings. OntoEmma and [13] uses word2vec [14], which is
trained on Wikipedia; [15] uses FastText [16]; LogMap-ML [17] uses OWL2Vec* [18], which
is a word2vec model trained on corpora extracted from the ontology with diferent kinds
of semantics; DeepAlignment uses refined word embeddings using counter-fitting; VeeAlign
proposes dual embeddings using class labels; [19] employs hyperbolic embeddings. These are
primarily traditional non-contextual word embedding methods and do not consider contextual
meaning. Some of these approaches, such as VeeAlign, are based on supervised training, which
requires high-quality labeled mappings for training and can be challenging to obtain.</p>
      <p>
        Transformer-based models [20], thanks to their ability to learn textual contexts, obtained
SOTA for several tasks in natural language processing such as machine translation [21, 22, 23],
question answering [24], among others. Similarly, in the field of OM, recent developments have
also shown the potential of using transformer-based frameworks [
        <xref ref-type="bibr" rid="ref1 ref3">1, 3, 25</xref>
        ]. Neutel and de Boer
[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] employed contextual BERT embeddings to match two domain ontologies associated with
occupations. Each sentence is embedded using BERT, and similarity is applied to get the scores
for OM. More recently, [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] proposed BERTMap model, which is obtained by fine-tuning the
already pre-trained BERT model for the binary classification task. The BERTMap model often
outperformed non-contextual approaches such as LogMap, AML, and LogMap-ML. AMD [25]
is another recent context-based matching approach that uses a BERT-based model to generate
mappings and then filters these mappings using graph embedding techniques. Other related
ontology matching systems that participated in OAEI 2022 [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] are LogMap-Lite, BERTMap-Lite,
Matcha, and ATMatcher.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <sec id="sec-3-1">
        <title>3.1. Problem statement</title>
        <p>
          Ontology Matching (OM) or Ontology Alignment (OA) is the process of finding correspondence
between the entities/classes of two ontologies [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. In this work, a new perspective is presented
by treating OM as a translation task for equivalence matching and can be mathematically
presented as  ( 1,  ) , where function  gives the matching target ontology class  2 ∈  2, given
a source class  1 ∈  1 and alignment task identifier as  .  1 and  2 as the source and target
ontologies, with  1 and  2 being their respective named class sets. Since we are training a
multi-task model, a unique identifier is used for each task.
        </p>
        <p>The present work focuses on equivalence matching, where classes having the same semantic
meaning in diferent ontologies are matched with each other. As shown in Figure 1, each
ontology is presented in the form of a hierarchical graph structure with parent-child relation,
where each class presents a node in the given ontology graph and target class  2 ∈  2 is obtained
as a path in the target ontology graph, for a given input node  1 ∈  1 in the input ontology1.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Ontologies</title>
        <p>
          In this work, as a part of the OAEI 2022 Bio-ML track [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], we focus on three UMLS equivalence
matching tasks, SNOMED to FMA (Body), SNOMED to NCIT (Neoplas), and SNOMED to
NCIT (Pharm), in an unsupervised setting from [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], where the matching pairs between these
ontologies are only divided into validation (10%) and testing (90%) sets, without any training
data. Pharm, Neoplas, and Body are associated with the semantic types of “Pharmacologic
1Note, each class is presented as a node in the ontology hierarchical graph-structure, as such, class and node are
used interchangeably, as appropriate.
        </p>
        <p>
          Substance”, “Neoplastic Process”, and “Body Part, Organ, or Organ Components” in UMLS,
respectively. Based on these semantic types, subset ontologies are provided in [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], and are given
as SNOMED (Body), SNOMED (Neoplas), SNOMED (Pharm), FMA (Body), NCIT (Neoplas) and
NCIT (Pharm), where the first three are the source and last three are the target ontologies in
our matching task. For each of the classes present in the given ontologies, class ID is provided
along with its associated label and possible synonyms (class descriptions). For example, in
Figure 1, for Snomed ID 78904004, the class label is “Chest Wall Structure,” and its synonyms
are “Thoracic Wall” and “Chest Wall”.
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Truveta Mapper (TM): Proposed approach for OM</title>
      <sec id="sec-4-1">
        <title>4.1. Hierarchical-ID generation</title>
        <p>An ontology is represented in
the form of a graph where each
node represents a class, and the
parent and child relations of
the ontology serve as connec- Figure 3: Hierarchical-IDs generation for the Enzyme concept
tions between classes. Based in the SNOMED ontology. The shortest ID
(highon this graph structure of each lighted) is chosen as a Hierarchical-ID, and others
full ontology, hierarchical-IDs are SynonymIDs for this concept.
are generated for all the classes.</p>
        <p>These are constructed by starting from the root node, separated by “-” at each hierarchy level,
and traversing through each node in that level as shown in Figure 3. Following this method, a
unique ID is generated for each path traversed. As such, for ontologies like SNOMED, where
there are multiple paths between the root and any given class, there could be multiple IDs
for that node. In such cases, the shortest ID is considered the hierarchical-ID of that node
(highlighted in yellow in Figure 3), while the other path IDs are considered its synonymIDs.</p>
        <p>Pre-training tasks
ID: Child &amp; Parent
ID: SYN &amp; SYN
ID &amp; SYN
ID &amp; FSN
FSN &amp; SYN
Fine-tuning tasks
Pharm
Neoplas
Body
Each node ID inherently captures the information of all its ancestors. This enables the model to
trace from a broader class, starting from the root and getting more granular at each level, thus
simplifying the translation task.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Pre-training</title>
        <p>After generating the hierarchical-IDs, multi-task pre-training is done on the full ontologies
(SNOMED, FMA, NCIT) using MLM by randomly masking the nodes, enabling the model to
learn the hierarchy and semantics. For instance, “Structure of Forel’s H2 bundle” is represented
as “1-1-0-0-0-0-4-1-1-0-0-0-7” and is masked as “1-1-0-0-0-0-[MASK]-1-0-0-0-7”. Furthermore,
additional tasks are included in order for the model to learn the semantics of each class in
the form of class-level synonyms, labels, and descriptions; class-level relations between child
and parent nodes; and the relation between synonym-ID and hierarchical-ID, using separate
identifiers for each task in the pre-training step (Figure 2). Task identifiers are added in the
form of prefixes, to distinguish between diferent ontologies. For example, SNOMED ontology
is prefixed as “F0:”, where “F” represents fully specified name (class label) and “0” indicates
SNOMED Ontology. Similarly, FMA and NCIT are represented using “1” and “2” identifiers.
Some representative examples are presented in Table 1, where similar tasks are defined for each
ontology, with the objective of learning their hierarchical structure and semantics using MLM.</p>
        <p>Based on the tasks stated in Table 1, we generate the pre-training dataset which has 2,406,456
instances constituting SNOMED, NCIT, and FMA ontologies. The model is trained for 3 epochs,
with an increasing masking percentage linearly over time, starting at 10% and increasing to 35%
in the final batch. The pre-training is done on 8 V100 32GB Nvidia GPUs with a batch size of 20,
using a learning rate of 1e-3 with linear decay scheduler and AdamW optimizer.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Fine-tuning</title>
        <p>The fine-tuning step aims to train the model on the downstream OM tasks. Only target subset
ontologies, i.e., NCIT (Pharm), NCIT (Neoplas), and FMA (Body), are used for fine-tuning. The
training data of each target sub-ontologies is augmented using the exact matches present in
the labels and synonyms of the other subset ontologies. We are also taking advantage of older
ontology versions to add more synonyms to each target label. This expands the training corpus,
enriches the data with minimal processing, and helps to perform more comprehensive learning.
After the data augmentation for all the target sub-ontologies, fine-tuning is performed only
on these target sub-ontologies corpora. Training data is generated for each class in the target
ontologies, where the input is the class label, synonyms, and descriptions, and output is the
corresponding node hierarchical-ID (generated in Section 4.1), using a separate identifier for
each task. Similar to our pre-training approach, multi-task fine-tuning is performed on the
downstream OM tasks. Some examples are presented in Table 1.</p>
        <p>Based on the fine-tuning tasks described in Table 1, we generate the fine-tuning training
data which has 462,789 samples from Pharm, Neoplas, and Body subsets. Using 8 Nvidia V100
32GB GPUs with a batch size of 20, the fine-tuning took around 21 epochs. For the fine-tuning,
a learning rate of 1e-3 with linear decay scheduler and warm-up of 1.5 epoch using AdamW
optimizer with eps of 1e-8 and weight decay of 1e-2 is used.</p>
      </sec>
      <sec id="sec-4-4">
        <title>4.4. Zero-shot Predictions</title>
        <p>TM is a multi-task model with the capability to translate between multiple ontologies from the
input source class labels/synonyms to target hierarchical-IDs. Thus, given a source term, the
model predicts the potential candidate in the target ontology graph. For confidence scoring,
two approaches have been adopted here: (i) Greedy search score: Scores are generated based
on greedy search with softmax probabilities using temperature scaling. This is a naive way
to compute the confidence directly from the model prediction. (ii) Using embeddings: This
is a sophisticated method proposed to make the TM predictions more robust and improve
model precision, by leveraging semantic similarity using embeddings of source terms and
predicted target candidates. Using the same model, the embeddings are generated for the target
candidate and the similarity score is obtained between the source term and predicted target term
embeddings. Scores are generated across the source and predicted class labels and synonyms, all
of which are also augmented by singularization. The maximum generated score is considered
as the similarity score. As such, the proposed model takes advantage of both graph search and
semantic matching. Mathematically, similarity score  is given as:</p>
        <p>1.0,
 = {
((Ω(</p>
        <p>if Ω( 1) ∩ Ω( 2) ≠ ∅
1), Ω( 2)), otherwise
(1)
where  2 is the predicted class for  1, Ω( 1) and Ω( 2) are sets of labels and synonyms for  1 and  2,
respectively, and ((Ω( 1), Ω( 2)) selects the maximum cosine similarity score across all
the labels and synonyms of  1 (source) and  2 (predicted). If an exact match is available between
the labels and synonyms of source and target classes, we assign a maximum similarity score,
since embedding similarity will also give a similar result. The source and the target candidates
are considered valid mapping pairs if their similarity score exceeds a selected threshold for both
the approaches.</p>
        <p>One of the main advantages of our proposed TM is that it reduces the time complexity to
log-linear as opposed to the naive solution of search that results in quadratic complexity2. Given
an input term with a specified task identifier, TM is able to predict the best possible match
from the target ontology with (()) complexity, where  corresponds to the number of
nodes in the target ontology graph (same as the number of classes). Overall, TM reduces the
time-complexity to (()) , noting that a single search in a tree structure with  nodes can
be performed in (()) time.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Results</title>
      <sec id="sec-5-1">
        <title>5.1. Evaluation criteria</title>
        <p>
          Commonly used metrics for evaluating OM systems [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]: Precision (P), Recall (R), and F-score
are used as the global evaluation metrics. Mathematically,
 =
|
        </p>
        <p>∩   |
|  |
,
 =
|</p>
        <p>∩   |
|  |
,</p>
        <p>.
  = (1 +  2)  2. + 
where,   are the reference mappings, consisting of matching pairs,  = (,  ′), such that 
and  ′ are two classes from the to-be-aligned ontologies, and   are the mappings computed
by OM systems and  = 1 .</p>
        <p>
          Local evaluation metrics,  @ and Mean Reciprocal Rank (  ), introduced in [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] are
also used for current evaluation and can be represented as:
|{ ∈ 
 |() ≤  }
|  |
|
,
  =
∑∈
 ()
|  |
−1
where () returns the ranking position of  among   ∪ {} according to their scores,
  represents a set of negative mappings pairs for each of the source term  in   , such that
(,  ″) ∈   with  ∈ {1, 2, ..., 100} and  ″ are the 100 negative output candidates from target
ontologies for each of the source terms  in   . As such, the Hits and MRR would be diferent
for diferent selected 100 samples. We have published the results of our model based on the
provided   set in [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] for a fair comparison. To provide a more robust measure of local metrics,
we are reporting overall accuracy as well, although this is not provided for any of the other
models. Accuracy here can be mathematically presented as:
   =
|{ ∈ 
 | (,  ) =
        </p>
        <p>
          ′}|
|  |
2Note that BERTMap reduces the time complexity from ( 2) in traditional approaches to () , where  &lt;&lt;  with
an additional preprocessing step by considering only a small portion of target subset ontology classes with at least
one subword token common to the source class candidate, which adds dependency on the tokenizer and could be
error prone since some semantically matching cases with lexical variations could get filtered out in this process.
Contrary to that, such limitation does not exist in TM since it performs matching from source to target without
reducing the target corpora size.
(2)
(3)
(4)
0.960
where  = (,  ′) represents matching pairs in the   set, and  (,  ) refers to the target
candidate predicted by the model, given an input term  and appropriate task identifier  .
Baselines. Results are compared with the SOTA approaches: Edit-Similarity, LogMap, AML,
BERTMap [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], and recently published results in [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. To be consistent, evaluation for P, R, F-score,
Hit@1, and MRR is done using [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] library.
        </p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Prediction Results</title>
        <p>
          Prediction results are shown in Tables 2–4, for the three equivalence OM tasks, from SNOMED
to FMA (Body), SNOMED to NCIT (Pharm), and SNOMED to NCIT (Neoplas). The results
demonstrate precision, recall, F-score, Hit@1, MRR, and accuracy for TM and baseline
approaches presented in He et al. [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] and OAEI [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] on the test data for the unsupervised setting.
In the given tables, superscripts1,2 are based on our proposed TM, where the former is based
on embedding similarity score and later is based on greedy search score, superscript∗ results
are based on He et al. [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] and we used the same evaluation metrics for TM, and superscript∗∗
correspond to OAEI [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] published results. The highest numbers for each of these metrics are
highlighted in the tables to emphasize which model is outperforming others in each category.
        </p>
        <p>The overall results illustrate that TM is outperforming all the baselines for all three OM
tasks in F-score, Hit@1, and MRR. A high threshold is selected to generate the most confident
cross-ontology matching pairs. Note that a single unified model is trained and leveraged here to
predict all the results in the form of a source class to target hierarchical-IDs, using appropriate
task identifiers.</p>
        <p>There are two TM results presented in the given tables, and both are based on diferent
scoring schemes. TM2 is based on greedy search scores and TM1 is based on a new and more
robust prediction scheme using embeddings described in Subsection 4.4, taking advantage of
both graph search and semantic similarity. It can be seen that both of our methods surpass
SOTA for all the tasks, but TM1 is more robust and has significant improvements as compared
Task
TM(Ours)1
TM(Ours)2
Edit-Similarity∗
LogMap∗
AML∗
BERTMap∗
LogMap-Lite∗∗
AMD ∗∗
BERTMap-Lite∗∗
Matcha∗∗
ATMatcher ∗∗
LSMatch∗∗
0.982
to any of the existing methods. To be precise, 2.3% improvement over the second best result
(AML) in Body, 11.0% improvement for Pharm (as compared to AMD), and 4.3% improvement
for Neoplas as compared to BertMap-Lite and Edit-Similarity, is seen for TM1 in the F-score. It
should be noted that even without TM, none of these methods are SOTA in all the tasks.</p>
        <p>For generating local metrics for Hit@1 and MRR, TM is used to generate the embedding
similarity score of input terms in the test set and their corresponding candidates in   ∪ {} set.
We are also outperforming all existing SOTA methods based on MRR and Hit@1. Additionally,
we are reporting accuracy metric, which is consistent, and more representative of the model
performance. For this metric, the TM predictions are obtained across the entire target ontology
without using any smaller subset of negative samples from the test set, while reducing the time
complexity from quadratic to log-linear.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusions and Discussions</title>
      <p>
        This work presents a new approach to OM by treating the OM process as a translation task
and performing multi-task pre-training, fine-tuning, and predictions in a zero-shot, unified
and end-to-end manner. The proposed approach takes advantage of transfer learning across
diferent ontologies and does not require manual annotations for training. Additionally, the
trained model understands the semantics of the text as well as the structure of the ontologies.
We show that our proposed method outperforms Edit-Similarity, LogMap, AML, BERTMap, and
the recently proposed OM frameworks in the OM22 conference [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] in all the tasks.
      </p>
      <p>Our approach provides several advantages: (1) It reduces the time complexity to log-linear
during inference, (2) It is based on zero-shot prediction, without requiring much post-processing
and does not employ mapping extension or mapping repair in contrast to the other methods,
(3) It does not require any manual labeled cross-ontologies matching pairs due to zero-shot
learning, (4) One unified framework is used as a result of multi-tasking, which makes it easier to
productionize these large transformer-based models, (5) It is robust toward diferent tokenization
schemes as it uses byte level tokenization, (6) It learns complete ontologies graphs, using the
hierarchical-IDs which provides a more natural path for translation, and would be significantly
helpful for subsumption mappings.</p>
      <p>In the future, we will pre-train the starting checkpoint with more domain-related corpus (e.g.,
PubMed, MIMIC-III, clinical notes) instead of the mC4 dataset. Another interesting track can be
ensemble learning of existing SOTA models with TM.
[10] L. L. Wang, C. Bhagavatula, M. Neumann, K. Lo, C. Wilhelm, W. Ammar, Ontology
alignment in the biomedical domain using entity definitions and context, arXiv preprint
arXiv:1806.07976 (2018).
[11] P. Kolyvakis, A. Kalousis, D. Kiritsis, Deepalignment: Unsupervised ontology matching
with refined word vectors, in: Proceedings of the 2018 Conference of the North American
Chapter of the Association for Computational Linguistics: Human Language Technologies,
Volume 1 (Long Papers), 2018, pp. 787–798.
[12] V. Iyer, A. Agarwal, H. Kumar, VeeAlign: a supervised deep learning approach to ontology
alignment., in: OM@ ISWC, 2020, pp. 216–224.
[13] Y. Zhang, X. Wang, S. Lai, S. He, K. Liu, J. Zhao, X. Lv, Ontology matching with word
embeddings, in: Chinese computational linguistics and natural language processing based
on naturally annotated big data, Springer, 2014, pp. 34–45.
[14] T. Mikolov, K. Chen, G. Corrado, J. Dean, Eficient estimation of word representations in
vector space, arXiv preprint arXiv:1301.3781 (2013).
[15] M. Tounsi Dhouib, C. Faron Zucker, A. G. Tettamanzi, An ontology alignment approach
combining word embedding and the radius measure, in: International Conference on
Semantic Systems, Springer, Cham, 2019, pp. 191–197.
[16] P. Bojanowski, E. Grave, A. Joulin, T. Mikolov, Enriching word vectors with subword
information, Transactions of the association for computational linguistics 5 (2017) 135–146.
[17] J. Chen, E. Jiménez-Ruiz, I. Horrocks, D. Antonyrajah, A. Hadian, J. Lee, Augmenting
ontology alignment by semantic embedding and distant supervision, in: European Semantic
Web Conference, Springer, 2021, pp. 392–408.
[18] J. Chen, P. Hu, E. Jimenez-Ruiz, O. M. Holter, D. Antonyrajah, I. Horrocks, Owl2vec*:</p>
      <p>Embedding of owl ontologies, Machine Learning 110 (2021) 1813–1845.
[19] P. Kolyvakis, Approaching Ontology Alignment through Representation Learning to Bridge
the Semantic Gap in Engineering Applications, Technical Report, EPFL, 2020.
[20] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, I.
Polosukhin, Attention is all you need, Advances in neural information processing systems 30
(2017).
[21] M. Johnson, M. Schuster, Q. V. Le, et al., Google’s multilingual neural machine translation
system: Enabling zero-shot translation, Transactions of the Association for Computational
Linguistics 5 (2017) 339–351.
[22] W. Xu, M. Carpuat, Editor: an edit-based transformer with repositioning for neural
machine translation with soft lexical constraints, Transactions of the Association for
Computational Linguistics 9 (2021) 311–328.
[23] H.-I. Liu, W.-L. Chen, Re-transformer: a self-attention based model for machine translation,</p>
      <p>Procedia Computer Science 189 (2021) 3–10.
[24] C. Clark, K. Lee, M.-W. Chang, T. Kwiatkowski, M. Collins, K. Toutanova, BoolQ: Exploring
the surprising dificulty of natural yes/no questions, arXiv preprint arXiv:1905.10044
(2019).
[25] Z. Wang, AMD Results for OAEI 2022 (2022).
[26] L. Xue, A. Barua, N. Constant, R. Al-Rfou, S. Narang, M. Kale, A. Roberts, C. Rafel, Byt5:
Towards a token-free future with pre-trained byte-to-byte models, Transactions of the
Association for Computational Linguistics 10 (2022) 291–306.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Neutel</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. H. de Boer</surname>
          </string-name>
          ,
          <article-title>Towards Automatic Ontology Alignment using BERT</article-title>
          .,
          <source>in: AAAI Spring Symposium: Combining Machine Learning with Knowledge Engineering</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , Bert:
          <article-title>Pre-training of deep bidirectional transformers for language understanding</article-title>
          , arXiv preprint arXiv:
          <year>1810</year>
          .
          <volume>04805</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Y.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Antonyrajah</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Horrocks</surname>
          </string-name>
          ,
          <article-title>BERTMap: a BERT-based ontology alignment system</article-title>
          ,
          <source>in: Proceedings of the AAAI Conference on Artificial Intelligence</source>
          , volume
          <volume>36</volume>
          ,
          <year>2022</year>
          , pp.
          <fpage>5684</fpage>
          -
          <lpage>5691</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4] OAEI, https://www.cs.ox.ac.uk/isg/projects/ConCur/oaei/2022/ Accessed:
          <fpage>2022</fpage>
          -
          <lpage>08</lpage>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Y.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Jiménez-Ruiz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hadian</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Horrocks</surname>
          </string-name>
          ,
          <article-title>Machine LearningFriendly Biomedical Datasets for Equivalence and Subsumption Ontology Matching</article-title>
          ,
          <source>arXiv preprint arXiv:2205.03447</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6] DeepOnto, https://github.com/KRR-Oxford/DeepOnto Accessed:
          <fpage>2022</fpage>
          -10-
          <lpage>14</lpage>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Sharma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Patel</surname>
          </string-name>
          , S. Jain,
          <article-title>LSMatch and LSMatch-Multilingual Results for OAEI (</article-title>
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>E.</given-names>
            <surname>Jiménez-Ruiz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. Cuenca</given-names>
            <surname>Grau</surname>
          </string-name>
          ,
          <article-title>Logmap: Logic-based and scalable ontology matching</article-title>
          , in: International Semantic Web Conference, Springer,
          <year>2011</year>
          , pp.
          <fpage>273</fpage>
          -
          <lpage>288</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>D.</given-names>
            <surname>Faria</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Pesquita</surname>
          </string-name>
          , E. Santos,
          <string-name>
            <given-names>M.</given-names>
            <surname>Palmonari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. F.</given-names>
            <surname>Cruz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. M.</given-names>
            <surname>Couto</surname>
          </string-name>
          ,
          <article-title>The agreementmakerlight ontology matching system</article-title>
          ,
          <source>in: OTM Confederated International Conferences” On the Move to Meaningful Internet Systems”</source>
          , Springer,
          <year>2013</year>
          , pp.
          <fpage>527</fpage>
          -
          <lpage>541</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>