<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Contextualized Structural Self-supervised Learning for Ontology Matching</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Zhu Wang</string-name>
          <email>zwang260@uic.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, University of Illinois at Chicago</institution>
          ,
          <addr-line>Chicago, IL, 60607</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Ontology matching (OM) entails the identification of semantic relationships between concepts within two or more knowledge graphs (KGs) and serves as a critical step in integrating KGs from various sources. Recent advancements in deep OM models have harnessed the power of transformer-based language models and the advantages of knowledge graph embedding. Nevertheless, these OM models still face persistent challenges, such as a lack of reference alignments, runtime latency, and unexplored diferent graph structures within an end-to-end framework. In this study, we introduce a novel self-supervised learning OM framework with input ontologies, called LaKERMap. This framework capitalizes on the contextual and structural information of concepts by integrating implicit knowledge into transformers. Specifically, we aim to capture multiple structural contexts, encompassing both local and global interactions, by employing distinct training objectives. To assess our methods, we utilize the Bio-ML datasets and tasks. The findings from our innovative approach reveal that LaKERMap surpasses state-of-the-art systems in terms of alignment quality and inference time. Our models and codes are available here</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>CEUR</p>
      <p>ceur-ws.org
https://github.com/ellenzhuwang/lakermap.</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>
        An ontology or knowledge graph (KG) provides a vocabulary to describe a domain of knowledge
[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. However, many ontologies have been developed independently across diferent systems
and domains, leading to heterogeneity and concept naming ambiguity. To facilitate knowledge
sharing and reuse, ontology matching (OM) plays a crucial role in knowledge engineering and
semantic data integration. Ontology matching aims to establish semantic correspondences
between concepts from diferent ontologies. Traditional OM systems, such as
AgreementMakerLight (AML) [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and LogMap [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], have consistently obtained top results in the Ontology
Alignment Evaluation Initiative (OAEI). These systems mainly apply lexical matching algorithms
but fail to capture semantic information between the concepts.
      </p>
      <p>
        Recent pre-trained language models (PLMs), such as BERT[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and BioBert[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] have achieved
dominant performance on various Natural Language Processing (NLP) tasks. Thus, systems
like MELT[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] and BERTMap[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] have applied the transformer-based models, demonstrating
https://ellenzhuwang.github.io (Z. Wang)
CEUR
Workshop
Proceedings
      </p>
      <p>
        © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
that the use of these pre-trained models leads to notable improvements in ontology matching
tasks. Nevertheless, these systems fine-tune on the synonyms of the concepts, disregarding
the abundant graph structures between the concepts. To address the challenge of learning
structural information, knowledge graph embedding-based methods have shown promising
results. Examples include MutliOM [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] and AMD [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], which learn relations from the triplets
by employing translation-based algorithms [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. However, these approaches typically focus
solely on learning the relational embeddings, without considering contextual semantics, such
as language embeddings of labels/names of the concepts.
      </p>
      <p>Motivated by the aforementioned observations, we propose a novel ontology matching
method, named as LaKERMap. This method employs two transformer encoders to incorporate
both contextual and structural information in the ontologies. Specifically, LaKERMap utilizes
self-supervised training on designed objectives at both the triplet and path levels, such as triplets
contrastive learning and masked concept prediction in paths. During the mapping inference, we
predict candidates using the pre-trained model and select the final alignments with a relation
regularization based on knowledge graph embedding methods.</p>
      <p>
        To evaluate the efectiveness of LaKERMap, we conduct multiple experiments on the Bio-ML
track [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] in the annual OAEI competitions. When compared to the state-of-the-art systems in
OAEI 2022, our experimental results demonstrate that LaKERMap consistently outperforms
baseline systems across various evaluation metrics. Moreover, considering inference latency,
LaKERMap generates alignments for large ontologies within minutes. In summary, our
contributions are as follows: (1) We propose LaKERMap, a novel ontology matching system that
infuses both contextual and structural information into transformer models. (2) We introduce
multiple training objectives with carefully designed strategies. These objectives are beneficial
for learning contextualized structural representations both locally and globally. (3) Extensive
evaluations on diferent datasets reveal that LaKERMap outperforms state-of-the-art ontology
baseline systems in terms of runtime, recall, and F-score.
      </p>
    </sec>
    <sec id="sec-3">
      <title>2. Preliminaries</title>
      <p>
        Transformers. Many PLMs apply a multi-layer Transformer architecture to encode texts [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ],
and the core component is the multi-head self-attention (MHA) [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. Given input  ∈ ℝ  ×
as
a sequence of N tokens, the  -th layer of MHA is derived as:
  =   (
 ,   ,   ) = (
1 , … ,    )
0 ,  ℎ = (
ℎ ,  
ℎ ,  ℎ ) =  (
where  ℎ =  
,ℎ ,  ℎ =  

,ℎ ,  ℎ =  

is a self-attention (SA) module, ℎ ∈ [1,  ]
      </p>
      <p>,ℎ ,  0 is a linear projection matrix, each head  ℎ
, and  ,ℎ ,  ,ℎ ,   ,ℎ are query, key, value matrices.</p>
      <p>Problem Formulation. Our general goal is to learn both contextual and structural
information from the ontologies to enhance ontology matching. Given an ontology  , we can extract
their concept sets as  and relation sets as  . Triplets ∈  are defined as (ℎ,  , ) , where ℎ ∈  is
head concept,  ∈  is tail concept, and  ∈  is the relation between head and tail concept. We
utilize the triplets in all the public ontologies as our training set.
 ℎ ( ℎ )⊤</p>
      <p>√ 
) ℎ
(1)</p>
      <p>
        Ontology matching is finding a set of mappings M between the concepts of source ontology
as  and target ontology as  ′. In this work, we mainly focus on the equivalence matching. A
mapping is a tuple (,  ′, ≡, ) , where  ∈  and  ′ ∈  ′ are concepts, , and  ∈ [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ] indicates the
probability of concept  and  ′ are equivalent as follow:
( ≡  ′) = cos(( (),  (
′)))
(2)
where  (⋅) is the feature extraction module, such as the output final layer from Eq 1, cos is
the cosine similarity between the feature embeddings of given concepts.
      </p>
    </sec>
    <sec id="sec-4">
      <title>3. Proposed method</title>
      <p>In this section, we detail our proposed LaKERMap framework. Specifically, we design multiple
training objectives to incorporate structural context at the triplet and path levels. We first
introduce how we jointly encode text and relations between concepts using masked triplet
contrastive learning and relation prediction for triplets. Furthermore, global structure
information is crucial for learning KG structures, as similar concepts often have similar parents or
children. To this end, we aim to distinguish positive and negative paths using a transformer
encoder. Moreover, ontologies typically contain multiple, lengthy paths leading from the root
to given concepts. Consequently, masked concept prediction at the path level is beneficial for
representing the global structure of a KG.</p>
      <p>LaKERMap Architecture. The architecture of LaKERMap is illustrated in Figure 1.
Initially, we construct triplet and path sets for the training corpus. LaKERMap incorporates two
transformer encoders trained on diferent tasks, with all encoders sharing parameters.
During inference, we extract the embeddings of a given concept  from the learned encoders and
generate target concept candidate sets as ( 1′, … ,  ′). Then, we compute the cosine similarity  
between  and each  ′,  ∈  , and select the final alignment based on similarity scores.</p>
      <sec id="sec-4-1">
        <title>3.1. Triplet Encoding</title>
        <p>Triplet-level representation learning considers a batch of triplets as inputs and estimates the
joint distribution of elements in the triplets. We first construct the triplet set for each  ∈  ,
and identify the relation set  (e.g., subclass of, disjoint with, synonym, etc.). Then, we iterate
over all the relations in KGs. Specifically, for each relation   , we aim to find the connected
head and tail concepts to create a triplet (ℎ,   , ) . All the triplets are subsequently passed to the
transformer encoders to learn meaningful representations.</p>
        <p>To incorporate contextual and structural information in the triplets, we adopt multiple
strategies for diferent relations to generate  negative triplets for each positive pair. Notably,
we mainly focus on the relation of ‘subclass of’ and ‘synonym’. To generate negative triplets
for the ‘subclass of’ relation, given a head concept ℎ, we randomly sample  tail concepts from
those not in the subclass set. Diferently, for the ‘synonym’ relation, we first randomly sample
 tail concepts from the subclass set, which do not have overlapping tokens with the given
head concept.
masked, i.g.,   , written as:</p>
        <p>Triplet contrastive learning. The mask prediction is core technique in PLMs, which is
beneficial for contextualized representations and encouraging bidirectional context learning.
Thus, we consider to randomly mask concepts in the triplets and predict them. To be specific,
given a set of input  = {(ℎ  ,   ,   )}=1 , we encode the distribution when one of the concepts is
− (ℎ,)∼ log( ( ()| (ℎ),  ( ))
where  (⋅) represents the feature extraction module.</p>
        <p>To improve robustness and performance on downstream tasks, such as ontology alignment,
we also propose to learn from the contrastive representations of masked triplets. It is beneficial
to maximize the similarity of matched pairs while minimizing the similarity of unmatched pairs.
For instance, the positive pair is ((ℎ ,   , []), (([], [], 
learning as a classification task and define Concept-Concept cross-entropy loss as:
 )). We derive contrastive
ℒ2
= − ∑ log(</p>
        <p>exp(( (ℎ
∑ exp(( (ℎ
 ,   , []),  ([], [], 
 ,   , []),  ([], [], 
 ))/ )
 ))/ )
)
where  is a scalar temperature hyperparameter.
(3)
(4)
Relation classification.</p>
        <p>We also encode the distribution of relations when the relations are
masked as [mask] token, written as:</p>
        <p>− (ℎ,)∼ log( ( ( )| (ℎ),  ())).</p>
        <p>There are various relation types connecting concepts.</p>
        <p>Given head and tail concepts
(ℎ , [],</p>
        <p>), we consider the relation prediction as a multi-class classification task. We extract
relation representions through an auxiliary two-layer MLP and define Concept-Relation loss as:

 ∈
ℒ2
= − ∑
∑   ⋅ (  ( (ℎ
 , [], 
 )))
where   = 1 is the true label of the i-th relation. Note that, we generate negative pairs in
diferent ways according to the relation types.</p>
      </sec>
      <sec id="sec-4-2">
        <title>3.2. Path Encoding</title>
        <p>Path-level representation learning aims at capturing global interactions between the
concepts. In particular, there are many paths from the root to the given concepts, some of
which have significant long lengths. The hierarchical subclass paths are extracted from the
graph structure of the ontologies. Additionally, we incorporate contextual information into
the paths. For example, given triplets ( 1, subclass of,  2) and ( 2, synonym,  3), the path is
( 1, subclass of,  2, synonym,  3). To generate negative paths, we randomly replace a concept for
the given paths with lengths less than 5, such as ( 1, subclass of,  5, synonym,  3). In contrast,
for paths with greater length, 20% concepts will be replaced randomly.</p>
        <p>Path contrastive learning. The goal is to diferentiate between positive and negative paths
by minimizing the contrastive loss. This training objective helps the model learn to identify
meaningful paths and capture the global contextualized structural information among the
task to predict the given paths are positive or negative. The CPath loss is defined as:

ontologies. Given a set of paths  = {( 1,  2, … ,  
)}=1 , we consider it as a binary classification</p>
        <p>=1   ∈
ℒℎ
= − ∑
∑   ⋅ ( ( (
 )))
where   is the  -th path ∈  ,   is the label,  (⋅) is the sigmoid function.</p>
        <p>Masked concept prediction. The advantages of transformer architecture provide the
abilities to capture global structural information in the long sequences. Similar to the masked
concept prediction in triplets, we randomly mask a concept in the path. Specifically, given a set

of positive paths as  = {( 1,  2, … ,  
)}
=1 , we feed the masked path such as as (
1
 , [], … , 
to the transformer encoder. The path encoding task is to predict the mask concept in the path,


)
and we derive the distribution of masked concepts as:
The MPath loss is defined as:
− ( 1,…,
 )∼ log( ( (</p>
        <p>)| ( 1), … ,  (  ))
ℒℎ
= − ∑
∑ 

 ⋅ ( ( (

=1   ∈
 )))
(5)
(6)
(7)
(8)
(9)
where    is the label of masked concepts in the  -th path,  is the masked position.</p>
        <p>Finally, the overall training loss is as:
ℒ = ℒ2
+ ℒ2
+ ℒℎ
+ ℒℎ
(10)
(11)</p>
      </sec>
      <sec id="sec-4-3">
        <title>3.3. Mappings inference</title>
        <p>During the inference, the goal is to find an aligned concept from the target ontology for the
given concept from the source ontology. Our model is self-supervised training on all ontolgoies
in zero-shot settings. Thus, we consider the mapping problem as masked concept prediction.
Specifically, given the concept   from source ontology  , the predictions of concept candidate
sets are derived as:
Ω() =  ( (, 

))
where Ω() = ( 1, … ,   ) is the top- concepts based on probabilities. Then, we search each
concept of the candidate sets in the target ontology  ′.</p>
        <p>
          Inspired by filtering strategies in many OM systems [
          <xref ref-type="bibr" rid="ref13 ref9">9, 13</xref>
          ], we select final alignments
by using a translation-based model, TransE [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ], to encode relation information. To avoid
increasing model complexity, we pre-process TransE embeddings and regard them as the relation
regularizations. Then, we calculate similarity scores of the given concept by concatenating
TransE embeddings to the outputs of the transformers. Suppose found Ω( ′) = ( 1′, … ,  ′) ∈
 ′,  ≤  , we compute the cosine similarity for the mapping score  = ( 1, … ,   ), where
  = (( (  ), (  ), ( ( ′), ( ′)))), and (⋅) is the embedding from TransE. Finally,
the final mapping is selected by the top scored candidate as follows:
 = (, 
′, ≡,  ((, 
′))
(12)
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>4. Experiments</title>
      <p>We mainly conduct various experiments of LaKERMap on equivalence matching, and use
BioML track developed by OAEI 2022, as the showcases. Note that, our method is generalized and
extendable to any tracks or datasets. The yearly results of OAEI illustrate the performances of
state-of-the-art ontology matching systems. In this section, we introduce the implementation
details of our model, its performance and analysis on diferent datasets, and ablation studies of
diferent settings and training objectives.</p>
      <sec id="sec-5-1">
        <title>4.1. Implementation details</title>
        <p>
          Dataset. We train on all ontologies without any reference mappings in Mondo and UMLS from
Bio-ML track. The statistical details of the ontologies are shown in [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. During inference, it
takes two particular ontologies as input files from the track. To evaluate, we use the subsets of
equivlance matching task and unsupervised settings which are described in the track.
        </p>
        <p>Ontology pre-processing. Ontologies are usually formatted as owl or rdf, but the inputs
of our model require the format of text tokens. Firstly, we extract meta-information from</p>
        <p>The statistical details of pre-processed triplets and paths with instr-ontologies. We use them to construct
ontologies using owlready2 1, including ID, labels, resources, and descriptions of concepts. The
structural relations are extracted from subClass and disjointWith. Note that since the ontologies
in the tasks were developed by diferent organizations, we process the ontology parsing from
diferent tags, such as rdf:ID=”isPartOf” and rdf:resource =”UNDEFINED_part_of”. We construct
sets of triplets and paths following the descriptions in sections 2.2 and 2.3. The details of our
training set are presented in Table 1.</p>
        <p>
          Training settings. In practice, we fine-tune BioBERT [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] on our proposed training objectives.
The number of fine-tuning epochs is 5 with batch size of 32 and learning rate of 5e-5. The other
hypermater settings are as:  in Eq 4 is 2; the positive-negative sample in both triplets and
paths is 1:2; the number of candidate concepts in inference is 5. We train on 8 NVIDIA RTX
2080Ti GPUs, and infer on 1 NVIDIA RTX 2080Ti GPU. To compare the performance with other
OM systems in OAEI, we adopt evaluation of DeepOnto 2to avoid errors and bias during the
evaluation process.
        </p>
        <p>Evaluation Metrics. Following OAEI evaluation metrics, we evaluate our models in term of
Precision, Recall and F1-score as follows:  =
∩

 ,  =
∩
 
 ,  =
+
2 , where  denotes
 
Precision,  denotes Recall,  denotes F-score,  is the output mappings of LaKERMap ,and
refers to the reference mappings which are annotated and verified by OAEI. Note that, we
extract mapping pairs with equal relation from the references, and perform one-to-one mapping.</p>
      </sec>
      <sec id="sec-5-2">
        <title>4.2. Results and analysis</title>
        <p>
          Baselines. The baselines and their results 3 are chosen from the participants in Bio-ML track in
OAEI 2022. We mainly select Ontology Matching (OM) systems from two categories, including
traditional OM matching with lexical and structure information (LogMap [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] and Matcha [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]),
and machine/deep learning methods with pre-trained large language model and knowledge
graph embbeding (BERTMap [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ], AMD [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], ATMatcher [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ].) Note that, Matcha applies the
same matching algorithms with AML [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] but with diferent ontology processing.
        </p>
        <p>Main results and analysis. The mapping results are shown in Tables 2 and 3. Other
model results are taken from OAEI 2022. We also provide the results of training on diferent
combinations of triplet and path objectives. The analysis will be explained in the ablation studies
section 4.3. We can summarize that LaKERMap achieves the best or second-best performance
compared with these state-of-the-art baselines. It outperforms the best baseline in terms of
recall and F-score by 4% and 8%, respectively. Although ATMatcher has the best precision
scores, their performance on recall is not good. However, LaKERMap achieves the second-best
scores in precision and surpasses other baselines by 4% − 11% in the OMIM-ORDO dataset.</p>
        <p>Compared with the OM models that consider structural information, such as ATMatcher
and AMD, LaKERMap is able to predict more correct mappings with contextualized structural
representations on triplet-level and path-level. ATMatcher and AMD utilize limited
structural contexts to filter mappings, which prevents them from achieving further improvements.
BERTMap has shown the power of large language models compared to the lexical matching
methods by LogMap and Matcha. However, BERTMap mainly focuses on learning contextual
representations and repairing mappings with ontology structure, which neglects the suficient
structural information during training. Hence, our proposed model learns contextual and
structural information seamlessly, and the integration of local and global structures can
further improve the performance on mapping predictions. In summary, LaKERMap learns more
generalized representations for the concepts and predicts high-quality mappings.</p>
        <p>Eficiency analysis. The runtime latency of an Ontology Matching (OM) system is another
crucial evaluation metric. To compare the runtime of the mapping predictions, we reproduce
AMD 4 and BERTMap 5 with the same computation resources as LaKERMap. As BERTMap
adopts fine-tuning and mapping repair techniques during the predictions, it requires hours to
generate the final alignments. In fact, it is impractical to match large scale ontologies. Compared
4https://github.com/ellenzhuwang/AMD-v2
5https://github.com/KRR-Oxford/DeepOnto
with AMD, our model achieves significant improvements in terms of runtime and mapping
quality, which results are shown in Table 3.</p>
      </sec>
      <sec id="sec-5-3">
        <title>4.3. Ablation studies</title>
        <p>We further evaluate the efectiveness of the diferent proposed encoding methods described
in sections 3.1 and 3.2. Moreover, we conduct a variety of experiments on the factors which
impact the representation learning during training and mapping predictions in the inference.</p>
        <p>Which objective(s) from - Triplet or Path? We propose incorporating contextual and
structural information with diferent training objectives. Evaluating each objective and their
combinations in downstream tasks, specifically ontology matching in our case, proves to be
beneficial. As shown in Table 2, we observe that triplet learning surpasses path learning in
terms of F-score across both datasets, possibly due to the more significant semantic information
captured by triplet encoding. When compared with solely using triplet encoding or path
encoding, training on all objectives significantly increases the performance in term of all
evaluation metrics. Therefore, encoding local and global interactions between concepts with
contextual representations leads to the best performance in mapping predictions.</p>
        <p>Which ratio for masks and negative samples? The quality of the corpus is pivotal in
training deep learning models. Consequently, we conduct experiments on corpus construction,
initially considering the number of negative samples for each positive sample. Negative samples
in both triplet and path encoding can help distinguish between diferent samples, such as
incorrect relations and paths. The results from varying the number of negative samples are
shown in Fig 2a. Positive-to-negative ratios of 1 ∶ 2 and 1 ∶ 3 outperforms the scenario with a
ratio of 1 ∶ 1. Thus, negative samples are beneficial for improving generalization. However, an
excess of negative samples can impair performance and robustness due to the bias introduced by
imbalanced data. In path encoding, the model learns to understand and capture the semantic and
syntactic relationships between concepts globally by predicting the masked concepts based on
the surrounding concepts. The number of masked concepts impacts the model’s understanding
ability. Therefore, we illustrate the experiments on the average number of masks in Fig 2b. It
should be noted that the number of masks varies with the path length. Considering the average
length of paths shown in Table 1, the performance declines with more than 2 masks. Since
the model receives less information from the paths, more masks could potentially degrade the
quality of the learned representations.</p>
        <p>How many candidates for inference? During mapping inference, we predict concepts
from the pre-trained model and employ knowledge graph embeddings to encode relations,
which allows us to filter mappings from the candidate sets. As a result, the final alignments
may difer from the model’s outputs. For instance, the probability of  ′ as  ( ′) might be the
ifth highest in the candidate sets, but it may become the final alignment after considering ( ′).
As shown in Fig 2c, the recall score decreases as the number of candidates increases. Indeed,
the final mappings are sensitive to the selections of the candidate sets in terms of precision and
recall. In practice, we have observed that a candidate set size of 5 generally performs well.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>5. Related Work</title>
      <p>
        Traditional Ontology Matching. Traditional ontology matching systems, such as
AgreementMaker[
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] and AgreementMakerLight [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], typically employ feature-based methods. As
leading ontology matching systems, they deploy high flexibility and extensibility by relying on a
combination of various approaches, including lexical matching, structure matching, and external
knowledge resources. The LogMap family [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], another prominent system in OAEI, extends
mapping by adopting structural information and repairing mapping through logical reasoning.
While these systems have been proven to be efective over the past decades, their matching
processes rely heavily on outdated NLP techniques such as lexical matching, thereby neglecting
the enrichment of contextual semantics. Recent methods have begun to apply word embeddings.
DeepAlignment [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] leverage representation learning for ontology matching problems. It uses
word embeddings such as Word2Vec [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] to extract synonyms with joint structure encoding.
ALOD2Vec [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] utilizes large RDF data with hypernymy relations as external resources, deriving
embeddings for concepts based on the RDF2Vec [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] method. However, much like AML and
LogMap, these models focus on feature engineering and do not suficiently consider semantics.
      </p>
      <p>
        Machine/Deep learning models based Ontology Matching. Recent works, such as
MultiOM [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], perform the matching process using diferent embedding techniques from multiple
views in graphs. Log-ML [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ], a machine learning extension built atop LogMap, trains a complex
neural network to incorporate semantics. However, some of these models require seed or
reference alignments. Moreover, they are not particularly eficient in mapping prediction, which
typically requires high computational resources and several hours to generate the ontology
alignments. As an extension of AML, AMD [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] adopts a pre-trained transformer model to
generate mappings and filters using knowledge graph embedding methods. Neutel et al.[
        <xref ref-type="bibr" rid="ref23">23</xref>
        ]
extract embeddings from Sentence-BERT[
        <xref ref-type="bibr" rid="ref24">24</xref>
        ] for ontology matching. BERTMap [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] is a
BERTbased system that fine-tunes BioBERT [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] on a semantic text corpus derived from ontologies.
MELT [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] incorporates a transformer-based filter in their machine learning module and discusses
performance based on diferent pre-trained language models and fine-tuning settings. However,
these methods mainly focus on textual semantic representations, thereby neglecting important
structural information in the ontologies.
      </p>
    </sec>
    <sec id="sec-7">
      <title>6. Conclusions</title>
      <p>In this paper, we propose capturing both contextual and structural information of the ontologies
using distinct training objectives. LaKERMap is composed of two transformers, encoding triplets
and paths respectively. We train LaKERMap in a self-supervised manner and infer mappings in
a zero-shot setting. Specifically, we fuse and feed contextual information, such as label names,
and structural information, such as relations between concepts, into the transformers. The
training tasks encompass triplet contrastive learning, relation classification, path contrastive
learning, and masked concept prediction. Through the benefits of self-supervised learning, our
model learns generalized representations of the concepts and can generate mappings within
seconds. By conducting extensive experiments on diferent datasets and in various settings, we
demonstrate that LaKERMap surpasses state-of-the-art baseline ontology matching systems in
terms of speed and accuracy.</p>
      <p>Limitations and future work. Due to computational resource limitations, we initialize
and fine-tune BioBERT as our backbone encoder. Performance could potentially be improved
by adopting larger and more complex language models. Furthermore, we believe that
selfsupervised training on a broader range of ontologies will enhance generalization and robustness.
Specifically, we plan to incorporate additional ontology resources with the aim of providing a
large-scale pre-trained model for ontology matching as part of our future work.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>I. F.</given-names>
            <surname>Cruz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Xiao</surname>
          </string-name>
          ,
          <article-title>The Role of Ontologies in Data Integration</article-title>
          ,
          <source>Journal of Engineering Intelligent Systems</source>
          <volume>13</volume>
          (
          <year>2005</year>
          )
          <fpage>245</fpage>
          -
          <lpage>252</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>D.</given-names>
            <surname>Faria</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Pesquita</surname>
          </string-name>
          , E. Santos,
          <string-name>
            <given-names>M.</given-names>
            <surname>Palmonari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. F.</given-names>
            <surname>Cruz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. M.</given-names>
            <surname>Couto</surname>
          </string-name>
          ,
          <article-title>The AgreementMakerLight Ontology Matching System</article-title>
          , in: International Conference on Ontologies,
          <source>DataBases, and Applications of Semantics (ODBASE)</source>
          , Springer,
          <year>2013</year>
          , pp.
          <fpage>527</fpage>
          -
          <lpage>541</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>E.</given-names>
            <surname>Jiménez-Ruiz</surname>
          </string-name>
          ,
          <string-name>
            <surname>B. Cuenca Grau</surname>
          </string-name>
          ,
          <source>LogMap: Logic-based and Scalable Ontology Matching</source>
          (
          <year>2011</year>
          )
          <fpage>273</fpage>
          -
          <lpage>288</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , Bert:
          <article-title>Pre-training of deep bidirectional transformers for language understanding</article-title>
          , arXiv preprint arXiv:
          <year>1810</year>
          .
          <volume>04805</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>E.</given-names>
            <surname>Alsentzer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Murphy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Boag</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.-H.</given-names>
            <surname>Weng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Jin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Naumann</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>McDermott, Publicly available clinical BERT embeddings</article-title>
          ,
          <source>in: Proceedings of the 2nd Clinical Natural Language Processing Workshop</source>
          , Association for Computational Linguistics, Minneapolis, Minnesota, USA,
          <year>2019</year>
          , pp.
          <fpage>72</fpage>
          -
          <lpage>78</lpage>
          . URL: https://www.aclweb.org/anthology/W19-1909. doi:
          <volume>10</volume>
          .18653/ v1/
          <fpage>W19</fpage>
          - 1909.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S.</given-names>
            <surname>Hertling</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Portisch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Paulheim</surname>
          </string-name>
          ,
          <article-title>Matching with transformers in melt</article-title>
          ,
          <source>arXiv preprint arXiv:2109.07401</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Y.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Antonyrajah</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Horrocks</surname>
          </string-name>
          ,
          <article-title>Bertmap: A bert-based ontology alignment system</article-title>
          ,
          <source>arXiv preprint arXiv:2112.02682</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>W.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Duan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Qi, Multi-view embedding for biomedical ontology matching</article-title>
          .,
          <source>OM@ISWC</source>
          <volume>2536</volume>
          (
          <year>2019</year>
          )
          <fpage>13</fpage>
          -
          <lpage>24</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <source>Amd results for oaei</source>
          <year>2022</year>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A.</given-names>
            <surname>Bordes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Usunier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Garcia-Duran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Weston</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Yakhnenko</surname>
          </string-name>
          ,
          <article-title>Translating Embeddings for Modeling Multi-relational Data</article-title>
          ,
          <source>Advances in Neural Information Processing Systems</source>
          <volume>26</volume>
          (
          <year>2013</year>
          )
          <fpage>2787</fpage>
          -
          <lpage>2795</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>Y.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Jiménez-Ruiz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hadian</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Horrocks,</surname>
          </string-name>
          <article-title>Machine learning-friendly biomedical datasets for equivalence and subsumption ontology matching</article-title>
          ,
          <source>in: The Semantic Web-ISWC</source>
          <year>2022</year>
          : 21st International Semantic Web Conference, Virtual Event,
          <source>October 23-27</source>
          ,
          <year>2022</year>
          , Proceedings, Springer,
          <year>2022</year>
          , pp.
          <fpage>575</fpage>
          -
          <lpage>591</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>A.</given-names>
            <surname>Vaswani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Parmar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Uszkoreit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Gomez</surname>
          </string-name>
          , Ł. Kaiser,
          <string-name>
            <surname>I. Polosukhin</surname>
          </string-name>
          ,
          <article-title>Attention is all you need</article-title>
          ,
          <source>in: Advances in neural information processing systems</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>5998</fpage>
          -
          <lpage>6008</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>S.</given-names>
            <surname>Hertling</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Paulheim</surname>
          </string-name>
          ,
          <source>Atbox results for oaei</source>
          <year>2022</year>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>E.</given-names>
            <surname>Jiménez-Ruiz</surname>
          </string-name>
          ,
          <article-title>Logmap family participation in the oaei 2022 (</article-title>
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>D.</given-names>
            <surname>Faria</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. C.</given-names>
            <surname>Silva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Cotovio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Eugénio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Pesquita</surname>
          </string-name>
          ,
          <article-title>Matcha and matcha-dl results for oaei 2022 (</article-title>
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>I. F.</given-names>
            <surname>Cruz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Palandri Antonelli</surname>
          </string-name>
          , C. Stroe,
          <source>AgreementMaker: Eficient Matching for Large Real-World Schemas and Ontologies, PVLDB</source>
          <volume>2</volume>
          (
          <year>2009</year>
          )
          <fpage>1586</fpage>
          -
          <lpage>1589</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>P.</given-names>
            <surname>Kolyvakis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kalousis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kiritsis</surname>
          </string-name>
          , Deepalignment:
          <article-title>Unsupervised ontology matching with refined word vectors</article-title>
          ,
          <source>in: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , Volume
          <volume>1</volume>
          (
          <string-name>
            <surname>Long</surname>
            <given-names>Papers)</given-names>
          </string-name>
          ,
          <year>2018</year>
          , pp.
          <fpage>787</fpage>
          -
          <lpage>798</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>T.</given-names>
            <surname>Mikolov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Chen</surname>
          </string-name>
          , G. Corrado,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dean</surname>
          </string-name>
          ,
          <article-title>Eficient estimation of word representations in vector space</article-title>
          ,
          <source>arXiv preprint arXiv:1301.3781</source>
          (
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>J.</given-names>
            <surname>Portisch</surname>
          </string-name>
          , H. Paulheim,
          <article-title>Alod2vec matcher</article-title>
          .,
          <source>OM@ ISWC</source>
          <volume>2288</volume>
          (
          <year>2018</year>
          )
          <fpage>132</fpage>
          -
          <lpage>137</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>P.</given-names>
            <surname>Ristoski</surname>
          </string-name>
          , H. Paulheim,
          <article-title>Rdf2vec: Rdf graph embeddings for data mining</article-title>
          , in: International Semantic Web Conference, Springer,
          <year>2016</year>
          , pp.
          <fpage>498</fpage>
          -
          <lpage>514</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>J.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Jiménez-Ruiz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Horrocks</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Antonyrajah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hadian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <article-title>Augmenting ontology alignment by semantic embedding and distant supervision</article-title>
          ,
          <source>in: The Semantic Web: 18th International Conference, ESWC</source>
          <year>2021</year>
          ,
          <string-name>
            <given-names>Virtual</given-names>
            <surname>Event</surname>
          </string-name>
          , June 6-10,
          <year>2021</year>
          , Proceedings 18, Springer,
          <year>2021</year>
          , pp.
          <fpage>392</fpage>
          -
          <lpage>408</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. F.</given-names>
            <surname>Cruz</surname>
          </string-name>
          ,
          <article-title>Agreementmakerdeep results for OAEI 2021</article-title>
          ,
          <source>in: Proceedings of the 16th International Workshop on Ontology Matching co-located with the 20th International Semantic Web Conference (ISWC</source>
          <year>2021</year>
          ), Virtual conference,
          <source>October</source>
          <volume>25</volume>
          ,
          <year>2021</year>
          , volume
          <volume>3063</volume>
          <source>of CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>124</fpage>
          -
          <lpage>130</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>S.</given-names>
            <surname>Neutel</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. H. de Boer</surname>
          </string-name>
          ,
          <article-title>Towards automatic ontology alignment using bert</article-title>
          .,
          <source>in: AAAI Spring Symposium: Combining Machine Learning with Knowledge Engineering</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>N.</given-names>
            <surname>Reimers</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Gurevych</surname>
          </string-name>
          ,
          <article-title>Sentence-bert: Sentence embeddings using siamese bert-networks</article-title>
          , arXiv preprint arXiv:
          <year>1908</year>
          .
          <volume>10084</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>