1. Introduction

Truveta Mapper: A Zero-shot Ontology Alignment Framework

Mariyam Amir

Murchana Baruah

Mahsa Eslamialishah

Sina Ehsani

Alireza Bahramali

Sadra Naddaf-Sh

Saman Zarandioon

0 0 Truveta , Bellevue, WA 98004 , USA

2023

In this paper, a new perspective is suggested for unsupervised Ontology Matching (OM) or Ontology Alignment (OA) by treating it as a translation task. Ontologies are represented as graphs, and the translation is performed from a node in the source ontology graph to a path in the target ontology graph. The proposed framework, Truveta Mapper (TM), leverages a multi-task sequence-to-sequence transformer model to perform alignment across multiple ontologies in a zero-shot, unified and endto-end manner. Multi-tasking enables the model to implicitly learn the relationship between diferent ontologies via transfer-learning without requiring any explicit cross-ontology manually labeled data. This also enables the formulated framework to outperform existing solutions for both runtime latency and alignment quality. The model is pre-trained and fine-tuned only on publicly available text corpus and inner-ontologies data. The proposed solution outperforms state-of-the-art approaches, Edit-Similarity, LogMap, AML, BERTMap, and the recently presented new OM frameworks in Ontology Alignment Evaluation Initiative (OAEI22), ofers log-linear complexity, and overall makes the OM task eficient and more straightforward without much post-processing involving mapping extension or mapping repair. We are open sourcing our solution+.

eol>Ontology Alignment Ontology Matching multi-task sequence-to-sequence contextual matching transfer learning zero-shot learning attention-based models deep learning

1. Introduction

Ontology Matching (OM) or Ontology Alignment (OA) is the process of finding correspondence between the entities of two ontologies. The purpose of this process is to unify data from diferent sources and reduce heterogeneity, making data more viable for research and development [ 1 ]. Classical state-of-the-art (SOTA) approaches on OM are based on non-contextual matching, where the model captures lexical similarity but fails to understand textual semantics. On the other hand, with contextual approaches, the objective is to match complex pairs which are lexically diferent but semantically similar and vice-versa.

Recently, a transformer-based contextual framework using BERT [ 2 ], has been proposed in [ 3 ], which showed promising results compared to other OM systems. In their approach the existing pre-trained BERT model was used which allowed to learn the textual semantics. However, it did not understand the ontology graph structure, which could significantly extend the capabilities of ontologies graph matching.

Motivated by the potential of the transformer models for understanding textual semantic context and overcoming the limitations in the existing methods, the present work proposes Truveta Mapper (TM), a novel zero-shot sequence-to-sequence multi-task transformer-based framework for OM, with the capability of learning both the graph-structure and textual semantics of the ontologies. The model is first pre-trained to learn the hierarchical graph structure of ontology and semantics of each class using Masked Language Modeling (MLM), then fine-tuned using class labels and synonyms as input and class hierarchical-ID as the output, capturing the structure of the ontology. As such, we treat OM as a translation task, where the source ontology class is translated to a path in the matching target ontology class in a zero-shot and multitask manner. Proposed approach is based on zero-shot learning and prediction, where “zero-shot learning” refers to the ability of the model to make source-to-target predictions without requiring manually labeled cross-ontologies matching pairs, and “zero-shot prediction” performs end-to-end mapping from the source to the target without the need of similarity calculation across the entire/subset target ontology or post-processing like extension/repair. With multi-task training, a single model is capable of matching diferent ontologies such as SNOMED to FMA, SNOMED to NCIT, and so on, by taking advantage of transfer learning.

In this work, empirical comparison is made with the state-of-the-art lexical matching approaches and the recent contextual models presented in [ 4, 5 ] on the Unified Medical Language System (UMLS) datasets as part of the Bio-ML track for OAEI 2022. The Ontology Alignment Evaluation Initiative (OAEI) organizes yearly campaigns on ontology matching tasks. Our solution surpasses state-of-the-art LogMap, AML models, Edit-similarity, and recently proposed BERTMap, AMD, LogMap-Lite, BERTMap-Lite, LSMatch, Matcha and ATMatcher.

The remainder of this paper is as follows. Section 2 reviews the recent SOTA-related works on OM/OA; Section 3 defines the problem statement; Section 4 elaborates on pre-training, ifne-tuning, zero-shot learning, and predictions; Section 5 shows the overall model performance; and lastly, Section 6 provides conclusion and outlines our potential future works.

2. Related Work

OM classical approaches are primarily based on non-contextual matching. Related to that, some notable works in the field of OM include Edit-Similarity [ 6 ], LSMatch [ 7 ], LogMap [ 8 ], and AgreementMakerLight (AML) [ 9 ], among others. Edit-Similarity is a naïve lexical matching approach based on normalized edit similarity scores. LSMatch is another lexical matching approach based on string similarity match. LogMap and AML are two classical OM systems with leading performance in many equivalence matching tasks. These two approaches are based on lexical matching, mapping extension (adding new mappings for semantically related classes of the current mappings), and mapping repair (removing mappings that can lead to logical conflicts). However, these lexical approaches do not consider contextual semantics.

Several OM systems, such as OntoEmma [10], DeepAlignment [11], VeeAlign [12] leveraged dense word embeddings, in which words are projected into a vector. Word pairs with smaller Euclidean distances in the vector space have closer semantic meanings. Diferent techniques are used to generate these embeddings. OntoEmma and [13] uses word2vec [14], which is trained on Wikipedia; [15] uses FastText [16]; LogMap-ML [17] uses OWL2Vec* [18], which is a word2vec model trained on corpora extracted from the ontology with diferent kinds of semantics; DeepAlignment uses refined word embeddings using counter-fitting; VeeAlign proposes dual embeddings using class labels; [19] employs hyperbolic embeddings. These are primarily traditional non-contextual word embedding methods and do not consider contextual meaning. Some of these approaches, such as VeeAlign, are based on supervised training, which requires high-quality labeled mappings for training and can be challenging to obtain.

Transformer-based models [20], thanks to their ability to learn textual contexts, obtained SOTA for several tasks in natural language processing such as machine translation [21, 22, 23], question answering [24], among others. Similarly, in the field of OM, recent developments have also shown the potential of using transformer-based frameworks [ 1, 3, 25 ]. Neutel and de Boer [ 1 ] employed contextual BERT embeddings to match two domain ontologies associated with occupations. Each sentence is embedded using BERT, and similarity is applied to get the scores for OM. More recently, [ 3 ] proposed BERTMap model, which is obtained by fine-tuning the already pre-trained BERT model for the binary classification task. The BERTMap model often outperformed non-contextual approaches such as LogMap, AML, and LogMap-ML. AMD [25] is another recent context-based matching approach that uses a BERT-based model to generate mappings and then filters these mappings using graph embedding techniques. Other related ontology matching systems that participated in OAEI 2022 [ 4 ] are LogMap-Lite, BERTMap-Lite, Matcha, and ATMatcher.

3. Methodology 3.1. Problem statement

Ontology Matching (OM) or Ontology Alignment (OA) is the process of finding correspondence between the entities/classes of two ontologies [ 5 ]. In this work, a new perspective is presented by treating OM as a translation task for equivalence matching and can be mathematically presented as ( 1, ) , where function gives the matching target ontology class 2 ∈ 2, given a source class 1 ∈ 1 and alignment task identifier as . 1 and 2 as the source and target ontologies, with 1 and 2 being their respective named class sets. Since we are training a multi-task model, a unique identifier is used for each task.

The present work focuses on equivalence matching, where classes having the same semantic meaning in diferent ontologies are matched with each other. As shown in Figure 1, each ontology is presented in the form of a hierarchical graph structure with parent-child relation, where each class presents a node in the given ontology graph and target class 2 ∈ 2 is obtained as a path in the target ontology graph, for a given input node 1 ∈ 1 in the input ontology1.

3.2. Ontologies

In this work, as a part of the OAEI 2022 Bio-ML track [ 4 ], we focus on three UMLS equivalence matching tasks, SNOMED to FMA (Body), SNOMED to NCIT (Neoplas), and SNOMED to NCIT (Pharm), in an unsupervised setting from [ 4 ], where the matching pairs between these ontologies are only divided into validation (10%) and testing (90%) sets, without any training data. Pharm, Neoplas, and Body are associated with the semantic types of “Pharmacologic 1Note, each class is presented as a node in the ontology hierarchical graph-structure, as such, class and node are used interchangeably, as appropriate.

Substance”, “Neoplastic Process”, and “Body Part, Organ, or Organ Components” in UMLS, respectively. Based on these semantic types, subset ontologies are provided in [ 4 ], and are given as SNOMED (Body), SNOMED (Neoplas), SNOMED (Pharm), FMA (Body), NCIT (Neoplas) and NCIT (Pharm), where the first three are the source and last three are the target ontologies in our matching task. For each of the classes present in the given ontologies, class ID is provided along with its associated label and possible synonyms (class descriptions). For example, in Figure 1, for Snomed ID 78904004, the class label is “Chest Wall Structure,” and its synonyms are “Thoracic Wall” and “Chest Wall”.

4. Truveta Mapper (TM): Proposed approach for OM 4.1. Hierarchical-ID generation

An ontology is represented in the form of a graph where each node represents a class, and the parent and child relations of the ontology serve as connec- Figure 3: Hierarchical-IDs generation for the Enzyme concept tions between classes. Based in the SNOMED ontology. The shortest ID (highon this graph structure of each lighted) is chosen as a Hierarchical-ID, and others full ontology, hierarchical-IDs are SynonymIDs for this concept. are generated for all the classes.

These are constructed by starting from the root node, separated by “-” at each hierarchy level, and traversing through each node in that level as shown in Figure 3. Following this method, a unique ID is generated for each path traversed. As such, for ontologies like SNOMED, where there are multiple paths between the root and any given class, there could be multiple IDs for that node. In such cases, the shortest ID is considered the hierarchical-ID of that node (highlighted in yellow in Figure 3), while the other path IDs are considered its synonymIDs.

Pre-training tasks ID: Child & Parent ID: SYN & SYN ID & SYN ID & FSN FSN & SYN Fine-tuning tasks Pharm Neoplas Body Each node ID inherently captures the information of all its ancestors. This enables the model to trace from a broader class, starting from the root and getting more granular at each level, thus simplifying the translation task.

4.2. Pre-training

After generating the hierarchical-IDs, multi-task pre-training is done on the full ontologies (SNOMED, FMA, NCIT) using MLM by randomly masking the nodes, enabling the model to learn the hierarchy and semantics. For instance, “Structure of Forel’s H2 bundle” is represented as “1-1-0-0-0-0-4-1-1-0-0-0-7” and is masked as “1-1-0-0-0-0-[MASK]-1-0-0-0-7”. Furthermore, additional tasks are included in order for the model to learn the semantics of each class in the form of class-level synonyms, labels, and descriptions; class-level relations between child and parent nodes; and the relation between synonym-ID and hierarchical-ID, using separate identifiers for each task in the pre-training step (Figure 2). Task identifiers are added in the form of prefixes, to distinguish between diferent ontologies. For example, SNOMED ontology is prefixed as “F0:”, where “F” represents fully specified name (class label) and “0” indicates SNOMED Ontology. Similarly, FMA and NCIT are represented using “1” and “2” identifiers. Some representative examples are presented in Table 1, where similar tasks are defined for each ontology, with the objective of learning their hierarchical structure and semantics using MLM.

Based on the tasks stated in Table 1, we generate the pre-training dataset which has 2,406,456 instances constituting SNOMED, NCIT, and FMA ontologies. The model is trained for 3 epochs, with an increasing masking percentage linearly over time, starting at 10% and increasing to 35% in the final batch. The pre-training is done on 8 V100 32GB Nvidia GPUs with a batch size of 20, using a learning rate of 1e-3 with linear decay scheduler and AdamW optimizer.

4.3. Fine-tuning

The fine-tuning step aims to train the model on the downstream OM tasks. Only target subset ontologies, i.e., NCIT (Pharm), NCIT (Neoplas), and FMA (Body), are used for fine-tuning. The training data of each target sub-ontologies is augmented using the exact matches present in the labels and synonyms of the other subset ontologies. We are also taking advantage of older ontology versions to add more synonyms to each target label. This expands the training corpus, enriches the data with minimal processing, and helps to perform more comprehensive learning. After the data augmentation for all the target sub-ontologies, fine-tuning is performed only on these target sub-ontologies corpora. Training data is generated for each class in the target ontologies, where the input is the class label, synonyms, and descriptions, and output is the corresponding node hierarchical-ID (generated in Section 4.1), using a separate identifier for each task. Similar to our pre-training approach, multi-task fine-tuning is performed on the downstream OM tasks. Some examples are presented in Table 1.

Based on the fine-tuning tasks described in Table 1, we generate the fine-tuning training data which has 462,789 samples from Pharm, Neoplas, and Body subsets. Using 8 Nvidia V100 32GB GPUs with a batch size of 20, the fine-tuning took around 21 epochs. For the fine-tuning, a learning rate of 1e-3 with linear decay scheduler and warm-up of 1.5 epoch using AdamW optimizer with eps of 1e-8 and weight decay of 1e-2 is used.

4.4. Zero-shot Predictions

TM is a multi-task model with the capability to translate between multiple ontologies from the input source class labels/synonyms to target hierarchical-IDs. Thus, given a source term, the model predicts the potential candidate in the target ontology graph. For confidence scoring, two approaches have been adopted here: (i) Greedy search score: Scores are generated based on greedy search with softmax probabilities using temperature scaling. This is a naive way to compute the confidence directly from the model prediction. (ii) Using embeddings: This is a sophisticated method proposed to make the TM predictions more robust and improve model precision, by leveraging semantic similarity using embeddings of source terms and predicted target candidates. Using the same model, the embeddings are generated for the target candidate and the similarity score is obtained between the source term and predicted target term embeddings. Scores are generated across the source and predicted class labels and synonyms, all of which are also augmented by singularization. The maximum generated score is considered as the similarity score. As such, the proposed model takes advantage of both graph search and semantic matching. Mathematically, similarity score is given as:

1.0, = { ((Ω(

if Ω( 1) ∩ Ω( 2) ≠ ∅ 1), Ω( 2)), otherwise (1) where 2 is the predicted class for 1, Ω( 1) and Ω( 2) are sets of labels and synonyms for 1 and 2, respectively, and ((Ω( 1), Ω( 2)) selects the maximum cosine similarity score across all the labels and synonyms of 1 (source) and 2 (predicted). If an exact match is available between the labels and synonyms of source and target classes, we assign a maximum similarity score, since embedding similarity will also give a similar result. The source and the target candidates are considered valid mapping pairs if their similarity score exceeds a selected threshold for both the approaches.

One of the main advantages of our proposed TM is that it reduces the time complexity to log-linear as opposed to the naive solution of search that results in quadratic complexity2. Given an input term with a specified task identifier, TM is able to predict the best possible match from the target ontology with (()) complexity, where corresponds to the number of nodes in the target ontology graph (same as the number of classes). Overall, TM reduces the time-complexity to (()) , noting that a single search in a tree structure with nodes can be performed in (()) time.

5. Results 5.1. Evaluation criteria

Commonly used metrics for evaluating OM systems [ 5 ]: Precision (P), Recall (R), and F-score are used as the global evaluation metrics. Mathematically, = |

∩ | | | , = |

∩ | | | ,

. = (1 + 2) 2. + where, are the reference mappings, consisting of matching pairs, = (, ′), such that and ′ are two classes from the to-be-aligned ontologies, and are the mappings computed by OM systems and = 1 .

Local evaluation metrics, @ and Mean Reciprocal Rank ( ), introduced in [ 5 ] are also used for current evaluation and can be represented as: |{ ∈ |() ≤ } | | | , = ∑∈ () | | −1 where () returns the ranking position of among ∪ {} according to their scores, represents a set of negative mappings pairs for each of the source term in , such that (, ″) ∈ with ∈ {1, 2, ..., 100} and ″ are the 100 negative output candidates from target ontologies for each of the source terms in . As such, the Hits and MRR would be diferent for diferent selected 100 samples. We have published the results of our model based on the provided set in [ 5 ] for a fair comparison. To provide a more robust measure of local metrics, we are reporting overall accuracy as well, although this is not provided for any of the other models. Accuracy here can be mathematically presented as: = |{ ∈ | (, ) =

′}| | | 2Note that BERTMap reduces the time complexity from ( 2) in traditional approaches to () , where << with an additional preprocessing step by considering only a small portion of target subset ontology classes with at least one subword token common to the source class candidate, which adds dependency on the tokenizer and could be error prone since some semantically matching cases with lexical variations could get filtered out in this process. Contrary to that, such limitation does not exist in TM since it performs matching from source to target without reducing the target corpora size. (2) (3) (4) 0.960 where = (, ′) represents matching pairs in the set, and (, ) refers to the target candidate predicted by the model, given an input term and appropriate task identifier . Baselines. Results are compared with the SOTA approaches: Edit-Similarity, LogMap, AML, BERTMap [ 5 ], and recently published results in [ 4 ]. To be consistent, evaluation for P, R, F-score, Hit@1, and MRR is done using [ 6 ] library.

5.2. Prediction Results

Prediction results are shown in Tables 2–4, for the three equivalence OM tasks, from SNOMED to FMA (Body), SNOMED to NCIT (Pharm), and SNOMED to NCIT (Neoplas). The results demonstrate precision, recall, F-score, Hit@1, MRR, and accuracy for TM and baseline approaches presented in He et al. [ 5 ] and OAEI [ 4 ] on the test data for the unsupervised setting. In the given tables, superscripts1,2 are based on our proposed TM, where the former is based on embedding similarity score and later is based on greedy search score, superscript∗ results are based on He et al. [ 5 ] and we used the same evaluation metrics for TM, and superscript∗∗ correspond to OAEI [ 4 ] published results. The highest numbers for each of these metrics are highlighted in the tables to emphasize which model is outperforming others in each category.

The overall results illustrate that TM is outperforming all the baselines for all three OM tasks in F-score, Hit@1, and MRR. A high threshold is selected to generate the most confident cross-ontology matching pairs. Note that a single unified model is trained and leveraged here to predict all the results in the form of a source class to target hierarchical-IDs, using appropriate task identifiers.

There are two TM results presented in the given tables, and both are based on diferent scoring schemes. TM2 is based on greedy search scores and TM1 is based on a new and more robust prediction scheme using embeddings described in Subsection 4.4, taking advantage of both graph search and semantic similarity. It can be seen that both of our methods surpass SOTA for all the tasks, but TM1 is more robust and has significant improvements as compared Task TM(Ours)1 TM(Ours)2 Edit-Similarity∗ LogMap∗ AML∗ BERTMap∗ LogMap-Lite∗∗ AMD ∗∗ BERTMap-Lite∗∗ Matcha∗∗ ATMatcher ∗∗ LSMatch∗∗ 0.982 to any of the existing methods. To be precise, 2.3% improvement over the second best result (AML) in Body, 11.0% improvement for Pharm (as compared to AMD), and 4.3% improvement for Neoplas as compared to BertMap-Lite and Edit-Similarity, is seen for TM1 in the F-score. It should be noted that even without TM, none of these methods are SOTA in all the tasks.

For generating local metrics for Hit@1 and MRR, TM is used to generate the embedding similarity score of input terms in the test set and their corresponding candidates in ∪ {} set. We are also outperforming all existing SOTA methods based on MRR and Hit@1. Additionally, we are reporting accuracy metric, which is consistent, and more representative of the model performance. For this metric, the TM predictions are obtained across the entire target ontology without using any smaller subset of negative samples from the test set, while reducing the time complexity from quadratic to log-linear.

6. Conclusions and Discussions

This work presents a new approach to OM by treating the OM process as a translation task and performing multi-task pre-training, fine-tuning, and predictions in a zero-shot, unified and end-to-end manner. The proposed approach takes advantage of transfer learning across diferent ontologies and does not require manual annotations for training. Additionally, the trained model understands the semantics of the text as well as the structure of the ontologies. We show that our proposed method outperforms Edit-Similarity, LogMap, AML, BERTMap, and the recently proposed OM frameworks in the OM22 conference [ 4 ] in all the tasks.

Our approach provides several advantages: (1) It reduces the time complexity to log-linear during inference, (2) It is based on zero-shot prediction, without requiring much post-processing and does not employ mapping extension or mapping repair in contrast to the other methods, (3) It does not require any manual labeled cross-ontologies matching pairs due to zero-shot learning, (4) One unified framework is used as a result of multi-tasking, which makes it easier to productionize these large transformer-based models, (5) It is robust toward diferent tokenization schemes as it uses byte level tokenization, (6) It learns complete ontologies graphs, using the hierarchical-IDs which provides a more natural path for translation, and would be significantly helpful for subsumption mappings.

In the future, we will pre-train the starting checkpoint with more domain-related corpus (e.g., PubMed, MIMIC-III, clinical notes) instead of the mC4 dataset. Another interesting track can be ensemble learning of existing SOTA models with TM. [10] L. L. Wang, C. Bhagavatula, M. Neumann, K. Lo, C. Wilhelm, W. Ammar, Ontology alignment in the biomedical domain using entity definitions and context, arXiv preprint arXiv:1806.07976 (2018). [11] P. Kolyvakis, A. Kalousis, D. Kiritsis, Deepalignment: Unsupervised ontology matching with refined word vectors, in: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), 2018, pp. 787–798. [12] V. Iyer, A. Agarwal, H. Kumar, VeeAlign: a supervised deep learning approach to ontology alignment., in: OM@ ISWC, 2020, pp. 216–224. [13] Y. Zhang, X. Wang, S. Lai, S. He, K. Liu, J. Zhao, X. Lv, Ontology matching with word embeddings, in: Chinese computational linguistics and natural language processing based on naturally annotated big data, Springer, 2014, pp. 34–45. [14] T. Mikolov, K. Chen, G. Corrado, J. Dean, Eficient estimation of word representations in vector space, arXiv preprint arXiv:1301.3781 (2013). [15] M. Tounsi Dhouib, C. Faron Zucker, A. G. Tettamanzi, An ontology alignment approach combining word embedding and the radius measure, in: International Conference on Semantic Systems, Springer, Cham, 2019, pp. 191–197. [16] P. Bojanowski, E. Grave, A. Joulin, T. Mikolov, Enriching word vectors with subword information, Transactions of the association for computational linguistics 5 (2017) 135–146. [17] J. Chen, E. Jiménez-Ruiz, I. Horrocks, D. Antonyrajah, A. Hadian, J. Lee, Augmenting ontology alignment by semantic embedding and distant supervision, in: European Semantic Web Conference, Springer, 2021, pp. 392–408. [18] J. Chen, P. Hu, E. Jimenez-Ruiz, O. M. Holter, D. Antonyrajah, I. Horrocks, Owl2vec*:

Embedding of owl ontologies, Machine Learning 110 (2021) 1813–1845. [19] P. Kolyvakis, Approaching Ontology Alignment through Representation Learning to Bridge the Semantic Gap in Engineering Applications, Technical Report, EPFL, 2020. [20] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, I. Polosukhin, Attention is all you need, Advances in neural information processing systems 30 (2017). [21] M. Johnson, M. Schuster, Q. V. Le, et al., Google’s multilingual neural machine translation system: Enabling zero-shot translation, Transactions of the Association for Computational Linguistics 5 (2017) 339–351. [22] W. Xu, M. Carpuat, Editor: an edit-based transformer with repositioning for neural machine translation with soft lexical constraints, Transactions of the Association for Computational Linguistics 9 (2021) 311–328. [23] H.-I. Liu, W.-L. Chen, Re-transformer: a self-attention based model for machine translation,

Procedia Computer Science 189 (2021) 3–10. [24] C. Clark, K. Lee, M.-W. Chang, T. Kwiatkowski, M. Collins, K. Toutanova, BoolQ: Exploring the surprising dificulty of natural yes/no questions, arXiv preprint arXiv:1905.10044 (2019). [25] Z. Wang, AMD Results for OAEI 2022 (2022). [26] L. Xue, A. Barua, N. Constant, R. Al-Rfou, S. Narang, M. Kale, A. Roberts, C. Rafel, Byt5: Towards a token-free future with pre-trained byte-to-byte models, Transactions of the Association for Computational Linguistics 10 (2022) 291–306.

[1]

Neutel , M. H. de Boer , Towards Automatic Ontology Alignment using BERT ., in: AAAI Spring Symposium: Combining Machine Learning with Knowledge Engineering , 2021 .

[2]

Devlin , M.-

Chang ,

Lee ,

Toutanova , Bert: Pre-training of deep bidirectional transformers for language understanding , arXiv preprint arXiv: 1810 . 04805 ( 2018 ).

[3]

He ,

Chen ,

Antonyrajah , I. Horrocks , BERTMap: a BERT-based ontology alignment system , in: Proceedings of the AAAI Conference on Artificial Intelligence , volume 36 , 2022 , pp. 5684 - 5691 .

[4] OAEI, https://www.cs.ox.ac.uk/isg/projects/ConCur/oaei/2022/ Accessed: 2022 - 08 ( 2022 ).

[5]

He ,

Chen ,

Dong ,

Jiménez-Ruiz ,

Hadian , I. Horrocks , Machine LearningFriendly Biomedical Datasets for Equivalence and Subsumption Ontology Matching , arXiv preprint arXiv:2205.03447 ( 2022 ).

[6] DeepOnto, https://github.com/KRR-Oxford/DeepOnto Accessed: 2022 -10- 14 ( 2022 ).

[7]

Sharma ,

Patel , S. Jain, LSMatch and LSMatch-Multilingual Results for OAEI ( 2022 ).

[8]

Jiménez-Ruiz ,

B. Cuenca

Grau , Logmap: Logic-based and scalable ontology matching , in: International Semantic Web Conference, Springer, 2011 , pp. 273 - 288 .

[9]

Faria ,

Pesquita , E. Santos,

Palmonari ,

I. F.

Cruz ,

F. M.

Couto , The agreementmakerlight ontology matching system , in: OTM Confederated International Conferences” On the Move to Meaningful Internet Systems” , Springer, 2013 , pp. 527 - 541 .