-

ADVIS Lab

Dept of Computer Science

0 0 University of Illinois at Chicago , Chicago IL 60607 , USA

2022

23 24

AgreementMakerDeep (AMD) is a new flexible and extensible ontology matching system. It exploits the contextual and structural information of ontologies by infusing knowledge to pre-trained masked language model, and then filter the output mappings using knowledge graph embedding techniques. AMD learns from classes and their relations between classes by constructing vector representations into the low dimensional embedding space with knowledge graph embedding methods. The results demonstrate that AMD achieves a competitive performance in many OAEI tracks, but AMD has limitations for property and instance matching.

Ontology matching Knowledge graph embedding pre-train language model

https://ellenzhuwang.github.io (Z. Wang)

2. Specific Techniques Used

The architecture of AMD is shown in Fig. 1, including ontology parsing, textual matching with large pre-train language models, knowledge graph embedding, model learning and candidate selection.

Ontology parsing. owlready2 [ 15 ] is used to extract meta information of classes from the source and target ontology, such as super/sub-classes, labels, annotations, partof and disjointwith. BeautifulSoup [ 16 ] is used to extract synonyms.

Textual matching. We apply several text per-processing techniques like stop-words removal and tokenization on class labels and annotations. AMD uses sentence-BERT [ 17 ] to compute cosine similarity between two concept labels and annotations in unsupervised tasks. We consider to use textual matching results as our mapping candidations.

Knowledge graph embedding. We characterize the structure information of ontologies by relations translated from one class to another class using a modified TransR [ 18 ] model into relational embedding spaces. 2.0.1. Problem Formulation.

Given two ontologies O and O’, we construct knowledge graph X and Y, and define the correspondence between two concepts as following triplets , ′ = < , , ′ >, where r is the relation between c and c’. The problem is to find mapping set M = {( , ) × | ≡ }. In this study, we focus on one-to-one alignment and the relation between concepts is equality.

Let ⃗( )= { 1, 2, ... } and ⃗( )= { 1′, 2′, ... ′} be two d-dimensional vectors sets of size m and n, we compute their distance with simple cosine similarity by d( ⃗( ), ⃗( )) = 1-sim( ⃗( ), ⃗( )) as follows: ( ⃗( ), ⃗( )) = ∑ arg max ( ⃗(

), ⃗( )) =1 (1) We define the probability of the aligned labels between concepts and by ( | ) as follows: ( | ) = ( ⃗( (2) (4) (5) where is the sigmoid function.

2.1. Masked Language Modeling

In bio-ML track, we train on ontologies corpus for semi-supervised tasks. We use standard transformer architecture[ 19 ] following Roberta[ 20 ]. The text encoder takes a sequence of tokens from triples {ℎ, , } as inputs, and computes a numbers of L layers to obtain contextualized representations ∈ ℝ × , where N is the number of tokens in our vocabulary and d is the dimension.

Concept prediction Concepts are the dominant elements in ontology matching problem, therefore, predicting the concepts forces the model to learn the of semantic information. At the same time, we infuse structural knowledge by triples to enable the ability of the model to learn the contextualized representations for each ontology. Here, the concept prediction is to predict head or tail concept, and the diference is position embedding of the masked tokens.

For all the concepts ∈ , we randomly select 30% of them to predict. And for each selected concept , the token ℎ or is replaced with the special token [MASK] in probability of 80%, another random token in 10% and the rest remain itself. The loss of concept prediction is defined as:

++ 2 = − ∑ log ( + | <+ ) (3)

Relation prediction Relations express the way of connection head and tail concepts, and also provide enrich hierarchy as structural information. We considers to concatenate all tokens in to predict, because relation labels usually have few words or tokens and are meaningless by separated tokens. For the training tasks, relation prediction is similar process as concepts prediction by masking randomly. Thus, the loss of relation prediction is written as: ++ 2 = − ∑ log ( + | <+ )

+ Therefore, the masked language modeling loss function can now be written as,

= 2 + 2 where we take a linear combination of both the loss terms.

2.2. Knowledge graph embedding

In AMD, we apply a modified TransR method which translates concepts and relations into concept space and relation-specify concept spaces, since there are multiple relations in the ontologies e.g subclassof and disjointwith. In the original TransR, the projected vectors are defined as = , ′ = ′ , and the score function as (, ′) = ‖ + − ′‖22 [ 18 ]. Inspired by Sun et al. [ 4 ], the absolute scores of positive triples are lower than the negative ones, so we modify the loss function by using two hyper-parameters as follows: ℒ = ∑ ∑ ( )∈ ( )∈ (0, ( ( ) − 1) − ( ( ) + 2)) (6) where T denotes ℎ, , and represent negative triples, 1, 2, > 0 and 2 > 1. Negative triples are generated from negative sampling method by following AMD[ 14 ] and Multi-OM [ 6 ].

During the process that computes vectors, we need to generate negative triples. Following the work of Sun et al. [ 4 ] and Li et al. [ 6 ], we refine the uniform negative sampling by choosing from the k-nearest neighbors in the embedding space, and setting constraints of select candidates excluding from the subclassOf or disjointWith related concepts. In this way, we can avoid vector sparsity and obtain better quality of vector representations for the concepts.

Candidate selection We select candidates based on a threshold of the classes knowledge graph embedding vectors similarity, and then compare the similarity with baseline if the pairs are in baseline result sets.

2.3. Parameter settings

Ontology pre-processing. Ontologies always are in the format of owl or rdf, but the inputs of masked language model and knowledge graph embedding models require the format of word or token embeddings. Firstly, we extract meta information from ontologies using o w l r e a d y 2 1, such as ID, labels, resource, descriptions of class(or called concepts). The nature language information of relations are extracted from restriction, property, subClass or superClass. Since the ontologies in the tasks were developed by diferent organizations, we process the ontology parsing from diferent tags, e.g rdf:ID=”isPartOf” and rdf:resource =”UNDEFINED_part_of”.

MLM pre-training settings. In practice, we use RoBERTa implementation by Huggingface2 as the base pre-trained model in our all experiments. LaKERMap is initialized with the r o b e r t a . b a s e parameters, and the base model size is 12 layers and 768-dimensional hiddden states( = 12, = 768) . For the MLM training task, we use the words or tokens in knowledge triples as our corpora for fine tuning. We select the first 5 mapping pairs from lexical matching method in few-shot learning. Hyper-parameters are the same in [ 20 ].

KGE training settings. We use the outputs of MLM as word or token embedding in the initialization for knowledge graph embbedding training process, as the dimension of d is set to 768. The remainder of our hyper-parameters in KGE are setup followed AMD [ 14 ].

The threshold for textual matching is 0.925, and the threshold for candidate selection is 0.9.

2.4. Datasets

We use the datasets provide by OAEI. AMD is able to be executed by organizers in four schema matching tracks, including Conference, Anatomy, bio-ML and Common Knowledge Graph track. However, AMD supports most of tracks in our local environment setups exclude for interactive matching track. 1https://github.com/pwin/owlready2 2https://huggingface.co/roberta-base

2.5. Adaptations made for the evaluation

Our framework uses Python with Pytorch 3 and RDFLib 4, and is packed for SEALS using MELT. We use the best parameter set in local alignments for the OAEI submission, see section 2.3.

3. Results 3.1. Anatomy

The Anatomy track results of AMD are shown in Table 1. In this year, AMD returns 1299 correspondences in 160 seconds. The result shows that AMD can be competitive among the top promising matching systems. The mapping candidates generation runtime is still 3 seconds which is same as last year. However, to improve overall performance in terms of recall and F1-score, we conduct a filtering process with more time consuming.

3.2. Conference

The Conference track results of AMD are shown in Table 1. As expected, the performance of AMD in the conference track is not good, with the F-measure only slightly higher when comparing baseline method(StringEquiv). AMD shows a lack of ability to extract and match the properties in M2 and M3 evaluation variants. However, AMD has higher values in term of Precision in most tasks. We have 0.01 improvement in term of F1-score in this year. 3.3. bio-ML

4. General comments 4.1. Comments on the result

Overall, the results show that AMD is able to complete several tasks in diferent domains on class-level matching in a timely manner. In this year, we have improvements in anatomy and 3https://pytorch.org 4https://github.com/RDFLib bio-ML tracks in terms of evaluation metrics. By contrast to last year, we solved memory issues for large scale ontologies. Moreover, we consider to enable semi-supervised capability of AMD, and it is beneficial to train on triples in intra-ontology and inter-ontology.

However, AMD is still under development that it is only able to return class correspondences, and is not able to match properties and instances in the current stage for some tracks.

4.2. Improvements

The current development of AMD touches on several aspects. Besides considering properties and instances matching, we will utilize joint embedding to combine contextualized knowledge graph embeddings like coKE and additional knowledge resources such as WebIsA [21] as a lexicon database. Moreover, we will adapt AMD with more diferent data types parsing and parameters selections for diferent tracks.

5. Conclusions

In this paper, we have introduced an ontology matching system called AMD. In this year, we consider to use BERT-like pre-train language model to obtain contextualized representations. To improve the overall performance, we adapted a modified transR model to fit the ontology matching problem: thus, we learn low-dimensional representations for each class and relation to capture the hidden semantics of ontologies, rather than measuring the similarities between classes directly, as in other traditional systems. AMD makes full use of the textual and structure knowledge of ontologies. The results demonstrate the high eficiency and the promising performance of our proposed matching method as compared to other systems results in several tracks. arXiv:1907.11692 (2019). [21] J. Seitner, C. Bizer, K. Eckert, S. Faralli, R. Meusel, H. Paulheim, S. P. Ponzetto, A large database of hypernymy relations extracted from the web., in: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16), 2016, pp. 360–367.

[1]

I. F.

Cruz ,

Palandri Antonelli , C. Stroe, AgreementMaker: Eficient Matching for Large Real-World Schemas and Ontologies, PVLDB 2 ( 2009 ) 1586 - 1589 .

[2]

I. F.

Cruz ,

Palandri Antonelli ,

Stroe , Eficient Selection of Mappings and Automatic Quality-driven Combination of Matching Methods , volume 551 of CEUR Workshop Proceedings , 2009 , pp. 49 - 60 .

[3]

Faria ,

Pesquita , E. Santos,

Palmonari ,

I. F.

Cruz ,

F. M.

Couto , The AgreementMakerLight Ontology Matching System , in: International Conference on Ontologies, DataBases, and Applications of Semantics (ODBASE) , Springer, 2013 , pp. 527 - 541 .

[4]

Sun ,

Hu ,

Zhang , Y. Qu, Bootstrapping Entity Alignment with Knowledge Graph Embedding , in: IJCAI, volume 18 , 2018 , pp. 4396 - 4402 .

[5]

Euzenat ,

Shvaiko , Ontology Matching, Springer-Verlag, Heidelberg (DE), 2007 .

[6]

Li ,

Duan ,

Wang ,

Zhang , G. Qi, Multi-view embedding for biomedical ontology matching ., OM@ ISWC 2536 ( 2019 ) 13 - 24 .

[7]

Laadhar ,

Ghozzi ,

Megdiche ,

Ravat ,

Teste ,

Gargouri , Partitioning and Local Matching Learning of Large Biomedical Ontologies , in: ACM SIGAPP Symposium on Applied Computing , 2019 , pp. 2285 - 2292 .

[8]

Melnik ,

Garcia-Molina ,

Rahm , Similarity Flooding: A Versatile Graph Matching Algorithm and Its Application to Schema Matching, 2002 , pp. 117 - 128 .

[9]

I. F.

Cruz , W. Sunna, Structural Alignment Methods with Applications to Geospatial Ontologies, Transactions in GIS, Special Issue on Semantic Similarity Measurement and Geospatial Applications 12 ( 2008 ) 683 - 711 .

[10]

Kolyvakis ,

Kalousis ,

Kiritsis , Deepalignment: Unsupervised ontology matching with refined word vectors , in: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies , Volume 1 ( Long

Papers)

, 2018 , pp. 787 - 798 .

[11]

Ristoski ,

Rosati ,

Di Noia , R. De Leone, H. Paulheim, RDF2Vec: RDF Graph Embeddings Their Applications , Semantic Web 10 ( 2019 ) 721 - 752 .

[12]

Cheatham ,

Hitzler , String similarity metrics for ontology alignment , in: International semantic web conference, Springer, 2013 , pp. 294 - 309 .

[13]

Hao ,

Chen ,

Yu ,

Sun ,

Wang , Universal Representation Learning of Knowledge Bases by Jointly Embedding Instances and Ontological Concepts , in: ACM SIGKDD International Conference on Knowledge Discovery & Data Mining , 2019 , pp. 1709 - 1719 .

[14]

Wang ,

I. F.

Cruz , Agreementmakerdeep results for oaei 2021 ., in: ISWC International Workshop on Ontology Matching (OM), CEUR Workshop Proceedings, CEUR-WS.org , 2021 , pp. 124 - 130 .

[15] J.-B. Lamy , Owlready: Ontology-oriented programming in python with automatic classification and high level constructs for biomedical ontologies , Artificial intelligence in medicine 80 ( 2017 ) 11 - 28 .

[16]

Richardson , Beautiful soup documentation, April ( 2007 ).

[17]

Reimers , I. Gurevych , Sentence-bert: Sentence embeddings using siamese bert-networks , in: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics , 2019 . URL: http://arxiv.org/abs/ 1908 .10084.

[18]

Lin ,

Liu ,

Sun ,

Liu ,

Zhu , Learning entity and relation embeddings for knowledge graph completion , in: AAAI Conference on Artificial Intelligence , 2015 .

[19]

Vaswani ,

Shazeer ,

Parmar ,

Uszkoreit ,

Jones ,

A. N.

Gomez , Ł. Kaiser, I. Polosukhin , Attention is all you need , in: Advances in neural information processing systems , 2017 , pp. 5998 - 6008 .

[20]

Liu ,

Ott ,

Goyal ,

Du ,

Joshi ,

Chen ,

Levy ,

Lewis ,

Zettlemoyer ,

Stoyanov , Roberta: A robustly optimized bert pretraining approach , arXiv preprint