2. Proposed Methods

A-LIOn - Alignment Learning through Inconsistency negatives of the aligned Ontologies

Sarah M. Alghamdi

sarah.alghamdi.1@kaust.edu.sa 0 1 2

Fernando Zhapa-Camacho

fernando.zhapacamacho@kaust.edu.sa 0 2

Robert Hoehndorf

robert.hoehndorf@kaust.edu.sa 0 2 0 Computational Bioscience Research Center , Computer , Electrical & Mathematical Sciences and Engineering Division 1 King Abdul-Aziz University, Faculty of Computing and Information Technology , Rabigh, 25732, Kingdom of Saudi 2 King Abdullah University of Science and Technology , 4700 KAUST, 23955 Thuwal , Saudi Arabia

Ontologies play an important role in sharing and reusing knowledge. Several ontologies have been developed to describe a particular domain but from diferent perspectives from communities of developers and users. This has led to the existence of multiple ontologies covering the same or a diferent domain with varying degrees of variability. Ontology Alignment is typically used to identify correspondences between semantically related elements of two or more ontologies in order to address this problem.

Ontologies Ontology Alignments Ontology matching Inconsistency negatives

2. Proposed Methods

, where is a set of concept ( , ) ∈

× that are considered as being equivalent or standing in a subclass relation nEvelop-O OAEI 2022 (R. Hoehndorf) within certain contexts.

A graph is defined as a tuple = (, , ) , where is a set of entities names, is a set of relations names and ⊆ × × is a set of triples of the form (ℎ, , ) .

A projection of an ontology into a graph is a mapping ∶ → that maps the ontology classes into graph nodes, ontology roles as graph relations, and ontology axioms as graph triples following a particular set of rules.

Our method A-LIOn combines diferent matching techniques and consists of four main components (see Figure 1): • Learning lexical matching seeds. • Graph construction from source and target ontology. • Graph embedding and transformation learning.

• Consistency checking.

Those components cover element-wise, structure-wise, and formal semantics learning techniques. (a) Element-wise techniques consider the entities in the ontology in isolation in order to ifnd alignments disregarding the fact that they are part of the ontology’s structure. This means that we use the information belonging to an ontology class itself such as its textual annotations and labels only. (b) On the other hand, structure-wise techniques analyze the entities as part of their structure. In our case, we focus on adjacency structure within the ontology and extract the structure in the form of a graph. (c) Finally, the semantic component consists of employing formal semantics learning techniques and logical inference to identify correspondences and repair inconsistencies.

2.1. Learning Lexical matching seeds

To begin learning ontology alignment, we need some known-to-be-positive seed alignments. We chose to align the classes of both ontologies with the same IRI, or lexically matched labels and relative IRIs. For lexical matching, we utilize fuzzy lexical matching, a method for finding approximate string matching with a retrieved score representing the similarity between one string to another. We begin with an exact matching score and then we decrease the threshold iteratively until a suficient number of seeds are obtained or a minimal accepted threshold is reached. The number of matching seeds required is a parameter of our method.

2.2. Ontology Projection

We project each ontology as a graph in order to learn structure-level information from the source and the target ontologies. We evaluate two graph construction techniques: • Subsumption hierarchy: in this method, we only utilized the subclass axioms asserted between the ontology classes to generate a directed graph for the source ontology and the target ontology. We evaluated this technique for Anatomy, Conference, Biodiversity and Ecology, and Material Sciences and Engineering tracks.

Symbolic

Neural

Add as negatives min || (M.cs - ct) - (M.cns - cnt) ||2

Find inconsistent alignments (Explanations of unsatisfiability )

Source ontology

Target ontology Graph projection

Lexical Seed Learning Transformation Embedding learning vc1 vc2

vc4 vc2 vc1 vc3

min || M c1 - c1 ||2 No min || Mr.c1+ r1- Mr.c2 ||2 Return Alignments yes

Is consistent? (OWL reasoner)

Merging ontologies + Alignment • OWL Projection: This method was proposed in [ 1 ] where OWL axioms are transformed directly into edges in the graph, and complex axioms are approximated in the graph to avoid the use of blank nodes. Despite the fact that this transformation method does not preserve exact logical relations, it enables correlation and learning alignments between classes of the source and target ontologies as well as within the same ontology. We evaluated this technique using the Phenotype ontology alignment.

2.3. Transformation Learning

After projecting an ontology, the result is a graph. Depending on the chosen projection method (Section 2.2), these graphs would encode the taxonomical structure or relational information found in the ontologies.

In our method, we start with two ontologies (source) and (target), which, after applying the graph projection, will become two graphs , , respectively. When we deal with two graphs, there are several graph alignment methods that can align two graphs from a small number of seed alignments; we follow the method in [ 2 ].

To learn representations of the two graphs , , we define two vector spaces , , where the entities (nodes and edges) of each graph will be processed separately. To learn the graph embeddings we rely on knowledge graph embeddings methods such as TransR [ 3 ], optimizing the following loss function: = ‖ ⋅ + − ⋅ ‖ (1) for each relation in the source graph where the triple ( , , ) exists.

= ‖ ⋅ + − ⋅ ‖ (2) for each relation in the target graph where the triple ( , , ) exists.

Simultaneously, we use a transformation ∶ → that takes the entities from the seeds we found earlier (Section 2.1) from the source embedding space to the target space, using the following loss:

2.4. Inconsistency negatives learning

OWL ontologies are based on Description Logic and facilitate the use of automated reasoners, which in turn facilitate computing entailments of statements from the asserted ontology axioms. In addition, these inferences can be investigated to determine if a class in an ontology is satisfiable or unsatisfiable. A class is unsatisfiable if it cannot have any instances (i.e., the axioms constrain the class in a contradictory way); an ontology is inconsistent if it has at least one instance of a logical contradiction [ 4, 5 ]. We utilize the ELK reasoner [ 6 ] to find alignments that lead to unsatisfiable classes. In order to find unsatisfiable classes in aligning and , we first merge both ontologies (i.e., we combine their axioms into a new ontology) and add all alignments predicted by our model as equivalence class axioms to the merged ontology . We define this ontology as ∶= ( ∪ , ∪ , ∪ , ∪ , ) , where is a set of concepts from ontology , is a set of relations form ontology , is a set of individuals form ontology , and is a set of axioms from ontology , is the predicted alignment.

Then we use the ELK reasoner [ 6 ] to identify unsatisfiable classes in the merged ontology. If we identify an unsatisfiable class, we generate explanations for the entailment generated by ELK; an explanation consists of a small set of axioms from which the unsatisfiability follows directly; we specifically identify any of the equivalence class axioms we have added within the generated explanations, as these are likely causing the class to become unsatisfiable. We remove the equivalence class axioms causing unsatisfiable classes from the merged ontology and iterate. Finally, we return to the transformation learning step with an updated loss to optimize for alignment learning as follows: where , are positive class pairs from source ontology and target ontology, respectively, , are pairs of classes which gave rise to unsatisfiable classes and which we removed in the repair step. The new iteration of our method now uses these pairs as negatives during training in the alignment of both ontologies. We repeat this step until no more unsatisfiable classes remain.

3. Results

For this year’s evaluation, we tested A-LIOn in three tracks: Anatomy, Conference and Material Sciences and Engineering (MSE). We have also tested our system on the phenotypes track using last year’s evaluation tests.

3.1. Participation in OAEI

We selected tracks that align ontologies that contain disjoint class assertion axioms. Disjoint class assertion axioms are a common cause of inconsistencies, and, therefore, we will be able to observe the performance of our method in correcting and training to avoid inconsistencies. In the anatomy track, the ontology file h u m a n . o w l contains 17 disjoint class assertion axioms. In the conference track, the number of disjoint class assertion axioms are as follows: 81 in c m t . o w l , 42 in C o n f e r e n c e . o w l , 15 in c o n f i o u s . o w l , 129 in c o n f O f . o w l , 36 in c r s _ d r . o w l , 1,221 in e d a s . o w l , 222 in e k a w . o w l , 3 in i a s t e d . o w l , 12 in M I C R O . o w l , 384 in M y R e v i e w . o w l , 237 in O p e n C o n f . o w l , 396 in p a p e r d y n e . o w l . Finally, in the Material Sciences and Engineering track 158 disjoint class axioms was found in M a t o n t o . All the results can be found in OAEI 2022 campaign page http://oaei.ontologymatching.org/2022/results/. 3.1.1. Anatomy In terms of precision, recall, and F-measure, the matching performance of A-LIOn in the anatomy track were below the string equivalence baseline. The main issue that afected our performance in this track is the small number of the predicted alignments and the small number of inconsistencies discovered using the OWL EL reasoner. The main issue afecting the performance of A-LIOn in the anatomy track is the limited number of initial seeds discovered based on the parameters settings we used, which substantially afected recall. To overcome this limitation, an adaptive method that uses a specific pairs of ontologies to determine parameters (such as for seed matching) could be developed and used to overcome this limitation.

3.1.2. Conference

The Conference track contains information about conference organization. This track comes in two versions: standard and uncertain. The standard version of the Conference track contains a reference alignment which was the result of a “Consensus Workshop” in 2008. However, some of these alignments may not be possible to detect either by a computational algorithm or manually by humans [ 7, 8 ]. For that reason, the uncertain version of the conference track was generated by consulting a group of experts and computing the ratio of agreement on each match. As a consequence, the uncertain track is more realistic because it removes the controversial alignments (i.e., the ones for which the experts could not reach a consensus). For that reason, when the evaluation is done on the uncertain version of the track, it is expected that systems increase their performance. A-LIOn has the highest increase with respect to the standard version among all the systems. This suggests that A-LIOn is capable to detect non-controversial alignments more easily than the controversial ones.

The current version of A-LIOn uses OWL EL reasoning to detect and exclude alignments that cause inconsistencies. However, the results show that A-LIOn does not detect all inconsistent alignments. The main reason for this lack of removing all incoherent alignment is the use of more expressive description logics than OWL EL. A-LIOn only uses OWL EL reasoning because computing entailments in expressive description logics has a high computational complexity and may not always be successful for larger ontologies, such as those used in the biomedical domain. However, the ontologies used in the Conference track are small compared to ontologies in other tracks such as Anatomy. In a future version of A-LIOn, we may include additional reasoners, including reasoners for more expressive logics.

3.1.3. Material Sciences and Engineering

There are three test cases for the Material Sciences and Engineering track. The first and second test cases align MatOnto ontology to the Material Information ontology, and the third case EMMO ontology to Material Information ontology. MatOnto contains 158 disjoint class axioms and could thus introduce useful inconsistencies that can be exploited by our method. The results indicate that A-LIOn had the highest recall, and an F-measure comparable to the other tested methods. However, there is one test case where A-LIOn failed to parse the labels in the ontology (the EMMO ontology); consequently, A-LIOn failed to produce any alignments.

3.2. Phenotype matching use case

We tested the OWL projection method in the problem of aligning phenotype ontologies. To test this approach, we utilized the datasets provided last year [9] for aligning Human phenotype ontology (HP) [10] and Mammalian Phenotype Ontology (MP) [11]. The seed alignments we used are exactly matching IRIs of classes, as well as lexical alignments for HP and MP classes only. We tested two diferent approaches for generating the graphs from source and target ontologies (Section 2.2). Results are shown in Table 3.2 where we included the results for some of the participating systems from last year for comparison [12, 13, 14, 15]. Comparing the results of the various graph generation techniques, we found that using the OWL projection in the problem of phenotype mappings allows for the discovery of more mappings, whereas the subsumption hierarchy produces alignments with high precision but finds fewer alignments, thereby decreasing the recall.

4. Conclusion

A-LIOn is a system that incorporates both entity-level and structure-level information in learning alignments between two ontologies; A-LIOn also uses logical reasoning to correct alignments that are likely faulty because they lead to unsatisfiable classes, and incorporates the results of this symbolic step in the learning process to generate new negatives. In the future, we plan to make our system able to learn better parameters based on the input ontologies features and self-evaluate the predicted alignment. For example, using a diferent set of parameters for anatomy and the first task on Material Sciences and Engineering tracks allowed us to increase the F-score by 10% and 3.3% respectively. A further improvement will be the use of language models in seed selection. of the 6th International Conference on Ontology Matching - Volume 814, OM’11, CEURWS.org, Aachen, DEU, 2011, p. 179–183. [9] M. Pour, A. Algergawy, F. Amardeilh, R. Amini, O. Fallatah, D. Faria, I. Fundulaki, I. Harrow, S. Hertling, P. Hitzler, et al., Results of the ontology alignment evaluation initiative 2021, in: CEUR Workshop Proceedings 2021, volume 3063, CEUR, 2021, pp. 62–108. [10] S. Köhler, M. Gargano, N. Matentzoglu, L. C. Carmody, D. Lewis-Smith, N. A. Vasilevsky, D. Danis, G. Balagura, G. Baynam, A. M. Brower, et al., The human phenotype ontology in 2021, Nucleic acids research 49 (2021) D1207–D1217. [11] C. L. Smith, J. T. Eppig, The mammalian phenotype ontology as a unifying standard for experimental and high-throughput phenotyping data, Mammalian genome 23 (2012) 653–668. [12] E. Jiménez-Ruiz, Logmap family participation in the oaei 2021, in: CEUR Workshop

Proceedings, volume 3063, 2021, pp. 175–177. [13] D. Faria, B. Lima1, F. Couto, M. Silva, C. Pesquita, Aml and amlc results for oaei 2021, in: The 23rd International Conference on Information Integration and Web Intelligence, 2021, pp. 131–136. [14] S. Hertling, H. Paulheim, Atbox results for oaei 2021, in: CEUR Workshop Proceedings, volume 3063, RWTH Aachen, 2021, pp. 137–143. [15] D. Kossack, N. Borg, L. Knorr, J. Portisch, Tom matcher results for oaei 2021, in: CEUR Workshop Proceedings, volume 3063, RWTH, 2022, pp. 193–198.

[1]

Chen ,

Hu ,

Jimenez-Ruiz ,

O. M.

Holter ,

Antonyrajah , I. Horrocks , Owl2vec*: Embedding of owl ontologies, Machine Learning 110 ( 2021 ) 1813 - 1845 .

[2]

Chen ,

Tian ,

Yang ,

Zaniolo , Multilingual knowledge graph embeddings for cross-lingual knowledge alignment , arXiv preprint arXiv:1611.03954 ( 2016 ).

[3]

Lin ,

Liu ,

Sun ,

Liu ,

Zhu , Learning entity and relation embeddings for knowledge graph completion , in: Twenty-ninth AAAI conference on artificial intelligence , 2015 .

[4]

L. T.

Slater ,

G. V.

Gkoutos ,

Hoehndorf , Towards semantic interoperability: finding and repairing hidden contradictions in biomedical ontologies , BMC Medical Informatics and Decision Making 20 ( 2020 ) 1 - 13 .

[5]

Martinez-Gil ,

Yin ,

Küng ,

Morvan , Matching large biomedical ontologies using symbolic regression , in: The 23rd International Conference on Information Integration and Web Intelligence , 2021 , pp. 162 - 167 .

[6]

Kazakov ,

Krötzsch ,

Simančík , Elk: a reasoner for owl el ontologies, System Description ( 2012 ).

[7]

Cheatham , P. Hitzler, Conference v2. 0: An uncertain version of the oaei conference benchmark , in: International Semantic Web Conference, Springer, 2014 , pp. 33 - 48 .

[8]

Bock ,

Dänschel ,

Stumpp , Mappso and mapevo results for oaei 2011 , in: Proceedings