=Paper=
{{Paper
|id=Vol-3324/oaei22_paper2
|storemode=property
|title=A-LIOn - alignment learning through inconsistency negatives of the aligned ontologies
|pdfUrl=https://ceur-ws.org/Vol-3324/oaei22_paper2.pdf
|volume=Vol-3324
|authors=Sarah M. Alghamdi,Fernando Zhapa-Camacho,Robert Hoehndorf
|dblpUrl=https://dblp.org/rec/conf/semweb/AlghamdiZH22
}}
==A-LIOn - alignment learning through inconsistency negatives of the aligned ontologies==
A-LIOn - Alignment Learning through Inconsistency negatives of the aligned Ontologies Sarah M. Alghamdi1,2 , Fernando Zhapa-Camacho1 and Robert Hoehndorf1 1 Computational Bioscience Research Center, Computer, Electrical & Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology, 4700 KAUST, 23955 Thuwal, Saudi Arabia 2 King Abdul-Aziz University, Faculty of Computing and Information Technology, Rabigh, 25732, Kingdom of Saudi Arabia Abstract Ontologies play an important role in sharing and reusing knowledge. Several ontologies have been developed to describe a particular domain but from different perspectives from communities of developers and users. This has led to the existence of multiple ontologies covering the same or a different domain with varying degrees of variability. Ontology Alignment is typically used to identify correspondences between semantically related elements of two or more ontologies in order to address this problem. We propose A-LIOn a system that learns alignments by combining lexical and semantic approaches as well as machine learning. The system utilizes OWL EL reasoning for negative sampling which is iteratively used to inform the correction of the learning of the alignments. We demonstrate that A-LIOn produces alignments that are coherent with respect to OWL EL. Keywords Ontology Alignments, Ontology matching, Inconsistency negatives 1. Presentation of the System Alignment Learning through Inconsistency negatives of the aligned Ontologies (A-LIOn) is a system that discovers alignments between ontologies by combining various matching tech- niques, ranging from entity-level label matching to structure-level taxonomy learning and graph projection to logical reasoning and inconsistency detection and learning. This is the first participation of A-LIOn in the Ontology Alignment Evaluation Initiative (OAEI). 2. Proposed Methods An ontology πͺ can be defined over a signature πͺ βΆ= (πΆ, π , πΌ ; ππ₯), where πΆ is a set of concept names, π is a set of relation names, πΌ is a set of individual names, and ππ₯ is a set of axioms. Given two ontologies πͺπ , πͺπ‘ , the purpose of ontology alignment is to find the pairs of entities (ππͺπ π , ππͺπ‘ π ) β πΆπͺπ Γ πΆπͺπ‘ that are considered as being equivalent or standing in a subclass relation OAEI 2022 Envelope-Open sarah.alghamdi.1@kaust.edu.sa (S. M. Alghamdi); fernando.zhapacamacho@kaust.edu.sa (F. Zhapa-Camacho); robert.hoehndorf@kaust.edu.sa (R. Hoehndorf) Orcid 0000-0001-5544-7166 (S. M. Alghamdi); 0000-0002-0710-2259 (F. Zhapa-Camacho); 0000-0001-8149-5890 (R. Hoehndorf) Β© 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) within certain contexts. A graph is defined as a tuple πΊ = (πΈ, π , π ), where πΈ is a set of entities names, π is a set of relations names and π β πΈ Γ π Γ πΈ is a set of triples of the form (βπππ, πππππ‘πππ, π‘πππ). A projection of an ontology into a graph is a mapping π βΆ πͺ β πΊ that maps the ontology classes into graph nodes, ontology roles as graph relations, and ontology axioms as graph triples following a particular set of rules. Our method A-LIOn combines different matching techniques and consists of four main components (see Figure 1): β’ Learning lexical matching seeds. β’ Graph construction from source and target ontology. β’ Graph embedding and transformation learning. β’ Consistency checking. Those components cover element-wise, structure-wise, and formal semantics learning tech- niques. (a) Element-wise techniques consider the entities in the ontology in isolation in order to find alignments disregarding the fact that they are part of the ontologyβs structure. This means that we use the information belonging to an ontology class itself such as its textual annotations and labels only. (b) On the other hand, structure-wise techniques analyze the entities as part of their structure. In our case, we focus on adjacency structure within the ontology and extract the structure in the form of a graph. (c) Finally, the semantic component consists of employing formal semantics learning techniques and logical inference to identify correspondences and repair inconsistencies. 2.1. Learning Lexical matching seeds To begin learning ontology alignment, we need some known-to-be-positive seed alignments. We chose to align the classes of both ontologies with the same IRI, or lexically matched labels and relative IRIs. For lexical matching, we utilize fuzzy lexical matching, a method for finding approximate string matching with a retrieved score representing the similarity between one string to another. We begin with an exact matching score and then we decrease the threshold iteratively until a sufficient number of seeds are obtained or a minimal accepted threshold is reached. The number of matching seeds required is a parameter of our method. 2.2. Ontology Projection We project each ontology as a graph in order to learn structure-level information from the source and the target ontologies. We evaluate two graph construction techniques: β’ Subsumption hierarchy: in this method, we only utilized the subclass axioms asserted between the ontology classes to generate a directed graph for the source ontology and the target ontology. We evaluated this technique for Anatomy, Conference, Biodiversity and Ecology, and Material Sciences and Engineering tracks. Source ontology Target ontology Symbolic Graph projection Lexical Seed Learning Neural Transformation Embedding learning vc4 vc1 vc2 Add as negatives min || M c1 - c1 ||2 min || (M.cs - ct) - (M.cns - cnt) ||2 vc1 vc3 vc2 Find inconsistent alignments (Explanations of unsatisfiability ) min || Mr.c1+ r1- Mr.c2 ||2 No yes Is consistent? Return Alignments (OWL reasoner) Merging ontologies + Alignment Figure 1: A-LIOn system component workflow β’ OWL Projection: This method was proposed in [1] where OWL axioms are transformed directly into edges in the graph, and complex axioms are approximated in the graph to avoid the use of blank nodes. Despite the fact that this transformation method does not preserve exact logical relations, it enables correlation and learning alignments between classes of the source and target ontologies as well as within the same ontology. We evaluated this technique using the Phenotype ontology alignment. 2.3. Transformation Learning After projecting an ontology, the result is a graph. Depending on the chosen projection method (Section 2.2), these graphs would encode the taxonomical structure or relational information found in the ontologies. In our method, we start with two ontologies πͺπ (source) and πͺπ‘ (target), which, after applying the graph projection, will become two graphs πΊπ , πΊπ‘ , respectively. When we deal with two graphs, there are several graph alignment methods that can align two graphs from a small number of seed alignments; we follow the method in [2]. To learn representations of the two graphs πΊπ , πΊπ‘ , we define two vector spaces ππ , ππ‘ , where the entities (nodes and edges) of each graph will be processed separately. To learn the graph embeddings we rely on knowledge graph embeddings methods such as TransR [3], optimizing the following loss function: ππ πππ π = βπππ β ππ π + ππ β πππ β ππ π β (1) for each relation ππ in the source graph where the triple (ππ π , ππ , ππ π ) exists. ππ‘πππ π = βπππ‘ β ππ‘π + ππ‘ β πππ‘ β ππ‘π β (2) for each relation ππ‘ in the target graph where the triple (ππ‘π , ππ‘ , ππ‘π ) exists. Simultaneously, we use a transformation π βΆ πΈπΊπ β πΈπΊπ‘ that takes the entities from the seeds we found earlier (Section 2.1) from the source embedding space to the target space, using the following loss: π΄πππ π = βπ β ππ β ππ‘ β (3) 2.4. Inconsistency negatives learning OWL ontologies are based on Description Logic and facilitate the use of automated reasoners, which in turn facilitate computing entailments of statements from the asserted ontology axioms. In addition, these inferences can be investigated to determine if a class in an ontology is satisfiable or unsatisfiable. A class is unsatisfiable if it cannot have any instances (i.e., the axioms constrain the class in a contradictory way); an ontology is inconsistent if it has at least one instance of a logical contradiction [4, 5]. We utilize the ELK reasoner [6] to find alignments that lead to unsatisfiable classes. In order to find unsatisfiable classes in aligning πͺπ and πͺπ‘ , we first merge both ontologies (i.e., we combine their axioms into a new ontology) and add all alignments predicted by our model as equivalence class axioms to the merged ontology πͺππππππ . We define this ontology as πͺππππππ βΆ= (πΆπ βͺ πΆπ‘ , π π βͺ π π‘ , πΌπ βͺ πΌπ‘ , ππ₯π βͺ ππ₯π‘ , π΄), where πΆπ is a set of concepts from ontology π, π π is a set of relations form ontology π, πΌπ is a set of individuals form ontology π, and ππ₯π is a set of axioms from ontology π, π΄ is the predicted alignment. Then we use the ELK reasoner [6] to identify unsatisfiable classes in the merged ontology. If we identify an unsatisfiable class, we generate explanations for the entailment generated by ELK; an explanation consists of a small set of axioms from which the unsatisfiability follows directly; we specifically identify any of the equivalence class axioms we have added within the generated explanations, as these are likely causing the class to become unsatisfiable. We remove the equivalence class axioms causing unsatisfiable classes from the merged ontology and iterate. Finally, we return to the transformation learning step with an updated loss to optimize for alignment learning as follows: π΄πππ π = β(π β ππ β ππ‘ ) β (π β πππ β πππ‘ )β (4) where ππ , ππ‘ are positive class pairs from source ontology and target ontology, respectively, πππ , πππ‘ are pairs of classes which gave rise to unsatisfiable classes and which we removed in the repair step. The new iteration of our method now uses these pairs as negatives during training in the alignment of both ontologies. We repeat this step until no more unsatisfiable classes remain. 3. Results For this yearβs evaluation, we tested A-LIOn in three tracks: Anatomy, Conference and Material Sciences and Engineering (MSE). We have also tested our system on the phenotypes track using last yearβs evaluation tests. 3.1. Participation in OAEI We selected tracks that align ontologies that contain disjoint class assertion axioms. Disjoint class assertion axioms are a common cause of inconsistencies, and, therefore, we will be able to observe the performance of our method in correcting and training to avoid inconsistencies. In the anatomy track, the ontology file h u m a n . o w l contains 17 disjoint class assertion axioms. In the conference track, the number of disjoint class assertion axioms are as follows: 81 in c m t . o w l , 42 in C o n f e r e n c e . o w l , 15 in c o n f i o u s . o w l , 129 in c o n f O f . o w l , 36 in c r s _ d r . o w l , 1,221 in e d a s . o w l , 222 in e k a w . o w l , 3 in i a s t e d . o w l , 12 in M I C R O . o w l , 384 in M y R e v i e w . o w l , 237 in O p e n C o n f . o w l , 396 in p a p e r d y n e . o w l . Finally, in the Material Sciences and Engineering track 158 disjoint class axioms was found in M a t o n t o . All the results can be found in OAEI 2022 campaign page http://oaei.ontologymatching.org/2022/results/. 3.1.1. Anatomy In terms of precision, recall, and F-measure, the matching performance of A-LIOn in the anatomy track were below the string equivalence baseline. The main issue that affected our performance in this track is the small number of the predicted alignments and the small number of inconsistencies discovered using the OWL EL reasoner. The main issue affecting the performance of A-LIOn in the anatomy track is the limited number of initial seeds discovered based on the parameters settings we used, which substantially affected recall. To overcome this limitation, an adaptive method that uses a specific pairs of ontologies to determine parameters (such as for seed matching) could be developed and used to overcome this limitation. 3.1.2. Conference The Conference track contains information about conference organization. This track comes in two versions: standard and uncertain. The standard version of the Conference track contains a reference alignment which was the result of a βConsensus Workshopβ in 2008. However, some of these alignments may not be possible to detect either by a computational algorithm or manually by humans [7, 8]. For that reason, the uncertain version of the conference track was generated by consulting a group of experts and computing the ratio of agreement on each match. As a consequence, the uncertain track is more realistic because it removes the controversial alignments (i.e., the ones for which the experts could not reach a consensus). For that reason, when the evaluation is done on the uncertain version of the track, it is expected that systems increase their performance. A-LIOn has the highest increase with respect to the standard version among all the systems. This suggests that A-LIOn is capable to detect non-controversial alignments more easily than the controversial ones. The current version of A-LIOn uses OWL EL reasoning to detect and exclude alignments that cause inconsistencies. However, the results show that A-LIOn does not detect all inconsistent alignments. The main reason for this lack of removing all incoherent alignment is the use of more expressive description logics than OWL EL. A-LIOn only uses OWL EL reasoning because computing entailments in expressive description logics has a high computational complexity and may not always be successful for larger ontologies, such as those used in the biomedical domain. However, the ontologies used in the Conference track are small compared to ontologies in other tracks such as Anatomy. In a future version of A-LIOn, we may include additional reasoners, including reasoners for more expressive logics. 3.1.3. Material Sciences and Engineering There are three test cases for the Material Sciences and Engineering track. The first and second test cases align MatOnto ontology to the Material Information ontology, and the third case EMMO ontology to Material Information ontology. MatOnto contains 158 disjoint class axioms and could thus introduce useful inconsistencies that can be exploited by our method. The results indicate that A-LIOn had the highest recall, and an F-measure comparable to the other tested methods. However, there is one test case where A-LIOn failed to parse the labels in the ontology (the EMMO ontology); consequently, A-LIOn failed to produce any alignments. 3.2. Phenotype matching use case We tested the OWL projection method in the problem of aligning phenotype ontologies. To test this approach, we utilized the datasets provided last year [9] for aligning Human phenotype ontology (HP) [10] and Mammalian Phenotype Ontology (MP) [11]. The seed alignments we used are exactly matching IRIs of classes, as well as lexical alignments for HP and MP classes only. We tested two different approaches for generating the graphs from source and target ontologies (Section 2.2). Results are shown in Table 3.2 where we included the results for some of the participating systems from last year for comparison [12, 13, 14, 15]. Comparing the results of the various graph generation techniques, we found that using the OWL projection in the problem of phenotype mappings allows for the discovery of more mappings, whereas the subsumption hierarchy produces alignments with high precision but finds fewer alignments, thereby decreasing the recall. Table 1 Phenotype use case test results on last year 5-consensus. We show the results for different variations of A-LIOn, starting with the use of the subsumption hierarchy graph (SH), projected graph (P) number of alignments Precision Recall F-score LogMap 2,136 0.767 0.908 0.831 AML 2,029 0.810 0.910 0.857 ATMatcher 769 0.968 0.412 0.578 TOM 306 0.101 0.140 0.117 A-LIOn - (SH) 700 0.986 0.382 0.551 A-LIOn - (P) 1078 0.822 0.732 0.774 4. Conclusion A-LIOn is a system that incorporates both entity-level and structure-level information in learning alignments between two ontologies; A-LIOn also uses logical reasoning to correct alignments that are likely faulty because they lead to unsatisfiable classes, and incorporates the results of this symbolic step in the learning process to generate new negatives. In the future, we plan to make our system able to learn better parameters based on the input ontologies features and self-evaluate the predicted alignment. For example, using a different set of parameters for anatomy and the first task on Material Sciences and Engineering tracks allowed us to increase the F-score by 10% and 3.3% respectively. A further improvement will be the use of language models in seed selection. References [1] J. Chen, P. Hu, E. Jimenez-Ruiz, O. M. Holter, D. Antonyrajah, I. Horrocks, Owl2vec*: Embedding of owl ontologies, Machine Learning 110 (2021) 1813β1845. [2] M. Chen, Y. Tian, M. Yang, C. Zaniolo, Multilingual knowledge graph embeddings for cross-lingual knowledge alignment, arXiv preprint arXiv:1611.03954 (2016). [3] Y. Lin, Z. Liu, M. Sun, Y. Liu, X. Zhu, Learning entity and relation embeddings for knowledge graph completion, in: Twenty-ninth AAAI conference on artificial intelligence, 2015. [4] L. T. Slater, G. V. Gkoutos, R. Hoehndorf, Towards semantic interoperability: finding and repairing hidden contradictions in biomedical ontologies, BMC Medical Informatics and Decision Making 20 (2020) 1β13. [5] J. Martinez-Gil, S. Yin, J. KΓΌng, F. Morvan, Matching large biomedical ontologies using symbolic regression, in: The 23rd International Conference on Information Integration and Web Intelligence, 2021, pp. 162β167. [6] Y. Kazakov, M. KrΓΆtzsch, F. SimanΔΓk, Elk: a reasoner for owl el ontologies, System Description (2012). [7] M. Cheatham, P. Hitzler, Conference v2. 0: An uncertain version of the oaei conference benchmark, in: International Semantic Web Conference, Springer, 2014, pp. 33β48. [8] J. Bock, C. DΓ€nschel, M. Stumpp, Mappso and mapevo results for oaei 2011, in: Proceedings of the 6th International Conference on Ontology Matching - Volume 814, OMβ11, CEUR- WS.org, Aachen, DEU, 2011, p. 179β183. [9] M. Pour, A. Algergawy, F. Amardeilh, R. Amini, O. Fallatah, D. Faria, I. Fundulaki, I. Harrow, S. Hertling, P. Hitzler, et al., Results of the ontology alignment evaluation initiative 2021, in: CEUR Workshop Proceedings 2021, volume 3063, CEUR, 2021, pp. 62β108. [10] S. KΓΆhler, M. Gargano, N. Matentzoglu, L. C. Carmody, D. Lewis-Smith, N. A. Vasilevsky, D. Danis, G. Balagura, G. Baynam, A. M. Brower, et al., The human phenotype ontology in 2021, Nucleic acids research 49 (2021) D1207βD1217. [11] C. L. Smith, J. T. Eppig, The mammalian phenotype ontology as a unifying standard for experimental and high-throughput phenotyping data, Mammalian genome 23 (2012) 653β668. [12] E. JimΓ©nez-Ruiz, Logmap family participation in the oaei 2021, in: CEUR Workshop Proceedings, volume 3063, 2021, pp. 175β177. [13] D. Faria, B. Lima1, F. Couto, M. Silva, C. Pesquita, Aml and amlc results for oaei 2021, in: The 23rd International Conference on Information Integration and Web Intelligence, 2021, pp. 131β136. [14] S. Hertling, H. Paulheim, Atbox results for oaei 2021, in: CEUR Workshop Proceedings, volume 3063, RWTH Aachen, 2021, pp. 137β143. [15] D. Kossack, N. Borg, L. Knorr, J. Portisch, Tom matcher results for oaei 2021, in: CEUR Workshop Proceedings, volume 3063, RWTH, 2022, pp. 193β198.