GMRs: Reconciliation of Generic Merge Requirements in Ontology Integration Samira Babalou[0000−0002−4203−1329] , Birgitta König-Ries[0000−0002−2382−9722] Heinz-Nixdorf Chair for Distributed Information Systems Institute for Computer Science, Friedrich Schiller University Jena, Germany {samira.babalou,birgitta.koenig-ries}@uni-jena.de Abstract. With growing popularity of ontologies as semantics-aware integration solutions, various ontology merging methods have been proposed. Each of them uses their own set of criteria to determine a desirable merge result. In this work, we first categorize these criteria into generic merge requirements (GMRs). Not all of these requirements can be met simultaneously. We argue that users should be able to select those requirements that are most important to them, but that system support is required to determine a compatible set of requirements based on user inputs. We propose a graph-theory based approach to determining such sets. Keywords: Ontology merging . Merge requirements . Graph theory 1 Introduction An ontology is a formal explicit description of a domain. It contains a set of entities including classes, properties, and instances. Given a set of input ontologies and a set of correspondences between them, an ontology merging process creates a new merged ontology. One aspect that various approaches to ontology merging differ in is the set of criteria they aim to fulfill, i.e., what are the requirements they expect the merged ontology to meet. We have conducted an analysis of the literature and determined which criteria, here called generic merge requirements (GMR), are used by different approaches. Our ultimate aim is to provide a flexible merging approach, where users can actively choose which requirements are important to them, instead of allowing only a very indirect choice by picking the right merging method, something that is not transparent to the user. Unfortunately, not all GMRs are compatible with each other. For instance, one may want to preserve all classes contained in the original ontology in the merged ontology. On the other hand, one could wish to achieve class acyclicity. Likely, these goals conflict. Thus, once a user has chosen which GMRs they consider important, a system is needed that is able to check whether these are compatible and which other requirements can be met simultaneously. In this paper, we take the first step towards this goal: We analyse the literature to compile a list of GMRs, we provide first insights into their compatibility and we describe a graph-theory based method for determining maximal sets of compatible GMRs. Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). Completeness Deduction Acyclicity Logic Properties 𝑅𝑅𝑅. Class preservation 𝑅𝑅12. Entailment deduction 𝑅𝑅𝑅𝑅. Acyclicity in the class hierarchy Integrity satisfaction 𝑅𝑅𝑅𝑅. Acyclicity in the property 𝑅𝑅𝑅. Property preservation hierarchy Minimality Constraint 𝑅𝑅𝑅. Instance preservation 𝑅𝑅18. Prohibition of properties 𝑅𝑅13. One type restriction being inverses of themselves 𝑅𝑅8. Class redundancy prohibition 𝑅𝑅4. Correspondence preservation 𝑅𝑅9. Property redundancy 𝑅𝑅14. Property value’s constraint Connectivity 𝑅𝑅5. Correspondences’ Property prohibition preservation 𝑅𝑅𝑅𝑅. Property’s domain and 𝑅𝑅19. Unconnected class prohibition 𝑅𝑅𝑅𝑅. Instance redundancy range oneness 𝑅𝑅6. Property’s value preservation prohibition 𝑅𝑅20. Unconnected property prohibition 𝑅𝑅𝑅𝑅. Extraneous entity Model Properties 𝑅𝑅7. Structure preservation prohibition Legend Dimension Aspect GMR Fig. 1: GMR classification 2 GMRs Overview and Classification To discover the most commonly used GMRs, we have studied and analyzed three different research areas including (i) ontology merging methods [1,2,3,6,7,10], (ii) ontology merging benchmarks [4,9], and (iii) ontology engineering [5,8]. Overall, we classify the GMRs according to three dimensions subdivided into different aspects (see Fig. 1). The three dimensions we identified are: – Integrity: This dimension refers to the degree of knowledge coverage in the merge process via (i) the completeness aspect, and to the amount of knowledge redundancy by (ii) the minimality aspect. – Model Properties: Within this dimension, the principles of creating a new ontology model are investigated. In this dimension, (i) acyclicity, and (ii) connectivity satisfaction aspects are taken into account. – Logic Properties: The inference of the expected knowledge with involved constraints is analyzed in this dimension. This includes (i) deduction, and (ii) constraint satisfaction aspects. We determined six aspects that GMRs can be grouped into: Completeness refers to knowledge preservation and coverage: R1. Class preservation: Each class in (all/target) 1 input ontologies should have a mapped class in the merged ontology [1,3,6,7,10]. R2. Property preservation: Each property from the (all/target) input ontologies is explicitly in or implied by the merged ontology [7,10]. R3. Instance preservation: All instances of (all/target) input ontologies should be preserved in the merged ontology [7,10]. R4. Correspondence preservation: If two entities of the input ontologies are corresponding 2 , both should map to the same merged entity in the merged ontology [6,7,10]. 1 Preserving all entities from all input ontologies or a preferred one. 2 This can be equality, similarity or is-a correspondences. In each case, the same type of corresponding should be preserved. R5. Correspondences’ property preservation: If any of the corresponding entities from the input ontologies has a certain property, the merged entity should also have this property [6,7]. R6. Property’s value preservation: Properties’ values from the (all/target) input ontologies should be preserved in the merged ontology [6,7]. In case of conflicts a resolution strategy is required. R7. Structure preservation: If two entities are connected via a certain property in an input ontology, their mapped entities in the merged ontology should be connected via the respective mapped property [1], thus preserving the input ontologies’ structures in the merged ontology. Minimality refers to knowledge redundancy and controlling of semantic overlap: R8. Class redundancy prohibition: A class from the (all/target) input ontologies should have at most one mapping in the merged ontology [1,3,4,7,10]. R9. Property redundancy prohibition: A property from the (all/target) input ontologies should have at most one mapping in the merged ontology [4]. R10. Instance redundancy prohibition: An instance from the (all/target) input ontologies should have at most one mapping in the merged ontology. R11. Extraneous entity prohibition: No additional entities other than input ontologies’ entities should be added in the merged result [7]. Deduction refers to the deduction satisfaction with R12 : R12. Entailment deduction satisfaction: The merged ontology is desirable to be able to entail all entailments of the (all/target) input ontologies [2]. As the semantic consequences of the integration, it can include more entailments but it should at least not miss knowledge from the input ontologies. Constraint reflects the satisfaction of the ontology constraints: R13. One type restriction: Two corresponding entities should follow the same data type [7]; e.g., if the range of author Id in one of the input ontology is String and in the other one is Integer, then the range of the merged entity author Id in the merged ontology cannot have both types. R14. Property value’s constraint: If the (all/target) input ontologies place some restriction on a property’s values (e.g., in terms of cardinality or by enumerating possible values) this should be preserved without conflict in the merged ontology [7]. R15. Property’s domain and range oneness: The merge process should not result in multiple domains or ranges defined for a single property. This rule is recast from the ontology modelling issues in [8]. Acyclicity refers to controlling the chain problem in the merged ontology: R16. Acyclicity in the class hierarchy: A cycle of is-a relationships implies equality of all of the classes in the cycle, since is-a is transitive. Therefore, the merge process should not produce a cycle in the class hierarchy [1,3,5,7,8,10]. R17. Acyclicity in the property hierarchy: The merge process should not produce a cycle between properties with respect to the is-subproperty-of relationship [8]. R18. Prohibition of properties being inverses of themselves: The merged process should not cause an inverse recursive definition on the properties [8]. Connectivity refers to the hierarchy connectivity satisfaction: R19. Unconnected class prohibition: The merge process should not make the classes unconnected [3,7]. Every class that had some connections in the input ontologies before the merge process, should not be unconnected after merge process in the merged ontology. R20. Unconnected property prohibition: The merge process should not make the properties unconnected [6,8]. Every property that had some connections in the input ontologies before the merge process, should not be unconnected after merge process in the merged ontology. 3 GMR Compatibility Intuitively, not all GMRs are compatible, e.g., completeness and minimality are difficult to achieve simultaneously. In our ongoing work, we are determining the precise relationships. Typically, users will, however, not be interested in all of them anyhow. Depending on the application they want to create an ontology for, some GMRs might be important to them while others are only nice to have. We therefore propose an approach that takes as input a set of user-selected GMRs, namely U and an undirected graph G reflecting the relationship among GMRs. G = (V, E), where V is the set of vertices representing the GMRs, and E is the set of edges. Two GMRs are connected via an edge iff they are compatible. Given the set U containing the GMRs the user is interested in, we find the maximum subset of V containing all vertices out of U and no incompatible nodes. This may not always be possible, since the user might have chosen incompatible GMRs already. In this case, we search for a maximum subset of V that preserves as many nodes out of U as possible and contains compatible nodes only. Precisely, we recast the problem at hand as maximum clique extraction on G. A clique is a set of fully connected vertices. We thus extract a compatible clique KC -Clique, where K indicates the number of vertices in the clique, and C denotes that the clique is compatible. KC -Clique includes compatible GMRs from U and some additional compatible GMRs related to U’s elements. KC -max-Clique, is a clique containing at least K vertices that is not a subset of any other cliques. To compute the KC -max-Clique, we use the CLIQUES algorithm in [11]. To illustrate our approach with an example, Fig. 2 left shows the graph G, for six GMRs where U = {R2 , R8 , R11 }. Based on this graph, three different cliques have been extracted, in which the user-selected compatible GMRs, the incompatible U’s elements, and the additional compatible GMRs related to U are indicated with green, red, and yellow circles, respectively. Two possible represented 3C -Clique are compatible cliques however they are not a maximal clique. The 4C -max-Clique is the best match to the U and indicates the maximal compatible subset on the sketched G. 𝑹𝑹𝟒𝟒 𝑹𝑹𝟐𝟐 𝑹𝑹𝟒𝟒 𝑹𝑹𝟐𝟐 𝑹𝑹𝟒𝟒 𝑹𝑹𝟐𝟐 𝑹𝑹𝟒𝟒 𝑹𝑹𝟐𝟐 𝑹𝑹𝟏𝟏𝟏𝟏 𝑹𝑹𝟏𝟏𝟏𝟏 𝑹𝑹𝟏𝟏𝟏𝟏 𝑹𝑹𝟏𝟏𝟏𝟏 𝑹𝑹𝟏𝟏𝟏𝟏 𝑹𝑹𝟏𝟏𝟏𝟏 𝑹𝑹𝟏𝟏𝟏𝟏 𝑹𝑹𝟏𝟏𝟏𝟏 𝑹𝑹𝟖𝟖 𝑹𝑹𝟏𝟏𝟏𝟏 𝑹𝑹𝟖𝟖 𝑹𝑹𝟏𝟏𝟏𝟏 𝑹𝑹𝟖𝟖 𝑹𝑹𝟏𝟏𝟏𝟏 𝑹𝑹𝟖𝟖 𝑹𝑹𝟏𝟏𝟏𝟏 Fig. 2: From left to right: a graph G for six GMRs; two different compatible 3C -Clique; a compatible 4C -max-Clique. 4 Conclusion There is a variety of generic merge requirements (GMR) in the ontology engineering domain. We gathered the most common GMRs 3 , and classified them into three dimensions. Since not all GMRs can be fulfilled at the same time, we propose an approach that allows users to specify the most important GMRs for their specific task, then determines the maximal compatible subset of GMRs. This result can then be used to select a proper merge method or to parameterize a generic merge method. We are currently working on two aspects of this problem: First, a systematic determination of GMR compatibility and second the development of a generic, parameterizable merge method. Acknowledgments. S. Babalou is supported by a scholarship from German Academic Exchange Service (DAAD). References 1. F. Duchateau and Z. Bellahsene. Measuring the quality of an integrated schema. In ER, pages 261–273, 2010. 2. E. Jiménez-Ruiz, B. C. Grau, I. Horrocks, and R. Berlanga. Ontology integration using mappings: Towards getting the right logical consequences. In ESWC, 2009. 3. S. P. Ju, H. E. Esquivel, A. M. Rebollar, and M. C. Su. Creado–a methodology to create domain ontologies using parameter-based ontology merging techniques. In MICAI, 2011. 4. M. Mahfoudh, G. Forestier, and M. Hassenforder. A benchmark for ontologies merging assessment. In KSEM, pages 555–566, 2016. 5. N. F. Noy, D. L. McGuinness, et al. Ontology development 101: A guide to creating your first ontology, 2001. 6. N. F. Noy and M. A. Musen. The prompt suite: interactive tools for ontology merging and mapping. IJHCS, 59(6):983–1024, 2003. 7. R. A. Pottinger and P. A. Bernstein. Merging models based on given correspondences. In VLDB Endowment, 2003. 8. M. Poveda-Villalón, M. C. Suárez-Figueroa, and A. Gómez-Pérez. Validating ontologies with oops! pages 267–281, 2012. 9. S. Raunich and E. Rahm. Towards a benchmark for ontology merging. In OTM, volume 7567, pages 124–133, 2012. 10. S. Raunich and E. Rahm. Target-driven merging of taxonomies with atom. Inf. Syst., 42:1–14, 2014. 11. E. Tomita, A. Tanaka, and H. Takahashi. The worst-case time complexity for generating all maximal cliques and computational experiments. TCS, 2006. 3 see also: http://comerger.uni-jena.de/requirement.jsp