Pattern-Guided Association Rule Mining for Complex Ontology Alignment Beatriz Lima, Daniel Faria, and Catia Pesquita LASIGE, Dep. Informática, Faculdade de Ciências, Universidade de Lisboa, Portugal Abstract. Aligning real-world ontology pairs often requires establish- ing complex correspondences, as conceptual differences between them may be too profound to bridge with simple equivalence correspondences. Yet, most ontology alignment algorithms are restricted to finding simple equivalences between ontology entities. This work presents a suite of novel algorithms for Complex Ontology Alignment (COA) that rely on a targeted application of Association Rule Mining (ARM) to known complex alignment patterns. This approach reduces the ARM search space, and enables the application of tailored semantic filtering algorithms for refining the mappings. We evaluated our approach using a pattern-oriented manual method, which yielded a global weighted precision of 75%, but revealed our ap- proach was unable to find mappings for some of the patterns present in the reference. On the other hand, our approach found several mappings for patterns not present in the reference with high weighted precision, highlighting the importance of establishing evaluation metrics that con- sider varying degrees of correctness while being fully automated. Keywords: Ontology Matching · Ontology Alignment · Complex On- tology Matching · Association Rule Mining 1 Introduction Ontology alignment is critical to address the semantic heterogeneity problem, as it finds correspondences that enable integrating data across the Semantic Web. One of its biggest challenges is that ontology schemas often differ conceptually, making it necessary to establish complex correspondences. A complex correspon- dence is an ontology mapping where at least one of the mapped entities is an expression, rather than a simple ontology entity. The expressions used in com- plex mappings include restrictions (e.g. ∀x, y, o1 : M arriedP erson(x) ≡ o2 : hasSpouse(x, y) ∧ o2 : P erson(y)) and constructions using logical operatores (e.g. ∀xo1 : M other(x) ≡ o2 : P arent(x) ∧ o2 : W oman(x)). The relevance of the Complex Ontology Alignment (COA) sub-field has been acknowledged by the Ontology Alignment Evaluation Initiative (OAEI) 1 who Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). 1 http://oaei.ontologymatching.org Beatriz Lima, Daniel Faria, and Catia Pesquita introduced a Complex track in 2018 [6]. As of 2020, only three out of twelve par- ticipating systems were able to produce complex mappings (AMLC [2], AROA [9] and CANARD [8]) and their performance was very modest in comparison with the results of simple matching tracks [4]. The strategies employed by these sys- tems can be divided into two categories: lexical and instance-based approaches. AMLC employs a lexical approach, which is inherently limited to finding the subset of complex mappings where there is lexical similarity between all entities mapped. CANARD, AROA and this work employ instance-based approaches, which use statistical and pattern mining techniques over a dataset of individuals shared (or mapped) between the two ontologies. AROA uses an Association Rule Mining (ARM) algorithm, FP-Growth [3], over a transaction database derived from the instance-level triples shared by two ontologies, thus demonstrating how a complex ontology alignment dataset can be transformed into a traditional ARM problem. Predefined complex alignment patterns [5] are then used to filter the generated association rules and produce complex ontology mappings. ARM exploring a shared set of instances is a promising approach. However, the fact that we have prior knowledge of the complex alignment patterns we want to find makes it inefficient to use a “catch-all” ARM algorithm to perform an exhaustive search for frequent itemsets, and only use the knowledge of the patterns a posteriori to filter the rules. Therefore, we propose to invert this paradigm, by using predefined complex alignment patterns to guide ARM. This effectively reduces the search space and allows the application of semantic-based filtering algorithms tailored to each kind of pattern, to select and refine the most relevant mappings. 2 Algorithms Our ontology alignment approach consists of the following steps: 1. An initial ontology loading step retrieves the set of shared individuals be- tween the two ontologies and organises the ontology information (types, re- lations and property values of each individual, ranges and domains of the properties and hierarchical relations between classes) in hash-tables. 2. For each complex alignment pattern, an individual pattern matching algo- rithm iterates through the set of shared individuals, and, for each individual, it searches the hash-table data structures containing the relevant data for the targeted alignment pattern. For each mapping candidate found, we in- crement the support (i.e., the frequency) of the source and target entities in the mapping and the support of the mapping itself (i.e., the fraction of shared individuals that have both the source and target entities). 3. A common ARM matching algorithm is then invoked by each pattern match- ing algorithm to filter mapping candidates by support and confidence, there- fore extracting association rules. 4. Filtering algorithms select which of the candidate mappings to include in the final alignment, excluding redundant mappings and conflicting mappings with lower confidence. An aggregator algorithm combines mappings for the Pattern-Guided Association Rule Mining for Complex Ontology Alignment same entity into a single mapping using logical operators, such as “AND” and “OR”. Our algorithms cover eight distinct complex patterns, from which seven were found in the cmt − conf erence dataset (see Table 1). Additionally, they can produce combinations of these patterns through disjunction and conjunction. 3 Evaluation We integrated our algorithms in the ontology matching system AMLC [1,2], and assessed their performance in the cmt − conf erence alignment 2 by manually classifying the mappings according to a rating scale consisting of the following five categories with associated scores: ◦ Correct [1.0]: The mapping is formally correct (regardless of whether it is present in the reference alignment). ◦ Nearly correct [0.75]: Only minor corrections necessary (e.g., alter the map- ping relation type or substitute a class for its sub- or super-class). ◦ Plausible [0.5]: The mapping seems sensible and no information in the on- tologies or reference alignment contradicts it. ◦ Implausible [0.25]: The mapping seems incorrect and is likely derived from biases in the dataset, but no information in the ontologies or reference align- ment contradicts it. ◦ False [0.0]: The mapping is contradictory to the reference alignment and/or ontologies. Our approach allows for fine tuning of matchers and filters, specific to each pattern, yielding precise results (Table 1). While it was unable to find mappings for some of the patterns present in the reference, it found several mappings for patterns not present in the reference with high weighted precision. 4 Conclusions We developed a novel complex ontology matching method based on pattern- guided ARM, which represents a paradigm shift by making used of the alignment patterns to steer, rather than filter, the ARM process. Our manual evaluation re- vealed that the majority of mappings we found are correct or nearly correct, even if not present in the reference alignment. These results highlight the importance of establishing evaluation metrics that consider varying degrees of correctness while being fully automated. Going forward we will investigate the computa- tional performance of our approach versus classical ARM, and extend the types of patterns it captures. 2 Available at: http://oaei.ontologymatching.org/2020/complex/index.html# popconf; Reference alignments provided by Thiéblin et al. [7] Beatriz Lima, Daniel Faria, and Catia Pesquita Table 1. Pattern-oriented analysis of the results obtained in the cmt − conf erence alignment using the filtered approach. N: number of mappings; Ref: reference align- ment; W: weighted. The total alignment size does not correspond to the sum of pattern occurrences as the same mapping may contain multiple patterns. Ref. Result W.Precision Pattern N N (%) Class - Class 16 10 77.5 Class - cardinality restriction on Object Property 5 22 73.9 Class - someValues restriction on Object Property 4 6 58.3 Class - hasValue restriction on Data Property - - - Class - someValues restriction on Data Property - 1 100 Object Property - Object Property 10 3 75.0 Data Property - Data Property 1 - - Object Property - Data Property - - - Object Property - InverseOf Object Property 2 - - Object Property - Object Property+range restriction - 9 80.6 Object Property - Object Property+domain restriction - 8 78.1 Data Property - Data Property+domain restriction - - - Total alignment 35 51 75.0 Acknowledgments This work was supported by FCT through the LASIGE Research Unit (UIDB/00408/2020 and UIDP/00408/2020). It was also partially supported by the KATY project which has received funding from the European Union’s Horizon 2020 research and innovation program under grant agreement No 101017453. References 1. Faria, D., Pesquita, C., Santos, E., Palmonari, M., Cruz, I.F., Couto, F.M.: The AgreementMakerLight ontology matching system. In: On the Move to Meaningful Internet Systems: OTM 2013 Conferences. pp. 527–541. Springer Berlin Heidelberg, Berlin, Heidelberg (2013) 2. Faria, D., Pesquita, C., Tervo, T., Couto, F., Cruz, I.: AML and AMLC results for OAEI 2019. vol. 2536, pp. 101–106 (2019) 3. Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data. p. 1–12. SIGMOD ’00, Association for Computing Machinery (2000) 4. Pour, N., Algergawy, A., Amini, R., Faria, D., Fundulaki, I., Harrow, I., Hertling, S., Jimenez-Ruiz, E., Jonquet, C., Karam, N., et al.: Results of the Ontology Alignment Evaluation Initiative 2020. vol. 2788, pp. 92–138. CEUR-WS (2020) 5. Ritze, D., Meilicke, C., Sváb-Zamazal, O., Stuckenschmidt, H.: A pattern-based ontology matching approach for detecting complex correspondences. vol. 551 (2009) 6. Thiéblin, É., Cheatham, M., Santos, C., Sváb-Zamazal, O., Zhou, L.: The First Version of the OAEI Complex Alignment Benchmark. In: International Semantic Web Conference. vol. 2180. Springer (2018) Pattern-Guided Association Rule Mining for Complex Ontology Alignment 7. Thiéblin, É., Haemmerlé, O., Hernandez, N., Trojahn, C.: Task-oriented complex ontology alignment: Two alignment evaluation sets. In: Gangemi, A., Navigli, R., Vidal, M.E., Hitzler, P., Troncy, R., Hollink, L., Tordai, A., Alam, M. (eds.) The Semantic Web. pp. 655–670. Springer International Publishing, Cham (2018) 8. Thiéblin, E., Haemmerlé, O., Trojahn, C.: CANARD complex matching system: results of the 2019 OAEI evaluation campaign. vol. 2536, pp. 114–122 (2019) 9. Zhou, L., Hitzler, P.: AROA Results for 2020 OAEI. vol. 2788, pp. 161–167 (2020)