ALIN Results for OAEI 2021 Jomar da Silva1 , Kate Revoredo , Fernanda Araujo Baião3 , and Cabral Lima1 2 1 Graduate Program in Informatics Federal University of Rio de Janeiro (UFRJ), Brazil 2 Vienna University of Economics and Business, Vienna, Austria 3 Department of Industrial Engineering Pontifical Catholic University of Rio de Janeiro (PUC-Rio), Brazil jomar.silva@uniriotec.br,kate.revoredo@wu.ac.at, fbaiao@puc-rio.br,cabrallima@ufrj.br Abstract. 4 Alin is a system for interactive ontology matching. In last year’s version, Alin applied natural language processing techniques (NLP) to standard- ize the concept names of the ontologies that participate in the matching process. In the current version, we modified the grammars used during the process in order to improve the quality of the alignments. This arti- cle describes the participation of Alin at OAEI 2021 and discusses its results. Keywords: ontology matching, Wordnet, interactive ontology match- ing, ontology alignment, interactive ontology alignment, natural language processing 1 Presentation of the system Due to the advances in information and communication technologies, a large amount of data repositories became available. Those repositories, however, are highly semantically heterogeneous, which hinders their integration. Ontology Matching has been successfully applied to solve this problem, by discovering mappings between two distinct ontologies which, in turn, conceptually define the data stored in each repository. The Ontology Matching process seeks to discover correspondences (mappings) between entities of different ontologies, and this may be performed manually, semi-automatically or automatically [1]. The inter- active approach, which considers the knowledge of domain experts through their participation during the matching process, has stood out among semi-automatic ones [2]. A domain expert is an expensive, scarce, and time-consuming resource; when available, however, this resource has improved the achieved results. Nev- ertheless, there is still room for improvements [2], as evidenced by the most recent results from the evaluation of interactive tools in the OAEI5 (Ontology 4 Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). 5 Available at http://oaei.ontologymatching.org/2021/results/interactive/index.htm, last accessed on Oct, 20, 2021. Alignment Evaluation Initiative). Alin [3] is a system for interactive ontology matching which has been participating in all OAEI editions since 2016, with increasingly improved results in the Anatomy interactive track. 1.1 State, Purpose and General statement Interactive ontology matching systems select mappings for domain expert eval- uates. Alin selects many of these mappings through semantic and lexical met- rics. As the concept names of the ontologies are not standardized, these metrics may return lower values than would be the case if they were standardized. This smaller metric may cause Alin not to select these mappings for evaluation by the domain expert. In its 2020 version, Alin proposed the use of Natural Language Process- ing (NLP) resources such as regular grammars (in fact, their equivalent regular expressions) and context-free grammars along with their respective lexical ana- lyzers (scanners) and syntax analyzers (parsers). The use of these NLP resources makes it possible to translate different pat- terns used in the two ontologies into a unique one. This standardization allows Alin to select better mappings for the domain expert to evaluate. To do the standardization, Alin has a new phase before the program runs. In this phase, an NLP expert manually develops grammars, and their respective scanners and parsers, to the ontologies. Alin uses these scanners and parsers during the execution of the program. This new phase is possible in an interactive ontology matching system because: 1. We know before the program runs which ontologies it will match, as we need to look for experts in the domain of ontologies to interact with the program; 2. The process of searching, meeting, and scheduling a day available for the expert to participate in the process can take a long time, probably a few days. We can use this time of a few days until the execution of the program to develop the necessary grammars, scanners, and parsers for the ontologies. To test the new techniques, the authors of this paper played the role of the NLP experts. 1.2 Specific techniques used During its matching process, Alin handles three sets of mappings: (i) Accepted, which is a set of mappings definitely to be retained in the alignment; (ii) Selected, which is a set of mappings where each is yet to be decided if it will be included in the alignment; and (iii) Suspended, which is a set of mappings that have been previously selected, but (temporarily or permanently) filtered out of the selected mappings. Given the previous definitions, Alin procedure follows 5 Steps, described as follows: 1. Select mappings: select the first mappings and automatically accepts some of them. We explain the selection and acceptance techniques in the ’Description of the used techniques’ paragraph below; 2. Filter mappings: suspend some selected mappings, using lexical criteria for that; 3. Ask domain expert: accepts or rejects selected mappings, according to do- main expert feedback 4. Propagate: select new mappings, reject some selected mappings or unsuspend some suspended mappings (depending on newly accepted mappings) 5. Go to step 3 as long as there are undecided selected mappings All versions of Alin (since its very first OAEI participation) follow this general procedure. In this 2020 version, Alin included a new step where an NLP expert develops grammars, and their respective scanners and parsers to the concept names of the ontologies. Alin uses these scanners and parsers to standardize the concept names of the ontologies and thus improve the generated alignment. The new step can lead to, for example, correcting spelling errors and unifying different spellings for the same concept name. More detailed examples of possible standardization of concept names are presented in [4]. Alin uses the developed scanners and parsers in step 1 of the procedure. Description of the used techniques – Step 1. Alin runs the scanners and the parsers for each concept name of the ontologies, modifying it and standardizing it. Alin uses a blocking strategy where it discards all data properties and object properties of the ontologies. So, in this step, Alin selects only concept mappings, using linguistic similar- ities between the standardized concept names. Alin automatically accepts concept mappings whose standardized names are synonyms. Alin uses the Wordnet and domain-specific ontologies (the FMA Ontology in the Anatomy track) to find synonyms between entities. – Step 2. Alin suspends the selected mappings whose entities have low lexical similarity. We use the Jaccard, Jaro-Wrinkler, and n-gram lexical metrics to calculate the lexical similarity of the selected mappings. We based the process of choosing the similarity metrics used by Alin on the result of these metrics in assessments [5]. These suspended mappings can be further unsuspended later, returning to the status of selected mappings, as proposed in [6]. – Step 3. At this point, the domain expert interaction begins. Alin sorts the selected mappings in a descending order according to the sum of similarity metric values. The sorted selected mappings are submitted to the domain expert. – Step 4. Initially, the set of selected mappings contains only concept map- pings. At each interaction with the domain expert, if he accepts the map- ping, Alin (i) removes from the set of selected mappings all the mappings that compose an instantiation of a mapping anti-pattern [7][8] (we explain mapping anti-patterns below in the ’Mapping anti-patterns’ paragraph) with the accepted mappings; (ii) selects data property (as proposed in [9]) and object property mappings related to the accepted concept mappings; (iii) unsuspends all concept mappings whose both entities are subconcepts of the concept of an accepted mapping (as proposed in [6]). – Step 5. The interaction phase continues until there are no selected mappings. Mapping anti-patterns An anti-pattern mapping can be a logical inconsis- tency, a construction constraint on the ontology, or an alignment constraint. An ontology may have construction constraints, such as a concept cannot be equivalent to its superconcept. The alignment between two ontologies can have a constraint like, for example, an entity of ontology O cannot be equivalent to two entities of the ontology O0 . Anti-pattern mapping is a combination of map- pings that generates a problematic alignment, i.e., a logical inconsistency or a violated constraint. In its 2021 version, we modified the grammars used by Alin to improve the quality of the generated alignments. 1.3 Link to the system and parameters file Alin is available 6 as a package to be run through the SEALS client. 2 Results Interactive ontology matching is the focus of the Alin system. The comparison between the participation of Alin in 2021 and 2020 (Tables 4 and 5) shows an improvement in the quality of the generated alignment, attesting the effectiveness of the changes in the grammars. 2.1 Comments on the participation of Alin in non-interactive tracks The modified grammars generated an increase in the F-Measure of non-interactively generated alignments in the Anatomy track. The modified grammars generated a minor increase in the F-Measure of non-interactively generated alignments in the Anatomy track. There was a decrease in precision and an increase in recall. The current version of Alin did not participate in the Conference track. (Table 1). 6 https : //drive.google.com/f ile/d/1SxJL6f LRV qI84epm8DbAM lcscEoGbgZ/view?usp = sharing Table 1. Participation of Alin in Anatomy Non-Interactive Track in OAEI 2020[10] and OAEI 2021[11] Year Precision Recall F-measure Anatomy track 2020 0.986 0.72 0.832 2021 0.983 0.726 0.835 2.2 Comments on the participation of Alin in interactive tracks In the Anatomy track, Alin was better than LogMap in quality (F-Measure) but worse in total requests, and worse in both aspects than AML (Table 2). In the Conference track, Alin was first in quality and third in total requests (Table 3). Table 2. Participation of Alin in Anatomy Interactive Track - OAEI 2021[12] - Error Rate 0.0 Tool Precision Recall F-measure Total Requests ALIN 0.986 0.887 0.934 404 AML 0.972 0.933 0.952 189 LogMap 0.988 0.846 0.912 388 Table 3. Participation of Alin in Conference Interactive Track - OAEI 2021[12] - Error Rate 0.0 Tool Precision Recall F-measure Total Requests ALIN 0.916 0.718 0.799 281 AML 0.91 0.698 0.79 220 LogMap 0.886 0.61 0.723 82 2.3 Alin participations in OAEI 2021 and OAEI 2020 Alin improved the quality of the alignment generated for both the Anatomy and the Conference tracks, with a higher improvement for the Anatomy scenario. The number of interactions has also increased on both tracks (Tables 4 and 5). The quality of the alignment generated by Alin depends on the correct feedback from the domain expert, as Alin uses this feedback to select new mappings. When Alin selects wrong mappings, the quality of the generated alignment tends to decrease. The comparison between the two runs of the table 6 shows a quality decline sharper in the current version. The run time of Alin 2021 was shorter than last year’s (Table 7), but other systems also had a similar reduction. So there is a chance that this reduction may be due to changes in the computational environment. Table 4. Participation of ALIN in Anatomy Interactive Track - OAEI 2016[13]/2017[14]/2018[15]/2019[16]/2020[10]/2021[12] - Error Rate 0.0 Year Precision Recall F-measure Total Requests 2016 0.993 0.749 0.854 803 2017 0.993 0.794 0.882 939 2018 0.994 0.826 0.902 602 2019 0.979 0.85 0.91 365 2020 0.988 0.856 0.917 360 2021 0.986 0.887 0.934 404 Table 5. Participation of ALIN in Conference Interactive Track - OAEI 2016[13]/2017[14]/2018[15]/2019[16]/2020[10]/2021[12] - Error Rate 0.0 Year Precision Recall F-measure Total Requests 2016 0.957 0.735 0.831 326 2017 0.957 0.731 0.829 329 2018 0.921 0.721 0.809 276 2019 0.914 0.695 0.79 228 2020 0.915 0.705 0.796 233 2021 0.916 0.718 0.799 281 3 General comments The analysis of OAEI 2021 results showed that Alin 2021 has improved the quality of the generated alignment in the interactive track. Moreover, an increase in the expert error rate caused a sharper fall in the alignment quality. The number of interactions with the expert was increased (from 233 to 281 requests in the Conference track and from 360 to 404 requests in the Anatomy track). Finally, Alin had a decrease in run time, like all other participants (Table 7), so the drop seems to be caused by changes in the computing environment rather than changes made to the program. Table 6. F-Measure of ALIN in Anatomy Interactive Track - OAEI /2020[10]/2021[12] and in Conference Interactive Track - OAEI /2020[10]/2021[12] - with Different Error Rates Year Error rate 0.0 Error rate 0.1 Anatomy 2020 0.917 0.887 2021 0.934 0.899 Year Error rate 0.0 Error rate 0.1 Conference 2020 0.796 0.713 2021 0.799 0.698 Table 7. Run Time (sec) in Anatomy Interactive Track - OAEI /2020[10]/2021[12] and in Conference interactive track - OAEI /2020[10]/2021[12] Tool 2020 2021 Anatomy ALIN 1152 880.66 AML 37,3 16.101 LogMap 7,6 4.209 Tool 2019 2020 Conference ALIN 136,9 82.381 AML 30.1 20.727 LogMap 37.96 23.999 3.1 Conclusions This paper presented the newest version of Alin (Alin 2021), an interactive matching tool that makes use of natural language processing (NLP) resources, and its performance results obtained in OAEI 2021. When compared to its pre- vious version, Alin 2021 used modified grammars during the NLP process to allow for a higher level of standardization among the terms used by the ontolo- gies being aligned. These grammar modifications were tested on the Anatomy and Conference tracks and proved to be efficient in increasing the quality of the generated alignment, obtaining the highest F-measure value in the Conference interactive track, with an expert error rate of zero. In future versions, we intend to implement new grammar changes to further improve the quality of the generated alignment. References 1. Euzenat, J., Shvaiko, P.: Ontology Matching - Second Edition. Springer-Verlag (2013) 2. Li, H., Dragisic, Z., Faria, D., Ivanova, V., Jimenez-Ruiz, E., Lambrix, P., Pesquita, C.: User validation in ontology alignment: functional assessment and impact. The Knowledge Engineering Review (2019) 3. Da Silva, J., Revoredo, K., Baião, F., Euzenat, J.: Alin: improving interactive ontology matching by interactively revising mapping suggestions. The Knowledge Engineering Review 35 (2020) 4. Real, F.J.Q., Bella, G., McNeill, F., Bundy, A.: Using domain lexicon and grammar for ontology matching. (2020) 5. Cheatham, M., Hitzler, P.: String similarity metrics for ontology alignment. In: Proceedings of the 12th International Semantic Web Conference - Part II. ISWC ’13, New York, NY, USA, Springer-Verlag New York, Inc. (2013) 294–309 6. Silva, J., Baião, F., Revoredo, K., Euzenat, J.: Semantic interactive ontology matching: Synergistic combination of techniques to improve the set of candidate correspondences. In: OM-2017: Proceedings of the Twelfth International Workshop on Ontology Matching. Volume 2032. (2017) 13–24 7. Guedes, A., Baião, F., Shivaprabhu, Revoredo, R.: On the Identification and Rep- resentation of Ontology Correspondence Antipatterns. In: Proc. 5th Int. Conf. Ontol. Semant. Web Patterns (WOP’14), CEUR Work. Proc. (2014) 8. Guedes, A., Baião, F., Revoredo, K.: Digging Ontology Correspondence Antipat- terns. In: Proceeding WOP’14 Proc. 5th Int. Conf. Ontol. Semant. Web Patterns. Volume 1032. (2014) 38—-48 9. Silva, J., Revoredo, K., Baião, F.A., Euzenat, J.: Interactive Ontology Match- ing: Using Expert Feedback to Select Attribute Mappings. In: CEUR Workshop Proceedings. Volume 2288. (2018) 25–36 10. Silva, J., Delgado, C., Revoredo, K., Baião, F.: Alin results for oaei 2020. In: Proceedings of the 15th International Workshop on Ontology Matching. OM’20 (2020) 139–146 11. : Results for oaei 2021 - anatomy track. http://oaei.ontologymatching.org/ 2021/results/anatomy/index.html Accessed: 2021-10-19. 12. : Results for oaei 2021 - interactive track. http://oaei.ontologymatching.org/ 2021/results/interactive/index.htm Accessed: 2021-10-19. 13. Silva, J., Baião, F., Revoredo, K.: Alin results for oaei 2016. In: OM-2016: Pro- ceedings of the Eleventh International Workshop on Ontology Matching. OM’16 (2016) 130–137 14. Silva, J., Baião, F., Revoredo, K.: Alin results for oaei 2017. In: OM-2017: Pro- ceedings of the Twelfth International Workshop on Ontology Matching. OM’17 (2017) 114–121 15. Silva, J., Baião, F., Revoredo, K.: Alin results for oaei 2018. In: Ontology Matching: OM-2018: Proceedings of the ISWC Workshop. OM’18 (2018) 117–124 16. Silva, J., Delgado, C., Revoredo, K., Baião, F.: Alin results for oaei 2019. In: Proceedings of the 14th International Workshop on Ontology Matching. OM’19 (2019) 94–100