LSMatch Results for OAEI 2021 Abhisek Sharma1 , Archana Patel2 , and Sarika Jain1 1 Department of Computer Applications, National Institute of Technology Kurukshetra, India {abhisek 61900048, jasarika}@nitkkr.ac.in 2 Department of Software Engineering, Eastern International University, Vietnam archanamca92@gmail.com Abstract. This paper presents the Large Scale Ontology Matching Sys- tem (LSMatch) and its results on OAEI 2021 datasets. LSMatch is an element-level and label-based ontology matching system that uses string similarity and synonyms matcher. The current version of the system is focused on finding similarities between the classes of the two ontologies. This is the first participation of LSMatch in the OAEI campaign on five tracks, namely Anatomy, Conference, Disease and Phenotype, Common Knowledge Graphs, and Knowledge Graph. LSMatch has demonstrated promising results in all five tracks. We also discuss the strengths and weaknesses of the LSMatch system. · · · · Keywords: Ontology Matching Knowledge Schema Alignment String similarity Synonym matcher. 1 Presentation of the system 1.1 State, purpose, general statement LSMatch (Large Scale Ontology Matching System) is an ontology matching sys- tem exploiting lexical properties to find correspondences between ontologies. It uses Levenshtein string similarity measure and synonyms matcher, which uti- lizes background knowledge containing synonyms to filter out concepts that are similar by meaning but have different lexical representations [1]. This is LS- Match’s first OAEI participation, and it got tested on 5 tracks, i.e., Anatomy, Conference, Disease and Phenotype, Common Knowledge Graphs, and Knowl- edge Graph. LSMatch system was wrapped using the MELT framework [2], and it is performing at par with some of the other systems in tracks and achieving best precision in Anatomy and Conference track. Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). 2 A. Sharma et al. 1.2 Specific techniques used The current version of LSMatch addresses monolingual ontology alignments, i.e., the concepts of the ontologies are in the same language, English [3]. We have called ontology as knowledge schema (KS) because the LSMatch system matches the classes only. The working of the LSMatch system is shown in figure 1. We introduce the multiple parts of the system by taking two Knowledge schemas (KS1 and KS2) as input to show the final set of alignments. LSMatch system takes input in any format and loaders loads the input KS (KS1 and KS2) as RDF graphs. Fig. 1. Overview of LSMatch system – Levenshtein matcher: The LSMatch uses a string similarity matcher that calculates Levenshtein distance between the concepts [4]. The concepts are represented as rdfs:label or directly as the class name in the ontologies. The official definition of Levenshtein distance is stated as “The smallest number of insertions, deletions, and substitutions required to change one string or tree into another” [5]. – Background knowledge [6]: To identify different lexical representations, LS- Match uses a synonym matcher that fetches synonyms from thesaurus.com through their API [7]. For immediate availability of synonyms at the time of matching, we have pre-fetched the synonyms and kept them in JSON format. – Synonym Matcher: LSMatch fetches synonyms from thrsaurus.com. Although we have pre-fetched the synonyms but during the execution, the concepts are cross-checked whether the synonyms for every concept are present or not. LSMatch Results for OAEI 2021 3 If some concept doesn’t have synonyms pre-fetched for it, we fetch them on the fly. For the purpose of storage and retrieval of alignments LSMatch uses dictio- nary. In the dictionary, we store information as pairs where key is hashed [8, 9]. LSMatch stores the alignments received from both the matchers along with the similarity score. We target storing and updating the scores of pairs multiple times during the alignment process and having hashed keys allow us to do that efficiently.By default, LSMatch keeps all the alignments with a combined score (Levenshtein + Synonym) of 0.5 or above to check the align- ments over variable thresholds. For the final selection of alignments the current version of LSMatch has used 0.95 as the threshold. 2 Results This section describes the results of the LSMatch system on five tracks namely: Anatomy, Conference, Disease and Phenotype, Common Knowledge Graphs, and Knowledge Graph. 2.1 Anatomy The Anatomy track consists of finding the alignments between the Adult Mouse Anatomy and the NCI Thesaurus describing the human anatomy. For the eval- uation a 16GB machine was used1 . Table 1 shows the performance of LSMatch system on anatomy track. LSMatch generated 940 total correspondences, out of which 937 were true positives, and 3 were false positive. In Anatomy is achieving best precision along side the baseline matcher. 2.2 Conference The Conference track contains 16 ontologies from the same domain (conference organization). Seven ontologies are involved in the reference alignment: Cmt, ConfTool, Edas, Ekaw, Iasted, Sigkdd, Sofsem. Table 2 shows the performance of LSMatch system on conference track. LSMatch generated 147 total correspon- dences, out of which 129 were true positives, and 18 were false positives. In the Conference track, LSMatch achieved best precision. In recall, LSMatch is at par with other systems such as AMD and baseline matches. 2.3 Disease and Phenotype This track is based on a real use case to find alignments between disease and phe- notype ontologies. Specifically, the selected ontologies are the Human Phenotype Ontology (HPO), the Mammalian Phenotype Ontology (MP), the Human Dis- ease Ontology (DOID), and the Orphanet and Rare Diseases Ontology (ORDO). 1 http://oaei.ontologymatching.org/2021/results/anatomy/index.html 4 A. Sharma et al. Table 1. Results of LSMatch on Anatomy track 2021 Matcher Precision Recall F1 Runtime (sec) LSMatch 0.997 0.618 0.763 98 ALIN 0.983 0.726 0.835 2190 ATMatcher 0.978 0.669 0.794 146 LogMapLite 0.962 0.728 0.828 2 AMD 0.96 0.739 0.835 3 Wiktionary 0.956 0.753 0.843 493 AML 0.956 0.927 0.941 32 Fine-TOM 0.933 0.808 0.866 15068 LogMap 0.917 0.848 0.881 7 TOM 0.916 0.794 0.851 2647 Lily 0.901 0.902 0.901 430 LogMapBio 0.874 0.914 0.894 1043 ALOD2Vec 0.828 0.766 0.796 261 OTMapOnto 0.646 0.811 0.72 16 Baseline 0.997 0.622 0.766 - Table 2. Results of LSMatch on Conference track 2021 Matcher Precision Recall F1 LSMatch 0.83 0.41 0.55 ATMatcher 0.69 0.51 0.59 LogMapLite 0.68 0.47 0.56 AMD 0.81 0.41 0.54 Wiktionary 0.66 0.53 0.59 AML 0.78 0.62 0.69 Fine-TOM 0.64 0.53 0.58 LogMap 0.76 0.56 0.64 TOM 0.69 0.48 0.57 Lily 0.62 0.43 0.51 ALOD2Vec 0.64 0.49 0.56 OTMapOnto 0.22 0.7 0.33 Baseline 0.76 0.41 0.53 LSMatch Results for OAEI 2021 5 Table 3 shows the performance of LSMatch system on disease and phenotype track at our end (official results are not available on the OAEI website at the time of submission of this paper). In the case of DOID-ORDO, LSMatch generated 1193 total correspondences, out of which 1178 were true positives and 15 were false positives. In the case of HP-MP, we received 685 total correspondences, out of which 683 were true positives, and 2 were false positives. Table 3. The results of Disease and Phenotype track. Test Case Precision Recall F1 Runtime (sec) doid-ordo 0.987426655 0.952303961 0.969547325 2004 hp-mp 0.997080292 0.981321839 0.989138306 2483 2.4 Common Knowledge Graphs This track evaluates the ability of matching systems to map the schema (classes) of large common knowledge graphs such as DBpedia, YAGO and NELL. Publicly available knowledge graphs are highly complementary, and known for sharing data about real-world entities such as person, organization, and place. The goal of this task is to align classes from highly influential and domain-independent knowledge graphs. Table 4 shows the performance of LSMatch on Common Knowledge Graphs track. The evaluation was executed on a Linux virtual ma- chine with 128 GB of RAM and 16 vCPUs (2.4 GHz) processors2 . LSMatch generated 102 total correspondences, out of which 101 were true positives, and 1 was false positive. Table 4. The results of Common Knowledge Graphs track 2021 Matcher Precision Recall F1 Runtime (sec) LSMatch 0.99 0.78 0.87 1005 AML 0.0 0.0 0.0 319 LogMap 0.99 0.80 0.88 199 ALOD2Vec 1.0 0.80 0.89 253 OTMapOnto 0.90 0.84 0.87 496 KGMatcher 0.97 0.91 0.94 6935 Wiktionary 1.0 0.80 0.89 272 AMD 0.0 0.0 0.0 1107 ATmatcher 1.0 0.80 0.89 196 Baseline 1.0 0.60 0.75 37 2 http://oaei.ontologymatching.org/2021/results/commonKG/index.html 6 A. Sharma et al. 2.5 Knowledge Graph The Knowledge Graph Track contains isolated knowledge graphs with instance and schema data. The goal of the task is to match both the instances and the schema. The knowledge graphs were created in the course of the DBkWik by running the DBpedia extraction framework on Wikis from the Fandom Wiki hosting platform. Table 5 shows the performance of LSMatch on Knowledge Graph track. The evaluation was executed on a virtual machine(VM) with 32GB of RAM and 16 vCPUs (2.4 GHz) with Debian 9 operating system3 . As LSMatch was only targeted for class matching. So, in Knowledge Graph track, LSMatch returned correspondences for classes only. Table 5. The results of Knowledge Graph track 2021(Classes only) Matcher Precision Recall F1 LSMatch 1.0 0.64 0.78 ALOD2Vec 1.0 0.67 0.80 AMD 0.40 0.18 0.25 AML 0.98 0.81 0.89 ATMatcher 0.97 0.79 0.87 BaselineAltLabel 1.0 0.59 0.74 BaselineLabel 1.0 0.59 0.74 Fine-TOM 1.0 0.66 0.80 KGMatcher 1.0 0.66 0.79 LogMap 0.93 0.71 0.81 OTMapOnto 0.59 0.64 0.61 TOM 1.0 0.71 0.83 Wiktionary 1.0 0.67 0.80 3 General Comments The results show that the LSMatch system is performing at par with many of the systems that were tested at OAEI and is also performing better than some of them in some cases. As far as results are considered, we can see that the system achieves good precision in all the tracks. The current version lacks in recall, which affects the F1 score. The future iterations of the system will be targeted towards improving upon these measures. The system has a lot of potentials to improve on different aspects such as multiple matchers can be employed together for finding missed out and tricky alignments; in the future versions, along with thesaurus.com, we can use knowl- edge bases like DBpedia, YAGO, or Wikidata as background knowledge, which can give more insights into the concepts represented into the ontologies for better alignments. 3 http://oaei.ontologymatching.org/2021/results/knowledgegraph/index.html LSMatch Results for OAEI 2021 7 4 Conclusions The LSMatch system is one of the good performers on multiple tracks. This year, the system was tested on 5 tracks, i.e., Anatomy, Conference, Disease and Phenotype, Common Knowledge Graphs, and Knowledge Graph. The system achieved considerably good precision in all the tracks but lacked behind in recall. In future versions, we will be adding a set of matchers and working to improve the utilization of background knowledge by which we can find better correlations between concepts that are not properly aligned using just the string similarity measures. References 1. Zhang, S., Hu, Y., and Bian, G. (2017, March). Research on string similarity al- gorithm based on Levenshtein Distance. In 2017 IEEE 2nd Advanced Information Technology, Electronic and Automation Control Conference (IAEAC) (pp. 2247- 2251). IEEE. 2. Hertling, S., Portisch, J., and Paulheim, H. (2019, September). Melt-matching evaluation toolkit. In International conference on semantic systems (pp. 231-245). Springer, Cham. 3. Shvaiko, P., and Euzenat, J. (2011). Ontology matching: state of the art and future challenges. IEEE Transactions on knowledge and data engineering, 25(1), 158-176. 4. Nguyen, T. T. A., and Conrad, S. (2015, November). Ontology matching using mul- tiple similarity measures. In 2015 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K) (Vol. 1, pp. 603-611). IEEE. 5. https://xlinux.nist.gov/dads/HTML/Levenshtein.html 6. Aleksovski, Z., Ten Kate, W., and Van Harmelen, F. (2006, November). Exploiting the Structure of Background Knowledge Used in Ontology Matching. In Ontology Matching (p. 13). 7. thesaurus.com, URL: https://www.thesaurus.com/ 8. Ochieng, P., and Kyanda, S. (2018). Large-scale ontology matching: state-of-the-art analysis. ACM Computing Surveys (CSUR), 51(4), 1-35. 9. Anam, S., Kim, Y. S., Kang, B. H., and Liu, Q. (2015). Review of ontology matching approaches and challenges. International journal of Computer Science and Network Solutions, 3(3), 1-27.