1.2. Specific techniques used

LSMatch and LSMatch-Multilingual Results for OAEI 2022

Abhisek Sharma

abhisek_61900048@nitkkr.ac.in 1

Archana Patel

archana.patel@eiu.edu.vn 0

Sarika Jain

jasarika@nitkkr.ac.in 1 0 Eastern International University , Vietnam 1 National Institute of Technology Kurukshetra , India

2022

The Large-Scale Ontology Matching System (LSMatch and LSMatch-Multilingual) and its findings using OAEI 2022 datasets are presented in this paper. A string similarity and synonyms matcher is used in the element-level and label-based ontology matching system called LSMatch. Same configuration in addition with MyMemory translation memory is used in the creation of multilingual capable system called LSMatch-Multilingual. The system(s) is/are capable of identifying classes, instances, and properties (both in monolingual and multilingual settings) between two ontologies. This year LSMatch and LSMatchMultilingual are collectively participating on OAEI's six tracks-Anatomy, Conference, Multifarm, Bio-ML, Common Knowledge Graphs, and Knowledge Graph. LSMatch has shown encouraging outcomes across all six tracks.

1.2. Specific techniques used

The current version of LSMatch (as compared to last year’s submission) is now capable addresses both monolingual and multilingual ontology alignments. The working of the LSMatch system is shown in figure 1. We introduce the multiple parts of the system by taking two Knowledge schemas/ontologies. LSMatch system takes input in any format and loads the input schemas/ontologies as RDF graphs. After extracting classes, properties, and instances we perform stemming, removing stopwords and non-alphabetic characters, and normalizing letters. Then we pass the ontology concepts from Levenshtein and synonyms matcher modules. The underline modules have following functionality:

Input Layer Pre-processing

Source Ontology Target Ontology Loading ontology as

Graph object Extracting Concepts/ Properties/Instances Text Preprocessing

Processing

Levenshtein

Matcher Synonyms Matcher Similarity Matrix Alignment Filtering

Output Layer

Final Alignment

Evaluation

Pre. Recall F1 External Resources /

Synonym Source

Translations • Levenshtein matcher: The LSMatch uses a string similarity matcher that calculates Levenshtein distance between the concepts [3]. The concepts are represented as rdfs:label or directly as the class name in the ontologies. The oficial definition of Levenshtein distance is stated as “The smallest number of insertions, deletions, and substitutions required to change one string or tree into another”1. • Background knowledge [4]: To identify diferent lexical representations, LSMatch uses a synonym matcher that fetches synonyms Wordnet [5]. Python’s nltk library is used for wordnet inclusion. • Synonym Matcher: LSMatch fetches synonyms from wordnet. Although we have prefetched the synonyms but during the execution, the concepts are cross-checked whether the synonyms for every concept are present or not. If some concept doesn’t have synonyms pre-fetched for it, we fetch them on the fly. 1https://xlinux.nist.gov/dads/HTML/Levenshtein.html • Translations2: for translations we have used MyMemory’s translations memory as its provide good translations, is free, and is the world’s largest Translation Memory.

For the purpose of storage and retrieval of alignments LSMatch uses dictionary. In the dictionary, we store information as <key, value> pairs where key is hashed [6, 7]. LSMatch stores the alignments received from both the matchers along with the similarity score. We target storing and updating the scores of pairs multiple times during the alignment process and having hashed keys allow us to do that eficiently. By default, LSMatch keeps all the alignments with a combined score (Levenshtein + Synonym) of 0.5 or above to check the alignments over variable thresholds. For the final selection of alignments the current version of LSMatch has used 0.95 as the threshold. 2. Results

2.1. Anatomy 2.3. Multifarm 2.4. Bio-ML

This section describes the results of the LSMatch and LSMatch-multilingual system collectively on six tracks namely: Anatomy, Conference, Multifarm, Bio-ML, Common Knowledge Graphs, and Knowledge Graph. The results are presented collectively in Table 1. Diferences from OAEI2021 are discussed in the subsections below.

In anatomy overall result is almost same as last year with 2% improvement in recall, though overall F-measure got afected and it decreased by 0.2%.

2.2. Conference

For conference track the result are exactly same as last year as due to some error we had to use the last year’s LSMatch for this track, because of which the results are identical. This is the first entry of LSMatch in Multifarm track. For this track we specifically developed LSMatch-multilingual. Though both the versions of LSMatch were tested on Multifarm track, LSMatch-multilingual obtained best F1-score among all the systems with 0.47 (see Table 2 for comparative results).

The Bio-ML track is Machine Learning (ML) friendly Biomedical track. This track supersedes the previous largebio and phenotype tracks. There are 5 tasks in total (on which LSMatch was tested), all Equivalent matching have been performed with 5 ontology pairs, OMIN-ORDO(Disease), NCIT-DOID(Disease), SNOMED-FMA(Body), SNOMED-NCIT(Pharm), and SNOMED-NCIT(Neoplas). On OMIN-ORDO(Disease) and NCIT-DOID(Disease) LSMatch 2https://mymemory.translated.net/ got average results. On SNOMED-FMA(Body), LSMatch has 6th best precision out of 9. On SNOMED-NCIT(Pharm) and SNOMED-NCIT(Neoplas), LSMatch has 2nd best precision just after LogMap-Lite. All the above stated resutls are on Unsupervised (90% Test Mapping). For Semi-supervised(70% Test Mappings), LSMatch has average performance in all tasks.

2.5. Common Knowledge Graphs

This year common Knowledge Graph track has one more task, namely Yago-Wikidata where LSMatch’s performance was decent though need improvement. In Nell-DBPedia task, LSMatch has almost similar result to last year.

2.6. Knowledge Graph

In OAEI 2021 LSMatch only supported class matching, this year (OAEI 2022) LSMatch had added functionality to also match instance and properties. Class matching results this year are same as last year, with this year’s property and instance matching overall result was 0.66, 0.63, and 0.61 precision, F1, and recall respectively. Which last year was 1, 0.01, and 0. 3. Conclusion This year, the system was tested on six tracks, i.e., Anatomy, Conference, Multifarm, Bio-ML, Common Knowledge Graphs, and Knowledge Graph. The system achieved considerably good precision in all the tracks but lacked behind in recall. In future versions, we will be adding a set of matchers and working to improve the utilization of background knowledge by which we can ifnd better correlations between concepts that are not properly aligned using just the lexical measures. [3] T. T. A. Nguyen, S. Conrad, Ontology matching using multiple similarity measures, in: 2015 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K), volume 1, IEEE, 2015, pp. 603–611. [4] Z. Aleksovski, W. Ten Kate, F. Van Harmelen, Exploiting the structure of background knowledge used in ontology matching., in: Ontology Matching, 2006, p. 13. [5] G. A. Miller, Wordnet: a lexical database for english, Communications of the ACM 38 (1995) 39–41. [6] P. Ochieng, S. Kyanda, Large-scale ontology matching: State-of-the-art analysis, ACM

Computing Surveys (CSUR) 51 (2018) 1–35. [7] S. Anam, Y. S. Kim, B. H. Kang, Q. Liu, Review of ontology matching approaches and challenges, International Journal of Computer Science and Network Solutions 3 (2015) 1–27.

[1]

Zhang ,

Hu , G. Bian, Research on string similarity algorithm based on levenshtein distance , in: 2017 IEEE 2nd Advanced Information Technology, Electronic and Automation Control Conference (IAEAC) , IEEE, 2017 , pp. 2247 - 2251 .

[2]

Hertling ,

Portisch ,

Paulheim , Melt-matching evaluation toolkit , in: International conference on semantic systems , Springer, Cham, 2019 , pp. 231 - 245 .