<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>LSMatch and LSMatch-Multilingual Results for OAEI 2022</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Abhisek Sharma</string-name>
          <email>abhisek_61900048@nitkkr.ac.in</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Archana Patel</string-name>
          <email>archana.patel@eiu.edu.vn</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sarika Jain</string-name>
          <email>jasarika@nitkkr.ac.in</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Eastern International University</institution>
          ,
          <country country="VN">Vietnam</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>National Institute of Technology Kurukshetra</institution>
          ,
          <country country="IN">India</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2022</year>
      </pub-date>
      <abstract>
        <p>The Large-Scale Ontology Matching System (LSMatch and LSMatch-Multilingual) and its findings using OAEI 2022 datasets are presented in this paper. A string similarity and synonyms matcher is used in the element-level and label-based ontology matching system called LSMatch. Same configuration in addition with MyMemory translation memory is used in the creation of multilingual capable system called LSMatch-Multilingual. The system(s) is/are capable of identifying classes, instances, and properties (both in monolingual and multilingual settings) between two ontologies. This year LSMatch and LSMatchMultilingual are collectively participating on OAEI's six tracks-Anatomy, Conference, Multifarm, Bio-ML, Common Knowledge Graphs, and Knowledge Graph. LSMatch has shown encouraging outcomes across all six tracks.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1.2. Specific techniques used</title>
      <p>The current version of LSMatch (as compared to last year’s submission) is now capable addresses
both monolingual and multilingual ontology alignments. The working of the LSMatch system
is shown in figure 1. We introduce the multiple parts of the system by taking two Knowledge
schemas/ontologies. LSMatch system takes input in any format and loads the input
schemas/ontologies as RDF graphs. After extracting classes, properties, and instances we perform stemming,
removing stopwords and non-alphabetic characters, and normalizing letters. Then we pass the
ontology concepts from Levenshtein and synonyms matcher modules. The underline modules
have following functionality:</p>
      <sec id="sec-1-1">
        <title>Input Layer</title>
        <sec id="sec-1-1-1">
          <title>Pre-processing</title>
          <p>Source Ontology
Target Ontology
Loading ontology as</p>
          <p>Graph object
Extracting Concepts/
Properties/Instances
Text Preprocessing</p>
        </sec>
        <sec id="sec-1-1-2">
          <title>Processing</title>
          <p>Levenshtein</p>
          <p>Matcher
Synonyms
Matcher
Similarity Matrix
Alignment
Filtering</p>
        </sec>
      </sec>
      <sec id="sec-1-2">
        <title>Output</title>
      </sec>
      <sec id="sec-1-3">
        <title>Layer</title>
        <p>Final Alignment</p>
        <p>Evaluation</p>
        <p>Pre. Recall F1
External Resources /</p>
        <p>Synonym Source</p>
        <p>Translations
• Levenshtein matcher: The LSMatch uses a string similarity matcher that calculates
Levenshtein distance between the concepts [3]. The concepts are represented as rdfs:label
or directly as the class name in the ontologies. The oficial definition of Levenshtein
distance is stated as “The smallest number of insertions, deletions, and substitutions
required to change one string or tree into another”1.
• Background knowledge [4]: To identify diferent lexical representations, LSMatch uses a
synonym matcher that fetches synonyms Wordnet [5]. Python’s nltk library is used for
wordnet inclusion.
• Synonym Matcher: LSMatch fetches synonyms from wordnet. Although we have
prefetched the synonyms but during the execution, the concepts are cross-checked whether
the synonyms for every concept are present or not. If some concept doesn’t have
synonyms pre-fetched for it, we fetch them on the fly.
1https://xlinux.nist.gov/dads/HTML/Levenshtein.html
• Translations2: for translations we have used MyMemory’s translations memory as its
provide good translations, is free, and is the world’s largest Translation Memory.</p>
        <p>For the purpose of storage and retrieval of alignments LSMatch uses dictionary. In the
dictionary, we store information as &lt;key, value&gt; pairs where key is hashed [6, 7]. LSMatch
stores the alignments received from both the matchers along with the similarity score. We
target storing and updating the scores of pairs multiple times during the alignment process and
having hashed keys allow us to do that eficiently. By default, LSMatch keeps all the alignments
with a combined score (Levenshtein + Synonym) of 0.5 or above to check the alignments over
variable thresholds. For the final selection of alignments the current version of LSMatch has
used 0.95 as the threshold.
2. Results</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2.1. Anatomy</title>
    </sec>
    <sec id="sec-3">
      <title>2.3. Multifarm 2.4. Bio-ML</title>
      <p>This section describes the results of the LSMatch and LSMatch-multilingual system collectively
on six tracks namely: Anatomy, Conference, Multifarm, Bio-ML, Common Knowledge Graphs,
and Knowledge Graph. The results are presented collectively in Table 1. Diferences from
OAEI2021 are discussed in the subsections below.</p>
      <p>In anatomy overall result is almost same as last year with 2% improvement in recall, though
overall F-measure got afected and it decreased by 0.2%.</p>
    </sec>
    <sec id="sec-4">
      <title>2.2. Conference</title>
      <p>For conference track the result are exactly same as last year as due to some error we had to use
the last year’s LSMatch for this track, because of which the results are identical.
This is the first entry of LSMatch in Multifarm track. For this track we specifically developed
LSMatch-multilingual. Though both the versions of LSMatch were tested on Multifarm track,
LSMatch-multilingual obtained best F1-score among all the systems with 0.47 (see Table 2 for
comparative results).</p>
      <p>The Bio-ML track is Machine Learning (ML) friendly Biomedical track. This track
supersedes the previous largebio and phenotype tracks. There are 5 tasks in total (on which
LSMatch was tested), all Equivalent matching have been performed with 5 ontology pairs,
OMIN-ORDO(Disease), NCIT-DOID(Disease), SNOMED-FMA(Body), SNOMED-NCIT(Pharm),
and SNOMED-NCIT(Neoplas). On OMIN-ORDO(Disease) and NCIT-DOID(Disease) LSMatch
2https://mymemory.translated.net/
got average results. On SNOMED-FMA(Body), LSMatch has 6th best precision out of 9. On
SNOMED-NCIT(Pharm) and SNOMED-NCIT(Neoplas), LSMatch has 2nd best precision just
after LogMap-Lite. All the above stated resutls are on Unsupervised (90% Test Mapping). For
Semi-supervised(70% Test Mappings), LSMatch has average performance in all tasks.</p>
    </sec>
    <sec id="sec-5">
      <title>2.5. Common Knowledge Graphs</title>
      <p>This year common Knowledge Graph track has one more task, namely Yago-Wikidata where
LSMatch’s performance was decent though need improvement. In Nell-DBPedia task, LSMatch
has almost similar result to last year.</p>
    </sec>
    <sec id="sec-6">
      <title>2.6. Knowledge Graph</title>
      <p>In OAEI 2021 LSMatch only supported class matching, this year (OAEI 2022) LSMatch had added
functionality to also match instance and properties. Class matching results this year are same
as last year, with this year’s property and instance matching overall result was 0.66, 0.63, and
0.61 precision, F1, and recall respectively. Which last year was 1, 0.01, and 0.
3. Conclusion
This year, the system was tested on six tracks, i.e., Anatomy, Conference, Multifarm, Bio-ML,
Common Knowledge Graphs, and Knowledge Graph. The system achieved considerably good
precision in all the tracks but lacked behind in recall. In future versions, we will be adding a set
of matchers and working to improve the utilization of background knowledge by which we can
ifnd better correlations between concepts that are not properly aligned using just the lexical
measures.
[3] T. T. A. Nguyen, S. Conrad, Ontology matching using multiple similarity measures, in:
2015 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering
and Knowledge Management (IC3K), volume 1, IEEE, 2015, pp. 603–611.
[4] Z. Aleksovski, W. Ten Kate, F. Van Harmelen, Exploiting the structure of background
knowledge used in ontology matching., in: Ontology Matching, 2006, p. 13.
[5] G. A. Miller, Wordnet: a lexical database for english, Communications of the ACM 38 (1995)
39–41.
[6] P. Ochieng, S. Kyanda, Large-scale ontology matching: State-of-the-art analysis, ACM</p>
      <p>Computing Surveys (CSUR) 51 (2018) 1–35.
[7] S. Anam, Y. S. Kim, B. H. Kang, Q. Liu, Review of ontology matching approaches and
challenges, International Journal of Computer Science and Network Solutions 3 (2015) 1–27.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Bian, Research on string similarity algorithm based on levenshtein distance</article-title>
          ,
          <source>in: 2017 IEEE 2nd Advanced Information Technology, Electronic and Automation Control Conference (IAEAC)</source>
          , IEEE,
          <year>2017</year>
          , pp.
          <fpage>2247</fpage>
          -
          <lpage>2251</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Hertling</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Portisch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Paulheim</surname>
          </string-name>
          ,
          <article-title>Melt-matching evaluation toolkit</article-title>
          ,
          <source>in: International conference on semantic systems</source>
          , Springer, Cham,
          <year>2019</year>
          , pp.
          <fpage>231</fpage>
          -
          <lpage>245</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>