<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>ATBox Results for OAEI 2021</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Data and Web Science Group, University of Mannheim</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>ATBox matcher is a system for matching instances (Abox) as well as schema (Tbox) of two given KGs. The focus of this matcher is on scalability such that it can easily perform huge tasks like Knowledge Graph and Large Bio track. ATBox participates in the OAEI for the second time. The basic system as well as the improvements are described in this paper. For matching, two pipelines (schema and instance) are used for generating candidates. The schema matches are used to further improve the instance alignments.</p>
      </abstract>
      <kwd-group>
        <kwd>Ontology Matching</kwd>
        <kwd>Knowledge Graph</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>1.1</p>
    </sec>
    <sec id="sec-2">
      <title>State, purpose, general statement</title>
      <p>The overall matching strategy of ATBox is shown in gure 1. The Tbox and
Abox have di erent processing pipelines but the correspondences are combined
0 Copyright c 2021 for this paper by its authors. Use permitted under Creative
Commons License Attribution 4.0 International (CC BY 4.0).</p>
      <p>TBox1
TBox 2
ABox1
ABox 2</p>
      <p>Stopword
Extraction
final
alignment
String Matching</p>
      <p>Synonym</p>
      <p>Extension
Cardinality Filter
Similar Neighbors</p>
      <p>Filter
Cosine Similarity</p>
      <p>Filter</p>
      <p>String Matching
Bounded Path</p>
      <p>Matching
Instance Filter</p>
      <p>Type Filter</p>
      <p>Common
Properties Filter
in the end to get the nal alignment. One of the main di erences in comparison
to the system submitted last year is the additional bounded path matching for
classes.</p>
      <p>First have a look at the Tbox matching. It is applied for all classes and
properties (owl:ObjectProperty, owl:DatatypeProperty, and rdf:Property). They
are retrieved by the jena1 methods OntModel.listClasses() and
OntModel.listAllOntProperties().</p>
      <p>The rst step is to extract KG speci c stopwords because in some cases the
labels and/or fragments contains tokens which appears very often like class,
infobox etc. If these tokens appears in more than 20 % of all classes/properties,
then they are assumed to be stop words.</p>
      <p>
        The synonyms are extracted from the English Wiktionary via DBnary [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
The extraction process is detailed in the previous results paper[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] similarly to
the string matching component. After these components the new bound path
matching is executed. This component will match classes which are in between
two already matched classes in a hierarchy. Thus it is a structural approach
which requires already matched resources. Figure 2 shows an example. The class
book is matched to class books and novel to novel. With this information, the
class in between is a candidate for another correspondence. Thus it will be added
with the average con dence of the other two correspondences.
      </p>
      <p>The instance matching (Abox - shown in the lower part of the gure 1) is kept
the same in comparison to the last submission. As a last step, all correspondences
are combined and a nal cardinality lter ensures a one to one alignment by
comparing the con dence scores.</p>
      <sec id="sec-2-1">
        <title>1 https://jena.apache.org</title>
        <p>one:Book</p>
        <p>two:Books
rdfs:subClassOf
one:novel
crime
one:Fiction</p>
        <p>
          Book
rdfs:subClassOf
rdfs:subClassOf
one:novel
two:Enterta
inment
two:Novel
rdfs:subClassOf
rdfs:subClassOf
ATBox matcher is also available as a docker based matcher which runs a HTTP
endpoint. The matcher is packaged with the MELT framework[
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. It will generate
a docker image which also contains the code for running a small server.
ATBox matcher can be downloaded from
https://www.dropbox.com/s/l344aawh0mw6rjm/atmatcher-1.0-web-latest.
tar.gz?dl=0.
2
        </p>
        <sec id="sec-2-1-1">
          <title>Results</title>
          <p>This section discusses the results of ATBox for each track of OAEI 2021 where the
matcher is able to produce results. The following tracks are included: anatomy,
conference, largebio, phenotype, biodiv, commonKG and knowledge graph track.
The results for were not reported this year.</p>
          <p>Speci c matching strategies and interfaces for the interactive and complex
track are still not implemented and thus not described. Due to the fact that
ATBox has no multi language support, the track multifarm is also excluded
from the results discussion.
2.1</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Anatomy</title>
      <p>In comparison to last years participation, the F-Measure slightly decreased from
0.799 to 0.794 but still beats the baseline by a small margin. The matcher is
rather precision oriented and achieves the third highest value after the string
baseline, LSMatch, and ALIN. Recall should be optimized further than just using
synonyms and an alignment repair step can be introduced to make a coherent
alignment (which is not yet the case).
2.2</p>
    </sec>
    <sec id="sec-4">
      <title>Conference</title>
      <p>
        In the conference track, ATBox matcher increased the F-Measure from 0.57 to
0.59 using the rar2-M3 evaluation setup [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] (which is a violation free version
of the entailed reference alignment for classes and properties). This is the third
highest value after AML, LogMap, and GMap. Again the recall (with 0.51) is
lower than precision (with 0.69).
2.3
      </p>
    </sec>
    <sec id="sec-5">
      <title>Largebio</title>
      <p>ATBox matcher is able to run on three out of six tasks in largebio. In the rst task
(FMA-NCI), the presented system returned 2,332 correspondences and scored
0.867 in terms of F-measure.</p>
      <p>The third task (FMA-SNOMED) could be solve in 30 seconds which is the
third best time in this test case. In this short time, the matcher returned 6,226
correspondences. Only the LogMap matcher family and AML have better results
but also need more time.</p>
      <p>The task FMA-SNOMED is the only one where also the whole ontologies
could be matched. This results in a higher runtime of 77 seconds. Unfortunately
the recall (0.206) was too low to return many correct mappings.</p>
      <p>Overall the system needs to be tuned to nd more correspondences (also in
larger ontologies).
2.4</p>
    </sec>
    <sec id="sec-6">
      <title>Phenotype</title>
      <p>In the phenotype track, the presented matcher is able to run on HP-MP task but
not on DOID-ORDO. We will investigate which components prevent a successful
run of the latter task.</p>
      <p>For task HP-MP the matcher was again quite fast and only AML and LogMap
are better but the di erences in terms of F-measure are quite large (0.454 in
comparison to AML with 0.804 and LogMap with 0.818).
2.5</p>
    </sec>
    <sec id="sec-7">
      <title>Biodiv</title>
      <p>In the Biodiv track ATBox scored di erently for the given fours tasks. For the
envo-sweet task only a score of 0.671 could be achieved but for anaeethes-gemet
task ATBox is the second best matcher with 0.748. Furthermore it is also by far
one of the fastest matchers together with LogMapLt (which has a much slower
F-Measure for the second task).</p>
      <p>For the agrovoc-nalt and ncbitaxon-taxre d tasks, our matcher could not
produce any result. We will further investigate it, such that the system is able
to match these tasks in the upcoming campaign.
2.6</p>
    </sec>
    <sec id="sec-8">
      <title>Common Knowledge Graphs</title>
      <p>This is a new track which was introduced in OAEI 2021. The task is to align
classes between NELL and DBpedia. NELL has 134 classes and 1,184,377
instances whereas DBpedia has 138 classes and 631,461 instances.</p>
      <p>ATMatcher is the second best matcher together with ALOD2Vec and
Wiktionary with a F-Measure of 0.89. Only KGMatcher (0.94) could nd more
correct correspondences. For this track it would help to nd classes based on the
instances matches as already done by DOME matcher. The currently version of
ATMatch only uses the classes to improve the instance correspondences. In the
next version we plan to also add this component to increase the capabilities of
this matcher.
2.7</p>
    </sec>
    <sec id="sec-9">
      <title>Knowledge Graph</title>
      <p>The results of ATBox are similar to previous years because the class hierarchy in
this track is not deep. One possibility would be to use the categories (connected
with property dcterms:subject2) as an additional type of class information.</p>
      <p>The F-Measure is 0.85 which is only slightly higher than the baseline using
label and alternative label (0.84). Only ALOD2Vec and Wiktionary can improve
on these results (both 0.87).</p>
      <p>Regarding the runtime, ATMatcher is the fastest one with only 20 minutes
for all test cases. Only the baselines are faster which need usually 11 minutes.</p>
      <p>
        The con dences of the overall KG track alignment are visualized in gure 3
(generated with MELT dashboard[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]). The di erent hard coded con dence values
can be seen very well and show that 0.4 and 0.5 has many false positives similar
to 0.8.
      </p>
    </sec>
    <sec id="sec-10">
      <title>Discussions on the way to improve the proposed system</title>
      <p>
        We would like to extend the matching pipeline with further components such as
transformer[
        <xref ref-type="bibr" rid="ref1 ref6">1,6</xref>
        ] based comparison between a textual representation of resources.
      </p>
      <sec id="sec-10-1">
        <title>2 http://purl.org/dc/terms/subject</title>
        <p>3
3.1</p>
        <sec id="sec-10-1-1">
          <title>General comments</title>
          <p>
            This only works if already created correspondences needs a precise con dence
based on text but does not retrieve any new correspondences because of the
complexity to compare all resources in a cross product manner. One way to
mitigate this problem is to use sentence transformers[
            <xref ref-type="bibr" rid="ref10">10</xref>
            ]. They embed the text
in a high dimensional space and thus allows to retrieve the top-k neighbors of a
given resource.
          </p>
          <p>
            Due to the fact that most of the returned alignments are not consistent with
the ontology, we also plan to include some alignment repair steps [
            <xref ref-type="bibr" rid="ref7">7</xref>
            ] like the
ALCOMO component[
            <xref ref-type="bibr" rid="ref8">8</xref>
            ].
          </p>
          <p>In case the resources have attached images, it would be also interesting to
compare those as well e.g. in the KG track are instances with an image displaying
the concept. With a visual comparison (like same persons etc) the con dence of
a correspondence can be further increased.</p>
          <p>
            Furthermore the schema matches could be improved with the help of instance
correspondences as already shown in the DOME matcher [
            <xref ref-type="bibr" rid="ref2">2</xref>
            ].
4
          </p>
        </sec>
        <sec id="sec-10-1-2">
          <title>Conclusions</title>
          <p>In this paper, we have analyzed the results of ATBox matcher in OAEI 2021.
It shows that the system is very scalable and can generate class, property and
instance alignments.</p>
          <p>
            Most of the used matching components are furthermore included in the
MELT framework[
            <xref ref-type="bibr" rid="ref5">5</xref>
            ] to allow other system developers to reuse them.
          </p>
        </sec>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Devlin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>M.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Toutanova</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Bert: Pre-training of deep bidirectional transformers for language understanding</article-title>
          . arXiv preprint arXiv:
          <year>1810</year>
          .
          <volume>04805</volume>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Hertling</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Paulheim</surname>
          </string-name>
          , H.:
          <article-title>Dome results for oaei 2019</article-title>
          .
          <source>OM@ ISWC 2536</source>
          ,
          <issue>123</issue>
          {
          <fpage>130</fpage>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Hertling</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Paulheim</surname>
          </string-name>
          , H.:
          <article-title>Atbox results for oaei 2020</article-title>
          .
          <source>OM@ ISWC 2788</source>
          ,
          <issue>168</issue>
          {
          <fpage>175</fpage>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Hertling</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Paulheim</surname>
          </string-name>
          , H.:
          <article-title>The knowledge graph track at oaei - gold standards, baselines, and the golden hammer bias</article-title>
          .
          <source>In: The Semantic Web: ESWC 2020</source>
          . pp.
          <volume>343</volume>
          {
          <issue>359</issue>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Hertling</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Portisch</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Paulheim</surname>
          </string-name>
          , H.:
          <article-title>Melt - matching evaluation toolkit</article-title>
          .
          <source>In: SEMANTICS. Karlsruhe</source>
          . (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Hertling</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Portisch</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Paulheim</surname>
          </string-name>
          , H.:
          <article-title>Matching with transformers in melt</article-title>
          .
          <source>In: OM@ ISWC</source>
          (
          <year>2021</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Jimenez-Ruiz</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Meilicke</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grau</surname>
            ,
            <given-names>B.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Horrocks</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Evaluating mapping repair systems with large biomedical ontologies</article-title>
          .
          <source>Description Logics</source>
          <volume>13</volume>
          ,
          <issue>246</issue>
          {
          <fpage>257</fpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Meilicke</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Alignment incoherence in ontology matching (</article-title>
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Portisch</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hertling</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Paulheim</surname>
          </string-name>
          , H.:
          <article-title>Visual analysis of ontology matching results with the melt dashboard</article-title>
          .
          <source>In: European Semantic Web Conference</source>
          . pp.
          <volume>186</volume>
          {
          <fpage>190</fpage>
          . Springer (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Reimers</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gurevych</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Sentence-bert: Sentence embeddings using siamese bertnetworks</article-title>
          .
          <source>In: EMNLP</source>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Serasset</surname>
          </string-name>
          , G.:
          <article-title>Dbnary: Wiktionary as a lemon-based multilingual lexical resource in rdf</article-title>
          .
          <source>Semantic Web</source>
          <volume>6</volume>
          (
          <issue>4</issue>
          ),
          <volume>355</volume>
          {
          <fpage>361</fpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Zamazal</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Svatek</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>The ten-year ontofarm and its fertilization within the onto-sphere</article-title>
          .
          <source>Journal of Web Semantics</source>
          <volume>43</volume>
          ,
          <issue>46</issue>
          {
          <fpage>53</fpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>