=Paper= {{Paper |id=Vol-3324/oaei22_paper4 |storemode=property |title=ATBox results for OAEI 2022 |pdfUrl=https://ceur-ws.org/Vol-3324/oaei22_paper4.pdf |volume=Vol-3324 |authors=Sven Hertling,Heiko Paulheim |dblpUrl=https://dblp.org/rec/conf/semweb/HertlingP22 }} ==ATBox results for OAEI 2022== https://ceur-ws.org/Vol-3324/oaei22_paper4.pdf
ATBox Results for OAEI 2022
Sven Hertling, Heiko Paulheim
Data and Web Science Group, University of Mannheim, Germany


                                      Abstract
                                      In this paper we present the results of the ATBox matcher (ATMatcher for short) during the OAEI
                                      campain 2022. The system is able to match instances (Abox) as well as the schema (Tbox) of given
                                      knowledge graphs. It scales to large inputs by using effective and modular approaches implemented in
                                      the MELT framework. The system participates for the third time in the OAEI. First, two pipelines for
                                      matching the schema and instance are used to generate candidates. Afterwards the instance matches are
                                      improved and repaired by reusing the candidate class correspondences.

                                      Keywords
                                      Ontology Matching, Knowledge Graph




1. Presentation of the system
ATBox (also called ATMatcher) is a knowledge graph (KG) matching system which is able
to match ontologies (Tbox) including properties and classes as well as instances (Abox). KG
matching becomes an important step towards building consolidated large scale knowledge
graphs. During the last years, more datasets which require instance alignment are published
(like Spimbench, Link Discovery, Common KG, and Knowledge Graph track). ATBox tackles
this problem by running independent pipelines for instance and schema matches which could
also be parallelized to decrease runtime. In a next step, the schema correspondences are used to
improve the instance alignment.

1.1. State, purpose, general statement
In this year we extended the MELT framework[2] by implementing the alignment repair filter[3]
of LogMap. Due to time constraints the matching component was not yet integrated in the final
system. A comparison between this filter and the other repair component in MELT would be a
useful next step. Similar evaluation was also executed in [4]. Therefore, in this years submission
there are only minor modifications to the system. To make this paper self contained, we keep
the description of the system from last years submission [1] which shown below.
   The overall matching strategy of ATBox is shown in figure 1. The Tbox and Abox have
different processing pipelines but the correspondences are combined in the end to get the final

Ontology Matching Workshop co-located with ISWC 2022
Envelope-Open sven.hertling@uni-mannheim.de (S. Hertling); heiko.paulheim@uni-mannheim.de (H. Paulheim)
GLOBE https://www.uni-mannheim.de/dws/people/researchers/phd-students/sven-hertling/ (S. Hertling);
https://www.uni-mannheim.de/dws/people/professors/prof-dr-heiko-paulheim/ (H. Paulheim)
Orcid 0000-0003-0333-5888 (S. Hertling); 0000-0003-4386-8195 (H. Paulheim)
                                    © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
 CEUR
 Workshop
 Proceedings
               http://ceur-ws.org
               ISSN 1613-0073
                                    CEUR Workshop Proceedings (CEUR-WS.org)
                                                                  if #entities < 10,000

                     TBox1
                                           Stopword                   Synonym
                                                                                            String Matching
                                           Extraction                 Extension
                    TBox 2
                                                                                             Bounded Path
                                                                                               Matching

                                            final
                                         alignment
                                                                 Cardinality Filter


                                                                                             Instance Filter

                     ABox1
                                                                 Similar Neighbors
                                        String Matching                                        Type Filter
                                                                       Filter
                    ABox 2

                                                                  Cosine Similarity            Common
                                                                       Filter               Properties Filter

Figure 1: Overview of the ATBox matcher strategy (from [1]).


alignment. One of the main differences in comparison to the system submitted last year is the
additional bounded path matching for classes.
    First have a look at the Tbox matching. It is applied for all classes and properties (o w l : O b j e c t -
P r o p e r t y , o w l : D a t a t y p e P r o p e r t y , and r d f : P r o p e r t y ). They are retrieved by the jena1 methods
OntModel.listClasses() and OntModel.listAllOntProperties().
    The first step is to extract KG specific stopwords because in some cases the labels and/or
fragments contains tokens which appears very often like c l a s s , i n f o b o x etc. If these tokens
appears in more than 20 % of all classes/properties, then they are assumed to be stop words.
    The synonyms are extracted from the English Wiktionary via DBnary [5]. The extraction
process is detailed in the previous results paper[6] similarly to the string matching component.
After these components the new bound path matching is executed. This component will match
classes which are in between two already matched classes in a hierarchy. Thus it is a structural
approach which requires already matched resources. Figure 2 shows an example. The class
book is matched to class books and novel to novel. With this information, the class in between
is a candidate for another correspondence. Thus it will be added with the average confidence of
the other two correspondences.
    The instance matching (Abox - shown in the lower part of the figure 1) is kept the same in
comparison to the last submission. As a last step, all correspondences are combined and a final
cardinality filter ensures a one to one alignment by comparing the confidence scores.

1.2. Specific techniques used
We used the following matching components of MELT [2]:

1
    https://jena.apache.org
                                                        KG 1   KG 2

                                    one:Book                          two:Books

                                          rdfs:subClassOf                   rdfs:subClassOf

                                    one:Fiction                       two:Enterta
                                      Book                              inment
                       rdfs:subClassOf      rdfs:subClassOf                 rdfs:subClassOf

                        one:novel
                                               one:novel              two:Novel
                          crime

Figure 2: Bounded path matching of a class hierarchy. The top and bottom lines are already matched
classes. The middle line represents a new correspondence (from [1]).


       • ScalableStringProcessingMatcher
       • StopwordExtraction
       • SimilarNeighborsFilter
       • CommonPropertiesFilter
       • CosineSimilarityConfidenceMatcher
       • SimilarTypeFilter
       • NaiveDescendingExtractor
       • BoundedPathMatching

1.3. Adaptations made for the evaluation
ATBox matcher is available as a docker image. When starting a container given the image,
an HTTP endpoint is started on port 8080 which fulfills the requirements of the web based
interface described in the MELT user guide 2 .

1.4. Link to the system and parameters file
ATBox matcher can be downloaded from
https://www.dropbox.com/s/l344aawh0mw6rjm/atmatcher-1.0-web-latest.tar.gz?dl=0.


2. Results
This section discusses the results of ATBox for each track of OAEI 2022 where the matcher
is able to produce results. The following tracks are included: anatomy, conference, bio-ml,
commonKG and knowledge graph track.


2
    https://dwslab.github.io/melt/matcher-packaging/web
2.1. Anatomy
The F-Measure didn’t change in comparison to last years submission which was expected. It is
still 0.794. This beats the baseline but only by a small margin. The matcher is very precision
oriented and achieves the third highest value after the string baseline and ALIN. The recall can
be optimized by not only using synonyms from wordnet but also other external sources. We
hope to create a coherent alignment in the next submission by using the aforementioned repair
strategies.

2.2. Conference
In the conference track, ATBox matcher has a F-Measure of 0.59 using the rar2-M3 evaluation
setup [7] (which is a violation free version of the entailed reference alignment for classes and
properties). This is the fourth highest value after LogMap, GraphMatcher, and SEBMatcher.
Again the recall (with 0.51) is lower than precision (with 0.69).
   This year a new subtrack was also evaluated. The task is to match DBpedia to OntoFarm
ontologies. ATBox was one of six systems which was able to solve this task. The F-Measure is
0.55 (same with KGMatcher+ and LSMatch) and only LogMap was better in those test cases.

2.3. Common Knowledge Graphs
In this track the task is to align classes in two given KGs. This allows to use instance matches
to find useful class correspondences. In 2021 there was only one task where the input graphs
are Nell and DBpedia. This year a new task was added which aligns YAGO and Wikidata.
   In the first task, ATBox scored 0.89 (only Matcha, KGMatcher+ is better). In the second task
the situation is exactly the same - meaning that the proposed system is on the third place. This
also shows that the two matching tasks are very similar to each other.
   For this track it would be beneficial if classes matches are created with the help of instances
correspondences as already done by DOME matcher. The current version only uses the schema
matches to improve the instance alignment.

2.4. Knowledge Graph
In the KG track ATMatcher is the best matching system with an overall F-Measure of 0.84. In
previous years, some systems were able to beat this score by 0.03 like Wiktionary matcher.
   Due to the fact that scalability is one crucial factor for developing the system, it shows clearly
that it is the fastest one after the baselines. The proposed system only needs 19 minutes for all
test cases.


3. General comments
3.1. Discussions on the way to improve the proposed system
Due to the fact that the two matching pipelines are independent of each other, the runtime of
the system could be further decreased by parallelization.
   Another possible way to improve the system is to incorporate a transformer model [8] as
already shown in [9]. Furthermore, the created alignments can be logically checked at the
end of the pipeline. Possible approaches are the LogMap reapir strategy [3] or the ALCOMO
component[10].
   Finally, one could not only improve instance matches by schema macthes but also the other
way around. This was not implemented in the presented approach but would help especially in
the Common KG track.


4. Conclusions
In this paper, we have analyzed the results of ATBox matcher in OAEI 2022. The system is
scalable and can generate class, property and instance alignments.
  Most of the components which are used in ATBox are included in the MELT framework[2]
which allows other researchers to reuse and compose components in their own systems.


References
 [1] S. Hertling, H. Paulheim, Atbox results for oaei 2021, OM @ ISWC 3063 (2021) 137–143.
 [2] S. Hertling, J. Portisch, H. Paulheim, Melt - matching evaluation toolkit, in: SEMANTICS.
     Karlsruhe., 2019, pp. 231–245.
 [3] A. Solimando, E. Jimenez-Ruiz, G. Guerrini, Minimizing conservativity violations in
     ontology alignments: Algorithms and evaluation, Knowledge and Information Systems 51
     (2017) 775–819.
 [4] E. Jiménez-Ruiz, C. Meilicke, B. C. Grau, I. Horrocks, Evaluating mapping repair systems
     with large biomedical ontologies., Description Logics 13 (2013) 246–257.
 [5] G. Sérasset, Dbnary: Wiktionary as a lemon-based multilingual lexical resource in rdf,
     Semantic Web 6 (2015) 355–361.
 [6] S. Hertling, H. Paulheim, Atbox results for oaei 2020, OM@ ISWC 2788 (2020) 168–175.
 [7] O. Zamazal, V. Svátek, The ten-year ontofarm and its fertilization within the onto-sphere,
     Journal of Web Semantics 43 (2017) 46–53.
 [8] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional
     transformers for language understanding, arXiv preprint arXiv:1810.04805 (2018).
 [9] S. Hertling, J. Portisch, H. Paulheim, Matching with transformers in melt, in: OM@ ISWC,
     2021, pp. 13–24.
[10] C. Meilicke, Alignment incoherence in ontology matching (2011).