A Benchmark for Testing Instance-Based Ontology
                      Matching Methods

                  Katrin Zaiss                          Sven Vater                       Stefan Conrad
          Institute of Computer Science        Institute of Computer Science      Institute of Computer Science
                 Universitaetsstr. 1                  Universitaetsstr. 1                Universitaetsstr. 1
          40225 Duesseldorf, Germany           40225 Duesseldorf, Germany         40225 Duesseldorf, Germany
                zaiss@cs.uni-                       Sven.Vater@uni-                     conrad@cs.uni-
                duesseldorf.de                       duesseldorf.de                     duesseldorf.de


ABSTRACT                                                        instances of the ontologies created with the STBenchmark
The matching of ontologies is a problem solved by many          are created artificially and do not contain instance varia-
different matching systems using various algorithms. To test    tions/modifications and the IIMB benchmark is again de-
different methods or a complete system or to compare the        signed for instance matching tasks and only provides a small
systems among each other a common data set is needed.           ontology with no changes on the concept level. The lack of a
There are already some benchmarks containing many test          reasonable amount of instances in existing benchmarks mo-
scenarios available, but they mainly focus on concept-based     tivates the development of an additional benchmark, which
matching algorithms or on instance matching (the process of     we present in this paper.
finding similar instances). Instance-based methods cannot
be tested sufficiently, because the ontologies do not contain
instances at all or the number of instances is very small.      2.   DEVELOPING THE BENCHMARK
In this poster we introduce a new benchmark, ONTOBI,            Under consideration of the advantages and disadvantages of
which makes use of Wikipedia to create a benchmark test         existing benchmarks and regarding the personal experiences
series with ontologies that contain many instances.             with matching systems, we defined a set of requirements that
                                                                our new benchmark should fulfill. First of all, the general re-
                                                                quirements for evaluation frameworks as described in [ES07]
1.   INTRODUCTION                                               should be considered, i.e. systematic procedure, continuity,
Ontologies represent knowledge in a structure way. In many
                                                                quality and equity, dissemination and intelligibility. Addi-
application areas there is a need to match ontologies, e.g.
                                                                tionally, we formulated some more criteria: bigger amount of
in the field of query answering on heterogeneous sources. In
                                                                instances, varying structure, different data formats, spelling
the past many matching systems have been developed to
                                                                mistakes and 1:n mappings. As described before, our focus
cope with this problem, an overview can be found in [ES07].
                                                                is set to a huge amount of instances to enable the evalua-
Generally, the used methods can be divided into concept-,
                                                                tion of instance-based matching methods, but we also want
structure- and instance-based approaches and in most cases
                                                                to provide a complete benchmark with which all kinds of
a matching system uses a combination of those approaches.
                                                                methods and systems can be tested.
To test the efficiency of single algorithms or complete sys-
                                                                Similar to the OAEI benchmark, our ONTOBI benchmark
tems, or to compare systems among each other based on a
                                                                consists of different test scenarios, whereas a reference on-
common data set, appropriate ontologies and test cases have
                                                                tology provides the basis for each test case. The reference
to be developed.
                                                                ontologies gets modified by applying one or more of the
Currently, there are already some benchmarks, e.g. the one
                                                                modifications described in Table 1. The modifications are
published by the OAEI [OAE09], the STBenchmark [ATV08]
                                                                applied on different parts of the ontology (instance set, con-
or the IIMB [FLMV08]. The OAEI benchmark ontologies
                                                                cept names, etc.), and in most cases only a subset of the
only contain a very small number of instances for a few
                                                                according data set is changed. This modified ontology has
concepts. In 2009 some instance-matching tasks have been
                                                                to be matched against the original reference ontology, an
added, but they cannot easily be adapted for instance-based
                                                                overview of the process is given in Figure 1. The reference
ontology matching methods, because the concept informa-
                                                                alignment is given as well (the format for this alignment is
tion remains the same for all tests and the reference align-
                                                                borrowed from the Ontology Alignment API [Euz06]), such
ment only contains instance-to-instance correspondences. The
                                                                that the results can be evaluated by e.g. calculating Preci-
                                                                sion and Recall.
                                                                We decided to use Wikipedia as the basis for our reference
                                                                ontology, because Wikipedia provides a lot of information
                                                                within its structured infoboxes, and developed a tool, that
                                                                extracts concepts, attributes, relations and instances out
                                                                of these infoboxes. The reference ontology consists of 17
                                                                classes, 13 object properties and 128 data type properties.
                                                                It is constructed around different concepts describing the
                                                                geographical structure of our world, i.e. countries, states,
         identifier                   modification                          test number           modification(s)
  simple transformations                                                    simple tests
            M                       spelling mistakes                           OS1                     M
             F                       changed format                             OS2                     S1
            L1               different naming conventions                       OS3                     I2
            S1                   suppressed comments                            OS4                     L4
            S2                        no data types                             OS5                     L2
            I1                    overlapping data sets                         OS6                     L3
            I2                       subset data sets                           OS7                     H1
 complex transformations                                                        OS8                     H2
            H1                     expanded structure                 complex tests (two mods)
            H2                     flattened structure                          OC1                   M, F
            L2                      another language                            OC2                   L3, S1
            L3                        random names                              OC3                   L4, I1
            L4                          synonyms                                OC4                    F, I3
            I3                      disjunct data sets                          OC5                   H2, I1
                                                                                OC6                   H1, S1
                                                                     complex tests (three mods)
                                                                               OCC1                  M, L4, S1
       Table 1: Overview of the modifications                                  OCC2                 H2, L1, I2
                                                                               OCC3                 I3, H1 , L4
                       Test case                                               OCC4                S1, F, L3, I2

                 reference ontology                                   Table 2: Overview of the benchmark


                                                               work we want to focus on developing more complex modifi-
                mods           Alignment                       cations on the instance level.

                                                               4.   REFERENCES
                                                               [ATV08]   Bogdan Alexe, Wang-Chiew Tan, and Yannis
                  modified ontology
                                                                         Velegrakis. STBenchmark: Towards a
                                                                         Benchmark for Mapping Systems. Proc. VLDB
                                                                         Endow., 1(1):230–244, 2008.
          Figure 1: Overview of a test case                    [ES07]    Jérôme Euzenat and Pavel Shvaiko. Ontology
                                                                         Matching. Springer-Verlag, Heidelberg (DE),
                                                                         2007.
cities and languages. Additionally there are concepts de-      [Euz06]   Jérôme Euzenat. An API for ontology
scribing different kinds of entertainment instruments, such              Alignment (version 2.1).
as books, movies or songs with their corresponding authors,              http://gforge.inria.fr/docman/
actors and singers. Another part of the ontology deals with              view.php/117/251/align.pdf, 2006.
companies and their products, e.g. cars, mobile phones or
                                                               [FLMV08] Alfio Ferrara, Davide Lorusso, Stefano
magazines. The most important issue for ONTOBI is the in-
                                                                         Montanelli, and Gaia Varese. Towards a
stance set. Currently, the reference ontology contains more
                                                                         Benchmark for Instance Matching. In
than 3500 instances, but the number grows constantly.
                                                                         Proceedings of the 3rd International Workshop
The different combinations of modification that we applied
                                                                         on Ontology Matching (OM-2008) Collocated
on the reference ontology and hence the different test cases
                                                                         with the 7th International Semantic Web
can be found in Table 2. All modifications are executed
                                                                         Conference (ISWC-2008), Karlsruhe, Germany,
manually by using an ontology editor like Protégé [Pro09].
                                                                         October 26, 2008, 2008.
                                                               [LBK+ 09] Jens Lehmann, Chris Bizer, Georgi Kobilarov,
The complete benchmark, including the ontologies and the
                                                                         Søren Auer, Christian Becker, Richard
reference alignmenta, are available for download on demand.
                                                                         Cyganiak, and Sebastian Hellmann. DBpedia -
                                                                         A Crystallization Point for the Web of Data.
3.   FUTURE WORK                                                         Journal of Web Semantics, 2009.
The work on this benchmark is still in progress. Currently,    [OAE09] Ontology Alignment Evaluation Initiative -
we enhanced the quality of our benchmark by directly de-                 OAEI-2009 Campaign.
riving the ontologies from DBPedia [LBK+ 09] (see                        http://oaei.ontologymatching.org/2009/, 2009.
www.dbs.cs.uni-duesseldorf.de/projekte/ONTOBI). We also        [Pro09]   The Protégé Ontology Editor and Knowledge
increased the number of concepts, attributes and instances,              Acquisition System.
the modifications have been slightly changed and the test                http://protege.stanford.edu/, December 2009.
cases have been reorganized. Additionally, an ontology mod-
ificator has been implemented which automatically applies
selected transformation on the reference ontology. In future