A Benchmark for Testing Instance-Based Ontology Matching Methods Katrin Zaiss Sven Vater Stefan Conrad Institute of Computer Science Institute of Computer Science Institute of Computer Science Universitaetsstr. 1 Universitaetsstr. 1 Universitaetsstr. 1 40225 Duesseldorf, Germany 40225 Duesseldorf, Germany 40225 Duesseldorf, Germany zaiss@cs.uni- Sven.Vater@uni- conrad@cs.uni- duesseldorf.de duesseldorf.de duesseldorf.de ABSTRACT instances of the ontologies created with the STBenchmark The matching of ontologies is a problem solved by many are created artificially and do not contain instance varia- different matching systems using various algorithms. To test tions/modifications and the IIMB benchmark is again de- different methods or a complete system or to compare the signed for instance matching tasks and only provides a small systems among each other a common data set is needed. ontology with no changes on the concept level. The lack of a There are already some benchmarks containing many test reasonable amount of instances in existing benchmarks mo- scenarios available, but they mainly focus on concept-based tivates the development of an additional benchmark, which matching algorithms or on instance matching (the process of we present in this paper. finding similar instances). Instance-based methods cannot be tested sufficiently, because the ontologies do not contain instances at all or the number of instances is very small. 2. DEVELOPING THE BENCHMARK In this poster we introduce a new benchmark, ONTOBI, Under consideration of the advantages and disadvantages of which makes use of Wikipedia to create a benchmark test existing benchmarks and regarding the personal experiences series with ontologies that contain many instances. with matching systems, we defined a set of requirements that our new benchmark should fulfill. First of all, the general re- quirements for evaluation frameworks as described in [ES07] 1. INTRODUCTION should be considered, i.e. systematic procedure, continuity, Ontologies represent knowledge in a structure way. In many quality and equity, dissemination and intelligibility. Addi- application areas there is a need to match ontologies, e.g. tionally, we formulated some more criteria: bigger amount of in the field of query answering on heterogeneous sources. In instances, varying structure, different data formats, spelling the past many matching systems have been developed to mistakes and 1:n mappings. As described before, our focus cope with this problem, an overview can be found in [ES07]. is set to a huge amount of instances to enable the evalua- Generally, the used methods can be divided into concept-, tion of instance-based matching methods, but we also want structure- and instance-based approaches and in most cases to provide a complete benchmark with which all kinds of a matching system uses a combination of those approaches. methods and systems can be tested. To test the efficiency of single algorithms or complete sys- Similar to the OAEI benchmark, our ONTOBI benchmark tems, or to compare systems among each other based on a consists of different test scenarios, whereas a reference on- common data set, appropriate ontologies and test cases have tology provides the basis for each test case. The reference to be developed. ontologies gets modified by applying one or more of the Currently, there are already some benchmarks, e.g. the one modifications described in Table 1. The modifications are published by the OAEI [OAE09], the STBenchmark [ATV08] applied on different parts of the ontology (instance set, con- or the IIMB [FLMV08]. The OAEI benchmark ontologies cept names, etc.), and in most cases only a subset of the only contain a very small number of instances for a few according data set is changed. This modified ontology has concepts. In 2009 some instance-matching tasks have been to be matched against the original reference ontology, an added, but they cannot easily be adapted for instance-based overview of the process is given in Figure 1. The reference ontology matching methods, because the concept informa- alignment is given as well (the format for this alignment is tion remains the same for all tests and the reference align- borrowed from the Ontology Alignment API [Euz06]), such ment only contains instance-to-instance correspondences. The that the results can be evaluated by e.g. calculating Preci- sion and Recall. We decided to use Wikipedia as the basis for our reference ontology, because Wikipedia provides a lot of information within its structured infoboxes, and developed a tool, that extracts concepts, attributes, relations and instances out of these infoboxes. The reference ontology consists of 17 classes, 13 object properties and 128 data type properties. It is constructed around different concepts describing the geographical structure of our world, i.e. countries, states, identifier modification test number modification(s) simple transformations simple tests M spelling mistakes OS1 M F changed format OS2 S1 L1 different naming conventions OS3 I2 S1 suppressed comments OS4 L4 S2 no data types OS5 L2 I1 overlapping data sets OS6 L3 I2 subset data sets OS7 H1 complex transformations OS8 H2 H1 expanded structure complex tests (two mods) H2 flattened structure OC1 M, F L2 another language OC2 L3, S1 L3 random names OC3 L4, I1 L4 synonyms OC4 F, I3 I3 disjunct data sets OC5 H2, I1 OC6 H1, S1 complex tests (three mods) OCC1 M, L4, S1 Table 1: Overview of the modifications OCC2 H2, L1, I2 OCC3 I3, H1 , L4 Test case OCC4 S1, F, L3, I2 reference ontology Table 2: Overview of the benchmark work we want to focus on developing more complex modifi- mods Alignment cations on the instance level. 4. REFERENCES [ATV08] Bogdan Alexe, Wang-Chiew Tan, and Yannis modified ontology Velegrakis. STBenchmark: Towards a Benchmark for Mapping Systems. Proc. VLDB Endow., 1(1):230–244, 2008. Figure 1: Overview of a test case [ES07] Jérôme Euzenat and Pavel Shvaiko. Ontology Matching. Springer-Verlag, Heidelberg (DE), 2007. cities and languages. Additionally there are concepts de- [Euz06] Jérôme Euzenat. An API for ontology scribing different kinds of entertainment instruments, such Alignment (version 2.1). as books, movies or songs with their corresponding authors, http://gforge.inria.fr/docman/ actors and singers. Another part of the ontology deals with view.php/117/251/align.pdf, 2006. companies and their products, e.g. cars, mobile phones or [FLMV08] Alfio Ferrara, Davide Lorusso, Stefano magazines. The most important issue for ONTOBI is the in- Montanelli, and Gaia Varese. Towards a stance set. Currently, the reference ontology contains more Benchmark for Instance Matching. In than 3500 instances, but the number grows constantly. Proceedings of the 3rd International Workshop The different combinations of modification that we applied on Ontology Matching (OM-2008) Collocated on the reference ontology and hence the different test cases with the 7th International Semantic Web can be found in Table 2. All modifications are executed Conference (ISWC-2008), Karlsruhe, Germany, manually by using an ontology editor like Protégé [Pro09]. October 26, 2008, 2008. [LBK+ 09] Jens Lehmann, Chris Bizer, Georgi Kobilarov, The complete benchmark, including the ontologies and the Søren Auer, Christian Becker, Richard reference alignmenta, are available for download on demand. Cyganiak, and Sebastian Hellmann. DBpedia - A Crystallization Point for the Web of Data. 3. FUTURE WORK Journal of Web Semantics, 2009. The work on this benchmark is still in progress. Currently, [OAE09] Ontology Alignment Evaluation Initiative - we enhanced the quality of our benchmark by directly de- OAEI-2009 Campaign. riving the ontologies from DBPedia [LBK+ 09] (see http://oaei.ontologymatching.org/2009/, 2009. www.dbs.cs.uni-duesseldorf.de/projekte/ONTOBI). We also [Pro09] The Protégé Ontology Editor and Knowledge increased the number of concepts, attributes and instances, Acquisition System. the modifications have been slightly changed and the test http://protege.stanford.edu/, December 2009. cases have been reorganized. Additionally, an ontology mod- ificator has been implemented which automatically applies selected transformation on the reference ontology. In future