<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>LANCE: A Generic Benchmark Generator for Linked Data</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Tzanina Saveta</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Evangelia Daskalaki</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giorgos Flouris</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Irini Fundulaki</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Axel-Cyrille Ngonga Ngomo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>IFI/AKSW, University of Leipzig</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Institute of Computer Science-FORTH</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Identifying duplicate instances in the Data Web is most commonly performed (semi-)automatically using instance matching frameworks. However, current instance matching benchmarks fail to provide end users and developers with the necessary insights pertaining to how current frameworks behave when dealing with real data. In this demo paper, we present Lance, a domain-independent instance matching benchmark generator for Linked Data. Lance is the rst benchmark generator for Linked Data to support semantics-aware test cases that take into account complex OWL constructs in addition to the standard test cases related to structure and value transformations. Lance supports the definition of matching tasks with varying degrees of di culty and produces a weighted gold standard, which allows a more ne-grained analysis of the performance of instance matching tools. It can accept as input any linked dataset and its accompanying schema to produce a target dataset implementing test cases of varying levels of di culty. In this demo, we will present the benchmark generation process underlying Lance as well as the user interface designed to support Lance users.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Instance matching (IM), refers to the problem of identifying instances that
describe the same real-world object. With the increasing adoption of Semantic Web
technologies and the publication of large interrelated RDF datasets and
ontologies that form the Linked Data (LD) Cloud a number of IM techniques adapted
to this setting have been proposed [
        <xref ref-type="bibr" rid="ref1 ref2 ref3">1,2,3</xref>
        ]. Clearly, the large variety of IM
techniques requires their comparative evaluation to determine which technique is
best suited for a given application. Assessing the performance of these systems
generally requires well-de ned and widely accepted benchmarks to determine the
weak and strong points of the methods or systems in addition to motivate the
development of better systems to overcome the identi ed weak points. Hence,
suited benchmarks help push the limit of existing systems [
        <xref ref-type="bibr" rid="ref4 ref5 ref6 ref7 ref8">4,5,6,7,8</xref>
        ], advancing
both research and technology.
      </p>
      <p>
        In this paper1, we describe Lance, a exible, generic and domain-independent
benchmark generator for IM systems. Lance supports a large variety of value,
? This work was partially supported by the EU FP7 projects LDBC (FP7-ICT-2011-8
#317548) and H2020 PARTHENOS (#654119).
1 This demo paper is a companion paper to the accepted ISWC research paper [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
structure based and semantics-aware transformations with varying degrees of
di culty. The results of these transformations can be recorded in the form of a
weighted gold standard that allows a more ne-grained analysis of the
performance of instance matching tools. This paper focuses on describing the interface
that allows users to generate a benchmark by providing the di erent parameters
that determine the characteristics of the benchmark (source datasets, types and
severity of transformations, size of the generated dataset and further con
gurations such as new namespace for the transformed instances, output date format
and other). Details on the di erent types of transformations, our weighted gold
standard and metrics, as well as the evaluation of our system can be found in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
Our demo can be found at http://tinyurl.com/pvex9hu.
2
      </p>
      <p>
        LANCE Approach
Lance [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] is a exible, generic and domain-independent benchmark generator
for IM systems whose main features are:
Transformation-based test cases. Lance supports a set of test cases based
on transformations that distinguish di erent types of matching entities. Similarly
to existing IM benchmarks, Lance supports value-based (typos, date/number
formats, etc.) and structure-based (deletion of classes/properties, aggregations,
splits, etc.) test cases. Lance is the rst benchmark generator to support
semanticsaware test cases that go beyond the standard RDFS constructs and allow testing
the ability of IM systems to use the semantics of RDFS/OWL axioms to
identify matches and include tests involving instance (in)equality, class and property
equivalence and disjointness, property constraints, as well as complex class de
nitions. Lance also supports simple combination (SC) test cases (implemented
using the aforementioned transformations applied on di erent triples pertaining
to the same instance), as well as complex combination (CC) test cases
(implemented by combinations of individual transformations on the same triple).
Similarity score and ne-grained evaluation metrics. Lance provides an
enriched, weighted gold standard and related evaluation metrics, which allow a
more ne-grained analysis of the performance of systems for tests with varying
di culty. The gold standard indicates the matches between source and target
instances. In particular, each match in the gold standard is enriched with
annotations speci c to the test case that generated each pair, i.e., the type of test case
it represents, the property on which a transformation was applied, and a
similarity score (or weight) of the pair of reported matched instances that essentially
quanti es the di culty of nding a particular match. This detailed information
allows Lance to provide more detailed views and novel evaluation metrics to
assess the completeness, soundness, and overall matching quality of an IM
system on top of the standard precision/recall metrics. Therewith, Lance provides
ne-grained information to support debugging and extending IM systems.
High level of customization and scalability testing. Lance provides the
ability to build benchmarks with di erent characteristics on top of any input
dataset, thereby allowing the implementation of diverse test cases for di erent
domains, dataset sizes and morphology. This makes Lance highly customizable
and domain independent; it also allows systematic scalability testing of IM
systems, a feature which is not available in most state-of-the-art IM benchmarks.
taaD iItsennog lueodM
      </p>
      <p>RDF Repository</p>
      <p>LSPRAQ irsueeQ (caSehm )ttsSInitialization Test Case Generator</p>
      <p>a Module
LSPRAQ irsueeQ I()R GReensoeurartcoer</p>
      <p>Resource
Transformation</p>
      <p>Module
Matched Instances</p>
      <p>Weight Computation Module
MATCHER</p>
      <p>SAMPLER</p>
      <p>RESCAL
In the following, we present the functionality which we will also explain during
the demo. Architecturally, Lance consists of two components: (i) an RDF
repository that stores the source datasets, and (ii) a test case generator, which takes
a source dataset as input and produces a target dataset. The target dataset is
generated by using some or all of the various test cases implemented by Lance
according to the con guration parameters speci ed by the user (see Figure 1).</p>
      <p>The test case generator consists of the initialization, resource generator and
the resource transformation modules. The rst reads the generation parameters
and retrieves the schema that will be used for producing the target dataset.
The Resource Generator uses this input to retrieve instances of those schema
constructs and to pass them (along with the con guration parameters) to the
resource transformation module, which creates and stores one transformed
instance per source instance. Once Lance has performed all the requested
transformations, the Weight Computation Module calculates the similarity scores of
the produced matches.</p>
      <p>We have developed a Web application on top of Lance accessible at http:
//tinyurl.com/pvex9hu that allows one to produce benchmarks by selecting
the source dataset (which will be transformed to produce the target dataset)
and the corresponding gold standard. The produced benchmark (source and
target dataset, gold standard) is then sent to an email address (also speci ed via
the interface), allowing the user to test IM systems by comparing the produced
matches (between the source and target dataset) against the gold standard. The
benchmark generation is based on a set of con guration parameters which can
be tuned via the interface (see Figure 2). The con guration parameters specify
the part of the schema and data to consider when producing the target dataset
as well as the percentage and type of transformations to consider. The idea
behind con guration parameters is to allow one to tune the benchmark generator
into producing benchmarks of varying degrees of di culty which test di erent
aspects of an instance matching tool. The interested reader may also nd a video
demonstrating the basic functionality in http://tinyurl.com/ou69jt9.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>R.</given-names>
            <surname>Isele</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Jentzsch</surname>
          </string-name>
          , et al.
          <article-title>Silk Server - Adding missing Links while consuming Linked Data</article-title>
          .
          <source>In COLD</source>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>A</surname>
          </string-name>
          .
          <string-name>
            <surname>-C.</surname>
          </string-name>
          <article-title>Ngonga Ngomo and Soren Auer. LIMES - A Time-E cient Approach for Large-Scale Link Discovery on the Web of Data</article-title>
          . IJCAI,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>K.</given-names>
            <surname>Stefanidis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Efthymiou</surname>
          </string-name>
          , et al.
          <article-title>Entity resolution in the web of data</article-title>
          . In WWW, Companion Volume,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>Ontology</given-names>
            <surname>Alignment</surname>
          </string-name>
          <article-title>Evaluation Initiative</article-title>
          . http://oaei.ontologymatching.org/.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>K.</given-names>
            <surname>Zaiss</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Conrad</surname>
          </string-name>
          , et al.
          <article-title>A Benchmark for Testing Instance-Based Ontology Matching Methods</article-title>
          . In KMIS,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>B.</given-names>
            <surname>Alexe</surname>
          </string-name>
          , W.-C
          <string-name>
            <surname>Tan</surname>
          </string-name>
          , et al.
          <article-title>STBenchmark: Towards a benchmark for mapping systems</article-title>
          .
          <source>In PVLDB</source>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>T.</given-names>
            <surname>Saveta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Daskalaki</surname>
          </string-name>
          , et al.
          <article-title>Pushing the Limits of Instance Matching Systems: A Semantics-Aware Benchmark for Linked Data</article-title>
          .
          <source>In WWW (Companion Volume)</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>T.</given-names>
            <surname>Saveta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Daskalaki</surname>
          </string-name>
          , et al. LANCE:
          <article-title>Piercing to the Heart of Instance Matching Tools</article-title>
          .
          <source>In ISWC</source>
          ,
          <year>2015</year>
          . To appear.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>