<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Instance-Based Property Matching in Linked Open Data Environment</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Cheng Xie</string-name>
          <email>chengxie@sjtu.edu.cn</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dominique Ritze</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Blerina Spahiu</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hongming Cai</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Shanghai Jiao Tong University</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Mannheim</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Milano-Bicocca</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2014</year>
      </pub-date>
      <abstract>
        <p>Instance matching frameworks that identify links between instances, expressed as owl:sameAs assertions, have achieved a high performance while the performance of property matching lags behind. In this paper, we leverage owl:sameAs links and show how these links can help for property matching. Introduction The performance of ontology matching systems on property matching lags signi cantly behind that on class and instance matching [1]. Current state-of-the-art techniques achieve a high performance on instance matching which focus on nding owl:sameAs links between LOD datasets [2]. These linked instances give an important information to the property matching process which we further explore in this paper. We argue that owl:sameAs instance pairs share similar values on similar properties. For this issue, we investigate to which extent we can automatically nd matching properties by exploiting owl:sameAs instance pairs. Approach Figure1 shows a concrete example of matching DBpedia to BTC20144. The proposed approach has four steps which are described in detail below. Linked Data Extraction: As rst step, we extract all owl:sameAs triples whose subject is an instance in BTC2014 while the object is an instance in DBpedia. With the same heuristic we extract owl:sameAs links from DBpedia to BTC2014 so we have a complete set of linked instances between these datasets.</p>
      </abstract>
      <kwd-group>
        <kwd>Property matching</kwd>
        <kwd>Instance-based matching</kwd>
        <kwd>Linked Open Data</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Linked Data as Tables: The table generation approach is based on
DBpediaAsTable5 with proper modi cations to t BTC2014 triples. We create one
table for each rdf:type and place instances of that type in rows, while the
columns contain information about their properties. In practice, large tables are
separated into several small tables by the limitation of 500 rows while columns
are ltered by the density limitation which should be greater than 20%.</p>
      <p>Property Matching: We argue that \owl:sameAs instances share similar
values on similar properties". Once we obtain the owl:sameAs instances and
similar values, similar properties could be inferred. Similar values are detected
by computing similarity measures on literal, numeric and date cells. Afterwards,
we can infer similar properties.</p>
      <p>Parametrization: The nal property correspondences are selected from a
candidate set that is obtained from the property matching in last step. The
selection is made by ltering property pairs using support threshold su and
con dence threshold co. Property pair (p1,p2) holds with support su if su%
of the owl:sameAs instances involved with p1 or p2 contain both p1 and p2.
Property pair (p1,p2) holds with con dence co if co% of value pairs on (p1,p2)
share similar values. We divide our gold standard into a learning set and a
testing set. A genetic learning algorithm is applied on the learning set to obtain
the proper values for su and co.</p>
      <p>Result. We use three string-based metrics, Jaccard, Levenshtein and
ExactEqual as baselines to compare with our approach. All metrics are applied on the
testing set to nd equivalent properties between BTC2014 and DBpedia. The
results and the comparison is shown in Table 1.</p>
      <p>Experiments</p>
      <p>True Positive False Positive GS Pre</p>
      <p>F1
84
52
52
32</p>
      <p>The proposed approach can e ectively match the property pairs which share
similar values such as \landArea" with \areaTotal" and \diedIn" with
\deathPlace". However, similar values also lead to wrong matchings such as
\happenedOnDate" with \date", \capital" with \largestCity" and \hasPhotoCollection"
with \label" which require more semantic matching on property labels than on
their values.
5 http://wiki.dbpedia.org/services-resources/downloads/dbpedia-tables</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Cheatham</surname>
          </string-name>
          and
          <string-name>
            <given-names>P.</given-names>
            <surname>Hitzler</surname>
          </string-name>
          .
          <article-title>The properties of property alignment</article-title>
          .
          <source>In Proc. of the 9th Int.'l Workshop on Ontology Matching (OM)</source>
          , pages
          <fpage>13</fpage>
          {
          <fpage>24</fpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Nentwig</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hartung</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.-C. N.</given-names>
            <surname>Ngomo</surname>
          </string-name>
          , and
          <string-name>
            <given-names>E.</given-names>
            <surname>Rahm</surname>
          </string-name>
          .
          <article-title>A Survey of Current Link Discovery Frameworks</article-title>
          .
          <source>Semantic Web Journal</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>