<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Annotating Web Tables Through Ontology Matching</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Vasilis Efthymiou</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Oktie Hassanzadeh</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mohammad Sadoghi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mariano Rodriguez-Muro</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>IBM T.J. Watson Research Center</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Crete</institution>
          ,
          <country country="GR">Greece</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Web tables have been proven to constitute valuable sources of information for applications, ranging from Web search, to data discovery in spreadsheet software and KB augmentation [1]. A requirement for those applications is to understand the semantics of Web tables and potentially match their contents with existing URIs in the Web of Data, a process known as Web table annotation [4]. Recent works on Web table annotation follow an iterative approach between instance- and schema-level re nements, until convergence [6, 7]. In this work, we annotate Web tables using ontology matching. As this eld has solid tools and benchmarks3, we design a framework that provides the required input to any ontology matching tool, resulting in Web table annotations. Moreover, our blocking enables even the less scalable ontology matching tools provide annotations to large-scale KBs, such as DBpedia. The contributions of our work are: { We introduce a generic and scalable framework for Web table annotation using existing ontology alignment systems. { We evaluate our framework and compare the results against state-of-the-art Web table annotation tools, with promising results. { Our framework can be extended as a benchmark for ontology matching tools.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Introduction
Model. We assume that each table row describes a real-world entity, and each
column represents a property. Each cell of the header row de nes the name of
a property, except the cell of the label column, which de nes the name of the
table's class. All the entities in the table are instances of this class. The values of
a column can be either literals, or references to other entities, corresponding to
dataype, or object properties, respectively. To make this distinction, we sample
the data types of each column, also identifying the label column, as in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. In
a second scan, we create a new instance of the table class for each row, whose
property values are the cell contents of this row for the respective column.
      </p>
      <p>
        Blocking. To enable ontology matching tools that do not scale well be
applicable in this framework, and to improve the e ciency of matching tools that
do scale, we have applied a pre-processing step of candidate mappings selection,
known as blocking [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Speci cally, we retain from DBpedia, the target ontology,
only those instances whose labels match with the labels of our table's instances.
      </p>
    </sec>
    <sec id="sec-2">
      <title>3 http://oaei.ontologymatching.org/</title>
      <p>Finally, we call an ontology matching tool with the table ontology and the
DBpedia ontology after blocking, as input, and return the mapping results.</p>
      <p>
        Evaluation. We evaluate our approach using the instance mappings of the
T2D gold standard4 and LogMap [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], one of the most e cient ontology matching
tools [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Our MapReduce-based framework annotates and evaluates the whole
corpus in less than 4 minutes. Table 1 presents the micro-averaged recall,
precision, and F-measure results, against T2K [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] and two baselines: DBpedia
lookup. For each entity label in our table, we use top-1 DBpedia lookup5
result as annotation. DBpedia lookup re ned. We keep the type of the top-1
lookup result for each cell in a rst scan of the table, and then the top-5 most
frequent types for each column as acceptable types. Then, we perform a second
lookup, restricting the results to the acceptable types, and use the top-1 result
as the annotation.
      </p>
      <p>Table 1. Results over T2D gold standard. Blocking results in parentheses.</p>
      <p>Method Recall Precision F-measure
DBpedia lookup 0.73 0.79 0.76
DBpedia lookup re ned 0.76 0.86 0.81
T2K 0.76 0.90 0.82</p>
      <p>Ontology matching 0.57 (0.71) 0.89 (0.32) 0.70 (0.44)</p>
      <p>The results show that our framework, using LogMap, suggests a good number
of correct results, with high precision. In the future, we plan to improve blocking
and extend our model to provide a rst alignment, which can be utilized by many
ontology matching tools. Our goal is to provide an ontology matching benchmark
for instance-, class- and property-mappings, that can result in a new track in
the upcoming OAEI campaigns.</p>
    </sec>
    <sec id="sec-3">
      <title>4 http://webdatacommons.org/webtables/goldstandard.html</title>
      <p>5 http://wiki.dbpedia.org/projects/dbpedia-lookup</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>S.</given-names>
            <surname>Balakrishnan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. Y.</given-names>
            <surname>Halevy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Harb</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Madhavan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rostamizadeh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Wilder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Wu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Yu</surname>
          </string-name>
          .
          <article-title>Applying WebTables in Practice</article-title>
          .
          <source>In CIDR</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>V.</given-names>
            <surname>Christophides</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Efthymiou</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Stefanidis</surname>
          </string-name>
          .
          <article-title>Entity Resolution in the Web of Data. Synthesis Lectures on the Semantic Web: Theory and Technology</article-title>
          . Morgan &amp; Claypool Publishers,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>E.</given-names>
            <surname>Daskalaki</surname>
          </string-name>
          , G. Flouris,
          <string-name>
            <surname>I. Fundulaki</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Saveta</surname>
          </string-name>
          .
          <article-title>Instance matching benchmarks in the era of linked data</article-title>
          .
          <source>Web Semantics: Science, Services and Agents on the World Wide Web</source>
          ,
          <volume>39</volume>
          :1{
          <fpage>14</fpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>O.</given-names>
            <surname>Hassanzadeh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Ward</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Rodriguez-Muro</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Srinivas</surname>
          </string-name>
          .
          <article-title>Understanding a large corpus of web tables through matching with knowledge bases: an empirical study</article-title>
          .
          <source>In ISWC</source>
          , pages
          <volume>25</volume>
          {
          <fpage>34</fpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>E.</given-names>
            <surname>Jimenez-Ruiz</surname>
          </string-name>
          and
          <string-name>
            <given-names>B. Cuenca</given-names>
            <surname>Grau</surname>
          </string-name>
          . Logmap:
          <article-title>Logic-based and scalable ontology matching</article-title>
          .
          <source>In ISWC</source>
          , pages
          <volume>273</volume>
          {
          <fpage>288</fpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>D.</given-names>
            <surname>Ritze</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Lehmberg</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Bizer</surname>
          </string-name>
          .
          <article-title>Matching HTML tables to dbpedia</article-title>
          .
          <source>In WIMS, pages 10:1{10:6</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhang</surname>
          </string-name>
          .
          <article-title>Towards e cient and e ective semantic table interpretation</article-title>
          .
          <source>In ISWC</source>
          , pages
          <volume>487</volume>
          {
          <fpage>502</fpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>