<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>T2LD : Interpreting and Representing Tables as Linked Data?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Varish Mulwad</string-name>
          <email>varish1@cs.umbc.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tim Finin</string-name>
          <email>finin@cs.umbc.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Zareen Syed</string-name>
          <email>zareensyed@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Anupam Joshi</string-name>
          <email>joshi@cs.umbc.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Maryland</institution>
          ,
          <addr-line>Baltimore County, Baltimore MD USA 21250</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>We describe a framework and prototype system for interpreting tables and extracting entities and relations from them, and producing a linked data representation of the table's contents. This can be used to annotate the table or to add new facts to the linked data collection.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Existing systems for extracting knowledge from tables [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] require human
intervention and do not focus on a complete interpretation of the table, nor
integrating the table with linked open data cloud. This poster paper focuses on
an automatic framework for generating an linked RDF which can be integrated
into the LOD cloud. The eventual goal of this work is to enrich the LOD cloud by
learning new facts and knowledge from tables and publishing it on the Semantic
Web.
      </p>
      <p>To develop an overall interpretation
of a table, we assign every column header city state mayor population
a class label from an appropriate ontol- PBhiallatdi meloprheia MPAD MS..DNiuxtotner 1654000000000
ogy, e.g., the column with header City Washington DC A.Fenty 595000
is assigned a class label dbpedia-owl:City New York NY M.Bloomberg 8400000
from the DBpedia ontology. For the ta- Boston MA T.Menino 610000
ble in Figure 1, we link \Baltimore" to Fig. 1: In simple tables column headers suggests
dbpedia:Baltimore. Numbers can be map- the type of data stored in columns and cell
values denote instances of that type.
ped as values of properties which can
be associated with entities in the table.</p>
      <p>We also identify the relations implicit between columns, e.g., that
dbpediaowl:largestCity seems to hold between the entities denoted by cell values in the
rst two columns (i.e., city and state). Finally this information is represented in
a N3 serialization of RDF.
2</p>
    </sec>
    <sec id="sec-2">
      <title>T2LD Framework</title>
      <p>
        Given an table as input, the T2LD framework [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] begins with the process of
assigning a class label to every column in the table. For all the cell values in every
column of the table, the algorithm for assigning class labels (see Algorithm 1 in
[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]) submits a complex query to the Wikitology knowledge base to determine the
type of each cell value in the column. Each class label from the set of possible
class labels obtained from query results is scored. The class label with the highest
score is chosen as the class label to be associated with the column. We predict
class labels from four vocabularies - DBpedia Ontology, Freebase, WordNet, and
Yago.
      </p>
      <p>
        Using the class labels as additional evidence, for every MAP columns
table cell, the algorithm for linking table cell to entities (see m = 1 11.53%
Algorithm 2 in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] for detailed algorithm), re-queries the KB. 0 &lt;mm= &lt;0 1 1699..2243%%
For every table cell, the KB returns the top N possible enti- Recall columns
ties. For each of the top N entities, the algorithm generates r = 1 46.15%
a feature vector consisting of the entity's KB score, entity's 0 &lt;r =r &lt;0 1 3149..6214%%
Wikipedia page length, entity's page rank, the Levenshtein
distance between the entity and the string in the query and Fig. 2: The
percentthe Dice score between the entity and the string. The set
avagreiooufscMolAumPnasndwirtehfeature vectors for each table cell are ranked using a SVM- call scores.
Rank classi er. To the highest rank feature vector from SVM rank, two more
features are added - the SVM rank score of the feature vector and the di erence
\City"@en is rdfs:label of dbpedia-owl:City .
\State"@en is rdfs:label of dbpedia-owl:AdminstrativeRegion .
\Baltimore"@en is rdfs:label of dbpedia:Baltimore .
dbpedia:Baltimore a dbpedia-owl:City .
\MD"@en is rdfs:label of dbpedia:Maryland .
dbpedia:Maryland a dbpedia-owl:AdministrativeRegion .
dbpprop:LargestCity rdfs:domain dbpedia-owl:AdminstrativeRegion .
dbpprop:LargestCity rdfs:range dbpedia-owl:City .
in SVM-Rank scores between the top two feature vectors. Based on this new
feature vector, a second SVM classi er decides whether to link the table cell to
this top ranked entity or not. If the evidence is not strong enough, it is likely
that the table cell is a new entity not present in the KB; this step is useful in
discovery of new entities in a given table. If the evidence is strong enough, the
table cell is linked to the top ranked entity returned by SVM-Rank.
      </p>
      <p>
        We also present a preliminary approach for identifying relations between table
columns (see Algorithm 3 in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]). The algorithm generates a set of candidate
relations from the relations that exist between the strings in each row of the
two columns. Each candidate relation is scored and the relation with the highest
score is selected to represent relation between the two columns. We have also
developed a preliminary template in N3 (see Figure 3), which is a compact and
human readable serialization of RDF for representing tables as LOD.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Evaluation and Conclusion</title>
      <p>Our implemented prototype was evaluated against 15 tables obtained from Google
Squared, Wikipedia and from a collection of tables extracted from the Web.
Excluding the columns with numbers, the 15 tables have 52 columns and 611 entities
for evaluation of our algorithms. We used a subset of 23 columns for evaluation
of relation identifcation between columns.</p>
      <p>
        In the rst evaluation of the algorithm for assigning class labels to columns,
we compared the ranked list of possible class labels generated by the system
against the list of possible class labels ranked by the evaluators. As shown in
Figure 2 for 80.76 % of the columns the Mean Average Precision (MAP) between
the system and evaluators list is greater than 0 which indicates that there was
at least one relevant label in the top three of the system ranked list. Also seen
in Figure 2, for 75 % of the columns, the recall of the algorithm was greater
than or equal to 0.6. We also assessed whether our predicted class labels were
reasonable based on the judgment of human subjects (see [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]). 76.92 % of the
class labels predicted were considered correct by the evaluators. The accuracy in
each of the four categories is shown in Figure 4. 66.12 % of the table cell strings
were correctly linked by our algorithm for linking table cells. The breakdown of
accuracy based on the categories is shown in Figure 4. Our dataset had 24 new
entities and our algorithm was able to correctly predict for all the 24 entities
as new entities not present in the KB. We did not get encouraging results for
relationship identi cation with an accuracy of 25 % (see [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] for details).
      </p>
      <p>Our existing system performs reasonably well in selecting appropriate types
for columns and linking cell values to LOD entities. We have preliminary results
for identifying and encoding the relationships implicit in the columns as well.
Our current work is focused on improving relationship discovery and generating
new facts and knowledge from tables that contain entities not present in the
LOD knowledge bases.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Finin</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Syed</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          :
          <article-title>Creating and Exploiting a Web of Semantic Data</article-title>
          .
          <source>In: Proc. 2nd Int. Conf. on Agents and Arti cial Intelligence</source>
          , Springer (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Cafarella</surname>
            ,
            <given-names>M.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Halevy</surname>
            ,
            <given-names>A.Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>Z.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          , Zhang, Y.:
          <article-title>Webtables: exploring the power of tables on the web</article-title>
          .
          <source>PVLDB 1</source>
          (
          <year>2008</year>
          )
          <volume>538</volume>
          {
          <fpage>549</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Mulwad</surname>
          </string-name>
          , V.:
          <article-title>T2LD - An automatic framework for extracting, interpreting and representing tables as Linked Data</article-title>
          .
          <source>Master's thesis</source>
          , U. of Maryalnd, Baltimore
          <string-name>
            <surname>County</surname>
          </string-name>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Sahoo</surname>
            ,
            <given-names>S.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Halb</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hellmann</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Idehen</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thibodeau</surname>
            Jr,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Auer</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sequeda</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ezzat</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>A survey of current approaches for mapping of relational databases to rdf</article-title>
          .
          <source>Technical report, W3C</source>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Han</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Finin</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Parr</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sachs</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Joshi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>RDF123: from Spreadsheets to RDF</article-title>
          . In: Seventh International Semantic Web Conference, Springer (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Syed</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Finin</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mulwad</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Joshi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Exploiting a Web of Semantic Data for Interpreting Tables</article-title>
          .
          <source>In: Proceedings of the Second Web Science Conference</source>
          . (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Mulwad</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Finin</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Syed</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Joshi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Using linked data to interpret tables</article-title>
          .
          <source>In: Proc. First Int. Workshop on Consuming Linked Data</source>
          . (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>