<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Tool for Creating and Visualizing Semantic Annotations on Relational Tables</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Suvodeep Mazumdar</string-name>
          <email>1s.mazumdar@she</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ziqi Zhang</string-name>
          <email>2ziqi.zhang@ntu.ac.uk</email>
        </contrib>
      </contrib-group>
      <abstract>
        <p>Semantically annotating content from relational tables on the Web is a crucial task towards realizing the vision of the Semantic Web. However, there is a lack of open source, user-friendly tools to facilitate this. This paper describes an extension of the TableMiner+ system, an open source Semantic Table Interpretation system that automatically annotates Web tables using Linked Data in an e ective and e cient approach. It adds a graphical user interface to TableMiner+, to facilitate the visualization and correction of automatically generated annotations. This makes TableMiner+ an ideal tool for the semi-automatic creation of high-quality semantic annotations on relational tables, which facilitates the publication of Linked Data on the Web.</p>
      </abstract>
      <kwd-group>
        <kwd>Web table</kwd>
        <kwd>Named Entity Disambiguation</kwd>
        <kwd>Semantic Table Interpretation</kwd>
        <kwd>table annotation</kwd>
        <kwd>Linked Data</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Recovering semantics from the growing amount of tabular data on the Web is
a crucial task in realizing the vision of the Semantic Web. Traditional search
engines perform poorly on such data, as they ignore the semantics of tabular
structures [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ]. Recent years have seen an increase in the research on Semantic
Table Interpretation [
        <xref ref-type="bibr" rid="ref2 ref3 ref5 ref6">2, 5, 3, 6</xref>
        ], which annotates relational tables using schema
and entities de ned in a reference knowledge base. The process deals with three
types of annotation tasks in tables. Starting with the input of a well-formed
relational table, it (1) links entity mentions in content cells to named entities; (2)
annotates columns with concepts if they contain entity mentions, or properties
of concepts if they contain data literals; and (3) identi es the semantic relations
between columns. The annotations created can enable semantic indexing and
search of the data, and can be used to create Linked Open Data (LOD).
      </p>
      <p>
        Semantic Table Interpretation systems are intrinsically di cult to implement,
due to, e.g., the complexity of the inter-dependent tasks (e.g., the annotation
of a cell depends on on that of the containing column and vice versa), and the
use of di erent knowledge bases. TableMiner+ [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] is such a method adopting
an incremental, bootstrapping approach that starts by creating preliminary and
partial annotations of a table using `sample' data, then using the outcome as
`seed' to guide interpretation of remaining contents. This is then followed by
a message passing process that iteratively re nes results on the entire table
to create the nal optimal annotations. It has been implemented as open-source
software (as part of the STI library1), however, the system is lacking an intuitive
user interface, which has made it di cult to be used by an average person with
limited technical knowledge.
      </p>
      <p>This work implements a graphical user interface speci cally for TableMiner+,
to make it an easy-to-use tool for annotating Web tables using Linked Data, and
also extend it by enabling users to visualise and correct the generated annotations
and Linked Data triples. As a result, data publishers can use TableMiner+ for
transforming tabular data on the Web into high-quality Linked Data, or creating
gold-standard for experiment purposes. The remainder of this paper is structured
as follows. Section 2 brie y discusses related work; Section 3 gives an overview
of TableMiner+; Section 4 introduces the improvement carried out in this work;
Section 5 concludes this paper.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        Recent years have seen an increasing number of work on Semantic Table
Interpretation. Venetis et al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] annotate columns in a table with semantic concepts
and identify relations between the subject column (typically containing entities
that the table is about) and other columns using a database mined with regular
lexico-syntactic patterns such as the Hearst patterns [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The database records
co-occurrence statistics for each pair of values extracted by such patterns. A
maximum likelihood inference model is used to predict the best concepts and
relations from candidates using these statistics. Limaye et al. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] uses a joint
inference model, i.e., factor graph to model a table and the interdependencies
between its components. Table components are modeled as variables represented
as nodes on the graph; then the interdependencies among variables are
modeled by factors. The task of inference amounts to searching for an assignment
of values to the variables that maximizes the joint probability. Mulwad et al.
[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] also uses joint inference with semantic message passing. TableMiner [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] and
TableMiner+[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] adopt a bootstrapping approach starting by creating
preliminary annotations of a table using automatically selected `sample' data in the
table, followed by a message passing process that iteratively re nes the
preliminary annotations to create the nal optimal results. These methods di er in
terms of the inference models, features and background knowledge bases used.
As discussed before, existing tools remains di cult to use due to the lack of a
user friendly interface.
      </p>
      <sec id="sec-2-1">
        <title>1 https://github.com/ziqizhang/sti</title>
        <sec id="sec-2-1-1">
          <title>Overview of TableMiner+</title>
          <p>
            Figure 1 shows a high-level view of the components and work ow of TableMiner+.
We refer readers to Zhang [
            <xref ref-type="bibr" rid="ref7">7</xref>
            ] for details of the methodology. The system can
be divided into three major components. Firstly, it detects a `subject column'
(SUBJECT COLUMN DETECTION), which is the one in the table
containing named entities that are subjects of each rows. TableMiner+ assumes
other columns in a relational table are data describing the subjects. It then
identi es other columns that also contain named entities (NE-columns), and
performs column classi cation (assigning a URI from a knowledge base to the
column) and cell disambiguation (assigning a URI from a knowledge base to each
cell) on these as well as the subject columns. Working with each NE-column at
a time, these are further divided into two processes. In the LEARNING phase,
the system attempts to use a subset (Sample Ranking ) of rows from the
NEcolumn to infer a concept URI for the column (Preliminary Col. Classify. with
I-Inf ). The idea is that, usually for human-beings, we only need to see some (and
rarely do we need to see all) data in a column in order to classify them.
However, it is likely that our understanding could be biased because of this `partial'
view. And therefore, we call these results `preliminary', which will be optimized
later. The LEARNING phase also uses preliminary column annotations as
input to guide Preliminary Cell Disambiguation. In this part of the process, the
assigned concept URI for the column determines the candidate named entities
for each row in that column. Next, the preliminary annotations for a column
and its content cells are optimized in the UPDATE phase. In this phase, the
system attempts to ensure annotations on di erent NE-columns are consistent,
e.g., they belong to the same domain (Compute Domain Representation ). The
computation can alter the preliminary annotations in some columns or content
cells, which then causes a chain of alterations due to the interdependency of
the tasks. A semantic message passing algorithm is implemented to control such
update process until convergence. With the column and cell annotations
nalized, TableMiner+ moves on to infer relations between the subject column and
other columns (RELATION ENUMERATION + LITERAL COLUMN
ANNOTATION). In simple terms, the relation between a subject column and
another column is selected based on the relations derived on each row between
the pairs of subject entity and data in the other column.
4
          </p>
        </sec>
        <sec id="sec-2-1-2">
          <title>Description of the TableMiner+ Application Interface</title>
          <p>
            In this section, we describe the TableMiner+ user interface and the use of the
tool through this interface. We use the implementation distributed as part of the
STI library as basis for this work. The STI library provides an implementation
of the system introduced in Zhang [
            <xref ref-type="bibr" rid="ref7">7</xref>
            ], and a few baseline systems. The library
is implemented in Java, and uses DBpedia as the knowledge base. Currently, a
Web-based interface consisting of two components are implemented: one that lets
users to de ne, con gure and start a table annotation task; and the other that
lets users to visualise and correct annotation results. In both cases, interaction
is achieved via a Web browser2.
4.1
          </p>
          <p>Starting a table annotation process
The interface for starting a Semantic Table Annotation task is illustrated in
Figure 23. Users rstly enter the URL of a webpage containing relational tables
that are to be annotated. Upon entering the details, the user is shown a preview
of the page along with a highlighted list of tables potentially containing relational
data. The users then select the tables they wish to annotate. They can also
con gure the system to alter settings such as feature weights and knowledge
base query constraints. The users may provide an email address to subscribe
for an automatic alert when the annotation task completes. When the users are
satis ed with the con guration and the input, they can click the button to start
the task, which will create annotations in JSON format. These will be interpreted
and displayed using the visualisation component described below.
4.2</p>
          <p>Visualisation and correction of annotations
The JSON les are then passed onto the visualisation component, which consists
of two interactive elements: an annotated table and a graph visualisation module.
2 However, it is not recommended to deploy TableMiner+ as a Web-service as it does
not support concurrent access typically found in multi-user environment.
3 Follow https://github.com/ziqizhang/sti/tree/master/ui for a demo and on how to
use</p>
          <p>The annotated table is the rst point of interaction with the user, and presents
the original table, annotated with the entities, concepts and relations identi ed
by TableMiner+. The rst step for the UI is to investigate the header cells of
the table - TableMiner+ creates a set of candidate concepts that best describe
the header and the data in the column. Each associated concept has a score
indicating the system's con dence. This set of candidate concepts is presented
as a dropdown with the scores (Figure 3 section B). Users can select any of the
concepts to indicate a more appropriate annotation by clicking on the respective
concept. Concepts are further encoded on the basis of scores (the highest scoring
concepts are indicated in green, while the lowest in red), which provides an
indication of the con dence in content cell annotations can be visualised in the
same way.</p>
          <p>As can also be seen from the gure, some entities have already been
recognized, while some haven't. In such cases the user can provide a URI that is
appropriate for any missing annotation, this can be done by clicking the relevant
cell, which will provide a prompt for a text input (Figure 4). Further SPARQL
queries can also be triggered to the respective endpoints (based on the user
customisations) that can identify any missing annotations.</p>
          <p>While tables can provide a clean annotated replication of the original source
document, with the added ability for users to provide their annotations and
correct any mistakes they can observe, a further need may arise for greater
customisation and control of annotations. This provides users with means to
visualise (and annotate) possible relations among table columns, in addition to
visualising possible candidate annotations. The next aspect of the UI is the graph
visualisation, which is invoked from the `inspect' button on the rst cell of each
table row (Figure 3, Section C). As an example, the header and it's relevant
candidate concepts have been plotted as a graph in Figure 5.</p>
          <p>Header cells are shown as nodes labelled with the header columns (0-3),
while the candidate classes are shown as nodes, linked with header elements. The
most relevant class is shown with a strong link, while the others are presented
as dashed lines. Clicking on an individual node makes all other nodes more
transparent, and hence keeps the current node and link in focus. Right-clicking
the dashed ones annotate the relevant header cell with the respective concept,
which will then con rm the change with a strong link (here, a straight thick
line). Header cells are also linked with each other with dashed lines, which is
interpreted as only an indicative relation. However, if TableMiner+ creates any
relations between the columns, it is re ected as straight lines as can be seen in
Figure 5.
Described Scenario In the example shown so far, the source URL (https:
//en.wikipedia.org/wiki/Commedia_all\%27italiana) describes movies
released that belong to an Italian lm genre. The extraction process in TableMiner+
annotated several cells of the table selected by the user in Figure 2 (Notable
lms), however several lms could not be identi ed. This is made evident when
the user visualises the annotated table (Figure 3). Missing cells can then be
manually annotated by adding URLs if the user can provide any unidenti ed ones
(Figure 4). For example, the user observes the cell `Boccaccio 70' could not be
identi ed and hence chose to manually add the resource URL. Further
inspecting the di erent concepts, users can click on the `inspect' button to visualise
the concepts on a node-link graph (Figure 5). While interacting with the graph,
the user notes that since the original webpage discussed Italian movie genre,
the `Italian Film Directors' concept would be appropriate to describe the rst
column in the table. Hence the user can right-click on the concept to add a new
annotation for the column. Each row in the table can be visualised as a graph,
and hence the user can introduce row-speci c annotations as well. Finally, when
all annotations are completed, the user can click on `Save changes' to submit all
annotations to TableMiner+.
5</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Conclusion</title>
      <p>This paper introduced a graphical user interface for TableMiner+ to facilitate the
semi-automatic creation of high quality Linked Data and annotations on Web
tables. Future work will extend the system to support, e.g., di erent knowledge
bases, other algorithms, ne-grained task de nition that enable batch processing
and zoning on tables (e.g., speci c columns). Furthermore, we will also explore
di erent visualisations and mechanisms for users to introduce new annotations,
visualising relevant sections of ontologies while exploring table annotations. We
also have a series of user evaluations planned to understand how users can make
use of the user interface.</p>
      <p>Acknowledgement This work is funded by the EU FP7 WeSenseIt (grant
agreement 308429)4 and EU Horizon 2020 Seta (grant agreement 688082)5 projects.
We also thank the ADEQUATe6 project team under the lead of Dr Tomas Knap
for contributing valuable design ideas.</p>
      <sec id="sec-3-1">
        <title>4 http://wesenseit.eu/</title>
        <p>5 http://setamobility.eu/
6 http://www.adequate.at</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Marti</surname>
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Hearst</surname>
          </string-name>
          .
          <article-title>Automatic acquisition of hyponyms from large text corpora</article-title>
          .
          <source>In Proceedings of the 14th Conference on Computational Linguistics - Volume 2, COLING '92</source>
          , pages
          <fpage>539</fpage>
          {
          <fpage>545</fpage>
          ,
          <string-name>
            <surname>Stroudsburg</surname>
          </string-name>
          , PA, USA,
          <year>1992</year>
          .
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Girija</given-names>
            <surname>Limaye</surname>
          </string-name>
          , Sunita Sarawagi, and
          <string-name>
            <given-names>Soumen</given-names>
            <surname>Chakrabarti</surname>
          </string-name>
          .
          <article-title>Annotating and searching web tables using entities, types and relationships</article-title>
          .
          <source>Proceedings of the VLDB Endowment</source>
          ,
          <volume>3</volume>
          (
          <issue>1</issue>
          -2):
          <volume>1338</volume>
          {
          <fpage>1347</fpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Varish</given-names>
            <surname>Mulwad</surname>
          </string-name>
          ,
          <string-name>
            <surname>Tim Finin</surname>
            , and
            <given-names>Anupam</given-names>
          </string-name>
          <string-name>
            <surname>Joshi</surname>
          </string-name>
          .
          <article-title>Semantic message passing for generating linked data from tables</article-title>
          .
          <source>In International Semantic Web Conference (1), Lecture Notes in Computer Science</source>
          , pages
          <volume>363</volume>
          {
          <fpage>378</fpage>
          . Springer,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>Petros</given-names>
            <surname>Venetis</surname>
          </string-name>
          , Alon Halevy, Jayant Madhavan, Marius Pasca, Warren Shen, Fei Wu, Gengxin Miao, and
          <string-name>
            <given-names>Chung</given-names>
            <surname>Wu</surname>
          </string-name>
          .
          <article-title>Recovering semantics of tables on the web</article-title>
          .
          <source>Proceedings of VLDB Endowment</source>
          ,
          <volume>4</volume>
          (
          <issue>9</issue>
          ):
          <volume>528</volume>
          {
          <fpage>538</fpage>
          ,
          <year>June 2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Jingjing</surname>
            <given-names>Wang</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Haixun</surname>
            <given-names>Wang</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Zhongyuan</given-names>
            <surname>Wang</surname>
          </string-name>
          , and
          <string-name>
            <surname>Kenny</surname>
            <given-names>Q.</given-names>
          </string-name>
          <string-name>
            <surname>Zhu</surname>
          </string-name>
          .
          <article-title>Understanding tables on the web</article-title>
          .
          <source>In Proceedings of the 31st international conference on Conceptual Modeling, ER'12</source>
          , pages
          <fpage>141</fpage>
          {
          <fpage>155</fpage>
          , Berlin, Heidelberg,
          <year>2012</year>
          . Springer-Verlag.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>Ziqi</given-names>
            <surname>Zhang</surname>
          </string-name>
          .
          <article-title>Towards e ective and e cient semantic table interpretation</article-title>
          .
          <source>In Proceedings of the 13th International Semantic Web Conference</source>
          , pages
          <volume>487</volume>
          {
          <fpage>502</fpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>Ziqi</given-names>
            <surname>Zhang</surname>
          </string-name>
          .
          <article-title>E ective and e cient semantic table interpretation using tableminer+</article-title>
          .
          <source>Semantic Web Journal</source>
          , Accepted. Tracking:
          <fpage>1339</fpage>
          -
          <lpage>2551</lpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>