<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Help me describe my data: A demonstration of the Open PHACTS VoID Editor</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Carole Goble</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alasdair J G Gray</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Eleftherios Tatakis</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, Heriot-Watt University</institution>
          ,
          <addr-line>Edinburgh</addr-line>
          ,
          <country country="UK">UK</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>School of Computer Science, University of Manchester</institution>
          ,
          <addr-line>Manchester</addr-line>
          ,
          <country country="UK">UK</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The Open PHACTS VoID Editor helps non-Semantic Web experts to create machine interpretable descriptions for their datasets. The web app guides the user, an expert in the domain of the data, through a series of questions to capture details of their dataset and then generates a VoID dataset description. The generated dataset description conforms to the Open PHACTS dataset description guidelines that ensure suitable provenance information is available about the dataset to enable its discovery and reuse. The VoID Editor is available at http://voideditor.cs.man.ac.uk. The source code can be found at https://github.com/openphacts/Void-Editor2.</p>
      </abstract>
      <kwd-group>
        <kwd>Dataset descriptions</kwd>
        <kwd>VoID</kwd>
        <kwd>Provenance</kwd>
        <kwd>Metadata</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Users of systems such as the Open PHACTS Discovery Platform3 [
        <xref ref-type="bibr" rid="ref1 ref2">1,2</xref>
        ] need to
know which datasets have been integrated. In the scienti c domain they
particularly need to know which version of a dataset is loaded in order to correctly
interpret the results returned by the platform. To satisify this need, the
provenance of the datasets loaded into the Open PHACTS Discovery Platform are
needed. This provenance information is then available for any data returned
by the platform's API. Within the Open PHACTS project we have identi ed
a minimal set of metadata that should be provided to aid understanding and
reuse of the data [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Additionally, we recommend that the metadata is provided
using the VoID vocabulary [
        <xref ref-type="bibr" rid="ref4 ref6">4</xref>
        ] so that the data is self-describing and machine
processable.
      </p>
      <p>
        Open PHACTS does not publish its own datasets; it integrates existing
publicly available domain data. Typically the publishers of these scienti c data sets
are experts in their scienti c domain, viz. chemistry or biology, but not in the
semantic web. They need to be supported in the creation of VoID descriptions
of their datasets which may have been published in a database and converted
into RDF. A tool which hides the underlying details of the semantic web but
enables the creation of descriptions understandable to a domain expert is thus
needed.
3 https://dev.openphacts.org/ accessed July 2014
The aim of the VoID Editor (see screenshot in Figure 1) is to allow a data
publisher to create validated dataset descriptions within 30 minutes. In particular,
the data publisher does not need to read and understand the Open PHACTS
dataset descriptions guidelines [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] which provide a checklist of the RDF
properties that must and should be provided. There is also no need for the data
publisher to understand RDF or the VoID vocabulary.
      </p>
      <p>The VoID Editor is a web application that guides the data provider through
a series of questions to acquire the required metadata properties. The user is rst
asked for details about themselves and other individuals involved in the
authoring of the data. Core publishing metadata such as the publishing organisation
and the license are then gathered. The user is then asked for versioning
information and the expected update frequency of the data. The Sources tab helps
the user to provide details of source datasets from which their data is derived.
They can either select from the datasets already known to the Open PHACTS
Discovery Platform or enter the details manually. The list of known datasets is
populated by a call to the Open PHACTS API. The Distribution Formats tab
allows the user to describe the distributions in which the data is provided, e.g.
RDF, database dump, or CSV. The nal screen allows the user to export the
RDF of their dataset description as well as providing a summary of any
validation errors, e.g. not supplying a license which is a required eld, such errors
will already have been indicated by a red bar at the top of the screen containing
an error message. Note that the `Export RDF' button is only activated when a
valid dataset description can be created, i.e. all required elds have been lled
in.</p>
      <p>At any stage, the generated RDF dataset description may be inspected by
clicking the `Under the Hood' button. This button can also be used to save a
partially generated description that can later be imported into the editor through
the `Import VoID' button. The `Under the Hood' feature is also useful for
semantic web experts to see what is being generated at any stage.
3</p>
    </sec>
    <sec id="sec-2">
      <title>Linkset Editor</title>
      <p>In companion with the VoID Editor, a Linkset Editor (see screenshot in Figure 2)
has been developed. The Linkset Editor allows for the creation of descriptions
of the links between two datasets. The same interface design and framework is
used.</p>
      <p>
        The Linkset Editor reuses the rst three tabs of the VoID Editor to capture
details of the authors, core publishing information, and details about versioning.
The Source/Target tab allows the user to select the pair of datasets that are
connected by the linkset. Again, the list of possible datasets is generated by
a call to the Open PHACTS API. The Link Info tab asks the user to declare
the link predicate used in the linkset and provide some justi cation to capture
the nature of the equality relationship encoded in the links. (For details about
linkset justi cations, please see Section 5 of [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].)
4
      </p>
    </sec>
    <sec id="sec-3">
      <title>Implementation</title>
      <p>
        The VoID and Linkset Editors have been implemented using AngularJS as a
Javascript framework for the web client with a server implementation using Jena
libraries. A user-centric approach was followed for the design and development
of the VoID Editor. A small number of data providers were consulted about the
type of tool they required with regular interviews and feedback on prototype
versions. A larger number of potential users were involved in an evaluation of
the VoID Editor. Full details can be found in [
        <xref ref-type="bibr" rid="ref5 ref7">5</xref>
        ].
      </p>
      <p>In the future we plan to investigate how the VoID Editor can genearate
template descriptions that can be populated as part of the data publishing pipeline.
We also plan to look at how the editor could be adapted to other dataset
description guidelines, e.g. DCAT4 or the W3C HCLS community pro le5. However,
this is not a straightforward process since considerable care and attention is paid
to the phrasing and grouping of questions to ensure a pleasant user experience.
Acknowledgements
The research has received support from the Innovative Medicines Initiative Joint
Undertaking under grant agreement number 115191, resources of which are
composed of nancial contribution from the European Union's Seventh Framework
Programme (FP7/2007- 2013) and EFPIA companies in kind contribution.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Gray</surname>
            ,
            <given-names>A.J.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Groth</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Loizou</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Askjaer</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brenninkmeijer</surname>
            ,
            <given-names>C.Y.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Burger</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chichester</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Evelo</surname>
            ,
            <given-names>C.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goble</surname>
            ,
            <given-names>C.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Harland</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pettifer</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thompson</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Waagmeester</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Williams</surname>
            ,
            <given-names>A.J.:</given-names>
          </string-name>
          <article-title>Applying linked data approaches to pharmacology: Architectural decisions and implementation</article-title>
          .
          <source>Semantic Web</source>
          <volume>5</volume>
          (
          <issue>2</issue>
          ) (
          <year>2014</year>
          )
          <volume>101</volume>
          {113 doi:10.3233/SW-2012-0088.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Groth</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Loizou</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gray</surname>
            ,
            <given-names>A.J.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goble</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Harland</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pettifer</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>API-centric Linked Data Integration: The Open PHACTS Discovery Platform Case Study</article-title>
          .
          <source>Journal of Web Semantics</source>
          (
          <year>2014</year>
          ) In press. doi:
          <volume>10</volume>
          .1016/j.websem.
          <year>2014</year>
          .
          <volume>03</volume>
          .003.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Gray</surname>
            ,
            <given-names>A.J.G.</given-names>
          </string-name>
          :
          <article-title>Dataset descriptions for the Open Pharmacological Space</article-title>
          . Working draft,
          <string-name>
            <surname>Open</surname>
            <given-names>PHACTS</given-names>
          </string-name>
          (
          <year>September 2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Alexander</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cyganiak</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hausenblas</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhao</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Describing Linked Datasets with the VoID Vocabulary</article-title>
          . Note,
          <source>W3C (March</source>
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Tatakis</surname>
          </string-name>
          , E.:
          <source>VoID Editor v2. Undergraduate dissertation</source>
          , School of Computer Science, University of Manchester, Manchester,
          <string-name>
            <surname>UK</surname>
          </string-name>
          (April
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>4 http://www.w3.org/TR/vocab-dcat/ accessed July 2014</mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>5 http://www.w3.org/2001/sw/hcls/notes/hclsdataset/ access July 2014</mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>