<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>DataGraft beta v2: New Features and Capabilities</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Nikolay Nikolov</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dina Sukhobok</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stefan Dragnev</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Steffen Dalgard</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Brian Elvesaeter</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bjørn Marius von Zernichow</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dumitru Roman</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Ontotext AD</institution>
          ,
          <addr-line>Tsarigradsko Shosse 47A, 1784 Sofia</addr-line>
          ,
          <country country="BG">Bulgaria</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>SINTEF</institution>
          ,
          <addr-line>Forskningsveien 1A, 0373 Oslo</addr-line>
          ,
          <country country="NO">Norway</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this demonstrator, we will introduce the latest features and capabilities added to DataGraft - a Data-as-a-Service platform for data preparation and knowledge graph generation. DataGraft provides data transformation, publishing and hosting capabilities that aim to simplify the data publishing lifecycle for data workers (i.e., Open Data publishers, Linked Data developers, data scientists). This demonstrator highlights the recent features added to DataGraft by exemplifying data publication of statistical data - going from the raw data published at a public portal to published and accessible Linked Data with the help of the tools and features of the platform.</p>
      </abstract>
      <kwd-group>
        <kwd>open data</kwd>
        <kwd>linked data</kwd>
        <kwd>data publication</kwd>
        <kwd>data transformation</kwd>
        <kwd>data management</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>In the recent years, a large number of public organisations and governments have been
increasingly committed to publishing data in reusable formats and under open licenses.
Nevertheless, in many cases such organisations choose to prioritise e-government and
open government initiatives due, to a large extent, to the high costs and domain-specific
expertise required. DataGraft1 started as an initiative with the goal to alleviate such
obstacles by providing new tools and services for faster and lower-cost publication of
Open Data. The lifecycle for the creation and provisioning of (Linked) Open Data
typically involves raw data cleaning, transformation, and preparation (most often from
tabular formats), mapping to standard Linked Data vocabularies and generating a
semantic RDF graph. The resulting semantic graph is then stored in a triple store, where
applications can reliably access and query the data. Conceptually, this process is rather
straightforward; however, such an integrated workflow is not commonly implemented.
Instead, publishing and consuming (Linked) Open Data remains a tedious task due to a
variety of reasons, including:
1 https://datagraft.io/
1. The technical complexity of preparing Open Data for publication is high – toolkits
are poorly integrated and require expert knowledge, particularly for publishing of
Linked Data;
2. There is considerable cost for publishing data and providing reliable access to it. In
the absence of clear monetisation channels and cost recovery incentives, the relative
investment costs can easily become excessively high for many organisations;
3. The poorly maintained and fragmented supply of Open Data reduces the reuse of
data: datasets are often provided through disconnected outlets; sequential releases of
the same dataset are often inconsistently formatted and structured.</p>
      <p>DataGraft features and capabilities have been developed to tackle such Open Data
challenges.
2</p>
      <p>The DataGraft Platform (beta v2)
DataGraft aims to streamline the process of publishing, managing and accessing Open
Data and related artefacts in order to support data workers and the users of their data.
The beta v2 version of DataGraft supports the following features:
• Asset management – metadata, access management (public/private), sharing – for
files, SPARQL endpoints, data transformations, queries;
• Automatic reliable storage of files and RDF databases;
• Data hosting and access services for all data stored on DataGraft;
• Interactive design of data transformations and RDF generation;
• Interactive visual exploration of RDF data in the SPARQL endpoints published;
• REST-based access to all platform assets;
• Layered security – encrypted user login information and SSL; asset security using
OAuth2, API keys for semantic graph databases.</p>
      <p>DataGraft
platform</p>
      <p>DataGraft
portal</p>
      <p>Grafterizer</p>
      <p>UI
User asset management
File storage
Secure access
Interactive data transformations
Repeatable data transformations
On- and off-line execution
RDF mapping
Based on the Grafter DSL
Ontotext S4
Elastic resource provisioning
Reliable storage,
Back-up,
Geospatial querying
Failover mechanisms</p>
      <p>Fig. 1. Main components of DataGraft platform
The DataGraft platform mainly consists of three components – the DataGraft portal,
Grafterizer and a cloud-enabled semantic graph database-as-a-service (as shown on
Fig. 1), which is based on a dedicated instance of the Ontotext S4 platform2.</p>
      <p>
        Since the initial release of the DataGraft platform reported in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ][
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], a number of
new features and improvements have been implemented as part of the new release (beta
v2):
• New asset types for the catalogue – SPARQL endpoints, queries and file pages
sharing between users of the platform;
• Improved Grafterizer capabilities – including conditional RDF mappings, support
various types and formats of tabular inputs;
• Versioning of assets – browsing, recording of provenance when copying assets;
• Visual browsing of data from SPARQL endpoints (using RDF Surveyor3);
• New dashboard with more control of user assets, instant search and various filters;
• Improved security (authentication) using OAuth2;
• REST API re-implemented using Swagger;
• Updated version of the semantic graph database, which now support geospatial
queries and serialisation to GeoJSON.
      </p>
      <p>Related Work. In the current state-of-the art there are several software tool ecosystem
solutions that provide support for publication of Linked Data (data extraction,
RDFisation, storage, access). Examples of such are Linked Data Stack 4 , LinDA 5 , and
COMSODE 6 projects, which provide software tools and methodologies for Linked
Data management. Such tools come functionally close to the features provided by
DataGraft, but are not provided "as-a-service", which puts the burden of deploying and
managing the infrastructure on the data publisher. OpenRefine7, with its RDF plugin
(including powerful RDF mapping features such as RDF reconciliation) comes closest
to the data transformation approach implemented in DataGraft. Nevertheless,
OpenRefine is unsuitable for usage in a service offering and the data transformation engine
is memory intensive with large data volumes.
3</p>
      <p>Demo Scenario: Statistical Data Publication and Access
The usage scenario will demonstrate the core capabilities of DataGraft using statistical
data from Statistics Norway8. Firstly, it will demonstrate how to use the DataGraft
dashboard to explore published data and user assets using the new filters and search
2 https://console.s4.ontotext.com
3 https://github.com/guiveg/rdfsurveyor
4 http://stack.linkeddata.org
5 http://linda-project.eu
6 http://www.comsode.eu
7 http://openrefine.org
8 https://www.ssb.no/
functionalities. We will then show the different types of assets that are now supported
by the platform along with the features related to browsing assets versions.
Furthermore, we will demonstrate how to use the DataGraft wizards to publish Linked Data
based on raw tabular data with the improved Grafterizer tool. Grafterizer allows to
interactively specify data transformations and mappings of tabular data to RDF. We will
demonstrate how transformations can be applied on-the-fly or offline. Then the
resulting Linked Data will be published in a dynamically created semantic graph database
and registered in the DataGraft catalogue in a SPARQL endpoint page (such as the one
shown in Fig. 2). We will demonstrate how the endpoint can be explored using
userdefined queries (as free-text or using pre-defined ones) or the new browsing tool
integrated in DataGraft. Finally, we will showcase how the new interactive APIs can be
directly used by users to access data hosted on the platform.
As of September 2017, DataGraft is available via http://datagraft.io and more details
can be found in the platform online documentation9.</p>
      <p>Acknowledgements. This work is partly funded by the EC H2020 project
proDataMarket (Grant number: 644497).
9 https://github.com/datagraft/datagraft-reference/blob/master/documentation.md</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Roman</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nikolov</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Putlier</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sukhobok</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Elvesaeter</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Berre</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ye</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dimitrov</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Simov</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zarev</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moynihan</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Roberts</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Berlocher</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smith</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Heath</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          (
          <year>2017</year>
          ).
          <article-title>DataGraft: One-Stop-Shop for Open Data Management</article-title>
          . (
          <year>2017</year>
          ). DOI:
          <volume>10</volume>
          .3233/SW-170263
          <source>. Journal: Semantic Web</source>
          , vol. Preprint, no.
          <source>Preprint</source>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>19</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Roman</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dimitrov</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nikolov</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pultier</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sukhobok</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Elvesaeter</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Petkov</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          (
          <year>2016</year>
          , May).
          <article-title>DataGraft: Simplifying open data publishing</article-title>
          .
          <source>ESWC (Satellite Events)</source>
          <year>2016</year>
          :
          <fpage>101</fpage>
          -
          <lpage>106</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>