<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>An Overview of the Linked Data AppStore</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>~ Demo/Poster Paper ~</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Dumitru Roman</institution>
          ,
          <addr-line>Claudia D. Pop, Roxana I. Roman, Bjørn M. Mathisen, Leendert Wienhofen, Brian Elvesaeter, and Arne J. Berre</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>SINTEF</institution>
          ,
          <addr-line>Oslo</addr-line>
          ,
          <country>Norway Contact:</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This demo/poster paper provides an overview of a Software-as-a-Service platform prototype for data integration on the Web - The Linked Data AppStore (LD-AppStore). It builds upon Linked Data technologies, targets data scientists/engineers and data integration application developers, and aims to provide a solution for simplifying tasks such as data transformation, querying, entity extraction, data visualization, crawling, etc. This paper focuses on the overall architecture of the LD-AppStore, basic data operations supported by the current prototype, and outlines the demonstration of the prototype.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>In recent years a significant amount of data has been made available as Open and/or
Linked Data, however applications utilizing such data have been rather few.1 Reasons
include, amongst others, the technical complexity and economical cost of integration,
publishing, interlinking and providing reliable access to the data, and lack of simplified
and unified solutions for data consumption, and lack of tools and infrastructures where
datasets and 3rd party components can be made easily available to application
developers to reuse, combine and develop novel data-driven applications. At present, Linked
Data publishers and application developers need to rely on generic platforms (like the
Amazon Web Services or Google App Engine cloud providers), and build, deploy and
maintain complex Linked Data software and data stacks from scratch. Tools addressing
various aspects of data integration process, though available in a Linked Data context,
are difficult to use for more complex, interesting data integration tasks. This results in
a high cost of data integration at large scale, a rather complicated and time consuming
process. New innovative ways of simplifying data integration in a Linked Data context
are needed.
1 As of Sept 2014, for example, the official EU public open data portal (http://publicdata.eu/)
contains more than 48,000 datasets but lists less than 80 applications using the data. The
situation is not much different for other open data portals (see e.g. http://www.datacatalogs.org/).</p>
      <p>To simplify the data integration process, and support data publishers and application
developers, this paper provides an overview of a Software-as-a-Service platform–The
Linked Data AppStore (LD-AppStore)–for data scientists/engineers aiming to enable
them to use, in a rather simplified manner, tools/services for tasks such as data
transformation, entity extraction, data visualization, crawling, etc. At the same time, data
integration application developers have the possibility of exploiting the use of their
tools/services by plugging them into the LD-AppStore.
2</p>
    </sec>
    <sec id="sec-2">
      <title>The LD-AppStore Platform Overview</title>
      <p>The LD-AppStore is meant to be a service where data engineers can get access to
various types of data operations, such as data transformation, storage, querying, linking,
visualization, etc., which they can apply on their data, and have access to various
tool/service implementations of those data operations – implementations provided by
developers. The LD-AppStore serves as a registry of data operations and their
implementations.</p>
      <p>Figure 1 provides a high level overview of the LD-AppStore architecture.</p>
      <p>The upper part of the picture depicts components for basic date operations, currently
being considered: RDF-ization of relational databases (mapping relational tables to
RDF graphs), data visualization (visualization of RDF graphs), entity extraction
(extracting entities from various sources), data storage (storage of RDF data manipulated
in the platform), link discovery (finding links between data in RDF graphs), crawling
(searching through RDF graphs), and data streaming (querying streams of RDF data).
A set of Web APIs have been designed for these data operations. The set of
tools/services that implement these basic data operations are made available through the registry
functionality of the platform (lower right part of the figure). When using a specific data
operation, the data engineer may select which implementation of that operation he/she
wants to use. The Linked Data tool/service developers have access to the platform for
registering their implementations, i.e., the implementations of the Web APIs
corresponding to the data operations APIs. The lower left part depicts a set of data integration
workflows meant to seamlessly combine the basic data operations in workflows
(configurable by the data engineers) that can eventually provide further useful insights into
the data on which they are applied.</p>
      <p>In the current design, the platform offers six different types of basic data operations
for which Web APIs have been designed: DB-RDFization (for mapping data from
relational databases to RDF); Entity Extraction (for extracting entities from various
sources); Data Visualization; Storage (for storing/querying data); Streaming (for
querying streams of data); Link Discovery (for discovering relations between different
datasets); and Web Crawling (for searching Linked Data).
3</p>
    </sec>
    <sec id="sec-3">
      <title>The LD-AppStore Prototype and Demonstration</title>
      <p>The current implementation of the LD-AppStore that will be demonstrated consists of
the backend infrastructure for registering applications/tools implementing the APIs of
data operations, the graphical frontend infrastructure through which data engineers can
access the various data operations and the tools/services that implement them, as well
as a set of tools that have been modified to implement the above mentioned APIs.
Figure 2 provides a screenshot of the LD-AppStore homepage.</p>
      <p>The platform offers the possibility to register new tools/services as implementations
for various operations. For each of the already registered tools a programmatic Web
interface has been made which follows the one for its corresponding operation. In this
way, implementation independence has been obtained, as long as each of the new added
tools implement the operation’s interface. The following tools have been integrated in
the current prototype: DB2Triples2 for the DB-RDFization operation; The Unstructured
Information Management Architecture (UIMA)3 for the entity extraction operation;
LodLive4 for the visualization operation; OpenRDF Sesame5 for storage operations;
Continuous SPARQL (C-SPARQL)6 for the streaming operation; The Silk framework7
for the link discovery operation; and LDSpider8 for the crawling operation.</p>
      <p>The demonstration will show the current implementation focusing on overall the
capabilities of the prototype and exemplify the registration and use of existing tools (e.g.
DB2Triples) in the LD-AppStore.</p>
      <p>Related Approaches. The LD-AppStore follows the research line of bundling
wellestablished technologies and tools for publishing and consuming Linked Data in order
to ease data integration on the Web. Notable approaches developed in this area include
toolchains such as the Linked Data Stack9 and the LarKC platform10. Such approaches
do not provide an as-a-service hosted solution where 3rd party tool developers can
plugin their implementations for different data operations and where data publishers can
configure and execute workflows of data operations implementations on their data
--which is what LD-AppStore targets. DaPaaS11, COMSODE12, and LinDA13 are a
number or recent EU funded research projects addressing the problem of simplifying access,
integration, and usage of open data based on Linked Data technologies, primarily
focusing on data publication and consumption aspects. The projects are in early stages of
development with their approaches not entirely defined yet, however ideas from the
LD-AppStore are finding traction in the DaPaaS project.</p>
      <p>Acknowledgements. This work was partly funded by the following projects: BigFut
(SINTEF internally funded project 102003299), DaPaaS (FP7 610988)14,
SmartOpenData (FP7 603824)15, and InfraRisk (FP7 603960)16.
2 https://github.com/antidot/db2triples
3 https://uima.apache.org/
4 http://en.lodlive.it/
5 http://www.openrdf.org/
6 http://streamreasoning.org/download/
7 http://wifo5-03.informatik.uni-mannheim.de/bizer/silk/
8 https://code.google.com/p/ldspider/
9 http://stack.linkeddata.org/
10 http://www.larkc.eu/
11 http://dapaas.eu/
12 http://www.comsode.eu/
13 http://linda-project.eu/
14 http://project.dapaas.eu/
15 http://www.smartopendata.eu/
16 https://www.infrarisk-fp7.eu/</p>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>