<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>The Information Workbench as a Self-Service Platform for Linked Data Applications</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Peter Haase</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Michael Schmidt</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andreas Schwarte</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>uid Operations AG</institution>
          ,
          <addr-line>Walldorf</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Pursuing the goal to lower the entry barrier into the world of Linked Data, we present the Information Workbench as a platform to support self-service Linked Data application development. Targeting the full life-cycle of Linked Data applications, it o ers support in discovery and exploration of Linked Data sources, facilitates the integration and processing of Linked Data following a Data-as-a-Service paradigm (where remote data sources can be virtually integrated through a federation layer), and eases self-service UI development based on Semantic Wiki technologies, combined with a large set of widgets for interacting with the data. Coming with all these features, the Information Workbench can be used to rapidly build industrial-strength Linked Data applications.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>In recent years, a large amount of Linked Data (LOD) has been published on the
Web [2]. Growing in size and domain coverage, this data becomes more and more
interesting for building innovative applications that integrate heterogeneous data
from di erent sources, in order to overcome the limitations of traditional data
management systems. Apart from a new era of Web applications that exploit
the LOD corpus, this development also o ers opportunities in building novel
applications for the enterprise by bringing together company-internal data with
external data, to augment and contextualize internal knowledge bases [3].</p>
      <p>The development of speci c applications that bene t from Linked Data,
though, comes with a variety of new challenges. First, at the data
management side, developers are faced with a variety of new data formats and query
languages (such as RDF, OWL, and SPARQL), but also struggle with
heterogeneity at data level (facing Linked Data available via HTTP lookups, RDF
dumps, and SPARQL endpoints) and new database systems and tools to store,
process, and access this data. Second, once the relevant data has been
identied and integrated into the system, Linked Data applications require new data
interaction paradigms to deal with the speci c challenges { and opportunities {
of the underlying data formats, such as schema exibility and data semantics.
In particular, to leverage the bene ts of Linked Data, aspects such as dynamic
discovery of data sources, seamless integration of Linked Data from multiple
sources, provenance, application development environments, and { last but not
least { end-user interfaces that implement generic interaction paradigms with
Linked Data are important aspects when building Linked Data applications.</p>
      <p>
        In this paper, we present the Information Workbench as a platform to support
self-service Linked Data application development. We start with a discussion of
the Information Workbench architecture in Section 2. In Section 3, we discuss
how the self-service idea empowers all stages of the application development
process, in particular (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) the deployment of the Information Workbench as a
virtual appliance based on cloud technologies, (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) the interactive exploration of
public Linked Data, (
        <xref ref-type="bibr" rid="ref3">3</xref>
        ) the ad hoc virtual integration of data sets by means of a
federation layer, and (
        <xref ref-type="bibr" rid="ref4">4</xref>
        ) self-service User Interface customization.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>The Information</title>
    </sec>
    <sec id="sec-3">
      <title>Workbench</title>
      <p>{ For every data source, a data provider can be instantiated. Such providers
gather information from the source, convert it into the RDF data format, and
materialize the RDF output in the central triple store. Apart from built-in
mechanisms to integrate semantic data formats such as RDF dumps or data
behind SPARQL endpoints, there are several generic providers supporting
the integration of legacy and Web data sources.
{ Data can be manually loaded in the platform using prede ned UIs and
APIs for data import. Supporting the integration of tabular data and
spreadsheets, the Information Workbench also o ers interfaces to Google Re ne.
{ Finally, our platform supports virtualized data integration, where local
or public Linked Data sources (such as SPARQL endpoints) can be connected
through a federation layer without materializing the data in the central store.</p>
      <p>Once the data has been integrated, every resource in the data graph is
automatically associated with a Semantic Wiki page, making it possible to bring
together the structured data of the resources with unstructured information.
Overcoming the limitations of traditional wikis, Semantic Wikis o er built-in
mechanisms to access the underlying data, namely to extract and display
information from the underlying data graph (e.g., by means of SPARQL queries) and
to write directly to the store (e.g., by semantic links that connect resources at
data level). Accounting for the coexistence of structured and unstructured data,
the platform implements advanced search and information access paradigms,
ranging from keyword search to complex graph pattern-based search, supporting
the user in constructing expressive search queries by use of forms and
state-ofthe-art semantic query auto-completion techniques.</p>
      <p>All the core system functionality is extensible and exposed to the outside by
di erent APIs, including Java interfaces, an integrated SPARQL endpoint, or
an interactive CLI. The Information Workbench also comes with an extensible
AJAX-based Web frontend, which supports mashups on both data and widget
level, making it possible to interlink data from multiple sources using di erent
visualization and exploration widgets.
3</p>
    </sec>
    <sec id="sec-4">
      <title>Self-Service Linked Data Application Development</title>
      <p>
        The Information Workbench supports the whole Linked Data application
development process. This process is aligned with the self-service idea, thus hiding
the technical details behind data integration, data management, as well as the
complexity of UI building. In particular, the platform o ers support for (a)
selfservice data integration, (b) self-service analytics (comprising aspects like
userde ned, mashed-up interactive dashboards, e.g., in the form of charts, time-series
diagrams, geo-mappings, etc.), and (c) the ability to explore data and problems
collaboratively and interactively in real time. In the following we describe how
the Information Workbench supports self-service application development along
the whole application development process. We illustrate the process using an
example application in the media domain that intends to provide an end-user
oriented music portal built on top of di erent public Linked Data sources.
(
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) Provisioning the Platform as a Service
As an alternative to the download from the Information Workbench website,
a fully equipped Information Workbench is available as a virtual appliance for
immediate deployment from our self-service portal. We employ virtualization
techniques to enable wizard-based self-service provisioning of both the
application and user-selected data sets. Through our self-service portal users can choose
an available Information Workbench template and deploy an instance of the
system. We o er di erent templates that are pre-populated with data from di erent
domains such as life science, governmental data, or media data. This data is
either connected via public SPARQL endpoints (possibly through a federation), or
as local RDF databases. Upon completion of the wizard, the Information
Workbench instance is deployed in the hosting landscape of the service provider and
henceforth publicly accessible from any computer with Internet access.
      </p>
      <p>
        In our example application, we select a template designed for the media
domain, which is con gured to use the data from the LinkedBrainz project (a
Linked Data version of MusicBrainz.org data) as well as the DBpedia dataset.
The template-based provisioning of the Information Workbench (including the
local data repositories) is completed in about 5{10 minutes.
(
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) Data Source Discovery
The prerequisite of a self-service Linked Data platform is a rigorous
implementation of the Data-as-a-Service (DaaS) paradigm, where users are able to discover,
integrate, and consume available Linked Data ad hoc and on demand. DaaS
relies on (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) the availability of individual data sets that can be deployed
independently (yet may be interlinked with each other) and (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) the availability of
meta information about the content of and access mechanisms to the sources.
      </p>
      <p>Building upon these prerequisites, Figure 2 shows how the Information
Workbench implements the DaaS paradigm. At the bottom is the data layer, which
includes public data sources from di erent publishers as well as local data
sources (typically in the form of SPARQL endpoints). An integral component
of the Information Workbench is a metadata registry, which provides a
unied view on metadata and statistics about these data sets by integrating
information from di erent data registries and markets such as http://ckan.org,
http://data.gov, and others. As standard vocabulary for representing the
metadata we use DCAT and VoID [1]. The user can explore the metadata
catalog from within the Information Workbench along all meta information that is
available, including domain and size of the data sets, origin, licenses, information
about interlinked sources, etc. Figure 3(a) shows a screenshot of the PivotViewer
interface for visual exploration of available data sources. In our example, we are
interested in an additional dataset from the media domain: The BBC Music data
set. Selecting the data set takes us to the detail page for BBC Music (cf.
Figure 3(b)), with a description, statistical data, as well as information about the
distribution of the data set.
Once a relevant data source has been identi ed, its data can be integrated into
the system by the click of a button. Depending on the access mechanisms that
are supported for the data set, the Information Workbench o ers options to
either load data into the local repository (if an RDF dump is available) or to
connect the data virtually through a federation layer (whenever there exists an
open SPARQL endpoint). The integration approach is transparent to the end
user, which means that (i) within the deployment process the user does not
need to be concerned with aspects of physical distribution, access protocols and
interfaces, underlying data models etc., and (ii) the details of the integration are
hidden at runtime, so both local data and virtually integrated data sources can
be queried and accessed in an integrated way.</p>
      <p>
        Virtualized data integration is realized by the use of FedX, a federation layer
for Linked Data, which we developed speci cally for the transparent access to
Linked Data [4]. FedX is a practical framework that incorporates novel
optimization techniques for e cient federated query processing in a distributed setting
and supports the ad-hoc integration of data sources at runtime.
(
        <xref ref-type="bibr" rid="ref4">4</xref>
        ) Customization of the User Interface
The Information Workbench provides a rich UI out of the box, which enables
basic interactions with the data as soon as the data has been integrated into the
platform. The basic interaction components include tabular and graph-based
visualization and exploration widgets, Semantic Wiki pages for editing and
annotating the data, as well as components for semantic and faceted search.
      </p>
      <p>The user interface can be customized using a rich pool of widgets that are
shipped with the Information Workbench, targeting di erent data interaction
paradigms such as semantic search, data visualization (e.g., as tabular results,
graphs, charts, timelines, maps, heatmaps, etc.), navigation and exploration of
the data (e.g., a graph-based data browser or a PivotViewer), collaborative
editing, knowledge acquisition, as well as mashups with external data sources (e.g.,
Youtube, NY Times data, Facebook, and Twitter). All widgets can be easily
embedded into Semantic Wiki pages and are speci ed using a declarative wiki-based
syntax. With very little e ort, the standard views can be customized to create
domain- and application speci c interfaces. Figure 4 shows a screenshot of the
page of the Red Hot Chili Peppers. It is based on a wiki-based template de
nition that describes how resources of type musical artist are presented. The chart
at the bottom, for instance, is generated by the following widget declaration:
{{#widget: Chart |
query ='SELECT (COUNT(?release) AS ?count) ?label WHERE {
?? foaf:made ?release .</p>
      <p>?release rdfs:label ?label .
} GROUP BY ?label ORDER BY DESC(?count) LIMIT 30' |
chart = 'BAR_VERTICAL' | input = 'label' | output = 'count' }}</p>
      <p>In addition, the page includes a description of the artist taken from a remote
Web service (Last.FM) and an embedded live video from YouTube for this artist.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Keith</surname>
            <given-names>Alexander</given-names>
          </string-name>
          , Richard Cyganiak,
          <string-name>
            <given-names>Michael</given-names>
            <surname>Hausenblas</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Jun</given-names>
            <surname>Zhao</surname>
          </string-name>
          .
          <article-title>Describing linked datasets - on the design and usage of void</article-title>
          .
          <source>In In Linked Data on the Web Workshop (LDOW 09)</source>
          , in conjunction with WWW '
          <volume>09</volume>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Christian</given-names>
            <surname>Bizer</surname>
          </string-name>
          , Tom Heath, and
          <string-name>
            <surname>Tim</surname>
          </string-name>
          Berners-Lee.
          <article-title>Linked data - the story so far</article-title>
          .
          <source>Int. J. Semantic Web Inf. Syst.</source>
          ,
          <volume>5</volume>
          (
          <issue>3</issue>
          ):1{
          <fpage>22</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Michael</given-names>
            <surname>Hausenblas</surname>
          </string-name>
          .
          <article-title>Exploiting linked data to build web applications</article-title>
          .
          <source>IEEE Internet Computing</source>
          ,
          <volume>13</volume>
          (
          <issue>4</issue>
          ):
          <volume>68</volume>
          {
          <fpage>73</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>Andreas</given-names>
            <surname>Schwarte</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Peter</given-names>
            <surname>Haase</surname>
          </string-name>
          , Katja Hose, Ralf Schenkel, and
          <string-name>
            <given-names>Michael</given-names>
            <surname>Schmidt</surname>
          </string-name>
          .
          <source>FedX: Optimization Techniques for Federated Query Processing on Linked Data. In ISWC</source>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>