Towards an Interface for User-Friendly
        Linked Data Generation Administration?

                Anastasia Dimou, Pieter Heyvaert, Wouter Maroy,
              Laurens De Graeve, Ruben Verborgh, and Erik Mannens

         Ghent University - iMinds, Belgium {firstname.lastname}@ugent.be


         Abstract. Linked Data generation and publication remain challenging
         and complicated, in particular for data owners who are not Semantic
         Web experts or tech-savvy. The situation deteriorates when data from
         multiple heterogeneous sources, accessed via different interfaces, is inte-
         grated, and the Linked Data generation is a long-lasting activity repeated
         periodically, often adjusted and incrementally enriched with new data.
         Therefore, we propose the rmlworkbench, a graphical user interface to
         support data owners administrating their Linked Data generation and
         publication workflow. The rmlworkbench’s underlying language is rml,
         since it allows to declaratively describe the complete Linked Data gen-
         eration workflow. Thus, any Linked Data generation workflow specified
         by a user can be exported and reused by other tools interpreting rml.

         Keywords: Linked Data Generation, Linked Data Workbench, [R2]RML


1     Introduction
Administrating the integration of the ever-increasing amounts of data from mul-
tiple sources in different formats into a common knowledge domain remains
challenging and complicated, in particular for data owners who are not Seman-
tic Web experts or tech-savvy [4]. Generating Linked Data requires dealing with
data that can originally (i) reside on diverse, distributed locations, (ii) be ap-
proached using different access interfaces, and (iii) be expressed in heterogeneous
structures and formats [3]. As the Linked Data generation becomes a long-lasting
activity, which is repeated periodically and is incrementally adjusted with new
data, administrating the different components becomes difficult.
    To minimize the effort and knowledge that data owners need to administrate
their data and the overall Linked Data generation and publication workflow,
we developed a multi-user browser application, the rmlworkbench. This demo
shows how data owners can use the rmlworkbench. Depending on their assigned
roles, data owners can view and manage different sources for retrieving raw data
and the corresponding mappings to generate Linked Data, in contrast to our
earlier work on the rmleditor [4] which only focuses on editing mapping rules. A
screencast of the rmlworkbench, is available at https://youtu.be/8UkI01nQNxc.
?
    The research activities described in this paper were funded by Ghent University, iMinds, the
    Institute for the Promotion of Innovation by Science and Technology in Flanders (IWT), the
    Fund for Scientific Research-Flanders (FWO-Flanders) and the European Union.
2       Anastasia Dimou et al.

2   State of the Art
The FluidOps Information Workbench1 , Ultrwrap2 and LinDa workbench3 are
gui tools supporting data owners to generate Linked Data. However, the latter
two only support tabular data, while the former, even though it supports more
data sources, does not allow specifying different access interfaces. Most impor-
tantly though, none of them allows users to export their specified Linked Data
generation workflow in a declarative, complete and interoperable way that allows
to replicate the same Linked Data generation by other tools.
    Linked Pipes4 , and its predecessor Unified Views5 , are general-purpose tools
that allow users to administrate, execute, debug, monitor and share Linked Data
processing tasks, for smooth and efficient management. However, they are not
focused on Linked Data generation. They perform direct mappings which are
afterwards processed via sparql construct queries. Moreover, they only allow to
export the different processes descriptions, using their own custom descriptions.
    The Silk Workbench6 follows a similar approach as the rmlworkbench. Even
though it is a gui supporting users to administrate rdf dataset linking, it also re-
quires corresponding aspects to be specified. Its function relies on projects which
consist of linkage rules associated with data sources (data dumps or sparql
endpoints) constituting altogether linkage tasks, as mapping rules are associated
with data sources constituting generation tasks in the case of the rmlworkbench.

3   The RML Workbench Interface
The rmlworkbench design principles are generic, following the classical multi-
tier client-server architecture. Its underlying language to declaratively define the
Linked Data generation workflow specified by the user is rml. rml [2] is a gener-
alization of the wc recommended rrml mapping language [1], which is defined
to specify rules to generate Linked Data from data in relational databases. rml
extends rrml to also specify rules from data in any semi-structured format,
e.g., csv, xml, or json. rml was furthermore aligned with different vocabu-
laries, e.g., dcat7 , csvw8 , or Hydra9 , to specify how to access data used to
generate the desired Linked Data. The rmlworkbench considers rml as its un-
derlying language, since it is the only one able to declaratively describe the
complete Linked Data generation workflow, independently of data sources and
formats [3]. Thus, all mapping rules, including the aligned data sources descrip-
tion may be exported and re-used by other tools beyond the rmlworkbench to
replicate the generation of same Linked Data.
1
  https://www.fluidops.com/en/portfolio/information_workbench/
2
  https://capsenta.com/
3
  https://github.com/LinDA-tools/LindaWorkbench
4
  http://etl.linkedpipes.com/
5
  http://unifiedviews.eu/
6
  https://github.com/silk-framework/silk/blob/master/doc/Workbench.md
7
  https://www.w3.org/TR/vocab-dcat/
8
  https://www.w3.org/TR/tabular-metadata/
9
  https://www.hydra-cg.com/spec/latest/core/
                                                          RML Workbench          3

   The rmlworkbench consists of five panels: Access, Retrieve, Generate, Publish
and Schedule. In the remaining of this section, each panel is briefly presented.

Access panel. Users can manage their own sources, which can be accessed through
interfaces for local files, databases or Web sources. The descriptions are anno-
tated using different vocabularies, e.g., dcat, csvw or Hydra. For instance, a
user specifies a database accessed via a certain jdbc, and labels it “DB Source”
and a dataset published on a dcat catalog, which he labels “Catalog Source”.


Retrieve panel. Not all data which appear in a data source are required to gen-
erate the desired Linked Data. Distinct subsets may be considered separately for
generating different Linked Data sets. The Retrieve panel allows users to specify
which exact data is retrieved for each selection. For instance, via the Retrieve
panel, the aforementioned user specifies the exact tables, which are eventually
considered to generate certain Linked Data. The user specifies the “Singers”
and “Albums” tables of the “DB source” and labels them as “Singer data”
and “Album data” respectively. Moreover, the same user specifies and labels as
“Performance data”, among the different datasets of the “Catalog source”, the
dataset about performances, and precisely its xml distribution.

Generate panel. To generate the desired Linked Data, the users need to specify
sets of mapping rules. The rmlworkbench allows users to (i) upload a mapping
document, (ii) specify a Web source with mapping rules, or (iii) directly edit
them via its interface. Different sets of mapping rules may be associated with the
same data, generating thus different Linked Data views. Once the set of mapping
rules is associated with some raw data, the users can execute the mapping and
generate the desired Linked Data (“Execute” button, as shown in the following
figure). The dataset is then listed among the datasets available for publishing, or
the users are notified if the generation was not successful. The users can specify
mapping rules, for instance the sets of “Singer mappings” and “Performance
mappings”. Once the mapping rules are listed among the available sets, the users
can associate them with the corresponding data (“Add Logical Source” button),
in our example the “Singer data” and “Performance data”. Furthermore, the
users may desire to generate another Linked Data set with the same data. In
that case, another set of mapping rules is added, e.g., the “Person mappings”,
and the user associates it with the “Singer data” as well.
4          Anastasia Dimou et al.


Publish panel. A frequent activity after generating Linked Data is its publica-
tion. The rmlworkbench supports users to easily accomplish this activity. In our
example, the Linked Data is published via an ldf server10 . Nevertheless, the ad-
ministrator can easily configure other interfaces, for instance sparql endpoints.
The users can then choose one or more of them to publish their Linked Data.

Schedule panel. In most cases, the Linked Data generation and publication is a
recurring activity. Data owners periodically regenerate their Linked Data set to
keep it up-to-date with the original data. The rmlworkbench allows data owners
to schedule the Linked Data generation and publication activities.


    To summarize, the rmlworkbench allows data owners to specify their com-
plete Linked Data generation workflow, without being restricted by the tool. In
the future, the rmleditor [4] will be integrated with the rmlworkbench and users
can then directly use the rmleditor to edit their mapping rules.

References
1. S. Das, S. Sundara, and R. Cyganiak. R2RML: RDB to RDF Mapping Language.
   Working Group Recommendation, W3C, Sept. 2012. http://www.w3.org/TR/r2rml/ .
2. A. Dimou, M. Vander Sande, P. Colpaert, R. Verborgh, E. Mannens, and R. Van de
   Walle. RML: A Generic Language for Integrated RDF Mappings of Heterogeneous
   Data. In Workshop on Linked Data on the Web, 2014.
3. A. Dimou, R. Verborgh, M. Vander Sande, E. Mannens, and R. Van de Walle.
   Machine-Interpretable Dataset and Service Descriptions for Heterogeneous Data
   Access and Retrieval. In SEMANTiCS 2015, 2015.
4. P. Heyvaert, A. Dimou, A.-L. Herregodts, V. Ruben, S. Dimitri, M. Erik, and
   V. de Walle Rik. RMLEditor: A Graph-Based Mapping Editor for Linked Data
   Mappings. In The Semantic Web: ESWC 2016. Springer, 2016.

10
     https://github.com/LinkedDataFragments/Server.js