<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Towards a Uniform User Interface for Editing Mapping Definitions</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Pieter Heyvaert</string-name>
          <email>pheyvaer.heyvaert@ugent.be</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Anastasia Dimou Ruben Verborgh</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Erik Mannens</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rik Van de Walle</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Ghent University - iMinds - Multimedia Lab</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Modeling domain knowledge as Linked Data is not straightforward for data publishers, because they are domain experts and not Semantic Web specialists. Most approaches that map data to its semantic representation still require users to have knowledge of the underlying implementations, as the mapping definitions remained, so far, tight to their execution. Defining mapping languages enables to decouple the mapping definitions from the implementation that executes them. However, user interfaces that enable domain experts to model knowledge and, thus, intuitively define such mapping definitions, based on available input sources, were not thoroughly investigated yet. This paper introduces a non-exhaustive list of desired features to be supported by such a mapping editor, independently of the underlying mapping language; and presents the rmleditor as prototype interface that implements these features with rml as its underlying mapping language.</p>
      </abstract>
      <kwd-group>
        <kwd>Linked Data Mapping</kwd>
        <kwd>Linked Data Mapping Interface</kwd>
        <kwd>rml</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        In recent years, the Web is evolving from the Web of Documents, to the Web
of Data, where data can be interlinked (Linked Data [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]), according to the four
Linked Data principles [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Since the Resource Description Framework (rdf) [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]
allows adhering to these principles, different deployment approaches were
introduced to map data to rdf. Most approaches rely on either custom, or
formatspecific implementations [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. In both cases, the mapping definitions are tied to
the corresponding implementation and, thus, new development cycles, executed
by developers, are required to adjust them. Additionally, data publishers, who
are domain experts, should be able to specify mapping definitions, and modify
them at any time, because they possess the domain knowledge to be modeled.
      </p>
      <p>
        Since data publishers are not developers or Semantic Web experts, the
mapping definitions authoring should be decoupled from their execution. To this end,
research has been conducted to define mapping languages, e.g., r2rml [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] and
its extension rml [
        <xref ref-type="bibr" rid="ref4 ref6">4, 6</xref>
        ], which allow to separate the definitions from the
execution, i.e., the implementation. r2rml is defined for mapping data in relational
databases, while rml generalizes its purpose to also support mappings for data
in different formats.
      </p>
      <p>
        Nevertheless, besides knowledge of the underlying mapping language,
manually editing and curating definitions requires a substantial amount of human
effort [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Therefore, the process of defining mappings should be facilitated.
Research efforts to improve the usability of editing mapping definitions has led to
two research topics: (i) (semi-)automatically generating mapping definitions [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]
and (ii) abstracting the creation of mapping definitions from the definitions’
syntax [
        <xref ref-type="bibr" rid="ref10 ref8 ref9">8, 9, 10</xref>
        ]. The former uses the source data for the (semi-)automatic
generation of mapping definitions, by determining the data’s semantics. These
definitions can be edited afterwards by the user, if needed. The latter includes
offering a graphical user interface (gui), where the data, the mapping definitions
and the resulting rdf dataset is integrated.
      </p>
      <p>
        Most existing tools with a gui follow a step-by-step workflow [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. This way,
adjustments to the previous steps are not straightforward and, in general, data
publishers are distracted from the model overview, when they define a certain
mapping definition. Others only provide an interface for explicitly editing the
definitions, which requires knowledge of the underlying mapping language [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
Moreover, there are solutions that do not allow to visualize the definitions
without loading the input data [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. This makes performing editing actions that do not
require the input data impossible, e.g., updating a property that connects two
resources. Last but not least, most solutions refer only to a single data source
when modeling the rdf representation, while, the data required to complete the
domain-model might be derived from multiple input sources.
      </p>
      <p>In this paper we introduce a non-exhaustive list of desired features,
independent of any underlying mapping language. They are addressed by a uniform
mapping editor which aims to support data publishers to model domain-level
knowledge. We implement such an editor, the rmleditor, to offer these features
to data publishers. Currently, the rmleditor supports specifying mapping
definitions for tabular-structured data, e.g., databases or data in csv format. The
underlying mapping language is the rdf mapping language (rml).</p>
      <p>The remainder of the paper is structured as follows. We elaborate on related
work in Section 2. Next, the desired features are listed in Section 3. In Section 4,
we describe the rmleditor and we end with conclusions and future work in
Section 5.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        Editing mapping definitions independently of their execution requires
abstracting them from their execution. Different mapping languages were defined in the
past for this purpose. However, in most cases they are tightly coupled to a single
format. For relational databases, besides the w3c-recommended r2rml [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ],
several other languages were defined [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. Similarly, mapping languages were defined
to support mapping data in csv files and spreadsheets to the rdf data model.
For instance, the declarative owl-centric mapping language M2 that maps data
from spreadsheets into the Web Ontology Language (owl) [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], or the
sparqlbased Tarql1 that maps data from csv files to rdf. Apart from rml [
        <xref ref-type="bibr" rid="ref4 ref6">4, 6</xref>
        ],
there are no uniform languages defined that cover different data structures and
formats.
      </p>
      <p>
        While guis are being built on top of mapping languages, such as the
fluidOps editor [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] on top of r2rml, no thorough research has been conducted
regarding the design of such guis. sheet2rdf [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] is a platform for mapping
spreadsheets to rdf. It allows data publishers to preview the source data, edit
the mapping definitions and view the resulting triples. To describe the mapping
rules, a pearl [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] document is used. Editing the mapping definitions requires
knowledge of the pearl syntax. Hence, the data publishers need to understand
the definitions’ syntax. However, they are not Semantic Web experts, who have
this understanding.
      </p>
      <p>
        Earlier efforts to provide a gui for mapping definitions editing lead to tools
which allow users to include semantic annotations with the input data. An
example is OpenRefine2, which was originally focused on cleansing and transforming
data. It allows to interlink data with other data by using web services and to
upload the cleaned data to a central database. The rdf Refine3 [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] extension
is built to export data to the rdf data model. It offers a gui for specifying
the mapping definitions, using a (so-claimed) rdf graph. However, the graph is
forced in a hierarchy-layout, which weakens the advantages of using a graph
representation. In the same context, Karma4 [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] follows a semi-automatic approach
to map structured sources to ontologies in order to build semantic descriptions.
Like OpenRefine, its main focus is on the input data. Karma uses
Global-LocalAs-View (glav) [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] rules to perform the mappings, which can be exported as
r2rml or d2rq mapping definitions. Due to the fact that the definitions can only
be visualized with the input data, a data-independent overview of the mapping
model is not possible.
      </p>
      <p>
        Next to visualizing the editing of mapping definitions, a number of tools
impose restrictions on the workflow followed by the user. Sengupta et al. [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]
introduced the fluidOps editor for creating and editing r2rml mappings. It
supports a step-by-step workflow that follows steps similar to creating a mapping
document, while reducing the use of r2rml’s vocabulary details. However, such
a step-by-step workflow (i) restricts data publishers’ editing options; (ii) makes
altering parameters in previous steps difficult; and (iii) detaches editing mapping
definitions from the model. Thus, data publishers might lose the global overview
of the model, since related information is separated in different steps.
      </p>
      <p>
        Pinkel et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] adapted the original fluidOps editor to overcome
flexibility limitations imposed by the step-by-step workflow to evaluate their
different hypotheses regarding ontology-driven and database-driven approaches
towards defining mappings. They concluded that an editor should support both
      </p>
      <sec id="sec-2-1">
        <title>1 http:// tarql.github.io/ 2 http:// openrefine.org/ 3 http:// refine.deri.ie/ 4 http:// www.isi.edu/ integration/ karma/</title>
        <p>
          approaches. In another work of ours, we generalized these approaches to
schemadriven and data-driven, and introduced the model-driven and result-driven
approaches [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]. The ontology-driven approach can be applied to any schema,
namely any combination of ontologies and/or vocabularies. With the data-driven
approach, we consider any type of input and not only databases. In the case of
the model-driven approach, firstly the domain can be modeled by generating
abstract mappings, resulting in modeling the desired domain independently of the
input data. With the result-driven approach, mappings can be generated based
on the desired result of the mapping process.
        </p>
        <p>Another tool that uses a step-by-step workflow to set up the mapping process
is TopBraid Composer5. It is a Semantic Web modeling tool that facilitates the
automated conversion of spreadsheets or Unified Modeling Language (uml) files
into rdf. Similarly, a wizard application6 to create mapping definitions for rml,
uses a step-by-step workflow. However, in both cases, knowledge of the underlying
mapping language is required, just like with the fluidOps editor. In both case,
the workflow follows steps similar to directly editing a mapping document.</p>
        <p>
          Last, Scharffe et al. [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ] introduced the Datalift7 platform to map raw data
sources to semantically interlinked resources. The platform allows the addition
of custom modules, to enable customer-specific features. However, Datalift also
follows a step-by-step workflow which first applies a direct-mapping approach and
then let data publishers redefine the semantic annotations. As the modifications
are applied to the preliminary rdf model, generated after the direct mapping,
the mapping definitions are interpreted as SPARQL CONSTRUCT queries.
3
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Interface for Modeling Domain-level Knowledge</title>
      <p>An rdf mapping editor aims to support data publishers to model domain-level
knowledge as Linked Data using the, prevalent for this scope, rdf data model.
To achieve this, an editor should support publishers in creating and editing
mapping definitions to acquire the desired rdf representation, independently of
the original state of the data. Below, we introduce a list of desired features to
be supported by such a mapping editor.</p>
      <p>Independence of mapping language Domain experts acting as data
publishers are not Semantic Web experts. To model domain-level knowledge via
mapping definitions, an editor should allow to edit them without confronting
publishers with the syntax of the mapping editor’s underlying language. For
instance, a visualization for the definitions can be provided for modeling
domain knowledge. The editor will interpret the visualization as statements in the
language’s syntax.
5 http:// www.topquadrant.com/ tools/ modeling-topbraid-composer-standard-edition/
6 http:// pebbie.org/ mashup/ rml
7 http:// datalift.org/
Multiple data sources Data that is required to model domain-level knowledge
might be derived from multiple data sources. Hence, mapping editors should
support data publishers in defining mapping definitions referring to multiple data
sources at the same time. For instance, two csv files contain information about
employees and projects. Connecting employees to their corresponding projects
is only possible when definitions can be specified across the two files.
Heterogeneous data formats Mapping editors should not restrict data
publishers from accurately modeling domain-level knowledge because of the original
data format. They should rather enable editing mapping definitions
independently of the source data format. For instance, information about the different
teams in a company might be available via an xml file. Connecting the
aforementioned employees to their corresponding team is only possible when definitions
can be specified across the xml and csv files.</p>
      <p>Multiple ontologies and vocabularies Different ontologies and/or
vocabularies, which model complementary or overlapping aspects of domain-level
knowledge, are available. Hence, mapping editors should support data publishers in
defining mapping definitions that annotate data with multiple ontologies and/or
vocabularies at the same time. For instance, publishers should be able to provide
information about people using the FOAF8 ontology, and add their
corresponding bibliographic information using the Bibliographic Ontology9. The desired
result could be
1 &lt;www.example.com/person/jd&gt; a foaf:Person;
2 foaf:firstName "John";
3 foaf:lastName "Doe".
4 &lt;www.example.com/journal/1&gt; a bibo:Journal;
5 bibo:editor &lt;www.example.com/person/jd&gt;.</p>
      <p>
        Multiple alternative modeling approaches Mapping editors should enable
and support multiple alternative modeling approaches [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] and allow data
publishers to choose the most adequate one for their needs. To be more precise,
as defined by the data-driven approach, publishers might start with the input
data that is desired to be semantically annotated. For instance, with tabular
data, the columns to be mapped can be selected, followed by adding the correct
classes and properties from the used ontologies and/or vocabularies. However,
not all required columns need to be selected before the semantics can be added.
Alternatively, given a model, described by an ontology, they might want to
associate the input data with the different elements of the model, as defined by the
schema-driven approach. For instance, after modeling the domain knowledge
using an ontology, the user selects which columns, from the tabular data, represent
which parts of the knowledge that is being modeled.
      </p>
      <sec id="sec-3-1">
        <title>8 http:// www.foaf-project.org/ 9 http:// bibliontology.com/</title>
        <p>Non-linear workflows Modeling domain-level knowledge involves multiple
factors – data, schema(s) and mapping definitions – that have an influence on each
other. A linear workflow separates these factors and obscures their relationships,
by using ordered steps that need to be followed by the data publishers.
Nonlinear editing allows publishers to keep an overview of the mapping model and
its relationships. For instance, when adding the semantic annotations to the
model, publishers might find that the data of certain column of a csv file is
missing in the model. The non-linear workflow allows to integrate that data into
the model without the need to redo (parts of) the mapping process.
Independence of execution Editing mapping definitions lies beyond the scope
of their execution. Thus, mapping editors should be able to export the set of
mapping definitions, specified by the data publishers, through the user
interface. As a result, further processing or execution, outside the mapping editor,
improves the definitions’ interoperability and reusability. For instance, when the
definitions are created locally using an editor, they can be executed on a server
without further need of the editor.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>RMLEditor</title>
      <p>In this section, we introduce our rmleditor. First, we discuss its underlying
language rml. It is used because it is a mapping language that can express
uniform mapping definitions on top of heterogeneous data. Thus, rml can
express what data publishers define via the editor interface (see Section 4.1). Then,
we elaborate on the Graphical User Interface of the rmleditor (in Section 4.2),
based on the desired features. Next, we describe how data publishers interact
with the rmleditor (in Section 4.3), followed by how the mapping definitions
are interpreted as rml statements (in Section 4.4). Last, we elaborate on how
each desired feature is supported by the rmleditor (in Section 4.5). A screencast
demonstrating the rmleditor can be found at http:// rml.io/ RMLeditor .
4.1</p>
      <p>
        RDF Mapping Language (RML)
rml [
        <xref ref-type="bibr" rid="ref4 ref6">4, 6</xref>
        ] is the rmleditor’s underlying language. It is a generalization of
r2rml [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] to support mapping definitions referring to data in different formats.
rml is defined as a superset of r2rml, with the goal to extend its applicability
and broaden its scope. While r2rml is limited to homogeneous data sources (i.e.,
databases), rml supports multiple heterogeneous sources. Although it adopts
the same syntax, it can be enhanced with case-specific extensions. Next, we will
elaborate on an example. First, the data is available in the CSV file ‘person.csv’.
This results in the following Logical Source
      </p>
      <p>The location of the file is specified by rml:source. Because it is a CSV file, we use
the correct reference formulation using rml:referenceFormulation. Next, the
subject of the rdf triples is determined by the Subject Map, where rr:template
explains how the iri is constructed. In the example the value of the
‘Identification’ column is used. The class of subject is determined by rr:class. In the
example, the class is set to foaf:Person.
1 &lt;#PersonMapping&gt;
2 rr:subjectMap [
3 rr:template "http://www.example.com/person/{Identification}";
4 rr:class foaf:Person ].</p>
      <p>Using the column ‘Name’ (via rrmlreference in an Object Map), the family
name of a person (foaf:familyName) can be added via rr:predicate to a
Predicate Object Map. This results in
1 &lt;#PersonMapping&gt;
2 rr:predicateObjectMap [
3 rr:predicate foaf:familyName;
4 rr:objectMap [
5 rml:reference "Name" ]].</p>
      <p>The rmlprocessor10 can be used to apply the mapping definitions on the source
data. It is integrated in the rmleditor to support the execution of definitions.</p>
      <p>rml was chosen because it can implement all desired features that are
related to the underlying mapping language. Currently, it is the only language
that offers uniform mapping definitions and natively supports multiple and/or
heterogeneous data sources. Using the interface with a mapping language that
does not support multiple and/or heterogeneous data sources would restrict data
publishers from using all their available data.
4.2</p>
      <sec id="sec-4-1">
        <title>RMLEditor’s Graphical User Interface</title>
        <p>There are four aspects that contribute when modeling knowledge: (i) the input
data, (ii) the schemas, (iii) the mapping definitions and (iv) the resulting
semantic representation. This is translated in the gui of the rmleditor to three
different panels: (i) the Input Panel, (ii) the Modeling Panel and (iii) the Results
Panel. The first panel shows a sample of each input data source, the second
panel allows data publishers to edit the mapping definitions, including the used
schemas, and the third panel displays the resulting dataset. These three panels
are aligned next to each other as shown in Figure 1. Additionally, (re)loading the
sources and editing the definitions can occur independently and interchangeably.
10 https:// github.com/ RMLio/ RML-Processor</p>
        <p>
          When creating a gui to provide a visualization for ontologies, a common
approach is the use of graphs [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ]. The Visual Notation for owl Ontologies [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ] (vowl)
defines a visual language, using graphs, for the representation of ontologies.
Nodes are used to represent the classes and the datatypes, while edges represent
the relationships between the nodes. For the visualization of the mapping
definitions in the Modeling Panel, we use graphs to represent how the rdf dataset will
be generated. Similar to ontology visualizations, the nodes represent the
mapping definitions that generate the rdf terms (uris, Blank Nodes and Literals),
while the edges represent the relationships between them, namely the properties
that associate them.
        </p>
        <p>In the Modeling Panel, the references to the data (i.e., the column names in
the case of databases, or header names in the case of csv files), are entwined
together with information about their class or datatype, represented by the nodes.
The edges represent the properties that connect two types of data. By doing so,
we adhere as much as possible to the vowl specification. For instance, as shown
in Figure 2, people (visualized as a node; represented by the column
‘Identification’; Figure 2a) of the class foaf:Person are connected to their family name
(visualized as a node; represented by the column ‘Name’; Figure 2b) by the
property foaf:familyName (visualized as an edge; Figure 2c). Similarly, people are
connected to their given name (using the property foaf:givenName; represented
by the column ‘Given Name’) and gender (using the property foaf:gender;
represented by the column ‘Gender’).</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.3 Interacting with the RMLEditor</title>
        <p>Having different panels in the rmleditor allows data publishers to follow different
approaches when generating mapping definitions. For instance, data publishers
can first use the Input Panel to begin the modeling. Nodes (Figures 2a and
2b) can be added by right-clicking on a column and selecting New Resource for
adding a resource or a blank node, or New Literal for adding a literal, with an
optional datatype and language annotation. The result of this action is a new</p>
        <p>List. 1. RML statements
node in the Modeling Panel on which further mapping actions can be performed,
e.g., add its class if it is a resource or its datatype if it is a literal. The Node
contains a reference to the corresponding data fraction. As data publishers are
not restricted to a single source, nodes might be generated from any of the input
sources.</p>
        <p>Alternatively, data publishers can use the Modeling Panel directly to edit the
mapping definitions. Right-clicking on an empty location on that panel allows
to create a new Resource Node (Figure 2a), Literal Node (Figure 2b) or Edge
(Figure 2c), by clicking on New Resource, New Literal or New Edge, respectively.
As soon as a Node is created, data publishers can associate it with the source
data and further edit its semantic annotations, namely, edit its class, datatype,
language and base iri. If an edge is created, publishers can set the predicate used
to define the relationship between two nodes. As most entities in the resulting
dataset have the same base iri, the rmleditor allows publishers to define the
base iri and, optionally, the corresponding prefix, in the Modeling Panel.</p>
        <p>The rmleditor allows data publishers to execute the mapping on either the
sample or the complete input data. For the sample input, this is done by clicking
on the Run Mapping button in the toolbar. The resulting rdf dataset is visible
in the Results Panel. A part of the mapping definitions is sufficient to have a
first idea of the resulting dataset. This allows to easily detect errors early in
the mapping process. Thanks to the fact that the panels are aligned next to
each other, errors can be directly addressed using the Model Panel. When data
publishers want to use their complete screen estate to work on the modeling, they
can easily hide the other panels thanks to the Hide Input Panel and Hide Results
Panel buttons in the toolbar. When the mapping definitions are complete, data
publishers are able to export the mapping definitions.</p>
        <p>The fact that the rmleditor provides the possibility to simultaneously present
all panels, allows to visualize the relationships between the different elements of
the panels. To be more precise, when a node on the Modeling Panel is associated
to a data fraction, that fraction is highlighted in the Input Panel.
4.4 Interpreting Graphical Mapping Definitions as RML Statements
The mapping definitions modeled in the Modeling Panel are translated into
rml statements, which can then be further processed or executed using the
rmlprocessor. Below, we explain how the definitions in Figure 2 are interpreted
as the rml statements in Listing 1 (a detailed explanation of the statements can
be found in Section 4.1).</p>
        <p>First, the Resource Node (Figure 2a) is interpreted as a Subject Map
(Listing 1a) and based on this, a Triples Map (&lt;#PersonMapping&gt; in Listing 1) is
formed. The Subject Map will generate the subject of the triple. The class foaf:Person
is interpreted as a class statement, using rr:class. The iri of the resource is
created based on the value of the ‘Identification’ column of each person, as defined
by rr:template of the Subject Map.</p>
        <p>Next, the Literal Node representing the (family) name of the person
(Figure 2b) is translated to an Object Map (Listing 1b), which specifies how the
object of a triple is generated. Subsequently, the Edge that states that the
person and its name are connected with foaf:familyName (Figure 2c) is translated to
a Predicate Map. A Predicate Object Map (Listing 1c) associates each Object Map
to its Predicate Map. When a Resource Node is connected to another Resource
Node, a Referencing Object Map is formed for the current Triples Map and another
Triples Map is formed having the latter Resource Node as its Subject Map. An
independent Literal Node is not possible.</p>
        <p>Finally, as seen in Listing 1d, a Logical Source defines the source of the data
(via rml:source) and how the source is referenced (via rml:referenceFormulation)
based on the data format, which in the example’s case is csv. This information
is derived from the Input Panel.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.5 Implementing the Desired Features</title>
        <p>The GUI of the rmleditor implements the desired features of Section 3. Below,
we will elaborate on each feature how it is facilitated. A summery of which
features are supported by which panels can be found in Table 1.
Independence of mapping language The challenge is to abstract the
mapping definitions from the syntax, namely the underlying mapping language. The
Modeling Panel achieves this by visualizing the definitions using graphs
(Section 4.2). When the definitions need to be executed on the input data, the
graphs are interpreted as rml statements (Section 4.4). Additionally, when
implementing the rmleditor, we used modular programming. Thus, a new module
can be added to use another mapping language to interpret the model.
Multiple data sources Multiple data sources are visualized in the Input Panel,
using a tab to represent each data source. A tab will contain a sample of the
data. When data publishers add an new source, a tab will be added to the panel.
Heterogeneous data formats Each tab of the Input Panel will visualize the
data source in a way that is best suited for the format of the data. Currently,
only tabular data, e.g., data in databases or in csv format is supported by the
rmleditor. However, new modules can be added to offer new or other
visualizations for different formats.</p>
        <p>Multiple ontologies and vocabularies Using the Modeling Panel, publishers
can set the class of a resource node or the datatype of a literal node. They are
not restricted to which ontologies or vocabularies can be used, and they can be
changed at any time during the mapping process. The rmleditor aids publishers
in finding appropriate classes, datatypes and properties. In order to achieve that
we integrated the use of Linked Open Vocabularies (lov)11. Prefix.cc12 is used
to help publishers in looking up namespaces of the used schema(s).
Multiple alternative modeling approaches By using three panels, data
publishers can decide, based on their current need, which approach they prefer
to follow. By first creating nodes based on the data in the Input Panel, publishers
are able to apply the data-driven approach. By first defining a model, described
by an ontology, in the Modeling Panel, publishers can subsequently associate
the input data with the different elements of the model, as described by the
schema-driven approach.</p>
        <p>Non-linear workflows By showing the three panels simultaneously, data
publishers have an overview of the mapping process. Thus, editing different elements
of the process can be done in the corresponding panel without the need for
tracing back previously completed steps.
11 http:// lov.okfn.org/ dataset/ lov
12 http:// prefix.cc/</p>
        <p>Independence of execution At any point during the mapping progress data
publishers are able to export the mapping definitions as rml statements, as
facilitated by the Modeling Panel. Subsequently, the definitions can be edited
and/or executed without the need of the rmleditor.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusion and Future Work</title>
      <p>In this paper, we present a set of desired features to be considered during design
and implementation of a uniform mapping editor. By doing so, data publishers,
who are domain experts, are able to model knowledge derived from heterogeneous
data. The proposed editor, the rmleditor, offers these features, using rml as its
underlying mapping language. The rmleditor’s interface is implemented based
on separate panels, which are aligned next to each other.</p>
      <p>Modeling domain-level knowledge with Linked Data is a task that should be
performed by data publishers, who are domain experts, rather than Semantic
Web specialists. There are four aspects that contribute to this modeling: (i) the
input data, (ii) the schemas, (iii) the mapping definitions and (iv) the resulting
semantic representation. Mapping editors and their guis should enable data
publishers to build their model on top of any of them, while incorporating
nonlinear workflows and multiple approaches. These workflows and approaches give
publishers the freedom to use the editor as it best fits the problem at hand. In
order to achieve that, designing an interface with multiple panels, as the one
proposed and implemented in the rmleditor, seems to be an adequate solution.</p>
      <p>Moreover, the modeling should be detached from the syntax of the
underlying mapping language and/or the format and number of input sources. In the
rmleditor, the Modeling Panel, which uses a graph visualization for the mapping
definitions, allows this detachment.</p>
      <p>In the future, we will add support for non-tabular-structured data to the
rmleditor. We will conduct an analysis to determine which mapping definitions
generation approaches need to be supported by the rmleditor. Last, evaluation
based on user study, is planned to validate the rmleditor.</p>
      <p>Acknowledgements The described research activities were funded by Ghent
University, iMinds, the Institute for the Promotion of Innovation by Science and
Technology in Flanders (IWT), the Fund for Scientific Research Flanders (FWO
Flanders), and the European Union.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Christian</given-names>
            <surname>Bizer</surname>
          </string-name>
          , Tom Heath, and
          <string-name>
            <surname>Tim</surname>
          </string-name>
          Berners-Lee.
          <article-title>Linked data - the story so far</article-title>
          .
          <source>Semantic Services, Interoperability and Web Applications: Emerging Concepts</source>
          , pages
          <fpage>205</fpage>
          -
          <lpage>227</lpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Tim</given-names>
            <surname>Berners-Lee</surname>
          </string-name>
          .
          <source>Linked data</source>
          ,
          <year>2006</year>
          ,
          <year>2006</year>
          . URL http:// www.w3.org/ DesignIssues/ LinkedData.html.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Dan</given-names>
            <surname>Brickley</surname>
          </string-name>
          and
          <string-name>
            <given-names>R. V.</given-names>
            <surname>Guha</surname>
          </string-name>
          .
          <source>RDF Schema 1</source>
          .1. Working group recommendation,
          <source>W3C</source>
          ,
          <year>February 2014</year>
          . URL http:// www.w3.org/ TR/ rdf-schema/ .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Anastasia</given-names>
            <surname>Dimou</surname>
          </string-name>
          , Miel Vander Sande, Pieter Colpaert, Ruben Verborgh, Erik Mannens, and Rik Van de Walle.
          <article-title>RML: A Generic Language for Integrated RDF Mappings of Heterogeneous Data</article-title>
          .
          <source>In Workshop on Linked Data on the Web</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Souripriya</given-names>
            <surname>Das</surname>
          </string-name>
          ,
          <string-name>
            <surname>Seema Sundara</surname>
          </string-name>
          , and Richard Cyganiak. R2RML:
          <article-title>RDB to RDF Mapping Language</article-title>
          . Working group recommendation,
          <source>W3C</source>
          ,
          <year>September 2012</year>
          . URL http:// www.w3.org/ TR/ r2rml/ .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Anastasia</given-names>
            <surname>Dimou</surname>
          </string-name>
          , Miel Vander Sande, Jason Slepicka, Pedro Szekely, Erik Mannens, Craig Knoblock, and Rik Van de Walle.
          <article-title>Mapping Hierarchical Sources into RDF using the RML Mapping Language</article-title>
          .
          <source>In Proceedings of the 8th IEEE International Conference on Semantic Computing</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Christoph</given-names>
            <surname>Pinkel</surname>
          </string-name>
          , Carsten Binnig,
          <string-name>
            <given-names>Peter</given-names>
            <surname>Haase</surname>
          </string-name>
          , Clemens Martin,
          <string-name>
            <given-names>Kunal</given-names>
            <surname>Sengupta</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Johannes</given-names>
            <surname>Trame</surname>
          </string-name>
          .
          <article-title>How to best find a partner? An evaluation of editing approaches to construct R2RML mappings</article-title>
          .
          <source>In The Semantic Web: Trends and Challenges</source>
          , pages
          <fpage>675</fpage>
          -
          <lpage>690</lpage>
          . Springer,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Craig</surname>
            <given-names>A Knoblock</given-names>
          </string-name>
          , Pedro Szekely, José Luis Ambite, Aman Goel, Shubham Gupta, Kristina Lerman, Maria Muslea, Mohsen Taheriyan, and
          <string-name>
            <given-names>Parag</given-names>
            <surname>Mallick</surname>
          </string-name>
          .
          <article-title>Semi-automatically mapping structured sources into the semantic web</article-title>
          .
          <source>In The Semantic Web: Research and Applications</source>
          , pages
          <fpage>375</fpage>
          -
          <lpage>390</lpage>
          . Springer,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Kunal</given-names>
            <surname>Sengupta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Peter</given-names>
            <surname>Haase</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Michael</given-names>
            <surname>Schmidt</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Pascal</given-names>
            <surname>Hitzler</surname>
          </string-name>
          .
          <article-title>Editing R2RML mappings made easy</article-title>
          .
          <source>In Proceedings of the 12th International Semantic Web Conference: Posters and Demos</source>
          , pages
          <fpage>101</fpage>
          -
          <lpage>104</lpage>
          .
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Manuel</surname>
            <given-names>Fiorelli</given-names>
          </string-name>
          , Tiziano Lorenzetti, Maria Teresa Pazienza, Armando Stellato, and Andrea Turbati.
          <article-title>Sheet2rdf: a Flexible and Dynamic Spreadsheet Import&amp;Lifting Framework for RDF</article-title>
          .
          <source>In Current Approaches in Applied Artificial Intelligence</source>
          , pages
          <fpage>131</fpage>
          -
          <lpage>140</lpage>
          . Springer,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Matthias</surname>
            <given-names>Hert</given-names>
          </string-name>
          , Gerald Reif, and
          <string-name>
            <surname>Harald</surname>
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Gall</surname>
          </string-name>
          .
          <article-title>A comparison of RDB-toRDF mapping languages</article-title>
          .
          <source>In Proceedings of the 7th International Conference on Semantic Systems, I-Semantics '11</source>
          , pages
          <fpage>25</fpage>
          -
          <lpage>32</lpage>
          . ACM,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Boris</surname>
            <given-names>Motik</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Peter F. Patel-Schneider</surname>
          </string-name>
          ,
          <article-title>and Bijan Parsia. OWL 2 Web Ontology Language Structural Specification and Functional-Style Syntax (Second Edition)</article-title>
          . Working group recommendation,
          <source>W3C</source>
          ,
          <year>December 2012</year>
          . URL http:// www.w3.org/ TR/ owl-syntax/ .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Maria</given-names>
            <surname>Teresa</surname>
          </string-name>
          <string-name>
            <surname>Pazienza</surname>
          </string-name>
          , Armando Stellato, and
          <string-name>
            <given-names>Andrea</given-names>
            <surname>Turbati</surname>
          </string-name>
          . Pearl:
          <article-title>Projection of annotations rule language, a language for projecting (uima) annotations over rdf knowledge bases</article-title>
          .
          <source>In LREC</source>
          , pages
          <fpage>3828</fpage>
          -
          <lpage>3835</lpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Fadi</surname>
            <given-names>Maali</given-names>
          </string-name>
          , Richard Cyganiak, and
          <string-name>
            <given-names>Vassilios</given-names>
            <surname>Peristeras</surname>
          </string-name>
          .
          <article-title>Re-using cool uris: Entity reconciliation against lod hubs</article-title>
          .
          <source>LDOW</source>
          ,
          <volume>813</volume>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>Marc</given-names>
            <surname>Friedman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Alon Y Levy</given-names>
            ,
            <surname>Todd D Millstein</surname>
          </string-name>
          , et al.
          <article-title>Navigational plans for data integration</article-title>
          .
          <source>AAAI/IAAI</source>
          , pages
          <fpage>67</fpage>
          -
          <lpage>73</lpage>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Pieter</surname>
            <given-names>Heyvaert</given-names>
          </string-name>
          , Anastasia Dimou, Ruben Verborgh, Erik Mannens, and Rik Van de Walle.
          <article-title>Approaches for generating mappings to RDF</article-title>
          .
          <source>In Proceedings of the 14th International Semantic Web Conference: Posters and Demos</source>
          ,
          <year>October 2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>François</surname>
            <given-names>Scharffe</given-names>
          </string-name>
          , Ghislain Atemezing, Raphaël Troncy, Fabien Gandon, Serena Villata, Bénédicte Bucher, Fayçal Hamdi, Laurent Bihanic, Gabriel Képéklian,
          <string-name>
            <given-names>Franck</given-names>
            <surname>Cotton</surname>
          </string-name>
          , et al.
          <article-title>Enabling linked data publication with the Datalift platform</article-title>
          .
          <source>In Proc. AAAI workshop on semantic cities</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Akrivi</surname>
            <given-names>Katifori</given-names>
          </string-name>
          , Constantin Halatsis, George Lepouras, Costas Vassilakis, and
          <string-name>
            <given-names>Eugenia</given-names>
            <surname>Giannopoulou</surname>
          </string-name>
          .
          <article-title>Ontology visualization methods - a survey</article-title>
          .
          <source>ACM Computing Surveys (CSUR)</source>
          ,
          <volume>39</volume>
          (
          <issue>4</issue>
          ):
          <fpage>10</fpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Steffen</surname>
            <given-names>Lohmann</given-names>
          </string-name>
          , Stefan Negru, Florian Haag, and Thomas Ertl.
          <article-title>vowl 2: User-oriented visualization of ontologies</article-title>
          .
          <source>In Proceedings of the 19th International Conference on Knowledge Engineering and Knowledge Management (EKAW '14)</source>
          , volume
          <volume>8876</volume>
          <source>of LNAI</source>
          , pages
          <fpage>266</fpage>
          -
          <lpage>281</lpage>
          . Springer,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>