<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Semi-Automatic Example-Driven Linked Data Mapping Creation?</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>IDLab, Department of Electronics and Information Systems, Ghent University</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Linked Data can be generated by applying mapping rules on existing (semi-)structured data. The manual creation of these rules involves a costly process for users. Therefore, (semi-)automatic approaches have been developed to assist users. Although, they provide promising results, in use cases where examples of the desired Linked Data are available they do not use the knowledge provided by these examples, resulting in Linked Data that might not be as desired. This in turn requires manual updates of the rules. These examples can in certain cases be easy to create and o er valuable knowledge relevant for the mapping process, such as which data corresponds to entities and attributes, how this data is annotated and modeled, and how di erent entities are linked to each other. In this paper, we introduce a semi-automatic approach to create rules based on examples for both the existing data and corresponding Linked Data. Furthermore, we made the approach available via the rmleditor, making it readily accessible for users through a graphical user interface. The proposed approach provides a rst attempt to generate a complete Linked Dataset based on user-provided examples, by creating an initial set of rules for the users.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>In most cases, Linked Data is generated by applying mapping rules on existing
(semi-)structured data. The mapping rules state how Linked Data is generated
from (raw) data, for every graph pattern that is part of the resulting Linked
Data. However, before these rules can be applied, they need to be created. This
includes providing (i) which data corresponds to entities and attributes, i.e., the
subjects and objects of an rdf triple; (ii) how the data is annotated and modeled,
i.e., which classes, properties, datatypes, and languages are used and how they
are related to each other; and (iii) how di erent entities are linked to each other.
For example, consider the raw data in Listings 1.1 and 1.2 about books and
authors, and a Linked Data example in Listing 1.3. The latter includes entities
and attributes which are constructed relying on data values which also appear
in the raw data and uses certain classes, properties, and datatypes. For every
graph pattern, at least three rules need to be de ned to generate this Linked
Data, namely, for the subject, predicate, and object. For example, to generate
a triple with the subject http://www.example.com/book/0, a rule has to state
that the string http://www.example.com/book/ has to be combined with the
id of the book, which is in this case \0". This results in the creation of at least
21 mapping rules, because there are 7 triples, and each one requires at least 3
rules.
1 id,title,author
2 0,Harry Potter and The Sorcerer's Stone,J.K. Rowling
3 1,Homo Deus,Yuval Noah Harari</p>
    </sec>
    <sec id="sec-2">
      <title>Listing 1.1: CSV data about books Listing 1.2: JSON about authors data</title>
    </sec>
    <sec id="sec-3">
      <title>Listing 1.3: Linked Data example</title>
      <p>
        When the rules are created manually, they are prone to errors, especially
when dealing with large and complex data sources [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ], and/or multiple data
sources at the same time [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. To ease this process, both semi-automatic and
automatic approaches have been the topic of research [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The former require
user interaction during the generation of the rules, while the latter does not.
Although, they provide promising results, in use cases where examples of the
desired Linked Data are available the generated Linked Data might not be as
desired. This in turn requires users to manually update the rules. This is due
to the fact that these approaches do not consider the knowledge embedded in
the examples when initially creating rules. Nevertheless, these examples o er
knowledge relevant for the mapping process.
      </p>
      <p>A Linked Data example can be used as a reference point for the creation
of rules for generating Linked Data, from some other raw data. This is done
by (i) generating entities and attributes in the same way, (ii) applying the
same model and semantic annotations, and (iii) providing the same
relationships among entities. For example, users may consider existing Linked Data, the
complete set or just a sample of it, and want their data to be modeled and
annotated the same way. This is re ected in the rules that are created, as mentioned
before, based on this existing Linked Data.</p>
      <p>
        To support the aforementioned use cases, we propose a semi-automatic
approach for the example-driven creation of Linked Data mapping rules. The
example Linked Data is used to extract the necessary information to create the
corresponding mapping rules. Users provide two elements: a set of data sources
and a set of rdf triples. These two elements are used to create the rules through
the following steps: (i) the original data sources are aligned with the Linked
Data example and (ii) mapping rules are created based on this alignment and
the knowledge about the model and semantic annotations extracted from the
Linked Data example. In certain cases, manual additions might still be required,
such as data transformations on the original data [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>The remainder of the paper is structured as follows. In Section 2, we discuss
the related work. In Section 3, we introduce a running example. In Section 4, we
discuss our proposed approach. In Section 5, we elaborate on the implementation
of this approach. In Section 6, we discuss both the existing approaches and our
approach, and conclude the paper.
2</p>
      <sec id="sec-3-1">
        <title>Related Work</title>
        <p>In this section, we discuss the related work regarding mapping rules, and
mapping rule creation approaches and solutions.
2.1</p>
        <sec id="sec-3-1-1">
          <title>Background</title>
          <p>
            A mapping consists of one or more mapping rules that state how rdf terms and
triples should be generated. A mapping rule denotes how data from an original
data source is used in the rdf terms, how these terms are associated to each
other, and how these terms form rdf triples. Mapping languages, such as
r2rml [
            <xref ref-type="bibr" rid="ref6">6</xref>
            ] and rml [
            <xref ref-type="bibr" rid="ref7">7</xref>
            ], provide a speci cation of the syntax on how to
declaratively construct mappings. This is preferred over the use of custom software and
scripts, because mapping languages provide a reusable solution, while custom
software and scripts are tied to a speci c use case and/or implementation [
            <xref ref-type="bibr" rid="ref7">7</xref>
            ].
          </p>
          <p>
            rml is an extension of r2rml, the W3C-recommended language for mapping
relational databases to rdf. While r2rml is limited to only relational databases,
rml applies a data-independent approach, allowing the mapping of also, e.g.,
JSON and CSV to rdf. For a detailed explanation of both mapping languages,
we refer to their corresponding speci cations [
            <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
            ].
2.2
          </p>
        </sec>
        <sec id="sec-3-1-2">
          <title>Mapping Generation Approaches</title>
          <p>
            In previous work [
            <xref ref-type="bibr" rid="ref8">8</xref>
            ], we identi ed four approaches that data owners use when
they create mapping rules themselves: the data-driven, schema-driven,
modeldriven, and result-driven approach. The data-driven approach is based on the
data, namely, rules are created for the data fractions of the input data sources.
Subsequently, the rules are annotated with classes, properties, datatypes, and
languages from schemas (vocabularies and ontologies). The schema-driven
approach is based on the schemas, namely, rules are created using the classes,
properties, and datatypes. Subsequently, the data fractions are associated with
the correct rules. The model-driven approach is based on the model of the
domain. More precisely, the entities, their attributes, and their relationships to
other entities are de ned, without explicitly indicating neither the schemas nor
the data fractions to be used. This leads to abstract rules without data fractions
and schema elements. Subsequently, the model is instantiated by applying
adequate schema(s) and it is associated with data fractions, by specifying which
fractions are associated with which parts of the model. The result-driven
approach creates rules based on the data sources and corresponding Linked Data.
The rules are created based on the complete Linked Dataset's model and used
schemas. Afterwards, the data fractions are associated with the correct rules.
          </p>
          <p>
            In other research elds, the example-driven approach has been applied
successfully [
            <xref ref-type="bibr" rid="ref10 ref9">9, 10</xref>
            ]. Their application of this approach includes two high-level steps:
(i) for a sample of the input, the output, i.e. the example, is given by the user;
(ii) similar output is generated for the complete input based on the example,
with no or minimal user interaction. For example, users select an example of
the data they want on a Webpage and they get all desired data, based on that
example, from the page [
            <xref ref-type="bibr" rid="ref9">9</xref>
            ]. In our case, for a sample of the existing data, i.e., the
input, a Linked Data sample, which is desired to be generated, may act as the
example. Linked Data for the complete existing dataset is obtained by (i)
creating the mapping rules which de ne how Linked Data should be generated, and
(ii) executing them and generating the Linked Data.
          </p>
          <p>The example-driven approach is related to the result-driven approach. Users
either provide the complete Linked Dataset or a sample, i.e., the example. The
result-driven approach deals with the case when you have the complete Linked
Dataset, while example-driven deals with the case when you have a sample.
Furthermore, neither the example-driven approach nor the result-driven approach
have been applied so far, to the best of our knowledge.</p>
        </sec>
        <sec id="sec-3-1-3">
          <title>2.3 (Semi-)Automatic Solutions</title>
          <p>
            As the process of creating mappings can become costly process when done
manually [
            <xref ref-type="bibr" rid="ref1 ref2 ref3">1, 2, 3</xref>
            ], both semi-automatic and automatic solutions have been proposed.
The former combine automatic steps with user interaction, where the latter
completely rely on automatic steps. For example, Jimenez-Ruiz et al. [
            <xref ref-type="bibr" rid="ref11">11</xref>
            ]
developed BootOX that creates mappings for relational databases based on the data
schema, and applies user feedback to improve the mappings. Taheriyan et al. [
            <xref ref-type="bibr" rid="ref12">12</xref>
            ]
propose an automatic solution that creates a new mapping based on previous
mappings, the raw data, and the preferred ontologies.
          </p>
          <p>
            Although these solutions show promising results, during the use cases where
a Linked Data example is available, users still need to adjust the rules if the
resulting Linked Data does not match the example. This is due to the fact that
these approaches do not take into account such use cases and, thus, they do
not consider examples when creating mapping rules. Nevertheless, these
examples o er knowledge relevant for the mapping process, such as (i) which data
corresponds to entities and attributes, i.e., the subjects and objects of an rdf
triple; (ii) how the data is annotated and modeled, i.e., which classes, properties,
datatypes, and languages are used and how they are related to each other; and
(iii) how di erent entities are linked to each other. To the best of our
knowledge, no research has been conducted in applying the example-driven approach
in (semi-)automatic solutions for Linked Data generation [
            <xref ref-type="bibr" rid="ref4">4</xref>
            ]. However, the
application of this approach in other elds has shown a decrease in the cost of the
process [
            <xref ref-type="bibr" rid="ref9">9</xref>
            ].
          </p>
          <p>
            Kranzdorf et al. [
            <xref ref-type="bibr" rid="ref9">9</xref>
            ] developed an example-driven system that aids in the
creation of path expressions to extract the required text from Webpages. The
system works as follows: (i) users select an example text on the page, (ii) the
system creates an initial expression, (iii) all the text that is selected by the
expression is presented to the users, and (iv) all that text is extracted. Iteratively,
users can update or provide additional examples to improve the expressions.
Atzori and Zaniolo [
            <xref ref-type="bibr" rid="ref10">10</xref>
            ] introduce a method to query DBpedia based on an
example Wikipedia infobox: (i) users edit the information of an infobox, which
acts as an example of how the infobox of a desired Wikipedia page should look
like; (ii) the infobox is used to construct a sparql query; (iii) the query is
executed on DBpedia; and (iv) the resulting Wikipedia pages are presented. In
both systems, the example-driven approach results in minimal user interaction
required with regard to the construction of the expressions and the queries. This
leads to a less costly process.
          </p>
          <p>As the generation of the mapping rules is similar to the aforementioned
construction, the (semi-)automatic example-driven approach is also applicable to
mapping generation. Mapping rules are created by the system based on both an
existing data and Linked Data example, reducing the required user interaction.
This, in turn, also can lead to a less costly process.
3</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>Running Example</title>
        <p>We explain the di erent elements of the remainder of the paper through a
running example. The example contains two data sources in two di erent formats:
CSV (see Listing 1.1) and JSON (see Listing 1.2). The data source in the CSV
format provides records for books, including the id, title, and name of the
author. The data source in the JSON format provides records for authors, including
the id, name, country, and birth date. The Linked Data example is based on the
rst record of each data source (see Listing 1.3). The data sources are interlinked
through the name of the author. This is also re ected in the reuse of the author's
iri in the Linked Data example (see line 9).
4</p>
      </sec>
      <sec id="sec-3-3">
        <title>Approach</title>
        <p>In this section, we discuss our example-driven approach. It is based on the two
steps executed by users when manually creating mapping rules based on an
example: (i) the subjects, predicates, and objects of the rdf triples are aligned
with the data sources; and (ii) rules are created based on the alignment, and the
Algorithm 1 Alignment between rdf example and data sources
for entity 2 example do
objects example:getT ripleObjects(entity)
for object 2 objects do
for dataSource 2 dataSources do</p>
        <p>align(object; dataSource)
end for
end for
if isIRI(entity) then
identif ier getIdentif ierIRI(entity)
align(identif ier; dataSource)
end if
selectBestDataSource(dataSources)
end for
for (entity1; entity2) 2 example do
if isT ripleW ith(entity1; entity2) then</p>
        <p>f indCondition(entity1; entity2)
end if
end for
model, semantic extractions, and relationships between entities extracted from
the rdf triples.
4.1</p>
        <sec id="sec-3-3-1">
          <title>Data Source Alignment</title>
          <p>
            The triples in a Linked Data example are contain data values that stem from
the existing data. To create mapping rules that refer to the correct data
values, our approach needs to determine which data values in the example align
with which data values in the data sources. The steps of this alignment can
be found in Algorithm 1. For each entity, triples with the entity as subject are
grouped together. Per triple in a group, for each object the correct reference
to a data fraction in each data source is determined, if possible. The example
value (as found in the rdf example) is compared with the data values of every
data fraction of the data source. For every unique entity there is a unique iri. A
common practice is to construct iris by using a base iri [
            <xref ref-type="bibr" rid="ref13">13</xref>
            ] to which a
entityspeci c value is appended. For example, http://www.example.com/book/0 and
http://www.example.com/book/1 are iris for books. Both iris start with http:
//www.example.com/book/ and the id of the book, i.e., \0" and \1", is
appended. This knowledge needs to added to the mapping rules to generate the
correct subjects for the triples. Therefore, we analyze the iris and extract the
document or fragment identi er, i.e., the string after the last # or /. Next, the
identi er is aligned with the data sources.
          </p>
          <p>For each group the data source is selected for which the most references could
be found. In case the same number of references are found, an arbitrary choice is
made. If two entities are connected, the conditions under which the entities are
related to each other can be determined. In our approach, we consider every value</p>
          <p>csv
schema:author
b</p>
          <p>json
foaf:Person
author:jkr
foaf:name
J.K. Rowling
csv
json</p>
          <p>foaf:country
UK
json</p>
          <p>foaf:birthdate
1965-07-21
xsd:date
json
from the selected data source for each entity and search for matches between
these values.</p>
          <p>We apply these steps to the Linked Data example (see Listing 1.3) and data
values extracted from the two data sources (see Listings 1.1 and 1.2):
1. There are two groups, because there are two entities (see Figure 1). One
group includes the triples regarding the book book:0 (a) and the other
includes the triples regarding the author author:jkr (b). In Figure 1 a graph
visualization is used to represent the rdf triples from the example. iris and
blank nodes, i.e., entities, are represented as circular nodes and include the
class of an entity. Literals are represented as rectangular nodes and include
the datatype or the language of a literal value. Edges are used to represent
the relationships connecting the subjects and objects, and include the used
predicates.
2. Within each group, we perform the alignment between the example data and
the data sources. For group (a), the literal values aligne with the CSV data
source, because \Harry Potter and The Sorcerer's Stone" can be found in
title. For group (b), the literal values \UK" and \1965-07-31" are aligned
with the JSON data source via country and birthdate. The literal value
\J.K. Rowling" is aligned with both the CSV and JSON data source, because
\J.K. Rowling" can be found in the column author and attribute name,
respectively.
3. For the iris of the two entities, we consider the part after the / to determine
the corresponding reference in the data sources. The value for book:0 is
\0", which aligns with id of the CSV source, and for author:jkr it is \jkr",
which aligns with id of the JSON source.
4. We select the appropriate data source (and references) for each group. For
group (a) only data values from the CSV data source is used, so we choose
this data source. For group (b), both data sources are used; however, only
the JSON data source was able to align with four nodes, while the CSV data
source was only able to align with one node. Therefore, we choose the JSON
data source over the CSV data source for group (b).
Algorithm 2 Creation of mapping rules
based on alignment and extracted knowledge
for entity 2 example do
triples example:getT riples(entity)
triplesM ap genererateT riplesM ap()
triplesM ap:generateLogicalSource(entity)
triplesM ap:generateSubjectM ap(entity)
for triple 2 triples do
predicateObjectM ap triplesM ap:generateP redicateObjectM ap(triple:predicate)
predicateObjectM ap:generateObjectmap(triple:object)
end for
end for
for (entity1; entity2) 2 example do
if isT ripleBetween(entity1; entity2) then
triple example:getT riple(entity1; entity2)
pom triplesM ap:generateP redicateObjectM ap(triple:predicate)
pom:generateRef erencingObjectM ap(entity1; entity2)
end if
end for
5. The book and the author of the rdf example have the same value for the
column author and the attribute name. These references can be used to
determine whether a pair of authors and books are related.
4.2</p>
        </sec>
        <sec id="sec-3-3-2">
          <title>Mapping Creation</title>
          <p>Once the Linked Data example is aligned with data sources, a mapping is
created. During the creation of the rules, additional knowledge is required that can
be extracted from the Linked Data example. More speci c, this knowledge
includes (i) how the data is annotated and modeled, i.e., which classes, properties,
datatypes, and languages are used and how they are related to each other; and
(ii) how di erent entities are linked to each other.</p>
          <p>The previous step in our approach was mapping language independent.
However, to create an actual mapping that can be used to generate Linked Data, we
need to rely on a speci c mapping language. In our approach we use rml, as
it allows extracting data values from multiple, heterogeneous data sources and
semantically annotating them (see Section 2).</p>
          <p>Again, triples are grouped by entity. The steps that are performed for each
group can be found in Algorithm 2. For every group, a Triples Maps is
created. Each Triples Map contains the details about how subjects, predicates, and
objects are generated for a certain type of entity. Each Triples Map requires a
Logical Source. This explains which data source is used to create the di erent
subjects, predicates, and objects. In the case of rml, an iterator is required if
multiple entities need to be mapped to Linked Data. In our approach, the iterator
can be determined by taking the common path of the references for each group.
Each Triples Map needs a Subject Map that explains how subjects of triples are
generated. A Subject Map includes the classes and the template to generate the
correct iris when iris are required instead of blank nodes. The classes can be
extracted from the triples and the template is available from the alignment. For
every combination of predicates and objects a Predicate Object Map is needed.
The required information is available in a triple's predicate and object. The
predicate of the triple is added to Predicate Object Map. If an object of a triple is a
literal, it determines the details of a Object Map of the Predicate Object Map:
(i) the reference found via the alignment is added via rml:reference; (ii) if the
literal has a datatype, it is added via rr:datatype; and (iii) if the literal has a
language, it is added via rr:language. If the object refers to an entity, instead
of an Object Map, a Referencing Object Map is used. This map refers to the
Triples Map of the other entity. Additionally, join conditions are added, if found
during the alignment. A join condition states the condition under which there is
a relationship between two entities.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
&lt;#TM_B&gt; rml:logicalSource &lt;#LS_B&gt;;
rr:subjectMap &lt;#SM_B&gt;;
rr:predicateObjectMap &lt;#POM_B1&gt;, &lt;#POM_B2&gt;.
&lt;#LS_B&gt; rml:source "books.csv"; rml:referenceFormulation ql:CSV.
&lt;#SM_B&gt; rr:class schema:Book; rr:template "http://www.example.com/book/{id}".
&lt;#POM_B1&gt; rr:predicate schema:title;</p>
          <p>rr:objectMap [ rml:reference "title"; rr:language "en" ].
&lt;#POM_B2&gt; rr:predicate schema:author;
rr:objectMap [
rr:parentTriplesMap &lt;#TM_A&gt;;
rr:joinCondition [rr:child "author"; rr:parent "name"]
].
&lt;#TM_A&gt; rml:logicalSource &lt;#LS_A&gt;;
rr:subjectMap &lt;#SM_A&gt;;
rr:predicateObjectMap &lt;#POM_A1&gt;, &lt;#POM_A2&gt;, &lt;#POM_A3&gt;.
&lt;#LS_A&gt; rml:source "authors.json";
rml:iterator "$.authors[*]";
rml:referenceFormulation ql:JSONPath.
&lt;#SM_A&gt; rr:class schema:Person;</p>
          <p>rr:template "http://www.example.com/author/{id}".
&lt;#POM_A1&gt; rr:predicate foaf:name; rr:objectMap [ rml:reference "name" ].
&lt;#POM_A2&gt; rr:predicate foaf:country; rr:objectMap [ rml:reference "country" ].
&lt;#POM_A3&gt; rr:predicate foaf:birthdate;
rr:objectMap [ rml:reference "birthdate"; rr:datatype xsd:date ].</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Listing 1.4: RML mapping for the example</title>
      <p>In Listing 1.4 the resulting mapping rules for our example can be found. Two
Triples Maps with each a Logical Source, Subject Map and one or more
Predicate Object Maps were created (lines 1-16 and 18-34). The Logical Source for
the books (line 5) refers to the CSV data source (rml:source) and states that
we are using the reference formulation for CSV (rml:referenceFormulation).
The Subject Map (line 7) denotes that every book is of the class schema:Book
and that the iri for each book is constructed based on the id, as found
during the alignment. The only attribute for the books is related to the entity
via schema:title and uses the data from the column title, which is in
English. This is re ected in a Predicate Object Map (line 9), with a connected
rr:objectMap pointing to the correct column and language. A second
Predicate Object Map (line 12) is used to state the relationship between the books
and the authors and is only valid when the author of the book and the name
of the author are the same (line 15). Similar maps are created for the authors.
However, the Logical Source points to a di erent data source and a di erent
reference formulation is used, and an iterator is added, as we are dealing with
the JSON data source (lines 22-24). Additionally, the datatype of the birth date
is set to xsd:date (line 34).
1 book:0 a schema:Book;
2 schema:title "Harry Potter and The Sorcerer's Stone"@en;
3 schema:author author:jkr.
4
5 book:1 a schema:Book;
6 schema:title "Homo Deus"@en;
7 schema:author author:ynh.
8
9 author:jkr a foaf:Person;
10 foaf:name "J.K. Rowling";
11 foaf:country "UK";
12 schema:birthdate "1965-07-21"^^xsd:date.
13
14 author:ynh a foaf:Person;
15 foaf:name "Yuval Noah Harari";
16 foaf:country "Israel";
17 schema:birthdate "1976-04-24"^^xsd:date.</p>
      <p>Listing 1.5: Generated Linked Data based on the two data sources and the
RML mapping</p>
      <p>The generated Linked Data based on the two data sources and the rml
mapping can be found in Listing 1.5. The Linked Data contains the rdf triples
that were provided as example (lines 1-3 and 9-12), and the rdf triples for
the second book and author that are present in the data sources (lines 5-7 and
14-17).
5</p>
      <sec id="sec-4-1">
        <title>Implementation</title>
        <p>
          The approach is available via a JavaScript library1. This library is available
for Node.js and the browser. The library supports both the CSV and JSON
format, showcasing the support for tabular and hierarchical data. Furthermore,
it is accessible through a command line interface and a graphical user interface
via the rmleditor [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]. The rmleditor provides a graphical user interface for the
creation and editing of mappings, with rml as its underlying mapping language.
To apply the example-driven approach, users need to perform two steps: load
the di erent data sources and provide a Linked Data example through a set of
1 https://github.com/RMLio/example2rml
rdf triples. Subsequently, the mapping is created as described and visualized in
the interface. This process is shown in the screencast at https://www.youtube.
com/watch?v=IQVwlYQXwAo.
6
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>Discussion and Conclusion</title>
        <p>Although existing approaches, such as data-driven and schema-driven, and their
corresponding (semi-)automatic solutions, have been the topic of multiple
research e orts, they o er limited bene ts when dealing with use cases that
provide Linked Data examples. These approaches consider one or more di erent
elements as input, such as data, data schemas, ontologies, and existing mappings.
The example-driven approach considers as input the data and a Linked Data
example. The advantage of the example-driven approach is the use of knowledge
that can be extracted from the Linked Data example: (i) which data corresponds
to entities and attributes, i.e., the subjects and objects of an rdf triple; (ii) how
the data is annotated and modeled, i.e., which classes, properties, datatypes,
and languages are used and how they are related to each other; and (iii) how
di erent entities are linked to each other. By using this knowledge, the
exampledriven approach creates mapping rules that generate Linked Data as desired
by the users. The other approaches do not consider this example, and, thus,
the created mapping rules might not generate the desired Linked Data.
Consequently, users need to manually update the mapping rules. Nonetheless, these
approaches are better suited when an example cannot be provided. In these use
cases, the example-driven approach cannot be applied due to lack of an example.
Therefore, the use case at hand drives the choice for the appropriate approach.
Furthermore, the example-driven approach can be extended by applying
techniques introduced by the other approaches, such as the use of data schemas,
other ontologies, and existing mappings. This results in a hybrid approach that
uses an increased amount of knowledge when creating the mapping rules,
compared to the individual approaches. This increase of used knowledge leads to
an improvement of the created mapping rules, which reduces the cost of the
mapping process to generate the desired Linked Data.</p>
        <p>As future work, we envision the addition of an extra step to our approach
to create rules that apply data transformations on the raw data during the
Linked Data generation. This is needed for use cases where the data in the rdf
triples is not just a copy of the raw data, but instead the raw data needs to
transformed before it can be used as (part of) of a subject, predicate, or object.
Furthermore, we plan to explore how to combine the di erent approaches to
achieve the aforementioned hybrid approach.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Evgeny</given-names>
            <surname>Kharlamov</surname>
          </string-name>
          , Dag Hovland, Ernesto Jimenez-Ruiz, Davide Lanti, Hallstein Lie, Christoph Pinkel, Martin Rezk, Martin G. Skj veland, Evgenij Thorstensen, Guohui Xiao, Dmitriy Zheleznyakov, and
          <string-name>
            <given-names>Ian</given-names>
            <surname>Horrocks</surname>
          </string-name>
          .
          <article-title>Ontology Based Access to Exploration Data at Statoil</article-title>
          .
          <source>In Proceedings of the 14th International Semantic Web Conference</source>
          , pages
          <volume>93</volume>
          {
          <fpage>112</fpage>
          . Springer,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Evgeny</given-names>
            <surname>Kharlamov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Nina</given-names>
            <surname>Solomakhina</surname>
          </string-name>
          ,
          <string-name>
            <surname>O</surname>
          </string-name>
          <article-title>zgur Lutfu Ozcep</article-title>
          , Dmitriy Zheleznyakov, Thomas Hubauer, Ste en Lamparter, Mikhail Roshchin, Ahmet Soylu, and Stuart Watson.
          <article-title>How Semantic Technologies Can Enhance Data Access at Siemens Energy</article-title>
          .
          <source>In Proceedings of the 13th International Semantic Web Conference</source>
          , pages
          <volume>601</volume>
          {
          <fpage>619</fpage>
          . Springer,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Bin</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <surname>Mitesh Patel</surname>
          </string-name>
          , Zhen Zhang, and
          <string-name>
            <surname>Kevin</surname>
            <given-names>Chen-Chuan</given-names>
          </string-name>
          <string-name>
            <surname>Chang</surname>
          </string-name>
          .
          <article-title>Accessing the Deep Web</article-title>
          .
          <source>Communications of the ACM</source>
          ,
          <volume>50</volume>
          (
          <issue>5</issue>
          ):
          <volume>94</volume>
          {
          <fpage>101</fpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Pieter</given-names>
            <surname>Heyvaert</surname>
          </string-name>
          .
          <article-title>Ontology-based data access mapping generation using data, schema, query, and mapping knowledge</article-title>
          .
          <source>In Proceedings of the 14th Extended Semantic Web Conference: PhD Symposium</source>
          , May
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>De</surname>
            <given-names>Meester</given-names>
          </string-name>
          , Maroy, Dimou, Verborgh, and
          <string-name>
            <surname>Mannens</surname>
          </string-name>
          .
          <article-title>Declarative Data Transformations for Linked Data Generation: the case of DBpedia</article-title>
          . In Eva Blomqvist,
          <string-name>
            <given-names>D.</given-names>
            <surname>Maynard</surname>
          </string-name>
          , Aldo Gangemi,
          <string-name>
            <given-names>R.</given-names>
            <surname>Hoekstra</surname>
          </string-name>
          , Pascal Hitzler, and Olaf Hartig, editors,
          <source>Proceedings of the 14th ESWC, LNCS</source>
          , pages
          <volume>33</volume>
          {
          <fpage>48</fpage>
          . Springer, Cham, may
          <year>2017</year>
          .
          <source>ISBN 978-3-319-58450-8</source>
          ,
          <fpage>978</fpage>
          -3-
          <fpage>319</fpage>
          -58451-5. doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>319</fpage>
          -58451-5 3. URL https://link.springer.com/chapter/10.1007/978-3-
          <fpage>319</fpage>
          -58451-
          <issue>5</issue>
          _
          <fpage>3</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Souripriya</given-names>
            <surname>Das</surname>
          </string-name>
          ,
          <string-name>
            <surname>Seema Sundara</surname>
          </string-name>
          , and Richard Cyganiak. R2RML:
          <article-title>RDB to RDF Mapping Language</article-title>
          . Working group recommendation,
          <source>W3C</source>
          ,
          <year>September 2012</year>
          . URL http://www.w3.org/TR/r2rml/.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Anastasia</given-names>
            <surname>Dimou</surname>
          </string-name>
          , Miel Vander Sande, Pieter Colpaert, Ruben Verborgh, Erik Mannens, and Rik Van de Walle.
          <article-title>RML: A Generic Language for Integrated RDF Mappings of Heterogeneous Data</article-title>
          .
          <source>In Proceedings of the 7th Workshop on Linked Data on the Web</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Pieter</given-names>
            <surname>Heyvaert</surname>
          </string-name>
          , Anastasia Dimou, Ruben Verborgh, Erik Mannens, and Rik Van de Walle.
          <article-title>Towards Approaches for Generating RDF Mapping De nitions</article-title>
          .
          <source>In Proceedings of the 14th International Semantic Web Conference: Posters and Demos</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Jochen</given-names>
            <surname>Kranzdorf</surname>
          </string-name>
          , Andrew Sellers, Giovanni Grasso,
          <string-name>
            <given-names>Christian</given-names>
            <surname>Schallhart</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Tim</given-names>
            <surname>Furche</surname>
          </string-name>
          .
          <article-title>Visual OXPath: robust wrapping by example</article-title>
          .
          <source>In Proceedings of the 21st International Conference on World Wide Web</source>
          , pages
          <volume>369</volume>
          {
          <fpage>372</fpage>
          . ACM,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Maurizio</given-names>
            <surname>Atzori</surname>
          </string-name>
          and
          <string-name>
            <given-names>Carlo</given-names>
            <surname>Zaniolo</surname>
          </string-name>
          . SWIPE:
          <article-title>Searching Wikipedia by Example</article-title>
          .
          <source>In Proceedings of the 21st International Conference on World Wide Web</source>
          , pages
          <volume>309</volume>
          {
          <fpage>312</fpage>
          . ACM,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>Ernesto</given-names>
            <surname>Jimenez-Ruiz</surname>
          </string-name>
          , Evgeny Kharlamov, Dmitriy Zheleznyakov, Ian Horrocks, Christoph Pinkel, Martin G. Skj veland, Evgenij Thorstensen, and Jose Mora. BootOX:
          <article-title>Practical Mapping of RDBs to OWL 2</article-title>
          .
          <source>In Proceedings of the 14th International Semantic Web Conference (Part II)</source>
          , pages
          <fpage>113</fpage>
          {
          <fpage>132</fpage>
          . Springer,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Mohsen</surname>
            <given-names>Taheriyan</given-names>
          </string-name>
          , Craig A Knoblock,
          <string-name>
            <given-names>Pedro</given-names>
            <surname>Szekely</surname>
          </string-name>
          , and Jose Luis Ambite.
          <source>Learning The Semantics of Structured Data Sources. Web Semantics: Science, Services and Agents on the World Wide Web</source>
          ,
          <volume>37</volume>
          :
          <fpage>152</fpage>
          {
          <fpage>169</fpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Larry</surname>
            <given-names>Masinter</given-names>
          </string-name>
          , Tim
          <string-name>
            <surname>Berners-Lee</surname>
          </string-name>
          , and
          <string-name>
            <surname>Roy T Fielding</surname>
          </string-name>
          .
          <article-title>Uniform resource identi er (URI): Generic syntax</article-title>
          .
          <source>Technical report</source>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Pieter</surname>
            <given-names>Heyvaert</given-names>
          </string-name>
          , Anastasia Dimou,
          <string-name>
            <surname>Aron-Levi</surname>
            <given-names>Herregodts</given-names>
          </string-name>
          , Ruben Verborgh, Dimitri Schuurman, Erik Mannens, and Rik Van de Walle.
          <article-title>RMLEditor: A Graphbased Mapping Editor for Linked Data Mappings</article-title>
          .
          <source>In The Semantic Web { Latest Advances and New Domains (ESWC</source>
          <year>2016</year>
          ), pages
          <fpage>709</fpage>
          {
          <fpage>723</fpage>
          . Springer,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>