<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>High Quality Schema and Data Transformations for Linked Data Generation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ben De Meester?</string-name>
          <email>ben.demeester@ugent.be</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>IDLab, Department of Electronics and Information Systems, Ghent University</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>imec</institution>
          ,
          <addr-line>Ghent</addr-line>
          ,
          <country country="BE">Belgium</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>High quality Linked Data is an important factor for the success of the Semantic Web. However, the quality of generated Linked Data is typically assessed and re ned after the dataset is generated, which is computationally intensive. Given Linked Data is typically generated from (semi-)structured data which highly in uences the intrinsic dimensions of the resulting Linked Data quality, I investigate how a generation process can automatically be validated before rdf data is even generated. However, current generation processes are not easily validated: descriptions of the data transformations depend on the use case or are incomplete, and validation approaches would require manual (re-)de nition of test cases aimed at the generated dataset. I propose (i) a generic approach to declaratively describe a generation process, and (ii) a validation approach for automatically assessing the quality of the generation process itself. By aligning declarative data and schema transformations, the generation process remains generic and independent of the implementation. The transformations can be automatically validated based on constraint rules that apply to the generated rdf data graph using custom entailment regimes. Preliminary results show the generation process of dbpedia can be described declaratively and (partially) validated.</p>
      </abstract>
      <kwd-group>
        <kwd>Generation</kwd>
        <kwd>Linked Data</kwd>
        <kwd>Data Transformation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        High quality Linked Data is an important factor for the success of the
envisaged Semantic Web. As machines are inherently intolerant at interpretation of
unexpected input, low quality data produces low quality results. Quality
assessment { speci cally for the intrinsic dimensions, i.e., directly related to the rdf
graph [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] { can be automated by checking constraint violations [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. A Linked
Data generation approach that eases validation lowers the threshold for Linked
Data publishers to generate data of high quality. Having access to Linked Data
of higher quality is bene cial for all Linked Data consumers.
? Co-Promotor prof. dr. ir. Ruben Verborgh and Promotor prof. dr. ir. Erik Mannens.
Assessing an entire dataset is computation and memory intensive [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. However,
Linked Data is typically generated from (semi-)structured data [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], and this
generation process highly in uences the quality assessment's intrinsic dimensions
of the resulting Linked Data. E.g., wrongly de ned schema transformations result
in violations such as entities as members of disjoint classes, and incorrect data
transformations result in inaccurate values such as parsing March 8, `17 into
the date 08-03-0017 [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ]. Violations can be resolved in the dataset, however, the
generation process that causes these violations is not improved. A new iteration
of the generation process can re-introduce the same errors, and data validation
needs to be re-executed. This results in duplicate work, wasted computation,
and wasted time.
      </p>
      <p>Problem Statement Iteratively validating generated Linked Data is
computationally intensive and makes it hard to determine the root causes of quality
violations.</p>
      <p>
        The earlier a dataset's quality is assessed, the better [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. When validating
an rdf data graph, it is not clear what the cause of a constraint violation
is, e.g., whether the generation is badly modeled or the input data is
inaccurate. When validating the rdf generation process instead, the assessment report
points to the violating parts, which can be used to re ne the
transformations used in those parts. This allows identi cation of violations before they even
occur and avoids propagation of awed transformations, leading to many
faulty triples [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Solving constraint violations on the generation level is thus more
e cient and allows for iterative re nement.
3
      </p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        Linked Data is typically generated from (semi-)structured data, encompassing
both schema and data transformations [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. Schema transformations involve
(re-)modeling the original data, describing how rdf terms are related, and
deciding which vocabularies and ontologies to use [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Data transformations are
needed to support any change in structure, representation, or content of
data [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], e.g., performing string transformations or computations. However, the
generation process' support for data transformations is currently uncombinable,
restricted, part of a use-case speci c system, or coupled [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. This results in
generation processes including preprocessing or custom implementations.
      </p>
      <p>
        In this section, I rst give an overview of Linked Data generation approaches,
and how schema and data transformations are integrated. Then, I review
existing validation approaches. Focus is given to declarative generation approaches,
as existing work has shown that schema transformations can be validated and
improved even before the generation execution, when making use of a declarative
generation process [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. The declarative schema transformations specify how the
dataset will be formed, thus, the assessment of schema transformations and
generated rdf graph are correlated [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Instead of directly validating the rdf data
graph, a new shape that corresponds to the declarative schema transformations
currently needs to be manually (re-)de ned for automatic validation.
3.1
      </p>
      <sec id="sec-2-1">
        <title>Generation Approaches</title>
        <p>On high level, the following approaches for Linked Data generation are identi ed:</p>
        <p>
          Hard-coded Custom tools and scripts were initially used to generate Linked
Data from raw data. They incorporate directly in their implementation both the
schema and data transformations, as in the case of, e.g., the dbpedia
Extraction Framework (dbpedia ef) [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. Updating the semantic annotations resulted
in dedicated software development cycles to adjust the implementations [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ].
        </p>
        <p>
          Case-speci c Solutions such as xslt- or xpath-based approaches were
established for generating Linked Data from data originally in xml format, e.g., [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ].
These solutions are declarative: rules are detached from the implementation that
executes them, thus, the implementation does not need to be updated when the
rules are updated. However, only speci c data sources are supported, and the
range of possible schema or data transformations is limited by the respective
language or syntax potential.
        </p>
        <p>
          Generic The rdf Mapping Language (rml) [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] { based on w3c
Recommendation r2rml [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] { is a declarative language, represented in rdf, that supports
schema transformations for heterogeneous (semi-)structured data sources. This
solution is no longer case-speci c and the Linked Data generation process is
machine-processable.
        </p>
        <p>
          Generic solutions however only support schema transformations. Data
transformations are either not supported, or the range of possible data
transformations is determined by the range of transformations that can be de ned when the
data is retrieved from the data source (pre-processing), or after the Linked Data
is generated (post-processing). More customization is enabled by solutions that
allow embedded scripts inside declarative schema transformations, such as using
fun-ul [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ], or using custom sparql binding functions, such as for
sparqlGenerate [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]. These solutions however depend on their implementation, and do
not provide a declarative description.
        </p>
        <p>
          Generic data transformations Besides the aforementioned solutions that
partially integrate schema and data transformations, there are Linked Data
generation processes which rely on distinct systems to perform the schema and data
transformations. These types of transformations cannot always be distinguished,
as data transformations may a ect the original schema. Their support for data
transformations range from a xed prede ned set of transformations (e.g., Linked
Pipes [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]) to an embedded scripting environment (e.g., OpenRe ne1).
        </p>
        <p>
          Declarative data transformations Di erent approaches emerged that de ne
data transformations and other functionalities declaratively, e.g., Hydra [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] for
Web services, or volt [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ] for sparql. However, these declarative approaches
        </p>
        <sec id="sec-2-1-1">
          <title>1 http://openrefine.org/</title>
          <p>focus on speci c implementations and can thus only be used within their context,
i.e., Hydra can only be used for Web service implementations, and volt only for
sparql endpoint implementations. No implementation-independent declarative
solution is available.
3.2</p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>Data Validation</title>
        <p>Two approaches emerged for assessing constraint violations: (i) integrity
constraints to detect violations, and (ii) query-based validation detection depending
on the rdf graph's shape.</p>
        <p>
          Integrity constraints Entailment regimes that are part of, e.g., rdfs and owl,
are used as integrity constraints to detect violations of an rdf graph [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ].
However, these entailment regimes use the Open World Assumption, and assessing
constraints assumes a Closed World. As such, these approaches need to rede ne
existing semantics. Using one standard to express both validation and
reasoning is a strong point of this approach. However, this leads to ambiguity: The
same formula having di erent meanings endangers the interoperability within
the Semantic Web [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ].
        </p>
        <p>
          Query-based Query-based validation approaches depend on the rdf graph's
shape to detect violations. Except for approaches that use sparql templates,
e.g., rdfunit [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ], constraint description languages are proposed, of which
shacl [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] is a w3c Recommendation. Integration with entailment is either limited
or requires a separate inferencing process.
        </p>
        <p>
          However, the combination of inferencing with shape-based validation is
needed [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. This allows for integration of assessing shapes and ontological constructs.
When no inferencing is provided, either too few or too many violations can be
returned. On the one hand, a generated resource that is member of disjoint
classes might not revealed without inferencing, i.e., too few violations. On the other
hand, domain and range violations could be returned, whilst inferred domains
and ranges solve the violation, i.e., too many violations.
4
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Research Question</title>
      <p>Given the related work, we can conclude that the generation process itself
cannot easily be validated: current descriptions of the data transformations of the
generation processes depend on the use case or are incomplete, and validation
approaches would require manual (re-)de nition of test cases aimed at the
generated dataset, to apply to the generation process. I thus investigate the following
two research questions:
Research Question 1 How can we provide a use-case independent declarative
Linked Data generation description that includes both transformations?
Subquestion 1 How can we declaratively de ne data transformations?
Subquestion 2 How can we align these schema and data transformations?
Research Question 2 How can we automatically validate the generation
description based on the constraint rules that apply to the rdf data graph
without needing to manually (re-)de ne them?</p>
    </sec>
    <sec id="sec-4">
      <title>Hypotheses</title>
      <p>The rst research question handles a declaratively described generation process.
When the description is machine-processable, the validation can be automated.
Existing solutions handle declarative schema transformations, however, when
data transformations are supported they are either not declarative, or dependent
on the implementation. Hypotheses related to Research Question 1 are:
Hypothesis 1 Declarative data transformations, independent of the
implementation, are reusable across use cases and generation processes
Hypothesis 2 Aligned declarative schema and data transformations provide a
generic framework for describing Linked Data generation processes.</p>
      <p>
        The second research question handles a validation approach capable of
validating the generation description itself, based on the constraint descriptions of
the rdf datasets. Existing works do not support data transformations and
require manual rede nition of the validation shape of the generation process [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
Hypotheses related to Research Question 2 are:
Hypothesis 3 A declarative generation process containing both schema and
data transformations can be automatically validated based on the rdf data
graph constraint rules without needing manual rede nition.
      </p>
      <p>Hypothesis 4 Validation of a declarative generation process is more
computationally e cient using custom entailment regimes than using rdfunit and
shacl. Root causes of both schema and data transformations can be found
and re ned before any rdf data is generated.
6</p>
    </sec>
    <sec id="sec-5">
      <title>Approach</title>
      <p>
        My approach consists of four steps, each related to one hypothesis.
1. Declarative description of functions Describing functions declaratively makes
them independent of the implementation. The descriptions of these functions can
be reused in other use cases and technologies, such as for describing declarative
data transformations during Linked Data generation. As these functions are
described in rdf, their descriptions can be validated using existing Linked Data
validation approaches, e.g., when describing a birth date, it can be assessed
whether the output type of the used function is in fact a date type.
2. Alignment of declarative schema and data transformations I create a
declarative generation process by aligning declarative schema and data transformations,
making them combinable. Thus, functions can be used as data
transformations within the context of the schema transformations of the generation process.
However, as the transformations are described separately, there are no
interdependencies. The schema transformation descriptions do not necessarily depend
on the data transformation descriptions and vice versa. The generation process
can now be entirely described declaratively in rdf: the generation graph. This
generation graph can thus be validated using existing Linked Data validation
approaches.
3. Creation of a validation approach handling custom entailment regimes
Custom entailment regimes can describe how to rewrite rdf data graph validation
rules into rdf generation graph validation rules, without needing to manually
(re-)de ne them. For example, when a validation rule de nes that every resource
of class schema:Person should have a schema:birthDate de ned, it can be inferred
that whenever a resource of class schema:Person is generated, also a predicate
schema:birthDate and object with a valid date type should be generated. Not
only does my approach allow for integration of assessing shapes and ontological
constructs [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], it remains independent of the language that describes the
constraint rules of rdf data graphs. Existing constraint languages can be reused,
and constraint rules of rdf data graphs can be automatically interpreted for the
generation process.
4. Evaluating the validation approach to the generation approach I apply this
rdf generation graph containing both schema and data transformations, to
realworld use cases. Validating this rdf generation graph, using constraint rules of
the rdf data graph, can then be computationally compared with validating
the generated dataset. Given the example of previous step: instead of
validating every resource of class schema:Person of the data graph, only one part of
the generation graph needs to be validated, namely, the part handling the
generation of resources of class schema:Person. The latter is thus assumed more
computationally e cient.
7
      </p>
    </sec>
    <sec id="sec-6">
      <title>Evaluation Plan</title>
      <p>
        For evaluating my approach, separate from comparing to a gold standard, I
investigate the generation process of a real-world use case: the dbpedia ef. This is a
non-trivial generation process, taking into account both schema and data
transformations. Furthermore, the dbpedia dataset is a valuable resource and quality
issues have persisted over a long period of time [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. Improving the generation
process and the quality of the dbpedia dataset is bene cial for the Semantic
Web community.
      </p>
      <p>To evaluate the rst hypothesis, declarative functions are functionally
evaluated by the means of real-world use cases to (a) be able to apply the same
descriptions to multiple implementations, and (b) describe existing
implementations declaratively.</p>
      <p>The second hypothesis { alignment of declarative schema and data
transformations { is evaluated by providing complete executable declarative descriptions
of existing Linked Data generation processes, namely, the dbpedia generation
process. Completeness and correctness of the generation description with
respect to the original generation process is measured by comparing the generated
triples, and performance of the declarative generation process is compared with
the dbpedia ef in terms of processing speed.</p>
      <p>
        For the third hypothesis { applying the validation approach to the generation
approach { I compare my approach to rdfunit [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] and shacl processors [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. I
compare functionalities, namely, with the inclusion of custom validation regimes,
whilst performance should be at least comparable for small datasets. Then, I
create a golden standard, describing generation processes that are capable of
generating the Linked Data as used by the shacl test suite2. By successfully
applying the shacl test suite to this golden standard, I functionally prove the
third hypothesis.
      </p>
      <p>For the nal hypothesis I evaluate my approach by applying it to the dbpedia ef
to describe, use, and validate a generation process. Comparison metrics will be
completeness and correctness of the validation result, and processing speed of the
validation assessment, comparing the quality assessment of the declarative
description of the generation process with the quality assessment of the generated
dataset.
8</p>
    </sec>
    <sec id="sec-7">
      <title>Preliminary Results</title>
      <p>
        For the rst hypothesis, I proposed the Function Ontology (fno) [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], a way to
declaratively describe functions, without restricting to programming
languagedependent implementations. The ontology allows for extensions, and is proposed
as a possible solution for semantic applications in various domains. As evaluation,
fno has been successfully applied to describe the actions of a Docker le [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ],
and the extracted parsing functions that existed in the original dbpedia ef as a
separate, reusable module [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>
        For the second hypothesis, I aligned fno with rml [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]: a use-case independent
declarative generation process supporting both schema and data
transformations where the extraction, transformation and mapping rules execution are
decoupled [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. As evaluation, I successfully applied it to the dbpedia ef [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], covering
98% completeness with comparable performance. Any part of the Linked Data
generation can be reused to generate other datasets, such as the mapping and
transformation rules; or the previously extracted parsing functions [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ].
      </p>
      <p>
        Validating my third hypothesis is ongoing. I already showed that rule logic
can cover both validation and custom entailment regimes if it is expressive
enough [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Practical feasibility has been shown by providing a proof-of-concept
in N3Logic which supports all rdfunit constraint types.
9
      </p>
    </sec>
    <sec id="sec-8">
      <title>Re ections</title>
      <p>Validating declarative generation processes provides traceability of the root
causes of the violations, which allows improving the generation process instead of the
generated rdf graph. A more scalable and e cient approach to generate high
quality Linked Data is achieved. Preliminary results have shown it is possible
to declaratively describe generation processes such as the dbpedia ef including
both schema and data transformations.</p>
      <sec id="sec-8-1">
        <title>2 https://w3c.github.io/data-shapes/data-shapes-test-suite/</title>
        <p>Validation of the schema transformations can already be performed by
manually rede ning constraint rules. By enabling validation approaches to validate
the generation process based on constraint rules of the rdf data graph
without needing manual adjustments, we can enable a more qualitative generation
process without additional e ort.</p>
        <p>My approach allows declaration and validation of the entire generation
process. Each transformation can be validated, and each module is use-case and
implementation independent. More granular control is given to the data
modeler. Linked Data generation is made more precise, and can be validated better,
resulting in Linked Data of higher quality.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Arndt</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>De Meester</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dimou</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Verborgh</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mannens</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          :
          <article-title>Using rule based reasoning for RDF validation</article-title>
          . In: RuleML+RR (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Auer</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bizer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kobilarov</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lehmann</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cyganiak</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ives</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          :
          <article-title>DBpedia: A nucleus for a Web of Open Data</article-title>
          .
          <source>In: The Semantic Web: 6th International Semantic Web Conference, 2nd Asian Semantic Web Conference</source>
          ,
          <string-name>
            <surname>ISWC</surname>
          </string-name>
          <year>2007</year>
          +
          <article-title>ASWC 2007, Busan</article-title>
          , Korea,
          <source>November 11-15</source>
          ,
          <year>2007</year>
          . Proceedings. vol.
          <volume>4825</volume>
          , pp.
          <volume>722</volume>
          {
          <fpage>735</fpage>
          .
          <string-name>
            <surname>Busan</surname>
          </string-name>
          ,
          <string-name>
            <surname>Korea</surname>
          </string-name>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Bosch</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Acar</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nolle</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Eckert</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>The role of reasoning for RDF validation</article-title>
          .
          <source>In: Proceedings of the 11th International Conference on Semantic Systems</source>
          . pp.
          <volume>33</volume>
          {
          <issue>40</issue>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Das</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sundara</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cyganiak</surname>
            ,
            <given-names>R.:</given-names>
          </string-name>
          <article-title>R2RML: RDB to RDF Mapping Language</article-title>
          . Working group recommendation,
          <source>World Wide Web Consortium (W3C) (Sep</source>
          <year>2012</year>
          ), http://www.w3.org/TR/r2rml/
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>De Meester</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dimou</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Verborgh</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mannens</surname>
          </string-name>
          , E., Van de Walle, R.:
          <article-title>An Ontology to Semantically Declare and Describe Functions</article-title>
          . In: The Semantic Web:
          <article-title>ESWC 2016 Satellite Events</article-title>
          , Heraklion, Crete, Greece, May 29 { June 2,
          <year>2016</year>
          ,
          <string-name>
            <given-names>Revised</given-names>
            <surname>Selected</surname>
          </string-name>
          <article-title>Papers</article-title>
          . vol.
          <volume>9989</volume>
          , pp.
          <volume>46</volume>
          {
          <issue>49</issue>
          (Oct
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>De Meester</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maroy</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dimou</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Verborgh</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mannens</surname>
          </string-name>
          , E.:
          <article-title>Declarative data transformations for Linked Data generation: the case of DBpedia</article-title>
          .
          <source>In: Proceedings of the 14th ESWC</source>
          . pp.
          <volume>33</volume>
          {
          <issue>48</issue>
          (May
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Dimou</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kontokostas</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Freudenberg</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Verborgh</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lehmann</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mannens</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hellmann</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , Van de Walle, R.:
          <article-title>Assessing and re ning mappings to RDF to improve dataset quality</article-title>
          .
          <source>In: The Semantic Web { ISWC 2015</source>
          . vol.
          <volume>9367</volume>
          , pp.
          <volume>133</volume>
          {
          <fpage>149</fpage>
          .
          <string-name>
            <surname>Bethlehem</surname>
          </string-name>
          , PA, USA (Oct
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Dimou</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vander</surname>
            <given-names>Sande</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Colpaert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Verborgh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Mannens</surname>
          </string-name>
          , E., Van de Walle, R.:
          <article-title>RML: A Generic Language for Integrated RDF Mappings of Heterogeneous Data</article-title>
          .
          <source>In: Proceedings of the 7th Workshop on Linked Data on the Web</source>
          . vol.
          <volume>1184</volume>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Hyland</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Atemezing</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Villazon-Terrazas</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Best Practices for Publishing Linked Data</article-title>
          .
          <source>WG Note</source>
          ,
          <source>W3C (Jan</source>
          <year>2014</year>
          ), https://www.w3.org/TR/ld-bp/
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Junior</surname>
            ,
            <given-names>A.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Debruyne</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brennan</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>O</given-names>
            <surname>'Sullivan</surname>
          </string-name>
          ,
          <string-name>
            <surname>D.:</surname>
          </string-name>
          <article-title>An evaluation of uplift mapping languages</article-title>
          .
          <source>International Journal of Web Information Systems</source>
          <volume>13</volume>
          (
          <issue>4</issue>
          ),
          <volume>405</volume>
          {
          <fpage>424</fpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Kl</surname>
            <given-names>mek</given-names>
          </string-name>
          , J.,
          <string-name>
            <surname>Skoda</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Necasky</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <string-name>
            <surname>LinkedPipes</surname>
            <given-names>ETL</given-names>
          </string-name>
          :
          <article-title>Evolved Linked Data preparation</article-title>
          . In: The Semantic Web:
          <article-title>ESWC 2016 Satellite Events</article-title>
          , Heraklion, Crete, Greece, May 29 { June 2,
          <year>2016</year>
          ,
          <string-name>
            <given-names>Revised</given-names>
            <surname>Selected</surname>
          </string-name>
          <article-title>Papers</article-title>
          . vol.
          <volume>9989</volume>
          LNCS, pp.
          <volume>95</volume>
          {
          <issue>100</issue>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Knublauch</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kontokostas</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Shapes Constraint Language (SHACL)</article-title>
          .
          <source>W3C recommendation</source>
          ,
          <source>W3C (Jul</source>
          <year>2017</year>
          ), https://www.w3.org/TR/shacl/
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Kontokostas</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Westphal</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Auer</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hellmann</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lehmann</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cornelissen</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zaveri</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Test-driven evaluation of linked data quality</article-title>
          .
          <source>In: Proceedings of the 23rd international conference on World Wide Web</source>
          . pp.
          <volume>747</volume>
          {
          <issue>757</issue>
          (Mar
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Lange</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Krextor -An Extensible Framework for Contributing Content Math to the Web of Data</article-title>
          .
          <source>In: Intelligent Computer Mathematics: 18th Symposium</source>
          ,
          <year>Calculemus 2011</year>
          ,
          <article-title>and</article-title>
          10th International Conference, MKM 2011, Bertinoro, Italy,
          <source>July 18-23</source>
          ,
          <year>2011</year>
          . Proceedings. pp.
          <volume>304</volume>
          {
          <issue>306</issue>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Lanthaler</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          : Hydra Core Vocabulary.
          <source>Uno cial Draft</source>
          ,
          <source>Google (Mar</source>
          <year>2018</year>
          ), http: //www.hydra-cg.com/spec/latest/core/
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Lefrancois</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zimmermann</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bakerally</surname>
          </string-name>
          , N.:
          <article-title>A SPARQL extension for generating RDF from heterogeneous formats</article-title>
          .
          <source>In: The Semantic Web 14th International Conference, ESWC</source>
          <year>2017</year>
          , Portoro, Slovenia, May 28 June 1,
          <year>2017</year>
          , Proceedings. pp.
          <volume>35</volume>
          {
          <fpage>50</fpage>
          .
          <string-name>
            <surname>Portoroz</surname>
          </string-name>
          , Slovenia (May
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Maroy</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dimou</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kontokostas</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>De Meester</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Verborgh</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lehmann</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mannens</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hellmann</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Sustainable linked data generation: The case of DBpedia</article-title>
          .
          <source>In: Proceedings of the 16th International Semantic Web Conference: InUse Track</source>
          . Vienna, Austria (Oct
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Rahm</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Do</surname>
            ,
            <given-names>H.H.</given-names>
          </string-name>
          :
          <article-title>Data cleaning: Problems and current approaches</article-title>
          .
          <source>IEEE Data Engineering Bulletin</source>
          <volume>23</volume>
          (
          <issue>4</issue>
          ),
          <volume>3</volume>
          {
          <fpage>13</fpage>
          (
          <year>2000</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Regalia</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Janowicz</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gao</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>VOLT: A Provenance-Producing, Transparent SPARQL Proxy for the On-Demand Computation of Linked Data and its Application to Spatiotemporally Dependent Data</article-title>
          .
          <source>In: Proceedings of the 13th International Conference on The Semantic Web. Latest Advances and New Domains</source>
          . pp.
          <volume>523</volume>
          {
          <issue>538</issue>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Tao</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sirin</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bao</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McGuinness</surname>
            ,
            <given-names>D.L.</given-names>
          </string-name>
          :
          <article-title>Integrity constraints in owl</article-title>
          .
          <source>In: Proceedings of the 24th AAAI Conference on Arti cial Intelligence (AAAI</source>
          <year>2010</year>
          ). Atlanta, Georgia, USA (Jul
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Tommasini</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>De Meester</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Heyvaert</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Verborgh</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mannens</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Della Valle</surname>
          </string-name>
          , E.:
          <article-title>Representing docker les in RDF</article-title>
          .
          <source>In: ISWC 2017 Posters &amp; Demonstrations and Industry Tracks</source>
          . vol.
          <year>1963</year>
          . Vienna, Austria (Oct
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Zaveri</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rula</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maurino</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pietrobon</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lehmann</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Auer</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Quality assessment for linked data: A survey</article-title>
          .
          <source>Semantic Web Journal</source>
          <volume>7</volume>
          (
          <issue>1</issue>
          ),
          <volume>63</volume>
          {93 (Mar
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>