=Paper= {{Paper |id=Vol-2073/article-08 |storemode=property |title=None |pdfUrl=https://ceur-ws.org/Vol-2073/article-08.pdf |volume=Vol-2073 |dblpUrl=https://dblp.org/rec/conf/www/DimouHMV18 }} ==None== https://ceur-ws.org/Vol-2073/article-08.pdf
                                   What Factors Influence the Design of
                                   a Linked Data Generation Algorithm?
                             Anastasia Dimou                                                   Pieter Heyvaert
                    anastasia.dimou@ugent.be                                               pieter.heyvaert@ugent.be
        IDLab, Dep. of Electronics and Information Systems,                   IDLab, Dep. of Electronics and Information Systems,
                     imec – Ghent University                                               imec – Ghent University

                              Ben De Meester                                                  Ruben Verborgh
                     ben.demeester@ugent.be                                            ruben.verborgh@ugent.be
        IDLab, Dep. of Electronics and Information Systems,                IDLab, Dep. of Electronics and Information Systems,
                     imec – Ghent University                                            imec – Ghent University

ABSTRACT                                                                  algorithm should be designed as well as how third-parties choose
Generating Linked Data remains a complicated and intensive engi-          the most adequate tool. Potential alternatives for these factors af-
neering process. While different factors determine how a Linked           fect how efficiently the rules are executed to generate Linked Data.
Data generation algorithm is designed, potential alternatives for         For instance, “What is the purpose? Is the Linked Data consumed
each factor are currently not considered when designing the tools’        immediately or is it published for future use?” or “What triggers
underlying algorithms. Certain design patterns are frequently ap-         the generation? Is the Linked Data generated from a real-time data
plied across different tools, covering certain alternatives of a few of   stream which needs to be immediately processed, or on demand?”.
these factors, whereas other alternatives are never explored. Con-            Certain design patterns are noticed to be frequently applied
sequently, there are no adequate tools for Linked Data generation         across different tools, covering particular alternatives of these fac-
for certain occasions, or tools with inadequate and inefficient algo-     tors, whereas other alternatives or factors were never explored.
rithms are chosen. In this position paper, we determine such factors,     For instance, if we lack tools whose algorithms support real-time
based on our experiences, and present a preliminary list. These fac-      data streams, Linked Data can be generated by storing the data
tors could be considered when a Linked Data generation algorithm          in a database and using corresponding tools, e.g., Morph-streams.
is designed or a tool is chosen. We investigated which factors are        Thus, there are often no adequate tools for Linked Data genera-
covered by widely known Linked Data generation tools and con-             tion for certain occasions or tools with inadequate and inefficient
cluded that only certain design patterns are frequently encountered.      algorithms are chosen as best alternatives.
By these means, we aim to point out that Linked Data generation               However, those factors were not thoroughly and systematically
is above and beyond bare implementations, and algorithms need to          studied so far, nor were the algorithms’ designs. Different solutions
be thoroughly and systematically studied and exploited.                   do not concretely describe the algorithms that drive their imple-
                                                                          mentation while optimizations are applied according to specific use
1    INTRODUCTION                                                         cases, decreasing a tool’s chances to be reused. The algorithms are
                                                                          not designed considering different factors and the use case’s tech-
Generating Linked Data remains a complicated and intensive engi-
                                                                          nical and functional requirements are not matched to any factors.
neering process, despite the significant number of existing tools.
                                                                              In this work, we present a preliminary list of factors that could
Most solutions primarily choose their own (often non-interoperable)
                                                                          be considered. We do not aim for a complete list. This is a position
approach. Format- and source-specific approaches were investigated
                                                                          paper whose goal is to provide insights and raise awareness, as
as more generic alternatives [4]. In all cases, rules define how Linked
                                                                          Linked Data generation goes beyond and above bare implementa-
Data should be generated. These rules often remain implicit and em-
                                                                          tions. We reviewed a few of the pioneering and broadly used tools
bedded in the implementation e.g., the DBpedia Extraction Frame-
                                                                          that cover one or more of these factors and discuss the results of
work, but more generic solutions distinguish them and turn them
                                                                          our observations. By these means, we aim to ensure that one can
explicit and declarative, e.g., [R2]RML processors.
                                                                          (i) choose or design the most adequate algorithm, and (ii) generate
   The rules cover a use case’s context, whereas the execution algo-
                                                                          Linked Data without being restricted by tooling limitations.
rithm covers its technical and functional requirements. On the one
                                                                              The remainder of the paper is structured as follows: In Section 2,
hand, each use case’s context—how the Linked Data is modeled or
                                                                          we discuss the preliminary list with different factors that we identi-
which vocabularies are used to generate Linked Data—is described
                                                                          fied and in Section 3 we investigate which factors each tool covers
within the rules and does not influence the algorithm’s design. For
                                                                          in Section 4, we outline our conclusions.
instance, the rules consider the adequate ontology terms to annotate
the data. On the other hand, different factors related to technical
and functional requirements of each use case determine how an             2     EXECUTION FACTORS
                                                                          Different factors determine how an algorithm should be designed
LDOW2018, April 2018, Lyon, France                                        for generating Linked Data. A factor can be any fact or circum-
© 2018 Copyright held by the owner/author(s).
                                                                          stance that contributes to the envisaged result, i.e., the Linked Data
LDOW2018, April 2018, Lyon, France                                     Anastasia Dimou, Pieter Heyvaert, Ben De Meester, and Ruben Verborgh

Table 1: Factors affecting Linked Data generation, when it                            generation remains independent of the data’s potential con-
occurs and which the associated elements are.                                         sumption which should then be adjusted to the Linked Data
                                                                                      as it becomes available.
              factor     element        generation’s execution                      Consumption Linked Data generation can occur due to cer-
                        data rules      before      during                            tain consumption needs, namely a data consumer requires to
             purpose                      ✓                                           process Linked Data which still need to be generated from
           direction     ✓                ✓                                           raw data. Thus, the generated Linked Data is the response
      materialization                     ✓                                           for a particular consumption need.
             location    ✓                             ✓                         Example. NMBS, the Belgian train provider, has a legal obli-
       driving force     ✓       ✓        ✓                                   gation to publish information about train stations. Different data
              trigger                     ✓                                   consumers can profit of this Linked Data, which is already produced,
         dynamicity      ✓                ✓            ✓                      to build intelligent applications adjusted to the already generated
            diversity    ✓                ✓            ✓                      Linked Data. The Belgian Airlines, Belgium’s national airlines, want
         complexity      ✓       ✓        ✓                                   to identify all airports where its airplanes fly to. This consumption
                                                                              need leads to generating Linked Data specifically for this purpose.

generation. Those factors are related to (and often dependent on)             2.2      Direction
the elements involved in a Linked Data generation activity:                   Linked Data generation might follow different directions [10], which
                                                                              are determined by the available data:
      data both raw data to generate the desired Linked Data from,
        as well as existing Linked Data.                                            Target-centric The execution is focused on describing a set
      rules mapping rules that define how Linked Data are generated                   of views over the data source(s). The approach is same as the
        relying on available data.                                                    Global-As-View (gav) formalism for data integration [6, 11].
      tools tools that apply mapping rules to data and generate                       When mapping among different data models, it is possible
        Linked Data.                                                                  to define one of the data models as a view onto the other
                                                                                      data model [10]. The target might be (i) a certain graph pat-
   Multiple factors determine how Linked Data is generated fulfill-
                                                                                      tern derived from existing Linked Data, whose schema is
ing different technical or functional requirements posed by different
                                                                                      desired to be replicated; (ii) a given query (results-driven
use cases. The different factors that we identified are outlined below
                                                                                      editing approach [8]); (iii) a given schema (a combination
and summarized in Table 1. For each factor, we outline the two
                                                                                      of ontologies and vocabularies – schema-driven editing ap-
furthest alternatives to shed light on the options, but intermedi-
                                                                                      proach [8]); or (iv) a set of mapping rules (model-driven edit-
ate or hybrid approaches may be adopted as well. The differences
                                                                                      ing approach [8]). For instance, a data owners has a data
are determined depending on what the generation purpose is (Sec-
                                                                                      source, while other data is already described as Linked Data.
tion 2.1), its direction (Section 2.2) and materialization (Section 2.3),
                                                                                      The data owner then generates its own Linked Data, consid-
where it occurs (location, Section 2.4), what drives (Section 2.5)
                                                                                      ering a certain target.
and what triggers the execution (Section 2.6), how dynamic (Sec-
                                                                                    Source-centric The execution is focused on describing the
tion 2.7) or diverse the data is (Section 2.8), and the data or rules
                                                                                      entities of each data source, independently of other data
complexity (Section 2.9).
                                                                                      sources (data-driven editing approach [8]). The approach is
   Use case. Let us consider a use case that illustrates each factor                  similar to the Local-as-View (lav) formalism for data inte-
with the help of an example. The use case is about an intelligent                     gration [6, 11], as a mapping occurs from the original data
transportation search engine, which relies on Linked Data derived                     source(s) to the mediated schema (Linked Data or schema),
from heterogeneous data sources. The search engine obtains in-                        making it easier to add and remove data sources. For in-
formation about airports from an airline data source, about train                     stance, a data owner has two data sources. She defines rules
stations from a train data source and about the location of countries,                to semantically annotate those data sources, without being
cities, and addresses from a data source with spatial data.                           concerned about similar or complementary data which is
                                                                                      already available as Linked Data.
2.1      Purpose                                                                 The direction is determined before the generation activity is trig-
Different purposes can prompt the Linked Data generation. On a                gered, and depends mainly on the available data and rules. Different
high level, we identify: production and consumption. The purpose              execution algorithms may be designed that support either the one
that drives the generation affects the design choices of the execution        or the other, or both directions.
algorithm, but remains independent of the involved elements (data                Example. Following our use case, the Belgian Airlines specify
or rules). The fundamental difference lies on the extend of use cases         a set of sparql queries which act as the target. The rules are de-
that the Linked Data generation task aims to cover:                           fined for each data source specifically, so the resulting Linked Data
      Production Linked Data generation can be driven by a pro-               matches the sparql queries’ graph patterns. nmbs specified a set
        duction need, i.e., a data owner generates Linked Data to an-         of rules to generate its own Linked Data from its own available
        notate and turn the data publicly available. Production-driven        sources (source-centric).
What Factors Influence the Design of
a Linked Data Generation Algorithm?                                                                         LDOW2018, April 2018, Lyon, France

2.3      Materialization                                                            is translated to rules or directly provides rules based on
In relational databases, views simplify a database’s conceptual                     which Linked Data is generated.
model with the definition of a virtual relation [1]. A materialized               Data-driven The execution is driven by data which prompts
view is a database that contains results, while the process of setting              the Linked Data generation and adequate rules are executed.
up a materialized view is called materialization [1]. To achieve this,              Once this data reaches a Linked Data generation tool, a new
different materialization strategies exist [7]. In the same context,                execution is triggered to generate Linked Data according to
the Linked Data generation materialization differs on when the                      rules associated to this data.
consumption occurs, i.e., dumping or on-the-fly [10], affecting the            Example. Once an updated version of the train stations is avail-
corresponding algorithms. On the former case, long term consump-            able, the data might be sent to a Linked Data generation tool and
tion is expected, whereas, on the latter, direct. The materialization,      prompts a new generation round (data-driven). The airports Linked
as the purpose of execution, does not depend on the elements in-            Data generation is triggered by the rules which specify the corre-
volved in the Linked Data generation, and it impacts before the             sponding data sources (mapping-driven).
Linked Data is generated.
      Dumping A data dump is generated into a volatile or persis-           2.6      Trigger
        tent triplestore, aiming to provide a view of the data (similar     Linked Data generation can occur real-time or ad-hoc [9]. While it
        to a materialized view in relational databases).                    is independent of the elements involved in Linked Data generation,
      On-the-fly This occurs when the Linked Data generation takes          as it occurs with the purpose and materialization, it affects the algo-
        place on-the-fly (as a non-materialized view).                      rithms design, e.g., real-time execution requires timely generation.
   Example. NMBS dumps the train station Linked Data in a triple-                 Real-time Real-time execution is related to the notions of
store, which is used for storing and retrieving Linked Data, whereas                event, i.e., “any occurrence that results in a change in the
the Belgian Airlines generates the airports Linked Data on-the-fly                  sequential flow of program execution” and response time, i.e.,
when a query is executed without storing it.                                        “the time between the presentation of a set of inputs and the
                                                                                    appearance of all associated outputs”.
2.4      Location                                                                 On-demand On demand execution occurs if agents trigger the
The elements involved in Linked Data generation might reside                        execution to generate Linked Data when desired.
on different sites. The fundamental difference lies in where the               Example. The train stations generation occurs real-time, as every
data and rules reside, and where the execution takes place. That            time the data is updated, new Linked Data is generated. If the train
is determined before the Linked Data generation is initiated and            stations Linked Data is not generated every time a new version is
affects how the algorithms are designed. For instance, how the input        available, its generation occurs on-demand.
data is retrieved or processed differs. We identify the following:
      In-situ Linked Data generation is performed in-situ when it           2.7      Dynamicity
         is addressed by the same site that holds both the tool and         A data source’s dynamicity might differ, influencing how the Linked
         data. For instance, a data owner has the data and rules locally    Data generation occurs. Thus, it affects how an algorithm is de-
         stored and in the same place as the tool that executes the         signed and a corresponding tool is implemented. For instance, the
         rules to generate the Linked Data.                                 memory allocation is influenced. This factor depends on the data,
      Remote Linked Data generation occurs remotely when the                but not on the rules, and affects the generation both before and
         tool does not reside on the same site as the data and rules.       while it is executed.
         For instance, the tool is a remote service, e.g., Software-as-a-         Static data A static data structure refers to a data collection
         Service (SaaS). To the contrary, the tool may reside locally,               that has a certain size.
         but the data and rules not.                                              Dynamic data A dynamic data structure refers to a data col-
   Example. The train stations Linked Data is generated in-situ, as                  lection that has the flexibility to grow or shrink in size. For
both the tool and data might be on the same site. To the contrary,                   instance, it might not be possible to obtain all data, as the
the data for airports might reside remotely from the site where the                  data can be infinite in size.
tool to generate the Linked Data is.                                           Example. The train stations original dataset is static: when the
                                                                            Linked Data generation is triggered, the original raw dataset’s size
2.5      Driving force                                                      is known. The airport’s dataset is dynamic: its returned size is not
The rules to generate Linked Data can be executed using alternative         foreseen, as it depends on a query’s answers.
driving forces [4], namely rules and data, or any combination of the
two (hybrid), and algorithms are affected depending on the element          2.8      Diversity
that drives the Linked Data generation. Which approach is followed          Linked Data generation may occur based on a single or multiple
depends either on the data or rules.                                        data sources. The different data sources might be homogeneous or
      Mapping-driven The processing is driven by rules which                heterogeneous with respect to their structure, e.g., tabular, hierar-
        prompt the Linked Data generation and adequate data is              chical or attribute-value pairs, their format, e.g., CSV, ML, JSON, or
        employed. For instance, a data consumer poses a query that          their access interface, e.g., database connectivity, Web APIs or local
LDOW2018, April 2018, Lyon, France                                  Anastasia Dimou, Pieter Heyvaert, Ben De Meester, and Ruben Verborgh


files [5]. The diversity factor influences the Linked Data generation      2.0. Morph-streams4 is an extension over Morph for evaluating
both before, e.g., what is supported, and during the execution, e.g.,      SPARQL-Stream queries over a range of data streams. It allows
how heterogeneous data sources are aligned.                                to register SPARQL-Stream continuous queries over an R2RML-
      Homogeneity Data with same data structure and format.                wrapped data source, apply query-rewriting and receive updated
      Heterogeneity Data with different structures and formats.            results as soon as the queries are evaluated.
                                                                              Morph allows to generate Linked Data for both production and
2.9     Complexity                                                         consumption by both dumping the Linked Data, when they are
                                                                           generated for production, and consuming on-the-fly them, when
Data or rules complexity affects the algorithm’s design.
                                                                           they are generated for consumption. Similarly to DB2triples, Morph
      Data The original dataset’s size or e.g., the depth of a data        may only be used with in-situ (except for CSV files which can be
        source which is hierarchically-structured can influence how        remote and get accessed via HTTP) and homogeneous raw data,
        the Linked Data generation is accomplished. For instance,          but it can support both dynamic and static data. Morph functions
        big datasets require to be treated differently than smaller,       mapping-driven and on-demand. To a certain extend, Morph tries to
        as parallelization or distribution might be preferred which        address the query translation (SPARQLtoSQL) complexity. Morph-
        might be an overhead for smaller datasets.                         streams generates Linked Data from both static and dynamic data,
      Rules The rules complexity might be affected by e.g., the de-        on-demand and real-time from heterogeneous data sources imported
        sired transformations and (cross-sources) joins.                   though in a homogeneous database. Morph-streams tries to address
                                                                           complexity with respect to query-rewriting.
   All in all, the purpose, direction, materialization, driving force,
                                                                               Ontop. Ontop5 is another tool that allows to query relational
and trigger affect the Linked Data generation before the execution
                                                                           databases as Virtual RDF Graphs using SPARQL, as Morph does too.
occurs, whereas the location, and complexity affect during execution,
                                                                           It translates SPARQL queries into Datalog rules before transforming
while the dynamicity and diversity influence both before and during.
                                                                           them into SQL queries (query-translation). Similarly to Morph,
All these should be taken into consideration when designing the
                                                                           Ontop covers the same factors and tries to address complexity with
corresponding algorithms.
                                                                           respecto to query translation.
3     TOOLS                                                                   R2RMLParser. The R2RMLParser6 is a tool that relies on R2RML
We outline the pioneering and broadly used open source rule-based          mapping rules to generate Linked Data from relational databases.
tools for Linked Data generation which support the W3C recom-              The R2RML Parser deals in principle with incremental Linked Data
mended R2RML language [3] or its extension for heterogeneous               generation. In more details, each time a Linked Data generation
data sources, RML [4]. We investigate which factors each tool covers       task is executed, not all of the input data should be used, but only
and we discuss the results.                                                the one that changed (so-called incremental transformation). The
                                                                           R2RMLParser is released under the Creative Commons Attribution-
    DB2triples. DB2Triples1 is a tool for extracting data from rela-       NonCommercial 4.07 license.
tional databases, semantically annotating the data extracts accord-           The R2RMLParser can be characterized as mapping-driven and
ing to R2RML rules and generating Linked Data. It implements the           on-demand. The generation occurs for production reasons and the
two W3C specifications for generating Linked Data from databases,          Linked Data is dumped when generated. As the aforementioned
i.e., R2RML [3] and Direct Mapping [2]. It is an open-source system        tools which are focused on relational databases, the R2RML parser
released under GNU Lesser General Public License, version 2.12 .           focuses on local, homogeneous raw data. However, it seems to ad-
    DB2Triples is adequate for generating Linked Data for production,      dress to a certain extend, the dynamicity (both static and dynamic)
but not for consumption. It is a command-line tool that dumps the          and complexity of the data with respect to time.
generated Linked Data to a file. It only considers local (in-situ),
homogeneous and static databases. Its function is prompt by the               XSPARQL. XSPARQL8 performs dynamic query translation to
mapping and occurs on-demand, while it does not address neither            generate Linked Data from different sources. XSPARQL primarily
data nor rules complexity.                                                 provides a query-driven approach that combines XQuery [17] and
                                                                           SPARQL [34, 47]. This way, it allows to query data in XML and RDF
   Morph. Morph3 is a tool for Linked Data generation from data            using the same framework, and supports both the generation of RDF
residing in relational databases. It supports (i) data upgrade, which      from XML (lifting), and XML from RDF (lowering). XSPARQL was
generates Linked Data from a relational database, according to             extended to also support Linked Data generation from databases
certain R2RML mapping rules; and (ii) query translation, which             combining SQL and SPARQL via R2RML rules, but Linked Data
evaluates SPARQL queries over virtual Linked Data, by rewriting            cannot be generated from both XML and databases.
those queries into SQL. Morph employs a query translation algo-               XSPARQL does support heterogeneous data to a certain extend,
rithm from SPARQL to SQL with different optimizations during the           but it is limited to data in XML format and relational databases.
query rewriting process, to generate more efficient SQL queries. It        4 Morph-streams, hhttps://goo.gl/FYr9Lc
is an open-source system released under Apache License, Version            5 Ontop, https://github.com/ontop/ontop
                                                                           6 R2RMLParser, http://github.com/nkons/r2rml-parser
1 DB2triples, https://github.com/antidot/db2triples                        7 Creative Commons Attribution-NonCommercial 4.0, http://creativecommons.org/
2 GNU LGPL, version 2.1, https://goo.gl/Wi7qbV
                                                                           licenses/by-nc/4.0/
3 Morph, https://goo.gl/JtAyFL                                             8 XSPARQL, http://xsparql.deri.org/
What Factors Influence the Design of
a Linked Data Generation Algorithm?                                                                               LDOW2018, April 2018, Lyon, France

Extending it to support other heterogeneous data requires new               are the only tools which optimize the query translation, when gen-
pipelines, as each format is separately addressed and combination           erating Linked Data, while Morph-streams optimizes the query
of heterogeneous data is not feasible. Otherwise, XSPARQL is a              rewriting. Query rewriting and translation may be considered as
consumption-driven tool which generates Linked Data on-the-fly,             partially handling the rules and data complexity.
relying on local data and is prompt on-demand by the rules.                    Among the production-driven tools, none addresses complexity.
                                                                            The R2RMLparser is the only one that aims to address to a certain
    RMLMapper. The RMLMapper9 is an RML Engine, i.e., a rule-               extend the data complexity, in its case with respect to time.
based Linked Data generator for data sources accessed using differ-            Even though data complexity is studied in data mining, neither
ent protocols containing data in various structures, formats, and           results from these studies are applied to Linked Data generation
serializations, e.g., CSV, XML and JSON. It is written in Java and          algorithms nor such algorithms are investigated in this context.
can be used on its own via a command-line interface or its modules             Overall, lack of in-depth understanding of Linked Data gener-
separately in different interfaces, e.g., as a library or remote service.   ation complexity and the many degrees of freedom in designing
It is released under MIT license.                                           algorithms to generate Linked Data prevents human and software
    In contrast to the tools mentioned above, the RMLMapper fo-             agents from effortless generating and directly profiting of large
cuses on heterogeneous and both local and remote data to generate           amounts of Linked Data for use with Semantic Web technologies.
Linked Data. However it still deals with static data and it does not           With this position paper, which is not meant to be complete with
optimize neither data or rules complexity. It follows an on-demand          respect to the factors or tools, we aim to show the diversity of factors
and mapping driven approach and dumps the data in a file or any             that influence the Linked Data generation and the limited spectrum
other triplestore.                                                          that is covered by current tools. We intent to raise awareness that
                                                                            the algorithms which drive the Linked Data generation should be
   CARML. CARML10 is also an RML Engine. It is developed as a
                                                                            more systematically studied so as human and software agents to
Java library that transforms (semi-)structured sources to RDF based
                                                                            be able to effortlessly and efficiently generate Linked Data.
on rules declared in RML. More precisely, it supports data in CSV,
                                                                               In the future, we aim to study more thoroughly these factors and
JSON and XML format. It is an open-source system released under
                                                                            their alternatives. We hope that the factors and their alternatives
MIT license11 that takes a static input and streams it to generate
                                                                            will be exploited, more diverse algorithms will be designed and
teh corresponding Linked Data.
                                                                            more efficient tools for Linked Data generation will be developed.
   CARML follows the same principles as the RMLMapper. It also
focuses on heterogeneous data, follows the mapping-driven approach,
and generates Linked Data on-demand. It functions with static that
                                                                            REFERENCES
                                                                             [1] D. J. Abadi, D. S. Myers, D. J. DeWitt, and S. R. Madden. Materialization strategies
streams them to generate Linked Data. Nevertheless, as most of the               in a column-oriented DBMS. In Data Engineering, 2007. IEEE 23rd International
other tools do not optimize the data or rules complexity.                        Conference on, 2007.
                                                                             [2] M. Arenas, A. Bertails, E. Prud’hommeaux, and J. Sequeda. A Direct Mapping of
                                                                                 Relational Data to RDF. W3C Recommendation, W3C, Sept. 2012.
4    CONCLUSIONS                                                             [3] S. Das, S. Sundara, and R. Cyganiak. R2RML: RDB to RDF Mapping Language.
                                                                                 W3C Rec, Sept. 2012.
Overall, we observe patterns, i.e., correlations among different fac-        [4] A. Dimou, M. Vander Sande, P. Colpaert, R. Verborgh, E. Mannens, and R. Van de
tors or certain of their alternatives repeat over different tools.               Walle. RML: A Generic Language for Integrated RDF Mappings of Heterogeneous
                                                                                 Data. In Workshop on Linked Data on the Web, 2014.
   Tools which are consumption-driven typically function both with           [5] A. Dimou, R. Verborgh, M. Vander Sande, E. Mannens, and R. Van de Walle.
static and dynamic data but only with homogeneous data. Consumption-             Machine-interpretable Dataset and Service Descriptions for Heterogeneous Data
driven tools may be used for production purposes but they are not                Access and Retrieval. In Proceedings of the 11th International Conference on
                                                                                 Semantic Systems, 2015.
optimized for that purpose and cannot handle heterogeneity.                  [6] A. Doan, A. Halevy, and Z. Ives. Principles of Data Integration. 2012.
   The dynamicity is only addressed by consumption-driven imple-             [7] E. N. Hanson. A performance analysis of view materialization strategies. 1987.
mentations in the form of dynamic data that answer a certain query.          [8] P. Heyvaert, A. Dimou, R. Verborgh, E. Mannens, and R. Van de Walle. Towards
                                                                                 Approaches for Generating RDF Mapping Definitions. In Proceedings of the 14th
Even though it is not obvious from the aforementioned, the extend                International Semantic Web Conference: Posters and Demos, volume 1486, 2015.
to which the consumption-driven tools address dynamicity do not              [9] N. Konstantinou, D.-E. Spanos, D. Kouis, and N. Mitrou. An approach for the
                                                                                 Incremental Export of Relational Databases into RDF Graphs. International
adhere well with the complexity, in particular of data, e.g., its size.          Journal on Artificial Intelligence Tools, 24, 2015.
   Production-driven tools support heterogeneous data but typically         [10] A. Langegger and W. Wöß. XLWrap – querying and integrating arbitrary spread-
do not support real-time generation. Only the most recent, CARML,                sheets with SPARQL. In The Semantic Web - ISWC 2009: 8th International Semantic
                                                                                 Web Conference, ISWC 2009, Chantilly, VA, USA, October 25-29, 2009, 2009.
focuses on Linked Data generation from dynamic data. However,               [11] M. Lenzerini. Data Integration: A Theoretical Perspective. In Proceedings of the
even then, the dynamicity is caused from otherwise static data.                  Twenty-first ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database
   In general, there are no tools which support the data-driven                  Systems, pages 233–246, 2002.
approach, as there are no tools which support real-time data.
   Moreover, none of the tools put effort into optimizing the com-          APPENDIX
plexity of the data or rules nor do they optimize their generation
algorithms. The consumption-driven tools, i.e., Morph and Ontop,

9 RMLMapper, http://github.com/RMLio/RML-Mapper
10 CARML, https://github.com/carml/carml
11 MIT license, https://opensource.org/licenses/MIT
LDOW2018, April 2018, Lyon, France                              Anastasia Dimou, Pieter Heyvaert, Ben De Meester, and Ruben Verborgh

                   Table 2: Linked Data generation tools, mapping language and supported input formats


                                     carml    DB2triples   Morph     Ontop   R2RMLparser    RMLMapper     XSPARQL
                       language
                         R2RML         –          ✓          ✓         ✓          ✓              ✓            ✓
                           RML         ✓          –          –         –          –              ✓            –
                            input
               relational database     –          ✓          ✓         ✓          ✓              ✓            ✓
                              CSV      ✓          –          ✓         –          –              ✓            –
                             JSON      ✓          –          –         –          –              ✓            –
                              XML      ✓          –          –         –          –              ✓            ✓

                                          Table 3: Linked Data generation tools and factors


                          factor     carml    DB2triples   Morph    Ontop    R2RMLparser   RMLMapper     XSPARQL
                        purpose
                      production      ✓           ✓         ✓         ✓           ✓             ✓            –
                    consumption       –           –         ✓         ✓           –             –            ✓
                materialization
                       dumping        ✓           ✓         ✓         ✓           ✓             ✓             ✓
                      on-the-fly      –           –         ✓         ✓           –             –            n/a
                       location
                          in-situ     ✓           ✓         ✓         ✓           ✓             ✓             ✓
                         remote       ✓           –         –         –           –             ✓            n/a
                  driving force
                       mapping        ✓           ✓         ✓         ✓           ✓             ✓            ✓
                           data       –           –         –         –           –             –            –
                         trigger
                       real-time      –           –         –         ✓           –             –            –
                     on demand        ✓           ✓         ✓         ✓           ✓             ✓            ✓
                    dynamicity
                          static      ✓           ✓         ✓         ✓           ✓             ✓            ✓
                       dynamic        –           –         ✓         ✓           –             –            ✓
                       diversity
                    homogeneity       ✓           ✓         ✓         ✓           ✓             ✓
                   heterogeneity      ✓           –         –         –           –             ✓            ✓
                    complexity
                           data       –           –          –        –          (✓)             –            –
                          rules       –           –         (✓)       –           –              –            –