What Factors Influence the Design of a Linked Data Generation Algorithm? Anastasia Dimou Pieter Heyvaert anastasia.dimou@ugent.be pieter.heyvaert@ugent.be IDLab, Dep. of Electronics and Information Systems, IDLab, Dep. of Electronics and Information Systems, imec – Ghent University imec – Ghent University Ben De Meester Ruben Verborgh ben.demeester@ugent.be ruben.verborgh@ugent.be IDLab, Dep. of Electronics and Information Systems, IDLab, Dep. of Electronics and Information Systems, imec – Ghent University imec – Ghent University ABSTRACT algorithm should be designed as well as how third-parties choose Generating Linked Data remains a complicated and intensive engi- the most adequate tool. Potential alternatives for these factors af- neering process. While different factors determine how a Linked fect how efficiently the rules are executed to generate Linked Data. Data generation algorithm is designed, potential alternatives for For instance, “What is the purpose? Is the Linked Data consumed each factor are currently not considered when designing the tools’ immediately or is it published for future use?” or “What triggers underlying algorithms. Certain design patterns are frequently ap- the generation? Is the Linked Data generated from a real-time data plied across different tools, covering certain alternatives of a few of stream which needs to be immediately processed, or on demand?”. these factors, whereas other alternatives are never explored. Con- Certain design patterns are noticed to be frequently applied sequently, there are no adequate tools for Linked Data generation across different tools, covering particular alternatives of these fac- for certain occasions, or tools with inadequate and inefficient algo- tors, whereas other alternatives or factors were never explored. rithms are chosen. In this position paper, we determine such factors, For instance, if we lack tools whose algorithms support real-time based on our experiences, and present a preliminary list. These fac- data streams, Linked Data can be generated by storing the data tors could be considered when a Linked Data generation algorithm in a database and using corresponding tools, e.g., Morph-streams. is designed or a tool is chosen. We investigated which factors are Thus, there are often no adequate tools for Linked Data genera- covered by widely known Linked Data generation tools and con- tion for certain occasions or tools with inadequate and inefficient cluded that only certain design patterns are frequently encountered. algorithms are chosen as best alternatives. By these means, we aim to point out that Linked Data generation However, those factors were not thoroughly and systematically is above and beyond bare implementations, and algorithms need to studied so far, nor were the algorithms’ designs. Different solutions be thoroughly and systematically studied and exploited. do not concretely describe the algorithms that drive their imple- mentation while optimizations are applied according to specific use 1 INTRODUCTION cases, decreasing a tool’s chances to be reused. The algorithms are not designed considering different factors and the use case’s tech- Generating Linked Data remains a complicated and intensive engi- nical and functional requirements are not matched to any factors. neering process, despite the significant number of existing tools. In this work, we present a preliminary list of factors that could Most solutions primarily choose their own (often non-interoperable) be considered. We do not aim for a complete list. This is a position approach. Format- and source-specific approaches were investigated paper whose goal is to provide insights and raise awareness, as as more generic alternatives [4]. In all cases, rules define how Linked Linked Data generation goes beyond and above bare implementa- Data should be generated. These rules often remain implicit and em- tions. We reviewed a few of the pioneering and broadly used tools bedded in the implementation e.g., the DBpedia Extraction Frame- that cover one or more of these factors and discuss the results of work, but more generic solutions distinguish them and turn them our observations. By these means, we aim to ensure that one can explicit and declarative, e.g., [R2]RML processors. (i) choose or design the most adequate algorithm, and (ii) generate The rules cover a use case’s context, whereas the execution algo- Linked Data without being restricted by tooling limitations. rithm covers its technical and functional requirements. On the one The remainder of the paper is structured as follows: In Section 2, hand, each use case’s context—how the Linked Data is modeled or we discuss the preliminary list with different factors that we identi- which vocabularies are used to generate Linked Data—is described fied and in Section 3 we investigate which factors each tool covers within the rules and does not influence the algorithm’s design. For in Section 4, we outline our conclusions. instance, the rules consider the adequate ontology terms to annotate the data. On the other hand, different factors related to technical and functional requirements of each use case determine how an 2 EXECUTION FACTORS Different factors determine how an algorithm should be designed LDOW2018, April 2018, Lyon, France for generating Linked Data. A factor can be any fact or circum- © 2018 Copyright held by the owner/author(s). stance that contributes to the envisaged result, i.e., the Linked Data LDOW2018, April 2018, Lyon, France Anastasia Dimou, Pieter Heyvaert, Ben De Meester, and Ruben Verborgh Table 1: Factors affecting Linked Data generation, when it generation remains independent of the data’s potential con- occurs and which the associated elements are. sumption which should then be adjusted to the Linked Data as it becomes available. factor element generation’s execution Consumption Linked Data generation can occur due to cer- data rules before during tain consumption needs, namely a data consumer requires to purpose ✓ process Linked Data which still need to be generated from direction ✓ ✓ raw data. Thus, the generated Linked Data is the response materialization ✓ for a particular consumption need. location ✓ ✓ Example. NMBS, the Belgian train provider, has a legal obli- driving force ✓ ✓ ✓ gation to publish information about train stations. Different data trigger ✓ consumers can profit of this Linked Data, which is already produced, dynamicity ✓ ✓ ✓ to build intelligent applications adjusted to the already generated diversity ✓ ✓ ✓ Linked Data. The Belgian Airlines, Belgium’s national airlines, want complexity ✓ ✓ ✓ to identify all airports where its airplanes fly to. This consumption need leads to generating Linked Data specifically for this purpose. generation. Those factors are related to (and often dependent on) 2.2 Direction the elements involved in a Linked Data generation activity: Linked Data generation might follow different directions [10], which are determined by the available data: data both raw data to generate the desired Linked Data from, as well as existing Linked Data. Target-centric The execution is focused on describing a set rules mapping rules that define how Linked Data are generated of views over the data source(s). The approach is same as the relying on available data. Global-As-View (gav) formalism for data integration [6, 11]. tools tools that apply mapping rules to data and generate When mapping among different data models, it is possible Linked Data. to define one of the data models as a view onto the other data model [10]. The target might be (i) a certain graph pat- Multiple factors determine how Linked Data is generated fulfill- tern derived from existing Linked Data, whose schema is ing different technical or functional requirements posed by different desired to be replicated; (ii) a given query (results-driven use cases. The different factors that we identified are outlined below editing approach [8]); (iii) a given schema (a combination and summarized in Table 1. For each factor, we outline the two of ontologies and vocabularies – schema-driven editing ap- furthest alternatives to shed light on the options, but intermedi- proach [8]); or (iv) a set of mapping rules (model-driven edit- ate or hybrid approaches may be adopted as well. The differences ing approach [8]). For instance, a data owners has a data are determined depending on what the generation purpose is (Sec- source, while other data is already described as Linked Data. tion 2.1), its direction (Section 2.2) and materialization (Section 2.3), The data owner then generates its own Linked Data, consid- where it occurs (location, Section 2.4), what drives (Section 2.5) ering a certain target. and what triggers the execution (Section 2.6), how dynamic (Sec- Source-centric The execution is focused on describing the tion 2.7) or diverse the data is (Section 2.8), and the data or rules entities of each data source, independently of other data complexity (Section 2.9). sources (data-driven editing approach [8]). The approach is Use case. Let us consider a use case that illustrates each factor similar to the Local-as-View (lav) formalism for data inte- with the help of an example. The use case is about an intelligent gration [6, 11], as a mapping occurs from the original data transportation search engine, which relies on Linked Data derived source(s) to the mediated schema (Linked Data or schema), from heterogeneous data sources. The search engine obtains in- making it easier to add and remove data sources. For in- formation about airports from an airline data source, about train stance, a data owner has two data sources. She defines rules stations from a train data source and about the location of countries, to semantically annotate those data sources, without being cities, and addresses from a data source with spatial data. concerned about similar or complementary data which is already available as Linked Data. 2.1 Purpose The direction is determined before the generation activity is trig- Different purposes can prompt the Linked Data generation. On a gered, and depends mainly on the available data and rules. Different high level, we identify: production and consumption. The purpose execution algorithms may be designed that support either the one that drives the generation affects the design choices of the execution or the other, or both directions. algorithm, but remains independent of the involved elements (data Example. Following our use case, the Belgian Airlines specify or rules). The fundamental difference lies on the extend of use cases a set of sparql queries which act as the target. The rules are de- that the Linked Data generation task aims to cover: fined for each data source specifically, so the resulting Linked Data Production Linked Data generation can be driven by a pro- matches the sparql queries’ graph patterns. nmbs specified a set duction need, i.e., a data owner generates Linked Data to an- of rules to generate its own Linked Data from its own available notate and turn the data publicly available. Production-driven sources (source-centric). What Factors Influence the Design of a Linked Data Generation Algorithm? LDOW2018, April 2018, Lyon, France 2.3 Materialization is translated to rules or directly provides rules based on In relational databases, views simplify a database’s conceptual which Linked Data is generated. model with the definition of a virtual relation [1]. A materialized Data-driven The execution is driven by data which prompts view is a database that contains results, while the process of setting the Linked Data generation and adequate rules are executed. up a materialized view is called materialization [1]. To achieve this, Once this data reaches a Linked Data generation tool, a new different materialization strategies exist [7]. In the same context, execution is triggered to generate Linked Data according to the Linked Data generation materialization differs on when the rules associated to this data. consumption occurs, i.e., dumping or on-the-fly [10], affecting the Example. Once an updated version of the train stations is avail- corresponding algorithms. On the former case, long term consump- able, the data might be sent to a Linked Data generation tool and tion is expected, whereas, on the latter, direct. The materialization, prompts a new generation round (data-driven). The airports Linked as the purpose of execution, does not depend on the elements in- Data generation is triggered by the rules which specify the corre- volved in the Linked Data generation, and it impacts before the sponding data sources (mapping-driven). Linked Data is generated. Dumping A data dump is generated into a volatile or persis- 2.6 Trigger tent triplestore, aiming to provide a view of the data (similar Linked Data generation can occur real-time or ad-hoc [9]. While it to a materialized view in relational databases). is independent of the elements involved in Linked Data generation, On-the-fly This occurs when the Linked Data generation takes as it occurs with the purpose and materialization, it affects the algo- place on-the-fly (as a non-materialized view). rithms design, e.g., real-time execution requires timely generation. Example. NMBS dumps the train station Linked Data in a triple- Real-time Real-time execution is related to the notions of store, which is used for storing and retrieving Linked Data, whereas event, i.e., “any occurrence that results in a change in the the Belgian Airlines generates the airports Linked Data on-the-fly sequential flow of program execution” and response time, i.e., when a query is executed without storing it. “the time between the presentation of a set of inputs and the appearance of all associated outputs”. 2.4 Location On-demand On demand execution occurs if agents trigger the The elements involved in Linked Data generation might reside execution to generate Linked Data when desired. on different sites. The fundamental difference lies in where the Example. The train stations generation occurs real-time, as every data and rules reside, and where the execution takes place. That time the data is updated, new Linked Data is generated. If the train is determined before the Linked Data generation is initiated and stations Linked Data is not generated every time a new version is affects how the algorithms are designed. For instance, how the input available, its generation occurs on-demand. data is retrieved or processed differs. We identify the following: In-situ Linked Data generation is performed in-situ when it 2.7 Dynamicity is addressed by the same site that holds both the tool and A data source’s dynamicity might differ, influencing how the Linked data. For instance, a data owner has the data and rules locally Data generation occurs. Thus, it affects how an algorithm is de- stored and in the same place as the tool that executes the signed and a corresponding tool is implemented. For instance, the rules to generate the Linked Data. memory allocation is influenced. This factor depends on the data, Remote Linked Data generation occurs remotely when the but not on the rules, and affects the generation both before and tool does not reside on the same site as the data and rules. while it is executed. For instance, the tool is a remote service, e.g., Software-as-a- Static data A static data structure refers to a data collection Service (SaaS). To the contrary, the tool may reside locally, that has a certain size. but the data and rules not. Dynamic data A dynamic data structure refers to a data col- Example. The train stations Linked Data is generated in-situ, as lection that has the flexibility to grow or shrink in size. For both the tool and data might be on the same site. To the contrary, instance, it might not be possible to obtain all data, as the the data for airports might reside remotely from the site where the data can be infinite in size. tool to generate the Linked Data is. Example. The train stations original dataset is static: when the Linked Data generation is triggered, the original raw dataset’s size 2.5 Driving force is known. The airport’s dataset is dynamic: its returned size is not The rules to generate Linked Data can be executed using alternative foreseen, as it depends on a query’s answers. driving forces [4], namely rules and data, or any combination of the two (hybrid), and algorithms are affected depending on the element 2.8 Diversity that drives the Linked Data generation. Which approach is followed Linked Data generation may occur based on a single or multiple depends either on the data or rules. data sources. The different data sources might be homogeneous or Mapping-driven The processing is driven by rules which heterogeneous with respect to their structure, e.g., tabular, hierar- prompt the Linked Data generation and adequate data is chical or attribute-value pairs, their format, e.g., CSV, ML, JSON, or employed. For instance, a data consumer poses a query that their access interface, e.g., database connectivity, Web APIs or local LDOW2018, April 2018, Lyon, France Anastasia Dimou, Pieter Heyvaert, Ben De Meester, and Ruben Verborgh files [5]. The diversity factor influences the Linked Data generation 2.0. Morph-streams4 is an extension over Morph for evaluating both before, e.g., what is supported, and during the execution, e.g., SPARQL-Stream queries over a range of data streams. It allows how heterogeneous data sources are aligned. to register SPARQL-Stream continuous queries over an R2RML- Homogeneity Data with same data structure and format. wrapped data source, apply query-rewriting and receive updated Heterogeneity Data with different structures and formats. results as soon as the queries are evaluated. Morph allows to generate Linked Data for both production and 2.9 Complexity consumption by both dumping the Linked Data, when they are generated for production, and consuming on-the-fly them, when Data or rules complexity affects the algorithm’s design. they are generated for consumption. Similarly to DB2triples, Morph Data The original dataset’s size or e.g., the depth of a data may only be used with in-situ (except for CSV files which can be source which is hierarchically-structured can influence how remote and get accessed via HTTP) and homogeneous raw data, the Linked Data generation is accomplished. For instance, but it can support both dynamic and static data. Morph functions big datasets require to be treated differently than smaller, mapping-driven and on-demand. To a certain extend, Morph tries to as parallelization or distribution might be preferred which address the query translation (SPARQLtoSQL) complexity. Morph- might be an overhead for smaller datasets. streams generates Linked Data from both static and dynamic data, Rules The rules complexity might be affected by e.g., the de- on-demand and real-time from heterogeneous data sources imported sired transformations and (cross-sources) joins. though in a homogeneous database. Morph-streams tries to address complexity with respect to query-rewriting. All in all, the purpose, direction, materialization, driving force, Ontop. Ontop5 is another tool that allows to query relational and trigger affect the Linked Data generation before the execution databases as Virtual RDF Graphs using SPARQL, as Morph does too. occurs, whereas the location, and complexity affect during execution, It translates SPARQL queries into Datalog rules before transforming while the dynamicity and diversity influence both before and during. them into SQL queries (query-translation). Similarly to Morph, All these should be taken into consideration when designing the Ontop covers the same factors and tries to address complexity with corresponding algorithms. respecto to query translation. 3 TOOLS R2RMLParser. The R2RMLParser6 is a tool that relies on R2RML We outline the pioneering and broadly used open source rule-based mapping rules to generate Linked Data from relational databases. tools for Linked Data generation which support the W3C recom- The R2RML Parser deals in principle with incremental Linked Data mended R2RML language [3] or its extension for heterogeneous generation. In more details, each time a Linked Data generation data sources, RML [4]. We investigate which factors each tool covers task is executed, not all of the input data should be used, but only and we discuss the results. the one that changed (so-called incremental transformation). The R2RMLParser is released under the Creative Commons Attribution- DB2triples. DB2Triples1 is a tool for extracting data from rela- NonCommercial 4.07 license. tional databases, semantically annotating the data extracts accord- The R2RMLParser can be characterized as mapping-driven and ing to R2RML rules and generating Linked Data. It implements the on-demand. The generation occurs for production reasons and the two W3C specifications for generating Linked Data from databases, Linked Data is dumped when generated. As the aforementioned i.e., R2RML [3] and Direct Mapping [2]. It is an open-source system tools which are focused on relational databases, the R2RML parser released under GNU Lesser General Public License, version 2.12 . focuses on local, homogeneous raw data. However, it seems to ad- DB2Triples is adequate for generating Linked Data for production, dress to a certain extend, the dynamicity (both static and dynamic) but not for consumption. It is a command-line tool that dumps the and complexity of the data with respect to time. generated Linked Data to a file. It only considers local (in-situ), homogeneous and static databases. Its function is prompt by the XSPARQL. XSPARQL8 performs dynamic query translation to mapping and occurs on-demand, while it does not address neither generate Linked Data from different sources. XSPARQL primarily data nor rules complexity. provides a query-driven approach that combines XQuery [17] and SPARQL [34, 47]. This way, it allows to query data in XML and RDF Morph. Morph3 is a tool for Linked Data generation from data using the same framework, and supports both the generation of RDF residing in relational databases. It supports (i) data upgrade, which from XML (lifting), and XML from RDF (lowering). XSPARQL was generates Linked Data from a relational database, according to extended to also support Linked Data generation from databases certain R2RML mapping rules; and (ii) query translation, which combining SQL and SPARQL via R2RML rules, but Linked Data evaluates SPARQL queries over virtual Linked Data, by rewriting cannot be generated from both XML and databases. those queries into SQL. Morph employs a query translation algo- XSPARQL does support heterogeneous data to a certain extend, rithm from SPARQL to SQL with different optimizations during the but it is limited to data in XML format and relational databases. query rewriting process, to generate more efficient SQL queries. It 4 Morph-streams, hhttps://goo.gl/FYr9Lc is an open-source system released under Apache License, Version 5 Ontop, https://github.com/ontop/ontop 6 R2RMLParser, http://github.com/nkons/r2rml-parser 1 DB2triples, https://github.com/antidot/db2triples 7 Creative Commons Attribution-NonCommercial 4.0, http://creativecommons.org/ 2 GNU LGPL, version 2.1, https://goo.gl/Wi7qbV licenses/by-nc/4.0/ 3 Morph, https://goo.gl/JtAyFL 8 XSPARQL, http://xsparql.deri.org/ What Factors Influence the Design of a Linked Data Generation Algorithm? LDOW2018, April 2018, Lyon, France Extending it to support other heterogeneous data requires new are the only tools which optimize the query translation, when gen- pipelines, as each format is separately addressed and combination erating Linked Data, while Morph-streams optimizes the query of heterogeneous data is not feasible. Otherwise, XSPARQL is a rewriting. Query rewriting and translation may be considered as consumption-driven tool which generates Linked Data on-the-fly, partially handling the rules and data complexity. relying on local data and is prompt on-demand by the rules. Among the production-driven tools, none addresses complexity. The R2RMLparser is the only one that aims to address to a certain RMLMapper. The RMLMapper9 is an RML Engine, i.e., a rule- extend the data complexity, in its case with respect to time. based Linked Data generator for data sources accessed using differ- Even though data complexity is studied in data mining, neither ent protocols containing data in various structures, formats, and results from these studies are applied to Linked Data generation serializations, e.g., CSV, XML and JSON. It is written in Java and algorithms nor such algorithms are investigated in this context. can be used on its own via a command-line interface or its modules Overall, lack of in-depth understanding of Linked Data gener- separately in different interfaces, e.g., as a library or remote service. ation complexity and the many degrees of freedom in designing It is released under MIT license. algorithms to generate Linked Data prevents human and software In contrast to the tools mentioned above, the RMLMapper fo- agents from effortless generating and directly profiting of large cuses on heterogeneous and both local and remote data to generate amounts of Linked Data for use with Semantic Web technologies. Linked Data. However it still deals with static data and it does not With this position paper, which is not meant to be complete with optimize neither data or rules complexity. It follows an on-demand respect to the factors or tools, we aim to show the diversity of factors and mapping driven approach and dumps the data in a file or any that influence the Linked Data generation and the limited spectrum other triplestore. that is covered by current tools. We intent to raise awareness that the algorithms which drive the Linked Data generation should be CARML. CARML10 is also an RML Engine. It is developed as a more systematically studied so as human and software agents to Java library that transforms (semi-)structured sources to RDF based be able to effortlessly and efficiently generate Linked Data. on rules declared in RML. More precisely, it supports data in CSV, In the future, we aim to study more thoroughly these factors and JSON and XML format. It is an open-source system released under their alternatives. We hope that the factors and their alternatives MIT license11 that takes a static input and streams it to generate will be exploited, more diverse algorithms will be designed and teh corresponding Linked Data. more efficient tools for Linked Data generation will be developed. CARML follows the same principles as the RMLMapper. It also focuses on heterogeneous data, follows the mapping-driven approach, and generates Linked Data on-demand. It functions with static that REFERENCES [1] D. J. Abadi, D. S. Myers, D. J. DeWitt, and S. R. Madden. Materialization strategies streams them to generate Linked Data. Nevertheless, as most of the in a column-oriented DBMS. In Data Engineering, 2007. IEEE 23rd International other tools do not optimize the data or rules complexity. Conference on, 2007. [2] M. Arenas, A. Bertails, E. Prud’hommeaux, and J. Sequeda. A Direct Mapping of Relational Data to RDF. W3C Recommendation, W3C, Sept. 2012. 4 CONCLUSIONS [3] S. Das, S. Sundara, and R. Cyganiak. R2RML: RDB to RDF Mapping Language. W3C Rec, Sept. 2012. Overall, we observe patterns, i.e., correlations among different fac- [4] A. Dimou, M. Vander Sande, P. Colpaert, R. Verborgh, E. Mannens, and R. Van de tors or certain of their alternatives repeat over different tools. Walle. RML: A Generic Language for Integrated RDF Mappings of Heterogeneous Data. In Workshop on Linked Data on the Web, 2014. Tools which are consumption-driven typically function both with [5] A. Dimou, R. Verborgh, M. Vander Sande, E. Mannens, and R. Van de Walle. static and dynamic data but only with homogeneous data. Consumption- Machine-interpretable Dataset and Service Descriptions for Heterogeneous Data driven tools may be used for production purposes but they are not Access and Retrieval. In Proceedings of the 11th International Conference on Semantic Systems, 2015. optimized for that purpose and cannot handle heterogeneity. [6] A. Doan, A. Halevy, and Z. Ives. Principles of Data Integration. 2012. The dynamicity is only addressed by consumption-driven imple- [7] E. N. Hanson. A performance analysis of view materialization strategies. 1987. mentations in the form of dynamic data that answer a certain query. [8] P. Heyvaert, A. Dimou, R. Verborgh, E. Mannens, and R. Van de Walle. Towards Approaches for Generating RDF Mapping Definitions. In Proceedings of the 14th Even though it is not obvious from the aforementioned, the extend International Semantic Web Conference: Posters and Demos, volume 1486, 2015. to which the consumption-driven tools address dynamicity do not [9] N. Konstantinou, D.-E. Spanos, D. Kouis, and N. Mitrou. An approach for the Incremental Export of Relational Databases into RDF Graphs. International adhere well with the complexity, in particular of data, e.g., its size. Journal on Artificial Intelligence Tools, 24, 2015. Production-driven tools support heterogeneous data but typically [10] A. Langegger and W. Wöß. XLWrap – querying and integrating arbitrary spread- do not support real-time generation. Only the most recent, CARML, sheets with SPARQL. In The Semantic Web - ISWC 2009: 8th International Semantic Web Conference, ISWC 2009, Chantilly, VA, USA, October 25-29, 2009, 2009. focuses on Linked Data generation from dynamic data. However, [11] M. Lenzerini. Data Integration: A Theoretical Perspective. In Proceedings of the even then, the dynamicity is caused from otherwise static data. Twenty-first ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database In general, there are no tools which support the data-driven Systems, pages 233–246, 2002. approach, as there are no tools which support real-time data. Moreover, none of the tools put effort into optimizing the com- APPENDIX plexity of the data or rules nor do they optimize their generation algorithms. The consumption-driven tools, i.e., Morph and Ontop, 9 RMLMapper, http://github.com/RMLio/RML-Mapper 10 CARML, https://github.com/carml/carml 11 MIT license, https://opensource.org/licenses/MIT LDOW2018, April 2018, Lyon, France Anastasia Dimou, Pieter Heyvaert, Ben De Meester, and Ruben Verborgh Table 2: Linked Data generation tools, mapping language and supported input formats carml DB2triples Morph Ontop R2RMLparser RMLMapper XSPARQL language R2RML – ✓ ✓ ✓ ✓ ✓ ✓ RML ✓ – – – – ✓ – input relational database – ✓ ✓ ✓ ✓ ✓ ✓ CSV ✓ – ✓ – – ✓ – JSON ✓ – – – – ✓ – XML ✓ – – – – ✓ ✓ Table 3: Linked Data generation tools and factors factor carml DB2triples Morph Ontop R2RMLparser RMLMapper XSPARQL purpose production ✓ ✓ ✓ ✓ ✓ ✓ – consumption – – ✓ ✓ – – ✓ materialization dumping ✓ ✓ ✓ ✓ ✓ ✓ ✓ on-the-fly – – ✓ ✓ – – n/a location in-situ ✓ ✓ ✓ ✓ ✓ ✓ ✓ remote ✓ – – – – ✓ n/a driving force mapping ✓ ✓ ✓ ✓ ✓ ✓ ✓ data – – – – – – – trigger real-time – – – ✓ – – – on demand ✓ ✓ ✓ ✓ ✓ ✓ ✓ dynamicity static ✓ ✓ ✓ ✓ ✓ ✓ ✓ dynamic – – ✓ ✓ – – ✓ diversity homogeneity ✓ ✓ ✓ ✓ ✓ ✓ heterogeneity ✓ – – – – ✓ ✓ complexity data – – – – (✓) – – rules – – (✓) – – – –