Towards Approaches for Generating RDF Mapping Definitions Pieter Heyvaert, Anastasia Dimou, Ruben Verborgh, Erik Mannens, and Rik Van de Walle Ghent University - iMinds - Multimedia Lab, pheyvaer.heyvaert@ugent.be Abstract. Obtaining Linked Data by modeling domain-level knowledge derived from input data is not straightforward for data publishers, espe- cially if they are not Semantic Web experts. Developing user interfaces that support domain experts to semantically annotate their data became feasible, as the mapping rules were abstracted from their execution. How- ever, most existing approaches reflect how mappings are typically exe- cuted: they offer a single linear workflow, triggered by a particular data source. Alternative approaches were neither thoroughly investigated yet, nor incorporated in most existing user interfaces for mappings. In this paper, we generalize the two prevalent approaches for generating map- pings of data in databases: database-driven and ontology-driven, to be applicable for any other data structure; and introduce two approaches: model-driven and result-driven. 1 Introduction A substantial amount of Linked Data is generated from data that exists in het- erogeneous formats and comes from different sources. This generation process is facilitated by mapping languages, such as the wc recommended rrml [1] or its extended version rml [2], which separate the definition of mappings from their execution. While data publishers are domain experts—the intended cre- ators of mappings—manually creating and editing mapping definitions requires knowledge of the mapping language’s syntax, which is unpractical for most pub- lishers [3]. Therefore, user interfaces can facilitate domain experts to specify mappings much more conveniently. Pinkel et al. [3] introduced two types of approaches for editing mappings for data in relational databases to the Resource Description Framework (rdf), namely (i) the database-driven and (ii) the ontology-driven, both of which are implemented at fluidOps1 . The suitability of each approach depends on differ- ent factors. However, most existing mapping interfaces, which mainly refer to data in databases, support only one of the two approaches. By doing so, data publishers’ editing options are restricted. Alternative approaches beside the two 1 http://www.fluidops.com/ 2 Pieter Heyvaert et al. id name age a foaf:Person id name age a foaf:Person foaf:name “John” foaf:name “John” 1 John 25 a foaf:Person 1 John 25 a foaf:Person foaf:name “Jane” foaf:name “Jane” 2 Jane 24 2 Jane 24 foaf:Person foaf:name foaf:Person foaf:name foaf:Document foaf:gender foaf:Document foaf:gender classes properties properties foaf:Agent foaf:Age classes foaf:Agent foaf:Age dcterms:Agent dcterms:title dcterms:Agent dcterms:title (a) data-driven (b) schema-driven id name age a foaf:Person id name age a foaf:Person foaf:name “John” foaf:name “John” 1 John 25 a foaf:Person 1 John 25 a foaf:Person foaf:name “Jane” foaf:name “Jane” 2 Jane 24 2 Jane 24 name person foaf:Person foaf:name foaf:Person foaf:name foaf:Document foaf:gender foaf:Document foaf:gender properties classes properties classes foaf:Agent foaf:Age foaf:Agent foaf:Age dcterms:Agent dcterms:title dcterms:Agent dcterms:title (c) model-driven (d) result-driven Fig. 1: Conceptual comparison of approaches to generate mappings. Input data, vocabularies, mappings and rdf triples are involved in different combinations. aforementioned ones were not thoroughly investigated so far, even though they might be more adequate under different circumstances. Moreover, the identified approaches are limited to modeling data in databases. Thus, the implementations completely disregard heterogeneous formats. Addi- tionally, these implementations fail to take into account and combine multi- ple data sources [4]. In this paper, we therefore generalize the approaches to data-driven and schema-driven and introduce two alternatives based on obser- vation: the model-driven and the result-driven approaches. 2 Approaches Our goal is to introduce mapping generation approaches that cover more thor- oughly the different needs and alternative usage scenarios: ranging from seman- tically annotating a particular data source to modeling domain-level knowledge. Each of the approaches is described in detail below and visualized in Figure 1. Data-Driven Mapping Definitions Generation In the database-driven approach [3], data publishers have the data from the database available. The generation of the mappings is based on that data, namely data fractions are iteratively associated to a corresponding mapping rule. For Towards Approaches for Generating RDF Mapping Definitions 3 instance, the fluidOps and Karma2 interfaces follow this approach. However, from a more general point of view, this approach could be applicable to any type of input data, and any combination of them, besides relational databases. Thus, we introduce a more generic approach, so-called data-driven (Figure 1a). Instead of only considering data from a database, any number of input data sources in any format, such as csv, xml, json, is equally considered. Schema-Driven Mapping Definitions Generation An existing ontology can be used as the basis for generating the mappings. This is the ontology-driven approach [3], which is supported by the fluidOps editor. Hence, generating the mappings is driven in the first place by the schema, as it accrues from the ontology. Afterwards, data publishers edit the mappings by associating them to the applicable data from the input source(s). In contrast to the data-driven approach where the correct schema(s) is associated to the data, here the appropriate data is associated to the schema(s). While Pinkel et al. [3] consider a single ontology, the approach can be more generic and applied to any schema, namely any combination of ontologies and/or vocabularies. Thus, we consider the more generic notion of schema-driven approach (Figure 1b). Model-Driven Mapping Definitions Generation Alternatively, data publishers can firstly model the domain, by generating ab- stract mappings. More precisely, data publishers define the entities, their at- tributes and their relationships to other entities, without explicitly indicating neither the schema (ontologies and vocabularies) nor the input data fractions to be used. The model is subsequently instantiated by applying adequate schema(s) and it is associated with input data, by specifying which fractions of the input data sources are associated with which parts of the model. To this end, we introduce the model-driven approach (Figure 1c). While this approach is prac- tical and useful, for instance applied by De Vocht et al. [5], to the best of our knowledge, no user interface supports it. However, it enables data publishers to formally generate abstract definitions and instantiate them afterwards with the appropriate data and schema(s). Result-Driven Mapping Definitions Generation Last, we introduce the result-driven approach (Figure 1d), where mappings can be generated based on the desired results. To be more precise, from a desired rdf output, the mappings are generated based on the desired output’s model and schema(s). Afterwards, they are associated with data fractions from the input sources. In contrast to the data-driven approach, which is based on the input data and where the appropriate schema is subsequently chosen, this approach is based on the desired result and the model, together with the proper schema(s), is derived from it. A real-world example of this approach is transformy.io3 . 2 http://usc-isi-i2.github.io/karma/ 3 https://www.transformy.io 4 Pieter Heyvaert et al. 3 Discussion This paper lists approaches to generate mappings. Besides the ones mentioned, hybrid approaches might emerge when implementing them or as data publishers specify their mappings. Identifying the different approaches, together with their advantages, allows publishers to select the approach best suited for the task at hand. Starting with a particular approach does not necessarily mean that data publishers can/should not switch between approaches over the course of a map- pings’ editing time. Thus, a user interface should allow and support switching between multiple approaches as suggested by Pinkel et al. [3]. The user interface of our prototype mapping editor, the rmleditor4 , aims to validate the aforemen- tioned approaches by creating the conditions for data publishers to follow any of them. This is facilitated by simultaneously offering three different panels to data publishers: (i) Input Panel (i.e., the input data), (ii) Modeling Panel (i.e., the mappings); and (iii) Results Panel (i.e., the output rdf dataset). Acknowledgements. The described research activities were funded by Ghent University, iMinds, the Institute for the Promotion of Innovation by Science and Technology in Flanders (IWT), the Fund for Scientific Research Flanders (FWO Flanders), and the European Union. References [1] Souripriya Das, Seema Sundara, and Richard Cyganiak. R2RML: RDB to RDF Mapping Language. Working group recommendation, W3C, September 2012. URL http://www.w3.org/TR/r2rml/. [2] Anastasia Dimou, Miel Vander Sande, Pieter Colpaert, Ruben Verborgh, Erik Mannens, and Rik Van de Walle. RML: A Generic Language for Integrated RDF Mappings of Heterogeneous Data. In Workshop on Linked Data on the Web, 2014. [3] Christoph Pinkel, Carsten Binnig, Peter Haase, Clemens Martin, Kunal Sen- gupta, and Johannes Trame. How to best find a partner? An evaluation of editing approaches to construct R2RML mappings. In The Semantic Web: Trends and Challenges, pages 675–690. Springer, 2014. [4] Christoph Pinkel, Carsten Binnig, Ernesto Jiménez-Ruiz, Wolfgang May, Do- minique Ritze, Martin G Skjæveland, Alessandro Solimando, and Evgeny Kharlamov. RODI: A Benchmark for Automatic Mapping Generation in Relational-to-Ontology Data Integration. In The Semantic Web. Latest Ad- vances and New Domains, pages 21–37. Springer, 2015. [5] Laurens De Vocht, Mathias Van Compernolle, Anastasia Dimou, Pieter Col- paert, Ruben Verborgh, Erik Mannens, Peter Mechant, and Rik Van de Walle. Converging on semantics to ensure local government data reuse. Pro- ceedings of the 5th workshop on Semantics for Smarter Cities (SSC14), 13th International Semantic Web Conference (ISWC), 2014. 4 http://rml.io/RML_editor.html