Semi-Automatic Example-Driven Linked Data Mapping Creation? Pieter Heyvaert, Anastasia Dimou, Ruben Verborgh, and Erik Mannens IDLab, Department of Electronics and Information Systems, Ghent University – imec pheyvaer.heyvaert@ugent.be Abstract. Linked Data can be generated by applying mapping rules on existing (semi-)structured data. The manual creation of these rules in- volves a costly process for users. Therefore, (semi-)automatic approaches have been developed to assist users. Although, they provide promising results, in use cases where examples of the desired Linked Data are avail- able they do not use the knowledge provided by these examples, resulting in Linked Data that might not be as desired. This in turn requires manual updates of the rules. These examples can in certain cases be easy to cre- ate and offer valuable knowledge relevant for the mapping process, such as which data corresponds to entities and attributes, how this data is an- notated and modeled, and how different entities are linked to each other. In this paper, we introduce a semi-automatic approach to create rules based on examples for both the existing data and corresponding Linked Data. Furthermore, we made the approach available via the rmleditor, making it readily accessible for users through a graphical user interface. The proposed approach provides a first attempt to generate a complete Linked Dataset based on user-provided examples, by creating an initial set of rules for the users. 1 Introduction In most cases, Linked Data is generated by applying mapping rules on existing (semi-)structured data. The mapping rules state how Linked Data is generated from (raw) data, for every graph pattern that is part of the resulting Linked Data. However, before these rules can be applied, they need to be created. This includes providing (i) which data corresponds to entities and attributes, i.e., the subjects and objects of an rdf triple; (ii) how the data is annotated and modeled, i.e., which classes, properties, datatypes, and languages are used and how they are related to each other; and (iii) how different entities are linked to each other. For example, consider the raw data in Listings 1.1 and 1.2 about books and authors, and a Linked Data example in Listing 1.3. The latter includes entities and attributes which are constructed relying on data values which also appear in the raw data and uses certain classes, properties, and datatypes. For every ? The described research activities were funded by Ghent University, imec, Flanders Innovation & Entrepreneurship (AIO), the Research Foundation – Flanders (FWO), and the European Union. 2 Pieter Heyvaert, Anastasia Dimou, Ruben Verborgh, and Erik Mannens graph pattern, at least three rules need to be defined to generate this Linked Data, namely, for the subject, predicate, and object. For example, to generate a triple with the subject http://www.example.com/book/0, a rule has to state that the string http://www.example.com/book/ has to be combined with the id of the book, which is in this case “0”. This results in the creation of at least 21 mapping rules, because there are 7 triples, and each one requires at least 3 rules. 1 id,title,author 2 0,Harry Potter and The Sorcerer’s Stone,J.K. Rowling 3 1,Homo Deus,Yuval Noah Harari Listing 1.1: CSV data about books 1 { 1 @prefix book: 2 "authors": [{ 2 . 3 "id": "jkr", 3 @prefix author: 4 "name": "J.K. Rowling", 4 . 5 "country": "UK" 5 6 "birthdate": "1965-07-31" 6 book:0 a schema:Book; 7 },{ 7 schema:title 8 "id": "ynh", 8 "Harry Potter and The Sorcerer’s Stone"@en; 9 "name": "Yuval Noah Harari", 9 schema:author author:jkr. 10 "country": "Israel", 10 11 "birthdate": "1976-04-24" 11 author:jkr a foaf:Person; 12 }] 12 foaf:name "J.K. Rowling"; 13 } 13 foaf:country "UK"; 14 14 schema:birthdate "1965-07-21"^^xsd:date. 15 Listing 1.2: JSON data about authors Listing 1.3: Linked Data example When the rules are created manually, they are prone to errors, especially when dealing with large and complex data sources [1, 2], and/or multiple data sources at the same time [3]. To ease this process, both semi-automatic and automatic approaches have been the topic of research [4]. The former require user interaction during the generation of the rules, while the latter does not. Although, they provide promising results, in use cases where examples of the desired Linked Data are available the generated Linked Data might not be as desired. This in turn requires users to manually update the rules. This is due to the fact that these approaches do not consider the knowledge embedded in the examples when initially creating rules. Nevertheless, these examples offer knowledge relevant for the mapping process. A Linked Data example can be used as a reference point for the creation of rules for generating Linked Data, from some other raw data. This is done by (i) generating entities and attributes in the same way, (ii) applying the same model and semantic annotations, and (iii) providing the same relation- ships among entities. For example, users may consider existing Linked Data, the complete set or just a sample of it, and want their data to be modeled and anno- tated the same way. This is reflected in the rules that are created, as mentioned before, based on this existing Linked Data. To support the aforementioned use cases, we propose a semi-automatic ap- proach for the example-driven creation of Linked Data mapping rules. The ex- Semi-Automatic Example-Driven Linked Data Mapping Creation 3 ample Linked Data is used to extract the necessary information to create the corresponding mapping rules. Users provide two elements: a set of data sources and a set of rdf triples. These two elements are used to create the rules through the following steps: (i) the original data sources are aligned with the Linked Data example and (ii) mapping rules are created based on this alignment and the knowledge about the model and semantic annotations extracted from the Linked Data example. In certain cases, manual additions might still be required, such as data transformations on the original data [5]. The remainder of the paper is structured as follows. In Section 2, we discuss the related work. In Section 3, we introduce a running example. In Section 4, we discuss our proposed approach. In Section 5, we elaborate on the implementation of this approach. In Section 6, we discuss both the existing approaches and our approach, and conclude the paper. 2 Related Work In this section, we discuss the related work regarding mapping rules, and map- ping rule creation approaches and solutions. 2.1 Background A mapping consists of one or more mapping rules that state how rdf terms and triples should be generated. A mapping rule denotes how data from an original data source is used in the rdf terms, how these terms are associated to each other, and how these terms form rdf triples. Mapping languages, such as rrml [6] and rml [7], provide a specification of the syntax on how to declara- tively construct mappings. This is preferred over the use of custom software and scripts, because mapping languages provide a reusable solution, while custom software and scripts are tied to a specific use case and/or implementation [7]. rml is an extension of rrml, the W3C-recommended language for mapping relational databases to rdf. While rrml is limited to only relational databases, rml applies a data-independent approach, allowing the mapping of also, e.g., JSON and CSV to rdf. For a detailed explanation of both mapping languages, we refer to their corresponding specifications [6, 7]. 2.2 Mapping Generation Approaches In previous work [8], we identified four approaches that data owners use when they create mapping rules themselves: the data-driven, schema-driven, model- driven, and result-driven approach. The data-driven approach is based on the data, namely, rules are created for the data fractions of the input data sources. Subsequently, the rules are annotated with classes, properties, datatypes, and languages from schemas (vocabularies and ontologies). The schema-driven ap- proach is based on the schemas, namely, rules are created using the classes, properties, and datatypes. Subsequently, the data fractions are associated with 4 Pieter Heyvaert, Anastasia Dimou, Ruben Verborgh, and Erik Mannens the correct rules. The model-driven approach is based on the model of the do- main. More precisely, the entities, their attributes, and their relationships to other entities are defined, without explicitly indicating neither the schemas nor the data fractions to be used. This leads to abstract rules without data fractions and schema elements. Subsequently, the model is instantiated by applying ad- equate schema(s) and it is associated with data fractions, by specifying which fractions are associated with which parts of the model. The result-driven ap- proach creates rules based on the data sources and corresponding Linked Data. The rules are created based on the complete Linked Dataset’s model and used schemas. Afterwards, the data fractions are associated with the correct rules. In other research fields, the example-driven approach has been applied suc- cessfully [9, 10]. Their application of this approach includes two high-level steps: (i) for a sample of the input, the output, i.e. the example, is given by the user; (ii) similar output is generated for the complete input based on the example, with no or minimal user interaction. For example, users select an example of the data they want on a Webpage and they get all desired data, based on that example, from the page [9]. In our case, for a sample of the existing data, i.e., the input, a Linked Data sample, which is desired to be generated, may act as the example. Linked Data for the complete existing dataset is obtained by (i) creat- ing the mapping rules which define how Linked Data should be generated, and (ii) executing them and generating the Linked Data. The example-driven approach is related to the result-driven approach. Users either provide the complete Linked Dataset or a sample, i.e., the example. The result-driven approach deals with the case when you have the complete Linked Dataset, while example-driven deals with the case when you have a sample. Fur- thermore, neither the example-driven approach nor the result-driven approach have been applied so far, to the best of our knowledge. 2.3 (Semi-)Automatic Solutions As the process of creating mappings can become costly process when done manu- ally [1, 2, 3], both semi-automatic and automatic solutions have been proposed. The former combine automatic steps with user interaction, where the latter completely rely on automatic steps. For example, Jiménez-Ruiz et al. [11] devel- oped BootOX that creates mappings for relational databases based on the data schema, and applies user feedback to improve the mappings. Taheriyan et al. [12] propose an automatic solution that creates a new mapping based on previous mappings, the raw data, and the preferred ontologies. Although these solutions show promising results, during the use cases where a Linked Data example is available, users still need to adjust the rules if the resulting Linked Data does not match the example. This is due to the fact that these approaches do not take into account such use cases and, thus, they do not consider examples when creating mapping rules. Nevertheless, these exam- ples offer knowledge relevant for the mapping process, such as (i) which data corresponds to entities and attributes, i.e., the subjects and objects of an rdf triple; (ii) how the data is annotated and modeled, i.e., which classes, properties, Semi-Automatic Example-Driven Linked Data Mapping Creation 5 datatypes, and languages are used and how they are related to each other; and (iii) how different entities are linked to each other. To the best of our knowl- edge, no research has been conducted in applying the example-driven approach in (semi-)automatic solutions for Linked Data generation [4]. However, the ap- plication of this approach in other fields has shown a decrease in the cost of the process [9]. Kranzdorf et al. [9] developed an example-driven system that aids in the creation of path expressions to extract the required text from Webpages. The system works as follows: (i) users select an example text on the page, (ii) the system creates an initial expression, (iii) all the text that is selected by the expression is presented to the users, and (iv) all that text is extracted. Iteratively, users can update or provide additional examples to improve the expressions. Atzori and Zaniolo [10] introduce a method to query DBpedia based on an example Wikipedia infobox: (i) users edit the information of an infobox, which acts as an example of how the infobox of a desired Wikipedia page should look like; (ii) the infobox is used to construct a sparql query; (iii) the query is executed on DBpedia; and (iv) the resulting Wikipedia pages are presented. In both systems, the example-driven approach results in minimal user interaction required with regard to the construction of the expressions and the queries. This leads to a less costly process. As the generation of the mapping rules is similar to the aforementioned con- struction, the (semi-)automatic example-driven approach is also applicable to mapping generation. Mapping rules are created by the system based on both an existing data and Linked Data example, reducing the required user interaction. This, in turn, also can lead to a less costly process. 3 Running Example We explain the different elements of the remainder of the paper through a run- ning example. The example contains two data sources in two different formats: CSV (see Listing 1.1) and JSON (see Listing 1.2). The data source in the CSV format provides records for books, including the id, title, and name of the au- thor. The data source in the JSON format provides records for authors, including the id, name, country, and birth date. The Linked Data example is based on the first record of each data source (see Listing 1.3). The data sources are interlinked through the name of the author. This is also reflected in the reuse of the author’s iri in the Linked Data example (see line 9). 4 Approach In this section, we discuss our example-driven approach. It is based on the two steps executed by users when manually creating mapping rules based on an example: (i) the subjects, predicates, and objects of the rdf triples are aligned with the data sources; and (ii) rules are created based on the alignment, and the 6 Pieter Heyvaert, Anastasia Dimou, Ruben Verborgh, and Erik Mannens Algorithm 1 Alignment between rdf example and data sources for entity ∈ example do objects ← example.getT ripleObjects(entity) for object ∈ objects do for dataSource ∈ dataSources do align(object, dataSource) end for end for if isIRI(entity) then identif ier ← getIdentif ierIRI(entity) align(identif ier, dataSource) end if selectBestDataSource(dataSources) end for for (entity1, entity2) ∈ example do if isT ripleW ith(entity1, entity2) then f indCondition(entity1, entity2) end if end for model, semantic extractions, and relationships between entities extracted from the rdf triples. 4.1 Data Source Alignment The triples in a Linked Data example are contain data values that stem from the existing data. To create mapping rules that refer to the correct data val- ues, our approach needs to determine which data values in the example align with which data values in the data sources. The steps of this alignment can be found in Algorithm 1. For each entity, triples with the entity as subject are grouped together. Per triple in a group, for each object the correct reference to a data fraction in each data source is determined, if possible. The example value (as found in the rdf example) is compared with the data values of every data fraction of the data source. For every unique entity there is a unique iri. A common practice is to construct iris by using a base iri [13] to which a entity- specific value is appended. For example, http://www.example.com/book/0 and http://www.example.com/book/1 are iris for books. Both iris start with http: //www.example.com/book/ and the id of the book, i.e., “0” and “1”, is ap- pended. This knowledge needs to added to the mapping rules to generate the correct subjects for the triples. Therefore, we analyze the iris and extract the document or fragment identifier, i.e., the string after the last # or /. Next, the identifier is aligned with the data sources. For each group the data source is selected for which the most references could be found. In case the same number of references are found, an arbitrary choice is made. If two entities are connected, the conditions under which the entities are related to each other can be determined. In our approach, we consider every value Semi-Automatic Example-Driven Linked Data Mapping Creation 7 a csv b json schema:author schema:book foaf:Person book:0 author:jkr schema:title foaf:name foaf:birthdate foaf:country Harry Potter and The Sorcerer’s Stone 1965-07-21 J.K. Rowling UK @en xsd:date csv csv json json json Fig. 1: The rdf example is aligned with the data sources. from the selected data source for each entity and search for matches between these values. We apply these steps to the Linked Data example (see Listing 1.3) and data values extracted from the two data sources (see Listings 1.1 and 1.2): 1. There are two groups, because there are two entities (see Figure 1). One group includes the triples regarding the book book:0 (a) and the other in- cludes the triples regarding the author author:jkr (b). In Figure 1 a graph visualization is used to represent the rdf triples from the example. iris and blank nodes, i.e., entities, are represented as circular nodes and include the class of an entity. Literals are represented as rectangular nodes and include the datatype or the language of a literal value. Edges are used to represent the relationships connecting the subjects and objects, and include the used predicates. 2. Within each group, we perform the alignment between the example data and the data sources. For group (a), the literal values aligne with the CSV data source, because “Harry Potter and The Sorcerer’s Stone” can be found in title. For group (b), the literal values “UK” and “1965-07-31” are aligned with the JSON data source via country and birthdate. The literal value “J.K. Rowling” is aligned with both the CSV and JSON data source, because “J.K. Rowling” can be found in the column author and attribute name, respectively. 3. For the iris of the two entities, we consider the part after the / to determine the corresponding reference in the data sources. The value for book:0 is “0”, which aligns with id of the CSV source, and for author:jkr it is “jkr”, which aligns with id of the JSON source. 4. We select the appropriate data source (and references) for each group. For group (a) only data values from the CSV data source is used, so we choose this data source. For group (b), both data sources are used; however, only the JSON data source was able to align with four nodes, while the CSV data source was only able to align with one node. Therefore, we choose the JSON data source over the CSV data source for group (b). 8 Pieter Heyvaert, Anastasia Dimou, Ruben Verborgh, and Erik Mannens Algorithm 2 Creation of mapping rules based on alignment and extracted knowledge for entity ∈ example do triples ← example.getT riples(entity) triplesM ap ← genererateT riplesM ap() triplesM ap.generateLogicalSource(entity) triplesM ap.generateSubjectM ap(entity) for triple ∈ triples do predicateObjectM ap ← triplesM ap.generateP redicateObjectM ap(triple.predicate) predicateObjectM ap.generateObjectmap(triple.object) end for end for for (entity1, entity2) ∈ example do if isT ripleBetween(entity1, entity2) then triple ← example.getT riple(entity1, entity2) pom ← triplesM ap.generateP redicateObjectM ap(triple.predicate) pom.generateRef erencingObjectM ap(entity1, entity2) end if end for 5. The book and the author of the rdf example have the same value for the column author and the attribute name. These references can be used to determine whether a pair of authors and books are related. 4.2 Mapping Creation Once the Linked Data example is aligned with data sources, a mapping is cre- ated. During the creation of the rules, additional knowledge is required that can be extracted from the Linked Data example. More specific, this knowledge in- cludes (i) how the data is annotated and modeled, i.e., which classes, properties, datatypes, and languages are used and how they are related to each other; and (ii) how different entities are linked to each other. The previous step in our approach was mapping language independent. How- ever, to create an actual mapping that can be used to generate Linked Data, we need to rely on a specific mapping language. In our approach we use rml, as it allows extracting data values from multiple, heterogeneous data sources and semantically annotating them (see Section 2). Again, triples are grouped by entity. The steps that are performed for each group can be found in Algorithm 2. For every group, a Triples Maps is cre- ated. Each Triples Map contains the details about how subjects, predicates, and objects are generated for a certain type of entity. Each Triples Map requires a Logical Source. This explains which data source is used to create the different subjects, predicates, and objects. In the case of rml, an iterator is required if multiple entities need to be mapped to Linked Data. In our approach, the iterator can be determined by taking the common path of the references for each group. Each Triples Map needs a Subject Map that explains how subjects of triples are Semi-Automatic Example-Driven Linked Data Mapping Creation 9 generated. A Subject Map includes the classes and the template to generate the correct iris when iris are required instead of blank nodes. The classes can be extracted from the triples and the template is available from the alignment. For every combination of predicates and objects a Predicate Object Map is needed. The required information is available in a triple’s predicate and object. The pred- icate of the triple is added to Predicate Object Map. If an object of a triple is a literal, it determines the details of a Object Map of the Predicate Object Map: (i) the reference found via the alignment is added via rml:reference; (ii) if the literal has a datatype, it is added via rr:datatype; and (iii) if the literal has a language, it is added via rr:language. If the object refers to an entity, instead of an Object Map, a Referencing Object Map is used. This map refers to the Triples Map of the other entity. Additionally, join conditions are added, if found during the alignment. A join condition states the condition under which there is a relationship between two entities. 1 <#TM_B> rml:logicalSource <#LS_B>; 2 rr:subjectMap <#SM_B>; 3 rr:predicateObjectMap <#POM_B1>, <#POM_B2>. 4 5 <#LS_B> rml:source "books.csv"; rml:referenceFormulation ql:CSV. 6 7 <#SM_B> rr:class schema:Book; rr:template "http://www.example.com/book/{id}". 8 9 <#POM_B1> rr:predicate schema:title; 10 rr:objectMap [ rml:reference "title"; rr:language "en" ]. 11 12 <#POM_B2> rr:predicate schema:author; 13 rr:objectMap [ 14 rr:parentTriplesMap <#TM_A>; 15 rr:joinCondition [rr:child "author"; rr:parent "name"] 16 ]. 17 18 <#TM_A> rml:logicalSource <#LS_A>; 19 rr:subjectMap <#SM_A>; 20 rr:predicateObjectMap <#POM_A1>, <#POM_A2>, <#POM_A3>. 21 22 <#LS_A> rml:source "authors.json"; 23 rml:iterator "$.authors[*]"; 24 rml:referenceFormulation ql:JSONPath. 25 26 <#SM_A> rr:class schema:Person; 27 rr:template "http://www.example.com/author/{id}". 28 29 <#POM_A1> rr:predicate foaf:name; rr:objectMap [ rml:reference "name" ]. 30 31 <#POM_A2> rr:predicate foaf:country; rr:objectMap [ rml:reference "country" ]. 32 33 <#POM_A3> rr:predicate foaf:birthdate; 34 rr:objectMap [ rml:reference "birthdate"; rr:datatype xsd:date ]. Listing 1.4: RML mapping for the example In Listing 1.4 the resulting mapping rules for our example can be found. Two Triples Maps with each a Logical Source, Subject Map and one or more Pred- icate Object Maps were created (lines 1-16 and 18-34). The Logical Source for the books (line 5) refers to the CSV data source (rml:source) and states that we are using the reference formulation for CSV (rml:referenceFormulation). The Subject Map (line 7) denotes that every book is of the class schema:Book 10 Pieter Heyvaert, Anastasia Dimou, Ruben Verborgh, and Erik Mannens and that the iri for each book is constructed based on the id, as found dur- ing the alignment. The only attribute for the books is related to the entity via schema:title and uses the data from the column title, which is in En- glish. This is reflected in a Predicate Object Map (line 9), with a connected rr:objectMap pointing to the correct column and language. A second Predi- cate Object Map (line 12) is used to state the relationship between the books and the authors and is only valid when the author of the book and the name of the author are the same (line 15). Similar maps are created for the authors. However, the Logical Source points to a different data source and a different reference formulation is used, and an iterator is added, as we are dealing with the JSON data source (lines 22-24). Additionally, the datatype of the birth date is set to xsd:date (line 34). 1 book:0 a schema:Book; 2 schema:title "Harry Potter and The Sorcerer’s Stone"@en; 3 schema:author author:jkr. 4 5 book:1 a schema:Book; 6 schema:title "Homo Deus"@en; 7 schema:author author:ynh. 8 9 author:jkr a foaf:Person; 10 foaf:name "J.K. Rowling"; 11 foaf:country "UK"; 12 schema:birthdate "1965-07-21"^^xsd:date. 13 14 author:ynh a foaf:Person; 15 foaf:name "Yuval Noah Harari"; 16 foaf:country "Israel"; 17 schema:birthdate "1976-04-24"^^xsd:date. Listing 1.5: Generated Linked Data based on the two data sources and the RML mapping The generated Linked Data based on the two data sources and the rml mapping can be found in Listing 1.5. The Linked Data contains the rdf triples that were provided as example (lines 1-3 and 9-12), and the rdf triples for the second book and author that are present in the data sources (lines 5-7 and 14-17). 5 Implementation The approach is available via a JavaScript library1 . This library is available for Node.js and the browser. The library supports both the CSV and JSON format, showcasing the support for tabular and hierarchical data. Furthermore, it is accessible through a command line interface and a graphical user interface via the rmleditor [14]. The rmleditor provides a graphical user interface for the creation and editing of mappings, with rml as its underlying mapping language. To apply the example-driven approach, users need to perform two steps: load the different data sources and provide a Linked Data example through a set of 1 https://github.com/RMLio/example2rml Semi-Automatic Example-Driven Linked Data Mapping Creation 11 rdf triples. Subsequently, the mapping is created as described and visualized in the interface. This process is shown in the screencast at https://www.youtube. com/watch?v=IQVwlYQXwAo. 6 Discussion and Conclusion Although existing approaches, such as data-driven and schema-driven, and their corresponding (semi-)automatic solutions, have been the topic of multiple re- search efforts, they offer limited benefits when dealing with use cases that pro- vide Linked Data examples. These approaches consider one or more different el- ements as input, such as data, data schemas, ontologies, and existing mappings. The example-driven approach considers as input the data and a Linked Data example. The advantage of the example-driven approach is the use of knowledge that can be extracted from the Linked Data example: (i) which data corresponds to entities and attributes, i.e., the subjects and objects of an rdf triple; (ii) how the data is annotated and modeled, i.e., which classes, properties, datatypes, and languages are used and how they are related to each other; and (iii) how different entities are linked to each other. By using this knowledge, the example- driven approach creates mapping rules that generate Linked Data as desired by the users. The other approaches do not consider this example, and, thus, the created mapping rules might not generate the desired Linked Data. Conse- quently, users need to manually update the mapping rules. Nonetheless, these approaches are better suited when an example cannot be provided. In these use cases, the example-driven approach cannot be applied due to lack of an example. Therefore, the use case at hand drives the choice for the appropriate approach. Furthermore, the example-driven approach can be extended by applying tech- niques introduced by the other approaches, such as the use of data schemas, other ontologies, and existing mappings. This results in a hybrid approach that uses an increased amount of knowledge when creating the mapping rules, com- pared to the individual approaches. This increase of used knowledge leads to an improvement of the created mapping rules, which reduces the cost of the mapping process to generate the desired Linked Data. As future work, we envision the addition of an extra step to our approach to create rules that apply data transformations on the raw data during the Linked Data generation. This is needed for use cases where the data in the rdf triples is not just a copy of the raw data, but instead the raw data needs to transformed before it can be used as (part of) of a subject, predicate, or object. Furthermore, we plan to explore how to combine the different approaches to achieve the aforementioned hybrid approach. References [1] Evgeny Kharlamov, Dag Hovland, Ernesto Jiménez-Ruiz, Davide Lanti, Hallstein Lie, Christoph Pinkel, Martin Rezk, Martin G. Skjæveland, Evgenij Thorstensen, Guohui Xiao, Dmitriy Zheleznyakov, and Ian Horrocks. Ontology Based Access 12 Pieter Heyvaert, Anastasia Dimou, Ruben Verborgh, and Erik Mannens to Exploration Data at Statoil. In Proceedings of the 14th International Semantic Web Conference, pages 93–112. Springer, 2015. [2] Evgeny Kharlamov, Nina Solomakhina, Özgür Lütfü Özçep, Dmitriy Zheleznyakov, Thomas Hubauer, Steffen Lamparter, Mikhail Roshchin, Ah- met Soylu, and Stuart Watson. How Semantic Technologies Can Enhance Data Access at Siemens Energy. In Proceedings of the 13th International Semantic Web Conference, pages 601–619. Springer, 2014. [3] Bin He, Mitesh Patel, Zhen Zhang, and Kevin Chen-Chuan Chang. Accessing the Deep Web. Communications of the ACM, 50(5):94–101, 2007. [4] Pieter Heyvaert. Ontology-based data access mapping generation using data, schema, query, and mapping knowledge. In Proceedings of the 14th Extended Semantic Web Conference: PhD Symposium, May 2017. [5] De Meester, Maroy, Dimou, Verborgh, and Mannens. Declarative Data Trans- formations for Linked Data Generation: the case of DBpedia . In Eva Blomqvist, D. Maynard, Aldo Gangemi, R. Hoekstra, Pascal Hitzler, and Olaf Hartig, editors, Proceedings of the 14th ESWC, LNCS, pages 33–48. Springer, Cham, may 2017. ISBN 978-3-319-58450-8, 978-3-319-58451-5. doi: 10.1007/978-3-319-58451-5 3. URL https://link.springer.com/chapter/10.1007/978-3-319-58451-5_3. [6] Souripriya Das, Seema Sundara, and Richard Cyganiak. R2RML: RDB to RDF Mapping Language. Working group recommendation, W3C, September 2012. URL http://www.w3.org/TR/r2rml/. [7] Anastasia Dimou, Miel Vander Sande, Pieter Colpaert, Ruben Verborgh, Erik Mannens, and Rik Van de Walle. RML: A Generic Language for Integrated RDF Mappings of Heterogeneous Data. In Proceedings of the 7th Workshop on Linked Data on the Web, 2014. [8] Pieter Heyvaert, Anastasia Dimou, Ruben Verborgh, Erik Mannens, and Rik Van de Walle. Towards Approaches for Generating RDF Mapping Definitions. In Proceedings of the 14th International Semantic Web Conference: Posters and Demos, 2015. [9] Jochen Kranzdorf, Andrew Sellers, Giovanni Grasso, Christian Schallhart, and Tim Furche. Visual OXPath: robust wrapping by example. In Proceedings of the 21st International Conference on World Wide Web, pages 369–372. ACM, 2012. [10] Maurizio Atzori and Carlo Zaniolo. SWIPE: Searching Wikipedia by Example. In Proceedings of the 21st International Conference on World Wide Web, pages 309–312. ACM, 2012. [11] Ernesto Jiménez-Ruiz, Evgeny Kharlamov, Dmitriy Zheleznyakov, Ian Horrocks, Christoph Pinkel, Martin G. Skjæveland, Evgenij Thorstensen, and Jose Mora. BootOX: Practical Mapping of RDBs to OWL 2. In Proceedings of the 14th International Semantic Web Conference (Part II), pages 113–132. Springer, 2015. [12] Mohsen Taheriyan, Craig A Knoblock, Pedro Szekely, and José Luis Ambite. Learning The Semantics of Structured Data Sources. Web Semantics: Science, Services and Agents on the World Wide Web, 37:152–169, 2016. [13] Larry Masinter, Tim Berners-Lee, and Roy T Fielding. Uniform resource identifier (URI): Generic syntax. Technical report, 2005. [14] Pieter Heyvaert, Anastasia Dimou, Aron-Levi Herregodts, Ruben Verborgh, Dim- itri Schuurman, Erik Mannens, and Rik Van de Walle. RMLEditor: A Graph- based Mapping Editor for Linked Data Mappings. In The Semantic Web – Latest Advances and New Domains (ESWC 2016), pages 709–723. Springer, 2016.