D2RML: Integrating Heterogeneous Data and Web Services into Custom RDF Graphs Alexandros Chortaras Giorgos Stamou National Technical University of Athens National Technical University of Athens Athens, Greece Athens, Greece achort@cs.ntua.gr gstam@cs.ntua.gr ABSTRACT Following the Linked Data growth, several research institutions In this paper, we present the D2RML Data-to-RDF Mapping Lan- and companies such as DBpedia1 , WordNet2 , OpenStreetmap3 , of- guage, as an extension of the R2RML mapping language, which fer now access to their huge datastores through SPARQL endpoints significantly enhances its abilities to collect data from diverse data or RESTful web services. Even more recently, the expansion of sources and transform them into custom RDF graphs. The defini- cloud computing and the exciting developments in the field of ma- tion of D2RML is based on a simple formal abstract data model, chine learning and the subsequent revival of interest in artificial which is needed to clearly define its semantics, given the diverse intelligence applications has resulted in the emergence of cloud types of data representation standards used in practice. D2RML platforms and marketplaces that offer intelligent data analysis web allows web service-based data transformations, simple data ma- services, often representing their output using Linked Open Data nipulation and filtering, and conditional maps, so as to improve vocabularies and resources, such as DBedia Spotlight4 , Google’s the selectivity of RDF mapping rules and facilitate the generation Cloud Natural Language5 and Microsoft’s Computer Vision API6 . of higher quality RDF data stores, through a lightweight, easy to These services typically deliver data using some structured data write and modify specification. exchange format (usually JSON or XML documents). Thus, if until recently the question was how to integrate exist- ing data with the Semantic Web, now part of the question is also CCS CONCEPTS how to use all these available data and diverse services in a coor- • Information systems → Information integration; Web data dinated and integrated manner to selectively pick and aggregate description languages; Query languages; Web services; data into custom data stores to power new intelligent applications. In this respect, aggregating data into custom RDF data stores is of KEYWORDS particular interest not only because they allow direct integration RDF mapping language, Data integration, Web service integration with the Linked Data cloud, but also because intelligence can be added on top of the data by including e.g. axiomatic knowledge in ACM Reference Format: the form of OWL2 [20] axioms. As a matter of fact, recent work on Alexandros Chortaras and Giorgos Stamou. 2018. D2RML: Integrating Het- efficient algorithms and methods for reasoning with tractable frag- erogeneous Data and Web Services into Custom RDF Graphs. In Proceed- ments of ontologies (e.g. [3], [21]) has allowed the development of ings of Linked Data on the Web 2018 (LDOW2018). ACM, New York, NY, practical systems that provide inferencing over semantic data. USA, 10 pages. In this environment, we propose D2RML, a generic Data-to-RDF Mapping Language, whose aim is to facilitate the generation of custom RDF data stores by selectively collecting and integrating 1 INTRODUCTION data from diverse data sources and web services into as much as In the past years, a considerable amount of work has been done possible high quality RDF data stores. Our purpose is to provide on developing methodologies for mapping relational databases to a formal basis for defining transformation-oriented general Data- RDF graphs. Several approaches, mapping languages and systems to-RDF mappings, as well as, while staying within the mapping have been proposed, including two W3C recommendations [1, 8]. language approach, to transfer as much as possible of the burden This work has mainly been motivated by the need to integrate for generating such data stores in practice from writing code or the huge amount of information contained in existing relational using heavyweight data workflow solutions, to writing easy un- databases with the emerging Semantic Web, and make them part derstandable and modifiable specifications. of the Linked Data cloud. The rest of the paper is organized as follows: In Section 2 we briefly discuss related work with emphasis on R2RML and RML, which are the starting points for our work. In Section 3 we define the simple theoretical data model that underlies D2RML. In Section Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed 4 we describe how several widely used information sources can for profit or commercial advantage and that copies bear this notice and the full citation be cast onto our model, and in Section 5 we present the formal on the first page. Copyrights for third-party components of this work must be honored. specification of D2RML. Section 6 presents an extensive realistic For all other uses, contact the owner/author(s). LDOW2018, April 2018, Lyon, France 1 http://dbpedia.org/sparql/ 2 http://wordnet-rdf.princeton.edu/ © 2018 Copyright held by the owner/author(s). 3 http://api.openstreetmap.org/ 4 http://www.dbpedia-spotlight.org/api/ 5 https://cloud.google.com/natural-language/ 6 https://azure.microsoft.com/en-us/services/cognitive-services/computer-vision/ LDOW2018, April 2018, Lyon, France Alexandros Chortaras and Giorgos Stamou use case that showcases the expressivity and practical usefulness other data types. E.g. select conditions and transformation func- of the proposed language, and Section 7 concludes the paper. tions are supporting implicitly by R2RML by relying on the expres- sivity of the SQL query language, but this is not fully portable in a 2 RELATED WORK straightforward extension to the case of XML or JSON documents. Several languages and systems have been proposed to map rela- tional databases to RDF (RDB-to-RDF mapping languages). A com- 2.1 R2RML and RML parative analysis is presented in [14], which determines fifteen de- R2RML works with logical tables (rr:LogicalTable), which may sirable features (e.g. support for transformation functions, named be either base tables or views (rr:BaseTableOrView) defined by graphs, integrity constraints) that such languages should have, and specifying an appropriate table name (rr:tableName), or result discusses how they are or are not supported by the several lan- sets (rr:R2RMLView) obtained by executing a query (rr:sqlQuery). guages. Existing RDB-to-RDF mapping languages vary consider- Each logical table is mapped to RDF triples using one or more ably in the flexibility they allow in defining mappings, from the triples maps (rr:TriplesMap). A triples map is a complex rule rigid Direct Mapping [1] approach that automatically translates that maps each row in the underlying logical table to several RDF the data of a relational database into an RDF graph representation triples. The rule has two parts: a subject map (rr:SubjectMap) that following the database schema, to the R2RML language [8] that al- generates the subject of all RDF triples that will be generated from lows the user to define custom views and mapping rules (expressed each row of the logical table, and several predicate-object maps as RDF graphs), and satisfies most of the fifteen desirable features. (rr:PredicateObjectMap) that in turn consist of predicate maps The development of mapping languages and practical systems (rr:PredicateMap) and object maps (rr:ObjectMap) or referenc- for translating data sources other than relational databases to RDF ing object maps (rr:RefObjectMap). A predicate map determines graphs has also been attempted. Closer to the relational model are predicates for the to-be generated RDF triples for the given sub- CSV/TSV documents and spreadsheets, which retain the tabular ject, and the object maps their objects. A subject map may include format. Tools for converting from these data sources include XL- several IRIs (rr:class) that will be used as objects to generate Wrap [18], TaRQL7 , Vertere8 , and M2 [22]. In all such tools, for each triples with the predicate rdf:type for the particular subjects. A table row one or more RDF resources are generated, and for each subject map or predicate-object map may have also one or more column one or more RDF triples about the respective resources graph maps (rr:GraphMap) associated with it, which specify the are generated. Other formats, such as XML, diverge considerably target graph of the resulting RDF triples. Referring object maps al- from tabular data owing to their hierarchical structure, and the sys- low joining two different triples maps. A referring object map spec- tems that have been proposed to translate XML to RDF graphs rely ifies a parent triples map (rr:parentTriplesMap), the subjects of on XSLT transformations (e.g. XML2RDF9 ), XPath (e.g. Tripliser10 ), which will act as objects for the current triples map, and may con- XQuery (e.g. XSPARQL [2]) or on embedding within the XML doc- tain (rr:joinCondition) a join condition (rr:Join) specified by uments links to transformation algorithms, typically XSLT trans- a reference to a column name of the current and parent triples formations (GRDDL [6]). All such tools rely on syntactical trans- map (rr:child and rr:parent, respectively). The IRIs and liter- formations of parts of the XML structure to RDF triples. Another als that will be used as RDF triple subjects, predicates, objects, or framework to assist the transformation of XML and JSON data RDF graph names may be either declared constants (rr:constant), sources is xCurator [13] which focus on delivering high-quality or obtained from the underlying table, view or result set by speci- linked data. Apart from the above, there exist also tools, in the fying the desired column name (rr:column) that will act as value form of web services (e.g. The Datatank11 ) or parts of other infras- source, or generated through a string template (rr:template) to tructures (e.g. Virtuoso Sponger12 ) that provide custom solutions concatenate column values and custom strings. String templates of- to work with data from different formats and possibly construct fer only very rudimentary options to manipulate actual database RDF graphs out of them. These tools, however, are general data values and generate custom IRIs and literals. processing and transformation tools and not designed to directly RML extends R2RML by allowing other sources (e.g. JSON or support semantic mappings of general data to RDF triples. XML files) apart from logical tables (rml:LogicalSource), that To resolve the polymorphy of tools and focus on the semantic may be used in an interlinked manner, by defining data iterators aspects of the Data-to-RDF mapping process, several works ex- (rml:iterator) to split the data obtained from such sources into tend the W3C recommended R2RML language to support other base elements on which each mapping rule will be applied, and data formats. These include KR2RML [23], xR2RML [19] and RML by allowing particular references (rml:reference), in the form of [9]. These proposals are a considerable advance with respect to subelement selectors within the base element, to define the value custom system solutions, because they are based on an existing, sources to be used for the generation of IRIs and literals. Both the clean, mapping-oriented standard, and allow backward compati- iterators and the references depend on the underlying data source, bility, and in most cases extensibility. It should be noted, however, and may be XPath queries, JSONPath queries, CSV column names that simply extending the R2RML standard to support other data or SPARQL return variable names. Their type is declared using the source types, does not necessarily carry on all its features into the rml:referenceFormulation predicate. 7 8 https://github.com/knudmoeller/Vertere-RDF/ With respect to the specification of the actual access to the data https://github.com/tarql/tarql/ 9 sources, R2RML leaves the issue to the implementation. The as- http://www.gac-grid.de/project-products/Software/XML2RDF.html 10 http://daverog.github.io/tripliser/ 11 http://thedatatank.com/ sumption is that each R2RML document applies to data from a 12 http://vos.openlinksw.com/owiki/wiki/VOS/VirtSponger unique database. In contrast, RML, which allows multiple sources D2RML: Integrating Heterogeneous Data and Web Services into Custom RDF Graphs LDOW2018, April 2018, Lyon, France and cross-references between the retrieved data, must include the and object filter, respectively. The implementation of R is the set of data source descriptions within the RML document. To describe RDF triples them, it suggests the use of some recommended or widely-used vo- {(s, p, o) | s ∈ Fs (D), p ∈ Fp (D), o ∈ Fo (D), D ∈ T }. cabularies such as DCAT13 , D2RQ14 , CSVW15 , Hydra16 , SPARQL- SD17 to access files, relational databases, CSV/TSV files, web APIs A set of triples rules over one or more set tables defines a Data- and SPARQL endpoints, respectively. However, these vocabularies to-RDF mapping. Using the above simple model we can define Data- have been developed mainly for APIs and data sources to inform to-RDF mappings for any information sources that can give rise to clients about their exact properties and services they offer, and not one or more set tables. The triple store represented by a Data-to- as a form of formulating requests to them. E.g. to retrieve data from RDF mapping is then the implementation of all its triples rules. a web API that paginates the results using next page access keys, We consider an information source to be any online software sys- knowledge on how to formulate each time the subsequent HTTP tem that can deliver structured data upon request. The information request is needed; this is not covered for example by Hydra. Sim- source may be a data repository (e.g. a relational database, an RDF ilarly, a SPARQL-SD specification provides information about the store, an XML file stored in some directory) or an implementation supported SPARQL version, the default entailment regime, the de- of a service or an algorithm (e.g. a RESTful web service) that may fault named graph, etc., which are not useful to a client, at the time process some input data and deliver some structured output. The of formulating a request. request, in the form of a query (e.g. an SQL or SPARQL SELECT query) or message (e.g. an HTTP GET or POST request) in a for- 3 DATA MODEL mat supported by the information source, includes all input data In this section, we extend the table-based model underlying the and parameters required by the information source to generate and R2RML language to support complex, non-tabular data, that can be deliver the output. The reply, or effective data source, is the output obtained from various information sources (such as JSON or XML produced by the information source, upon processing the request. document returning sources). To do this we consider that instead The reply may be delivered to the client in a native format (e.g. as of logical tables, RDF triples are generated from set tables. In the an SQL result set), or in a generic document format (e.g. as a JSON following we represent an RDF triple as a tuple ⟨s, p, o⟩, where s is or XML document). the subject, p the property or predicate and o the object. To accommodate the several possible information sources in our model, we consider, as in RML, that the effective data source Definition 3.1. A set row of arity k is a tuple ⟨D 1 , . . . , D k ⟩, where groups some set of autonomous elements (e.g. rows of an SQL re- D 1 , . . . , D k are sets of values over some domains. A name row of sult set, elements of a JSON array). The division of the reply in arity k is a tuple ⟨n 1 , . . . , nk ⟩, where n 1 , . . . , nk are names. A set these autonomous elements is achieved through an iterator. Hence, table of arity k with m rows is a tuple S = ⟨N , T ⟩, where N is a an effective data source together with an iterator specifies a logical name row and T = [D1 , . . . , Dm ] a list of set rows, all of arity k, array, through whose items the iterator eventually iterates. Each such as the i-th elements of D1 , . . . , Dm , for 1 ≤ i ≤ k, share all item of a logical array may itself be a complex data structure (a the same domain. new effective data source), so in order to extract from it lists of val- The names allow us to refer to particular elements of set rows ues to construct set rows and use them as subjects, predicates and and tables. We denote the set of values that corresponds to name objects of RDF triples, we need some selectors. Thus, the role of the ni (1 ≤ i ≤ k) in a set row D by D[nk ]. We also denote the list selectors is to transform a logical array into a set table. [D1 [nk ], . . . , Dm [nk ]] of value sets that are obtained from the sev- Definition 3.4. The triple A = ⟨I, t, L⟩, where I is a informa- eral set rows of S by S[nk ], which we call a column of S. Let also tion source and request specification, t an iterator specification, dom(n) denote the domain of column n. It should be underlined, and L a set of selectors, is a data acquisition pipeline. that for a particular set row D and the different possible names ni , the several sets D[ni ] may have different numbers of values, there It follows that each data acquisition pipeline A gives rise to a is no alignment between the individual values among the several unique set table S A . A data acquisition pipeline may be paramet- sets, and all individual values are equivalent with respect to their ric, in the sense that the information source or request specification relation to the values of the other sets in the same set row. may contain parameters. Given a non-parametric data acquisition pipeline A, a parametric data acquisition pipeline A ′ that depends Definition 3.2. A filter F over a set table S of arity k is a tuple on A is a data acquisition pipeline whose parameters take values ⟨n, f ⟩, where n is a column name and f : dom(n) → dom(n) a from one or more columns of S A . We call such a parametric data function, such that f (D[n]) ⊆ D[n] for all set rows D of S. acquisition pipeline a transformation of A. We denote the set value f (D[n]), obtained by applying F on a Definition 3.5. A series of data acquisition pipelines A0 , A1 , . . ., set row D by F (D). Clearly, f may be the identity function. Al , where each Ai , for i > 1, is a transformation that depends on Definition 3.3. A triples rule R over a set table S = ⟨N , T ⟩ is one or more A j for j < i is a set table specification. A0 is the a triple of filters ⟨Fs , Fp , Fo ⟩, over S, called the subject, predicate primary data acquisition pipeline. 13 14 http://d2rq.org/d2rq-language A set table specification gives rise to a unique set table, which is https://www.w3.org/TR/vocab-dcat/ 15 https://www.w3.org/TR/tabular-metadata/ S A0 extended by columns contributed by transformations A1 , . . ., 16 https://www.hydra-cg.com/spec/latest/core/ Al . A trivial set table specification consists only of the primary 17 https://www.w3.org/TR/sparql11-service-description/ data acquisition pipeline A0 . Each transformation in a set table LDOW2018, April 2018, Lyon, France Alexandros Chortaras and Giorgos Stamou specification is realized as a series of requests to the respective in- Table 1: Information sources, requests and replies formation source, after binding the parameters to all possible com- binations of values obtained from the referred to columns of the set Information Source Request Effective Data Source table constructed from the preceding data acquisition pipelines. In SQL RDBMS SQL Result Set particular, to evaluate a set table specification, we must evaluate se- SELECT Query rially the data acquisition pipelines, extending at each step the pre- SPARQL viously obtained set table: The primary data acquisition pipeline SELECT Query SPARQL Result Set SPARQL Endpoint A0 gives rise to set table S A0 . Then, for each set row D of S A0 , and RDF graph IRIs via HTTP Message via HTTP Message evaluating A1 gives rise to a set table S A1 (D). By flattening all RESTful HTTP JSON/XML/CSV/TSV rows of S A1 (D) into a single row (by merging the respective col- Web Service GET/POST Request Document umn values of each row) we obtain a new set row that is appended JSON/XML/CSV/TSV HTTP JSON/XML/CSV/TSV to D. Doing this for all set rows D results in S A0 A1 . By applying Document GET Request Document this process iteratively, eventually S A0 is extended with additional columns to set table S A0 A1 ... Al . More formally, let n 1 , . . . , nk be the names, and [D1 , . . . , Dm ] Table 2: Effective Data Sources, iterators and selectors the rows of Ŝ  S A0 ... Ai . Evaluating Ai+1 on each row of Ŝ pro- duces set tables S Ai +1 (D1 ), . . ., S Ai +1 (Dm ). Since all these set ta- Effective Data Source Iterator Selector bles are produced by the same data acquisition pipeline Ai+1 , they SQL Result Set Row Iterator Column name share the same arity, say k ′ , and column names, say n̂ 1 , . . . , n̂k ′ . SPARQL Result Set Row Iterator Variable name Thus S A0 ... Ai +1 = ⟨N , T ⟩, where N = ⟨n 1 , . . . , nk , n̂ 1 , . . . , n̂k ′ ⟩, JSON Document JSONPath query Flat JSONPath query T = [D1′ , . . . , Dm ′ ], D ′ = [D [n ], . . . , D [n ], D̂ , . . . , D̂ ′ ] j j 1 j k j1 jk XML Document XPath query Flat XPath query for 1 ≤ j ≤ m, and D = S A (Dj )[n̂ ] for 1 ≤ l ≤ k . ′ ′ ˆ Ð i +1 l CSV/TSV Document Row Iterator Column name jl The row flattening step is intentional: S A0 provides the origi- nal data that we want to extend through transformations, ie. by appending new columns containing new properties of that data. RDF form. D2RQ Mapping Language [7] allows a JDBC-dependent Since, as mentioned above, all values contained in a particular row RDF definition of connection strings and is used by RML to specify and column of S A are equivalent with respect to the values in RDBMS connectivity. the sets of the other columns of the current row, the flattening be- An implementation provided with a RDBMS connection specifi- haviour maintains this relationship between values, without intro- cation can connect to the particular RDBMS, pose an SQL SELECT ducing non-desired hierarchical dependencies. Finally, the primary query q that specifies attributes n 1 , . . . , nk in the SELECT state- data acquisition pipeline may be itself parametric. In this case, the ment for the returned columns, and obtain as result a list of rows evaluation is done exactly as described above, but the set rows [⟨v 11 , . . . , v 1k ⟩, . . . , ⟨vn1 , . . . , vnk ⟩]. Using, a trivial row iterator generated by S A0 are not appended to the set table on which it and column names n 1 , . . . , nk as selectors, the results of q can be depends, but initiate a new set table. converted to the following set table: ⟨⟨n 1 , . . . , nk ⟩, [⟨{v 11 }, . . . , {v 1k }⟩, . . . , ⟨{vn1 }, . . . , {vnk }⟩]⟩ 4 INFORMATION SOURCES AND REPLIES We now study how several information and effective data sources 4.2 RESTful Web Services used in real applications can be accommodated by our model. We RESTful web services are services based on the REST principles discuss relational databases, RESTful web services, JSON, XML, [11], and are usually implemented using the HTTP protocol. Typ- CSV/TSV documents, and SPARQL endpoints. ically, a data retrieving RESTful service accepts an HTTP request and delivers the result in a self-descriptive text message (e.g. an 4.1 Relational Databases HTML, XML, JSON, plain text). Here we are interested in struc- In relational databases data is organized into one or more tables tured reply services, i.e. services whose reply is in one of the XML, (or relations) of columns (or attributes) and rows (or tuples). Each JSON or CSV/TSV formats. To access a RESTful web service, the table column has a name. Data are retrieved by issuing an SQL elements of the appropriate HTTP request have to be specified. SELECT query and the results are packed as a result set, which These include the method (GET or POST), the URI (including the is essentially a row-by-row iteratable table along with its meta- query string in the case of a GET message), any headers, and the data. Because relational database management systems (RDBMS) body (for passing parameters in the case of a POST message). All use native formats to implement the data stores and the result for- these can be specified in RDF using the W3C’s Working Group mats, communications with RDBMSs’ are done using special pro- Notes ‘HTTP Vocabulary in RDF 1.0’ [16] and ‘Representing Con- tocols (such as ODBC, JDBC) to implement clients for particular tent in RDF 1.0’ [17]. Thus, we can assume that an HTTP client that RDMBSs’. Practical access requires several parameters to be speci- can consume an HTTP Vocabulary and Representing Content in fied (e.g. server location, database name, user name, password, ac- RDF 1.0 description to create an HTTP request, can use a RESTful cess driver), which are usually grouped in the so-called connec- web service and obtain as result a structured document. Although tion string and are programming language implementation depen- not strictly qualifying as RESTful web services, we include in this dent. There is no standard for representing connection strings in category also URIs that simply deliver structured documents (e.g. D2RML: Integrating Heterogeneous Data and Web Services into Custom RDF Graphs LDOW2018, April 2018, Lyon, France URIs to static JSON/XML files), since the communication is per- 4.5 XML Documents formed in exactly in the same way through HTTP messages. An XML document may also be modeled using a tree [5], however A practical consideration usually related with some RESTful its structure differs from a JSON tree. The core part of an XML web services, is that the APIs that implement the services, to avoid document is represented in the tree by element, attribute and text extremely long replies, perform pagination of the results and do nodes. Each element node corresponds to an element of the XML not return the full set of results as one document, but as a series document and has a name (the element name) and children that of smaller documents: in most cases, each returned document con- are all the enclosed elements. It may also have as child a text node, tains some keys that can be used by the client in the subsequent re- that holds in its string value the characters in the CDATA section quest to instruct the server to return the next set of results. The pag- of the element. Each element node may have associated with it also ination schema may get non-trivial, as in the case of MediaWiki18 . a set of attribute nodes that represent the attributes of the element, which, however, are not considered to be children of the element 4.3 SPARQL Endpoints node. Each attribute node has a name (the attribute name) and a SPARQL endpoints are URIs at which a SPARQL Protocol service string value that holds the respective attribute value. Relying on listens [10]. SPARQL Protocol is built on top of HTTP and as such this model, the XPath language allows to select particular nodes it can be treated as a RESTful web service. However, since special from the tree that meet certain conditions. Unlike in the case of SPARQL Protocol clients, in the form of APIs, exist (e.g. Apache JSON, the result is not itself an XML document, but a set of the Jena19 ) that hide from the user the cumbersome details of building nodes that match the query criteria. We will say that an XPath and decoding the necessary HTTP request and reply messages it query is flat if the result contains only text or attribute nodes. is useful to provide support also for this type of interaction. The Hence, we can consider as iterator for an XML document tree situation is similar to the RDBMS case: The request is a SPARQL T any relevant non-flat XPath query q that splits T into a logical SELECT (possibly along with some default and named RDF graph array of nodes N 1 , . . . , Nn . Since the query is non-flat, these nodes IRIs) instead of an SQL SELECT query, and the effective data source are element nodes, and can be treated as smaller XML document is a result set, whose column names are the return variable names trees T1 , . . . , Tn . The selectors are then flat XPath queries q 1 , . . . qk specified in the SPARQL query. Thus, the translation of the reply that are executed over each one of these smaller XML documents. to a set table is done in exactly the same way. The only essential Thus, T after applying iterator q and selectors q 1 , . . . qk yields thing that changes is the specification of the access to the SPARQL the set table ⟨⟨q 1 , . . . , qk ⟩, [⟨C 11 , . . . , C 1k ⟩, . . . , ⟨Cn1 , . . . , Cnk ⟩]⟩, endpoint for which a single URI is enough. where Ci j are the sting values of the text or attribute nodes in the node set obtained by applying q j on Ti . 4.4 JSON Documents A JSON document [15] may be modeled as a JSON tree [4]. A JSON 4.6 CSV/TSV Documents tree is an edge-labeled tree, whose root represents the entire docu- CSV/TSV documents are textual representations of tabular data. ment. A node may have either string- or integer-labeled children, Each line represents a data row, expect possibly from the first row but not both. A node with string-labeled outgoing edges represents that contains the names of the columns. Hence, the situation is a set of JSON key-value pairs: the edge label is the key and the edge similar to the RDBMS case, with no need of a query to be specified. destination the corresponding value. A node with integer-labeled The name tuple consists of the names of the columns in the file (or outgoing edges represents an array: the edge label is the array in- of their numbering) and the row sets of the actual rows of the table. dex and the edge destination the corresponding value. Value nodes, The only thing the needs to be specified are the formatting details are either leaf nodes having a string or integer label, or JSON trees. (eg. delimiter, escape separator, quote character). In the absence of an official standard, to select values from a JSON document that meet specific conditions, in practice the JSON- Path [12] specification is used, which is inspired by XPath. JSON- 5 D2RML SPECIFICATION Path queries select nodes of a JSON tree that meet a certain path D2RML draws significantly from R2RML and RML, and follows condition, and group them into a JSON array, which is the result the same simple syntactical strategy for defining mappings: Triples of the query. Since a JSON array is a JSON document, the result of maps, which consist of a subject map and several predicate object a JSONPath query is always a JSON document. We will say that a maps. From RML it adopts and appropriately extends the way to JSONPath query is flat if the result JSON tree has depth 1, ie. is an define the interaction with information sources through requests, array of simple values. iterators and selectors. Moreover, it significantly extends the ex- Hence, an iterator for a JSON tree T is any relevant JSONPath pressive capabilities of R2RML and RML by allowing transforma- query q, which splits T into a logical array of smaller JSON trees tions, conditional statements, and custom IRI generation functions. T1 , . . . , Tn , and the selectors are flat JSONPath queries q 1 , . . . qk For its semantics, D2RML relies on the data model described in that are executed over each T1 , . . . , Tn to deliver a set table from the Section 3. Each triples map is essentially a set table specification underlying logical array. Thus T , after applying iterator q and se- of Def. 3.3 and a specification of a set of triple rules of Def. 3.5 lectors q 1 , . . . qk , yields the set table ⟨⟨q 1 , . . . , qk ⟩, with the same subject filter over the common underlying set table. [⟨C 11 , . . . , C 1k ⟩, . . . , ⟨Cn1 , . . . , Cnk ⟩]⟩, where Ci j is the set of val- The information source, request and iterator of the original data ues contained in the array that results from applying q j on Ti . acquisition pipeline is directly provided in the triples map defini- 18 https://www.mediawiki.org/wiki/API:Query 19 https://jena.apache.org/ tion. Any transformations to be added to the set table specification LDOW2018, April 2018, Lyon, France Alexandros Chortaras and Giorgos Stamou Table 3: Namespaces used in D2RML documents determines also the form of all selectors that will be applied on the particular effective data source. Prefix IRI rr http://www.w3.org/ns/r2rml# LogicalTable ← a rr:LogicalTable dr http://islab.ntua.gr/ns/d2rml# dr:source ⟨InformationSource⟩ op http://islab.ntua.gr/ns/d2rml-op# SQLTable | SPARQLTable | CSVTable is http://islab.ntua.gr/ns/d2rml-is# (is:parameters ( ⟨DataVariable⟩+ ))? http http://www.w3.org/2011/http# LogicalSource ← a dr:LogicalSource cnt http://www.w3.org/2011/content# dr:source ⟨InformationSource⟩ dr:iterator literal are declared in the order of their application. The selectors are im- dr:referenceFormulation iri plicitly declared in the subject, predicate, object and graph maps. SQLTable ← a rr:BaseTableOrView a rr:R2RMLView Several triples map are allowed to coexist in the a D2RML docu- rr:tableName literal rr:sqlQuery literal ment, in which case several distinct set tables are generated. (rr:sqlVersion iri)? We define D2RML using a BNF-like notation. Terminal sym- SPARQLTable ← a dr:SPARQLTable bols are written in monospace, and non-terminals in italics. Non- dr:sparqlQuery literal terminals within angle brackets represent RDF nodes. Parenthesis (dr:sparqlVersion iri)? specify the scope of alternatives (separated by |) and of the stan- (dr:defaultGraph iri)* (dr:namedGraph iri)* dard quantifiers ?, *, and +. Terminal symbols not explicitly defined in the specification are written in smallcaps. The namespaces are CSVTable ← a dr:TextTable defined in Table 3. D2RML is compatible with R2RML, but not fully dr:delimiter literal dr:headerline boolean compatible with RML, so it does not directly extend its namespace. (dr:quoteCharacter literal)? (dr:commentCharacter literal)? 5.1 Triples Maps (dr:escapeCharacter literal)? A triples map is defined as in R2RML and RML, but tabular data (dr:recordSeparator literal)? providing information sources are clearly distinguished from non- tabular by using rr:logicalTable for tabular data providing in- 5.3 Information Sources formation sources, and dr:logicalSource for the rest. The version of D2RML presented here provides definitions for im- TriplesMap ← a rr:TriplesMap plementing data acquisition pipelines involving RDBMSs’, REST- rr:logicalTable ⟨LogicalTable⟩ | ful web services and SPARQL endpoints. Extensions for additional dr:logicalSource ⟨LogicalSource⟩ sources are expected in subsequent versions. (dr:transformations ( ⟨Transformation⟩+ ))? rr:subjectMap ⟨SubjectMap⟩ | rr:subject iri InformationSource ← RDMSSource | SPARQLService | HTTPSource (rr:predicateObjectMap ⟨PredObjMap⟩)* RDMSSource ← a is:RDBMSSource PredObjMap ← a rr:PredicateObjectMap is:rdbms iri (rr:predicateMap ⟨PredicateMap⟩ | is:location literal rr:predicate iri)+ (is:username literal)? (rr:objectMap (⟨ObjectMap⟩ | ⟨RefObjectMap⟩) | (is:password literal)? rr:object (iri | literal))+ (is:database literal)? (rr:graphMap ⟨GraphMap⟩ | rr:graph iri)* SPARQLService ← a is:SPARQLService is:uri uri 5.2 Logical Tables and Logical Sources HTTPSource ← a is:HTTPSource The LogicalTable and LogicalSource nodes provide details about the is:request ⟨HTTPRequest ⟩ | is:uri uri primary information source used to generate the set table. In the (is:parameters ( ⟨Parameter ⟩+ ))? case of query supporting information sources (such as RDBMSs’ Parameter ← DataVariable | SimpleKeyRequestIterator and SPARQL endpoints), for backward compatibility with R2RML, they contain also the query-relevant details of the request that DataVariable ← a is:DataVariable is:name literal should be sent to the information source. The is:parameters pred- icate may be used to declare parameter names in queries that par- SimpleKeyRequestIterator ← a is:SimpleKeyRequestIterator ticipate in parametric data acquisition pipelines. For other informa- is:name literal dr:reference literal tion sources (such as RESTful web services), the request, and any dr:referenceFormulation literal parameters, are included in the InformationSource specification it- is:initialValue literal self. For non-tabular data providing information sources, Logical- Source contains also the definition of the iterator (dr:iterator In an RDBMSSource, is:rdbms determines the specific RBMBS and dr:referenceFormulation) that will be used to split the ef- (eg. MySQL, PostgreSQL). An HTTPSource is specified in terms of a fective data source into a logical array. Since the effective data HTTPRequest which should be a http:Request and specify the de- source format is fixed, the object of dr:referenceFormulation tails of the HTTP message to be sent. An HTTPSource may contain D2RML: Integrating Heterogeneous Data and Web Services into Custom RDF Graphs LDOW2018, April 2018, Lyon, France parameters in case the web service is part of a parametric data ac- Condition ← (ValueRef )? quisition pipeline, or it paginates the results. Data parameters are (dr:booleanOperator iri)? (operator literal | dr:operand ⟨Condition⟩)+ identified by a name (is:name). For paginated results, the above specification allows, as an example, iterated requests through a re- RefObjectMap ← a rr:RefObjectMap quest iterator that should be part eg. of the web service URI and rr:parentTriplesMap ⟨TriplesMap⟩ ((rr:joinCondition ⟨JoinCondition⟩)+ | whose values, apart from the initial value (is:initialValue) are (dr:parameterBinding ⟨ParameterBinding⟩)+ )? extracted each time from the previous reply using a selector. Ex- tensions are possible to support additional pagination policies. JoinCondition ← a rr:Join rr:child literal rr:parent literal 5.4 Transformations To support filters, a SubjectMap, GraphMap, PredicateMap or Ob- A triples map definition may include a list of transformations that jectMap may contain a condition (dr:condition) and/or a case should be applied in the declared order to the set table derived from statement (dr:cases). If a term map contains a condition state- the primary information source. Since a transformation is itself ment, this will be evaluated and the corresponding subject, graph, a parametric data acquisition pipeline, its definition includes the predicate or object value will be included in the respective value specification of an InformationSource through a rr:logicalTable set only if the condition evaluates to true. Each condition statement or dr:logicalSource and one or more ParameterBindings. A Pa- should first specify the actual value on which it will operate (as a rameterBinding consists of a reference to a value (ValueRef ) or a ValueRef ), and may include several tests which will be jointly eval- constant value, and the parameter name (dr:parameter) in the uated using the boolean operator specified by dr:booleanOperator corresponding information source the value will be bound to. (op:and or op:or). Each test is specified either through an opera- Transformation ← a dr:Transformation tor and a literal which define a constant value with which the rr:logicalTable ⟨LogicalTable⟩ | actual value will be compared using operator, or as a nested con- dr:logicalSource ⟨LogicalSource⟩ dition. An operator is a common operator such as op:eq, op:le, (dr:parameterBinding ⟨ParameterBinding⟩)+ op:leq, op:ge, op:geq, op:matches, etc. The type of the operation ParameterBinding ← a dr:ParameterBinding (eg. number or string comparison) depends on the XSD type of the dr:parameter literal literal provided as operand. If a nested condition does not specify rr:constant literal | ValueRef a value reference, it inherits it from the enclosing condition. The case statement offers alternatives for realizing a term map: It contains a list of alternative term maps, each along with a con- 5.5 Term Maps and Conditions dition. If the condition evaluates to true the term map is realized, The definitions of term maps (i.e. of subject maps, graph maps, otherwise control flows to the next case. predicate maps and object map) follow the R2RML specification Finally, a referring object map (RefObjectMap) may be defined by with the addition of filters. a ParameterBinding, instead of by a R2RML JoinCondition. This is how set table specifications with parametric primary data acqui- SubjectMap ← a rr:SubjectMap sition pipelines are defined: the parametric set table specification IRIRef | BlankNodeRef (SubjectBody CaseSubjectBody*) | CaseSubjectBody+ corresponds to the parent triples map of RefObjectMap, and the ParameterBinding provides the parameters values. PredicateMap ← a rr:PredicateMap (PredicateBody CasePredBody*) | CasePredBody+ ObjectMap ← a rr:ObjectMap 5.6 IRIs, Literals and Blank Nodes (ObjectBody CaseObjectBody*) | CaseObjectBody+ In R2RML, RDF terms are generated using the rr:constant, the GraphMap ← a rr:GraphMap rr:column and rr:template predicates; to these, RML adds the (GraphBody CaseGraphBody*) | CaseGraphBody+ rml:reference option. D2RML follows the same strategy, but to SubjectBody ← (rr:class IRI)* account for values coming from transformations, RDF terms are (rr:graphMap ⟨GraphMap⟩ | rr:graph IRI)* generated through value references (ValueRefs), specified by two (dr:condition ⟨Condition⟩)? distinct components: a compulsory rr:column, rr:template or PredicateBody ← IRIRef dr:reference, and an optional dr:transformationReference to (dr:condition ⟨Condition⟩)? specify the transformation that provides the logical array for the ObjectBody ← IRIRef | BlankNodeRef | LiteralRef respective rr:column, rr:template or dr:reference. If missing, (dr:condition ⟨Condition⟩)? the primary logical array is assumed. GraphBody ← IRIRef Although rr:template allows some minimal flexibility in defin- (dr:condition ⟨Condition⟩)? ing custom IRIs or literals, the overall mechanism is quite restric- CaseSubjectBody ← dr:cases ( ⟨SubectBody ⟩+ ) tive, since no simple transformations (e.g. replace particular char- acters etc.) can be applied on the values obtained from the underly- CasePredBody ← dr:cases ( ⟨PredicateBody ⟩+ ) ing set tables. D2RML addresses this issue by allowing simple func- CaseObjectBody ← dr:cases ( ⟨ObjectBody ⟩+ ) tions to be applied on the raw values obtained from effective data CaseGraphBody ← dr:cases ( ⟨GraphBody ⟩+ ) sources. Thus, a ValueRef may include definitions of one or more LDOW2018, April 2018, Lyon, France Alexandros Chortaras and Giorgos Stamou defined columns (dr:definedColumns) that are constructed by ap- page contains a key to accessing the next page (nextCursor). An plying a series of functional transformations on particular set table extract from the response obtained from executing the above is the column values and may be used in a rr:column or rr:template. following JSON document, which contains a list of items modeled A defined column should declare the new column name dr:name it using the Europeana Data Model (EDM): will be referred by, the function (dr:function) that will generate { the custom values (eg. op:regex, op:replace), and a list of argu- "nextCursor": "AoE/GC8yMDI0OTA0L3Bob3Rv****=", "items": [ ments, in the form of one or more dr:parameterBindings. The { parameter names should be provided by the function definition. "id": "/2024904/photography_ProvidedCHO_TopFoto_co_uk_EU061905", "dcDescription": [ "Former chief inspector Berrett decorated by the king.\n IRIRef ← rr:constant iri | ValueRef Former chief detective inspector James Berrett of (rr:termType rr:IRI)? Scotland Yard was decorated by the King at the royal LiteralRef ← rr:constant literal | ValueRef invesititure at Buckingham Palace. " ], (rr:termType rr:Literal)? "edmIsShownBy": [ (rr:language literal | rr:datatype iri)? "http://www.topfoto.co.uk/imageflows/imagepreview/f=EU061905" ], BlankNodeRef ← ValueRef "edmConcept": [ (rr:termType rr:BlankNode)? "http://bib.arts.kuleuven.be/photoVocabulary/12003", ValueRef ← rr:column literal | rr:template literal | "http://data.europeana.eu/concept/base/1711" ], dr:reference literal "type": "IMAGE" (dr:transformationReference ⟨Transformation⟩)? }, ... (dr:definedColumns ( ⟨DefinedColumn⟩+ ))? ] } DefinedColumn ← a dr:DefinedColumn dr:name literal Most fields are self-explanatory. edmConcept contains a list of dr:function iri Open Linked Data resources that have been associated to each item (dr:parameterBinding ⟨ParameterBinding⟩)+ by the provider to characterize the respective item content. To gen- erate RDF triples for this information, as well as for the type of each item, we define the following D2RML document: 6 USE CASE <#EuropeanaMapping> In this section, we present a realistic use case for D2RML, involving dr:logicalSource [ dr:source <#EuropeanaAPI> ; dr:iterator "$.items" ; true data and readily available web services and data repositories. dr:referenceFormulation is:JSONPath ; ] ; The aim is to extract an extensive set of textual or URI features for a rr:subjectMap [ set of cultural items, in order to subsequently use them to perform dr:definedColumns ( [ dr:name "SID" ; several tasks such as clustering and similarity ranking. We assume dr:function op:extractMatch ; that we want to extract features in several ways (e.g. directly from dr:parameterBinding [ dr:parameter "input" ; the metadata, from applying named entity extraction, image analy- dr:reference "$.id" ; ] ; dr:parameterBinding [ dr:parameter "regex" ; sis, etc.), and that we want to keep information about the source of rr:constant "^.*_(.*)$" ; ] ; each feature so that we can use them selectively to test how they ] ) ; rr:template "http://islab.ntua.gr/resources/tp/{SID}" ; affect the clustering or similarity algorithm performance. dr:cases ( [ As primary information source of cultural items we use Euro- rr:class ; peana Collections20 , in particular the collection provided by Top- dr:condition [ dr:reference "$.type" ; op:eq "IMAGE"^^xsd:string ; ] ; Foto21 , which consists of 60,882 black and white images of the ] [ 1930s, along with their metadata. This collection can be obtained rr:class ; through the Europeana API. The D2RML specification for getting ] ) ; ] ; the effective data source for this collection is the following: rr:predicateObjectMap [ <#EuropeanaAPI> rr:predicate ; a is:HTTPSource ; rr:objectMap [ dr:reference "$.edmConcept" ; is:request [ rr:termType rr:IRI ; ] ; http:absoluteURI "http://www.europeana.eu/api/v2/search.json? ] . wskey=A*******W&rows=20&cursor={@@cursor@@}&profile=rich& query=europeana_collectionName%3A%222024904_Ag_EU_ Note the use of a defined column to construct custom RDF sub- EuropeanaPhotography_TopFoto_1013%22" ; ject IRIs. The particular defined column applies the regular expres- http:methodName "GET" ; ] ; sion ˆ.*_(.*)$ on the id field of each item and uses the value is:parameters ( [ a is:SimpleKeyRequestIterator ; of the first capturing group, named SID. The above specification is:name "cursor" ; generates the following RDF triples for the first item: is:initialValue "*" ; dr:reference "$.nextCursor" ; dr:referenceFormulation is:JSONPath ; ] ) . . The specification includes a is:SimpleKeyRequestIterator as parameter, because the API returns the results in pages, and each . 20 https://www.europeana.eu/portal/en 21 http://www.topfoto.co.uk/ D2RML: Integrating Heterogeneous Data and Web Services into Custom RDF Graphs LDOW2018, April 2018, Lyon, France . and the transformation Since we want to extract several features, we can invoke ser- <#DBpediaTransformation> dr:logicalSource [ vices to the analyze metadata. An option is to use DBpedia Spot- dr:source <#DBpediaSPARQLService> ; light to extract named entities from the textual descriptions. To do dr:query "SELECT ?dbpediatype WHERE this, we need a transformation that takes the description of each { <{@@resource@@}> a ?dbpediatype }" ; is:parameters ( [ a is:DataVariable; item (dcDescription) and invokes DBpedia Spotlight on it. We is:name "resource" ; ] ) ; first define the relevant information source: ] ; <#DBpediaSpotlightAPI> dr:parameterBinding [ a is:HTTPSource ; dr:parameter "resource" ; is:request [ dr:reference "/Resource/@URI" ; http:absoluteURI "http://model.dbpedia-spotlight.org/en/ dr:transformationReference <#SpotlightTransformation> ; annotate?text={@@text@@}&confidence=0.5&support=0& ] . spotter=Default&disambiguator=Default&policy=whitelist& types=&sparql=" ; http:methodName "GET" ; Finally, we modify <#EuropeanaMapping> to add the new trans- http:headers ( [ http:fieldName "Accept" ; formation and a add new predicate object map: http:fieldValue "application/xml" ; ] ) ; <#EuropeanaMapping> ] ; ... is:parameters ( [ a is:DataVariable ; dr:transformations ( <#SpotlightTransformation> is:name "text" ; ] ) . <#DBpediaTransformation> ) ; The respective effective data source has the following XML format rr:predicateObjectMap [ rr:predicate ; ; Palace." confidence="0.5" support="0" rr:termType rr:IRI ; types="" sparql="" policy="whitelist"> dr:condition [ op:matches "http://dbpedia\\.org/ontology/.*" ; ... Note that the mapping includes a conditional statement. It has been included because the query returns not only DBpedia ontol- which includes all detected named entities (Resource) as DBpedia ogy concepts as types, but also FOAF, YAGO, Schema, Wikidata, resources (URI). We next define the transformation and other resources, which we do not want to include in our re- <#SpotlightTransformation> sults. Eventually, this map generates the following RDF triples: dr:logicalSource [ dr:source <#DBpediaSpotlightAPI> ; dr:iterator "/Annotation/Resources/Resource" ; dr:referenceFormulation is:XPath ; ] ; . dr:parameterBinding [ dr:parameter "text" ; dr:reference "$.dcDescription" ; ] . . and add the transformation and a new predicate object map to the <#EuropeanaMapping> triples map: <#EuropeanaMapping> . ... dr:transformations ( <#SpotlightTransformation> ) ; Finally, we can use computer vision technologies to analyze the rr:predicateObjectMap [ image of each item (the URI is provided by the edmIsShownBy field rr:predicate ; in the document returned by the Europeana API) to detect objects rr:objectMap [ dr:reference "/Resource/@URI" ; that appear in it. To this end we use Microsoft’s Computer Vision dr:transformationReference <#SpotlightTransformation> ; API, that is offered as a RESTful web service. Thus, we add a new rr:termType rr:IRI ; ] ; information source including the required request parameters ] . <#ComputerVisionAPI> a is:HTTPSource When executed, it generates the following additional triples: is:request [ http:absoluteURI "https://westcentralus.api.cognitive.microsoft. com/vision/v1.0/analyze?visualFeatures=Categories& . language=en" ; http:methodName "POST" ; http:headers ( [ http:fieldName "Content-Type" ; . http:fieldValue "application/json" ; ] [ http:fieldName "Ocp-Apim-Subscription-Key" ; We further extend the set of features by using DBpedia ontol- http:fieldValue "3*************************b" ; ] ) ; ogy to get the types of the retrieved DBpedia resources. For this http:body [ a cnt:ContentAsText ; we need a second transformation, dependent on the first one, that cnt:chars "{\"url\" : \"{@@imageURL@@}\" }" ; ] ; ] ; consults a DBpedia endpoint. The information source definition is is:parameters ( [ a is:DataVariable ; <#DBpediaSPARQLService> is:name "imageURL" ; ] ) . a is:SPARQLService ; is:uri "http://dbpedia.org/sparql" . which produces the following JSON-formatted effective data source: LDOW2018, April 2018, Lyon, France Alexandros Chortaras and Giorgos Stamou { ACKNOWLEDGEMENTS "categories": [ { "name": "people_group", We acknowledge support of this work by the project ‘APOLLONIS’ "score": 0.578125 (MIS 5002738) which is implemented under the Action ‘Reinforce- } ], "requestId": "3b28df72-abf5-488c-86f4-b2c6a7eb9703" ment of the Research and Innovation Infrastructure’, funded by the } Operational Programme ‘Competitiveness, Entrepreneurship and Innovation’ (NSRF 2014-2020) and co-financed by Greece and the Based on this, we define the transformation European Union (European Regional Development Fund). <#ImageTransformation> dr:logicalSource [ dr:source <#ComputerVisionAPI> ; dr:iterator "$.categories" ; REFERENCES dr:referenceFormulation is:JSONPath ; ] ; [1] Marcelo Arenas, Alexandre Bertails, Eric Prud’hommeaux, and Juan Sequeda. dr:parameterBinding [ dr:parameter "imageURL" ; 2012. A Direct Mapping of Relational Data to RDF. (2012). https://www.w3. dr:reference "$.edmIsShownBy" ; ] . org/TR/rdb-direct-mapping/ [2] Stefan Bischof, Stefan Decker, Thomas Krennwallner, Nuno Lopes, and Axel to generate a logical array from categories that contains the names Polleres. 2012. Mapping between RDF and XML with XSPARQL. J. Data Se- of the detected objects, and modify <#EuropeanaMapping> by adding mantics 1, 3 (2012), 147–185. the new transformation and a new predicate object map: [3] Barry Bishop, Atanas Kiryakov, Damyan Ognyanoff, Ivan Peikov, Zdravko Ta- shev, and Ruslan Velkov. 2011. OWLIM: A family of scalable semantic reposito- <#EuropeanaMapping> ries. Semantic Web 2, 1 (2011), 33–42. dr:transformations ( <#SpotlightTransformation> [4] Pierre Bourhis, Juan L. Reutter, Fernando Suárez, and Domagoj Vrgoc. 2017. <#DBpediaTransformation> <#ImageTransformation> ) ; JSON: Data model, Query languages and Schema specification. In PODS. ACM, rr:predicateObjectMap [ 123–135. rr:predicate ; [5] James Clark and Steve DeRose. 2016. XML Path Language (XPath) Version 1.0. rr:objectMap [ (2016). https://www.w3.org/TR/xpath/ dr:reference "$.name" ; [6] Dan Connolly. 2007. Gleaning Resource Descriptions from Dialects of Languages dr:transformationReference <#ImageTransformation> ; (GRDDL). (2007). https://www.w3.org/TR/grddl/ rr:termType rr:Literal ; [7] Richard Cyganiak, Chris Bizer, Jörg Garbers, Oliver Maresch, and Christian dr:condition [ Becker. 2012. The D2RQ Mapping Language. (2012). http://d2rq.org/ dr:reference "$.score" ; d2rq-language dr:transformationReference <#ImageTransformation> ; [8] Souripriya Das, Seema Sundara, and Richard Cyganiak. 2012. R2RML: RDB to op:geq "0.4"^^xsd:decimal ; RDF Mapping Language. (2012). https://www.w3.org/TR/r2rml/ ] ; [9] Anastasia Dimou, Miel Vander Sande, Pieter Colpaert, Ruben Verborgh, Erik ] ; Mannens, and Rik Van de Walle. 2014. RML: A Generic Language for Integrated ] ; RDF Mappings of Heterogeneous Data. In LDOW (CEUR Workshop Proceedings), Vol. 1184. CEUR-WS.org. The above object map applies a filter in order to keep only objects [10] Lee Feigenbaum, Gregory Todd Williams, Kendall Grant Clark, and Elias Torres. that have been detected with relatively high confidence (score). 2013. SPARQL 1.1 Protocol. (2013). https://www.w3.org/TR/sparql11-protocol/ [11] Roy T. Fielding and Richard N. Taylor. 2000. Principled design of the modern Eventually, the above map adds the following RDF triple: Web architecture. In ICSE. ACM, 407–416. [12] Stefan Gössner and Stephen Frank. 2007. JSONPath. (2007). http://goessner. net/articles/JsonPath/ "people_group" . [13] Oktie Hassanzadeh, Soheil Hassas Yeganeh, and Renée J. Miller. 2011. Linking Semistructured Data on the Web. In WebDB. The RDF triples generated by all the above predicate-object maps [14] Matthias Hert, Gerald Reif, and Harald C. Gall. 2011. A comparison of RDB-to- make up the desired RDF graph. In terms of performance, for exe- RDF mapping languages. In I-SEMANTICS (ACM International Conference Pro- ceeding Series). ACM, 25–32. cuting the above D2RML document, our implementation of D2RML [15] Internet Engineering Task Force (IETF). 2014. The JavaScript Object Notation processor22 took about 7 minutes per 100 Europeana items. (JSON) Data Interchange Format. (2014). https://tools.ietf.org/html/rfc7159 [16] Johannes Koch, Carlos A Velasco, and Philip Ackermann. 2017. HTTP Vocabu- lary in RDF 1.0. (2017). https://www.w3.org/TR/HTTP-in-RDF10/ 7 CONCLUSIONS [17] Johannes Koch, Carlos A Velasco, and Philip Ackermann. 2017. Representing Content in RDF 1.0. (2017). https://www.w3.org/TR/Content-in-RDF10/ We presented D2RML, a Data-to-RDF mapping language, which [18] Andreas Langegger and Wolfram Wöß. 2009. XLWrap - Querying and Integrat- based on an abstract data model, allows the orchestrated retrieval ing Arbitrary Spreadsheets with SPARQL. In International Semantic Web Confer- of data from several information sources, their transformation and ence (Lecture Notes in Computer Science), Vol. 5823. Springer, 359–374. [19] Franck Michel, Loïc Djimenou, Catherine Faron Zucker, and Johan Montagnat. extension using relevant web services, their filtering and manipu- 2014. xR2RML: Non-Relational Databases to RDF Mapping Language. (2014). lation using simple operations, and finally their mapping to RDF https://hal.inria.fr/hal-01066663v1/document [20] Boris Motik, Peter F. Patel-Schneider, and Bijan Parsia. 2012. OWL 2 Web On- graphs. It combines the mapping approach of R2RML and RML tology Language Structural Specification and Functional-Style Syntax (Second with workflow approaches, by allowing the definition of easy to Edition). (2012). https://www.w3.org/TR/owl2-syntax/ write and understand, homogenous views of the underlying data [21] Yavor Nenov, Robert Piro, Boris Motik, Ian Horrocks, Zhe Wu, and Jay Baner- jee. 2015. RDFox: A Highly-Scalable RDF Store. In International Semantic Web and services in a lightweight document. We developed D2RML on Conference (2) (Lecture Notes in Computer Science), Vol. 9367. Springer, 3–20. top of a formal abstract data model, so as to formally define its se- [22] Martin J. O’Connor, Christian Halaschek-Wiener, and Mark A. Musen. 2010. M2 : mantics and allow future extensions. We also presented a realistic A Language for Mapping Spreadsheets to OWL. In OWLED (CEUR Workshop Proceedings), Vol. 614. CEUR-WS.org. use case, which demonstrates the capabilities of the proposed lan- [23] Jason Slepicka, Chengye Yin, Pedro A. Szekely, and Craig A. Knoblock. 2015. guage in real settings, by delivering a unified and coordinated ac- KR2RML: An Alternative Interpretation of R2RML for Heterogenous Sources. cess to Linked Data data stores and other services in a clean specifi- In COLD (CEUR Workshop Proceedings), Vol. 1426. CEUR-WS.org. cation without the need of code writing or heavy-weight solutions. 22 Available as a web service at http://apps.islab.ntua.gr/d2rml/