=Paper= {{Paper |id=Vol-2073/article-07 |storemode=property |title=None |pdfUrl=https://ceur-ws.org/Vol-2073/article-07.pdf |volume=Vol-2073 |dblpUrl=https://dblp.org/rec/conf/www/ChortarasS18 }} ==None== https://ceur-ws.org/Vol-2073/article-07.pdf

D2RML: Integrating Heterogeneous Data and Web Services
into Custom RDF Graphs
Alexandros Chortaras Giorgos Stamou
National Technical University of Athens National Technical University of Athens
Athens, Greece Athens, Greece
achort@cs.ntua.gr gstam@cs.ntua.gr

ABSTRACT Following the Linked Data growth, several research institutions
In this paper, we present the D2RML Data-to-RDF Mapping Lan- and companies such as DBpedia1 , WordNet2 , OpenStreetmap3 , of-
guage, as an extension of the R2RML mapping language, which fer now access to their huge datastores through SPARQL endpoints
significantly enhances its abilities to collect data from diverse data or RESTful web services. Even more recently, the expansion of
sources and transform them into custom RDF graphs. The defini- cloud computing and the exciting developments in the field of ma-
tion of D2RML is based on a simple formal abstract data model, chine learning and the subsequent revival of interest in artificial
which is needed to clearly define its semantics, given the diverse intelligence applications has resulted in the emergence of cloud
types of data representation standards used in practice. D2RML platforms and marketplaces that offer intelligent data analysis web
allows web service-based data transformations, simple data ma- services, often representing their output using Linked Open Data
nipulation and filtering, and conditional maps, so as to improve vocabularies and resources, such as DBedia Spotlight4 , Google’s
the selectivity of RDF mapping rules and facilitate the generation Cloud Natural Language5 and Microsoft’s Computer Vision API6 .
of higher quality RDF data stores, through a lightweight, easy to These services typically deliver data using some structured data
write and modify specification. exchange format (usually JSON or XML documents).
Thus, if until recently the question was how to integrate exist-
ing data with the Semantic Web, now part of the question is also
CCS CONCEPTS how to use all these available data and diverse services in a coor-
• Information systems → Information integration; Web data dinated and integrated manner to selectively pick and aggregate
description languages; Query languages; Web services; data into custom data stores to power new intelligent applications.
In this respect, aggregating data into custom RDF data stores is of
KEYWORDS particular interest not only because they allow direct integration
RDF mapping language, Data integration, Web service integration with the Linked Data cloud, but also because intelligence can be
added on top of the data by including e.g. axiomatic knowledge in
ACM Reference Format: the form of OWL2 [20] axioms. As a matter of fact, recent work on
Alexandros Chortaras and Giorgos Stamou. 2018. D2RML: Integrating Het- efficient algorithms and methods for reasoning with tractable frag-
erogeneous Data and Web Services into Custom RDF Graphs. In Proceed- ments of ontologies (e.g. [3], [21]) has allowed the development of
ings of Linked Data on the Web 2018 (LDOW2018). ACM, New York, NY, practical systems that provide inferencing over semantic data.
USA, 10 pages. In this environment, we propose D2RML, a generic Data-to-RDF
Mapping Language, whose aim is to facilitate the generation of
custom RDF data stores by selectively collecting and integrating
1 INTRODUCTION data from diverse data sources and web services into as much as
In the past years, a considerable amount of work has been done possible high quality RDF data stores. Our purpose is to provide
on developing methodologies for mapping relational databases to a formal basis for defining transformation-oriented general Data-
RDF graphs. Several approaches, mapping languages and systems to-RDF mappings, as well as, while staying within the mapping
have been proposed, including two W3C recommendations [1, 8]. language approach, to transfer as much as possible of the burden
This work has mainly been motivated by the need to integrate for generating such data stores in practice from writing code or
the huge amount of information contained in existing relational using heavyweight data workflow solutions, to writing easy un-
databases with the emerging Semantic Web, and make them part derstandable and modifiable specifications.
of the Linked Data cloud. The rest of the paper is organized as follows: In Section 2 we
briefly discuss related work with emphasis on R2RML and RML,
which are the starting points for our work. In Section 3 we define
the simple theoretical data model that underlies D2RML. In Section
Permission to make digital or hard copies of part or all of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed 4 we describe how several widely used information sources can
for profit or commercial advantage and that copies bear this notice and the full citation be cast onto our model, and in Section 5 we present the formal
on the first page. Copyrights for third-party components of this work must be honored. specification of D2RML. Section 6 presents an extensive realistic
For all other uses, contact the owner/author(s).
LDOW2018, April 2018, Lyon, France 1 http://dbpedia.org/sparql/ 2 http://wordnet-rdf.princeton.edu/
© 2018 Copyright held by the owner/author(s). 3 http://api.openstreetmap.org/ 4 http://www.dbpedia-spotlight.org/api/
5 https://cloud.google.com/natural-language/
6 https://azure.microsoft.com/en-us/services/cognitive-services/computer-vision/
LDOW2018, April 2018, Lyon, France Alexandros Chortaras and Giorgos Stamou

use case that showcases the expressivity and practical usefulness other data types. E.g. select conditions and transformation func-
of the proposed language, and Section 7 concludes the paper. tions are supporting implicitly by R2RML by relying on the expres-
sivity of the SQL query language, but this is not fully portable in a
2 RELATED WORK straightforward extension to the case of XML or JSON documents.
Several languages and systems have been proposed to map rela-
tional databases to RDF (RDB-to-RDF mapping languages). A com- 2.1 R2RML and RML
parative analysis is presented in [14], which determines fifteen de-
R2RML works with logical tables (rr:LogicalTable), which may
sirable features (e.g. support for transformation functions, named
be either base tables or views (rr:BaseTableOrView) defined by
graphs, integrity constraints) that such languages should have, and
specifying an appropriate table name (rr:tableName), or result
discusses how they are or are not supported by the several lan-
sets (rr:R2RMLView) obtained by executing a query (rr:sqlQuery).
guages. Existing RDB-to-RDF mapping languages vary consider-
Each logical table is mapped to RDF triples using one or more
ably in the flexibility they allow in defining mappings, from the
triples maps (rr:TriplesMap). A triples map is a complex rule
rigid Direct Mapping [1] approach that automatically translates
that maps each row in the underlying logical table to several RDF
the data of a relational database into an RDF graph representation
triples. The rule has two parts: a subject map (rr:SubjectMap) that
following the database schema, to the R2RML language [8] that al-
generates the subject of all RDF triples that will be generated from
lows the user to define custom views and mapping rules (expressed
each row of the logical table, and several predicate-object maps
as RDF graphs), and satisfies most of the fifteen desirable features.
(rr:PredicateObjectMap) that in turn consist of predicate maps
The development of mapping languages and practical systems
(rr:PredicateMap) and object maps (rr:ObjectMap) or referenc-
for translating data sources other than relational databases to RDF
ing object maps (rr:RefObjectMap). A predicate map determines
graphs has also been attempted. Closer to the relational model are
predicates for the to-be generated RDF triples for the given sub-
CSV/TSV documents and spreadsheets, which retain the tabular
ject, and the object maps their objects. A subject map may include
format. Tools for converting from these data sources include XL-
several IRIs (rr:class) that will be used as objects to generate
Wrap [18], TaRQL7 , Vertere8 , and M2 [22]. In all such tools, for each
triples with the predicate rdf:type for the particular subjects. A
table row one or more RDF resources are generated, and for each
subject map or predicate-object map may have also one or more
column one or more RDF triples about the respective resources
graph maps (rr:GraphMap) associated with it, which specify the
are generated. Other formats, such as XML, diverge considerably
target graph of the resulting RDF triples. Referring object maps al-
from tabular data owing to their hierarchical structure, and the sys-
low joining two different triples maps. A referring object map spec-
tems that have been proposed to translate XML to RDF graphs rely
ifies a parent triples map (rr:parentTriplesMap), the subjects of
on XSLT transformations (e.g. XML2RDF9 ), XPath (e.g. Tripliser10 ),
which will act as objects for the current triples map, and may con-
XQuery (e.g. XSPARQL [2]) or on embedding within the XML doc-
tain (rr:joinCondition) a join condition (rr:Join) specified by
uments links to transformation algorithms, typically XSLT trans-
a reference to a column name of the current and parent triples
formations (GRDDL [6]). All such tools rely on syntactical trans-
map (rr:child and rr:parent, respectively). The IRIs and liter-
formations of parts of the XML structure to RDF triples. Another
als that will be used as RDF triple subjects, predicates, objects, or
framework to assist the transformation of XML and JSON data
RDF graph names may be either declared constants (rr:constant),
sources is xCurator [13] which focus on delivering high-quality
or obtained from the underlying table, view or result set by speci-
linked data. Apart from the above, there exist also tools, in the
fying the desired column name (rr:column) that will act as value
form of web services (e.g. The Datatank11 ) or parts of other infras-
source, or generated through a string template (rr:template) to
tructures (e.g. Virtuoso Sponger12 ) that provide custom solutions
concatenate column values and custom strings. String templates of-
to work with data from different formats and possibly construct
fer only very rudimentary options to manipulate actual database
RDF graphs out of them. These tools, however, are general data
values and generate custom IRIs and literals.
processing and transformation tools and not designed to directly
RML extends R2RML by allowing other sources (e.g. JSON or
support semantic mappings of general data to RDF triples.
XML files) apart from logical tables (rml:LogicalSource), that
To resolve the polymorphy of tools and focus on the semantic
may be used in an interlinked manner, by defining data iterators
aspects of the Data-to-RDF mapping process, several works ex-
(rml:iterator) to split the data obtained from such sources into
tend the W3C recommended R2RML language to support other
base elements on which each mapping rule will be applied, and
data formats. These include KR2RML [23], xR2RML [19] and RML
by allowing particular references (rml:reference), in the form of
[9]. These proposals are a considerable advance with respect to
subelement selectors within the base element, to define the value
custom system solutions, because they are based on an existing,
sources to be used for the generation of IRIs and literals. Both the
clean, mapping-oriented standard, and allow backward compati-
iterators and the references depend on the underlying data source,
bility, and in most cases extensibility. It should be noted, however,
and may be XPath queries, JSONPath queries, CSV column names
that simply extending the R2RML standard to support other data
or SPARQL return variable names. Their type is declared using the
source types, does not necessarily carry on all its features into the
rml:referenceFormulation predicate.
7 8 https://github.com/knudmoeller/Vertere-RDF/
With respect to the specification of the actual access to the data
https://github.com/tarql/tarql/
9
sources, R2RML leaves the issue to the implementation. The as-
http://www.gac-grid.de/project-products/Software/XML2RDF.html
10 http://daverog.github.io/tripliser/ 11 http://thedatatank.com/ sumption is that each R2RML document applies to data from a
12 http://vos.openlinksw.com/owiki/wiki/VOS/VirtSponger unique database. In contrast, RML, which allows multiple sources
D2RML: Integrating Heterogeneous Data and Web Services
into Custom RDF Graphs LDOW2018, April 2018, Lyon, France

and cross-references between the retrieved data, must include the and object filter, respectively. The implementation of R is the set of
data source descriptions within the RML document. To describe RDF triples
them, it suggests the use of some recommended or widely-used vo- {(s, p, o) | s ∈ Fs (D), p ∈ Fp (D), o ∈ Fo (D), D ∈ T }.
cabularies such as DCAT13 , D2RQ14 , CSVW15 , Hydra16 , SPARQL-
SD17 to access files, relational databases, CSV/TSV files, web APIs A set of triples rules over one or more set tables defines a Data-
and SPARQL endpoints, respectively. However, these vocabularies to-RDF mapping. Using the above simple model we can define Data-
have been developed mainly for APIs and data sources to inform to-RDF mappings for any information sources that can give rise to
clients about their exact properties and services they offer, and not one or more set tables. The triple store represented by a Data-to-
as a form of formulating requests to them. E.g. to retrieve data from RDF mapping is then the implementation of all its triples rules.
a web API that paginates the results using next page access keys, We consider an information source to be any online software sys-
knowledge on how to formulate each time the subsequent HTTP tem that can deliver structured data upon request. The information
request is needed; this is not covered for example by Hydra. Sim- source may be a data repository (e.g. a relational database, an RDF
ilarly, a SPARQL-SD specification provides information about the store, an XML file stored in some directory) or an implementation
supported SPARQL version, the default entailment regime, the de- of a service or an algorithm (e.g. a RESTful web service) that may
fault named graph, etc., which are not useful to a client, at the time process some input data and deliver some structured output. The
of formulating a request. request, in the form of a query (e.g. an SQL or SPARQL SELECT
query) or message (e.g. an HTTP GET or POST request) in a for-
3 DATA MODEL mat supported by the information source, includes all input data
In this section, we extend the table-based model underlying the and parameters required by the information source to generate and
R2RML language to support complex, non-tabular data, that can be deliver the output. The reply, or effective data source, is the output
obtained from various information sources (such as JSON or XML produced by the information source, upon processing the request.
document returning sources). To do this we consider that instead The reply may be delivered to the client in a native format (e.g. as
of logical tables, RDF triples are generated from set tables. In the an SQL result set), or in a generic document format (e.g. as a JSON
following we represent an RDF triple as a tuple ⟨s, p, o⟩, where s is or XML document).
the subject, p the property or predicate and o the object. To accommodate the several possible information sources in
our model, we consider, as in RML, that the effective data source
Definition 3.1. A set row of arity k is a tuple ⟨D 1 , . . . , D k ⟩, where groups some set of autonomous elements (e.g. rows of an SQL re-
D 1 , . . . , D k are sets of values over some domains. A name row of sult set, elements of a JSON array). The division of the reply in
arity k is a tuple ⟨n 1 , . . . , nk ⟩, where n 1 , . . . , nk are names. A set these autonomous elements is achieved through an iterator. Hence,
table of arity k with m rows is a tuple S = ⟨N , T ⟩, where N is a an effective data source together with an iterator specifies a logical
name row and T = [D1 , . . . , Dm ] a list of set rows, all of arity k, array, through whose items the iterator eventually iterates. Each
such as the i-th elements of D1 , . . . , Dm , for 1 ≤ i ≤ k, share all item of a logical array may itself be a complex data structure (a
the same domain. new effective data source), so in order to extract from it lists of val-
The names allow us to refer to particular elements of set rows ues to construct set rows and use them as subjects, predicates and
and tables. We denote the set of values that corresponds to name objects of RDF triples, we need some selectors. Thus, the role of the
ni (1 ≤ i ≤ k) in a set row D by D[nk ]. We also denote the list selectors is to transform a logical array into a set table.
[D1 [nk ], . . . , Dm [nk ]] of value sets that are obtained from the sev- Definition 3.4. The triple A = ⟨I, t, L⟩, where I is a informa-
eral set rows of S by S[nk ], which we call a column of S. Let also tion source and request specification, t an iterator specification,
dom(n) denote the domain of column n. It should be underlined, and L a set of selectors, is a data acquisition pipeline.
that for a particular set row D and the different possible names ni ,
the several sets D[ni ] may have different numbers of values, there It follows that each data acquisition pipeline A gives rise to a
is no alignment between the individual values among the several unique set table S A . A data acquisition pipeline may be paramet-
sets, and all individual values are equivalent with respect to their ric, in the sense that the information source or request specification
relation to the values of the other sets in the same set row. may contain parameters. Given a non-parametric data acquisition
pipeline A, a parametric data acquisition pipeline A ′ that depends
Definition 3.2. A filter F over a set table S of arity k is a tuple on A is a data acquisition pipeline whose parameters take values
⟨n, f ⟩, where n is a column name and f : dom(n) → dom(n) a from one or more columns of S A . We call such a parametric data
function, such that f (D[n]) ⊆ D[n] for all set rows D of S. acquisition pipeline a transformation of A.
We denote the set value f (D[n]), obtained by applying F on a Definition 3.5. A series of data acquisition pipelines A0 , A1 , . . .,
set row D by F (D). Clearly, f may be the identity function. Al , where each Ai , for i > 1, is a transformation that depends on
Definition 3.3. A triples rule R over a set table S = ⟨N , T ⟩ is one or more A j for j < i is a set table specification. A0 is the
a triple of filters ⟨Fs , Fp , Fo ⟩, over S, called the subject, predicate primary data acquisition pipeline.

13 14 http://d2rq.org/d2rq-language A set table specification gives rise to a unique set table, which is
https://www.w3.org/TR/vocab-dcat/
15 https://www.w3.org/TR/tabular-metadata/ S A0 extended by columns contributed by transformations A1 , . . .,
16 https://www.hydra-cg.com/spec/latest/core/ Al . A trivial set table specification consists only of the primary
17 https://www.w3.org/TR/sparql11-service-description/ data acquisition pipeline A0 . Each transformation in a set table
LDOW2018, April 2018, Lyon, France Alexandros Chortaras and Giorgos Stamou

specification is realized as a series of requests to the respective in- Table 1: Information sources, requests and replies
formation source, after binding the parameters to all possible com-
binations of values obtained from the referred to columns of the set Information Source Request Effective Data Source
table constructed from the preceding data acquisition pipelines. In SQL
RDBMS SQL Result Set
particular, to evaluate a set table specification, we must evaluate se- SELECT Query
rially the data acquisition pipelines, extending at each step the pre- SPARQL
viously obtained set table: The primary data acquisition pipeline SELECT Query SPARQL Result Set
SPARQL Endpoint
A0 gives rise to set table S A0 . Then, for each set row D of S A0 , and RDF graph IRIs via HTTP Message
via HTTP Message
evaluating A1 gives rise to a set table S A1 (D). By flattening all
RESTful HTTP JSON/XML/CSV/TSV
rows of S A1 (D) into a single row (by merging the respective col-
Web Service GET/POST Request Document
umn values of each row) we obtain a new set row that is appended JSON/XML/CSV/TSV HTTP JSON/XML/CSV/TSV
to D. Doing this for all set rows D results in S A0 A1 . By applying Document GET Request Document
this process iteratively, eventually S A0 is extended with additional
columns to set table S A0 A1 ... Al .
More formally, let n 1 , . . . , nk be the names, and [D1 , . . . , Dm ] Table 2: Effective Data Sources, iterators and selectors
the rows of Ŝ S A0 ... Ai . Evaluating Ai+1 on each row of Ŝ pro-
duces set tables S Ai +1 (D1 ), . . ., S Ai +1 (Dm ). Since all these set ta- Effective Data Source Iterator Selector
bles are produced by the same data acquisition pipeline Ai+1 , they SQL Result Set Row Iterator Column name
share the same arity, say k ′ , and column names, say n̂ 1 , . . . , n̂k ′ . SPARQL Result Set Row Iterator Variable name
Thus S A0 ... Ai +1 = ⟨N , T ⟩, where N = ⟨n 1 , . . . , nk , n̂ 1 , . . . , n̂k ′ ⟩, JSON Document JSONPath query Flat JSONPath query
T = [D1′ , . . . , Dm
′ ], D ′ = [D [n ], . . . , D [n ], D̂ , . . . , D̂ ′ ]
j j 1 j k j1 jk XML Document XPath query Flat XPath query
for 1 ≤ j ≤ m, and D = S A (Dj )[n̂ ] for 1 ≤ l ≤ k .
′ ′
ˆ Ð
i +1 l CSV/TSV Document Row Iterator Column name
jl
The row flattening step is intentional: S A0 provides the origi-
nal data that we want to extend through transformations, ie. by
appending new columns containing new properties of that data. RDF form. D2RQ Mapping Language [7] allows a JDBC-dependent
Since, as mentioned above, all values contained in a particular row RDF definition of connection strings and is used by RML to specify
and column of S A are equivalent with respect to the values in RDBMS connectivity.
the sets of the other columns of the current row, the flattening be- An implementation provided with a RDBMS connection specifi-
haviour maintains this relationship between values, without intro- cation can connect to the particular RDBMS, pose an SQL SELECT
ducing non-desired hierarchical dependencies. Finally, the primary query q that specifies attributes n 1 , . . . , nk in the SELECT state-
data acquisition pipeline may be itself parametric. In this case, the ment for the returned columns, and obtain as result a list of rows
evaluation is done exactly as described above, but the set rows [⟨v 11 , . . . , v 1k ⟩, . . . , ⟨vn1 , . . . , vnk ⟩]. Using, a trivial row iterator
generated by S A0 are not appended to the set table on which it and column names n 1 , . . . , nk as selectors, the results of q can be
depends, but initiate a new set table. converted to the following set table: ⟨⟨n 1 , . . . , nk ⟩,
[⟨{v 11 }, . . . , {v 1k }⟩, . . . , ⟨{vn1 }, . . . , {vnk }⟩]⟩
4 INFORMATION SOURCES AND REPLIES
We now study how several information and effective data sources 4.2 RESTful Web Services
used in real applications can be accommodated by our model. We RESTful web services are services based on the REST principles
discuss relational databases, RESTful web services, JSON, XML, [11], and are usually implemented using the HTTP protocol. Typ-
CSV/TSV documents, and SPARQL endpoints. ically, a data retrieving RESTful service accepts an HTTP request
and delivers the result in a self-descriptive text message (e.g. an
4.1 Relational Databases HTML, XML, JSON, plain text). Here we are interested in struc-
In relational databases data is organized into one or more tables tured reply services, i.e. services whose reply is in one of the XML,
(or relations) of columns (or attributes) and rows (or tuples). Each JSON or CSV/TSV formats. To access a RESTful web service, the
table column has a name. Data are retrieved by issuing an SQL elements of the appropriate HTTP request have to be specified.
SELECT query and the results are packed as a result set, which These include the method (GET or POST), the URI (including the
is essentially a row-by-row iteratable table along with its meta- query string in the case of a GET message), any headers, and the
data. Because relational database management systems (RDBMS) body (for passing parameters in the case of a POST message). All
use native formats to implement the data stores and the result for- these can be specified in RDF using the W3C’s Working Group
mats, communications with RDBMSs’ are done using special pro- Notes ‘HTTP Vocabulary in RDF 1.0’ [16] and ‘Representing Con-
tocols (such as ODBC, JDBC) to implement clients for particular tent in RDF 1.0’ [17]. Thus, we can assume that an HTTP client that
RDMBSs’. Practical access requires several parameters to be speci- can consume an HTTP Vocabulary and Representing Content in
fied (e.g. server location, database name, user name, password, ac- RDF 1.0 description to create an HTTP request, can use a RESTful
cess driver), which are usually grouped in the so-called connec- web service and obtain as result a structured document. Although
tion string and are programming language implementation depen- not strictly qualifying as RESTful web services, we include in this
dent. There is no standard for representing connection strings in category also URIs that simply deliver structured documents (e.g.
D2RML: Integrating Heterogeneous Data and Web Services
into Custom RDF Graphs LDOW2018, April 2018, Lyon, France

URIs to static JSON/XML files), since the communication is per- 4.5 XML Documents
formed in exactly in the same way through HTTP messages. An XML document may also be modeled using a tree [5], however
A practical consideration usually related with some RESTful its structure differs from a JSON tree. The core part of an XML
web services, is that the APIs that implement the services, to avoid document is represented in the tree by element, attribute and text
extremely long replies, perform pagination of the results and do nodes. Each element node corresponds to an element of the XML
not return the full set of results as one document, but as a series document and has a name (the element name) and children that
of smaller documents: in most cases, each returned document con- are all the enclosed elements. It may also have as child a text node,
tains some keys that can be used by the client in the subsequent re- that holds in its string value the characters in the CDATA section
quest to instruct the server to return the next set of results. The pag- of the element. Each element node may have associated with it also
ination schema may get non-trivial, as in the case of MediaWiki18 . a set of attribute nodes that represent the attributes of the element,
which, however, are not considered to be children of the element
4.3 SPARQL Endpoints node. Each attribute node has a name (the attribute name) and a
SPARQL endpoints are URIs at which a SPARQL Protocol service string value that holds the respective attribute value. Relying on
listens [10]. SPARQL Protocol is built on top of HTTP and as such this model, the XPath language allows to select particular nodes
it can be treated as a RESTful web service. However, since special from the tree that meet certain conditions. Unlike in the case of
SPARQL Protocol clients, in the form of APIs, exist (e.g. Apache JSON, the result is not itself an XML document, but a set of the
Jena19 ) that hide from the user the cumbersome details of building nodes that match the query criteria. We will say that an XPath
and decoding the necessary HTTP request and reply messages it query is flat if the result contains only text or attribute nodes.
is useful to provide support also for this type of interaction. The Hence, we can consider as iterator for an XML document tree
situation is similar to the RDBMS case: The request is a SPARQL T any relevant non-flat XPath query q that splits T into a logical
SELECT (possibly along with some default and named RDF graph array of nodes N 1 , . . . , Nn . Since the query is non-flat, these nodes
IRIs) instead of an SQL SELECT query, and the effective data source are element nodes, and can be treated as smaller XML document
is a result set, whose column names are the return variable names trees T1 , . . . , Tn . The selectors are then flat XPath queries q 1 , . . . qk
specified in the SPARQL query. Thus, the translation of the reply that are executed over each one of these smaller XML documents.
to a set table is done in exactly the same way. The only essential Thus, T after applying iterator q and selectors q 1 , . . . qk yields
thing that changes is the specification of the access to the SPARQL the set table ⟨⟨q 1 , . . . , qk ⟩, [⟨C 11 , . . . , C 1k ⟩, . . . , ⟨Cn1 , . . . , Cnk ⟩]⟩,
endpoint for which a single URI is enough. where Ci j are the sting values of the text or attribute nodes in the
node set obtained by applying q j on Ti .
4.4 JSON Documents
A JSON document [15] may be modeled as a JSON tree [4]. A JSON 4.6 CSV/TSV Documents
tree is an edge-labeled tree, whose root represents the entire docu-
CSV/TSV documents are textual representations of tabular data.
ment. A node may have either string- or integer-labeled children,
Each line represents a data row, expect possibly from the first row
but not both. A node with string-labeled outgoing edges represents
that contains the names of the columns. Hence, the situation is
a set of JSON key-value pairs: the edge label is the key and the edge
similar to the RDBMS case, with no need of a query to be specified.
destination the corresponding value. A node with integer-labeled
The name tuple consists of the names of the columns in the file (or
outgoing edges represents an array: the edge label is the array in-
of their numbering) and the row sets of the actual rows of the table.
dex and the edge destination the corresponding value. Value nodes,
The only thing the needs to be specified are the formatting details
are either leaf nodes having a string or integer label, or JSON trees.
(eg. delimiter, escape separator, quote character).
In the absence of an official standard, to select values from a
JSON document that meet specific conditions, in practice the JSON-
Path [12] specification is used, which is inspired by XPath. JSON- 5 D2RML SPECIFICATION
Path queries select nodes of a JSON tree that meet a certain path D2RML draws significantly from R2RML and RML, and follows
condition, and group them into a JSON array, which is the result the same simple syntactical strategy for defining mappings: Triples
of the query. Since a JSON array is a JSON document, the result of maps, which consist of a subject map and several predicate object
a JSONPath query is always a JSON document. We will say that a maps. From RML it adopts and appropriately extends the way to
JSONPath query is flat if the result JSON tree has depth 1, ie. is an define the interaction with information sources through requests,
array of simple values. iterators and selectors. Moreover, it significantly extends the ex-
Hence, an iterator for a JSON tree T is any relevant JSONPath pressive capabilities of R2RML and RML by allowing transforma-
query q, which splits T into a logical array of smaller JSON trees tions, conditional statements, and custom IRI generation functions.
T1 , . . . , Tn , and the selectors are flat JSONPath queries q 1 , . . . qk For its semantics, D2RML relies on the data model described in
that are executed over each T1 , . . . , Tn to deliver a set table from the Section 3. Each triples map is essentially a set table specification
underlying logical array. Thus T , after applying iterator q and se- of Def. 3.3 and a specification of a set of triple rules of Def. 3.5
lectors q 1 , . . . qk , yields the set table ⟨⟨q 1 , . . . , qk ⟩, with the same subject filter over the common underlying set table.
[⟨C 11 , . . . , C 1k ⟩, . . . , ⟨Cn1 , . . . , Cnk ⟩]⟩, where Ci j is the set of val- The information source, request and iterator of the original data
ues contained in the array that results from applying q j on Ti . acquisition pipeline is directly provided in the triples map defini-
18 https://www.mediawiki.org/wiki/API:Query 19 https://jena.apache.org/ tion. Any transformations to be added to the set table specification
LDOW2018, April 2018, Lyon, France Alexandros Chortaras and Giorgos Stamou

Table 3: Namespaces used in D2RML documents determines also the form of all selectors that will be applied on the
particular effective data source.
Prefix IRI
rr http://www.w3.org/ns/r2rml# LogicalTable ← a rr:LogicalTable
dr http://islab.ntua.gr/ns/d2rml# dr:source ⟨InformationSource⟩
op http://islab.ntua.gr/ns/d2rml-op# SQLTable | SPARQLTable | CSVTable
is http://islab.ntua.gr/ns/d2rml-is# (is:parameters ( ⟨DataVariable⟩+ ))?
http http://www.w3.org/2011/http# LogicalSource ← a dr:LogicalSource
cnt http://www.w3.org/2011/content# dr:source ⟨InformationSource⟩
dr:iterator literal
are declared in the order of their application. The selectors are im- dr:referenceFormulation iri
plicitly declared in the subject, predicate, object and graph maps. SQLTable ← a rr:BaseTableOrView a rr:R2RMLView
Several triples map are allowed to coexist in the a D2RML docu- rr:tableName literal rr:sqlQuery literal
ment, in which case several distinct set tables are generated. (rr:sqlVersion iri)?
We define D2RML using a BNF-like notation. Terminal sym- SPARQLTable ← a dr:SPARQLTable
bols are written in monospace, and non-terminals in italics. Non- dr:sparqlQuery literal
terminals within angle brackets represent RDF nodes. Parenthesis (dr:sparqlVersion iri)?
specify the scope of alternatives (separated by |) and of the stan- (dr:defaultGraph iri)*
(dr:namedGraph iri)*
dard quantifiers ?, *, and +. Terminal symbols not explicitly defined
in the specification are written in smallcaps. The namespaces are CSVTable ← a dr:TextTable
defined in Table 3. D2RML is compatible with R2RML, but not fully dr:delimiter literal
dr:headerline boolean
compatible with RML, so it does not directly extend its namespace.
(dr:quoteCharacter literal)?
(dr:commentCharacter literal)?
5.1 Triples Maps (dr:escapeCharacter literal)?
A triples map is defined as in R2RML and RML, but tabular data (dr:recordSeparator literal)?
providing information sources are clearly distinguished from non-
tabular by using rr:logicalTable for tabular data providing in- 5.3 Information Sources
formation sources, and dr:logicalSource for the rest.
The version of D2RML presented here provides definitions for im-
TriplesMap ← a rr:TriplesMap plementing data acquisition pipelines involving RDBMSs’, REST-
rr:logicalTable ⟨LogicalTable⟩ | ful web services and SPARQL endpoints. Extensions for additional
dr:logicalSource ⟨LogicalSource⟩ sources are expected in subsequent versions.
(dr:transformations ( ⟨Transformation⟩+ ))?
rr:subjectMap ⟨SubjectMap⟩ | rr:subject iri InformationSource ← RDMSSource | SPARQLService | HTTPSource
(rr:predicateObjectMap ⟨PredObjMap⟩)*
RDMSSource ← a is:RDBMSSource
PredObjMap ← a rr:PredicateObjectMap is:rdbms iri
(rr:predicateMap ⟨PredicateMap⟩ | is:location literal
rr:predicate iri)+ (is:username literal)?
(rr:objectMap (⟨ObjectMap⟩ | ⟨RefObjectMap⟩) | (is:password literal)?
rr:object (iri | literal))+ (is:database literal)?
(rr:graphMap ⟨GraphMap⟩ | rr:graph iri)*
SPARQLService ← a is:SPARQLService
is:uri uri
5.2 Logical Tables and Logical Sources
HTTPSource ← a is:HTTPSource
The LogicalTable and LogicalSource nodes provide details about the is:request ⟨HTTPRequest ⟩ | is:uri uri
primary information source used to generate the set table. In the (is:parameters ( ⟨Parameter ⟩+ ))?
case of query supporting information sources (such as RDBMSs’
Parameter ← DataVariable | SimpleKeyRequestIterator
and SPARQL endpoints), for backward compatibility with R2RML,
they contain also the query-relevant details of the request that DataVariable ← a is:DataVariable
is:name literal
should be sent to the information source. The is:parameters pred-
icate may be used to declare parameter names in queries that par- SimpleKeyRequestIterator ← a is:SimpleKeyRequestIterator
ticipate in parametric data acquisition pipelines. For other informa- is:name literal
dr:reference literal
tion sources (such as RESTful web services), the request, and any
dr:referenceFormulation literal
parameters, are included in the InformationSource specification it-
is:initialValue literal
self. For non-tabular data providing information sources, Logical-
Source contains also the definition of the iterator (dr:iterator In an RDBMSSource, is:rdbms determines the specific RBMBS
and dr:referenceFormulation) that will be used to split the ef- (eg. MySQL, PostgreSQL). An HTTPSource is specified in terms of a
fective data source into a logical array. Since the effective data HTTPRequest which should be a http:Request and specify the de-
source format is fixed, the object of dr:referenceFormulation tails of the HTTP message to be sent. An HTTPSource may contain
D2RML: Integrating Heterogeneous Data and Web Services
into Custom RDF Graphs LDOW2018, April 2018, Lyon, France

parameters in case the web service is part of a parametric data ac- Condition ← (ValueRef )?
quisition pipeline, or it paginates the results. Data parameters are (dr:booleanOperator iri)?
(operator literal | dr:operand ⟨Condition⟩)+
identified by a name (is:name). For paginated results, the above
specification allows, as an example, iterated requests through a re- RefObjectMap ← a rr:RefObjectMap
quest iterator that should be part eg. of the web service URI and rr:parentTriplesMap ⟨TriplesMap⟩
((rr:joinCondition ⟨JoinCondition⟩)+ |
whose values, apart from the initial value (is:initialValue) are
(dr:parameterBinding ⟨ParameterBinding⟩)+ )?
extracted each time from the previous reply using a selector. Ex-
tensions are possible to support additional pagination policies. JoinCondition ← a rr:Join
rr:child literal
rr:parent literal
5.4 Transformations
To support filters, a SubjectMap, GraphMap, PredicateMap or Ob-
A triples map definition may include a list of transformations that
jectMap may contain a condition (dr:condition) and/or a case
should be applied in the declared order to the set table derived from
statement (dr:cases). If a term map contains a condition state-
the primary information source. Since a transformation is itself
ment, this will be evaluated and the corresponding subject, graph,
a parametric data acquisition pipeline, its definition includes the
predicate or object value will be included in the respective value
specification of an InformationSource through a rr:logicalTable
set only if the condition evaluates to true. Each condition statement
or dr:logicalSource and one or more ParameterBindings. A Pa-
should first specify the actual value on which it will operate (as a
rameterBinding consists of a reference to a value (ValueRef ) or a
ValueRef ), and may include several tests which will be jointly eval-
constant value, and the parameter name (dr:parameter) in the
uated using the boolean operator specified by dr:booleanOperator
corresponding information source the value will be bound to.
(op:and or op:or). Each test is specified either through an opera-
Transformation ← a dr:Transformation tor and a literal which define a constant value with which the
rr:logicalTable ⟨LogicalTable⟩ | actual value will be compared using operator, or as a nested con-
dr:logicalSource ⟨LogicalSource⟩ dition. An operator is a common operator such as op:eq, op:le,
(dr:parameterBinding ⟨ParameterBinding⟩)+ op:leq, op:ge, op:geq, op:matches, etc. The type of the operation
ParameterBinding ← a dr:ParameterBinding (eg. number or string comparison) depends on the XSD type of the
dr:parameter literal literal provided as operand. If a nested condition does not specify
rr:constant literal | ValueRef a value reference, it inherits it from the enclosing condition.
The case statement offers alternatives for realizing a term map:
It contains a list of alternative term maps, each along with a con-
5.5 Term Maps and Conditions dition. If the condition evaluates to true the term map is realized,
The definitions of term maps (i.e. of subject maps, graph maps, otherwise control flows to the next case.
predicate maps and object map) follow the R2RML specification Finally, a referring object map (RefObjectMap) may be defined by
with the addition of filters. a ParameterBinding, instead of by a R2RML JoinCondition. This
is how set table specifications with parametric primary data acqui-
SubjectMap ← a rr:SubjectMap
sition pipelines are defined: the parametric set table specification
IRIRef | BlankNodeRef
(SubjectBody CaseSubjectBody*) | CaseSubjectBody+
corresponds to the parent triples map of RefObjectMap, and the
ParameterBinding provides the parameters values.
PredicateMap ← a rr:PredicateMap
(PredicateBody CasePredBody*) | CasePredBody+
ObjectMap ← a rr:ObjectMap 5.6 IRIs, Literals and Blank Nodes
(ObjectBody CaseObjectBody*) | CaseObjectBody+ In R2RML, RDF terms are generated using the rr:constant, the
GraphMap ← a rr:GraphMap rr:column and rr:template predicates; to these, RML adds the
(GraphBody CaseGraphBody*) | CaseGraphBody+ rml:reference option. D2RML follows the same strategy, but to
SubjectBody ← (rr:class IRI)* account for values coming from transformations, RDF terms are
(rr:graphMap ⟨GraphMap⟩ | rr:graph IRI)* generated through value references (ValueRefs), specified by two
(dr:condition ⟨Condition⟩)? distinct components: a compulsory rr:column, rr:template or
PredicateBody ← IRIRef dr:reference, and an optional dr:transformationReference to
(dr:condition ⟨Condition⟩)? specify the transformation that provides the logical array for the
ObjectBody ← IRIRef | BlankNodeRef | LiteralRef respective rr:column, rr:template or dr:reference. If missing,
(dr:condition ⟨Condition⟩)? the primary logical array is assumed.
GraphBody ← IRIRef Although rr:template allows some minimal flexibility in defin-
(dr:condition ⟨Condition⟩)? ing custom IRIs or literals, the overall mechanism is quite restric-
CaseSubjectBody ← dr:cases ( ⟨SubectBody ⟩+ )
tive, since no simple transformations (e.g. replace particular char-
acters etc.) can be applied on the values obtained from the underly-
CasePredBody ← dr:cases ( ⟨PredicateBody ⟩+ )
ing set tables. D2RML addresses this issue by allowing simple func-
CaseObjectBody ← dr:cases ( ⟨ObjectBody ⟩+ ) tions to be applied on the raw values obtained from effective data
CaseGraphBody ← dr:cases ( ⟨GraphBody ⟩+ ) sources. Thus, a ValueRef may include definitions of one or more
LDOW2018, April 2018, Lyon, France Alexandros Chortaras and Giorgos Stamou

defined columns (dr:definedColumns) that are constructed by ap- page contains a key to accessing the next page (nextCursor). An
plying a series of functional transformations on particular set table extract from the response obtained from executing the above is the
column values and may be used in a rr:column or rr:template. following JSON document, which contains a list of items modeled
A defined column should declare the new column name dr:name it using the Europeana Data Model (EDM):
will be referred by, the function (dr:function) that will generate {
the custom values (eg. op:regex, op:replace), and a list of argu- "nextCursor": "AoE/GC8yMDI0OTA0L3Bob3Rv****=",
"items": [
ments, in the form of one or more dr:parameterBindings. The {
parameter names should be provided by the function definition. "id": "/2024904/photography_ProvidedCHO_TopFoto_co_uk_EU061905",
"dcDescription": [
"Former chief inspector Berrett decorated by the king.\n
IRIRef ← rr:constant iri | ValueRef
Former chief detective inspector James Berrett of
(rr:termType rr:IRI)? Scotland Yard was decorated by the King at the royal
LiteralRef ← rr:constant literal | ValueRef invesititure at Buckingham Palace. "
],
(rr:termType rr:Literal)? "edmIsShownBy": [
(rr:language literal | rr:datatype iri)? "http://www.topfoto.co.uk/imageflows/imagepreview/f=EU061905"
],
BlankNodeRef ← ValueRef
"edmConcept": [
(rr:termType rr:BlankNode)? "http://bib.arts.kuleuven.be/photoVocabulary/12003",
ValueRef ← rr:column literal | rr:template literal | "http://data.europeana.eu/concept/base/1711"
],
dr:reference literal "type": "IMAGE"
(dr:transformationReference ⟨Transformation⟩)? }, ...
(dr:definedColumns ( ⟨DefinedColumn⟩+ ))? ]
}
DefinedColumn ← a dr:DefinedColumn
dr:name literal Most fields are self-explanatory. edmConcept contains a list of
dr:function iri Open Linked Data resources that have been associated to each item
(dr:parameterBinding ⟨ParameterBinding⟩)+ by the provider to characterize the respective item content. To gen-
erate RDF triples for this information, as well as for the type of each
item, we define the following D2RML document:
6 USE CASE <#EuropeanaMapping>
In this section, we present a realistic use case for D2RML, involving dr:logicalSource [ dr:source <#EuropeanaAPI> ;
dr:iterator "$.items" ;
true data and readily available web services and data repositories. dr:referenceFormulation is:JSONPath ; ] ;
The aim is to extract an extensive set of textual or URI features for a rr:subjectMap [
set of cultural items, in order to subsequently use them to perform dr:definedColumns ( [
dr:name "SID" ;
several tasks such as clustering and similarity ranking. We assume dr:function op:extractMatch ;
that we want to extract features in several ways (e.g. directly from dr:parameterBinding [ dr:parameter "input" ;
the metadata, from applying named entity extraction, image analy- dr:reference "$.id" ; ] ;
dr:parameterBinding [ dr:parameter "regex" ;
sis, etc.), and that we want to keep information about the source of rr:constant "^.*_(.*)$" ; ] ;
each feature so that we can use them selectively to test how they ] ) ;
rr:template "http://islab.ntua.gr/resources/tp/{SID}" ;
affect the clustering or similarity algorithm performance. dr:cases ( [
As primary information source of cultural items we use Euro- rr:class ;
peana Collections20 , in particular the collection provided by Top- dr:condition [ dr:reference "$.type" ;
op:eq "IMAGE"^^xsd:string ; ] ;
Foto21 , which consists of 60,882 black and white images of the ] [
1930s, along with their metadata. This collection can be obtained rr:class ;
through the Europeana API. The D2RML specification for getting ] ) ;
] ;
the effective data source for this collection is the following: rr:predicateObjectMap [
<#EuropeanaAPI> rr:predicate ;
a is:HTTPSource ; rr:objectMap [ dr:reference "$.edmConcept" ;
is:request [ rr:termType rr:IRI ; ] ;
http:absoluteURI "http://www.europeana.eu/api/v2/search.json? ] .
wskey=A*******W&rows=20&cursor={@@cursor@@}&profile=rich&
query=europeana_collectionName%3A%222024904_Ag_EU_ Note the use of a defined column to construct custom RDF sub-
EuropeanaPhotography_TopFoto_1013%22" ; ject IRIs. The particular defined column applies the regular expres-
http:methodName "GET" ;
] ;
sion ˆ.*_(.*)$ on the id field of each item and uses the value
is:parameters ( [ a is:SimpleKeyRequestIterator ; of the first capturing group, named SID. The above specification
is:name "cursor" ; generates the following RDF triples for the first item:
is:initialValue "*" ;
dr:reference "$.nextCursor" ;
dr:referenceFormulation is:JSONPath ; ] ) .
.
The specification includes a is:SimpleKeyRequestIterator as

parameter, because the API returns the results in pages, and each .

20 https://www.europeana.eu/portal/en 21 http://www.topfoto.co.uk/
D2RML: Integrating Heterogeneous Data and Web Services
into Custom RDF Graphs LDOW2018, April 2018, Lyon, France

. and the transformation
Since we want to extract several features, we can invoke ser- <#DBpediaTransformation>
dr:logicalSource [
vices to the analyze metadata. An option is to use DBpedia Spot- dr:source <#DBpediaSPARQLService> ;
light to extract named entities from the textual descriptions. To do dr:query "SELECT ?dbpediatype WHERE
this, we need a transformation that takes the description of each { <{@@resource@@}> a ?dbpediatype }" ;
is:parameters ( [ a is:DataVariable;
item (dcDescription) and invokes DBpedia Spotlight on it. We is:name "resource" ; ] ) ;
first define the relevant information source: ] ;
<#DBpediaSpotlightAPI> dr:parameterBinding [
a is:HTTPSource ; dr:parameter "resource" ;
is:request [ dr:reference "/Resource/@URI" ;
http:absoluteURI "http://model.dbpedia-spotlight.org/en/ dr:transformationReference <#SpotlightTransformation> ;
annotate?text={@@text@@}&confidence=0.5&support=0& ] .
spotter=Default&disambiguator=Default&policy=whitelist&
types=&sparql=" ;
http:methodName "GET" ;
Finally, we modify <#EuropeanaMapping> to add the new trans-
http:headers ( [ http:fieldName "Accept" ; formation and a add new predicate object map:
http:fieldValue "application/xml" ; ] ) ; <#EuropeanaMapping>
] ; ...
is:parameters ( [ a is:DataVariable ; dr:transformations ( <#SpotlightTransformation>
is:name "text" ; ] ) . <#DBpediaTransformation> ) ;
The respective effective data source has the following XML format rr:predicateObjectMap [
rr:predicate ;
;
Palace." confidence="0.5" support="0"
rr:termType rr:IRI ;
types="" sparql="" policy="whitelist">
dr:condition [

op:matches "http://dbpedia\\.org/ontology/.*" ;

... Note that the mapping includes a conditional statement. It has

been included because the query returns not only DBpedia ontol-
which includes all detected named entities (Resource) as DBpedia ogy concepts as types, but also FOAF, YAGO, Schema, Wikidata,
resources (URI). We next define the transformation and other resources, which we do not want to include in our re-
<#SpotlightTransformation> sults. Eventually, this map generates the following RDF triples:
dr:logicalSource [ dr:source <#DBpediaSpotlightAPI> ;
dr:iterator "/Annotation/Resources/Resource" ;
dr:referenceFormulation is:XPath ; ] ; .
dr:parameterBinding [ dr:parameter "text" ;
dr:reference "$.dcDescription" ; ] .
.
and add the transformation and a new predicate object map to the
<#EuropeanaMapping> triples map:
<#EuropeanaMapping> .
...
dr:transformations ( <#SpotlightTransformation> ) ; Finally, we can use computer vision technologies to analyze the
rr:predicateObjectMap [ image of each item (the URI is provided by the edmIsShownBy field
rr:predicate ; in the document returned by the Europeana API) to detect objects
rr:objectMap [
dr:reference "/Resource/@URI" ; that appear in it. To this end we use Microsoft’s Computer Vision
dr:transformationReference <#SpotlightTransformation> ; API, that is offered as a RESTful web service. Thus, we add a new
rr:termType rr:IRI ;
] ;
information source including the required request parameters
] . <#ComputerVisionAPI>
a is:HTTPSource
When executed, it generates the following additional triples: is:request [
http:absoluteURI "https://westcentralus.api.cognitive.microsoft.
com/vision/v1.0/analyze?visualFeatures=Categories&
. language=en" ;
http:methodName "POST" ;
http:headers ( [ http:fieldName "Content-Type" ;
. http:fieldValue "application/json" ; ]
[ http:fieldName "Ocp-Apim-Subscription-Key" ;
We further extend the set of features by using DBpedia ontol- http:fieldValue "3*************************b" ; ] ) ;
ogy to get the types of the retrieved DBpedia resources. For this http:body [ a cnt:ContentAsText ;
we need a second transformation, dependent on the first one, that cnt:chars "{\"url\" : \"{@@imageURL@@}\" }" ; ] ;
] ;
consults a DBpedia endpoint. The information source definition is is:parameters ( [ a is:DataVariable ;
<#DBpediaSPARQLService> is:name "imageURL" ; ] ) .
a is:SPARQLService ;
is:uri "http://dbpedia.org/sparql" . which produces the following JSON-formatted effective data source:
LDOW2018, April 2018, Lyon, France Alexandros Chortaras and Giorgos Stamou

{ ACKNOWLEDGEMENTS
"categories": [ {
"name": "people_group", We acknowledge support of this work by the project ‘APOLLONIS’
"score": 0.578125 (MIS 5002738) which is implemented under the Action ‘Reinforce-
} ],
"requestId": "3b28df72-abf5-488c-86f4-b2c6a7eb9703"
ment of the Research and Innovation Infrastructure’, funded by the
} Operational Programme ‘Competitiveness, Entrepreneurship and
Innovation’ (NSRF 2014-2020) and co-financed by Greece and the
Based on this, we define the transformation
European Union (European Regional Development Fund).
<#ImageTransformation>
dr:logicalSource [ dr:source <#ComputerVisionAPI> ;
dr:iterator "$.categories" ; REFERENCES
dr:referenceFormulation is:JSONPath ; ] ; [1] Marcelo Arenas, Alexandre Bertails, Eric Prud’hommeaux, and Juan Sequeda.
dr:parameterBinding [ dr:parameter "imageURL" ; 2012. A Direct Mapping of Relational Data to RDF. (2012). https://www.w3.
dr:reference "$.edmIsShownBy" ; ] . org/TR/rdb-direct-mapping/
[2] Stefan Bischof, Stefan Decker, Thomas Krennwallner, Nuno Lopes, and Axel
to generate a logical array from categories that contains the names Polleres. 2012. Mapping between RDF and XML with XSPARQL. J. Data Se-
of the detected objects, and modify <#EuropeanaMapping> by adding mantics 1, 3 (2012), 147–185.
the new transformation and a new predicate object map: [3] Barry Bishop, Atanas Kiryakov, Damyan Ognyanoff, Ivan Peikov, Zdravko Ta-
shev, and Ruslan Velkov. 2011. OWLIM: A family of scalable semantic reposito-
<#EuropeanaMapping> ries. Semantic Web 2, 1 (2011), 33–42.
dr:transformations ( <#SpotlightTransformation> [4] Pierre Bourhis, Juan L. Reutter, Fernando Suárez, and Domagoj Vrgoc. 2017.
<#DBpediaTransformation> <#ImageTransformation> ) ; JSON: Data model, Query languages and Schema specification. In PODS. ACM,
rr:predicateObjectMap [ 123–135.
rr:predicate ; [5] James Clark and Steve DeRose. 2016. XML Path Language (XPath) Version 1.0.
rr:objectMap [ (2016). https://www.w3.org/TR/xpath/
dr:reference "$.name" ; [6] Dan Connolly. 2007. Gleaning Resource Descriptions from Dialects of Languages
dr:transformationReference <#ImageTransformation> ; (GRDDL). (2007). https://www.w3.org/TR/grddl/
rr:termType rr:Literal ; [7] Richard Cyganiak, Chris Bizer, Jörg Garbers, Oliver Maresch, and Christian
dr:condition [ Becker. 2012. The D2RQ Mapping Language. (2012). http://d2rq.org/
dr:reference "$.score" ; d2rq-language
dr:transformationReference <#ImageTransformation> ; [8] Souripriya Das, Seema Sundara, and Richard Cyganiak. 2012. R2RML: RDB to
op:geq "0.4"^^xsd:decimal ; RDF Mapping Language. (2012). https://www.w3.org/TR/r2rml/
] ; [9] Anastasia Dimou, Miel Vander Sande, Pieter Colpaert, Ruben Verborgh, Erik
] ; Mannens, and Rik Van de Walle. 2014. RML: A Generic Language for Integrated
] ; RDF Mappings of Heterogeneous Data. In LDOW (CEUR Workshop Proceedings),
Vol. 1184. CEUR-WS.org.
The above object map applies a filter in order to keep only objects [10] Lee Feigenbaum, Gregory Todd Williams, Kendall Grant Clark, and Elias Torres.
that have been detected with relatively high confidence (score). 2013. SPARQL 1.1 Protocol. (2013). https://www.w3.org/TR/sparql11-protocol/
[11] Roy T. Fielding and Richard N. Taylor. 2000. Principled design of the modern
Eventually, the above map adds the following RDF triple: Web architecture. In ICSE. ACM, 407–416.
[12] Stefan Gössner and Stephen Frank. 2007. JSONPath. (2007). http://goessner.
net/articles/JsonPath/
"people_group" . [13] Oktie Hassanzadeh, Soheil Hassas Yeganeh, and Renée J. Miller. 2011. Linking
Semistructured Data on the Web. In WebDB.
The RDF triples generated by all the above predicate-object maps [14] Matthias Hert, Gerald Reif, and Harald C. Gall. 2011. A comparison of RDB-to-
make up the desired RDF graph. In terms of performance, for exe- RDF mapping languages. In I-SEMANTICS (ACM International Conference Pro-
ceeding Series). ACM, 25–32.
cuting the above D2RML document, our implementation of D2RML [15] Internet Engineering Task Force (IETF). 2014. The JavaScript Object Notation
processor22 took about 7 minutes per 100 Europeana items. (JSON) Data Interchange Format. (2014). https://tools.ietf.org/html/rfc7159
[16] Johannes Koch, Carlos A Velasco, and Philip Ackermann. 2017. HTTP Vocabu-
lary in RDF 1.0. (2017). https://www.w3.org/TR/HTTP-in-RDF10/
7 CONCLUSIONS [17] Johannes Koch, Carlos A Velasco, and Philip Ackermann. 2017. Representing
Content in RDF 1.0. (2017). https://www.w3.org/TR/Content-in-RDF10/
We presented D2RML, a Data-to-RDF mapping language, which [18] Andreas Langegger and Wolfram Wöß. 2009. XLWrap - Querying and Integrat-
based on an abstract data model, allows the orchestrated retrieval ing Arbitrary Spreadsheets with SPARQL. In International Semantic Web Confer-
of data from several information sources, their transformation and ence (Lecture Notes in Computer Science), Vol. 5823. Springer, 359–374.
[19] Franck Michel, Loïc Djimenou, Catherine Faron Zucker, and Johan Montagnat.
extension using relevant web services, their filtering and manipu- 2014. xR2RML: Non-Relational Databases to RDF Mapping Language. (2014).
lation using simple operations, and finally their mapping to RDF https://hal.inria.fr/hal-01066663v1/document
[20] Boris Motik, Peter F. Patel-Schneider, and Bijan Parsia. 2012. OWL 2 Web On-
graphs. It combines the mapping approach of R2RML and RML tology Language Structural Specification and Functional-Style Syntax (Second
with workflow approaches, by allowing the definition of easy to Edition). (2012). https://www.w3.org/TR/owl2-syntax/
write and understand, homogenous views of the underlying data [21] Yavor Nenov, Robert Piro, Boris Motik, Ian Horrocks, Zhe Wu, and Jay Baner-
jee. 2015. RDFox: A Highly-Scalable RDF Store. In International Semantic Web
and services in a lightweight document. We developed D2RML on Conference (2) (Lecture Notes in Computer Science), Vol. 9367. Springer, 3–20.
top of a formal abstract data model, so as to formally define its se- [22] Martin J. O’Connor, Christian Halaschek-Wiener, and Mark A. Musen. 2010. M2 :
mantics and allow future extensions. We also presented a realistic A Language for Mapping Spreadsheets to OWL. In OWLED (CEUR Workshop
Proceedings), Vol. 614. CEUR-WS.org.
use case, which demonstrates the capabilities of the proposed lan- [23] Jason Slepicka, Chengye Yin, Pedro A. Szekely, and Craig A. Knoblock. 2015.
guage in real settings, by delivering a unified and coordinated ac- KR2RML: An Alternative Interpretation of R2RML for Heterogenous Sources.
cess to Linked Data data stores and other services in a clean specifi- In COLD (CEUR Workshop Proceedings), Vol. 1426. CEUR-WS.org.
cation without the need of code writing or heavy-weight solutions.

22 Available as a web service at http://apps.islab.ntua.gr/d2rml/