<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>ACM Reference Format:
Alexandros Chortaras and Giorgos Stamou.</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>D2RML: Integrating Heterogeneous Data and Web Services into Custom RDF Graphs</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Alexandros Chortaras</string-name>
          <email>achort@cs.ntua.gr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giorgos Stamou</string-name>
          <email>gstam@cs.ntua.gr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>National Technical University of Athens</institution>
          ,
          <addr-line>Athens</addr-line>
          ,
          <country country="GR">Greece</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2018</year>
      </pub-date>
      <volume>2</volume>
      <abstract>
        <p>In this paper, we present the D2RML Data-to-RDF Mapping Language, as an extension of the R2RML mapping language, which significantly enhances its abilities to collect data from diverse data sources and transform them into custom RDF graphs. The definition of D2RML is based on a simple formal abstract data model, which is needed to clearly define its semantics, given the diverse types of data representation standards used in practice. D2RML allows web service-based data transformations, simple data manipulation and filtering, and conditional maps, so as to improve the selectivity of RDF mapping rules and facilitate the generation of higher quality RDF data stores, through a lightweight, easy to write and modify specification.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>CCS CONCEPTS</title>
      <p>• Information systems → Information integration; Web data
description languages; Query languages; Web services;</p>
    </sec>
    <sec id="sec-2">
      <title>INTRODUCTION</title>
      <p>
        In the past years, a considerable amount of work has been done
on developing methodologies for mapping relational databases to
RDF graphs. Several approaches, mapping languages and systems
have been proposed, including two W3C recommendations [
        <xref ref-type="bibr" rid="ref1 ref8">1, 8</xref>
        ].
This work has mainly been motivated by the need to integrate
the huge amount of information contained in existing relational
databases with the emerging Semantic Web, and make them part
of the Linked Data cloud.
      </p>
      <p>Permission to make digital or hard copies of part or all of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. Copyrights for third-party components of this work must be honored.
For all other uses, contact the owner/author(s).</p>
      <p>LDOW2018, April 2018, Lyon, France
© 2018 Copyright held by the owner/author(s).</p>
      <p>Following the Linked Data growth, several research institutions
and companies such as DBpedia1, WordNet2, OpenStreetmap3,
offer now access to their huge datastores through SPARQL endpoints
or RESTful web services. Even more recently, the expansion of
cloud computing and the exciting developments in the field of
machine learning and the subsequent revival of interest in artificial
intelligence applications has resulted in the emergence of cloud
platforms and marketplaces that ofer intelligent data analysis web
services, often representing their output using Linked Open Data
vocabularies and resources, such as DBedia Spotlight4, Google’s
Cloud Natural Language5 and Microsoft’s Computer Vision API6.
These services typically deliver data using some structured data
exchange format (usually JSON or XML documents).</p>
      <p>
        Thus, if until recently the question was how to integrate
existing data with the Semantic Web, now part of the question is also
how to use all these available data and diverse services in a
coordinated and integrated manner to selectively pick and aggregate
data into custom data stores to power new intelligent applications.
In this respect, aggregating data into custom RDF data stores is of
particular interest not only because they allow direct integration
with the Linked Data cloud, but also because intelligence can be
added on top of the data by including e.g. axiomatic knowledge in
the form of OWL2 [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] axioms. As a matter of fact, recent work on
eficient algorithms and methods for reasoning with tractable
fragments of ontologies (e.g. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ]) has allowed the development of
practical systems that provide inferencing over semantic data.
      </p>
      <p>In this environment, we propose D2RML, a generic Data-to-RDF
Mapping Language, whose aim is to facilitate the generation of
custom RDF data stores by selectively collecting and integrating
data from diverse data sources and web services into as much as
possible high quality RDF data stores. Our purpose is to provide
a formal basis for defining transformation-oriented general
Datato-RDF mappings, as well as, while staying within the mapping
language approach, to transfer as much as possible of the burden
for generating such data stores in practice from writing code or
using heavyweight data workflow solutions, to writing easy
understandable and modifiable specifications.</p>
      <p>The rest of the paper is organized as follows: In Section 2 we
briefly discuss related work with emphasis on R2RML and RML,
which are the starting points for our work. In Section 3 we define
the simple theoretical data model that underlies D2RML. In Section
4 we describe how several widely used information sources can
be cast onto our model, and in Section 5 we present the formal
specification of D2RML. Section 6 presents an extensive realistic
1 http://dbpedia.org/sparql/ 2 http://wordnet-rdf.princeton.edu/
3 http://api.openstreetmap.org/ 4 http://www.dbpedia-spotlight.org/api/
5 https://cloud.google.com/natural-language/
6 https://azure.microsoft.com/en-us/services/cognitive-services/computer-vision/
use case that showcases the expressivity and practical usefulness
of the proposed language, and Section 7 concludes the paper.
2</p>
    </sec>
    <sec id="sec-3">
      <title>RELATED WORK</title>
      <p>
        Several languages and systems have been proposed to map
relational databases to RDF (RDB-to-RDF mapping languages). A
comparative analysis is presented in [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], which determines fifteen
desirable features (e.g. support for transformation functions, named
graphs, integrity constraints) that such languages should have, and
discusses how they are or are not supported by the several
languages. Existing RDB-to-RDF mapping languages vary
considerably in the flexibility they allow in defining mappings, from the
rigid Direct Mapping [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] approach that automatically translates
the data of a relational database into an RDF graph representation
following the database schema, to the R2RML language [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] that
allows the user to define custom views and mapping rules (expressed
as RDF graphs), and satisfies most of the fifteen desirable features.
      </p>
      <p>
        The development of mapping languages and practical systems
for translating data sources other than relational databases to RDF
graphs has also been attempted. Closer to the relational model are
CSV/TSV documents and spreadsheets, which retain the tabular
format. Tools for converting from these data sources include
XLWrap [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], TaRQL7, Vertere8, and M2 [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ]. In all such tools, for each
table row one or more RDF resources are generated, and for each
column one or more RDF triples about the respective resources
are generated. Other formats, such as XML, diverge considerably
from tabular data owing to their hierarchical structure, and the
systems that have been proposed to translate XML to RDF graphs rely
on XSLT transformations (e.g. XML2RDF9), XPath (e.g. Tripliser10),
XQuery (e.g. XSPARQL [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]) or on embedding within the XML
documents links to transformation algorithms, typically XSLT
transformations (GRDDL [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]). All such tools rely on syntactical
transformations of parts of the XML structure to RDF triples. Another
framework to assist the transformation of XML and JSON data
sources is xCurator [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] which focus on delivering high-quality
linked data. Apart from the above, there exist also tools, in the
form of web services (e.g. The Datatank11) or parts of other
infrastructures (e.g. Virtuoso Sponger12) that provide custom solutions
to work with data from diferent formats and possibly construct
RDF graphs out of them. These tools, however, are general data
processing and transformation tools and not designed to directly
support semantic mappings of general data to RDF triples.
      </p>
      <p>
        To resolve the polymorphy of tools and focus on the semantic
aspects of the Data-to-RDF mapping process, several works
extend the W3C recommended R2RML language to support other
data formats. These include KR2RML [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ], xR2RML [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] and RML
[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. These proposals are a considerable advance with respect to
custom system solutions, because they are based on an existing,
clean, mapping-oriented standard, and allow backward
compatibility, and in most cases extensibility. It should be noted, however,
that simply extending the R2RML standard to support other data
source types, does not necessarily carry on all its features into the
7 https://github.com/tarql/tarql/ 8 https://github.com/knudmoeller/Vertere-RDF/
9 http://www.gac-grid.de/project-products/Software/XML2RDF.html
10 http://daverog.github.io/tripliser/ 11 http://thedatatank.com/
12 http://vos.openlinksw.com/owiki/wiki/VOS/VirtSponger
other data types. E.g. select conditions and transformation
functions are supporting implicitly by R2RML by relying on the
expressivity of the SQL query language, but this is not fully portable in a
straightforward extension to the case of XML or JSON documents.
2.1
      </p>
      <p>R2RML and RML
R2RML works with logical tables (rr:LogicalTable), which may
be either base tables or views (rr:BaseTableOrView) defined by
specifying an appropriate table name (rr:tableName), or result
sets (rr:R2RMLView) obtained by executing a query (rr:sqlQuery).
Each logical table is mapped to RDF triples using one or more
triples maps (rr:TriplesMap). A triples map is a complex rule
that maps each row in the underlying logical table to several RDF
triples. The rule has two parts: a subject map (rr:SubjectMap) that
generates the subject of all RDF triples that will be generated from
each row of the logical table, and several predicate-object maps
(rr:PredicateObjectMap) that in turn consist of predicate maps
(rr:PredicateMap) and object maps (rr:ObjectMap) or
referencing object maps (rr:RefObjectMap). A predicate map determines
predicates for the to-be generated RDF triples for the given
subject, and the object maps their objects. A subject map may include
several IRIs (rr:class) that will be used as objects to generate
triples with the predicate rdf:type for the particular subjects. A
subject map or predicate-object map may have also one or more
graph maps (rr:GraphMap) associated with it, which specify the
target graph of the resulting RDF triples. Referring object maps
allow joining two diferent triples maps. A referring object map
specifies a parent triples map ( rr:parentTriplesMap), the subjects of
which will act as objects for the current triples map, and may
contain (rr:joinCondition) a join condition (rr:Join) specified by
a reference to a column name of the current and parent triples
map (rr:child and rr:parent, respectively). The IRIs and
literals that will be used as RDF triple subjects, predicates, objects, or
RDF graph names may be either declared constants (rr:constant),
or obtained from the underlying table, view or result set by
specifying the desired column name (rr:column) that will act as value
source, or generated through a string template (rr:template) to
concatenate column values and custom strings. String templates
offer only very rudimentary options to manipulate actual database
values and generate custom IRIs and literals.</p>
      <p>RML extends R2RML by allowing other sources (e.g. JSON or
XML files) apart from logical tables ( rml:LogicalSource), that
may be used in an interlinked manner, by defining data iterators
(rml:iterator) to split the data obtained from such sources into
base elements on which each mapping rule will be applied, and
by allowing particular references (rml:reference), in the form of
subelement selectors within the base element, to define the value
sources to be used for the generation of IRIs and literals. Both the
iterators and the references depend on the underlying data source,
and may be XPath queries, JSONPath queries, CSV column names
or SPARQL return variable names. Their type is declared using the
rml:referenceFormulation predicate.</p>
      <p>With respect to the specification of the actual access to the data
sources, R2RML leaves the issue to the implementation. The
assumption is that each R2RML document applies to data from a
unique database. In contrast, RML, which allows multiple sources
and cross-references between the retrieved data, must include the
data source descriptions within the RML document. To describe
them, it suggests the use of some recommended or widely-used
vocabularies such as DCAT13, D2RQ14, CSVW15, Hydra16,
SPARQLSD17 to access files, relational databases, CSV/TSV files, web APIs
and SPARQL endpoints, respectively. However, these vocabularies
have been developed mainly for APIs and data sources to inform
clients about their exact properties and services they ofer, and not
as a form of formulating requests to them. E.g. to retrieve data from
a web API that paginates the results using next page access keys,
knowledge on how to formulate each time the subsequent HTTP
request is needed; this is not covered for example by Hydra.
Similarly, a SPARQL-SD specification provides information about the
supported SPARQL version, the default entailment regime, the
default named graph, etc., which are not useful to a client, at the time
of formulating a request.
3</p>
    </sec>
    <sec id="sec-4">
      <title>DATA MODEL</title>
      <p>In this section, we extend the table-based model underlying the
R2RML language to support complex, non-tabular data, that can be
obtained from various information sources (such as JSON or XML
document returning sources). To do this we consider that instead
of logical tables, RDF triples are generated from set tables. In the
following we represent an RDF triple as a tuple ⟨s, p, o⟩, where s is
the subject, p the property or predicate and o the object.</p>
      <p>Definition 3.1. A set row of arity k is a tuple ⟨D1, . . . , Dk ⟩, where
D1, . . . , Dk are sets of values over some domains. A name row of
arity k is a tuple ⟨n1, . . . , nk ⟩, where n1, . . . , nk are names. A set
table of arity k with m rows is a tuple S = ⟨N , T ⟩, where N is a
name row and T = [D1, . . . , Dm ] a list of set rows, all of arity k,
such as the i-th elements of D1, . . . , Dm , for 1 ≤ i ≤ k, share all
the same domain.</p>
      <p>The names allow us to refer to particular elements of set rows
and tables. We denote the set of values that corresponds to name
ni (1 ≤ i ≤ k) in a set row D by D[nk ]. We also denote the list
[D1[nk ], . . . , Dm [nk ]] of value sets that are obtained from the
several set rows of S by S[nk ], which we call a column of S. Let also
dom(n) denote the domain of column n. It should be underlined,
that for a particular set row D and the diferent possible names ni ,
the several sets D[ni ] may have diferent numbers of values, there
is no alignment between the individual values among the several
sets, and all individual values are equivalent with respect to their
relation to the values of the other sets in the same set row.</p>
      <p>Definition 3.2. A filter F over a set table S of arity k is a tuple
⟨n, f ⟩, where n is a column name and f : dom(n) → dom(n) a
function, such that f (D[n]) ⊆ D[n] for all set rows D of S.</p>
      <p>We denote the set value f (D[n]), obtained by applying F on a
set row D by F (D). Clearly, f may be the identity function.</p>
      <p>Definition 3.3. A triples rule R over a set table S = ⟨N , T ⟩ is
a triple of filters ⟨Fs , Fp , Fo ⟩, over S, called the subject, predicate
14 http://d2rq.org/d2rq-language
13 https://www.w3.org/TR/vocab-dcat/
15 https://www.w3.org/TR/tabular-metadata/
16 https://www.hydra-cg.com/spec/latest/core/
17 https://www.w3.org/TR/sparql11-service-description/
and object filter , respectively. The implementation of R is the set of
RDF triples</p>
      <p>{(s, p, o) | s ∈ Fs (D), p ∈ Fp (D), o ∈ Fo (D), D ∈ T }.</p>
      <p>A set of triples rules over one or more set tables defines a
Datato-RDF mapping. Using the above simple model we can define
Datato-RDF mappings for any information sources that can give rise to
one or more set tables. The triple store represented by a
Data-toRDF mapping is then the implementation of all its triples rules.</p>
      <p>We consider an information source to be any online software
system that can deliver structured data upon request. The information
source may be a data repository (e.g. a relational database, an RDF
store, an XML file stored in some directory) or an implementation
of a service or an algorithm (e.g. a RESTful web service) that may
process some input data and deliver some structured output. The
request, in the form of a query (e.g. an SQL or SPARQL SELECT
query) or message (e.g. an HTTP GET or POST request) in a
format supported by the information source, includes all input data
and parameters required by the information source to generate and
deliver the output. The reply, or efective data source , is the output
produced by the information source, upon processing the request.
The reply may be delivered to the client in a native format (e.g. as
an SQL result set), or in a generic document format (e.g. as a JSON
or XML document).</p>
      <p>To accommodate the several possible information sources in
our model, we consider, as in RML, that the efective data source
groups some set of autonomous elements (e.g. rows of an SQL
result set, elements of a JSON array). The division of the reply in
these autonomous elements is achieved through an iterator. Hence,
an efective data source together with an iterator specifies a logical
array, through whose items the iterator eventually iterates. Each
item of a logical array may itself be a complex data structure (a
new efective data source), so in order to extract from it lists of
values to construct set rows and use them as subjects, predicates and
objects of RDF triples, we need some selectors. Thus, the role of the
selectors is to transform a logical array into a set table.</p>
      <p>Definition 3.4. The triple A = ⟨I, t , L⟩, where I is a
information source and request specification, t an iterator specification,
and L a set of selectors, is a data acquisition pipeline.</p>
      <p>It follows that each data acquisition pipeline A gives rise to a
unique set table SA . A data acquisition pipeline may be
parametric, in the sense that the information source or request specification
may contain parameters. Given a non-parametric data acquisition
pipeline A, a parametric data acquisition pipeline A ′ that depends
on A is a data acquisition pipeline whose parameters take values
from one or more columns of SA . We call such a parametric data
acquisition pipeline a transformation of A.</p>
      <p>Definition 3.5. A series of data acquisition pipelines A0, A1, . . .,
Al , where each Ai , for i &gt; 1, is a transformation that depends on
one or more Aj for j &lt; i is a set table specification . A0 is the
primary data acquisition pipeline.</p>
      <p>A set table specification gives rise to a unique set table, which is
SA0 extended by columns contributed by transformations A1, . . .,
Al . A trivial set table specification consists only of the primary
data acquisition pipeline A0. Each transformation in a set table
specification is realized as a series of requests to the respective
information source, after binding the parameters to all possible
combinations of values obtained from the referred to columns of the set
table constructed from the preceding data acquisition pipelines. In
particular, to evaluate a set table specification, we must evaluate
serially the data acquisition pipelines, extending at each step the
previously obtained set table: The primary data acquisition pipeline
A0 gives rise to set table SA0 . Then, for each set row D of SA0 ,
evaluating A1 gives rise to a set table SA1 (D). By flattening all
rows of SA1 (D) into a single row (by merging the respective
column values of each row) we obtain a new set row that is appended
to D. Doing this for all set rows D results in SA0 A1 . By applying
this process iteratively, eventually SA0 is extended with additional
columns to set table SA0 A1 ... Al .</p>
      <p>More formally, let n1, . . . , nk be the names, and [D1, . . . , Dm ]
the rows of Sˆ SA0 ... Ai . Evaluating Ai+1 on each row of Sˆ
produces set tables SAi+1 (D1), . . ., SAi+1 (Dm ). Since all these set
tables are produced by the same data acquisition pipeline Ai+1, they
share the same arity, say k ′, and column names, say nˆ1, . . . , nˆk′ .
Thus SA0 ... Ai+1 = ⟨N , T ⟩, where N = ⟨n1, . . . , nk , nˆ1, . . . , nˆk′ ⟩,
T = [D1′, . . . , D m′ ], Dj′ = [Dj [n1], . . . , Dj [nk ], Dˆ j1, . . . , Dˆ jk′ ]
for 1 ≤ j ≤ m, and Dˆj′l = Ð SAi+1 (Dj )[nˆl ] for 1 ≤ l ≤ k ′.</p>
      <p>The row flattening step is intentional: SA0 provides the
original data that we want to extend through transformations, ie. by
appending new columns containing new properties of that data.
Since, as mentioned above, all values contained in a particular row
and column of SA are equivalent with respect to the values in
the sets of the other columns of the current row, the flattening
behaviour maintains this relationship between values, without
introducing non-desired hierarchical dependencies. Finally, the primary
data acquisition pipeline may be itself parametric. In this case, the
evaluation is done exactly as described above, but the set rows
generated by SA0 are not appended to the set table on which it
depends, but initiate a new set table.
4</p>
    </sec>
    <sec id="sec-5">
      <title>INFORMATION SOURCES AND REPLIES</title>
      <p>We now study how several information and efective data sources
used in real applications can be accommodated by our model. We
discuss relational databases, RESTful web services, JSON, XML,
CSV/TSV documents, and SPARQL endpoints.
4.1</p>
    </sec>
    <sec id="sec-6">
      <title>Relational Databases</title>
      <p>
        In relational databases data is organized into one or more tables
(or relations) of columns (or attributes) and rows (or tuples). Each
table column has a name. Data are retrieved by issuing an SQL
SELECT query and the results are packed as a result set, which
is essentially a row-by-row iteratable table along with its
metadata. Because relational database management systems (RDBMS)
use native formats to implement the data stores and the result
formats, communications with RDBMSs’ are done using special
protocols (such as ODBC, JDBC) to implement clients for particular
RDMBSs’. Practical access requires several parameters to be
speciifed (e.g. server location, database name, user name, password,
access driver), which are usually grouped in the so-called
connection string and are programming language implementation
dependent. There is no standard for representing connection strings in
RDF form. D2RQ Mapping Language [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] allows a JDBC-dependent
RDF definition of connection strings and is used by RML to specify
RDBMS connectivity.
      </p>
      <p>An implementation provided with a RDBMS connection
specification can connect to the particular RDBMS, pose an SQL SELECT
query q that specifies attributes n1, . . . , nk in the SELECT
statement for the returned columns, and obtain as result a list of rows
[⟨v11, . . . , v1k ⟩, . . . , ⟨vn1, . . . , vnk ⟩]. Using, a trivial row iterator
and column names n1, . . . , nk as selectors, the results of q can be
converted to the following set table: ⟨⟨n1, . . . , nk ⟩,
[⟨{v11}, . . . , {v1k }⟩, . . . , ⟨{vn1}, . . . , {vnk }⟩]⟩
4.2</p>
    </sec>
    <sec id="sec-7">
      <title>RESTful Web Services</title>
      <p>
        RESTful web services are services based on the REST principles
[
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], and are usually implemented using the HTTP protocol.
Typically, a data retrieving RESTful service accepts an HTTP request
and delivers the result in a self-descriptive text message (e.g. an
HTML, XML, JSON, plain text). Here we are interested in
structured reply services, i.e. services whose reply is in one of the XML,
JSON or CSV/TSV formats. To access a RESTful web service, the
elements of the appropriate HTTP request have to be specified.
These include the method (GET or POST), the URI (including the
query string in the case of a GET message), any headers, and the
body (for passing parameters in the case of a POST message). All
these can be specified in RDF using the W3C’s Working Group
Notes ‘HTTP Vocabulary in RDF 1.0’ [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] and ‘Representing
Content in RDF 1.0’ [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. Thus, we can assume that an HTTP client that
can consume an HTTP Vocabulary and Representing Content in
RDF 1.0 description to create an HTTP request, can use a RESTful
web service and obtain as result a structured document. Although
not strictly qualifying as RESTful web services, we include in this
category also URIs that simply deliver structured documents (e.g.
URIs to static JSON/XML files), since the communication is
performed in exactly in the same way through HTTP messages.
      </p>
      <p>A practical consideration usually related with some RESTful
web services, is that the APIs that implement the services, to avoid
extremely long replies, perform pagination of the results and do
not return the full set of results as one document, but as a series
of smaller documents: in most cases, each returned document
contains some keys that can be used by the client in the subsequent
request to instruct the server to return the next set of results. The
pagination schema may get non-trivial, as in the case of MediaWiki18.
4.3</p>
    </sec>
    <sec id="sec-8">
      <title>SPARQL Endpoints</title>
      <p>
        SPARQL endpoints are URIs at which a SPARQL Protocol service
listens [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. SPARQL Protocol is built on top of HTTP and as such
it can be treated as a RESTful web service. However, since special
SPARQL Protocol clients, in the form of APIs, exist (e.g. Apache
Jena19) that hide from the user the cumbersome details of building
and decoding the necessary HTTP request and reply messages it
is useful to provide support also for this type of interaction. The
situation is similar to the RDBMS case: The request is a SPARQL
SELECT (possibly along with some default and named RDF graph
IRIs) instead of an SQL SELECT query, and the efective data source
is a result set, whose column names are the return variable names
specified in the SPARQL query. Thus, the translation of the reply
to a set table is done in exactly the same way. The only essential
thing that changes is the specification of the access to the SPARQL
endpoint for which a single URI is enough.
4.4
      </p>
    </sec>
    <sec id="sec-9">
      <title>JSON Documents</title>
      <p>
        A JSON document [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] may be modeled as a JSON tree [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. A JSON
tree is an edge-labeled tree, whose root represents the entire
document. A node may have either string- or integer-labeled children,
but not both. A node with string-labeled outgoing edges represents
a set of JSON key-value pairs: the edge label is the key and the edge
destination the corresponding value. A node with integer-labeled
outgoing edges represents an array: the edge label is the array
index and the edge destination the corresponding value. Value nodes,
are either leaf nodes having a string or integer label, or JSON trees.
      </p>
      <p>
        In the absence of an oficial standard, to select values from a
JSON document that meet specific conditions, in practice the
JSONPath [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] specification is used, which is inspired by XPath.
JSONPath queries select nodes of a JSON tree that meet a certain path
condition, and group them into a JSON array, which is the result
of the query. Since a JSON array is a JSON document, the result of
a JSONPath query is always a JSON document. We will say that a
JSONPath query is flat if the result JSON tree has depth 1, ie. is an
array of simple values.
      </p>
      <p>Hence, an iterator for a JSON tree T is any relevant JSONPath
query q, which splits T into a logical array of smaller JSON trees
T1, . . . , Tn , and the selectors are flat JSONPath queries q1, . . . qk
that are executed over each T1, . . . , Tn to deliver a set table from the
underlying logical array. Thus T , after applying iterator q and
selectors q1, . . . qk , yields the set table ⟨⟨q1, . . . , qk ⟩,
[⟨C11, . . . , C1k ⟩, . . . , ⟨Cn1, . . . , Cnk ⟩]⟩, where Ci j is the set of
values contained in the array that results from applying qj on Ti .
18 https://www.mediawiki.org/wiki/API:Query 19 https://jena.apache.org/
4.5</p>
    </sec>
    <sec id="sec-10">
      <title>XML Documents</title>
      <p>
        An XML document may also be modeled using a tree [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], however
its structure difers from a JSON tree. The core part of an XML
document is represented in the tree by element, attribute and text
nodes. Each element node corresponds to an element of the XML
document and has a name (the element name) and children that
are all the enclosed elements. It may also have as child a text node,
that holds in its string value the characters in the CDATA section
of the element. Each element node may have associated with it also
a set of attribute nodes that represent the attributes of the element,
which, however, are not considered to be children of the element
node. Each attribute node has a name (the attribute name) and a
string value that holds the respective attribute value. Relying on
this model, the XPath language allows to select particular nodes
from the tree that meet certain conditions. Unlike in the case of
JSON, the result is not itself an XML document, but a set of the
nodes that match the query criteria. We will say that an XPath
query is flat if the result contains only text or attribute nodes.
      </p>
      <p>Hence, we can consider as iterator for an XML document tree
T any relevant non-flat XPath query q that splits T into a logical
array of nodes N1, . . . , Nn . Since the query is non-flat, these nodes
are element nodes, and can be treated as smaller XML document
trees T1, . . . , Tn . The selectors are then flat XPath queries q1, . . . qk
that are executed over each one of these smaller XML documents.
Thus, T after applying iterator q and selectors q1, . . . qk yields
the set table ⟨⟨q1, . . . , qk ⟩, [⟨C11, . . . , C1k ⟩, . . . , ⟨Cn1, . . . , Cnk ⟩]⟩,
where Ci j are the sting values of the text or attribute nodes in the
node set obtained by applying qj on Ti .
4.6</p>
    </sec>
    <sec id="sec-11">
      <title>CSV/TSV Documents</title>
      <p>CSV/TSV documents are textual representations of tabular data.
Each line represents a data row, expect possibly from the first row
that contains the names of the columns. Hence, the situation is
similar to the RDBMS case, with no need of a query to be specified.
The name tuple consists of the names of the columns in the file (or
of their numbering) and the row sets of the actual rows of the table.
The only thing the needs to be specified are the formatting details
(eg. delimiter, escape separator, quote character).
5</p>
    </sec>
    <sec id="sec-12">
      <title>D2RML SPECIFICATION</title>
      <p>D2RML draws significantly from R2RML and RML, and follows
the same simple syntactical strategy for defining mappings: Triples
maps, which consist of a subject map and several predicate object
maps. From RML it adopts and appropriately extends the way to
define the interaction with information sources through requests,
iterators and selectors. Moreover, it significantly extends the
expressive capabilities of R2RML and RML by allowing
transformations, conditional statements, and custom IRI generation functions.</p>
      <p>For its semantics, D2RML relies on the data model described in
Section 3. Each triples map is essentially a set table specification
of Def. 3.3 and a specification of a set of triple rules of Def. 3.5
with the same subject filter over the common underlying set table.
The information source, request and iterator of the original data
acquisition pipeline is directly provided in the triples map
definition. Any transformations to be added to the set table specification</p>
      <sec id="sec-12-1">
        <title>LogicalTable</title>
        <p>←</p>
      </sec>
      <sec id="sec-12-2">
        <title>LogicalSource</title>
        <p>←</p>
      </sec>
      <sec id="sec-12-3">
        <title>SPARQLTable</title>
        <p>←</p>
      </sec>
      <sec id="sec-12-4">
        <title>CSVTable</title>
        <p>←
SQLTable
←
a rr:BaseTableOrView
rr:tableName literal
a rr:R2RMLView
rr:sqlQuery literal
(rr:sqlVersion iri)?
a rr:LogicalTable
dr:source ⟨InformationSource⟩
SQLTable | SPARQLTable | CSVTable
(is:parameters ( ⟨DataVariable⟩+ ))?
a dr:LogicalSource
dr:source ⟨InformationSource⟩
dr:iterator literal
dr:referenceFormulation iri
a dr:SPARQLTable
dr:sparqlQuery literal
(dr:sparqlVersion iri)?
(dr:defaultGraph iri)*
(dr:namedGraph iri)*
a dr:TextTable
dr:delimiter literal
dr:headerline boolean
(dr:quoteCharacter literal)?
(dr:commentCharacter literal)?
(dr:escapeCharacter literal)?
(dr:recordSeparator literal)?
determines also the form of all selectors that will be applied on the
particular efective data source.</p>
        <p>a is:SimpleKeyRequestIterator
is:name literal
dr:reference literal
dr:referenceFormulation literal
is:initialValue literal</p>
        <p>In an RDBMSSource, is:rdbms determines the specific RBMBS
(eg. MySQL, PostgreSQL). An HTTPSource is specified in terms of a
HTTPRequest which should be a http:Request and specify the
details of the HTTP message to be sent. An HTTPSource may contain
are declared in the order of their application. The selectors are
implicitly declared in the subject, predicate, object and graph maps.
Several triples map are allowed to coexist in the a D2RML
document, in which case several distinct set tables are generated.</p>
        <p>We define D2RML using a BNF-like notation. Terminal
symbols are written in monospace, and non-terminals in italics.
Nonterminals within angle brackets represent RDF nodes. Parenthesis
specify the scope of alternatives (separated by |) and of the
standard quantifiers ?, *, and +. Terminal symbols not explicitly defined
in the specification are written in smallcaps. The namespaces are
defined in Table 3. D2RML is compatible with R2RML, but not fully
compatible with RML, so it does not directly extend its namespace.
5.1</p>
      </sec>
    </sec>
    <sec id="sec-13">
      <title>Triples Maps</title>
      <p>A triples map is defined as in R2RML and RML, but tabular data
providing information sources are clearly distinguished from
nontabular by using rr:logicalTable for tabular data providing
information sources, and dr:logicalSource for the rest.</p>
      <sec id="sec-13-1">
        <title>TriplesMap</title>
        <p>←
PredObjMap
←
a rr:TriplesMap
rr:logicalTable ⟨LogicalTable⟩ |</p>
        <p>dr:logicalSource ⟨LogicalSource⟩
(dr:transformations ( ⟨Transformation⟩+ ))?
rr:subjectMap ⟨SubjectMap⟩ | rr:subject iri
(rr:predicateObjectMap ⟨PredObjMap⟩)*
a rr:PredicateObjectMap
(rr:predicateMap ⟨PredicateMap⟩ |
rr:predicate iri)+
(rr:objectMap ( ⟨ObjectMap⟩ | ⟨RefObjectMap⟩) |
rr:object (iri | literal))+
(rr:graphMap ⟨GraphMap⟩ | rr:graph iri)*
5.2</p>
      </sec>
    </sec>
    <sec id="sec-14">
      <title>Logical Tables and Logical Sources</title>
      <p>The LogicalTable and LogicalSource nodes provide details about the
primary information source used to generate the set table. In the
case of query supporting information sources (such as RDBMSs’
and SPARQL endpoints), for backward compatibility with R2RML,
they contain also the query-relevant details of the request that
should be sent to the information source. The is:parameters
predicate may be used to declare parameter names in queries that
participate in parametric data acquisition pipelines. For other
information sources (such as RESTful web services), the request, and any
parameters, are included in the InformationSource specification
itself. For non-tabular data providing information sources,
LogicalSource contains also the definition of the iterator ( dr:iterator
and dr:referenceFormulation) that will be used to split the
effective data source into a logical array. Since the efective data
source format is fixed, the object of dr:referenceFormulation
5.3</p>
    </sec>
    <sec id="sec-15">
      <title>Information Sources</title>
      <p>The version of D2RML presented here provides definitions for
implementing data acquisition pipelines involving RDBMSs’,
RESTful web services and SPARQL endpoints. Extensions for additional
sources are expected in subsequent versions.</p>
      <sec id="sec-15-1">
        <title>RDMSSource | SPARQLService | HTTPSource</title>
        <sec id="sec-15-1-1">
          <title>InformationSource</title>
        </sec>
        <sec id="sec-15-1-2">
          <title>RDMSSource</title>
          <p>←</p>
        </sec>
        <sec id="sec-15-1-3">
          <title>SPARQLService</title>
          <p>←</p>
        </sec>
        <sec id="sec-15-1-4">
          <title>HTTPSource</title>
        </sec>
        <sec id="sec-15-1-5">
          <title>Parameter</title>
          <p>←</p>
        </sec>
        <sec id="sec-15-1-6">
          <title>DataVariable</title>
          <p>←
←
SimpleKeyRequestIterator
←
←
a is:RDBMSSource
is:rdbms iri
is:location literal
(is:username literal)?
(is:password literal)?
(is:database literal)?
a is:SPARQLService
is:uri uri
a is:HTTPSource
is:request ⟨HTTPRequest ⟩ | is:uri uri
(is:parameters ( ⟨Parameter ⟩+ ))?</p>
        </sec>
      </sec>
      <sec id="sec-15-2">
        <title>DataVariable | SimpleKeyRequestIterator</title>
        <p>a is:DataVariable
is:name literal
parameters in case the web service is part of a parametric data
acquisition pipeline, or it paginates the results. Data parameters are
identified by a name ( is:name). For paginated results, the above
specification allows, as an example, iterated requests through a
request iterator that should be part eg. of the web service URI and
whose values, apart from the initial value (is:initialValue) are
extracted each time from the previous reply using a selector.
Extensions are possible to support additional pagination policies.</p>
      </sec>
    </sec>
    <sec id="sec-16">
      <title>5.4 Transformations</title>
      <p>A triples map definition may include a list of transformations that
should be applied in the declared order to the set table derived from
the primary information source. Since a transformation is itself
a parametric data acquisition pipeline, its definition includes the
specification of an InformationSource through a rr:logicalTable
or dr:logicalSource and one or more ParameterBindings. A
ParameterBinding consists of a reference to a value (ValueRef ) or a
constant value, and the parameter name (dr:parameter) in the
corresponding information source the value will be bound to.
Transformation ← a dr:Transformation
rr:logicalTable ⟨LogicalTable⟩ |</p>
      <p>dr:logicalSource ⟨LogicalSource⟩
(dr:parameterBinding ⟨ParameterBinding⟩)+
ParameterBinding ← a dr:ParameterBinding
dr:parameter literal
rr:constant literal | ValueRef</p>
    </sec>
    <sec id="sec-17">
      <title>5.5 Term Maps and Conditions</title>
      <p>The definitions of term maps (i.e. of subject maps, graph maps,
predicate maps and object map) follow the R2RML specification
with the addition of filters.</p>
      <p>SubjectMap ← a rr:SubjectMap</p>
      <sec id="sec-17-1">
        <title>IRIRef | BlankNodeRef</title>
        <p>(SubjectBody CaseSubjectBody*) | CaseSubjectBody+
PredicateMap ← a rr:PredicateMap</p>
        <p>(PredicateBody CasePredBody*) | CasePredBody+
ObjectMap ← a rr:ObjectMap</p>
        <p>(ObjectBody CaseObjectBody*) | CaseObjectBody+
GraphMap ← a rr:GraphMap</p>
        <p>(GraphBody CaseGraphBody*) | CaseGraphBody+</p>
      </sec>
      <sec id="sec-17-2">
        <title>SubjectBody ← (rr:class IRI)*</title>
        <p>(rr:graphMap ⟨GraphMap⟩ | rr:graph IRI)*
(dr:condition ⟨Condition⟩)?</p>
      </sec>
      <sec id="sec-17-3">
        <title>PredicateBody ← IRIRef</title>
        <p>(dr:condition ⟨Condition⟩)?</p>
      </sec>
      <sec id="sec-17-4">
        <title>ObjectBody ← IRIRef | BlankNodeRef | LiteralRef</title>
        <p>(dr:condition ⟨Condition⟩)?</p>
      </sec>
      <sec id="sec-17-5">
        <title>GraphBody ← IRIRef</title>
        <p>(dr:condition ⟨Condition⟩)?</p>
      </sec>
      <sec id="sec-17-6">
        <title>CaseSubjectBody ← dr:cases ( ⟨SubectBody ⟩+ )</title>
      </sec>
      <sec id="sec-17-7">
        <title>CasePredBody ← dr:cases ( ⟨PredicateBody ⟩+ )</title>
      </sec>
      <sec id="sec-17-8">
        <title>CaseObjectBody ← dr:cases ( ⟨ObjectBody ⟩+ )</title>
      </sec>
      <sec id="sec-17-9">
        <title>CaseGraphBody ← dr:cases ( ⟨GraphBody ⟩+ )</title>
        <p>To support filters, a SubjectMap, GraphMap, PredicateMap or
ObjectMap may contain a condition (dr:condition) and/or a case
statement (dr:cases). If a term map contains a condition
statement, this will be evaluated and the corresponding subject, graph,
predicate or object value will be included in the respective value
set only if the condition evaluates to true. Each condition statement
should first specify the actual value on which it will operate (as a
ValueRef ), and may include several tests which will be jointly
evaluated using the boolean operator specified by dr:booleanOperator
(op:and or op:or). Each test is specified either through an
operator and a literal which define a constant value with which the
actual value will be compared using operator, or as a nested
condition. An operator is a common operator such as op:eq, op:le,
op:leq, op:ge, op:geq, op:matches, etc. The type of the operation
(eg. number or string comparison) depends on the XSD type of the
literal provided as operand. If a nested condition does not specify
a value reference, it inherits it from the enclosing condition.</p>
        <p>The case statement ofers alternatives for realizing a term map:
It contains a list of alternative term maps, each along with a
condition. If the condition evaluates to true the term map is realized,
otherwise control flows to the next case.</p>
        <p>Finally, a referring object map (RefObjectMap) may be defined by
a ParameterBinding, instead of by a R2RML JoinCondition. This
is how set table specifications with parametric primary data
acquisition pipelines are defined: the parametric set table specification
corresponds to the parent triples map of RefObjectMap, and the
ParameterBinding provides the parameters values.</p>
      </sec>
    </sec>
    <sec id="sec-18">
      <title>5.6 IRIs, Literals and Blank Nodes</title>
      <p>In R2RML, RDF terms are generated using the rr:constant, the
rr:column and rr:template predicates; to these, RML adds the
rml:reference option. D2RML follows the same strategy, but to
account for values coming from transformations, RDF terms are
generated through value references (ValueRefs), specified by two
distinct components: a compulsory rr:column, rr:template or
dr:reference, and an optional dr:transformationReference to
specify the transformation that provides the logical array for the
respective rr:column, rr:template or dr:reference. If missing,
the primary logical array is assumed.</p>
      <p>Although rr:template allows some minimal flexibility in
defining custom IRIs or literals, the overall mechanism is quite
restrictive, since no simple transformations (e.g. replace particular
characters etc.) can be applied on the values obtained from the
underlying set tables. D2RML addresses this issue by allowing simple
functions to be applied on the raw values obtained from efective data
sources. Thus, a ValueRef may include definitions of one or more
defined columns ( dr:definedColumns) that are constructed by
applying a series of functional transformations on particular set table
column values and may be used in a rr:column or rr:template.
A defined column should declare the new column name dr:name it
will be referred by, the function (dr:function) that will generate
the custom values (eg. op:regex, op:replace), and a list of
arguments, in the form of one or more dr:parameterBindings. The
parameter names should be provided by the function definition.</p>
      <sec id="sec-18-1">
        <title>IRIRef</title>
        <p>← rr:constant iri | ValueRef
(rr:termType rr:IRI)?</p>
      </sec>
      <sec id="sec-18-2">
        <title>LiteralRef</title>
      </sec>
      <sec id="sec-18-3">
        <title>BlankNodeRef</title>
      </sec>
      <sec id="sec-18-4">
        <title>ValueRef DefinedColumn</title>
        <p>← rr:constant literal | ValueRef
(rr:termType rr:Literal)?
(rr:language literal | rr:datatype iri)?
← ValueRef</p>
        <p>(rr:termType rr:BlankNode)?
← rr:column literal | rr:template literal |
dr:reference literal
(dr:transformationReference ⟨Transformation⟩)?
(dr:definedColumns ( ⟨DefinedColumn ⟩+ ))?
← a dr:DefinedColumn
dr:name literal
dr:function iri
(dr:parameterBinding ⟨ParameterBinding ⟩)+</p>
      </sec>
    </sec>
    <sec id="sec-19">
      <title>6 USE CASE</title>
      <p>In this section, we present a realistic use case for D2RML, involving
true data and readily available web services and data repositories.
The aim is to extract an extensive set of textual or URI features for a
set of cultural items, in order to subsequently use them to perform
several tasks such as clustering and similarity ranking. We assume
that we want to extract features in several ways (e.g. directly from
the metadata, from applying named entity extraction, image
analysis, etc.), and that we want to keep information about the source of
each feature so that we can use them selectively to test how they
afect the clustering or similarity algorithm performance.</p>
      <p>As primary information source of cultural items we use
Europeana Collections20, in particular the collection provided by
TopFoto21, which consists of 60,882 black and white images of the
1930s, along with their metadata. This collection can be obtained
through the Europeana API. The D2RML specification for getting
the efective data source for this collection is the following:
&lt;#EuropeanaAPI&gt;
a is:HTTPSource ;
is:request [
http:absoluteURI "http://www.europeana.eu/api/v2/search.json?
wskey=A*******W&amp;rows=20&amp;cursor={@@cursor@@}&amp;profile=rich&amp;
query=europeana_collectionName%3A%222024904_Ag_EU_</p>
      <p>EuropeanaPhotography_TopFoto_1013%22" ;
http:methodName "GET" ;
] ;
is:parameters ( [ a is:SimpleKeyRequestIterator ;
is:name "cursor" ;
is:initialValue "*" ;
dr:reference "$.nextCursor" ;
dr:referenceFormulation is:JSONPath ; ] ) .</p>
      <p>The specification includes a is:SimpleKeyRequestIterator as
parameter, because the API returns the results in pages, and each
20 https://www.europeana.eu/portal/en 21 http://www.topfoto.co.uk/
page contains a key to accessing the next page (nextCursor). An
extract from the response obtained from executing the above is the
following JSON document, which contains a list of items modeled
using the Europeana Data Model (EDM):
{
"nextCursor": "AoE/GC8yMDI0OTA0L3Bob3Rv****=",
"items": [
{
"id": "/2024904/photography_ProvidedCHO_TopFoto_co_uk_EU061905",
"dcDescription": [
"Former chief inspector Berrett decorated by the king.\n</p>
      <p>Former chief detective inspector James Berrett of
Scotland Yard was decorated by the King at the royal
invesititure at Buckingham Palace. "
],
"edmIsShownBy": [</p>
      <p>"http://www.topfoto.co.uk/imageflows/imagepreview/f=EU061905"
],
"edmConcept": [
"http://bib.arts.kuleuven.be/photoVocabulary/12003",
"http://data.europeana.eu/concept/base/1711"
}</p>
      <p>Since we want to extract several features, we can invoke
services to the analyze metadata. An option is to use DBpedia
Spotlight to extract named entities from the textual descriptions. To do
this, we need a transformation that takes the description of each
item (dcDescription) and invokes DBpedia Spotlight on it. We
ifrst define the relevant information source:
&lt;#DBpediaSpotlightAPI&gt;
a is:HTTPSource ;
is:request [
http:absoluteURI "http://model.dbpedia-spotlight.org/en/
annotate?text={@@text@@}&amp;confidence=0.5&amp;support=0&amp;
spotter=Default&amp;disambiguator=Default&amp;policy=whitelist&amp;
types=&amp;sparql=" ;
http:methodName "GET" ;
http:headers ( [ http:fieldName "Accept" ;</p>
      <p>http:fieldValue "application/xml" ; ] ) ;
] ;
is:parameters ( [ a is:DataVariable ;
is:name "text" ;</p>
      <p>The respective efective data source has the following XML format
&lt;Annotation text="Former chief inspector Berrett decorated by the king.
\nFormer chief detective inspector James Berrett of Scotland Yard
was decorated by the King at the royal invesititure at Buckingham
Palace." confidence="0.5" support="0"
types="" sparql="" policy="whitelist"&gt;
&lt;Resources&gt;
&lt;Resource URI="http://dbpedia.org/resource/Inspector"
support="972" types="" surfaceForm="detective inspector"
offset="69" similarityScore="1.0"
percentageOfSecondRank="0.0"/&gt;
...</p>
      <p>&lt;/Resources&gt;
&lt;/Annotation&gt;
which includes all detected named entities (Resource) as DBpedia
resources (URI). We next define the transformation
&lt;#SpotlightTransformation&gt;
dr:logicalSource [ dr:source &lt;#DBpediaSpotlightAPI&gt; ;
dr:iterator "/Annotation/Resources/Resource" ;
dr:referenceFormulation is:XPath ; ] ;
dr:parameterBinding [ dr:parameter "text" ;</p>
      <p>dr:reference "$.dcDescription" ; ] .
and add the transformation and a new predicate object map to the
&lt;#EuropeanaMapping&gt; triples map:
&lt;#EuropeanaMapping&gt;
...
dr:transformations ( &lt;#SpotlightTransformation&gt; ) ;
rr:predicateObjectMap [
rr:predicate &lt;http://islab.ntua.gr/ml/DBpediaResource&gt; ;
rr:objectMap [
dr:reference "/Resource/@URI" ;
dr:transformationReference &lt;#SpotlightTransformation&gt; ;
rr:termType rr:IRI ;
When executed, it generates the following additional triples:
&lt;http://islab.ntua.gr/resources/tp/EU061905&gt;
&lt;http://islab.ntua.gr/ml/DBpediaResource&gt;</p>
      <p>&lt;http://dbpedia.org/resource/Inspector&gt; .
&lt;http://islab.ntua.gr/resources/tp/EU061905&gt;
&lt;http://islab.ntua.gr/ml/DBpediaResource&gt;</p>
      <p>&lt;http://dbpedia.org/resource/James_Berrett&gt; .</p>
      <p>We further extend the set of features by using DBpedia
ontology to get the types of the retrieved DBpedia resources. For this
we need a second transformation, dependent on the first one, that
consults a DBpedia endpoint. The information source definition is
&lt;#DBpediaSPARQLService&gt;
a is:SPARQLService ;
is:uri "http://dbpedia.org/sparql" .
and the transformation
&lt;#DBpediaTransformation&gt;
dr:logicalSource [
dr:source &lt;#DBpediaSPARQLService&gt; ;
dr:query "SELECT ?dbpediatype WHERE</p>
      <p>{ &lt;{@@resource@@}&gt; a ?dbpediatype }" ;
is:parameters ( [ a is:DataVariable;</p>
      <p>is:name "resource" ; ] ) ;
] ;
dr:parameterBinding [
dr:parameter "resource" ;
dr:reference "/Resource/@URI" ;
dr:transformationReference &lt;#SpotlightTransformation&gt; ;</p>
      <p>Finally, we modify &lt;#EuropeanaMapping&gt; to add the new
transformation and a add new predicate object map:
&lt;#EuropeanaMapping&gt;
...
dr:transformations ( &lt;#SpotlightTransformation&gt;</p>
      <p>&lt;#DBpediaTransformation&gt; ) ;
rr:predicateObjectMap [
rr:predicate &lt;http://islab.ntua.gr/ml/DBpediaConcept&gt; ;
rr:objectMap [
rr:column "dbpediatype" ;
dr:transformationReference &lt;#DBpediaTransformation&gt; ;
rr:termType rr:IRI ;
dr:condition [</p>
      <p>op:matches "http://dbpedia\\.org/ontology/.*" ;
] .
been included because the query returns not only DBpedia
ontology concepts as types, but also FOAF, YAGO, Schema, Wikidata,
and other resources, which we do not want to include in our
results. Eventually, this map generates the following RDF triples:
&lt;http://islab.ntua.gr/resources/tp/EU061905&gt;
&lt;http://islab.ntua.gr/ml/DBpediaConcept&gt;</p>
      <p>&lt;http://dbpedia.org/ontology/Athlete&gt; .
&lt;http://islab.ntua.gr/resources/tp/EU061905&gt;
&lt;http://islab.ntua.gr/ml/DBpediaConcept&gt;</p>
      <p>&lt;http://dbpedia.org/ontology/Person&gt; .
&lt;http://islab.ntua.gr/resources/tp/EU061905&gt;
&lt;http://islab.ntua.gr/ml/DBpediaConcept&gt;</p>
      <p>&lt;http://dbpedia.org/ontology/Agent&gt; .</p>
      <p>Finally, we can use computer vision technologies to analyze the
image of each item (the URI is provided by the edmIsShownBy field
in the document returned by the Europeana API) to detect objects
that appear in it. To this end we use Microsoft’s Computer Vision
API, that is ofered as a RESTful web service. Thus, we add a new
information source including the required request parameters
which produces the following JSON-formatted efective data source:
&lt;#ImageTransformation&gt;
dr:logicalSource [ dr:source &lt;#ComputerVisionAPI&gt; ;
dr:iterator "$.categories" ;
dr:referenceFormulation is:JSONPath ; ] ;
dr:parameterBinding [ dr:parameter "imageURL" ;</p>
      <p>dr:reference "$.edmIsShownBy" ; ] .
to generate a logical array from categories that contains the names
of the detected objects, and modify &lt;#EuropeanaMapping&gt; by adding
the new transformation and a new predicate object map:
&lt;#EuropeanaMapping&gt;
dr:transformations ( &lt;#SpotlightTransformation&gt;</p>
      <p>&lt;#DBpediaTransformation&gt; &lt;#ImageTransformation&gt; ) ;
rr:predicateObjectMap [
rr:predicate &lt;http://islab.ntua.gr/ml/ComputerVisionTerm&gt; ;
rr:objectMap [
dr:reference "$.name" ;
dr:transformationReference &lt;#ImageTransformation&gt; ;
rr:termType rr:Literal ;
dr:condition [
dr:reference "$.score" ;
dr:transformationReference &lt;#ImageTransformation&gt; ;
op:geq "0.4"^^xsd:decimal ;
The above object map applies a filter in order to keep only objects
that have been detected with relatively high confidence ( score).
Eventually, the above map adds the following RDF triple:
&lt;http://islab.ntua.gr/resources/tp/EU061905&gt;
&lt;http://islab.ntua.gr/ml/ComputerVisionTerm&gt;</p>
      <p>"people_group" .</p>
      <p>The RDF triples generated by all the above predicate-object maps
make up the desired RDF graph. In terms of performance, for
executing the above D2RML document, our implementation of D2RML
processor22 took about 7 minutes per 100 Europeana items.
We presented D2RML, a Data-to-RDF mapping language, which
based on an abstract data model, allows the orchestrated retrieval
of data from several information sources, their transformation and
extension using relevant web services, their filtering and
manipulation using simple operations, and finally their mapping to RDF
graphs. It combines the mapping approach of R2RML and RML
with workflow approaches, by allowing the definition of easy to
write and understand, homogenous views of the underlying data
and services in a lightweight document. We developed D2RML on
top of a formal abstract data model, so as to formally define its
semantics and allow future extensions. We also presented a realistic
use case, which demonstrates the capabilities of the proposed
language in real settings, by delivering a unified and coordinated
access to Linked Data data stores and other services in a clean
specification without the need of code writing or heavy-weight solutions.</p>
    </sec>
    <sec id="sec-20">
      <title>ACKNOWLEDGEMENTS</title>
      <p>We acknowledge support of this work by the project ‘APOLLONIS’
(MIS 5002738) which is implemented under the Action
‘Reinforcement of the Research and Innovation Infrastructure’, funded by the
Operational Programme ‘Competitiveness, Entrepreneurship and
Innovation’ (NSRF 2014-2020) and co-financed by Greece and the
European Union (European Regional Development Fund).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Marcelo</given-names>
            <surname>Arenas</surname>
          </string-name>
          , Alexandre Bertails,
          <source>Eric Prud'hommeaux, and Juan Sequeda</source>
          .
          <year>2012</year>
          .
          <article-title>A Direct Mapping of Relational Data to RDF</article-title>
          . (
          <year>2012</year>
          ). https://www.w3. org/TR/rdb-direct-mapping/
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Stefan</given-names>
            <surname>Bischof</surname>
          </string-name>
          , Stefan Decker, Thomas Krennwallner, Nuno Lopes, and
          <string-name>
            <given-names>Axel</given-names>
            <surname>Polleres</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>Mapping between RDF and XML with XSPARQL</article-title>
          .
          <source>J. Data Semantics</source>
          <volume>1</volume>
          ,
          <issue>3</issue>
          (
          <year>2012</year>
          ),
          <fpage>147</fpage>
          -
          <lpage>185</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Barry</given-names>
            <surname>Bishop</surname>
          </string-name>
          , Atanas Kiryakov, Damyan Ognyanof, Ivan Peikov, Zdravko Tashev, and
          <string-name>
            <given-names>Ruslan</given-names>
            <surname>Velkov</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>OWLIM: A family of scalable semantic repositories</article-title>
          .
          <source>Semantic Web</source>
          <volume>2</volume>
          ,
          <issue>1</issue>
          (
          <year>2011</year>
          ),
          <fpage>33</fpage>
          -
          <lpage>42</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Pierre</given-names>
            <surname>Bourhis</surname>
          </string-name>
          , Juan L. Reutter, Fernando Suárez, and
          <string-name>
            <given-names>Domagoj</given-names>
            <surname>Vrgoc</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>JSON: Data model, Query languages and Schema specification</article-title>
          .
          <source>In PODS. ACM</source>
          ,
          <volume>123</volume>
          -
          <fpage>135</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>James</given-names>
            <surname>Clark and Steve DeRose</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>XML Path Language (XPath) Version 1</article-title>
          .
          <fpage>0</fpage>
          . (
          <year>2016</year>
          ). https://www.w3.org/TR/xpath/
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Dan</given-names>
            <surname>Connolly</surname>
          </string-name>
          .
          <year>2007</year>
          .
          <article-title>Gleaning Resource Descriptions from Dialects of Languages (GRDDL)</article-title>
          .
          <article-title>(</article-title>
          <year>2007</year>
          ). https://www.w3.org/TR/grddl/
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Richard</given-names>
            <surname>Cyganiak</surname>
          </string-name>
          , Chris Bizer, Jörg Garbers, Oliver Maresch, and
          <string-name>
            <given-names>Christian</given-names>
            <surname>Becker</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>The D2RQ Mapping Language</article-title>
          . (
          <year>2012</year>
          ). http://d2rq.org/ d2rq-language
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Souripriya</given-names>
            <surname>Das</surname>
          </string-name>
          ,
          <string-name>
            <surname>Seema Sundara</surname>
          </string-name>
          , and Richard Cyganiak.
          <year>2012</year>
          .
          <article-title>R2RML: RDB to RDF Mapping Language</article-title>
          . (
          <year>2012</year>
          ). https://www.w3.org/TR/r2rml/
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Anastasia</given-names>
            <surname>Dimou</surname>
          </string-name>
          , Miel Vander Sande, Pieter Colpaert, Ruben Verborgh, Erik Mannens, and Rik Van de Walle.
          <year>2014</year>
          .
          <article-title>RML: A Generic Language for Integrated RDF Mappings of Heterogeneous Data</article-title>
          .
          <source>In LDOW (CEUR Workshop Proceedings)</source>
          , Vol.
          <volume>1184</volume>
          . CEUR-WS.org.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Lee</surname>
            <given-names>Feigenbaum</given-names>
          </string-name>
          , Gregory Todd Williams, Kendall Grant Clark, and
          <string-name>
            <given-names>Elias</given-names>
            <surname>Torres</surname>
          </string-name>
          .
          <source>2013. SPARQL 1.1 Protocol</source>
          . (
          <year>2013</year>
          ). https://www.w3.org/TR/sparql11-protocol/
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Roy</surname>
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Fielding</surname>
            and
            <given-names>Richard N.</given-names>
          </string-name>
          <string-name>
            <surname>Taylor</surname>
          </string-name>
          .
          <year>2000</year>
          .
          <article-title>Principled design of the modern Web architecture</article-title>
          .
          <source>In ICSE. ACM</source>
          ,
          <volume>407</volume>
          -
          <fpage>416</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Stefan</given-names>
            <surname>Gössner</surname>
          </string-name>
          and
          <string-name>
            <given-names>Stephen</given-names>
            <surname>Frank</surname>
          </string-name>
          .
          <year>2007</year>
          . JSONPath. (
          <year>2007</year>
          ). http://goessner. net/articles/JsonPath/
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Oktie</surname>
            <given-names>Hassanzadeh</given-names>
          </string-name>
          , Soheil Hassas Yeganeh, and
          <string-name>
            <given-names>Renée J.</given-names>
            <surname>Miller</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>Linking Semistructured Data on the Web</article-title>
          . In WebDB.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Matthias</surname>
            <given-names>Hert</given-names>
          </string-name>
          , Gerald Reif, and
          <string-name>
            <surname>Harald</surname>
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Gall</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>A comparison of RDB-toRDF mapping languages</article-title>
          .
          <source>In I-SEMANTICS (ACM International Conference Proceeding Series)</source>
          . ACM,
          <volume>25</volume>
          -
          <fpage>32</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>Internet</given-names>
            <surname>Engineering Task Force</surname>
          </string-name>
          (IETF).
          <year>2014</year>
          .
          <article-title>The JavaScript Object Notation (JSON) Data Interchange Format</article-title>
          . (
          <year>2014</year>
          ). https://tools.ietf.org/html/rfc7159
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Johannes</surname>
            <given-names>Koch</given-names>
          </string-name>
          ,
          <article-title>Carlos A Velasco,</article-title>
          and
          <string-name>
            <given-names>Philip</given-names>
            <surname>Ackermann</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>HTTP Vocabulary in RDF 1.0</article-title>
          . (
          <year>2017</year>
          ). https://www.w3.org/TR/HTTP-in-RDF10/
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Johannes</surname>
            <given-names>Koch</given-names>
          </string-name>
          ,
          <article-title>Carlos A Velasco,</article-title>
          and
          <string-name>
            <given-names>Philip</given-names>
            <surname>Ackermann</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Representing Content in RDF 1.0</article-title>
          . (
          <year>2017</year>
          ). https://www.w3.org/TR/Content-in-RDF10/
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>Andreas</given-names>
            <surname>Langegger</surname>
          </string-name>
          and
          <string-name>
            <given-names>Wolfram</given-names>
            <surname>Wöß</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>XLWrap - Querying and Integrating Arbitrary Spreadsheets with SPARQL</article-title>
          .
          <source>In International Semantic Web Conference (Lecture Notes in Computer Science)</source>
          , Vol.
          <volume>5823</volume>
          . Springer,
          <fpage>359</fpage>
          -
          <lpage>374</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Franck</surname>
            <given-names>Michel</given-names>
          </string-name>
          , Loïc Djimenou, Catherine Faron Zucker, and
          <string-name>
            <given-names>Johan</given-names>
            <surname>Montagnat</surname>
          </string-name>
          .
          <year>2014</year>
          . xR2RML:
          <article-title>Non-Relational Databases to RDF Mapping Language</article-title>
          . (
          <year>2014</year>
          ). https://hal.inria.fr/hal-01066663v1/document
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>Boris</surname>
            <given-names>Motik</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Peter F. Patel-Schneider</surname>
            ,
            <given-names>and Bijan</given-names>
          </string-name>
          <string-name>
            <surname>Parsia</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>OWL 2 Web Ontology Language Structural Specification</article-title>
          and
          <string-name>
            <surname>Functional-Style Syntax (Second Edition).</surname>
          </string-name>
          (
          <year>2012</year>
          ). https://www.w3.org/TR/owl2-syntax/
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <surname>Yavor</surname>
            <given-names>Nenov</given-names>
          </string-name>
          , Robert Piro, Boris Motik, Ian Horrocks,
          <string-name>
            <surname>Zhe Wu</surname>
            , and
            <given-names>Jay</given-names>
          </string-name>
          <string-name>
            <surname>Banerjee</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>RDFox: A Highly-Scalable RDF Store</article-title>
          . In
          <source>International Semantic Web Conference (2) (Lecture Notes in Computer Science)</source>
          , Vol.
          <volume>9367</volume>
          . Springer,
          <fpage>3</fpage>
          -
          <lpage>20</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <surname>Martin J. O'Connor</surname>
            ,
            <given-names>Christian</given-names>
          </string-name>
          <string-name>
            <surname>Halaschek-Wiener</surname>
            , and
            <given-names>Mark A.</given-names>
          </string-name>
          <string-name>
            <surname>Musen</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>M2: A Language for Mapping Spreadsheets to OWL</article-title>
          .
          <source>In OWLED (CEUR Workshop Proceedings)</source>
          , Vol.
          <volume>614</volume>
          . CEUR-WS.org.
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <surname>Jason</surname>
            <given-names>Slepicka</given-names>
          </string-name>
          , Chengye Yin, Pedro A.
          <string-name>
            <surname>Szekely</surname>
          </string-name>
          , and
          <string-name>
            <surname>Craig</surname>
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Knoblock</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>KR2RML: An Alternative Interpretation of R2RML for Heterogenous Sources</article-title>
          .
          <source>In COLD (CEUR Workshop Proceedings)</source>
          , Vol.
          <volume>1426</volume>
          . CEUR-WS.org.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>