<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>On the Semantics of R2RML and its Relationship with the Direct Mapping</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Juan F. Sequeda</string-name>
          <email>jsequeda@cs.utexas.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, University of Texas at Austin</institution>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The W3C Relational Database to RDF (RDB2RDF) standards are positioned to bridge the gap between Relational Databases and the Semantic Web. The standards consist of two interrelated and complementary specifications: Direct Mapping of Relational Data to RDF and R2RML: RDB to RDF Mapping Language. In this paper we present initial results on the formal study of the R2RML mapping language by defining its semantics using Datalog. We prove that there are a total of 57 distinct Datalog rules which can be used to generate RDF triples from a relational table. Additionally, we provide insights on the relationship between R2RML and Direct Mapping.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>
        To live up to the promise of web-scale data integration, the Semantic Web will have to
include the content of existing relational databases. In September 2012, two interrelated
and complementary W3C standards, Direct Mapping [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and R2RML [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], were
standarized. R2RML is a mapping language which allows users to manually define
mappings. Direct Mapping is the default and automatic way to translate relational databases
into RDF without any input from a user, which can be represented in R2RML. The
Direct Mapping has been well studied. The W3C specifications present the denotational
and datalog-based semantics of the Direct Mapping [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Additionally, the W3C Direct
Mapping has been augmented to include a direct mapping from the relational schema
to OWL in order to prove fundamental correctness properties [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. However, to the best
of our knowledge, there has not been a thorough study of the R2RML language and
its relationship with the Direct Mapping. As a matter of fact, the semantics of R2RML
have not even been formally defined.
      </p>
      <p>Our methodology is to study the problem of mapping relational databases to RDF
from two different perspectives: logical and physical. Inspired by the relational query
processing workflow, we propose the following workflow. Mappings are expressed in
a declarative language: R2RML. A parser can translate the mapping into a logical
expression: the logical mapping. A logical mapping can be rewritten into other equivalent
logical mappings which may be better for performance. A logical mapping is
translated into a executable program, the physical mapping. A physical mapping can then
be implemented and optimized in different ways. This paper address the first part of
the proposed workflow and presents initial results on the formal study of R2RML by
define its semantics using Datalog. We prove that there are a total of 57 distinct Datalog
rules which can be used to generate RDF triples from a relational table. Additionally,
we provide insight on the relationship between R2RML and Direct Mapping. It is our
hypothesis that a core subset of R2RML has the same expressive power as the Direct
Mapping, if views are allowed as input.
2 R2RML
R2RML is a language for expressing mappings from a relational database to RDF.
The input of an R2RML mapping M is a relational schema R and an instance I
of R. The output is an RDF graph. Consider a schema of a database with of tables
EMP(EMPNO,ENAME,DID) and DEPT(DEPTNO, DNAME, LOC). Moreover, we
have the following constraints about the schema of the university: EMPNO is the
primary key of EMP, DEPTNO is the primary key of DEPT and DID is a foreign key in
EMP that references attribute DEPTNO in DEPT. An R2RML mapping is represented as
an RDF graph itself and also has an associated RDFS schema1. For readibility, the RDF
Turtle syntax is the recommended syntax to write R2RML mappings.</p>
      <p>Example 1. An R2RML Mapping for the example database.</p>
      <p>@prefix rr: &lt;http://www.w3.org/ns/r2rml#&gt;.
@prefix ex: &lt;http://example.com/ns#&gt;.
&lt;#TriplesMap1&gt;
rr:logicalTable [ rr:tableName "emp" ];
rr:subjectMap [ rr:template "http://ex.com/employee/{empno}";</p>
      <p>rr:class ex:Employee; ];
rr:predicateObjectMap [ rr:predicate ex:name;</p>
      <p>rr:objectMap [ rr:column "ename" ]; ];
rr:predicateObjectMap [ rr:predicate ex:department;
rr:objectMap [
rr:parentTriplesMap &lt;#TriplesMap2&gt;;
rr:joinCondition [
rr:child "dept";
rr:parent "deptno"; ]; ]; ].
&lt;#TriplesMap2&gt;
rr:logicalTable [ rr:tableName "dept" ];
rr:subjectMap [ rr:template "http://ex.com/dept/{deptno}";</p>
      <p>rr:class ex:Department; ];
rr:predicateObjectMap [ rr:predicate ex:name;</p>
      <p>rr:objectMap [ rr:column "DNAME" ]; ].</p>
      <p>The Datalog rules defining the R2RML semantics can be found at http://www.
cs.utexas.edu/˜jsequeda/r2rml. We briefly explain the components of R2RML
with the running example. An R2RML mapping consists of a set of Triple Maps. In the
example, there are two Triples Map: &lt;#TriplesMap1&gt; and &lt;#TriplesMap2&gt;.
Each TripleMaps consists of exactly one LogicalTable, exactly one SubjectMap and a
set (which may be empty) of Predicate-Object Maps. A LogicalTable is either an
existing table/view in the database or a SQL query (known also as an R2RML view).
In &lt;#TriplesMap1&gt;, the Logical Tables is the table name "emp". A SubjectMap
generates an RDF term for the subject and optionally an rdf:type statement. A
PredicateObjectMap is a pair of PredicateMap and ObjectMap which generates the RDF terms
for the predicate and object respectively of a triple, that is associated to the subject
generated by the SubjectMap. An ObjectMap can also be a Referencing Object Map
which allows using the subjects of another Triples Map as an object. Since both Triples
Maps may be based on different logical tables, it may require a join between the logical
tables.
1 http://www.w3.org/ns/r2rml</p>
      <p>SubjectMap, PredicateMap and ObjectMap are all Term Maps which is a function
that generates an RDF term from the database. There are three ways of creating an
RDF term, hence three types of Term Maps: 1) Constant-valued Term Map, 2)
Columnvalued Term Map, and 3) Template-valued Term Map. In &lt;#TriplesMap1&gt;, the
SubjectMap is a Template-valued Term Map because it generates an IRI from a template and
the value of the "empno" column. The PredicateMap is a Constant-valued Term Map
because it generates a constant IRI: ex:name. The ObjectMap is a Column-valued
Term Map because it generates a Literal from the value of the "ename" column.</p>
      <p>There are three ways of generating RDF triples. Table Triples are triples describing
an instance of a given class if a Subject Map has a rr:class associated to it. In this
case, the subject is one of the three possible Term Maps, the predicate is rdf:type
and the object is the given class. There are three possible rules to generate Table Triples.
Local Triples are triples that are generated exclusively from a single logical table. Given
that there are three ways of generating and RDF term for each the subject, predicate
and object, there are 27 different possible cases to generating triples, hence 27 different
rules. Reference Triples are triples that are generated for Referencing Object Maps,
through a join conditions to another table Similar to Local Triples, there are also 27
distinct rules to generate triples. Figure 1 depicts al the possible 27 combinations.
Given that there are 3 rules to generate Table Triples, 27 rules to generate Local
Triples and 27 rules to generate Reference Triples, the following theorem holds.
Theorem 1. The total number of distinct Datalog rules which can be used to generate
RDF triples from a table name is 57.</p>
    </sec>
    <sec id="sec-2">
      <title>Relationship with Direct Mapping</title>
      <p>We now focus on a simple yet important subset of R2RML which we call R2RM Lcore
which consists of only three rules (out of the 57), one rule each for generating Table,
Local and References Triples. In these rules, subjects are template-valued term maps,
predicates are constant-valued term maps and objects are column-valued term maps.
The following is an example of the a Datalog rule to generate Local Triples:
TRIPLE(S; P; O)</p>
      <p>SUBJECTMAP(T M; SID); PREDICATEOBJECTMAP(T M; P OID);
SUBJECTTEMPLATEVALUETERMMAP(SID; T; S);
PREDICATECONSTANTVALUETERMMAP(P OID; P );</p>
      <p>OBJECTCOLUMNVALUETERMMAP(P OID; T; O)</p>
      <p>
        The motivation of R2RM Lcore is two-fold. First, the W3C RDB2RDF Test Cases
[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] consist of mainly subject template-valued term maps and predicate constant-valued
term maps. Second, the anecdotal experience of the the author, who has implemented
Ultrawrap [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], a system that supports R2RML, shows that subjects are usually
templatevalued term maps and predicates are constant-valued term maps.
      </p>
      <p>Our hypothesis is that R2RM Lcore is as expressive as the Direct Mapping, if views
are allowed as input. We first would need to show how any mapping in R2RM Lcore
can be expressed in DMviews. This means, that views and constraints would need to
be created from the R2RML mapping. Additionally, there would need to be a
mechanism to defining templates for IRIs. Subsequently, we would need to show how any
DMviews mapping can be expressed as R2RM Lcore.
4</p>
    </sec>
    <sec id="sec-3">
      <title>Conclusions</title>
      <p>To the best of our knowledge, this is the first work presenting initial results on the formal
study of the R2RML mapping language by defining its semantics using Datalog. and
studying the relationship between R2RML and Direct Mapping. This is ongoing work
where we are now considering additional features of R2RML such as SQL queries,
languages and datatypes. We believe it is important to understand R2RML in order to
know how better support users and build tools.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>M.</given-names>
            <surname>Arenas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bertails</surname>
          </string-name>
          ,
          <string-name>
            <surname>E.</surname>
          </string-name>
          <article-title>Prud'hommeaux, and</article-title>
          <string-name>
            <given-names>J.</given-names>
            <surname>Sequeda</surname>
          </string-name>
          .
          <article-title>Direct mapping of relational data to RDF</article-title>
          .
          <source>W3C Recomendation 27 September</source>
          <year>2012</year>
          , http://www.w3.org/TR/rdb-direct-mapping/.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>S.</given-names>
            <surname>Das</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sundara</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Cyganiak</surname>
          </string-name>
          .
          <article-title>R2RML: RDB to RDF mapping language</article-title>
          .
          <source>W3C Recomendation 27 September</source>
          <year>2012</year>
          , http://www.w3.org/TR/r2rml/.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>J.</given-names>
            <surname>Sequeda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Arenas</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D. P.</given-names>
            <surname>Miranker</surname>
          </string-name>
          .
          <article-title>On directly mapping relational databases to rdf and owl</article-title>
          .
          <source>In WWW</source>
          , pages
          <fpage>649</fpage>
          -
          <lpage>658</lpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>J.</given-names>
            <surname>Sequeda</surname>
          </string-name>
          and
          <string-name>
            <given-names>D. P.</given-names>
            <surname>Miranker</surname>
          </string-name>
          . Ultrawrap:
          <article-title>Sparql execution on relational data</article-title>
          . To appear
          <source>in Journal of Web Semantics</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>B.</given-names>
            <surname>Villazon-Terrazas</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Hausenblas</surname>
          </string-name>
          .
          <article-title>R2RML and direct mapping test cases</article-title>
          .
          <source>W3C Working Group Note 14 August</source>
          <year>2012</year>
          , http://www.w3.org/TR/rdb2rdf-test-cases/.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>