=Paper= {{Paper |id=Vol-1409/paper-09 |storemode=property |title=Simplified RDB2RDF Mapping |pdfUrl=https://ceur-ws.org/Vol-1409/paper-09.pdf |volume=Vol-1409 |dblpUrl=https://dblp.org/rec/conf/www/StadlerUWSL15 }} ==Simplified RDB2RDF Mapping== https://ceur-ws.org/Vol-1409/paper-09.pdf
                                Simplified RDB2RDF Mapping

                                                       ∗
                                   Claus Stadler                      Jörg Unbehauen
                              Department of Computer               Department of Computer
                            Science, University of Leipzig,      Science, University of Leipzig,
                                      Germany                              Germany
                         cstadler@informatik.uni- unbehauen@informatik.uni-
                                 leipzig.de                leipzig.de
                Patrick Westphal            Mohamed Sherif            Jens Lehmann
             Department of Computer              Department of Computer              Department of Computer
           Science, University of Leipzig,     Science, University of Leipzig,     Science, University of Leipzig,
                     Germany                             Germany                             Germany
          pwestphal@informatik.uni-              sherif@informatik.uni-            lehmann@informatik.uni-
                 leipzig.de                            leipzig.de                        leipzig.de

ABSTRACT                                                          While having a standard itself is of high importance, we
The combination of the advantages of widely used relational       argue that R2RML has some drawbacks on the syntactical
databases and semantic technologies has attracted signifi-        level: Writing RDF views in R2RML is very verbose and
cant research over the past decade. In particular, mapping        arguably not as intuitive as it could be. The choice of us-
languages for the conversion of databases to RDF knowledge        ing RDF as base syntax for R2RML has the advantage that
bases have been developed and standardized in the form of         people writing mappings can be expected to know RDF.
R2RML. In this article, we first review those mapping lan-        However, there is a significant gap between the relational
guages and then devise work towards a unified formal model        database structure and the structure of the R2RML map-
for them. Based on this, we present the Sparqlification Map-      ping specifications. While graphical editors, such as [11][7],
ping Language (SML), which provides an intuitive way to           partially mitigate the problem of having to write those map-
declare mappings based on SQL VIEWS and SPARQL con-               ping definitions, these also have their limitations. In par-
struct queries. We show that SML has the same expressivity        ticular, they would have to support both, the full feature
as R2RML by enumerating the language features and show            set of the mapping language while still be efficient to work
the correspondences, and we outline how one syntax can be         with and producing human readable output. Moreover, in
converted into the other. A conducted user study for this         some environments, Web based editors in the spirit of ph-
paper juxtaposing SML and R2RML provides evidence that            pmyadmin2 may pose security risks or are not convenient,
SML is a more compact syntax which is easier to understand        since many database and RDF experts are simply used to
and read and thus lowers the barrier to offer SPARQL access       work on text files and textual representations of data and
to relational databases.                                          queries. While they appreciate unobtrusive help, like syn-
                                                                  tax checking or code completion, a graphical user interface
                                                                  might impose an unfitting work flow, for example when an
1.     INTRODUCTION                                               administrator is used to be able to perform small database
   Despite the success of semantic technologies, a large share    related tasks via a command line interface. In this work,
of structured knowledge still resides in relational databases.    we introduce the Sparqlification Mapping Language (SML)
For this reason, significant research effort has been invested    as a human friendly alternative to R2RML. It is notewor-
by the Semantic Web community into making relational              thy, that non-RDF syntaxes for which RDF-based versions
databases available as RDF.                                       exist are commonly used the Semantic Web. For example,
   Due to the strong interest in this area, several approaches    while e.g. OWL ontologies can be written directly in RDF,
and languages for mapping relational data to triples have         Manchester OWL Syntax3 is a popular and concise alter-
been devised, in particular the W3C standard R2RML1 .             native used in the primer of the OWL 2 specification itself.
∗Corresponding Author                                             As another example, while SPARQL queries in principle can
1                                                                 be written in RDF using the SPIN SPARQL Syntax4 , it is
    http://www.w3.org/TR/r2rml/
                                                                  uncommon to do so unless special use cases demand this.
                                                                     SML is based on work towards a unified formal model for
                                                                  RDB2RDF mappings. While it has equal expressiveness to
                                                                  R2RML, it uses a different syntactical approach: It blends
                                                                  the traditional SQL CREATE VIEW statements with SPARQL
                                                                  CONSTRUCT queries. Both features can be expected to be fa-

                                                                  2
                                                                    http://www.phpmyadmin.net
                                                                  3
WWW2015 Workshop: Linked Data on the Web (LDOW2015).                http://www.w3.org/TR/owl2-manchester-syntax/
                                                                  4
Copyright is held by the author/owner(s).                           http://spinrdf.org/sp.html
miliar to persons working on RDB2RDF data integration                    tools, the standardization of the RDB2RDF Mapping Lan-
and combined provide a more concise syntax than R2RML.                   guage (R2RML) was initiated by the W3C RDB2RDF work-
In fact, we believe that for RDF itself, history has shown               ing group5 .
that the seemingly obvious choice of syntactically building                 R2RML is defined in [4] as a mapping language for de-
on XML has had several drawbacks and the special purpose                 scribing customized mappings of relational data into RDF.
language Turtle meanwhile enjoys high popularity for man-                The R2RML specification is accompanied by the Direct Map-
ually crafting and editing RDF documents and Turtle 1.1                  ping (DM) specification [2], describing a standard way of
became a W3C Proposed Recommendation in 2014. Simi-                      translating a relational database into RDF without the use
larly, we believe a more intuitive special purpose RDB2RDF               of a customized mapping definition. An R2RML mapping
mapping language can provide similar benefits.                           definition is represented in RDF using the R2RML vocabu-
   The research on the syntax of SML builds on a compar-                 lary and serialized in the Terse RDF Triple Language (Tur-
ison of RDB2RDF mapping languages and a subsequently                     tle). It can be used to either store converted relational data
defined formal model of those languages. Languages like                  in an RDF dump file, expose the data as Linked Data or
R2RML and SML are syntactic instances of this formal                     allow querying it via a SPARQL endpoint. A more general
model. We use this model to highlight the equivalence be-                overview of mapping tools for structured sources is given
tween the languages and derive approaches for converting                 in [14]. Recent efforts, such as [5], propose extensions of
between them. In particular, this implies that any proces-               R2RML for non-relational sources by adding support for
sor, which can work on the W3C R2RML standard, can                       the use of e.g. XPath6 and JSONPath expressions in the
also use SML as input and no further implementation ef-                  mappings. In this work we focus on relational data.
fort is required to use SML in combination with a number                    With the advent of R2RML, vendors took up the standard
of RDB2RDF engines. Our main argument is that SML                        and either modified their existing tools to additionally sup-
despite its simplicity provides equal expressiveness and is,             port R2RML or created tools fully based on the standard.
therefore, a viable alternative to R2RML. The contributions              In general these tools can be categorized with regards to dif-
of the article are as follows:                                           ferent dimensions, with the type of data exposition and the
                                                                         mode of querying the underlying database being the most
     • Definition of the compact SML mapping language with               distinctive. A list of existing R2RML tools is given in Ta-
       equal expressiveness to R2RML                                     ble 1. These tools have in common that they all allow the
                                                                         exposition as SPARQL endpoint and all employ SPARQL-
     • Comparison of RDB2RDF mapping languages.                          to-SQL translation.
                                                                            The R2RML tools use the mapping definition expressed
     • A unified formal model of RDB2RDF mapping lan-                    in R2RML to connect the relational data with a domain on-
       guages.                                                           tology. The domain ontology describes the actual RDF data
                                                                         exposed and consists of standard vocabularies and custom
     • Converters from R2RML to SML and vice versa.                      created terms depending on the use case. Listing 1 provides
                                                                         an example of an R2RML mapping of a simple employee
     • Syntax highlighting definition for the editor vim and
                                                                         table only containing IDs (EMPNO) and names (ENAME).
       an online SML editor with syntax and error highlight-
       ing as a demonstrator. Although this component is            1    <# TriplesMap1 >
       an engineering effort, it contributed to the fairness of     2         rr : logicalTable [ rr : tableName " EMP " ];
                                                                    3         rr : subjectMap [
       the user study in terms of providing comparable tool         4                rr : template " http :// data . example . com / employee
       support for both R2RML and SML.                                                        /{ EMPNO } " ;
                                                                    5                rr : class ex : Employee ;
   All tools, demos and the specification of the SML syntax,        6         ];
                                                                    7         rr : p r e d i c a t e O b j e c t M a p [
are available at http://sml.aksw.org.                               8                rr : predicate ex : name ;
   The remainder of the article is structured as follows: In Sec-   9                rr : objectMap [ rr : column " ENAME " ];
tion 2 we review existing RDB2RDF mapping languages.                10        ].
Subsequently, in Section 3 we present a corresponding for-
                                                                                   Listing 1: Example of an R2RML mapping.
mal model. The SML syntax is introduced in Section 4,
whereas Section 5 compares it to R2RML. In Section 6 the
                                                                           D2RQ-ML7 is another declarative language for mapping
conversion approach from SML to R2RML is described. In
                                                                         RDB to RDF, supported by the D2R server. As D2R is one
Section 7, we describe a user study via a public survey with
                                                                         of the most popular RDB2RDF solutions, its mapping lan-
46 participants amounting to almost 16 hours of survey com-
                                                                         guage is also supported by other tools like UltraWrap. The
pletion time. Finally, we conclude this paper in Section 8.
                                                                         D2RQ mapping itself is an RDF document as well, usually
                                                                         written in Turtle syntax. The mapping defines a virtual
2.    RDB2RDF SYSTEMS AND MAPPING                                        RDF graph that contains information from the database.
                                                                         This is similar to the concept of views in SQL, except that
      LANGUAGES REVIEW                                                   the virtual data structure is an RDF graph instead of a vir-
   The mapping of relational databases (RDB) to the Re-                  tual relational table. The D2RQ Platform provides SPARQL
source Description Framework (RDF) is of keen interest                   access, a Linked Data server, an RDF dump generator, a
from the inception of the Semantic Web as exemplified in [6].            simple HTML interface, and Jena API access to D2RQ-
The exposition of such previously constrained data allows
                                                                         5
integration and interlinking with other data on the Web.                   http://www.w3.org/2001/sw/rdb2rdf/
                                                                         6
Based on this need, multiple tools and approaches emerged.                 http://www.w3.org/TR/xpath-30/
                                                                         7
In an approach for fostering interoperability between those                http://d2rq.org/d2rq-language
     mapped databases. Listing 2 shows an example of a D2RQ          SBRD utilizes Snoggle 12 for mapping from RDB to RDF.
     mapping from a conferences table in a database to the con-      Snoggle is a graphical ontology mapper based on the Se-
     ference class in an ontology.                                   mantic Web Rule Language (SWRL)13 . It allows users to
                                                                     draw ontologies and then create mappings between them on
1    map : Database1 a d2rq : Database ;
2          d2rq : jdbcDSN " jdbc : mysql :// localhost / iswc " ;    a graphical canvas. This mapping is then translated into
3          d2rq : jdbcDriver " com . mysql . jdbc . Driver " ;       SWRL/RDF or SWRL/XML.
4          d2rq : username " user " ;                                  An overview of the introduced RDB2RDF solutions is
5          d2rq : password " password " ;
6          .                                                         given in Table 1.
7    map : Conference a d2rq : ClassMap ;
           d2rq : dataStorage map : Database1 .
8
9          d2rq : class : Conference ;                               3.       TOWARDS A UNIFIED FORMAL MODEL
10         d2rq : uriPattern " http :// conferences . org / comp /
                 c on fn o @@ C o n f e r e n c e s . ConfID@@ " ;
                                                                              FOR RDB2RDF MAPPINGS
11         .                                                            In this section, we outline a formal approach for mapping
12   map : eventTitle a d2rq : PropertyBridge ;                      tabular data to RDF. For this purpose, we first briefly sum-
13         d2rq : belongsT o C l a s s M a p map : Conference ;
14         d2rq : property : eventTitle ;                            marize fundamental concepts of the RDF data model. It
15         d2rq : column " Conferences . Name " ;                    should be noted, that we assume that RDF is generated by
16         d2rq : datatype xsd : string ;                            row-wise processing of the underlying relational data. Both
17              .
                                                                     R2RML and SML build on this assumption. Also, without
                   Listing 2: D2RQ map for conferences.              loss of generality, we only consider quads rather than triples
                                                                     in the formalization. The reason is, that we can view any
     Generally speaking, D2RQ-ML is close to R2RML with              generated triple as being labeled with the URI of the graph
  some notable distinctions. D2RQ-ML includes the database           it belongs to.
  connection information in the mapping file and uses different
  constructs to express joins between tables.                   Preliminaries.
     Another notable approach is utilized in the ontop[9] plat-   Let the RDF primitives be: U the set of URIs, B the set of
  form by the Knowledge Representation meets Databases (KRDB) 8 blank nodes, L the set of literals and V the set of variables.
  research group. Ontop supports mapping definitions in its     Further:
  own language and R2RML. Quest, the SPARQL engine/rea-
  soner in ontop, implements query rewriting techniques that       • T is the set of all RDF terms, defined as U ∪ B ∪ L.
  translate SPARQL into SQL. Listing 3 shows an example
                                                                  Furthermore, we make use of the following notions:
  from the ontop documentation9 .
1 [ M app ing Dec lar ati o n ] @collection [[
                                                                   • J is the joint set of RDF terms and variables, defined
2     mappingId         Book collection                              as T ∪ V.
3         target         : BID_ { id } a : Book .
4         source         SELECT id FROM books                               • Q is the set of all quads, defined as J × J × J × J .
5    ]]
                                                                            • A quad pattern Q is a finite, possibly empty, set of
            Listing 3: Example of the Ontop mapping language                  quads, defined as Q ⊂ Q
        Virtuoso RDF Views [1] is another tool specific map-                • R is the set of all quad patterns, thus the powerset of
     ping language. It is part of OpenLink’s Virtuoso Universal               Q, denoted by P(Q)
     Server10 . Virtuoso RDF Views provide a declarative Meta
     Schema Language for mapping of SQL data to RDF ontolo-                 • A quad q is defined as q ∈ Q.
     gies and preceded Virtuoso’s R2RML support. The corre-
                                                                            • vars(Q) is the set of variables appearing in Q.
     sponding mappings are dynamic, such that changes to the
     underlying data are reflected immediately in the RDF views.            • A ground quad (pattern) is a variable free quad (pat-
     OpenLink Virtuoso Universal Server includes SPARQL sup-                  tern).
     port and an RDF data store tightly integrated with its rela-
     tional storage engine. An example of a Virtuoso RDF View           Finally, we introduce our notion of a relation instance
     definition is given in Listing 4.                               (short: relation) L, which, for convenience, we define as a set
                                                                     of partial functions that map attribute names to attribute
1    graph < http :// localhost / testdata / products # >            values. It is noteworthy, that R2RML defines an entity re-
2    subject prd : product_iri ( PRODUCT . PRODUCT_ID )
3    predicate rdf : type
                                                                     ferred to as logical table. Instances of this entity possess
4    object prd : Product                                            an effective SQL query 14 which can be evaluated over an
                                                                     instance of a database schema in order to obtain a relation.
                   Listing 4: Virtuoso RDF views example

       Besides the textual mapping languages, there are also         Generating RDF from relations.
     tools providing a graphical representation of the mapping.        Based on the previously introduced primitives, we are now
     The Asio Semantic Web bridge SBRD11 or the more re-             able to formally capture the nature of RDF mapping ap-
     cent R2RML editor presented in [12] fall into this category.    proaches for relational data.
     8                                                                 A relational data to RDF (R2R) mapping m is a four-
        http://www.inf.unibz.it/krdb/                                tuple (N, P, L, f ):
     9
        https://github.com/ontop/ontop/wiki/
                                                                     12
      ontopOBDAModel                                                      http://bbn.com/technology/knowledge/snoggle
     10                                                              13
        http://virtuoso.openlinksw.com/                                   http://www.w3.org/Submission/SWRL/
     11                                                              14
        http://bbn.com/technology/knowledge/asio_sbrd                     http://www.w3.org/TR/r2rml/#dfn-effective-sql-query
Tool/Features                  Mapping language          SPARQL Version                    License                      Support
Ontop [10]                     Own language & R2RML                1.0                     Free                         Free
Revelytix Spyder               R2RML                               1.1                     Free                         With fees
Asio SBRD                      Graphical                           —                       Commercial                   Commercial
Virtuoso RDF Views [1]         Own language & R2RML                1.1                     Free                         Free
D2RQ Platform [3]              D2RQ-ML                             1.1                     Free                         Free
Morph [8]                      R2RML                               1.0                     Free                         Free
Ultrawrap [13]                 R2RML                               1.0                     Commercial                   Commercial
SparqlMap [15]                 R2RML                               1.0                     Free                         Commercial
Sparqlify                      R2RML & SML                         1.0                     Free                         Free
                                    Table 1: Comparison between different mapping tools and languages.


       • N is the name of a view.                                            An analysis of the mapping languages revealed, that there
                                                                          is a small set of essential operations for RDB-to-RDF map-
       • P is a quad pattern which acts as the template for               pings, namely concat, str and urlEncode and percentEn-
         the construction of triples and relating them to named           code 17 . These function symbols are usually used for the
         graphs. The template is instantiated once for each row           construction of URIs and IRIs from values of the underly-
         of the relation.                                                 ing relation: The function symbol concat may be used to
       • L is a relation to be converted to RDF.                          prepend a prefix IRI to one or more ID columns. The func-
                                                                          tion symbol str corresponds to an implicit conversion and
       • f is a mapping with signature L → (V → T ): f yields             therefore usually does not have to be stated explicitly as it
         for each element of the relation L a partial function            can be implied. It is needed to preserve type consistency:
         that binds the variables of the template P to RDF                For instance, concat is only defined for string arguments.
         terms in T . Note that it is not required for variables          Therefore, concat(’http://ex.org/’, 1) would yield a type er-
         of P to be bound, which enables the support NULL                 ror without the prior conversion of the second argument to
         values in the source data.                                       string.
                                                                             Note, that although these functions could be applied in
An R2R mapping is valid, if its evaluation yields an RDF                  the underlying RDBMS, support at the mapping level opens
dataset, as defined in the SPARQL secification15 .                        possibilities for basic optimizations without the need to parse
  Given a quad pattern Q ⊂ Q and a partial function a :                   the involved SQL.
V → T , we define the substitution operator
                                                                     1     varDefinition = ( var ’= ’ rdf - term - ctor - expr ) * ;
                             ρ[a] : R → R                            2
                                                                     3    rdf - term - ctor - expr
ρ[a] (Q) yields a new ground quad pattern Q0 with all vari-          4          = bNode ’( ’ expr ’) ’
ables replaced in accordance with a. Any quads of Q with             5          | uri ’( ’ expr ’) ’
                                                                     6          | plainLiteral ’( ’ expr ( ’ , ’ expr ) ? ’) ’
unbound variables in a are omitted in Q0 .                           7          | typedLiteral ’( ’ expr ’ , ’ expr ’) ’
   An evaluation of a mapping m proceeds by passing each             8          ;
row of L as an argument to f , thereby obtaining the bindings        9
                                                                     10    expr - list
for vars(P ), which are used to instantiate the template P           11         = ( expr ( ’ , ’ expr ) *) ?
for finally creating ground quads. Let M be the set of all           12         ;
mappings, then a function eval : M → R can be defined as:            13
                                                                     14    expr
                                                                     15           =    var      // Denotes a reference to a column
                  [                                                 16           |    str ’( ’ expr ’) ’
        eval(m) =   ρ[f (l)] (P )       with m = (N, P, L, f )       17           |    concat ’( ’ expr - list ’) ’
                    l∈L                                              18           |    urlEncode ’( ’ expr ’) ’
                                                                     19           ;
   What remains is to define a representation of the function
f in terms of expressions. We refer to such a set of expres-                          Listing 5: EBNF for variable definition expressions
sions as a variable definition. An analysis of the mapping
languages revealed, that there is a small set of essential op-              Example: Assume a given relation holding the label of a
erations for RDB-to-RDF mappings, for which we devised                    product:
an Extended Backus–Naur Form (EBNF) of an expression                                           {{(id, 1), (label,“Coke”)} , . . .}
grammar as shown in Listing 5 and explained as follows.
   In a first step, we need to be able to construct RDF terms             Assume that we aim to obtain the following assignment from
from the underlying relation, hence we introduce the rdf-                 variables to RDF terms:
term-ctor-expr 16 production. Note that our plainLiteral
and typedLiteral functions roughly correspond to the func-                        {{(?s, ), (?l,“Coke”@en)} , . . .}
tionsSTRDT and STRLANG of the SPARQL standard, although                        Then a definition of f as
in SML arguments may be of types other than string, such
as when mapping a column of type real to a corresponding                        f : [{?s = uri(concat(’http://ex.org/’, str(?id))),
typed literal. Yet, in the future aliases may be introduced to                                        ?l = plainLiteral(?label, ’en’)}]     (1)
SML for better alignment with existing SPARQL features.
15                                                                        would yield the desired output.
     http://www.w3.org/TR/sparql11-query/#rdfDataset
16                                                                        17
     We use ctor as abbreviation for constructor                               http://tools.ietf.org/html/rfc3986
4.      SML SYNTAX                                                   to SML. Figure 1 shows a side-by-side comparison of the
  In this section, we give an introduction to the SML syntax.        mapping languages for a specific example. Both syntactic
  The left hand side of Figure 1 shows an example of SML,            formats can be converted to each other.
whose syntactic constituents are explained as follows.
  Recall that in the previous Section 3 we formally defined          Defining Logical Tables.
an R2R view as a four-tuple (N, P, L, f ). The core syntax of           R2RML defines the predicate rr:logicalTable to relate
an SML view definition comprises four parts that correspond          a TriplesMap to a logical table. The object of this pred-
directly to the formal definition. Additionally, SML features        icate must be a resource that is further described using
an optional constraint component for improving query per-            rr:tableName or rr:sqlQuery. A TriplesMap must have
formance. An SML view definition is composed of the fol-             exactly one logical table. In SML, the FROM clause serves
lowing parts:                                                        the same purpose. Table 2 compares how to state a logical
                                                                     table in R2RML and SML.
      • The name of the view. This corresponds to an element
        of the set N .                                                  rr:tableName ”person”             . . . From person
                                                                        rr:tableName ”SCOTT.DEPT”         . . . From ”SCOTT.DEPT”
      • A construct clause, which consists of triple patterns           rr:sqlQuery ”””SELECT . . . ”””   . . . From ”””SELECT . . . ”””
        which can be optionally associated with a specific named     Table 2: Comparison of attributes of rr:logicalTable with SML’s
        graph by surrounding them with GRAPH G { . . . }, where      FROM clause.
        G can be a variable name or an IRI. Hence, the syn-
        tax is equivalent to the quads production rule of the
        SPARQL 1.1 standard18 . This corresponds to an ele-          Creating RDF terms from logical tables.
        ment of P .                                                    Both SML and R2RML allow to express how to create
                                                                     RDF terms from the rows of the underlying logical table. In
      • A FROM clause, where a reference to a logical table
                                                                     SML, the term constructor expressions of the WITH clause
        can be specified. As in R2RML, this can be either an
                                                                     serve this purpose, whereas R2RML introduces the notion
        SQL SELECT statement, the name of a physical table
                                                                     of TermMaps. SML uses an expression syntax to specify
        or the name of a view. The former needs to be escaped
                                                                     the RDF term creation, which corresponds to using a com-
        in triple double-quotes, i.e. """SELECT ...""". The
                                                                     bination of the properties rr:template, rr:termType and
        execution of a logical table’s effective SQL query over
                                                                     rr:datatype in R2RML. The template syntax for values
        an SQL connection yields a result set which formally
                                                                     of rr:template corresponds to the SML expression sym-
        corresponds to L.
                                                                     bols concat and urlEncode. Table 3 shows examples of RDF
      • The variable definition clause acts as the bridge be-        term construction in both mapping languages. The function
        tween the RDF and SQL data models, and is used to            asTemplate(expr) is assumed to yield the R2RML template
        specify the creation of RDF terms from rows of the re-       for a given SML expression. Note that SML is slightly more
        lation. It consists of a set of variable definition state-   expressive in this regard, as it allows e.g. nested urlEncod-
        ments of the form ?var = rdf-term-ctor(expr0 , . . . ,       ings.
        exprn ), and should at least support the grammar de-
        fined in 5.                                                  Forming Quads from RDF Terms.
                                                                       Once there exists a specification of how to create RDF
      • Finally, there is a CONSTRAINT clause for specify-           terms from the rows of a logical table, these RDF terms need
        ing contstraints about variables on the RDF level. As        to be grouped to form quads. In SML, this is done using the
        such, it has no direct influence on the virtual RDF re-      CONSTRUCT clause, which re-uses the quads production
        lation, but rather on query performance. The example         rule of the SPARQL 1.1 standard. Hence, anyone familiar
        in Figure 1 shows, that solely based on the definition
        of ?s = uri(?website) we have no information about
        the set of URIs being created. Specifying such a type        SML RDF term constructor        R2RML term map
        of constraints enables SPARQL-to-SQL rewriters, for          bNode(?COL)                     ... [ rr:column "COL" ;
                                                                                                           rr:termType rr:blankNode ]
        instance, to prune joins whose join condition equates
        variables with disjoint sets of prefixes. Syntactically,     bNode(expr )                    ... [ rr:template "asTem-
                                                                                                     plate(expr) " ;
        up to now only stating prefix constraints is supported.                                            rr:termType rr:blankNode ]
                                                                     uri(expr )                      ... [ rr:(constant|column|template)
5.      COMPARISON OF SML WITH R2RML                                                                 "asTemplate(expr) ";
                                                                                                           rr:termType rr:IRI ]
   In this section, we summarize essential features of R2RML
                                                                     plainLiteral(?COL)              ... [ rr:column "COL" ]
and explain how they relate to those of SML. R2RML map-
pings are expressed as RDF graphs for which R2RML by                 plainLiteral(expr )             ... [ rr:template "asTem-
                                                                                                     plate(expr) " ]
convention uses Turtle serialization. The fundamental class
is rr:TriplesMap, whose instances are specifications of the          typedLiteral(?COL, xsd:int )    ... [ rr:column "COL" ;
                                                                                                           rr:datatype xsd:int ]
triples to generate from an underlying logical table. As such,
an instance of a TriplesMap corresponds to an SML view               typedLiteral(expression,        ... [ rr:template "asTem-
                                                                     xsd:int )                       plate(expr) " ;
definition. In the following, we explain the most important                                                rr:datatype xsd:int ]
attributes that TriplesMaps may have, and compare them
                                                                     Table 3: Transformation of SML term constructors to R2RML
18
     http://www.w3.org/TR/sparql11-query/#rQuads                     term maps
SML                                                    R2RML
                                                       @prefix rdfs:  .
                                                       @prefix xsd:  .
                                                       @prefix ex:  .
Prefix rdfs: 
Prefix xsd:         
Prefix ex:                               rr:logicalTable [ rr:sqlQuery
                                                         """SELECT website, name,
Create View hotels As                                    vacancy FROM hotels""" ];
  Construct {
    ?s a ex:Hotel ;                                     rr:subjectMap [
      rdfs:label ?l ;                                     rr:column "website";
      ex:vacancy ?v                                       rr:class ex:Hotel
  }                                                     ];
  With                                                  rr:predicateObjectMap [
    ?s = uri(?website)                                    rr:predicate rdfs:label;
    ?l = plainLiteral(?name,’en’)                         rr:objectMap
    ?v = typedLiteral(?vacancy,                             [ rr:column "name";
              xsd:boolean)                                    rr:language "en"];
  Constrain                                             ];
    ?s prefix "http://ex.org/"                          rr:predicateObjectMap [
  From                                                    rr:predicate ex:vacancy;
    """SELECT website, name,                              rr:objectMap
      vacancy FROM hotels"""                                [ rr:column "vacancy";
                                                              rr:datatype
                                                                xsd:boolean ];
                                                        ].

                                           Figure 1: A simple view in SML and R2RML.


with SPARQL should already be familiar with SML in this                Assigning Triples to Named Graphs.
regard.                                                                   To assign triples to be generated to a certain named graph,
   In R2RML, the TriplesMap serves this purpose, however               again term maps are utilized. Accordingly, an additional
its specification is more verbose: For relating a TriplesMap           term map inside a subject or predicate map is defined, which
to TermMaps for the subject, predicate, object and graph               is called graph map. These nested term map expressions
components, there exist the general properties rr:subjectMap,          introduce further complexity to the actual triples map.
rr:predicateMap, rr:objectMap and rr:graphMap, respec-
tively. Note that for each of these properties there exists a
syntactic shortcut without the Map in the name, that can
                                                                       6.    CONVERTING SML TO R2RML
be used for constants.                                                    In this section, we briefly outline the approach for the
   These properties are used to form the following structure:          conversion of SML to R2RML. The process of converting
Every TriplesMap carries exactly one specification for its             prefix definitions is straightforward since the only difference
subjects, and zero or more specifications for the pairs or             is that SML uses the Prefix keyword, whereas in R2RML
predicates and objects. Subjects are specified by relating             turtle notation @prefix is used. The name of an SML view
the TriplesMap to a TermMap using rr:subjectMap. A Sub-                definition serves as a base for crafting the IRIs for naming
jectMap may carry zero or more attributes of rr:graphMap               triples maps. However, it has to be taken into account that
and rr:class. The former specifies in which graphs the gen-            one SML view definition can correspond to many R2RML
erated triples reside. The latter is a syntactic shortcut for          triples maps. The definition of a logical tables as table
rdf:type’ing the subjects with given IRIs. Zero or more                names, view names or queries have direct counterparts in
rr:predicateObjectMap attributes denote the predicate-object-          R2RML, namely rr:tableName and rr:sqlQuery.
pairs to associate with each subject. Thereby, each Predica-              The SML CONSTRAINT clause has no equivalent in R2RML
teObjectMap carries the attributes rr:objectMap and graphMap,          and is thus omitted in the conversion. Note that constraints
which again are TermMaps.                                              only act as hints that may be considered for improving per-
                                                                       formance.
                                                                          The Construct and With sections of an SML do not di-
Foreign Key Relations among Logical Tables.
                                                                       rectly translate to R2RML. In general, a new triples map can
   R2RML offers a model for the expression of joins. This
                                                                       be created for each quad of the Construct section. However,
model is primarily intended for the generation of IRIs that
                                                                       if an SML view defines multiple quads with the same variable
require a join of tables between one or more foreign key con-
                                                                       in the subject position, multiple instances of rr:predicate-
straints to hold. The R2RML vocabulary distinguishes the
                                                                       ObjectMap can be created on the same triples map.
roles of the parent and child table, where the child references
                                                                          Regarding the constructor arguments, one has to differen-
the parent on a certain join condition.
                                                                       tiate between an atomic value and a compound expression.
   SML offers a limited SQL syntax for this purpose, which
                                                                       An atomic value can either be a column reference, resulting
is shown and compared to R2RML in Figure 2. Note, that
                                                                       in an rr:column term map or a constant expression requiring
in contrast to using SML’s triple double-quotes syntax for
                                                                       rr:constant. In all other cases, a more complex expression is
stating SQL queries, this syntax allows the SML processor
                                                                       assumed, resulting in an rr:template. Such an expression has
to understand the SQL natively (i.e. without requiring an
                                                                       to be evaluated from the innermost to the outermost term
full SQL parser) and thus consider it for optimizations such
                                                                       constructor which leads to the evaluated expression shown
as self join elimination.
                                                                       in Table 3.
SML                                                 R2RML
                                                    @prefix ex:  .

                                                    <#Departments>
Prefix ex: 
                                                      rr:logicalTable [ rr:tableName "departments" ];
Create View departments As
                                                      rr:subjectMap [
  Construct {
                                                          rr:template "http://ex.org/dept/{id}" ];
    ?d a ex:Department
                                                          rr:class ex:Department
  }
                                                      .
  With
    ?d = uri(ex:dept, ?name)
                                                    <#Employees>
  From
                                                      rr:logicalTable [ rr:tableName "employees" ];
    departments
                                                      rr:subjectMap [
Create View employees As
                                                          rr:template "http://ex.org/emp/{id}" ;
  Construct {
                                                          rr:class ex:Employee ];
    ?e a ex:Employee ;
      ex:worksIn ?d
                                                      rr:predicateObjectMap [
  }
                                                        rr:predicate ex:worksIn;
  With
                                                        rr:objectMap [
    ?e = uri(ex:emp, ?e.id)
                                                          rr:parentTriplesMap <#Departments>;
    ?d = uri(ex:dept, ?d.name)
                                                          rr:joinCondition [
  From
                                                            rr:child "dept_id";
    employees e Join departments d
                                                            rr:parent "id";
      On (d.id = e.dept_id)
                                                          ];
                                                        ];
                                                      ].
                                Figure 2: A view in SML and R2RML with a referencing object map.


  An even more complex R2RML mapping has to be cre-                 In the first 3 tasks, participants had to select the subset
ated if an SML variable already used in subject position is         of 4 shown triples, which was actually generated from a
also referred to in the object position of another Construct        given mapping. The tasks were ordered by complexity of
statement. In this case a referencing object map has to be          the mapping specification. In the 4th and 5th task, the in-
used in R2RML. An example and the corresponding triples             verse needed to be performed: Given a target RDF output,
maps are shown in Figure 2. There, the definition of the            participants had to select those mappings from 4 presented
rr:joinCondition can be omitted since both triples maps             mappings, which generates the target output. Finally, in the
refer to the same logical table. Note, that R2RML only pro-         third part of the survey, participants had to assess a) the dif-
vides a language expression to declare referencing objects          ficulty of the presented tasks, b) whether they could make
and not for e.g. predicates and graphs. A conversion from           sense of the SML and R2RML mappings, c) whether they
R2RML to SML proceeds in a similar fashion as described             found SML and R2RML easy to read, d) whether they would
in this section, however details are omitted for brevity.           consider using SML for RDB2RDF tasks and e) whether
                                                                    they have a preference between SML and R2RML.
7.    EVALUATION                                                       The survey was distributed to the Semantic Web and
                                                                    Linked Data mailing lists and announced on Twitter.
   We evaluated the SML mapping language to clarify the
following questions:
   1. Is SML easier to read than R2RML and does SML                 7.2    Results
      have a lower entry barrier than R2RML?                           Overall, a total of 102 participants took part in the sur-
   2. Can people understand SML mappings or R2RML map-              vey, of which 73 completed the survey. We removed entries
      pings faster?                                                 with an completion time below 500 seconds in order to re-
   3. If given the choice, would people prefer SML or R2RML?        move bot entries and carefully assessed the removed entries.
                                                                    46 participants remained out of which 28 answered all test
7.1    Experimental Setup                                           questions correctly. The 46 participants required an average
   We set up a survey, which consists of three parts:               time of 1243.1 seconds to complete the survey. As a result,
    1. Questions about prior expertise of the participant.          the overall time valid participants spent on the survey was
    2. Test questions for SML and R2RML.                            953 minutes. All results of the survey can also be directly
    3. An assessment of the characteristics of SML and R2RML        obtained and analysed at the SML project website.
       by the participants.                                            The averaged results of the survey are shown in Table 4.
   We used a standard star rating for most questions ranging        The self assessment scores in Table 4 illustrate that the au-
from 1 star (lowest value) to 5 starts (highest value). The         dience is interested in RDB2RDF conversions and that the
comparison with R2RML was performed as it is the current            participants are familiar with Turtle (TTL), SPARQL and
W3C standard for RDB2RDF mapping.                                   SQL. R2RML familiarity is considerably lower with an av-
   In the fist part, we asked participants to state their famil-    erage of 3 and SML relatively unknown (1.74).
iarity with the SML and R2RML languages as well as with                We discuss each of our evaluation questions in turn:
related concepts such as the Turtle syntax and the SQL and             Readability and Entry barrier: Figure 3 shows that SML
SPARQL query languages. In the second part, we had 5                appears to have a lower entry barrier than R2RML. Par-
different tasks for participants. Each task was formulated          ticipants who were familiar with R2RML already judged
for SML and R2RML with renamed classes and properties.              both languages to have similar readability. However, par-
criterion                value                              criterion                                 value
Relevance :               4.15                              Average Understandability R2RML :          3.85
Familiarity TTL :         4.11                              Average Understandability SML :            3.88
Familiarity SPARQL:       4.30                              Average Readability R2RML :                3.26
Familiarity SQL :         4.33                              Average Readability SML :                  3.72
Familiarity R2RML :          3                              Average Considering SML :                  3.59
Familiarity SML :         1.74                              Average Preference R2RML - SML:            3.26
                       Table 4: Averaged results of survey questions (1 = lowest rating, 5 = highest rating).




Figure 3: The plot shows the R2RML familiarity plotted against the readability assessment of R2RML and SML. Overall, SML was
judged to be significantly more readable although this effect is reduced for participants already familiar with R2RML.




Figure 4: The figure on the left assesses the time needed to solve an RDB2RDF mapping task. The figure on the right shows the
preference for a mapping language. Overall, SML tasks required less time and the language was preferred, although this does not hold
when considering only R2RML experts.


ticipants who were less familiar with R2RML judged SML                  67.08 seconds and for SML 69.23 seconds, giving R2RML
to be much more readable than R2RML and gave high read-                 a slight edge. In a more thorough analysis, Figure 4 shows
ability scores.                                                         the time required to solve the first RDB2RDF mapping task
   Time needed to solve tasks: For this task, we randomly               in the survey, for R2RML experts (self assessment greater
started the survey either with either an SML or R2RML                   or equal 4) and R2RML novices (self assessment less 4).
task to be able to assess this evaluation question. The me-             R2RML experts require less time for solving the task in
dian time required for completing completing in R2RML was               the R2RML, however R2RML novices appear are able to
complete the task faster using SML. It is further clear that       9.   REFERENCES
R2RML experts require less time for solving the task regard-
less of the utilized language. It should be noted that there        [1] Mapping relational data to rdf with virtuoso’s rdf
is an unknown amount of time required to understand the                 views.
task irrespective of the mapping language, i.e. the difference          http://virtuoso.openlinksw.com/whitepapers/
between the mapping language is larger than depicted.                   relational%20rdf%20views%20mapping.html.
   Overall preference: Figure 4 shows that there is an over-        [2] M. Arenas, A. Bertails, E. Prud’hommeaux, and
all preference for SML. This preference does not hold when              J. Sequeda. R2rml: Rdb to rdf mapping language
considering only the group of R2RML experts, but is very                (w3c recommendation). Technical report, 2012.
significant when considering people not familiar with any of        [3] C. Bizer and R. Cyganiak. D2r server – publishing
the two mapping languages.                                              relational databases on the semantic web. Poster at
                                                                        the 5th Int. Semantic Web Conf. (ISWC2006), 2006.
8.   CONCLUSIONS AND FUTURE WORK                                    [4] S. Das, S. Sundara, and R. Cyganiak. R2rml: Rdb to
   In this article, we presented work towards a unified model           rdf mapping language (w3c recommendation).
for RDB2RDF mappings and the lightweight mapping lan-                   Technical report, 2012.
guage SML that reuses familiar elements of SPARQL and               [5] A. Dimou, M. Vander Sande, P. Colpaert,
SQL in order to lower the learning curve and ease the manual            R. Verborgh, E. Mannens, and R. Van de Walle. Rml:
writing and maintenance of view definitions. An extensive               a generic language for integrated rdf mappings of
public survey confirmed that this is the case. We provided              heterogeneous data. In Proceedings of the 7th
an in-depth comparison of how SML relates to the R2RML                  Workshop on Linked Data on the Web (LDOW2014),
standard, and detailed how the former can be automatically              Seoul, Korea, 2014.
converted to the latter.                                            [6] T. B. Lee. Relational databases on the semantic web,
   SML has been successfully deployed in several scenarios:             09 1998. Design Issues (published on the Web).
We created SML mappings for the BSBM and SP2 bench-                 [7] C. Pinkel, C. Binnig, P. Haase, C. Martin,
marks, two popular SPARQL benchmarks that are often                     K. Sengupta, and J. Trame. How to best find a
used for evaluating RDB2RDF mappers. Most prominently,                  partner? an evaluation of editing approaches to
we created SML mappings for transforming the OpenStreetMap              construct r2rml mapping. In ESWC, 2014.
(OSM) database to RDF. These efforts are carried out as             [8] F. Priyatna, O. Corcho, and J. Sequeda. Formalisation
part of the LinkedGeoData19 (LGD) project, where we give                and experiences of r2rml-based sparql to sql query
access to more than 25 billion OSM RDF triples created                  translation using morph. In Proceedings of the 23rd
through SPARQL-to-SQL rewriting over about 3 billion re-                international conference on World wide web, pages
lational rows via more than 40 SML view definitions.                    479–490. International World Wide Web Conferences
   Furthermore, SML mappings have been created for two                  Steering Committee, 2014.
large scale linguistic resources: One is the mapping of the
                                                                    [9] M. Rodriguez-Muro, M. Rezk, J. Hardi, M. Slusnys,
Wortschatz database20 , which contains statistics, such as
                                                                        T. Bagosi, and D. Calvanese. Evaluating sparql-to-sql
frequency and co-occurrences, about words in more than
                                                                        translation in ontop. In OWL Reasoner Evaluation
240 languages. The other resource is PanLex21 , which is a
                                                                        Workshop, volume 1015 of CEUR Workshop
database holding translations of about 19 million expression
                                                                        Proceedings, pages 94–100. CEUR-WS.org, 2013.
extracted from over 2.000 sources. Links to the correspond-
                                                                   [10] M. Rodriguez-Muro, M. Rezk, J. Hardi, M. Slusnys,
ing SML mappings are published together with our other
                                                                        T. Bagosi, and D. Calvanese. Evaluating sparql-to-sql
SML related resources22 .
                                                                        translation in ontop. 2013.
   In general, we believe that mapping relational structures
to RDF will stay a highly important topic in research and          [11] K. Sengupta, P. Haase, M. Schmidt, and P. Hitzler.
practice to provide an unobtrusive transition towards the use           Editing r2rml mappings made easy. 2013.
of semantic technologies. Providing engineers an intuitive         [12] K. Sengupta, P. Haase, M. Schmidt, and P. Hitzler.
yet powerful language is a crucial step to ease this transition.        Editing r2rml mappings made easy. In International
Future work will continue on extending the formalizations as            Semantic Web Conference (Posters & Demos), volume
well as sorting out details based on community feedb, such              1035 of CEUR Workshop Proceedings, pages 101–104.
as whether an explicit FROM QUERY syntax for specifying SQL             CEUR-WS.org, 2013.
queries is preferred over the current approach where this is       [13] J. F. Sequeda and D. P. Miranker. Ultrawrap: Sparql
implied by the use of triple quotes.                                    execution on relational data. Web Semantics: Science,
                                                                        Services and Agents on the World Wide Web,
Acknowledgment                                                          22:19–39, 2013.
                                                                   [14] J. Unbehauen, S. Hellmann, S. Auer, and C. Stadler.
This work was supported by grants from the EU’s 7th Frame-              Knowledge extraction from structured sources. In
work Programme provided for the projects LOD2 (GA no.                   S. Ceri and M. Brambilla, editors, Search Computing -
257943) and GeoKnow (GA no. 318159).                                    Broadening Web Search, volume 7538 of Lecture Notes
19                                                                      in Computer Science, pages 34–52. Springer, 2012.
   http://linkedgeodata.org/
20
   http://www.wortschatz.uni-leipzig.de/                           [15] J. Unbehauen, C. Stadler, and S. Auer. Accessing
21
   http://ld.panlex.org                                                 relational data on the web with sparqlmap. In JIST,
22
   http://sml.aksw.org                                                  2012.