RML: A Generic Language for Integrated
                         RDF Mappings of Heterogeneous Data

              Anastasia Dimou                        Miel Vander Sande                          Pieter Colpaert
          anastasia.dimou@ugent.be                 miel.vandersande@ugent.be                pieter.colpaert@ugent.be
             Ruben Verborgh                            Erik Mannens                          Rik Van de Walle
         ruben.verborgh@ugent.be                   erik.mannens@ugent.be                  rik.vandewalle@ugent.be
                                            Ghent University – iMinds – Multimedia Lab
                                                         Ghent, Belgium

ABSTRACT                                                           or even in different formats. Furthermore, data is mapped
Despite the significant number of existing tools, incorporat-      progressively, thus it is important that data publishers incor-
ing data from multiple sources and different formats into the      porate their data in what is already published. Reusing the
Linked Open Data cloud remains complicated. No mapping             same unique identifiers for concepts is necessary to achieve
formalisation exists to define how to map such heterogeneous       this, but it is only possible if prior existing definitions in the
sources into rdf in an integrated and interoperable fashion.       same dataset are discovered and if they can be replicated.
This paper introduces the rml mapping language, a generic          Otherwise, duplicates will inevitably appear—even within
language based on an extension over rrml, the wc stan-           a publisher’s own datasets. Identifying, replicating, and keep-
dard for mapping relational databases into rdf. Broadening         ing those definitions aligned is complicated and the situation
rrml’s scope, the language becomes source-agnostic and            aggravates the more data is mapped and published.
extensible, while facilitating the definition of mappings of          Solving this problem requires a uniform, modular, interop-
multiple heterogeneous sources. This leads to higher integrity     erable and extensible technology that supports this need for
within datasets and richer interlinking among resources.           gradually incrementing datasets. Such a solution can deal
                                                                   with the mapping and primary interlinking of the data, which
                                                                   should take place in a tightly coordinated way instead of
1.      INTRODUCTION                                               as two separate, consecutive actions. This ensures semantic
   Deploying the five stars of the Linked Open Data schema1        representations of higher quality and datasets with better
is the de-facto way of mapping data. In real-world situations,     integrity. To this end, we propose rml, a generic mapping
multiple sources of different formats are part of multiple         language defined as an extension of rrml2 , the wc recom-
domains, which in their turn are formed by multiple sources        mendation for mapping data in relational databases into rdf.
and the relations between them. Approaching the stars as a            The remainder of the paper is organized as follows: Sec-
set of consecutive steps and applying them to a single source      tion 2 discusses related solutions existing today. Section 3
every time—as most solutions tend to do—is not always an           analyzes the requirements of a mapping language, and Sec-
optimal solution. When mapping heterogeneous data into             tion 4 introduces the proposed approach. Next, Section 5
rdf, such approaches often fail to reach the final goal of         addresses the challenges of implementing an rml processor.
publishing interlinked data. The semantic representation of        Finally, Section 6 outlines our conclusions and future work.
each mapped resource is defined independently, disregarding
its possible prior definitions and its links to other resources.
Manual alignment to their prior appearances is performed           2.   RELATED WORK
by redefining their semantic representations, while links to          Several solutions exist to execute mappings from differ-
other resources are defined after the data are mapped and          ent file structures and serialisations to rdf. For relational
published. Nonetheless, as datasets are often shaped gradu-        databases, different mapping languages beyond rrml are
ally, a demand emerges for a well-considered policy regarding      defined [3] and several implementations already exist3 . Simi-
mapping and primary interlinking of data in the context of         larly, mapping languages were defined to support conversion
a certain knowledge domains.                                       from data in csv and spreadsheets to the rdf data model.
   For instance, governments publish their data as Open Data       They include the XLWrap’s mapping language [5] that con-
and turn them into Linked Open Data afterwards. Much of            verts data in various spreadsheets to rdf, the declarative
this data, as expected when dealing with many sources, com-        owl-centric mapping language Mapping Master’s M2 [6]
plements each other in the description of different knowledge      that converts data from spreadsheets into the Web Ontol-
domain. Therefore, the same concepts appear in multiple            ogy Language (owl), Tarql4 that follows a querying ap-
data sets, and problematically, often with different identifiers   proach and Vertere5 . The main drawback in the case of most
1                                                                  csv/spreadsheet-to-rdf mapping solutions is the assumption
    http://5stardata.info/
                                                                   2
                                                                     http://www.w3.org/TR/r2rml
Copyright is held by the author/owner(s).                          3
                                                                     http://www.w3.org/2001/sw/rdb2rdf/wiki/Implementations
LDOW2014, April 8, 2014, Seoul, Korea.                             4
                                                                     https://github.com/cygri/tarql
                                                                   5
                                                                     https://github.com/knudmoeller/Vertere-RDF
that each row describes an entity (entity-per-row assumption)      Mapping definitions’ reusability. The mapping definitions
and that each column represents a property.                        of current solutions are not reusable, as there is no standard
   A larger variety of solutions exist to map from xml to rdf,     formalisation for any source format apart from relational
but to the best of our knowledge, no specific languages were       databases, i.e., rrml. In most cases, the mapping rules
defined for this, apart from grddl6 that essentially provides      are not interoperable as they are tied to the implementation,
the links to the algorithms (typically represented in xslt)        which prevents their extraction and reuse across different
that map the data to rdf. Instead, tools mostly rely on            implementations. Moreover, this prohibits reuse of the same
existing xml solutions, such as xslt (e.g., Krextor [4] and        mapping rules to map data that describe the same model,
AstroGrid-D7 ), xpath (e.g., Tripliser8 ), and xquery (e.g., XS-   but is serialized in different initial formats.
PARQL [1]). In general, most existing tools deploy mappings
from a certain source format to rdf (per-source approaches).       3.2    Requirements for generic mappings
Few tools provide mappings from different source formats to           To achieve datasets with better integrated and richer in-
rdf; and those tools actually employ separate source-centric       terlinked resources, the aforementioned issues should be ad-
approaches for each of the formats they support. Datalift [7],     dressed during the mapping phase, rather than later. A set
The DataTank 9 , OpenRefine10 , RDFizers 11 and Virtuoso           of factors that contribute to this are outlined below.
Sponger12 are the most well-known.
                                                                   Uniform and interoperable mapping definitions. Since
                                                                   we require a uniform way of dealing with different source
3.    MAPPINGS METHODOLOGY                                         serializations, the mapping definitions should be defined
  After outlining the limitations of existing solutions, we        independently of the references to the input data. The
present the factors that can improve the mappings to produce       same mappings may then be reused across different sources—
better integrated datasets and early interlinked resources.        as long as they capture the same context (i.e., the same
                                                                   rdf representations)—only by changing the reference to
3.1     Limitations of current mapping methods                     the input source that holds the information. For example,
  We identified the following limitations that prevent current     a performance described in a json file and an exhibition
practices from achieving well integrated datasets.                 described in an xml file may take place at the same location,
                                                                   indicated by an identical longitude/latitude pair. We only
Mapping of data on a per-source basis. Most of the cur-            need a single mapping definition to describe their location,
rent solutions work on a per-source basis: only one source         adjusted to point to respectively the json objects and the
is mapped at once, as opposed to mapping different related         xml elements that hold the corresponding values. Therefore,
sources together, despite covering the same domains or shar-       we require a modular language in which the references to the
ing the same formats. As a result, data publishers can only        data extracts and the mapping definitions are distinct and
generate resources and links between data appearing within         not interdependent. Thereby, the mapping definitions can be
a single source. Their mapping definitions need to be aligned      reused across different implementations for different source
manually when the same resources already appear in the             formats, reducing the implementation and learning costs.
targeting dataset. Thus, data publishers need to redefine
and replicate the patterns for the resources’ uris definition      Robust cross-references and interlinking. Redefining and
every time they appear in a new mapping rule. Furthermore,         replicating patterns every time a new input source is inte-
this is not always possible, as the data included in the one       grated should be avoided. Publishers should be able to
source may not be sufficient to replicate the same uris. This      uniquely define the pattern that generates a resource and
results in distinct uris for identical resources, which leads to   refer to its definition every other time this resource is mapped
duplicates within a publisher’s own dataset. In addition, the      (in this way enriched), which has the following three advan-
interlinking of the resources generated from different sources     tages: First, possible modifications to the patterns, or data
has to be performed afterwards.                                    values appearing in the patterns that generate the uris, are
                                                                   propagated to every other reference of the resource, mak-
Mapping data on a per-format basis. Besides the per-               ing the interlinking more robust. Second, taking advantage
source approach, most of the current solutions provide a           of this integrated solution, cross-references among sources
per-format approach: only mappings from a certain source           become possible; links between resources in different input
format (e.g., xml) are supported. In practice, data publishers     sources are defined already on mapping level. Third, and
need to map various source formats to rdf. Therefore, they         most significant, when data publishers want to map a new
need to install, learn, use and maintain different tools for       source, their new mappings are defined taking advantage of
each case separately, which hampers their effort to ensure         and automatically aligning to the existing ones.
the integrity of their datasets even more. Alternatively, some        Extending the aforementioned example, the venue where
end up implementing their own case-specific solutions.             the performance and the event take place is the same. When
                                                                   the input source for the performances was mapped, the
6                                                                  mappings for the possible venues were defined considering
   http://www.w3.org/TR/grddl/
7                                                                  certain identifiers to define their uris. Once the exhibitions
   http://www.gac-grid.de/project-products/Software/XML2RDF.html
 8
   http://daverog.github.io/tripliser/
                                                                   are about to be mapped, the data publisher might not be
 9
   http://thedatatank.com
                                                                   able to reuse the existing mapping definition for the venues
10
   http://openrefine.org/                                          as the identifiers are not included in the dataset to replicate
11
   http://simile.mit.edu/wiki/RDFizers                             the same patterns. However, the venue name might be
12
   http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/     considered to determine the binding. Then, the existing
 VirtSponger                                                       mapping definition can be referred to generate the same uris
and, thus enrich the existing resource with new attributes and       Triples Maps, when the subject of a Triples Map is the same as
interlink data from the newly mapped dataset to the existing         the object generated by a Predicate-Object Map. A Referencing
one. As the original input source is an Open Data set that           Object Map (rr:RefObjectMap) is used then to point to the Triples
can be referenced, it is always available to be used to support      Map that generates on its Subject Map the corresponding re-
the mapping of the new data. Summarizing, the definition             source, the so-called Referencing Object Map’s Parent Triples
of the links between resources in different sources—even if          Map. If the Triples Maps refer to different Logical Tables, a
they are in different formats—happens on the mapping level           join between the Logical Tables is required. The join condition
instead of during a subsequent interlinking step.                    (rr:joinCondition) performs the join exactly as a join is exe-
                                                                     cuted in sql. The join condition consists of a reference to a
Scalable mapping language. As the references to the data             column name that exists in the Logical Table of the Triples
extracts and the mapping definitions are distinct and not            Map that contains the Referencing Object Map (rr:child) and a
interdependent, the pointer to the input source’s data can           reference to a column name that exists in the Logical Table of
be adjusted to each case. Such modular solution leads to             the Referencing Object Map’s Parent Triples Map (rr:parent).
correspondingly modular implementations that perform the
mappings in a uniform way, independent of the input source.          4.2    RML
They only adjust the respective extraction mechanism de-                rml keeps the mapping definitions as in rrml but ex-
pending on the input source. Case-specific solutions exist           cludes its database-specific references from the core model.
because complete generic solutions fail, as it is impossible to      The potential broad concepts of rrml, which were explained
predict every potential input. A scalable solution addresses         previously [2], are formally designated in the frame of the
what can be defined in a generic way for all possible dif-           rml mapping language and are elaborated upon here. The
ferent input sources and scales over what cannot. In order           primary difference is the potential input that is limited to
to support emerging needs, it should allow extensions with           a certain database in the case of rrml, while it can be a
source-specific references, addressed on a case-specific level.      broad set of (one or more) input sources in the case of rml.
                                                                     Table 1 summarizes overall the rml’s extensions over rrml
4.     RML MAPPING LANGUAGE                                          entailed because of the broader set of possible input sources.
   The RDF Mapping language (rml) is a generic map-                     rml provides a generic way of defining the mappings that is
ping language defined to express customized mapping rules            easily transferable to cover references to other data structures,
from heterogeneous data structures and serializations to the         combined with case-specific extensions, but always remains
rdf data model. rml is defined as a superset of the wc-             backward compatible with rrml as relational databases
standardized mapping language rrml, aiming to extend its            form such a specific case. rml considers that the mappings
applicability and broaden its scope.                                 to rdf of sets of sources that all together describe a certain
                                                                     domain, can be defined in a combined and uniform way,
4.1      R2RML                                                       while the mapping definitions may be re-used across different
   rrml is defined to express customized mappings only              sources that describe the same domain to incrementally form
from data in relational databases to datasets represented us-        well-integrated datasets, as displayed at Figure 1.
ing the rdf data model. In rrml, the mapping to the rdf                An rml mapping definition follows the same syntax as
data model is based on one or more Triples Maps and occur            rrml. The rml vocabulary namespace is http://semweb.
over a Logical Table iterating on a per-row basis. A Triples Map     mmlab.be/ns/rml# and the preferred prefix is rml. More details
consists of three main parts: the Logical Table (rr:LogicalTable),   about the rml mapping language can be found at http://rml.
the Subject Map and zero or more Predicate-Object Maps. The          io. Defining and executing a mapping with rml requires the
Subject Map (rr:SubjectMap) defines the rule that generates          user to provide a valid and well-formatted input dataset to
unique identifiers (uris) for the resources which are mapped         be mapped and the mapping definition (mapping document)
and is used as the subject of all the rdf triples that are gen-      according to which the mapping will be executed to generate
erated from this Triples Map. A Predicate-Object Map consists        the data’s representation using the RDF data model (output
of Predicate Maps, which define the rule that generates the          dataset). Data cleansing is out of the scope of the language’s
triple’s predicate and Object Maps or Referencing Object Maps,       definition and, if necessary, should be performed in advance.
which defines the rule that generates the triple’s object. The       An extract of two heterogeneous input sources is displayed at
Subject Map, the Predicate Map and the Object Map are Term           Listing 1, an example of a corresponding mapping definition
Maps, namely rules that generate an rdf term (an iri, a blank        is displayed at Listing 3 and the produced output at Listing 2.
node or a literal). A Term Map can be a constant-valued
term map (rr:constant) that always generates the same rdf            Logical Source. A Logical Source (rml:LogicalSource) extends
term, or a column-valued term map (rr:column) that is the            rrml’s Logical Table and is used to determine the input
data value of a referenced column in a given Logical Table’s         source with the data to be mapped. The rrml Logical Table
row, or a template-valued term map (rr:template) that is a           definition determines a database’s table, using the Table Name
valid string template that can contain referenced columns.           (rr:tableName). In the case of rml, a broader reference to any
                                                                     input source is required. Thus, the Logical Source and source
                          R2RML                   RML
  Input Reference       Table Name               Source
  Value Reference          Column               Reference
  Iteration model     per row(implicit)          defined
  Source Expression    SQL (implicit)     Reference Formulation


               Table 1: R2RML Vs RML.                                Figure 1: Mapping sources without and with RML
     Furthermore, rrml supports cross-references between
     { ...  "Performance" :                                        condition’s child reference (rr:child) indicates the reference to
              { "Perf_ID": "567",
                "Venue": { "Name": "STAM",
                                                                   the data value (using an rml:reference) of the Logical Source
                             "Venue_ID": "78" },                   that contains the Referencing Object Map. The join condition’s
                "Location": { "long": "3.717222",                  child reference (rr:parent) indicates the reference to the data
                               "lat": "51.043611" } } , ... }      extract (rr:reference) of the Referencing Object Map’s Parent
     <Events> ...
       <Exhibition id="398">                                       Triples Map. The reference is specified using the Reference
         <Venue> STAM </Venue>                                     Formulation defined at the current Logical Source. The join
         <Location>                                                condition’s parent reference indicates the reference to the
           <lat>51.043611</lat>
           <long>3.717222</long>                                   data extract (rml:reference) of the Parent Triples Map. The
         </Location>                                               reference is specified using the Reference Formulation defined
       </Exhibition> ... ...                                       at the Parent Triples Map Logical Source definition. Therefore,
     </Events>
                                                                   the child reference and the parent reference of a join condition
                                                                   may be defined using different Reference Formulations, if the
  Listing 1: performances.json and exhibitions.xml
                                                                   Triples Map refers to sources of different format.
     ex:567      ex:venue    ex:78 ;
                 ex:location ex:3.717222,51.043611 .
                                                                    1   <#PerformancesMapping>
     ex:398      ex:venue    ex:78 ;
                                                                    2    rml:logicalSource [
                 ex:location ex:3.717222,51.043611 .
                                                                    3     rml:source "http://ex.com/performances.json";
     ex:3.717222,51.043611   ex:lat      ex:3.717222
                                                                    4     rml:referenceFormulation ql:JSONPath;
                             ex:long     ex:51.043611.
                                                                    5     rml:iterator "$.Performance.[*]" ];
                                                                    6     rr:subjectMap [ rr:template "http://ex.com/{Perf_ID}" ];
              Listing 2: The expected output.                       7    rr:predicateObjectMap [ rr:predicate ex:venue;
                                                                    8     rr:objectMap [ rr:parentTriplesMap <#VenueMapping> ] ];
(rml:source) are introduced respectively to specify the input.      9    rr:predicateObjectMap [ rr:predicate ex:location;
                                                                   10      rr:objectMap [ rr:parentTriplesMap <#LocationMapping> ] ].
                                                                   11
Reference Formulation. rml needs to deal with different            12   <#VenueMapping>
                                                                   13    rml:logicalSource [
data serialisations which use different ways to refer to their     14     rml:source "http://ex.com/performances.json";
elements/objects. But, as rml aims to be generic, not a            15     rml:referenceFormulation ql:JSONPath;
uniform way of referring to the data’s elements/objects is         16     rml:iterator "$.Performance.Venue.[*]" ];
defined. rrml uses columns’ names for this purpose. In the        17    rr:subjectMap [ rr:template "http://ex.com/{Venue_ID}" ].
                                                                   18
same context, rml considers that any reference to the Logical      19   <#LocationMapping>
Source should be defined in a form relevant to the input data,     20    rml:logicalSource [ ......... ];
e.g. XPath for xml files or jsonpath for json files. To this       21    rr:subjectMap [ rr:template "http://ex.com/{lat},{long}" ];
                                                                   22    rr:predicateObjectMap [ rr:predicate ex:long;
end, the Reference Formulation (rml:referenceFormulation) decla-   23      rr:objectMap [ rml:reference "long" ] ]
ration is introduced indicating the formulation (for instance,     24    rr:predicateObjectMap [ rr:predicate ex:lat;
a standard or a query language) used to refer to its data. At      25      rr:objectMap [ rml:reference "lat" ] ] .
                                                                   26
the current version of rml, the ql:CSV, ql:XPath and ql:JSONPath   27   <#ExhibitionMapping>
Reference Formulations are predefined.                             28    rml:logicalSource [
                                                                   29     rml:source "http://ex.com/exhibitions.xml";
                                                                   30     rml:referenceFormulation ql:XPath;
Iterator. While in rrml it is already known that a per-           31     rml:iterator "/Events/Exhibition" ];
row iteration occurs, as rml remains generic, the iteration        32    rr:subjectMap [ rr:template "http://ex.com/{@id}" ];
pattern, if any, can not always be implicitly assumed, but it      33     rr:predicateObjectMap [ rr:predicate ex:location;
                                                                   34      rr:objectMap [ rr:parentTriplesMap <#LocationMapping> ] ];
needs to be determined. Thereafter, the iterator (rml:iterator)    35    rr:predicateObjectMap [ rr:predicate ex:venue;
is introduced. The iterator determines the iteration pattern       36      rr:objectMap [ rr:parentTriplesMap <#VenueMapping>;
over the input source and specifies the extract of the data        37       rr:joinCondition [
                                                                   38       rr:child "$.Performance.Venue.Name";
mapped during each iteration. For example, the "$.[*]"             39       rr:parent "/Events/Exhibition/Venue" ] ] ] .
determines the iteration over a json file that occurs over the
object’s outer level. The iterator is not required in the case            Listing 3: An RML mapping definition.
of tabular sources as the default per-row iteration is implied
or if there is no need to iterate over the input data.             5.   RML PROCESSING
                                                                     rml is highly extensible towards new source formats, al-
Logical Reference. A column-valued term map, according             lowing different levels of support. On processing level that
to rrml, is defined using the property rr:column which deter-     adds some complexity as it demands the processor to be
mines a column’s name. In the case of rml, a more generic          scalable to support different input sources, in a uniform way.
property is introduced rml:reference. Its value must be a valid    To deal with these caveats, rml relies on expressions in a
reference to the data of the input dataset. Therefore, the         target expression language relevant to the source format to
reference’s value should be a valid expression according to the    refer to the values of the sources while uses the rml syntax
Reference Formulation defined at the Logical Source, as well as    for the rest of the mapping definition. This target expression
the string template used in the definition of a template-valued    language needs to be tied to its format and should act as a
term map and the iterator’s value. For instance, the itera-        point of reference to the values in a source.
tor, the subject’s template-valued term map and the object’s         Expressions can be located wherever values need to be
reference-valued term map are all valid jsonpath expressions.      extracted from the source (Term maps and rr:iterator) and
                                                                   have to be valid according to the formulation specified in
Referencing Object Map. The last aspect of rrml that              the Triples Map (rr:referenceFormulation). In order to deal with
is extended in rml is the Referencing Object Map. The join         these embedded expressions, an rml processor is required
to have a modular architecture where the extraction and             We created a prototype rml processor implementation in
mapping modules are executed independently of each other.         Java based on the mapping-driven model which is available
When the rml mappings are processed, the mapping module           at https://github.com/mmlab/RMLProcessor.
deals with the mappings’ execution as defined at the mapping
document in rml syntax, while the extraction module deals         6.   CONCLUSIONS AND FUTURE WORK
with the target language’s expressions.
                                                                     In this paper, we presented a novel approach for mapping
Mapping Models                                                    heterogeneous sources into rdf using the rml, an easily
                                                                  extendable mapping language that significantly reduces the
An RML processor can be implemented using two alternative
                                                                  effort for integrated mapping of heterogeneous resources. Our
models: mapping-driven, data-driven or in a hybridic fashion
                                                                  proposed solution efficiently solves the limitations outlined
following any combination of the two solutions that turns
                                                                  (Section 3.1) by addressing the factors presented (Section 3.2)
the processor to better perform.
                                                                  that could improve the dataset’ s integrity and their resources’
                                                                  interlinking, incorporates the data publisher’s uri policy in a
Mapping-driven. In this model, the processing is driven           well considered mapping policy. The per-format and per-file
by the mapping module. The processor processes each Triples       mapping models followed so far get surpassed, leading to
Maps in a consecutive order. Based on the defined expres-         contingent data integration and interlinking at a primary
sion language, each Triples Map is delegated to a language-       stage. The language’s extensibility is self-evident as the
specific sub-extractor. For each Triples Map, its delegated       whole solution relies on the extension of the rrml mapping
sub-extractor iterates over the source data as the Triples        language and arose in a progressive way, as it was initially
Map’s Iterator specifies. For each iteration the mapping mod-     performed to accommodate mappings from the xml format
ule requests an extract of data from the extraction module.       to the rdf data model and later on was re-used as such for
The defined Subject Map and Predicate-Object Maps are applied     mappings of data appearing in json.
and the corresponding triples are generated. The execution           In the future, a thorough evaluation of rml’s efficiency and
of dependent Triples Maps, because of joins, is triggered by      effectiveness will be performed. Furthermore, rml can be
the Parent Triples Map and a nested mapping process occurs.       extended to support views on sources, built by queries. This
                                                                  captures, to an extent, the issue of data cleaning and trans-
Data-driven. In this model, the processing is driven by the       formation enhancing its applicability. Next, the efficiency of
extractor module, namely the data sources. The proces-            rml processing can be improved. A possible optimization is
sor extracts beforehand the iteration patterns, if any, from      the use of execution plans that efficiently arrange the exe-
the Triples Maps. Each defined dataset is integrated by its       cution order depending on their dependencies. Finally, rml
language-specific sub-extractor. Based on the defined expres-     could be used to specify the triples’ provenance, by taking
sion language and the iterator, each Triples Map is delegated     advantage of the rdf-nature of the mapping documents.
to a specific sub-mapper. For each iteration, a data extract is
passed to the processor, which in turn, delegates the extract
of data to the corresponding sub-mapper. The defined Subject
                                                                  7.   REFERENCES
Map and Predicate-Object Maps are applied and the correspond-     [1] S. Bischof, S. Decker, T. Krennwallner, N. Lopes, and
ing triples are generated. The execution of dependent Triples         A. Polleres. Mapping between RDF and XML with
                                                                      XSPARQL. Journal on Data Semantics, 1(3):147–185, 2012.
Maps, because of joins, is triggered by the Parent Triples Map
                                                                  [2] A. Dimou, M. Vander Sande, P. Colpaert, E. Mannens, and
and a nested mapping-driven process occurs.                            R. Van de Walle. Extending rrml to a Source-independent
                                                                       Mapping Language for rdf. In International Semantic Web
   The efficiency of the processor can be increased by schedul-       Conference (Posters and Demos), 2013.
ing the execution of the present expressions in an intelligent    [3] M. Hert, G. Reif, and H. C. Gall. A comparison of
way. The mapping-driven model allows the most straight-                RDB-to-RDF mapping languages. In Proceedings of the 7th
                                                                      International Conference on Semantic Systems, I-Semantics
forward implementation, since Triples Maps are processed
                                                                      ’11, pages 25–32. ACM, 2011.
independently from each other. However, because of this,          [4] C. Lange. Krextor - an extensible framework for contributing
avoiding multiple passes over the same dataset is difficult.           content math to the Web of Data. In Proceedings of the 18th
With execution planning, the number of file passes can be re-         Calculemus and 10th international conference on Intelligent
duced to the bare minimum, but can not be one for all cases.           computer mathematics, MKM’11, pages 304–306.
The data-driven model does not have this problem, since one            Springer-Verlag, 2011.
element of a single dataset can activate all related mappings.    [5] A. Langegger and W. Wöß. XLWrap – Querying and
The execution planning does become more complex, since all             Integrating Arbitrary Spreadsheets with SPARQL. In
                                                                      Proceedings of the 8th International Semantic Web
dependencies have to be resolved beforehand. Note that we             Conference, ISWC ’09, pages 359–374. Springer-Verlag, 2009.
deliberately ignore storing files into memory, which would        [6] M. J. O’Connor, C. Halaschek-Wiener, and M. A. Musen.
solve the multiple passes for the mapping-driven approach.             Mapping Master: a flexible approach for mapping
We only consider a streaming solution, since rml can be used           spreadsheets to OWL. In Proceedings of the 9th International
to process datasets too big for the processor’ s memory. We           Semantic Web Conference on The Semantic Web - Volume
accept a longer mapping time in trade of lower memory us-             Part II, ISWC’10, pages 194–208. Springer-Verlag, 2010.
age. A side-effect of a streaming approach, is the inability to   [7] F. Scharffe, G. Atemezing, R. Troncy, F. Gandon, S. Villata,
                                                                       B. Bucher, F. Hamdi, L. Bihanic, G. Képéklian, F. Cotton,
support some features of expression languages. For instance,          J. Euzenat, Z. Fan, P.-Y. Vandenbussche, and B. Vatant.
XPath has look-ahead functionality that requires access to             Enabling Linked Data publication with the Datalift platform.
data which is not yet known. Thus, we can only support                 In Proc. AAAI workshop on semantic cities, 2012.
a subset. Nevertheless, in practice, most of the expressions
only require functionality within this subset.