RML: A Generic Language for Integrated RDF Mappings of Heterogeneous Data Anastasia Dimou Miel Vander Sande Pieter Colpaert anastasia.dimou@ugent.be miel.vandersande@ugent.be pieter.colpaert@ugent.be Ruben Verborgh Erik Mannens Rik Van de Walle ruben.verborgh@ugent.be erik.mannens@ugent.be rik.vandewalle@ugent.be Ghent University – iMinds – Multimedia Lab Ghent, Belgium ABSTRACT or even in different formats. Furthermore, data is mapped Despite the significant number of existing tools, incorporat- progressively, thus it is important that data publishers incor- ing data from multiple sources and different formats into the porate their data in what is already published. Reusing the Linked Open Data cloud remains complicated. No mapping same unique identifiers for concepts is necessary to achieve formalisation exists to define how to map such heterogeneous this, but it is only possible if prior existing definitions in the sources into rdf in an integrated and interoperable fashion. same dataset are discovered and if they can be replicated. This paper introduces the rml mapping language, a generic Otherwise, duplicates will inevitably appear—even within language based on an extension over rrml, the wc stan- a publisher’s own datasets. Identifying, replicating, and keep- dard for mapping relational databases into rdf. Broadening ing those definitions aligned is complicated and the situation rrml’s scope, the language becomes source-agnostic and aggravates the more data is mapped and published. extensible, while facilitating the definition of mappings of Solving this problem requires a uniform, modular, interop- multiple heterogeneous sources. This leads to higher integrity erable and extensible technology that supports this need for within datasets and richer interlinking among resources. gradually incrementing datasets. Such a solution can deal with the mapping and primary interlinking of the data, which should take place in a tightly coordinated way instead of 1. INTRODUCTION as two separate, consecutive actions. This ensures semantic Deploying the five stars of the Linked Open Data schema1 representations of higher quality and datasets with better is the de-facto way of mapping data. In real-world situations, integrity. To this end, we propose rml, a generic mapping multiple sources of different formats are part of multiple language defined as an extension of rrml2 , the wc recom- domains, which in their turn are formed by multiple sources mendation for mapping data in relational databases into rdf. and the relations between them. Approaching the stars as a The remainder of the paper is organized as follows: Sec- set of consecutive steps and applying them to a single source tion 2 discusses related solutions existing today. Section 3 every time—as most solutions tend to do—is not always an analyzes the requirements of a mapping language, and Sec- optimal solution. When mapping heterogeneous data into tion 4 introduces the proposed approach. Next, Section 5 rdf, such approaches often fail to reach the final goal of addresses the challenges of implementing an rml processor. publishing interlinked data. The semantic representation of Finally, Section 6 outlines our conclusions and future work. each mapped resource is defined independently, disregarding its possible prior definitions and its links to other resources. Manual alignment to their prior appearances is performed 2. RELATED WORK by redefining their semantic representations, while links to Several solutions exist to execute mappings from differ- other resources are defined after the data are mapped and ent file structures and serialisations to rdf. For relational published. Nonetheless, as datasets are often shaped gradu- databases, different mapping languages beyond rrml are ally, a demand emerges for a well-considered policy regarding defined [3] and several implementations already exist3 . Simi- mapping and primary interlinking of data in the context of larly, mapping languages were defined to support conversion a certain knowledge domains. from data in csv and spreadsheets to the rdf data model. For instance, governments publish their data as Open Data They include the XLWrap’s mapping language [5] that con- and turn them into Linked Open Data afterwards. Much of verts data in various spreadsheets to rdf, the declarative this data, as expected when dealing with many sources, com- owl-centric mapping language Mapping Master’s M2 [6] plements each other in the description of different knowledge that converts data from spreadsheets into the Web Ontol- domain. Therefore, the same concepts appear in multiple ogy Language (owl), Tarql4 that follows a querying ap- data sets, and problematically, often with different identifiers proach and Vertere5 . The main drawback in the case of most 1 csv/spreadsheet-to-rdf mapping solutions is the assumption http://5stardata.info/ 2 http://www.w3.org/TR/r2rml Copyright is held by the author/owner(s). 3 http://www.w3.org/2001/sw/rdb2rdf/wiki/Implementations LDOW2014, April 8, 2014, Seoul, Korea. 4 https://github.com/cygri/tarql 5 https://github.com/knudmoeller/Vertere-RDF that each row describes an entity (entity-per-row assumption) Mapping definitions’ reusability. The mapping definitions and that each column represents a property. of current solutions are not reusable, as there is no standard A larger variety of solutions exist to map from xml to rdf, formalisation for any source format apart from relational but to the best of our knowledge, no specific languages were databases, i.e., rrml. In most cases, the mapping rules defined for this, apart from grddl6 that essentially provides are not interoperable as they are tied to the implementation, the links to the algorithms (typically represented in xslt) which prevents their extraction and reuse across different that map the data to rdf. Instead, tools mostly rely on implementations. Moreover, this prohibits reuse of the same existing xml solutions, such as xslt (e.g., Krextor [4] and mapping rules to map data that describe the same model, AstroGrid-D7 ), xpath (e.g., Tripliser8 ), and xquery (e.g., XS- but is serialized in different initial formats. PARQL [1]). In general, most existing tools deploy mappings from a certain source format to rdf (per-source approaches). 3.2 Requirements for generic mappings Few tools provide mappings from different source formats to To achieve datasets with better integrated and richer in- rdf; and those tools actually employ separate source-centric terlinked resources, the aforementioned issues should be ad- approaches for each of the formats they support. Datalift [7], dressed during the mapping phase, rather than later. A set The DataTank 9 , OpenRefine10 , RDFizers 11 and Virtuoso of factors that contribute to this are outlined below. Sponger12 are the most well-known. Uniform and interoperable mapping definitions. Since we require a uniform way of dealing with different source 3. MAPPINGS METHODOLOGY serializations, the mapping definitions should be defined After outlining the limitations of existing solutions, we independently of the references to the input data. The present the factors that can improve the mappings to produce same mappings may then be reused across different sources— better integrated datasets and early interlinked resources. as long as they capture the same context (i.e., the same rdf representations)—only by changing the reference to 3.1 Limitations of current mapping methods the input source that holds the information. For example, We identified the following limitations that prevent current a performance described in a json file and an exhibition practices from achieving well integrated datasets. described in an xml file may take place at the same location, indicated by an identical longitude/latitude pair. We only Mapping of data on a per-source basis. Most of the cur- need a single mapping definition to describe their location, rent solutions work on a per-source basis: only one source adjusted to point to respectively the json objects and the is mapped at once, as opposed to mapping different related xml elements that hold the corresponding values. Therefore, sources together, despite covering the same domains or shar- we require a modular language in which the references to the ing the same formats. As a result, data publishers can only data extracts and the mapping definitions are distinct and generate resources and links between data appearing within not interdependent. Thereby, the mapping definitions can be a single source. Their mapping definitions need to be aligned reused across different implementations for different source manually when the same resources already appear in the formats, reducing the implementation and learning costs. targeting dataset. Thus, data publishers need to redefine and replicate the patterns for the resources’ uris definition Robust cross-references and interlinking. Redefining and every time they appear in a new mapping rule. Furthermore, replicating patterns every time a new input source is inte- this is not always possible, as the data included in the one grated should be avoided. Publishers should be able to source may not be sufficient to replicate the same uris. This uniquely define the pattern that generates a resource and results in distinct uris for identical resources, which leads to refer to its definition every other time this resource is mapped duplicates within a publisher’s own dataset. In addition, the (in this way enriched), which has the following three advan- interlinking of the resources generated from different sources tages: First, possible modifications to the patterns, or data has to be performed afterwards. values appearing in the patterns that generate the uris, are propagated to every other reference of the resource, mak- Mapping data on a per-format basis. Besides the per- ing the interlinking more robust. Second, taking advantage source approach, most of the current solutions provide a of this integrated solution, cross-references among sources per-format approach: only mappings from a certain source become possible; links between resources in different input format (e.g., xml) are supported. In practice, data publishers sources are defined already on mapping level. Third, and need to map various source formats to rdf. Therefore, they most significant, when data publishers want to map a new need to install, learn, use and maintain different tools for source, their new mappings are defined taking advantage of each case separately, which hampers their effort to ensure and automatically aligning to the existing ones. the integrity of their datasets even more. Alternatively, some Extending the aforementioned example, the venue where end up implementing their own case-specific solutions. the performance and the event take place is the same. When the input source for the performances was mapped, the 6 mappings for the possible venues were defined considering http://www.w3.org/TR/grddl/ 7 certain identifiers to define their uris. Once the exhibitions http://www.gac-grid.de/project-products/Software/XML2RDF.html 8 http://daverog.github.io/tripliser/ are about to be mapped, the data publisher might not be 9 http://thedatatank.com able to reuse the existing mapping definition for the venues 10 http://openrefine.org/ as the identifiers are not included in the dataset to replicate 11 http://simile.mit.edu/wiki/RDFizers the same patterns. However, the venue name might be 12 http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/ considered to determine the binding. Then, the existing VirtSponger mapping definition can be referred to generate the same uris and, thus enrich the existing resource with new attributes and Triples Maps, when the subject of a Triples Map is the same as interlink data from the newly mapped dataset to the existing the object generated by a Predicate-Object Map. A Referencing one. As the original input source is an Open Data set that Object Map (rr:RefObjectMap) is used then to point to the Triples can be referenced, it is always available to be used to support Map that generates on its Subject Map the corresponding re- the mapping of the new data. Summarizing, the definition source, the so-called Referencing Object Map’s Parent Triples of the links between resources in different sources—even if Map. If the Triples Maps refer to different Logical Tables, a they are in different formats—happens on the mapping level join between the Logical Tables is required. The join condition instead of during a subsequent interlinking step. (rr:joinCondition) performs the join exactly as a join is exe- cuted in sql. The join condition consists of a reference to a Scalable mapping language. As the references to the data column name that exists in the Logical Table of the Triples extracts and the mapping definitions are distinct and not Map that contains the Referencing Object Map (rr:child) and a interdependent, the pointer to the input source’s data can reference to a column name that exists in the Logical Table of be adjusted to each case. Such modular solution leads to the Referencing Object Map’s Parent Triples Map (rr:parent). correspondingly modular implementations that perform the mappings in a uniform way, independent of the input source. 4.2 RML They only adjust the respective extraction mechanism de- rml keeps the mapping definitions as in rrml but ex- pending on the input source. Case-specific solutions exist cludes its database-specific references from the core model. because complete generic solutions fail, as it is impossible to The potential broad concepts of rrml, which were explained predict every potential input. A scalable solution addresses previously [2], are formally designated in the frame of the what can be defined in a generic way for all possible dif- rml mapping language and are elaborated upon here. The ferent input sources and scales over what cannot. In order primary difference is the potential input that is limited to to support emerging needs, it should allow extensions with a certain database in the case of rrml, while it can be a source-specific references, addressed on a case-specific level. broad set of (one or more) input sources in the case of rml. Table 1 summarizes overall the rml’s extensions over rrml 4. RML MAPPING LANGUAGE entailed because of the broader set of possible input sources. The RDF Mapping language (rml) is a generic map- rml provides a generic way of defining the mappings that is ping language defined to express customized mapping rules easily transferable to cover references to other data structures, from heterogeneous data structures and serializations to the combined with case-specific extensions, but always remains rdf data model. rml is defined as a superset of the wc- backward compatible with rrml as relational databases standardized mapping language rrml, aiming to extend its form such a specific case. rml considers that the mappings applicability and broaden its scope. to rdf of sets of sources that all together describe a certain domain, can be defined in a combined and uniform way, 4.1 R2RML while the mapping definitions may be re-used across different rrml is defined to express customized mappings only sources that describe the same domain to incrementally form from data in relational databases to datasets represented us- well-integrated datasets, as displayed at Figure 1. ing the rdf data model. In rrml, the mapping to the rdf An rml mapping definition follows the same syntax as data model is based on one or more Triples Maps and occur rrml. The rml vocabulary namespace is http://semweb. over a Logical Table iterating on a per-row basis. A Triples Map mmlab.be/ns/rml# and the preferred prefix is rml. More details consists of three main parts: the Logical Table (rr:LogicalTable), about the rml mapping language can be found at http://rml. the Subject Map and zero or more Predicate-Object Maps. The io. Defining and executing a mapping with rml requires the Subject Map (rr:SubjectMap) defines the rule that generates user to provide a valid and well-formatted input dataset to unique identifiers (uris) for the resources which are mapped be mapped and the mapping definition (mapping document) and is used as the subject of all the rdf triples that are gen- according to which the mapping will be executed to generate erated from this Triples Map. A Predicate-Object Map consists the data’s representation using the RDF data model (output of Predicate Maps, which define the rule that generates the dataset). Data cleansing is out of the scope of the language’s triple’s predicate and Object Maps or Referencing Object Maps, definition and, if necessary, should be performed in advance. which defines the rule that generates the triple’s object. The An extract of two heterogeneous input sources is displayed at Subject Map, the Predicate Map and the Object Map are Term Listing 1, an example of a corresponding mapping definition Maps, namely rules that generate an rdf term (an iri, a blank is displayed at Listing 3 and the produced output at Listing 2. node or a literal). A Term Map can be a constant-valued term map (rr:constant) that always generates the same rdf Logical Source. A Logical Source (rml:LogicalSource) extends term, or a column-valued term map (rr:column) that is the rrml’s Logical Table and is used to determine the input data value of a referenced column in a given Logical Table’s source with the data to be mapped. The rrml Logical Table row, or a template-valued term map (rr:template) that is a definition determines a database’s table, using the Table Name valid string template that can contain referenced columns. (rr:tableName). In the case of rml, a broader reference to any input source is required. Thus, the Logical Source and source R2RML RML Input Reference Table Name Source Value Reference Column Reference Iteration model per row(implicit) defined Source Expression SQL (implicit) Reference Formulation Table 1: R2RML Vs RML. Figure 1: Mapping sources without and with RML Furthermore, rrml supports cross-references between { ... "Performance" : condition’s child reference (rr:child) indicates the reference to { "Perf_ID": "567", "Venue": { "Name": "STAM", the data value (using an rml:reference) of the Logical Source "Venue_ID": "78" }, that contains the Referencing Object Map. The join condition’s "Location": { "long": "3.717222", child reference (rr:parent) indicates the reference to the data "lat": "51.043611" } } , ... } extract (rr:reference) of the Referencing Object Map’s Parent ... Triples Map. The reference is specified using the Reference STAM Formulation defined at the current Logical Source. The join condition’s parent reference indicates the reference to the 51.043611 3.717222 data extract (rml:reference) of the Parent Triples Map. The reference is specified using the Reference Formulation defined ... ... at the Parent Triples Map Logical Source definition. Therefore, the child reference and the parent reference of a join condition may be defined using different Reference Formulations, if the Listing 1: performances.json and exhibitions.xml Triples Map refers to sources of different format. ex:567 ex:venue ex:78 ; ex:location ex:3.717222,51.043611 . 1 <#PerformancesMapping> ex:398 ex:venue ex:78 ; 2 rml:logicalSource [ ex:location ex:3.717222,51.043611 . 3 rml:source "http://ex.com/performances.json"; ex:3.717222,51.043611 ex:lat ex:3.717222 4 rml:referenceFormulation ql:JSONPath; ex:long ex:51.043611. 5 rml:iterator "$.Performance.[*]" ]; 6 rr:subjectMap [ rr:template "http://ex.com/{Perf_ID}" ]; Listing 2: The expected output. 7 rr:predicateObjectMap [ rr:predicate ex:venue; 8 rr:objectMap [ rr:parentTriplesMap <#VenueMapping> ] ]; (rml:source) are introduced respectively to specify the input. 9 rr:predicateObjectMap [ rr:predicate ex:location; 10 rr:objectMap [ rr:parentTriplesMap <#LocationMapping> ] ]. 11 Reference Formulation. rml needs to deal with different 12 <#VenueMapping> 13 rml:logicalSource [ data serialisations which use different ways to refer to their 14 rml:source "http://ex.com/performances.json"; elements/objects. But, as rml aims to be generic, not a 15 rml:referenceFormulation ql:JSONPath; uniform way of referring to the data’s elements/objects is 16 rml:iterator "$.Performance.Venue.[*]" ]; defined. rrml uses columns’ names for this purpose. In the 17 rr:subjectMap [ rr:template "http://ex.com/{Venue_ID}" ]. 18 same context, rml considers that any reference to the Logical 19 <#LocationMapping> Source should be defined in a form relevant to the input data, 20 rml:logicalSource [ ......... ]; e.g. XPath for xml files or jsonpath for json files. To this 21 rr:subjectMap [ rr:template "http://ex.com/{lat},{long}" ]; 22 rr:predicateObjectMap [ rr:predicate ex:long; end, the Reference Formulation (rml:referenceFormulation) decla- 23 rr:objectMap [ rml:reference "long" ] ] ration is introduced indicating the formulation (for instance, 24 rr:predicateObjectMap [ rr:predicate ex:lat; a standard or a query language) used to refer to its data. At 25 rr:objectMap [ rml:reference "lat" ] ] . 26 the current version of rml, the ql:CSV, ql:XPath and ql:JSONPath 27 <#ExhibitionMapping> Reference Formulations are predefined. 28 rml:logicalSource [ 29 rml:source "http://ex.com/exhibitions.xml"; 30 rml:referenceFormulation ql:XPath; Iterator. While in rrml it is already known that a per- 31 rml:iterator "/Events/Exhibition" ]; row iteration occurs, as rml remains generic, the iteration 32 rr:subjectMap [ rr:template "http://ex.com/{@id}" ]; pattern, if any, can not always be implicitly assumed, but it 33 rr:predicateObjectMap [ rr:predicate ex:location; 34 rr:objectMap [ rr:parentTriplesMap <#LocationMapping> ] ]; needs to be determined. Thereafter, the iterator (rml:iterator) 35 rr:predicateObjectMap [ rr:predicate ex:venue; is introduced. The iterator determines the iteration pattern 36 rr:objectMap [ rr:parentTriplesMap <#VenueMapping>; over the input source and specifies the extract of the data 37 rr:joinCondition [ 38 rr:child "$.Performance.Venue.Name"; mapped during each iteration. For example, the "$.[*]" 39 rr:parent "/Events/Exhibition/Venue" ] ] ] . determines the iteration over a json file that occurs over the object’s outer level. The iterator is not required in the case Listing 3: An RML mapping definition. of tabular sources as the default per-row iteration is implied or if there is no need to iterate over the input data. 5. RML PROCESSING rml is highly extensible towards new source formats, al- Logical Reference. A column-valued term map, according lowing different levels of support. On processing level that to rrml, is defined using the property rr:column which deter- adds some complexity as it demands the processor to be mines a column’s name. In the case of rml, a more generic scalable to support different input sources, in a uniform way. property is introduced rml:reference. Its value must be a valid To deal with these caveats, rml relies on expressions in a reference to the data of the input dataset. Therefore, the target expression language relevant to the source format to reference’s value should be a valid expression according to the refer to the values of the sources while uses the rml syntax Reference Formulation defined at the Logical Source, as well as for the rest of the mapping definition. This target expression the string template used in the definition of a template-valued language needs to be tied to its format and should act as a term map and the iterator’s value. For instance, the itera- point of reference to the values in a source. tor, the subject’s template-valued term map and the object’s Expressions can be located wherever values need to be reference-valued term map are all valid jsonpath expressions. extracted from the source (Term maps and rr:iterator) and have to be valid according to the formulation specified in Referencing Object Map. The last aspect of rrml that the Triples Map (rr:referenceFormulation). In order to deal with is extended in rml is the Referencing Object Map. The join these embedded expressions, an rml processor is required to have a modular architecture where the extraction and We created a prototype rml processor implementation in mapping modules are executed independently of each other. Java based on the mapping-driven model which is available When the rml mappings are processed, the mapping module at https://github.com/mmlab/RMLProcessor. deals with the mappings’ execution as defined at the mapping document in rml syntax, while the extraction module deals 6. CONCLUSIONS AND FUTURE WORK with the target language’s expressions. In this paper, we presented a novel approach for mapping Mapping Models heterogeneous sources into rdf using the rml, an easily extendable mapping language that significantly reduces the An RML processor can be implemented using two alternative effort for integrated mapping of heterogeneous resources. Our models: mapping-driven, data-driven or in a hybridic fashion proposed solution efficiently solves the limitations outlined following any combination of the two solutions that turns (Section 3.1) by addressing the factors presented (Section 3.2) the processor to better perform. that could improve the dataset’ s integrity and their resources’ interlinking, incorporates the data publisher’s uri policy in a Mapping-driven. In this model, the processing is driven well considered mapping policy. The per-format and per-file by the mapping module. The processor processes each Triples mapping models followed so far get surpassed, leading to Maps in a consecutive order. Based on the defined expres- contingent data integration and interlinking at a primary sion language, each Triples Map is delegated to a language- stage. The language’s extensibility is self-evident as the specific sub-extractor. For each Triples Map, its delegated whole solution relies on the extension of the rrml mapping sub-extractor iterates over the source data as the Triples language and arose in a progressive way, as it was initially Map’s Iterator specifies. For each iteration the mapping mod- performed to accommodate mappings from the xml format ule requests an extract of data from the extraction module. to the rdf data model and later on was re-used as such for The defined Subject Map and Predicate-Object Maps are applied mappings of data appearing in json. and the corresponding triples are generated. The execution In the future, a thorough evaluation of rml’s efficiency and of dependent Triples Maps, because of joins, is triggered by effectiveness will be performed. Furthermore, rml can be the Parent Triples Map and a nested mapping process occurs. extended to support views on sources, built by queries. This captures, to an extent, the issue of data cleaning and trans- Data-driven. In this model, the processing is driven by the formation enhancing its applicability. Next, the efficiency of extractor module, namely the data sources. The proces- rml processing can be improved. A possible optimization is sor extracts beforehand the iteration patterns, if any, from the use of execution plans that efficiently arrange the exe- the Triples Maps. Each defined dataset is integrated by its cution order depending on their dependencies. Finally, rml language-specific sub-extractor. Based on the defined expres- could be used to specify the triples’ provenance, by taking sion language and the iterator, each Triples Map is delegated advantage of the rdf-nature of the mapping documents. to a specific sub-mapper. For each iteration, a data extract is passed to the processor, which in turn, delegates the extract of data to the corresponding sub-mapper. The defined Subject 7. REFERENCES Map and Predicate-Object Maps are applied and the correspond- [1] S. Bischof, S. Decker, T. Krennwallner, N. Lopes, and ing triples are generated. The execution of dependent Triples A. Polleres. Mapping between RDF and XML with XSPARQL. Journal on Data Semantics, 1(3):147–185, 2012. Maps, because of joins, is triggered by the Parent Triples Map [2] A. Dimou, M. Vander Sande, P. Colpaert, E. Mannens, and and a nested mapping-driven process occurs. R. Van de Walle. Extending rrml to a Source-independent Mapping Language for rdf. In International Semantic Web The efficiency of the processor can be increased by schedul- Conference (Posters and Demos), 2013. ing the execution of the present expressions in an intelligent [3] M. Hert, G. Reif, and H. C. Gall. A comparison of way. The mapping-driven model allows the most straight- RDB-to-RDF mapping languages. In Proceedings of the 7th International Conference on Semantic Systems, I-Semantics forward implementation, since Triples Maps are processed ’11, pages 25–32. ACM, 2011. independently from each other. However, because of this, [4] C. Lange. Krextor - an extensible framework for contributing avoiding multiple passes over the same dataset is difficult. content math to the Web of Data. In Proceedings of the 18th With execution planning, the number of file passes can be re- Calculemus and 10th international conference on Intelligent duced to the bare minimum, but can not be one for all cases. computer mathematics, MKM’11, pages 304–306. The data-driven model does not have this problem, since one Springer-Verlag, 2011. element of a single dataset can activate all related mappings. [5] A. Langegger and W. Wöß. XLWrap – Querying and The execution planning does become more complex, since all Integrating Arbitrary Spreadsheets with SPARQL. In Proceedings of the 8th International Semantic Web dependencies have to be resolved beforehand. Note that we Conference, ISWC ’09, pages 359–374. Springer-Verlag, 2009. deliberately ignore storing files into memory, which would [6] M. J. O’Connor, C. Halaschek-Wiener, and M. A. Musen. solve the multiple passes for the mapping-driven approach. Mapping Master: a flexible approach for mapping We only consider a streaming solution, since rml can be used spreadsheets to OWL. In Proceedings of the 9th International to process datasets too big for the processor’ s memory. We Semantic Web Conference on The Semantic Web - Volume accept a longer mapping time in trade of lower memory us- Part II, ISWC’10, pages 194–208. Springer-Verlag, 2010. age. A side-effect of a streaming approach, is the inability to [7] F. Scharffe, G. Atemezing, R. Troncy, F. Gandon, S. Villata, B. Bucher, F. Hamdi, L. Bihanic, G. Képéklian, F. Cotton, support some features of expression languages. For instance, J. Euzenat, Z. Fan, P.-Y. Vandenbussche, and B. Vatant. XPath has look-ahead functionality that requires access to Enabling Linked Data publication with the Datalift platform. data which is not yet known. Thus, we can only support In Proc. AAAI workshop on semantic cities, 2012. a subset. Nevertheless, in practice, most of the expressions only require functionality within this subset.