Why to Tie to a Single Data Mapping Language? Enabling a Transformation from ShExML to RML Herminio García-González1,∗ , Anastasia Dimou2 1 Kazerne Dossin: Memorial, Museum and Research Centre on Holocaust and Human Rights, Mechelen, Belgium 2 KU Leuven, Department of Computer Science, Sint-Katelijne-Waver, Belgium Abstract Different mapping languages (e.g., RML, SPARQL-Generate and ShExML) have appeared in the last years covering different use cases, scenarios and functionalities. However, users cannot seamlessly interchange between these mapping languages. In this paper, we propose a translation from ShExML to RML letting users benefit from the usability of ShExML and the wide-support and functionalities of RML. Keywords declarative mapping rules, translation, data mapping languages, ShExML, RML 1. Introduction Declarative mapping languages have been increasingly adopted during the last years in different fields (e.g., DBpedia [1], digital heritage [2] or the railway domain [3])1 . These mapping languages can be used to specify declarative mapping rules which allow for a modifiable, flexible, repeatable and shareable workflow when integrating heterogeneous data sources in a Knowledge Graph (KG) [4] superseding the imperative solutions. However, not all mapping languages cover the same functionalities, leading to diverse mapping languages. The consequent lack of interoperability between the different languages ties users to a certain specification, preventing them to switch between mapping languages. Among the different declarative mapping languages, RML [5] is the one that has been the most adopted by the community. More than 20 associated systems implement RML allowing different approaches and optimisations2 which make RML a very reliable and long-term solution. However, its syntax, based on RDF, tends to be very verbose and thus not so friendly for humans. In this context, ShExML appeared with a clear vocation on usability letting users be more productive when developing mapping rules [6]. However, its specification only counts with a conformant engine3 , which does not cover all the functionalities that users could face nor can users take advantage of all the optimisations offered by the different RML implementations. SEMANTICS 2022 EU: 18th International Conference on Semantic Systems, September 13-15, 2022, Vienna, Austria ∗ Corresponding author. Envelope-Open herminio.garciagonzalez@kazernedossin.eu (H. García-González); anastasia.dimou@kuleuven.be (A. Dimou) GLOBE https://herminiogarcia.com/ (H. García-González); https://natadimou.com (A. Dimou) Orcid 0000-0001-5590-4857 (H. García-González); 0000-0003-2138-7972 (A. Dimou) © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop CEUR Workshop Proceedings (CEUR-WS.org) Proceedings http://ceur-ws.org ISSN 1613-0073 1 See more on: https://github.com/kg-construct/use-cases 2 https://github.com/kg-construct/resources/blob/master/tools.md 3 https://github.com/herminiogg/ShExML Thus, it is important to build bridges to foster adoption and trust in the declarative KG construction ecosystem while supporting users to effectively and seamlessly cover their needs. With this objective in mind, in this paper, we propose a conversion from ShExML to RML, letting users rapidly sketch their mapping rules in ShExML and then switch to RML to take advantage of its plethora of implementations. A live demo is available at the ShExML playground4 . 2. Related Work Different mapping languages were proposed in the past [7] based on dedicated languages, e.g., RML extending R2RML to support heterogeneous data sources; repurposed languages, e.g., SPARQL-Generate [8] extending the SPARQL query language or ShExML using Shape Expressions (ShEx)-based syntax [9]; or serialisation languages like YARRRML [10] extending YAML. As YARRRML is a tightly coupled serialisation for RML aiming for better readability by humans, translating YARRRML to RML is straightforward5 . Besides the YARRRML to RML translation, an RML to SPARQL-Generate6 translation exists enabling the use of RML mapping rules with the SPARQL-Generate implementation, but the inverse translation is not tackled. Recently, an ontology was presented as a meta-language to represent the functionalities of existing mapping languages [11]. Future implementations will offer translations from and to the targeted mapping languages. Despite covering all mapping rules, being a meta-language involves more verbose constructions than existing languages, surpassing even a verbose language like RML. In addition, covering the expressiveness from all potential mapping languages could be a difficult—if ever approachable—task. Thus, specific translations are still necessary. In this paper we propose a specific translation from ShExML to RML to combine the usability of ShExML and the sustainability and wide-support of RML implementations. 3. Brief introduction of RML and ShExML The two languages offer different syntaxes for integrating heterogeneous data into a KG. The example in Figure 1 generates a triple for each entity, showing their correspondences. On the one hand, RML7 offers a syntax based in RDF which ties the representation of the language to the RDF serialisation formats (i.e., Turtle, NTriples, RDF/XML, etc.). For describing the mapping process it relies on classes like: r r : S u b j e c t M a p , r r : P r e d i c a t e M a p and r r : O b j e c t M a p that define how the subject, predicate and object of a triple will be generated; r r : P r e d i c a t e O b j e c t M a p which relates a predicate with an object; r m l : L o g i c a l S o u r c e which describes an input data source; and r r : T r i p l e s M a p which defines how a set of triples will be generated for a certain data source. Each class offers different fields to modify the aspects of the generation. On the other hand, ShExML8 separates the constructions intended to extract values (declara- tions) from those to generate triples (shapes). ShExML offers: S O U R C E which points to a data 4 http://shexml.herminiogarcia.com/editor/ 5 https://github.com/RMLio/yarrrml-parser 6 https://github.com/sparql-generate/rml-to-sparql-generate 7 https://rml.io/specs/rml/ 8 http://shexml.herminiogarcia.com/spec/ source, I T E R A T O R which defines a query whose results need to be iterated, F I E L D that holds the query to a value that will be outputted in a triple, and E X P R E S S I O N which relates the sources with the iterator and field queries. Then, the shapes part allows to format the output in triples using already defined expressions for subjects, predicates and objects. 4. Translating ShExML constructions to RML rules Figure 1: A set of ShExML mapping rules (left side) and an equivalent set of RML mapping rules (right side). Each color identifies a correspondence between an RML construction and a ShExML construction. @prefix : . @prefix schema: . @prefix rml: . @prefix rr: . @prefix d2rq: . @prefix ql: . @prefix map: . PREFIX : map:550785378 a rml:LogicalSource ; PREFIX schema: rml:source "http://shexml.herminiogarcia.com/files/films.xml" ; rml:iterator "//film" ; SOURCE films_xml_file rml:referenceFormulation ql:XPath . < http://shexml.herminiogarcia.com/files/films.xml > map:s_1 a rr:SubjectMap ; ITERATOR film_xml < xpath: //film > { rr:template "http://example.com/{@id}" . FIELD id < @id > map:p_1 a rr:PredicateMap ; FIELD name < name > rr:constant schema:name . } map:o_1 a rr:ObjectMap ; EXPRESSION films rr:template "{name}" ; rr:termType rr:Literal . :Film :[films.id] { map:m_1 a rr:TriplesMap ; schema:name [films.name] ; rml:logicalSource map:550785378 ; } rr:predicateObjectMap map:po_1 ; rr:subjectMap map:s_1 . map:po_1 a rr:PredicateObjectMap ; rr:objectMap map:o_1 ; rr:predicateMap map:p_1 . As ShExML follows a ShEx-based syntax, a dedicated parser fitting the ShExML grammar is required. For this purpose, the translator is embedded in the ShExML engine, taking advantage of the already developed modules. Once the abstract tree is generated the translator traverses it to generate the triples that conform an RML rules set. This process is similar to how the ShExML engine outputs the mapped RDF but this time it generates RML. The translator performs a one to one translation of the different directives, taking into account that in many cases information in ShExML rules is dispersed across different directives in comparison with their RML counterparts (see Figure 1 for a diagram on how ShExML constructions are translated to RML). A live demo of this algorithm can be tested on the ShExML playground9 . To verify the validity of the performed translations we developed a set of test cases10 that cover different aspects of ShExML functionality. The test cases convert the ShExML rules to RML rules. The latter are executed in the rml-mapper processor11 and the output is compared to the output of the ShExML engine. 9 http://shexml.herminiogarcia.com/editor/ 10 https://github.com/herminiogg/ShExML/tree/master/src/test/scala-2.12/com/herminiogarcia/shexml/rml 11 https://github.com/RMLio/rmlmapper-java Non translatable directives Even though many declarative mapping languages share the same foundations and a common iteration model [11], they have evolved in different directions covering different functionali- ties. Therefore, unlike YARRRML, not all ShExML features can be directly translated to RML statements, or it would involve too complex processes to make them possible. Certain features are included in ShExML but they are not directly supported in RML. This includes Matchers12 (substitute one string occurrence for other string), String operations13 (concatenate two strings), and Auto-increment ids14 . All these functionalities could be covered in RML using the FnO [12] extension but that is beyond the scope of this translation. Last year, ShExML introduced a set of modifications to cope with the mapping challenges15 [13]. It is now possible to define dynamic data types16 and language tags17 . This has been tackled by RML specification for dynamic language tags generation but not for dynamic datatypes. ShExML allows accessing fields outside the iteration scope. This is useful when accessing parent nodes in JSON as JSONPath specification does not support it18 . It has been described how to tackle this with RML [14] but it is not yet implemented in the specification. Finally, external functions execution has been recently supported in ShExML. However, ShExML uses Scala classes that can be directly executed while FnO relies on a function hub where functions can be implemented in JavaScript or Java. This could be resolved by translating from Scala to Java code and then pushing the Java code to the Function Hub [12]. 5. Conclusions In this work, we described a translation from ShExML to RML, letting users sketch mapping rules rapidly in ShExML and then switching to RML. In addition, we discussed the limitations of the current translation due to different coverage of features between the two languages. In the future, we will tackle the inverse translation, i.e., from RML to ShExML. This will let more users adopt ShExML using their already created RML rules as well as allowing them to switch back and forth between the two languages depending on their needs. Acknowledgments This work was carried out in the context of the EHRI-3 project funded by the European Com- mission under the call H2020-INFRAIA-2018-2020, with grant agreement ID 871111 and DOI 10.3030/871111. 12 http://shexml.herminiogarcia.com/spec/#matcher 13 http://shexml.herminiogarcia.com/spec/#string-operation 14 http://shexml.herminiogarcia.com/spec/#autoincrement-ids 15 https://kg-construct.github.io/workshop/2021/challenges.html 16 http://shexml.herminiogarcia.com/spec/#data-types-dynamic-version 17 http://shexml.herminiogarcia.com/spec/#lang-tags-dynamic-version 18 https://goessner.net/articles/JsonPath/ References [1] B. De Meester, W. Maroy, A. Dimou, R. Verborgh, E. Mannens, Declarative data transfor- mations for Linked Data generation: the case of DBpedia, in: The Semantic Web – ISWC 2017, Springer, 2017. doi:1 0 . 1 0 0 7 / 9 7 8 - 3 - 3 1 9 - 6 8 2 0 4 - 4 _ 2 8 . [2] H. García-González, E. Albarrán-Fernández, J. E. L. Gayo, M. Calleja-Puerta, Converting Asturian Notaries Public deeds to Linked Data Using TEI and ShExML, in: Proceedings of the Third Workshop on Humanities in the Semantic Web, CEUR, 2020. [3] J. A. Rojas, M. Aguado, P. Vasilopoulou, I. Velitchkov, D. V. Assche, P. Colpaert, R. Verborgh, Leveraging Semantic Technologies for Digital Interoperability in the European Railway Domain, in: The Semantic Web–ISWC, Springer, 2021. doi:1 0 . 1 0 0 7 / 9 7 8 - 3 - 0 3 0 - 8 8 3 6 1 - 4 _ 3 8 . [4] B. D. Meester, P. Heyvaert, R. Verborgh, A. Dimou, Mapping Languages: Analysis of Comparative Characteristics, in: Joint Proceedings of the 1st Workshop on Knowledge Graph Building and 1st Workshop on Large Scale RDF Analytics, CEUR, 2019. [5] A. Dimou, M. V. Sande, P. Colpaert, R. Verborgh, E. Mannens, R. V. de Walle, RML: A generic language for integrated RDF mappings of heterogeneous data, in: Proceedings of the Workshop on Linked Data on the Web, CEUR, 2014. [6] H. García-González, I. Boneva, S. Staworko, J. E. Labra-Gayo, J. M. C. Lovelle, ShExML: improving the usability of heterogeneous data mapping languages for first-time users, PeerJ Computer Science (2020) e318. doi:1 0 . 7 7 1 7 / p e e r j - c s . 3 1 8 . [7] D. V. Assche, T. Delva, G. Haesendonck, P. Heyvaert, B. D. Meester, A. Dimou, Declarative RDF graph generation from heterogeneous (semi-)structured data: a Systematic Literature Review, Journal of Web Semantics (2022). (In press). [8] M. Lefrançois, A. Zimmermann, N. Bakerally, A SPARQL extension for generating RDF from heterogeneous formats, in: The Semantic Web–ESWC, Springer, 2017. doi:1 0 . 1 0 0 7 / 978- 3- 319- 58068- 5_3. [9] E. Prud’hommeaux, J. E. Labra Gayo, H. Solbrig, Shape Expressions: An RDF Validation and Transformation Language, in: Proceedings of the 10th International Conference on Semantic Systems, ACM, 2014. doi:1 0 . 1 1 4 5 / 2 6 6 0 5 1 7 . 2 6 6 0 5 2 3 . [10] P. Heyvaert, B. De Meester, A. Dimou, R. Verborgh, Declarative rules for linked data generation at your fingertips!, in: The Semantic Web: ESWC Satellite Events, Springer, 2018. doi:1 0 . 1 0 0 7 / 9 7 8 - 3 - 3 1 9 - 9 8 1 9 2 - 5 . [11] A. Iglesias-Molina, A. Cimmino, E. Ruckhaus1, D. Chaves-Fraga, R. García-Castro, O. Cor- cho, An Ontological Approach for Integrating Declarative Mapping Languages, Semantic Web (2022). (In press). [12] B. De Meester, T. Seymoens, A. Dimou, R. Verborgh, Implementation-independent function reuse, Future Generation Computer Systems (2020). doi:1 0 . 1 0 1 6 / j . f u t u r e . 2 0 1 9 . 1 0 . 0 0 6 . [13] H. García-González, A ShExML Perspective on Mapping Challenges: Already Solved Ones, Language Modifications and Future Required Actions, in: Proceedings of the 2nd International Workshop on Knowledge Graph Construction, CEUR, 2021. [14] T. Delva, D. V. Assche, P. Heyvaert, B. D. Meester, A. Dimou, Integrating Nested Data into Knowledge Graphs with RML Fields, in: Proceedings of 2nd Workshop on Knowledge Graph Construction, CEUR, 2021.