Backwards or Forwards? [R2]RML Backwards Compatibility in RMLMapper Dylan Van Assche1,* , Jozef Jankaj1 and Ben De Meester1,* 1 IDLab, Dept. Electronics & Information Systems, Ghent University – imec, Belgium Abstract During the past decade, RML was proposed as an extension to the W3C’s R2RML Recommendation for supporting heterogeneous data sources. Although RML (RMLio flavour) was not a W3C Recommendation, it gained a lot of traction, and has been extended by the KG-Construct W3C Community Group as RMLKGC . Currently, this results in three main flavours (i.e. R2RML, RMLio , and KG-Construct’s RMLKGC ) used among users of these mapping languages. Therefore, many existing mappings cannot be used among all existing [R2]RML engines, since they only implement one [R2]RML flavour. In this paper, we implement a translation of all flavours into the latest RML flavour (i.e. RMLKGC ) within RMLMapper. This way, any mapping – no matter which flavour of [R2]RML was used – can be executed by RMLMapper. We discuss our translation approach and evaluate it in the KGCW Challenge 2024 Track 1 and all available RMLio and R2RML test cases to verify our translation into RMLKGC . We were able to translate R2RML and RMLio to RMLKGC Core (98,7%) and some parts of RMLKGC IO (50,75%) modules without changing the [R2]RML mappings. We reach a total coverage of 73,70% among all RMLKGC test cases and 100% coverage for RMLio and R2RML test cases. Thanks to our translation approach, we can re-use the same RMLMapper for all flavours without requiring the user to change their mappings. In the future, we aim to support all RMLKGC modules, while keeping support for the other flavours. Keywords RML, Knowledge Graph Construction, RMLMapper, Challenge 1. Introduction During the past decade, RML was proposed as an extension to the W3C’s R2RML Recommen- dation for supporting heterogeneous data sources. On its own, RML has been revised by the KG-Construct W3C Community Group. Nowadays, multiple flavours of [R2]RML exist: W3C’s Recommended R2RML specification [1] (R2RML), the RML specification initiated by Dimou [2] (RMLio ) and maintained throughout the years on https://rml.io (RMLio , v1.1.21 ), and a new major revision [3] (RMLKGC ), maintained by the W3C Community Group on Knowledge Graph Construction (RMLKGC , by KG-Construct). Not all flavours are supported by existing RML engines [4] such as SDM-RDFizer [5], Morph- KGC [6], RMLMapper [7], RMLStreamer [8]. KGCW’24: 5th International Workshop on Knowledge Graph Construction, May 27, 2024, Crete, GRE * Corresponding author. $ dylan.vanassche@ugent.be (D. Van Assche); jozef.jankaj@ugent.be (J. Jankaj); ben.demeester@ugent.be (B. De Meester) € https://dylanvanassche.be (D. Van Assche); https://ben.de-meester.org/#me (B. De Meester)  0000-0002-7195-9935 (D. Van Assche); 0000-0003-0248-0987 (B. De Meester) © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). 1 https://rml.io/specs/rml/v/1.1.2/ CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings Users with existing RML mappings are limited in the RML engines they can use, since no engine supports all [R2]RML flavours. RMLKGC can represent all elements present in R2RML and RMLio through translation because each specification is backwards compatible with each other. However, no engine takes advantage of this backwards compatibility to represent the other flavours as RMLKGC . Therefore, users must translate all their existing mappings first if they want to use a different engine which supports a newer flavour. Moreover, we do not want to deprecate support for R2RML and RMLio flavours in our existing mapping engine RMLMapper2 when adding support for RMLKGC . We overcome this problem in RMLMapper by translating all three [R2]RML flavours into RMLKGC [3]. By applying our translation, RMLMapper can now read any RML mapping without requiring the user to change them, written in R2RML, RMLio , and RMLKGC . Thanks to our translation in RMLMapper, users can still run their existing [R2]RML mappings while the Knowledge Graph Construction community can work towards the standardization of RML as a W3C Recommendation in the future. 2. Approach In this Section, we show our translation for R2RML and RMLio into RMLKGC . This translation is implemented in RMLMapper and automatically applied without any intervention of the user. Therefore, the user does not have to migrate existing [R2]RML mappings into RMLKGC immediately. 2.1. Translation We compare the different [R2]RML flavours among each other to establish a translation path towards RMLKGC . Tables 1 & 2 list all required translations to translate R2RML and RMLio into RMLKGC . The biggest translations rely on the removal of R2RML Logical Table in favor of RMLKGC Logical Source, the ontology prefixes which are different between flavours, and the access descriptions for data sources. In this work, we cover the RMLKGC Core which focus on RDF generation with RML Triples Maps and parts of the RMLKGC IO modules used for accessing data sources and targets in RML because they overlap with the R2RML and RMLio flavours. Translation is required to avoid implementing all flavours separately in engines such as RMLMapper. In the future, we will expand our work to the other RMLKGC modules. Prefixes Every [R2]RML flavour has its own prefix which allows us to recognize which flavour of [R2]RML was used to create the mapping. Since most of the terms in the ontologies are similar to each other, we translate the prefixes of R2RML and RMLio into RMLKGC by replacing them with the new prefix. However, some changes in RMLKGC require additional transformations which we describe in the next paragraphs. Literals in rml:source RMLio mappings consistently use Literals in the RMLio Logical Source’s rml:source to describe the path to a file which is used in the mapping as data source. 2 https://github.com/RMLio/rmlmapper-java Table 1 Translations from RMLio to RMLKGC . Queries used when accessing SQL databases and SPARQL endpoints need to be transformed into an iterator and corresponding reference formulation. File paths as string Literals in a RML Source must be transformed into a DCAT access description or RMLKGC Relative Path Source for relative file paths. RMLio RMLKGC Classes ql:XPath rml:XPath ql:CSV rml:CSV ql:JSONPath rml:JSONPath rml:LogicalSource rml:LogicalSource rml:BaseSource rml:LogicalSource rml:LanguageMap rml:LanguageMap Properties rml:iterator rml:iterator rml:logicalSource rml:logicalSource rml:reference rml:reference rml:referenceFormulation rml:referenceFormulation rml:languageMap rml:languageMap Transformations rml:query rml:iterator + rml:referenceFormulation Literals in rml:source DCAT or RMLKGC Relative Path Source This approach was deprecated in 2015 [9] and replaced by access descriptions such as DCAT [10], SD [11], or D2RQ [12] to access heterogeneous data sources, e.g. files, SPARQL services, or databases. RMLKGC drops this deprecated option which requires a transformation when a mapping still uses Literals for rml:source. If we encounter such a case, we replace it by a DCAT access description. If the path to the file is a relative path, we cannot use DCAT since there is no base IRI available to resolve the relative path against. To overcome this problem, an RMLKGC Relative Path Source was introduced to handle this case3 . rml:query The RMLio specification4 does not indicate how queries must be specified in the case of relational databases or SPARQL services. Over the past decade, an unofficial predicate rml:query was used to address this problem. This way, queries could be specified in the RMLio mapping to access such sources. The W3C Community Group on Knowledge Graph Construction incorporated the query property in the iterator, but there is still discussion around this approach5 . In this work, we do perform this transformation and add the necessary access descriptions for the relational database or SPARQL service if needed. rr:tableName & rr:LogicalTable R2RML Logical Table and rr:tableName shortcut must be translated completely into RMLKGC Logical Source since a Logical Source is an expansion of an R2RML Logical Table. We perform this transformation by moving the query from R2RML 3 https://github.com/kg-construct/rml-io/issues/36 4 https://rml.io/specs/rml/v/1.1.2/ 5 https://github.com/kg-construct/rml-io/issues/28 Logical Table into the iterator and adding the access description of the database using the D2RQ ontology6 . The reference formulation is set to rml:SQL2008Table. For rr:tableName, we also add the access description using the D2RQ ontology, and place the table name as well in the iterator. However, the reference formulation is set to rml:SQL2008Table, allowing RML engines to detect that they receive a table name instead of a SQL query. 2.2. Implementation in RMLMapper Our translation is applied when RMLMapper parses the [R2]RML mappings. Each part of the [R2]RML mapping which is not using RMLKGC , is translated internally. This way, the RMLMapper operates on [R2]RML mappings based on the latest RMLKGC version. Only for R2RML, the database details needs to be supplied by the user as R2RML does not include the database access information in its mappings. Our implementation in the RMLMapper is written in Java, released as v7.0.0, and available on GitHub7 under the MIT license. 3. Evaluation In this Section, we evaluate our translation approach on the RMLKGC test cases of the Knowledge Graph Construction Workshop (KGCW) Challenge 2024 Track 18 to verify our implementation and identify which parts of the RMLKGC modules are (not) supported. Moreover, we have validated all R2RML and RMLio test cases’ RDF output on the RMLMapper for correctness to avoid that our translation approach breaks the other [R2]RML flavors when translating into RMLKGC . The KGCW Challenge 2024 Track 1 consists of 365 test cases from 5 different RMLKGC modules: RMLKGC Core (238 test cases), RMLKGC IO (67 test cases), RMLKGC FNML (13 test cases), RMLKGC CC (29 test cases), and RMLKGC Star (18 test cases). Each module provides a set of test cases to evaluate the compliance of engines with the specification provided by the module. RMLKGC Core has the most test cases because it contains the core functionality for generating RDF using RMLKGC mappings, followed by RMLKGC IO which focus on accessing various data sources and targets used in RMLKGC mappings. The other modules have lower number of test cases, thus engines supporting RMLKGC Core and RMLKGC IO already have a high coverage of the new RMLKGC flavour. We calculate the coverage of each module by dividing the number of passing test cases by the number of test cases per module. The Knowledge Graph Construction W3C Community Group does not provide a detailed description for each test case yet, but it is planned for the future. Table 3 shows the coverage of RMLMapper with all [R2]RML test cases with and without our approach. Without our translation approach, RMLMapper achieves 0% coverage on the RMLKGC test cases of the KGCW Challenge. RMLMapper passes 100% of the R2RML and RMLio test cases. We achieve 98,70% coverage for RMLKGC Core and 50,75% coverage for RMLKGC IO. RMLKGC Core has a few test cases where RMLMapper fails to provide the correct output: For 6 http://www.wiwiss.fu-berlin.de/suhl/bizer/D2RQ/0.1# 7 https://doi.org/10.5281/zenodo.11518178 8 https://doi.org/10.5281/zenodo.10721874 Table 2 Translations from W3C Recommended R2RML into RMLKGC . R2RML’s specific access descriptions for SQL tables (Logical Table, BaseTableOrView, R2RMLView, tableName) need to be transformed into a RMLKGC Logical Source with a D2RQ Database access description for accessing SQL tables. R2RML RMLKGC Classes rr:Literal rml:Literal rr:BlankNode rml:BlankNode rr:IRI rml:IRI rr:SQL2008 rml:SQL2008Table or rml:SQL2008Query rr:TriplesMap rml:TriplesMap rr:SubjectMap rml:SubjectMap rr:PredicateObjectMap rml:PredicateObjectMap rr:PredicateMap rml:PredicateMap rr:ObjectMap rml:ObjectMap rr:TermMap rml:TermMap rr:GraphMap rml:GraphMap rr:Join rml:Join rr:RefObjectMap rml:RefObjectMap rr:defaultGraph rml:defaultGraph Properties rr:joinCondition rml:joinCondition rr:parent rml:parent rr:child rml:child rr:parentTriplesMap rml:parentTriplesMap rr:column rml:reference rr:class rml:class rr:constant rml:constant rr:datatype rml:datatype rr:graph rml:graph rr:graphMap rml:graphMap rr:language rml:language rr:object rml:object rr:objectMap rml:objectMap rr:predicate rml:predicate rr:predicateMap rml:predicateMap rr:predicateObjectMap rml:predicateObjectMap rr:subject rml:subject rr:subjectMap rml:subjectMap rr:termType rml:termType rr:template rml:template rr:logicalTable rml:logicalSource Transformations rr:BaseTableOrView RMLKGC Logical Source + D2RQ Database rr:R2RMLView RMLKGC Logical Source + D2RQ Database rr:Logical Table RMLKGC Logical Source + D2RQ Database rr:tableName RMLKGC Logical Source + D2RQ Database Table 3 Coverage results of the RMLKGC test cases with and without our translation approach by the RMLMapper. Without translation, RMLMapper cannot execute any of the RMLKGC test cases. RMLKGC FNML, CC, and Star modules are currently unsupported by RMLMapper. Total coverage of R2RML and RMLio test cases is 100% and coverage of all RMLKGC test cases is 73,70%. Test cases Without translation With translation RMLKGC Core 0% 98,70% RMLKGC IO 0% 50,75% RMLio 100% 100% R2RML 100% 100% RMLKGC Core, the test case RMLTC0010{a,b,c}-JSON fails for RMLMapper as it uses the latest IETF JSONPath expressions which are not supported yet by the RMLMapper. For RMLKGC IO, new serialization and compression formats are not implemented in RMLMapper. Moreover, compressed data sources cannot be accessed yet by RMLMapper. We did not implement specific translations yet for FnO functions, provided by the RMLKGC FNML and other modules such as RMLKGC Star for RDF-Star support or RMLKGC CC to generate RDFS Collections & Containers. Therefore, we expected only a small set of test cases would succeed for these modules. The total coverage of RMLKGC test cases we reach for RMLMapper is 73,70%. RMLMapper still passes 100% of the R2RML and RMLio test cases with our translation approach. 4. Conclusion In this paper, we showed our approach for translating R2RML and RMLio into the latest RMLKGC and evaluated it on the RMLKGC test cases. Thanks to our work, users can still execute their existing RML mappings while the community works towards a standardization of RMLKGC as a W3C Recommendation. In the future, we aim to support more RMLKGC modules besides RMLKGC Core and IO, and perform an evaluation of the translation itself since we focus in this work on participating in the KGCW Challenge which only evaluate the generated RDF of each engine. Acknowledgments The described research activities were supported by SolidLab Vlaanderen (Flemish Government, EWI and RRF project VV023/10). Dylan Van Assche is supported by the Special Research Fund of Ghent University9 under grant BOF20/DOC/132. References [1] S. Das, S. Sundara, R. Cyganiak, R2RML: RDB to RDF Mapping Language, Working Group Recommendation, World Wide Web Consortium (W3C), 2012. URL: http://www.w3.org/ 9 https://www.ugent.be/en/research/funding/bof/overview.htm TR/r2rml/. [2] A. Dimou, M. Vander Sande, P. Colpaert, R. Verborgh, E. Mannens, R. Van de Walle, RML: A Generic Language for Integrated RDF Mappings of Heterogeneous Data, in: C. Bizer, T. Heath, S. Auer, T. Berners-Lee (Eds.), Proceedings of the 7th Workshop on Linked Data on the Web, volume 1184 of CEUR Workshop Proceedings, CEUR, 2014. URL: http://ceur-ws.org/Vol-1184/ldow2014_paper_01.pdf. [3] A. Iglesias-Molina, D. Van Assche, J. Arenas-Guerrero, B. De Meester, C. Debruyne, S. Joza- shoori, P. Maria, F. Michel, D. Chaves-Fraga, A. Dimou, The RML Ontology: A Community- Driven Modular Redesign After a Decade of Experience in Mapping Heterogeneous Data to RDF, in: Submited to ISWC2023, 2023. [4] D. Van Assche, T. Delva, G. Haesendonck, P. Heyvaert, B. De Meester, A. Dimou, Declarative RDF graph generation from heterogeneous (semi-)structured data: A systematic literature review, Journal of Web Semantics (2022). doi:10.1016/j.websem.2022.100753. [5] E. Iglesias, S. Jozashoori, D. Chaves-Fraga, D. Collarana, M.-E. Vidal, SDM-RDFizer: An RML Interpreter for the Efficient Creation of RDF Knowledge Graphs, in: Proceedings of the 29th ACM International Conference on Information & Knowledge Management, ACM, 2020. doi:10.1145/3340531.3412881. [6] J. Arenas-Guerrero, D. Chaves-Fraga, J. Toledo, M. S. Pérez, O. Corcho, Morph-KGC: Scalable knowledge graph materialization with mapping partitions, Semantic Web (2022) 1–20. doi:10.3233/sw-223135. [7] P. Heyvaert, B. De Meester, D. Van Assche, et al., Rmlmapper, 2024. URL: https://github. com/RMLio/rmlmapper-java. [8] G. Haesendonck, W. Maroy, P. Heyvaert, R. Verborgh, A. Dimou, Parallel RDF generation from heterogeneous big data, in: S. Groppe, L. Gruenwald (Eds.), Proceedings of the International Workshop on Semantic Big Data - SBD '19, number 1 in SBD ’19, ACM Press, Amsterdam, Netherlands, 2019. URL: https://biblio.ugent.be/publication/8619808/ file/8659668.pdf. doi:10.1145/3323878.3325802. [9] A. Dimou, R. Verborgh, M. V. Sande, E. Mannens, R. V. de Walle, Machine-interpretable dataset and service descriptions for heterogeneous data access and retrieval, in: Proceed- ings of the 11th International Conference on Semantic Systems - SEMANTICS '15, ACM Press, 2015. doi:10.1145/2814864.2814873. [10] R. Albertoni, D. Browning, S. Cox, A. Gonzalez Beltran, A. Perego, P. Winstanley, Data Catalog Vocabulary (DCAT) - Version 2, Recommendation, World Wide Web Consortium (W3C), 2020. URL: https://www.w3.org/TR/vocab-dcat/. [11] G. Williams, SPARQL 1.1 Service Description, Recommendation, World Wide Web Consor- tium (W3C), 2013. URL: https://www.w3.org/TR/sparql11-service-description/. [12] R. Cyganiak, C. Bizer, J. Garbers, O. Maresch, C. Becker, The D2RQ Mapping Language, Technical Report, FU Berlin, DERI, UCB, JP Morgan Chase, AGFA Healthcare, HP Labs, Johannes Kepler Universität Linz, 2012. URL: http://d2rq.org/d2rq-language.