=Paper=
{{Paper
|id=Vol-3718/paper10
|storemode=property
|title=Backwards or Forwards? [R2]RML backwards compatibility in RMLMapper
|pdfUrl=https://ceur-ws.org/Vol-3718/paper10.pdf
|volume=Vol-3718
|authors=Dylan Van Assche,Jozef Jankaj,Ben De Meester
|dblpUrl=https://dblp.org/rec/conf/kgcw/AsscheJM24
}}
==Backwards or Forwards? [R2]RML backwards compatibility in RMLMapper==
<pdf width="1500px">https://ceur-ws.org/Vol-3718/paper10.pdf</pdf>
<pre>
                                Backwards or Forwards? [R2]RML Backwards
                                Compatibility in RMLMapper
                                Dylan Van Assche1,* , Jozef Jankaj1 and Ben De Meester1,*
                                1
                                    IDLab, Dept. Electronics & Information Systems, Ghent University – imec, Belgium


                                               Abstract
                                               During the past decade, RML was proposed as an extension to the W3C’s R2RML Recommendation for
                                               supporting heterogeneous data sources. Although RML (RMLio flavour) was not a W3C Recommendation,
                                               it gained a lot of traction, and has been extended by the KG-Construct W3C Community Group as
                                               RMLKGC . Currently, this results in three main flavours (i.e. R2RML, RMLio , and KG-Construct’s RMLKGC )
                                               used among users of these mapping languages. Therefore, many existing mappings cannot be used
                                               among all existing [R2]RML engines, since they only implement one [R2]RML flavour. In this paper, we
                                               implement a translation of all flavours into the latest RML flavour (i.e. RMLKGC ) within RMLMapper. This
                                               way, any mapping – no matter which flavour of [R2]RML was used – can be executed by RMLMapper. We
                                               discuss our translation approach and evaluate it in the KGCW Challenge 2024 Track 1 and all available
                                               RMLio and R2RML test cases to verify our translation into RMLKGC . We were able to translate R2RML
                                               and RMLio to RMLKGC Core (98,7%) and some parts of RMLKGC IO (50,75%) modules without changing
                                               the [R2]RML mappings. We reach a total coverage of 73,70% among all RMLKGC test cases and 100%
                                               coverage for RMLio and R2RML test cases. Thanks to our translation approach, we can re-use the same
                                               RMLMapper for all flavours without requiring the user to change their mappings. In the future, we aim
                                               to support all RMLKGC modules, while keeping support for the other flavours.

                                               Keywords
                                               RML, Knowledge Graph Construction, RMLMapper, Challenge


                                1. Introduction
                                During the past decade, RML was proposed as an extension to the W3C’s R2RML Recommen-
                                dation for supporting heterogeneous data sources. On its own, RML has been revised by the
                                KG-Construct W3C Community Group.
                                  Nowadays, multiple flavours of [R2]RML exist: W3C’s Recommended R2RML specification [1]
                                (R2RML), the RML specification initiated by Dimou [2] (RMLio ) and maintained throughout the
                                years on https://rml.io (RMLio , v1.1.21 ), and a new major revision [3] (RMLKGC ), maintained by
                                the W3C Community Group on Knowledge Graph Construction (RMLKGC , by KG-Construct).
                                Not all flavours are supported by existing RML engines [4] such as SDM-RDFizer [5], Morph-
                                KGC [6], RMLMapper [7], RMLStreamer [8].

                                KGCW’24: 5th International Workshop on Knowledge Graph Construction, May 27, 2024, Crete, GRE
                                *
                                 Corresponding author.
                                $ dylan.vanassche@ugent.be (D. Van Assche); jozef.jankaj@ugent.be (J. Jankaj); ben.demeester@ugent.be (B. De
                                Meester)
                                 https://dylanvanassche.be (D. Van Assche); https://ben.de-meester.org/#me (B. De Meester)
                                 0000-0002-7195-9935 (D. Van Assche); 0000-0003-0248-0987 (B. De Meester)
                                             © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
                                1
                                    https://rml.io/specs/rml/v/1.1.2/


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
   Users with existing RML mappings are limited in the RML engines they can use, since no
engine supports all [R2]RML flavours. RMLKGC can represent all elements present in R2RML and
RMLio through translation because each specification is backwards compatible with each other.
However, no engine takes advantage of this backwards compatibility to represent the other
flavours as RMLKGC . Therefore, users must translate all their existing mappings first if they
want to use a different engine which supports a newer flavour. Moreover, we do not want to
deprecate support for R2RML and RMLio flavours in our existing mapping engine RMLMapper2
when adding support for RMLKGC . We overcome this problem in RMLMapper by translating all
three [R2]RML flavours into RMLKGC [3]. By applying our translation, RMLMapper can now
read any RML mapping without requiring the user to change them, written in R2RML, RMLio ,
and RMLKGC .
   Thanks to our translation in RMLMapper, users can still run their existing [R2]RML mappings
while the Knowledge Graph Construction community can work towards the standardization of
RML as a W3C Recommendation in the future.


2. Approach
In this Section, we show our translation for R2RML and RMLio into RMLKGC . This translation
is implemented in RMLMapper and automatically applied without any intervention of the
user. Therefore, the user does not have to migrate existing [R2]RML mappings into RMLKGC
immediately.

2.1. Translation
We compare the different [R2]RML flavours among each other to establish a translation path
towards RMLKGC . Tables 1 & 2 list all required translations to translate R2RML and RMLio
into RMLKGC . The biggest translations rely on the removal of R2RML Logical Table in favor
of RMLKGC Logical Source, the ontology prefixes which are different between flavours, and
the access descriptions for data sources. In this work, we cover the RMLKGC Core which focus
on RDF generation with RML Triples Maps and parts of the RMLKGC IO modules used for
accessing data sources and targets in RML because they overlap with the R2RML and RMLio
flavours. Translation is required to avoid implementing all flavours separately in engines such
as RMLMapper. In the future, we will expand our work to the other RMLKGC modules.

Prefixes Every [R2]RML flavour has its own prefix which allows us to recognize which flavour
of [R2]RML was used to create the mapping. Since most of the terms in the ontologies are similar
to each other, we translate the prefixes of R2RML and RMLio into RMLKGC by replacing them
with the new prefix. However, some changes in RMLKGC require additional transformations
which we describe in the next paragraphs.

Literals in rml:source RMLio mappings consistently use Literals in the RMLio Logical
Source’s rml:source to describe the path to a file which is used in the mapping as data source.
2
    https://github.com/RMLio/rmlmapper-java
Table 1
Translations from RMLio to RMLKGC . Queries used when accessing SQL databases and SPARQL endpoints
need to be transformed into an iterator and corresponding reference formulation. File paths as string
Literals in a RML Source must be transformed into a DCAT access description or RMLKGC Relative Path
Source for relative file paths.
                       RMLio                                      RMLKGC
                                                      Classes
         ql:XPath                               rml:XPath
         ql:CSV                                 rml:CSV
         ql:JSONPath                            rml:JSONPath
         rml:LogicalSource                      rml:LogicalSource
         rml:BaseSource                         rml:LogicalSource
         rml:LanguageMap                        rml:LanguageMap
                                                     Properties
         rml:iterator                           rml:iterator
         rml:logicalSource                      rml:logicalSource
         rml:reference                          rml:reference
         rml:referenceFormulation               rml:referenceFormulation
         rml:languageMap                        rml:languageMap
                                              Transformations
         rml:query                             rml:iterator + rml:referenceFormulation
         Literals in rml:source                DCAT or RMLKGC Relative Path Source


This approach was deprecated in 2015 [9] and replaced by access descriptions such as DCAT [10],
SD [11], or D2RQ [12] to access heterogeneous data sources, e.g. files, SPARQL services, or
databases. RMLKGC drops this deprecated option which requires a transformation when a
mapping still uses Literals for rml:source. If we encounter such a case, we replace it by a
DCAT access description. If the path to the file is a relative path, we cannot use DCAT since
there is no base IRI available to resolve the relative path against. To overcome this problem, an
RMLKGC Relative Path Source was introduced to handle this case3 .

rml:query        The RMLio specification4 does not indicate how queries must be specified in the
case of relational databases or SPARQL services. Over the past decade, an unofficial predicate
rml:query was used to address this problem. This way, queries could be specified in the
RMLio mapping to access such sources. The W3C Community Group on Knowledge Graph
Construction incorporated the query property in the iterator, but there is still discussion around
this approach5 . In this work, we do perform this transformation and add the necessary access
descriptions for the relational database or SPARQL service if needed.

rr:tableName & rr:LogicalTable R2RML Logical Table and rr:tableName shortcut must
be translated completely into RMLKGC Logical Source since a Logical Source is an expansion of
an R2RML Logical Table. We perform this transformation by moving the query from R2RML
3
  https://github.com/kg-construct/rml-io/issues/36
4
  https://rml.io/specs/rml/v/1.1.2/
5
  https://github.com/kg-construct/rml-io/issues/28
Logical Table into the iterator and adding the access description of the database using the D2RQ
ontology6 . The reference formulation is set to rml:SQL2008Table. For rr:tableName, we
also add the access description using the D2RQ ontology, and place the table name as well in
the iterator. However, the reference formulation is set to rml:SQL2008Table, allowing RML
engines to detect that they receive a table name instead of a SQL query.

2.2. Implementation in RMLMapper
Our translation is applied when RMLMapper parses the [R2]RML mappings. Each part of
the [R2]RML mapping which is not using RMLKGC , is translated internally. This way, the
RMLMapper operates on [R2]RML mappings based on the latest RMLKGC version. Only for
R2RML, the database details needs to be supplied by the user as R2RML does not include the
database access information in its mappings. Our implementation in the RMLMapper is written
in Java, released as v7.0.0, and available on GitHub7 under the MIT license.


3. Evaluation
In this Section, we evaluate our translation approach on the RMLKGC test cases of the Knowledge
Graph Construction Workshop (KGCW) Challenge 2024 Track 18 to verify our implementation
and identify which parts of the RMLKGC modules are (not) supported. Moreover, we have
validated all R2RML and RMLio test cases’ RDF output on the RMLMapper for correctness to
avoid that our translation approach breaks the other [R2]RML flavors when translating into
RMLKGC .
   The KGCW Challenge 2024 Track 1 consists of 365 test cases from 5 different RMLKGC
modules: RMLKGC Core (238 test cases), RMLKGC IO (67 test cases), RMLKGC FNML (13 test
cases), RMLKGC CC (29 test cases), and RMLKGC Star (18 test cases). Each module provides a
set of test cases to evaluate the compliance of engines with the specification provided by the
module. RMLKGC Core has the most test cases because it contains the core functionality for
generating RDF using RMLKGC mappings, followed by RMLKGC IO which focus on accessing
various data sources and targets used in RMLKGC mappings. The other modules have lower
number of test cases, thus engines supporting RMLKGC Core and RMLKGC IO already have a high
coverage of the new RMLKGC flavour. We calculate the coverage of each module by dividing the
number of passing test cases by the number of test cases per module. The Knowledge Graph
Construction W3C Community Group does not provide a detailed description for each test case
yet, but it is planned for the future.
   Table 3 shows the coverage of RMLMapper with all [R2]RML test cases with and without
our approach. Without our translation approach, RMLMapper achieves 0% coverage on the
RMLKGC test cases of the KGCW Challenge. RMLMapper passes 100% of the R2RML and RMLio
test cases. We achieve 98,70% coverage for RMLKGC Core and 50,75% coverage for RMLKGC IO.
RMLKGC Core has a few test cases where RMLMapper fails to provide the correct output: For

6
  http://www.wiwiss.fu-berlin.de/suhl/bizer/D2RQ/0.1#
7
  https://doi.org/10.5281/zenodo.11518178
8
  https://doi.org/10.5281/zenodo.10721874
Table 2
Translations from W3C Recommended R2RML into RMLKGC . R2RML’s specific access descriptions for
SQL tables (Logical Table, BaseTableOrView, R2RMLView, tableName) need to be transformed into a
RMLKGC Logical Source with a D2RQ Database access description for accessing SQL tables.
                     R2RML                               RMLKGC
                                           Classes
           rr:Literal                   rml:Literal
           rr:BlankNode                 rml:BlankNode
           rr:IRI                       rml:IRI
           rr:SQL2008                   rml:SQL2008Table or rml:SQL2008Query
           rr:TriplesMap                rml:TriplesMap
           rr:SubjectMap                rml:SubjectMap
           rr:PredicateObjectMap        rml:PredicateObjectMap
           rr:PredicateMap              rml:PredicateMap
           rr:ObjectMap                 rml:ObjectMap
           rr:TermMap                   rml:TermMap
           rr:GraphMap                  rml:GraphMap
           rr:Join                      rml:Join
           rr:RefObjectMap              rml:RefObjectMap
           rr:defaultGraph              rml:defaultGraph
                                          Properties
           rr:joinCondition             rml:joinCondition
           rr:parent                    rml:parent
           rr:child                     rml:child
           rr:parentTriplesMap          rml:parentTriplesMap
           rr:column                    rml:reference
           rr:class                     rml:class
           rr:constant                  rml:constant
           rr:datatype                  rml:datatype
           rr:graph                     rml:graph
           rr:graphMap                  rml:graphMap
           rr:language                  rml:language
           rr:object                    rml:object
           rr:objectMap                 rml:objectMap
           rr:predicate                 rml:predicate
           rr:predicateMap              rml:predicateMap
           rr:predicateObjectMap        rml:predicateObjectMap
           rr:subject                   rml:subject
           rr:subjectMap                rml:subjectMap
           rr:termType                  rml:termType
           rr:template                  rml:template
           rr:logicalTable              rml:logicalSource
                                      Transformations
           rr:BaseTableOrView          RMLKGC Logical Source + D2RQ Database
           rr:R2RMLView                RMLKGC Logical Source + D2RQ Database
           rr:Logical Table            RMLKGC Logical Source + D2RQ Database
           rr:tableName                RMLKGC Logical Source + D2RQ Database
Table 3
Coverage results of the RMLKGC test cases with and without our translation approach by the RMLMapper.
Without translation, RMLMapper cannot execute any of the RMLKGC test cases. RMLKGC FNML, CC,
and Star modules are currently unsupported by RMLMapper. Total coverage of R2RML and RMLio test
cases is 100% and coverage of all RMLKGC test cases is 73,70%.
                         Test cases       Without translation   With translation
                        RMLKGC Core                        0%             98,70%
                        RMLKGC IO                          0%             50,75%
                        RMLio                            100%               100%
                        R2RML                            100%               100%


RMLKGC Core, the test case RMLTC0010{a,b,c}-JSON fails for RMLMapper as it uses the latest
IETF JSONPath expressions which are not supported yet by the RMLMapper. For RMLKGC IO,
new serialization and compression formats are not implemented in RMLMapper. Moreover,
compressed data sources cannot be accessed yet by RMLMapper. We did not implement specific
translations yet for FnO functions, provided by the RMLKGC FNML and other modules such as
RMLKGC Star for RDF-Star support or RMLKGC CC to generate RDFS Collections & Containers.
Therefore, we expected only a small set of test cases would succeed for these modules. The total
coverage of RMLKGC test cases we reach for RMLMapper is 73,70%. RMLMapper still passes
100% of the R2RML and RMLio test cases with our translation approach.


4. Conclusion
In this paper, we showed our approach for translating R2RML and RMLio into the latest RMLKGC
and evaluated it on the RMLKGC test cases. Thanks to our work, users can still execute their
existing RML mappings while the community works towards a standardization of RMLKGC
as a W3C Recommendation. In the future, we aim to support more RMLKGC modules besides
RMLKGC Core and IO, and perform an evaluation of the translation itself since we focus in this
work on participating in the KGCW Challenge which only evaluate the generated RDF of each
engine.


Acknowledgments
The described research activities were supported by SolidLab Vlaanderen (Flemish Government,
EWI and RRF project VV023/10). Dylan Van Assche is supported by the Special Research Fund
of Ghent University9 under grant BOF20/DOC/132.


References
    [1] S. Das, S. Sundara, R. Cyganiak, R2RML: RDB to RDF Mapping Language, Working Group
        Recommendation, World Wide Web Consortium (W3C), 2012. URL: http://www.w3.org/

9
    https://www.ugent.be/en/research/funding/bof/overview.htm
     TR/r2rml/.
 [2] A. Dimou, M. Vander Sande, P. Colpaert, R. Verborgh, E. Mannens, R. Van de Walle,
     RML: A Generic Language for Integrated RDF Mappings of Heterogeneous Data, in:
     C. Bizer, T. Heath, S. Auer, T. Berners-Lee (Eds.), Proceedings of the 7th Workshop on
     Linked Data on the Web, volume 1184 of CEUR Workshop Proceedings, CEUR, 2014. URL:
     http://ceur-ws.org/Vol-1184/ldow2014_paper_01.pdf.
 [3] A. Iglesias-Molina, D. Van Assche, J. Arenas-Guerrero, B. De Meester, C. Debruyne, S. Joza-
     shoori, P. Maria, F. Michel, D. Chaves-Fraga, A. Dimou, The RML Ontology: A Community-
     Driven Modular Redesign After a Decade of Experience in Mapping Heterogeneous Data
     to RDF, in: Submited to ISWC2023, 2023.
 [4] D. Van Assche, T. Delva, G. Haesendonck, P. Heyvaert, B. De Meester, A. Dimou, Declarative
     RDF graph generation from heterogeneous (semi-)structured data: A systematic literature
     review, Journal of Web Semantics (2022). doi:10.1016/j.websem.2022.100753.
 [5] E. Iglesias, S. Jozashoori, D. Chaves-Fraga, D. Collarana, M.-E. Vidal, SDM-RDFizer: An
     RML Interpreter for the Efficient Creation of RDF Knowledge Graphs, in: Proceedings of
     the 29th ACM International Conference on Information & Knowledge Management, ACM,
     2020. doi:10.1145/3340531.3412881.
 [6] J. Arenas-Guerrero, D. Chaves-Fraga, J. Toledo, M. S. Pérez, O. Corcho, Morph-KGC:
     Scalable knowledge graph materialization with mapping partitions, Semantic Web (2022)
     1–20. doi:10.3233/sw-223135.
 [7] P. Heyvaert, B. De Meester, D. Van Assche, et al., Rmlmapper, 2024. URL: https://github.
     com/RMLio/rmlmapper-java.
 [8] G. Haesendonck, W. Maroy, P. Heyvaert, R. Verborgh, A. Dimou, Parallel RDF generation
     from heterogeneous big data, in: S. Groppe, L. Gruenwald (Eds.), Proceedings of the
     International Workshop on Semantic Big Data - SBD '19, number 1 in SBD ’19, ACM
     Press, Amsterdam, Netherlands, 2019. URL: https://biblio.ugent.be/publication/8619808/
     file/8659668.pdf. doi:10.1145/3323878.3325802.
 [9] A. Dimou, R. Verborgh, M. V. Sande, E. Mannens, R. V. de Walle, Machine-interpretable
     dataset and service descriptions for heterogeneous data access and retrieval, in: Proceed-
     ings of the 11th International Conference on Semantic Systems - SEMANTICS '15, ACM
     Press, 2015. doi:10.1145/2814864.2814873.
[10] R. Albertoni, D. Browning, S. Cox, A. Gonzalez Beltran, A. Perego, P. Winstanley, Data
     Catalog Vocabulary (DCAT) - Version 2, Recommendation, World Wide Web Consortium
     (W3C), 2020. URL: https://www.w3.org/TR/vocab-dcat/.
[11] G. Williams, SPARQL 1.1 Service Description, Recommendation, World Wide Web Consor-
     tium (W3C), 2013. URL: https://www.w3.org/TR/sparql11-service-description/.
[12] R. Cyganiak, C. Bizer, J. Garbers, O. Maresch, C. Becker, The D2RQ Mapping Language,
     Technical Report, FU Berlin, DERI, UCB, JP Morgan Chase, AGFA Healthcare, HP Labs,
     Johannes Kepler Universität Linz, 2012. URL: http://d2rq.org/d2rq-language.

</pre>