<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Backwards or Forwards? [R2]RML Backwards Compatibility in RMLMapper</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Dylan Van Assche</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jozef Jankaj</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ben De Meester</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>IDLab, Dept. Electronics &amp; Information Systems, Ghent University - imec</institution>
          ,
          <country country="BE">Belgium</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>During the past decade, RML was proposed as an extension to the W3C's R2RML Recommendation for supporting heterogeneous data sources. Although RML (RMLio flavour) was not a W3C Recommendation, it gained a lot of traction, and has been extended by the KG-Construct W3C Community Group as RMLKGC. Currently, this results in three main flavours (i.e. R2RML, RML io, and KG-Construct's RMLKGC) used among users of these mapping languages. Therefore, many existing mappings cannot be used among all existing [R2]RML engines, since they only implement one [R2]RML flavour. In this paper, we implement a translation of all flavours into the latest RML flavour (i.e. RML KGC) within RMLMapper. This way, any mapping - no matter which flavour of [R2]RML was used - can be executed by RMLMapper. We discuss our translation approach and evaluate it in the KGCW Challenge 2024 Track 1 and all available RMLio and R2RML test cases to verify our translation into RMLKGC. We were able to translate R2RML and RMLio to RMLKGC Core (98,7%) and some parts of RMLKGC IO (50,75%) modules without changing the [R2]RML mappings. We reach a total coverage of 73,70% among all RMLKGC test cases and 100% coverage for RMLio and R2RML test cases. Thanks to our translation approach, we can re-use the same RMLMapper for all flavours without requiring the user to change their mappings. In the future, we aim to support all RMLKGC modules, while keeping support for the other flavours.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;RML</kwd>
        <kwd>Knowledge Graph Construction</kwd>
        <kwd>RMLMapper</kwd>
        <kwd>Challenge</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Users with existing RML mappings are limited in the RML engines they can use, since no
engine supports all [R2]RML flavours. RML KGC can represent all elements present in R2RML and
RMLio through translation because each specification is backwards compatible with each other.
However, no engine takes advantage of this backwards compatibility to represent the other
lfavours as RML KGC. Therefore, users must translate all their existing mappings first if they
want to use a diferent engine which supports a newer flavour. Moreover, we do not want to
deprecate support for R2RML and RMLio flavours in our existing mapping engine RMLMapper 2
when adding support for RMLKGC. We overcome this problem in RMLMapper by translating all
three [R2]RML flavours into RML KGC [3]. By applying our translation, RMLMapper can now
read any RML mapping without requiring the user to change them, written in R2RML, RMLio,
and RMLKGC.</p>
      <p>Thanks to our translation in RMLMapper, users can still run their existing [R2]RML mappings
while the Knowledge Graph Construction community can work towards the standardization of
RML as a W3C Recommendation in the future.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Approach</title>
      <sec id="sec-2-1">
        <title>2.1. Translation</title>
        <p>In this Section, we show our translation for R2RML and RMLio into RMLKGC. This translation
is implemented in RMLMapper and automatically applied without any intervention of the
user. Therefore, the user does not have to migrate existing [R2]RML mappings into RMLKGC
immediately.</p>
        <p>We compare the diferent [R2]RML flavours among each other to establish a translation path
towards RMLKGC. Tables 1 &amp; 2 list all required translations to translate R2RML and RMLio
into RMLKGC. The biggest translations rely on the removal of R2RML Logical Table in favor
of RMLKGC Logical Source, the ontology prefixes which are diferent between flavours, and
the access descriptions for data sources. In this work, we cover the RMLKGC Core which focus
on RDF generation with RML Triples Maps and parts of the RMLKGC IO modules used for
accessing data sources and targets in RML because they overlap with the R2RML and RMLio
lfavours. Translation is required to avoid implementing all flavours separately in engines such
as RMLMapper. In the future, we will expand our work to the other RMLKGC modules.
Prefixes Every [R2]RML flavour has its own prefix which allows us to recognize which flavour
of [R2]RML was used to create the mapping. Since most of the terms in the ontologies are similar
to each other, we translate the prefixes of R2RML and RML io into RMLKGC by replacing them
with the new prefix. However, some changes in RML KGC require additional transformations
which we describe in the next paragraphs.</p>
        <p>Literals in rml:source RMLio mappings consistently use Literals in the RMLio Logical
Source’s rml:source to describe the path to a file which is used in the mapping as data source.
2https://github.com/RMLio/rmlmapper-java
This approach was deprecated in 2015 [9] and replaced by access descriptions such as DCAT [10],
SD [11], or D2RQ [12] to access heterogeneous data sources, e.g. files, SPARQL services, or
databases. RMLKGC drops this deprecated option which requires a transformation when a
mapping still uses Literals for rml:source. If we encounter such a case, we replace it by a
DCAT access description. If the path to the file is a relative path, we cannot use DCAT since
there is no base IRI available to resolve the relative path against. To overcome this problem, an
RMLKGC Relative Path Source was introduced to handle this case3.
rml:query The RMLio specification 4 does not indicate how queries must be specified in the
case of relational databases or SPARQL services. Over the past decade, an unoficial predicate
rml:query was used to address this problem. This way, queries could be specified in the
RMLio mapping to access such sources. The W3C Community Group on Knowledge Graph
Construction incorporated the query property in the iterator, but there is still discussion around
this approach5. In this work, we do perform this transformation and add the necessary access
descriptions for the relational database or SPARQL service if needed.
rr:tableName &amp; rr:LogicalTable R2RML Logical Table and rr:tableName shortcut must
be translated completely into RMLKGC Logical Source since a Logical Source is an expansion of
an R2RML Logical Table. We perform this transformation by moving the query from R2RML
Logical Table into the iterator and adding the access description of the database using the D2RQ
ontology6. The reference formulation is set to rml:SQL2008Table. For rr:tableName, we
also add the access description using the D2RQ ontology, and place the table name as well in
the iterator. However, the reference formulation is set to rml:SQL2008Table, allowing RML
engines to detect that they receive a table name instead of a SQL query.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Implementation in RMLMapper</title>
        <p>Our translation is applied when RMLMapper parses the [R2]RML mappings. Each part of
the [R2]RML mapping which is not using RMLKGC, is translated internally. This way, the
RMLMapper operates on [R2]RML mappings based on the latest RMLKGC version. Only for
R2RML, the database details needs to be supplied by the user as R2RML does not include the
database access information in its mappings. Our implementation in the RMLMapper is written
in Java, released as v7.0.0, and available on GitHub7 under the MIT license.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Evaluation</title>
      <p>In this Section, we evaluate our translation approach on the RMLKGC test cases of the Knowledge
Graph Construction Workshop (KGCW) Challenge 2024 Track 18 to verify our implementation
and identify which parts of the RMLKGC modules are (not) supported. Moreover, we have
validated all R2RML and RMLio test cases’ RDF output on the RMLMapper for correctness to
avoid that our translation approach breaks the other [R2]RML flavors when translating into
RMLKGC.</p>
      <p>The KGCW Challenge 2024 Track 1 consists of 365 test cases from 5 diferent RML KGC
modules: RMLKGC Core (238 test cases), RMLKGC IO (67 test cases), RMLKGC FNML (13 test
cases), RMLKGC CC (29 test cases), and RMLKGC Star (18 test cases). Each module provides a
set of test cases to evaluate the compliance of engines with the specification provided by the
module. RMLKGC Core has the most test cases because it contains the core functionality for
generating RDF using RMLKGC mappings, followed by RMLKGC IO which focus on accessing
various data sources and targets used in RMLKGC mappings. The other modules have lower
number of test cases, thus engines supporting RMLKGC Core and RMLKGC IO already have a high
coverage of the new RMLKGC flavour. We calculate the coverage of each module by dividing the
number of passing test cases by the number of test cases per module. The Knowledge Graph
Construction W3C Community Group does not provide a detailed description for each test case
yet, but it is planned for the future.</p>
      <p>Table 3 shows the coverage of RMLMapper with all [R2]RML test cases with and without
our approach. Without our translation approach, RMLMapper achieves 0% coverage on the
RMLKGC test cases of the KGCW Challenge. RMLMapper passes 100% of the R2RML and RMLio
test cases. We achieve 98,70% coverage for RMLKGC Core and 50,75% coverage for RMLKGC IO.
RMLKGC Core has a few test cases where RMLMapper fails to provide the correct output: For</p>
      <p>R2RML</p>
      <p>RMLKGC
RMLKGC Core, the test case RMLTC0010{a,b,c}-JSON fails for RMLMapper as it uses the latest
IETF JSONPath expressions which are not supported yet by the RMLMapper. For RMLKGC IO,
new serialization and compression formats are not implemented in RMLMapper. Moreover,
compressed data sources cannot be accessed yet by RMLMapper. We did not implement specific
translations yet for FnO functions, provided by the RMLKGC FNML and other modules such as
RMLKGC Star for RDF-Star support or RMLKGC CC to generate RDFS Collections &amp; Containers.
Therefore, we expected only a small set of test cases would succeed for these modules. The total
coverage of RMLKGC test cases we reach for RMLMapper is 73,70%. RMLMapper still passes
100% of the R2RML and RMLio test cases with our translation approach.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion</title>
      <p>In this paper, we showed our approach for translating R2RML and RMLio into the latest RMLKGC
and evaluated it on the RMLKGC test cases. Thanks to our work, users can still execute their
existing RML mappings while the community works towards a standardization of RMLKGC
as a W3C Recommendation. In the future, we aim to support more RMLKGC modules besides
RMLKGC Core and IO, and perform an evaluation of the translation itself since we focus in this
work on participating in the KGCW Challenge which only evaluate the generated RDF of each
engine.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>The described research activities were supported by SolidLab Vlaanderen (Flemish Government,
EWI and RRF project VV023/10). Dylan Van Assche is supported by the Special Research Fund
of Ghent University9 under grant BOF20/DOC/132.
9https://www.ugent.be/en/research/funding/bof/overview.htm</p>
      <p>TR/r2rml/.
[2] A. Dimou, M. Vander Sande, P. Colpaert, R. Verborgh, E. Mannens, R. Van de Walle,
RML: A Generic Language for Integrated RDF Mappings of Heterogeneous Data, in:
C. Bizer, T. Heath, S. Auer, T. Berners-Lee (Eds.), Proceedings of the 7th Workshop on
Linked Data on the Web, volume 1184 of CEUR Workshop Proceedings, CEUR, 2014. URL:
http://ceur-ws.org/Vol-1184/ldow2014_paper_01.pdf.
[3] A. Iglesias-Molina, D. Van Assche, J. Arenas-Guerrero, B. De Meester, C. Debruyne, S.
Jozashoori, P. Maria, F. Michel, D. Chaves-Fraga, A. Dimou, The RML Ontology: A
CommunityDriven Modular Redesign After a Decade of Experience in Mapping Heterogeneous Data
to RDF, in: Submited to ISWC2023, 2023.
[4] D. Van Assche, T. Delva, G. Haesendonck, P. Heyvaert, B. De Meester, A. Dimou, Declarative
RDF graph generation from heterogeneous (semi-)structured data: A systematic literature
review, Journal of Web Semantics (2022). doi:10.1016/j.websem.2022.100753.
[5] E. Iglesias, S. Jozashoori, D. Chaves-Fraga, D. Collarana, M.-E. Vidal, SDM-RDFizer: An
RML Interpreter for the Eficient Creation of RDF Knowledge Graphs, in: Proceedings of
the 29th ACM International Conference on Information &amp; Knowledge Management, ACM,
2020. doi:10.1145/3340531.3412881.
[6] J. Arenas-Guerrero, D. Chaves-Fraga, J. Toledo, M. S. Pérez, O. Corcho, Morph-KGC:
Scalable knowledge graph materialization with mapping partitions, Semantic Web (2022)
1–20. doi:10.3233/sw-223135.
[7] P. Heyvaert, B. De Meester, D. Van Assche, et al., Rmlmapper, 2024. URL: https://github.</p>
      <p>com/RMLio/rmlmapper-java.
[8] G. Haesendonck, W. Maroy, P. Heyvaert, R. Verborgh, A. Dimou, Parallel RDF generation
from heterogeneous big data, in: S. Groppe, L. Gruenwald (Eds.), Proceedings of the
International Workshop on Semantic Big Data - SBD '19, number 1 in SBD ’19, ACM
Press, Amsterdam, Netherlands, 2019. URL: https://biblio.ugent.be/publication/8619808/
ifle/8659668.pdf. doi: 10.1145/3323878.3325802.
[9] A. Dimou, R. Verborgh, M. V. Sande, E. Mannens, R. V. de Walle, Machine-interpretable
dataset and service descriptions for heterogeneous data access and retrieval, in:
Proceedings of the 11th International Conference on Semantic Systems - SEMANTICS '15, ACM
Press, 2015. doi:10.1145/2814864.2814873.
[10] R. Albertoni, D. Browning, S. Cox, A. Gonzalez Beltran, A. Perego, P. Winstanley, Data
Catalog Vocabulary (DCAT) - Version 2, Recommendation, World Wide Web Consortium
(W3C), 2020. URL: https://www.w3.org/TR/vocab-dcat/.
[11] G. Williams, SPARQL 1.1 Service Description, Recommendation, World Wide Web
Consortium (W3C), 2013. URL: https://www.w3.org/TR/sparql11-service-description/.
[12] R. Cyganiak, C. Bizer, J. Garbers, O. Maresch, C. Becker, The D2RQ Mapping Language,
Technical Report, FU Berlin, DERI, UCB, JP Morgan Chase, AGFA Healthcare, HP Labs,
Johannes Kepler Universität Linz, 2012. URL: http://d2rq.org/d2rq-language.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Das</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sundara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Cyganiak</surname>
          </string-name>
          ,
          <article-title>R2RML: RDB to RDF Mapping Language</article-title>
          , Working Group Recommendation,
          <source>World Wide Web Consortium (W3C)</source>
          ,
          <year>2012</year>
          . URL: http://www.w3.org/
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>