<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Why to Tie to a Single Data Mapping Language? Enabling a Transformation from ShExML to RML</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Herminio García-González</string-name>
          <email>herminio.garciagonzalez@kazernedossin.eu</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Anastasia Dimou</string-name>
          <email>anastasia.dimou@kuleuven.be</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>KU Leuven, Department of Computer Science</institution>
          ,
          <addr-line>Sint-Katelijne-Waver</addr-line>
          ,
          <country country="BE">Belgium</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Kazerne Dossin: Memorial, Museum and Research Centre on Holocaust and Human Rights</institution>
          ,
          <addr-line>Mechelen</addr-line>
          ,
          <country country="BE">Belgium</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2022</year>
      </pub-date>
      <fpage>13</fpage>
      <lpage>15</lpage>
      <abstract>
        <p>Diferent mapping languages (e.g., RML, SPARQL-Generate and ShExML) have appeared in the last years covering diferent use cases, scenarios and functionalities. However, users cannot seamlessly interchange between these mapping languages. In this paper, we propose a translation from ShExML to RML letting users benefit from the usability of ShExML and the wide-support and functionalities of RML. declarative mapping rules, translation, data mapping languages, ShExML, RML ∗Corresponding author.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Declarative mapping languages have been increasingly adopted during the last years in diferent
ifelds (e.g., DBpedia [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], digital heritage [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] or the railway domain [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ])1. These mapping
languages can be used to specify declarative mapping rules which allow for a modifiable,
lfexible, repeatable and shareable workflow when integrating heterogeneous data sources in a
Knowledge Graph (KG) [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] superseding the imperative solutions. However, not all mapping
languages cover the same functionalities, leading to diverse mapping languages. The consequent
lack of interoperability between the diferent languages ties users to a certain specification,
preventing them to switch between mapping languages.
      </p>
      <p>
        Among the diferent declarative mapping languages, RML [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] is the one that has been the
most adopted by the community. More than 20 associated systems implement RML allowing
diferent approaches and optimisations 2 which make RML a very reliable and long-term solution.
However, its syntax, based on RDF, tends to be very verbose and thus not so friendly for humans.
In this context, ShExML appeared with a clear vocation on usability letting users be more
productive when developing mapping rules [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. However, its specification only counts with a
conformant engine3, which does not cover all the functionalities that users could face nor can
users take advantage of all the optimisations ofered by the diferent RML implementations.
https://herminiogarcia.com/ (H. García-González); https://natadimou.com (A. Dimou)
      </p>
      <p>© 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
CEUR
Workshop
Proceedings</p>
      <p>Thus, it is important to build bridges to foster adoption and trust in the declarative KG
construction ecosystem while supporting users to efectively and seamlessly cover their needs.
With this objective in mind, in this paper, we propose a conversion from ShExML to RML, letting
users rapidly sketch their mapping rules in ShExML and then switch to RML to take advantage
of its plethora of implementations. A live demo is available at the ShExML playground4.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Diferent mapping languages were proposed in the past [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] based on dedicated languages,
e.g., RML extending R2RML to support heterogeneous data sources; repurposed languages,
e.g., SPARQL-Generate [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] extending the SPARQL query language or ShExML using Shape
Expressions (ShEx)-based syntax [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]; or serialisation languages like YARRRML [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] extending
YAML. As YARRRML is a tightly coupled serialisation for RML aiming for better readability
by humans, translating YARRRML to RML is straightforward5. Besides the YARRRML to RML
translation, an RML to SPARQL-Generate6 translation exists enabling the use of RML mapping
rules with the SPARQL-Generate implementation, but the inverse translation is not tackled.
      </p>
      <p>
        Recently, an ontology was presented as a meta-language to represent the functionalities of
existing mapping languages [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. Future implementations will ofer translations from and to the
targeted mapping languages. Despite covering all mapping rules, being a meta-language involves
more verbose constructions than existing languages, surpassing even a verbose language like
RML. In addition, covering the expressiveness from all potential mapping languages could be a
dificult—if ever approachable—task. Thus, specific translations are still necessary.
      </p>
      <p>In this paper we propose a specific translation from ShExML to RML to combine the usability
of ShExML and the sustainability and wide-support of RML implementations.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Brief introduction of RML and ShExML</title>
      <p>The two languages ofer diferent syntaxes for integrating heterogeneous data into a KG. The
example in Figure 1 generates a triple for each entity, showing their correspondences.</p>
      <p>On the one hand, RML7 ofers a syntax based in RDF which ties the representation of
the language to the RDF serialisation formats (i.e., Turtle, NTriples, RDF/XML, etc.). For
describing the mapping process it relies on classes like: r r : S u b j e c t M a p , r r : P r e d i c a t e M a p and
r r : O b j e c t M a p that define how the subject, predicate and object of a triple will be generated; r r :
P r e d i c a t e O b j e c t M a p which relates a predicate with an object; r m l : L o g i c a l S o u r c e which describes
an input data source; and r r : T r i p l e s M a p which defines how a set of triples will be generated for
a certain data source. Each class ofers diferent fields to modify the aspects of the generation.</p>
      <p>On the other hand, ShExML8 separates the constructions intended to extract values
(declarations) from those to generate triples (shapes). ShExML ofers: S O U R C E which points to a data
4http://shexml.herminiogarcia.com/editor/
5https://github.com/RMLio/yarrrml-parser
6https://github.com/sparql-generate/rml-to-sparql-generate
7https://rml.io/specs/rml/
8http://shexml.herminiogarcia.com/spec/
source, I T E R A T O R which defines a query whose results need to be iterated, F I E L D that holds the
query to a value that will be outputted in a triple, and E X P R E S S I O N which relates the sources
with the iterator and field queries. Then, the shapes part allows to format the output in triples
using already defined expressions for subjects, predicates and objects.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Translating ShExML constructions to RML rules</title>
      <p>
        As ShExML follows a ShEx-based syntax, a dedicated parser fitting the ShExML grammar is
required. For this purpose, the translator is embedded in the ShExML engine, taking advantage
of the already developed modules. Once the abstract tree is generated the translator traverses it
to generate the triples that conform an RML rules set. This process is similar to how the ShExML
engine outputs the mapped RDF but this time it generates RML. The translator performs a one to
one translation of the diferent directives, taking into account that in many cases information in
ShExML rules is dispersed across diferent directives in comparison with their RML counterparts
(see Figure 1 for a diagram on how ShExML constructions are translated to RML). A live demo
of this algorithm can be tested on the ShExML playground9. To verify the validity of the
performed translations we developed a set of test cases10 that cover diferent aspects of ShExML
functionality. The test cases convert the ShExML rules to RML rules. The latter are executed in
the rml-mapper processor11 and the output is compared to the output of the ShExML engine.
9http://shexml.herminiogarcia.com/editor/
10https://github.com/herminiogg/ShExML/tree/master/src/test/scala-2.12/com/herminiogarcia/shexml/rml
11https://github.com/RMLio/rmlmapper-java
Non translatable directives
Even though many declarative mapping languages share the same foundations and a common
iteration model [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], they have evolved in diferent directions covering diferent
functionalities. Therefore, unlike YARRRML, not all ShExML features can be directly translated to RML
statements, or it would involve too complex processes to make them possible.
      </p>
      <p>
        Certain features are included in ShExML but they are not directly supported in RML. This
includes Matchers12 (substitute one string occurrence for other string), String operations13
(concatenate two strings), and Auto-increment ids14. All these functionalities could be covered
in RML using the FnO [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] extension but that is beyond the scope of this translation. Last year,
ShExML introduced a set of modifications to cope with the mapping challenges 15 [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. It is now
possible to define dynamic data types 16 and language tags17. This has been tackled by RML
specification for dynamic language tags generation but not for dynamic datatypes. ShExML
allows accessing fields outside the iteration scope. This is useful when accessing parent nodes
in JSON as JSONPath specification does not support it 18. It has been described how to tackle
this with RML [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] but it is not yet implemented in the specification.
      </p>
      <p>
        Finally, external functions execution has been recently supported in ShExML. However,
ShExML uses Scala classes that can be directly executed while FnO relies on a function hub
where functions can be implemented in JavaScript or Java. This could be resolved by translating
from Scala to Java code and then pushing the Java code to the Function Hub [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
      </p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions</title>
      <p>In this work, we described a translation from ShExML to RML, letting users sketch mapping
rules rapidly in ShExML and then switching to RML. In addition, we discussed the limitations
of the current translation due to diferent coverage of features between the two languages. In
the future, we will tackle the inverse translation, i.e., from RML to ShExML. This will let more
users adopt ShExML using their already created RML rules as well as allowing them to switch
back and forth between the two languages depending on their needs.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>This work was carried out in the context of the EHRI-3 project funded by the European
Commission under the call H2020-INFRAIA-2018-2020, with grant agreement ID 871111 and DOI
10.3030/871111.
12http://shexml.herminiogarcia.com/spec/#matcher
13http://shexml.herminiogarcia.com/spec/#string-operation
14http://shexml.herminiogarcia.com/spec/#autoincrement-ids
15https://kg-construct.github.io/workshop/2021/challenges.html
16http://shexml.herminiogarcia.com/spec/#data-types-dynamic-version
17http://shexml.herminiogarcia.com/spec/#lang-tags-dynamic-version
18https://goessner.net/articles/JsonPath/</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>B.</given-names>
            <surname>De Meester</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Maroy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Dimou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Verborgh</surname>
          </string-name>
          , E. Mannens,
          <article-title>Declarative data transformations for Linked Data generation: the case of DBpedia</article-title>
          ,
          <source>in: The Semantic Web - ISWC 2017</source>
          , Springer,
          <year>2017</year>
          .
          <source>doi:1 0 . 1 0</source>
          <volume>0 7 / 9 7 8 - 3 - 3 1 9 - 6 8 2 0 4 - 4</volume>
          _
          <fpage>2</fpage>
          8 .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>H.</given-names>
            <surname>García-González</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Albarrán-Fernández</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. E. L.</given-names>
            <surname>Gayo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Calleja-Puerta</surname>
          </string-name>
          ,
          <article-title>Converting Asturian Notaries Public deeds to Linked Data Using TEI and ShExML</article-title>
          ,
          <source>in: Proceedings of the Third Workshop on Humanities in the Semantic Web, CEUR</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Rojas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Aguado</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Vasilopoulou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Velitchkov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. V.</given-names>
            <surname>Assche</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Colpaert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Verborgh</surname>
          </string-name>
          ,
          <article-title>Leveraging Semantic Technologies for Digital Interoperability in the European Railway Domain</article-title>
          ,
          <source>in: The Semantic Web-ISWC</source>
          , Springer,
          <year>2021</year>
          .
          <source>doi:1 0 . 1 0</source>
          <volume>0 7 / 9 7 8 - 3 - 0 3 0 - 8 8 3 6 1 - 4</volume>
          _
          <fpage>3</fpage>
          8 .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>B. D.</given-names>
            <surname>Meester</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Heyvaert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Verborgh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Dimou</surname>
          </string-name>
          ,
          <source>Mapping Languages: Analysis of Comparative Characteristics, in: Joint Proceedings of the 1st Workshop on Knowledge Graph Building and 1st Workshop on Large Scale RDF Analytics, CEUR</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Dimou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. V.</given-names>
            <surname>Sande</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Colpaert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Verborgh</surname>
          </string-name>
          , E. Mannens, R. V. de Walle,
          <article-title>RML: A generic language for integrated RDF mappings of heterogeneous data</article-title>
          ,
          <source>in: Proceedings of the Workshop on Linked Data on the Web, CEUR</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>H.</given-names>
            <surname>García-González</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Boneva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Staworko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. E.</given-names>
            <surname>Labra-Gayo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M. C.</given-names>
            <surname>Lovelle</surname>
          </string-name>
          ,
          <article-title>ShExML: improving the usability of heterogeneous data mapping languages for first-time users</article-title>
          ,
          <source>PeerJ Computer Science</source>
          (
          <year>2020</year>
          )
          <article-title>e318</article-title>
          .
          <source>doi:1 0 . 7 7</source>
          <volume>1 7</volume>
          / p e e r j - c
          <source>s . 3 1 8 .</source>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>D. V.</given-names>
            <surname>Assche</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Delva</surname>
          </string-name>
          , G. Haesendonck,
          <string-name>
            <given-names>P.</given-names>
            <surname>Heyvaert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. D.</given-names>
            <surname>Meester</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Dimou</surname>
          </string-name>
          ,
          <article-title>Declarative RDF graph generation from heterogeneous (semi-)structured data: a Systematic Literature Review</article-title>
          ,
          <source>Journal of Web Semantics</source>
          (
          <year>2022</year>
          ). (In press).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M.</given-names>
            <surname>Lefrançois</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Zimmermann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Bakerally</surname>
          </string-name>
          ,
          <article-title>A SPARQL extension for generating RDF from heterogeneous formats</article-title>
          ,
          <source>in: The Semantic Web-ESWC</source>
          , Springer,
          <year>2017</year>
          .
          <source>doi:1 0 . 1 0</source>
          <volume>0 7 / 9 7 8 - 3 - 3 1 9 - 5 8 0 6 8 - 5</volume>
          _
          <fpage>3</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>E.</given-names>
            <surname>Prud</surname>
          </string-name>
          <article-title>'hommeaux</article-title>
          ,
          <string-name>
            <given-names>J. E. Labra</given-names>
            <surname>Gayo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Solbrig</surname>
          </string-name>
          , Shape Expressions:
          <article-title>An RDF Validation and Transformation Language</article-title>
          ,
          <source>in: Proceedings of the 10th International Conference on Semantic Systems</source>
          , ACM,
          <year>2014</year>
          .
          <source>doi:1 0 . 1 1</source>
          <volume>4 5 / 2 6 6 0 5 1 7 . 2 6 6 0 5 2 3 .</volume>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>P.</given-names>
            <surname>Heyvaert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>De Meester</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Dimou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Verborgh</surname>
          </string-name>
          ,
          <article-title>Declarative rules for linked data generation at your fingertips!</article-title>
          ,
          <source>in: The Semantic Web: ESWC Satellite Events</source>
          , Springer,
          <year>2018</year>
          .
          <source>doi:1 0 . 1 0</source>
          <volume>0 7 / 9 7 8 - 3 - 3 1 9 - 9 8 1 9 2 - 5</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>A.</given-names>
            <surname>Iglesias-Molina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Cimmino</surname>
          </string-name>
          , E. Ruckhaus1,
          <string-name>
            <surname>D.</surname>
            Chaves-Fraga,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>García-Castro</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          <string-name>
            <surname>Corcho</surname>
          </string-name>
          ,
          <article-title>An Ontological Approach for Integrating Declarative Mapping Languages</article-title>
          , Semantic
          <string-name>
            <surname>Web</surname>
          </string-name>
          (
          <year>2022</year>
          ). (In press).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>B. De Meester</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Seymoens</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Dimou</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Verborgh</surname>
          </string-name>
          ,
          <article-title>Implementation-independent function reuse</article-title>
          ,
          <source>Future Generation Computer Systems</source>
          (
          <year>2020</year>
          ).
          <source>doi:1 0 . 1 0</source>
          <volume>1 6</volume>
          / j . f u t u r e .
          <source>2 0 1 9 . 1 0 . 0 0 6 .</source>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>H.</given-names>
            <surname>García-González</surname>
          </string-name>
          ,
          <article-title>A ShExML Perspective on Mapping Challenges: Already Solved Ones, Language Modifications and Future Required Actions</article-title>
          ,
          <source>in: Proceedings of the 2nd International Workshop on Knowledge Graph Construction, CEUR</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>T.</given-names>
            <surname>Delva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. V.</given-names>
            <surname>Assche</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Heyvaert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. D.</given-names>
            <surname>Meester</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Dimou</surname>
          </string-name>
          ,
          <article-title>Integrating Nested Data into Knowledge Graphs with RML Fields</article-title>
          ,
          <source>in: Proceedings of 2nd Workshop on Knowledge Graph Construction, CEUR</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>