=Paper= {{Paper |id=Vol-3141/paper6 |storemode=property |title= |pdfUrl=https://ceur-ws.org/Vol-3141/paper6.pdf |volume=Vol-3141 |authors=Ana Iglesias-Molina,Andrea Cimmino,Oscar Corcho |dblpUrl=https://dblp.org/rec/conf/esws/Iglesias-Molina22 }} ==== https://ceur-ws.org/Vol-3141/paper6.pdf
Devising Mapping Interoperability With Mapping
Translation
Ana Iglesias-Molina1 , Andrea Cimmino1 and Oscar Corcho1
1
    Ontology Engineering Group, Universidad Politécnica de Madrid, Spain


                                         Abstract
                                         Nowadays, Knowledge Graphs are extensively created using very different techniques, mapping languages
                                         among them. The wide variety of use cases, data peculiarities, and potential uses has had a substantial
                                         impact in how mapping languages have been created, extended, and applied, resulting in a high amount of
                                         diverse languages. These languages, along with their compliant tools and usually the lack of information
                                         of both, lead users to use other techniques to construct Knowledge Graphs, such as ad hoc programming
                                         scripts that suit their needs. This choice is normally less reproducible and maintainable, what ultimately
                                         affects the quality of the generated RDF data. We devise with mapping translation an enhancement to the
                                         interoperability of existing mapping languages. Having the possibility of translating mappings to other
                                         languages can make more accessible mapping technologies, lowering its learning curve and widening
                                         the use of mapping tools. This position paper analyses the possible language translation approaches,
                                         presents the scenarios in which it is being applied and discusses how it can be implemented.

                                         Keywords
                                         Mapping languages, Ontology Description, Mapping Translation




1. Introduction
Knowledge Graphs (KG) are increasingly used in academia and industry to represent and manage
the increasing amount of data on the Web [1]. A large number of techniques to create KGs have
been proposed: tools such as OpenRefine, programming ad hoc scripts, or mapping languages.
   Mapping languages represent the relationships between heterogeneous data and an RDF
version following the schema provided by an ontology, i.e., the rules on how to translate from
non-RDF data into RDF. This data can be expressed in a wide variety of formats, such as tabular,
JSON, or XML among many others. Due to the heterogeneous nature of data, the extensive range
of techniques and the specific requirements that some scenarios may impose, an increasing
number of mapping languages have been proposed [2, 3]. The differences among them are based
on mainly three different aspects: (a) they focus on a particular feature, whereas to describe a
specific data format (e.g., RML [2] extension of R2RML [4] for more data sources than RDBs) or
to implement a new capability (e.g., R2RML-F [5] for incorporating functions to R2RML); (b) if
they are designed for a particular technique or scenario that has special requirements, (e.g., the

Third International Workshop On Knowledge Graph Construction, Co-located with the ESWC 2022, Crete - 30th May
2022
Envelope-Open ana.iglesiasm@upm.es (A. Iglesias-Molina); andreajesus.cimmino@upm.es (A. Cimmino); oscar.corcho@upm.es
(O. Corcho)
Orcid 0000-0001-5375-8024 (A. Iglesias-Molina); 0000-0002-1823-4484 (A. Cimmino); 0000-0002-9260-0753 (O. Corcho)
                                       © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
WoT-mappings [6], which were designed as an extension of the Web of Things standard [7]); or
(c) they are based on a schema/language that has some inherent capabilities (e.g., SPARQL-based
languages, such as SPARQL-Generate [8], that extends SPARQL 1.1).
   As a result, the diversity of mapping languages allows for the construction of KG from
heterogeneous data sources in many different scenarios. Current mapping languages may be
categorized by their schema: RDF-based (e.g. R2RML [4] and extensions, CSVW [9]), SPARQL-
based (e.g., SPARQL-Generate [8], SPARQL-Anything [10]) or based on other schemas (e.g.
ShExML [11], Helio mappings[12]). Nevertheless, the existing techniques usually implement
just one mapping language, and sometimes not even the whole language specification [13, 14].
Deciding which language and technique should be used in each scenario becomes a costly task,
since the choice of one language may not cover all needed requirements [15]. Some scenarios
require a combination of mapping languages because of their differential features, which entails
using different techniques. In many cases, this diversity leads to ad hoc solutions that reduce
reproducibility, maintainability, and reusability [16].
   The increasing and heterogeneous emergence of new use cases still motivates the community
to keep developing solutions that are, more commonly than desired, not compatible with existing
ones. This position paper develops the concept of mapping translation, proposed by Corcho
et al. [17], that is defined as “a function that transforms a set of mappings described in one
language into a set of mappings described in another language”. Enabling mapping translation
can enhance the interoperability among existing mapping languages in several aspects, such
as reducing (a) the need of learning many different languages and (b) the effort to manually
translate mappings to use different tools. Then, as a result, mapping technologies can be more
accessible to users. Just by learning one language, users could use any tool that suits best
their needs without necessarily learning its compliant mapping language, what usually poses a
challenge and supposes a barrier for using them [18]. This paper presents some approaches for
language translation, shows the current situations in which mapping translation is being applied
and their benefits, and outlines different techniques in which this issue may be addressed.
   The remaining of this article is structured as follows: Section 2 provides some insights about
language translation and the situations in which it is being applied. Section 3 proposes three
different techniques to address mapping translation at a larger scale. Finally, Section 4 draws
some conclusions of the concepts presented in the paper.


2. Mapping translation: Context
Corcho et al. [17] define mapping translation as “a function that transforms a set of mappings
described in one language (we call them original mappings) into a set of mappings described in
another language (we call them target mappings)”. In addition, they attach to that function two
desirable properties, the information preservation property (IPP) and the query result preservation
property (QRPP). IPP states that the original and target mappings will be able to transform data
to RDF that must contain the same information. QRPP states that the data transformed with
the original and the target mappings must return the same information when queried. In other
words, mapping translation must ensure that the same information must be preserved in the
original and the target mappings so that, when evaluated, their transformed data and queried
                                                                              L1
         L1               L2                         L2
                                           L1                   L3
                                                                        L3          L2
                                                    Common
              L3     L4                      interchange language


                                                                        L4          L5
                                           L4                   L6
                                                     L5
         L5               L6                                                  L6


      (a) Peer-to-peer transla-         (b) Common inter-            (c) Family of lan-
          tion approach.                    change language.             guages approach.


Figure 1: Types of language translations (Adapted from [20]).


results return the same information.
  Having this concepts present, following we introduce some approaches to language translation
and present a set of scenarios in which mapping translation is being applied. Authors assume
the reader is familiar with current mapping languages and their general characteristics (e.g.,
R2RML [19], RML [2], SPARQL-Generate [8], CSVW [9] and ShExML [11]).

2.1. Approaches to language translation
In the context of language translation, there are several approaches that carry out translations
among a set of languages. Depending on the situation at hand, an approach can be advantageous
with respect to the other ones. We highlight the following [21]:
   Peer-to-peer translation (Fig. 1a) supports ad hoc translation solutions between pairs of
languages. This one may seem as the most straightforward approach, requiring the development
of only the translator services needed for the situation at hand and with the possibility of
adjusting it ad hoc for each situation. However, it becomes decreasingly feasible as the number
of required translations increases.
   Common interchange language (Fig. 1b) uses a language that serves as an intermediary
among several languages. This approach reduces the number of translator services needed
to develop and it is the most feasible of the three to scale in amount. It involves creating
(or luckily having) a language able to represent the expressiveness of all languages, to avoid
information loss. Additionally, this implies that there are common patterns shared by the
languages independently of their representation, and that an abstract manner of gathering them
is possible, which may not be thus for highly heterogeneous languages.
   Family of languages (Fig. 1c) considers sets of languages and translations between the
representatives of each set. This approach stands out for situations where there are clear
subgroups of languages similar among them but dissimilar among languages from other groups.
In addition, this scenario could imply having ”chains of mappings”. That means, instead of
having a straight-forward translation or a two-step translation (like in previous approaches,
respectively), a mapping may need to be translated into intermediate languages until reaching
the target one if the differences between the original and the target are too many.

2.2. Mapping translation scenarios
Mapping translation is a concept that is already being applied to specific usse cases and in
different ways. For instance, there are currently some implementations that unidirectionally
translate pairs of mapping languages. ShExML and YARRRML in their respective online edi-
tors1,2 enable translation to RML. Another case is when tools implement RML/R2RML mapping
translation into the language they are designed to parse; such as Helio3 and SPARQL-Generate4 ,
that translate from RML to their respective language; and Ontop [22], that translates R2RML
into its compliant language, OBDA mappings [23]. These translations makes it possible to
extend the outreach of the tool, since they enable the possibility of using them without the
need for learning their specific language, but using one that is widely used, in this case the
W3C Recommendation R2RML and its extension RML. Another similar case is Morph-KGC 5 , a
materializer that translates from R2RML to RML, and more recently, to RML-star [24], to be
able to process the three languages.
   Another case we want to present is Mapeathor [18], a tool that takes the mapping rules speci-
fied in spreadsheets and transforms them into a mapping in either R2RML, RML or YARRRML. It
aims to lower the learning curve of those languages for new users and ease the mapping writing
process. Similarly, XRM6 (Expressive RDF Mapper) provides a user-friendly intermediate syntax
to create mappings in CSVW, R2RML and RML.
   Finally, we remark tools that provide a set of optimizations on the construction of RDF
graphs exploiting the translation of mapping rules. This is the case of Morph-CSV [25] and
FunMap [26]. Morph-CSV performs a transformation over the tabular data with RML+FnO
mappings and CSVW annotations, and outputs a database and R2RML mappings. FunMap takes
an RML+FnO mapping, performs the transformation functions indicated, outputs the parsed
data and generates a function-free RML mapping.
   The approaches presented are, mainly, examples of peer-to-peer translation for specific uses.
The exception are Mapeathor and XRM, that from an intermediate user-friendly representation
translate to some languages, and thus align with the approach of a common interchange
language. Even though most of these translation examples involve R2RML or RML, there is no
holistic approach of a general translation framework.


3. Mapping translation: Techniques
This section presents three proposals to implement a mapping translator service general enough
to enable translation among several languages taking into account the information preservation

    1
      http://shexml.herminiogarcia.com/editor/
    2
      https://rml.io/yarrrml/matey/#
    3
      https://github.com/oeg-upm/helio/wiki/Streamlined-use-cases#materialising-rdf-from-csv--xml-and-json-files-
using-rml
    4
      https://github.com/sparql-generate/rml-to-sparql-generate
    5
      https://morph-kgc.readthedocs.io/en/latest/
    6
      https://zazuko.com/products/expressive-rdf-mapper/
and query result preservation properties described in Section 2. These proposals are, namely,
(1) Software-based, (2) Construct Query-based, and (3) Executable Mapping-based. These
implementations can be applied to any of the language translation approaches of Section 2.1.
   Software-based translation. It consists on ad-hoc software implementation for each pair
of languages to perform bidirectional translations between them. As any ad hoc solution, it
benefits from adjusting specifically to any situation with the (almost) unlimited possibilities
that programming languages provide. This is the approach that all situations presented in
Section 2.2 have applied, although with unidirectional translations.
   Construct query-based translation. This approach takes advantage of SPARQL query
language with construct queries, which return an RDF graph. These particular queries extract
the data by matching graph patterns of the query (WHERE clause) and builds the output graph
based on a template (CONSTRUCT clause). Since many languages are defined by the schema of
an ontology and are usually written in Turtle syntax (e.g., R2RML and extensions), this approach
can be applicable to them. This approach benefits from relying on a well-established standard,
as SPARQL is nowadays, and its compliant engines. However, it would leave out languages
with other schemes (e.g., SPARQL-based), without relying on software-based solutions.
   Executable mapping-based translation. This last approach makes use of executable
mappings automatically generated from ontology alignment to perform data translation between
the two ontologies [27]. Similarly to the previous approach, this one also makes use of construct
queries from SPARQL in the executable mappings. While the previous one relied on manual
effort to build queries, this one takes advantage of the ontologies that define RDF-based mapping
languages. In addition to the benefits and setbacks that the previous approach has, this approach
may be hindered by the language constructs to build mappings. That is to say, single one-to-one
correspondences of ontology entities may not be enough to gather and be able to translate their
expressiveness and capabilities, especially for considerably different languages.
   The techniques proposed are presented in decreasing order of manual effort required. The
first one is completely ad hoc, and even though it could use some modules of the developed
solutions presented in Section 2.2, many more would be needed to provide a complete set
of bidirectional translations covering a good number of languages. The second one requires
considerable effort to build queries for RDF-based languages, assuming no extra help from
software implementation is needed. The third one could ideally be automatically done from
ontology alignments creation to mapping execution generation. However, the rate of success
of this approach without manual intervention is not expected to be high, especially for the
ontology alignment part when the input ontologies considerably differ from one another or
present different constructs (with different number of elements or differently structured).


4. Conclusions
This paper develops the concept of mapping translation, proposed by Corcho et al. [17]. It
analyses the possible language translation approaches, updates the scenarios in which it is
being applied, and proposes some implementation techniques to perform it.
   There are several possibilities in order to fully develop a complete solution to achieve mapping
translation that ensures information preservation, as described in previous sections. It not only
requires choosing the technical implementation according to the available efforts and resources,
but more importantly, it involves deciding wisely the language translation approach that suits
best this particular case of mapping languages.
   All mapping languages, independently of their design, have been built for the same purpose:
describing non-RDF data in terms of an ontology. It can be that the rules that the different
mappings create can be represented in an abstract, language-independent manner. However,
the sometimes large differences among these languages may question this assumption. Some
languages, inside their categories, are similar to each other, R2RML and its extensions, for
instance. Languages from different groups can be related, such as ShExML and RML, despite
some inevitable differences in their features. There are others that are more unique, such
as CSVW. Lastly, the SPARQL-based group is more isolated from the others due to the great
possibilities that SPARQL already provides.
   This scenario poses challenges for every language translation approach. Peer-to-peer transla-
tion would require a substantial amount of effort for divergent languages. Using families of
languages could work better with respect to the previous one, but it still would have to face
several challenges in language representation and the amount of translator services required.
Meanwhile, using a common interchange language would be the one that reduces most efforts,
but there is no absolute certainty that a common interchange language could be able to represent
them all. Still, authors advocate for this approach, and some steps have already been taken to
draft this language7 , with the base idea that the mapping rules can be abstracted and represented
in an ontology-based language.
   Even though it does not present itself as an easy task, mapping translation is a concept that
can only benefit the current landscape of heterogeneous mapping languages. After years of
Knowledge Graph construction, the increasing and heterogeneous emergence of new use cases
still motivates the community to keep developing solutions, whether ad hoc or with extensions
of widely used languages. Mapping translation has the potential of building bridges among the
past (but still used) and new solutions to improve interoperability.


Acknowledgments
The work presented in this paper is partially funded by Knowledge Spaces project (Grant
PID2020-118274RB-I00 funded by MCIN/AEI/ 10.13039/501100011033); and partially funded
by the European Union’s Horizon 2020 Research and Innovation Programme through the
AURORAL project, Grant Agreement No. 101016854.


References
 [1] A. Hogan, E. Blomqvist, M. Cochez, C. d’Amato, G. D. Melo, C. Gutierrez, S. Kirrane, J. E. L.
     Gayo, R. Navigli, S. Neumaier, et al., Knowledge graphs, ACM Computing Surveys (CSUR)
     54 (2021) 1–37.
 [2] A. Dimou, M. V. Sande, P. Colpaert, R. Verborgh, E. Mannens, R. Van De Walle, RML: A
     generic language for integrated RDF mappings of heterogeneous data, in: LDOW, 2014.
   7
       w3id.org/conceptual-mapping/portal
 [3] H. García-González, I. Boneva, S. Staworko, J. E. Labra-Gayo, J. M. Cueva-Lovelle, ShExML:
     improving the usability of heterogeneous data mapping languages for first-time users,
     PeerJ Computer Science 6 (2020) e318. URL: https://peerj.com/articles/cs-318.
 [4] S. Das, S. Sundara, R. Cyganiak, R2RML: RDB to RDF Mapping Language, W3C Recom-
     mendation 27 September 2012, www.w3.org/TR/r2rml (2012).
 [5] C. Debruyne, D. O’Sullivan, R2rml-f: towards sharing and executing domain logic in r2rml
     mappings, in: LDOW@ WWW, 2016.
 [6] A. Cimmino, M. Poveda-Villalón, R. García-Castro, ewot: A semantic interoperability
     approach for heterogeneous iot ecosystems based on the web of things, Sensors 20 (2020)
     822.
 [7] M. Kovatsch, R. Matsukura, M. Lagally, T. Kawaguchi, K. Kajimoto, Web of Things
     (WoT) Architecture, W3C Recommendation 9 April 2020, https://www.w3.org/TR/wot-
     architecture/ (2020).
 [8] M. Lefrançois, A. Zimmermann, N. Bakerally, A SPARQL extension for generating RDF
     from heterogeneous formats, Lecture Notes in Computer Science (including subseries
     Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 10249 LNCS
     (2017) 35–50.
 [9] J. Tennison, G. Kellogg, I. Herman, Model for tabular data and metadata on the web, W3C
     Recommendation (2015).
[10] E. Daga, L. Asprino, P. Mulholland, A. Gangemi, Facade-x: an opinionated approach to
     sparql anything, arXiv preprint arXiv:2106.02361 (2021).
[11] H. García-González, A shexml perspective on mapping challenges: already solved ones, lan-
     guage modifications and future required actions, in: Proceedings of the 2nd International
     Workshop on Knowledge Graph Construction, 2021.
[12] A. Cimmino, R. García-Castro, Helio: a framework for implementing the life cycle of
     knowledge graphs, Semantic Web (2022).
[13] D. Chaves-Fraga, F. Priyatna, A. Cimmino, J. Toledo, E. Ruckhaus, O. Corcho, Gtfs-madrid-
     bench: A benchmark for virtual knowledge graph access in the transport domain, Journal
     of Web Semantics 65 (2020) 100596.
[14] J. Arenas-Guerrero, M. Scrocca, A. Iglesias-Molina, J. Toledo, L. P. Gilo, D. Dona, O. Corcho,
     D. Chaves-Fraga, Knowledge graph construction with r2rml and rml: An etl system-
     based overview, in: Proceedings of the 2nd International Workshop on Knowledge Graph
     Construction, 2021.
[15] B. De Meester, W. Maroy, A. Dimou, R. Verborgh, E. Mannens, Declarative data trans-
     formations for linked data generation: The case of DBpedia, Lecture Notes in Computer
     Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in
     Bioinformatics) 10250 LNCS (2017) 33–48.
[16] A. Iglesias-Molina, D. Chaves-Fraga, F. Priyatna, O. Corcho, Enhancing the maintainability
     of the bio2rdf project using declarative mappings., in: The 12th International Conference
     on Semantic Web Applications and Tools for Health Care and Life Sciences, 2019.
[17] O. Corcho, F. Priyatna, D. Chaves-Fraga, Towards a new generation of ontology based
     data access, Semantic Web 11 (2020) 153–160.
[18] A. Iglesias-Molina, L. Pozo-Gilo, D. Doņa, E. Ruckhaus, D. Chaves-Fraga, Ó. Corcho,
     Mapeathor: Simplifying the specification of declarative rules for knowledge graph con-
     struction, in: ISWC (Demos/Industry), 2020.
[19] B. Villazón-Terrazas, M. Hausenblas, R2RML and Direct Mapping Test Cases, W3C Note,
     W3C, 2012. http://www.w3.org/TR/rdb2rdf-test-cases/.
[20] O. Corcho, A. Gómez-Pérez, A layered approach to ontology translation with knowledge
     representation, Ph.D. thesis, UPM, 2004.
[21] J. Euzenat, H. Stuckenschmidt, The ‘family of languages’ approach to semantic interoper-
     ability, Knowledge transformation for the semantic web 95 (2003) 49.
[22] D. Calvanese, B. Cogrel, S. Komla-Ebri, R. Kontchakov, D. Lanti, M. Rezk, M. Rodriguez-
     Muro, G. Xiao, Ontop: Answering sparql queries over relational databases, Semantic Web
     8 (2017) 471–487.
[23] M. Rodriguez-Muro, M. Rezk, Efficient sparql-to-sql with r2rml mappings, Journal of Web
     Semantics 33 (2015) 141–169.
[24] T. Delva, J. Arenas-Guerrero, A. Iglesias-Molina, O. Corcho, D. Chaves-Fraga, A. Dimou,
     Rml-star: A declarative mapping language for rdf-star generation, in: International
     Semantic Web Conference - Demos&Industry, 2021.
[25] D. Chaves-Fraga, E. Ruckhaus, F. Priyatna, M.-E. Vidal, O. Corcho, Enhancing virtual
     ontology based access over tabular data with morph-csv, Semantic Web (2021) 1–34.
[26] S. Jozashoori, D. Chaves-Fraga, E. Iglesias, M.-E. Vidal, O. Corcho, Funmap: Efficient
     execution of functional mappings for knowledge graph creation, in: International Semantic
     Web Conference, Springer, 2020, pp. 276–293.
[27] C. R. Rivero, I. Hernández, D. Ruiz, R. Corchuelo, Generating sparql executable mappings
     to integrate ontologies, in: International Conference on Conceptual Modeling, Springer,
     2011, pp. 118–131.