=Paper= {{Paper |id=Vol-2873/paper2 |storemode=property |title=A ShExML Perspective on Mapping Challenges: Already Solved Ones, Language Modifications and Future Required Actions |pdfUrl=https://ceur-ws.org/Vol-2873/paper2.pdf |volume=Vol-2873 |authors=Herminio García-González |dblpUrl=https://dblp.org/rec/conf/esws/Garcia-Gonzalez21a }} ==A ShExML Perspective on Mapping Challenges: Already Solved Ones, Language Modifications and Future Required Actions== https://ceur-ws.org/Vol-2873/paper2.pdf
A ShExML Perspective on Mapping Challenges:
 Already Solved Ones, Language Modifications
         and Future Required Actions

                            Herminio Garcı́a-González

       IT and Communications Service, University of Oviedo, Asturias, Spain
                         garciaherminio@uniovi.es



      Abstract. Data mapping languages allow users to create knowledge
      graphs with lower cost and time. Some challenges cannot be solved with
      state-of-the-art languages and tools, though. Thus, in this paper we use
      and modify ShExML to deal with some of them. We see how some chal-
      lenges were already solved, which modifications we had to perform to
      cover others, and how the rest of them could be covered in future ver-
      sions. Then, we establish a demonstration on language integrity after
      changes and a discussion on performed and upcoming changes. These
      solutions, alongside the discussion and joint analysis of other languages
      and tools solutions, will drive us to effective techniques to solve all pro-
      posed challenges.

      Keywords: mapping challenges, ShExML, data mapping languages, knowl-
      edge graph construction




1   Introduction
Mapping heterogeneous datasources using a single representation is an active
field which has been getting traction in the past years. For this purpose a set
of languages and tools has been proposed [8] which lower the cost and time
employed in these tasks. This trajectory ended up in the celebration of the 1st
International Workshop on Knowledge Graph Building [1] and, one year after,
the beginning of the Knowledge Graph Construction W3C Community Group1
where academia, industry and practitioners are gathered to envision new steps,
find unsolved problems, and face new challenges in this field2 . One of the commu-
nity outputs was a set of mapping challenges3 which are nowadays complicated
to solve with the current state-of-the-art languages, tools and techniques.
  Copyright © 2021 for this paper by its authors. Use permitted under Creative Com-
  mons License Attribution 4.0 International (CC BY 4.0).
1
  https://w3id.org/kg-construct/tpac/
2
  https://w3id.org/kg-construct/tpac/#report
3
  https://w3id.org/kg-construct/workshop/challenges.html
   Therefore, in this paper we tackle some of these mapping challenges with
ShExML [5] and try to solve the following questions:

 – Q1: How can mapping challenges be solved with ShExML?
 – Q2: How can unaddressed challenges be solved and implemented in ShExML?
 – Q3: Have modifications in ShExML affected the functioning of already present
   features?

   The rest of the paper is structured as follows: In Section 2 we summarise the
mapping challenges proposed by the community; additionally, we offer a set of
suplementary material to support following explanations. In Section 3 we see how
the current language specification and engine can solve some of the challenges.
Then, we explain how the solutions for other challenges have been implemented
and included in ShExML in Section 4. In Section 5 we propose some further
language modifications and a discussion on how the rest of the challenges could
be addressed. We demonstrate the old features integrity after including the new
ones and we establish a discussion on mapping challenges results in Section 6.
And, finally, in Section 7 we draw some conclusions.


2   Mapping challenges summary

During consecutive meetings in the Knowledge Graph Construction W3C Com-
munity Group, several mapping challenges and problems were arisen which are
collected in the workshop website4 . Thus, in this paper we deal with this selec-
tion of mapping challenges, we examine them and propose solutions within the
ShExML language and our engine5 . To categorise these solutions we link them
to ShExML versions so we can trace when these solutions were achieved, i.e., if
they were possible to solve before the mapping challenges were defined (ShExML
v0.2.3), if they were solved after the mapping challenges were defined (ShExML
v0.2.4 & v0.2.5) or if they are not yet solved (future versions).
    Thus, in Table 1 a summary table is offered with the addressed challenges
and with which version of ShExML engine the expected output is achieved.
Besides, we offer a webpage6 with links to the working solutions as supplemen-
tary material for the sake of demonstration and reproducibility. The inputs are
taken from the Knowledge Graph Construction W3C Community Group map-
ping challenges repository7 which contains a set of inputs and expected outputs
that are agreed to represent the community mapping problems.
    In the following sections we explain how solutions are achieved, which ShExML
constructions and techniques were used, we establish a discussion on reached so-
lutions and how unsolved challenges could be addressed in ShExML.
4
  https://w3id.org/kg-construct/workshop/challenges.html
5
  https://github.com/herminiogg/ShExML
6
  http://herminiogg.github.io/mapping-challenges/challenges/solutions.html
7
  https://github.com/kg-construct/mapping-challenges
                                      Already solved With language modifications
                                         (v0.2.3)         (v0.2.4 & v0.2.5)
Access fields         input 1               x                     X
outside iterators     input 2               x                     x
                      input 1               x                     X
                      input 2               x                     X
Datatype map          input 3               x                     X
                      input 4               x                     X
                      input 5               X                     X
                      input 1
Excel style                                 x                    x
                      (unaddressed)
Generate multiple     input 1               x                    X
values                input 2               x                    X
Join on literal       input 1               X                    X
                      input 1               x                    X
Language map          input 2               x                    X
                      input 3               x                    X
Multivalue references input 1            x (bug)                 X
Process multivalue    input 1
                                            x                    x
references            (unaddressed)
                      input 1               x                    X
RDF Collections       input 2               x                    x
                      input 3               x                    x
Table 1. Coverage table of mapping challenges in ShExML language and engine by
input files. Xmeans covered and x means not covered. Unaddressed means that this
challenge was not tried to solve in this work.



3     Already solved mapping challenges
In this section, we deal with the mapping challenges that can be solved without
any modification in the ShExML language and engine. Therefore, these solutions
are those that are reachable with ShExML v0.2.3 (released on 29th October
2020)8 , that is, before the mapping challenges were defined.

3.1    Datatype map (input 5)
Datatype map refers to the possibility to generate datatype tags from the input
content. Therefore, instead of defining them statically in the mapping rules, this
challenge aims to support the dynamic generation of datatype tags from input
content. In the case of input 5, it is intended that mapping languages would be
able to generate datatype tags according to the most probable value according to
values formats. For example, in input 5 it is expected that the number 3 would
have an xsd:integer datatype and 3.14 would have an xsd:decimal one.
    This inference mechanism was already implemented in ShExML engine which
in case that the user does not specifically define a datatype for an object value
it will infer the most probable one (see Listing 1.1). Although the current imple-
mentation solves this specific mapping challenge, it is a naı̈ve implementation.
8
    https://github.com/herminiogg/ShExML/releases/tag/v0.2.3
However, it can lead to a more complex inference system, for example, aligning
existing input data sources datatypes with RDF ones.

                   Listing 1.1. ShExML datatype inference function.
protected def searchForXSDType ( o: String ) : RDFDatatype = {
  if ( Try ( o . toInt ). isSuccess )
     XSDDatatype . XSDinteger
  else if ( Try ( o . toDouble ). isSuccess )
     XSDDatatype . XSDdecimal
  else if ( Try ( o . toBoolean ). isSuccess )
     XSDDatatype . XSDboolean
  else
     XSDDatatype . XSDstring
}



3.2     Join on literal

This challenge refers to the possibility to generate literals from a join condition
(i.e., from another source) where R2RML9 and RML [3] output a resource by
default.
    In ShExML, join conditions10 generate values without any specific form, so it
is not determined in this step if it is a literal or a resource. It is, then, defined by
the user in the shapes part, where the user decides the form of the output. This
is a design decision on ShExML that was driven by the separation of concerns
main principle. In Listing 1.2 we can see how the join condition is defined in
familyName expression, and how then this expression is used in :Author shape
without any prefix, indicating that a literal must be generated.

                   Listing 1.2. ShExML solution for join on literals.
PREFIX : < http: // example . com / >
PREFIX experson: < http: // example . com / person / >
PREFIX dbr: < http: // dbpedia . org / resource / >
PREFIX schema: < http: // schema . org / >
SOURCE jsonfile < https: // raw . githubusercontent . com /
kg - construct / mapping - challenges /
2 aac9680cd731fd647abd33d44a7f400e4278cf3 /
challenges / join - on - literal / input -1/ input . json >
ITERATOR author < jsonpath: $ . author [*] > {
   FIELD id 
   FIELD firstname < firstname >
   FIELD affiliation < affiliation >
}
ITERATOR people < jsonpath: $ . people [*] > {
9
     https://www.w3.org/TR/r2rml/
10
     See ShExML specification for a full explanation on how join construction works:
     http://shexml.herminiogarcia.com/spec/#join
     FIELD firstname < firstname >
     FIELD familyname < familyName >
}
EXPRESSION authors < jsonfile . author UNION jsonfile . people >
EXPRESSION familyName < jsonfile . people . familyname UNION
  jsonfile . author . firstname JOIN jsonfile . people . firstname >

:Author experson: [ authors . id ] {
  :affiliation [ authors . affiliation ] ;
  :lastName [ familyName ] ;
}



3.3     Multivalue references
This challenge deals with the expected output of a hierarchical document (e.g.,
XML or JSON files) where multiple iterators are used. The discussion in this
challenge is whether we produce the cartesian product and provide a join con-
dition to correlate values or if we just translate the hierarchical information as
it is, without the need to provide any join condition11 . This case becomes more
complicated if a join condition needs to be provided over a JSON file because of
the impossibility to access parent nodes (see Section 4.1 for the specific challenge
on this topic). Therefore, it seems that in hierarchical data the expected output
should be a verbatim translation (so to say, not creating the cartesian product
as it is not how it is originally represented in the input file).
     In ShExML, this was the default behaviour from its inception as in ShExML
first versions it only supported XML and JSON files. Besides, we saw it as a
more usable manner to define these mappings as usability is the main goal of
the language [4]. Therefore, in Listing 1.3 we can see how using iterators and
nested iterators we can cover these hierarchical data models. In Table 1 this
challenge is marked as not solved in ShExML v0.2.3 due to a bug when using
only the root node ($) in the top iterator query. However, we include it here
as the coverage of this challenge did not require syntax modifications nor new
features in ShExML engine, only a bug fix.
                 Listing 1.3. ShExML solution multivalue references.
PREFIX ex: < http: // example . com / >
PREFIX exLab: < http: // example . com / lab / >
PREFIX exArticle: < http: // example . com / article / >
PREFIX exAuthor: < http: // example . com / author / >
PREFIX exAff: < http: // example . com / aff / >
SOURCE lab_file < https: // raw . githubusercontent . com /
kg - construct / mapping - challenges / main
/ challenges / multivalue - references
/ input -1/ input . json >
11
     See discussion about this topic in RMLMapper reference implementation: https:
     //github.com/RMLio/rmlmapper-java/issues/28
ITERATOR lab < jsonpath: $ > {
  FIELD labName < labName >
  ITERATOR articles < articles [*] > {
    FIELD title < title >
    ITERATOR authors < authors [*] > {
      FIELD name < name >
      ITERATOR affiliation < affiliation [*] > {
        FIELD label < label >
      }
    }
  }
}

EXPRESSION labValues < lab_file . lab >

ex:Lab exLab: [ labValues . labName ] {
  a ex:Lab ;
  ex:hasArticles @ ex:Article ;
  ex:hasMembers @ ex:Author ;
}

ex:Article exArticle: [ labValues . articles . title ] {
  ex:hasAuthor @ ex:Author ;
}

ex:Author exAuthor: [ labValues . articles . authors . name ] {
  ex:hasAffiliation
   exAff: [ labValues . articles . authors . affiliation . label ] ;
}



4      Language modifications

In this section, we deal with the mapping challenges that needed some modifica-
tions in the ShExML language and engine. Therefore, these solutions are those
which are reachable with ShExML v0.2.4 (released on 18th January 2021)12
or with ShExML v0.2.5 (released on 27th January 2021)13 , that is, after the
mapping challenges were defined.


4.1     Access fields outside iterators

Sometimes, in hierarchical data models, there is the need to access values outside
the iteration pattern. For example, we may need to obtain the values that are
parents of the current iterated node. When dealing with XML files it does not
involve any modification in ShExML, as using XPath queries we are able to
12
     https://github.com/herminiogg/ShExML/releases/tag/v0.2.4
13
     https://github.com/herminiogg/ShExML/releases/tag/v0.2.5
access upper nodes with the double dot and slash notation (i.e., ../). However,
when dealing with JSON files, this is not possible because of JSONPath not
supporting the parent access notation14 .
    This is a well-known problem in data mapping languages as they use Json-
Path to define values accesses. Indeed, in xR2RML [11], the authors defined a
property called xrr:pushDown that takes a value in the hierarchy and pushes it
down into their offsprings iterators so it can be available further [12].
    Following this experience with xR2RML, we implemented a similar solution
in ShExML using the PUSHED_FIELD and POPPED_FIELD keywords. When using the
PUSHED_FIELD keyword the ShExML engine saves the value using the name as
the identifier for further uses. Then, when the POPPED_FIELD is used the ShExML
engine searches for the saved value with an identifier which is equal to that given
in the query part (i.e., inside < and >). Therefore, in Listing 1.4, the id field is
saved and then used in the cars iterator, so we can establish a relation from the
car to the owner.
    An interesting discussion here is if it is better to make these accesses implicit
or explicit. A recent RML syntax modification proposal [2] presented an algo-
rithm that could implicitly access values from upper hierarchical levels by saving
iteration information, indexes and values. Users can directly access upper ele-
ments from its current hierarchical level. Indeed, the solution is clean and avoids
the user’s explicit declaration of values to be saved. However, it has two possible
main drawbacks. Firstly, the use of a bigger amount of memory and time by
saving a lot information that could be used (or not) in further mapping rules.
This technique, could be, in the end, a bottleneck in performance if it is not
carefully implemented. Second one, it could complicate the engine implementa-
tion as it allows to go up and down in the hierarchy, while actual behaviour only
expects to go down. This could also lead to a performance issue. Either way,
this dichotomy should be quantified in further experiments to establish the best
solution in terms of usability and performance.

          Listing 1.4. ShExML solution for accessing fields outside iterations.
ITERATOR records < jsonpath: $ . records [*] > {
  PUSHED_FIELD id 
  FIELD enteredBy < enteredBy >
  ITERATOR cars < cars [*] > {
    FIELD make < make >
    POPPED_FIELD carOwner 
  }
}



4.2     Datatype map

As we mentioned in Section 3.1, this challenge aims to generate datatype tags
dynamically from data content. Therefore, the datatype inputs can appear in
14
     https://goessner.net/articles/JsonPath/
multiple ways: full URI, prefix plus datatype, or simply datatype name without
prefix.
    ShExML v0.2.3 supports the creation of static datatype tags with prefix
plus datatype syntax (see Listing 1.5). Therefore, we should derive this syntax
and maintain its proven usability [4] but giving dynamic datatype generation
possibilities. The natural expansion of this syntax is to include the same object
generation expression but also for datatypes and language tags (see Section 4.3).
So, the final syntax is prefix plus generation expression (inside square brackets)
as we can see in Listing 1.6. The prefix can be optional if the data value already
contains it (e.g., input 1 and 2) and values can be transformed using Matcher
feature15 to expected XML Schema valid datatypes (e.g., input 4).

                  Listing 1.5. ShExML static datatypes syntax.
ex:Person exPerson: [ person . firstname ] {
  ex:num [ person . num ] xsd:integer ;
}


                 Listing 1.6. ShExML dynamic datatypes syntax.
ex:Person exPerson: [ person . firstname ] {
  ex:num [ person . num ] xsd: [ person . dt ] ;
}



4.3   Language map

As with datatype maps in Section 4.2, the language map challenge want to
address the problem of generating language tags dynamically from input data.
In ShExML v0.2.3 language tags were supported statically, that is, it was possible
to tag an object expression with a specific language but it would be applied to
all values (see Listing 1.7).
    We performed a syntax and engine modification, like in datatype maps, to
be able to generate language tags with expressions. The final syntax is @ plus the
generation expression (between square brackets) as it can be seen in Listing 1.8.
Here, again, the idea was to preserve the usability as the main goal and to make
it as simple as possible. Input 1 tests the generation with a valid tag following
BCP4716 , input 2 tests the transformation of a language value to a valid tag (in
ShExML this is done using Matchers functionality17 ), and in input 3 it is shown
how two different sources can be joined to provide language information.
    As a side note, it is not possible to specify a language map and a datatype
map in the same triple, as it is forbidden by the ShExML grammar. This was
made intentional to follow the RDF specification rules as it can be seen in its
15
   http://shexml.herminiogarcia.com/spec/#matcher
16
   https://tools.ietf.org/html/bcp47
17
   http://shexml.herminiogarcia.com/spec/#matcher
grammar18 . Therefore, whenever a langtag generation expression (either static
or dynamic) is provided the implicit datatype is rdf:langString.

               Listing 1.7. ShExML static generation of language tags.
ex:Person exPerson: [ person . firstname ] {
  ex:lastName [ person . lastname ] @ en ;
}


              Listing 1.8. ShExML dynamic generation of language tags
ex:Person exPerson: [ person . firstname ] {
  ex:lastName [ person . lastname ] @ [ person . lang ] ;
}



4.4     Generate multiple values

This challenge wants to address the problem of generating various datatypes or
language tags for the same subject (e.g., a multi-language value). Once datatype
maps (see Section 4.2) and language maps (see Section 4.3) are supported in
ShExML, it is straightforward as ShExML will generate a triple per value re-
turned from the object expression. Therefore, to generate multi-language values
the syntax is like in Listing 1.9 and to generate multi-language values with a
default language the syntax is like in Listing 1.10.

              Listing 1.9. ShExML multiple values with language tags.
ex:Person exPerson: [ person . lastname ] {
  ex:name [ person . firstname . label ] @ [ person . firstname . lang ] ;
}


Listing 1.10. ShExML multiple values with language tags and with a default language.
ex:Person exPerson: [ person . firstname ] {
  ex:name [ person . firstname ] @ en ;
  ex:name [ person . firstname ] @ [ person . lang ] ;
}



4.5     RDF Collections

This challenge puts on the table the necessity for a mechanism to create RDF
Collections from some values. Normally, in ShExML, and in other data map-
ping languages, when an object generation expression returns multiple values,
multiple triples are generated (see Section 3.3). However, in certain cases it is
necessary to encapsulate these values inside a collection (e.g., to preserve order).
18
     https://www.w3.org/TR/turtle/#h3 sec-grammar-grammar
    This was already explored by some languages (e.g., SPARQL-Generate [7]
and xR2RML [11]) which provide some directives to create collections. Therefore,
we applied this experience in ShExML to cover RDF Collections and Containers
(i.e., Lists, Seqs, Bags and Alts.). Now, it is possible to indicate to the engine
that a collection or container should be generated using the keyword AS plus
the desired collection or container (i.e., RDFList, RDFBag, RDFSeq or RDFAlt). The
proposed syntax follows the same design principles from already existing features
syntax (e.g., Matchers19 ). See Listing 1.11 for an example.
          Listing 1.11. ShExML support for RDF collections and containers.
ex:Article exArticle: [ labValues . articles . title ] {
  a ex:Article ;
  ex:hasAuthors
    exAuthor: [ labValues . articles . authors . name AS RDFList ] ;
}



5      Future required actions
In this section we discuss further challenges that are not solved with the previ-
ously mentioned modifications. These are the challenges that would require to
rethink some functionality, or to include new ones, but that would need from a
well planned inclusion, due to their possible interference with other features.

5.1     Access fields outside iterators (input 2)
Although this challenge was already addressed in Section 4.1, only input 1 was
completely solved. In the case of input 2, where data is in the same hierarchi-
cal level (like it would come from two different files), using join conditions in
ShExML, only one car is linked to each owner when the expected result was two
cars per person. To solve this problem we think of two possible solutions.
     First one is to review the join condition functionality to check whether some-
thing is failing (a bug) or if the join condition need to be rethought and reim-
plemented to cover further challenges.
     Another possibility, which is already present in other languages like YARRRML
[6], is to provide conditional generation. With conditional content generation we
are able to test a condition (e.g., in input 2 for value equality) and generate or
not the resulting triple depending on its result.

5.2     Excel style
A classic solution when dealing with Excel sheets was to convert them to CSV,
and then treat them as tables to be processed by data mapping languages. How-
ever, this challenge found this solution not appropriate when the style of the
Excel sheet is wanted to be preserved. Two solutions could cover this challenge.
19
     http://shexml.herminiogarcia.com/spec/#matcher-0
    First one is to preprocess the Excel sheet and convert it to CSV but adding
columns with style information so it can be processed by state-of-the-art tools.
However, it would require some preprocessing work which would weaken the goal
of low cost and time invested when using data mapping tools.
    Second one is to include a specific Excel processor, with its own query lan-
guage, which can express not only access to cells, but also to the cell and text
style. Thus, in Java based implementations it can be considered to use Apache
POI to process sheets, and include some simple query support to retrieve styles.
Therefore, in ShExML this would require to support the mentioned query lan-
guage for Excel, and then, integrate it into the ShExML engine to retrieve Excel
sheets values.


5.3   Process multivalue references

This challenge is very close to multivalue references (see Section 3.3), but in this
case multivalues are included all within a string value and separated by commas.
    Therefore, here the challenge is not about how to output multivalues, or cre-
ate RDF collections, but how to effectively handle these multivalues which need
some processing. Therefore, this would require some sort of data transformations
functions that can be applied to the extracted values. Therefore, the most effec-
tive way to extend ShExML and enable users to transform data is to provide the
possibility to execute transformation functions which can be defined by users.
    Data transformation functions have been already explored in RML through
the FnO library [9] which provides a set of implementation independent reusable
functions [10]. So, one possibility is to support FnO functions inside ShExML.
The advantage of this proposal is that it moves all function infrastructure outside
the ShExML language and engine. Conversely, we add more dependencies to
users (which can find it hard to learn), we force them to use a third party
environment and we lose control of this part.
    Another possibility is to provide an environment to define inline functions
like the semantic actions in Shape Expressions (ShEx) [13]. Therefore, we can
provide a restricted environment where higher order functions could be executed
(see Listing 1.12 for an example). The advantages are that there is no need for
third party dependencies, it provides a higher flexibility and users do not need
to learn another tool. However, it can increase complexity due to the necessity
to know about functional programming.

        Listing 1.12. ShExML support for RDF collections and containers.
PREFIX ex: < http: // example . com / >
SOURCE lab_file < https: // raw . githubusercontent . com /
kg - construct / mapping - challenges / main / challenges /
process - multivalue - references / input -1/ input . json >
FUNCTION splitFunction  n . split ( ’ , ’) >
ITERATOR lab < jsonpath: $ > {
   FIELD labName < labName >
   ITERATOR articles < article > {
         FIELD title < title >
         FIELD tags < tags >
     }
}

EXPRESSION labValues < lab_file . lab >

ex:Tag ex: [ lab . articles . tag WITH splitFunction ] {
  ex:label [ lab . articles . tag WITH splitFunction ] ;
}


5.4      RDF Collections (input 2 & 3)
Although RDF collections and containers were included in ShExML (see Section
4.5), input 2 and 3 present some particularities. In the case of input 2 the use
of different keys would require a more complex query or some sort of parametri-
sation in executed queries. In input 3 the per row iteration model for CSV
files implemented in ShExML does not create collections effectively. Therefore,
it would imply a reimplementation of per row iteration model for these cases.
However, it could affect the overall functionality for CSV files.

6        Evaluation and Discussion
In Q3 we have posed a question about the possible effects that the modifications
in ShExML could have in already working features. The idea of this question is
to demonstrate the validity of Q1 solutions alongside old features that should
still work as expected. This type of testing, known as regression testing, have
been included in ShExML from the very beginning20 so we are able to add new
features in ShExML knowing that old features are still working as expected.
Thus, every time a new version is released these tests must be executed to
validate the language and engine integrity. Continuous integration is the perfect
tool for this task, as every time that a change is submitted to ShExML repository
all tests are executed to verify the integrity. In the ShExML repository we have
configured Travis CI21 for this task. Therefore, these regression tests in v0.2.422
and v0.2.523 are telling us that all features are still working as expected, and
equally, giving a negative answer to Q3. So, we can conclude that integrity is
held.
     In Sections 3 and 4 we have seen how some mapping challenges were already
solved in ShExML and how we have made some modifications in ShExML lan-
guage and engine to deal with the others. These two sections give an answer
20
   To see all tests that are executed over ShExML engine
   https://github.com/herminiogg/ShExML/tree/master/src/test/scala-2.12/es/
   weso/shexml
21
   https://travis-ci.org/
22
   https://travis-ci.org/github/herminiogg/ShExML/builds/755033209
23
   https://travis-ci.org/github/herminiogg/ShExML/builds/756419674
for Q1. These solutions were designed to maintain ShExML usability [4] using
a similar syntax, so that users can use these new features with the minimum
learning curve possible; in other words, making the smallest modifications in the
ShExML syntax. In addition, in Section 3 we have highlighted how the ShExML
design have already given an answer to some challenges, emphasising how the
ShExML separation of concerns principle can give answers to some of them (e.g.,
Join on literal).
    In Section 5 we have given some intuition on how remaining challenges could
be solved, answering to Q2. They would require harder and more complex modi-
fications; in some cases the modification of an already working mechanism (e.g.,
inputs 2 and 3 in RDF Collections), the inclusion of a new iteration model and
the design of a new query language (e.g., Excel style) or the choice between two
different systems (e.g., data transformation functions in process multivalue ref-
erences). All these inclusions will require a careful study and implementation in
the language so they do not affect other features and, also, to select the better
option from a usability perspective.


7   Conclusions
In this paper we have explored how ShExML can deal with some of the chal-
lenges defined in the Knowledge Graph Construction W3C Community Group.
We have divided them into challenges already solved by ShExML before their
definition, challenges solved by latest versions of ShExML, and challenges that
are not yet solved, for which we have given some notions and intuitions on how
ShExML can be modified to cover them. Furthermore, we have demonstrated
that the modification of ShExML to cover new challenges has not affected other
language and engine features. Therefore, we see this work as a first step on
demonstrating how the challenges can be solved and, together with solutions
from other languages and the joint discussion, we will be able to offer unified
solutions to the posed mapping challenges.


References
 1. Chaves-Fraga, D., Heyvaert, P., Priyatna, F., Sequeda, J.F., Dimou, A., Jabeen,
    H., Graux, D., Sejdiu, G., Saleem, M., Lehmann, J. (eds.): Joint Proceedings of the
    1st International Workshop on Knowledge Graph Building and 1st International
    Workshop on Large Scale RDF Analytics co-located with 16th Extended Semantic
    Web Conference (ESWC 2019), Portorož, Slovenia, June 3, 2019, CEUR Workshop
    Proceedings, vol. 2489. CEUR-WS.org (2019)
 2. Delva, T., Van Assche, D., Heyvaert, P., De Meester, B., Dimou, A.: Integrating
    nested data into knowledge graphs with RML fields. In: To appear on: Proceedings
    of the 2nd International Workshop on Knowledge Graph Building co-located with
    18th Extended Semantic Web Conference (ESWC 2021), Hersonissos, Greece, June
    6, 2021. CEUR Workshop Proceedings (2021)
 3. Dimou, A., Sande, M.V., Colpaert, P., Verborgh, R., Mannens, E., de Walle, R.V.:
    RML: A Generic Language for Integrated RDF Mappings of Heterogeneous Data.
    In: Proceedings of the Workshop on Linked Data on the Web co-located with the
    23rd International World Wide Web Conference (WWW 2014), Seoul, Korea, April
    8, 2014. (2014)
 4. Garcı́a-González, H., Boneva, I., Staworko, S., Labra-Gayo, J.E., Lovelle, J.M.C.:
    ShExML: improving the usability of heterogeneous data mapping languages for
    first-time users. PeerJ Computer Science 6, e318 (2020)
 5. Garcı́a-González, H., Fernández-Álvarez, D., Gayo, J.E.L.: ShExML: An Heteroge-
    neous Data Mapping Language based on ShEx. In: Proceedings of the EKAW 2018
    Posters and Demonstrations Session co-located with 21st International Conference
    on Knowledge Engineering and Knowledge Management (EKAW 2018), Nancy,
    France, November 12-16, 2018. pp. 9–12 (2018)
 6. Heyvaert, P., Meester, B.D., Dimou, A., Verborgh, R.: Declarative Rules for Linked
    Data Generation at Your Fingertips! In: The Semantic Web: ESWC 2018 Satellite
    Events - ESWC 2018 Satellite Events, Heraklion, Crete, Greece, June 3-7, 2018,
    Revised Selected Papers. pp. 213–217 (2018)
 7. Lefrançois, M., Zimmermann, A., Bakerally, N.: A SPARQL Extension for Generat-
    ing RDF from Heterogeneous Formats. In: Blomqvist, E., Maynard, D., Gangemi,
    A., Hoekstra, R., Hitzler, P., Hartig, O. (eds.) The Semantic Web - 14th Inter-
    national Conference, ESWC 2017, Portorož, Slovenia, May 28 - June 1, 2017,
    Proceedings, Part I. Lecture Notes in Computer Science, vol. 10249, pp. 35–50
    (2017)
 8. Meester, B.D., Heyvaert, P., Verborgh, R., Dimou, A.: Mapping Languages: Anal-
    ysis of Comparative Characteristics. In: Chaves-Fraga, D., Heyvaert, P., Priy-
    atna, F., Sequeda, J.F., Dimou, A., Jabeen, H., Graux, D., Sejdiu, G., Saleem,
    M., Lehmann, J. (eds.) Joint Proceedings of the 1st International Workshop on
    Knowledge Graph Building and 1st International Workshop on Large Scale RDF
    Analytics co-located with 16th Extended Semantic Web Conference (ESWC 2019),
    Portorož, Slovenia, June 3, 2019. CEUR Workshop Proceedings, vol. 2489, pp. 37–
    45. CEUR-WS.org (2019)
 9. Meester, B.D., Maroy, W., Dimou, A., Verborgh, R., Mannens, E.: RML and
    FnO: Shaping DBpedia Declaratively. In: Blomqvist, E., Hose, K., Paulheim, H.,
    Lawrynowicz, A., Ciravegna, F., Hartig, O. (eds.) The Semantic Web: ESWC 2017
    Satellite Events - ESWC 2017 Satellite Events, Portorož, Slovenia, May 28 - June
    1, 2017, Revised Selected Papers. Lecture Notes in Computer Science, vol. 10577,
    pp. 172–177. Springer (2017)
10. Meester, B.D., Seymoens, T., Dimou, A., Verborgh, R.: Implementation-
    independent function reuse. Future Gener. Comput. Syst. 110, 946–959 (2020)
11. Michel, F., Djimenou, L., Faron-Zucker, C., Montagnat, J.: Translation of Rela-
    tional and Non-relational Databases into RDF with xR2RML. In: Monfort, V.,
    Krempels, K., Majchrzak, T.A., Turk, Z. (eds.) WEBIST 2015 - Proceedings of
    the 11th International Conference on Web Information Systems and Technologies,
    Lisbon, Portugal, 20-22 May, 2015. pp. 443–454. SciTePress (2015)
12. Michel, F., Djimenou, L., Zucker, C.F., Montagnat, J.: xR2RML: Relational and
    non-relational databases to RDF mapping language. Tech. rep. (2017)
13. Prud’hommeaux, E., Gayo, J.E.L., Solbrig, H.R.: Shape expressions: an RDF val-
    idation and transformation language. In: Sack, H., Filipowska, A., Lehmann, J.,
    Hellmann, S. (eds.) Proceedings of the 10th International Conference on Semantic
    Systems, SEMANTICS 2014, Leipzig, Germany, September 4-5, 2014. pp. 32–40.
    ACM (2014)