<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A ShExML Perspective on Mapping Challenges: Already Solved Ones, Language Modi cations and Future Required Actions</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>IT and Communications Service, University of Oviedo</institution>
          ,
          <addr-line>Asturias</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Data mapping languages allow users to create knowledge graphs with lower cost and time. Some challenges cannot be solved with state-of-the-art languages and tools, though. Thus, in this paper we use and modify ShExML to deal with some of them. We see how some challenges were already solved, which modi cations we had to perform to cover others, and how the rest of them could be covered in future versions. Then, we establish a demonstration on language integrity after changes and a discussion on performed and upcoming changes. These solutions, alongside the discussion and joint analysis of other languages and tools solutions, will drive us to e ective techniques to solve all proposed challenges.</p>
      </abstract>
      <kwd-group>
        <kwd>mapping challenges</kwd>
        <kwd>ShExML</kwd>
        <kwd>data mapping languages</kwd>
        <kwd>knowledge graph construction</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Mapping heterogeneous datasources using a single representation is an active
eld which has been getting traction in the past years. For this purpose a set
of languages and tools has been proposed [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] which lower the cost and time
employed in these tasks. This trajectory ended up in the celebration of the 1st
International Workshop on Knowledge Graph Building [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and, one year after,
the beginning of the Knowledge Graph Construction W3C Community Group1
where academia, industry and practitioners are gathered to envision new steps,
nd unsolved problems, and face new challenges in this eld2. One of the
community outputs was a set of mapping challenges3 which are nowadays complicated
to solve with the current state-of-the-art languages, tools and techniques.
      </p>
      <p>Copyright © 2021 for this paper by its authors. Use permitted under Creative
Commons License Attribution 4.0 International (CC BY 4.0).
1 https://w3id.org/kg-construct/tpac/
2 https://w3id.org/kg-construct/tpac/#report
3 https://w3id.org/kg-construct/workshop/challenges.html</p>
      <p>
        Therefore, in this paper we tackle some of these mapping challenges with
ShExML [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] and try to solve the following questions:
{ Q1: How can mapping challenges be solved with ShExML?
{ Q2: How can unaddressed challenges be solved and implemented in ShExML?
{ Q3: Have modi cations in ShExML a ected the functioning of already present
features?
      </p>
      <p>The rest of the paper is structured as follows: In Section 2 we summarise the
mapping challenges proposed by the community; additionally, we o er a set of
suplementary material to support following explanations. In Section 3 we see how
the current language speci cation and engine can solve some of the challenges.
Then, we explain how the solutions for other challenges have been implemented
and included in ShExML in Section 4. In Section 5 we propose some further
language modi cations and a discussion on how the rest of the challenges could
be addressed. We demonstrate the old features integrity after including the new
ones and we establish a discussion on mapping challenges results in Section 6.
And, nally, in Section 7 we draw some conclusions.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Mapping challenges summary</title>
      <p>During consecutive meetings in the Knowledge Graph Construction W3C
Community Group, several mapping challenges and problems were arisen which are
collected in the workshop website4. Thus, in this paper we deal with this
selection of mapping challenges, we examine them and propose solutions within the
ShExML language and our engine5. To categorise these solutions we link them
to ShExML versions so we can trace when these solutions were achieved, i.e., if
they were possible to solve before the mapping challenges were de ned (ShExML
v0.2.3), if they were solved after the mapping challenges were de ned (ShExML
v0.2.4 &amp; v0.2.5) or if they are not yet solved (future versions).</p>
      <p>Thus, in Table 1 a summary table is o ered with the addressed challenges
and with which version of ShExML engine the expected output is achieved.
Besides, we o er a webpage6 with links to the working solutions as
supplementary material for the sake of demonstration and reproducibility. The inputs are
taken from the Knowledge Graph Construction W3C Community Group
mapping challenges repository7 which contains a set of inputs and expected outputs
that are agreed to represent the community mapping problems.</p>
      <p>In the following sections we explain how solutions are achieved, which ShExML
constructions and techniques were used, we establish a discussion on reached
solutions and how unsolved challenges could be addressed in ShExML.</p>
      <sec id="sec-2-1">
        <title>4 https://w3id.org/kg-construct/workshop/challenges.html 5 https://github.com/herminiogg/ShExML 6 http://herminiogg.github.io/mapping-challenges/challenges/solutions.html 7 https://github.com/kg-construct/mapping-challenges</title>
        <p>Already solved With language modi cations
(v0.2.3) (v0.2.4 &amp; v0.2.5)
x X
x x
x X
x X
x X
x X
X X</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Already solved mapping challenges</title>
      <p>In this section, we deal with the mapping challenges that can be solved without
any modi cation in the ShExML language and engine. Therefore, these solutions
are those that are reachable with ShExML v0.2.3 (released on 29th October
2020)8, that is, before the mapping challenges were de ned.
3.1</p>
      <sec id="sec-3-1">
        <title>Datatype map (input 5)</title>
        <p>Datatype map refers to the possibility to generate datatype tags from the input
content. Therefore, instead of de ning them statically in the mapping rules, this
challenge aims to support the dynamic generation of datatype tags from input
content. In the case of input 5, it is intended that mapping languages would be
able to generate datatype tags according to the most probable value according to
values formats. For example, in input 5 it is expected that the number 3 would
have an xsd:integer datatype and 3.14 would have an xsd:decimal one.</p>
        <p>This inference mechanism was already implemented in ShExML engine which
in case that the user does not speci cally de ne a datatype for an object value
it will infer the most probable one (see Listing 1.1). Although the current
implementation solves this speci c mapping challenge, it is a nave implementation.
8 https://github.com/herminiogg/ShExML/releases/tag/v0.2.3
However, it can lead to a more complex inference system, for example, aligning
existing input data sources datatypes with RDF ones.</p>
        <p>Listing 1.1. ShExML datatype inference function.
protected def searchForXSDType (o: String ): RDFDatatype = {
if( Try (o. toInt ). isSuccess )</p>
        <p>XSDDatatype . XSDinteger
else if( Try (o. toDouble ). isSuccess )</p>
        <p>XSDDatatype . XSDdecimal
else if( Try (o. toBoolean ). isSuccess )</p>
        <p>XSDDatatype . XSDboolean
else</p>
        <p>XSDDatatype . XSDstring
}</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2 Join on literal</title>
        <p>
          This challenge refers to the possibility to generate literals from a join condition
(i.e., from another source) where R2RML9 and RML [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] output a resource by
default.
        </p>
        <p>In ShExML, join conditions10 generate values without any speci c form, so it
is not determined in this step if it is a literal or a resource. It is, then, de ned by
the user in the shapes part, where the user decides the form of the output. This
is a design decision on ShExML that was driven by the separation of concerns
main principle. In Listing 1.2 we can see how the join condition is de ned in
familyName expression, and how then this expression is used in :Author shape
without any pre x, indicating that a literal must be generated.</p>
        <p>Listing 1.2. ShExML solution for join on literals.</p>
        <p>PREFIX : &lt;http: // example . com /&gt;
PREFIX experson: &lt;http: // example . com / person /&gt;
PREFIX dbr: &lt;http: // dbpedia . org / resource /&gt;
PREFIX schema: &lt;http: // schema . org /&gt;
SOURCE jsonfile &lt;https: // raw . githubusercontent . com /
kg - construct / mapping - challenges /
2 aac9680cd731fd647abd33d44a7f400e4278cf3 /
challenges /join -on - literal /input -1/ input .json &gt;
ITERATOR author &lt; jsonpath: $. author [*] &gt; {</p>
        <p>FIELD id &lt;id &gt;
FIELD firstname &lt;firstname &gt;</p>
        <p>FIELD affiliation &lt;affiliation &gt;
}
ITERATOR people &lt; jsonpath: $. people [*] &gt; {</p>
        <sec id="sec-3-2-1">
          <title>9 https://www.w3.org/TR/r2rml/</title>
          <p>10 See ShExML speci cation for a full explanation on how join construction works:
http://shexml.herminiogarcia.com/spec/#join
FIELD firstname &lt;firstname &gt;</p>
          <p>FIELD familyname &lt;familyName &gt;
}
EXPRESSION authors &lt; jsonfile . author UNION jsonfile . people &gt;
EXPRESSION familyName &lt; jsonfile . people . familyname UNION
jsonfile . author . firstname JOIN jsonfile . people . firstname &gt;
:Author experson: [ authors .id] {
:affiliation [ authors . affiliation ] ;
:lastName [ familyName ] ;
}</p>
        </sec>
      </sec>
      <sec id="sec-3-3">
        <title>3.3 Multivalue references</title>
        <p>This challenge deals with the expected output of a hierarchical document (e.g.,
XML or JSON les) where multiple iterators are used. The discussion in this
challenge is whether we produce the cartesian product and provide a join
condition to correlate values or if we just translate the hierarchical information as
it is, without the need to provide any join condition11. This case becomes more
complicated if a join condition needs to be provided over a JSON le because of
the impossibility to access parent nodes (see Section 4.1 for the speci c challenge
on this topic). Therefore, it seems that in hierarchical data the expected output
should be a verbatim translation (so to say, not creating the cartesian product
as it is not how it is originally represented in the input le).</p>
        <p>
          In ShExML, this was the default behaviour from its inception as in ShExML
rst versions it only supported XML and JSON les. Besides, we saw it as a
more usable manner to de ne these mappings as usability is the main goal of
the language [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. Therefore, in Listing 1.3 we can see how using iterators and
nested iterators we can cover these hierarchical data models. In Table 1 this
challenge is marked as not solved in ShExML v0.2.3 due to a bug when using
only the root node ($) in the top iterator query. However, we include it here
as the coverage of this challenge did not require syntax modi cations nor new
features in ShExML engine, only a bug x.
        </p>
        <p>Listing 1.3. ShExML solution multivalue references.</p>
        <p>PREFIX ex: &lt;http: // example . com /&gt;
PREFIX exLab: &lt;http: // example . com / lab /&gt;
PREFIX exArticle: &lt;http: // example . com / article /&gt;
PREFIX exAuthor: &lt;http: // example . com / author /&gt;
PREFIX exAff: &lt;http: // example . com / aff /&gt;
SOURCE lab_file &lt;https: // raw . githubusercontent . com /
kg - construct / mapping - challenges / main
/ challenges / multivalue - references
/ input -1/ input .json &gt;
11 See discussion about this topic in RMLMapper reference implementation: https:
//github.com/RMLio/rmlmapper-java/issues/28
ITERATOR lab &lt; jsonpath: $&gt; {</p>
        <p>FIELD labName &lt;labName &gt;
ITERATOR articles &lt; articles [*] &gt; {</p>
        <p>FIELD title &lt;title &gt;
ITERATOR authors &lt;authors [*] &gt; {</p>
        <p>FIELD name &lt;name &gt;
ITERATOR affiliation &lt; affiliation [*] &gt; {</p>
        <p>FIELD label &lt;label &gt;
EXPRESSION labValues &lt; lab_file .lab &gt;
ex:Lab exLab: [ labValues . labName ] {
a ex:Lab ;
ex:hasArticles @ ex:Article ;
ex:hasMembers @ ex:Author ;
ex:Article exArticle: [ labValues . articles . title ] {</p>
        <p>ex:hasAuthor @ ex:Author ;
ex:Author exAuthor: [ labValues . articles . authors . name ] {
ex:hasAffiliation</p>
        <p>exAff: [ labValues . articles . authors . affiliation . label ] ;
}
}
}</p>
        <p>}
}
}
}</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4 Language modi cations</title>
      <p>In this section, we deal with the mapping challenges that needed some modi
cations in the ShExML language and engine. Therefore, these solutions are those
which are reachable with ShExML v0.2.4 (released on 18th January 2021)12
or with ShExML v0.2.5 (released on 27th January 2021)13, that is, after the
mapping challenges were de ned.</p>
      <sec id="sec-4-1">
        <title>4.1 Access elds outside iterators</title>
        <p>Sometimes, in hierarchical data models, there is the need to access values outside
the iteration pattern. For example, we may need to obtain the values that are
parents of the current iterated node. When dealing with XML les it does not
involve any modi cation in ShExML, as using XPath queries we are able to
12 https://github.com/herminiogg/ShExML/releases/tag/v0.2.4
13 https://github.com/herminiogg/ShExML/releases/tag/v0.2.5
access upper nodes with the double dot and slash notation (i.e., ../). However,
when dealing with JSON les, this is not possible because of JSONPath not
supporting the parent access notation14.</p>
        <p>
          This is a well-known problem in data mapping languages as they use
JsonPath to de ne values accesses. Indeed, in xR2RML [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ], the authors de ned a
property called xrr:pushDown that takes a value in the hierarchy and pushes it
down into their o springs iterators so it can be available further [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ].
        </p>
        <p>Following this experience with xR2RML, we implemented a similar solution
in ShExML using the PUSHED_FIELD and POPPED_FIELD keywords. When using the
PUSHED_FIELD keyword the ShExML engine saves the value using the name as
the identi er for further uses. Then, when the POPPED_FIELD is used the ShExML
engine searches for the saved value with an identi er which is equal to that given
in the query part (i.e., inside &lt; and &gt;). Therefore, in Listing 1.4, the id eld is
saved and then used in the cars iterator, so we can establish a relation from the
car to the owner.</p>
        <p>
          An interesting discussion here is if it is better to make these accesses implicit
or explicit. A recent RML syntax modi cation proposal [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] presented an
algorithm that could implicitly access values from upper hierarchical levels by saving
iteration information, indexes and values. Users can directly access upper
elements from its current hierarchical level. Indeed, the solution is clean and avoids
the user's explicit declaration of values to be saved. However, it has two possible
main drawbacks. Firstly, the use of a bigger amount of memory and time by
saving a lot information that could be used (or not) in further mapping rules.
This technique, could be, in the end, a bottleneck in performance if it is not
carefully implemented. Second one, it could complicate the engine
implementation as it allows to go up and down in the hierarchy, while actual behaviour only
expects to go down. This could also lead to a performance issue. Either way,
this dichotomy should be quanti ed in further experiments to establish the best
solution in terms of usability and performance.
        </p>
        <p>Listing 1.4. ShExML solution for accessing elds outside iterations.
ITERATOR records &lt; j so np at h: $ . records [*] &gt; {</p>
        <p>P U S H E D _ F I E L D id &lt;id &gt;
FIELD en te re dB y &lt; enteredBy &gt;
ITERATOR cars &lt; cars [*] &gt; {</p>
        <p>FIELD make &lt; make &gt;</p>
        <p>P O P P E D _ F I E L D carOwner &lt;id &gt;
}</p>
        <p>}
4.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Datatype map</title>
        <p>As we mentioned in Section 3.1, this challenge aims to generate datatype tags
dynamically from data content. Therefore, the datatype inputs can appear in
14 https://goessner.net/articles/JsonPath/
multiple ways: full URI, pre x plus datatype, or simply datatype name without
pre x.</p>
        <p>
          ShExML v0.2.3 supports the creation of static datatype tags with pre x
plus datatype syntax (see Listing 1.5). Therefore, we should derive this syntax
and maintain its proven usability [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] but giving dynamic datatype generation
possibilities. The natural expansion of this syntax is to include the same object
generation expression but also for datatypes and language tags (see Section 4.3).
So, the nal syntax is pre x plus generation expression (inside square brackets)
as we can see in Listing 1.6. The pre x can be optional if the data value already
contains it (e.g., input 1 and 2) and values can be transformed using Matcher
feature15 to expected XML Schema valid datatypes (e.g., input 4).
        </p>
        <p>Listing 1.5. ShExML static datatypes syntax.
ex:Person exPerson: [ person . firstname ] {</p>
        <p>ex:num [ person . num ] x s d : i n t e g e r ;
}
}</p>
        <p>Listing 1.6. ShExML dynamic datatypes syntax.
ex:Person exPerson: [ person . firstname ] {</p>
        <p>ex:num [ person . num ] xsd: [ person . dt ] ;
4.3</p>
      </sec>
      <sec id="sec-4-3">
        <title>Language map</title>
        <p>As with datatype maps in Section 4.2, the language map challenge want to
address the problem of generating language tags dynamically from input data.
In ShExML v0.2.3 language tags were supported statically, that is, it was possible
to tag an object expression with a speci c language but it would be applied to
all values (see Listing 1.7).</p>
        <p>We performed a syntax and engine modi cation, like in datatype maps, to
be able to generate language tags with expressions. The nal syntax is @ plus the
generation expression (between square brackets) as it can be seen in Listing 1.8.
Here, again, the idea was to preserve the usability as the main goal and to make
it as simple as possible. Input 1 tests the generation with a valid tag following
BCP4716, input 2 tests the transformation of a language value to a valid tag (in
ShExML this is done using Matchers functionality17), and in input 3 it is shown
how two di erent sources can be joined to provide language information.</p>
        <p>As a side note, it is not possible to specify a language map and a datatype
map in the same triple, as it is forbidden by the ShExML grammar. This was
made intentional to follow the RDF speci cation rules as it can be seen in its
15 http://shexml.herminiogarcia.com/spec/#matcher
16 https://tools.ietf.org/html/bcp47
17 http://shexml.herminiogarcia.com/spec/#matcher
grammar18. Therefore, whenever a langtag generation expression (either static
or dynamic) is provided the implicit datatype is rdf:langString.</p>
        <p>Listing 1.7. ShExML static generation of language tags.
ex:Person exPerson: [ person . firstname ] {</p>
        <p>ex:lastName [ person . lastname ] @ en ;
}
}
}
}
4.4</p>
      </sec>
      <sec id="sec-4-4">
        <title>Generate multiple values</title>
        <p>This challenge wants to address the problem of generating various datatypes or
language tags for the same subject (e.g., a multi-language value). Once datatype
maps (see Section 4.2) and language maps (see Section 4.3) are supported in
ShExML, it is straightforward as ShExML will generate a triple per value
returned from the object expression. Therefore, to generate multi-language values
the syntax is like in Listing 1.9 and to generate multi-language values with a
default language the syntax is like in Listing 1.10.</p>
        <p>Listing 1.9. ShExML multiple values with language tags.
ex:Person exPerson: [ person . lastname ] {</p>
        <p>ex:name [ person . firstname . label ] @[ person . firstname . lang ] ;
Listing 1.10. ShExML multiple values with language tags and with a default language.
ex:Person exPerson: [ person . firstname ] {
ex:name [ person . firstname ] @ en ;
ex:name [ person . firstname ] @[ person . lang ] ;</p>
        <p>Listing 1.8. ShExML dynamic generation of language tags
ex:Person exPerson: [ person . firstname ] {</p>
        <p>ex:lastName [ person . lastname ] @[ person . lang ] ;
4.5</p>
      </sec>
      <sec id="sec-4-5">
        <title>RDF Collections</title>
        <p>This challenge puts on the table the necessity for a mechanism to create RDF
Collections from some values. Normally, in ShExML, and in other data
mapping languages, when an object generation expression returns multiple values,
multiple triples are generated (see Section 3.3). However, in certain cases it is
necessary to encapsulate these values inside a collection (e.g., to preserve order).
18 https://www.w3.org/TR/turtle/#h3 sec-grammar-grammar</p>
        <p>
          This was already explored by some languages (e.g., SPARQL-Generate [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]
and xR2RML [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]) which provide some directives to create collections. Therefore,
we applied this experience in ShExML to cover RDF Collections and Containers
(i.e., Lists, Seqs, Bags and Alts.). Now, it is possible to indicate to the engine
that a collection or container should be generated using the keyword AS plus
the desired collection or container (i.e., RDFList, RDFBag, RDFSeq or RDFAlt). The
proposed syntax follows the same design principles from already existing features
syntax (e.g., Matchers19). See Listing 1.11 for an example.
        </p>
        <p>Listing 1.11. ShExML support for RDF collections and containers.
e x : A r t i c l e e x A r t i c l e : [ la bV al ue s . articles . title ] {
a e x : A r t i c l e ;
e x : h a s A u t h o r s</p>
        <p>e xA ut hor : [ la bV al ue s . articles . authors . name AS RDFList ] ;
}
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Future required actions</title>
      <p>In this section we discuss further challenges that are not solved with the
previously mentioned modi cations. These are the challenges that would require to
rethink some functionality, or to include new ones, but that would need from a
well planned inclusion, due to their possible interference with other features.
5.1</p>
      <sec id="sec-5-1">
        <title>Access elds outside iterators (input 2)</title>
        <p>Although this challenge was already addressed in Section 4.1, only input 1 was
completely solved. In the case of input 2, where data is in the same
hierarchical level (like it would come from two di erent les), using join conditions in
ShExML, only one car is linked to each owner when the expected result was two
cars per person. To solve this problem we think of two possible solutions.</p>
        <p>First one is to review the join condition functionality to check whether
something is failing (a bug) or if the join condition need to be rethought and
reimplemented to cover further challenges.</p>
        <p>
          Another possibility, which is already present in other languages like YARRRML
[
          <xref ref-type="bibr" rid="ref6">6</xref>
          ], is to provide conditional generation. With conditional content generation we
are able to test a condition (e.g., in input 2 for value equality) and generate or
not the resulting triple depending on its result.
5.2
        </p>
      </sec>
      <sec id="sec-5-2">
        <title>Excel style</title>
        <p>A classic solution when dealing with Excel sheets was to convert them to CSV,
and then treat them as tables to be processed by data mapping languages.
However, this challenge found this solution not appropriate when the style of the
Excel sheet is wanted to be preserved. Two solutions could cover this challenge.
19 http://shexml.herminiogarcia.com/spec/#matcher-0</p>
        <p>First one is to preprocess the Excel sheet and convert it to CSV but adding
columns with style information so it can be processed by state-of-the-art tools.
However, it would require some preprocessing work which would weaken the goal
of low cost and time invested when using data mapping tools.</p>
        <p>Second one is to include a speci c Excel processor, with its own query
language, which can express not only access to cells, but also to the cell and text
style. Thus, in Java based implementations it can be considered to use Apache
POI to process sheets, and include some simple query support to retrieve styles.
Therefore, in ShExML this would require to support the mentioned query
language for Excel, and then, integrate it into the ShExML engine to retrieve Excel
sheets values.
5.3</p>
      </sec>
      <sec id="sec-5-3">
        <title>Process multivalue references</title>
        <p>This challenge is very close to multivalue references (see Section 3.3), but in this
case multivalues are included all within a string value and separated by commas.</p>
        <p>Therefore, here the challenge is not about how to output multivalues, or
create RDF collections, but how to e ectively handle these multivalues which need
some processing. Therefore, this would require some sort of data transformations
functions that can be applied to the extracted values. Therefore, the most e
ective way to extend ShExML and enable users to transform data is to provide the
possibility to execute transformation functions which can be de ned by users.</p>
        <p>
          Data transformation functions have been already explored in RML through
the FnO library [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] which provides a set of implementation independent reusable
functions [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. So, one possibility is to support FnO functions inside ShExML.
The advantage of this proposal is that it moves all function infrastructure outside
the ShExML language and engine. Conversely, we add more dependencies to
users (which can nd it hard to learn), we force them to use a third party
environment and we lose control of this part.
        </p>
        <p>
          Another possibility is to provide an environment to de ne inline functions
like the semantic actions in Shape Expressions (ShEx) [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. Therefore, we can
provide a restricted environment where higher order functions could be executed
(see Listing 1.12 for an example). The advantages are that there is no need for
third party dependencies, it provides a higher exibility and users do not need
to learn another tool. However, it can increase complexity due to the necessity
to know about functional programming.
        </p>
        <p>Listing 1.12. ShExML support for RDF collections and containers.</p>
        <p>PREFIX ex: &lt; http: // example . com /&gt;
SOURCE lab_file &lt; https: // raw . githubusercontent . com /
kg - construct / mapping - challenges / main / challenges /
process - multivalue - references / input -1/ input . json &gt;
FUNCTION splitFunction &lt;n =&gt; n. split ( ',')&gt;
ITERATOR lab &lt; jsonpath: $ &gt; {</p>
        <p>FIELD labName &lt; labName &gt;
ITERATOR articles &lt; article &gt; {
}</p>
        <p>FIELD title &lt; title &gt;</p>
        <p>FIELD tags &lt; tags &gt;
}
}
E X P R E S S I O N la bV al ue s &lt; lab_file . lab &gt;
ex:Tag ex: [ lab . articles . tag WITH s p l i t F u n c t i o n ] {</p>
        <p>ex:label [ lab . articles . tag WITH s p l i t F u n c t i o n ] ;
Although RDF collections and containers were included in ShExML (see Section
4.5), input 2 and 3 present some particularities. In the case of input 2 the use
of di erent keys would require a more complex query or some sort of
parametrisation in executed queries. In input 3 the per row iteration model for CSV
les implemented in ShExML does not create collections e ectively. Therefore,
it would imply a reimplementation of per row iteration model for these cases.
However, it could a ect the overall functionality for CSV les.
6</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Evaluation and Discussion</title>
      <p>In Q3 we have posed a question about the possible e ects that the modi cations
in ShExML could have in already working features. The idea of this question is
to demonstrate the validity of Q1 solutions alongside old features that should
still work as expected. This type of testing, known as regression testing, have
been included in ShExML from the very beginning20 so we are able to add new
features in ShExML knowing that old features are still working as expected.
Thus, every time a new version is released these tests must be executed to
validate the language and engine integrity. Continuous integration is the perfect
tool for this task, as every time that a change is submitted to ShExML repository
all tests are executed to verify the integrity. In the ShExML repository we have
con gured Travis CI21 for this task. Therefore, these regression tests in v0.2.422
and v0.2.523 are telling us that all features are still working as expected, and
equally, giving a negative answer to Q3. So, we can conclude that integrity is
held.</p>
      <p>
        In Sections 3 and 4 we have seen how some mapping challenges were already
solved in ShExML and how we have made some modi cations in ShExML
language and engine to deal with the others. These two sections give an answer
20 To see all tests that are executed over ShExML engine
https://github.com/herminiogg/ShExML/tree/master/src/test/scala-2.12/es/
weso/shexml
21 https://travis-ci.org/
22 https://travis-ci.org/github/herminiogg/ShExML/builds/755033209
23 https://travis-ci.org/github/herminiogg/ShExML/builds/756419674
for Q1. These solutions were designed to maintain ShExML usability [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] using
a similar syntax, so that users can use these new features with the minimum
learning curve possible; in other words, making the smallest modi cations in the
ShExML syntax. In addition, in Section 3 we have highlighted how the ShExML
design have already given an answer to some challenges, emphasising how the
ShExML separation of concerns principle can give answers to some of them (e.g.,
Join on literal).
      </p>
      <p>In Section 5 we have given some intuition on how remaining challenges could
be solved, answering to Q2. They would require harder and more complex
modications; in some cases the modi cation of an already working mechanism (e.g.,
inputs 2 and 3 in RDF Collections), the inclusion of a new iteration model and
the design of a new query language (e.g., Excel style) or the choice between two
di erent systems (e.g., data transformation functions in process multivalue
references). All these inclusions will require a careful study and implementation in
the language so they do not a ect other features and, also, to select the better
option from a usability perspective.
7</p>
    </sec>
    <sec id="sec-7">
      <title>Conclusions</title>
      <p>In this paper we have explored how ShExML can deal with some of the
challenges de ned in the Knowledge Graph Construction W3C Community Group.
We have divided them into challenges already solved by ShExML before their
de nition, challenges solved by latest versions of ShExML, and challenges that
are not yet solved, for which we have given some notions and intuitions on how
ShExML can be modi ed to cover them. Furthermore, we have demonstrated
that the modi cation of ShExML to cover new challenges has not a ected other
language and engine features. Therefore, we see this work as a rst step on
demonstrating how the challenges can be solved and, together with solutions
from other languages and the joint discussion, we will be able to o er uni ed
solutions to the posed mapping challenges.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Chaves-Fraga</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Heyvaert</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Priyatna</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sequeda</surname>
            ,
            <given-names>J.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dimou</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jabeen</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Graux</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sejdiu</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Saleem</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lehmann</surname>
            ,
            <given-names>J</given-names>
          </string-name>
          . (eds.):
          <source>Joint Proceedings of the 1st International Workshop on Knowledge Graph Building and 1st International Workshop on Large Scale RDF Analytics co-located with 16th Extended Semantic Web Conference (ESWC</source>
          <year>2019</year>
          ), Portoroz, Slovenia, June 3,
          <year>2019</year>
          , CEUR Workshop Proceedings, vol.
          <volume>2489</volume>
          .
          <string-name>
            <surname>CEUR-WS.org</surname>
          </string-name>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Delva</surname>
          </string-name>
          , T.,
          <string-name>
            <surname>Van Assche</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Heyvaert</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>De Meester</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dimou</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Integrating nested data into knowledge graphs with RML elds</article-title>
          . In: To appear
          <source>on: Proceedings of the 2nd International Workshop on Knowledge Graph Building co-located with 18th Extended Semantic Web Conference (ESWC</source>
          <year>2021</year>
          ), Hersonissos, Greece, June 6,
          <year>2021</year>
          . CEUR Workshop Proceedings (
          <year>2021</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Dimou</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sande</surname>
            ,
            <given-names>M.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Colpaert</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Verborgh</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mannens</surname>
          </string-name>
          , E.,
          <string-name>
            <surname>de</surname>
            <given-names>Walle</given-names>
          </string-name>
          , R.V.:
          <article-title>RML: A Generic Language for Integrated RDF Mappings of Heterogeneous Data</article-title>
          .
          <source>In: Proceedings of the Workshop on Linked Data on the Web co-located with the 23rd International World Wide Web Conference (WWW</source>
          <year>2014</year>
          ), Seoul, Korea, April 8,
          <year>2014</year>
          . (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Garc</surname>
            a-Gonzalez,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Boneva</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Staworko</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Labra-Gayo</surname>
            ,
            <given-names>J.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lovelle</surname>
            ,
            <given-names>J.M.C.</given-names>
          </string-name>
          :
          <article-title>ShExML: improving the usability of heterogeneous data mapping languages for rst-time users</article-title>
          .
          <source>PeerJ Computer Science</source>
          <volume>6</volume>
          ,
          <issue>e318</issue>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Garc</surname>
            a-Gonzalez,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fernandez-Alvarez</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gayo</surname>
            ,
            <given-names>J.E.L.</given-names>
          </string-name>
          :
          <article-title>ShExML: An Heterogeneous Data Mapping Language based on ShEx</article-title>
          .
          <source>In: Proceedings of the EKAW</source>
          <year>2018</year>
          <article-title>Posters and Demonstrations Session co-located with 21st International Conference on Knowledge Engineering and Knowledge Management (EKAW</article-title>
          <year>2018</year>
          ), Nancy, France,
          <source>November 12-16</source>
          ,
          <year>2018</year>
          . pp.
          <volume>9</volume>
          {
          <issue>12</issue>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Heyvaert</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Meester</surname>
            ,
            <given-names>B.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dimou</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Verborgh</surname>
          </string-name>
          , R.:
          <article-title>Declarative Rules for Linked Data Generation at Your Fingertips! In: The Semantic Web: ESWC 2018 Satellite Events - ESWC 2018 Satellite Events</article-title>
          , Heraklion, Crete, Greece, June 3-7,
          <year>2018</year>
          , Revised Selected Papers. pp.
          <volume>213</volume>
          {
          <issue>217</issue>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Lefrancois</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zimmermann</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bakerally</surname>
          </string-name>
          , N.:
          <article-title>A SPARQL Extension for Generating RDF from Heterogeneous Formats</article-title>
          . In: Blomqvist,
          <string-name>
            <given-names>E.</given-names>
            ,
            <surname>Maynard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Gangemi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Hoekstra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Hitzler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Hartig</surname>
          </string-name>
          ,
          <string-name>
            <surname>O</surname>
          </string-name>
          . (eds.)
          <source>The Semantic Web - 14th International Conference, ESWC</source>
          <year>2017</year>
          , Portoroz, Slovenia, May 28 - June 1,
          <year>2017</year>
          , Proceedings,
          <source>Part I. Lecture Notes in Computer Science</source>
          , vol.
          <volume>10249</volume>
          , pp.
          <volume>35</volume>
          {
          <issue>50</issue>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Meester</surname>
            ,
            <given-names>B.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Heyvaert</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Verborgh</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dimou</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Mapping Languages: Analysis of Comparative Characteristics</article-title>
          . In:
          <string-name>
            <surname>Chaves-Fraga</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Heyvaert</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Priyatna</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sequeda</surname>
            ,
            <given-names>J.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dimou</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jabeen</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Graux</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sejdiu</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Saleem</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lehmann</surname>
            ,
            <given-names>J</given-names>
          </string-name>
          . (eds.)
          <source>Joint Proceedings of the 1st International Workshop on Knowledge Graph Building and 1st International Workshop on Large Scale RDF Analytics co-located with 16th Extended Semantic Web Conference (ESWC</source>
          <year>2019</year>
          ), Portoroz, Slovenia, June 3,
          <year>2019</year>
          .
          <source>CEUR Workshop Proceedings</source>
          , vol.
          <volume>2489</volume>
          , pp.
          <volume>37</volume>
          {
          <fpage>45</fpage>
          .
          <string-name>
            <surname>CEUR-WS.org</surname>
          </string-name>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Meester</surname>
            ,
            <given-names>B.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maroy</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dimou</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Verborgh</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mannens</surname>
          </string-name>
          , E.:
          <article-title>RML and FnO: Shaping DBpedia Declaratively</article-title>
          . In: Blomqvist,
          <string-name>
            <given-names>E.</given-names>
            ,
            <surname>Hose</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            ,
            <surname>Paulheim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Lawrynowicz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Ciravegna</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            ,
            <surname>Hartig</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O. (eds.) The</given-names>
            <surname>Semantic Web: ESWC 2017 Satellite Events - ESWC 2017 Satellite Events</surname>
          </string-name>
          , Portoroz, Slovenia, May 28 - June 1,
          <year>2017</year>
          ,
          <source>Revised Selected Papers. Lecture Notes in Computer Science</source>
          , vol.
          <volume>10577</volume>
          , pp.
          <volume>172</volume>
          {
          <fpage>177</fpage>
          . Springer (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Meester</surname>
            ,
            <given-names>B.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Seymoens</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dimou</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Verborgh</surname>
          </string-name>
          , R.:
          <article-title>Implementationindependent function reuse</article-title>
          .
          <source>Future Gener. Comput. Syst</source>
          .
          <volume>110</volume>
          ,
          <issue>946</issue>
          {
          <fpage>959</fpage>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Michel</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Djimenou</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Faron-Zucker</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Montagnat</surname>
          </string-name>
          , J.:
          <article-title>Translation of Relational and Non-relational Databases into RDF with xR2RML</article-title>
          . In: Monfort,
          <string-name>
            <given-names>V.</given-names>
            ,
            <surname>Krempels</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            ,
            <surname>Majchrzak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.A.</given-names>
            ,
            <surname>Turk</surname>
          </string-name>
          ,
          <string-name>
            <surname>Z</surname>
          </string-name>
          . (eds.)
          <source>WEBIST 2015 - Proceedings of the 11th International Conference on Web Information Systems and Technologies</source>
          , Lisbon, Portugal,
          <fpage>20</fpage>
          -22 May,
          <year>2015</year>
          . pp.
          <volume>443</volume>
          {
          <fpage>454</fpage>
          .
          <string-name>
            <surname>SciTePress</surname>
          </string-name>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Michel</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Djimenou</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zucker</surname>
            ,
            <given-names>C.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Montagnat</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>xR2RML: Relational and non-relational databases to RDF mapping language</article-title>
          .
          <source>Tech. rep. (</source>
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Prud'hommeaux</surname>
          </string-name>
          , E.,
          <string-name>
            <surname>Gayo</surname>
            ,
            <given-names>J.E.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Solbrig</surname>
            ,
            <given-names>H.R.</given-names>
          </string-name>
          :
          <article-title>Shape expressions: an RDF validation and transformation language</article-title>
          . In: Sack,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Filipowska</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Lehmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Hellmann</surname>
          </string-name>
          , S. (eds.)
          <source>Proceedings of the 10th International Conference on Semantic Systems, SEMANTICS</source>
          <year>2014</year>
          , Leipzig, Germany, September 4-
          <issue>5</issue>
          ,
          <year>2014</year>
          . pp.
          <volume>32</volume>
          {
          <fpage>40</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>