<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Mapeathor: Simplifying the Speci cation of Declarative Rules for Knowledge Graph Construction</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ana Iglesias-Molina</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Luis Pozo-Gilo</string-name>
          <email>luis.pozog@upm.es</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Daniel Don~a</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Edna Ruckhaus</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>David Chaves-Fraga</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Oscar Corcho</string-name>
          <email>ocorchog@fi.upm.es</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Ontology Engineering Group, Universidad Politecnica de Madrid</institution>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In recent years we have observed an increasing interest by the scienti c community, from social sciences to biomedicine, in the generation and publication of RDF-based knowledge graphs. One possibility for creating knowledge graphs consists in using declarative mappings together with their associated parsers. These mappings describe the relationship between the source data and a reference ontology. However, the learning curve to create these mapping les is steep, hindering its use by a wider community. In this paper we present a user-friendly mappinglanguage-independent tool, Mapeathor, to declare transformation rules based on spreadsheets and translate them into two di erent mapping languages with the purpose of easing the mappings creation process.</p>
      </abstract>
      <kwd-group>
        <kwd>Knowledge Graph</kwd>
        <kwd>Declarative mapping</kwd>
        <kwd>Spreadsheet</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>In the last few decades, we have seen a signi cant increase in the publication
of data in a machine understandable manner following Linked Data principles1
(e.g., DBpedia2, Wikidata3). Knowledge Graph construction requires integrating
di erent data sources in a structured way, usually following the schema of an
ontology or group of ontologies. This facilitates the posterior task of mining the
knowledge graph with several applications, such as searching recommendations
and learning implicit data patterns.</p>
      <p>
        Knowledge graphs can be built in diverse ways. One option is creating ad-hoc
scripts to transform data, which requires the user to repeat the process of script
Copyright c 2020 for this paper by its authors. Use permitted under Creative
Commons License Attribution 4.0 International (CC BY 4.0).
1 https://5stardata.info/en/
2 https://wiki.dbpedia.org/
3 https://www.wikidata.org/
writing in every speci c use case. Another option is using tools like
OpenRene4 to perform data transformation through the creation of an RDF skeleton,
which includes proprietary transformation rules and a functionality for
knowledge graph construction. Lastly, there is an option to keep the transformation
rules in speci c les that can be later processed by engines that either transform
the data to RDF or create a virtual knowledge graph that can be queried
without transforming the source data. These rules can be written in a wide variety of
languages (e.g., R2RML [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], RML [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]) that cover di erent user's needs (e.g., the
source data format or the engine that will be used). Although the use of these
mapping les is more exible and independent, since they can be processed by
a wide variety of engines, their creation is still not easy for new users. Experts
are usually needed to carry out these tasks, hindering the use of semantic web
technologies across the scienti c community. That is why it is necessary to lower
the learning curve and improve mapping reuse and reproducibility.
      </p>
      <p>
        Since mapping languages started to be used by the community, there have
been multiple approaches for the development of editors to ease their speci
cation. Most of them enable editing through graphical visualization [
        <xref ref-type="bibr" rid="ref4 ref6">4, 6</xref>
        ], others
provide a writing environment (e.g. the Protege extension OntopPro). These
editors are language-oriented, they help to create one kind of mapping, not
taking into account the wide variety of mapping languages that currently exist.
Moreover, when managing a considerable amount of mapping rules, a graphical
approach may not be easily handled.
      </p>
      <p>
        Our work focuses on providing a straightforward way to create these
mappings, specifying the transformation rules in spreadsheets, so they are later
translated into one of the implemented mapping languages. The purpose of this
proposal is to increase the interoperability between these languages [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] as well as
to ease the creation process. To perform the mapping rules translation we
developed Mapeathor5, a tool able to parse the spreadsheets and generate the
corresponding mappings in two di erent languages. This work is an extension
and improvement on the work previously presented in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], where the rst
version of the spreadsheet design and the tool were presented. The spreadsheet
includes now more options to maintain the language's expressiveness, and the
implementation has become simpler to use and more accurate in the translation.
      </p>
      <p>This paper is structured as follows: Section 2 describes the design of the
spreadsheet. Section 3 explains the functionalities of the tool and a real-world
use case. Finally, section 4 presents the main conclusions and future work.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Spreadsheet design</title>
      <p>The rules required to generate a knowledge graph can be speci ed in multiple
languages. The language is chosen by the user depending on the speci c use case.
However, the rules themselves are equivalent across languages, so they can be
written in a language-independent way, in this case, we chose a spreadsheet for</p>
      <sec id="sec-2-1">
        <title>4 http://openre ne.org/ 5 https://morph.oeg. .upm.es/demo/mapeathor</title>
        <p>(a) Prefix sheet
Prefix URI
noise http://vc.coiundt-aadceussaticbaie#rtas.es/
noise- http://v.ciudadesabiertas.es/res/
res cont-acustica#
sosa http://www.w3.org/ns/sosa/
(c) Source sheet</p>
        <sec id="sec-2-1-1">
          <title>ID Feature Value</title>
          <p>Station query SEFLREOCMT Sidt,antiaomne
Observation source data/station.json</p>
        </sec>
        <sec id="sec-2-1-2">
          <title>Observation format JSON</title>
        </sec>
        <sec id="sec-2-1-3">
          <title>Observation iterator $</title>
          <p>(d) Predicate_Object sheet</p>
        </sec>
        <sec id="sec-2-1-4">
          <title>ID Predicate</title>
          <p>Station dcterms:identifier
Station schema:name</p>
          <p>Station geGoespoamreqtl:rhyas
Observation sosa:resultTime
Observation sosa:madeBySensor
Observation sosa:observedProperty &lt;Fun1&gt;
(b) Subject sheet</p>
          <p>ID</p>
        </sec>
        <sec id="sec-2-1-5">
          <title>Station</title>
          <p>Observation noise:Observacion</p>
        </sec>
        <sec id="sec-2-1-6">
          <title>Class URI</title>
          <p>noise:EstacionMedida
noise-res:estacionmedida/{id}
noise-res:observa
cion/{idx}
(e) Function sheet</p>
        </sec>
        <sec id="sec-2-1-7">
          <title>FunctionID Feature</title>
          <p>&lt;Fun1&gt; fno:executes
&lt;Fun1&gt; ex:param1
&lt;Fun1&gt; ex:param2
&lt;Fun1&gt; ex:param3</p>
        </sec>
        <sec id="sec-2-1-8">
          <title>Value</title>
          <p>grel:replace
{obsProperty}
“ ”
“-”</p>
        </sec>
        <sec id="sec-2-1-9">
          <title>InnerRef</title>
        </sec>
        <sec id="sec-2-1-10">
          <title>OuterRef</title>
        </sec>
        <sec id="sec-2-1-11">
          <title>Object DataType ReferenceID</title>
          <p>{id} string
{name} string
npouinsteo-/r{eids}: iri
{resTime} Time</p>
          <p>Station {madeBySensor} {id}
3
rule speci cation. The spreadsheet template is devised to contain the rules in a
compact and understandable way, in a format widely used by the scienti c
community. The design is aimed to be language-independent and to ease the writing
process so the user does not have to learn a mapping language. In addition,
the functionalities of a spreadsheet editor can be used to speed up the writing
process. Reusing mappings for similar use cases is also easier in this speci cation
format. The spreadsheet contains the mapping essential elements structured in
ve di erent sheets: Pre x, Source, Subject, Predicate Object and Function.</p>
          <p>Pre x sheet: This sheet contains the namespaces and corresponding
prexes used when declaring the transformation rules (Figure 1a). It is composed
of two columns: Prefix for the pre x and URI for the corresponding namespace.</p>
          <p>Subject sheet: This sheet de nes the subjects to be generated and the key
ID that links the information in the sheets (Figure 1b). It is organized in three
columns: ID, Class and URI. URI de nes the template URI for the subject, its
class is speci ed in Class. ID contains a unique identi er for each subject's set
of rules in order to relate to information on these rules in the remaining sheets.</p>
          <p>Source sheet: Here we specify where the data is retrieved from (Figure
1c). The information is organized in three columns: ID, Feature and Value.
Feature declares the type of information provided in Value. In Value it can
be speci ed the path to the source data (with the feature source), the format
(format ), the iterator (iterator, loop used to map the data from JSON and XML
les), database table (table), SQL query (query) and SQL version (SQLVersion).
Any language option may be included. Finally, ID indicates the rule it refers to.</p>
          <p>Predicate Object sheet: This sheet de nes the triples through the
predicates and its correspondent objects (Figure 1d). The columns Predicate and
Object specify the predicate and object in a rule. The XSD datatype of Object
@@@ppprrreeefffiiixxx rrxrms: dl&lt;::h&lt;&lt;ththpttt:tp/p/:w:////swwewwm.www.3we.bo3.r.mgo/rmngs/l2a/r0b20.rbm1e/lX/#nM&gt;s./LrmScl#h&gt;e.ma#&gt;. [&lt;P#rOefbixseesr]vation&gt;
@@@@pppprrrreeeeffffiiiixxxx snqnoloo:sii&lt;ssaeeh::-t&lt;rt&lt;pehh:st/tt:/ptsp&lt;e:/:h//mw/tvtwp.wc:ew/iu/bv.d.w.mca3idum.eodlsraagadb/be.nbisseea/rs/tbnaoisses/.arqet/als&gt;#s/.&gt;c.e.osn/tr-easc/ucsotnict-aa#c&gt;u.stica#&gt;. ]r;mrrrmmml:llllo:::irstgeeoifcrueaarrtceloSenroc""ude$raF"c;toear/m[stualtaiotino.njsqoln:J";SONPath;
&lt;]]]]]rrrrr#;;;;;rrrrr.rrrrrrrrrrS:::::rrrrrrrrrrlspppo::::::::::tutcssoppooprrrageeeelqqbrrrbbbtaimeeedddiclljjjjoseVQeeedddiiiapcccscnccceiiiulaaalcccSttttra&gt;neMMMMaaatttsoteeeoreitttayaaaoueeeOOOisppppnr"MMM"bbbecn"[rjjje:aaa"oeee[[[rESppp:icccrrr[sSsErmmttte:tMMMQ[[[taL-lle::rrrrcELaaarrmrrreeei:::2pppCoscccffp0nee:oooT[[[el0Mrrannnseei8tdsssetennattt,daaaccc"nieennnndiaotttoa""mnsgdiin;sd-cceeame"htom;e-FeesrrreeRrmdpm:s"idaOd;as:arpar:M:qntriu/da:l{a:dneitShdmyatnaot}ptate"sa/ei;{tfGtiii]oydex;epn}rs"oe"d];";m:"rxsr;es:ttrdteirn:yrsgm]t;r]Tinygpe] rr:IRI] ]]]]rrrr;;;;rrrr.]rrrrrrrrra::::;rrrrrrrrrsppprr:::::::::rrruttcopopoprrrr::eeeeelbjp:rrrbbbaorSmeeedddjajjjmseieeedddiiiunrpcccscccceTiiibCaaalcccttttaynnjMMMMaaatttoeteeeptoetttcnTaaaaeeeOOOeitsdrpppp"MMM;ibbberniptr[jjj:aaaiol:eee[[&lt;oOeIpppiRcccr#nssbmttteFMIMMMs[[[;-l[ue:rrraraaarrnrrrerepr:::pppv1s:cccfcae:&gt;ooo[[[ohcrnnnbieilsssodsntttneaaac";rennnmv&lt;ttta"#sssarceSooodisosssteaTnaaaBti/:::imy{romoieSdbneasxse&gt;"du}e;n;"letrrs;TrvBo:eidymrd"aS;ePtearrrn]to;:yspppoaeerrre]tx;yns]td;":iTdi"m;]e;]
(a) Triple map for Station (b) Triple map for Observation
[Prefixes]
&lt;#Fun1&gt;
a rr:TriplesMap;
fanfmnlm:ful:nFcutniocntiVoanlTueerm[Map;
rml:logicalSource [
rml:source "data/station.json";
];rrrm:pl:rreedfeicraetnecOebFjoercmtMulaaptio[n ql:JSONPath
rr:predicate fno:executes ;
];rr:objectMap [ rr:constant grel:replace ]
rr:predicateObjectMap [
rr:predicate ex:param1 ;
];rr:objectMap [ rml:reference "obsProperty"]
rr:predicateObjectMap [
rrrr::oprbejedcictMataepe[xr:pr:acroanmst2an;t " " ]
];
rr:predicateObjectMap [
rr:predicate ex:param3 ;
].];rr:objectMap [ rr:constant "-" ]
(c) Triple map for Fun1
is de ned in DataType. When the object refers to a subject de ned in another
rule, the rule is written di erently. There are three elds that allow the speci
cation of the linking condition between the object of the triple and the referenced
subject. They specify which is the ID of the target subject (ReferenceID), and
the "join" elds in the source data (InnerRef for the eld of the object of the
current triple, and OuterRef for the eld of the referred subject). Lastly, the
column ID indicates the rule it belongs to.</p>
          <p>Function sheet: Some languages are able to process transformation
functions over the data (e.g. FnO+RML), which can be detailed in this sheet (Figure
1e). Some well known options are the SQL and GREL functions, but any option
can be used. The functions are referred in the Predicate Object sheet or in other
function rows with the identi er speci ed in FunctionID. The column Feature
is used to specify the type of information provided in Value, where the name of
the function and the value of the parameters are written.
3</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Demonstration</title>
      <p>
        The spreadsheet containing the transformation rules is processed by the tool
Mapeathor to create a mapping le. For example, Figure 2 depicts the mapping
le written in the RML language that results when translating the rules in
Figure 1. Currently, this tool translates Google spreadsheets and XLSX les to
the following languages: the W3C recommendation R2RML [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], RML [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], and
its serialization, YARRRML6. It can be used as a web service7 and as a CLI8.
      </p>
      <sec id="sec-3-1">
        <title>6 https://rml.io/yarrrml/ 7 https://morph.oeg. .upm.es/tool/mapeathor/swagger/ 8 https://github.com/oeg-upm/Mapeathor</title>
        <p>Currently, Mapeathor is being used to generate mappings for city open data
publication related to tra c, public bus transport, budget and noise pollution
in the context of the Ciudades Abiertas project. Six spreadsheets have been
completed, containing 31 subjects and 104 predicate-objects rules. The process
of spreadsheet completion and mapping creation for the languages implemented
will be shown in the demo with data from this real-world use case.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusions and future work</title>
      <p>This paper presents Mapeathor, a tool able to translate transformation rules
speci ed in spreadsheets to three di erent mapping languages. The key part
of the work are the spreadsheets containing the mapping rules, since they are
designed to facilitate the speci cation process for the user. Currently, the tool
is being tested in several use cases from the Ciudades Abiertas project.</p>
      <p>The purpose of this work is to create a framework to declare in a user-friendly
manner the transformation rules in a language-independent way and to be able
to generate these rules in any mapping language. Future work includes a user
study to test the usefulness of this tool and nd guidelines for improvement,
extend the tool to cover more languages, and implement changes that make rule
speci cation more user-friendly.</p>
      <p>Acknowledgements. The work presented in this paper is supported by the
Spanish Ministerio de Econom a, Industria y Competitividad and EU FEDER
funds under the DATOS 4.0: RETOS Y SOLUCIONES - UPM Spanish national
project (TIN2016-78011-C4-4-R).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Corcho</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Priyatna</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chaves-Fraga</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Towards a New Generation of Ontology Based Data Access</article-title>
          .
          <source>Semantic Web</source>
          <volume>11</volume>
          ,
          <issue>153</issue>
          {
          <fpage>160</fpage>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Das</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sundara</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cyganiak</surname>
            ,
            <given-names>R.:</given-names>
          </string-name>
          <article-title>R2RML: RDB to RDF Mapping Language</article-title>
          ,
          <source>W3C Recommendation 27 September</source>
          <year>2012</year>
          , https://www.w3.org/TR/r2rml/
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Dimou</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vander</surname>
            <given-names>Sande</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Colpaert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Verborgh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Mannens</surname>
          </string-name>
          , E., Van de Walle, R.:
          <article-title>RML: a generic language for integrated RDF mappings of heterogeneous data</article-title>
          .
          <source>In: Ldow</source>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Heyvaert</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dimou</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Herregodts</surname>
            ,
            <given-names>A.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Verborgh</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schuurman</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mannens</surname>
          </string-name>
          , E., Van de Walle, R.:
          <article-title>Rmleditor: a graph-based mapping editor for linked data mappings</article-title>
          .
          <source>In: European Semantic Web Conference</source>
          . pp.
          <volume>709</volume>
          {
          <fpage>723</fpage>
          . Springer (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Iglesias-Molina</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chaves-Fraga</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Priyatna</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corcho</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          :
          <article-title>Towards the de nition of a language-independent mapping template for knowledge graph creation</article-title>
          .
          <source>In: Proceedings of the Third International Workshop on Capturing Scienti c Knowledge</source>
          . pp.
          <volume>33</volume>
          {
          <issue>36</issue>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Sicilia</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nemirovski</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nolle</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <string-name>
            <surname>Map-On</surname>
          </string-name>
          :
          <article-title>A web-based editor for visual ontology mapping</article-title>
          .
          <source>Semantic Web</source>
          <volume>8</volume>
          (
          <issue>6</issue>
          ),
          <volume>969</volume>
          {
          <fpage>980</fpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>