<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>J2RM : an Ontology-based JSON-to-RDF Mapping Tool</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Sergio J. Rodr guez Mendez</string-name>
          <email>Sergio.RodriguezMendez@anu.edu.au</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Armin Haller</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pouya G. Omran</string-name>
          <email>P.G.Omran@anu.edu.au</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jesse Wright</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kerry Taylor</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Australian National University</institution>
          ,
          <addr-line>Canberra ACT 2601, AU</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>This manuscript introduces J2RM : a tool to process mappings from JSON data to RDF triples guided by an OWL2 ontology structure. The mappings are de ned as annotation properties associated with each ontology entity of interest. They are embedded in an ontology le so that they can be readily deployed and shared to automate RDF-graph creation. In this paper, we motivate the need for such mappings, describe some of their de nitions on a use case example, present the formal grammar of the mapping language, and explain how these mappings work. We conclude with a discussion of the key aspects, main contributions, and future improvements.</p>
      </abstract>
      <kwd-group>
        <kwd>JSON</kwd>
        <kwd>RDF</kwd>
        <kwd>Mappings</kwd>
        <kwd>Creation</kwd>
        <kwd>Information Architect Tool</kwd>
        <kwd>Ontology</kwd>
        <kwd>Automated Graph</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Quite often data transformation tasks consume a lot of engineering e ort when
dealing with heterogeneous data models and formats. Speci cally, creating an
RDF-graph based on data extracted from a closed and proprietary information
system can be a daunting task. A simple approach to extract the required and
curated data from these systems is to expose the data in an \easy-to-process"
format, usually, JSON, as an intermediary representation. JSON has been used
extensively in a variety of processing tasks as a serialization format becoming the
universal format for data interchange on the Web [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Frequently, software
engineering teams do not have a deep understanding of Semantic Web technologies.
In such cases, a tool that could abstract all the time-consuming complexities
of creating and storing RDF triples {on-the- y{ from any JSON data set could
help these Web developers. Moreover, by embedding the mappings in the
ontology le itself they become shareable. This paper introduces J2RM, a tool
that gives a versatile solution for these use cases5. Its main goal is to automate
Copyright c 2020 for this paper by its authors. Use permitted under Creative
Commons License Attribution 4.0 International (CC BY 4.0).
5 The source code is available at https://github.com/srodriguez142857/J2RM. A
series of demo videos can be found at https://bit.ly/3h5iE5M
      </p>
      <p>
        RDF-graph creations from JSON data following an OWL2 ontology structure.
The mappings are declared as annotation properties associated with each
ontology entity of interest6. The mappings are embedded in an ontology le so they
can be readily deployed to automate the graph creation from a \standardized"
JSON structure, tailored from any information systems' data (see Figure 1).
With J2RM, one could work with di erent JSON structures where all mappings
are embedded in a speci c ontology le. Some transformation and mapping
languages have been proposed to generate RDF from non-RDF data, including
SPARQL-Generate [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], XSPARQL [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], SAURON [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], Elda [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], R2RML [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ],
and RML7 [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. While most of these methods consider a given mapping, in this
paper we consider the use of an OWL2 ontology for extracting the schema of the
target RDF data. To the best of our knowledge, while there are many tools that
follow di erent approaches to map JSON data to RDF, none of them embed
the mappings in ontology de nition les: J2RM mappings are not de ned in a
separate input le.
      </p>
      <p>Information System
(closed and proprietary)</p>
      <p>JSON
(as an intermediary format)</p>
      <p>J2RM Processor</p>
      <p>J2RM Mappings
&lt;_ _ _&gt;
&lt;_ _ _&gt;
6 The IRI used to identify the mappings is de ned in the J2RM con guration le.
7 Although the RML mappings may be connected to an ontology, these are de ned in
a separate de nition le.
} ],
"team": [
{ "ind_id": "401636", "role": "CIA",</p>
      <p>"name": "Dr Susan Storm",
erated8. In this case, the meta-character \#" indicates that the mapped value
is used \as is" in the IRI9. The meta-character \!" in #3 indicates that the
string value (with blank spaces) is used to generate an IRI (with replacements):
d:Area#Clinical-Medicine a m:Area. #4 maps to an array formed of
composite values based on the tree structure: ["A19453-401636", "A19453-401443"],
which are used to generate two instances of m:ChiefAnalyst.</p>
      <p>Datatype (dp) and annotation (ap) prop. mappings: create a triple for
each mapped value with the structure &lt;data#ID&gt; &lt;dp|ap&gt; "value"^^&lt;xsd:type&gt;.
J2RM analyzes the ontology; for each class (and sub-classes) that has &lt;dp|ap&gt;
as a class restriction, it will create a triple for each mapped instance. One
example of #5 is d:Analyst#401636 m:fullName "Dr Susan Storm"^^xsd:string
considering that m:Analyst has m:fullName in its class restrictions. In this case,
the meta-character \~" indicates that the mapped value is used to automatically
8 Empty mapped values (\", null, {or not existent{) are not processed.
9 The created triple is &lt;https://orcid.org/0X30-01X1-68X0-083X&gt; a m:ORCID
create an rdfs:label triple as well (#13 presents similar examples)10. #6 creates
a triple for each value found when splitting the mapped values using the delimiter
\ | " and, thus, it will generate three keywords. #7 de nes a \conditional path":
in this case, it will create a triple with the mapped value of \4.91" because the
restriction (expression after meta-character \%") evaluates to true: the scheme
and year values are mapped and evaluated correctly. In #8, the meta-character
\&lt;" de nes a mapping to a common JSONObject ancestor: for the m:Grant class
with instances mapped as doc/publicData/id, the ancestor is publicData11.</p>
      <p>Object prop. mappings: create triples between sets of mapped values for
each identi ed class that is applicable in the analyzed context (class
hierarchies, sub-properties, etc). The structure generated is &lt;domainData#ID&gt; &lt;op&gt;
&lt;rangeData#ID&gt;, where &lt;domainData#ID&gt; correspond to the mapped instances
of each &lt;op&gt; domain class, and &lt;rangeData#ID&gt; correspond to the mapped
instances of each &lt;op&gt; range class. The mappings are paths that de ne the
connection between &lt;domainData#ID&gt; and &lt;rangeData#ID&gt;. Simple cases, such as
#912 and #10, nd the connection between the instances in a single path: in #9,
/doc connects the domain instances /doc/id="A19453" with the range instances
/doc/FoR code="12908", creating the triple d:GrantApp#A19453 m:hasFoR d:FoR#12908.
The meta-character \@" is used (#9, #12, #13) to indicate the entity (domain
class) attached to the path (useful for entity disambiguation). In #10, when
applying to the domain class m:Analyst, the mapping results in an array of
values for both, the domain (["401636", "401443"]) and the range (same as
#2). Internally, the tool keeps track of the context for each mapped JSONObject
that could result in a valid connection. #11 illustrates a mapping based on two
di erent paths: for domain (D=, indicates the usage of the already known
instances from the domain classes) and range (R=..., indicates the mapping to
the values that are equal to "CIA"). #12 illustrates two mappings: one where
explicitly disambiguate the domain and range classes to use (CV-&gt;FoR), and other,
10 rdfs:label creation might be useful in some graph search and visualization tools.
11 The created triple is d:Grant#19453 m:link "http://test.com/key/19453"^^xsd:anyURI
12 #9 de nes two mappings that apply to distinct domain classes.
&lt;path 1&gt;|=|&lt;path 2&gt;, where it will map to values of &lt;path 1&gt; only if those
values are equal to values of &lt;path 2&gt;.</p>
      <p>Along with each mapping, one can specify the target endpoint and graph.
Target endpoint is a label that identi es a SPARQL endpoint access13 where
the triples will be created. Examples: test, prod. Target graph is the named
graph where the triples will be created. It is de ned as a namespace pre x in the
ontology le. Examples: g0-testing, g0-prod. The namespace pre x IRI will
be used as the named graph for the triple creation for that speci c mapping.
3</p>
    </sec>
    <sec id="sec-2">
      <title>Conclusions and Ongoing Work</title>
      <p>J2RM gives information architects a simple mechanism to de ne the necessary
mapping rules for an automated RDF-graph creation task guided by an OWL2
ontology structure from any JSON data. The key aspect is that the mappings
are embedded in an ontology le: this does not imply that the JSON structure is
intrinsically tied to the OWL2 model. For di erent JSON structures, one could
de ne each type of mappings in di erent ontology les. J2RM is in its early
development stages. It has been tested on three di erent domain ontologies. We will
increase the support for more complex JSON mappings and more OWL2 axioms.
The major contributions of this tool are: the ability to selectively extract data
and perform basic operations on the source JSON structure, the \portability" of
the mappings embedded in the OWL2 ontology le as annotation properties
attached to classes and properties, and its ease of use while hiding the complexity
of creating RDF triples following OWL2 axioms.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>JavaScript</given-names>
            <surname>Object</surname>
          </string-name>
          <article-title>Notation (JSON) Pointer</article-title>
          .
          <article-title>Request for comments, Internet Engineering Task Force (IETF)</article-title>
          (
          <year>April 2013</year>
          ), https://tools.ietf.org/html/rfc6901
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2. ECMA-
          <volume>404</volume>
          :
          <article-title>The JSON Data Interchange Syntax</article-title>
          . Standard, ECMA International (
          <year>December 2017</year>
          ), https://www.json.org/
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>RDF</given-names>
            <surname>Mapping</surname>
          </string-name>
          <article-title>Language (RML). Uno cial draft</article-title>
          , Ghent University (
          <year>July 2020</year>
          ), https://rml.io/specs/rml/
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Akhtar</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kopecky</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Krennwallner</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Polleres</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Xsparql: Traveling between the xml and rdf worlds { and avoiding the xslt pilgrimage</article-title>
          .
          <source>In: The Semantic Web: Research and Applications</source>
          . pp.
          <volume>432</volume>
          {
          <fpage>447</fpage>
          . Springer Berlin Heidelberg (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Arenas</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bertails</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Prud</surname>
          </string-name>
          'hommeaux, E.,
          <string-name>
            <surname>Sequeda</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>A direct mapping of relational data to rdf (</article-title>
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Bareau</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Blache</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bolle</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ecrepont</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Folz</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hernandez</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Monteil</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Privat</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ramparany</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Semi-automatic rd zation using automatically generated mappings</article-title>
          .
          <source>In: ESWC Posters and Demos Track</source>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Das</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sundara</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cyganiak</surname>
          </string-name>
          , R.: R2rml:
          <article-title>Rdb to rdf mapping language (</article-title>
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8. Elda,
          <article-title>a Linked Data API implementation</article-title>
          . https://github.com/epimorphics/elda
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Lefrancois</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zimmermann</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bakerally</surname>
          </string-name>
          , N.:
          <article-title>Sparql-generate: RDF generation from heterogeneous data sources</article-title>
          .
          <source>In: EKAW Satellite Events</source>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>