=Paper= {{Paper |id=Vol-2721/paper593 |storemode=property |title=J2RM: An ontology-based JSON-to-RDF Mapping tool |pdfUrl=https://ceur-ws.org/Vol-2721/paper593.pdf |volume=Vol-2721 |authors=Sergio José Rodríguez Méndez,Armin Haller,Pouya Ghiasnezhad Omran,Jesse Wright,Kerry Taylor |dblpUrl=https://dblp.org/rec/conf/semweb/MendezHOWT20 }} ==J2RM: An ontology-based JSON-to-RDF Mapping tool== https://ceur-ws.org/Vol-2721/paper593.pdf
       J2RM : an Ontology-based JSON-to-RDF
                   Mapping Tool

    Sergio J. Rodrı́guez Méndez1,2 , Armin Haller1,4 , Pouya G. Omran1,3 , Jesse
                          Wright1,4 , and Kerry Taylor1,4
              1
                  Australian National University, Canberra ACT 2601, AU
                        2
                            Sergio.RodriguezMendez@anu.edu.au
                                 3
                                   P.G.Omran@anu.edu.au
                          4
                             {firstname.lastname}@anu.edu.au

        Abstract. This manuscript introduces J2RM : a tool to process map-
        pings from JSON data to RDF triples guided by an OWL2 ontology
        structure. The mappings are defined as annotation properties associated
        with each ontology entity of interest. They are embedded in an ontol-
        ogy file so that they can be readily deployed and shared to automate
        RDF-graph creation. In this paper, we motivate the need for such map-
        pings, describe some of their definitions on a use case example, present
        the formal grammar of the mapping language, and explain how these
        mappings work. We conclude with a discussion of the key aspects, main
        contributions, and future improvements.

        Keywords: JSON · RDF · Mappings · Ontology · Automated Graph
        Creation · Information Architect Tool


1     Introduction
Quite often data transformation tasks consume a lot of engineering effort when
dealing with heterogeneous data models and formats. Specifically, creating an
RDF-graph based on data extracted from a closed and proprietary information
system can be a daunting task. A simple approach to extract the required and
curated data from these systems is to expose the data in an “easy-to-process”
format, usually, JSON, as an intermediary representation. JSON has been used
extensively in a variety of processing tasks as a serialization format becoming the
universal format for data interchange on the Web [2]. Frequently, software engi-
neering teams do not have a deep understanding of Semantic Web technologies.
In such cases, a tool that could abstract all the time-consuming complexities
of creating and storing RDF triples –on-the-fly– from any JSON data set could
help these Web developers. Moreover, by embedding the mappings in the on-
tology file itself they become shareable. This paper introduces J2RM, a tool
that gives a versatile solution for these use cases5 . Its main goal is to automate
  Copyright c 2020 for this paper by its authors. Use permitted under Creative Com-
  mons License Attribution 4.0 International (CC BY 4.0).
5
  The source code is available at https://github.com/srodriguez142857/J2RM. A
  series of demo videos can be found at https://bit.ly/3h5iE5M
           Rodrı́guez Méndez et al.

RDF-graph creations from JSON data following an OWL2 ontology structure.
The mappings are declared as annotation properties associated with each ontol-
ogy entity of interest6 . The mappings are embedded in an ontology file so they
can be readily deployed to automate the graph creation from a “standardized”
JSON structure, tailored from any information systems’ data (see Figure 1).
With J2RM, one could work with different JSON structures where all mappings
are embedded in a specific ontology file. Some transformation and mapping lan-
guages have been proposed to generate RDF from non-RDF data, including
SPARQL-Generate [9], XSPARQL [4], SAURON [6], Elda [8], [5], R2RML [7],
and RML7 [3]. While most of these methods consider a given mapping, in this
paper we consider the use of an OWL2 ontology for extracting the schema of the
target RDF data. To the best of our knowledge, while there are many tools that
follow different approaches to map JSON data to RDF, none of them embed
the mappings in ontology definition files: J2RM mappings are not defined in a
separate input file.



                                                                              J2RM Mappings




   Information System                      JSON
 (closed and proprietary)      (as an intermediary format)

                                                             J2RM Processor



                                                                                         <_ _ _>
                                                                                         <_ _ _>




                            Fig. 1: J2RM general functional architecture


2     Mapping Definitions
Figure 2 presents an excerpt of a modified JSON document about medical re-
search grants, while table 1 presents an excerpt of a grants ontology definition
along with the J2RM mappings. J2RM mappings follow the path-based syntax
presented in figure 3. The mappings are designed as JSON-Pointer [1] exten-
sions with their own primitives that define basic transformations and operations
applied to the JSON data. Below, we describe how these mappings work.
    Class mappings: create an instance for each mapped value with the struc-
ture  a . In table 1, #1 maps to the value "A19453"; its gen-
erated triple is d:Eval#A19453 a m:Eval. #2 maps to an array of values (["",
6
    The IRI used to identify the mappings is defined in the J2RM configuration file.
7
    Although the RML mappings may be connected to an ontology, these are defined in
    a separate definition file.
                              J2RM : an Ontology-based JSON-to-RDF Mappings Tool
"doc": {                                                                    "org": "Bright Institute of Neuroscience",
    "id": "A19453",                                                         "ORCID": "",
    "year": "2011",                                                         "FoR_Data": [
    "scheme": "ABC-Grant",                                                    { "FoR_code": "12908", "FoR": "Central Nervous System",
    "app_AdminInstName": "Bright Institute of Neuroscience",                    "FOR_Date_Start": "2001-02-01"
    "finalScore": "5.1125",                                                   },
    "title": "Correlation between strokes and dementia:                       { "FoR_code": "72099", "FoR": "General Cognitive Science",
    Cognition changes following stroke",                                        "FOR_Date_Start": "2001-02-01"
    "Keywords": "stroke | neuropsychology | cognitive disorders",             } ]
    "Broad_Research_Area": "Clinical Medicine",                           },
    "FoR_code": "12908", "FoR": "Central Nervous System",                 { "ind_id": "401443", "role": "CIB",
    "scoreCriteria1": "4.91",                                               "name": "Dr Peter Parker",
    "publicData": [                                                         "org": "Bright Institute of Neuroscience",
    {    "title": "Cognition following stroke: is it neurodegenerative?",   "ORCID": "https://orcid.org/0X30-01X1-68X0-083X",
         "id": "19453",                                                     "FoR_Data": [
         "researchers": [ "Dr Peter Parker", "Dr Susan Storm" ],              { "FoR_code": "60939", "FoR": "Psychology and Cognitive Sciences",
         "startDate": 2012, "endDate": 2016,                                    "FOR_Date_Start": "2006-03-01"
         "funder": [ "Medical Research Council" ],                            },
         "adminInst": null,                                                   { "FoR_code": "10791", "FoR": "General Neurosciences",
         "principalAnalyst": [ "Dr Susan Storm" ],                              "FOR_Date_Start": "2006-03-01"
         "links": [ { "href": "http://test.com/key/19453" } ]                 } ]
    } ],                                                                  } ]
    "team": [                                                       }
      { "ind_id": "401636", "role": "CIA",
         "name": "Dr Susan Storm",




          Fig. 2: Excerpt of a JSON document about research medical grants
                                                                        hsingle-path-opi |= hmeta-chari | hspliti | hrel-valuei
                                                            hmultiple-path-opi |= hconditioni | hequal-valuesi
               hpathi |= hJSON-pointerihOWL-resi? | hmapi
                                                                    hconditioni |= %hand-expressioni
      hJSON-pointeri |= / hstepihJSON-pointeri | / hstepi
                                                                 hequal-valuesi |= |=|hsimple-pi
               hmapi |= hpointeri | hmulti-pointeri | hOP-mapi
                                                                          hspliti |= ("string-delimiter")
            hpointeri |= hsimple-pi hOWL-resi? | hcomplex-pi
                                                                     hrel-valuei |= hrelational-operatorihvaluei
      hmulti-pointeri |= hsimple-pi (\n+hsimple-pi) +
                                                                    hOWL-resi |= @entity-namehrangei?
           hOP-mapi |= D=hsimple-pi? \nR=hsimple-pi
                                                                        hrangei |= -ientity-name
           hsimple-pi |= hstepi/hsimple-pi | hstepi
                                                                          hstepi |= name | number
         hcomplex-pi |= hsimple-pihoperationi
                                                          hrelational-operatori |= = | i | h | i= | h=
          hoperationi |= hsingle-path-opi hOWL-resi?
                                                                   hmeta-chari |= # | ! | ~ | h
          hoperationi |= hmultiple-path-opi



Fig. 3: Excerpt of the J2RM mappings formal grammar expressed in simple
Extended Backus-Naur Form (EBNF) notation


"https://orcid.org/0X30-01X1-68X0-083X"]), however, only one triple is gen-
erated8 . In this case, the meta-character “#” indicates that the mapped value
is used “as is” in the IRI9 . The meta-character “!” in #3 indicates that the
string value (with blank spaces) is used to generate an IRI (with replacements):
d:Area#Clinical-Medicine a m:Area. #4 maps to an array formed of compos-
ite values based on the tree structure: ["A19453-401636", "A19453-401443"],
which are used to generate two instances of m:ChiefAnalyst.
    Datatype (dp) and annotation (ap) prop. mappings: create a triple for
each mapped value with the structure   "value"^^.
J2RM analyzes the ontology; for each class (and sub-classes) that has 
as a class restriction, it will create a triple for each mapped instance. One exam-
ple of #5 is d:Analyst#401636 m:fullName "Dr Susan Storm"^^xsd:string
considering that m:Analyst has m:fullName in its class restrictions. In this case,
the meta-character “~” indicates that the mapped value is used to automatically
 8
     Empty mapped values (“”, null, –or not existent–) are not processed.
 9
     The created triple is  a m:ORCID
             Rodrı́guez Méndez et al.

Table 1: Excerpt of a grant ontology definition along with the J2RM mappings
            OWL2                                                                 J2RM Mappings
#                                    QName
            Entity                                                            (defined as annotation properties)
     1   Class            m:Eval                                 /doc/id
     2   Class            m:ORCID                                doc/team/ORCID#
     3   Class            m:Area                                 doc/Broad_Research_Area!
     4   Class            m:ChiefAnalyst                         doc\n+/id\n+/team/ind_id
     5   Datatype Prop.   m:fullName (xsd:string)                doc/team/name~
     6   Datatype Prop.   m:keyword (xsd:string)                 doc/Keywords("|")
     7   Datatype Prop.   m:CGcriterion (xsd:float)              /doc/scoreCriteria1%/doc/scheme="ABC- Grant"&&/doc/year< 2019
     8   Datatype Prop.   m:link (xsd:anyURI)                    doc/publicData/links/href<
     9   Object Prop.     m:hasFoR rdfs:range(m:FoR)             /doc@GrantApp
                                                                 doc/team/FoR_Data@FoR_cat
 10      Object Prop.     m:hasORCID rdfs:range (m:ORCID)        doc/team
 11      Object Prop.     m:hasResearcher-1 (rdfs:subPropertyOf) D=\nR=doc/team/role="CIA"
 12      Object Prop.     m:about                                doc/team/FoR_Data@CV- > FoR
                          rdfs:range(m:FoR, m:Organization)      doc/team/org|=|doc/publicData/adminInst
 13      Annotation       dc:title                               doc/title~@GrantApp
         Prop.                                                   doc/publicData/title~@Grant
“m” (model) and “d” (data) are namespace prefixes defined in the ontology (“m”) and in the config. file (“d”)




create an rdfs:label triple as well (#13 presents similar examples)10 . #6 creates
a triple for each value found when splitting the mapped values using the delimiter
“ | ” and, thus, it will generate three keywords. #7 defines a “conditional path”:
in this case, it will create a triple with the mapped value of “4.91” because the
restriction (expression after meta-character “%”) evaluates to true: the scheme
and year values are mapped and evaluated correctly. In #8, the meta-character
“<” defines a mapping to a common JSONObject ancestor: for the m:Grant class
with instances mapped as doc/publicData/id, the ancestor is publicData11 .
    Object prop. mappings: create triples between sets of mapped values for
each identified class that is applicable in the analyzed context (class hierar-
chies, sub-properties, etc). The structure generated is  
, where  correspond to the mapped instances
of each  domain class, and  correspond to the mapped
instances of each  range class. The mappings are paths that define the con-
nection between  and . Simple cases, such as
#912 and #10, find the connection between the instances in a single path: in #9,
/doc connects the domain instances /doc/id="A19453" with the range instances
/doc/FoR code="12908", creating the triple d:GrantApp#A19453 m:hasFoR d:FoR#12908.
The meta-character “@” is used (#9, #12, #13) to indicate the entity (domain
class) attached to the path (useful for entity disambiguation). In #10, when
applying to the domain class m:Analyst, the mapping results in an array of
values for both, the domain (["401636", "401443"]) and the range (same as
#2). Internally, the tool keeps track of the context for each mapped JSONObject
that could result in a valid connection. #11 illustrates a mapping based on two
different paths: for domain (D=, indicates the usage of the already known in-
stances from the domain classes) and range (R=..., indicates the mapping to
the values that are equal to "CIA"). #12 illustrates two mappings: one where ex-
plicitly disambiguate the domain and range classes to use (CV->FoR), and other,

10
   rdfs:label creation might be useful in some graph search and visualization tools.
11
   The created triple is d:Grant#19453 m:link "http://test.com/key/19453"^^xsd:anyURI
12
   #9 defines two mappings that apply to distinct domain classes.
                      J2RM : an Ontology-based JSON-to-RDF Mappings Tool

|=|, where it will map to values of  only if those
values are equal to values of .
   Along with each mapping, one can specify the target endpoint and graph.
Target endpoint is a label that identifies a SPARQL endpoint access13 where
the triples will be created. Examples: test, prod. Target graph is the named
graph where the triples will be created. It is defined as a namespace prefix in the
ontology file. Examples: g0-testing, g0-prod. The namespace prefix IRI will
be used as the named graph for the triple creation for that specific mapping.


3      Conclusions and Ongoing Work
J2RM gives information architects a simple mechanism to define the necessary
mapping rules for an automated RDF-graph creation task guided by an OWL2
ontology structure from any JSON data. The key aspect is that the mappings
are embedded in an ontology file: this does not imply that the JSON structure is
intrinsically tied to the OWL2 model. For different JSON structures, one could
define each type of mappings in different ontology files. J2RM is in its early de-
velopment stages. It has been tested on three different domain ontologies. We will
increase the support for more complex JSON mappings and more OWL2 axioms.
The major contributions of this tool are: the ability to selectively extract data
and perform basic operations on the source JSON structure, the “portability” of
the mappings embedded in the OWL2 ontology file as annotation properties at-
tached to classes and properties, and its ease of use while hiding the complexity
of creating RDF triples following OWL2 axioms.


References
1. JavaScript Object Notation (JSON) Pointer. Request for comments, Internet Engi-
   neering Task Force (IETF) (April 2013), https://tools.ietf.org/html/rfc6901
2. ECMA-404: The JSON Data Interchange Syntax. Standard, ECMA International
   (December 2017), https://www.json.org/
3. RDF Mapping Language (RML). Unofficial draft, Ghent University (July 2020),
   https://rml.io/specs/rml/
4. Akhtar, W., Kopecký, J., Krennwallner, T., Polleres, A.: Xsparql: Traveling between
   the xml and rdf worlds – and avoiding the xslt pilgrimage. In: The Semantic Web:
   Research and Applications. pp. 432–447. Springer Berlin Heidelberg (2008)
5. Arenas, M., Bertails, A., Prud’hommeaux, E., Sequeda, J.: A direct mapping of
   relational data to rdf (2012)
6. Bareau, C., Blache, F., Bolle, S., Ecrepont, C., Folz, P., Hernandez, N., Monteil, T.,
   Privat, G., Ramparany, F.: Semi-automatic rdfization using automatically generated
   mappings. In: ESWC Posters and Demos Track (2020)
7. Das, S., Sundara, S., Cyganiak, R.: R2rml: Rdb to rdf mapping language (2012)
8. Elda, a Linked Data API implementation. https://github.com/epimorphics/elda
9. Lefrançois, M., Zimmermann, A., Bakerally, N.: Sparql-generate: RDF generation
   from heterogeneous data sources. In: EKAW Satellite Events (2016)

13
     Defined in the J2RM configuration file.