<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>RDF2JSON-OM: Dynamic ontology serialization using ontology mapping paths</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Patrik Kompuš</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Workshop</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Prague University of Economics and Business</institution>
          ,
          <addr-line>nám. W. Churchilla 1938/4, Prague, 130 67</addr-line>
          ,
          <country country="CZ">Czech Republic</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <fpage>26</fpage>
      <lpage>28</lpage>
      <abstract>
        <p>Processing semantically enriched data in RDF or Turtle format is not trivial for traditional development teams, who usually consume data in XML or JSON format. To improve the developers' experience of using the Semantic Web serialization formats, e.g., RDF or Turtle, an intermediate solution might be beneficial. This work presents a way for dynamic serialization of ontology metadata graphs, the possible structure, with ontology mapping paths, the required structure, to be later used by software development teams. Presented idea is implemented as a tool, RDF2JSON-OM, and is demonstrated as a web service endpoint.</p>
      </abstract>
      <kwd-group>
        <kwd>Dynamic ontology structures</kwd>
        <kwd>Semantic Web technologies adoption</kwd>
        <kwd>Ontology serialization</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        As the amount of the data processed and stored by companies is rising in the last decades [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], these
companies are joining the data transformation journey to become more aware about their own datasets
and ofer better products [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Major task in this journey is to ensure fast and precise data exchange
and interoperability, for which stakeholders need to spend a lot of resources [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Semantic Web
technologies have the means to tackle this task by design, yet the adoption of these technologies in a
business environment is hard, especially when business operations and sales results depend heavily
on traditional ways of delivering data to internal or external systems. It requires the whole chain of
command to be on board with this change, backed by evaluation and reassurement that the solution will
work out and improve current processes. Also, the developer toolbox must become comparable to the
one that normal full-stack app developers enjoy, convincible enough to jump on it for the next project
[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. To help businesses overcome this in the long run, an intermediate solution can be introduced, thus
incorporate Semantic Web technologies gradually.
      </p>
      <p>
        JSON-LD [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] introduces a JSON tree structure with a context key, which holds the semantics of the
information delivered. However, for heavily JSON-dependent platforms, it can be verbose and costly to
handle context discovery for any possible variation, namely in an event-based data delivery. Detailed
performance tests were done by JSON-LD creators in 2016 already, with results showing that JSON-LD
parsing can be 7551 times slower then plain JSON [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Advised solution to overcome this at least
partially, is to cache the context. Unfortunately, for many use cases that cannot implement any caching
mechanism, this is a show stopper. The specification also states: The syntax is designed to not disturb
already deployed systems running on JSON, but provide a smooth upgrade path from JSON to JSON-LD.
This is not actually happening. Lanthaler et al. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] in 2012 talked about RDF/XML being there for over
decade with very little uptake, thus introducing RESTful services powered by JSON-LD. The same is
happening to JSON-LD. Introduced in 2010, JSON-LD libraries for various programming languages
have maximum version of 1.1 [7], Github repositories with only about 40 contributors and around 180
users in average. This is not passing even the innovators level of adoption mentioned by Pavlov et al.
[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Vast majority of users are using JSON-LD solely for SEO purposes. Big companies will not start
https://www.vse.cz/ (P. Kompuš)
      </p>
      <p>CEUR</p>
      <p>ceur-ws.org
using Semantic Web technologies for their core activities, because the masses of software developers
are not willing to adopt it [8]. Here is where it makes sense to use the presented hybrid solution: linked
data, but in a JSON tree structure with a simple context, as can be seen in Figure 3 - right. For external
interoperability, generating JSON-LD out of consumed JSON with context is easier than consuming it.</p>
      <p>
        RDF/JSON [9] is also a root-oriented structure. The presented solution algorithm extends and
improves the one described in the W3C Note. It does not require branches of the tree to be arrays; that
is optional based on the modeled reality. Also, the cited algorithm does not ofer any solution to cyclic
structures, not even mentioning them. RDF/JSON did not become a W3C recommendation in favour of
JSON-LD. Reasoning included opinions about the purpose and the final target group of such format
[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], which at that time was decided to be primarily Web developers [10][11]. Unfortunately, nowadays
we need to process data not only on the Web, but also in large amounts in data pipelines, messaging
queues, etc., to be quickly used in other consuming components hidden to Web developers.
      </p>
      <p>Ekaputra et al. proposed a modelling framework called SOyA [8] for modeling more interoperable
structures with JSON-LD in scope. The work concentrates on modelling activities, mostly of a metadata
graph, but is not going further to discuss the usage of JSON-LD structures by software developers,
therefore not helping to bridge the adoption gap.</p>
      <p>The presented solution (see Figure 1) shows a way of delivering semantically enriched data to
consuming components in a more developer friendly format: JSON [12]. The data/domain expert
is responsible for modeling the ontology and the desired structures to be serialized. The software
developer then receives that structure in JSON format and uses it to implement further business logic.
Although there are already several standardized JSON serialization formats [13], their main purpose
it to publish data via services [12]. They deliver information in a graph-like structure, mocking the
original graph. At the same time, structures representing a graph do not have one key limitation that
JSON tree structure has: cycles / circular structures. This issue is solved by using newly introduced
mapping paths over the ontology metadata graph. It also concentrates on separating the concerns of
data modeling, where data structures are often not modeled by data/domain experts, but by software
developers. This can speed up the adoption of mentioned SOyA framework among the consumers of
data, not only the creators. This is a crucial step towards clean data delivery and interoperability.</p>
      <p>The presented tool, which we call RDF2JSON-OM (where OM stands for ontology mapping), together
with the demo service implementation can be found on Github [14].</p>
    </sec>
    <sec id="sec-2">
      <title>2. RDF to JSON serialization through mapping paths</title>
      <p>A very simple Person ontology can be seen in Figure 2. The section highlighted in blue represents a
cycle. When attempting to serialize such an ontology graph into a tree structure, an infinite loop of
operations is created, since no end of cycle is specified. Also, let’s think of the use-case where the
Spouse object should not have the isIdentifiedBy property, because that is not part of the requirement
(e.g. lack of datasets, legislation constraints, etc.).</p>
      <sec id="sec-2-1">
        <title>2.1. Traditional JSON structure</title>
        <p>A JSON tree is a 2-dimensional key-value structure that is easy to consume, parse, and get the content
from. The keys have to be unique on the same level of the tree, and values can be either a simple literal,
an object or an array of objects that can expand further internally. An example of a possible JSON tree
structure can be seen in Figure 3 - left.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Structure generated by RDF2JSON-OM</title>
        <p>In the presented solution, the generated structure is a tree with a root, the serialized object; see Figure 3
right. This serialization process is based on the ontology it is meant to serialize. There is no hard-coded
pattern, therefore the whole process is dynamic towards the modeled reality, preferably created by
data/domain experts. Software developers then get the JSON structure they can work with. It provides
them with the keys, that are IRIs, pointing to the ontology for further metadata discovery, if needed.
The populated datatype properties (e.g. hasFamilyName), can carry any relevant information, e.g. the
datatype of the value, which the developers should use.</p>
        <p>Ontologies can be provided as a parameter, but can also be stored in a triple store. That is a big
benefit, since multiple ontologies complementing each other, e.g., Ontology network, can be stored in
the same graph, and the structure generator will create a JSON consisting of classes and properties
from all ontologies touching any of the elements in the desired tree of the serialized object. The storage
can be either persistent, or virtual, built during the generation.</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Implementation</title>
        <p>The solution implements the RDF2JSON-OM structure generator using the JENA API library, which
loads the ontology model as RDF triples and populates a generic Map&lt;Object,Object&gt; object based on
the selected root. Afterwards, it creates the corresponding branches based on the modeled datatype and
object properties. Datatype property ends with a leaf to store only the final value, whereas the above
creation is repeated in recursion for object properties, until the final ontology class has no more object
properties. It takes subClasses, subProperties and restrictions into consideration as well. The required
part of the generated structure is the context, which carries all prefixes used in the structure. The
implemented demo service is a REST endpoint accepting an ontology in RDF/XML format and root
object IRI, returning the generated structure [14].</p>
      </sec>
      <sec id="sec-2-4">
        <title>2.4. Mapping paths</title>
        <p>One of the biggest challenges to overcome was the unavailability to convert circular structures to JSON,
although an IETF draft exists [ 15]. When there is a cycle in the ontology, e.g., objects are pointing to
each other, the JSON generator would enter an endless loop. There are several non-semantic techniques
to overcome this, e.g., content replacements or substitutions, but that leads to the need for content
restoration and adds more complexity.</p>
        <p>Instead, the presented solution involves more descriptive techniques and also benefits the content
creators, as they now have the power to specify what the final structure is supposed to look like. The
creator of the model uses restricted paths to draw only the relevant branches of the final tree structure.
For the infinite loop to happen, the creator would need to create an infinite path, which is impossible in
the real world and also impossible to draw. Generic JSON-LD serialization would include the property
isIdentifiedBy for Spouse object, since it is part of the ontology. With mapping paths we generate only
what we draw to be generated, thus fulfilling the requirements set by stakeholders, yet still keeping the
model as designed by data experts. Moreover, the modeled paths can be re-used in various structures.
They are interoperable, as long as they are logically correct.</p>
        <p>An example of mapping paths is highlighted in Figure 4.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Conclusions and future work</title>
      <p>This work describes the idea and implementation of the ontology graph serialization into a JSON tree
structure, including overcoming the problem with circular structures. It allows the content creators to
explicitly restrict parts of the graph from being serialized and gives them the power to specify how deep
the circular nesting should go. The structure is generated automatically and dynamically, without any
further configuration. These structures are to be sent out as serialized data structures to consumers, who
can either read them with semantics in mind and populate with content for further event based message
processing or decompose them and store them in any arbitrary structure, which raises the possibility
of adopting Semantic Web technologies in a broader community. The final generated structure also
conforms to most of the best practices listed by JSON-LD creators [12] at the moment of writing this
paper, therefore foreseeing the pitfalls of data consumption and thus confirming the maturity of the
solution.</p>
      <p>We plan to include the restriction validation functionality in our tool. OWL axioms themselves can
assure some level of validation, e.g. Person can be declared as disjoint from Vehicle, which then makes
a reasoner trigger an inconsistency even if the data experts makes a mistake and tries to model the
mapping path that way. For much extensive validation, this can be then complemented with SHACL
[16] shapes modelled by data experts.</p>
      <p>Another benefit and usage of the proposed solution is that the same algorithm can be used to generate
a JSON schema in parallel. This schema can be later used for the structures validation during the future
data transfer process, where they will be validated before they reach the software developers, efectively
lowering the amount of tests needed to be prepared.</p>
      <p>To improve developers’ experience with consuming and understanding the delivered structures, we
plan to include the generation of Swagger documentation out of RDF, solely based on data experts’
models. This is another step towards Semantic Web technologies adoption.</p>
    </sec>
    <sec id="sec-4">
      <title>Acknowledgments</title>
      <p>This work has been supported by the EU’s Horizon Europe grant no. 101058682 (Onto-DESIDE).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>F.</given-names>
            <surname>Provost</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Fawcett</surname>
          </string-name>
          ,
          <article-title>Data Science and its Relationship to Big Data and Data-Driven Decision Makingt, 2013</article-title>
          . URL: https://doi.org/10.1089/big.
          <year>2013</year>
          .
          <volume>1508</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Natuvion</surname>
          </string-name>
          ,
          <year>Transformation 2022</year>
          ,
          <string-name>
            <given-names>The</given-names>
            <surname>Study</surname>
          </string-name>
          ,
          <year>2023</year>
          . URL: https://www.natuvion.com/newsroom/ challenges-2023.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>G.</given-names>
            <surname>Pavlov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Genevski</surname>
          </string-name>
          ,
          <article-title>Semantic technologies for the masses</article-title>
          ,
          <source>in: Proceedings of 2015 Big Data, Knowledge and Control Systems Engineering</source>
          ,
          <year>2015</year>
          , pp.
          <fpage>33</fpage>
          -
          <lpage>43</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>G.</given-names>
            <surname>Kellog</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Longley</surname>
          </string-name>
          , P. Champin,
          <source>JSON-LD 1.1</source>
          ,
          <year>2020</year>
          . URL: https://www.w3.org/TR/json-ld11/.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Sporny</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Longley</surname>
          </string-name>
          ,
          <string-name>
            <surname>JSON-LD Best Practice: Context Caching</surname>
          </string-name>
          ,
          <year>2016</year>
          . URL: https://web.archive. org/web/20230131235929/https://manu.sporny.org/2016/json-ld
          <string-name>
            <surname>-</surname>
          </string-name>
          context-caching/.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M.</given-names>
            <surname>Lanthaler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Gütl</surname>
          </string-name>
          ,
          <article-title>On Using JSON-LD to Create Evolvable RESTful Services</article-title>
          ,
          <source>in: Proceedings of the 3rd International Workshop on RESTful Design (WS-REST 2012) at WWW2012</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>