Versioned Objects
Dag Hovland, Fredrik Chrislock
Bouvet Norway, Bergen, Norway


    In our work at Equinor, an oil and gas operator company, we are involved in the exchange of
information between contractors and operator during design and planning of new facilities,
or capital projects. We are part of a larger effort to get away from transferring documents and
rather transfer smaller pieces of information. The goal is that information about the facility
becomes less disconnected and that more of it can be processed by machines. It is also a goal
that the interaction between operator and contractor can be faster and more small-grained.
That is, smaller pieces of information can be transferred, not only complete documents.
    RDF is a strong candidate for the transfer and storage of this information. However, it does
not fulfill our needs for maintaining the same dataset in different organizations. Specifically,
parts of the model, in different stages of development, will be exchanged between organizations
(e.g. operator and contractor) and between units, the parts will be extended, changed and
commented on simultaneously in these different organizations. It is not an option to maintain
a single RDF dataset that is directly edited by all partners. There are two reasons for this: i)
The contractor and operator do not wish to share their complete graph with each other and
ii) Multiple, differing, versions of the same model/graph must be possible to maintain, to be
able to explore diffferent design choices. A consequence of this requirement is that decorating
properties with identifiers of the dataset version, like in [3], is not sufficient.
    In relational data warehousing this type of data is called slowly changing dimensions[2] and
there are 7 named patterns for maintaining them. These patterns can also be implemented in
RDF, but since we are no longer restricted to fixed, tabular formats, some patterns therefore are
probably not useful in RDF.
    The options we have considered for transferring the model as RDF between contracator and
operator
          1. Send the whole RDF graph every time
          2. Send instructions about what triples to delete and what to insert
          3. Send the RDF graph for each object that has been changed
  Before explaining what we mean with objects, let us explain why the first two options do
not cover our needs: Transferring the whole RDF graph means that concurrent changes to the
graph are complicated and would need a strong locking mechanism on the RDF graph at the
operator, preventing changes to the parts of the graph that are currently under work by the
contractor. This is a too strict requirement.

ISWC 2022 Industry Proceedings, October 23 – 27, 2022, Hangzhou, China
$ dag.hovland@bouvet.no (D. Hovland); fredrik.chrislock@bouvet.no (F. Chrislock)
 0000-0002-3569-8838 (D. Hovland)
                                    © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
 CEUR
 Workshop
 Proceedings
               http://ceur-ws.org
               ISSN 1613-0073
                                    CEUR Workshop Proceedings (CEUR-WS.org)
   The second option is not allowed by the users: The engineers want to make sure that
whoever makes a change to an object in the model, is aware of the state of the object they are
changing. This is not possible in a distributed setting if single triples can be inserted or removed
independently. This also means that Auer and Herre’s atomic changes [1] does not solve our
problem, although they could supplement the solution by being used to describe the changes
on objects.
   This leaves us with our suggested solution, which is to agree upon a fragmentation of the data
into objects. Every triple in the RDF graph is a member of exactly one such object, and there must
be agreed between the collaborators rules about the sizes of these objects. It must further be
possible to easily distinguish between different versions of an object, and the provenance history
of that specific version. Note that a version of an object is a RDF graph. These requirements
appear intuitive to the engineers and have led us to the following design decisions:
   1. Data is transferred as JSON-LD, constructed using an agreed JSON-LD-Frame. We found
      JSON-LD Framing convenient for expressing and agreeing upon the sizes of the objects.
   2. We distinguish between a persistent IRI, which identifies a domain object, and a versioned
      IRI, which identifies an RDF graph describing a specific version of an object. The latter
      are immutable.
   3. To identify and distinguish versioned IRIs and objects we use a custom scheme where the
      versioned IRIs consist of the persistent IRIs suffixed with the string "/version/", a hash of
      the object, and then the date
   4. Edges between objects use the versioned IRIs, and are reified, using rdf:subject, rdf:predicate
      and rdf:object
   5. PROV-Os prov:wasDerivedFrom is used to link an object to the previous version of an
      object
The representation of versions of objects as separate entities with their own IRIs is necessary to
be able to support the transfer of changes of data between organizations. When a system receives
an object, it can compare with the previously stored version of the object, and, importantly, see
both what triples should be removed and which should be added. Our approach has no impact
on how data is read from the RDF Graph, only on how data is written into it.
  An implementation of this versioning scheme is available at https://github.com/equinor/
versioned-object.


References
[1] S. Auer and H. Herre. A versioning and evolution framework for RDF knowledge bases. In I. B.
    Virbitskaite and A. Voronkov, editors, Perspectives of Systems Informatics, 6th International
    Andrei Ershov Memorial Conference, PSI 2006, Novosibirsk, Russia, June 27-30, 2006. Revised
    Papers, volume 4378 of Lecture Notes in Computer Science, pages 55–69. Springer, 2006.
[2] R. Kimball. The Data Warehouse Toolkit: Practical Techniques for Building Dimensional Data
    Warehouses. John Wiley, 1996.
[3] X. Ma, C. Ma, and C. Wang. A new structure for representing and tracking version informa-
    tion in a deep time knowledge graph. Computers & Geosciences, 145:104620, 2020.