-

Hangzhou, China $ dag.hovland@bouvet.no (D. Hovland); fredrik.chrislock@bouvet.no (F. Chrislock)

Dag Hovland

Fredrik Chrislock

Bouvet Norway

Bergen

Norway

2022

000 0 0002

In our work at Equinor, an oil and gas operator company, we are involved in the exchange of information between contractors and operator during design and planning of new facilities, or capital projects. We are part of a larger efort to get away from transferring documents and rather transfer smaller pieces of information. The goal is that information about the facility becomes less disconnected and that more of it can be processed by machines. It is also a goal that the interaction between operator and contractor can be faster and more small-grained. That is, smaller pieces of information can be transferred, not only complete documents. RDF is a strong candidate for the transfer and storage of this information. However, it does not fulfill our needs for maintaining the same dataset in diferent organizations. Specifically, parts of the model, in diferent stages of development, will be exchanged between organizations (e.g. operator and contractor) and between units, the parts will be extended, changed and commented on simultaneously in these diferent organizations. It is not an option to maintain a single RDF dataset that is directly edited by all partners. There are two reasons for this: i) The contractor and operator do not wish to share their complete graph with each other and ii) Multiple, difering, versions of the same model/graph must be possible to maintain, to be able to explore different design choices. A consequence of this requirement is that decorating properties with identifiers of the dataset version, like in [3], is not suficient. In relational data warehousing this type of data is called slowly changing dimensions[2] and there are 7 named patterns for maintaining them. These patterns can also be implemented in RDF, but since we are no longer restricted to fixed, tabular formats, some patterns therefore are probably not useful in RDF. The options we have considered for transferring the model as RDF between contracator and operator 1. Send the whole RDF graph every time 2. Send instructions about what triples to delete and what to insert 3. Send the RDF graph for each object that has been changed Before explaining what we mean with objects, let us explain why the first two options do not cover our needs: Transferring the whole RDF graph means that concurrent changes to the graph are complicated and would need a strong locking mechanism on the RDF graph at the operator, preventing changes to the parts of the graph that are currently under work by the contractor. This is a too strict requirement.

The second option is not allowed by the users: The engineers want to make sure that whoever makes a change to an object in the model, is aware of the state of the object they are changing. This is not possible in a distributed setting if single triples can be inserted or removed independently. This also means that Auer and Herre’s atomic changes [ 1 ] does not solve our problem, although they could supplement the solution by being used to describe the changes on objects.

This leaves us with our suggested solution, which is to agree upon a fragmentation of the data into objects. Every triple in the RDF graph is a member of exactly one such object, and there must be agreed between the collaborators rules about the sizes of these objects. It must further be possible to easily distinguish between diferent versions of an object, and the provenance history of that specific version. Note that a version of an object is a RDF graph. These requirements appear intuitive to the engineers and have led us to the following design decisions: The representation of versions of objects as separate entities with their own IRIs is necessary to be able to support the transfer of changes of data between organizations. When a system receives an object, it can compare with the previously stored version of the object, and, importantly, see both what triples should be removed and which should be added. Our approach has no impact on how data is read from the RDF Graph, only on how data is written into it.

An implementation of this versioning scheme is available at https://github.com/equinor/ versioned-object.

[1]

Auer and

Herre . A versioning and evolution framework for RDF knowledge bases . In I. B. Virbitskaite and A . Voronkov, editors, Perspectives of Systems Informatics, 6th International Andrei Ershov Memorial Conference, PSI 2006 , Novosibirsk, Russia, June 27-30, 2006 . Revised Papers, volume 4378 of Lecture Notes in Computer Science, pages 55 - 69 . Springer, 2006 .

[2]

Kimball . The Data Warehouse Toolkit: Practical Techniques for Building Dimensional Data Warehouses . John Wiley, 1996 .

[3]

Ma , C. Ma, and

Wang . A new structure for representing and tracking version information in a deep time knowledge graph . Computers & Geosciences , 145 : 104620 , 2020 .