<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Towards Data Integrity Verification for More Sustainable Petroleum Industry</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Yuanwei Qu</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Zhuoxun Zheng</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Baifan Zhou</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yan Zhou</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nicolau Santos</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ognjen Savkovic</string-name>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Arild Waaler</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>David Cameron</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Bosch Center for AI</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Computer Science, Oslo Metropolitan University</institution>
          ,
          <country country="NO">Norway</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Department of Informatics, University of Oslo</institution>
          ,
          <country country="NO">Norway</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Federal University of Rio Grande do Sul</institution>
          ,
          <country country="BR">Brazil</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>Free University of Bozen-Bolzano</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>As a conventional energy industry, the petroleum industry is responsible for supplying over half of the world's energy. Facilitating sustainable development for petroleum energy production remains crucial. Data methods have emerged as powerful tools to advance sustainability by enabling eficient resource management and risk mitigation. However, the reliable implementation of data-driven methods relies on high-quality data, necessitating the verification of data integrity on substantial data volumes. To this end, this poster paper presents our ongoing research, leveraging ontologies and knowledge graphs as shared knowledge representation, and provides preliminary results on data integrity verification. Based on the ontologies, we formulate domain knowledge integrity constraints and test three technologies of integrity verification: Python, PySpark, and SPARQL, for exploring future potential industrial adoption.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;integrity verification</kwd>
        <kwd>petroleum industry</kwd>
        <kwd>sustainability</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>1. Introduction
O3PO:
choke valve
subClass</p>
      <p>Of</p>
      <p>O3PO:</p>
    </sec>
    <sec id="sec-2">
      <title>Valve Manifold O3PO: Sensor</title>
      <p>S1
V3
!$
choke
valve 3</p>
      <sec id="sec-2-1">
        <title>BFO:roleOf</title>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>O3PO:well</title>
      <sec id="sec-3-1">
        <title>RO:partOf</title>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Well</title>
    </sec>
    <sec id="sec-5">
      <title>Section</title>
    </sec>
    <sec id="sec-6">
      <title>Wellbore</title>
      <sec id="sec-6-1">
        <title>RO:partOf</title>
      </sec>
      <sec id="sec-6-2">
        <title>FSO:exchangesFluidWith upstreamTo downstreamTo FSO:suppliesFluidTo</title>
        <p>to suistainability by improving energy production eficiency, and reducing accidents with severe
environmental damage, such as oil leaks.</p>
        <p>Challenge. The performance of data-driven approaches heavily relies on the data quality.
Thereby, it presents an important challenge in ensuring the integrity of the data, which is
crucial for data-driven solutions to deliver reliable predictions. In petroleum production, the
purpose of collecting data from various sensors is to allow domain experts to analyse and
make informed decisions based on their knowledge and experience. In this context, we discuss
one of the issues of data integrity: the data should follow certain constraints of physical laws.
This is in addition to other issues, such as missing values, sensor precision errors, etc. For
verifying the constraints of physical laws, domain knowledge plays an important role, and
should be incorporated in the integrity checking. Semantic technology is suited here due to its
transparency, which the domain experts tend to have a high chance to trust because they can
observe that their domain knowledge is used and how it is used.</p>
        <p>
          Our approach. In this poster paper, we present our ongoing research on a semantic solution
for data integrity verification for petroleum industry. We develop a draft of a petrolum ontology
aligned with upper ontologies as shared representation for data integration and knowledge
representation; we construct knowledge graphs for transparent and unified human
understanding; we experiment technologies for data integrity verification, including Python, PySpark and
SPARQL. We provide preliminary experiment results and discussion of adoption.
2. Data and Knowledge Representation
Ontology for petroleum production. In the petroleum domain, ontologies for petroleum
exploration [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], reservoir [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], ofshore production plant [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], subsurface fault [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ], and petroleum
risk assessment [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. To meet our needs of verifying data integrity for the petroleum industry,
we develop a draft of an ontology for petroleum production wells. Fig. 1a presents a simplified
visual depiction of oil production wells. The fluid flow depicted in the figure originates from
subsurface sub-wells, then merges towards the main well, and in the end reaches the ofshore
production platform. Fig. 1b depicts our petroleum ontology written in OWL 2, which includes
15 classes, 11 object properties, and 2 datatype properties. It contains several classes that are
essential to the industry, including well section and subwell role. The ontology further includes
core relations such as upstreamTo and downstreamTo, which facilitate the representation of the
spatial locations of each well section and indicate the flow direction of the fluid supplied from the
production zone. To ensure compatibility and interoperability, our ontology has been aligned
with a domain ontology: the Ofshore Petroleum Production Plant Ontology (O3PO) [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ], which
is built upon the Basic Formal Ontology (BFO) [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], Relation Ontology (RO) [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ], and Industrial
Ontologies Foundry Core [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ], and Flow System Ontology (FSO) [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]. By using these classes
and relations, our ontology provides a structured framework for integrating data, renaming
the variables, and capturing and organising relevant knowledge and data, and supporting the
integrity verification.
        </p>
        <p>Data from petroleum production wells. The data collected from the sensors in the production
wells are typically presented in relational tables, including various sensor measurements on
the range of well sections from the bottom-hole to the wellhead. These measurements include
parameters such as pressure, temperature, and flow rate of each well section. Additionally, the
collected data can contain other important equipment information, such as the ratio between
the choke opening rate and flow coeficients. These relational tables provide a structured format
for data-driven approaches to make predictions.</p>
        <p>Knowledge graphs for petroleum wells. Based on the proposed ontology, we construct
knowledge graphs (KG) (Fig. 2) with domain experts to illustrate the production wells, well
sections, well sensors, and their relations. These KGs serve as a flexible foundation to formalise
the domain knowledge , and to support a transparent shared understanding between the
domain experts, semantic experts, data scientists, etc. The KGs can also have the potential for
sophisticated reasoning, for example, using domain-knowledge-based constraints to detect subtle
anomalies, identify potential erroneous data, and ensure the consistency of the delivered data.
3. Integrity Verification with Preliminary Evaluation
Integrity constraints. The integrity constraints play a crucial role in ensuring the quality
of petroleum data. After renaming the features, we can formulate these constraints based
on domain knowledge for verifying the data integrity. This allows validation of the sensor
measurements, ensuring that they align with physical laws and empirical expectations. Here
we give three examples (Fig. 2b):
Example 1: the flow rate at any position within a well must consistently equal the flow rate of
its upstream or downstream locations.</p>
        <p>Example 2: the total flow rate of the main well should precisely match the sum of the flow rates
of all merged (or split) wells.</p>
        <p>Example 3: for each well, both the flow rate and pressure consistently adhere to the principles
outlined in the Bernoulli function.
BFO:
hasRole</p>
        <p>Well A
t
an e
inu iTm
tn e
oC oSm
sa t</p>
        <p>A
:hFO traP
B</p>
        <sec id="sec-6-2-1">
          <title>Well 2</title>
          <p>o
T
d
li
u
F
s
d
e
e
:f
O
S
F
BFO:
hasRole</p>
        </sec>
        <sec id="sec-6-2-2">
          <title>Well 1</title>
          <p>few violations) based on real production data, provided by two world-leading energy companies.
In total, we generated five such tables with sizes ranging from 145MB to 1.45GB, containing from
1 million to 10 million records. In addition, we generate corresponding KGs (Fig. 2a) following
our ontology. These KGs are saved as Turtle files ranging in size from 440MB to 4.4GB.
Implementation. We implement the constraints in Example 1-3 with (a) Python, because it
is relatively easy to learn and it is popular among the petroleum domain experts; (b) PySpark,
for its similarity to Python and that it unlocks the potential of parallelizable computation;
(c) and SPARQL, for its popularity in the semantic community. The Python implementation
uses common libraries such as Pandas, Numpy. PySpark is the Python API for Apache Spark,
which is a distributed computing framework that enables parallel computation for dealing with
large-scale datasets. The SPARQL is implemented with Jena and Fuseki. Jena is an open-source
Java framework for semantic applications, while Fuseki is for setting SPARQL endpoint.
Results and discussion. From the results (Fig. 3) we can see that the Python running time
increases significantly when the data size grows, while the running time for PySpark and
SPARQL changes insignificantly. We postulate that the reason is that the data size is under a
certain threshold so that the most consumed time for PySpark and SPARQL is used for loading
the environment, not for querying. The results indicate that both PySpark and SPARQL have
the potential for verifying large datasets. Yet, note that for generating the ttl files for SPARQL
to query, it takes a large amount of time (some minutes to some hours). Besides, many domain
experts are familiar with Python, but unfamiliar with Jena Fuseki and SPARQL. We expect they
tend to learn writing constraint queries in PySpark than in SPARQL. All these factors need to
be taken into account in considering industrial adoption.</p>
          <p>Acknowledgements This work is supported by the Norwegian Research Council via PeTWIN
(294600), DigiWell(308817) and SIRIUS (237898).</p>
        </sec>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>H.</given-names>
            <surname>Ritchie</surname>
          </string-name>
          , et al.,
          <string-name>
            <surname>Energy</surname>
          </string-name>
          , Our World in Data (
          <year>2022</year>
          ). Https://ourworldindata.org/energy.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>L.</given-names>
            <surname>Kuang</surname>
          </string-name>
          , et al.,
          <article-title>Application and development trend of artificial intelligence in petroleum exploration and development</article-title>
          ,
          <source>Petroleum Exploration and Development</source>
          <volume>48</volume>
          (
          <year>2021</year>
          )
          <fpage>1</fpage>
          -
          <lpage>14</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J.</given-names>
            <surname>Ge</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>A novel chinese domain ontology construction method for petroleum exploration information</article-title>
          .,
          <source>J. Comput. 7</source>
          (
          <year>2012</year>
          )
          <fpage>1445</fpage>
          -
          <lpage>1452</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>F.</given-names>
            <surname>Cicconeto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. V.</given-names>
            <surname>Vieira</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Abel</surname>
          </string-name>
          , R. dos Santos Alvarenga,
          <string-name>
            <given-names>J. L.</given-names>
            <surname>Carbonera</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. F.</given-names>
            <surname>Garcia</surname>
          </string-name>
          ,
          <string-name>
            <surname>Georeservoir:</surname>
          </string-name>
          <article-title>An ontology for deep-marine depositional system geometry description</article-title>
          ,
          <source>Computers &amp; Geosciences</source>
          <volume>159</volume>
          (
          <year>2022</year>
          )
          <fpage>105005</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>N.</given-names>
            <surname>Santos</surname>
          </string-name>
          , et al.,
          <article-title>O3po: A domain ontology for ofshore petroleum production plants</article-title>
          ,
          <source>SSRN</source>
          <volume>4280151</volume>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Qu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Perrin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Torabi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Abel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Giese</surname>
          </string-name>
          ,
          <article-title>Geofault: A well-founded fault ontology for interoperability in geological modeling</article-title>
          ,
          <source>arXiv preprint arXiv:2302.07059</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>P. F.</given-names>
            <surname>Silva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. F.</given-names>
            <surname>Garcia</surname>
          </string-name>
          , G. Figueiredo, R. J. de Moraes, R. K. Romeu,
          <article-title>How do specialists express risks: an applied ontology for the oil &amp; gas domain</article-title>
          .,
          <source>in: ONTOBRAS</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>114</fpage>
          -
          <lpage>125</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>N. O.</given-names>
            <surname>Santos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Abel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. H.</given-names>
            <surname>Rodrigues</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Schmidt</surname>
          </string-name>
          ,
          <article-title>Towards an ontology of ofshore petroleum production equipment</article-title>
          ,
          <source>CEUR-WS</source>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>R.</given-names>
            <surname>Arp</surname>
          </string-name>
          , et al.,
          <article-title>Building ontologies with basic formal ontology</article-title>
          , MIT Press,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>B.</given-names>
            <surname>Smith</surname>
          </string-name>
          , et al.,
          <article-title>Relations in biomedical ontologies</article-title>
          ,
          <source>Genome biology 6</source>
          (
          <year>2005</year>
          )
          <fpage>1</fpage>
          -
          <lpage>15</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>B.</given-names>
            <surname>Smith</surname>
          </string-name>
          , et al.,
          <article-title>A first-order logic formalization of the industrial ontologies foundry signature using basic formal ontology</article-title>
          .,
          <source>in: JOWO</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>V.</given-names>
            <surname>Kukkonen</surname>
          </string-name>
          , et al.,
          <article-title>An ontology to support flow system descriptions from design to operation of buildings, Automation in Construction 134 (</article-title>
          <year>2022</year>
          )
          <fpage>104067</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>