<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Ontology-enhanced deep learning framework for anomaly detection in oil and gas production plants</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Gustavo Alexsandro de Lima</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mara Abel</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Instituto de Informática- Universidade Federal do Rio Grande do Sul (UFRGS)- 91501-970- Porto Alegre- RS-</institution>
          <country country="BR">Brazil</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This proposal presents the creation of a framework that combines deep learning-based anomaly detection with ontology-driven knowledge representation to improve fault diagnosis in oil and gas production plants. The framework aims to leverage the strength of both techniques to reduce false alarm rates and to provide operators with more comprehensive information for decision-making.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;anomaly detection</kwd>
        <kwd>ontology</kwd>
        <kwd>oil and gas</kwd>
        <kwd>time-series</kwd>
        <kwd>framework</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Technological advancements pave the way for increased productivity and security in industrial process
plants. Smart factories, brought by Industry 4.0, are characterized by their outstanding use of
cuttingedge technology, with automation, monitoring, and artificial intelligence playing a significant role in
operation eficiency [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>These technological advancements apply not only to traditional manufacturing industries but also to
various industrial processes, including the oil and gas sector, which is the focus of this proposal. An
important improvement resulting from these advancements is the installation of sensor devices for
constant information monitoring.</p>
      <p>
        Despite the benefits, the vast amount of data generated by these sensors can prove challenging to
analyze, creating the need for automated processes that verify that continuous stream of information
in search of anomalies [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] that can indicate equipment failures, safety hazards, or ineficiencies in
production.
      </p>
      <p>The detection of these failures is of utmost importance to the sector. Plant shutdowns caused by
failures can bring significant economic problems for companies. Moreover, safety hazards in this
industry can have catastrophic consequences due to the hazardous nature of the industry, posing severe
risks to worker safety and environmental integrity.</p>
      <p>While traditional anomaly detection models can bring good results in specific areas, they still fail to
understand the semantic characteristics of an oil and gas production plant, creating false results that
can make it harder for an operator to address potential issues.</p>
      <p>This work aims to create a framework that uses machine learning anomaly detection methods with a
layer of ontology for semantic analysis of the oil and gas industry anomalies.</p>
      <p>The paper is structured in the following manner: firstly, an analysis of the current state-of-the-art
research will be done, focusing on works on the anomaly detection and ontology front, then the research
proposal will be specified, showing improvements of the study and potential challenges. After that, the
following steps in the research will be presented.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>Anomaly detection is a critical field in data analysis, with applications in various industries and domains.
This section will explore state-of-the-art research in the area, focusing on machine-learning techniques
and the use of ontologies.</p>
      <sec id="sec-2-1">
        <title>2.1. Anomaly Detection</title>
        <p>
          Anomaly detection is used to expose data points that difer from typical values. It can be used in network
intrusion detection, fraud detection, fault diagnosis, and many other areas that require data mining
applications [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ].
        </p>
        <p>
          Traditional anomaly detection relies on rules and statistical methods, often struggling to keep up
with extensive dynamic and heterogeneous data [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. Advanced techniques leverage machine learning
algorithms for eficient and scalable detection to cope with that.
        </p>
        <p>The context of fault diagnosis, especially in the oil and gas sector, sees anomaly detection widely
used and studied for its role in maintaining operational eficiency and safety.</p>
        <p>
          In [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], the authors propose a methodology considering seven fault types in production wells and
lines, using a classifier based on random forest, and tuning their hyperparameters with a Bayesian
non-convex optimizer. The authors achieved an accuracy of above 94% on the used data.
        </p>
        <p>
          The authors at [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] propose a methodology that uses dynamic time warping and k-means clustering of
the time series data to improve the performance of one-class classifiers, achieving increased performance
metrics after using the proposed method.
        </p>
        <p>
          When focusing more on neural network works, the authors at [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] propose a methodology that uses a
generative adversarial network that is driven by a digital twin to conduct multivariate time series data
anomaly detection that was able to increase by 2.6% the detection of anomalous data compared to other
methods.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Ontologies</title>
        <p>
          Computer science uses ontologies to model a system’s relevant entities and relations. They can be used
to infer information for all the explicitly represented knowledge [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ].
        </p>
        <p>The oil and gas industry employs ontologies to provide domain-specific knowledge and modeling
concepts such as, but not limited to, wells, reservoirs, and production facilities. This representation can
enable data integration and interoperability between diferent systems.</p>
        <p>In the context of anomaly detection in the oil and gas sector, a noticeable research gap presents an
opportunity for further exploration. Existing studies in this field have predominantly utilized ontology
to enhance the visualization and integration of time series data.</p>
        <p>
          Other areas use ontologies’ rules and logic framework to represent and specify anomaly information
[
          <xref ref-type="bibr" rid="ref9">9</xref>
          ].
        </p>
        <p>
          In [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ], the authors introduce NORIA-O, an ontology that represents network infrastructures,
incidents, and maintenance that can be used to model complex Information and Communications Technology
systems situations and can be used as the basis for anomaly detection.
        </p>
        <p>
          The authors at [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] present expert knowledge on the maritime field using an ontology expressed
in description logic, using automated reasoning tools for the context of anomaly detection. While the
proposed approach was validated, the authors believe that further research is needed to suit the high
processing demands of real maritime environments.
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Anomaly Detection with Ontologies and Machine Learning</title>
        <p>Combining ontologies and machine learning algorithms for anomaly detection is also an area with
opportunities for further research, with very few works exploring the combination of both.</p>
        <p>
          In [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ], the authors explore this possibility by combining a long short-term memory network for the
mathematical search of anomalies with a fuzzy web ontology for a second stage that filters the results
for anomalies that only afect a specific subject area. Their experiments focused on facies logs of nine
drilling wells and achieved good eficiency.
        </p>
        <p>The authors at [13] propose a new methodology, FLAGS, that combines data and knowledge-driven
techniques, using semantic filters to classify anomalies as known behavior, reducing the load on
operators to verify real alerts. This methodology was tested on the railway domain on the topic of
predictive maintenance.</p>
        <p>In [14], the authors introduce a method that uses a semantic approach to reduce the number of
features used for the anomaly detection process, then integrates the proposed model on IBM’s Mape-K
loop, which combines inference rules and a Hierarchical Temporal Memory algorithm. The authors
applied their methodology to cellular vehicular communication systems and achieved encouraging
results.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Research Proposal</title>
      <p>The main goal of this proposal is to develop a modular framework that combines a deep-learning
anomaly detection model layer with an ontology knowledge representation layer to enhance the
identification and interpretation of anomalies in oil and gas production plants. The framework aims to
leverage the strength of both machine learning and semantic technologies to provide a more accurate
fault diagnosis system.</p>
      <p>The framework’s modularity comes from the first layer, which is designed to be agnostic as to which
deep learning model is implemented, creating the ability to plug diferent trained models as needed.</p>
      <p>One of the side goals of this research is the creation of the ontology itself, which will contain the
representation of a certain area inside an oil and gas production plant, with the creation of semantic
rules to insert the semantic knowledge into the framework. The creation of the ontology will follow
the NeOn methodology [15].</p>
      <p>The other side goal is the choice and training of machine learning models, focusing on novel and
recent advancements in the area, such as graph neural networks or the use of transformer-based
architectures.</p>
      <p>The proposed framework is shown in Figure 1.</p>
      <sec id="sec-3-1">
        <title>3.1. Expected Contributions</title>
        <p>The proposed framework can ofer advantages over existing methods:
• Increased accuracy in anomaly detection by combining data-driven insights and domain expertise.
• Facilitate the operator’s ability to understand anomalous data and quickly verify afected systems.
• Reduced false alarm rates.</p>
        <p>• Deepen research on the combination of ontologies and machine learning applications.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Potential Challenges of the Proposal</title>
        <p>Combining two diferent methods can increase the number of challenges the research faces.</p>
        <p>On the side of anomaly detection, it is required that the technique chosen is adequate for the process
and the data provided. It is also worth noting the dificulty in gathering data for training, which might
require extra work on data gathering and creation.</p>
        <p>Creating an ontology is also a complex task, requiring the assistance of domain experts and knowledge
of industry standards to ensure the ontology’s validity and completeness.</p>
        <p>After these challenges are overcome, there’s also the need to validate the framework in real-world
settings and check if the system is scalable enough to handle the large amount of data a real oil and gas
production plant can generate.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Research Benchmarks</title>
        <p>In order to evaluate the framework and compare it with other applications, a standard for anomaly
detection introduced in [16] will be utilized. This benchmark involves performing multiple training
and testing rounds and computing precision, recall, and F1 scores for each round. The average F1 score
will be used to assess the framework’s validity.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Research Methodology</title>
      <p>The methodology used to guide this research will be the Design Science Research Methodology, which
has roots in engineering and is fundamentally a problem-solving paradigm seeking innovative solutions
to real-world problems [17].</p>
      <p>The steps of this methodology are as follows:
• Step 1: Problem identification and motivation.
• Step 2: Define the objectives for a solution.
• Step 3: Design and development.
• Step 4: Demonstration.
• Step 5: Evaluation.</p>
      <p>• Step 6: Communication.</p>
      <sec id="sec-4-1">
        <title>4.1. Current steps</title>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Next Steps</title>
        <p>The research is still in its first step. The problem was first identified via interviews with oil and gas
operators, introducing the everlasting anomaly detection problem in specific areas and the burden it
can create on operators that need to verify large amounts of time series data to verify the validity of
alarms in a production plant.</p>
        <p>More profound research will be done for the next steps to gather more specific objectives. This includes
defining a particular area of interest in applying the framework to incorporate a case study approach.</p>
        <p>With a specific area defined, it’s time to start planning the development of the ontology, which will
include interviews with domain experts in the area to gather expert knowledge.</p>
        <p>There’s also the need for data gathering to evaluate and train the machine-learning models to use in
the modular layer of the framework to check the model’s efectiveness and the increased accuracy of
including a semantic layer.</p>
        <p>After the ontology has been planned and the data for the machine learning models has been gathered,
the bulk of the research will start, which includes the design and then the implementation of the
framework, with benchmarks of the validity of the solution happening in parallel with the development.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>This research proposal aims to develop a framework that combines the strengths of deep learning-based
anomaly detection models with ontology-driven knowledge to improve fault diagnosis in the oil and
gas industry, refining the accuracy of detection and interpretability of anomalies.</p>
      <p>The research will follow the Design Science Research Methodology to find a solution to improve false
alarm rates and provide operators with more information to enable better decisions, addressing the
research gap in the area and contributing to the continued advancement of Industry 4.0 technologies in
the oil and gas sector.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>The authors acknowledge CAPES-Brazil Finance Code 001, the Brazilian Agency CNPq, and the Petwin
Project, supported by FINEP and Libra Consortium (Petrobras, Shell Brasil, Total Energies, CNOOC,
CNPC).
detection of time series anomalies, Mathematics 11 (2023). URL: https://www.mdpi.com/2227-7390/
11/5/1204. doi:10.3390/math11051204.
[13] B. Steenwinckel, D. De Paepe, S. V. Hautte, P. Heyvaert, M. Bentefrit, P. Moens, A. Dimou, B. Van
Den Bossche, F. De Turck, S. Van Hoecke, et al., Flags: A methodology for adaptive anomaly
detection and root cause analysis on sensor data streams by fusing expert knowledge with machine
learning, Future Generation Computer Systems 116 (2021) 30–48.
[14] Q. Ricard, P. Owezarski, Ontology based anomaly detection for cellular vehicular communications,
in: 10th European Congress on Embedded Real Time Software and Systems (ERTS 2020), 2020.
[15] M. C. Suárez-Figueroa, Neon methodology for building ontology networks: Specification,
scheduling and reuse, 2010. URL: https://oa.upm.es/3879/. doi:10.20868/UPM.thesis.3879, ontology
Engineering Group.
[16] R. E. V. Vargas, C. J. Munaro, P. M. Ciarelli, A. G. Medeiros, B. G. do Amaral, D. C. Barrionuevo,
J. C. D. de Araújo, J. L. Ribeiro, L. P. Magalhães, A realistic and public dataset with rare undesirable
real events in oil wells, Journal of Petroleum Science and Engineering 181 (2019) 106223.
[17] J. Vom Brocke, A. Hevner, A. Maedche, Introduction to design science research, Design science
research. Cases (2020) 1–13.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Soori</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Arezoo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Dastres</surname>
          </string-name>
          ,
          <article-title>Internet of things for smart factories in industry 4.0</article-title>
          ,
          <string-name>
            <surname>a</surname>
            <given-names>review</given-names>
          </string-name>
          ,
          <source>Internet of Things and Cyber-Physical Systems</source>
          <volume>3</volume>
          (
          <year>2023</year>
          )
          <fpage>192</fpage>
          -
          <lpage>204</lpage>
          . URL: https://www.sciencedirect.com/ science/article/pii/S2667345223000275. doi:https://doi.org/10.1016/j.iotcps.
          <year>2023</year>
          .
          <volume>04</volume>
          . 006.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>D.</given-names>
            <surname>Miodutzki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Tacla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Gomes-Jr</surname>
          </string-name>
          ,
          <article-title>Outlier detection with ontology-driven fault contextualization in the industry 4.0</article-title>
          , in: Anais do XXXVII Simpósio Brasileiro de Bancos de Dados, SBC,
          <string-name>
            <surname>Porto</surname>
            <given-names>Alegre</given-names>
          </string-name>
          ,
          <string-name>
            <surname>RS</surname>
          </string-name>
          , Brasil,
          <year>2022</year>
          , pp.
          <fpage>267</fpage>
          -
          <lpage>278</lpage>
          . URL: https://sol.sbc.org.br/index.php/sbbd/article/view/21812. doi:
          <volume>10</volume>
          .5753/sbbd.
          <year>2022</year>
          .
          <volume>224309</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>D.</given-names>
            <surname>Samariya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Thakkar</surname>
          </string-name>
          ,
          <article-title>A comprehensive survey of anomaly detection algorithms</article-title>
          ,
          <source>Annals of Data Science</source>
          <volume>10</volume>
          (
          <year>2023</year>
          )
          <fpage>829</fpage>
          -
          <lpage>850</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>B.</given-names>
            <surname>Dhamodharan</surname>
          </string-name>
          ,
          <article-title>Beyond traditional methods: A novel approach to anomaly detection and classification using ai techniques</article-title>
          ,
          <source>Transactions on Latest Trends in Artificial Intelligence</source>
          <volume>3</volume>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Marins</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. D.</given-names>
            <surname>Barros</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. H.</given-names>
            <surname>Santos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. C.</given-names>
            <surname>Barrionuevo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. E.</given-names>
            <surname>Vargas</surname>
          </string-name>
          , T. d. M.
          <string-name>
            <surname>Prego</surname>
            ,
            <given-names>A</given-names>
          </string-name>
          . A. de Lima,
          <string-name>
            <surname>M. L. de Campos</surname>
          </string-name>
          , E. A. da
          <string-name>
            <surname>Silva</surname>
            ,
            <given-names>S. L.</given-names>
          </string-name>
          <string-name>
            <surname>Netto</surname>
          </string-name>
          ,
          <article-title>Fault detection and classification in oil wells and production/service lines using random forest</article-title>
          ,
          <source>Journal of Petroleum Science and Engineering</source>
          <volume>197</volume>
          (
          <year>2021</year>
          )
          <fpage>107879</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A. P. F.</given-names>
            <surname>Machado</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. J.</given-names>
            <surname>Munaro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. M.</given-names>
            <surname>Ciarelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. E. V.</given-names>
            <surname>Vargas</surname>
          </string-name>
          ,
          <article-title>Time series clustering to improve one-class classifier performance</article-title>
          ,
          <source>Expert Systems with Applications</source>
          <volume>243</volume>
          (
          <year>2024</year>
          )
          <fpage>122895</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Geng</surname>
          </string-name>
          , T. Tian,
          <article-title>Anomaly detection method for multivariate time series data of oil and gas stations based on digital twin and mtad-gan</article-title>
          ,
          <source>Applied Sciences</source>
          <volume>13</volume>
          (
          <year>2023</year>
          ). URL: https: //www.mdpi.com/2076-3417/13/3/1891. doi:
          <volume>10</volume>
          .3390/app13031891.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>N.</given-names>
            <surname>Guarino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Oberle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Staab</surname>
          </string-name>
          ,
          <source>What Is an Ontology?</source>
          , Springer Berlin Heidelberg, Berlin, Heidelberg,
          <year>2009</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>17</lpage>
          . URL: https://doi.org/10.1007/978-3-
          <fpage>540</fpage>
          -92673-
          <issue>3</issue>
          _0. doi:
          <volume>10</volume>
          .1007/ 978-3-
          <fpage>540</fpage>
          -92673-
          <issue>3</issue>
          _
          <fpage>0</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>J.</given-names>
            <surname>Baumeister</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Seipel</surname>
          </string-name>
          ,
          <article-title>Anomalies in ontologies with rules</article-title>
          ,
          <source>Journal of Web Semantics</source>
          <volume>8</volume>
          (
          <year>2010</year>
          )
          <fpage>55</fpage>
          -
          <lpage>68</lpage>
          . URL: https://www.sciencedirect.com/science/article/pii/S1570826809000778. doi:https: //doi.org/10.1016/j.websem.
          <year>2009</year>
          .
          <volume>12</volume>
          .003.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>L.</given-names>
            <surname>Tailhardat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chabot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Troncy</surname>
          </string-name>
          ,
          <article-title>Noria-o: an ontology for anomaly detection and incident management in ict systems</article-title>
          ,
          <source>in: European Semantic Web Conference</source>
          , Springer,
          <year>2024</year>
          , pp.
          <fpage>21</fpage>
          -
          <lpage>39</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>J.</given-names>
            <surname>Roy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Davenport</surname>
          </string-name>
          ,
          <article-title>Exploitation of maritime domain ontologies for anomaly detection and threat analysis</article-title>
          , in: 2010 International WaterSide Security Conference, IEEE,
          <year>2010</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>V.</given-names>
            <surname>Moshkin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kurilo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Yarushkina</surname>
          </string-name>
          ,
          <article-title>Integration of fuzzy ontologies and neural networks in the</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>