-

1613-0073

Approach: AI for Digitalised Carbon Storage Analysis

Yuanwei Qu

Arild Waaler

Anita Torabi

anita.torabi@geo.uio.no 1

Baifan Zhou

baifan.zhou@oslomet.no 0 2 0 Department of Computer Science, Oslo Metropolitan University , Norway 1 Department of Geosciences, University of Oslo , Norway 2 Department of Informatics, University of Oslo , Norway

2025

17 18

Carbon Capture and Storage (CCS) plays an essential role in mitigating climate change. A comprehensive CO2 storage analysis is key to CCS success, however, its advancement is hindered by interdisciplinary complexities, data fragmentation, and the need for enhanced transparency. This paper presents ongoing work in an integrated digitalisation framework for CO2 storage analysis that addresses three major challenges: knowledge complexity across geosciences, reservoir engineering, and computer science; the heterogeneous, multi-scale nature of geological data; and the requirement for explainable decision-making in high-risk subsurface storage scenarios. Our approach leverages both symbolic AI and generative AI. On the symbolic AI side, it includes ontologydriven knowledge engineering to harmonise domain-specific terminologies, and employs a layered information modelling and data integration system to standardise diverse datasets. Additionally, a generative AI-powered query and explanation system provides context-aware, transparent analyses. This framework aims to streamline multidisciplinary collaboration and enhance the accountability and reliability of long-term CO2 storage, thereby supporting the transition to a carbon-neutral future.

carbon capture and storage knowledge engineering information modelling explainable AI

(B. Zhou) (B. Zhou) CEUR ceur-ws.org

1. Introduction

Carbon Capture and Storage (CCS) captures CO2 from industrial sources or the atmosphere and securely stores it in geological formations. This technology mitigates greenhouse gas emissions and supports the transition to net-zero emissions [ 1 ]. To achieve its full potential in global CO2 mitigation and meet predicted storage capacities, CCS requires advanced, cost-efective analysis methods that ensure long-term storage safety and integrity [ 2, 3 ]. A key advancement in this area is digital transformation: transitioning from traditional workflows to data- and knowledge-driven methods that improve the eficiency and accuracy of storage analysis.

However, digitalising carbon storage analysis remains challenging despite its recognised importance in the energy sector [ 4 ]. This challenge arises from the need for CO2 storage analysts to integrate vast amounts of domain-specific knowledge and analyse data in various formats, from structured datasets to images and diagrams. While machine learning advances ofer promise for specific tasks [

5], their

accuracy is often constrained by the unstructured nature of CO 2 storage data. While machine learning advances ofer promise for specific tasks [

5], their efectiveness is often constrained by the unstructured

nature of CO2 storage data due to accessibility, quality, and heterogeneity issues.

Compounding these technical dificulties, the interdisciplinary nature of CO 2 storage analysis requires collaboration across domains such as geoscience, reservoir engineering, and computer science. Each domain has its unique methodologies and terminologies, hindering efective collaboration due to difering approaches and tools, complicating data processing and interpretation. In this context, we characterise three major challenges for the design and operation of digitalised carbon storage analysis:

Knowledge complexity and diversity Data fragmentation and heterogeneity

Leakage or Stable? Fractures Seismic survey

Petrophysical plot Microscopic image Outcrop

Fieldwork data Well log

Accountability

Data Integrity

Compliance Validation Geologists

Computer Scientist Reservoir Engineer Explainability

Traceability

Responsibility • C1, Knowledge complexity and diversity highlights the need to reconcile disparate terminologies and conceptual frameworks across geoscience, reservoir engineering, and computer science. Experts may struggle to interpret concepts or data presented in another discipline’s terminology. • C2, Data fragmentation and heterogeneity is a critical issue, as the CO2 storage analysis relies on heterogeneous data types such as well logs, seismic surveys, core samples and analogues from the onshore fields. These data sources are highly fragmented cross data silos, rendering manual data interpretation and integration by human experts ineficient. • C3, Accountability is crucial, as CO2 storage processes are high-stake. All aspects of analysis and system operation, including data queries and knowledge retrieval, must be transparent and easily explainable to ensure trust, accountability, and reliability.

To address these challenges, we propose an integrated approach. For C1, knowledge engineering bridges semantic gaps across disciplines to create a unified conceptual framework . C2 is addressed through information modelling and data integration techniques that standardise and process diverse data sets, providing unified data access . Finally, C3 is tackled with an AI-enhanced system for data and knowledge query that provides transparent, context-aware responses and explanations for storage analyses and knowledge queries. Together, these strategies aim to enable efective interdisciplinary collaboration, supported by high-quality data and knowledge, positioning digitalised carbon storage analysis as a key element in decarbonisation eforts.

2. Challenges

Digitalising carbon storage analysis demands tight collaboration among geoscientists, reservoir engineers, and computer scientists. This section elaborates on the three major challenges, with examples illustrated in Figure 1. 2.1. Challenge C1: Knowledge Complexity and Diversity CO2 storage analysis spans multiple disciplines, each with its own specialised knowledge and conceptual models. These frameworks are not only complex individually but also diverse in terminology and perspective. For instance, the term fracture can vary in meaning: reservoir engineers may describe it as lines or 2D planes in models, while geoscientists may define it as a 3D volume network with a core and damage zone. These semantic diferences afect the interpretation of caprock integrity analysis, potentially leading to inconsistent leakage predictions. Computer scientists, unfamiliar with these domain nuances, face dificulties understanding such contextual variations. Furthermore, existing analysis methods are largely adapted from petroleum engineering, which prioritises extraction. In contrast, CO2 storage demands a focus on long-term containment, requiring a shift in both objectives and conceptual understanding. 2.2. Challenge C2: Data Fragmentation and Heterogeneity CO2 storage analysis depends on diverse geological datasets that are often fragmented across isolated systems. Data may originate from well logs that provide detailed measurements of subsurface properties, seismic surveys that map geological structures, core samples that ofer physical insights into rock composition, and onshore outcrop data that provide analogue scenarios for geologists to reference and build geo-mechanical models to predict the behaviour of reservoir and cap-rock. These datasets difer not only in format and resolution but also in the type of information they convey. For example, seismic data are often high-dimensional and require sophisticated signal processing, while well log data are typically time-series measurements with varying scales. This fragmentation and heterogeneity hinder efective data access and analysis, complicating data interpretation and limiting the eficiency of both human analysis and data science methods. 2.3. Challenge C3: Accountability CO2 storage involves high-stake infrastructure where decisions directly afect safety, environmental impact, and operational eficiency. As AI-supported methods become increasingly integral to storage analysis [ 8 ], ensuring transparency, explainability, and traceability in decision-making is essential. For instance, during the CO2 injection phase, when an anomaly—such as a sudden pressure change in a storage reservoir is detected, the geologists and reservoir engineers need to understand the underlying reasoning behind any automated response or control adjustment. Accountability also demands that data queries and knowledge retrieval are traceable and interpretable, supporting compliance with regulations, verification of data integrity, and clear responsibility in operations.

Each of these challenges highlights the complexity inherent in creating a robust digitalised CO2 storage analysis method. In the following section, we describe our approach that targets these challenges through a combination of knowledge engineering, information modelling with data integration, and large language model-based query and explanation systems.

3. Approach

To overcome the challenges in digitalising carbon storage analysis, our approach is structured around three key strategies, each targeting a specific challenge (Figure 2). 3.1. Knowledge Engineering To address knowledge complexity and diversity (C1), we establish a unified conceptual framework that aims to bridge the semantic gaps across geoscience, reservoir engineering, and computer science. Our approach employs knowledge engineering techniques to develop a unified ontology that harmonises domain-specific terminologies and concepts. For example, the ontology should define terms such as cap rock, sealing stable time, permeability, etc., with machine-readable definitions that are agreed upon by domain experts. In the current knowledge model (Unified Conceptual Framework in Figure 2), we separate the concept of cap rock from rock as material and make a conceptual distinction between the temporal region of the sealing stable time and the predicted value of storage time.

To not reinvent the wheel, this framework is built above the existing useful ontological resources such as top-level ontology: Basic Formal Ontology [ 9 ], core ontology: GeoCore Ontology [ 10 ], and domain-specific ontology: GeoFault Ontology [ 11 ]. By mapping and clarifying these concepts into a unified conceptual model, this framework enables experts from diferent domains to share and interpret information accurately, supporting efective multidisciplinary collaboration.

Unified Conceptual Framework

Ontology

layer Information model layer

Data layer Unified Conceptual Framework Facility

Reservoir Digital System

Text Database

Image Database 3.2. Information Modelling and Data Integration The second challenge (C2) relates to the fragmented multi-scale data sources, such as petrographic thin section, well logs, core samples, seismic surveys, and onshore outcrops. Our approach aims at a unified data access framework, enabling a consistent and structured way of accessing diverse data through a common semantic and modelling framework, comprising of three layers: • The ontology layer that relies on the unified conceptual framework to ensure clarity and consistent terminology. • The information model layer that serves as a mediator between the ontology layer and data layer by constructing information models from the perspectives of geoscience, reservoir engineering, and computer science, and by implementing data integrity and constraint checks. • The data layer are mapped to and accessed via the information model layer, comprising of diverse data, such as text data (e.g., documents, reports), image data (e.g., tomography, seismic survey), table data (e.g., sensor data).

The unified data access approach organises geological and operational data into a structured, query-able format that can be eficiently accessed and interpreted by both humans and automated systems. To ensure the work results are well aligned with industrial standards, the construction of information model will follow the recommended practice for asset information modelling framework from DNV [ 12 ] and apply the suggested Information Modelling Framework language [ 13 ]. 3.3. AI-enhanced System for Data and Knowledge Query Given the risks associated with CO2 storage analysis and the followed operating processes, accountability, including transparency and explainability (C3) are critical. To meet this need, we aim to develop an AI-enhanced system for data and knowledge query that provides context-aware, human-understandable answers and explanations. The system relies on the unified data access and provides semantic-enriched responses. The system employs retrieval-augmented generation (RAG) techniques to draw upon up-todate domain knowledge and operational data, ensuring that each decision or output is accompanied by clear and traceable explanations. For example, when an operator queries the caprock sealing capability, which is crucial for ensuring that the subsurface carbon storage system does not leak, the system will not only assess the sealing capability based on the retrieved relevant sensor data, but also explain the knowledge and reasoning behind the assessment in plain language (giving explanations). This approach enhances operators’ trust and supports informed decision-making in complex subsurface environments.

4. Summary

This paper presents our ongoing efort to develop a data- and knowledge-driven approach enhanced by AI for CO2 storage analysis. We identify three core challenges: the complexity and diversity of domain-specific knowledge, the heterogeneous and multi-scale nature of data from diferent sources, and the critical need for transparency and explainability raised by accountability in carbon storage analysis and operations. To address these challenges, our approach integrates: (1) knowledge engineering to create a unified conceptual framework, (2) information modelling and data integration to standardise and fuse diverse datasets, enabling unified data access aligned with industry best practices, and (3) a generative AI-enhanced system for data and knowledge query to support accountable, transparent, and traceable decision-making. Together, these components form a foundation for a more eficient, reliable, collaborative, and digitalised CO2 storage analysis framework, contributing to a carbon-neutral future. Declaration on Generative AI. The authors have used ChatGPT to assist with the polishing of human-authored text. The authors take full responsibility for the publication’s content.

[1] IEA , Net

zero

by 2050 , 2021 . URL: https://www.iea.org/reports/net-zero-by- 2050 , accessed 10 April 2025 .

[2]

Kazlou ,

Cherp ,

Jewell , Feasible deployment of carbon capture and storage and the requirements of climate targets , Nature Climate Change 14 ( 2024 ) 1047 - 1055 .

[3]

Gholami ,

Raza ,

Iglauer , Leakage risk assessment of a co2 storage site: A review , EarthScience Reviews 223 ( 2021 ) 103849 .

[4] IEA , Digitalisation and energy, 2017 . URL: https://www.iea.org/reports/digitalisation-and -energy, accessed 9 April 2025 .

[5]

Hussin ,

S. A. N. M.

Rahim ,

N. S. M.

Hatta ,

M. K.

Aroua ,

S. A.

Mazari , A systematic review of machine learning approaches in carbon capture applications , Journal of CO2 Utilization 71 ( 2023 ).

[6]

Fawad ,

M. J.

Rahman ,

N. H.

Mondol , Seismic reservoir characterization of potential co2 storage reservoir sandstones in smeaheia area, northern north sea , Journal of Petroleum Science and Engineering 205 ( 2021 ).

[7]

Torabi ,

Alaei ,

Smith , Fault characteristics in exhumed basement rocks; implications for understanding subsurface basement faults , Tectonophysics 887 ( 2024 ) 230445 .

[8]

Nassabeh ,

You ,

Keshavarz ,

Iglauer , Sub-surface geospatial intelligence in carbon capture, utilization and storage: a machine learning approach for ofshore storage site selection , Energy 305 ( 2024 ).

[9]

Arp ,

Smith ,

A. D.

Spear , Building ontologies with basic formal ontology , Mit Press, 2015 .

[10]

L. F.

Garcia ,

Abel ,

Perrin , R. dos Santos Alvarenga, The geocore ontology: a core ontology for general use in geology , Computers & Geosciences 135 ( 2020 ) 104387 .

[11]

Qu ,

Perrin ,

Torabi ,

Abel ,

Giese , Geofault: A well-founded fault ontology for interoperability in geological modeling , Computers & Geosciences 182 ( 2024 ) 105478 .

[12] DNV , DNV- RP- 0670 , Recommended practice for asset information modelling framework , 2024 . URL: https://www.dnv. com/digital-trust/recommended-practices/ asset-information-modelling- dnv- rp- 0670 /, accessed 10 April 2025 .

[13]

Waaler , M. Skjaeveland, (editors), Information modelling framework manual version 0.3.0 , 2024 . URL: https://www.imfid.org/, accessed 10 April 2025 .