<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>B. Bukhatov);</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Modern methods of visualization and preprocessing of seismic data for deep learning: a review of python-based approaches</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Bekzat Bukhatov</string-name>
          <email>bukhatovbekzat@gmail.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Aizhan Altaibek</string-name>
          <email>a.altaibek@iitu.edu.kz</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marat Nurtas</string-name>
          <email>m.nurtas@iitu.edu.kz</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Institute of Ionosphere</institution>
          ,
          <addr-line>Gardening community IONOSPHERE 117, Almaty, 050020</addr-line>
          ,
          <country country="KZ">Kazakhstan</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>International Information Technology University</institution>
          ,
          <addr-line>34/1 Manas St., Almaty, 050000</addr-line>
          ,
          <country country="KZ">Kazakhstan</country>
        </aff>
      </contrib-group>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0001</lpage>
      <abstract>
        <p>In seismic data interpretation, neural networks have significantly advanced tasks such as fault detection and subsurface analysis. However, the quality of input data remains a critical factor in the performance of these models. Seismic data is often noisy, incomplete, or inconsistent, making it necessary to apply robust preprocessing techniques to ensure that neural network or machine learning algorithms can effectively interpret the data. This paper presents an overview of modern preprocessing and visualization techniques tailored for seismic data, with a focus on Python-based implementations. We explore methods such as stratal slicing [1], attribute co-rendering[2], and data interpolation[3], which are crucial for improving both 2D and 3D seismic datasets before they are fed into neural networks. Our focus is not to directly prove the improvement in model performance but to examine how these techniques enhance the overall quality and interpretability of the seismic data. This review aims to provide geophysicists and data scientists with the tools necessary to improve data quality, optimize neural network input , without delving into the actual training of neural networks, but rather focusing on how to better prepare the input data for such tasks.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Seismic data</kwd>
        <kwd>Preprocessing</kwd>
        <kwd>Neural networks</kwd>
        <kwd>Deep learning</kwd>
        <kwd>Fault detection</kwd>
        <kwd>Seismic visualization</kwd>
        <kwd>Python</kwd>
        <kwd>Noise reduction</kwd>
        <kwd>Stratal slicing</kwd>
        <kwd>Attribute co-rendering</kwd>
        <kwd>Data interpolation</kwd>
        <kwd>2D seismic data</kwd>
        <kwd>3D seismic data</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Seismic data analysis is a fundamental aspect of industries like oil and gas exploration, earthquake
hazard mitigation, and geotechnical engineering. One of the critical tasks in seismic interpretation is
the detection of faults and other geological structures, which offer valuable insights into subsurface
characteristics. Traditionally, these tasks were performed using manual methods or computationally
expensive models [4, 5]. While effective, these traditional approaches struggle with the increasing
complexity and volume of seismic data [6-8], particularly as modern surveys generate larger datasets
with more intricate structures.</p>
      <p>In recent years, neural networks have emerged as powerful tools for automating seismic
interpretation. Unlike traditional methods, which often require domain expertise and manual feature
extraction, neural networks—especially convolutional neural networks (CNNs)—can automatically
learn complex patterns from raw seismic data. This makes them highly effective in dealing with large
2D and 3D datasets, where traditional methods might falter due to the complexity and sheer scale of
the data. While neural networks offer immense potential, their effectiveness is contingent upon the
quality of the input data [9-11].</p>
      <p>Traditional methods for seismic interpretation, such as manual fault picking or applying
predefined algorithms, often face limitations in handling large-scale datasets and complex geological
structures. For example, methods like those described by Wu et al. [10] and Moreno et al. [9] rely on
manual interaction or heuristic algorithms to unfault 3D seismic images and assist interpretation.
These methods, while useful in specific cases, can struggle with maintaining consistency across
larger datasets or detecting subtle features.</p>
      <p>In contrast, neural networks, as demonstrated in works like Wu and Hale's automatic fault
detection [11], offer a data-driven approach that learns from raw seismic images without manual
intervention. By capturing complex spatial relationships and features across 3D datasets, CNNs have
shown superior performance in accurately identifying faults, horizons, and other geological
structures. This not only reduces the time required for interpretation but also improves accuracy,
particularly in regions where traditional methods may miss or misinterpret critical features.</p>
      <p>Seismic data, by its nature, is prone to noise and inconsistencies due to variations in acquisition
methods and environmental factors. This noise can obscure key geological features, such as faults
and horizons, leading to inaccurate results if not properly addressed. Therefore, preprocessing
seismic data is essential to ensure that only relevant and clean data is fed into neural network models.
Preprocessing includes techniques like noise reduction, data smoothing, and feature enhancement,
which can greatly improve the interpretability of the data.</p>
      <p>The importance of data quality cannot be overstated, particularly in fields like geophysics, where
high accuracy is paramount. Poor-quality data can lead to suboptimal model performance, even with
the most advanced neural network architectures. This makes robust preprocessing not just a
recommendation but a necessity for successful seismic data analysis.</p>
      <p>In this paper, we review modern preprocessing and visualization techniques specifically designed
to handle the challenges of seismic data. Focusing on Python-based tools, we will explore methods
like stratal slicing, co-rendering of seismic attributes, and data interpolation. These techniques not
only prepare the data for neural network models but also enhance human interpretation, offering a
more intuitive understanding of complex subsurface structures. By improving data quality and
visualization, we aim to facilitate better decision-making processes in seismic data analysis.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Problem statement</title>
      <p>Seismic data analysis, particularly in the context of fault detection and subsurface feature
identification, is challenged by the inherent complexity and variability of the datasets. Both 2D and
3D seismic data can suffer from significant noise, inconsistencies, and artifacts, making it difficult to
obtain accurate geological interpretations. While neural networks offer powerful solutions for
seismic interpretation, their effectiveness depends heavily on the quality of the input data.</p>
      <p>A key issue is the lack of standardized tools and methods that can consistently improve data
quality before it is fed into neural network models. Seismic data often requires multiple layers of
preprocessing, including noise reduction, attribute co-rendering, and interpolation, to ensure that
geological features are clearly visible and interpretable. The challenge lies not only in removing noise
but also in preserving the integrity of key features such as faults and horizons, which are critical for
accurate interpretations.</p>
      <p>Another critical problem is the limited ability of existing tools to visualize and explore data before
neural network training. Effective visualization methods are essential for geophysicists and data
scientists to assess the quality of preprocessing, ensuring that the data is suitable for both human
interpretation and neural network tasks. This paper seeks to address these gaps by exploring modern
preprocessing techniques and their role in improving the quality and interpretability of seismic
datasets.</p>
      <p>The main research question is: How can modern preprocessing and visualization techniques
enhance the clarity and structure of seismic data, making it more suitable for subsequent analysis and
interpretation? By evaluating methods such as noise reduction, stratal slicing, and attribute
corendering, we aim to provide a clearer understanding of how these techniques can improve both data
quality and the overall seismic analysis workflow. The goal is not to prove improvements in neural
network model performance but rather to explore how better data preparation can lead to more
robust datasets that are ready for neural network applications or manual interpretation.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <sec id="sec-3-1">
        <title>3.1. Data description</title>
      </sec>
      <sec id="sec-3-2">
        <title>3.1.1. Seismic data overview</title>
        <p>In this study, we work with both 2D and 3D seismic datasets, which are commonly used for exploring
subsurface geological structures, particularly in fault detection and subsurface mapping [12]. Seismic
data is collected through surveys that involve generating and recording energy waves reflected from
subsurface layers, creating an image of the Earth’s subsurface. These images are then used to identify
key features like faults, horizons, and fractures.</p>
        <p>




2D seismic data provides cross-sectional views of the subsurface and is often used in early
exploration phases. It offers a single vertical slice of data, which is easier to analyze but may
miss important details in complex structures.
3D seismic data, on the other hand, provides a volumetric representation of the subsurface. It
offers a much more detailed and accurate depiction of geological structures, enabling
geophysicists to visualize and interpret complex formations. 3D data is essential for detailed
fault analysis and exploration of more intricate subsurface features, but it comes with
challenges, including increased noise levels and larger data volumes that require more
extensive processing.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.1.2. Seismic data formats</title>
        <p>Different datasets are provided in specific formats that cater to large volumes of data and ensure the
integrity of the recorded information.</p>
        <p>SEG-Y Format[13]: One of the most common formats for seismic data is SEG-Y (Society of
Exploration Geophysicists Y). This format is widely used in the industry and serves as a
standard for storing seismic data collected during surveys. SEG-Y files contain a mixture of
binary and ASCII data, including the recorded seismic signals (often referred to as traces) and
additional metadata, such as location coordinates, acquisition parameters, and recording
times.
.vol Format: The .vol format is another format commonly used for storing 3D seismic
volumes. Unlike SEG-Y, which stores individual traces, .vol files represent entire 3D seismic
volumes, allowing for faster retrieval and manipulation of data when working with large,
complex datasets.
.npy Format[14]: The .npy format is a data format used for storing large, multi-dimensional
arrays, which is supported by the Python library NumPy. This format is especially useful in
deep learning workflows where seismic data needs to be efficiently loaded into memory for
model training and analysis.</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.1.3. Open seismic data sources</title>
        <p>For this study, several open-source seismic datasets were used, allowing for comprehensive analysis
and comparison of different preprocessing techniques. Among these are:



</p>
        <p>Opunake-3D Dataset[15]: Available at Opunake-3D, this dataset includes detailed 3D seismic
images stored in the .vol format. This data is ideal for testing 3D seismic analysis and fault
detection methods, particularly in regions with complex subsurface structures.</p>
        <p>FORCE 2020 Machine Learning Competition Dataset[16]: Hosted on the Harvard Dataverse,
this dataset includes both 2D and 3D seismic images in the .npy format, specifically designed
for machine learning tasks. It provides a diverse set of seismic features, making it a valuable
resource for deep learning experiments in fault detection.</p>
        <p>Netherlands F3 Dataset[17]: The F3 dataset is widely used in seismic research and is available
through platforms like TerraNubis. It includes 3D seismic data in SEG-Y format, offering a
rich dataset for structural and stratigraphic interpretation.</p>
        <p>These datasets were selected for their diversity in format and content, allowing for the
testing of various preprocessing techniques.</p>
      </sec>
      <sec id="sec-3-5">
        <title>3.2. Data preprocessing</title>
        <p>In seismic data processing, proper preprocessing is crucial for improving data quality, enhancing key
features, and preparing the data for further interpretation or machine learning and deep learning
tasks. Seismic datasets, particularly those with complex subsurface structures, often contain noise
and require interpolation or specialized visualization techniques. The following preprocessing
methods were applied to ensure the data is both clean and informative, allowing for clearer
visualization and more accurate interpretation.</p>
        <p>One of the central techniques used was stratal slicing, which proved invaluable in examining
subsurface structures across multiple layers. Traditional vertical cross-sections often obscure
important lateral variations, making it difficult to track features such as faults. Stratal slicing
addresses this by cutting through seismic volumes horizontally along stratigraphic layers, enabling a
more intuitive exploration of how geological formations extend across different parts of the
subsurface. This method provided a clear advantage when working with 3D volumes, allowing us to
isolate and study specific depositional patterns that would otherwise remain hidden.</p>
        <p>In conjunction with stratal slicing, volume flattening [18] was applied to simplify the visualization
of complex geological formations. Flattening the volume along a key horizon, this approach removed
distortions caused by folding or faulting, offering a clearer view of continuous layers. With the data
"flattened," even subtle stratigraphic details became more apparent, aiding in both manual
interpretation and deep learning-based feature extraction.</p>
        <p>A key addition to our preprocessing toolkit was Crude Spectral Decomposition [19], a method that
allows the breakdown of seismic data into its component frequencies. This technique helps in
isolating specific frequency ranges that highlight different geological features, making it easier to
identify stratigraphic traps and thin beds. Spectral decomposition enriches the dataset by providing
more detailed insights into subsurface structures, particularly when combined with other seismic
attributes like amplitude.</p>
        <p>To handle incomplete or irregular data, we employed various interpolation techniques. While
nearest-neighbor interpolation [20] preserved sharp boundaries in regions with missing data, more
refined methods such as bilinear interpolation and cubic interpolation were used to create smoother
transitions in areas where the data was less complex. Bilinear interpolation proved useful for filling
gaps in 2D data, whereas cubic interpolation was more effective for creating smooth, continuous
surfaces in 3D seismic volumes.</p>
        <p>Filtering and denoising were applied to mitigate the noise inherent in seismic data, ensuring that
critical features like faults and horizons remained clear. This process involved several techniques,
including:
</p>
        <p>Denoising through methods like CLAHE [21] (Contrast Limited Adaptive Histogram
Equalization), which enhanced contrast in regions with poor signal-to-noise ratios, making
features easier to detect.


</p>
        <p>Gaussian smoothing [22], which was used to reduce high-frequency noise while maintaining
the overall structural integrity of the data.</p>
        <p>Outlier removal [23] to eliminate anomalous data points that could distort the seismic image.
Spatial transformations to correct for any geometrical distortions introduced during data
acquisition.</p>
        <p>These filtering and noise-reduction techniques worked together to enhance the quality of seismic
data, ensuring that deep learning models and human interpreters had access to clean, high-fidelity
data.</p>
        <p>Finally, co-rendering of multiple seismic attributes was applied to enhance the interpretability of
the data. This technique involved overlaying various attributes such as amplitude and spectral
decomposition to create a composite view that emphasized subtle geological features. By combining
these attributes, we were able to detect features like fault zones and stratigraphic traps with greater
clarity, ensuring that no critical details were overlooked during the interpretation process.</p>
        <p>Together, these preprocessing techniques form a comprehensive approach to seismic data
preparation, ensuring that the datasets are not only clean and continuous but also rich in geological
detail. These steps are critical for both manual interpretation and for providing high-quality input to
deep learning models.</p>
      </sec>
      <sec id="sec-3-6">
        <title>3.3. Tools and libraries</title>
        <p>Handling and visualizing seismic data at this scale requires a combination of specialized tools and
libraries, each playing a distinct role in the preprocessing and analysis pipeline. The integration of
industry-standard tools with Python-based solutions allowed for a flexible yet robust workflow,
particularly suited for the challenges of seismic data.</p>
        <p>Geoprobe, for example, served as a cornerstone for interpreting seismic volumes. This tool
allowed for efficient horizon and fault picking, automating many of the tasks that would otherwise
require manual intervention. The ability to quickly extract key structural information from the
seismic volumes significantly accelerated the interpretation process, laying the groundwork for more
in-depth analysis.</p>
        <p>For 3D visualization, Mayavi [24] played an essential role. With its ability to render large 3D
volumes interactively, Mayavi allowed us to explore seismic data in real time, adjusting parameters
on the fly to better understand complex subsurface features. Its strength lies in the detailed, dynamic
visualizations it produces, offering geophysicists the ability to inspect subsurface structures layer by
layer and identify anomalies or points of interest with precision.</p>
        <p>Complementing Mayavi was PyVista [25], a more Pythonic interface built on top of the VTK
framework. Where Mayavi excelled in real-time visualization, PyVista provided a smoother
experience for creating high-quality, static visualizations, perfect for documenting the effects of
different preprocessing techniques. PyVista’s integration with Python allowed seamless transitions
between data manipulation and visualization, making it an indispensable part of our workflow.</p>
        <p>When it came to handling the data itself, particularly in SEG-Y format, Obspy [26] provided the
necessary tools for reading, writing, and manipulating seismic traces. This library was pivotal in
extracting the necessary metadata from seismic files, preparing the data for further processing steps
like volume flattening or stratal slicing. Its broad compatibility with seismic data formats made it a
go-to tool for data preprocessing.</p>
        <p>By using this suite of tools and libraries, we were able to streamline the preprocessing pipeline,
enabling efficient handling of large seismic datasets. The integration of these tools into a cohesive
Python-based workflow ensured flexibility and scalability, essential for both seismic interpretation
and deep learning applications.</p>
      </sec>
      <sec id="sec-3-7">
        <title>3.4. Research</title>
      </sec>
      <sec id="sec-3-8">
        <title>3.4.1. Metrics for evaluation</title>
        <p>In this study, several key metrics are used to assess the quality and effectiveness of our preprocessing
techniques. These metrics provide a quantitative evaluation of how well each method contributes to
improving the seismic data before it is fed into neural networks. The following metrics were chosen
for their ability to measure both the information content and the structural integrity of the data.</p>
        <p>Correlation: This metric measures the relationship between the original seismic data and the
processed data. A lower correlation after processing can indicate that new, meaningful information
has been introduced, particularly after methods like spectral decomposition. This suggests that the
processed data contains additional, independent features not present in the original dataset.</p>
        <p>cov ( X , Y )
ρX ,Y =</p>
        <p>
          ,
σ X σ Y
(
          <xref ref-type="bibr" rid="ref1">1</xref>
          )
where, cov ( X , Y ) is the covariance between variables X and Y. σ X and σ Y are the standard
deviations of X and Y, respectively.
        </p>
        <p>Information Entropy: Entropy is a measure of uncertainty or information content in the data.
Higher entropy values indicate that the processed data contains more distinguishable information,
which can be useful for further interpretation or deep learning tasks. An increase in entropy after
preprocessing suggests that new features or patterns have been extracted from the data.</p>
        <p>
          n
H ( X )=−∑ P ( xi) logb P ( xi) ,
i=1
(
          <xref ref-type="bibr" rid="ref2">2</xref>
          )
where, P ( x I ) is the probability of occurrence of event xi. lo gb is the logarithm to the base b
(commonly base 2).
        </p>
        <p>PSNR (Peak Signal-to-Noise Ratio): PSNR quantifies the similarity between the original and
processed data by measuring the ratio between the maximum possible signal and the noise
introduced during processing. Higher PSNR values indicate that the processed data closely resembles
the original, meaning that while noise has been reduced, the integrity of the original signal has been
preserved.</p>
        <p>MA X2
PSNR =10 log10(MSE I ),</p>
        <p>
          (
          <xref ref-type="bibr" rid="ref3">3</xref>
          )
where, MA XI is the maximum possible pixel value of the image. MSE is the mean squared error
between the original and compressed image.
        </p>
        <p>SSIM (Structural Similarity Index): SSIM evaluates the structural similarity between the original
and processed data. It compares features such as luminance, contrast, and structure to determine how
much the key features of the data have been preserved. A high SSIM score suggests that important
geological features remain intact after preprocessing, even after noise reduction or smoothing
techniques have been applied.</p>
        <p>SSIM ( x , y )=</p>
        <p>(2 μx μ y +C1)(2 σ xy +C2)
( μ2x + μ2y +C1)(σ 2x + σ 2y +C2)
where, μx and μ y are the mean values of x and y. σ 2x and σ 2y are the variances of x
and y. σ xy is the covariance between x and y. C1 and C2 are constants to stabilize the division.</p>
      </sec>
      <sec id="sec-3-9">
        <title>3.4.2. Spectral decomposition</title>
        <p>Spectral decomposition is a powerful technique used to break down seismic signals into their
frequency components. This method allows for a more detailed analysis of subsurface features,
revealing subtle geological structures that may not be visible in the original amplitude data. By
decomposing the signal into its constituent frequencies, we can isolate specific frequency bands that
highlight different geological characteristics, such as thin beds or stratigraphic traps.</p>
        <p>Amplitude and Coherence Extraction: We begin by extracting amplitude and coherence attributes
from the seismic data. These attributes form the basis for spectral decomposition, as they provide
insight into the seismic signal’s strength and continuity across different layers.</p>
        <p>Combining Spectral Components: After extracting the spectral components, we combine them
with the amplitude and coherence data to form a more comprehensive dataset. This allows us to
create new features that enhance the interpretability of the data, providing additional insights into
subsurface structures.</p>
        <p>Correlation and Entropy Evaluation: Once the spectral decomposition is complete, we evaluate
the processed data using correlation and information entropy metrics. A lower correlation between
the original and decomposed data indicates that new features have been introduced. An increase in
entropy suggests that the decomposed data contains more distinguishable information, enhancing its
value for subsequent analysis or deep learning models.</p>
      </sec>
      <sec id="sec-3-10">
        <title>3.4.3. Interpolation</title>
        <p>Interpolation is critical for filling in gaps in seismic data, ensuring continuity and completeness
across the dataset. Seismic surveys often produce incomplete data due to technical limitations or
environmental factors, and interpolation helps to mitigate these issues by reconstructing missing
values.</p>
        <p>Bilinear Interpolation: This method is used for simple regions of the seismic dataset, where a
smooth transition between data points is sufficient. Bilinear interpolation calculates the value of a
missing data point as the weighted average of its neighboring points, resulting in a seamless
integration of the interpolated data into the original dataset.</p>
        <p>Cubic Interpolation: For more complex regions, cubic interpolation is applied. This method
provides smoother transitions between data points, making it ideal for areas with intricate geological
structures. Cubic interpolation ensures that the reconstructed data points are more accurate,
preserving the continuity of the geological features.</p>
        <p>PSNR and SSIM Evaluation: After interpolation, we evaluate the data using PSNR and SSIM. High
PSNR values indicate that the interpolated data closely matches the original dataset, while high SSIM
values suggest that the structural integrity of the geological features has been preserved. These
metrics ensure that the interpolation methods have effectively reconstructed the missing data
without introducing significant distortions.</p>
      </sec>
      <sec id="sec-3-11">
        <title>3.4.4. Filtering and noise reduction</title>
        <p>To further enhance the quality of the seismic data, a combination of filtering and noise reduction
techniques is applied. These methods are essential for removing high-frequency noise, outliers, and
other artifacts that can obscure critical geological features.</p>
        <p>Gaussian Smoothing: This technique is used to reduce high-frequency noise while maintaining
the overall structure of the seismic reflections. Gaussian smoothing applies a weighted average to the
data, effectively blurring out noise while preserving the key features of the subsurface.</p>
        <p>CLAHE (Contrast Limited Adaptive Histogram Equalization): CLAHE is applied to enhance the
contrast of the seismic data, particularly in regions with low signal-to-noise ratios. By improving
contrast, CLAHE makes subtle features more visible, aiding in the interpretation of geological
structures.</p>
        <p>Outlier Removal: Anomalous data points, which could distort the interpretation of the seismic
data, are removed to ensure that the dataset remains clean and interpretable. Outlier removal is
particularly important in regions where data acquisition may have been less accurate.</p>
        <p>Spatial Transformations: Geometrical distortions in the data, introduced during acquisition, are
corrected through spatial transformations. These transformations align the data with the expected
subsurface geometry, ensuring that the features are accurately represented.</p>
        <p>PSNR and SSIM Evaluation: Following the application of these noise reduction and filtering
techniques, the data is evaluated using PSNR and SSIM to ensure that the integrity of the original
seismic signal has been preserved. High PSNR and SSIM scores confirm that the noise has been
effectively reduced without compromising the structural features of the data.</p>
        <p>This approach ensures that each preprocessing step contributes to the overall goal of improving
data quality and interpretability. By combining spectral decomposition, interpolation, and noise
reduction techniques, we can significantly enhance the seismic dataset, making it more suitable for
deep learning models and geological interpretation.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results and discussion</title>
      <p>In our study, all experiments were conducted using real seismic data, ensuring the practical
validation of the proposed methods. Specifically, we utilized both 2D and 3D seismic datasets. The
Opunake-3D Dataset [15], stored in the .vol format, provided high-resolution 3D seismic images ideal
for testing fault detection and subsurface analysis techniques, particularly in regions with complex
geological structures. Additionally, the dataset includes detailed subsurface information, allowing for
thorough evaluation of our preprocessing and interpretation methods on real-world data.</p>
      <sec id="sec-4-1">
        <title>4.1. Spectral decomposition</title>
        <p>
          In the spectral decomposition, we applied Gaussian filters with different levels (np.linspace(
          <xref ref-type="bibr" rid="ref1 ref4">1, 9, 4</xref>
          )
and np.linspace(
          <xref ref-type="bibr" rid="ref1">1, 15, 6</xref>
          )). The correlation between color channels (R, G, B) and the average
correlation were moderate, while entropy slightly increased with finer decomposition.
        </p>
        <p>Correlation: The moderate correlation values (avg_corr = 0.2990 and avg_corr = 0.2943) indicate
that spectral decomposition captures useful but noisy features.</p>
        <p>Entropy: A slight increase in entropy (from 11.01 to 11.35) reflects added complexity in the data,
but this complexity may also introduce noise.</p>
        <p>Impact on Neural Networks: Moderate correlation and increased entropy suggest that while
spectral decomposition introduces more detail, it may also make it harder for a neural network to
distinguish meaningful patterns. Neural networks may benefit from this decomposition if the right
balance between signal and noise is maintained, but too much complexity could overwhelm the
model, making it harder to learn key features.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Co-rendered Seismic Amplitude and Coherence</title>
        <p>The combination of seismic amplitude and coherence showed a very low correlation (corr =
-0.0041), indicating minimal linear relationship between the two. However, the entropy values for
both datasets were close, suggesting that they contain comparable levels of complexity.</p>
        <p>Impact on Neural Networks: The low correlation suggests that seismic amplitude and coherence
provide complementary information, which could enhance the model's ability to learn diverse
features. However, the lack of linear correlation may require the neural network to learn more
complex, non-linear patterns between the datasets. The close entropy values imply that both datasets
contribute a similar amount of information, which could help the model generalize better.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Interpolated seismic amplitude and coherence (bilinear and cubic)</title>
      </sec>
      <sec id="sec-4-4">
        <title>4.3.1. Bilinear interpolation</title>
        <p>These values indicate that bilinear interpolation introduces considerable noise and poorly
preserves the structure of the data.</p>
        <p>Effect on Neural Networks: A low PSNR and SSIM suggest that bilinear interpolation could distort
critical patterns, making it difficult for a neural network to extract meaningful features. The model
may struggle to learn from this data, as the interpolation distorts the spatial relationships that are
important for effective learning.</p>
      </sec>
      <sec id="sec-4-5">
        <title>4.3.2. Cubic interpolation</title>
        <p>Cubic interpolation marginally improves structural similarity over bilinear interpolation, but the
overall noise remains high.</p>
        <p>Effect on Neural Networks: Cubic interpolation is slightly better at preserving structure, but the
network may still face challenges learning from this data due to the introduced noise. Although it
retains more information than bilinear interpolation, the distortion could negatively impact model
performance, especially for tasks requiring fine detail recognition.</p>
      </sec>
      <sec id="sec-4-6">
        <title>4.4. Denoising and Filtering</title>
        <p>Among the different methods, CLAHE stands out as the most balanced for enhancing image
quality, with a PSNR of 19.32 and SSIM of 0.8832 at Clip 2.0, Grid (8, 8). It effectively enhances
contrast while preserving structure, making it ideal for neural network training, as it makes features
more distinguishable without introducing artifacts. Outlier removal produces the highest PSNR
(44.80) and SSIM (0.9998), significantly improving signal clarity by removing extreme values. This
method ensures that the neural network learns from clean, high-quality data, enhancing its ability to
generalize. For denoising, a small median filter (size 3) performs well, removing noise while retaining
important details (PSNR = 19.13, SSIM = 0.7096). Larger filters, however, lead to over-smoothing,
reducing the network’s ability to capture key features. Gaussian smoothing with a small sigma also
helps reduce noise without losing too many details, but higher sigma values risk blurring important
patterns.</p>
        <p>CLAHE proved especially effective in improving contrast and detail visibility, allowing for better
feature recognition by the model. However, excessive filtering or contrast enhancement could lead to
over-smoothing or artifacts, reducing model performance.</p>
        <p>In summary, CLAHE offers the best balance for feature enhancement, while outlier removal is
ideal for improving signal clarity, making both methods excellent for neural network preprocessing.</p>
      </sec>
      <sec id="sec-4-7">
        <title>4.5. Summary of results and limitations</title>
        <p>The preprocessing techniques we applied, such as spectral decomposition, co-rendering, and
denoising, significantly improved the quality of the input data, making it more suitable for neural
network training. For example, spectral decomposition results showed moderate correlation values
(average 0.2990 and 0.2943), indicating useful but noisy features, while entropy increased slightly,
from 11.01 to 11.35, reflecting added complexity. Co-rendering of seismic amplitude and coherence
also revealed complementary information, with a very low correlation (-0.0041) but similar entropy
levels (around 11), which could enhance the neural network's ability to learn diverse features.</p>
        <p>One of the main benefits of these methods is the enhancement of important geological features,
making it easier for neural networks to identify patterns and detect anomalies. Techniques like
spectral decomposition and co-rendering allowed us to extract complementary information from the
data, improving the richness of the input. Denoising and filtering steps, such as median filtering with
a PSNR of 19.13 and SSIM of 0.7096, further reduced noise, ensuring that the network learns from
cleaner, more consistent data.</p>
        <p>However, these improvements come with some challenges. The increased complexity of the data,
especially after adding new features through spectral decomposition, means that training neural
networks will likely take longer. More computational resources are needed, and the network might
require a more sophisticated architecture to handle the complexity. There is also the risk that with
too much added detail, the model could struggle to generalize and may overfit to the noise or
irrelevant features.</p>
        <p>In summary, while our preprocessing methods greatly enhance data quality and can improve
neural network performance, they also introduce challenges like longer training times and increased
complexity, which require careful consideration when applying these techniques in practice.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>This study has demonstrated the effectiveness of various preprocessing techniques applied to
seismic data in improving data quality and preparing it for neural network-based analysis. By
utilizing methods such as spectral decomposition, interpolation, denoising, and filtering, we were
able to enhance the clarity and structure of the data while preserving critical geological features.</p>
      <p>Spectral decomposition successfully added new features by breaking down the seismic signal into
its frequency components, although it also introduced some noise. The balance between added
complexity and potential noise is crucial, as it provides more detail but requires careful management
to avoid overwhelming neural networks.</p>
      <p>Interpolation methods, including bilinear and cubic interpolation, provided continuity in areas
with missing data, although cubic interpolation was slightly better at preserving structural integrity.
Nonetheless, noise introduced during interpolation remains a challenge that could impact neural
network performance, particularly in detailed tasks.</p>
      <p>Denoising and filtering techniques such as CLAHE and median filtering played a critical role in
improving the quality of the data by reducing noise and enhancing contrast. These methods
preserved key features necessary for both human interpretation and deep learning tasks, ensuring
that seismic data retains its structural integrity throughout the preprocessing pipeline.</p>
      <p>Finally, spatial transformations and outlier removal helped further refine the data, removing
artifacts and extreme values that could have negatively impacted the learning process. These
techniques not only improved data quality but also ensured that neural networks receive clean,
wellstructured input.</p>
      <p>In conclusion, the preprocessing techniques discussed in this paper are integral to enhancing
seismic data, making it more interpretable and suitable for deep learning applications. These methods
offer a robust foundation for improving the quality of seismic datasets and facilitating more accurate
subsurface geological interpretations.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgements</title>
      <p>The work was performed according to the grant funding of the Science Committee of the Ministry of
Science and Higher Education of the Republic of Kazakhstan, Grant number: AP23489938.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>
        The authors have not employed any Generative AI tools.
[6] Nurtas M., Baishemirov Z. (2020). Numerical simulation of the acoustic wave equation by the
method of reverse time migration. International Journal of Advanced Trends in Computer
Science and Engineering. 9(
        <xref ref-type="bibr" rid="ref5">5</xref>
        ), 8223-8227. https://doi.org/10.30534/ijatcse/2020/189952020.
[7] Nurtas M, Ydyrys A, Altaibek A. (2020).  Using of Machine Learning algorithm and Spectral
method for simulation of Nonlinear Wave Equation. ICEMIS'20: Proceedings of the 6th
International Conference on Engineering &amp; MIS 2020, 43, 1–6.
https://doi.org/10.1145/3410352.3410778.
[8] Nurtas M, Baishemirov Zh, Ydyrys A, Altaibek A. (2020).   2-D Finite Element method using"
eScript" for acoustic wave propagation. ICEMIS'20: Proceedings of the 6th International
Conference on Engineering &amp; MIS 2020, 40, 1–7. https://doi.org/10.1145/3410352.3410774.
[9] Moreno, M., Santos, R., Mozart, R., Santos, W., &amp; Cerqueira, R. (2018). Assisting seismic image
interpretations with Hyperknowledge. In 2018 First International Conference on Artificial
Intelligence for Industries (AI4I) (pp. 48-51). IEEE. https://doi.org/10.1109/AI4I.2018.8665691.
[10] Wu, X., Luo, S., &amp; Hale, D. (2016). Moving faults while unfaulting 3D seismic images. Geophysics,
81(
        <xref ref-type="bibr" rid="ref2">2</xref>
        ), MA1-MA17. https://doi.org/10.1190/geo2015-0381.1
[11] Wu, X., &amp; Hale, D. (2016). Automatically interpreting all faults, unconformities, and horizons
from 3D seismic images. Interpretation, 4(
        <xref ref-type="bibr" rid="ref2">2</xref>
        ), T260-T277. https://doi.org/10.1190/INT-2015-0160.1.
[12] Yilmaz, Ö. (2001). Seismic data analysis: Processing, inversion, and interpretation of seismic data
(Vol. 1). Society of Exploration Geophysicists. https://doi.org/10.1190/1.9781560801580.
[13] Aziz, I. A., Mazelan, N. A., Samiha, N., &amp; Mehat, M. (2008). 3-D seismic visualization using SEG-Y
data format. In 2008 International Symposium on Information Technology (pp. 1-7). IEEE.
https://doi.org/10.1109/ITSIM.2008.4631705.
[14] NumPy Developers. (n.d.). numpy.lib.format — NumPy v1.22.dev0 Manual. NumPy. Retrieved
September 8, 2024, from
https://numpy.org/devdocs/reference/generated/numpy.lib.format.html.
[15] Society of Exploration Geophysicists. (n.d.). Opunake-3D Dataset. SEG Wiki. Retrieved from
https://wiki.seg.org/wiki/Opunake-3D.
[16] FORCE. (2020). FORCE 2020 Machine Learning Competition Dataset. Harvard Dataverse.
      </p>
      <p>
        https://doi.org/10.7910/DVN/2020.
[17] dGB Earth Sciences. (n.d.). Netherlands Offshore F3 Block. Open Seismic Repository. Retrieved
from https://opendtect.org/osr/.
[18] Wu, X., Li, Y., &amp; Sawasdee, P. (2022). Toward accurate seismic flattening: Methods and
applications. Geophysics, 87(
        <xref ref-type="bibr" rid="ref5">5</xref>
        ), 1SO-V558. https://doi.org/10.1190/geo2021-0662.1.
[19] Mahadik, R., Singh, G., &amp; Routray, A. (2022). Multispectral coherence analysis for better fault
visualization in seismic data. IEEE Geoscience and Remote Sensing Letters, 19, 1-5, Art no. 5000905.
https://doi.org/10.1109/LGRS.2021.3076213.
[20] Shengda, C. (2012). The comparison and analysis of several magnification of image
magnification.Jinhua Polytechnic Journal, 12(
        <xref ref-type="bibr" rid="ref3">3</xref>
        ), 66-70.
[21] Shome, S., &amp; Mandal, B. (2019). Multidimensional Contrast Limited Adaptive Histogram
      </p>
      <p>Equalization. arXiv. https://arxiv.org/abs/1906.11355.
[22] Zhao, T., &amp; Huang, J. (2009). Structure-oriented Gaussian filter for seismic detail preserving
smoothing. Proceedings of the IEEE International Conference on Image Processing (ICIP), 601-604.
https://doi.org/10.1109/ICIP.2009.5654.
[23] Zhao, T., Zhang, G., &amp; Chen, Y. (2018). An efficient outlier removal method for scattered point
cloud data. PLOS ONE, 13(7), e0201280. https://doi.org/10.1371/journal.pone.0201280.
[24] Ramachandran, P., &amp; Varoquaux, G. (2011). Mayavi: 3D Visualization of Scientific Data.</p>
      <p>
        Computing in Science &amp; Engineering, 13(
        <xref ref-type="bibr" rid="ref2">2</xref>
        ), 40-51. https://doi.org/10.1109/MCSE.2011.35.
[25] Sullivan, C., &amp; Kaszynski, A. (2019). PyVista: 3D plotting and mesh analysis through a
streamlined interface for the Visualization Toolkit (VTK). Journal of Open Source Software, 4(37),
1450. https://doi.org/10.21105/joss.01450.
[26] Benesty, J., Chen, J., Huang, Y., &amp; Cohen, I. (2009). Pearson correlation coefficient. In Noise
reduction in speech processing (pp. 1-4). Springer. https://doi.org/10.1007/978-3-642-00296-0_5.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Zeng</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          (
          <year>2010</year>
          ).
          <article-title>Stratal slicing: Benefits and challenges</article-title>
          .
          <source>The Leading Edge</source>
          ,
          <volume>29</volume>
          (
          <issue>9</issue>
          ),
          <fpage>1040</fpage>
          -
          <lpage>1047</lpage>
          . https://doi.org/10.1190/1.3485764.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Chopra</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Marfurt</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          (
          <year>2011</year>
          ).
          <article-title>Volume co-rendering of seismic attributes - A great aid to seismic interpretation</article-title>
          .
          <source>SEG Technical Program Expanded Abstracts</source>
          <year>2011</year>
          ,
          <volume>30</volume>
          (
          <issue>1</issue>
          ),
          <fpage>1150</fpage>
          -
          <lpage>1154</lpage>
          . https://doi.org/10.1190/1.3627406.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>Y. A.</given-names>
          </string-name>
          (
          <year>2010</year>
          ).
          <article-title>An improved digital image interpolation algorithm</article-title>
          .
          <source>2010 Second International Conference on Multimedia and Information Technology</source>
          ,
          <fpage>183</fpage>
          -
          <lpage>186</lpage>
          . https://doi.org/10.1109/MMIT.
          <year>2010</year>
          .
          <volume>141</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Meirmanov</surname>
            <given-names>A.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname> Nurtas</surname>
            <given-names>M.</given-names>
          </string-name>
          (
          <year>2016</year>
          ). 
          <article-title>Mathematical models of seismic in composite media: elastic and poroelastic components</article-title>
          .
          <source>Electronic Journal of Differential Equations</source>
          .
          <volume>184</volume>
          ,
          <fpage>1</fpage>
          -
          <lpage>22</lpage>
          .  EID: 
          <fpage>2</fpage>
          -
          <lpage>s2</lpage>
          .
          <fpage>0</fpage>
          -
          <issue>84978512564</issue>
          , ISBN: 
          <volume>10726691</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Meirmanov</surname>
            <given-names>A.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mukhambethzanov</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , 
          <string-name>
            <surname>Nurtas</surname>
            <given-names>M.</given-names>
          </string-name>
          (
          <year>2016</year>
          )
          <article-title>Seismic in composite media: elastic and poroelastic components</article-title>
          .
          <source>Siberian Electronic Mathematical Reports</source>
          .
          <volume>13</volume>
          . 
          <fpage>75</fpage>
          -
          <lpage>88</lpage>
          .  DOI: 
          <volume>10</volume>
          .17377/semi.
          <year>2016</year>
          .
          <volume>13</volume>
          .006.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>