<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Generative Adversarial Network-Based Approach to Scientific Time Series Analysis</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Akanksha Vijayvergiya</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alsayed Algergawy</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Chair of Data and Knowledge Engineering, University of Passau</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Acquiring time-series data in life sciences, such as soil, hydrological, and climatic measurements, presents significant challenges because of the susceptibility of real sensors to damage and malfunction. These issues often lead to incomplete, inconsistent, or missing data, where traditional machine learning and deep learning models struggle to handle efectively. To address this, the paper introduces a robust approach leveraging generative adversarial networks (GANs) to improve data reliability. GANs are used to generate synthetic data that fill in gaps and correct inconsistencies, resulting in a more complete and accurate dataset. The method involves training the GAN on the existing dataset to learn its fundamental patterns and subsequently producing new data that align with these patterns. The efectiveness of the proposed pipeline is validated through extensive set of experiments across various life-science datasets. The results demonstrate significant improvements in error metrics, including reduced mean absolute error (MAE) and root mean square error (RMSE), alongside increased R² scores. These findings highlight the enhanced accuracy and reliability of the pipeline compared to conventional approaches.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Time series</kwd>
        <kwd>GAN</kwd>
        <kwd>Deep Learning</kwd>
        <kwd>Machine Learning</kwd>
        <kwd>Data Modeling</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Scientific data is crucial in advancing our understanding of
natural phenomena and driving innovations across various
ifelds. It encompasses a range of data types, from categorical
data requiring straightforward statistical methods to
complex time-series data necessitating sophisticated analytical
approaches and advanced artificial intelligence (AI)
techniques. A significant subset of scientific data is life science
data, which often involves time series measurements such
as soil moisture levels and temperature fluctuations. These
measurements are highly sensitive to climatic variations and
external factors. Accurate monitoring, analysis, and
prediction of these parameters are essential for environmental
preservation, agricultural management, and climate change
mitigation. However, collecting and analyzing time series
data in life sciences presents challenges due to sensor issues
such as noise, errors, and sensor drift, which complicate
data collection. Enhancing data quality involves addressing
these issues to improve reliability [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. In addition, deploying
physical sensors can be cost-prohibitive, logistically
challenging, and often produce limited data volumes. Uneven
sensor distribution and environmental variability further
exacerbate these challenges, leading to incomplete datasets
that afect the performance of traditional machine learning
and deep learning models.
      </p>
      <p>
        To overcome these data collection and analysis challenges,
advanced techniques such as Generative Adversarial
Networks (GANs) can be utilized. GANs create synthetic
datasets that simulate real-world conditions, reducing the need
for extensive physical data collection. Data augmentation
using GANs improves the spatial and temporal resolution of
environmental research data, providing a more
comprehensive view of the monitored environment and enhancing the
performance of ML models with more diverse training
samples [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ]. Moreover, these techniques can help in anomaly
detection and enhance the robustness of predictive models
by addressing data imbalance [
        <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
        ].
      </p>
      <p>To sum up, the followings are the main contributions of
the paper:
1. Introducing an efective pipeline for analysis and
processing sparse temporal life science data.
2. Investigating the performance of traditional
machine learning and deep learning models in
understanding climate-soil interactions, and
3. Applying GANs in life science data analysis.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Background</title>
      <p>The field of soil-climate interactions and life science data
analysis encompasses many complex concepts and
methodologies. Establishing a robust foundation for this work,
this section presents a thorough examination of the basic
principles, ideas, and approaches that are pertinent to the
argument.</p>
      <sec id="sec-2-1">
        <title>2.1. Time Series</title>
        <p>
          Time series analysis, as described in [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ], is a statistical
technique used to examine data points that are organized
sequentially. The primary objective of time series analysis
is to understand the underlying structure and process that
produced the data. This approach is widely used for many
reasons, such as economic forecasting, stock market
analysis, weather prediction, and many more[
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. The selected
dataset for our investigation comprises soil-climate data,
which exemplifies the scientific data that may greatly
beneift from time series analysis. Long-term collection of
soilclimate data ofers essential knowledge on the relationships
between soil characteristics and climatic elements including
temperature, precipitation, and humidity.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Tradition ML &amp; DL Methods for Time</title>
      </sec>
      <sec id="sec-2-3">
        <title>Series</title>
        <p>
          Imputation techniques are crucial for estimating and
substituting missing values to facilitate comprehensive data
analysis. There are several ways to deal with missing values,
such as cubic and linear imputation, Gaussian imputation,
and K-Nearest Neighbors (KNN) imputation. Cubic and
linear imputation methods introduced by [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] are widely
used for their simplicity and efectiveness in time series and
continuous data. Gaussian imputation requires that the data
conforms to a Gaussian (normal) distribution, as stated by
Little in 1987 [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. K-closest Neighbors (KNN)
imputation is a non-parametric technique that utilizes the  closest
neighbors to approximate the missing values [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ].
        </p>
        <p>
          Machine learning utilizes a range of models to examine
data and make predictions about future events. This section
presents two often used models: Linear Regression and
Random Forest [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]. Both models were selected based on their
eficacy in managing time-varying data, which is essential
for precise forecasting and analysis in dynamic settings.
        </p>
        <p>
          Deep Neural Network (DNN) [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ] is an extension of a
simple neural network with multiple hidden layers between the
input and output layers. The addition of these hidden layers
allows the network to model more complex relationships
in the data. The working of each layer in a DNN follows
the same principles as in a simple neural network, but with
repeated layers, the depth of the network increases.
Generative Adversarial Networks
GAN are a class of
machine learning frameworks invented by Ian Goodfellow
and his colleagues in 2014 [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ]. GANs consist of two
competing neural networks, the Generator (G) and the
Discriminator (D), which are trained simultaneously through a process
known as adversarial training. GAN is designed to
generate synthetic data that resembles real data. It achieves this
through the interaction of two neural networks:
Generator (G): Generates new data samples by taking an Input
Noise Vector and producing Generated Data, and
Discriminator (D): Evaluates whether the data samples are real or
generated.
        </p>
        <p>The training of GANs involves a minimax game between
the generator and the discriminator:
1. Discriminator Training: The discriminator is
updated to maximize the probability of correctly
classifying real and fake data. The loss function for the
discriminator is given by:</p>
        <p>︁(
 = −</p>
        <p>
          E∼ data()[log ()]
+ E∼ ()[log(1 − (()))]︁) [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ]
2. Generator Training: The generator is updated to
minimize the probability that the discriminator
correctly classifies the generated data as fake. The loss
function for the generator is given by:
        </p>
        <p>
          = − E∼ ()[log (())] [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ]
        </p>
        <sec id="sec-2-3-1">
          <title>The overall objective function of the GAN is:</title>
          <p />
          <p>
            min max  (, ) = E∼ data()[log ()]
+ E∼ ()[log(1 − (()))] [
            <xref ref-type="bibr" rid="ref17">17</xref>
            ]
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Related</title>
    </sec>
    <sec id="sec-4">
      <title>Work</title>
      <p>
        Key advancements include the optimization of deep learning
models using GANs and the Sailfish Optimization Algorithm
(SOA) for soil moisture prediction [
        <xref ref-type="bibr" rid="ref18 ref19">18, 19</xref>
        ]. This method
enhances the quality of synthetic data generation, addressing
1–7
the challenge of incomplete and inconsistent soil moisture
readings. Machine learning models such as Random Forest
(RF) and Support Vector Machines (SVM) have been utilized
for soil moisture and temperature prediction, demonstrating
robustness and improved generalization capabilities.
However, these models also faced limitations in handling missing
data and variability in data quality [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. Recent studies have
leveraged GANs for data augmentation, which significantly
enhances classifier performance by increasing the volume
and diversity of the training data [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ]. Additionally, the
integration of GANs with Long Short-Term Memory (LSTM)
networks (GAN-LSTM) has been explored to improve the
accuracy of soil moisture predictions by generating
highquality synthetic time series data [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ]. Moreover, GANs
have been applied to refine seasonal weather predictions,
demonstrating significant potential for high-resolution
forecasting and capturing intricate spatial patterns among
climate variables [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ].
      </p>
      <p>This paper builds on these advancements by
incorporating advanced imputation techniques to preprocess datasets
before applying GANs, ensuring the generation of
highquality synthetic data. By integrating various deep
learning and machine learning models, including GANs, DNNs,
SNNs, and CNNs. This work further aims to develop highly
accurate and reliable prediction models for soil moisture
and temperature. This comprehensive strategy results in
enhanced prediction models that exhibit high levels of
accuracy and reliability.
4.</p>
    </sec>
    <sec id="sec-5">
      <title>Methodology</title>
      <p>In this section, we outline the proposed approach, which
consists of the following main tasks, as shown in Figure 1:
Exploratory Data Analysis (EDA), imputing missing data,
training models on the completed datasets, generating
synthetic data using GANs, and evaluating the efectiveness of</p>
      <p>In the following we are going to provide more details for</p>
      <sec id="sec-5-1">
        <title>4.1. Performing Exploratory Data Analysis</title>
        <p>EDA is an essential process for understanding the
organization, integrity, and attributes of the information. The first
step involves identifying any missing values present in the
dataset. This stage is crucial as it establishes the magnitude
of the missing data and guides the approach for addressing
these models.
each task.</p>
        <p>(EDA)
it. The number of missing values in each row and column
is computed to assess the degree of data incompleteness.
Hence, comprehending the dispersion and quantity of
missing data aids in determining the appropriate imputation
methods to guarantee the integrity and use of the dataset.</p>
      </sec>
      <sec id="sec-5-2">
        <title>4.2. Imputing Missing Data</title>
        <p>After identifying missing data, the subsequent action is
to impute these absent values. Three diferent algorithms:
KNN, Cubic and Gaussian which are explained in section
(2.2) are employed and later compared to pick the best
method for further model training.</p>
        <p>
          A statistical study is conducted to evaluate the eficacy
of each imputation strategy following the imputing of
missing data. By comparing the imputed data with the
original data, the Mean Absolute Error (MAE) and Root Mean
Squared Error (RMSE) ofer insights on both the imputed
values precision and fluctuation. Whereas RMSE is
susceptible to outliers and assigns greater weight to higher errors,
MAE calculates the average size of prediction errors. These
measures aid in assessing how efectively the imputation
techniques maintain the distribution and structure of the
underlying data [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ]. This comparison facilitates the
assessment of the eficacy of each imputation technique as
indicated by the results in section (5.3).
        </p>
      </sec>
      <sec id="sec-5-3">
        <title>4.3. Model Selection</title>
        <p>When analyzing the relationship between soil and the
environment over time, it is essential to choose suitable models
that can accurately represent the intricate dynamics and
inter-dependencies included in the data. The model
selection procedure was guided by the need to achieve a balance
between predicted accuracy, interpretability, and the
capability to handle many types of data patterns. The following
models—Random Forest, Linear Regression, Simple Neural
Network, Deep Neural Network, and Convolutional
Network—were chosen because of their suitability for training
time-series data and are described in section (2.2).</p>
        <p>The pre-processed data obtained through imputation in
subsection (4.2) and EDA in subsection (4.1) is used as
training and validation dataset for further training Models.</p>
      </sec>
      <sec id="sec-5-4">
        <title>4.4. Machine And Deep Learning Model</title>
      </sec>
      <sec id="sec-5-5">
        <title>Evaluation on Real Data</title>
        <p>The outcomes shown in subsections (5.4) analyze the eficacy
of ML and DL models when utilized with real-world datasets.
The findings suggest that both ML and DL models have
subpar performance.</p>
        <p>The inadequate performance highlights the need for other
methods to improve the accuracy and resilience of the model.
An encouraging strategy is the use of Generative Adversarial
Networks as explained in subsection (2.2).</p>
      </sec>
      <sec id="sec-5-6">
        <title>4.5. GAN Model Implementation and</title>
      </sec>
      <sec id="sec-5-7">
        <title>Generating Synthetic Data</title>
        <p>The procedures for data processing, GAN model training,
and assessment are outlined in subsection (2.2). The primary
objective is to produce artificial data samples using GANs,
using diverse input characteristics obtained from several
sources as training data.
After generating the synthetic data, ML and deep learning
DL models are trained and assessed using this data to verify
its fidelity to the original dataset and results are described
in section (5.6). The technique entails a meticulous selection
of synthetic data, guided by statistical measurements, to
guarantee its resemblance to the original data.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>5. Results</title>
      <p>This section summarizes the results obtained from the
analysis and modeling conducted in this research. The sections
are arranged in a systematic manner to comprehensively
address the following topics: exploration of the dataset,
comparison of imputation methods, application and
evaluation of machine learning and deep learning models on
real data, generation and assessment of synthetic data using
GANs, and comparative analysis of model performance on
real versus synthetic data.</p>
      <sec id="sec-6-1">
        <title>5.1. Dataset Description</title>
        <p>
          To validate the performance of the proposed approach,
we use three diferent datasets from the open data
provided by biodiversity exploratory information system
(BExIS)1) [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ] [
          <xref ref-type="bibr" rid="ref26">26</xref>
          ], which serves as the data portal for
biodiversity datasets collected within the framework of the
Biodiversity Exploratories project2. The collection includes a
variety of climate and soil factors that are consistently recorded
and classified.The characteristics of selected datasets are
illustrated in Table 1. As shown in the table, we selected
datasets that represent two diferent domains, soil and
climatic.
        </p>
        <p>Dataset
ds_set1.csv
ds_set2.csv
ds_set3.csv</p>
        <p>Start Date
05-01-2008
03-01-2009
09-02-2020</p>
        <p>End Date
06-09-2010
04-09-2011
04-09-2024</p>
        <p>No. Features No. tuples
26 801
26 801
26 801</p>
        <sec id="sec-6-1-1">
          <title>1https://www.bexis.uni-jena.de/ 2https://www.biodiversity-exploratories.de/en/</title>
          <p>The climatic features specified in Table (2) serve as
input variables for machine learning (ML) and deep learning
(DL) models. These features encompass a range of
meteorological parameters, including temperature, precipitation,
wind attributes, and duration of sunshine. They ofer a
thorough comprehension of the climate dynamics across various
altitudes and temporal dimensions.</p>
          <p>Soil parameter
Ts_05
Ts_10
Ts_20
Ts_50
SM_10
SM_20</p>
          <p>Description
Soil temperature in 5 cm
Soil temperature in 10 cm
Soil temperature in 20 cm
Soil temperature in 50 cm
Soil moisture in 10 cm</p>
          <p>Soil moisture in 20 cm</p>
          <p>The soil features listed in Table (3) are used as output
features for the ML and DL models. These features include
soil temperature and soil moisture at various depths. By
understanding the relationship between climate inputs and
soil outputs, the models can predict soil conditions based
on climatic variations, which is essential for applications in
agriculture, environmental monitoring, and land
management.</p>
        </sec>
      </sec>
      <sec id="sec-6-2">
        <title>5.2. Analysis Using Exploratory Data</title>
      </sec>
      <sec id="sec-6-3">
        <title>Analysis</title>
        <p>In the domain of actual data, sparsity is a prevalent problem,
as shown by the abundance of NaN (Not a Number) values.
The presence of these gaps in the data may often be ascribed
to sensor malfunctions or other dificulties related to data
collecting. The analysis, shown in Figure 2, indicates that
the columns SD Olivier, WD, WV, and WV gust do not have
data and should be excluded. Figure 3 demonstrates that
a considerable proportion of rows have over 50% missing
data.</p>
      </sec>
      <sec id="sec-6-4">
        <title>5.3. Analysis of Diferent Imputation</title>
      </sec>
      <sec id="sec-6-5">
        <title>Methods</title>
        <p>Various imputation approaches may be used to address
missing data. To assess the eficacy of these techniques, we may
compare the mean absolute error (MAE) and the root mean
square error (RMSE) values obtained from the original and
imputed datasets. The performance of three imputation
approaches, namely Cubic, K-nearest neighbors (KNN), and
Gaussian, was evaluated in three datasets (set1, set2, set3),
as shown in Figure 4.
1–7</p>
      </sec>
      <sec id="sec-6-6">
        <title>5.4. Analysis of Machine Learning and Deep</title>
      </sec>
      <sec id="sec-6-7">
        <title>Learning Models on Real Data</title>
        <p>The examination of various machine learning models on
the dataset Figure 5 uncovers a consistent trend where the
output feature SM_10 displays significantly higher MSE and
MAE, as well as notably lower 2 scores in comparison to
other target variables (Ts_05, Ts_10, Ts_20, Ts_50). This
suggests that the SM_10 input feature has the most impact,
leading to poor prediction performance and poor model
ift. On the other hand, alternative target variables exhibit
reduced error rates and increased 2 scores, indicating
superior model performance and predictive accuracy. Despite
experimenting with several hyperparameters in the
Random Forest model, no substantial improvements were seen.
Similar trends were observed in all three datasets. Similar
results were also visible with the deep learning models on
all three datasets as described by Figure 6.</p>
      </sec>
      <sec id="sec-6-8">
        <title>5.5. Analysis of Real vs GAN Synthetic Data</title>
        <p>The figures provided provide a comparison examination
of performance metrics between the original data and the
data produced by a GAN for three datasets (ds_set1, ds_set2,
ds_set3). Each image consists of two subplots: one
displaying the mean values and the other representing the standard
deviations. When examining the subplots that compare the
average values, it becomes evident that the produced data
closely resembles the original data in virtually all aspects.
This suggests that the GAN successfully captures the
fundamental distribution of the original dataset. The strong
agreement seen across several hyperparameter
configurations, as shown in Figures 7, 8, and 9, highlights the GAN
model’s ability to faithfully reproduce the average values
of the original dataset. Similarly, the analysis of the
subplots comparing the standard deviations shows that the
generated data closely resemble the variability of the
original data. However, there are some deviations in certain
features, indicating that while the GAN performs well
overall, reproducing specific features accurately may be more
challenging.</p>
        <p>The evaluation also considers the efects of several
hyperparameters, which are represented by various markers and
colors. These combinations consist of learning rates (0.0002,
0.003, 0.001) and loss functions (mean squared error, binary
cross-entropy).</p>
      </sec>
      <sec id="sec-6-9">
        <title>5.6. Analysis of DL and DL Models on Real vs Synthetic Data</title>
        <p>The assessment of deep learning models on three datasets
(Figures 10, 11, and 12) reveals significant improvements
when using GAN-generated synthetic data in comparison
to the original data. Notable observations consist of:
• Mean Squared Error: Models trained on synthetic
data consistently provide decreased MSE values for
all target variables, with significant enhancements
seen for SM_10.
• Mean Absolute Error: The use of GAN-generated
data leads to a notable decrease in MAE, especially
for the SM_10 target variable.
• 2: Higher 2 scores show more explanatory power
for models trained on synthetic data, particularly
improving the prediction performance for SM_10.</p>
        <p>The results emphasize the efectiveness of using synthetic
data produced by GANs to improve the accuracy and
reliability of models, especially in predicting the SM_10 variable.</p>
        <p>Similar observations were observed using ML models
trained on the synthetic data. The assessment of machine
learning models in three datasets (Figures 13, 14, and 15)
reveals substantial improvements when using synthetic data
generated by GAN compared to the original data. Notable
observations consist of:
• Mean Squared Error: Models trained on synthetic
data consistently exhibit decreased MSE values for</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>6. Conclusion</title>
      <p>In this paper we investigated the analysis of time series
datasets collected with the life science domain. We
demonstrate the efect of KNN-based imputation techniques and
show how KNN imputation consistently outperforms other
methods, making it the optimal choice for addressing
missing data in this scenario. The data generated by GANs
exhibits a high degree of similarity to the original data,
demonstrating GANs’ ability to accurately replicate the
underlying distribution of the actual dataset (5.5). Models
trained on GAN-generated data show superior performance
compared to those trained on real data, as evidenced by
significantly improved evaluation metrics, such as lower MSE
and higher R2 scores. These improvements are observed
in both machine learning and deep learning models across
various datasets described in section (5.6). The findings of
this paper have broad applicability in biological sciences
and environmental research. This study enhances the
resilience and precision of models predicting soil properties
under diferent climatic conditions, facilitating more reliable
agricultural planning and environmental monitoring.</p>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <p>Either:
The author(s) have not employed any Generative AI tools.</p>
      <p>During the preparation of this work, the author(s) used
X-GPT-4 and Gramby in order to: Grammar and spelling
check. Further, the author(s) used X-AI-IMG for figures
3 and 4 in order to: Generate images. After using these
tool(s)/service(s), the author(s) reviewed and edited the
content as needed and take(s) full responsibility for the
publication’s content.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>G.</given-names>
            <surname>Iglesias</surname>
          </string-name>
          , E. Talavera, Á.
          <string-name>
            <surname>González-Prieto</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Mozo</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Gómez-Canaval</surname>
          </string-name>
          ,
          <article-title>Data augmentation techniques in time series domain: a survey and taxonomy</article-title>
          ,
          <source>Neural Computing and Applications</source>
          <volume>35</volume>
          (
          <year>2023</year>
          )
          <fpage>10123</fpage>
          -
          <lpage>10145</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Cekić</surname>
          </string-name>
          ,
          <article-title>Anomaly detection in medical time series with generative adversarial networks: a selective review, Anomaly Detection-Recent Advances, AI and ML Perspectives and Applications (</article-title>
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>T.</given-names>
            <surname>Chakraborty</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U. R.</given-names>
            <surname>KS</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Naik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Panja</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Manvitha</surname>
          </string-name>
          ,
          <article-title>Ten years of generative adversarial nets (gans): a survey of the state-of-the-</article-title>
          <string-name>
            <surname>art</surname>
          </string-name>
          ,
          <source>Machine Learning: Science and Technology</source>
          <volume>5</volume>
          (
          <year>2024</year>
          )
          <fpage>011001</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>B. K.</given-names>
            <surname>Iwana</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Uchida</surname>
          </string-name>
          ,
          <article-title>An empirical survey of data augmentation for time series classification with neural networks</article-title>
          ,
          <source>Plos one 16</source>
          (
          <year>2021</year>
          )
          <article-title>e0254841</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>H.</given-names>
            <surname>Jeon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <article-title>A new data augmentation method for time series wearable sensor data using a learning mode switching-based dcgan</article-title>
          ,
          <source>IEEE Robotics and Automation Letters</source>
          <volume>6</volume>
          (
          <year>2021</year>
          )
          <fpage>8671</fpage>
          -
          <lpage>8677</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>G. E.</given-names>
            <surname>Box</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. M.</given-names>
            <surname>Jenkins</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. C.</given-names>
            <surname>Reinsel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. M.</given-names>
            <surname>Ljung</surname>
          </string-name>
          ,
          <article-title>Time series analysis: forecasting and control</article-title>
          , John Wiley &amp; Sons,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>P.</given-names>
            <surname>Diggle</surname>
          </string-name>
          , E. Giorgi,
          <source>Time series: a biostatistical introduction</source>
          , Oxford University Press,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>M. G. Kendall,</surname>
          </string-name>
          <article-title>The advanced theory of statistics</article-title>
          . (
          <year>1946</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>K.</given-names>
            <surname>Kornelsen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Coulibaly</surname>
          </string-name>
          ,
          <article-title>Comparison of interpolation, statistical, and data-driven methods for imputation of missing values in a distributed soil moisture dataset</article-title>
          ,
          <source>Journal of Hydrologic Engineering</source>
          <volume>19</volume>
          (
          <year>2014</year>
          )
          <fpage>26</fpage>
          -
          <lpage>43</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>R. J.</given-names>
            <surname>Little</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. B.</given-names>
            <surname>Rubin</surname>
          </string-name>
          ,
          <article-title>Statistical analysis with missing data</article-title>
          , New York: Wiley (
          <year>1987</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>X.</given-names>
            <surname>Qiu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , G. Tao,
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <article-title>An improved gaussian process for filling the missing data in gnss position time series considering the influence of adjacent stations</article-title>
          ,
          <source>Scientific Reports</source>
          <volume>14</volume>
          (
          <year>2024</year>
          )
          <fpage>19268</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>O.</given-names>
            <surname>Troyanskaya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cantor</surname>
          </string-name>
          , G. Sherlock, P. Brown, T. Hastie,
          <string-name>
            <given-names>R.</given-names>
            <surname>Tibshirani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Botstein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. B.</given-names>
            <surname>Altman</surname>
          </string-name>
          ,
          <article-title>Missing value estimation methods for dna microarrays</article-title>
          ,
          <source>Bioinformatics</source>
          <volume>17</volume>
          (
          <year>2001</year>
          )
          <fpage>520</fpage>
          -
          <lpage>525</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>P.</given-names>
            <surname>Dixneuf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Errico</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Glaus</surname>
          </string-name>
          ,
          <article-title>A computational study on imputation methods for missing environmental data</article-title>
          ,
          <source>arXiv preprint arXiv:2108.09500</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>N. A.</given-names>
            <surname>Zainuri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. A.</given-names>
            <surname>Jemain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Muda</surname>
          </string-name>
          ,
          <article-title>A comparison of various imputation methods for missing values in air quality data</article-title>
          ,
          <source>Sains Malaysiana</source>
          <volume>44</volume>
          (
          <year>2015</year>
          )
          <fpage>449</fpage>
          -
          <lpage>456</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>L.</given-names>
            <surname>Breiman</surname>
          </string-name>
          , Random forests,
          <source>Machine learning 45</source>
          (
          <year>2001</year>
          )
          <fpage>5</fpage>
          -
          <lpage>32</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>D. E.</given-names>
            <surname>Rumelhart</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. E.</given-names>
            <surname>Hinton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. J.</given-names>
            <surname>Williams</surname>
          </string-name>
          ,
          <article-title>Learning representations by back-propagating errors</article-title>
          ,
          <source>nature</source>
          <volume>323</volume>
          (
          <year>1986</year>
          )
          <fpage>533</fpage>
          -
          <lpage>536</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>I.</given-names>
            <surname>Goodfellow</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Pouget-Abadie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mirza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Warde-Farley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ozair</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Courville</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bengio</surname>
          </string-name>
          , Generative adversarial nets,
          <source>Advances in neural information processing systems</source>
          <volume>27</volume>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>D. M. N. Sivasankaran</surname>
            <given-names>S1</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dr.K. Jagan</surname>
            <given-names>Mohan2</given-names>
          </string-name>
          ,
          <article-title>Soil moisture quantity prediction using optimized deep learning model with sailfish optimization algorithm</article-title>
          ,
          <source>Journal of Neural Networks</source>
          <volume>15</volume>
          (
          <year>2024</year>
          )
          <fpage>4268</fpage>
          -
          <lpage>3515</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>E.</given-names>
            <surname>Brophy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>She</surname>
          </string-name>
          , T. Ward,
          <article-title>Generative adversarial networks in time series: A systematic literature review</article-title>
          ,
          <source>ACM Computing Surveys</source>
          <volume>55</volume>
          (
          <year>2023</year>
          )
          <fpage>1</fpage>
          -
          <lpage>31</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>S.</given-names>
            <surname>Siddharth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Abhishek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Karthik</surname>
          </string-name>
          ,
          <article-title>Machine learning applications for predicting soil moisture</article-title>
          ,
          <source>Environmental Modelling &amp; Software</source>
          <volume>134</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>V.</given-names>
            <surname>Venkatesan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Nithya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Karthikeyan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Adilakshmi</surname>
          </string-name>
          ,
          <article-title>A deep learning data augmentation experiment to classify agricultural soil moisture to conserve plants</article-title>
          ,
          <source>International Journal of Intelligent Systems and Applications in Engineering</source>
          <volume>11</volume>
          (
          <year>2023</year>
          )
          <fpage>114</fpage>
          -
          <lpage>119</lpage>
          . URL: https:// www.ijisae.org/index.php/IJISAE/article/view/2834.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Shi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>A comprehensive study of deep learning for soil moisture prediction</article-title>
          ,
          <source>Hydrology and Earth System Sciences Discussions</source>
          <year>2023</year>
          (
          <year>2023</year>
          )
          <fpage>1</fpage>
          -
          <lpage>38</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>G.</given-names>
            <surname>Gousios</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Mamouka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Vourlioti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kotsopoulos</surname>
          </string-name>
          ,
          <article-title>Downscaling seasonal weather forecasting with generative adversarial networks, preprint (</article-title>
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>N. M.</given-names>
            <surname>Noor</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. M. Al Bakri Abdullah</surname>
            ,
            <given-names>A. S.</given-names>
          </string-name>
          <string-name>
            <surname>Yahaya</surname>
            ,
            <given-names>N. A.</given-names>
          </string-name>
          <string-name>
            <surname>Ramli</surname>
          </string-name>
          ,
          <article-title>Comparison of linear interpolation method and mean method to replace the missing values in environmental data set, in: Materials science forum</article-title>
          , volume
          <volume>803</volume>
          ,
          <string-name>
            <surname>Trans</surname>
            <given-names>Tech Publ</given-names>
          </string-name>
          ,
          <year>2015</year>
          , pp.
          <fpage>278</fpage>
          -
          <lpage>281</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <article-title>BEXIS 2, Tropical climate data: Exported sensor data</article-title>
          ,
          <year>2024</year>
          . URL: https://www.biodiversity-exploratories.de/ en/klimatool/.
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>M.</given-names>
            <surname>Fischer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Weithoener</surname>
          </string-name>
          , W. W. Weisser,
          <string-name>
            <surname>TubeDB:</surname>
          </string-name>
          <article-title>An on-demand processing database system for climate station data</article-title>
          ,
          <source>Computers &amp; Geosciences</source>
          <volume>145</volume>
          (
          <year>2020</year>
          )
          <article-title>104641</article-title>
          . doi:
          <volume>10</volume>
          .1016/j.cageo.
          <year>2020</year>
          .
          <volume>104641</volume>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>