<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>V. Alekseiko);</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Sea ice extent data analysis using statistical and unsupervised learning methods⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Vitalii Alekseiko</string-name>
          <email>vitalii.alekseiko@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vitaly Levashenko</string-name>
          <email>vitaly.levashenko@fri.uniza.sk</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yurii Voichur</string-name>
          <email>voichury@khmnu.edu.ua</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dmytro Medzatyi</string-name>
          <email>medza@ukr.net</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Khmelnytskyi National University</institution>
          ,
          <addr-line>Institutska str., 11, Khmelnytskyi, 29016</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Zilina University</institution>
          ,
          <addr-line>Univerzitná 8215, 010 26 Žilina</addr-line>
          ,
          <country country="SK">Slovakia</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>1932</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0003</lpage>
      <abstract>
        <p>The article analyzes the main trends in the change in Sea Ice extent in the Arctic and Antarctic regions. The impact of Sea Ice extent on climate change and ecosystems in both polar regions and more remote regions of the Earth is analyzed. A statistical analysis of historical data is performed and the Time Series are decomposed into the main components: trends, seasonality, and residuals. To determine the stationarity of the numerical series and test the hypotheses, the Dickey-Fuller test is performed. It is used machine learning methods under unsupervised learning to provide clustering for better understandings of sea ice extent patterns and anomalies in data. The direction of further research is outlined based on the results obtained.</p>
      </abstract>
      <kwd-group>
        <kwd>Sea Ice extent</kwd>
        <kwd>time series</kwd>
        <kwd>data analysys</kwd>
        <kwd>climate change</kwd>
        <kwd>trends 1</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>and preparation for forecasting.
fields of economics [4], finance, climate, etc.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Literature review</title>
      <p>Sea Ice extent in the Arctic and Antarctic is one of the seven main climate indicators identified by
the World Meteorological Organization (WMO) [1]. This indicator has a significant impact on the
ecosystems of the Arctic and Antarctic regions, as well as on other regions of the globe, and plays a
significant role in climate change. Determining the main trends and patterns of changes in sea ice
extent is extremely relevant in the context of understanding potential risks not only for the polar
regions, but also for the entire biosphere. Studying the time series of sea ice extent changes provides
extremely valuable information for choosing techniques and models for predicting future indicators.</p>
      <p>Arctic and Antarctic ice extent have shown stable trends for decades, but today’s observations
indicate abnormally low levels [2].</p>
      <p>Although nowadays there are many approaches to forecasting, it should be noted that an
extremely important aspect is data preparation, which allows to significantly increase the accuracy
of the forecast [3]. Thus, there is a need to conduct statistical analysis of data and their pre-processing</p>
      <p>This approach is relevant in the field of machine learning for various forecasting tasks in the
Previous works have examined the impact of surface temperature on climate change, approaches to
forecasting temperature time series [5, 6], the impact of such forecasting in the context of developing
sustainable cities and communities [7], and the specifics of developing predictive information
systems [8, 9].</p>
      <p>This study focuses on the study of time series of sea ice extent. Although the studies share some
common patterns, such as seasonality, the specificities of polar regions necessitate a comprehensive
analysis that takes into account the climatic characteristics of the regions under study.</p>
      <p>The conducted literature review confirms the relevance of the chosen research topic. Nowadays,
the main directions of scientific research are the study of the Arctic and Antarctic regions and the
impact of changes occurring there on climate change in different regions of the world.</p>
      <p>Changes in the ice cover of the Arctic and Antarctica affect the Sea level [2], precipitation [10, 11,
12, 13], coastal wave heights [14] and many other indicators, turning entire regions into areas
vulnerable to climate change [15]. Sea Ice also has a significant impact on flora and fauna [16, 17,
18].</p>
      <p>The article [19] establishes a correlation between historical trends in the area of sea ice in different
seas of the Arctic Circle by analyzing the impact of temperature changes in the Arctic Circle and the
average annual concentration of global CO2 on them. An important aspect of the study of the Arctic
region is the analysis of the greening of the area, which demonstrates growth trends [20].</p>
      <p>Also, when studying climate data, much attention is paid to the analysis of time series, in
particular the study of seasonal patterns.</p>
      <p>The work [21] considers seasonal trends of the early twentieth century and conducts a detailed
statistical analysis of the data. The study [22] proposes a model that uses the theory of time series
decomposition to predict temperature in Chinese cities.</p>
      <p>In a number of studies, the Dickey-Fuller test is used to check the stationarity of climate data. In
particular, when studying climate trends for the Indian tropical river basin [23], changes in
nonstationary extreme precipitation due to climate change in East Malaysia [24], and the incidence of
dengue fever due to climate change in Singapore [25].</p>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <sec id="sec-3-1">
        <title>3.1. Dataset structure</title>
        <p>It was used Daily Sea Ice extent Data dataset from Kaggle [26]. All information provided by Data and
image archive of National Snow and Ice Data Center [27]. The calculation of the Sea Ice extent is
based on the accepted threshold of the frozen area of 15% [28]. Technologies of the National
Aeronautics and Space Administration (NASA) as satellite images analysis are used for data
collection [29]. Dataset includes 7 variables:
•
•
•
•
•
•
•</p>
        <p>Year;
Month;
Day;
Extent (unit is 10^6 sq km);
Missing (unit is 10^6 sq km);
Source: Source data product web site [27];</p>
        <p>Hemisphere.</p>
        <p>Current dataset is limited by data from October 26, 1978 to May 31, 2019. But all the more recent
information is presented by National Snow and Ice Data Center [27].</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Time series data analysis</title>
        <p>A number of studies are conducted to analyze Time Series Data. To determine general trends, it is
advisable to aggregate the data by calculating the average monthly temperature for each month of
each year. Then graphs are plotted and a linear approximation is performed. Next, the angle between
the resulting line and the time axis is calculated. The result obtained indicates general trends in the
Sea Ice Extent. A negative result indicates a tendency for it to decrease, a positive result indicates an
increase, and a result close to zero indicates no significant changes. Also, for a more comprehensive
understanding of the changes and the selection of forecasting techniques [30], it is necessary to
decompose the time series into three main components: Trend, Seasonality, and Residuals.
3.2.1. Trend
Trend represents a long-term movement in data. Thus, it may increase, decrease, or remain constant
over time. A trend captures a systematic change in data. The important factor is that such changes
are not caused by short-term fluctuations or periodic patterns.</p>
        <p>In a time series, the trend component is often modeled as a function of time (e.g., linear, quadratic,
exponential, etc.).</p>
        <p>A linear trend can be represented as:</p>
        <p>
          T(t) = β0 + β1t, (
          <xref ref-type="bibr" rid="ref1">1</xref>
          )
where: t – time; T(t) – the value of the trend at time t; β0 – the intercept (the value of the trend at
t=0); β1 – the slope (indicating the rate of change of the trend over time).
        </p>
        <p>
          Also there are non-linear trends as polynomial (
          <xref ref-type="bibr" rid="ref2">2</xref>
          ) or exponential (
          <xref ref-type="bibr" rid="ref3">3</xref>
          ). The exponential form
captures the rapid growth or decay.
        </p>
        <p>
          T(t) = β0 + β1t + β2t2, (
          <xref ref-type="bibr" rid="ref2">2</xref>
          )
where: β2 – quadratic coefficient. It captures the curvature or acceleration of the trend over time.
        </p>
        <p>
          T(t) = αeβt,
(
          <xref ref-type="bibr" rid="ref3">3</xref>
          )
where: α and β are constants.
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>3.2.2. Seasonality</title>
        <p>The seasonality component refers to periodic fluctuations in a time series. In most cases, such
intervals are fixed. Thus, cycles with regularly repeating intervals can be modeled using periodic
patterns, in particular sinusoidal functions.</p>
        <p>A simple sinusoidal seasonal model can be represented as:</p>
        <p>
          S(t) = γ1sin(2πft + ϕ1) + γ2cos(2πft + ϕ2), (
          <xref ref-type="bibr" rid="ref4">4</xref>
          )
where: S(t) – seasonal component at time t; γ1 – amplitude of the sine term; γ2 – amplitude of the
cosine term; f – frequency of the seasonality; ϕ1 – phase shift for the sine component; ϕ2 – phase
shift for the cosine component.
        </p>
      </sec>
      <sec id="sec-3-4">
        <title>3.2.3. Residuals</title>
        <p>Residuals, also known as “noise”, represent the remainder of the time series after trend and
seasonality have been subtracted. In effect, they represent the random component of the time series
that cannot be explained by trend or seasonal patterns.</p>
        <p>
          R(t) = Y(t) − T(t) − S(t), (
          <xref ref-type="bibr" rid="ref5">5</xref>
          )
where: R(T) – residual (or noise) component at time t; Y(T) – observed value of the time series at
time t;
        </p>
        <p>Residuals are usually considered to be white noise. This means that they should have the
following properties:
•
•</p>
        <p>Zero mean: the mean of the residuals is zero.</p>
        <p>Constant variance: the variance of the residuals is constant over time (homoscedasticity).
•</p>
        <p>No autocorrelation: the residuals are independent, meaning that there is no significant
correlation between them at different time lags.</p>
        <p>Thus, residuals R(t) can be described by the formula:</p>
        <p>
          R(t) = Ν(0, σ2), (
          <xref ref-type="bibr" rid="ref6">6</xref>
          )
where: Ν(0, σ2) represents a normal distribution with zero mean and variance σ2. If the residuals
exhibit autocorrelation, this means that there is some structure missing in the model and it may need
to be improved.
        </p>
      </sec>
      <sec id="sec-3-5">
        <title>3.3. Dickey-fuller test</title>
        <p>The Dickey-Fuller test is a statistical test that is used to determine the stationarity of a time series
by testing the null hypothesis of the presence of a unit root in an autoregressive time series model.</p>
        <p>The Dickey-Fuller test examines the time series yt using the following autoregressive (AR) model
of order 1:</p>
        <p>
          yt = ϕyt−1 + ϵt, (
          <xref ref-type="bibr" rid="ref7">7</xref>
          )
where: yt – the value of the time series at time t; yt-1 – the value of the time series at time t – 1; ϕ
– the coefficient of the lagged variable; ϵt – a white noise error term (independently and identically
distributed with mean zero).
        </p>
        <p>
          To perform Dickey-Fuller test it is determined Null (H0) and Alternative (HA) Hypotheses:
• Null Hypothesis (H0) : ϕ = 1 (The series has a unit root and is non-stationary).
• Alternative Hypothesis (HA) : ϕ &lt; 1 (The series does not have a unit root and is stationary).
The Dickey-Fuller test is based on a transformed version of the AR(
          <xref ref-type="bibr" rid="ref1">1</xref>
          ) model [31]:
Δyt = α + βt + γyt−1 + ϵt,
(
          <xref ref-type="bibr" rid="ref8">8</xref>
          )
where: Δyt = yt – yt-1 – the first difference of the series; α – a constant (optional) to account for a
non-zero mean; βt – a trend term (optional) to account for a deterministic trend; γ = ϕ – 1 – the
coefficient that measures the presence of a unit root.
        </p>
        <p>Thus, the null and alternative hypotheses are now expressed in terms of γ:
• H0 : γ = 0 (ϕ = 1).
• HA : γ = 0 (ϕ &lt; 1).</p>
        <p>The test statistic can be calculated as:</p>
        <p>
          τ = SEγ(γ), (
          <xref ref-type="bibr" rid="ref9">9</xref>
          )
where: γ – the estimated value of γ; SE(γ) – the standard error of γ.
        </p>
        <p>This statistic is compared to critical values derived from the Dickey-Fuller distribution.
If τ value is less than the critical value, then the series is stationary and H0 Hypothesis rejected.</p>
        <p>If τ value is greater than the critical value, then the series is non-stationary and H0 Hypothesis
failed to reject.</p>
        <p>The augmented Dickey-Fuller (ADF) test is an extension of this test that includes higher-order
lagged differences of yt to account for autocorrelation in the error terms.</p>
      </sec>
      <sec id="sec-3-6">
        <title>3.4. Data analysis by machine learning techniques</title>
        <p>Since statistical analysis does not allow for a holistic understanding of processes, due to its
limitations, in particular the inability to capture hidden patterns, it was decided to conduct data
analysis using machine learning methods under unsupervised learning [32, 33], in particular
K</p>
        <p>Cj = {xi: ‖xi − μi‖2 ≤ ‖xi − μm‖2, ∀m},
where: Cj – the set of points assigned to cluster j.</p>
        <p>Update step is represent calculation of new centroid for each cluster:</p>
        <p>
          μi = C1j ∑xi∈Cj xi, (
          <xref ref-type="bibr" rid="ref11">11</xref>
          )
where |Cj| – the number of points in cluster j.
        </p>
        <p>The convergence criterion is achieved when the centroids no longer change significantly or a
fixed number of iterations is reached.</p>
      </sec>
      <sec id="sec-3-7">
        <title>3.4.2. Density-based spatial clustering of applications with noise</title>
        <p>DBSCAN performs clustering of points based on density, so the method is robust for data that forms
irregular shapes [36].</p>
        <p>A point p is a core point if at least minimum number of points required to form a dense region
(MinPts) points (including itself) exist within ε-radius.
means Clustering, Density-Based Spatial Clustering of Applications with Noise (DBSCAN) and
Hierarchical Clustering. Using these methods will provide more information on trends in Sea Ice
extent and existing anomalies in the data [34].</p>
      </sec>
      <sec id="sec-3-8">
        <title>3.4.1. K-Means clustering</title>
        <p>The K-Means algorithm divides n data points into k clusters, minimizing the variance in each cluster
[35].</p>
        <p>
          The first step of this method is initialization, that is initial cluster centroids μi are randomly
selected. After that, each data point xi is assigned to the nearest centroid based on the Euclidean
distance:
(
          <xref ref-type="bibr" rid="ref10">10</xref>
          )
(
          <xref ref-type="bibr" rid="ref12">12</xref>
          )
(
          <xref ref-type="bibr" rid="ref14">14</xref>
          )
(
          <xref ref-type="bibr" rid="ref15">15</xref>
          )
|Nε(p)| ≥ MinPts,
where: ε – maximum neighborhood radius.
        </p>
        <p>Nε(p) is determined as:
where: xi and xj – points.</p>
        <p>d(xi, xj) = xi − xj ,</p>
        <p>Nε(p) = {q ∈ D: ‖q − p‖ ≤ ε} ≥ MinPts,
where: q and p – points.</p>
        <p>
          If p is a core point and expression (
          <xref ref-type="bibr" rid="ref14">14</xref>
          ) is true, A point q is directly density-reachable from p.
(
          <xref ref-type="bibr" rid="ref13">13</xref>
          )
q ∈ Nε(p)
        </p>
        <p>The point r is a density-reachable from p, provided that there exists a chain of directly
densityreachable points.</p>
        <p>The algorithm of the method consists in choosing an unvisited point and checking whether it is
a core point. If the point is a core point, a new cluster is created by expanding its density-reachable
point. If the point is not a core point, it is marked as noise. The algorithm is repeated until all points
are visited.</p>
      </sec>
      <sec id="sec-3-9">
        <title>3.4.3. Hierarchical clustering</title>
        <p>Hierarchical clustering creates a hierarchy of clusters using either an agglomeration (bottom-up) or
a divisive (top-down) approach [37].</p>
        <p>Agglomerative hierarchical clustering (bottom-up) is based on a distance matrix. The matrix is
formed by calculating the pairwise distances between all points using the Euclidean distance:
d(Ci, Cj) =</p>
        <p>max
x∈Ci,y∈Cj
1
d(x, y),
d(x, y),
d(Ci, Cj) =</p>
        <p>d(x, y)
|Ci| Cj x∈Ci,y∈Cj</p>
        <p>
          The dendrogram is built by merging until one cluster remains. Divisive hierarchical clustering
(top-down) is performed starting with a single cluster that contains all points. The clusters are split
recursively until each point becomes a separate cluster.
(
          <xref ref-type="bibr" rid="ref16">16</xref>
          )
(
          <xref ref-type="bibr" rid="ref17">17</xref>
          )
(
          <xref ref-type="bibr" rid="ref18">18</xref>
          )
        </p>
        <p>
          Cluster merging provides by defining each point as a separate cluster. Then, the two closest
clusters are iteratively merged based on the connectivity criteria: Single linkage (
          <xref ref-type="bibr" rid="ref16">16</xref>
          ), Complete
linkage (
          <xref ref-type="bibr" rid="ref17">17</xref>
          ), Average linkage (
          <xref ref-type="bibr" rid="ref18">18</xref>
          ).
        </p>
        <p>d(Ci, Cj) = x∈mCi,iyn∈Cj
where: x and y – points of different clusters.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results and discussions</title>
      <p>To assess the fluctuations in Sea Ice Extent, the data were aggregated by calculating monthly
averages (Fig. 1, 2). It is clear from the chart that for the Northern Hemisphere (Fig. 1) the minimum
is observed in September, and the maximum in March, or sometimes in February. At the same time,
for the Southern Hemisphere (Fig. 2), the sea ice extent reaches its maximum values in September,
and its minimum values in February.</p>
      <p>To consider the problem of sea ice extent changes in more detail, we will perform an
approximation and construct straight lines that will demonstrate the main trends (Fig. 3, 4). The
calculated angles between the resulting straight lines and the time axis are given in Table 2.</p>
      <p>Table 1 shows the maximum, minimum and average values of sea ice extent for each month,
indicating the year when the highest and lowest values were recorded for the period from October
1978 to May 2019. Statistical data show that the maximums in the Northern Hemisphere were
recorded only in the 20th century, while the minimums cover the period from 2016 to 2019. For the
Southern Hemisphere, the situation is not so critical.
The results show that a significant decrease in sea ice extent is observed in the Northern Hemisphere,
especially in September and October. No such trends are recorded in the Southern Hemisphere. The
slope is always positive and close to zero. Thus, there has been a gradual, slight increase in sea ice
extent.</p>
      <p>To examine regional trends by month, the time series data were decomposed into trends,
seasonality, and residuals. This decomposition plays a key role, as it helps to break down the
underlying patterns and fluctuations in the data, as well as provide a clearer understanding of the
process being analyzed.</p>
      <p>Trends are used to predict future values based on previous data, which is extremely important
when predicting climate indicators based on historical data.</p>
      <p>Trends provide insight into future developments, allowing governments and organizations to
plan and develop adaptation and mitigation strategies. Thus, if the trend indicates a rise in sea level
due to melting ice, cities can prepare for potential flooding or inundation.</p>
      <p>Residual analysis allows us to assess how well the model has captured the data. If the residuals
show patterns (e.g., autocorrelation), this indicates that the model may need adjustment or additional
features. If the residuals are large at a particular time, this may indicate an anomaly or event that
was not captured by trend or seasonal patterns. For example, a sudden spike in sea ice extent due to
extreme weather would be considered a residual anomaly. If the residuals show consistent patterns,
the model may need to be refined to make more accurate predictions.</p>
      <p>Linear trends of Sea Ice Extent for each month by years are shown in the Figure 3 for North
Hemisphere and in the Figure 4 for South Hemisphere. Figures 3 and 4 show that the area of Sea Ice
in the Arctic region has a decreasing trend, while in the Antarctic region, stability of indicators was
observed from 1979 to 2019.</p>
      <p>To determine the rate of change in sea ice extent, linear interpolation was performed and the
angles of inclination to the abscissa axis were determined. The results of the calculations are given
in Table 2.</p>
      <p>Figure 5 shows the decomposition of the Time Series into the main components: trends,
seasonality, and residuals for September in the North Hemisphere, when annual minimum of Sea Ice
extent are observed. Random values of residuals, indicating that the decomposition has successfully
isolated trend and seasonality.</p>
      <p>Results of Dickey-Fuller test are presented in Table 3. The test statistic (−1.0269) is greater than
all critical values (−3.4442, −2.8676, −2.5700), and the p-value (0.7433) is much greater than the typical
significance level (0.05).
As a result, it is failed to reject the null hypothesis (H0) that the Time Series has a unit root. Thus,
the Time Series for North Hemisphere Sea Ice extent is non-stationary. It means, that its statistical
properties (mean, variance, autocorrelation) may change over time. Therefore, it is might be
necessary to provide further transformation (e.g., differencing) to make Time Series stationary before
forecasting. The test statistic (−3.4721) is less than the critical value at the 1% significance level
(−3.4443), and the p-value (0.0087) is less than 0.05. Thus, the null hypothesis (H0) is rejected.
Therefore Time Series has a unit root. It means, that Time Series for South Hemisphere Sea Ice extent
is stationary. Thus, its statistical properties do not change over time.</p>
      <p>For better understanding patterns and detect anomalies in data it was performed clustering
analysis using unsupervised learning techniques.</p>
      <p>The result of K-Means clustering for the South Hemisphere shows three distinct clusters (Figure
6). Cluster 0 highlighted in blue demonstrates high values of Sea Ice extent from 12 to 16 million km².
This values were dominant before 2000. Cluster 1 highlighted in green covers values between 6 and
11 million km². This cluster is intermediate and it indicates a gradual transition from high to lower
extents. Cluster 2 highlighted in orange indicates low values of Sea Ice extent. They become
dominant after the early 2000s. The transition from cluster 0 to 1 and then to 2 shows a steady decline
in Sea Ice extent over time.</p>
      <p>In the Southern Hemisphere K-Means clustering result shows different patterns (Figure 7) compared
to the North Hemisphere.</p>
      <p>There are also three distinct clusters. Cluster 0 highlighted in blue demonstrates low Sea Ice
extent. Cluster 1 highlighted in orange shows intermediate values, which become dominant after
2000. Cluster 2 highlighted in green represents high values, mostly before the 2000s, then declining
over time.</p>
      <p>Unlike the Arctic, where observed tendency to losing ice, situation in the Antarctic is more stable.
In some periods trends are slightly increasing.</p>
      <p>The transition from Cluster 2 to 1 suggests seasonal fluctuations but not a clear long-term decline
like in the North.</p>
      <p>The clustering suggests that Sea Ice extent fluctuations in the South Hemisphere are more
complex and may have more hidden patterns.</p>
      <p>The clustering performed using the DBSCAN method did not yield significant results. The method
did not recognize differences, assigning all values to one cluster for North (Figure 8) and South
(Figure 9) Hemispheres. This result indicates that no anomalous values were detected in the values
of both hemispheres.</p>
      <p>Hierarchical clustering for North Hemisphere (Figure 9) detects three clusters. There are low
values in Cluster 0 highlighted in blue. It represents months with low values of Sea Ice extent among
all years. Cluster 1 represents intermidiete values. As for months with the highest extent in each
year, it is detected decreasing tendency. Cluster 2 highlighted in green demonstrates high values of
Sea Ice extent up to the beginning 1990s. Transition from Cluster 2 to 1 shows main trend to
decreasing the highest Sea Ice extent each year.</p>
      <p>The results of hierarchical clustering for South Hemisphere are presented in Figure 10. It is
detected three clusters. Cluster 0 highlighted in blue shows low values which demonstrates months
with yearly minimum Sea Ice extent. Cluster 1 highlighted in orange represents values with high
amplitude, which mostly where observed after middle of 1990s. Cluster 2 demonstrates values with
lower amplitude in earlier years. Transition from cluster 2 to 1 shows increasing of amplitude and
difference between months with high values of Sea Ice extent. Maximum values are increasing and
intermediate values are decreasing. Thus, South Hemisphere has more complex patterns compare to
North.</p>
      <p>Clustering analysis detects that Antarctic Sea Ice extent has complex patterns and potentially
there are some hidden patterns. This region is pretend to be discovered more detailed. Arctic Sea Ice
extent are decreasing more quickly each year and it is needed urgent action to avoid negative
consequences.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Future work</title>
      <p>Based on the obtained results, it is needed to provide transformations as differencing, logarithmic or
seasonal decomposition for Sea Ice extent in North hemisphere. Then apply re-test to confirm
stationarity. For the South Hemisphere series no further transformation is needed, and this data is
ready for Time Series modeling and forecasting. Stationarity is a fundamental assumption in most
Time Series models (e.g., ARIMA, Vector AutoRegression). Although, machine learning techniques
can work with both stationary and non-stationary data, removing trends and seasonality can improve
their forecast. Thus, in future work, it is advisable to conduct a comparative analysis of the forecasts
of traditional models and machine learning models to determine the accuracy of the forecast.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusions</title>
      <p>Comprehensive statistical analysis of Time Series allows to identify the main trends of changes in
indicators over time. Principal component decomposition (trends, seasonality, residuals) allows to
identify the main trends, seasonal fluctuations and potential anomalies in the data. Checking the
data for stationarity using the Dickey-Fuller test provides information on the need for further data
analysis in preparation for forecasting. The study analyzes fluctuations in Sea Ice extent by
identifying monthly trends and identifies the main problems in the Arctic and Antarctic regions. The
calculations provide a basis for further research and forecasting of Sea Ice extent. Results of the
clustering analysis shows rapidly changes in Sea Ice extent in North Hemisphere. Thus, Arctic region
is vulnerable to climate change and needs urgent climate actions. Sea Ice extent in South Hemisphere
demonstrate more stable situation. It is needed more long-term observation and more complex
analysis to determine hidden patterns in Antarctic Sea Ice extent.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>
        During the preparation of this work, the authors used Grammarly in order to: grammar and spelling
check; DeepL Translate in order to: some phrases translation into English. After using these
tools/services, the authors reviewed and edited the content as needed and take full responsibility for
the publication’s content.
[19] H. Cao, Y. Zhou, X. Jia, Y. Li. (2024). Analysis of Sea Ice Area Fluctuation in the Arctic Circle
Based on Big Data and SARIMA Model. 2024 International Conference on Electrical Drives,
Power Electronics &amp; Engineering (EDPEE), Athens, Greece, 376-380.
[20] M. Seo, H.-C. Kim. (2024). Arctic Greening Trends: Change Points in Satellite-Derived
Normalized Difference Vegetation Indexes and Their Correlation with Climate Variables over
the Last Two Decades. Remote Sensing, 16(
        <xref ref-type="bibr" rid="ref7">7</xref>
        ), 1160. https://doi.org/10.3390/rs16071160.
[21] T. Kocsis, R. Pongrácz, I. G. Hatvani, N. Magyar, A. Anda, I. Kovács-Székely. (2024). Seasonal
trends in the Early Twentieth Century Warming (ETCW) in a centennial instrumental
temperature record from Central Europe. Hungarian Geographical Bulletin, 73(
        <xref ref-type="bibr" rid="ref1">1</xref>
        ), 3–16.
https://doi.org/10.15201/hungeobull.73.1.1.
[22] X. Huo, N. Sun, L. Ma. (2024). MA-BLTSI model for Land Surface Temperature prediction based
on multi-dimensional data. Theoretical and Applied Climatology, 155(
        <xref ref-type="bibr" rid="ref7">7</xref>
        ), 6119–6136.
https://doi.org/10.1007/s00704-024-05009-2.
[23] S. Dixit, K. K. Pandey. (2024). Spatiotemporal variability identification and analysis for
nonstationary climatic trends for a tropical river basin of India. Journal of Environmental
Management, 365, 121692. https://doi.org/10.1016/j.jenvman.2024.121692.
[24] J. L. Ng, Y. F. Huang, S. L. S. Yong, J. C. Lee, A. N. Ahmed, M. Mirzaei. (2024). Analysing the
variability of non-stationary extreme rainfall events amidst climate change in East Malaysia.
AQUA – Water Infrastructure Ecosystems and Society, 73(
        <xref ref-type="bibr" rid="ref7">7</xref>
        ), 1494–1509.
https://doi.org/10.2166/aqua.2024.132.
[25] M. T. Islam, A. S. M. M. Kamal, M. M. Islam, S. Hossain. (2024). Impact of climate change on
dengue incidence in Singapore: time-series seasonal analysis. International Journal of
Environmental Health Research, 34(
        <xref ref-type="bibr" rid="ref12">12</xref>
        ), 3988–3998.
https://doi.org/10.1080/09603123.2024.2337827.
[26] Daily Sea Ice extent Data. (2019, June 10). Kaggle.
      </p>
      <p>https://www.kaggle.com/datasets/nsidcorg/daily-sea-ice-extent-data.
[27] Data and image archive. (2025). National Snow and Ice Data Center.</p>
      <p>https://nsidc.org/data/seaice_index/data-and-image-archive.
[28] Climate change indicators: Arctic Sea ICE | US EPA. (2025, January 17). US EPA.</p>
      <p>https://www.epa.gov/climate-indicators/climate-change-indicators-arctic-sea-ice.
[29] Current State of Sea ice cover | Earth. (2025, January 23).</p>
      <p>https://earth.gsfc.nasa.gov/cryo/data/current-state-sea-ice-cover.
[30] J. Korstanje. (2023, July 31). How to select a model for your Time Series Prediction Task [Guide].</p>
      <p>neptune.ai. https://neptune.ai/blog/select-model-for-time-series-prediction-task.
[31] The Pennsylvania State University. (2024). Applied Time Series Analysis. Sample ACF and</p>
      <p>
        Properties of AR(
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) Model. PennState Eberly College of Science.
[32] H. Duan, Q. Li, L. He, J. Zhang, H. An, R. Ali, M. Vazifedoust. (2024). Climate classification for
major cities in China using cluster analysis. Atmosphere, 15(
        <xref ref-type="bibr" rid="ref7">7</xref>
        ), 741.
[33] K. Sun, T. Lan, Y. M. Goh, S. Safiena, Y. Huang, B. Lytle, Y. He. (2023). An interpretable clustering
approach to safety climate analysis: Examining driver group distinctions. Accident Analysis &amp;
Prevention, 196, 107420. https://doi.org/10.1016/j.aap.2023.107420.
[34] M. Ali, P. Scandurra, F. Moretti, H. H. R. Sherazi. (2024). Anomaly Detection in Public Street
Lighting Data Using Unsupervised Clustering. IEEE Transactions on Consumer Electronics,
70(
        <xref ref-type="bibr" rid="ref1">1</xref>
        ), 4524-4535. https://doi.org/10.1109/TCE.2024.3354189.
[35] M. Suyal, S. Sharma. (2024). A Review on Analysis of K-Means Clustering Machine Learning
Algorithm based on Unsupervised Learning. Journal of Artificial Intelligence and Systems, 6(
        <xref ref-type="bibr" rid="ref1">1</xref>
        ),
85–95. https://doi.org/10.33969/ais.2024060106.
[36] K. Aurangzeb. (2024). DBSCAN-based energy users clustering for performance enhancement of
deep learning model. Journal of Intelligent &amp; Fuzzy Systems, 46(
        <xref ref-type="bibr" rid="ref3">3</xref>
        ), 5555–5573.
[37] D. Mavaluru, R. S. Malar, S. M. Dharmarajlu, J. P. L. Auguskani, A. Chellathurai. (2024). Deep
hierarchical cluster analysis for assessing the water quality indicators for sustainable
groundwater. Groundwater for Sustainable Development, 25, 101119.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>World Meteorological Organization.</surname>
          </string-name>
          (
          <year>2025</year>
          ). https://wmo.int.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J.</given-names>
            <surname>Turner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Holmes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. C.</given-names>
            <surname>Harrison</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Phillips</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Jena</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Reeves-Francois</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Fogt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. R.</given-names>
            <surname>Thomas</surname>
          </string-name>
          ,
          <string-name>
            <surname>C. C. Bajish.</surname>
          </string-name>
          (
          <year>2022</year>
          ).
          <article-title>Record low Antarctic Sea ice cover in February 2022</article-title>
          . Geophysical Research Letters,
          <volume>49</volume>
          (
          <issue>12</issue>
          ). https://doi.org/10.1029/2022gl098904.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>N. R.</given-names>
            <surname>Njeri</surname>
          </string-name>
          . (
          <year>2022</year>
          ).
          <article-title>Data preparation for machine learning modelling</article-title>
          .
          <source>International Journal of Computer Applications Technology and Research</source>
          ,
          <volume>11</volume>
          (
          <issue>06</issue>
          ),
          <fpage>231</fpage>
          -
          <lpage>235</lpage>
          . https://doi.org/10.7753/ijcatr1106.
          <fpage>1008</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>P.</given-names>
            <surname>Hryhoruk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Grygoruk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Khrushch</surname>
          </string-name>
          ,
          <string-name>
            <surname>T. Hovorushchenko.</surname>
          </string-name>
          (
          <year>2020</year>
          ).
          <article-title>Using non-metric multidimensional scaling for assessment of regions' economy in the context of their sustainable development</article-title>
          .
          <source>CEUR-WS 2713</source>
          ,
          <fpage>315</fpage>
          -
          <lpage>333</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>O.</given-names>
            <surname>Pavlova</surname>
          </string-name>
          ,
          <string-name>
            <surname>V. Alekseiko.</surname>
          </string-name>
          (
          <year>2024</year>
          ).
          <article-title>The concept of an information system for forecasting the temperature regime of the Earth's surface based on machine learning</article-title>
          .
          <source>Computer Systems and Information Technologies</source>
          ,
          <volume>2</volume>
          ,
          <fpage>6</fpage>
          -
          <lpage>13</lpage>
          . https://doi.org/10.31891/csit-2024
          <source>-2-1.</source>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>T.</given-names>
            <surname>Hovorushchenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Alekseiko</surname>
          </string-name>
          ,
          <string-name>
            <surname>V. Levashenko.</surname>
          </string-name>
          (
          <year>2025</year>
          ).
          <article-title>Machine learning methods' comparison for land surface temperatures forecasting due to climate classification</article-title>
          .
          <source>CEUR-WS</source>
          , Vol.
          <volume>3899</volume>
          ,
          <fpage>55</fpage>
          -
          <lpage>68</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>T.</given-names>
            <surname>Hovorushchenko</surname>
          </string-name>
          ,
          <string-name>
            <surname>V. Alekseiko.</surname>
          </string-name>
          (
          <year>2024</year>
          ).
          <article-title>Land surface temperature forecasting in the context of the development of sustainable cities and communities</article-title>
          .
          <source>Computer Systems and Information Technologies</source>
          ,
          <volume>3</volume>
          ,
          <fpage>6</fpage>
          -
          <lpage>13</lpage>
          . https://doi.org/10.31891/csit-2024
          <source>-3-1.</source>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>V.</given-names>
            <surname>Alekseiko</surname>
          </string-name>
          . (
          <year>2024</year>
          ).
          <article-title>Web-based information system for land surface temperature forecasting using machine learning methods</article-title>
          .
          <source>Science and technology today</source>
          ,
          <volume>10</volume>
          (
          <issue>38</issue>
          ),
          <fpage>17</fpage>
          -
          <lpage>27</lpage>
          . https://doi.org/10.52058/
          <fpage>2786</fpage>
          -6025-2024-
          <volume>10</volume>
          (
          <issue>38</issue>
          )
          <string-name>
            <surname>-</surname>
          </string-name>
          17-27.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>T.</given-names>
            <surname>Hovorushchenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Pavlova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Alekseiko</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. Kuzmin.</surname>
          </string-name>
          (
          <year>2024</year>
          ).
          <article-title>Climate parameters monitoring in the context of natural disasters forecasting</article-title>
          .
          <source>In 2024 IEEE 14th International Conference on Dependable Systems, Services and Technologies</source>
          , Athens, Greece,
          <source>October 11-13</source>
          ,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>X.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <surname>Y. Guo.</surname>
          </string-name>
          (
          <year>2022</year>
          ).
          <article-title>Causes of extreme 2020 Meiyu-Baiu rainfall: a study of combined effect of Indian Ocean and Arctic</article-title>
          .
          <source>Climate Dynamics</source>
          ,
          <volume>59</volume>
          (
          <fpage>11</fpage>
          -
          <lpage>12</lpage>
          ),
          <fpage>3485</fpage>
          -
          <lpage>3501</lpage>
          . https://doi.org/10.1007/s00382-022-06279-0.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>M.</given-names>
            <surname>Müller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Kelder</surname>
          </string-name>
          ,
          <string-name>
            <surname>C. Palerme.</surname>
          </string-name>
          (
          <year>2022</year>
          ).
          <article-title>Decline of sea-ice in the Greenland Sea intensifies extreme precipitation over Svalbard</article-title>
          .
          <source>Weather and Climate Extremes</source>
          ,
          <volume>36</volume>
          , 100437. https://doi.org/10.1016/j.wace.
          <year>2022</year>
          .
          <volume>100437</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>S.</given-names>
            <surname>Sundaram</surname>
          </string-name>
          ,
          <string-name>
            <surname>D. M. Holland.</surname>
          </string-name>
          (
          <year>2022</year>
          ).
          <article-title>A physical mechanism for the Indian summer MonsoonArctic Sea-Ice teleconnection</article-title>
          .
          <source>Atmosphere</source>
          ,
          <volume>13</volume>
          (
          <issue>4</issue>
          ), 566. https://doi.org/10.3390/atmos13040566.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>B.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Sun</surname>
          </string-name>
          , T. Han,
          <string-name>
            <surname>Z. Cheng.</surname>
          </string-name>
          (
          <year>2020</year>
          ).
          <article-title>Possible role of Southern Hemispheric sea ice in the variability of West China autumn rain</article-title>
          .
          <source>Atmospheric Research</source>
          ,
          <volume>249</volume>
          , 105329. https://doi.org/10.1016/j.atmosres.
          <year>2020</year>
          .
          <volume>105329</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>M.</given-names>
            <surname>Henke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Miesse</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. De Souza De Lima</surname>
            ,
            <given-names>C. M.</given-names>
          </string-name>
          <string-name>
            <surname>Ferreira</surname>
            ,
            <given-names>T. M. Ravens.</given-names>
          </string-name>
          (
          <year>2024</year>
          ).
          <article-title>Increasing coastal exposure to extreme wave events in the Alaskan Arctic as the open water season expands</article-title>
          .
          <source>Communications Earth &amp; Environment</source>
          ,
          <volume>5</volume>
          (
          <issue>1</issue>
          ). https://doi.org/10.1038/s43247-024- 01323-9.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <article-title>The cryosphere is the frozen water part of the Earth system</article-title>
          . (
          <year>2025</year>
          ). https://oceanservice.noaa.gov/facts/sea-ice-climate.html.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>P. T. Fretwell.</surname>
          </string-name>
          (
          <year>2024</year>
          ).
          <article-title>A 6 year assessment of low sea-ice impacts on emperor penguins</article-title>
          .
          <source>Antarctic Science</source>
          ,
          <volume>36</volume>
          (
          <issue>1</issue>
          ),
          <fpage>3</fpage>
          -
          <lpage>5</lpage>
          . https://doi.org/10.1017/S0954102024000130.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>J. R.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Waterman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. D.</given-names>
            <surname>Shaw</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. M.</given-names>
            <surname>Bergstrom</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. J.</given-names>
            <surname>Lynch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. H.</given-names>
            <surname>Wall</surname>
          </string-name>
          ,
          <string-name>
            <surname>S. A. Robinson.</surname>
          </string-name>
          (
          <year>2022</year>
          ).
          <article-title>Islands in the ice: Potential impacts of habitat transformation on Antarctic biodiversity</article-title>
          .
          <source>Global Change Biology</source>
          ,
          <volume>28</volume>
          (
          <issue>20</issue>
          ),
          <fpage>5865</fpage>
          -
          <lpage>5880</lpage>
          . https://doi.org/10.1111/gcb.16331.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>S. A.</given-names>
            <surname>Robinson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. E.</given-names>
            <surname>Revell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Mackenzie</surname>
          </string-name>
          ,
          <string-name>
            <surname>R. Ossola.</surname>
          </string-name>
          (
          <year>2024</year>
          ).
          <article-title>Extended ozone depletion and reduced snow and ice cover - Consequences for Antarctic biota</article-title>
          .
          <source>Global Change Biology</source>
          ,
          <volume>30</volume>
          (
          <issue>4</issue>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>