<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Geo-referenced Time Series: A Copula-Based Approach for Clustering</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Alessia Benevento</string-name>
          <email>alessia.benevento@unisalento.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fabrizio Durante</string-name>
          <email>fabrizio.durante@unisalento.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Roberta Pappadà</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Time Series Clustering, Spatio-Temporal Data, Dependence Modeling, Copula Models</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Dipartimento di Matematica e Fisica “Ennio De Giorgi”, Università del Salento</institution>
          ,
          <addr-line>Lecce</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Dipartimento di Scienze Economiche, Aziendali, Matematiche e Statistiche “B. de Finetti”, Università degli Studi di Trieste</institution>
          ,
          <addr-line>Trieste</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>Time series clustering plays a crucial role in managing and extracting knowledge from the vast and complex Earth Observation (EO) datasets such as satellite-derived temperatures, precipitation levels, or soil-related variables. This study explores copula-based clustering techniques that focus on temporal dependence structures among time series, rather than their marginal behavior, to detect patterns of comovement in environmental variables. Applied to summer maximum temperatures in Italy, the approach reveals spatially coherent clusters that reflect underlying climatic regimes. However, when applied to monthly maximum precipitation data, clustering based solely on temporal dependence yields fragmented and geographically inconsistent results. To address this, we introduce a method that incorporates spatial proximity via soft constraints, combining temporal and spatial-based dissimilarities through a tunable mixing parameter. Our results demonstrate that including spatial information can significantly improve cluster coherence and interpretability, particularly for variables with strong geographic variability. Applications are based on EO data from the Copernicus Climate Data Store.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Time series clustering plays a crucial role in managing and extracting knowledge from the vast and
complex Earth Observation (EO) datasets such as satellite-derived temperatures, precipitation levels, or
soil-related variables. These data are often collected almost continuously over time and across thousands
of spatial locations. The growing availability of high-resolution EO data provides unique opportunities
for understanding complex environmental processes. However, this data ore often high-dimensional or
spatially heterogeneous, presenting significant challenges for automated analysis and reuse.</p>
      <p>In this context, time series clustering emerges as a key unsupervised learning strategy to manage
complexity and extract meaningful structure from large-scale temporal datasets. By grouping time
series with similar temporal behavior, clustering helps uncover regional patterns, reduce dimensionality,
and support the design of interpretable models. This is particularly valuable in EO applications, where
thousands of gridded time series must be analyzed jointly, such as temperature or precipitation over
diferent locations.</p>
      <p>
        One increasingly relevant direction in this area involves clustering based on cross-sectional
dependence: identifying sets of time series that exhibit comovement, meaning they tend to increase or
decrease together over time, even if their individual marginal behaviors difer. This type of dependence
is particularly important for capturing joint climate dynamics, especially related to joint extremes, e.g.,
maxima of precipitations [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and temperatures [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], or to model flood risks [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Copula-based clustering
methods for time series (see, e.g., [4] and references therein) have naturally appeared in this context to
Workshop on AI-driven Data Engineering and Reusability for Earth and Space Sciences (DARES’25), co-located with the 28th
(R. Pappadà)
      </p>
      <p>CEUR
Workshop</p>
      <p>ISSN1613-0073
focus exactly on the dependence among diferent time series regardless of the marginal behavior, rather
than comparing raw values or global features.</p>
      <p>These methods rely on a rank-invariant dissimilarity measure that quantifies how close the underlying
copula is to the comonotonic case, representing perfect positive dependence.</p>
      <p>Additionally, beyond time-depending features, EO data comes with diferent deterministic (spatial)
information such as latitude, longitude or altitude. When general-purpose clustering methods are used
for clustering geo-referenced time series, the resulting clusters are scattered over the spatial domain of
the study. Thus, numerous studies have emphasized the importance of including spatial information for
geographically referenced data since, in these scenarios, forming clusters that also reflect geographical
proximity can significantly improve the interpretability of the results [ 5, 6].</p>
      <p>In the copula framework, clustering methods with deterministic constraints have been proposed in
[7] to incorporate non-temporal proximity information into the clustering process. Importantly, these
methods do not enforce strict adherence to proximity constraints: the existing algorithms may cluster
time series that are geographically distant if their dependence structure justifies it [ 8]. This flexibility
arises from the use of soft proximity constraints [ 9], which contrast with hard constraints that require
strict spatial coherence, as explored, e.g., in [10, 11].</p>
      <p>Here, we present two applications of clustering with and without spatial information on climatological
data downloaded from the Copernicus Climate Data Store1.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Copula-based Clustering for Comovement Detection</title>
      <p>Firstly, we show how to adopt a copula-based clustering approach that captures the dependence structure
among variables to identify patterns of comovement in environmental time series.</p>
      <p>We begin by considering a set of monthly maximum temperatures of the summer months (JJA) in
Italy, derived from ERA5 reanalysis data, covering multiple spatial points across the Italian territory
and spanning the period from 1960 to 2024. Although ERA5 provides a globally homogeneous grid, we
restrict our attention to land areas by excluding sea points, selecting a subset of  = 105 grid points.</p>
      <p>The aim is to cluster the relative time series not based on absolute temperature levels, but on how
strongly their fluctuations are statistically dependent over time, for example, locations that tend to heat
up simultaneously, even if the temperature magnitudes difer.</p>
      <p>The approach proceeds in three main steps. First, the data are filtered by suitable univariate time
series models, like seasonal ARIMA models, in order to remove the seasonality and the auto-correlation.
The temperature series are then transformed into pseudo-observations, using their empirical marginal
distributions. This operation removes the efect of difering scales or distributions and retains only the
information relevant to dependence structure. From these pseudo-observations, we compute pairwise
copulas for each pair of locations. The full collection of these bivariate copulas defines the Copula
Matrix, a compact representation of the dependencies among all time series. An alternative strategy
would be to rely on multivariate copulas to construct the Copula Matrix; however this approach may
substantially increase computational complexity when dealing with large datasets.</p>
      <p>Next, we define a dissimilarity measure over this matrix, which evaluates how far each pairwise
copula deviates from the comonotonicity case, i.e., perfect positive dependence. This dissimilarity is
rank-invariant, meaning that it is robust to monotonic transformations and unafected by outliers or
marginal variability. The resulting Dissimilarity Matrix is then used to perform Partitioning Around
Medoids (PAM) clustering, yielding groups of locations whose time series exhibit similar comovement
patterns. We present the results using  = 8 clusters, as this value represents a reasonable compromise
between the optimal Average Silhouette Index [12] and the need for a clear and interpretable spatial
visualization. Choosing  = 8 avoids overly complex maps with too many colors (which would hinder
interpretation) while still ofering more informative structure than overly simplistic solutions with only
two or three regions. Although we have a finite set of representative stations, the maps are displayed
46°N
44°N
e
ittud42°N
a
L
40°N
38°N
46°N
44°N
e
ittud42°N
a
L
40°N
38°N
using continuous colors to provide a full spatial representation across Italy. Each station is associated
with a rectangular area (pixel), allowing the entire map to be fully covered.</p>
      <p>Applied to the Italian dataset, this method reveals several meaningful clusters as shown in Fig.1. For
instance, stations located in the Alps tend to cluster together, reflecting a shared temporal behavior
likely driven by mountain air masses. Coastal and southern regions, influenced by diferent climatic
regimes, form separate groups.</p>
      <p>While the copula-based clustering algorithm provides coherent and geographically interpretable
groups when applied to summer maximum temperature time series, the same methodology proves less
efective when used with other variables, such as precipitation extremes. In particular, when clustering
monthly maximum precipitations time series from January 2011 till November 2023 across the same
 = 105 locations in Italy, the resulting clusters appear highly fragmented and spatially inconsistent as
shown in Fig.2 where some stations in northwestern Italy are grouped together with others located in
the southeastern part of the country, despite their clear geographic and climatic diferences. Moreover,
the locations of the cluster medoids further indicates that the clustering fails to capture the spatial
heterogeneity of precipitation patterns across the country.</p>
      <p>These results suggest that, for certain environmental variables, temporal dependence alone is
insufifcient to meaningfully characterize comovement structures. In such cases, it becomes necessary to
explicitly incorporate spatial information into the clustering process. By doing so, we aim to balance
dependence-driven similarity with geographic proximity, producing clusters that are both statistically
coherent and spatially interpretable. This motivates the introduction of clustering with spatial
constraints frameworks, where soft proximity constraints enforce spatial cohesion among time series.
Given one matrix capturing temporal dependence among the time series and another representing
spatial proximity, several approaches exist to merge these into a single dissimilarity matrix suitable
for clustering [13, 14, 7]. Specifically, introducing a parameter  to balance the two dependencies
ensures that the spatial constraint is imposed in a soft, rather than hard, form. Such  plays the role of a
regularization parameter, calibrating the trade-of between temporal dependence and spatial proximity,
and thereby influencing the number and shape of the resulting clusters. A common approach to merge
the dissimilarities that does not leverage on copulas is, e.g., [9]. In the current copula-based framework,
we adopt a convex combination controlled by the parameter  , where  = 0 corresponds to purely
temporal dependence and  = 1 to purely spatial dependence. The clustering procedure relies on a
dissimilarity measure defined as the distance between the copula-based dependence structure in the data
and a target matrix  representing the perfect comonotonicity. This dissimilarity is itself constructed
as a convex combination of temporal and spatial components, in line with the strategy proposed in
[7]. The tuning parameter  thus acts as a weight that regulates the relative contribution of the two
sources of information, balancing dependence-driven similarity with geographic proximity. In Figure 3,
we illustrate how gradually incorporating the spatial component into the dependence model leads to
increasingly compact clusters.</p>
      <p>The selection of the optimal parameter  is far from straightforward. Both its selection and the
determination of the optimal number of clusters  can significantly influence the resulting clusters, and
therefore deserve careful consideration.</p>
    </sec>
    <sec id="sec-3">
      <title>Acknowledgments</title>
      <p>AB and FD have been supported by MUR-PRIN 2022 PNRR, Project “Stochastic Modeling of Compound
Events” (No. P2022KZJTZ) funded by European Union – Next Generation EU. The work of FD has
been carried out with partial financial support from ICSC – Centro Nazionale di Ricerca in High
Performance Computing, Big Data and Quantum Computing, funded by EU – Next Generation EU
(CUP F83C22000740001). RP has been supported by MUR-PRIN 2022, Project “Modelling Non-standard
data and Extremes in Multivariate Environmental Time series” (No. 20223CEZSR) funded by European
Union – Next Generation EU.</p>
    </sec>
    <sec id="sec-4">
      <title>Declaration on Generative AI</title>
      <p>The authors have not employed any Generative AI tools.</p>
      <sec id="sec-4-1">
        <title>Cluster</title>
      </sec>
      <sec id="sec-4-2">
        <title>Cluster</title>
        <p>8°E 10°E 12°E 14°E 16°E 18°E</p>
      </sec>
      <sec id="sec-4-3">
        <title>Longitude</title>
        <p>α = 0.95</p>
      </sec>
      <sec id="sec-4-4">
        <title>Cluster</title>
        <p>Cluster
46°N
44°N
e
tud42°N
it
a
L
40°N
38°N
46°N
44°N
e
tud42°N
it
a
L
40°N
38°N
8°E 10°E 12°E 14°E 16°E 18°E</p>
        <p>Longitude
α = 0.75
[4] F. M. L. Di Lascio, A. Menapace, R. Pappadà, A spatially-weighted amh copula-based dissimilarity
measure for clustering variables: An application to urban thermal eficiency, Environ. 35 (2024)
e2828.
[5] F. Fouedjio, Clustering of multivariate geostatistical data, Wiley Interdiscip. Rev.: Comput. Stat.</p>
        <p>12 (2020) e1510.
[6] K. Kopczewska, Spatial machine learning: new opportunities for regional science, Ann. Reg. Sci.</p>
        <p>68 (2022) 713–755.
[7] M. Disegna, P. D’Urso, F. Durante, Copula-based fuzzy clustering of spatial time series, Spat. Stat.</p>
        <p>21 (2017) 209–225.
[8] T. Romary, F. Ors, J. Rivoirard, J. Deraisme, Unsupervised classification of multivariate geostatistical
data: Two algorithms, Comput. &amp; Geosci. 85 (2015) 96–103.
[9] M. Chavent, V. Kuentz-Simonet, A. Labenne, J. Saracco, ClustGeo: an R package for hierarchical
clustering with spatial constraints, Comput. Stat. 33 (2018) 1799–1822.
[10] Y. Pawitan, J. Huang, Constrained clustering of irregularly sampled spatial data, J. Stat. Comput.</p>
        <p>Simul. 73 (2003) 853–865.
[11] G. Guénard, P. Legendre, Hierarchical clustering with contiguity constraint in R, J. Stat. Softw.</p>
        <p>103 (2022) 1–26.
[12] C. Hennig, M. Meila, F. Murtagh, R. Rocci, Handbook of cluster analysis, Chapman Hall/CRC</p>
        <p>Handb. Mod. Stat. Methods, Boca Raton, FL: CRC Press, 2016.
[13] A. Benevento, F. Durante, R. Pappadà, Tail-dependence clustering of time series with spatial
constraints, Environ. Ecol. Stat. 31 (2024) 801–817.
[14] M. de Carvalho, R. Huser, R. Rubio, Similarity-based clustering for patterns of extreme values,
Stat 12 (2023) e560.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>E.</given-names>
            <surname>Bernard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Naveau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Vrac</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Mestre</surname>
          </string-name>
          ,
          <article-title>Clustering of maxima: Spatial dependencies among heavy rainfall in France</article-title>
          , J. Clim.
          <volume>26</volume>
          (
          <year>2013</year>
          )
          <fpage>7929</fpage>
          -
          <lpage>7937</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Bador</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Naveau</surname>
          </string-name>
          , E. Gilleland,
          <string-name>
            <given-names>M.</given-names>
            <surname>Castellà</surname>
          </string-name>
          , T. Arivelo,
          <article-title>Spatial clustering of summer temperature maxima from the CNRM-CM5 climate model ensembles &amp; E-OBS over Europe</article-title>
          ,
          <source>Weather Clim. Extremes</source>
          <volume>9</volume>
          (
          <year>2015</year>
          )
          <fpage>17</fpage>
          -
          <lpage>24</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>R.</given-names>
            <surname>Pappadà</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Durante</surname>
          </string-name>
          , G. Salvadori, C. De Michele,
          <article-title>Clustering of concurrent flood risks via hazard scenarios</article-title>
          ,
          <source>Spat. Stat</source>
          .
          <volume>23</volume>
          (
          <year>2018</year>
          )
          <fpage>124</fpage>
          -
          <lpage>142</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>