Proceedings 1st International Workshop on Advanced Analytics and Learning on Temporal Data
                                         AALTD 2015


       Fuzzy Clustering of Series Using Quantile
                   Autocovariances
                        Borja Lafuente-Rego and Jose A. Vilar
    Research Group on Modeling, Optimization and Statistical Inference (MODES),
    Department of Mathematics, Computer Science Faculty, University of A Coruña


        Abstract. Unlike conventional clustering, fuzzy cluster analysis allows
        data elements to belong to more than one cluster by assigning member-
        ship degrees of each data to clusters. This work proposes a fuzzy C–
        medoids algorithm to cluster time series based on comparing their esti-
        mated quantile autocovariance functions. The behaviour of the proposed
        algorithm is studied on different simulated scenarios and its effectiveness
        is concluded by comparison with alternative approaches.


1     Introduction
In classical cluster analysis each datum is assigned to exactly one cluster, thus
producing a “hard” partition of the data set into several disjoint subsets. This
approach can be inadequate in the presence of data objects that are equally dis-
tant to two ore more clusters. Fuzzy cluster analysis allows gradual memberships
of data objects to clusters, thus providing versatility to reflect the certainty with
which each data is assigned to the different clusters. An interesting overview of
present fuzzy clustering methods is provided by [3]. Interest in this approach
has increased in recent years. Proof of this is the large amount of publications
in this field (e.g. [6] and [5]).
    In this paper, a fuzzy C–medoids algorithm to cluster time series using the
quantile autocovariance functions is proposed. Motivation behind this approach
is twofold. First, quantile autocovariances have shown a high capability to clus-
ter time series generated from a broad range of dependence models [10]. On the
other hand, the use of a fuzzy approach for clustering time series is justified in
order to gain adaptivity for constructing the centroids and to obtain a better
characterization of the temporal pattern of the series (see discussion in [7]). To
illustrate the merits of the proposed algorithm, an extensive simulation study
comparing our fuzzy approach with other fuzzy procedures has been carried
out. Specifically, we have focused on the classification of heteroskedastic models,
which are of great importance in many applications (e.g. to model many finan-
cial time series) and have received relatively little attention in the clustering
literature.


2     A dissimilarity based on quantile autocovariances
                                                                     (j)          (j)
Consider a set of p series S = X (1) , . . . , X (p) , with X (j) = (X1 , . . . , XT )
                                                                        (j)
being a T -length partial realization from a real valued process {Xt , t ∈ Z}.


Copyright c 2015 for this paper by its authors. Copying permitted for private and academic
purposes.
2     B. Lafuente-Rego, J.A. Vilar

We wish to perform cluster analysis on S in such a way that series with similar
generating processes are grouped together. To achieve this goal, we propose to
measure dissimilarity between two series by comparing the estimators of their
quantile autocovariance functions (QAF), which are formally defined below.
   Let X1 , . . . , XT an observed stretch of a strictly stationary process {Xt ; t ∈
Z}. Denote by F the marginal distribution of Xt and by qτ = F −1 (τ ), τ ∈ [0, 1],
the corresponding quantile function. Fixed l ∈ Z and an arbitrary couple of
quantile levels (τ, τ 0 ) ∈ [0, 1]2 , consider the cross-covariance of the indicator
functions I (Xt ≤ qτ ) and I (Xt+l ≤ qτ 0 ) given by

   γl (τ, τ 0 ) = cov {I (Xt ≤ qτ ) , I (Xt+l ≤ qτ 0 )} = P (Xt ≤ qτ , Xt+l ≤ qτ 0 ) − τ τ 0 .
                                                                                            (1)
      Function γl (τ, τ 0 ), with (τ, τ 0 ) ∈ [0, 1]2 , is called quantile autocovariance func-
tion of lag l. Replacing in (1) the theoretical quantiles of the marginal distribu-
tion F , qτ and qτ 0 , by the corresponding empirical quantiles based on X1 , . . . , XT ,
q̂τ and q̂τ 0 , we obtain the estimated quantile autocovariance function given by
                                       T −l
                                   1 X
               γ̂l (τ, τ 0 ) =              I (Xt ≤ q̂τ ) I (Xt+l ≤ q̂τ 0 ) − τ τ 0 .         (2)
                                 T − l t=1

    As the quantile autocovariances are able to account for high level dynamic
                                                                       (1)       (2)
features, a simple dissimilarity criterion between two series Xt and Xt con-
sists in comparing their estimated quantile autocovariances on a common range
of selected quantiles. Thus, for L prefixed lags, l1 , . . . , lL , and r quantile levels,
0 < τ1 < . . . < τr < 1, we construct the vectors Γ (u) , u = 1, 2, given by
                                                                              
               (u)          (u)          (u)      (u)
   Γ (u) = Γ l1 , . . . , Γ lL , with Γ li = γ̂li (τj , τk ); j, k = 1 . . . , r , (3)

                                                                                        (1)   (2)
for i = 1, . . . , L, and γ̂ given in (2). Then, the distance between Xt and Xt
is defined as the squared Euclidean distance between their representations Γ (1)
and Γ (2) , i.e.
                                              
                                     (1)   (2)
                          dQAF Xt , Xt               Γ (1) − Γ (2) ||22
                                                 = ||Γ                        (4)

    Computing dQAF for all pairs of series in S allows us to set a pairwise dissim-
ilarity matrix, which can be taken as starting point of a conventional hierarchical
clustering algorithm. Alternatively, a partitioning clustering, such as the k-means
algorithm, can be performed averaging the Γ representations to determine the
centroids. Then, dQAF is also used to calculate the distances between series and
centroids involved in the iterative refinement of the cluster solution.


3    Fuzzy clustering based on quantile autocovariances
Time series are dynamic objects and therefore different temporal patterns may
be necessary to characterize the serial behaviour in different periods of time. In
                 Fuzzy Clustering of Series Using Quantile Autocovariances                        3

other words, the series are not distributed accurately within a given number of
clusters, but they can belong to two or even more clusters. This problem can be
adequately treated using a fuzzy clustering procedure, which associates a fuzzy
label vector to each element stating its memberships to the set of clusters. In
this section we propose a fuzzy C-medoids clustering algorithm for time series
by plugging the QAF-dissimilarity introduced in Section 2.                  
    Let S = X (1) , . . . , X (p) be a set of p time series and Γ = Γ (1) , . . . , Γ (p)
a set of quantile autocovariances selected to perform n        clustering.o The fuzzy C-
                                                          (1)
medoids clustering finds the subset of ,    Γ   Γ̃ =   Γ̃     , . . . , Γ̃ (C) , and the p ×
C matrix of fuzzy coefficients Ω = (ui,c ) that lead to solve the minimization
problem:
               p X
                 C                                                 C
               X                             2                     X
       min               um
                          ic Γ
                               (i)
                                   − Γ (c)       , subject to             uic = 1 and uic ≥ 0.   (5)
       Γ̃ ,Ω                                 2
               i=1 c=1                                             c=1

Each uic ∈ [0, 1] represents the membership degree of the i-th series to the c-th
cluster and the parameter m > 1 controls the fuzziness of the partition. As the
value of m increases, the boundaries between clusters become softer and therefore
the classification is fuzzier. If m = 1, the hard version of the clustering procedure
is obtained, i.e. uic ∈ {0, 1}, that leads to a classical K-means partition of S.
                  PC
The constraints c=1 uic = 1 and uic ≥ 0 ensure that no cluster is empty and
that all series are included in the cluster partition.
    The objective function (5) cannot be minimized directly, and an iterative
algorithm that alternately optimizes the membership degrees and the medoids
must be used. The update formula for the membership degrees is given by
                                                  1 −1
                       C                      2 ! m−1
                     X        Γ (i) − Γ (c) 2
             uic =                                    , for i = 1, . . . , p.   (6)
                      0      Γ  (i) − Γ (c0 ) 2
                     c =1                     2

    Then, the QAF-based fuzzy C–medoids clustering algorithm is implemented
as follows.                            n                       o
 i. Pick an initial set of medoids Γe = Γe (1) , . . . , Γe (C) and the fuzzifier m.
 ii. Set Γe OLD = Γe .
iii. Compute uic using (6).                 n                 o
iv. Update the medoids, let’s say Γb = Γb (1) , . . . , Γb (C) , by minimizing the
     objective function with the new uic . Denote by
                                                  p
                                                  X                  00        0    2
                           q = argmin1≤i0 <p               um
                                                            i00 c Γ
                                                                    (i )
                                                                         − Γ (i )
                                                                                    2
                                                  i00 =1

    If the value of q is lower than the one obtained with Γe , then Γe = Γb .
 v. If Γe OLD = Γe or a maximum number of iterations is achieved then end
    algorithm. Otherwise, return to step ii.
   The total number of clusters C has to be preset. For this task classical indexes
such as silhouette width or Krazanowski-Lai index can be used.
4       B. Lafuente-Rego, J.A. Vilar

4      Simulation Study

The proposed fuzzy algorithm was tested against two other fuzzy clustering
algorithms via simulation. In particular, the classification of heteroskedastic
time series was considered by simulating two different scenarios formed by (i)
GARCH(1,1) models and (ii) different structures of conditional heteroskedastic-
ity. The selected generating models at each case are detailed below.

    – Scenario 1: Consider Xt = µt + at , with µt ∼ AR(1) and at = σt t ,
      t ∼ N (0, 1). Then, the following GARCH(1,1) structures for the varying
      conditional variance are considered:
      M1: σt2 = 0.1 + 0.01a2t−1 + 0.9σt−1
                                      2
                                                      M3: σt2 = 0.1 + 0.1a2t−1 + 0.1σt−1
                                                                                     2
            2             2           2                     2             2          2
      M2: σt = 0.1 + 0.9at−1 + 0.01σt−1               M4: σt = 0.1 + 0.4at−1 + 0.5σt−1
    – Scenario 2: Consider Xt = µt + at , with µt ∼ MA(1) and at = σt t ,
      t ∼ N (0, 1). Then, the following ARCH(1), GARCH(1,1), GJR-GARCH
      and EGARCH structures are considered for the varying conditional variance:
      M1: σt2 = 0.1 + 0.8a2t−1
      M2: σt2 = 0.1 + 0.1a2t−1 + 0.8σt−1
                                      2

      M3: σt = 0.1 + (0.25 + 0.3Nt−1 )a2t−1 + 0.5σt−1
            2                                        2
                                                          ; Nt−1 = I(at−1 < 0)
               2                                                     2
      M4: ln(σt ) = 0.1 + t−1 + 0.3 [|t−1 | − E(|t−1 |)] + 0.4ln(σt−1 )

    In all cases t consisted of independent zero-mean Gaussian variables with
unit variance. For each scenario, five series of length T = 200 were generated
from each model over N = 100 trials.
    Two fuzzy clustering algorithms specifically designed to deal with GARCH
models were used and compared with our proposal. Both algorithms rely on
different dissimilarity measures constructed using the AR representation of a
GARCH(p,q) process given by
                                      p
                                      X                  q
                                                         X
                          σt2 = γ +         αi a2t−i +             2
                                                               βj σt−j                 (7)
                                      i=1                j=1


with      0 ≤ αi < 1 and 0 ≤ βj < 1, for i = 1, . . . , p and j = 1, . . . , q, and
 Pp γ > 0,P q
( i=1 αi + j=1 βj ) < 1. Then, the dissimilarities are defined as follows.

 1. Dissimilarity based on the autoregressive representation of the GARCH mod-
                                      0
    els [12, 11]. Given X (k) and X (k ) in S, we define

                                                     R
                                            0        X                   2
                          d2AR (X
                                X (k) , X (k ) ) =          πrk − π
                                                           (b     brk0 ) ,
                                                     r=1

                                                                    Pmin(q,r)
      with πbrz an estimator of the r-th coefficient πr = (αr +βr )+ j=1      βj πr−j ,
      for the series z, z = k, k 0 . Parameter R determines the maximum number of
                Fuzzy Clustering of Series Using Quantile Autocovariances                     5

   autoregressive coefficients πr . A GARCH–based fuzzy C-medoids clustering
   is proposed in [4] by considering the optimization problem:
             p X
             X C             R
                             X                                 C
                                                               X
                                              2
      min              um
                        ic          πri − π
                                   (b     brc ) , subject to         uic = 1 and uic ≥ 0.    (8)
      Π̂,Ω
             i=1 c=1         r=1                               c=1

2. GARCH-based distance measure [1] given by
                                       0                 0               −1
         dGARCH (XX (k) , X (k ) ) = (Lk − Lk0 ) (Vk + Vk0 ) (Lk − Lk0 ) (9)
                      
   with Lj = αb j , βbj the vector of estimated parameters and Vj the esti-
   mated covariance matrix for Lj , for j = k, k 0 . An alternative GARCH-based
   fuzzy C-medoids clustering is proposed in [4] by minimizing:
                       p X
                         C            h                                   i
                                                 0           −1
                       X
                                   um
                                    ic (Li − Lc ) (Vi + Vc )    (Li − Lc ) ,                (10)
                       i=1 c=1
                PC
   subject to      c=1 uic = 1 and uic ≥ 0.

    The three fuzzy clustering algorithms were performed using a fuzziness pa-
rameter m = 1.5 on N = 100 trials for each scenario. At each trial, the quality
of the clustering procedure was evaluated comparing the experimental cluster
solution with the true cluster partition. Two different agreement measures were
used, namely the Gavrilov index [8] and the adjusted Rand index [9]. The mean
values and standard deviations of these indexes based on the 100 trials using
both hard and fuzzy cluster analysis are provided in Table 1.


Table 1. Averages and standard deviations (in brackets) of two cluster similarity
indexes obtained from 100 trials.

                                           Scenario 1                     Scenario 2
                                      Gavrilov Adj. Rand             Gavrilov Adj. Rand
    Hard cluster       dAR    0.859 (.109) 0.685 (.198)         0.712 (.146) 0.469 (.215)
                       dGARCH 0.574 (.059) 0.286 (.072)         0.504 (.078) 0.137 (.116)
                       dQAF   0.843 (.109) 0.726 (.152)         0.918 (.081) 0.825 (.135)
    Fuzzy cluster dAR    0.541 (.056) 0.271 (.080)              0.486 (.076) 0.128 (.100)
                  dGARCH 0.553 (.088) 0.241 (.132)              0.535 (.076) 0.188 (.107)
                  dQAF   0.842 (.116) 0.704 (.181)              0.925 (.072) 0.833 (.125)


    Results from Table 1 show that the metrics based on quantile autocovariances
and on the AR representation led to the best scores in Scenario 1 when the hard
cluster is carried out. When the fuzzy approaches were considered, the behaviour
of the dAR substantially worsened, while the very similar (even somewhat higher)
6      B. Lafuente-Rego, J.A. Vilar

results were obtained with dQAF . The worst results were obtained with the
GARCH-based dissimilarity both for the hard and the fuzzy versions.
    The metric based on quantile autocovariances also obtained the best results
in Scenario 2, with indexes of agreement above 0.8 and a slight improvement by
using the fuzzy clustering. The GARCH-based metrics, dAR and dGARCH where
strongly affected by the model misspecification and produced the worst results
for both the hard and the fuzzy versions of the cluster analysis.
    To assess the effect of the fuzziness parameter in the partitions the algorithm
was implemented for several values of m. However this results were here omitted
due to the limitation of space.


5     A case study
In this section, the proposed fuzzy C–medoids clustering algorithm is used to per-
form clustering on a set of series of electricity demand. Specifically, our database
consists of hourly electricity demand in the Spanish market from 1st January
2011 to 31th December 2012. All data are sourced from the official website of
Operador del Mercado Iberico de Energia1 . Records corresponding to Saturdays
and Sundays have been removed from the database because electricity demand
is lower in the weekends. Thus we have 24 time series (one for each hour of the
day) of length T = 731. Since all series are non–stationary in mean, the original
series are transformed taking one regular difference.
    Table 2 presents the membership degrees for the case with two and three
clusters. The results obtained for the two–cluster partition formed by C1 =
{H24, H1, H2, H3, H4, H5, H6, H7} and C2 grouping the remaining series. The
cluster C1 corresponds with the hours of the day where the electricity demand
is low, while the C2 identifies the time of the day when the power consumption
is greater. In the case of the three–cluster partition, the cluster C1 is divided in
two subclusters. One formed with the hours of the day with the lowest demand
of electricity, and a second cluster with an intermediate electricity consumption.


6     Concluding remarks
In this paper, we focus on the classification of time series featuring a fuzzy
clustering algorithm in the framework of a partitioning around medoids. A
dissimilarity–based approach is considered. In particular, we propose a C–medoids
fuzzy clustering algorithm using an innovative dissimilarity measure based on the
quantile autocovariances (dQAF ).
    The simulation study shows that the proposed dissimilarity produces satisfac-
tory results by performing fuzzy cluster analysis. The proposed clustering algo-
rithm was tested against two GARCH–based fuzzy clustering algorithm present
in the literature in two different heteroskedastic scenarios. The fuzzy clustering
algorithm based on dQAF led to the best results. In fact, apart from dQAF , none
1
    http://http://www.omel.es/files/flash/ResultadosMercado.swf
               Fuzzy Clustering of Series Using Quantile Autocovariances                7

Table 2. Membership degrees obtained with QAF–based FCM with m = 1.5 consid-
ering 2 and 3 clusters.

                2 Clusters                     3 Clusters
                Membership degrees     Crisp   Membership degrees               Crisp
                  C1           C2                C1           C2        C3
        H1      0.63044      0.36956    1      1.00000      0.00000   0.00000    1
        H2      1.00000      0.00000    1      0.99878      0.00098   0.00024    1
        H3      0.98282      0.01718    1      0.99484      0.00395   0.00121    1
        H4      0.94118      0.05882    1      0.99229      0.00574   0.00197    1
        H5      1.00000      0.00000    1      0.92388      0.06811   0.00801    1
        H6      0.99923      0.00077    1      0.30793      0.66185   0.03021    2
        H7      0.98282      0.01718    1      0.00000      1.00000   0.00000    2
        H8      0.00003      0.99997    2      0.05793      0.09475   0.84733    3
        H9      0.00003      0.99997    2      0.00097      0.00294   0.99610    3
        H10     0.00077      0.99923    2      0.00045      0.00149   0.99806    3
        H11     0.00077      0.99923    2      0.00171      0.00507   0.99323    3
        H12     0.00002      0.99998    2      0.00011      0.00049   0.99940    3
        H13     0.00002      0.99998    2      0.00027      0.00178   0.99794    3
        H14     0.00002      0.99998    2      0.00032      0.00107   0.99861    3
        H15     0.00002      0.99998    2      0.00134      0.00940   0.98927    3
        H16     0.00002      0.99998    2      0.00088      0.00636   0.99277    3
        H17     0.00002      0.99998    2      0.00000      0.00000   1.00000    3
        H18     0.00056      0.99944    2      0.00051      0.00498   0.99451    3
        H19     0.00000      1.00000    2      0.00002      0.00014   0.99984    3
        H20     0.00003      0.99997    2      0.00020      0.00135   0.99846    3
        H21     0.00056      0.99944    2      0.00047      0.00384   0.99569    3
        H22     0.01718      0.98282    2      0.00136      0.01488   0.98377    3
        H23     0.00003      0.99997    2      0.01539      0.13091   0.85370    3
        H24     0.99998      0.00002    1      0.00206      0.98054   0.01740    2


of the remaining examined dissimilarities shown acceptable results by cluster-
ing heteroskedastic processes, thus emphasizing the usefulness of dQAF in this
framework.
    Note that a limitation of our procedure is that series are assumed to be
strictly stationary and hence further research must be carried out. Although
we have followed a dissimilarity-based approach, it is worthy to emphasize that
model-based techniques can be also an interesting alternative. Likewise the fuzzy
approach, the use of probabilistic models such as mixture models (see e.g. [2])
allows us to assign each datum to one single cluster although this assignment
relies on a probabilistic approach since the mixing proportions are estimated
from the data. Unlike the fuzzy approach, no fuzziness parameter is required by
using mixture models, although the model selection problem must be solved in
the latter case.


References
 1. Caiado, J., Crato, N.: A garch-based method for clustering of financial time series:
    International stock markets evidence. Mpra paper, University Library of Munich,
    Germany (2007), http://EconPapers.repec.org/RePEc:pra:mprapa:2074
 2. Chen, W.C., Maitra, R.: Model-based clustering of regression time series data via
    apecman aecm algorithm sung to an even faster beat. Statistical Analysis and Data
    Mining 4(6), 567–578 (2011)
8     B. Lafuente-Rego, J.A. Vilar

 3. Döring, C., Lesot, M.J., Kruse, R.: Data analysis with fuzzy clustering methods.
    Computational Statistics & Data Analysis 51(1), 192 – 214 (2006)
 4. D’Urso, P., Cappelli, C., Lallo, D.D., Massari, R.: Clustering of financial time
    series. Physica A 392(9), 2114–2129 (2013)
 5. D’Urso, P., Giovanni, L.D., Massari, R.: Time series clustering by a robust autore-
    gressive metric with application to air pollution 141(15), 107–124 (2015)
 6. D’Urso, P., Giovanni, L.D., Massari, R., Lallo, D.D.: Noise fuzzy clustering of time
    series by the autoregressive metric 71(3), 217–243 (2013)
 7. D’Urso, P., Maharaj, E.A.: Autocorrelation-based fuzzy clustering of time series.
    Fuzzy Sets Syst. 160(24), 3565–3589 (2009)
 8. Gavrilov, M., Anguelov, D., Indyk, P., Motwani, R.: Mining the stock market
    (extended abstract): Which measure is best? In: Proceedings of the sixth ACM
    SIGKDD International Conference on Knowledge Discovery and Data Mining. pp.
    487–496. KDD’00, ACM, New York, USA (2000)
 9. Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2(1), 193–218 (1985)
10. Lafuente-Rego, B., Vilar, J.A.: Clustering of time series using quantile autocovari-
    ances. Advances in Data Analysis and Classification pp. 1–25 (2015)
11. Maharaj, E.A.: Clusters of time series. J. Classification 17(2), 297–314 (2000)
12. Piccolo, D.: A distance measure for classifying arima models. J. Time Series Anal.
    11(2), 153–164 (1990)