<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Clustering Trend Data Time-Series through Segmentation of FFT-decomposed Signal Constituents</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Daniyal Kazempour</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Anna Beer</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Oliver Schrufer</string-name>
          <email>oliver.schruefer@campus.lmu.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Thomas Seidl</string-name>
          <email>seidlg@dbs.ifi.lmu.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Ludwig-Maximilians-Universitat Munchen</institution>
          ,
          <addr-line>Munich</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>When we are given trend data for di erent keywords, scientists may want to cluster them in order to detect speci c terms which exhibit a similar trending. For this purpose the periodic regression on each of the time-series can be performed. We ask in this work: What if we not simply cluster the regression models of each time-series, but the periodic signal constituents? The impact of such an approach is twofold: rst we would see at a regression level how similar or dissimilar two timeseries are regarding their periodic models, and secondly we would be able to see similarities based on single signal constituents between di erent time-series, containing the semantic that although time-series may be di erent on a regression level, they may be similar on an constituent level, re ecting other periodic in uences. The results of this approach reveal commonalities between time series on a constituent level that are not visible in rst place, by looking at their plain regression models.</p>
      </abstract>
      <kwd-group>
        <kwd>Clustering</kwd>
        <kwd>Time-series</kwd>
        <kwd>FFT</kwd>
        <kwd>Signal</kwd>
        <kwd>decomposition</kwd>
        <kwd>constituents</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Whenever data is collected over a period of time we have time-series. Let it
be time-series in the life science, economy, health, social science and many other
research elds, we are faced with time-series containing information on attributes
changing over time. In context of time-series the detection of periodicity is of
special interest, such as the periodicity in the predator-prey behavior among
animals, power-consumption cycles, or trends of certain terms in search engines
etc. As a motivation we introduce in this work the following use-case: Suppose
there is a journal which publishes four times a year. Its content encompasses
topics ranging from health, over society, education and many more. The editor
in chief decides now to include topics in each of the issues which are most trending
among the readers. The question at this point which concerns the editor in chief
Copyright c 2019 for this paper by its authors. Use permitted under Creative
Commons License Attribution 4.0 International (CC BY 4.0).
is: what is most trending? And how can such information be obtained? Or in
other terms: How can the editorial stu decide which of the most trending topics
belong together in an issue? For this purpose one of the editors comes up with
the idea to mine the Google trends platform1. From an editors-compiled list of
topics, the trend data over the past years are collected from the trends platform.
The data scientists in the journal obtain rst for the keyword "turkey" (the
poultry) the time-series and perform a periodic regression on it. We elaborate
on periodic regression in Section 3 in more detail. The data scientists, however,
recognize that the regression approach neglects many information of the original
time-series as it can be seen in Figure 1 (left). Then one of the data scientists
comes up with the idea to perform a Fast Fourier Transformation (FFT) by
which the time-series for \turkey" is decomposed through the frequency domain
into its so called constituents as seen in Figure 1 (right). In fact, the sum over
all the constituents resembles the original signal. What the data scientists can
do now is to cluster the keywords by using the constituents information instead
of clustering the regressions of the time-series. We shall see in the experiments
section of this work that the clustering on basis of signal constituents yields
meaningful clusters compared to the case where we just use the regression of
time-series. By the constituent-based approach the editors can identify the topics
for each calendar quarter of the year, providing articles in their issues which
satisfy the (seasonal) interests of readers.</p>
      <p>
        Apart from our journal example, in [1] the authors list 25 applications of
clustering on time-series data from 11 di erent domains (aviation, energy, nance
etc.). Each of the clustering applications comes with references to at least one
paper, emphasizing that the topic of time-series clustering is undoubtedly of
signi cance and broad range regarding its application. Concluding this introductory
part our contributions in this work are as follows: (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) Providing a representation
of decomposed signal constituents from a periodic signal as parameter-space
vec
      </p>
    </sec>
    <sec id="sec-2">
      <title>1 https://trends.google.de/trends/</title>
      <p>
        tors. (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) A clustering method on regression and on constituent level of time-series
applied on periodic parameter space.
2
      </p>
      <sec id="sec-2-1">
        <title>Related</title>
      </sec>
      <sec id="sec-2-2">
        <title>Work</title>
        <p>Most related work can be found in context of expression patterns of genes, since
many of them are associated with \circannual (yearly), circadian (daily),
cellcycle and other periodic biological processes [that] are known to be rhythmic"
[7]. Early methods like, e.g., [3] use the standard correlation coe cient between
the normalized vectors representing the correlations, since it captures similarity
in the shape of periodically correlated datasets well. However, it does not ful ll
the triangle inequality (and is thus no metric), plus magnitude and shift of
time series are neglected here, which can lead to undesired results. [3] apply a
pairwise average-linkage cluster analysis and focus on organizing and graphically
displaying data in a way to allow users to explore data in an intuitive manner.
RAGE [7] is phase independent and uses a true distance metric based on the
undirected Hausdor distance. Nevertheless, it is based on nding the periodic
representation of each time series by tting it to an ideal synthetic sinusoid or
another already known (gene expression) pro le. CorrCos [4] rst generates over
100000 synthetic models and then matches each time series to one of those models
using cross correlation. In a more recent work, \a periodic covariance function
based on a projection of the Matern covariance" [5] is used. They regard noise
and allow time series to share only a periodic component, structuring them in a
hierarchical manner. However, they need some information like the phase-length
beforehand, which may be popular in this eld, but not generally accessible
for new datasets. The representation of time series in form of FFT and other
approaches is quite common in signal processing as discussed in [9], which also
gives a brief overview on some of the most common techniques. With regards to
the aspect of time-series distance measures there is a wealth of literature, such
as in a more recent work of [8]. To the best of our knowledge, we are not aware of
any algorithm regarding the singular components of each periodically correlated
time series, but only such working on simple periodical correlations.
3</p>
      </sec>
      <sec id="sec-2-3">
        <title>Clustering via Signal Constituents</title>
        <p>In this section we elaborate on our approach of clustering periodic time series
by clustering signal constituents. For this purpose we rst describe the periodic
feature vectors and parameter space and proceed on periodic regression. We
contrast the regression approach by describing on how we perform signal
decomposition using FFT. Having obtained the signal constituents, we cluster them,
which requires speci c handling, being elaborated on in the last subsection.
3.1</p>
        <sec id="sec-2-3-1">
          <title>Periodic Feature Vectors and Parameter Space</title>
          <p>A periodic function can be represented as follows:
De nition 1 (Periodic Function) Given an object (x; y) 2 R2, a periodic
sinusoidal function is de ned as:
Where a is the amplitude, f the frequency, p the phase-shift and v the vertical
shift. As such a periodic feature vector ' is de ned as:</p>
          <p>The periodic feature vector ' can be regarded as a model which describes a
periodicity within a given time series. At this point we recapitulate that we do
not aim at comparing residue-by-residue multiple time-series with each other,
but their respective models. For this purpose we de ne the periodic parameter
space P as a feature space being spanned by the set of parameters of a periodic
function fa; f; p; dg with ' 2 P.
3.2</p>
        </sec>
        <sec id="sec-2-3-2">
          <title>The FFT and Periodic Regression</title>
          <p>The most common usage of the Fourier transform is to convert a given signal from
the so-called time spectrum to a frequency spectrum. One core aspect on which
the Fourier transform relies, is that every non-linear function can be represented
as a sum of sine functions. A Fourier transform decomposes the time signal and
reveals information about the frequency of all involved sine waves that generate
the original signal. For time-series of evenly spaced values, such as time series
of having data from every second, hour, week etc. the so called Discrete Fourier
Transform (DFT) is de ned as follows:</p>
        </sec>
        <sec id="sec-2-3-3">
          <title>De nition 2 (Discrete Fourier Transformation)</title>
          <p>N 1
Xk = X xne 2 ikn=N</p>
          <p>n=0
where N denotes the number of samples, n the current sample, xn the value of
a signal at the sample point n, k the current frequency and Xk the result of the
DFT encapsulating amplitude and phase.</p>
          <p>Since the computation of the DFT as shown in De nition 2 has a quadratic
runtime complexity of O(n2), we use in this work the so called Fast Fourier
Transform (FFT)[2] which reduces the runtime to O(N log(N )) by recursively
dividing a given DFT into smaller DFTs .</p>
          <p>A periodic regression is de ned as:</p>
        </sec>
        <sec id="sec-2-3-4">
          <title>De nition 3 (Periodic Regression) Given a set DB of objects from e.g. a</title>
          <p>time series (xi; yi) of variables (dependent and independent), the Damped Least
Squares (DLS, also known as Levenberg-Marquardt algorithm) aims to nd the
parameters of a model curve (x; ) s.t. the sum of squares of the deviation
S( ) is minimized:</p>
          <p>S( ) =
m
X(yi
i=1</p>
          <p>(xi; ))2
with an initial guess p0 = (a0; f0; p0; v0) which is obtained from a FFT taking
the frequency, phase-shift and vertical shift with the highest amplitude.</p>
          <p>For a periodic regression we perform rst an FFT on a given time-series. We
take the peak frequency in the frequency domain from the FFT and discard all
other frequencies. The peak refers to that particular frequency with the highest
energy (=amplitude). Intuitively it can be compared to a principal component
analysis (PCA), where for dimension reduction purposes we keep for example
only that principal component with the strongest Eigenvalue. The amplitude is
determined by computing the absolute value of the complex conjugate from an
FFT. Having the frequency with the highest amplitude (=peak) we now need to
determine the phase-shift which is easily obtained by computing the angle of the
complex conjugate result. With the amplitude, frequency and phase we can now
compute the missing vertical shift. The FFT-obtained frequency, phase-shift,
vertical-shift and amplitude are taken as an initial guess for a DLS as stated
in De nition 3. We got now the periodic feature vector ' which represents the
regression of a given periodic time series.</p>
          <p>However, the regression approach comes with the drawback that we may
lose information on the constituents which actually contribute to the detailed
\shape" of a time-series. This is an issue which we shall approach in the upcoming
subsection.
3.3</p>
        </sec>
        <sec id="sec-2-3-5">
          <title>Signal Constituent Tensors through Signal Decomposition</title>
          <p>What are the most determinant constituents of an original time-series signal?
Or, to express it rather in terms of a principal component analysis: what are
the most determinant principal components? For this purpose we perform an
FFT as described in the previous section and keep the k strongest signals, where
the signal strength is determined by its amplitude. The other constituents are
discarded. A possible approach by which k can be chosen is, e.g., by using the
elbow-method[6]. As a result we obtain a tensor of the following structure:</p>
        </sec>
        <sec id="sec-2-3-6">
          <title>De nition 4 (Signal Constituent Tensor) Given the top k signal con</title>
          <p>stituents i = (ai; fi; pi; vi)T from an FFT on a time-series , the resulting
signal constituent tensor is de ned as:</p>
          <p>( ) = ( 0T ; :::; kT 1)T
Where the single constituents are ordered in a way such that it holds:
ai &gt; ai+1 &gt; ::: &gt; ak 1</p>
          <p>The signal constituent tensor from De nition 4 provides us now a
representation of the top k strongest signal constituents. They pose the very basis
for computing similarities between di erent time-series among their respective
constituent signals.
3.4</p>
        </sec>
        <sec id="sec-2-3-7">
          <title>Similarity and Clustering of Signal Constituents</title>
          <p>Having now obtained the signal constituent tensors, we use them as a base for
performing a clustering. As a design decision we use here hierarchical clustering
with average linkage. In order to perform hierarchical clustering we need to have
a distance matrix. For such a distance matrix the question of how to actually
compute the distance of the signal constituent tensors between two time-series
arises. For this purpose we like to introduce the MinCT-dist distance in this
work.</p>
        </sec>
        <sec id="sec-2-3-8">
          <title>De nition 5 (Minimum Constituent Tensor Distance (MinCT-dist))</title>
          <p>
            Given two signal constituent tensors, each of them from one time-series (
            <xref ref-type="bibr" rid="ref1">1</xref>
            )
and (
            <xref ref-type="bibr" rid="ref2">2</xref>
            ). The minimum constituent tensor distance is de ned as:
dMinCT ( (
            <xref ref-type="bibr" rid="ref1">1</xref>
            ); (
            <xref ref-type="bibr" rid="ref2">2</xref>
            ))
          </p>
          <p>
            = minf( 0 (
            <xref ref-type="bibr" rid="ref1">1</xref>
            ) ; 0 (
            <xref ref-type="bibr" rid="ref2">2</xref>
            ) ); :::; ( k 1 (
            <xref ref-type="bibr" rid="ref1">1</xref>
            ) ; k 1 (
            <xref ref-type="bibr" rid="ref2">2</xref>
            ) )g
where ( i (
            <xref ref-type="bibr" rid="ref1">1</xref>
            ) ; j (
            <xref ref-type="bibr" rid="ref2">2</xref>
            ) ) 2
(
            <xref ref-type="bibr" rid="ref1">1</xref>
            )
          </p>
          <p>
            (
            <xref ref-type="bibr" rid="ref2">2</xref>
            )
          </p>
          <p>The intuition behind De nition 5 is that we take between two constituent
tensors the minimum of two constituents as the distance. Through this approach
we link those time-series which have a small distance on a speci c constituent.
By that de nition we also have some sort of a lower bound, since the distance
between two time-series can not be smaller than any of the constituents.</p>
          <p>Such an approach as provided by MinCT-dist would work well on coarse
resolutions (such as one measure per week), but will fail on ner temporal
resolutions, since we neglect with MinCT-dist repetitive patterns on daily, or weekly
level. To mitigate this e ect, we provide a di erent distance measure taking
into account the di erent phase-shifts for each time-series. This leads to a rst
de nition of our next distance measure:
De nition 6 (Phase-shift based distance) Given two time-series i, j
where each of the time-series has a common series of di erent frequency intervals
like f = [f1; f2; :::; fi] where e.g. f1 = [23h; 35h]. For each time-series a FFT
is computed, obtaining the respective amplitudes a and phase-shifts p among the
frequency f . The distance between two time-series i and j is de ned as:
n 1
dphase( i; j ) = X</p>
          <p>De nition 6 also relies on taking the top-k impactful frequencies for each of
the frequency intervals. However, this phase-shift-centric approach comes with
the major drawback that all frequency intervals have the same impact being
considered as 'equally important'. Looking for a keyword like e.g. 'Beer garden'
(Biergarten in German) the daily cycles may be more important since people
query their search engine of choice for the nearest beer garden in the afternoon.
Looking at an annual resolution, the summer may be of more relevance since
people would look for a beer garden to spend their time in summer rather then
in winter. To counteract this e ect, we multiply each of the absolute phase-shift
distances with their associated amplitude, yielding the next stage of our distance
de nition:</p>
        </sec>
        <sec id="sec-2-3-9">
          <title>De nition 7 (Amplitude-weighted phase-shift based distance) Given</title>
          <p>two time-series i, j . The amplitude-enhanced variant of the phase-shift based
distance is de ned as:
daw phase( i; j ) =
n 1
X a ifk
k=0
a jfk jp ifk</p>
          <p>By including the amplitudes of both time series as weighting factors like
in De nition 7, frequency intervals with small impact (small amplitude) have a
smaller impact on the whole distance. In the scenario that in both time series we
have high amplitudes in the same frequency intervals, the phase-shift distance is
also meaningful for the overall distance. However this approach will fail given the
case that both time-series do not share impactful frequencies. This would result
in small distances rendering it impossible to tell if they have similar phase shifts
or if the distance itself is just insigni cant. This e ect is however mitigated by
adding up the multiplied amplitudes without the phase shifts and normalizing
De nition 7 by it.</p>
        </sec>
        <sec id="sec-2-3-10">
          <title>De nition 8 (Normalized amplitude-weighted phase-shift based dis</title>
          <p>tance) Given two time-series i, j . The normalized amplitude-enhanced
variant of the phase-shift based distance is de ned as:
dnaw phase( i; j ) =</p>
          <p>1
Pn 1
k=0 a ifk
n 1</p>
          <p>X a ifk
a jfk k=0
a jfk jp ifk</p>
          <p>With De nition 8 we have now a distance function which computes the
distance with respect to (a) the phase-shift, (b) the amplitudes and thus the impact
and (c) di erent frequency intervals. We shall see in the experiment section the
e ects of our de ned distance function.
4</p>
        </sec>
      </sec>
      <sec id="sec-2-4">
        <title>Experiments</title>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>2 http://trends.google.com</title>
      <p>In order to re-connect to the initial journal use-case from the introduction
section, we take for the conducted experiments 10 time-series from Google trends2,
namely: college dorm, reworks, tness, u, healthy eating, holidays, i have a
dream, sunburn, superbowl and turkey from a time frame between 2004 and
today in the region of the United States. We have the relevance for each of these key
terms which denote the vertical axis and each sample represents one month. This
yields a total of 183 samples per key term. We used the FFT from the numpy
library on which we wrote routines to extract the amplitude, phase-shift and
vertical shift of each constituent, ranking them by amplitude and encapsulating
them into a constituent tensor. We then performed a hierarchical clustering using
rst a periodic regression and then all three of our distance measures.
Computing rst the clustering for the regression models for each of the keywords we
obtain the following result as seen in Figure 2.</p>
      <p>In the periodic regression model clustering we can observe one cluster block
on the diagonal with low intra-cluster distance (center) and two with high
intracluster distance. The clustering however does not seem to make that much of
sense, since the Martin Luther King day (I have a dream) and turkey as well
as u or super bowl do not share any temporal closeness. They all occur annual
(except u maybe) but do not have the same time of the year (phase).</p>
      <p>Comparing the regression result against the minimum constituent tensor
distance, we obtain the cluster map as shown in Figure 3. It shows, that our
approach yields a more meaningful result. We have two clusters with very low
intracluster distances. The bottom-right cluster encompasses the keywords healthy
eating, I have a dream, tness and holidays. Taking a closer look at the
keywords reveals that healthy eating and tness are terms of high relevance during
the beginning of the year (mostly new years resolutions). Also the Martin Luther
King day is in January (20.01). In the top-left cluster we can see reworks, u,
turkey, college dorm, sunburn and super bowl. Fireworks has its peak around
the independence day (04.07) and sunburn also matches well, since we are in the
summer time. Flu take some kind of special role, since the people seem to look
more for this keyword in the search engine during October which is also close
to thanksgiving (turkey). College dorm is close to the summer vacations and as
such also suitable in the cluster. Just the term super bowl does not seem to t
in the pattern, since it is in February. Compared to the simple regression model,
a clustering based on decomposed signal constituents seems to provide a much
more intuitive result.</p>
      <p>However, in this rst experiment the time-series data was obtained on a
weekly-basis. What happens if we take di erent keywords on di erent temporal
resolutions? The frequency intervals encompass 8, 12, 24, 84, 168, 420, 735, 802,
981, 1103, 1471, 2207, 2943, 3924, 4415, 7064, 8830 and 11773 hours representing,
working day, weeks, months, seasons etc. For each of the keywords for all of the
frequency intervals their respective amplitudes and phase-shifts are computed
through FFT. As keywords for this second data set we chose weather (Wetter),
outdoor pool (Freibad), winter tires (Winterreifen), bakery (Backer), MVV
(Munich local public transport), beer garden (Biergarten), sunburn (Sonnenbrand),
tv program (Fernsehprogramm), Sunrise (Sonnenaufgang), Twitter, Club, Ikea,
Christmas (Weihnachten), Ski and Firework (Feuerwerk). Computing clusters
on this trend data with MinCT-dist yields the clustermap as seen in Figure 4.
While club and twitter are together in one cluster, most of the rest of the data
set is put into one massive cluster where Christmas and outdoor pool are put
together. The result seems arbitrary.</p>
      <p>If we look at the clustermap yielded by the normalized amplitude-weighted
phase-shift based distance, one can see that more ne-grained clusters are
revealed as seen in Figure 5. Weather and outdoor pool are put together into one
cluster which makes sense, since people would go to an outdoor pool only if the
weather is suitable. Further sunburn, beer garden, outdoor pool and MVV fall
into the same cluster.</p>
      <p>Taking a look at the time-series on a 30-days interval reveals no common
periodic pattern as seen in Figure 6.</p>
      <p>However if we change from the 30-days interval to a 7-days interval one can
observe that all keywords have a high(er) number of queries between 8am and
10pm as seen in Figure 7. Individually the MVV peeks at around 8am and
9pm which corresponds roughly to the morning time people getting to work
and the evening time returning from evening events. Outdoor pool peeks at
around lunch time, which re ects the time people seeking to go swimming in
the afternoon. In the late afternoon beer garden peeks which is due to the e ect
that people look for beer gardens to spend their evening at. Lastly we have as
an interesting insight an increased query for sunburn around 9pm-10pm which
may be speculated that people have got a sunburn and look for ways to heal or
alleviate the pain from it.
In this work-in-progress we have presented two distance functions for clustering
time-series based on their signal constituents which emerged from an FFT. The
rst experiments on real-world data showed that our method reveals similarities
between time-series on the constituent models which would not have been visible
by simply clustering the regression models. While our rst approach goes beyond
a simple periodic regression, it fails when considering di erent frequency intervals
with their di erent phase shifts. For this purpose we developed the normalized
amplitude-weighted phase-shift based distance, which provides a better re
ection of distances by putting phase-shift and amplitudes on di erent frequency
intervals into account. As this work may seem like 'yet another distance
measure for temporal data' it aims to explore the computation of distances based on
single constituents, respecting the in uences of phase-shift, amplitude and
frequency intervals. We hope that this work leads to new discoveries on time-series
data and the team in our ctional journal to more time-speci c topics for their
readers.</p>
      <sec id="sec-3-1">
        <title>Acknowledgement</title>
        <p>This work has been funded by the German Federal Ministry of Education and
Research (BMBF) under Grant No. 01IS18036A. The authors of this work take
full responsibilities for its content.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Aghabozorgi</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shirkhorshidi</surname>
            ,
            <given-names>A.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wah</surname>
          </string-name>
          , T.Y.:
          <article-title>Time-series clustering{a decade review</article-title>
          .
          <source>Information Systems</source>
          <volume>53</volume>
          ,
          <fpage>16</fpage>
          {
          <fpage>38</fpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Cooley</surname>
            ,
            <given-names>J.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tukey</surname>
            ,
            <given-names>J.W.:</given-names>
          </string-name>
          <article-title>An algorithm for the machine calculation of complex fourier series</article-title>
          .
          <source>Mathematics of computation 19(90)</source>
          ,
          <volume>297</volume>
          {
          <fpage>301</fpage>
          (
          <year>1965</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Eisen</surname>
            ,
            <given-names>M.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Spellman</surname>
            ,
            <given-names>P.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brown</surname>
            ,
            <given-names>P.O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Botstein</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Cluster analysis and display of genome-wide expression patterns</article-title>
          .
          <source>Proceedings of the National Academy of Sciences</source>
          <volume>95</volume>
          (
          <issue>25</issue>
          ),
          <volume>14863</volume>
          {
          <fpage>14868</fpage>
          (
          <year>1998</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Harmer</surname>
            ,
            <given-names>S.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hogenesch</surname>
            ,
            <given-names>J.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Straume</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            , H.S., Han,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kreps</surname>
            ,
            <given-names>J.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kay</surname>
            ,
            <given-names>S.A.</given-names>
          </string-name>
          :
          <article-title>Orchestrated transcription of key pathways in arabidopsis by the circadian clock</article-title>
          .
          <source>Science</source>
          <volume>290</volume>
          (
          <issue>5499</issue>
          ),
          <volume>2110</volume>
          {
          <fpage>2113</fpage>
          (
          <year>2000</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Hensman</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rattray</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lawrence</surname>
          </string-name>
          , N.D.:
          <article-title>Fast nonparametric clustering of structured time-series</article-title>
          .
          <source>IEEE transactions on pattern analysis and machine intelligence</source>
          <volume>37</volume>
          (
          <issue>2</issue>
          ),
          <volume>383</volume>
          {
          <fpage>393</fpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Ketchen</surname>
            ,
            <given-names>D.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shook</surname>
            ,
            <given-names>C.L.:</given-names>
          </string-name>
          <article-title>The application of cluster analysis in strategic management research: an analysis and critique</article-title>
          .
          <source>Strategic management journal 17(6)</source>
          ,
          <volume>441</volume>
          {
          <fpage>458</fpage>
          (
          <year>1996</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Langmead</surname>
            ,
            <given-names>C.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yan</surname>
            ,
            <given-names>A.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McClung</surname>
            ,
            <given-names>C.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Donald</surname>
            ,
            <given-names>B.R.</given-names>
          </string-name>
          :
          <article-title>Phase-independent rhythmic analysis of genome-wide expression patterns</article-title>
          .
          <source>Journal of computational biology 10(3-4)</source>
          ,
          <volume>521</volume>
          {
          <fpage>536</fpage>
          (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Lucas</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shifaz</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pelletier</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>ONeill</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zaidi</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goethals</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Petitjean</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Webb</surname>
            ,
            <given-names>G.I.</given-names>
          </string-name>
          :
          <article-title>Proximity forest: an e ective and scalable distance-based classi er for time series</article-title>
          .
          <source>Data Mining and Knowledge Discovery</source>
          <volume>33</volume>
          (
          <issue>3</issue>
          ),
          <volume>607</volume>
          {
          <fpage>635</fpage>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Popivanov</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Miller</surname>
            ,
            <given-names>R.J.</given-names>
          </string-name>
          :
          <article-title>Similarity search over time-series data using wavelets</article-title>
          .
          <source>In: Proceedings 18th international conference on data engineering</source>
          . pp.
          <volume>212</volume>
          {
          <fpage>221</fpage>
          .
          <string-name>
            <surname>IEEE</surname>
          </string-name>
          (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>