<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Cities</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>An Exploratory Analysis of Multiple Multivariate Time Series</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Lynne Billard</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ahlame Douzal-Chouakria</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Seyed Yaser Samadi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Mathematics, Southern Illinois University</institution>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Statistics, University of Georgia</institution>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Universite Grenoble Alpes</institution>
          ,
          <addr-line>CNRS - LIG/AMA</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2015</year>
      </pub-date>
      <volume>1</volume>
      <fpage>2</fpage>
      <lpage>14</lpage>
      <abstract>
        <p>Our aim is to extend standard principal component analysis for non-time series data to explore and highlight the main structure of multiple sets of multivariate time series. To this end, standard variancecovariance matrices are generalized to lagged cross-autocorrelation matrices. The methodology produces principal component time series, which can be analysed in the usual way on a principal component plot, except that the plot also includes time as an additional dimension.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>dependencies can be used in a principal component analysis. This produces
principal component time series, which in turn allows the projection of the original
time series observations onto three dimensional principal component by time
space. The basic methodology is outlined in Section2, and illustrated in Section
3.
2
2.1</p>
    </sec>
    <sec id="sec-2">
      <title>Methodology</title>
      <p>Cross-Autocorrelation functions for S &gt; 1 series and p &gt; 1
dimensions
Let Xst = {(Xstj ); j = 1; : : : ; p}, t = 1; : : : ; Ns, s = 1; : : : ; S, be a p-dimensional
time series of length Ns, for each series s. For notational simplicity, assume
Ns = N for all s. Let us also assume the observations have been suitably di
erenced/transformed so that the data are stationary.</p>
      <p>For a standard single univariate series time series where S = 1 and p = 1, it
is well-known that the sample autocovariance function at lag k is (dropping the
s = S = 1 and j = p = 1 subscripts)
^(k) =
1 N−k</p>
      <p>∑ (Xt − X)(Xt+k − X); k = 0; 1; : : : ;
N t=1
X = 1 ∑N Xt;</p>
      <p>N t=1
(2.1)
and the sample autocorrelation function at lag k is ^(k) = ^(k)=^(0), k =
0; 1; : : :.</p>
      <p>These autocorrelation functions provide a measure of the time dependence
between observations changes as their distance apart, lag k. They are used to
identify the type of model and also to estimate model parameters. See, many
of the basic texts on time series, e.g., Box et al. (2011); Brockwell and Davis
(1991); Cryer and Chan (2008). Note that the divisor in Eq.(2:1) is N , rather
than N − k. This ensures that the sample autocovariance matrix is non-negative
de nite.</p>
      <p>For a single multivariate time series where S = 1 and p ≥ 1, the
crossautocovariance function between variables (j; j′) at lag k is the p × p matrix
(k) with elements estimated by
and the cross-autocorrelation function between variables (j; j′) at lag k is the
p × p matrix, (k), with elements { jj′ (k); j; j′ = 1; : : : ; p} estimated by
^jj′ (k) = ^jj′ (k)={^jj (0)^j′j′ (0)}1=2; k = 0; 1; : : : :
(2.3)</p>
      <p>Unlike the autocorrelation function obtained from Eq.(2:1) with its single
value at each lag k, Eq.(2:3) produces a p × p matrix at each lag k. The function
Eq.(2:2) was rst given by Whittle (1963) and shown to be nonsymmetric by
Jones (1964). In general, jj′ (k) ̸= j′j (k) for variables j ̸= j′, except for k = 0,
but (k) = ′(−k); see, e.g., Brockwell and Davis (1991).</p>
      <p>When there are S ≥ 1 series and p ≥ 1 variables, the de nition of
Eqs.(2:2)(2:3) can be extended to give a p×p sample cross-autocovariance function matrix
between variables (j; j′) at lag k, ^ (k), with elements given by, for j; j′ =
1; : : : ; p;</p>
      <p>
        N1S ∑s=S1 N∑t=−1k(Xstj − Xj )(Xs;t+k;j′ − Xj′ ); k = 0; 1;
(2.4)
with Xj = N1S ∑s=S1 ∑t=N1 Xstj ;
and the p × p sample cross-autocorrelation matrix at lag k, ^(
        <xref ref-type="bibr" rid="ref2">1</xref>
        )(k), has
elements ^jj′ (k), j; j′ = 1; : : : ; p, obtained by substituting Eq.(2:4) into Eq.(2:3).
This cross-autocovariance function in Eq.(2:4) is a measure of time dependence
between observations k units apart for a given variable pair (j; j′), calculated
across all S series. Notice, the sample means Xj in Eq.(2:4) are calculated across
all N S observations.
      </p>
      <p>An alternative approach is to calculate these sample means by series. In
this case, the cross-autocovariance matrix ^ (k) has elements estimated by, for
j; j′ = 1; : : : ; p, s = 1; : : : ; S,</p>
      <p>N1S ∑s=S1 N∑t=−1k(Xstj − Xsj )(Xs;t+k;j − Xsj′ ); k = 0; 1;
(2.5)
with Xsj =
1 ∑N Xstj ;
N t=1
and the corresponding p × p cross-autocorrelation function matrix ^(2)(k) has
elements ^jj′ (k) found by substituting Eq.(2:5) into Eq.(2:3).</p>
      <p>Other model structures can be considered, which would provide other options
for obtaining the relevant sample means. These include class structures, lag k
structures, weighted series and/or weighted variable structures, and the like; see
Billard et al. (2015).
2.2</p>
      <p>Principal Components for Time Series
In a standard classical principal component analysis on a set of p-dimensional
multivariate observations X = {Xij ; i = 1; : : : n; j = 1; : : : ; p}, each observation
is projected into a corresponding th order principal component, P C (i), through
the linear combination of the observation's variables,</p>
      <p>P C (i) = w 1Xi1 + · · · + w pXip;
= 1; : : : ; p;
(2.6)
where w = (w 1; : : : ; w p) is the th eigenvector of the correlation matrix (or,
equivalently for non-standardized observations, the variance-covariance matrix
). The eigenvalues satisfy 1 ≥ 2 ≥ : : : ≥ p ≥ 0, and ∑p=1 = p (or, 2 for
non-standardized data). A detailed description of this methodology for standard
data can be found in any of the numerous texts on multivariate analysis, e.g.,
Joli e (1986) and Johnson and Wichern (2007) for an applied approach, and
Anderson (1984) for theoretical details.</p>
      <p>For time series data, the correlation matrix is replaced by the
crossautocorrelation matrix (k), for a speci c lag k = 1; 2; : : : , and the th order
principal component of Eq.(2:6) becomes
P C (s; t) = w 1Xs1t + · · · + w pXspt; = 1; : : : ; p; t = 1; : : : ; N; s = 1; : : : ; S:
(2.7)
The elements of (k) can be estimated by ^jj′ (k) from Eq.(2:4) or from Eq.(2:5)
(or from other choices of model structure). The problem of non-positive de
niteness, for lag k &gt; 0, for the cross-autocorrelation matrix has been studied by
Rousseeuw and Molenberghs (1993) and Jackel (2002), with the recommendation
that negative eigenvalues be re-set at zero.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Illustration</title>
      <p>To illustrate, take a data set (&lt;http://dss.ucar.edu/datasets/ds578.5&gt;) where
the observations are time series of monthly temperatures at S = 14 cities
(weather stations) in China over the years 1923-88. In the present analysis,
each month is taken to be a single variable corresponding to the twelve months
(January, : : : , December, respectively); hence, p = 12. Clearly, these variables
are dependent as re ected in the cross-autocovariances when j ̸= j′.</p>
      <p>Let us limit the discussion to using the cross-autocorrelation functions at lag
k = 1, evaluated from Eq.(2:4) and Eq.(2:3), and shown in Table 1. We obtain the
corresponding eigenvalues and eigenvectors, and hence we calculate the principal
components P C , = 1; : : : ; p, from Eq.(2:6). A plot of P C1 × P C2× time is
displayed in Figure 1, and that for P C1 × P C3× time is given in Figure 2. An
interesting feature of these data highlighted by the methodology is that it is
the P C1 × P C3 pair that distinguishes more readily the city groupings. Figure
3 displays the P C1 × P C3 values for all series and all times without tracking
time (i.e., the 3-dimensional P C1 × P C3 × time values are projected onto the
P C1 × P C3 plane). Hence, we are able to discriminate between cities.
Thus, we observe that cities 1-4 (Hailaer, HaErBin, MuDanJiang and ChangChun,
respectively), color coded in black (and indicated by the symbol black ◦ and full
lines (`lty=1')) have similar temperatures and are located in the north-eastern
region of China. Cities 5-7 (TaiYuan, BeiJing, TianJin), identi ed by red (△ and
lines − · − (`lty=4')), are in the north, and have similar but di erent
temperature trends than do those in the north-eastern region. Two (BeiJing and TianJin)
are located close to sea-level, while the third (TaiYuan) is further south (and so
might be expected to have higher temperatures) but its elevation is very high so
decreasing its temperature patterns to be more in line with BeiJing and TianJin.
Cities 8-11 (ChengDu, WuHan, ChangSha, HangZhou), green (∗) with lines · · ·
(`lty=3'), are located in central regions with ChengDu further west but elevated.
Finally, cities 12-14 (FuZhou, XiaMen, GuangZhou), blue ( ) with lines − − −
(`lty=8'), are in the southeast part of the country.</p>
      <p>Pearson correlations between the variables Xj , j = 1; : : : ; 12, and the
principal components P C , = 1; : : : ; 12, sand correlation circles (not shown) show
that all months have an impact on P C1 with the months of June, July and
August having a slightly negative in uence on P C2. Plots for other k ̸= 1 values
give comparable results. Likewise, analyses using the cross-autocorrelations of
Eq.(2:5) also produce similar conclusions.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Conclusion</title>
      <p>The methodology has successfully identi ed cities with similar temperature trends,
which trends a priori could not have been foreshadowed, but which do conform
with other geophysical information thus con rming the usefulness of the
methodology. The cross-autocorrelation functions for a p-dimensional multivariate time
series have been extended to the case where there are S ≥ 1 multivariate time
series. These replaced the standard variance-covariance matrices for use in a
principal component analysis, thus retaining measures of the time dependencies
inherent to time series data. The methodology produces principal component
time series, which can be compared in the usual way on a principal component
plot, except that the plot also includes time as an additional plot dimension.</p>
      <p>Figure 1 - Temperature Data: P C1 × P C2 over Time { All Cities, k = 1</p>
      <p>40 50 60 70
0 200 400 600 800 0 10 20 30 Time</p>
      <p>PC1
Figure 2 - Temperature Data: P C1 × P C3 over Time { All Cities, k = 1</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <source>Cities 8−11 Cities 12−14</source>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <source>Cities 1−4 PC1 −400 −200 0 200 400 600</source>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Figure</surname>
            3 -
            <given-names>Temperature</given-names>
          </string-name>
          <string-name>
            <surname>Data: P C1 × P C3 { All</surname>
            <given-names>Cities</given-names>
          </string-name>
          , All Times,
          <source>k = 1</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>