<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Anomaly Detection in Temporal Graph Data: An Iterative Tensor Decomposition and Masking Approach</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Anna Sapienza</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andre Panisson</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Joseph Wu</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Laetitia Gauvin</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ciro Cattuto</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Data Science Laboratory, ISI Foundation</institution>
          ,
          <addr-line>Turin</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Polytechnic University of Turin</institution>
          ,
          <addr-line>Turin</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>School of Public Health, University of Hong Kong</institution>
          ,
          <country country="HK">Hong Kong</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2015</year>
      </pub-date>
      <abstract>
        <p>Sensors and Internet-of-Things scenarios promise a wealth of interaction data that can be naturally represented by means of timevarying graphs. This brings forth new challenges for the identi cation and removal of temporal graph anomalies that entail complex correlations of topological features and activity patterns. Here we present an anomaly detection approach for temporal graph data based on an iterative tensor decomposition and masking procedure. We test this approach using highresolution social network data from wearable sensors and show that it successfully detects anomalies due to sensor wearing time protocols.</p>
      </abstract>
      <kwd-group>
        <kwd>Data cleaning</kwd>
        <kwd>anomaly detection</kwd>
        <kwd>non-negative tensor factorization</kwd>
        <kwd>high-resolution social networks</kwd>
        <kwd>sensors</kwd>
        <kwd>temporal networks</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Emerging applications in the big data and Internet-of-Things domains pose new
problems for data cleaning. Time-resolved interaction data, in particular, are
especially challenging because the relational nature of the data yields anomalies
that entangle temporal and topological aspects. Several studies have focused on
identifying anomalous behaviors in graph-based datasets [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and time-varying
networks [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. However, mesoscale anomalies that mimic normal behaviors are
observed in empirical data and call for further research.
      </p>
      <p>
        Here we focus on time-varying graphs [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] represented as three-mode tensors
and we present an semi-supervised anomaly detection method based iterative
tensor decomposition and masking. We report on the performance of this method
in detecting and removing anomalies in an empirical social network dataset
gathered by using wearable proximity sensors in a school.
A static graph can be represented by an adjacency matrix M 2 RN N , where
Mij = 1 if a contact between i and j occurred and Mij = 0 otherwise. This
description can be generalized to the case of a time-varying graph, by using a
sequence of S consecutive adjacency matrices, that can be easily arranged as a
tensor T 2 RN N S .
      </p>
      <p>Copyright c 2015 for this paper by its authors. Copying permitted for private and academic
purposes.</p>
      <p>The extraction of latent structures can then be performed by following the
iterative approach described below. This framework allows to carry out the data
cleaning by unearthing at each iteration group behaviours of nodes having
correlated activities and classifying these patterns of activities as meaningul or
anomalous.</p>
      <p>
        Step 1. The Non-negative Tensor Factorization [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] is used as a powerful tool
to approximate the tensor T as a sum of R rank-one tensors ar br cr, called
components. In the speci c case of temporal networks, ar and br provide the
membership of nodes to the component r, whereas cr is the temporal activity
pattern of the component. Moreover, it is possible to consider ar br if the
graph is undirected. The components can be recovered by solving an optimization
problem with non-negative constraints. The minimization problem
min tijk
      </p>
      <p>
        R R R
X X X aimajnckl
m=1 n=1 l=1
2
F
s.t. aim; ajn; ckl
0
(1)
is computed by the alternating non-negative least squares method [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], solved
by using the block principal pivoting algorithm [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. The selection of a suitable
number R of components is guided at each iteration by the Core Consistency
Diagnostic [
        <xref ref-type="bibr" rid="ref10 ref9">9, 10</xref>
        ], and performed in order to prevent over tting.
      </p>
      <p>Step 2. The extracted components are analysed in order to discriminate
between those dominated by anomalous activities or meaningful behaviours. To
this end, a classi er working on the temporal activity patterns of each component
cr was developed.</p>
      <p>
        Step 3. Spurious contact patterns highlighted by the anomalous components
are combined into a mask, used to clean the original tensor. The nodes involved
in each of these contacts are detected by analysing the level of membership
given by ar. The occurrence times of these contacts are given by the anomalous
windows found in the temporal patterns cr. These windows are recovered by
using a step detection algorithm based on the Otsu threshold [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
      </p>
      <p>Step 4. The mask is applied to the tensor T in order to erase the invalid
entries. The cleaned tensor T 0 becomes then the input of the consecutive iteration
in the iterative framework.</p>
      <p>Step 5. The procedure is repeated until no component is classi ed as
anomalous in step 2.
3</p>
    </sec>
    <sec id="sec-2">
      <title>Results and Validation</title>
      <p>
        The current investigation involves the analysis of a high-resolution dataset which
describes the interactions of people in a primary school in Hong Kong. The school
population consisted in 709 children and 65 teachers divided into 30 classes. Data
were collected by using wearable proximity sensors [
        <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
        ] over 10 consecutive days
in March 2013, from Monday 18th to Thursday 27th. These sensors record spatial
proximity with a resolution of 20s. As a result, a time-varying network with
N = 774 nodes was created. The data were then aggregated over a time-window
of 5min, leading to a division of the overall network in S = 2680 snapshots.
      </p>
      <p>The protocol was as follows: the proximity of the sensors was recorded during
the whole experiment duration, and the sensors were grouped in each class at
the end of the school day. Hence, activity patterns composed by strong steady
contacts withinh each class were observed during the school closing time. In
order to clean the data, these anomalous patterns must be retrieved. A
general methodology is thus developed here to deal with the anomaly detection of
temporal graph-based data, and then used to perform the data cleaning of the
present problem.</p>
      <p>The results after 23 iterations of the iterative framework presented in Section
2 are summarised in Fig. 1, that shows the total contact number evolving in
time measured in the original tensor and in the cleaned tensor generated by
the iterative process. The total amount of contacts during the school closure
is extremely reduced as a result of the cleaning process. Normal interactions
belonging to the classes emerge and meaningful patterns are recovered.</p>
      <p>In order to validate the method, a reference tensor was created and used as
a ground truth. To this end, anomalous behaviours were identi ed and removed
from the original dataset by applying the step detection on the temporal contact
densities of each class. The ensuing tensor was compared with the cleaned tensor
T 0 at each iteration, by computing the relative error with the L1-norm. This
quantity, shown in Fig. 2, monotonically decreases with the number of steps of
the iterative approach and stabilizes around the low value of 0:2.</p>
      <p>The reference tensor was then used to label the entries of the original tensor as
anomalous or meaningful, in order to compute the confusion matrix summarizing
the performances of the iterative approach. The resulting recall and precision
values at each entry level reported in Tab. 1 and the overall accuracy of 0:94
highlight the high performance of the iterative approach.
4</p>
    </sec>
    <sec id="sec-3">
      <title>Conclusions</title>
      <p>Time-varying graphs can expose both topology and temporal correlations, which
make the anomaly detection a major challenge. The iterative approach
introduced here captures such correlations and enables to discriminate between
meaningful and anomalous patterns. The evaluation measurements of the anomaly
detection, achieved on the primary school dataset, highlights the high performance
of the implemented method.</p>
      <p>This iterative method is a principled approach, which provides an
unsupervised way to identify and select meso-scale data anomalies. However, some
limitations are worth noting. The method relies on the temporal activity pro le
of a latent component to be able to classify it as anomalous or not. Moreover,
the implication of the NTF makes the iterative approach computationally costly,
in terms of memory and time. The latter problem is being tackled as a lot of
research is devoted to improve the e ciency of the implementation.</p>
      <p>Finally, the iterative framework could be extended to the case of a greater
number of dimensions, e.g. with sensors localized in space.
5</p>
    </sec>
    <sec id="sec-4">
      <title>Acknowledgements</title>
      <p>This research was supported by the Lagrange Project of the ISI Foundation funded by
the CRT Foundation, by the Q-ARACNE project funded by the Fondazione Compagnia
di San Paolo, by the FET Multiplex Project (EU-FET-317532) funded by the European
Commission, and by the Harvard Center for Communicable Disease Dynamics from
the National Institute of General Medical Sciences (grant no. U54 GM088558). The
funding bodies had no role in study design, data collection and analysis, preparation
of the manuscript, or the decision to publish.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>W.</given-names>
            <surname>Eberle</surname>
          </string-name>
          , L. Holder,
          <article-title>Discovering structural anomalies in graph-based data</article-title>
          .
          <source>In ICDM Workshops</source>
          , pp.
          <fpage>393</fpage>
          -
          <lpage>398</lpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>M.</given-names>
            <surname>Mongiovi</surname>
          </string-name>
          , et al.,
          <article-title>Netspot: Spotting signi cant anomalous regions on dynamic networks</article-title>
          .
          <source>SIAM International Conference on Data Mining</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>P.</given-names>
            <surname>Holme</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.</surname>
          </string-name>
          <article-title>Saramaki. Temporal networks</article-title>
          .
          <source>Phys. Reps</source>
          .
          <volume>519</volume>
          :
          <fpage>97</fpage>
          -
          <lpage>125</lpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>J.</given-names>
            <surname>Stehle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Voirin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Barrat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Cattuto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Isella</surname>
          </string-name>
          , et al.
          <article-title>High-resolution measurements of face-to-face contact patterns in a primary school</article-title>
          .
          <source>PLoS One</source>
          <volume>6</volume>
          :
          <fpage>e23176</fpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>C.</given-names>
            <surname>Cattuto</surname>
          </string-name>
          , et al.
          <article-title>Dynamics of person-to-person interactions from distributed RFID sensor networks</article-title>
          .
          <source>PLoS ONE</source>
          <volume>5</volume>
          :
          <issue>e11596</issue>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>T.G.</given-names>
            <surname>Kolda</surname>
          </string-name>
          and
          <string-name>
            <given-names>B.W.</given-names>
            <surname>Bader</surname>
          </string-name>
          .
          <article-title>Tensor decompositions and applications</article-title>
          .
          <source>SIAM Review</source>
          ,
          <volume>51</volume>
          (
          <issue>3</issue>
          ):
          <fpage>455</fpage>
          -
          <lpage>500</lpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>H.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <surname>L.</surname>
          </string-name>
          <article-title>Elde n, and</article-title>
          <string-name>
            <given-names>H.</given-names>
            <surname>Park</surname>
          </string-name>
          .
          <article-title>Non-negative tensor factorization based on alternating large-scale nonnegativity-constrained least squares</article-title>
          .
          <source>In Proceedings of IEEE 7th International Conference on Bioinformatics and Bioengineering (BIBE07)</source>
          , volume II, pages
          <fpage>1147</fpage>
          -
          <lpage>1151</lpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>J.</given-names>
            <surname>Kim</surname>
          </string-name>
          and
          <string-name>
            <given-names>H.</given-names>
            <surname>Park</surname>
          </string-name>
          .
          <article-title>Fast nonnegative tensor factorization with an activeset-like method</article-title>
          .
          <source>In High-Performance Scienti c Computing</source>
          , Springer, London,
          <year>2012</year>
          , pp.
          <fpage>311</fpage>
          -
          <lpage>326</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>R.</given-names>
            <surname>Bro</surname>
          </string-name>
          and
          <string-name>
            <given-names>H. A. L.</given-names>
            <surname>Kiers</surname>
          </string-name>
          ,
          <article-title>A new e cient method for determining the number of components in PARAFAC models</article-title>
          ,
          <source>Journal of Chemometrics</source>
          ,
          <volume>17</volume>
          (
          <year>2003</year>
          ), pp.
          <fpage>274</fpage>
          -
          <lpage>286</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <given-names>E.E.</given-names>
            <surname>Papalexakis</surname>
          </string-name>
          and
          <string-name>
            <given-names>C.</given-names>
            <surname>Faloutsos</surname>
          </string-name>
          ,
          <article-title>Fast e cient and scalable core consistency diagnostic for the parafac decomposition for big sparse tensors</article-title>
          ,
          <source>in Acoustics, Speech and Signal Processing (ICASSP)</source>
          ,
          <source>2015 IEEE International Conference on. IEEE</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>M. Fang</surname>
            , G. Yue, and
            <given-names>Q.</given-names>
          </string-name>
          <string-name>
            <surname>Yu</surname>
          </string-name>
          ,
          <source>The Study on An Application of Ostu Method in Canny Operator, Proceeding of the 2009 International Symposium on Information Processing</source>
          , Huangshan,
          <string-name>
            <given-names>P. R.</given-names>
            <surname>China</surname>
          </string-name>
          , pp.
          <fpage>109</fpage>
          -
          <lpage>112</lpage>
          ,
          <year>August 2009</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>