<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Event2Graph: Event-driven Bipartite Graph for Multivariate Time Series Forecasting and Anomaly Detection</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Yuhang Wu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mengting Gu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lan Wang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yu-san Lin</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fei Wang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hao Yang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Visa Research</institution>
          ,
          <addr-line>385 Sherman Ave., Palo Alto, CA</addr-line>
          ,
          <country country="US">U.S.A</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Modeling inter-dependencies between time series is the key to achieve high performance in anomaly detection for multivariate time series data. The de-facto solution to model the dependencies is to feed the data into a recurrent neural network (RNN). However, the fully connected network structure underneath the RNN (either GRU or LSTM) assumes a static and complete dependency graph between time series, which may not hold in many real-world applications. To alleviate this assumption, we propose a dynamic bipartite graph structure to encode the inter-dependencies between time series. More concretely, we model time series as one type of nodes, and the time series segments (regarded as event) as another type of nodes, where the edge between two types of nodes describe a temporal pattern occurred on a specific time series at a certain time. Based on this design, relations between time series can be explicitly modeled via dynamic connections to event nodes, and the multivariate time series anomaly detection problem can be formulated as a self-supervised, edge stream prediction problem in dynamic graphs. We conducted extensive experiments to demonstrate the efectiveness of the design.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Time series</kwd>
        <kwd>Anomaly detection</kwd>
        <kwd>Dynamic graph</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        derlying inter-dependencies are diferent. Our work is
built upon the recent wisdom of dynamic graph neural
Detecting anomalies in time series data has been an im- network. We allow the connectivity of the graph changes
portant problem in the research community of data min- dynamically in diferent time stamps based on the
pating as well as the finance industry. In many circum- terns on the time series.
stances, anomaly patterns in multiple time series need In order to construct a dynamic graph on-the-fly, one
to be taken into account together to disclose the full important question is how to determine the
connectivpicture of the system. Previous works in multivariate ity of the graph in each timestamp. Our work is
inanomaly detection mainly rely on recurrent neural net- spired by the recent progress of evolutionary event graph
works (RNNs). Malhotra et al.[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and Hundman et al.[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ] where the nodes in the graph represent the
timeemployed LSTM models [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] to capture the temporal de- sequences segments (events) and directed links represent
pendencies of multivariate signals and adopted predic- the transition of the segments (events). Compare to
prevition and reconstruction errors, respectively, to identify ous works, this line of research naturally models the
timethe anomaly patterns. Su et al.[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] improved the classical varying relations among time series states via dynamic
RNN model by modeling the probability distribution of connections, and each state carries a physical meaning
time series via incorporating stochastic variables. To fur- that is understandable by human. However, one major
ther model the correlations of time series explicitly, Zhao limitation of [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] is that the event nodes employed in this
et al.[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] proposed a graph attention network to propa- work capture the information across all the time series.
gate information from diferent time series and aggregate Assume there are  segment patterns in each time series,
the information together before feeding into a GRU [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. and the number of time series is . The model would
However, because of the underlying RNN structure, pre- need  number of event nodes to represent the
multivious works assume a static, complete dependency graph variate signals. This exponential number of event nodes
among time series. These approaches may not perform strongly limits the information processing capability of
well under regime change of time series where the un- the evolutionary state graph and therefore allows only a
very small number of time series () to be analyzed in
practice (four in the dataset used in the paper [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]).
      </p>
      <p>
        To address the problem of exponential number of
combinations, we disentangle the time series nodes and the
event nodes in our design, and model them as two types
of nodes in a dynamic bipartite graph (as depicted in
Fig. 1). Each event node only represents a time
segment on one individual time series, instead of integrating
patterns across all time series. The undirectional
connection between two types of nodes indicates event  sors, Su et al.[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] proposed a stochastic LSTM framework
happens on the ℎ time series at time . So the maxi- to model the data distribution of time series in a global
mum number of edges in the graph is (), which perspective. As RNN was originally proposed to model
is much smaller than (2). To further improve the the temporal dependencies between diferent timestamps,
eficiency and generalizability of the algorithm, we built and all the dimensions of the input of RNN are used to
deupon the framework based on the recent advances in scribe a single concept (e.g.word) all together, it was not
edge streams [9, 10, 11], where connections between tailored to model the inter-dependencies among variables
nodes are modeled as incoming attributed edges instead by design. Recently, Zhao et al.[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] proposed a graph
atof constructing adjacency matrices. The complete sys- tention network-based (GAT) approach [12] where each
tem, with the name Event2Graph (Event-driven bipartite time series is regarded as an individual node, and
informaGraph), outperforms previous strong baselines on two tion are aggregated based on the underlying similarity
public data-sets, and we summarize our main contribu- of the signals. While this solution partially mitigates
tions as follows: the problem by taking into account of dynamic pairwise
(i) We propose a bipartite event-graph-based system to similarities between time series, the design assumes a
analyze the multivariate time series and model the inter- complete and static inter-dependency graph. Also, the
actions between time series and event segments via edge processed information after GAT is simply aggregated
streams. (ii) Events employed in the system is highly all together and fed into a single GRU. Hence the module
interpretable and easy for human to interact with. (iii) still sufers from similar problems as previous designs.
The system achieves competitive performance on chal- Dynamic inter-dependency relation: A few recent
lenging anomaly detection datasets through graph edge works started to explore the partial inter-dependency
forecasting. relations among time series. Deng et al.[13] constructed
a neighborhood graph on-the-fly based on sensor
embeddings to describe the dependencies between diferent
2. Related work sensors. The sparsity level of this incomplete graph can
be customized by users, but the connectivity of the graph
is fixed once constructed. Hu et al.[
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] proposed the first
dynamic dependency design in multivariate anomaly
detection. In this design, nodes are interpretable time series
states, and the transition of these states are explicitly
modeled by dynamic graph neural network. This design
allows the system to represent the time-varying relations
among time series. However, one major limitation of
Static inter-dependency relation: In this category
of approach, relations (e.g., correlation) between
multivariate time series are fixed once learned, and the model
assumes all the time series can influence each other
(complete inter-dependency). LSTM-based framework has
been widely employed in this category of work. For
instance, Malhotra et al.[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] proposed a LSTM-based
autoencoder network to detect anomalies from multiple
senthe method is it uses a single event node to represent
segments across all time series. This design brings the
combination explosion problem in pattern representation
(mentioned in Section I) and makes it hard to tackle a
moderate scale problem. Our model is built upon the
advances in dynamic graph neural networks. Most of the
literatures in dynamic graph networks assume
discretetime graph dynamics where the graphs are represented as
a sequence of snapshots [14, 15, 16]. Recent works start
exploring more flexible design and assuming edges can
appear at any time [10, 17, 9, 18, 11]. Our solution is
derived from a recent solution named Temporal Graph
Networks (TGN) proposed by Rossi et al.[11]. The model is
built upon the temporal graph attention network (TGAT)
[10] but extended with an unique node-wise memory
which attempt to model the temporal dependency on the
node level instead of the system level as [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. It allows us
to model the dynamic inter-dependency in our bi-partite
event graph accurately, and provide us the flexibility to
incorporate new nodes that have not seen in training.
      </p>
    </sec>
    <sec id="sec-2">
      <title>3. Problem definition</title>
      <p>
        A multivariate time series is defined as X ∈ R ×  where
 is the maximum number of time stamps in the time
series and  is the dimension. The anomaly detection
problem is to detect specific time stamps ̇* ∈ O in a
multivariate time series where the time series
behavior deviating from the normal patterns of the time series.
Note that O contains timestamps that marked as anomaly
by domain expert. In case  is large, a common way to
handle long time series is to use a sliding window
approach. Let  denote the length of the sliding window.
We hereby formulate the problem as a binary
classification problem with the objective to identify time windows
with X[˙−  :˙]×  that contain anomaly time-stamps. One
solution of solving the multivariate time series anomaly
detection problem is to convert time series into a
homogeneous graph [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ], where G is defined on the
representative patterns of a multivariate time series. The node of
the graph v corresponds to a human interpretable time
series patterns p ∈ R × . The pattern should be
representative enough so that the whole time series can be
approximated by transitions of symbolic events (states)
in the graph. The transitional relation between two
sequential nodes (end with stamp  and  + 1) is defined
as an edge in the graph. Many existing works (e.g., Bag
of Patterns, Shaples, sequence clustering) can be used to
distill representative patterns (event) from time series,
but most of them only efective on univariate time series
where  = 1.
      </p>
    </sec>
    <sec id="sec-3">
      <title>4. Bipartite Event Stream</title>
      <sec id="sec-3-1">
        <title>The core intuition behind the dynamic bipartite design</title>
        <p>is to decouple three key concepts in the traditional event
graph, namely: where (which time series), when (at what
time), and which (the event category). This practice helps
us avoid the problem of exponential pattern
combinations.</p>
        <p>Specifically, the proposed algorithm is built upon a
dynamic bipartite graph. We define it as a sequence of
undirected bipartite event graph  = {(v, v,

(v, v))}. Here  represents when the time window
that the event graph is formulated. If the window size is
set to be 1, then the time window  is essentially the same
as the time step ̇. A time sequence index node v
indicates “where" an event v is happening. An attributed
edge (v, v) that connect event node v and sequence
index node v indicates an event v happened on time
series  at time window . For simplicity, we also denote
the edge as , in the following sections. We employ
an edge stream representation so an edge (v, v) is
only constructed to represent the relation that actually
existed. A major benefit of using edge stream
representation over adjacency matrix is that it allows the graph
structure to be more scalable and flexible, which
provides us the generality of incorporating new events that
have not appeared in training. Based on this bipartite
graph structure, we convert the multivariate anomaly
detection problem into a self-supervised edge prediction
problem in dynamic graph. Given the historical sequence
of G(1:) and event nodes v+1 in time window  + 1,
we are predicting edges ˆ(v+1, v+1)) in G+1. The
anomaly is derived as a mismatching score from the
predicted edge set ˆ(v+1, v+1)) and the observed edge
set (v+1, v+1)) with a read-out function (· ).</p>
        <p>So the procedure of solving the anomaly detection
problem becomes:
1. Given a normal sequence X, identify
representative patterns on each of the time sequence
respectively in multivariate time series.
2. With identified representative patterns events,
merge similar events across time series to indicate
afinity relation between time series.
3. Build a sequence of bipartite event graph for
multivariate time series (with a stride  ) based on
event matching.
4. Analyze the sequence of bi-partite event graphs
and derive a model Ψ to describe the intra- and
inter-dependency relations.
5. Given a testing sequence X, repeat (3), apply the
model learned in (4) to predict ˆ at time window
* .
6. Derive anomaly scores based on the predicted ˆ
and original time series with a proposed readout
function (· ).</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>5. Event Node Detection And</title>
    </sec>
    <sec id="sec-5">
      <title>Matching</title>
      <p>place. We employed a highly eficient C language-based
Dynamic Time Wrapping algorithm [23] to conduct the
event matching, which provides a ( −  )∖ ×  × 
We defined the event nodes on each time series separately similarity tensor ℒ to indicate the similarity of each
and employed an unsupervised algorithm to identify the event to all the sliding window with size  and stride 
events. As we have no prior knowledge about what an in the input.
anomaly pattern would look like in most real world appli- Event nodes created by pattern matching: for a single
cations, the system is designed to learn the representative time stamp , given ℒ the event graph is created by
patterns and their correlations from observed data. selecting the best matched events for each time series.</p>
      <p>Representative segment detection: We adopted Matrix The event node is then connected with the
correspondProfile [ 19], a state-of-the-art unsupervised algorithm to ing time series node v in the event graph G with an
identify representative patterns from time sequences.f attribute edge (v, v). By finding and linking the best
Matrix profile is able to identify the top-  repeated pat- matched event for each time series, we added  number
terns on time series with high accuracy and low com- of event nodes into the graph.
putation time. We employed a recent implementation Event nodes created by residual error: we noticed that
of matrix profile called SCRIMP++ [ 20] on each of the even for the best matched time series, the pattern
matchtime series in X, and the algorithm yields a list of single ing still left with small residual errors. We hereby define
dimensional representative patterns p, ( ∈ {1 : }, two general residual nodes which indicate whether the
 ∈ {1 : }), each with size  . residual error in a time series is larger than a threshold</p>
      <p>Event generation: many time series share similar rep-  . One residual event node denoted as v+ indicates the
resentative patterns with each other. It is reasonable to residual error is larger than  , another residual event
merge similar segment patterns together across diferent node denoted as v− indicates the residual error is equal
time series to create the event nodes. To measure the or smaller than  . The parameter  is automatically
gensimilarity between time series, we employed dynamic erated using SPOT [24] algorithm, where we employ the
time wrapping [21], which provides us with a robust dis- whole training data-set for initialization and testing for
tance matrix that is insensitive to small misalignments on-going adaptation. All time series shared these two
between two time series segments. After that, we em- residual nodes as shown in Fig. 3.
ployed the H-DBSCAN [22] to cluster the patterns p, After generating a sequence of bipartite event graph
into clusters. We select the centroid of each cluster to G(1:) for time series X, a model Ψ is trained on G(1:)
be an event, which is a representative segment across all so that given a testing sequence X, it is able to predict
time series within the cluster. Finally, we obtain an event the connectivity of an event graph in any specific time
set E which contains all events each with length  . stamp ̇ by observing historical sequence observed on</p>
      <p>Event matching: after extracting  events from time se- X. In Fig. 2, we visualize the events forecasted/detected
ries segments, we match each event with the original time on ten time series. The diference between the top and
series to identify where and when each event is taking the middle group result in the bottom group where we</p>
      <sec id="sec-5-1">
        <title>5.1. Network Design With Node-wise Memory</title>
        <p>time  as following:
() =</p>
        <p>∑︁
∈([0,])
 ((), (), ,, (), ())
(3)
For each pair of v and v in graph G(), the node fea- TGA represents the temporal graph attention module
tures of them are defined as  and , respectively. We [11], where  graph attention layers compute ’s
embedalso define  , as the edge features between v and v. ding by aggregating information from its L-hop temporal
In order to support the dynamic inter-dependency graph, neighbors.
we avoid explicitly defining a static adjacency matrix to
describe the relation between nodes. Instead, we adopt a 5.2. Optimization and inference
state-message framework to model the node interactions.</p>
        <p>For each node v at time stamp  (here we use time se- The aforementioned model is trained in a self-supervised
ries node  as an example; the same rule applies to the fashion. As our goal is to predict the events that might
event node ), we define a state vector () to repre- happen in the time sequences in the next time step, which
sent its interaction history with other v nodes before corresponds to the edges linking the time sequence nodes
 in a compressed format. By initiating (0) as an all and event nodes. Therefore, we train a simple MLP model
zero vector, the interaction at time  is encoded with a with the edge prediction task based on ().
message vector (): We use the cross entropy to supervise the forecasting
of which original and residual events would happen in a
() = [ ,()||∆ ||(− )||(− )] (1) given time stamp . It is a two hot classification problem
where the two activated nodes corresponding to an event
where ∆  is the time elapse between the previous time node and a residual event node.
stamp − and , symbol || means concatenating operation. Event Forecasting Score: To convert the predicted event
After aggregating all the messages from neighbors, the edges ˆ into an anomaly score, for each time series node
state vector of v is updated as: v, with window size  , the event v, that has the
() = ({(1), ..., ()}, (− )) highest probability connect to v, is retrieved and its
(2) pattern in the original signal space is denoted as s, .</p>
        <p>We project the event s, ’s pattern back to its original
signal space, we then compute anomaly score based on
the dynamic time wrapping distance as follows:
Here agg(· ) is a general aggregation operation (support
learnable parameters). For the sake of simplicity, we
only compute the mean of the most recent messages 1, =   (X, , s, ) (4)
in aggregation. We use mem(· ) to represent a trainable
update function (e.g., GRU). Residual Score: For a positive residual event v+ at</p>
        <p>Build upon the updated state vector () and (), time stamp  where the forecasted results is not a
posia time-aware node embedding can be generated at any tive residual vˆ− , we calculate a changing point score to
Number of Time series nodes</p>
        <p>Number of event nodes
Time window length ( )
Time window stride ( )</p>
        <p>X−, ||)</p>
        <p>(5)
where   is a standard function that maps a scalar
into the negative log likelihood, which indicates the
sparsity of this changing point in the training data. A
frequent changing signal may results in small 2, after
the mapping. The function is learned in a data-driven
manner based on the training data of time series . The
ifnal anomaly score at time stamp  is calculated as the
following read out function (· ):
Dataset
SMD
SMAP</p>
        <p>Method
DAGMM
LSTM-VAE</p>
        <p>LSTM-NDT
OmniAnomaly
MTAD-GAT
Event2Graph</p>
        <p>DAGMM
LSTM-VAE</p>
        <p>LSTM-NDT
OmniAnomaly
MTAD-GAT
Event2Graph</p>
        <sec id="sec-5-1-1">
          <title>The proposed bipartite event graph contains  +</title>
          <p>number of nodes, where  is the number of time series
and  is the number of events ( equals to 25 and 38
in SMAP and SMD datasets,  equal to 130 and 64 on
SMAP and SMD datasets). The detailed number of nodes
is shown in Table 2. The length ( ) and stride ( ) of the
time window of each dataset are also displayed in Table
2. For each time series, we set the maximum number of
motifs detected to be three, and the minimum cluster size
of H-DBSCAN to be three. For the temporal attention
model, we set the number of multi-head to be 2. The GRU
is employed to model the time encoding. The dimension
of node embedding and message vector are both set to 64
respectively. Each model is trained after 10 epochs with
the learning rate 0.0001. The node features of both time
series and event nodes are randomly initialized, and edge
features are the one-hot embedding of the numerical ID
of the nodes on both sides of the edge.</p>
        </sec>
        <sec id="sec-5-1-2">
          <title>The objective of this experiment is to provide detailed analysis over the efectiveness of each proposed module.</title>
        </sec>
      </sec>
      <sec id="sec-5-2">
        <title>6.3. Ablation study</title>
        <p>By removing each of the critical module of our model, Table 3
the performances are reported in Table 2. Ablation study on SMD dataset.
6.3.1. Efectiveness of temporal graph attention</p>
        <sec id="sec-5-2-1">
          <title>Through replacing the temporal graph attention with</title>
          <p>a simple MLP module, the model’s F1 score is reduced
by 2.42%. It demonstrated that the temporal attention
plays an important role in aggregating information from
neighborhood nodes.
6.3.2. Efectiveness of event forecasting score 1:</p>
        </sec>
        <sec id="sec-5-2-2">
          <title>We remove the event matching score (Eq. 4) and just</title>
          <p>use a residual score (Eq. 5) for anomaly detection. We
observe that the model performance drop by 5.48%. The
experiment indicates the event forecasting module in
Event2Graph is essential for accurate anomaly detection.
From the experiment, we also observe that TGA plays a
critical role in guarantee the quality of the forecasting
score 1. By removing TGA, the forecasting-based model
performance (w/o score 2) degrade from 73.81% F1 score
to 64.51%.
6.3.3. Efectiveness of residual score 2
We remove the residual score (Eq. 5) and only use
forecasting score (Eq. 4) to detect anomaly. The model sufers
more than 10% performance degradation. This results
shows that modeling the residual score is essential to
allow the system to identify the anomaly regions that may
not well characterized by the event matching scores. We
also observe that the TGA may not contribute much to
residual event modeling, a 2 only model perform
competitively well with or without TGA. This phenomenon
tells us the residual event is something unexpected and an
accurate changing point detection is better than learning
simple, repeatable patterns.
6.3.4. Efectiveness of log likelihood function
 
To demonstrate our assumption, we compare the results
with a baseline that using a naive changing point
detection score  , without adopting any forecasting or
event modeling model. Surprisingly, the changing-point
only model is able to achieve 78.48% F1-score.
Outperforms DAGMM, LSTM-VAE, as well as LSTM-NDT. This
results further confirm that it is hard to learn any static
pattern by using LSTM or autoencoder for anomaly
detection on a challenge dataset, a dynamic model that is
able to adapt to the changing of time series patterns is
preferred.</p>
          <p>Model
Event2Graph</p>
          <p>w/o TGA
w/o score 1
w/o score 2
w/o TGA and w/o 1
w/o TGA and w/o 2
  only</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>7. Conclusion</title>
      <sec id="sec-6-1">
        <title>We proposed an event-driven bipartite graph solution</title>
        <p>for multivariate time series anomaly detection. The
solution does not assume any inter-dependency on time
series, and all the relations are learned in a dynamic,
datadriven manner. Our design is based on edge-stream so
no adjacency matrix of the graph is required as input.</p>
        <p>As the system’s memory is defined on the node level,
our design left plenty space for future extensions such as
inductive learning and parallel computation. Our
solution achieved very competitive results on two anomaly
detection datasets, and we encourage future works to
explore further using bipartite event graph for multivariate
anomaly detection.
series event prediction with evolutionary state theory, KDD’ 2017.</p>
        <p>graph, WSDM ’21,. [25] B. Zong, Q. Song, M. R. Min, W. Cheng,
[9] S. Kumar, X. Zhang, J. Leskovec, Predicting dy- C. Lumezanu, D. Cho, H. Chen, Deep autoencoding
namic embedding trajectory in temporal interac- gaussian mixture model for unsupervised anomaly
tion networks, KDD’, 2019. detection, ICLR’ 2018.
[10] D. Xu, C. Ruan, E. Körpeoglu, S. Kumar, K. Achan, [26] D. Park, Y. Hoshi, C. C. Kemp, A multimodal
Inductive representation learning on temporal anomaly detector for robot-assisted feeding
usgraphs, ICLR’ 2020. ing an lstm-based variational autoencoder, IEEE
[11] E. Rossi, B. Chamberlain, F. Frasca, D. Eynard, Robotics and Automation Letters (2018).</p>
        <p>F. Monti, M. M. Bronstein, Temporal graph net- [27] L. Shen, Z. Li, J. T. Kwok, Timeseries anomaly
detecworks for deep learning on dynamic graphs, CoRR, tion using temporal hierarchical one-class network,
abs/2006.10637 (2020). NIPS’ 2020.
[12] P. Velickovic, G. Cucurull, A. Casanova, A. Romero,</p>
        <p>P. Liò, Y. Bengio, Graph attention networks, ICLR’
2018.
[13] A. Deng, B. Hooi, Graph neural network-based
anomaly detection in multivariate time series,</p>
        <p>AAAI’ 2021.
[14] W. Yu, W. Cheng, C. C. Aggarwal, H. Chen,</p>
        <p>W. Wang, Link prediction with spatial and temporal
consistency in dynamic networks, IJCAI’ 2017.
[15] W. Yu, W. Cheng, C. C. Aggarwal, K. Zhang,</p>
        <p>H. Chen, W. Wang, Netwalk: A flexible deep
embedding approach for anomaly detection in dynamic
networks, KDD’ 2018.
[16] A. Sankar, Y. Wu, L. Gou, W. Zhang, H. Yang, Dysat:</p>
        <p>Deep neural representation learning on dynamic
graphs via self-attention networks, WSDM’, 2020.
[17] G. H. Nguyen, J. B. Lee, R. A. Rossi, N. K. Ahmed,</p>
        <p>E. Koh, S. Kim, Dynamic network embeddings:
From random walks to temporal random walks,</p>
        <p>IEEE International Conference on Big Data, 2018.
[18] R. Trivedi, M. Farajtabar, P. Biswal, H. Zha, Dyrep:</p>
        <p>Learning representations over dynamic graphs,</p>
        <p>ICLR’ 2019.
[19] C. M. Yeh, Y. Zhu, L. Ulanova, N. Begum, Y. Ding,</p>
        <p>H. A. Dau, D. F. Silva, A. Mueen, E. J. Keogh, Matrix
profile I: all pairs similarity joins for time series:
A unifying view that includes motifs, discords and
shapelets, ICDM’ 2016.
[20] Y. Zhu, C. M. Yeh, Z. Zimmerman, K. Kamgar, E. J.</p>
        <p>Keogh, Matrix profile XI: SCRIMP++: time series
motif discovery at interactive speeds, ICDM’ 2018.
[21] D. J. Berndt, J. Cliford, Using dynamic time
warping to find patterns in time series, AAAI’
(Workshop) 1994.
[22] L. McInnes, J. Healy, S. Astels, hdbscan:
Hierarchical density based clustering, J. Open Source Softw,
2017 (????).
[23] A. Mueen, Y. Zhu, M. Yeh, K. Kamgar,</p>
        <p>K. Viswanathan, C. Gupta, E. Keogh, The
fastest similarity search algorithm for time series
subsequences under euclidean distance, 2017.
[24] A. Sifer, P.-A. Fouque, A. Termier, C. Largouet,</p>
        <p>Anomaly detection in streams with extreme value</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>P.</given-names>
            <surname>Malhotra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ramakrishnan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Anand</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Vig</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Agarwal</surname>
          </string-name>
          , G. Shrof,
          <article-title>Lstm-based encoderdecoder for multi-sensor anomaly detection</article-title>
          ,
          <source>CoRR abs/1607</source>
          .00148 (????).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>K.</given-names>
            <surname>Hundman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Constantinou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Laporte</surname>
          </string-name>
          , I. Colwell, T. Söderström,
          <article-title>Detecting spacecraft anomalies using lstms and nonparametric dynamic thresholding</article-title>
          , KDD'
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Hochreiter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Schmidhuber</surname>
          </string-name>
          ,
          <article-title>Long short-term memory, Neural Comput</article-title>
          .
          <article-title>(</article-title>
          <year>1997</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Su</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Niu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Pei</surname>
          </string-name>
          ,
          <article-title>Robust anomaly detection for multivariate time series through stochastic recurrent neural network</article-title>
          , KDD'
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Duan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Cao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Tong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Tong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <article-title>Multivariate time-series anomaly detection via graph attention network</article-title>
          , ICDM'
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>J.</given-names>
            <surname>Chung</surname>
          </string-name>
          , Ç. Gülçehre,
          <string-name>
            <given-names>K.</given-names>
            <surname>Cho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bengio</surname>
          </string-name>
          ,
          <article-title>Empirical evaluation of gated recurrent neural networks on sequence modeling</article-title>
          , CoRR (????).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Cheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhuang</surname>
          </string-name>
          , G. Song,
          <article-title>Time2graph: Revisiting time series modeling with dynamic shapelets</article-title>
          , AAAI'
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>W.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yang</surname>
          </string-name>
          , Z. Cheng,
          <string-name>
            <given-names>C.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Ren</surname>
          </string-name>
          , Time-
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>