<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Enhanced Cyber-Physical Security through Deep Learning Techniques</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Mayra Macas</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Wu Chunming</string-name>
          <email>wuchunmingg@zju.edu.cn</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, Zhejiang University</institution>
          ,
          <addr-line>No.38, Zheda Rd, Zhejiang 310000</addr-line>
          ,
          <country country="CN">PR China</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Nowadays that various aspects of our lives depend on complex cyber-physical systems, automated anomaly detection, as well as attack prevention and reaction have become of paramount importance and directly a ect our security and ultimately our quality of life. Recent catastrophic events have demonstrated that manual, human-based management of anomalies in complex systems is not e cient enough, underlying the importance of automatic detection and intelligent response as the recommended approach of defence. We proposed an anomaly detection framework for complex systems based on monitored data storage and Statistical Correlation Analysis for di erent pairs of constituent time series of a multivariate time series segment, and unsupervised deep learning to intelligently distinguish between normal and abnormal behavior of the system. Experimental results demonstrate that the proposed model is much better than baseline methods, and it can model (inter)correlation and temporal patterns of multivariate time series e ectively.</p>
      </abstract>
      <kwd-group>
        <kwd>Anomaly detection Critical Infrastructures Deep Learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Cyber-Physical Systems (CPS) comprise a new generation of sophisticated
systems whose normal operation depends on robust communications between their
physical and cyber components. Such systems have become vital for several
industrial sectors, including water treatment and distribution plants, electrical
power grids, public transportation systems, oil re neries, and many more. As the
deployment of Internet of Things (IoT) is undergoing an exponential increase,
a rise in CPS applications for a large variety of tasks is also observed,
resulting in many systems and devices communicating and working autonomously
over networks. At the same time, CPS and IoT also increase the likelihood of
cyber-security vulnerabilities and incidents, as pointed out in the annual
statements issued by the European Agency for Network and Information Security
(ENISA) [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and the Industrial Control Systems Cyber Emergency Response
Team (ICS-CERT) [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], where exploits of the heterogeneous communication
systems in charge of managing and controlling complex environments are presented
and discussed. From the cybercriminals' perspective, the use of CPS constitutes
a unique opportunity to cause maximum damage with minimum e ort [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. In
recent years, many attempts to stealthily exploit CPS of important sectors have
occurred, such as the attack on the power grid in Ukraine [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], the Maroochy
water breach [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], the Stuxnet worm in Iranian nuclear plant [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], the Triton
malware on the Saudi oil company [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], and a growing number of attacks on energy
networks [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>
        Considering that the services provided by such systems are important for
the well-being of the community, CPS can be classi ed as Critical
Infrastructures (CI) [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Consequently, their exibility and resilience against cyber-attacks
has been a primary concern. Indeed, attacks that could corrupt or disrupt the
rendered services would have a negative impact in the context of public safety
and order, nancial losses and environmental damage. Therefore, the ability to
detect sophisticated cyber-attacks on the increasingly heterogeneous nature of
the CPS that is ampli ed by the arrival of IoT has become a crucial task. In
this paper, we focus on an unsupervised machine-learning based anomaly
detection approach, based on which we attempt to detect anomalous behavior of the
system at the physical level.
      </p>
      <p>
        Popular solutions for anomaly detection such as Statistical Process
Control (SPC) [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] methods like cumulative sum (CUSUM), Exponentially Weighted
Moving Average (EWMA) and Shewhart charts are not able to cope with the
increasingly heterogeneous nature of the CPS with the arrival of IoT. As a
result, researchers have moved beyond speci cation or signature-based techniques
and have begun to leverage both supervised and unsupervised machine
learning techniques to develop more intelligent and adaptive methods for big data,
in order to identify anomalies or intrusions [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. However, even with the use of
machine learning techniques, the detection of anomalies in time series remains
a demanding task. First, while the supervised techniques require a su cient
amount of labeled normal data and anomaly classes to learn from, anomalies
are typically scarce in a real environment. Second, most of the existing
unsupervised methods, such as distance/clustering methods [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] and temporal prediction
methods [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], may still not be able to e ectively recognize anomalies due to the
following reasons: (i) The existence of temporal dependencies in multivariate
time series. Distance/clustering methods (e.g., k-Nearest Neighbor (kNN) [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ])
and classi cation methods (e.g., One-Class SVM [
        <xref ref-type="bibr" rid="ref11 ref12">11,12</xref>
        ]) cannot capture
temporal dependencies across di erent time series. (ii) Multivariate time series data
often contain noise in real environment applications. When the noise grows
moderately severe, it can a ect the generalization ability of temporal prediction
models (e.g., LSTM-RRN [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]), causing the false positive detection rate to
increase. Fig. 1. illustrates a high-level overview of the proposed anomaly detection
framework for complex systems that aims to address the aforementioned
challenges. Based on monitored data storage and Statistical Correlation Analysis for
di erent pairs of constituent time series of a multivariate time series segment,
we employ unsupervised deep learning to intelligently distinguish between
"normal" and "abnormal" behavior of the system. More precisely, we propose an
unsupervised deep learning approach based on Spatial-Temporal Encoder-Decoder
scheme for Anomaly Detection in a complex multi-process CPS that builds upon
the trained Convolutional Neural Network Autoencoder (CNN-AE) and
Convolutional LSTM EncoderDecoder (ConvLSTM-ED) models. In greater detail, we
rst construct correlation matrices to characterize the system status. Next, a
convolutional encoder is employed to encode the patterns of the correlation
matrices, whereas a ConvLSTM-ED model captures the underlying temporal
dependencies. Finally, the convolutional decoder is used to reconstruct the correlation
matrices and is leveraged in order to detect anomalies. The central idea is that
the model will be trained only with normal data and will learn to accurately
reconstruct the respective matrices. When given an anomalous instance, it is not
expected to reconstruct it equally well, and this will result to higher
reconstruction errors compared to the ones of normal instances. Therefore, these errors can
be used to separate normal from abnormal behaviors. Our primary contributions
are the following: (i) We design an intelligent system, which is trained to detect
anomalies in complex multi-process cyber-physical systems and is built upon
Convolutional Neural Network Autoencoder and Convolutional LSTM
EncoderDecoder. (ii) We conduct extensive performance evaluation on the Secure Water
Treatment (SWat) testbed [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. Our preliminary results demonstrate the
superior performance of the proposed model compared to state-of-the-art baseline
methods.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Related works</title>
      <p>
        Unsupervised learning techniques aim to identify the hidden structure of
unlabeled data. Given that these techniques can handle a large dataset in addition to
their simplicity, they have been extensively employed in the most recent studies
on CPS intrusion detection [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. The SVM-based one-class (OCSVM)
classier [
        <xref ref-type="bibr" rid="ref11 ref12">11,12</xref>
        ] and k-means clustering algorithms are used in [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Nevertheless,
distance/clustering methods and the OCSVM classi er ignore the temporal
dependencies that exist between anomalous data points [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] and are vulnerable
to false alarms. Goh et al. [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] used deep LSTM-RNN and Cumulative Sum
(CUSUM) to detect anomalies on the rst stage of the SWAT dataset. However,
this approach is not suitable for time series a ected by external factors not
captured by sensors, making them unpredictable [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. Inoue et al. [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] conducted a
study based on OCSVM. The research was carried out on all six stages of the
SWaT dataset. The axuthors used a complex structure that treats sensors and
actuators separately, in which the outputs of the LSTM layer are used to predict
the outcome of the actuators. The predictions are combined with actual values
and are fed into a fully connected hidden layer to predict the mean value and
variance of the rst sensor. This process is repeated for the remainder of the
sensors, and then the sum of the log likelihoods of the actuator positions and
sensor values gives the outlier factor used for anomaly detection. The proposed
architecture is complex, challenging to understand and resource demanding.
      </p>
      <p>
        Recently, Kravchik et al. [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] used two deep neural network models:
1Dconvolutional (CNN) and recurrent neural network (LSTM), in order to detect
cyber-attacks on all six stages of the SWAT dataset. The authors assert that
the model with ensembled record reports rates of 86:7%, 85:4% and 86:0% for
precision, recall, and F1 score, respectively. Nevertheless, the attack detection
was performances at each stage separately, therefore the ways to learn inter-stage
dependencies (including time dependencies) were not examined.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Secure Water Treatment (SWaT) testbed dataset</title>
      <p>
        The Secure Water Treatment (SWaT) testbed [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] was designed to provide
researchers with data collected from a realistic, complex CPS environment. The
SWaT testbed is an operational small-scale water treatment plant that supplies
puri ed water. The water puri cation method in the testbed is divided in six
stages denoted P1 through P6. Each stage has a series of sensors and actuators.
The P1 stage is for raw water supply and storage, and P2 is for pre-treatment
where the water condition is evaluated. Undesired substances are then eliminated
by ultra- ltration (UF) backwash in P3. The residual chlorine is eliminated
during the Dechlorination process (P4). Subsequently, the water from P4 is pumped
into the Reverse Osmosis (RO) system (P5) to decrease inorganic impurities.
Finally, P6 stores the water that is suitable for distribution and consumption. The
sensors and the actuators at each phase are connected to the corresponding PLC
(programming logic controller), and the PLCs are connected to the SCADA
(Supervisory Control and Data Acquisitions) workstation.
      </p>
      <p>The data from 51 sensors and actuators were recorded every second by the
Historian Server. The SWaT dataset contains seven days of capturing under
normal operating conditions and a four-day-long recording during which 36 attacks
were carried out. The attack model used in the experiment simulated a system
that was already a ected by attackers, who proceed to interfere with normal
system operation and spoof the system state to the PLCs, thus causing
incorrect commands by modifying the network tra c in the level 1 network, raising
the sensors' values and issuing fake SCADA commands. The dataset includes
attacks that aim at a single stage of the system, as well as attacks targeting
simultaneously multiple stages. Furthermore, similar varieties of sensors (or
actuators) tend to react to attacks in similar fashions. The above observations
suggest we should assume a multivariate approach during model formulation,
instead of considering each sensor or actuator in the CPS as an independent
data source (univariate approach). The underlying correlation between the
sensors and actuators could be applied to accurately recognize irregularities in the
system.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Proposed Framework</title>
      <p>In this section, we rst introduce the problem we aim to study, and then we
describe the proposed model in detail.
4.1</p>
      <sec id="sec-4-1">
        <title>Problem statement</title>
        <p>Suppose we have the historical data of n time series, i.e., X = (x1; x2; ; xn)| =
(x1; x2; ; xL) 2 Rn L, where L is the size/length of the time series. Under
the assumption that there are no anomalies in the historical data, the model
aims to detect anomalous events at certain time steps after L.
4.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Statistical Correlation Analysis</title>
        <p>
          series xi = xi1; xi2; ; xiL &gt; 2 IRL and xj = xj1; xj2;
Pearson's correlation coe cient can be calculated as follows:
Following the suggestions of many recent studies [
          <xref ref-type="bibr" rid="ref20 ref21">20,21</xref>
          ], we apply statistical
correlation analysis between di erent pairs of time series in a multivariate time
series segment to characterize the system. In particular, we construct a n n
correlation matrix based on Pearson's correlation coe cient. Given two time
&gt;
        </p>
        <p>2 IRL, the
; xjL
mij =</p>
        <p>PL
g=1(xig
xi)(xjg</p>
        <p>xj )
qPL
g=1(xig
xi)2 PL
g=1(xjg
xj )2
(1)
where xi and xj represent the sample means of the two time series. In order to
investigate the e ect of characterizing system status in di erent scales (or di erent
sequences length) during anomaly detection di erent lengths of sequences were
tested in the experimental phase. Since the SWaT data were recorded every
second, we built the correlation matrices with window lengths of ` = f90; 120; 150g
(i.e., data collected within 1.5 2 and 2.5 minutes) at each time step. In this study,
the interval between the starting time of two consecutive segments is s = 10.
4.3</p>
      </sec>
      <sec id="sec-4-3">
        <title>Spatial-Temporal Encoder-Decoder scheme</title>
        <p>
          The Spatial-Temporal Encoder-Decoder (ST-ED) scheme is adapted from the
architecture proposed in [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ]. The model combines a convolutional autoencoder
[
          <xref ref-type="bibr" rid="ref22">22</xref>
          ], which learns the spatial structure of each correlation matrix, with a
ConvLSTM Encoder-Decoder [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ] that captures the temporal dependencies from the
learned spatial feature maps of every time step. Fig. 2. (left) depicts the core of
our approach.
Convolutional LSTM Units: the regular LSTM applies vector multiplications
on the input elements. That is, it treats the input as vectors and it vectorizes the
input feature map. The statistical correlation matrices, however, are composed
of both spatial and temporal components. Given that no spatial information is
considered by the LSTM, the results of such an application could be
suboptimal. In order to conserve the spatiotemporal information, the fully connected
multiplicative operations of the input-to-state and state-to-state transitions are
substituted by convolutions in ConvLSTM [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ], namely:
it =
ft =
(Wxi
(Wxf
        </p>
        <p>X t + Whi
X t + Whf</p>
        <p>Ht 1 + Wci
Ht 1 + Wcf</p>
        <p>Ct 1 + bi)
Ct 1 + bf )
Ct = ft
ot =</p>
        <p>Ct 1 + it
tanh(Wxc</p>
        <p>X t + Whc</p>
        <p>Ht 1 + bc)
(Wxo</p>
        <p>X t + Who</p>
        <p>Ht 1 + Wco</p>
        <p>Ct 1 + bo)
Ht = ot
tanh(Ct)
(2)
(3)
(4)
(5)
(6)
where it, ft, ot represent the input, forget, and output gates at time t
respectively; Ct, and Ht denote the cell outputs and the hidden states at time t; ()
and tanh() are the sigmoid and hyperbolic tangent non-linearities; denotes the
Hadamard product; * expresses the convolution operation; Wh are the lter
matrices connecting di erent gates, and bh are the corresponding bias of lters. All
the inputs X 1; ; X t, cell outputs C1; ; Ct, hidden state H1; ; Ht, and
gates it, ft, ot are 3D tensors whose last two dimensions are spatial dimensions
(rows and columns). Furthermore, this convolutional version also adds optimal
peephole connections that enable the units to derive past information better.
The advantages of ConvLSTM over regular LSTM are related to the advantages
of convolutional layers compared to linear layers: they are suitable for learning
lters, valuable for spatially invariant inputs, and they require less memory for
the parameters. The memory needed is independent of the size of the input.</p>
        <p>
          As shown in Fig. 2. (right), our model consists of three ConvLSTM
layers and the Convolutional auto-encoder is used to reconstruct the correlation
matrices. The employed loss function is the mean square error (MSE) between
the prediction result and ground truth for N time steps: N1 PtN=1 hXt X^ti2.
We use mini-batch stochastic gradient method together with Adaptive Moment
Estimation (Adam) method [
          <xref ref-type="bibr" rid="ref27">27</xref>
          ] to minimize the mean square loss function.
After training the model, the neural network is used to infer the reconstruction
correlation matrices of validation and test data. Finally, anomaly detection is
performed build upon residual correlation matrices, which is presented in the
next section.
5
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>EXPERIMENTS AND ANALYSIS</title>
      <p>In order to evaluate the performance of the proposed framework, we carried out
a comprehensive empirical study exploiting the infrastructures and the datasets
of the SWaT testbed.
5.1</p>
      <sec id="sec-5-1">
        <title>Experimental Setup</title>
        <p>
          Dataset: The SWaT dataset contains data captured on a per second basis
for 51 variables corresponding to sensors and actuators. Within the raw data,
496,800 records were collected under normal conditions, and 449,919 records
were collected while performing various cyber-attacks in the system. The rst
16,000 records of the training dataset were trimmed since it took around 5 hours
to reach stabilization when the system was rst turned on according to [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ].
During our analysis, we divided the dataset captured under normal conditions
into three parts: SN which comprises the 80% of the original normal dataset and
is used for the training of the model, VN1 which comprises the 10% and is used
for early stopping in order to avoid over- tting and VN2 which comprises the
remaining 10% and is used for determining the threshold along with the 10%
of the dataset that contains anomalies denoted by VAB1. The remaining 90% of
the anomalous dataset SAB is used for testing.
        </p>
        <p>
          Baseline methods: We compare the proposed model with the following
baseline methods: One-Class SVM [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] learns a decision function and classi es the
test dataset as similar or dissimilar to the training dataset. LSTM-ED [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ]
represents the temporal dependencies of the training dataset and predicts the value
of the test dataset. In LSTM-ED model the average prediction error over the
alltime series is considered as the anomaly score. The anomaly score of each time
point is equal to the reconstruction error of the respective correlation matrix.
If that value is larger than a given threshold which is determined empirically
over di erent datasets, then we consider that an attack is taking place.
Otherwise, we assume normal behavior. The above baseline methods are state-of-art
anomaly detection algorithms that can be used for raw time series data. The
proposed method is implemented in Python 3.5.6 with use of TensorFlow
framework version 1.11 [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ]. The baseline methods, i.e., One Class SVM and LSTM
encoder-decoder, are likewise created in Python 3.5 using the Scikit-learn
library [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ] and TensorFlow framework version 1.11, respectively. Experiments
are performed on a Linux server with 6 vCPUs and 15GB of memory.
Evaluation metrics: In order to evaluate the anomaly detection performance
tp
of each method, we use Precision, Recall and F1 scores de ned as P rec = tp+fp
Rec = tp+fn , and F1 = 2 precision
        </p>
        <p>tp precision+rerceaclalll , where tp, f p and tn denote true
positives, false positives, and false negatives, respectively. To detect anomalies,
we determine a threshold = maxfV al(t)g where V al(t) are the anomaly
scores over the join of the sets VN2 and VAB1, and 2 [1; 2] is a constant tuned
to maximize the F1 Score over validation period Recall and Precision scores over
the testing period are computed based on this threshold.</p>
        <p>
          Other settings In order to avoid over- tting, we used early stopping while
training the model. Furthermore, Dropout [
          <xref ref-type="bibr" rid="ref26">26</xref>
          ] is employed with probability 0,4
in the recurrent layers. Furthermore, we x the batch size at 128. The learning
rate and epoch are set to 0.01 and 1000 respectively. The model used hyperbolic
tangent non-linearity (tanh) as activation function. The proposed model uses an
input and output length of four.
5.2
        </p>
      </sec>
      <sec id="sec-5-2">
        <title>Experimental Results</title>
        <p>We rst demonstrate that the ST-ED scheme as the baseline method
LSTMED can learn the system features and predict them with high precision. We
note that the respective parameter settings and con gurations are described
in Section 4.3 for the ST-ED architecture and Section 5.1 for LSTM-ED. The
classi cation baseline method OC-SVM is omitted here since traditional models
behave di erently compared to deep learning methods. As mentioned in Section
4.2, we used di erent scales or window lengths (`) to characterize the system
status. In other words, ST-ED and its variants were trained, validated, and tested
on correlation matrix samples built under di erent windows lengths. For
LSTMED, we used the raw time-series data from the SWaT dataset. We identi ed that
given adequate computational capacity, the deep learning models were able to
achieve an RMSE within the range of 0.00323 (ST-ED) to 0.09873 (LSTM-ED),
as summarized in Table I. Fig. 3(a). Illustrates how the test error rate of
STED architecture changes as the employed window length varies. In particular,
the lowest error is achieved when the window length is equal to 120 (i.e., two
minutes). In Fig. 3(a) and Table I, it can be seen that the ST-ED scheme achieves
the best convergence, generating the lowest error</p>
        <p>ST-ED
LSTM-ED</p>
        <p>In the following, we compare the training and test times, as well as the di
erent model sizes, which are presented in Fig. 3(b), Fig. 3(c) and Fig. 3(d),
respectively. In particular, Figures 3(b)-3(c) demonstrate the average time/duration
per epoch, as measured at the workstation machine described in Section 5.1
during training and testing. We found that the LSTM-ED model exhibits the
fastest/shortest training and testing times due to its application to raw data,
while also being the smallest model in terms of size. Another observation is that
the ST-ED scheme can learn faster when ` = 120. The results are summarized
in Table I.</p>
        <p>
          Subsequently, we evaluate the models' performance on the six stages of the
SWaT dataset in terms of precision (Pre), recall (Rec), and F1 score.
Experiments on the dataset are repeated ve times, and the average results are reported
for comparison, presented in Table 2. We observe that the classi cation method
(OC-SVM) perform worse than the prediction model (LSTM), indicating that
the traditional methods cannot handle adequately the temporal dependencies
that exist in the dataset. The LSTM-ED and the ST-ED with ` = 120
architectures yield the largest precision. However, the ST-ED model achieves the largest
recall and F1 score for all the employed window sizes. Hence, this veri es that
the proposed spatial-temporal encoder-decoder is e cient at identifying
anomalies or outliers. Next, we demonstrate how the performance of the ST-ED scheme
varies with regard to the employed sequence window lengths ` = f90; 120; 150g.
In particular, ST-ED with ` = f90; 120g has better precision than ST-ED with
` = 150, whereas ST-ED with ` = 120 has better recall and F1 score compared
to the other window lengths. Fig. 4 provides a visual representation of the ability
of the ST-ED and LSTM-ED methods to detect anomalies.
In this paper, we have presented an anomaly detection model for the
complex CPS networks based on the combination of a convolutional autoencoder,
which learns the spatial structure of each correlation matrix, with a ConvLSTM
Encoder-Decoder that captures the temporal dependencies from the learned
spatial feature maps of every time step. We have demonstrated its improved
performance compared to two baseline models in terms of Recall and F1 metrics.
Moreover, the proposed model, contrary to study [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ], is able to model both
inter-sensor correlation and temporal dependencies of multivariate time series.
        </p>
        <p>One limitation of the present work is the fact that the experiments were
performed on one dataset from one type of industrial process. Various adversarial
attacks can be carried out against the proposed model. One such attack can
alter the training process by in uencing and corrupting the training data. On the
other hand, an exploratory attack can employ probing to discover information
about the training set. The potential adversary cannot modify or manipulate
the training data but can craft new instances based on the underlying data
distribution. Therefore, it is necessary to explore reactive and proactive defense
strategies in order to take countermeasures for adverse attacks. Apart from
addressing the aforementioned issues, this research can be expanded in several
directions: i) investigating the application of recent adversarial autoencoders as
well as adversarial variational autoencoder to anomaly detection; ii) introducing
an input attention mechanism to adaptively select the most signi cant input
features; iii) exploring other methods to performance the correlation analysis that
are robust to non-normality of the data; v) amplify the scope of the proposed
model to anomaly diagnosis, i.e., identifying the most likely cause of an anomaly;
iv) applying the proposed anomaly detection method to streaming data.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1. ENISA, \
          <article-title>European Union Agency for Network and Information Security</article-title>
          ,https:// www.enisa.europa.eu/topics/threat
          <article-title>-risk-management/threats-and-trends</article-title>
          .
          <source>Last accessed 7 Jan</source>
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>ICS-CERT</surname>
          </string-name>
          ,TheIndustrialControlSystemsCyberEmergencyResponseTeam,https: //ics-cert.
          <source>us-cert.gov. Last accessed 7 Jan</source>
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3. STELLIOS,
          <string-name>
            <surname>Ioannis</surname>
          </string-name>
          , et al.
          <article-title>A survey of iot-enabled cyberattacks: Assessing attack paths to critical infrastructures and services</article-title>
          .
          <source>IEEE Communications Surveys &amp; Tutorials</source>
          ,
          <year>2018</year>
          , vol.
          <volume>20</volume>
          , no 4, p.
          <fpage>3453</fpage>
          -
          <lpage>3495</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4. KHATOUN,
          <article-title>Rida; ZEADALLY, Sherali. Cybersecurity and privacy solutions in smart cities</article-title>
          .
          <source>IEEE Communications Magazine</source>
          ,
          <year>2017</year>
          , vol.
          <volume>55</volume>
          , no 3, p.
          <fpage>51</fpage>
          -
          <lpage>59</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5. ABRAMS, Marshall; WEISS, Joe.
          <source>Malicious control system cyber security attack case study{Maroochy Water Services, Australia. McLean, VA: The MITRE Corporation</source>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6. FALLIERE, Nicolas; MURCHU,
          <string-name>
            <surname>Liam</surname>
            <given-names>O.; CHIEN</given-names>
          </string-name>
          , Eric. W32.
          <article-title>stuxnet dossier</article-title>
          .
          <source>White paper</source>
          , Symantec Corp.,
          <string-name>
            <surname>Security</surname>
            <given-names>Response</given-names>
          </string-name>
          ,
          <year>2011</year>
          , vol.
          <volume>5</volume>
          , no 6, p.
          <fpage>29</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Blake</surname>
            <given-names>Johnson</given-names>
          </string-name>
          , Dan Caban, Marina Kroto l, Dan Scali, Nathan Brubaker, and
          <string-name>
            <given-names>Christopher</given-names>
            <surname>Glyer</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Attackers Deploy New ICS Attack Framework \TRITON" and Cause Operational Disruption to Critical Infrastructure</article-title>
          .https://www.fireeye.com/blog/threat-research/
          <year>2017</year>
          /12/ attackers-deploynew
          <article-title>-ics-attack-framework-triton</article-title>
          .
          <source>html. Last accessed 7 Jan</source>
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8. MURGUIA,
          <article-title>Carlos; RUTHS, Justin. Characterization of a cusum model-based sensor attack detector</article-title>
          .
          <article-title>En 2016 IEEE 55th Conference on Decision and Control (CDC)</article-title>
          . IEEE,
          <year>2016</year>
          . p.
          <fpage>1303</fpage>
          -
          <lpage>1309</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9. KRAVCHIK, Moshe; SHABTAI, Asaf.
          <article-title>Detecting cyber attacks in industrial control systems using convolutional neural networks</article-title>
          .
          <source>En Proceedings of the 2018 Workshop on Cyber-Physical Systems Security and PrivaCy. ACM</source>
          ,
          <year>2018</year>
          . p.
          <fpage>72</fpage>
          -
          <lpage>83</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10. MAGLARAS,
          <string-name>
            <surname>Leandros</surname>
          </string-name>
          , et al.
          <article-title>Novel Intrusion Detection Mechanism with Low Overhead for SCADA Systems</article-title>
          .
          <source>En Security Solutions and Applied Cryptography in Smart Grid Communications. IGI Global</source>
          ,
          <year>2017</year>
          . p.
          <fpage>160</fpage>
          -
          <lpage>178</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11. INOUE,
          <string-name>
            <surname>Jun</surname>
          </string-name>
          , et al.
          <article-title>Anomaly detection for a water treatment system using unsupervised machine learning</article-title>
          .
          <source>En 2017 IEEE International Conference on Data Mining Workshops (ICDMW)</source>
          . IEEE,
          <year>2017</year>
          . p.
          <fpage>1058</fpage>
          -
          <lpage>1065</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>LIN</surname>
          </string-name>
          ,
          <string-name>
            <surname>Qin</surname>
          </string-name>
          , et al.
          <article-title>TABOR: a graphical model-based approach for anomaly detection in industrial control systems</article-title>
          .
          <source>En Proceedings of the 2018 on Asia Conference on Computer and Communications Security. ACM</source>
          ,
          <year>2018</year>
          . p.
          <fpage>525</fpage>
          -
          <lpage>536</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13. GOH,
          <string-name>
            <surname>Jonathan</surname>
          </string-name>
          , et al.
          <article-title>Anomaly detection in cyber physical systems using recurrent neural networks</article-title>
          .
          <source>En 2017 IEEE 18th International Symposium on High Assurance Systems Engineering (HASE)</source>
          . IEEE,
          <year>2017</year>
          . p.
          <fpage>140</fpage>
          -
          <lpage>145</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14. GOH,
          <string-name>
            <surname>Jonathan</surname>
          </string-name>
          , et al.
          <article-title>A dataset to support research in the design of secure water treatment systems</article-title>
          .
          <source>En International Conference on Critical Information Infrastructures Security</source>
          . Springer, Cham,
          <year>2016</year>
          . p.
          <fpage>88</fpage>
          -
          <lpage>99</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15. CHANDOLA, Varun; BANERJEE, Arindam; KUMAR, Vipin.
          <article-title>Anomaly detection: A survey</article-title>
          .
          <source>ACM computing surveys (CSUR)</source>
          ,
          <year>2009</year>
          , vol.
          <volume>41</volume>
          , no 3, p.
          <fpage>15</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16. ANTON,
          <string-name>
            <surname>Simon Duque</surname>
          </string-name>
          , et al.
          <article-title>Two decades of SCADA exploitation: A brief history</article-title>
          .
          <source>En 2017 IEEE Conference on Application, Information and Network Security (AINS)</source>
          . IEEE,
          <year>2017</year>
          . p.
          <fpage>98</fpage>
          -
          <lpage>104</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17. MALHOTRA,
          <string-name>
            <surname>Pankaj</surname>
          </string-name>
          , et al.
          <article-title>LSTM-based encoder-decoder for multi-sensor anomaly detection</article-title>
          .
          <source>arXiv preprint arXiv:1607.00148</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18. KRAVCHIK, Moshe; SHABTAI, Asaf.
          <article-title>Detecting cyber attacks in industrial control systems using convolutional neural networks</article-title>
          .
          <source>En Proceedings of the 2018 Workshop on Cyber-Physical Systems Security and PrivaCy. ACM</source>
          ,
          <year>2018</year>
          . p.
          <fpage>72</fpage>
          -
          <lpage>83</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>WANG</surname>
          </string-name>
          ,
          <string-name>
            <surname>Lin</surname>
          </string-name>
          , et al.
          <article-title>Abnormal Event Detection in Videos Using Hybrid SpatioTemporal Autoencoder</article-title>
          .
          <source>En 2018 25th IEEE International Conference on Image Processing (ICIP)</source>
          . IEEE,
          <year>2018</year>
          . p.
          <fpage>2276</fpage>
          -
          <lpage>2280</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20. SONG,
          <string-name>
            <surname>Dongjin</surname>
          </string-name>
          , et al.
          <article-title>Deep r-th root of rank supervised joint binary embedding for multivariate time series retrieval</article-title>
          .
          <source>En Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery &amp; Data Mining. ACM</source>
          ,
          <year>2018</year>
          . p.
          <fpage>2229</fpage>
          -
          <lpage>2238</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>HALLAC</surname>
          </string-name>
          ,
          <string-name>
            <surname>David</surname>
          </string-name>
          , et al.
          <article-title>Toeplitz inverse covariance-based clustering of multivariate time series data</article-title>
          .
          <source>En Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM</source>
          ,
          <year>2017</year>
          . p.
          <fpage>215</fpage>
          -
          <lpage>223</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22. LONG, Jonathan; SHELHAMER, Evan; DARRELL, Trevor.
          <article-title>Fully convolutional networks for semantic segmentation</article-title>
          .
          <source>En Proceedings of the IEEE conference on computer vision and pattern recognition</source>
          .
          <year>2015</year>
          . p.
          <fpage>3431</fpage>
          -
          <lpage>3440</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>XINGJIAN</surname>
            ,
            <given-names>S. H. I.</given-names>
          </string-name>
          , et al.
          <article-title>Convolutional LSTM network: A machine learning approach for precipitation nowcasting</article-title>
          .
          <source>En Advances in neural information processing systems</source>
          .
          <year>2015</year>
          . p.
          <fpage>802</fpage>
          -
          <lpage>810</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24. ABADI,
          <string-name>
            <surname>Mart n</surname>
          </string-name>
          , et al.
          <article-title>Tensor ow: A system for large-scale machine learning</article-title>
          .
          <source>En 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16)</source>
          .
          <year>2016</year>
          . p.
          <fpage>265</fpage>
          -
          <lpage>283</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25. PEDREGOSA,
          <string-name>
            <surname>Fabian</surname>
          </string-name>
          , et al.
          <article-title>Scikit-learn: Machine learning in Python</article-title>
          .
          <source>Journal of machine learning research</source>
          ,
          <year>2011</year>
          , vol.
          <volume>12</volume>
          , no Oct, p.
          <fpage>2825</fpage>
          -
          <lpage>2830</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26. SRIVASTAVA,
          <string-name>
            <surname>Nitish</surname>
          </string-name>
          , et al.
          <article-title>Dropout: a simple way to prevent neural networks from over tting</article-title>
          .
          <source>The journal of machine learning research</source>
          ,
          <year>2014</year>
          , vol.
          <volume>15</volume>
          , no 1, p.
          <fpage>1929</fpage>
          -
          <lpage>1958</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          27. KINGMA,
          <string-name>
            <surname>Diederik</surname>
            <given-names>P.</given-names>
          </string-name>
          ; BA, Jimmy. Adam:
          <article-title>A method for stochastic optimization</article-title>
          .
          <source>arXiv preprint arXiv:1412.6980</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>