<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Approach for Unsupervised Failure Detection in Smart Industry</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>(Discussion Paper)</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Salvatore Iiritano</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Angelica Liguori</string-name>
          <email>angelica.liguori@dimes.unical.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giuseppe Manco</string-name>
          <email>giuseppe.manco@icar.cnr.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ettore Ritacco</string-name>
          <email>ettore.ritacco@icar.cnr.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Massimilano Rufolo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Revelis S.r.l.</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Anomaly detection</institution>
          ,
          <addr-line>Failure detection, Fault detection, Time-series analysis, Embeddings, Siamese net-</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Computer Engineering</institution>
          ,
          <addr-line>Modeling, Electronics and</addr-line>
          ,
          <institution>Systems University of Calabria</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Institute for High Performance Computing and Networking of Italian National Research Council</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2021</year>
      </pub-date>
      <fpage>5</fpage>
      <lpage>9</lpage>
      <abstract>
        <p>We propose an unsupervised anomaly detection model that is able to identify abnormal behavior by analysing streaming data coming from IoT sensors installed on critical devices. The proposed model is based on a Siamese neural network which embeds time series windows in a latent space, thus generating distance-based clusters of normal behavior. We experiment the proposed model on a case study aimed at the predictive maintenance of elevators where specific sensors measure the oscillations of the lift during its daily use. The experiments show that the proposed model successfully isolates anomalous oscillations thus correlating them to prospective malfunctions and thus preventing possible faults. works.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Maintenance is one of the most important activities in support of all industrial production
systems. It represents both an opportunity and a bottleneck at the same time. In fact, if on the one
hand it makes it possible to avoid the occurrence of machinery breakdowns, service interruptions
and possibly monetary penalties, on the other hand it risks slowing down industrial processes,
since it requires starting activities not strictly correlated with the core business, committing the
production machinery and numerous resources in terms of technicians, tools, time, possible
costs.</p>
      <p>
        Fault detection and prevention [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] is one of the most critical part of predictive maintenance,
as it aims at identifying an anomalous behavior of the system and avoid sudden interruptions
and catastrophic machine failures. Typically, the focus is on developing solutions that produce
warnings when an anomalous behavior is detected and fault detection methods based on machine
learning aim at analyzing historical data in order to devise models of failure or malfunction,
based on supervised learning. In this respect, the predictive maintenance process has largely
benefited from the large-scale adoption of sensor devices. With the advent of the IoT technology,
numerous sensors can be installed on production devices, which produce significant amounts
of data based on their sampling frequency.
      </p>
      <p>Despite their intrinsic predictive value, these enormous streams of data represent a challenge
in several respects. First of all, sensor streams are very noisy temporal sequences, with excessive
dimensionality and afected by specific issues (e.g. burst efects, seasonality). It is necessary to
devise automatic methods capable of filtering out noise and produce efective representations
that can be fruitfully exploited in predictive models. Moreover, often the presence of streaming
data requires also the capability to meet real-time requirements. Maintenance solutions must
be able to provide timely warnings to maintenance experts.</p>
      <p>However, in a vast majority of industrial situations, there are no samples that represent the
presence of failure, hence, the adoption of unsupervised methods seems better suited. The
choice between supervised and unsupervised approaches does not depend only on the presence
or absence of the fault indicator. There are numerous scenarios in which, despite the presence of
such a flag, the classification cannot succeed in a quality prediction. Failures are rare events and
are often not suficient to define the prediction patterns that are necessary for any classification
technique.</p>
      <p>In this paper, we study a specific scenario where the above mentioned issues take place. The
paper is focused on a study where the adoption of sensing technology, combined with machine
learning, can efectively characterize the behavior of lifting systems and thus devise efective
predictive maintenance strategies aimed at ensuring the stability and eficiency of the elevator,
as well as preventing future breakdowns. The intuition is that, in their daily routine, elevators
produce oscillations which can be registered and analyzed. A methodology for devising profiles
of normal oscillations can hence help us detect any deviation from typical signatures and hence
prospective eficiency issues.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        In the maintenance field, traditional unsupervised approaches use one-class classifier, e.g.
Oneclass Support Vector Machines [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ] or distance-metrics, such as e.g. Isolation Forest [
        <xref ref-type="bibr" rid="ref4">4, 5</xref>
        ].
Autoencoders [6] are the core of Unsupervised Anomaly Detection methods based on Deep
Learning and most recent literature on unsupervised failure detection is based on them, such
as [7, 8, 9] in which the reconstruction error is used as anomaly score and, their variants,
such as [10, 11] that propose an unsupervised fault detection based on a stacked autoencoder
and on a sparse autoencoder, respectively. Autoencoders can also be adapted to cope with
streaming data. Lindemann et al. [12] propose an anomaly detection system by combining an
autoencoder with Long short-term memory (LSTM) [13]. By contrast, Jiang et al. [14] propose an
unsupervised fault detection based on denoising autoencoder (DAE). Xiang et al. [15] propose a
framework based on Variational Autoencoder (VAE) [16] in which the Gated Recurrent Network
(GRU) [17] network unit is introduced into VAE network to replace the traditional neural unit
in encoder and decoder. Alfeo et al. [18] combines an autoencoder with a heuristic-based
discriminator in order to improve the interpretability of the detection. Jian and Zhiyan [19]
propose an unsupervised fault detection method based on adversarial auto-encoders in which
a discriminator is able to refine the reconstruction errors and hence. In [ 20] a one-class fault
detection based on unsupervised training of Generative Adversarial Networks [21] is proposed,
where the problem of distinguish real from fake data is converted in distinguish normal from
anomalous data, by forcing the generator in producing normal-like generated data.
      </p>
      <p>Most of the unsupervised failure detection systems in the literature exploit the reconstruction
error to discriminate between normal data and anomalies. Usually, when the reconstruction error
is used as measure of outlierness, a threshold is defined such that the data whose reconstruction
error is above the threshold is marked as outlier. Defining a threshold is very hard, especially
when there is no knowledge background.</p>
      <p>Unlike these works, our proposal exploits the philosophy of the Siamese networks [22] to
map the data into a latent space so that data that belong to the same category are located in
the same area with respect that data that belong to diferent category. This idea mitigates the
overfitting problem of the cited approaches: sequences are analyzed according to a collective
approach instead of comparing any rebuilt sequence with only its original version.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Setting</title>
      <p>Let  be a set of devices characterized by functionality, structure, purpose and/or working
environment. Each device  ∈ 
of events   = {

(1)
, 

(2)
, 

(3), …}, where the superscript indicates the time step. Each event  () is</p>
      <p>is equipped with a set sensors ℳ emitting temporal sequences
a real vector of size |ℳ | containing the values observed by the sensors. We further assume that
  is labeled by a specific category which characterizes the process being monitored. Categories
can also refer to the underlying device (i.e., each device can be seen as a distinct category);
however, they can also describe the situations that the sensors are measuring, such as an elevator

moving up with a load of two persons.</p>
      <p>The objective of our research is to detect anomalous situations within a sequence   associated
with a device. The basic approach consists in building a machine learning model which is capable
of characterize the profile of each sequence, and mark anomalies whenever a given sequence
does not fit that profile. However, in order to apply such a methodology, we need to face
two main problems. First of all, there are several diferent profiles that can characterize each
sequence. The categories encode diferent situations and we can expect that events marked
with a specific category are diferent from events labeled with a diferent one. We can hence
assume that the expected number of categories is high. Furthermore, we assume that |ℳ | is
large but the overall number of sequences is small compared to the number of categories. That
is, we can expect that each sequence is labeled with almost a diferent category. This clearly
poses a problem in the learning stage, since it is not possible to build specific profiles due to the
lack of suficient training data for each category.</p>
      <p>The solution we propose is a methodology divided into two parts. The first part consists in a
data transformation approach that allows the definition of a classification problem. The second
part consists in the exploitation of a modular Siamese network able to map the input (sequence
fragments) into data points lying on a latent space, by solving the aforementioned classification
problem.</p>
      <p>The latent space has a geometric interpretation: points that are close in the space correspond
to devices that exhibit, in some time interval of their working process, similar behavior. The
core concept is that clusters of these latent data points represent the diferent working modes
of the target devices. Any element that is on the edge of a cluster or out of all of them can be
highlighted to maintenance experts for further investigation. This allows to approach situation,
typical in many industrial processes, where critical anomalies (e.g. failures) are extremely rare.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Methodology</title>
      <p>Since sequences may have diferent sizes, our methodology applies a sliding window extraction
procedure to generate, for each sequence, a set of fixed-size observation windows. Notice that
if the window shift is lower than the window size, contiguous windows partially overlap. Each
window is a time frame that partially describes the behavior of the target devices during the
interval of observation. For the rest of the paper we will name  = { 1,  2, …} the set of all
the time windows and we will use the function (  ) for denoting the category relative to the
device that generated the  -th time window.</p>
      <p>The set  and the function  enable the definition of the following classification problem:
given two arbitrary time windows   and   , the goal is to predict if they belong to the same
category, i.e. whether (  ) = (  ).</p>
      <p>The solution we propose to address this problem is a Siamese neural network, shown in
Figure 1, composed by two modules. The first one is the Embedding model that maps a time
window into the high-dimensional latent space. The Embedding subnet is a sequential model
composed by a recurrent neural network (we exploited a LSTM layer [13]), for catching the time
dependencies within each window, and a feed-forward neural network (a dense embedding
layer), that generates the data points in the latent space. The second module is the Distance
subnet that outputs the euclidean distance between two embeddings.</p>
      <p>The working flow of the whole architecture is described as follows. The input of the network
is a pair of time windows,   and   , that are randomly sampled from  . The set of all the
built-up pairs, called   , suitably represent each category, providing a suficient number of
positive (windows belonging to the same category) and negative (windows belonging to diferent
categories) comparisons. Both   and   pass through the Embedding subnet, the Siamese part
of the network, that will produce two embeddings, respectively   and   , that will feed the
Distance subnet that will computes and returns their euclidean distance. The loss function we
chose is the following:
 =
1</p>
      <p>∑  , ⋅  (  ,   ) − (1 −  , ) ⋅ log (1 −  − (    ))
|  |   ,  ∈ 
(1)
where  , is equal to 1 if (  ) = (  ), 0 otherwise, and  (  ,   ) is a function that computes
the euclidean distance between the embeddings   and   of the time windows   and   ,
respectively.</p>
      <p>The proposed loss function encourages the network to generate pairs of embeddings that are
close in the latent space if  , = 1; on the other hand, if the two categories are diferent, the
network will produce pairs of embeddings that are distant.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Predicting Elevator Anomalous Oscillations</title>
      <p>We applied our methodology to a real case study whose objective is to monitor the health status
and the working process of an elevator in an ofice building. The sensor system placed in the
elevator (see Figure 2 for details) records the movements in the  ,  and  axes of the oscillations
of the elevator guides and cabin, the inclination of the cabin and the magnetic field intensity in a
relational data structure. Each record is a time-series collecting sensor emissions in a interval of
time. Moreover, records provide further information about the operation status of the elevator,
highlighting, for each time step its position, movement and door status.</p>
      <p>In this case study, we applied our methodology on a limited set of similar elevators in an ofice
building. Thus, their sequences were split into fixed size time windows that were randomly
paired up to be processed by the Siamese network. The category labels we used were the
operational modalities: (i) Stationary; (ii) Moving up; (iii) Moving down; (iv) Opening doors; (v)
(a) Normal embeddings
(b) Normal and anomalous embeddings
Closing doors. Each of these modalities is further labeled by contour conditions representing
the load (number of people) in the lift.</p>
      <p>The result of the training phase was that, for each elevator, the Network understood the
diferent normal behavior models, which map into clusters of the latent space. To allow a friendly
visualization of the embeddings, we used the t-distributed stochastic neighbor embedding
(tSNE) library [23]. A 2D t-SNE plot of the learnt behavior clusters is shown in Figure 3a, where
each point is related to the embedding of a time window, while colors indicate the categories
the windows belong to. As can be seen, the network found out 7 diferent behavioural clusters,
in which there is a dominance of a color. The partial color overlapping is due to two factors.
On one hand, category labels were noisy since they were produced by humans with external
chronometers, thus, making, for each category session, the initial and final time windows
imprecise. On the other hand, the sensors we used where not able to find appreciable diferences
in vibration when doors were opening or closing.</p>
      <p>In order to observe the capability of the model to isolate anomalies we performed new
experiments in which passengers stopped and restarted several times the elevator movements or
produced (weak) unexpected vibrations. As shown in the Figure 3b, the 2d t-SNE transformation
of the embeddings, provided by the network fed with these anomalous sequences, generated
points, labeled as Noise, that are outside the clusters.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusions</title>
      <p>We proposed a new methodology for addressing the early detection problem of faults in critical
devices that equip sensors that generate time sequences of observations. In particular, the
methodology is designed to efectively work in settings where explicit information about
previous failures is missing, overcoming the hindrance of the exploitation of supervised detection
approaches. Assuming that failures are rare events during the life time of a device, the proposed
methodology supports a maintenance expert in easily identifying them as anomalous elements
that are distant from all the clusters of normal behavior.</p>
      <p>Experiments on a real case study showed the capability of the proposal in efectively isolating
anomalous time frames, suggesting that its application fields can span in many diferent and
more complex scenarios.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>This work has been partially supported by the Calabria Region (ITALY) under the project ”RAISE
- Revelis Artificial Intelligence Smart Environment” - POR CALABRIA FESR-FSE 2014-2020,
ASSE I – PROMOZIONE DELLA RICERCA E DELL’INNOVAZIONE Obiettivo Specifico 1.4
“Aumento dell’incidenza di specializzazioni innovative in perimetri applicativi ad alta intensità
di conoscenza” Azione 1.4.1 “Sostegno alla creazione e al consolidamento di startup innovative
ad alta intensità di applicazione di conoscenza e alle iniziative di spin-of della ricerca”.
[5] F. T. Liu, K. M. Ting, Z.-H. Zhou, Isolation-based anomaly detection, ACM Trans. Knowl.</p>
      <p>Discov. Data 6 (2012).
[6] D. Bank, N. Koenigstein, R. Giryes, Autoencoders, 2020. a r X i v : 2 0 0 3 . 0 5 9 9 1 .
[7] D. F. Oliveira, L. F. Vismari, J. R. de Almeida, P. S. Cugnasca, J. B. Camargo, E. Marreto,
D. R. Doimo, L. P. F. de Almeida, R. Gripp, M. M. Neves, Evaluating unsupervised anomaly
detection models to detect faults in heavy haul railway operations, in: 2019 18th IEEE
International Conference On Machine Learning And Applications (ICMLA), 2019, pp.
1016–1022.
[8] D. F. N. Oliveira, L. F. Vismari, A. M. Nascimento, J. R. de Almeida Jr au2, P. S. Cugnasca,
J. B. C. J. au2, L. Almeida, R. Gripp, M. Neves, A new interpretable unsupervised anomaly
detection method based on residual explanation, 2021. a r X i v : 2 1 0 3 . 0 7 9 5 3 .
[9] E. Principi, D. Rossetti, S. Squartini, F. Piazza, Unsupervised electric motor fault detection
by using deep autoencoders, IEEE/CAA Journal of Automatica Sinica 6 (2019) 441–451.
[10] K. H. Park, E. Park, H. K. Kim, Unsupervised fault detection on unmanned aerial vehicles:</p>
      <p>Encoding and thresholding approach, Sensors 21 (2021).
[11] X. Liang, F. Duan, I. Bennett, D. Mba, A sparse autoencoder-based unsupervised scheme
for pump fault detection and isolation, Applied Sciences 10 (2020).
[12] B. Lindemann, F. Fesenmayr, N. Jazdi, M. Weyrich, Anomaly detection in discrete
manufacturing using self-learning approaches, Procedia CIRP 79 (2019) 313–318.
[13] S. Hochreiter, J. Schmidhuber, Long short-term memory, Neural Comput. 9 (1997)
1735–1780.
[14] G. Jiang, P. Xie, H. He, J. Yan, Wind turbine fault detection using a denoising autoencoder
with temporal information, IEEE/ASME Transactions on Mechatronics 23 (2018) 89–100.
[15] G. Xiang, R. Tao, Y. Peng, K. Tian, C. Qu, Unsupervised deep learning for fault detection on
spacecraft using improved variational autoencoder, in: 2020 Chinese Automation Congress
(CAC), 2020, pp. 5527–5531.
[16] D. P. Kingma, M. Welling, Auto-encoding variational bayes, 2014. a r X i v : 1 3 1 2 . 6 1 1 4 .
[17] J. Chung, C. Gulcehre, K. Cho, Y. Bengio, Empirical evaluation of gated recurrent neural
networks on sequence modeling, 2014. a r X i v : 1 4 1 2 . 3 5 5 5 .
[18] A. L. Alfeo, M. G. Cimino, G. Manco, E. Ritacco, G. Vaglini, Using an autoencoder in the
design of an anomaly detector for smart manufacturing, Pattern Recognition Letters 136
(2020) 272–278.
[19] W. Jian, H. Zhiyan, A novel fault detection method based on adversarial auto-encoder, in:
2020 39th Chinese Control Conference (CCC), 2020, pp. 4166–4170.
[20] P. Spyridon, Y. S. Boutalis, Generative adversarial networks for unsupervised fault
detection, in: 2018 European Control Conference (ECC), 2018, pp. 691–696.
[21] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville,
Y. Bengio, Generative adversarial nets, in: Proceedings of the 27th International Conference
on Neural Information Processing Systems - Volume 2, NIPS’14, MIT Press, Cambridge,
MA, USA, 2014, p. 2672–2680.
[22] G. Koch, R. Zemel, R. Salakhutdinov, Siamese neural networks for one-shot image
recognition, 2015.
[23] L. van der Maaten, G. Hinton, Visualizing data using t-SNE, Journal of Machine Learning
Research 9 (2008) 2579–2605.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>N.</given-names>
            <surname>Amruthnath</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Gupta</surname>
          </string-name>
          ,
          <article-title>A research study on unsupervised machine learning algorithms for early fault detection in predictive maintenance</article-title>
          ,
          <source>in: 2018 5th International Conference on Industrial Engineering and Applications (ICIEA)</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>355</fpage>
          -
          <lpage>361</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>B.</given-names>
            <surname>Schölkopf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. C.</given-names>
            <surname>Williamson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. J.</given-names>
            <surname>Smola</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Shawe-Taylor</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. C.</given-names>
            <surname>Platt</surname>
          </string-name>
          , et al.,
          <article-title>Support vector method for novelty detection</article-title>
          .,
          <source>in: NIPS</source>
          , volume
          <volume>12</volume>
          ,
          <string-name>
            <surname>Citeseer</surname>
          </string-name>
          ,
          <year>1999</year>
          , pp.
          <fpage>582</fpage>
          -
          <lpage>588</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M.</given-names>
            <surname>Amer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Goldstein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Abdennadher</surname>
          </string-name>
          ,
          <article-title>Enhancing one-class support vector machines for unsupervised anomaly detection</article-title>
          ,
          <source>in: Proceedings of the ACM SIGKDD Workshop on Outlier Detection and Description</source>
          , ODD '13,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2013</year>
          , p.
          <fpage>8</fpage>
          -
          <lpage>15</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>F. T.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. M.</given-names>
            <surname>Ting</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.-H.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <article-title>Isolation forest</article-title>
          ,
          <source>in: Proceedings of the 2008 Eighth IEEE International Conference on Data Mining, ICDM '08</source>
          , IEEE Computer Society, USA,
          <year>2008</year>
          , p.
          <fpage>413</fpage>
          -
          <lpage>422</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>