<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Data Management and Anomaly Detection Solution for the Entertainment Industry</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>(Discussion Paper)</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Michele Berno</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marco Canil</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nicola Chiarello</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Luca Piazzon</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fabio Berti</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Francesca Ferrari</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alessandro Zaupa</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nicola Ferro</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Michele Rossi</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gian Antonio Susto</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Antonio Zamperla S.p.A.</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Information Engineering, University of Padua</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>We present a smart monitoring system that combines a data management architecture with an unsupervised anomaly detection technique, targeting the automated equipment in the entertainment industry. Anomaly detection uses state-of-the-art univariate and multivariate algorithms, as well as recently proposed techniques in the field of explainable artificial intelligence, to achieve enhanced monitoring capabilities and optimize service operations. The monitoring system is here presented and tested on a real-world case study, i.e., an amusement park ride.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;anomaly detection</kwd>
        <kwd>predictive maintenance</kwd>
        <kwd>data management</kwd>
        <kwd>entertainment industry</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Antonio Zamperla S.p.A.1 designs and develops entertainment rides, continuously striving for
their rides to become safer, greener and more eficient. Smart monitoring systems are expected
to enrich the entertainment industry with a number of key features. For example, faults could
be predicted in advance, or data analysis during the testing of the ride’s prototypes could
give deeper insights onto various design choices. Furthermore, maintenance operations are
nowadays performed manually, following ride’s manuals or government directives that are
usually scheduled on a periodic basis, regardless of the actual conditions of the machines. This
implies that, oftentimes, maintenance is performed without a real need, entailing a waste of
time, human resources and material. This practice, although being cautious and robust, is quite
ineficient. Smart monitoring systems would allow a change of paradigm from the current
conservative approach to a greener and more eficient one. Lastly, automated supervision
techniques would enforce the safeness of the rides by detecting subtle anomalies, which would
be hardly identified by a human supervisor.</p>
      <p>
        In the context of advanced monitoring of complex systems, two main Machine Learning
(ML)-based technologies have emerged in recent years: unsupervised anomaly detection (AD) [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ],
that aims at providing enhanced diagnostic capabilities, and predictive maintenance [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], with
the purpose of predicting failures/degradation to enable the early intervention of maintenance
operators.
      </p>
      <p>In this work, we present a data management architecture that combines state-of-the-art
univariate and multivariate approaches for AD to reach enhanced monitoring capabilities and
optimize service operation. We use a database to store the streaming data acquired by sensors,
feed them to the AD algorithms, store back the features extracted by the AD algorithms, relate
them to alerts and maintenance events, and support an overall Web application.</p>
      <p>The paper is organized as follows: Section 2 briefly summarizes related work; Section 3
presents the considered use case; Section 4 describes the Entity-Relationship schema of the
datastore; Section 5 introduces the proposed approach for AD; Section 6 discusses the current
AD prototype and the related Web application. Finally, concluding remarks and future works
are reported in Section 7.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Unsupervised AD tools adopt (i) multivariate approaches based on tabular data and (ii)
univariate approaches working with time-series [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]: (i) multivariate approaches [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] have the
advantage of capturing multivariate anomalous behaviour that typically goes undetected by
classic chart-based monitoring tools, but, when applied to time-series data, they entail the use
of feature extraction procedures that are typically time-consuming for the developers and may
lead to loss of information; (ii) univariate approaches typically work by predicting residuals,
i.e., comparing measured and forecast time-series data, and raising an alarm as their diference
exceeds a threshold. While Deep learning techniques are available for (i) and (ii), they typically
need to be adapted to cope with discrete production data, where time-series are usually split
into batches representing the machine cycles [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Another relevant issue in the monitoring field
is the so-called concept drift, which means that the statistical proprieties of the target variables
change over time in an unforeseen way [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], posing the additional challenge of tracking or
estimating it.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Use Case</title>
      <p>The case study is represented by a coaster (Zamperla’s DangleZ ) whose seats can freely move
during the ride, independently of the underlying chassis, see Figure 1. In this case study, the
coaster track is 133 meters long.</p>
      <p>The data acquisition process provides, by means of an on-board PLC, a set of  = 55 diferent
time series acquired with irregular sampling rate. Each time series represents a signal which can
be analyzed to monitor the behavior of the machine: 9 signals are generated by the equipment
sensors that report the transit of the coaster over diferent locations of the rail; 15 signals
describe voltages, currents, frequencies and other physical quantities denoting the consumption
and the movement of the machine; 5 signals are acquired by a weather station that detects
the condition of the surrounding environment; the remaining signals are needed to check the
correct working conditions and the security of the machine (for instance, the state of the safety
button).</p>
      <p>Data are divided into sessions (or tests) and each session is composed of a number of cycles.
Each cycle collects the data generated during one run of the coaster, from the moment it leaves
the station to its return to the station. Each session contains data coming from an homogeneous
set of measurements, typically measurements in the same day or under the same wheater
conditions.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Conceptual Design</title>
      <p>• analysis related data: dictionaries (the obtained prototypes) for the univariate AD and
the corresponding drift value of each packet are stored (orange entities). Features for the
multivariate AD and their values for each packet are stored (purple entities).</p>
    </sec>
    <sec id="sec-5">
      <title>5. Anomaly Detection</title>
      <p>The proposed approach integrates univariate and multi-variate methods, as illustrated in
Figure 3:
• at a given machine cycle, a set of raw signals  = {1, . . . , } coming from the
equipment sensors (and additional context information) are fed to a feature extraction
block that computes features 1, . . . , , where it typically holds  ̸= . While signals
 are usually time-series, features  are scalars. Here, we consider the case where each
feature  is computed from a single raw signal  and we denote the set of features
computed from  by  ;
• such features are used within a multivariate AD block having two objectives: (i) obtaining
an anomaly score (· ), a quantitative indicator that summarizes the degree of “outlierness”
of the machine cycle  under exam; (ii) using an XAI approach to obtain a feature</p>
      <p>FEATURE
EXTRACTION</p>
      <p>Features
A Priori
Knowledge</p>
      <p>MULTIVARIATE
ANOMALY DETECTION</p>
      <p>List of signals
to be monitored</p>
      <p>univariately</p>
      <p>Ranking
of Variables
AnoUmnaivlyarSiactoeres
Concept Drift</p>
      <p>Measure</p>
      <p>importance score ( ), ∀  ∈ {1, . . . , }. ( ) is a quantitative index that summarizes
the impact of feature  in identifying machine cycle  as anomalous.
• raw signals that deserve to be monitored separately with a time-series based (univariate)
approach are detected based on (i) expert knowledge and on (ii) the feature importance
( ) coming from the AD module. The rationale is that some anomaly types are better
identified by analyzing a specific time-series. To be reliably detected, the respective signal
 should be independently assessed, thus avoiding the loss of information descending
from a multi-variate feature extraction procedure. To identify such signals, all data points
 are considered. Raw signals to be processed by a univariate approach are identified by
applying the following conditions:
1. a cycle  is tagged as “anomalous” if it holds</p>
      <p>( ) &gt; ,
where  is a pre-defined threshold on the anomaly score (under the assumption
that “high” values of the anomaly score indicate a high degree of outlierness). For
the problem at hand, we used  = 0.55 w.r.t. the anomaly score generated by an
Isolation Forest;
2. if, for an anomalous cycle, , we detect that the features obtained from a single
time-series  are more important than all the others in explaining the anomaly,
then the corresponding time-series  is sent to the univariate monitoring block.
Formally  is selected
if there exists
where
and
 ( ) &gt;  ( ),
,  ∈ {1, . . . , },
 ∈  ,  ∈/  ,
where  &gt; 1 is a pre-defined quantity (for our results, we have set  = 2);
3. if, besides the two previous steps, some expert knowledge indicates that some raw
signals are particularly relevant by themselves, then, they should be monitored
univariately as well.
(1)
(2)
40
35
30
]A25
[
A
r
r
u20
C
t
u
O
l.c15
D
10
5
0</p>
      <p>GWR prototypes
20 input data
prototype 0
prototype 1
prototype 2
prototype 3
prototype 4
prototype 5
prototype 6
0
10
20</p>
      <p>The user is finally provided with complementary information coming from the two module
types, with anomaly scores, system monitoring indications, ranking of variables and concept
drift estimations, allowing for a guided Root Cause Analysis.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Prototype</title>
      <p>We remark that features are not only related to physical signals acquired from the machine, but
also to parameters that characterize its working condition and the surrounding environment,
like the machine load (number of customers carried), the lubrication, the ambient temperature
and the presence of rain. To exemplify, Fig. 4 shows a particular electric current consumption
during a machine cycle: this signal was processed by extracting features related to the value of
the first current peak, the rising time (to rise to 90% of the peak’s amplitude), the maximum
current value, a standard deviation in its neighborhood, and the value of the last peak before the
signal drops to zero. Once all the features were computed, a correlation analysis was performed
to remove redundancy, obtaining vectors composed of  = 23 independent features ready
to be inputted into the Isolation Forest. For the concept drift estimation, we focused on one
specific signal, namely, the current consumption of a particular engine, since it highly correlates
with the lubrication that is performed periodically on the machine. We used, as a reference,
measurements collected in a day when the gears were lubricated and during which the machine
was tested in all its possible load configurations (i.e., recalling that, as shown in Fig. 1, four
seats are available, the possible configurations correspond to 1, 2, 3 or 4 occupied seats). The
left part of Fig. 4 illustrates these data. Some of these configurations generate traces that are
clearly identifiable also by visual inspection, as a higher number of occupied seats implies more
weight and, therefore, a higher current consumption. However, the proposed GWR approach
does this automatically, extracting and then storing the typical patterns (the prototypes) into
the dictionary depicted in the right part of Fig. 4.</p>
      <p>Data acquired from the sensors are temporarily stored into the file system where there is a
folder for each Ride containing metadata about the Ride and a set of csv files corresponding
to each Session for data Ride. These files are parsed by using Python scripts and stored in a
relational database, namely PostgreSQL2. Then, a data access layer, written in Python, provides
API to query the data to both the anomaly detection algorithms and to the Web application. The
server-side of the Web application is implemented by using the Django REST framework3. On
the client-side, we rely on AJAX calls implemented by using jQuery4 and the PLOTLY5 libraries
for delivering interactive plots.</p>
      <p>Figure 5 shows an example screenshot of the Web application we have developed. It allows
designers to explore, visualize, and analyze the acquired data, according to the approaches
2https://www.postgresql.org/
3https://www.django-rest-framework.org/
4https://jquery.com/
5https://plotly.com/
described above. In particular, Figure 5 shows the side-by-side comparison of two diferent
sessions of acquisitions and the related analysis.
In this work, an AD system to monitor amusement rides has been presented. The system
combines a database management system and a multi-faceted AD engine performing univariate
and multi-variate analyses, exposed through a Web application. The approach is unsupervised,
making it appealing for scenarios where tagged information is unavailable or unreliable. Also,
features are ranked according to their importance in explaining an alarm, using an explainable
artificial intelligence method.</p>
      <p>
        Future work may concern diferent areas: improvements of the current framework by (i) using
other norms (e.g., | · | ∞ or DTW [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]) to account for diferent types of anomalies, and (ii) adopting
semi-supervised learning techniques, including tagged data from either manual labeling or new
sensors; extensions of the system beyond AD, using it in the ride design phase to maximize its
eficiency and the service life of wear components.
      </p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>This work has been supported by the Regione Veneto POR FESR 2014-2020. Asse 1. Azione 1.1.4
(Bando per il sostegno a progetti sviluppati da Aggregazioni di imprese) initiative for the project
“Trasformazione digitale innovativa nell’industria dell’entertainment” and by MIUR (Italian
Minister for Education) under the initiative “Departments of Excellence” (Law 232/2016).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>L.</given-names>
            <surname>Stojanovic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dinic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Stojanovic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Stojadinovic</surname>
          </string-name>
          ,
          <article-title>Big-data-driven anomaly detection in industry (4.0): An approach and a case study</article-title>
          ,
          <source>in: 2016 IEEE International Conference on Big Data (Big Data)</source>
          , IEEE,
          <year>2016</year>
          , pp.
          <fpage>1647</fpage>
          -
          <lpage>1652</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>G. A.</given-names>
            <surname>Susto</surname>
          </string-name>
          , et al.,
          <article-title>An adaptive machine learning decision system for flexible predictive maintenance</article-title>
          ,
          <source>in: 2014 IEEE International Conference on Automation Science and Engineering (CASE)</source>
          ,
          <year>2014</year>
          , pp.
          <fpage>806</fpage>
          -
          <lpage>811</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Bah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hammad</surname>
          </string-name>
          ,
          <article-title>Progress in outlier detection techniques: A survey</article-title>
          ,
          <source>IEEE Access 7</source>
          (
          <year>2019</year>
          )
          <fpage>107964</fpage>
          -
          <lpage>108000</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Nasrullah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>Pyod: A python toolbox for scalable outlier detection</article-title>
          ,
          <source>Journal of Machine Learning Research</source>
          <volume>20</volume>
          (
          <year>2019</year>
          )
          <fpage>1</fpage>
          -
          <lpage>7</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Carletti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Masiero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Beghi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. A.</given-names>
            <surname>Susto</surname>
          </string-name>
          ,
          <article-title>A deep learning approach for anomaly detection with industrial time series data: a refrigerators manufacturing case study</article-title>
          ,
          <source>Procedia Manufacturing</source>
          <volume>38</volume>
          (
          <year>2019</year>
          )
          <fpage>233</fpage>
          -
          <lpage>240</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Tsymbal</surname>
          </string-name>
          ,
          <article-title>The problem of concept drift: definitions and related work</article-title>
          , Computer Science Department, Trinity College Dublin 106 (
          <year>2004</year>
          )
          <fpage>58</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>H.</given-names>
            <surname>Sakoe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chiba</surname>
          </string-name>
          ,
          <article-title>Dynamic programming algorithm optimization for spoken word recognition</article-title>
          ,
          <source>IEEE transactions on acoustics, speech, and signal processing 26</source>
          (
          <year>1978</year>
          )
          <fpage>43</fpage>
          -
          <lpage>49</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>