A cloud-to-edge architecture for predictive analytics

    David Bowden∗ , Angelo Marguglio† , Lucrezia Morabito‡ , Chiara Napione‡ , Simone Panicucci‡ ,

      Nikolaos Nikolakis§ , Sotiris Makris§ , Guido Coppo∗∗ , Salvatore Andolina∗∗ , Alberto Macii†† ,

                              Enrico Macii‡‡ , Niamh O’Mahony∗ , Paul Becker§§ , Sven Jung§§
                                           ∗ DELL EMC, Cork, IRELAND, name.surname@dell.com
                        † Engineering Ingegneria Informatica S.p.A., Palermo, ITALY, name.surname@eng.it
                                        ‡ COMAU S.p.A., Turin, ITALY, name.surname@comau.com
    § Laboratory for Manufacturing Systems & Automation, Department of Mechanical Engineering and Aeronautics,

                        University of Patras, Patras, GREECE, name.surname@lms.mech.upatras.gr
                         ∗∗ SynArea Consultants S.r.l., Turin, ITALY, name.surname@synarea.com
    †† Department of Control and Computer engineering, Politecnico di Torino, Turin, ITALY, alberto.macii@polito.it
     ‡‡ Interuniversity Department of Regional and Urban Studies and Planning, Politecnico di Torino, Turin, ITALY,

                                                   enrico.macii@polito.it
                §§ Fraunhofer Gesellschaft zur Förderung der angewandten Forschung, Aachen, Germany,

                                              name.surname@ipt.fraunhofer.de

ABSTRACT                                                                        those data and creates insight, aiming to enable predictive an-
Data management and processing to enable predictive analytics                   alytics at the edge. With the goal of anticipating failures and
in cyber physical systems, holds the promise of creating insight                estimating the remaining useful life (RUL) of physical equipment,
into the underlying processes, discovering criticalities and pre-               a two-tier data analytics architecture has been developed. This
dicting imminent problems. Hence, proactive strategies can be                   architecture, the "SERENA" system, can identify the symptoms
adopted, with respect to predictive analytics. This paper discusses             of imminent machine failure, through the characterization of the
the design and prototype implementation of a plug-n-play end-                   current dynamics of the process/machine (at any given time) us-
to-end cloud architecture, enabling predictive maintenance of                   ing on-line data collected in the factory. A scalable and modular
industrial equipment. This is enabled by integrating edge gate-                 approach has been taken in the design of the architecture, de-
ways, data stores at both the edge and the cloud, and various                   coupling the overall design from any specific set of technologies.
applications, such as predictive analytics, visualization and sched-            For testing and validating the proposed approach, a prototype
uling, integrated as services in the cloud system. The proposed                 has been implemented, integrating services such as visualization,
approach has been implemented into a prototype and tested in                    scheduling and predictive analytics. This prototype was validated
an industrial use case related to the maintenance of a robotic                  in a real-world scenario involving anomaly detection on a robotic
arm.                                                                            axis and concerning the maintenance requirements caused by the
                                                                                backlash effect. The visualization service enables a real-time data
                                                                                stream and machine visualization, while the predictive analytics
1    INTRODUCTION                                                               services generate the estimated RUL value, which is consumed by
                                                                                the scheduling service to proactively schedule the maintenance
The advent of Industry 4.0 trend in automation and data exchange,               activities.
leads to a constant evolution towards smart environments, in-                      This paper is organized as follows. Section 2 discusses the
cluding an intensive utilization of Cyber-Physical System (CPS).                state-of-the-art analytics approaches targeting to Industry 4.0.
This promotes a full integration of manufacturing IT and control                Section 3 describes the proposed cloud-to-edge architecture for
systems with physical objects embedded with electronics, soft-                  predictive analytics, while Section 4 introduces the industrial
ware and sensors. This new industrial model leads to a pervasive                use case. Section 5 presents the preliminary results achieved by
integration of information and communication technology into                    exploiting the proposed architecture in a real use-case. Finally,
productive components, generating massive amounts of data.                      Section 6 draws conclusions and discusses future developments of
Powerful and reliable cyber-physical architectures are becoming                 the proposed cloud-to-edge architecture for predictive analytics
prominent to effectively analyze such large amounts of data, cre-               in Industry 4.0.
ating insight into the production process, and, thus, enabling its
improvement, as well as competitive business advantages.
   This paper presents a cloud architecture designed for the In-                2   RELATED WORK
dustry 4.0 vision, bridging the gap between the physical world,                 A large variety of studies have been carried out to develop effi-
which provides raw data, and the cyber space, which processes                   cient data management systems, data analytics engines, business
                                                                                processes and risk assessment in Industry 4.0. The authors in
© 2019 Copyright held by the author(s). Published in the Workshop Proceedings   [15] presented a case study exploiting Big Data analytics to im-
of the EDBT/ICDT 2019 Joint Conference (March 26, 2019, Lisbon, Portugal) on    prove production processes. It exploited a methodology, called
CEUR-WS.org.                                                                    Cross-Industry Standard Process for Data Mining, to present
                                                                                and organize results for better understanding businesses. The
work in [7] presented an integrated self-tuning engine for pre-
dictive maintenance in Industry 4.0. Specifically, a distributed
architecture, based on Apache Kafka, Spark Streaming, MLlib,
and Cassandra was proposed and discussed. The proposed ap-
proach integrated the monitoring and prediction tasks, along
with a self-tuning approach for the dynamic selection of the best
predictive algorithm, and specific attention to providing inter-
pretable knowledge to end users. Manufacturing computerization
is another crucial issue to be addressed in the Industry 4.0 ecosys-
tem. The study presented in [11] proposed a semantic reduction
of heterogeneous sources, based on Semantic Web approaches,
to foster better analytics implementations.
    Another interesting issue to address is the increasing amount
of data to be managed by machine learning techniques. In this
context, an interesting comparison between multi-class classifiers
and deep learning techniques is discussed in [12]. Furthermore, a
comparative experimental analysis of exploratory techniques for                      Figure 1: SERENA Architecture
Big Data is provided in [3]. The authors in [2] present a Big-Data
scalable predictive approach in the energy domain Industry. The
                                                                       3     THE SERENA APPROACH
study presented in [9] proposed a framework for on-demand
remote sensing data analysis to speed up the execution of models       The SERENA system comprises a number of services, which
by reducing data transfers through the network. This allows            collectively provide predictive analytics functionality, enabling
for classical remote data service systems to evolve into remote        predictive maintenance policies to be applied. It is implemented
sensing data processing infrastructures.                               using a light-weight micro-services architecture, which utilizes
    Advanced Internet of Things (IoT) and ICT technologies al-         Docker containers to wrap the services into deployable units.
low linking physical manufacturing facilities and machines in          The services can then be distributed across the SERENA hybrid
integrated applications. The authors in [5] provided a review of       cloud, extending their functionality out to edge gateways on the
virtualized and cloud-based services in the context of manufac-        factory floor. The distribution of services and dynamic commu-
turing systems. A predictive maintenance approach involving            nications channels is implemented using a Docker orchestration
cyber-physical systems with wide IoT capabilities along with           manager. Wrapping services in containers abstracts them from
complex event processing features was discussed.                       the underlying host infrastructure. As Docker is a commonly
    Among the most widespread maintenance approaches, con-             supported open source technology, the SERENA system can be
dition based maintenance (CBM) is usually considered the most          deployed on a wide variety of infrastructures, from hardware
effective. Efficiently determining the health status of a monitored    servers and gateways, through virtual machines, to hosted envi-
device, in such context, is of major importance. Prognostics and       ronments on public and hybrid clouds. Using the same Docker
diagnostics applied to raw data collected from sensors aim to de-      solution across the SERENA hybrid cloud and gateways, gives
termine the health of the monitored system or equipment. To this       the system a unified architecture, which can be operated and
end, detecting and analyzing underlying data trends allow anom-        managed as a single unit. The services represent logical elements
alies to be discovered. An overview of data analytics techniques       that provide defined functionality in the SERENA system. Whilst
for anomaly detection is provided in [13]. The authors exploited       the SERENA reference implementation uses specific technology
artificial neural networks in large systems to effectively predict     to realize each service, the common interface allows technology
their health. Prognostics or predictive analytics are usually as-      to be swapped, depending on the specific implementation require-
sociated with the computation of a key performance indicator,          ments. This technology transparency is an important concept in
such as the RUL. The authors in [17] presented a deep-belief net-      SERENA’s plug-n-play architecture. Figure 1 illustrates the main
work ensemble method with multiple objectives to estimate RUL.         components of the SERENA system and their interactions, which
Similarly, the authors in [4] exploited a neural-network prognos-      are further described in the following subsections.
tics model to support industrial maintenance scheduling. The
failure probabilities were computed from real-world equipment          3.1    Services
measurements through a logistic regression approach. Such mea-         The SERENA system is designed to integrate external applications
surements were then routed to a prognostics model to forecast          using a service-oriented approach. In this context, the following
failure conditions and, finally, to estimate RUL. In this scenario,    services have been designed, implemented and integrated in the
predictive analytics are affected by the quality of data used for      above mentioned system:
prediction. The authors in [6] proposed a method for improving            A predictive analytics service, based on machine learning
data quality in diagnosing the health of devices and production        techniques, to forecast future failures of machinery/equipment.
equipment. First, a visualization-based grouping, based on the         The aim of this service, whose functional building blocks are
dissimilarity spectrum, was performed on critical measurements,        shown in Figure 2, is two-fold: (i) Building a prediction model,
which were then clustered and evaluated, in terms of their fit-        based on historical data, by means of machine learning algo-
ness and separation with each other. An outlier-detection visual       rithms; and (ii) applying such a model in real time to new in-
assessment was also presented to identify outliers in the data.        coming data streams, to identify possible failures. A two-tier
                                                                       architecture exploiting both edge and cloud computing has been
                                                                       proposed to address phase (i) in the cloud, exploiting (theoreti-
                                                                       cally) unlimited resources, while phase (ii), which requires less
                                                                          operations to the local technicians or remote experts in an effec-
                                                                          tive and intuitive way.

                                                                          3.2    Edge gateway
                                                                          As illustrated on the left of Figure 1 the edge gateways are lo-
                                                                          cated on the factory floor and collect sensor data from industrial
                                                                          equipment and channel it, through the data flow engine, to the
                                                                          communications broker running in the SERENA hybrid cloud.
                                                                          The gateways also host analytics models, which are used to pro-
Figure 2: The predictive analytics service: main building                 cess data at the edge, converting raw data into smart data. Both
blocks                                                                    the data flow engine and the analytics model are deployed to the
                                                                          gateway as Docker containers, under the control of the Docker
                                                                          orchestration manager. The gateway can host multiple types of
computation resources, runs at the edge due to the limited re-            data flow engines and analytics models, depending on the types
sources and to increase responsiveness.                                   of equipment that are being monitored . The majority of the raw
   The smart data block derives relevant static features from the         sensor data is transformed into smart data, by the analytics model
raw data (in many cases raw data are time series), supporting             running on the gateway, but when specific criteria are met, a
the predictive maintenance goal. Smart data represents the key            sample of the raw sensor data is sent to the SERENA cloud for
characteristics of the raw data, as well as context information           more in-depth analysis and to train the analytical models. Typi-
about how the data was collected and the operating conditions             cally, gateways have enough computing power to run analytical
of the equipment it was collected from. In the current implemen-          models, but not to train them. The training is handled by the
tation, the block computes a large variety of statistical indices,        predictive analytics service running in the SERENA cloud. The
including maximum, minimum, mean, peak to peak distance,                  service uses the raw data to train the prediction algorithms and
variance, inter-quartile ranges, standard deviation, root mean            package the resultant models up in a Docker container, which
square, kurtosis and skewness.                                            are deployed to the edge gateways.
   The model building block is executed on a batch schedule
on historical data. These data include the smart data computed            3.3    Broker
over the original time series and their corresponding class labels
                                                                          As shown in the middle of Figure 1, the communications broker
(e.g., failure presence or absence, category of failures). All data
                                                                          acts as the central communication hub between the Data Stores,
are related to an industrial device/robot/piece of equipment of
                                                                          SERENA services and the edge gateways. The broker primarily
interest that can fail and for which a predictive maintenance
                                                                          handles HTTPS traffic by exposing a number of REST endpoints.
strategy should be addressed.
                                                                          The broker also supports a number of other protocols, such as
   Many classifiers do not manage time series data by design but,
                                                                          MQTT and Web Services, for real-time data transfer. In addition
since the original time-series of measurements are not considered
                                                                          to receiving sensor data directly from the factory floor, the broker
for training the model, a wide range of classifiers could be used.
                                                                          acts as the access point for external facilities, such as enterpise
In the current implementation to train a predictive model, the
                                                                          resourse planning systems (ERP). Security is a critical part of
proposed methodology exploits one of the following machine
                                                                          the SERENA system, and the broker, as the communications
learning algorithms: Neural networks (NN) [16], Random Forest
                                                                          hub, provides secure channels to and from the gateways and
(RF) [10], Logistic Regression (LR) [10], Support Vector Machines
                                                                          other SERENA services. It will also validate the authenticity of
(SVM) [10], and Gradient-Boosted Tree (GBT) [8].
                                                                          incoming messages, and whether the requester is authorized to
   In the validation block, performance of the prediction block, is
                                                                          use the requested service.
evaluated by exploiting either a k-fold stratified cross-validation
or a hold-out strategy based on the cardinality of the training
                                                                          3.4    Cloud data storage
set. In addition, the training dataset defaults to the available
historical data, even if shorter and more specific periods can be         SERENA supports a number of different data stores, depending
selected to address ad-hoc predictive maintenance issues. The             on the type of data and its function within the system, including
prediction performance is evaluated through quality indices, such         raw sensor data, smart data, metadata, equipment manuals, 3D
as accuracy, to evaluate the overall efficacy of the classifier, whilst   objects for virtual reality applications, etc. The data stores and
f-measure, precision, and recall offer important insights on the          data repositories are also implemented as containerized SERENA
performance of the classifier with respect to a given class.              services, which gives them the same flexibility as other services
   A forecasted failure time horizon is generated as the output of        on the system.
the predictive analytics service and consumed by a scheduling
service [14]. The aim of the service is to prevent the predicted          3.5    Docker orchestration
failure, by assigning the required maintenance activities to oper-        The orchestration manager is responsible for deploying the ser-
ators within the given timeframe. This service can be extended            vice containers to the host infrastructure and managing their life-
to consider the current production plan, hence fitting the mainte-        cycle. It also defines and manages the communications channels
nance activities within a given time slot to optimise production          between services. The core SERENA cloud services are deployed
outputs.                                                                  as resilient clusters of Docker containers. If a container fails, the
   The visulization block provides a 3D view of the relevant              orchestrator automatically starts a new container to replace it.
machinery/equipment, using the data collected in the field, along         Additionally, the orchestrator can be used to increase or decrease
with the results of the predictive maintenance algorithms. This           the number of containers in a service, thus scaling the operation
service allows for the presentation of the data and maintenance           of the service.
   As the SERENA cloud servers and gateways are registered
within the same Docker domain, the orchestrator can manage the
deployment of new services (e.g. data flow engines and analytics
models) from the SERENA cloud, all the way to the edge gateways.
Docker uses labels to specify which containers are deployed to
which hosts. If a new data flow engine is required to support a
piece of equipment, the appropriate Docker label is defined on the
host gateway; the Docker orchestrator will then ensure that the
appropriate data flow container is automatically deployed to the
gateway. In this way, thousands of containers can be deployed to
hundreds of gateways, simply by defining the appropriate Docker
labels.

3.6    Docker registry
The SERENA system also implements its own local Docker image
registry. Docker containers are deployed from images in the local                      Figure 3: Experimental setting
registry, rather than using a public registry, which ensures that
the required images are always available locally, and from a
trusted source.                                                        level is different; which slightly complicates the machine learning
                                                                       approach but allowed us to collect more data about the levels
4     USE CASE AND EXPERIMENTAL SETTING                                most complicate to analyze. Each datapoint consists of all the
COMAU (https://www.comau.com/) deploys industrial robots               information provided from the RobotBox controller and from the
around the world and it has an increasing requirement to collect       user setting connected to the choice of the belt tensioning degree:
data that monitors the health status of all its machines, in order         • header information: machine id, program number, cycle
to avoid sudden failure. To cope with this complexity, further               start time, cycle time;
studies on predictive maintenance approaches are needed. For               • time series data: position and current data, collected with
this reason a test-bed has been built, which is called RobotBox              a sampling time of 2 milliseconds for a duration of 24
and consists of a motor from a Comau medium size robot, with                 seconds;
its associated controller. Then it is constituted by an adaptor, a         • label: level of belt tensioning.
belt and a 5 kilos weight, which simulates an end effector. The
                                                                           Smart data. From the current raw data, 12 statistical features
choice of using a single axis rather than an entire robot is due to
                                                                       have been calculated and used to classify each cycle indepen-
the fact that manipulating a robot is very expensive. In addition,
                                                                       dently. Smart data include: maximum, minimum, mean, peak to
in a complete robot there are many factors having impact on
                                                                       peak distance, variance, standard deviation, root mean square
the physical conditions of the robot behaviour (e.g. frictions,
                                                                       (rms) of raw data, kurtosis, skewness and rms of three types of
temperature, vibrations, humidity, etc.). It is difficult to isolate
                                                                       filter on current data (low pass, band-pass and high pass filter).
single effects and decouple environment phenomena, especially
                                                                       In previous internal studies these features have proved to be
because the only two monitored signals are the axis position and
                                                                       effective to model a current cycle.
the current required from the motor to perform the expected
                                                                           Experimental setting. In order to implement the first pro-
cycle. So even noise signals impact on these two time series.
                                                                       totype of the proposed architecture, position and current data
Nevertheless, it is possible to extend the knowledge acquired
                                                                       were acquired by the RobotBox controller and transmitted to the
from the single axis to robots with more axes, in order to derive
                                                                       Gateway in a JSON format (the .log file), as presented in Figure 3.
a comprehensive knowledge of the asset health status.
                                                                       The Gateway then calculates some statistical features (i.e., smart
   This initial experiment only takes into account position and
                                                                       data) of the current time series. Then, it communicates with two
current, but in the future more parameters will be collected and
                                                                       other services, listed below, to obtain classification information
analyzed.
                                                                       about the RobotBox cycle:
   In a predictive maintenance perspective two possible motor
failures have been defined, namely backlash and incorrect belt             • a neural network classifier able to recognize the belt ten-
tensioning. In this study we focus on the belt tensioning issue.             sioning level;
   Real data. COMAU collects real data by monitoring a motor               • a classifier capable of giving a qualitative backlash status
from a Comau medium size robot, with its associated controller.              and an estimation of the remaining useful life expressed
The collection phase started on September 2018 up to December                in days.
2018, during which a cycle has been collected every 120 seconds.       At the end, all the raw data, the current features and the classifiers’
The sampling rate to collect raw data is 2 milliseconds, i.e. the      outputs are sent to the broker ingestion service running on the
sampling frequency is 500 Hz. Therefore, the total number of           cloud.
monitored cycles is 87,840. A cycle is the sequence of moves              The use of Node-RED (https://nodered.org/) makes it possible
which the motor has been coded to perform in loop; in this case        to program each block which implements the required function-
the cycle lasts for 24 seconds.                                        alities remotely, since the operator has only to connect or launch
   In order to study the belt tensioning phenomenon with a             the flow. The Node-RED service in the RobotBox controller, the
machine learning approach, six levels of tensioning have been          two classifiers and the Node-Red flow in the Gateway are all
defined with the domain expert’s help. The dataset collected is not    located in Docker containers and the relative images have been
balanced, which means the number of samples for each tensiosing        created and added to the SERENA Docker Registry.
   Big data framework. As a first attempt, the proposed architec-                                      Predicted
ture exploits a NoSQL database as a cloud storage layer. How-                           0       1      2        3     4         5
ever, the current solution is planned to be replaced by Big data                  0   1373      0      0       0      0        0
framework, exploiting the Cloudera stack. A MongoDB solution                      1     0     2145     0       5      0        0


                                                                         Actual
has been adopted in the experiment, in order to ease the data                     2     0       0    6673      97    67        32
management services delivered upon flexible message formats,                      3     0       0     36     3718    40       219
guaranteeing fast performance on both write/read directions.                      4     0       0     51      230   3302      293
MongoDB collections and contents have been exposed through                        5     0       0      0      119    30      3530
HTTP REST endpoints to the SERENA Ingestion Service running
on the cloud implemented using Apache NiFi.                                                  Table 1: Confusion matrix
   HTTP and real time feed (MQTT). A service in a Docker con-
tainer for almost real time data streaming has been deployed:
this is situated in the RobotBox controller and it publishes data to
a MQTT broker in the cloud, so as to make data available to the
                                                                       or removing a washer; which is why our model has difficulty
visualization application which subscribes to the same topic. The
                                                                       in identifying the correct class. Future work might consider the
data stream position was used to update a virtual representation
                                                                       temperature as another feature to be considered by the classifier.
of the RobotBox and a sample period of 50 milliseconds was cho-
sen as a good trade off between the visualization quality and the
bandwidth requested.                                                   5.2        Visualization application
                                                                       In order to setup the first experiment in the Comau test-bed,
5     PRELIMINARY RESULTS                                              SynArea (http://www.synarea.com/) has developed an HTML5
In this section, some preliminary results obtained through the ex-     Unity 3D interactive prototype application, to show in a web
ploitation of the proposed architecture and its provided services      browser the 3D model of the RobotBox. Furthermore, some inter-
are discussed.                                                         face methods have been implemented to manage the information
                                                                       coming from the SERENA platform. In particular:
5.1     The predictive analytics service                                     • display, in near real-time warnings, errors and RUL with
The predictive analytics has been tailored to forecast the belt                different colors highlighting the involved part, to imme-
tensioning level. To this end, a machine learning algorithm has                diately capture the operator’s attention, and provide an
been applied to smart data, in order to recognize a tensioning                 intuitive indication of the main information to check;
level, given a new cycle of data. In the current implementation,             • preventive and predictive textual information to be dis-
we exploited the TensorFlow library [1] to implement a Neural                  played by selecting the involved part of the RobotBox;
Network algorithm. After an in-depth sensitivity analysis, the               • 3D virtual procedure to guide the operator while perform-
specific algorithm parameters were set to the following:                       ing the replacement of the involved part (i.e. an example
                                                                               of operator support);
      • two hidden layers with, respectively, 50 and 25 neurons;
                                                                             • subscribing to the defined topic of the MQTT broker in
      • Adam optimiser with default values;
                                                                               the cloud, used for the data stream, visualize the real-time
      • cross-entropy loss;
                                                                               position on the RobotBox 3D model, to enable a remote
      • 100 thousand of epochs.
                                                                               monitoring of the physical behaviour observed.
   Given the amount of data available, an hold-out approach was
used to divide the dataset into train and test sets, with 75% of the      Figure 4 shows a screenshot of the HTML5 Unity 3D inter-
data used for training and the remainder used for test. In both        active application showing the Comau RobotBox without the
datasets, shuffling has been performed and a batch of size 300         associated controller. The central (yellow) element is a 5 kilos
samples for the training set and 100 for the test set have been        weight simulating an end effector, and the highlighted element
chosen.                                                                is a medium size motor of a Comau robot, connected with an
   Since the belt tensioning is changed by moving the motor            adaptor and a belt.
with respect to the adaptor and in order to make experiments              The application is connected to the SERENA cloud platform
reproducible, six washers have been used to discretize the six         in order to provide intuitive and real-time information to the
levels of belt tensionsing taken into account (each washer is 0.2      maintenance operator, as a result of the analytics and predictive
mm thick). The lower the number of washers, the higher both            algorithms, and to enable remote monitoring using the position.
the belt tensioning and the current cunsumption.                          The highlighted color on the motor shows its failure status
   The accuracy of the final model was found to be approximately       (green = correct; yellow = warning; red = failure) and, by clicking
94%. Table 1 shows the confusion matrix; both 0 and 1 washers          on it, an information box (on the left side) is displayed with some
are almost perfectly recognized by the model and it is due to          important prognostic or predictive values, such as the label (level
the fact that those classes are easily divisible since the belt is     of belt tensioning) and RUL (Remaining Useful Life).
extremely tense and thus the current consumption is different             By clicking on the ”Maintenance Procedure” button, a virtual
from the other classes. Regarding the other classes, even though       procedure of the belt replacement and tensioning is also displayed.
the model has good performance, there are more incorrect classi-
fications due to an environment factor: the temperature. In fact
we have noticed that the higher the environment temperature,           5.3        Scheduling application
the lower the current consumption of the motor due to lower            The scheduling service has been implemented in Java, follow-
friction between the motor components. The shift introduced by         ing a client-server architecture. The service inputs include the
this phenomenon is quite similar to the one caused by adding           monitored equipment, RUL value, maintenance tasks, including
                                                                     applications can be enabled with various applications, consider-
                                                                     ing underlying CPS features and under the vision of Industry 4.0
                                                                     and connected factories. In this regard, the proposed architecture
                                                                     has been designed with the goal of addressing some common
                                                                     needs of industrial enterprises such as:

                                                                         • compatibility with both the on-premise and the in-the-
                                                                           cloud environments;
                                                                         • exploitation of reliable and largely supported Big Data
                                                                           platforms;
                                                                         • virtually unlimited horizontal scalability;
                                                                         • easy deployment through containerized software modules.
        Figure 4: 3D visualization of the RobotBox
                                                                     To test the proposed approach, a prototype has been created and
    Resource          Task         Duration           Cost           validated in an industrial use case on the predictive maintenance
     Name                          (minutes)     (Euros/minute)      of a robotic manipulator, in particular the RobotBox device. To
                                                                     enable the evaluation on the basis of predictive analytics, visual-
    Newcomer,       machine
                                                                     ization and consequent maintenance planning, three applications
     Middle        inspection         20              0.25
                                                                     have been integrated as services. As a result of the validation of
    Newcomer,       machine
                                                                     the early prototype, the integrated solution achieved to bridge
      Expert       inspection         120             0.25
                                                                     the gap between machine data acquisition and generation of pre-
                    machine                                          dictive maintenance policies based on the analysis of the acquired
      Expert       inspection         15              0.4            data. Additionally, dynamic allocation of docker containers at the
    Newcomer,   replacement of                                       edge was achieved, enabling a dynamic way of allocating func-
     Middle       the gearbox         100             0.4            tionalities to shop floor equipment, as long as they are connected
    Newcomer,   replacement of                                       to the cloud platform and properly labelled. Existing frameworks,
      Expert      the gearbox         10              0.5            such as Arrowhead (http://www.arrowhead.eu/), provide a high-
                replacement of                                       level representation of the underlying architecture without any
     Expert       the gearbox         80              0.5            specification on an end-to-end implementation with a certain
                                                                     set of components addressing some application. This paper pro-
Table 2: Information used by the scheduling service for              vides a reference implementation for a predictive maintenance
the experiment                                                       system using a certain set of components and a specific interac-
                                                                     tion mechanism. Moreover, the presented implementation is not
                                                                     coupled to any specific technology or technique, thus making
                                                                     it suitable for overlaying other reference architectures, such as
precedence relations and default duration per operation expe-        Fiware (https://www.fiware.org/). The architecture presented in
rience, and a number of potential operators with their charac-       this work has been focused on flexibility and ease of implemen-
teristics, such as experience level. The server side includes a      tation, extension and deployment. A set of technologies have
multi-criterion decision making framework, evaluating the alter-     been used without restricting any user to adopt the same set of
native scheduling configurations, ranking them and selecting the     technologies, for example, for data storage, or final user services,
highest ranked one. The client side communicates with the server     such as scheduling. As a result, they can be easily substituted,
side via restful APIs, supporting the following functionalities:     following the proposed integration approach. This will allow the
                                                                     proposed architecture to fit a variety of applications and domains.
     • editing of tasks, resources, equipment;
                                                                     Hence, the main contribution of this work is providing a set of
     • time series visualisation;
                                                                     required end-to-end functionalities for creating a cloud platform
     • process plan Gantt visualisation.
                                                                     for Industry 4.0, not limited to the maintenance domain.
The process time required to create a new schedule depends               Future activities will focus on integrating additional function-
on the complexity of the schedule, referring to the number of        alities to the overall architecture, such as data security features,
tasks, resources, and their dependencies, along with the evaluated   increasing the robustness of the integrated solution, and eval-
criteria. In the current experiment, the schedule was generated in   uating it in versatile use cases with the aim of improving its
approximately 11 msec, and included the execution of two tasks;      efficiency and user-friendliness. Moreover, with regards to the
machine inspection and replacement of the gearbox, along with        data analytics, further investigation and research is required to
three potential resources; (1) a team of one newcomer and one        identify the most appropriate algorithms for enabling data driven
of middle experience, (2) one newcomer and one expert and (3)        predictive analytics and validating their outcome.
a team of one expert. The difference in task completion time as
well as cost is presented in the Table 2, per task.
                                                                     ACKNOWLEDGMENT
6    CONCLUSIONS AND FUTURE                                          The research leading to these results has received funding from
     APPLICATIONS                                                    European Commission under the H2020-IND-CE-2016-17 pro-
This work presents a flexible and scalable architecture merg-        gram, FOF-09-2017, Grant agreement no. 767561 "SERENA" project,
ing cloud based and edge deployed components. Through the            VerSatilE plug-and-play platform enabling REmote predictive
proposed unified integration and deployment concept, different       mainteNAnce.
REFERENCES
 [1] Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen,
     Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin,
     Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael
     Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh
     Levenberg, Dandelion Mané, Rajat Monga, Sherry Moore, Derek Murray,
     Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever,
     Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda
     Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan
     Yu, and Xiaoqiang Zheng. 2015. TensorFlow: Large-Scale Machine Learning
     on Heterogeneous Systems. (2015). https://www.tensorflow.org/ Software
     available from tensorflow.org.
 [2] A. Acquaviva, D. Apiletti, A. Attanasio, E. Baralis, L. Bottaccioli, F. B. Castag-
     netti, T. Cerquitelli, S. Chiusano, E. Macii, D. Martellacci, and E. Patti. 2015.
     Energy Signature Analysis: Knowledge at Your Fingertips. In 2015 IEEE Interna-
     tional Congress on Big Data. 543–550. https://doi.org/10.1109/BigDataCongress.
     2015.85
 [3] Daniele Apiletti, Elena Baralis, Tania Cerquitelli, Paolo Garza, Fabio Pulvirenti,
     and Luca Venturini. 2017. Frequent itemsets mining for Big Data: a comparative
     analysis. Big Data Research 9 (2017), 67–83.
 [4] S. A. Asmai, A. S. H. Basari, A. S. Shibghatullah, N. K. Ibrahim, and B. Hussin.
     2011. Neural network prognostics model for industrial equipment mainte-
     nance. In 2011 11th International Conference on Hybrid Intelligent Systems (HIS).
     635–640. https://doi.org/10.1109/HIS.2011.6122180
 [5] Radu F. Babiceanu and Remzi Seker. 2016. Big Data and virtualization for
     manufacturing cyber-physical systems: A survey of the current status and
     future outlook. Computers in Industry 81 (2016), 128 – 137. https://doi.org/
     10.1016/j.compind.2016.02.004 Emerging {ICT} concepts for smart, safe and
     sustainable industrial systems.
 [6] Yan Chen, Feibai Zhu, and Jay Lee. 2013. Data quality evaluation and im-
     provement for prognostic modeling using visual assessment based data par-
     titioning method. Computers in Industry 64, 3 (2013), 214 – 225. https:
     //doi.org/10.1016/j.compind.2012.10.005
 [7] Tania Cerquitelli Alberto Macii Enrico Macii Massimo Poncino
     Daniele Apiletti, Claudia Barberis and Francesco Ventura. 2018.                iS-
     TEP: an integrated Self-Tuning Engine for Predictive maintenance in Industry
     4.0. In 16th IEEE International Symposium on Parallel and Distributed Processin
     with Applications, ISPA-18 Melbourne, Australia, December 11-13, 2018. 8.
 [8] Trevor Hastie, Robert Tibshirani, and Jerome Friedman. [n. d.]. The elements
     of statistical learning: data mining, inference and prediction (2 ed.). Springer.
 [9] Z. Huang, A. Zhong, and G. Li. 2017. On-Demand Processing for Remote
     Sensing Big Data Analysis. In 2017 IEEE International Symposium on Parallel
     and Distributed Processing with Applications and 2017 IEEE International Con-
     ference on Ubiquitous Computing and Communications (ISPA/IUCC). 1241–1245.
     https://doi.org/10.1109/ISPA/IUCC.2017.00187
[10] Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2013. An
     introduction to statistical learning. Vol. 112. Springer.
[11] V. Jirkovsky, M. Obitko, and V. Marik. 2016. Understanding Data Heterogeneity
     in the Context of Cyber-Physical Systems Integration. IEEE Transactions
     on Industrial Informatics PP, 99 (2016), 1–1. https://doi.org/10.1109/TII.2016.
     2596101
[12] M. Mis̆ kuf and I. Zolotová. 2016. Comparison between multi-class classifiers
     and deep learning with focus on industry 4.0. In 2016 Cybernetics Informatics
     (K I). 1–5. https://doi.org/10.1109/CYBERI.2016.7438633
[13] J. Murphree. 2016. Machine learning anomaly detection in large systems. In
     2016 IEEE AUTOTESTCON. 1–9. https://doi.org/10.1109/AUTEST.2016.7589589
[14] Nikolaos Nikolakis, Apostolos Papavasileiou, Konstantinos Dimoulas, Kiriakos
     Bourmpouchakis, and Sotirios Makris. 2018. On a versatile scheduling concept
     of maintenance activities for increased availability of production resources.
     Procedia CIRP 78 (2018), 172 – 177. https://doi.org/10.1016/j.procir.2018.09.065
     6th CIRP Global Web Conference âĂŞ Envisaging the future manufacturing,
     design, technologies and systems in innovation era (CIRPe 2018).
[15] M. Niñ o, J. M. Blanco, and A. Illarramendi. 2015. Business understanding,
     challenges and issues of Big Data Analytics for the servitization of a capital
     equipment manufacturer. In 2015 IEEE International Conference on Big Data
     (Big Data). 1368–1377. https://doi.org/10.1109/BigData.2015.7363897
[16] Jürgen Schmidhuber. 2015. Deep learning in neural networks: An overview.
     Neural Networks 61 (2015), 85–117.
[17] C. Zhang, P. Lim, A. K. Qin, and K. C. Tan. 2016. Multiobjective Deep Belief
     Networks Ensemble for Remaining Useful Life Estimation in Prognostics. IEEE
     Transactions on Neural Networks and Learning Systems PP, 99 (2016), 1–13.
     https://doi.org/10.1109/TNNLS.2016.2582798