A data-driven context-based approach for modelling
Resilient Cyber Physical Production Systems
(Discussion Paper)

Ada Bagozi1 , Devis Bianchini1 and Valeria De Antonellis1
1
    University of Brescia, Dept. of Information Engineering, Via Branze 38, 25123 - Brescia (Italy)


                                         Abstract
                                         In modern Cyber Physical Production Systems (CPPS) workers interact with hybrid networked cyber
                                         and engineered physical elements that record data (e.g., using sensors), analyse them using connected
                                         services and support decision making, according to the Human-In-the-Loop paradigm. In this paper we
                                         present an approach for modelling Resilient Cyber Physical Production Systems (R-CPPS). The approach
                                         is conceived as: (i) data-driven, because recovery actions, modelled in a service-oriented architecture,
                                         are activated by sensor data measures collected on the CPPS subsystems and the surrounding production
                                         environment; (ii) context-based, since recovery services are associated with the steps of the production
                                         process as well as with the hierarchical organisation of the CPPS components involved in the recovery
                                         actions. The approach provides runtime selection of services, where the sensor data measures are used
                                         as service inputs and service outputs are displayed to the operators who supervise the CPPS subsystem
                                         on which recovery actions must be performed, enabling fast and effective resilience also in a Human-
                                         In-the-Loop scenario. The feasibility of the approach is demonstrated in a food industry case study.

                                         Keywords
                                         Resilient cyber physical production system, context-aware resilience, service-oriented architecture,
                                         Human-In-the-Loop


1. Introduction
Cyber Physical Systems (CPS) are hybrid networked cyber and engineered physical elements
that record data (e.g., using sensors), analyse them using connected services, influence physical
processes and interact with human actors using multi-channel interfaces. Examples of CPS
interacting with humans in industrial production are Cyber Physical Production Systems (CPPS),
where workers supervise the operations of industrial work centers, according to the Human-
In-the-Loop paradigm [1]. In this paper, the design of Resilient CPPS (R-CPPS) is addressed.
Resilience in CPPS is even more challenging since it must be performed both on single work
centers and at the shop floor level across connected components. For example, let’s consider the
production line shown in Figure 1, to produce biscuits starting from the recipe and ingredients.

SEBD 2021: The 29th Italian Symposium on Advanced Database Systems, September 5-9, 2021, Pizzo Calabro (VV),
Italy
" a.bagozi@unibs.it (A. Bagozi); devis.bianchini@unibs.it (D. Bianchini); valeria.deantonellis@unibs.it
(V. De Antonellis)
                                       © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
                       Dough
                                               Leavening                               Baking                        Packaging
                     preparation

       Recipe


                                                                                                          Cooked
                                      Dough                 Biscuits                                                              Packages
                                                                                                          biscuits
  Ingredients         Kneading                 Leavening                  Conveyor + Cooking                         Packaging
                      machine                 chamber and                   belt     chamber                          machine
                                                shaping

                                              Leavening                Conveyor belt       Cooking
                                              chamber                  measures            chamber
                                              measures                 ——————              measures
                                              ——————                   rpm                 ——————                                Environment
                                              duration                 …                   temperature                           measures
                                              …                                            humidity                              ——————
                                                                                           cooking_time                          env_temp
                                                                                           …                                     env_humidity
                Production line environment                                                                                      …


Figure 1: Production process for the food industry case study.


In the process, the dough is prepared by a kneading machine and let rise in a leavening chamber.
Once the biscuits are ready to be baked, they are placed in the oven. The oven is composed of a
conveyor belt and the cooking chamber. By regulating the velocity of the belt through the rpm
of the rotating engine, it is possible to setup the cooking time of the biscuits. The temperature
and the humidity of the cooking chamber can be regulated as well. Finally, other measures can
be gathered at the shop floor level, such as the temperature and the humidity of the production
line environment.
   In modern digital factories work centers are fully connected, therefore changes in one of
them may require recovery actions on the others. For example, an anomaly on the rotating
engine of the conveyor belt might cause the cookies to burn and a possible recovery action
can be triggered to modify the temperature of the cooking chamber to face a longer cooking
time. Similarly, the effects of environment humidity on the dough entering the oven must be
considered. Finally, recovery actions sometimes require an interaction with operators (e.g., the
substitution of some parts in the production line should be authorised). Therefore, the effects of
recovery services should be visualised nearby the involved work centers only, in order to give
useful insights to on-field operators who supervise the involved components.
   To address the above mentioned complexity of the domain, we propose a service-oriented
approach for modelling R-CPPS, in line with recent approaches for the design of resilient
CPS [2]. The approach is conceived as: (i) data-driven, because recovery actions, modelled in
a service-oriented architecture, are activated by sensor data measures collected on the CPPS
and the surrounding production environment; (ii) context-based, since recovery services are
associated with the steps of the production process as well as with the hierarchical organisation
of the CPPS components. The approach provides runtime selection of services, where the sensor
data measures are used as service inputs and service outputs are displayed to the operators
supervising the CPPS subsystem on which recovery actions must be performed, enabling fast
and effective resilience also in a Human-In-the-Loop scenario. The approach described in this
paper has been presented in details in [3].
  The paper is organised as follows: in Section 2 related work are discussed; Section 3 describes
the context model; in Section 4 the recovery services selection procedure is presented; in
Section 5 we describe implementation and experimental validation; finally, Section 6 closes the
paper.


2. Related Work
A recent ever growing interest has been devoted to resilience challenges in industrial plants,
often related with the notion of self-adaptation, as witnessed by an increasing number of
surveys [4, 5].
   Resilience/self-adaptation specifically designed for CPPS has been addressed in [6, 7, 8, 9, 10].
Authors in [6] propose a failure predictive tool based on the dynamic principal component
analysis (DPCA) and the gradient boosting decision trees (GBDT), a supervised machine learning
algorithm. Authors in [7] address resilience in a micro factory adopting a digital twin (DT) to
perform simulation on the monitored system and reinforcement learning (RL)-based production
control method. In [8, 9] ad-hoc resilience solutions are provided, focusing on single systems to
monitor, without considering the effects of resilience across connected components. Compared
to these approaches, our solution introduces a context model, apt to relate recovery services with
work centers organised in the fully connected hierarchy of smart machines (from connected
devices up to the whole production line at shop floor level). The adoption of context-awareness
to implement resilience on single CPS has been investigated in the Context-Aware Resilience for
Cyber Physical Systems (CAR) project (http://www.msca-car.eu), where resilience patterns
have been implemented by combining recovery actions. Authors in [10] propose an approach for
resilient CPPS that uses a simulation-based decision support system to automatically select the
best recovery action based on KPIs (e.g., Overall Equipment Efficiency) measured on the whole
production process. With respect to [10], our approach uses the context model to explicitly
relate different kinds of recovery services to the product that is being created, the involved
work centers and production process phases. Moreover, a continuous evolution of the service
ecosystem is realised through the design of new services in case of unsuccessful or missing
recovery actions. Authors in [2] share with us the service-oriented viewpoint. With respect to
them, we add here the context model and we propose a set of context-driven phases to identify
critical conditions and improve the selection of recovery services.


3. Context Model
In Figure 2 we report a simplified schema of the context model to support resilience in CPPS. In
the proposed model, the Context is described by the Product that is being produced (e.g., a certain
type of biscuits), the production Process (e.g., biscuits baking process) and the Environment
Parameters that may influence the production (e.g., the environment temperature and humidity).
A Process is associated with Services and is performed by one or more components, that cooperate
to successfully complete the production. For example, the biscuits baking process involves the
kneading machine to prepare the dough, the leavening chamber to prepare biscuits, the oven to
bake the biscuits. Components can be organised hierarchically (according to the hierarchy levels
                                                                   Context
                                                            0..n                1..n
                                                                        0..n
                                                 1                       1                    0..n

                                                                                               Environment
                                Product                            Process
                                                                                                Parameter
                                                            1..n
                                                                         1..n
                                          0..n                           1..n          1..n
                                                                                1
                                                                                                     Input
                                                     0..n   1..n
                                 CPPS                              Service
                                                                                       0..1
                        0..n                         1..n                       1
                                                                                                     Output
                 1..n                                       0..n
                                                            Component
            Operator           Components
                                                            Parameter


Figure 2: Resilient CPPS context model.


dimension of the RAMI 4.0 reference architectural model, IEC 62264/IEC 61512 standards): for
example, the oven is composed of the conveyor belt and the cooking chamber. Services represent
recovery actions to be executed on a component or the whole CPPS to ensure resilience. A
component is supervised by at least one Operator and can be monitored and controlled through
a set of Component Parameters (e.g., the oven temperature). Both Environment Parameters and
Component Parameters are used to monitor the behaviour of a CPPS in a given Context.
   A recovery service 𝑆𝑗 is associated with a CPPS (or one of its components) and is described
as a tuple
                          𝑆𝑗 = ⟨𝑛𝑆𝑗 , 𝐼𝑁𝑆𝑗 , 𝑜𝑢𝑡𝑆𝑗 , 𝑡𝑦𝑝𝑒𝑆𝑗 , 𝐶𝑃 𝑃 𝑆𝑆𝑗 ⟩                    (1)
where: (i) 𝑛𝑆𝑗 is the service name; (ii) 𝐼𝑁𝑆𝑗 is the set of input parameters; (iii) 𝑜𝑢𝑡𝑆𝑗 is an
optional service output; (iv) 𝑡𝑦𝑝𝑒𝑆𝑗 is the service type; (v) 𝐶𝑃 𝑃 𝑆𝑆𝑗 is the component or the
whole CPPS associated with the service. Service I/O can be either Component or Environment
Parameters.
  Flexibility of service-oriented architectures enables to include and dynamically add different
types of services. For instance, a recovery service may implement the function that relates one or
more input parameters with the output one. We refer to this type of service as “re-configuration”.
The following service
   setOvenTemperature(ConveyorBelt.rpm) → CookingChamber.temperature

represents a re-configuration service to set the cooking chamber temperature when the conveyor
belt rpm changes, to avoid cookies overheating. When a re-configuration is not an applicable
solution (e.g., if the service returns a cooking chamber temperature out of an acceptable range
of values), other recovery actions must be applied, such as to replace or repair the conveyor
                               warning                     radius
                                                                                   error
                                                    OK

                   error                                                 warning
                                               Synthesis centroid
          error            warning                    OK                      warning            error

                  Lower               Lower       Paremeter          Upper              Upper
                  bound              bound        measures          bound               bound
                   error             warning                        warning              error


Figure 3: Warning and error parameters bounds to detect the system status.


belt. An example of “component substitution” service would be the following:
   replaceConveyorBeltRotatingEngine(ConveyorBelt.rpm) → void

that has no output parameter to modify. This service is associated with the conveyor belt. Other
service information (e.g., execution cost, time) can be used to guide the automatic selection of
the proper recovery actions. The examples of recovery service types considered here is not
exhaustive and may be extended [11]. Recovery services can be exposed in different ways, for
example as web services, invoked from a local library, integrated in the administrative shell of
work centers according to the RAMI4.0 specification.


4. Selection of relevant recovery services
Once an anomalous event (corresponding to a critical condition) is detected on one of the
CPPS components, the event is used to identify recovery services to be applied on the involved
component or connected ones. Recovery services are automatically identified by inspecting
their inputs. In particular, a recovery service is relevant if one of its input parameters has
been classified in the error (reactive resilience) or warning status (proactive resilience), as
summarised in Figure 3. In order to face the volume of data streams collected from monitored
CPPS and to avoid misleading anomaly detection due to noise and false outliers, that may affect
single measures, anomaly detection is performed by applying the IDEAaS approach described
in [12]. Roughly speaking, a summarised representation of collected measures, called syntheses,
is incrementally built. Each synthesis contains measures collected when the observed system
is operating in the same working conditions. Moreover, the following conditions must hold
on the relevant recovery service: (a) if the service type is “re-configuration”, the value of its
output parameters must not exceed any parameter bound; (b) if the service type is “component
substitution”, an alternative machinery or component ready to be used in substitution must be
available and associated with the service.
   As an example, the setOvenTemperature service is relevant if an anomaly has been detected
on the values of rotating engine rpm in the conveyor belt. Since the service type in this case
is “re-configuration”, the value of the service output resulting from its automatic execution
Figure 4: Approach architecture.


must be compliant with parameter bounds of the cooking chamber temperature. The relevant
service is then automatically executed to operate on the component or the whole CPPS (for
re-configuration services) or to proceed with a physical substitution of the affected part (for
component substitution services). The system will present all the necessary information to
guide the maintenance operator during the substitution. In fact, we remark that another feature
of the approach is that the information about the recovery actions to undertake, as results of
recovery services execution, is visualised nearby the involved work center, providing insights to
operators who supervise those components, avoiding information flooding towards operators
that may hamper their working efficiency. A prototype operator interface is detailed in [3].


5. Implementation and preliminary evaluation
The approach described in this paper has been integrated with the IDEAaS anomaly detection
module and the resulting architecture is sketched in Figure 4. During anomaly detection, the
Context Manager is invoked in order to contextualise the incoming data. To this purpose, the
Context Manager will provide the following information: (i) an identifier for the context; (ii) a
set of parameters, either Environment or Component parameters, to be analysed; (iii) the observed
CPPS; (iv) the product that is being produced; (v) the running process. Such information is
extracted from the Context Model database.
   Collected measures in the context are properly summarised as syntheses by applying IDEAaS
data summarisation techniques. Furthermore, syntheses are processed in order to detect anoma-
lies and are stored in the Data Syntheses database. Summarised data are visualised: (a) on
the Designer GUI to let the designer monitor the overall evolution of the CPPS; (b) on the
Edge Computing Device of the involved component (operator interface), to let the on-field
operator to better understand the behaviour of the component. Moreover, when anomalous
conditions are detected, the Context Manager is notified with the identifier of the context and
the list of critical parameters on which the anomaly occurred, together with their measures. The
Context Manager will search for relevant recovery services, associated with the component
in the context. Once relevant recovery services have been identified, the Context Manager
launches the execution of the services by interacting with the Service Manager, which is
responsible for services registration in the Service Repository and for their execution. The result
of the services execution is sent to the operator supervising the component.
   A proof-of-concept validation of the approach to demonstrate its applicability is being per-
formed. In particular, processing time required to promptly detect anomalies and activate
recovery actions services (a potential bottleneck for the whole approach) is investigated. We
run experiments on a MacBook Pro Retina, with an Intel Core i7-6700HQ processor, at 2.60
GHz, 4 cores, RAM 16GB. For measures of parameters collected every 200ms, average response
time per measure necessary to apply data summarisation and anomaly detection is within 0.12
ms. We also quantified the capability to detect anomalies on the collected measures using the
Pearson Correlation Coefficient (PCC) ∈ [−1, +1], that estimates the correlation between the
real variations and the detected ones. In the experiment, the best PCC value is higher than 0.85,
that represents an acceptable correlation.


6. Conclusions
In this paper, we addressed resilience in Cyber Physical Production Systems, where recovery
services are activated by sensor data measures (data-driven) and are selected according to a
context model, that relates services with the steps of the production process as well as with
the hierarchical organisation of the involved CPPS components. A validation of the approach
is being performed on a real dataset that we should pre-process appropriately (e.g., sensitive
information must be removed or anonymised) before making it available to the community as a
benchmark for future work comparison. Future efforts will be devoted to the improvement of
service selection criteria, for example defining a cost model and using simulation-based modules
to predict the effects of recovery actions on the production process, like the one described
in [10]. Finally, modelling of other kinds of recovery services is being considered.
References
 [1] D. Nunes, J. Silva, F. Boavida, A Practical Introduction to Human-in-the-Loop Cyber-
     Physical Systems, Wiley IEEE Press, 2018.
 [2] N. Bicocchi, G. Cabri, F. Mandreoli, M. Mecella, Dynamic digital factories for agile supply
     chains: An architectural approach, Industrial Information Integration 15 (2019) 111–121.
 [3] A. Bagozi, D. Bianchini, V. De Antonellis, Designing Context-Based Services for Resilient
     Cyber Physical Production Systems, in: 21st Int. Conf. on Web Information Systems
     Engineering (WISE), 2020, pp. 474–488.
 [4] D. Ratasich, F. Khalid, F. Geissler, R. Grosu, M. Shafique, E. Bartocci, A Roadmap Toward the
     Resilient Internet of Things for Cyber-Physical Systems, IEEE Access 7 (2019) 13260–13283.
 [5] J. Moura, D. Hutchison, Game Theory for Multi-Access Edge Computing: Survey, Use
     Cases, and Future Trends, IEEE Communication Surveys and Tutorials 21 (2019) 260–288.
 [6] Y. Zhang, X. Beudaert, J. Argandoña, S. Ratchev, J. Munoa, A cpps based on gbdt for
     predicting failure events in milling, The International Journal of Advanced Manufacturing
     Technology 111 (2020) 341–357.
 [7] K. T. Park, Y. H. Son, S. W. Ko, S. D. Noh, Digital twin and reinforcement learning-based
     resilient production control for micro smart factory, Applied Sciences 11 (2021).
 [8] R. Barenji, A. Barenji, M. Hashemipour, A multi-agent RFID-enabled distributed control
     system for a flexible manufacturing shop, Advanced Manufacturing Technology 71 (2014)
     1773–1791.
 [9] B. Vogel-Hauser, C. Diedrich, D. Pantförder, P. Göohner, Coupling heterogeneous produc-
     tion systems by a multi-agent based cyber-physical production system, in: Proc. of 12th
     IEEE Int. Conf. on Industrial Informatics (INDN), 2014, pp. 713–719.
[10] N. Galaske, A. R, Disruption Management for Resilient Processes in Cyber-Physical
     Production Systems, Procedia CIRP 50 (2016) 442 – 447.
[11] G. Pumpuni-Lenss, T. Blackburn, A. Garstenauer, Resilience in Complex Systems: An
     Agent-Based Approach, Systems Engineering 20 (2017) 158–172.
[12] A. Bagozi, D. Bianchini, V. D. Antonellis, M. Garda, A. Marini, A Relevance-based approach
     for Big Data Exploration, Future Generation Computer Systems 101 (2019) 51 – 69.