A data-driven context-based approach for modelling Resilient Cyber Physical Production Systems (Discussion Paper) Ada Bagozi1 , Devis Bianchini1 and Valeria De Antonellis1 1 University of Brescia, Dept. of Information Engineering, Via Branze 38, 25123 - Brescia (Italy) Abstract In modern Cyber Physical Production Systems (CPPS) workers interact with hybrid networked cyber and engineered physical elements that record data (e.g., using sensors), analyse them using connected services and support decision making, according to the Human-In-the-Loop paradigm. In this paper we present an approach for modelling Resilient Cyber Physical Production Systems (R-CPPS). The approach is conceived as: (i) data-driven, because recovery actions, modelled in a service-oriented architecture, are activated by sensor data measures collected on the CPPS subsystems and the surrounding production environment; (ii) context-based, since recovery services are associated with the steps of the production process as well as with the hierarchical organisation of the CPPS components involved in the recovery actions. The approach provides runtime selection of services, where the sensor data measures are used as service inputs and service outputs are displayed to the operators who supervise the CPPS subsystem on which recovery actions must be performed, enabling fast and effective resilience also in a Human- In-the-Loop scenario. The feasibility of the approach is demonstrated in a food industry case study. Keywords Resilient cyber physical production system, context-aware resilience, service-oriented architecture, Human-In-the-Loop 1. Introduction Cyber Physical Systems (CPS) are hybrid networked cyber and engineered physical elements that record data (e.g., using sensors), analyse them using connected services, influence physical processes and interact with human actors using multi-channel interfaces. Examples of CPS interacting with humans in industrial production are Cyber Physical Production Systems (CPPS), where workers supervise the operations of industrial work centers, according to the Human- In-the-Loop paradigm [1]. In this paper, the design of Resilient CPPS (R-CPPS) is addressed. Resilience in CPPS is even more challenging since it must be performed both on single work centers and at the shop floor level across connected components. For example, let’s consider the production line shown in Figure 1, to produce biscuits starting from the recipe and ingredients. SEBD 2021: The 29th Italian Symposium on Advanced Database Systems, September 5-9, 2021, Pizzo Calabro (VV), Italy " a.bagozi@unibs.it (A. Bagozi); devis.bianchini@unibs.it (D. Bianchini); valeria.deantonellis@unibs.it (V. De Antonellis) © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) Dough Leavening Baking Packaging preparation Recipe Cooked Dough Biscuits Packages biscuits Ingredients Kneading Leavening Conveyor + Cooking Packaging machine chamber and belt chamber machine shaping Leavening Conveyor belt Cooking chamber measures chamber measures —————— measures —————— rpm —————— Environment duration … temperature measures … humidity —————— cooking_time env_temp … env_humidity Production line environment … Figure 1: Production process for the food industry case study. In the process, the dough is prepared by a kneading machine and let rise in a leavening chamber. Once the biscuits are ready to be baked, they are placed in the oven. The oven is composed of a conveyor belt and the cooking chamber. By regulating the velocity of the belt through the rpm of the rotating engine, it is possible to setup the cooking time of the biscuits. The temperature and the humidity of the cooking chamber can be regulated as well. Finally, other measures can be gathered at the shop floor level, such as the temperature and the humidity of the production line environment. In modern digital factories work centers are fully connected, therefore changes in one of them may require recovery actions on the others. For example, an anomaly on the rotating engine of the conveyor belt might cause the cookies to burn and a possible recovery action can be triggered to modify the temperature of the cooking chamber to face a longer cooking time. Similarly, the effects of environment humidity on the dough entering the oven must be considered. Finally, recovery actions sometimes require an interaction with operators (e.g., the substitution of some parts in the production line should be authorised). Therefore, the effects of recovery services should be visualised nearby the involved work centers only, in order to give useful insights to on-field operators who supervise the involved components. To address the above mentioned complexity of the domain, we propose a service-oriented approach for modelling R-CPPS, in line with recent approaches for the design of resilient CPS [2]. The approach is conceived as: (i) data-driven, because recovery actions, modelled in a service-oriented architecture, are activated by sensor data measures collected on the CPPS and the surrounding production environment; (ii) context-based, since recovery services are associated with the steps of the production process as well as with the hierarchical organisation of the CPPS components. The approach provides runtime selection of services, where the sensor data measures are used as service inputs and service outputs are displayed to the operators supervising the CPPS subsystem on which recovery actions must be performed, enabling fast and effective resilience also in a Human-In-the-Loop scenario. The approach described in this paper has been presented in details in [3]. The paper is organised as follows: in Section 2 related work are discussed; Section 3 describes the context model; in Section 4 the recovery services selection procedure is presented; in Section 5 we describe implementation and experimental validation; finally, Section 6 closes the paper. 2. Related Work A recent ever growing interest has been devoted to resilience challenges in industrial plants, often related with the notion of self-adaptation, as witnessed by an increasing number of surveys [4, 5]. Resilience/self-adaptation specifically designed for CPPS has been addressed in [6, 7, 8, 9, 10]. Authors in [6] propose a failure predictive tool based on the dynamic principal component analysis (DPCA) and the gradient boosting decision trees (GBDT), a supervised machine learning algorithm. Authors in [7] address resilience in a micro factory adopting a digital twin (DT) to perform simulation on the monitored system and reinforcement learning (RL)-based production control method. In [8, 9] ad-hoc resilience solutions are provided, focusing on single systems to monitor, without considering the effects of resilience across connected components. Compared to these approaches, our solution introduces a context model, apt to relate recovery services with work centers organised in the fully connected hierarchy of smart machines (from connected devices up to the whole production line at shop floor level). The adoption of context-awareness to implement resilience on single CPS has been investigated in the Context-Aware Resilience for Cyber Physical Systems (CAR) project (http://www.msca-car.eu), where resilience patterns have been implemented by combining recovery actions. Authors in [10] propose an approach for resilient CPPS that uses a simulation-based decision support system to automatically select the best recovery action based on KPIs (e.g., Overall Equipment Efficiency) measured on the whole production process. With respect to [10], our approach uses the context model to explicitly relate different kinds of recovery services to the product that is being created, the involved work centers and production process phases. Moreover, a continuous evolution of the service ecosystem is realised through the design of new services in case of unsuccessful or missing recovery actions. Authors in [2] share with us the service-oriented viewpoint. With respect to them, we add here the context model and we propose a set of context-driven phases to identify critical conditions and improve the selection of recovery services. 3. Context Model In Figure 2 we report a simplified schema of the context model to support resilience in CPPS. In the proposed model, the Context is described by the Product that is being produced (e.g., a certain type of biscuits), the production Process (e.g., biscuits baking process) and the Environment Parameters that may influence the production (e.g., the environment temperature and humidity). A Process is associated with Services and is performed by one or more components, that cooperate to successfully complete the production. For example, the biscuits baking process involves the kneading machine to prepare the dough, the leavening chamber to prepare biscuits, the oven to bake the biscuits. Components can be organised hierarchically (according to the hierarchy levels Context 0..n 1..n 0..n 1 1 0..n Environment Product Process Parameter 1..n 1..n 0..n 1..n 1..n 1 Input 0..n 1..n CPPS Service 0..1 0..n 1..n 1 Output 1..n 0..n Component Operator Components Parameter Figure 2: Resilient CPPS context model. dimension of the RAMI 4.0 reference architectural model, IEC 62264/IEC 61512 standards): for example, the oven is composed of the conveyor belt and the cooking chamber. Services represent recovery actions to be executed on a component or the whole CPPS to ensure resilience. A component is supervised by at least one Operator and can be monitored and controlled through a set of Component Parameters (e.g., the oven temperature). Both Environment Parameters and Component Parameters are used to monitor the behaviour of a CPPS in a given Context. A recovery service 𝑆𝑗 is associated with a CPPS (or one of its components) and is described as a tuple 𝑆𝑗 = ⟨𝑛𝑆𝑗 , 𝐼𝑁𝑆𝑗 , 𝑜𝑢𝑡𝑆𝑗 , 𝑡𝑦𝑝𝑒𝑆𝑗 , 𝐶𝑃 𝑃 𝑆𝑆𝑗 ⟩ (1) where: (i) 𝑛𝑆𝑗 is the service name; (ii) 𝐼𝑁𝑆𝑗 is the set of input parameters; (iii) 𝑜𝑢𝑡𝑆𝑗 is an optional service output; (iv) 𝑡𝑦𝑝𝑒𝑆𝑗 is the service type; (v) 𝐶𝑃 𝑃 𝑆𝑆𝑗 is the component or the whole CPPS associated with the service. Service I/O can be either Component or Environment Parameters. Flexibility of service-oriented architectures enables to include and dynamically add different types of services. For instance, a recovery service may implement the function that relates one or more input parameters with the output one. We refer to this type of service as “re-configuration”. The following service setOvenTemperature(ConveyorBelt.rpm) → CookingChamber.temperature represents a re-configuration service to set the cooking chamber temperature when the conveyor belt rpm changes, to avoid cookies overheating. When a re-configuration is not an applicable solution (e.g., if the service returns a cooking chamber temperature out of an acceptable range of values), other recovery actions must be applied, such as to replace or repair the conveyor warning radius error OK error warning Synthesis centroid error warning OK warning error Lower Lower Paremeter Upper Upper bound bound measures bound bound error warning warning error Figure 3: Warning and error parameters bounds to detect the system status. belt. An example of “component substitution” service would be the following: replaceConveyorBeltRotatingEngine(ConveyorBelt.rpm) → void that has no output parameter to modify. This service is associated with the conveyor belt. Other service information (e.g., execution cost, time) can be used to guide the automatic selection of the proper recovery actions. The examples of recovery service types considered here is not exhaustive and may be extended [11]. Recovery services can be exposed in different ways, for example as web services, invoked from a local library, integrated in the administrative shell of work centers according to the RAMI4.0 specification. 4. Selection of relevant recovery services Once an anomalous event (corresponding to a critical condition) is detected on one of the CPPS components, the event is used to identify recovery services to be applied on the involved component or connected ones. Recovery services are automatically identified by inspecting their inputs. In particular, a recovery service is relevant if one of its input parameters has been classified in the error (reactive resilience) or warning status (proactive resilience), as summarised in Figure 3. In order to face the volume of data streams collected from monitored CPPS and to avoid misleading anomaly detection due to noise and false outliers, that may affect single measures, anomaly detection is performed by applying the IDEAaS approach described in [12]. Roughly speaking, a summarised representation of collected measures, called syntheses, is incrementally built. Each synthesis contains measures collected when the observed system is operating in the same working conditions. Moreover, the following conditions must hold on the relevant recovery service: (a) if the service type is “re-configuration”, the value of its output parameters must not exceed any parameter bound; (b) if the service type is “component substitution”, an alternative machinery or component ready to be used in substitution must be available and associated with the service. As an example, the setOvenTemperature service is relevant if an anomaly has been detected on the values of rotating engine rpm in the conveyor belt. Since the service type in this case is “re-configuration”, the value of the service output resulting from its automatic execution Figure 4: Approach architecture. must be compliant with parameter bounds of the cooking chamber temperature. The relevant service is then automatically executed to operate on the component or the whole CPPS (for re-configuration services) or to proceed with a physical substitution of the affected part (for component substitution services). The system will present all the necessary information to guide the maintenance operator during the substitution. In fact, we remark that another feature of the approach is that the information about the recovery actions to undertake, as results of recovery services execution, is visualised nearby the involved work center, providing insights to operators who supervise those components, avoiding information flooding towards operators that may hamper their working efficiency. A prototype operator interface is detailed in [3]. 5. Implementation and preliminary evaluation The approach described in this paper has been integrated with the IDEAaS anomaly detection module and the resulting architecture is sketched in Figure 4. During anomaly detection, the Context Manager is invoked in order to contextualise the incoming data. To this purpose, the Context Manager will provide the following information: (i) an identifier for the context; (ii) a set of parameters, either Environment or Component parameters, to be analysed; (iii) the observed CPPS; (iv) the product that is being produced; (v) the running process. Such information is extracted from the Context Model database. Collected measures in the context are properly summarised as syntheses by applying IDEAaS data summarisation techniques. Furthermore, syntheses are processed in order to detect anoma- lies and are stored in the Data Syntheses database. Summarised data are visualised: (a) on the Designer GUI to let the designer monitor the overall evolution of the CPPS; (b) on the Edge Computing Device of the involved component (operator interface), to let the on-field operator to better understand the behaviour of the component. Moreover, when anomalous conditions are detected, the Context Manager is notified with the identifier of the context and the list of critical parameters on which the anomaly occurred, together with their measures. The Context Manager will search for relevant recovery services, associated with the component in the context. Once relevant recovery services have been identified, the Context Manager launches the execution of the services by interacting with the Service Manager, which is responsible for services registration in the Service Repository and for their execution. The result of the services execution is sent to the operator supervising the component. A proof-of-concept validation of the approach to demonstrate its applicability is being per- formed. In particular, processing time required to promptly detect anomalies and activate recovery actions services (a potential bottleneck for the whole approach) is investigated. We run experiments on a MacBook Pro Retina, with an Intel Core i7-6700HQ processor, at 2.60 GHz, 4 cores, RAM 16GB. For measures of parameters collected every 200ms, average response time per measure necessary to apply data summarisation and anomaly detection is within 0.12 ms. We also quantified the capability to detect anomalies on the collected measures using the Pearson Correlation Coefficient (PCC) ∈ [−1, +1], that estimates the correlation between the real variations and the detected ones. In the experiment, the best PCC value is higher than 0.85, that represents an acceptable correlation. 6. Conclusions In this paper, we addressed resilience in Cyber Physical Production Systems, where recovery services are activated by sensor data measures (data-driven) and are selected according to a context model, that relates services with the steps of the production process as well as with the hierarchical organisation of the involved CPPS components. A validation of the approach is being performed on a real dataset that we should pre-process appropriately (e.g., sensitive information must be removed or anonymised) before making it available to the community as a benchmark for future work comparison. Future efforts will be devoted to the improvement of service selection criteria, for example defining a cost model and using simulation-based modules to predict the effects of recovery actions on the production process, like the one described in [10]. Finally, modelling of other kinds of recovery services is being considered. References [1] D. Nunes, J. Silva, F. Boavida, A Practical Introduction to Human-in-the-Loop Cyber- Physical Systems, Wiley IEEE Press, 2018. [2] N. Bicocchi, G. Cabri, F. Mandreoli, M. Mecella, Dynamic digital factories for agile supply chains: An architectural approach, Industrial Information Integration 15 (2019) 111–121. [3] A. Bagozi, D. Bianchini, V. De Antonellis, Designing Context-Based Services for Resilient Cyber Physical Production Systems, in: 21st Int. Conf. on Web Information Systems Engineering (WISE), 2020, pp. 474–488. [4] D. Ratasich, F. Khalid, F. Geissler, R. Grosu, M. Shafique, E. Bartocci, A Roadmap Toward the Resilient Internet of Things for Cyber-Physical Systems, IEEE Access 7 (2019) 13260–13283. [5] J. Moura, D. Hutchison, Game Theory for Multi-Access Edge Computing: Survey, Use Cases, and Future Trends, IEEE Communication Surveys and Tutorials 21 (2019) 260–288. [6] Y. Zhang, X. Beudaert, J. Argandoña, S. Ratchev, J. Munoa, A cpps based on gbdt for predicting failure events in milling, The International Journal of Advanced Manufacturing Technology 111 (2020) 341–357. [7] K. T. Park, Y. H. Son, S. W. Ko, S. D. Noh, Digital twin and reinforcement learning-based resilient production control for micro smart factory, Applied Sciences 11 (2021). [8] R. Barenji, A. Barenji, M. Hashemipour, A multi-agent RFID-enabled distributed control system for a flexible manufacturing shop, Advanced Manufacturing Technology 71 (2014) 1773–1791. [9] B. Vogel-Hauser, C. Diedrich, D. Pantförder, P. Göohner, Coupling heterogeneous produc- tion systems by a multi-agent based cyber-physical production system, in: Proc. of 12th IEEE Int. Conf. on Industrial Informatics (INDN), 2014, pp. 713–719. [10] N. Galaske, A. R, Disruption Management for Resilient Processes in Cyber-Physical Production Systems, Procedia CIRP 50 (2016) 442 – 447. [11] G. Pumpuni-Lenss, T. Blackburn, A. Garstenauer, Resilience in Complex Systems: An Agent-Based Approach, Systems Engineering 20 (2017) 158–172. [12] A. Bagozi, D. Bianchini, V. D. Antonellis, M. Garda, A. Marini, A Relevance-based approach for Big Data Exploration, Future Generation Computer Systems 101 (2019) 51 – 69.