iPRODICT – Intelligent Process Prediction based on Big Data Analytics Nijat Mehdiyev1*, Andreas Emrich1, Björn Stahmer2, Peter Fettke1, Peter Loos1 1 Institute for Information Systems (IWi) at German Research Center for Artificial Intelligence (DFKI) and Saarland University Saarbruecken, Germany {nijat.mehdiyev;andreas.emrich;peter.fettke;peter.loos}@iwi.dfki.de 2 Saarstahl AG, Völklingen, Germany bjoern.stahmer@saarstahl.com Abstract. The major purpose of the iPRODICT research project is to operation- alize industrial internet of things driven predictive and prescriptive analytics by embedding it to the operational processes. Particularly, within an interdiscipli- nary team of researchers and industry experts, we investigate an integration of diverse technologies to enable real time sensor data driven decision making for process improvements and optimization in the process industry. The case study concentrates on adaptation and optimization of both manufacturing and business processes by analyzing the quality of the semi-finished steel products proactively based on the sensor data obtained from the continuous casting process and chem- ical properties of the steel. In the underlying paper, we discussed three business process management specific use cases in the sensor-driven process industry, namely (i) business process instance adaptation, (ii) business process instance- to-instance adaptation and optimization and (iii) business process instance-to- model adaptation. Furthermore, we discuss the components of the proposed pre- dictive enterprise solution and their dependencies briefly and provide an insight to the challenges and lessons learnt over the diverse stages of the case study. Keywords: Predictive Analytics · Process Adaptation and Optimization · Pro- cess Industry · Sensor-driven Business Process Management · 1 Introduction 1.1 Operationalizing and Embedding Analytics to Business Processes Since the firms adopt similar products and identical technologies, high-performance business processes are one of the last points of differentiation [1]. The dynamic capa- bility of managing the business processes proactively requires the embedding of in- sights gained from descriptive, predictive and prescriptive analysis to business pro- cesses. The recent proliferation of industrial internet of things, coined also as Industry 4.0, creates enormous opportunities especially for manufacturing firms to advance their M. Brambilla, T. Hildebrandt (Eds.): BPMN 2017 Industrial Track Proceedings, CEUR- WS.org, 2017. Copyright © 2017 for the individual papers by the papers' authors. Copying per- mitted for private and academic purposes. This volume is published and copyrighted by its edi- tors. N. Mehdiyev et al. analytical capabilities. Industry 4.0 enables the digitalization of horizontal and vertical integration of value chains both within the corporation and across the whole supply chain. A successful horizontal integration between diverse in-house functional areas such as production management, quality management, inventory management and maintenance management requires a robust vertical integration of the operational/pro- duction processes (shop-floor) with the related business processes. In order to enable such an integration the manufacturing firms need to have the capabilities/platforms to collect, distribute, share and analyze the data from diverse levels of the automation pyramid (both business and production levels) to make the strategic decisions in real time. These capabilities should enable transparency, interoperability and communica- tion over the whole value chain. Within the frame of the iPRODICT research project, we explored the possibilities to integrate novel technologies and approaches to develop a predictive enterprise software for the process industry to manage/control the business and operational processes in real time. For this purpose, we developed a prototype which is capable of supporting both unilateral and bilateral integration of (i) Machine Learning, (ii) Complex Event Processing, (iii) Business Process Management, (iii) Image Recognition, (iv) Mathe- matical Optimization and (v) Data Visualization technologies and methods. Further- more, we explore the opportunities offered by industrial internet of things that enable the digital transformation in both manufacturing and business processes. 1.2 A Case Study from Steel Industry The underlying case study conducts initial investigations and preliminary attempts for proactive management, adaptation and optimization of business and operational pro- cesses at one of the biggest German steelmaking company, Saarstahl AG. The core focus of the research lies in the efficient exploitation of the real-time data obtained in the diverse stages of the steel bar production for making strategically critical decisions. The key challenge when handling such voluminous data with high velocity is assuring reliability, timeliness and scalability. Furthermore, since we deal with semi-automation of the business processes, the data visualization capabilities play also a central role for supporting domain experts to make the relevant decisions. Particularly, the iPRODICT research project aims to integrate the shop floor data obtained from the continuous casting process and the data describing the chemical prop- erties of the steel vertically with the business process data. The irregularities in the chemical properties of the steel and abnormalities in the continuous casting parameters such as tundish mass, air ingress, mold level fluctuations, oscillation frequency, mold heat flux, mold water flow, casting speed and casting speed change influence the quality of the (semi)finished steel bars. Such irregular parameter values may lead to steel sur- face defects (surface decarburization, cracks and etc.) which are defined as a deviation from the normative appearance, form, size, macrostructure [2]. Various additional pro- duction processes such as steel pickling, surface grinding, etc. are required to be per- formed contingent upon the grades of the steel surface defect in order to attain the de- sired level of the product quality. This in turn requires agile capabilities to adapt the business processes such as reallocating both human and machine resources, dynamic iPRODICT – Intelligent Process Prediction based on Big Data Analytics optimization of the production and scheduling plans as well as matching the demand and supply in real time. Currently the Saarstahl AG assesses the quality of semi-finished products by performing multi-stage visual inspection which comes at a high expense. The proposed predictive enterprise analytics solution aims to semi-automate the inspec- tion process by providing real-time situational awareness about the product quality based on the industrial internet of things. The remainder of this paper is structured as follows: Section 2 provides an overview of the related work in predictive analytics, business process management, complex event processing and optimization domains. Section 3 introduces three BPM uses cases in the sensor-driven process industry, namely business process instance adaptation, pro- cess instance-to-instance adaptation and process instance-to-model adaptation. Section 4 provides a brief overview of the proposed system architecture. Finally, section 5 con- cludes the paper by discussing the lessons learnt. 2 Related Work Predictive Analytics and Business Process Management. Recently, many attempts have been made to apply machine learning algorithms in the business process manage- ment domain. Scholars examined the applicability of diverse machine learning and ar- tificial intelligence approaches for (i) regression problems such as estimation of the remaining process completion time [3], [4], [5] and (ii) classification problems such as next process event prediction, business process outcome prediction, violation of service level agreements and etc. [6], [7], [8], [9], [10], [11]. The application of deep learning algorithms has also been gaining the popularity for both regression and classification problems [12], [13]. A thorough analysis of these studies reveals that, they mainly use process log data provided by Process Aware Information Systems. Control flow data are especially preferred due to their easy accessibility and simplicity. The main superi- ority of the proposed process prediction approaches within the iPRODICT research project is the exploitation of big data obtained from the sensors which provide a com- prehensive overview of the process parameters. Industrial internet of things driven busi- ness process management has been recently gaining great attention but the applica- tions/case studies are currently very limited. By providing the relevant use cases we aim to address this gap. Complex Event Processing and Business Process Management. An integration of Complex Event Processing and Business Process Management is often coined in the literature as Event Driven Business Process Management [14], [15]. An overview of the recent literature reveals that the scholars mainly concentrate on the modelling as- pects of such an integration [16]. There are also studies which examine the role of Com- plex Event Processing as an active Business Activity Monitoring tool and provide a proof of concept [17]. However, integrating predictive analytics into the Complex Event Processing in the Business Process Management domain in different formats such as data driven event pattern detection from process data or streaming the predic- tion results as primitive events to CEP, have not been addressed in detail. Within the N. Mehdiyev et al. frame of the iPRODICT research project we made relevant contributions to address this research gap. Mathematical Optimization and Business Process Management. An implementa- tion of mathematical optimization domain in the business process domains has also been investigated by researchers [18],[19]. A number of studies addressed single and multi-objective business process optimization with both conventional and meta-heuris- tic optimization approaches [20]. However, an analysis of these papers suggests that the optimization input parameters and constraint values were mainly provided by the experts based on their domain knowledge or solely on assumptions. In the iPRODICT research project we investigated data driven real time optimization by leveraging the information obtained from the industrial internet of things. Complex Event Processing and Machine Learning. An integration of machine learn- ing approaches to CEP systems and their application in process monitoring have also been recently investigated. [21] proposed a Kalman Filter based approach for rule pa- rameter prediction in CEP systems. [22] applied adaptive moving regression to predict the IoT data and integrated it to CEP systems. [23] investigated rule-based event pro- cessing systems and languages. [24] examined event pattern identification through ma- chine learning approaches. To our best knowledge, the iPRODICT research project is one of the first in the domain of applying the machine learning algorithms on top of a big data platform to infer complex event patterns for managing both business and op- erational processes in real-time. 3 BPM Use Cases in Sensor-driven Process Industry Traditionally known process mining scenarios do not apply to sensor-based scenarios since the sensor data don’t constitute atomic logs of business events. Capabilities such as data fusion and complex event processing have to be applied, in order to achieve similar results for sensor data. The iPRODICT research project closes this gap by providing a system approach to capturing business process events from sensor data. 3.1 Use Case I: Process Instance Adaptation & Process Step Recommendation Instance A.4711 A B C B2 Fig. 1. Process instance adaptation iPRODICT – Intelligent Process Prediction based on Big Data Analytics Rationale. The case of “process instance adaptation” denotes the run-time adaptation of the given process instance. Usually, this is seen as a recommendation of the next process step or activity based on the execution logs of the current process instance and the execution log histories from prior process instances of the same process model. In our scenario, we focus on situations where sensor data have a crucial impact on the process outcome, i.e. the step chosen, while the log data give slim to no indication at all about this process outcome. Use Case Description. The case at Saarstahl AG deals with the quality control of steel slabs. Regardless of the final products, the slabs are later transformed into, the steel production process is quite linear before that– with the exception of the chemical mix that constitutes a batch for steel casting. According to an error model, the quality of the steel slabs is being assessed and certain post-processing steps are triggered. This can be done according to standard work plans for materials, individual customer requirements or as a countermeasure for eventual errors. Methods. The two predictions for steel surface failures and post-processing steps can be formulated as a time series classification problems. The error prediction is a multi- label classification, which finds surface failures for a given steel slab. The prediction of post-processing steps is the prediction of the next process step which relies on a multi-class classification. Each class represents the possible activity types of the post- processing steps that are available. Input for both scenarios is the sensor data from the steel casting plant and chemical properties of the steel for the given batches. The size of the dataset delivered by Saarstahl was about 30 Terabytes which comprised the in- formation about process parameters obtained from about 450 sensors positioned in the various stages of the continuous casting process, the chemical properties of the steel for each individual charges, the pre-defined post-processing activities in the standard work plans for the materials, the results of the quality inspection procedures for almost 9000 steel slabs, the occurred error types and the keys for matching the sensor data and the chemical analysis data with the individual steel slabs. The results from the error predic- tions are used as input for post-processing step predictions along with the standard work plan and order information for the given steel brand or current order. N. Mehdiyev et al. 3.2 Use Case II: Process Instance-to-Instance Adaptation & Optimization Instance A.4711 Adaptation of A.4711 A B C causes delay of A.4712 B2 Instance A.4712 A B C Fig. 2. Process instance-to-instance adaptation and optimization Rationale. An “instance-to-instance” adaptation means the run-time adaptation or co- ordination of several running process instances. In business processes, activities are often being explicitly performed by either human or system resources such as machines or computers. Moreover, goods and / or information are being transformed in the busi- ness process. Especially for the latter, one important characteristic of process industry comes into play: The synthetical and analytical production stages blur the traces be- tween product and order, i.e. the product is being heavily transformed in the process and the final allocation to the constituting order is often being done at the very end of the production process. Use Case Description. At Saarstahl AG, the orders constitute important information for the selection of a steel brand for the next batches to cast. However, there is no direct association of a given steel slab with its respective given order throughout the casting process itself. Therefore, the allocation of slabs or the respective end products to the orders is carried out at the end of the production process. Different criteria such as timeliness, priority and storage availability have to be considered in order to make this decision. Methods. Overall, the instances are being optimized regarding the given criteria in terms of a multi-criteria optimization. Using an underlying cost function helps to iden- tify the best possible allocations. In the underlying study we examined the applicability of meta-heuristic optimization approaches, particularly genetic algorithm based opti- mization methods. iPRODICT – Intelligent Process Prediction based on Big Data Analytics 3.3 Use Case III: Process Instance-to-Model Adaptation Instance A.4711 Adaptation of several Instanz A.4711 Instanz A A.4711 B C instances of model A A B C causes a model change A B C to model A‘ B2 B2 Model A' B2 A B B2 C Fig. 3. Process instance-to-model adaptation Rationale. Process discovery and model enhancement are core fields of process min- ing. They are usually performed on process logs, where each log represents an atomic denotation of a business event, i.e. an executed process activity. In sensor-based sce- narios, this is much harder to achieve, as time series of sensor data have to be pre- processed, aggregated, segmented and condensed into such log information. For that matter, process discovery in the internet of things must rely on different ground data to derive process models from process instances. Use Case Description. At Saarstahl AG, almost 2000 steel types exist. Each one of them has its own quality characteristics, recipes for its chemical mixture and associated standard work plans. For example, a certain steel type may require mandatory post- processing steps in order to fulfill the formulated quality requirements. In our case, it is interesting to analyze, whether the insights gathered from quality control (cf. use case I) can be utilized to adapt the business process model according to those insights. By that means, it should be analyzed, whether a formerly optional post-processing step should be made obligatory or vice versa. Methods. In the initial stage of the iPRODICT research project, the global reference process model was created by conducting interviews with the experts from Saarstahl AG. The process modelling was carried out in Software AG’s ARIS by using the event driven process chain approach. The obtained business process model incorporates the sequence of different activities ranging from the order processing through the produc- tion processes of the steel slabs. Subsequently, the variants of the process were identi- fied in terms of the individiual steel types. For this purpose the related work plans which provide information about the pre-defined activities (either upon request from the cus- tomers or the internal production requirements) and the quality inspection results as described in use case I were used to induce the process models in the instance level. Along with well-known algorithms from process discovery and model enhancement, the sensor data were segmented towards the slabs and their steel types. We computed different information retrieval metrics (particularly based on the a-priori probability distributions) to measure difference aspects of model similarities and extracted the data- N. Mehdiyev et al. driven implications and suggestions for enhancing the global reference model which were derived initially based on expert knowledge. 4 System Design A system capable of performing the aforementioned use cases is depicted in Figure 4. Fig. 4. iPRODICT system architecture The data acquisition encompasses sensor data, mobility data, video data assessing the surface quality of steel slabs and data inventories from enterprise systems such as ERP systems, order allocation systems, etc. In order to align the data and to perform the necessary pre-processing, cleansing, aggregation and segmentation a component is dedicated to this task of data preparation. For the different analyses carried out in iPRODICT, the system must cater different mechanisms for real-time analysis: both machine learning and complex event processing components. In the iPRODICT re- search project we analyzed the applicability of different Machine Learning approaches particularly for multi-class time series classification problem. For this purpose we ap- plied different approaches. Classification with state-of-the-art algorithms such as deci- sion trees, random forest, logistic regression, rule-induction techniques based on the features extracted from time series sensor data using the feature templates provided by domain experts was the initial approach. Since, the domain knowledge in the process industry about the impact of individual process parameter values (measured by sensors) iPRODICT – Intelligent Process Prediction based on Big Data Analytics to the product quality is restricted, which makes the supervised feature extraction al- most infeasible, we also investigated unsupervised approaches. To achieve the satisfac- tory results, we applied deep learning techniques particularly stacked LSTM (Long short-term memory) Autoencoders to extract the features from the time series data in an unsupervised manner. The extracted features which can also model the non-linear interdependencies among the individual sensor variables are then fed into a deep feed- forward neural networks to carry out the classification. Since the underlying data is quite imbalanced, i.e. certain error constellations either occur quite scarcely or not at all, Machine Learning results have to be combined with rule induction mechanisms to enhance existing expert rules to both allow insights from the gathered data and to counteract rule inductions introduced through random data correlations or sensor faults leveraging expert knowledge. The communication among those components is being done via an asynchronous message bus. The dashboards visualize the analytic results and provide action recommendations to the various stake- holders in the end-to-end process at the steel casting plant, the quality control and the production planning. In order to tackle the imbalanced nature of the data we also ex- amined various approaches such as over/undersampling and cost-sensitive learning techniques to achieve more reliable results. 5 Lessons Learnt Both practitioners and scholars suggest that the increasing availability of data facilitates the systematic analysis based on data mining and artificial intelligence approaches to beat the intuition based predictions [29]. Tremendous volume of data with high veloc- ity, the changing nature of both input and output data distributions over time, uncer- tainty related to data and prediction environment and other factors was the main moti- vator for developing a data-driven decision support solution within the frame of the iPRODICT research project. However, the experiences gained during the different stages of the iPRODICT research project suggest that it is also very important to incor- porate process knowledge obtained from the domain experts to machine learning anal- ysis for process adaptation, optimization and monitoring. The ability of processing un- structured information makes the judgmental analytics crucial. Recent evidence from literature suggests that human judgments and machine learning techniques must be combined. Integration is effective when judgments are collected in a systematic manner and then used as inputs to the quantitative models, rather than simply used as adjust- ments to the output [25]. Furthermore, the gap between the theoretical development of the predictive analytics approaches and their application in the industrial and business environments can also be observed in the underlying case study. This phenomenon which is described by [26] as “companies using quantitative forecasting methods does not appear to have changed over time, despite enormous advances in the use of computer technology” can be ex- plained with the survey results conducted by [27] where almost half of the respondents from multi-national companies considers the “lack of understanding of how to use the analytics to improve the business” as the main obstacle of adoption of analytics in their N. Mehdiyev et al. organizations. During both requirements analysis and implementation phase of the pro- ject, it has been repeatedly revealed that building machine learning based solution for automating the decision making processes is not preferred by the production managers. There is a need for transition phase which is necessary for building trust, during which the solution acts as a decision support system with high explanatory capabilities and easily understandable structure. After ensuring that the system provides robustness and accuracy in the desired level, the integration of the analytics to the business process can be automated. Acknowledgment This research was funded in part by the German Federal Ministry of Education and Research under grant numbers 01IS14004A (project iPRODICT). The iPRODICT re- search project consortium consists of Software AG, Saarstahl AG, Pattern Recognition Company GmbH, Fraunhofer Institute for Intelligent Analysis and Information Sys- tems (IAIS), Blue Yonder GmbH and German Research Center for Artificial Intelli- gence (DFKI). References 1. Davenport, T.H., Harris, J.G.: Competing on analytics: The new science of winning. Harvard Business School Press, Boston, MA, USA (2007) 2. Popa, E.M., Kiss, I.: Assessment of surface defects in the continuously cast steel. Acta Tech- nica Corviniensis - Bulletin of Engineering 4(4), 109 (2011) 3. van der Aalst, W.M.P., Schonenberg, M.H., Song, M.: Time prediction based on process mining. Information Systems 36(2), 450-475 (2011) 4. Rogge-Solti, A., Weske, M.: Prediction of remaining service execution time using stochastic petri nets with arbitrary firing delays. In: Basu, S., Pautasso, C., Zhang, L., Fu, X. (eds.) ICSOC 2013. LNCS, vol. 8274, pp. 389–403. Springer, Heidelberg (2013) 5. Polato, M., Sperduti, A., Burattin, A., de Leoni, M.: Time and activity sequence prediction of business process instances. arXiv preprint arXiv:1602.07566 (2016) 6. Kang, B., Kim, D., Kang, S.-H.: Real-time business process monitoring method for predic- tion of abnormal termination using KNNI-based LOF prediction. Expert Systems with Ap- plications 39(5), 6061-6068 (2012) 7. Kang, B., Kim, D., Kang, S.-H.: Periodic performance prediction for real-time business pro- cess monitoring. Industrial Management & Data Systems 112(1), 4-23 (2012) 8. Leontjeva, A., Conforti, R., Di Francescomarino, C., Dumas, M., Maggi, F.M.: Complex symbolic sequence encodings for predictive monitoring of business processes. In: Motahari- Nezhad, H.R., Recker, J., Weidlich, M. (eds.) BPM 2015. LNCS, vol. 9253, pp. 297–313. Springer, Switzerland (2015) 9. Le, M., Nauck, D., Gabrys, B., Martin, T.: Sequential Clustering for Event Sequences and Its Impact on Next Process Step Prediction. In: International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems, pp. 168-178. Springer, Cham (2014) 10. Le, M., Gabrys, B., Nauck, D.: A hybrid model for business process event and outcome prediction. Expert Systems (2014) iPRODICT – Intelligent Process Prediction based on Big Data Analytics 11. Unuvar, M., Lakshmanan, G.T., Doganata, Y.N.: Leveraging path information to generate predictions for parallel business processes. Knowledge and Information Systems 47, 433- 461 (2016) 12. Evermann, J., Rehse, J.R., Fettke, P.: Predicting process behaviour using deep learning. De- cision Support Systems 100, 129-140 (2017) 13. Tax, N., Verenich, I., La Rosa, M., Dumas, M.: Predictive business process monitoring with LSTM neural networks. In: International Conference on Advanced Information Systems En- gineering, pp. 477-492. Springer, Cham (2017). 14. Krumeich, J., Weis, B., Werth, D., Loos, P.: Event-driven business process management: where are we now? A comprehensive synthesis and analysis of literature. Business Process Management Journal 20, 615-633 (2014) 15. von Ammon, R.: Event-driven business process management. Encyclopedia of Database Systems, pp. 1068-1071. Springer US (2009) 16. Krumeich, J., Mehdiyev, N., Werth, D., Loos, P.: Towards an extended metamodel of event- driven process chains to model complex event patterns. In: International Conference on Con- ceptual Modeling, pp. 119-130. Springer, Cham, (2015) 17. Janiesch, C., Matzner, M., Müller, O.: Beyond process monitoring: a proof-of-concept of event-driven business activity management. Business Process Management Journal 18, 625- 643 (2012) 18. Vergidis, K., Tiwari, A., Majeed, B.: Business process analysis and optimization: Beyond reengineering. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 38, 69-82 (2008) 19. Vergidis, K., Tiwari, A., Majeed, B.: Business process improvement using multi-objective optimisation. BT Technology Journal 24, 229-235 (2006) 20. Tiwari, A., Vergidis, K., Majeed, B.: Evolutionary Multi-objective Optimization of Business Processes. In: IEEE Congress on Evolutionary Computation, pp. 3091-3097. IEEE (2006) 21. Turchin, Y., Gal, A., Wasserkrug, S.: Tuning complex event processing rules using the pre- diction-correction paradigm. In: Proceedings of the Third ACM International Conference on Distributed Event-Based Systems, pp. 10. ACM, (2009) 22. Akbar, A., Carrez, F., Moessner, K., Zoha, A.: Predicting complex events for pro-active IoT applications. In: IEEE 2nd World Forum on Internet of Things (WF-IoT), pp. 327-332. IEEE, (2015) 23. Paschke, A., Kozlenkov, A.: Rule-Based Event Processing and Reaction Rules. In: Gov- ernatori, G., Hall, J., Paschke, A. (eds.) Rule Interchange and Applications: International Symposium, RuleML 2009, Las Vegas, Nevada, USA pp. 53-66. Springer Berlin Heidelberg (2009) 24. Mehdiyev, N., Krumeich, J., Enke, D., Werth, D., Loos, P.: Determination of rule patterns in complex event processing using machine learning techniques. Procedia Computer Science 61, 395-401 (2015) 25. Armstrong, J.S.: The forecasting canon: nine generalizations to improve forecast accuracy. International Journal of Applied Forecasting 1, 29-35, (2005) 26. Sanders, N.R., Manrodt, K.B.: The efficacy of using judgmental versus quantitative fore- casting methods in practice. Omega 31, 511-522 (2003) 27. LaValle, S., Lesser, E., Shockley, R., Hopkins, M.S., Kruschwitz, N.: Big data, analytics and the path from insights to value. MIT Sloan Management Review 52, 21 (2011)