Interactive Data-Driven Business Process Simulation (Extended Abstract) Gerhardus van Hulzen Research group Business Informatics Hasselt University Hasselt, Belgium 0000-0001-8962-9515 I. I NTRODUCTION individual BPS model components to make it usable to support CM decisions. Today, healthcare systems worldwide are under constant 2) Enabling interactive data-driven process simulation: pressure. On the one hand, increasing population numbers, Domain knowledge should be closely integrated during ageing populations, lifestyle factors, and new technologies are the discovery of BPS models to ensure the reliability increasing the yearly expenses on healthcare. On the other and usability of the discovered simulation models. hand, budgets are under pressure due to economic austerity [1]. In order to provide high-quality care to all patients, healthcare III. P LANNED R ESEARCH ACTIVITIES managers are forced to improve their care processes. Efficient The following subsections give an overview of the planned Capacity Management (CM) is one of the key aspects to ensure research activities for the two research objectives. this. This involves, amongst others, determining the suitable resource levels – i.e. staff size, equipment, and facilities [2]. A. Extended Support for Key BPS Modelling Tasks Business Process Simulation (BPS) can be used to support Based on a systematic literature review, we concluded managers during CM decisions. BPS uses a (computer) model that defining the control-flow, entity arrival rates, activity to imitate the behaviour of a business process. This approach execution times, gateway routing logic, entity types, queueing allows evaluating the effects of changes before implementing disciplines, resource schedules, resource requirements, and them [3]. For instance, BPS can be used to determine suitable resource roles are the most important modelling tasks to equipment levels, e.g. by simulating the effect of an additional support CM decisions via simulation. These tasks correspond X-ray scanner on patient waiting times, throughput rates, and to a subset of modelling tasks given by [7]. Most attention staff workload. of PM research has been dedicated to control-flow definition In Process Mining (PM) the emerging field of data-driven [7]. However, for creating a simulation model for supporting process simulation provides promising first results to generate CM decisions, we believe that all aforementioned tasks are simulation models from information captured in event logs [4]. required – albeit some tasks are more important than others. These “discovered” models can form the basis to compare In PM, only limited amount of work has been devoted the operational effects of various capacity levels. The main to integrating the various tasks needed to build a simulation advantage of data-driven process simulation over “traditional” model. The authors in [8] were the first to generate an initial simulation model development is the availability and objec- simulation model from data. They included the process-flow, tivity of event logs compared to information sources, such as gateway routing logic, and resource pools. Later, the authors interviews, process documentation, and observations [5]. How- extended their work with activity durations and entity inter- ever, some challenges remain in the field of automated BPS arrival times [5]. Nevertheless, the authors emphasise that the discovery. Most importantly, the lack of domain knowledge derived initial model still has to be verified and – if required makes it challenging to extract a reliable and usable simulation – augmented by domain experts to ensure validity. model. In addition, event logs often suffer from data quality In [9], a PM approach is proposed to generate BPS models issues, which strongly affects the reliability of the simulation for short-term KPI prediction. A similar approach as in [5] is results [6]. Therefore, it is imperative to take these problems used. However, the resource perspective is left aside, assuming seriously. an infinite amount of resources is available [9]. II. R ESEARCH O BJECTIVES Control-flow, resources, activity durations, and gateway routing logic are supported by the approach in [10]. In Given the context outlined above, this PhD research pursues addition, they also support inter-arrival times and resource the following two objectives: schedules. However, the latter have to be defined manually 1) Extended support for key BPS modelling tasks: While by the domain expert. the field of automated BPS discovery renders promising None of the aforementioned studies tried to integrate all results; there are still challenges ahead to discover elements into a single, simulation-ready model. This is where Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). Simod [11] extends the work on data-driven process simu- expert should immediately obtain an estimation of the impact lation. Simod is a tool which automatically discovers BPS of the changed parameter, instead of having to wait until the models from event logs. In addition, Simod is also capable of simulation has finished running, which could – depending on measuring the accuracy of the obtained simulation model and the complexity of the model – take quite a while. allows to optimise the accuracy using hyper-parameters [11]. The third cycle of the framework involves the actual model While the initial results of data-driven BPS algorithms are validation. The calibrated model is simulated extensively, and promising, there are still challenges to automatically derive the domain expert validates the simulation results. If needed, a simulation model for supporting CM decisions from event the parameters of the simulation model can be altered again logs. Especially the resource perspective is crucial for CM to obtain more realistic results. The validated model can be decisions. Incorrect resource requirements, pools, and sched- used for further analyses and to evaluate different scenarios. ules make the results of the model unreliable, resulting in The goal of this part of the PhD research is to develop a inaccurate capacity requirement estimations. The state-of-the- prototype which supports the interactive development of data- art still has limitations when it comes to defining the resource driven simulation models. perspective. Part of this PhD research will be dedicated to IV. C ONCLUDING R EMARKS improving the support of the resource perspective in data- This PhD will mainly focus on the resource aspect of data- driven BPS. driven BPS and how domain experts can be interactively B. Enabling Interactive Data-Driven Process Simulation involved in the discovery of simulation models. This should As mentioned earlier, data quality issues should be taken culminate in the development of a prototype tool which allows seriously to ensure the reliability of the data-driven simulation interactive data-driven generation of BPS models based on model. Detecting these issues often requires domain knowl- event logs and domain knowledge. The derived simulation edge. Therefore, it would be beneficial to involve the domain model will form the basis for supporting CM decisions in experts as early as possible to detect and handle data quality healthcare. Nevertheless, the prototype would also be usable in issues before integrating everything into a single simulation many other applications in different fields besides healthcare, model. Especially in stochastic models, such as simulation, a such as production planning in manufacturing, supply chain problem in one part of the model may have a profound impact logistics, and transportation. on other parts. It is much easier to solve issues at the root, then R EFERENCES having to trace back the problem in a full simulation model. [1] C. Hicks, T. McGovern, G. Prior, and I. Smith, “Applying Lean Ideally, domain experts would conduct simulation studies Principles to the Design of Healthcare Facilities,” International Journal themselves. After all, they know the process best. However, of Production Economics, vol. 170, pp. 677–686, 2015. [2] F. R. Jacobs and R. B. Chase, “Strategic Capacity Management,” in conducting simulation studies requires specific knowledge Operations and Supply Management: The Core, ser. Operations and which domain experts often do not possess. Of course, they Decision Sciences. New York, NY, USA: McGraw Hill/Irwin, 2008, could learn more about constructing simulation models, but pp. 51–79. [3] N. Melão and M. Pidd, “Use of Business Process Simulation: A Survey usually, they are very busy and do not have the time to master of Practitioners,” Journal of the Operational Research Society, vol. 54, the required skills. no. 1, pp. 2–10, 2003. [4] B. Depaire and N. Martin, “Data-Driven Process Simulation,” Encyclo- Against this background, we propose a framework to in- pedia of Big Data Technologies, 2018. teractively involve domain experts during the development [5] A. Rozinat, R. S. Mans, M. Song, and W. M. P. van der Aalst, of data-driven simulation models. The framework consists “Discovering Simulation Models,” Information Systems, vol. 34, no. 3, pp. 305–327, 2009. of three cycles. The first cycle is the initial model con- [6] L. Vanbrabant, N. Martin, K. Ramaekers, and K. Braekers, “Quality struction. In this step, for each required modelling task (e.g. of Input Data in Emergency Department Simulations: Framework and determining the inter-arrival rates, activity durations, resource Assessment Techniques,” Simulation Modelling Practice and Theory, vol. 91, pp. 83–101, 2019. requirements, the control-flow, etc.) the data requirements are [7] N. Martin, B. Depaire, and A. Caris, “The Use of Process Mining in established. If these requirements are fulfilled, the quality of Business Process Simulation Model Construction,” Business & Informa- the data is assessed, and a discovery algorithm is applied. The tion Systems Engineering, vol. 58, no. 1, pp. 73–87, 2016. [8] A. Rozinat, R. S. Mans, and W. M. P. van der Aalst, “Mining CPN results of this algorithm, together with the detected data quality Models: Discovering Process Models with Data from Event Logs,” in issues (e.g. missing values, outliers, inconsistencies, etc.), are Workshop and Tutorial on Practical Use of Coloured Petri Nets and the presented to the domain expert for validation. If needed, CPN Tools, K. Jensen, Ed., Aarhus, Denmark, 2006, pp. 57–76. [9] I. Khodyrev and S. Popova, “Discrete Modeling and Simulation of the expert can correct these issues and alter the discovery Business Processes Using Event Logs,” in Proceedings of the 14th Inter- parameters until he or she is satisfied with the results. national Conference on Computational Science, ser. Procedia Computer In the second cycle, all the initial model components from Science, D. Abramson, M. Lees, V. Krzhizhanovskaya, J. Dongarra, and P. M. A. Sloot, Eds., vol. 29. Cairns, QLD, Australia: Elsevier, 2014, the first cycle are integrated into a single simulation-ready pp. 322–331. model. The entire model will run for the first time, and the [10] B. Gawin and B. Marcinkowski, “How Close to Reality is the “as-is” preliminary results will be validated for the first time by the Business Process Simulation Model?” Organizacija, vol. 48, no. 3, pp. 155–175, 2015. domain expert. By altering parameters, the domain expert [11] M. Camargo, M. Dumas, and O. González-Rojas, “Automated Discovery can “calibrate” the model until he or she is satisfied with of Business Process Simulation Models from Event Logs,” Decision the preliminary results. During this calibration, the domain Support Systems, vol. 134, 2020.