=Paper= {{Paper |id=Vol-2673/paperDR03 |storemode=property |title=PMSD: Data-Driven Simulation Using System Dynamics and Process Mining |pdfUrl=https://ceur-ws.org/Vol-2673/paperDR03.pdf |volume=Vol-2673 |authors=Mahsa Pourbafrani,Wil M. P. van der Aalst |dblpUrl=https://dblp.org/rec/conf/bpm/PourbafraniA20 }} ==PMSD: Data-Driven Simulation Using System Dynamics and Process Mining== https://ceur-ws.org/Vol-2673/paperDR03.pdf
     PMSD: Data-Driven Simulation Using System
          Dynamics and Process Mining ?

                  Mahsa Pourbafrani and Wil M. P. van der Aalst

       Chair of Process and Data Science, RWTH Aachen University, Germany
                 {mahsa.bafrani,wvdaalst}@pads.rwth-aachen.de



       Abstract. Process mining extends far beyond process discovery and
       conformance checking, and also provides techniques for bottleneck anal-
       ysis and organizational mining. However, these techniques are mostly
       backward-looking. PMSD is a web application tool that supports forward-
       looking simulation techniques. It transforms the event data and process
       mining results into a simulation model which can be executed and vali-
       dated. PMSD includes log transformation, time window selection, rela-
       tion detection, interactive model generation, simulating and validating
       the models in the form of system dynamics, i.e., a technique for an ag-
       gregated simulation. The results of the modules are visualized in the tool
       for a better interpretation.

       Keywords: Process mining · Simulation · System Dynamics · What-if
       analysis


1     Introduction
Process mining uses stored event data of organizations, i.e., event logs, to provide
actionable insights for organizations [1]. Different tools address process discovery,
performance analysis, bottleneck analysis, and deviation detection. Yet, the gap
between the backward-looking and the forward-looking process mining techniques
remains. Traditional forward-looking techniques as mentioned in [2], use events
in the process as a basis of simulation. They aimed to mimic the process at the
level of detail and simulate it. In more recent simulation tool such as [3], different
level of detail for simulation is acquired, e,g., duration of activities and the flow
of activities are used. Moreover, the Monte Carlo technique is used in the pm4py
tool1 for simulating discovered Petri nets.
    In PMSD, we use the idea that a simulation model can be learned from the
event data at an aggregated level. The traditional connections between process
mining and simulation mainly use a descriptive model discovered in the discov-
ery step to enrich the simulation models at the level of the process instances,
e.g., Discrete Event Simulation (DES). The presented tool is the result of our
 ?
    Funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under
   Germany’s Excellence Strategy – EXC 2023 Internet of Production- Project ID: 390621612. We
   also thank the Alexander von Humboldt (AvH) Stiftung for supporting our research.
 1
    http://pm4py.pads.rwth-aachen.de




Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons
License Attribution 4.0 International (CC BY 4.0).
2       M. Pourbafrani and Wil M. P. van der Aalst

                         Tool Scope
                                                                     No
                      Preprocessing                    Model Generation
                        Event Log    SD-Log                Relation Detection
          Event Log    Preparation Generation SD Log                                                               Model       Further
                                                          CLD Model Generation   Simulation   Validation   Yes
                       Time Window SD-Log                                                                        Refinement   Prediction
                         Selection  Analysis              SFD Model Generation




Fig. 1. Our proposed framework for using process mining and system dynamics to-
gether in order to design valid models to support scenario-based prediction of business
processes in [4]. This paper focuses on the developed tool, i.e., the highlighted step.



approach in generating simulation results for business processes at an aggregated
level providing the option to add external factors into the simulation [4]. Figure 1
shows the overview of the approach starting from an event log and ending with
a scenario-based simulation model. The steps indicated in the highlighted parts
are supported by the tool. We extract possible variables from the process in dif-
ferent steps of time instead of taking the events into account for the simulation
as shown in Fig. 2.
                                        The Model generation module is introduced
                                    in [6] and the preprocessing step is presented
                                    in [5]. The event log is transformed into a set
                                    of variables over time and the values of these
                                    variables form the System Dynamics logs (SD-
                                    Logs). To generate more stable SD-Logs, we use
                                    time series analysis over the values. The rela-
                                    tions between variables over time in the SD-
                                    Log are used for creating the system dynamics
                                    models. We support both causal loop diagrams
Fig. 2. Traditional Simulation
                                    (CLD) and stock-flow diagrams (SFD). System
vs. PMSD. We extract possible
variables (m) over time steps (k ).
                                    dynamics models the systems and their rela-
                                    tions with the environment [8]. CLDs represent
                                    these conceptual relationships and SFDs model
                                    the underlying equations using stock, flow and
variable notations. Flows add/remove to/from the values of stocks, also, vari-
ables affect/get affected by the flows, other variables. PMSD provides insights
through the processes over time which can be hidden from the user, e.g., a non-
linear relation between the workload of resources and the speed of performing
tasks.


2    Description of Functionalities

In our approach, the possible process variables are extracted over time, e.g., ar-
rival rate per day and average service time per day. The newly generated log
(SD-Log) is the cornerstone of the simulation. The preprocessing step and ex-
tracting the best parameters in the framework by means of time series analysis
                                                                                                                                              PMSD   3

                                                                           SD-Log &Enriched SFD (.mdl file)
                    Visualized Validation Results
                                                        Different Time Windows

                                                 Event Log        Prepared       Time Window             Simulation         Structured Data
                          Event Log
                                                Preparation       Event Log        Selection             Validation         of Model (.mdl)

             User                                                                                             Structured Data            SFD
                                               Best Time Window
                                                                                                              of Model (.mdl)         Generation
                                                                  Time Window
                           Time Window
                            & Prepared                                                   Discovered     CLD Generation
                                               SD-Log                   Relation
                             Event Log                      SD-Log                        Relations      (Conceptual
                                             Generation                Detection          in SD-Log         Model)
                              &Level
                                                      Selected Relations
                                                                                                       Equations & Mapping Elements




Fig. 3. Data flow diagram of the PMSD including data flow between the user and the
main modules as well as the background flow of data between the modules.



proposed in [5]. To form a valid system dynamics model, we have to discover
all the relations, i.e., linear and nonlinear correlations, between the generated
process variables over time as introduced in [6]. Analyzing a process and creat-
ing aggregated features of the process over time (process variables) for further
analyses is the main focus of the tool.
    PMSD is being designed in such a way that in all the steps, the outputs are
accessible for users. Figure 3 depicts the data flow diagram of the application.
The inputs and generated outputs in each module and the interactions with the
user are shown. The generated SD-Logs including active steps in the processes as
well as all the steps for the different selected time windows in the form of .csv are
captured. Also, all the designed CLDs and SFDs in the .mdl format are stored
locally for the user. To run the tool locally, the home page can be accessed via
any browser using the http://127.0.0.1:5000 URL. All the modules are designed
as different tabs and are visually accessible. PMSD is a fully interactive tool with
a user interface based on Python and Flask technology. The results of the steps
are shown graphically to provide an easier interpretation possible. It contains
8 tabs and each tab can be run separately with different inputs/output of the
other modules/tabs. Currently, the following components are available:

 – Event log transformation indicates the main attributes of the event log, dis-
   covers the directly follows graph, and presents the event log’s information.
 – Time window selection assesses the quality of the user’s preference for se-
   lecting a time window for generating simulation data.
 – Simulation log generator uses the transformed event log and the selected time
   window to generate simulation data (SD-Log). It generates an SD-Log for
   different aspects and levels, i.e., general process, organizational, and activity
   aspects. For instance, an SD-Log of the general aspect of a process includes
   the arrival rate of the process, and average service time in the process and
   other possible measurable variables per day.
 – Relation detection investigates whether there is any strong relationship be-
   tween the variables in the extracted SD-Log. Furthermore, the user can look
   for the relations between variables in different steps of time.
 – Detailed relations, presents the existing relations between every two variables
   in the SD-log for further investigation on the types of relations.
4         M. Pourbafrani and Wil M. P. van der Aalst

    – Interactive conceptual model generation provides the option for the user to
      choose between all the strong relations discovered in the relation detection
      module and creates CLD, i.e., effects and relations between process variables.
      It generates both the graphical model in the tool and the .mdl (text format)
      file to be used in most of the system dynamics tools, e.g., Vensim 2 .
    – Interactive stock-flow diagram generates SFDs graphically in PMSD and the
      (.mdl) file. The relations are directly transformed from the CLD (previous
      step) and the user can map the process variables to the SFD elements.
    – Simulation and validation simulates the SFD model using the values in the
      SD-Log and validates the results using the pair-wise comparison of the SD-
      Log and simulation results values and their distributions.


3      Maturity of the Tool
The evaluation results of our proposed forward-looking approaches in process
mining are represented using different modules of the tool. PMSD along with a
tutorial and a screen-cast is available on GitHub.3 It has also been used in some
industrial projects, e.g., in the project of Internet of Production in the context
of Industry 4.0. In [7], part of the results of using PMSD for the production line
is presented. By an example, i.e., an event log of a call center designed by the
CPN tool, we show some similar results.
                                                  We use different suggested time
                                              windows to extract values over time
                                              for the possible process variables us-
                                              ing the time window test. The result
                                              in Fig. 4 shows the selected time win-
                                              dows by the user and the errors of
                                              trained models for each time window.
                                              Figure 5 represents the user interface
                                              for selecting the strong detected rela-
                                              tions between the variables. Finally,
                                              by uploading the generated SFD and
                                              SD-log (both are automatically gen-
Fig. 4. Stability test showing the error of
                                              erated), the automatic simulation is
training models for the time windows.
                                              performed and the validation results
are shown in validation module. The results include a comparison between the
real values and the simulated ones and their distributions for the selected vari-
ables.


4      Conclusion
In this paper, we introduced PMSD to support designing system dynamics mod-
els for simulation in the context of business processes. Using PMSD, we look into
2
     www.vensim.com
3
    https://github.com/mbafrani/PMSD
                                                                        PMSD          5




Fig. 5. The conceptual modeling section showing the detected relations and their
strength between the variables. The user is able to select among the selected relations.


the processes at different aggregation levels, e.g., hourly or daily, as well as dif-
ferent aspects, e.g., overall process or organizational aspects. The provided user
interface and the graphical outputs make the interpretation of the results easy.
Applying PMSD, the underlying effects and relations at the instance level can be
detected and modeled in an aggregated manner. Besides the option to simulate
and validate the models directly in the tool, the models can be simulated or
refined by adding external variables using simulation software like Vensim.

References
1. van der Aalst, W.M.P.: Process Mining - Data Science in Action, Second Edition.
   Springer (2016)
2. van der Aalst, W.M.P.: Process Mining and Simulation: A Match Made in Heaven!
   In: Computer Simulation Conference. pp. 1–12. ACM Press (2018)
3. Camargo, M., Dumas, M., Rojas, O.G.: Simod: A tool for automated discovery
   of business process simulation models. In: Proceedings of Demonstration Track at
   BPM 2019. pp. 139–143 (2019)
4. Pourbafrani, M., van Zelst, S.J., van der Aalst, W.M.P.: Scenario-based predic-
   tion of business processes using system dynamics. In: On the Move to Meaning-
   ful Internet Systems: OTM 2019 Conferences - Confederated International Con-
   ferences: CoopIS, ODBASE, C&TC 2019, Rhodes, Greece, October 21-25, 2019,
   Proceedings. pp. 422–439 (2019). https://doi.org/10.1007/978-3-030-33246-4 27,
   https://doi.org/10.1007/978-3-030-33246-4 27
5. Pourbafrani, M., van Zelst, S.J., van der Aalst, W.M.P.: Semi-automated time-
   granularity detection for data-driven simulation using process mining and system
   dynamics. In: Conceptual Modeling - 39th International Conference, ER 2020, Vi-
   enna, Austria, November 3-6, 2020, Proceedings (2020)
6. Pourbafrani, M., van Zelst, S.J., van der Aalst, W.M.P.: Supporting automatic sys-
   tem dynamics model generation for simulation in the context of process mining. In:
   Business Information Systems - 23st International Conference, BIS 2020, Colorado
   Springs,USA, 8-10 June , 2020, Proceedings (2020)
7. Pourbafrani, M., van Zelst, S.J., van der Aalst, W.M.P.: Supporting decisions in
   production line processes by combining process mining and system dynamics. In:
   Proceedings of the 3rd International Conference on Intelligent Human Systems In-
   tegration. pp. 461–467 (2020). https://doi.org/10.1007/978-3-030-39512-4 72
8. Sterman, J.: System Dynamics: Systems Thinking and Modeling for a Complex
   World (2002)