ProcessProfiler3D: A Tool for Visualising
       Performance Differences Between Process
            Cohorts and Process Instances

    E. Poppe1 , M.T. Wynn1 , A.H.M. ter Hofstede12 , R. Brown1 , A. Pini3 , and
                             W.M.P. van der Aalst21
              1
              Queensland University of Technology, Queensland, Australia
          2
            Eindhoven University of Technology, Eindhoven, The Netherlands
          3
            DensityDesign Research Lab, Politecnico di Milano, Milan, Italy


        Abstract. An organisation’s event logs can give great insight into fac-
        tors that affect the execution of their business processes by comparing
        different process cohorts. We have recently presented ProcessProfiler3D,
        a novel tool for such comparisons that supports interactive data explo-
        ration, automatic calculation of performance data and visual comparison
        of multiple cohorts. The approach enables the intuitive discovery of dif-
        ferences and trends in cohort performance. To better support the inter-
        pretation of these differences in the context of process execution we now
        extended the tool with a novel visualisation technique that enables the
        visualisation of case execution and timing in a way that provides context
        to such a performance analysis.


1     Introduction

Analysing process data in event logs to identify problems and opportunities
with existing processes can be of great value for improving the processes of an
organisation. Process mining [1], a specialised field of research in business pro-
cess management, develops tools and techniques to support this. By splitting
an event log into process cohorts, i.e. a group of process instances that have one
or more shared characteristics, one can analyse how different case character-
istics (often called context factors) affect the execution of a process. We have
recently identified that despite continued industry interest [6,4,2], there is a lack
of tools to support such analyses effectively [7]. None of the existing academic or
commercial tools provided both support for interactive data exploration, by sup-
porting interactive splitting of the event log, as well as an integrated comparison
of multiple process cohorts, by supporting the visualisation of performance data
for more than two cohorts in one view. Consequently, we presented ProcessPro-
filer3D, a framework to solve this issue [7]. We now present a complementary
novel visualisation technique that covers additional performance analysis scenar-
ios by providing additional context to the presented performance data.
Fig. 1: Example of comparative process performance visualisation for four co-
horts on a hierarchical process model (across two dimensions - time and fre-
quency) using ProcessProfiler3D


2   Overview of the tool
ProcessProfiler3D enables comparing the performance of multiple process co-
horts by

 – aligning an event log with a process model
 – calculating common node level process performance indicators such as activ-
   ity duration, activity throughput time and waiting times between activities
 – storing performance data in a data cube
 – interactively splitting the event log by defining cohorts
 – visualising performance data in a third dimension on top of the process model
   at multiple levels of process abstraction
 – visualising data related to activities using either one of three different types
   of bar charts or a triangle chart (see [5])
 – visualising data related to activity pairs can be visualised using coloured arcs
   between the two activities (see [7])

   The framework was implemented in two plugins for the process mining frame-
work ProM. Figure 1 shows an example of comparative performance analysis
using this tool.
   However, we note that some scenarios are still not well covered by existing
performance analysis techniques and in the remainder of this paper we will dis-
cuss one of these scenarios and present a novel visualisation technique that we
have added to ProcessProfiler3D to address this issue.


3   Problem statement
One issue with existing techniques for process performance analysis is the loss of
context that occurs when performance data are a) localised and b) summarised
as is usually the case with activity duration, throughput time and waiting time
calculations. Both problems have the potential to affect our understanding of
performance analysis results and can complicate finding root causes.
    Firstly, the analysis results are currently localised to one point in the process
model. For example, an activity C may be preceded by either activity A or
B. By looking at performance indicators of these activities we cannot tell if
cases that first executed A on average take longer to execute C than cases that
executed B. So by localising the analysis results per activity we lose the context
of how preceding activities affected the case and how subsequent activities were
impacted.
    Secondly, the statistical summary of performance indicators by minimum,
median, mean and maximum also means that we are losing context in the re-
sults. It is, for example, hard to tell whether a few extreme cases skewed the
results or what the general distribution of cases is. Furthermore, if the same
case executes an activity multiple times, it is impossible to identify differences
between the individual execution times (e.g. the activity took much time on the
first execution, but finished really quickly on every following execution). Some
absolute indicators, such as the average case runtime at an activity, also get
distorted by loops. Consequently, while existing process performance analysis
techniques already provide valuable insights into the execution of a process, ad-
ditional analysis techniques are required to add context to the results of existing
techniques.


4   Trajectory Visualisation


Fig. 2: Example of the proposed visualisation for process performance compar-
isons based on case trajectories (variant 3)


    We therefore propose a novel visualisation technique inspired by geo-spatial
data visualisations (e.g. [3]) to present performance data in the context of both
history and future execution of a process instance. This visualisation presents
the path of individual cases through a process model, while showing timing
information in a third dimension, orthogonal to the process model. An example
of this technique can be seen in Figure 2.
    We construct this visualisation by replaying a token-game on a given Petri
net and recording each token move as a line in two dimensions. We then use the
time of each event that triggered the token move to calculate the height of the
start and end point of each line. Our implementation provides three different
configurations of the trajectory visualisation. The first variant visualises token
paths from one activity straight to the next activity. The second variant visualises
the token paths from the activity through the place to the next activity. The third
variant visualises the token path from the activity along the edge connecting it to
the place and then along the edge to the next activity. Each variant increases the
complexity of the visualisation, but often lines following the model layout more
precisely make it easier to relate them back to the underlying process model and
therefore easier to understand. To further facilitate this, vertical support lines
can be displayed by selecting nodes in the process model, as shown in Figure 2.
    In addition to the shape of case trajectories, colours can encode additional
information in the visualisation. By default, case trajectories are coloured to
indicate the cohort a case belongs to (see Figure 3). However, our implemen-
tation can also colour the trajectory to display relative completion of the case
as a colour gradient. This can facilitate finding bottlenecks in large event logs.
Furthermore, the cohort classification can be used to filter the visualisation, by
hiding trajectories belonging to a particular cohort. Lastly, the vertical scale of
the visualisation can be changed by clicking on the white frame surrounding the
trajectories and pulling it upwards or downwards. This can make it easier to see
differences between otherwise densely packed trajectories.
    Seeing both control-flow and time perspective in one view enables users to
identify interactions between control-flow constructs such as loops and process
execution times. Using this technique together with the previously presented
techniques for comparative performance visualisation (see [7]) therefore facili-
tates the understanding of performance analysis results.


Fig. 3: Trajectory view of a complex event log with colour encoding the cohort
a trajectory belongs to
5     Conclusion

We have presented ProcessProfiler3D, a framework that can be used to analyse
and compare the performance of multiple process cohorts. The usefulness of this
framework has previously been demonstrated by analysing two industry data
sets and evaluating the tool with two industry partners [7]. In this paper we
have added a novel visualisation technique, the trajectory visualisation, to this
framework, to address the loss of context in the existing performance analysis
approaches.
    The framework is available as a package (called “ProcessProfiler3D”) for the
process mining framework ProM. In addition, the complete source code for the
tool including the trajectory visualisation is available in the ProM repository4 .
    A screencast of the tool including the new technique is available at:
    https://www.youtube.com/watch?v=CkgBTFk6MXY


6     References

References
1. van der Aalst, W.M.P.: Process Mining: Data Science in Action. Springer (2016)
2. Bolt, A., de Leoni, M., van der Aalst, W.M.P., Gorissen, P.: Exploiting process
   cubes, analytic workflows and process mining for business process reporting: A case
   study in education. In: International Symposium on Data-driven Process Discovery
   and Analysis. pp. 33–47. CEUR-WS.org (2015)
3. Kraak, M.J.: The space-time cube revisited from a geovisualization perspective. In:
   Proc. 21st International Cartographic Conference. pp. 1988–1996 (2003)
4. Partington, A., Wynn, M., Suriadi, S., Ouyang, C., Karnon, J.: Process mining
   for clinical processes: A comparative analysis of four Australian hospitals. ACM
   Transactions on Management Information Systems 5(4), 19:1–19:18 (Jan 2015)
5. Pini, A., Brown, R., Wynn, M.T.: Process visualization techniques for multi-
   perspective process comparisons. In: Bae, J., Suriadi, S., Wen, L. (eds.) Asia Pacific
   Business Process Management. Lecture Notes in Business Information Processing,
   vol. 219, pp. 183–197. Springer, Busan, Korea (March 2015)
6. Suriadi, S., Wynn, M.T., Ouyang, C., ter Hofstede, A.H.M., van Dijk, N.J.: Un-
   derstanding process behaviours in a large insurance company in Australia: A case
   study. In: Salinesi, C., Norrie, M.C., Pastor, O. (eds.) Advanced Information Systems
   Engineering, Lecture Notes in Computer Science, vol. 7908, pp. 449–464. Springer
   (2013)
7. Wynn, M.T., Poppe, E., Xu, J., ter Hofstede, A.H.M., Brown, R.A., Pini, A., van der
   Aalst, W.M.P.: ProcessProfiler3D: A visualisation framework for log-based process
   performance comparison. Decision Support Systems (2017, in press), https://doi.
   org/10.1016/j.dss.2017.04.004


4
    https://svn.win.tue.nl/repos/prom/Packages/ProcessProfiler3D/