=Paper=
{{Paper
|id=Vol-2469/ERDemo01
|storemode=property
|title=Comprehensive Process Drift Analysis with the Visual Drift Detection Tool
|pdfUrl=https://ceur-ws.org/Vol-2469/ERDemo01.pdf
|volume=Vol-2469
|authors=Anton Yeshchenko,Claudio Di Ciccio,Jan Mendling,Artem Polyvyanyy
|dblpUrl=https://dblp.org/rec/conf/er/YeshchenkoCMP19
}}
==Comprehensive Process Drift Analysis with the Visual Drift Detection Tool==
<pdf width="1500px">https://ceur-ws.org/Vol-2469/ERDemo01.pdf</pdf>
<pre>
               Comprehensive Process Drift Analysis
                with the Visual Drift Detection Tool

 Anton Yeshchenko1r0000´0002´5346´8358s , Claudio Di Ciccio1r0000´0001´5570´0475s ,
 Jan Mendling1r0000´0002´7260´524Xs , and Artem Polyvyanyy2r0000´0002´7672´1643s
               1
                 Vienna University of Economics and Business, Vienna, Austria
          {anton.yeshchenko,claudio.di.ciccio,jan.mendling}@wu.ac.at
               2
                 The University of Melbourne, Parkville, VIC, 3010, Australia
                         artem.polyvyanyy@unimelb.edu.au

       Abstract. Recent research has introduced ideas from concept drift into process
       mining to enable the analysis of changes in business processes over time. This
       stream of research, however, has not yet addressed the challenges of drift categoriza-
       tion, drilling-down, and quantification. In this tool demonstration paper, we present
       a novel software tool to analyze process drifts, called Visual Drift Detection (VDD),
       which fulfills these requirements. The tool is of benefit to the researchers and prac-
       titioners in the business intelligence and process analytics area, and can constitute
       a valuable aid to those who are involved in business process redesign endeavors.

       Keywords: Process mining · Time series analysis · Change point detection · Declar-
       ative process models


1   Introduction
The availability of data has extended conceptual modeling as a research field of manu-
ally created models with automatic techniques for generating models from data. Process
mining is one of these recent extensions that is concerned with providing transparency
of how the businesses operate based on real-world event data. Process discovery is a
branch of process mining that takes as input event logs, i.e., collections of event sequences
(traces) wherein every event corresponds to an activity execution, and returns the model
that best describes the process generating the event log. However, process mining analyzes
aggregated snapshots of sequentially stored process executions. Therefore, it can overlook
the behavioral changes that occur in the time lapse during which those data were gathered.
In data mining, such a change over time is called a drift. A drift is a concept that process
mining has addressed only to a limited extent so far.
    Recent works such as that of Maaradji et al. [7] and Ostovar et al. [8] have focused on
the identification of specific drift types, based on the tracking of behavioral relations over
time through statistical tests. In this paper, we present a novel technique for process drift
detection, called Visual Drift Detection (VDD), which extends existing techniques by not
only finding drifts but also helping the user recognize their type. Furthermore, it facilitates
assessment of drifts through visual interpretation [10]. Our technique is founded in the
formal rigor of temporal logic of D ECLARE constraints [1,4] and time series analysis [9].
Key strengths of our technique are clustering of declarative behavioral constraints that
exhibit similar trends of change over time and automatic detection of drift points. These
features allow us to detect and explain drifts that would otherwise sneak undetected
by other techniques. In this paper, we outline our technique and illustrate the usage of

Copyright © 2019 for this paper by its authors. Use permitted under Creative Com-
mons License Attribution 4.0 International (CC BY 4.0).
                                                          Visual Drift Detection (VDD)    109


           Fig. 1: Drift types, cf. [5, Fig. 2]; notice that an outlier is not a drift.


the tool on a real-world data set publicly available on the 4TU Data Centre.1 The event
log contains events from a ticketing management process of the help desk of an Italian
software company [8]. We will henceforth refer to that data set as Italian help desk log.
    This is a tool demonstration paper illustrating the software implementation of the
VDD approach, which is detailed in [11] and shown in a dedicated video.2


2     Preliminaries

Process drifts. A process drift is a notion in process mining for analyzing behavioral
changes of business processes over time. The specific challenge is to not only spot a drift but
also to classify it. Figure 1 shows established drift classes from data mining. A sudden drift
is typically caused by an intervention, such as a new law exerting a change in the control
flow. An incremental drift might result from a stepwise introduction of new routines. A
gradual drift may yield from a new policy to be adopted in the enactment of the process.
Finally, a reoccurring drift might result from specific measures taken at every occurrence
of seasonal events, e.g., during holidays or festive days, weekends, night shifts, and so
on. An outlier corresponds to an isolate episode and, as such, it does not qualify as a drift.
Existing process mining techniques support these types of drifts only to a limited extent.
Declarative process constraints. In our approach, diagrams like those in Fig. 1 are
depicted considering the confidence level of process behavioral rules over time. In
particular, we resort on the repertoire of rules provided by the declarative specifica-
tion language D ECLARE [1,4]. Examples of D ECLARE constraints are R ESPONSEpa,bq,
A LTERNATE R ESPONSEpa, bq, and C HAIN R ESPONSEpa, bq. The first constraint applies
the R ESPONSE template on tasks a and b, and states that if a occurs then b must occur
later on within the same trace. In this case, a is named activation because it is men-
tioned in the “if” clause, thus triggering the constraint, whereas b is named target be-
cause it is in the “then” clause. R ESPONSEpa, bq holds true in a trace like xa, c, a, c, by.
A LTERNATE R ESPONSEpa,bq asserts that R ESPONSEpa,bq holds true and a does not recur
before b, as in xa,b,c,a,c,by. C HAIN R ESPONSEpa,bq imposes that R ESPONSEpa,bq holds
true and no other task occurs between a and b, as in xa,b,c,a,by. Declarative process mining
tools can measure to what degree constraints hold true in a given event log. One such mea-
sure is confidence [4]. It is computed as the number of activations that lead to a satisfaction
of the constraint (e.g., the number of a’s eventually followed by an occurrence of b for
R ESPONSEpa,bq) scaled by the percentage of traces in which the activation (a) occurs, to
penalize constraints that are triggered sporadically. Confidence value ranges from 0 to 1.

 1
     https://doi.org/10.4121/uuid:0c60edf1-6f83-4e75-9367-4c63b3e9d5bb
 2
     https://youtu.be/_AZpI_YTjO8
110                            A. Yeshchenko, C. Di Ciccio, J. Mendling, A. Polyvyanyy

      Table 1: Italian ticket log constraints; including min, max, and mean confidence.

           Cluster                Constraint            Activity 1               Activity 2     Min     Max    Mean
                         C HAIN P RECEDENCE    Take in charge ticket    Create SW anomaly        0.0   100.0   42.8
                9
                     A LTERNATE P RECEDENCE    Assign seriousness       Create SW anomaly        0.0   100.0   49.0
                         C HAIN P RECEDENCE    Take in charge ticket   Schedule intervention     0.0   100.0    9.9
               11
                     A LTERNATE P RECEDENCE    Assign seriousness      Schedule intervention     0.0   100.0    9.9
                           C HAIN R ESPONSE    Take in charge ticket                    Wait     9.4    69.6   23.2
                            N OT S UCCESSION         Resolve ticket                     Wait    10.0    77.2   26.0
                            N OT S UCCESSION                   Wait      Assign seriousness     10.0    78.0   26.6
                            N OT S UCCESSION                   Wait     Take in charge ticket    9.8    73.3   22.1
                4
                       A LTERNATE R ESPONSE    Assign seriousness                       Wait     9.0    72.3   23.8
                       A LTERNATE R ESPONSE                    Wait                  Closed      8.3    61.4   22.5
                       A LTERNATE R ESPONSE                    Wait           Resolve ticket     8.3    61.4   22.8
                               AT M OST O NE                   Wait                              9.8    68.6   25.1


3     The VDD Approach

Our multi-staged technique takes as input an event log (henceforth, log for short) and
returns visual diagnostics on process drifts. It consists of three main steps, which we shall
explain through their application on the case study of the Italian help desk log.
     In the first step, we sort the traces in the log by the timestamp of their respective first
events. Thereupon, we extract a sub-log of a given window size from the first traces. We
let the window slide over the log at a given step. From each sub-log we mine the set of
D ECLARE constraints and compute their confidence. In our case study, we set the window
size to 100 and the sliding step to 50. For the sample log, we mine D ECLARE constraints
out of 90 sub-logs. For each sub-log, we compute the confidence of 2604 constraints,
including those reported in Table 1.
     In the second step, we extract multi-variate time series that represent the trends of
the constraints’ confidence. Then, we cluster those time series with hierarchical cluster-
ing [2] to find groups of constraints that exhibit similar confidence trends (henceforth,
behavior clusters). We resort on the Pruned Exact Linear Time (PELT) algorithm [6] to
detect change points in the whole multi-variate time series as well as within the behavior
clusters. The change points denote process drifts. In our case study, we identify 16 clusters,
including those in Table 1. We detect 3 process drifts for the overall multi-variate time
series, and 12 within-cluster change points. We observe that the overall process drifts
that our technique discovers include those that were found by ProDrift [8]. They occur
approximately in the first half and towards the end of the time span.
     In the third step, we plot graphical representations to visually identify and character-
ize the detected drifts. We create two categories of visual aids: Drift Maps and Drift Charts.
Drift Maps display all drifts data on a two-dimensional plane. Figure 2(a) illustrates the
Drift Map for overall drifts and Fig. 2(b) shows the drifts within behavior clusters. The
x-axis is the time axis, while every constraint corresponds to a point along the y-axis. We
add vertical lines to mark the identified change points, i.e., drift points, and horizontal
lines to demark clusters. Constraints are sorted by the similarity of the confidence trends.
The values of the time series are represented through the plasma color-blind friendly color
map [10], from blue (low peak) to yellow (high peak). Drift Charts (e.g., those in Fig. 3)
have time on the x-axis and average confidence of the constraints in a cluster on the y-axis.
We add vertical lines to denote change points as in Drift Maps. In order to find and pinpoint
                                                             Visual Drift Detection (VDD)         111


     (a) Overall change points       (b) Drifts by cluster             (c) Most erratic cluster

                        Fig. 2: Italian help desk log VDD visualizations.


the most interesting (erratic) behavior clusters, we define a measure inspired by the idea
of finding the length of a poly-line in a plot. The rationale is, straight lines denote a regular
trend and have the shortest length, whilst more irregular, wavy curves evidence more
behavior changes and their length is larger.
Detecting and explaining drifts. As illustrated by the VDD visualization in Fig. 2(a), we
detect a sudden change in the first quarter, in addition to the two identified by ProDrift.
Following on that, we analyze the within-cluster changes (Fig. 2(b)) and notice that the
most erratic cluster contains an outlier, as shown by the spike in Fig. 2(c).
     In Fig. 3, we illustrate the most erratic examples of behavior, and, in Table 1, we
present the constraints that describe that specific behavior after applying the constraint
minimization algorithm of [3]. Figure 3(a) shows an erratic behavior, which visually
corresponds to the reoccurring concept classification from Fig. 1 (cluster 9). By
examining the constraints that constitute this behavior, we can conclude that in the dates
of the peak in Fig. 3(a) the activity Create SW anomaly always had Take in charge ticket
executed immediately beforehand (C HAIN P RECEDENCE). Also, we can conclude
that before Create SW anomaly, the Assign seriousness activity was executed and no
other Create SW anomaly occurred in between (A LTERNATE P RECEDENCE). Figure 3(b)
(cluster 11) has four spikes, where Schedule intervention activities occurred. Immediately
before Schedule intervention, Take in charge ticket occurred. Also, Assign seriousness had
to occur before Schedule intervention recurred. We notice, however, that this cluster shows
outlier behavior, due to its rare changes. Finally, Fig. 3(c) (cluster 4) depicts a gradual drift
until June 2012, and the incremental drift afterward. We notice that all constraints in the
cluster have Wait either as an activation (e.g., with A LTERNATE R ESPONSEpWait,closedq)
or as a target (e.g., with C HAIN R ESPONSEpTake in charge ticket,Waitq).


4      Maturity, Documentation and Screencast

We implemented the VDD tool in Python 3, resorting on the scipy library for time-series
clustering and the ruptures library for change point identification. We used the MINERful3
Java package for constraints discovery. We run our experiments using a laptop equipped
with an Intel Core i5 at 2.40GHz ˆ 4 with 16GB of RAM. With this modest hardware, the
 3
     https://github.com/cdc08x/MINERful
112                         A. Yeshchenko, C. Di Ciccio, J. Mendling, A. Polyvyanyy


          (a) Cluster 9                    (b) Cluster 11                    (c) Cluster 4

                          Fig. 3: Italian help desk log detailed clusters.


tool was able to process data and produce the analysis outcome in about 27 seconds using
a real-size event log with 21438 events from 14 activities over 4580 traces. This indicates
that the VDD tool has reached a fairly large degree of maturity as it performs well in terms
of scalability. In future work, we will focus on the prediction of drifts in running processes
and study how to improve the interpretability of depicted results.
    We have created a project website for the VDD tool, from which it can be downloaded
together with its sources.4 It is free for academic and non-commercial use under the MIT
license. On the project website, we provide documentation on its installation and first run.
A screencast documenting its usage is available at https://youtu.be/_AZpI_YTjO8.
Acknowledgements. This work is partially funded by the EU H2020 program under
MSCA-RISE agreement 645751 (RISE BPM). Artem Polyvyanyy was partly supported
by the Australian Research Council Discovery Project DP180102839.
References
 1. van der Aalst, W.M.P., Pesic, M.: DecSerFlow: Towards a truly declarative service flow
    language. In: WS-FM. Lecture Notes in Computer Science, vol. 4184, pp. 1–23. Springer (2006)
 2. Aghabozorgi, S., Seyed Shirkhorshidi, A., Ying Wah, T.: Time-series clustering - a decade
    review. IS 53(C), 16–38 (Oct 2015)
 3. Di Ciccio, C., Maggi, F.M., Montali, M., Mendling, J.: Resolving inconsistencies and
    redundancies in declarative process models. IS 64, 425–446 (Mar 2017)
 4. Di Ciccio, C., Mecella, M.: On the discovery of declarative control flows for artful processes.
    ACM TMIS 5(4), 24:1–24:37 (2015)
 5. Gama, J., Zliobaite, I., Bifet, A., Pechenizkiy, M., Bouchachia, A.: A survey on concept drift
    adaptation. ACM Comput. Surv. 46(4), 44:1–44:37 (2014)
 6. Killick, R., Fearnhead, P., Eckley, I.A.: Optimal detection of changepoints with a linear com-
    putational cost. Journal of the American Statistical Association 107(500), 1590–1598 (2012)
 7. Maaradji, A., Dumas, M., La Rosa, M., Ostovar, A.: Detecting sudden and gradual drifts in
    business processes from execution traces. IEEE TKDE 29(10), 2140–2154 (2017)
 8. Ostovar, A., Leemans, S.J., La Rosa, M.: Robust drift characterization from event streams of
    business processes (2018), https://eprints.qut.edu.au/121158/
 9. Reinsel, G.C.: Elements of multivariate time series analysis. Springer (1993)
10. Ware, C.: Information visualization: perception for design. Elsevier (2012)
11. Yeshchenko, A., Di Ciccio, C., Mendling, J., Polyvyanyy, A.: Comprehensive process drift
    detection with visual analytics. In: ER. Springer (2019), in print

 4
     https://github.com/yesanton/Process-Drift-Visualization-With-Declare

</pre>