=Paper=
{{Paper
|id=Vol-2703/paperDC6
|storemode=property
|title=Behavioural Clustering by Extensive Declarative Specifications Measurements (Extended Abstract)
|pdfUrl=https://ceur-ws.org/Vol-2703/paperDC6.pdf
|volume=Vol-2703
|authors=Alessio Cecconi
|dblpUrl=https://dblp.org/rec/conf/icpm/Cecconi20
}}
==Behavioural Clustering by Extensive Declarative Specifications Measurements (Extended Abstract)==
Behavioural Clustering by Extensive Declarative
Specifications Measurements (Extended Abstract)
Alessio Cecconi
Vienna University of Economics and Business, Vienna, Austria
alessio.cecconi@wu.ac.at
I. I NTRODUCTION Table I: Process mining techniques using trace clustering.
The recognition, classification and grouping of distinct Technique
Clustering Control-flow Data Clustering
approach perspective perspective algorithms
process behaviours in an event log is a key aspect of process Greco et al. [1] Vector-based procedural no K-means, Hierarchical Clustering
analysis. In unstructured and flexible processes contexts this Song et al. [2] Vector-based procedural yes K-means, Quality Threshold, Agglomer-
ative Clustering, SelfOrganizing Maps
is not straightforward and the literature devises different Jablonski et al. [3] Vector-based procedural yes Hierarchical clustering
Bose and
techniques to tackle the problem. An effective one has been van der Aalst
[4] Vector-based procedural no Hierarchical clustering
found in trace clustering, namely a set of techniques which Ferreira et al. [5] Model-based procedural no 1st order Markov chain Expectation-
Maximization
automatically group similar traces according to specified De Koninck and
De Weerdt
[6] Model-based procedural no Active learning
criteria, allowing for better understandability and decreased Wang et al. [7] Model-based procedural no Constrained clustering, agglomerative hi-
erarchical clustering, spectral clustering
complexity of the analysis. However, all available clustering Bose and
van der Aalst
[8] Context-aware procedural no edit-distance, agglomerative clustering
techniques are designed exclusively with procedural process Evermann et al. [9] Context-aware procedural no K-means
models. For those techniques the key aspect for trace similarity Nguyen et al. [10]
De Koninck and
Mixed procedural yes Graph path similarity
[11] Mixed procedural yes K-means, active learning
is the precise sequence of execution of events, as they consider De Weerdt
only events that immediately follow or precede one another.
Yet, the properties and relations of events in a process may
fall outside such a narrow scope. different clusters, the discovery techniques can be applied
In our research, we want to explore the opportunity of only to discover the models of each cluster, resulting in a
employing declarative process mining for trace clustering. We set of simpler and more understandable models of particular
believe that the characteristics of declarative specifications can behaviours of the process. Table I summarizes the current
lead to novel results given the focus on different relations of the applications of trace clustering in process mining.
events in the event log. Indeed, a declarative rule describes a It can be noticed that different approaches, perspectives,
desired property of the process, not a specific execution. Thus, and algorithms have been tried, yet all the current trace
grouping around them suggests clusters centred on flexible, clustering techniques in process mining share, not really a
complex, and yet specific behaviours of the process instead of limit, but rather a common trait: only procedural models
strict events sequence similarity. are considered. Accordingly the control-flow perspective is
Any clustering technique is based on similarity (or distance) inspected only for its continuous subsequences, i.e., only
concepts describing how close or distant objects are. Never- directly following relations, thus local proximity of activities is
theless, the current declarative rules evaluation methods are preferred in the clustering composition. This is not a limit of the
limited to devise a comprehensive similarity concept for traces clustering techniques per se, but in the object used to devise the
based on rules. To fill this gap it is required an extensive characteristics upon which basing the clustering. For example,
measurement system for declarative specifications. consider two traces xa, b, c, d, e, f y and xb, a, d, c, f, ey where
the events are couple-wise swapped, but a transitivity property
II. BACKGROUND between tasks a, b, and c is preserved (i.e., a Ñ c Ñ e). If this
Trace Clustering. The goal of trace clustering is to find transitivity property is of interest, both the traces should be
traces of similar behaviour and group them into clusters. The grouped in the same cluster, but the directly-follow relations
guiding rule is to maximise the similarity within a cluster while between the two traces is messed, thus they may result too
maximising the distance with the other clusters. Three main different to appear in the same cluster. As a result, similar
class of approaches exist: (i) Vector-based, where the traces are traces may be disjoined or different ones may be grouped.
transformed into feature vectors and distance metrics are used Evaluation of declarative specifications. Declarative process
in the vector space (e.g. [1], [2]); (ii) Context-aware, where mining mostly resorts to quality measures from association
string distance metrics are applied directly on the whole traces rule mining [12] to qualify single rules with respect to event
(e.g. [4], [9]); (iii) Model-based, where traces are clustered logs. Support and confidence are the most adopted measures
around fitting process models (e.g. [5], [6]). Trace clustering on that regard, yet they are reportedly not sufficient to avoid
has been employed in process mining to assist the discovery a great amount of spurious results [13], which threatens the
of procedural process models. Dividing the event log into statistical soundness of the results. Also, there are different
Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
definitions for support [14], [15], [16] and confidence [14], datasets, e.g. [18]. The controlled environment of a simulation
[15], [16]. For example, the support measure of [16] cannot is required to check the validity of the results in absence of a
be compared to the support of [14] because of the different ground truth, while real-life data allows to asses the feasibility
definitions. Furthermore these techniques defined the measures of the technique in realistic settings.
only for a limited set of rules (i.e., the standard D ECLARE
IV. C ONCLUSION
rules-set). Thus, the comparison of techniques is hampered
by their customized definitions of the same measures and Trace clustering is a relevant topic and the employment of
the transferability of measures themselves between techniques declarative process mining in that regard is promising and
is limited. The result is a scattered adoption of a small set especially still unexplored. Yet, the current evaluation systems
of measures dependent either to a specific language or set for declarative specification are not enough for a truly effective
of rules. Different other measures have been studied to go trace clustering based on them. Given these open points, there
beyond this limit [17], yet they have been not fully exploited is a call for: (i) an extended evaluation system for declarative
in process mining area. Thus a more advanced and extensive specifications. (ii) a novel application of declarative process
evaluation system for declarative specifications is required to mining for trace clustering. Markedly, we recently achieved
base efficiently trace clustering on them. the first point in [19], based our previous work [20].
III. C ONTRIBUTION R EFERENCES
[1] G. Greco, A. Guzzo, L. Pontieri, and D. Saccà, “Discovering expressive
With this research we aim to explore the integration of process models by clustering log traces,” IEEE Trans. Knowl. Data Eng.,
declarative process mining and trace clustering. The expres- vol. 18, no. 8, pp. 1010–1027, 2006.
siveness of declarative rules can allow for a new clustering [2] M. Song, C. W. Günther, and W. M. P. van der Aalst, “Trace clustering
in process mining,” in BPM Workshops, 2008, pp. 109–120.
based on clear desired properties of the process, and not strict [3] S. Jablonski, M. Röglinger, S. Schönig, and K. M. Wyrtki, “Multi-
events sequences. In order to do so, an extension of the current perspective clustering of process execution traces,” Enterp. Model. Inf.
evaluation techniques for declarative specifications is required. Syst. Archit. Int. J. Concept. Model., vol. 14, pp. 2:1–2:22, 2018.
[4] R. P. J. C. Bose and W. M. P. van der Aalst, “Trace clustering based on
A declarative specification allows for complex relations conserved patterns: Towards achieving better process models,” in BPM
among activities regardless of their distance in the execution Workshops, 2009, pp. 170–181.
flow. That is because each specification models a desired [5] D. R. Ferreira, M. Zacarias, M. Malheiros, and P. Ferreira, “Approaching
process mining with sequence clustering: Experiments and findings,” in
properties of the process, not a specific executions. At the BPM, 2007, pp. 360–374.
best of our knowledge, the combination of declarative process [6] P. De Koninck and J. De Weerdt, “Multi-objective trace clustering:
mining with trace clustering is still unexplored. We believe that Finding more balanced solutions,” in BPM Workshops, 2016, pp. 49–60.
[7] P. Wang, W. Tan, A. Tang, and K. Hu, “A novel trace clustering technique
this novel intuition can lead to distinct and interesting results, based on constrained trace alignment,” in HCC, 2017, pp. 53–63.
beyond the reach of procedural processes. Also, clustering [8] R. P. J. C. Bose and W. M. P. van der Aalst, “Context aware
around rules makes the clustering semantic explicit, easing trace clustering: Towards improving process mining results,” in SIAM
International Conference on Data Mining, 2009, pp. 401–412.
supervised techniques and the injection of experts knowledge. [9] J. Evermann, T. Thaler, and P. Fettke, “Clustering traces using sequence
To make this clustering possible, it is mandatory to devise a alignment,” in BPM Workshops, 2015, pp. 179–190.
similarity concept between traces and rules. Indeed a declarative [10] P. Nguyen, A. Slominski, V. Muthusamy, V. Ishakian, and K. Nahrstedt,
“Process trace clustering: A heterogeneous information network approach,”
injection can be used for both model-based and vector-based in SIAM International Conference on Data Mining, 2016, pp. 279–287.
techniques. For both is paramount to devise an informative [11] P. De Koninck and J. De Weerdt, “Scalable mixed-paradigm trace
evaluation of the rules on the trace. The validity or violation clustering using super-instances,” in ICPM, 2019, pp. 17–24.
[12] L. Geng and H. J. Hamilton, “Interestingness measures for data mining:
of a rule in a trace can be a possible direction, but the A survey,” ACM Comput. Surv., vol. 38, no. 3, p. 9, 2006.
boolean evaluation may be too limited to clearly differentiate [13] W. Hämäläinen and G. I. Webb, “A tutorial on statistically sound pattern
the clusters. Furthermore it would be a single perspective, discovery,” Data Min. Knowl. Discov., vol. 33, no. 2, pp. 325–377, 2019.
[14] F. M. Maggi, R. P. J. C. Bose, and W. M. P. van der Aalst, “Efficient
not enough to build a feature vector. A more flexible and discovery of understandable declarative process models from event logs,”
broad mean of rules evaluation would be desirable, but the in CAiSE, 2012, pp. 270–285.
current declarative techniques are limited on that regard. For [15] S. Schönig, A. Rogge-Solti, C. Cabanillas, S. Jablonski, and J. Mendling,
“Efficient and customisable declarative process mining with SQL,” in
this reason we will devise an extensive measurement framework CAiSE, 2016, pp. 290–305.
for declarative specifications going beyond these limits. [16] C. Di Ciccio and M. Mecella, “On the discovery of declarative control
The goal of our measurement framework is to provide a flows for artful processes,” ACM Trans. Management Inf. Syst., vol. 5,
no. 4, pp. 24:1–24:37, 2015.
sound ground where to define, compute, and verify measures for [17] T. B. Le and D. Lo, “Beyond support and confidence: Exploring
generic temporal logic formulae. On top of it will be based the interestingness measures for rule-based specification mining,” in SANER,
similarity function for clustering of traces. In order to validate 2015, pp. 331–340.
[18] B. F. van Dongen, “BPI challenge 2012,” Eindhoven University of
these results, we are going to implement the measurement Technology, 2012.
framework first and the overall behavioural clustering after- [19] A. Cecconi, G. De Giacomo, C. Di Ciccio, F. M. Maggi, and J. Mendling,
wards into a proof-of-concept software with which experimental “A temporal logic-based measurement framework for process mining,” in
ICPM, 2020.
evaluations will be conducted. The empirical evaluation of the [20] A. Cecconi, C. D. Ciccio, G. De Giacomo, and J. Mendling, “Interesting-
techniques will be carried out both on simulated artificial ness of traces in declarative process mining: The janus LTLpf approach,”
data and publicly available real-life data like BPI Challenge in BPM, 2018, pp. 121–138.