=Paper=
{{Paper
|id=Vol-2703/paperDC6
|storemode=property
|title=Behavioural Clustering by Extensive Declarative Specifications Measurements (Extended Abstract)
|pdfUrl=https://ceur-ws.org/Vol-2703/paperDC6.pdf
|volume=Vol-2703
|authors=Alessio Cecconi
|dblpUrl=https://dblp.org/rec/conf/icpm/Cecconi20
}}
==Behavioural Clustering by Extensive Declarative Specifications Measurements (Extended Abstract)==
<pdf width="1500px">https://ceur-ws.org/Vol-2703/paperDC6.pdf</pdf>
<pre>
        Behavioural Clustering by Extensive Declarative
        Specifications Measurements (Extended Abstract)
                                                              Alessio Cecconi
                                     Vienna University of Economics and Business, Vienna, Austria
                                                        alessio.cecconi@wu.ac.at


                           I. I NTRODUCTION                                Table I: Process mining techniques using trace clustering.
       The recognition, classification and grouping of distinct                      Technique
                                                                                                    Clustering      Control-flow   Data          Clustering
                                                                                                    approach        perspective    perspective   algorithms
    process behaviours in an event log is a key aspect of process               Greco et al. [1]    Vector-based    procedural         no        K-means, Hierarchical Clustering

    analysis. In unstructured and flexible processes contexts this               Song et al. [2]    Vector-based    procedural        yes        K-means, Quality Threshold, Agglomer-
                                                                                                                                                 ative Clustering, SelfOrganizing Maps
    is not straightforward and the literature devises different             Jablonski et al. [3]    Vector-based    procedural        yes        Hierarchical clustering
                                                                            Bose and
    techniques to tackle the problem. An effective one has been             van der Aalst
                                                                                             [4]    Vector-based    procedural         no        Hierarchical clustering

    found in trace clustering, namely a set of techniques which               Ferreira et al. [5]   Model-based     procedural         no        1st order Markov chain Expectation-
                                                                                                                                                 Maximization
    automatically group similar traces according to specified              De Koninck and
                                                                           De Weerdt
                                                                                          [6]       Model-based     procedural         no        Active learning

    criteria, allowing for better understandability and decreased               Wang et al. [7]     Model-based     procedural         no        Constrained clustering, agglomerative hi-
                                                                                                                                                 erarchical clustering, spectral clustering
    complexity of the analysis. However, all available clustering           Bose and
                                                                            van der Aalst
                                                                                             [8]    Context-aware   procedural         no        edit-distance, agglomerative clustering

    techniques are designed exclusively with procedural process             Evermann et al. [9]     Context-aware   procedural         no        K-means

    models. For those techniques the key aspect for trace similarity         Nguyen et al. [10]
                                                                           De Koninck and
                                                                                                    Mixed           procedural        yes        Graph path similarity
                                                                                           [11]     Mixed           procedural        yes        K-means, active learning
    is the precise sequence of execution of events, as they consider       De Weerdt

    only events that immediately follow or precede one another.
    Yet, the properties and relations of events in a process may
    fall outside such a narrow scope.                                     different clusters, the discovery techniques can be applied
       In our research, we want to explore the opportunity of             only to discover the models of each cluster, resulting in a
    employing declarative process mining for trace clustering. We         set of simpler and more understandable models of particular
    believe that the characteristics of declarative specifications can    behaviours of the process. Table I summarizes the current
    lead to novel results given the focus on different relations of the   applications of trace clustering in process mining.
    events in the event log. Indeed, a declarative rule describes a          It can be noticed that different approaches, perspectives,
    desired property of the process, not a specific execution. Thus,      and algorithms have been tried, yet all the current trace
    grouping around them suggests clusters centred on flexible,           clustering techniques in process mining share, not really a
    complex, and yet specific behaviours of the process instead of        limit, but rather a common trait: only procedural models
    strict events sequence similarity.                                    are considered. Accordingly the control-flow perspective is
       Any clustering technique is based on similarity (or distance)      inspected only for its continuous subsequences, i.e., only
    concepts describing how close or distant objects are. Never-          directly following relations, thus local proximity of activities is
    theless, the current declarative rules evaluation methods are         preferred in the clustering composition. This is not a limit of the
    limited to devise a comprehensive similarity concept for traces       clustering techniques per se, but in the object used to devise the
    based on rules. To fill this gap it is required an extensive          characteristics upon which basing the clustering. For example,
    measurement system for declarative specifications.                    consider two traces xa, b, c, d, e, f y and xb, a, d, c, f, ey where
                                                                          the events are couple-wise swapped, but a transitivity property
                           II. BACKGROUND                                 between tasks a, b, and c is preserved (i.e., a Ñ c Ñ e). If this
     Trace Clustering. The goal of trace clustering is to find            transitivity property is of interest, both the traces should be
     traces of similar behaviour and group them into clusters. The        grouped in the same cluster, but the directly-follow relations
     guiding rule is to maximise the similarity within a cluster while    between the two traces is messed, thus they may result too
     maximising the distance with the other clusters. Three main          different to appear in the same cluster. As a result, similar
     class of approaches exist: (i) Vector-based, where the traces are    traces may be disjoined or different ones may be grouped.
     transformed into feature vectors and distance metrics are used       Evaluation of declarative specifications. Declarative process
     in the vector space (e.g. [1], [2]); (ii) Context-aware, where       mining mostly resorts to quality measures from association
     string distance metrics are applied directly on the whole traces     rule mining [12] to qualify single rules with respect to event
     (e.g. [4], [9]); (iii) Model-based, where traces are clustered       logs. Support and confidence are the most adopted measures
     around fitting process models (e.g. [5], [6]). Trace clustering      on that regard, yet they are reportedly not sufficient to avoid
     has been employed in process mining to assist the discovery          a great amount of spurious results [13], which threatens the
     of procedural process models. Dividing the event log into            statistical soundness of the results. Also, there are different


Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
definitions for support [14], [15], [16] and confidence [14],        datasets, e.g. [18]. The controlled environment of a simulation
[15], [16]. For example, the support measure of [16] cannot          is required to check the validity of the results in absence of a
be compared to the support of [14] because of the different          ground truth, while real-life data allows to asses the feasibility
definitions. Furthermore these techniques defined the measures       of the technique in realistic settings.
only for a limited set of rules (i.e., the standard D ECLARE
                                                                                                IV. C ONCLUSION
rules-set). Thus, the comparison of techniques is hampered
by their customized definitions of the same measures and                Trace clustering is a relevant topic and the employment of
the transferability of measures themselves between techniques        declarative process mining in that regard is promising and
is limited. The result is a scattered adoption of a small set        especially still unexplored. Yet, the current evaluation systems
of measures dependent either to a specific language or set           for declarative specification are not enough for a truly effective
of rules. Different other measures have been studied to go           trace clustering based on them. Given these open points, there
beyond this limit [17], yet they have been not fully exploited       is a call for: (i) an extended evaluation system for declarative
in process mining area. Thus a more advanced and extensive           specifications. (ii) a novel application of declarative process
evaluation system for declarative specifications is required to      mining for trace clustering. Markedly, we recently achieved
base efficiently trace clustering on them.                           the first point in [19], based our previous work [20].

                     III. C ONTRIBUTION                                                            R EFERENCES
                                                                      [1] G. Greco, A. Guzzo, L. Pontieri, and D. Saccà, “Discovering expressive
   With this research we aim to explore the integration of                process models by clustering log traces,” IEEE Trans. Knowl. Data Eng.,
declarative process mining and trace clustering. The expres-              vol. 18, no. 8, pp. 1010–1027, 2006.
siveness of declarative rules can allow for a new clustering          [2] M. Song, C. W. Günther, and W. M. P. van der Aalst, “Trace clustering
                                                                          in process mining,” in BPM Workshops, 2008, pp. 109–120.
based on clear desired properties of the process, and not strict      [3] S. Jablonski, M. Röglinger, S. Schönig, and K. M. Wyrtki, “Multi-
events sequences. In order to do so, an extension of the current          perspective clustering of process execution traces,” Enterp. Model. Inf.
evaluation techniques for declarative specifications is required.         Syst. Archit. Int. J. Concept. Model., vol. 14, pp. 2:1–2:22, 2018.
                                                                      [4] R. P. J. C. Bose and W. M. P. van der Aalst, “Trace clustering based on
   A declarative specification allows for complex relations               conserved patterns: Towards achieving better process models,” in BPM
among activities regardless of their distance in the execution            Workshops, 2009, pp. 170–181.
flow. That is because each specification models a desired             [5] D. R. Ferreira, M. Zacarias, M. Malheiros, and P. Ferreira, “Approaching
                                                                          process mining with sequence clustering: Experiments and findings,” in
properties of the process, not a specific executions. At the              BPM, 2007, pp. 360–374.
best of our knowledge, the combination of declarative process         [6] P. De Koninck and J. De Weerdt, “Multi-objective trace clustering:
mining with trace clustering is still unexplored. We believe that         Finding more balanced solutions,” in BPM Workshops, 2016, pp. 49–60.
                                                                      [7] P. Wang, W. Tan, A. Tang, and K. Hu, “A novel trace clustering technique
this novel intuition can lead to distinct and interesting results,        based on constrained trace alignment,” in HCC, 2017, pp. 53–63.
beyond the reach of procedural processes. Also, clustering            [8] R. P. J. C. Bose and W. M. P. van der Aalst, “Context aware
around rules makes the clustering semantic explicit, easing               trace clustering: Towards improving process mining results,” in SIAM
                                                                          International Conference on Data Mining, 2009, pp. 401–412.
supervised techniques and the injection of experts knowledge.         [9] J. Evermann, T. Thaler, and P. Fettke, “Clustering traces using sequence
   To make this clustering possible, it is mandatory to devise a          alignment,” in BPM Workshops, 2015, pp. 179–190.
similarity concept between traces and rules. Indeed a declarative    [10] P. Nguyen, A. Slominski, V. Muthusamy, V. Ishakian, and K. Nahrstedt,
                                                                          “Process trace clustering: A heterogeneous information network approach,”
injection can be used for both model-based and vector-based               in SIAM International Conference on Data Mining, 2016, pp. 279–287.
techniques. For both is paramount to devise an informative           [11] P. De Koninck and J. De Weerdt, “Scalable mixed-paradigm trace
evaluation of the rules on the trace. The validity or violation           clustering using super-instances,” in ICPM, 2019, pp. 17–24.
                                                                     [12] L. Geng and H. J. Hamilton, “Interestingness measures for data mining:
of a rule in a trace can be a possible direction, but the                 A survey,” ACM Comput. Surv., vol. 38, no. 3, p. 9, 2006.
boolean evaluation may be too limited to clearly differentiate       [13] W. Hämäläinen and G. I. Webb, “A tutorial on statistically sound pattern
the clusters. Furthermore it would be a single perspective,               discovery,” Data Min. Knowl. Discov., vol. 33, no. 2, pp. 325–377, 2019.
                                                                     [14] F. M. Maggi, R. P. J. C. Bose, and W. M. P. van der Aalst, “Efficient
not enough to build a feature vector. A more flexible and                 discovery of understandable declarative process models from event logs,”
broad mean of rules evaluation would be desirable, but the                in CAiSE, 2012, pp. 270–285.
current declarative techniques are limited on that regard. For       [15] S. Schönig, A. Rogge-Solti, C. Cabanillas, S. Jablonski, and J. Mendling,
                                                                          “Efficient and customisable declarative process mining with SQL,” in
this reason we will devise an extensive measurement framework             CAiSE, 2016, pp. 290–305.
for declarative specifications going beyond these limits.            [16] C. Di Ciccio and M. Mecella, “On the discovery of declarative control
   The goal of our measurement framework is to provide a                  flows for artful processes,” ACM Trans. Management Inf. Syst., vol. 5,
                                                                          no. 4, pp. 24:1–24:37, 2015.
sound ground where to define, compute, and verify measures for       [17] T. B. Le and D. Lo, “Beyond support and confidence: Exploring
generic temporal logic formulae. On top of it will be based the           interestingness measures for rule-based specification mining,” in SANER,
similarity function for clustering of traces. In order to validate        2015, pp. 331–340.
                                                                     [18] B. F. van Dongen, “BPI challenge 2012,” Eindhoven University of
these results, we are going to implement the measurement                  Technology, 2012.
framework first and the overall behavioural clustering after-        [19] A. Cecconi, G. De Giacomo, C. Di Ciccio, F. M. Maggi, and J. Mendling,
wards into a proof-of-concept software with which experimental            “A temporal logic-based measurement framework for process mining,” in
                                                                          ICPM, 2020.
evaluations will be conducted. The empirical evaluation of the       [20] A. Cecconi, C. D. Ciccio, G. De Giacomo, and J. Mendling, “Interesting-
techniques will be carried out both on simulated artificial               ness of traces in declarative process mining: The janus LTLpf approach,”
data and publicly available real-life data like BPI Challenge             in BPM, 2018, pp. 121–138.

</pre>