<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Behavioural Clustering by Extensive Declarative Specifications Measurements (Extended Abstract)</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Alessio Cecconi Vienna University of Economics and Business</institution>
          ,
          <addr-line>Vienna</addr-line>
          ,
          <country country="AT">Austria</country>
        </aff>
      </contrib-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>II. BACKGROUND</title>
      <p>
        Trace Clustering. The goal of trace clustering is to find
traces of similar behaviour and group them into clusters. The
guiding rule is to maximise the similarity within a cluster while
maximising the distance with the other clusters. Three main
class of approaches exist: (i) Vector-based, where the traces are
transformed into feature vectors and distance metrics are used
in the vector space (e.g. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]); (ii) Context-aware, where
string distance metrics are applied directly on the whole traces
(e.g. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]); (iii) Model-based, where traces are clustered
around fitting process models (e.g. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]). Trace clustering
has been employed in process mining to assist the discovery
of procedural process models. Dividing the event log into
      </p>
      <p>Clustering</p>
      <p>
        Technique approach
Greco et al. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] Vector-based
      </p>
      <p>
        Song et al. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] Vector-based
Jablonski et al. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] Vector-based
Bose and
van der Aalst [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] Vector-based
      </p>
      <p>
        Ferreira et al. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] Model-based
      </p>
    </sec>
    <sec id="sec-2">
      <title>DDee KWoeneirndctk and [6] Model-based procedural</title>
      <p>
        Wang et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] Model-based procedural
Bose and
van der Aalst [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] Context-aware procedural
Evermann et al. [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] Context-aware procedural
      </p>
      <p>
        Nguyen et al. [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] Mixed procedural
DDee KWoeneirndctk and [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] Mixed procedural
pCeornstpreoclt-iflvoew pDeartsapective aCllguosrtietrhimngs
procedural no K-means, Hierarchical Clustering
procedural yes
aKti-vmeeCanlus,stQeruinagli,tySeTlfhOrersghaonlidz,inAggMglaopmserprocedural yes Hierarchical clustering
procedural no Hierarchical clustering
procedural no 1MstaxiomrdizeartioMnarkov chain
Expectationno Active learning
no
eCroanrcshtriacianlecdlucslutesrtienrgin,gs,paegctgrlaolmcelurastteivreinhgino edit-distance, agglomerative clustering
no K-means
yes Graph path similarity
yes K-means, active learning
different clusters, the discovery techniques can be applied
only to discover the models of each cluster, resulting in a
set of simpler and more understandable models of particular
behaviours of the process. Table I summarizes the current
applications of trace clustering in process mining.
      </p>
      <p>It can be noticed that different approaches, perspectives,
and algorithms have been tried, yet all the current trace
clustering techniques in process mining share, not really a
limit, but rather a common trait: only procedural models
are considered. Accordingly the control-flow perspective is
inspected only for its continuous subsequences, i.e., only
directly following relations, thus local proximity of activities is
preferred in the clustering composition. This is not a limit of the
clustering techniques per se, but in the object used to devise the
characteristics upon which basing the clustering. For example,
consider two traces xa; b; c; d; e; f y and xb; a; d; c; f; ey where
the events are couple-wise swapped, but a transitivity property
between tasks a; b; and c is preserved (i.e., a Ñ c Ñ e). If this
transitivity property is of interest, both the traces should be
grouped in the same cluster, but the directly-follow relations
between the two traces is messed, thus they may result too
different to appear in the same cluster. As a result, similar
traces may be disjoined or different ones may be grouped.</p>
      <p>
        Evaluation of declarative specifications. Declarative process
mining mostly resorts to quality measures from association
rule mining [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] to qualify single rules with respect to event
logs. Support and confidence are the most adopted measures
on that regard, yet they are reportedly not sufficient to avoid
a great amount of spurious results [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], which threatens the
statistical soundness of the results. Also, there are different
Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
definitions for support [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] and confidence [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ],
[
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. For example, the support measure of [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] cannot
be compared to the support of [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] because of the different
definitions. Furthermore these techniques defined the measures
only for a limited set of rules (i.e., the standard DECLARE
rules-set). Thus, the comparison of techniques is hampered
by their customized definitions of the same measures and
the transferability of measures themselves between techniques
is limited. The result is a scattered adoption of a small set
of measures dependent either to a specific language or set
of rules. Different other measures have been studied to go
beyond this limit [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], yet they have been not fully exploited
in process mining area. Thus a more advanced and extensive
evaluation system for declarative specifications is required to
base efficiently trace clustering on them.
      </p>
    </sec>
    <sec id="sec-3">
      <title>III. CONTRIBUTION</title>
      <p>With this research we aim to explore the integration of
declarative process mining and trace clustering. The
expressiveness of declarative rules can allow for a new clustering
based on clear desired properties of the process, and not strict
events sequences. In order to do so, an extension of the current
evaluation techniques for declarative specifications is required.</p>
      <p>A declarative specification allows for complex relations
among activities regardless of their distance in the execution
flow. That is because each specification models a desired
properties of the process, not a specific executions. At the
best of our knowledge, the combination of declarative process
mining with trace clustering is still unexplored. We believe that
this novel intuition can lead to distinct and interesting results,
beyond the reach of procedural processes. Also, clustering
around rules makes the clustering semantic explicit, easing
supervised techniques and the injection of experts knowledge.</p>
      <p>To make this clustering possible, it is mandatory to devise a
similarity concept between traces and rules. Indeed a declarative
injection can be used for both model-based and vector-based
techniques. For both is paramount to devise an informative
evaluation of the rules on the trace. The validity or violation
of a rule in a trace can be a possible direction, but the
boolean evaluation may be too limited to clearly differentiate
the clusters. Furthermore it would be a single perspective,
not enough to build a feature vector. A more flexible and
broad mean of rules evaluation would be desirable, but the
current declarative techniques are limited on that regard. For
this reason we will devise an extensive measurement framework
for declarative specifications going beyond these limits.</p>
      <p>
        The goal of our measurement framework is to provide a
sound ground where to define, compute, and verify measures for
generic temporal logic formulae. On top of it will be based the
similarity function for clustering of traces. In order to validate
these results, we are going to implement the measurement
framework first and the overall behavioural clustering
afterwards into a proof-of-concept software with which experimental
evaluations will be conducted. The empirical evaluation of the
techniques will be carried out both on simulated artificial
data and publicly available real-life data like BPI Challenge
datasets, e.g. [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. The controlled environment of a simulation
is required to check the validity of the results in absence of a
ground truth, while real-life data allows to asses the feasibility
of the technique in realistic settings.
      </p>
    </sec>
    <sec id="sec-4">
      <title>IV. CONCLUSION</title>
      <p>
        Trace clustering is a relevant topic and the employment of
declarative process mining in that regard is promising and
especially still unexplored. Yet, the current evaluation systems
for declarative specification are not enough for a truly effective
trace clustering based on them. Given these open points, there
is a call for: (i) an extended evaluation system for declarative
specifications. (ii) a novel application of declarative process
mining for trace clustering. Markedly, we recently achieved
the first point in [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ], based our previous work [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ].
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>G.</given-names>
            <surname>Greco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Guzzo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Pontieri</surname>
          </string-name>
          , and D. Sacca`, “
          <article-title>Discovering expressive process models by clustering log traces</article-title>
          ,
          <source>” IEEE Trans. Knowl. Data Eng.</source>
          , vol.
          <volume>18</volume>
          , no.
          <issue>8</issue>
          , pp.
          <fpage>1010</fpage>
          -
          <lpage>1027</lpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Song</surname>
          </string-name>
          , C. W. Gu¨nther, and
          <string-name>
            <surname>W. M. P. van der Aalst</surname>
          </string-name>
          , “
          <article-title>Trace clustering in process mining,”</article-title>
          <source>in BPM Workshops</source>
          ,
          <year>2008</year>
          , pp.
          <fpage>109</fpage>
          -
          <lpage>120</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Jablonski</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Ro¨glinger, S. Scho¨nig, and</article-title>
          K. M. Wyrtki, “
          <article-title>Multiperspective clustering of process execution traces,”</article-title>
          <string-name>
            <given-names>Enterp. Model. Inf. Syst. Archit. Int. J.</given-names>
            <surname>Concept</surname>
          </string-name>
          . Model., vol.
          <volume>14</volume>
          , pp.
          <volume>2</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>2</lpage>
          :
          <fpage>22</fpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>R. P. J. C.</given-names>
            <surname>Bose</surname>
          </string-name>
          and
          <string-name>
            <surname>W. M. P. van der Aalst</surname>
          </string-name>
          , “
          <article-title>Trace clustering based on conserved patterns: Towards achieving better process models,”</article-title>
          <source>in BPM Workshops</source>
          ,
          <year>2009</year>
          , pp.
          <fpage>170</fpage>
          -
          <lpage>181</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>D. R.</given-names>
            <surname>Ferreira</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zacarias</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Malheiros</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Ferreira</surname>
          </string-name>
          , “
          <article-title>Approaching process mining with sequence clustering: Experiments and findings</article-title>
          ,” in BPM,
          <year>2007</year>
          , pp.
          <fpage>360</fpage>
          -
          <lpage>374</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>P.</given-names>
            <surname>De Koninck and J. De Weerdt</surname>
          </string-name>
          ,
          <article-title>“Multi-objective trace clustering: Finding more balanced solutions,”</article-title>
          <source>in BPM Workshops</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>49</fpage>
          -
          <lpage>60</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>P.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Tan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Tang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Hu</surname>
          </string-name>
          , “
          <article-title>A novel trace clustering technique based on constrained trace alignment</article-title>
          ,” in HCC,
          <year>2017</year>
          , pp.
          <fpage>53</fpage>
          -
          <lpage>63</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>R. P. J. C.</given-names>
            <surname>Bose</surname>
          </string-name>
          and
          <string-name>
            <surname>W. M. P. van der Aalst</surname>
          </string-name>
          , “
          <article-title>Context aware trace clustering: Towards improving process mining results,”</article-title>
          <source>in SIAM International Conference on Data Mining</source>
          ,
          <year>2009</year>
          , pp.
          <fpage>401</fpage>
          -
          <lpage>412</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>J.</given-names>
            <surname>Evermann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Thaler</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Fettke</surname>
          </string-name>
          , “
          <article-title>Clustering traces using sequence alignment,”</article-title>
          <source>in BPM Workshops</source>
          ,
          <year>2015</year>
          , pp.
          <fpage>179</fpage>
          -
          <lpage>190</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>P.</given-names>
            <surname>Nguyen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Slominski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Muthusamy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Ishakian</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Nahrstedt</surname>
          </string-name>
          , “
          <article-title>Process trace clustering: A heterogeneous information network approach,”</article-title>
          <source>in SIAM International Conference on Data Mining</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>279</fpage>
          -
          <lpage>287</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>P. De Koninck and J. De Weerdt</surname>
          </string-name>
          , “
          <article-title>Scalable mixed-paradigm trace clustering using super-instances,” in</article-title>
          <string-name>
            <surname>ICPM</surname>
          </string-name>
          ,
          <year>2019</year>
          , pp.
          <fpage>17</fpage>
          -
          <lpage>24</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>L.</given-names>
            <surname>Geng</surname>
          </string-name>
          and
          <string-name>
            <given-names>H. J.</given-names>
            <surname>Hamilton</surname>
          </string-name>
          , “
          <article-title>Interestingness measures for data mining: A survey,” ACM Comput</article-title>
          . Surv., vol.
          <volume>38</volume>
          , no.
          <issue>3</issue>
          , p.
          <fpage>9</fpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>W.</given-names>
            <surname>Ha</surname>
          </string-name>
          <article-title>¨ma¨la¨inen and</article-title>
          <string-name>
            <surname>G. I. Webb</surname>
          </string-name>
          , “
          <article-title>A tutorial on statistically sound pattern discovery,” Data Min</article-title>
          . Knowl. Discov., vol.
          <volume>33</volume>
          , no.
          <issue>2</issue>
          , pp.
          <fpage>325</fpage>
          -
          <lpage>377</lpage>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>F. M.</given-names>
            <surname>Maggi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. P. J. C.</given-names>
            <surname>Bose</surname>
          </string-name>
          , and
          <string-name>
            <surname>W. M. P. van der Aalst</surname>
          </string-name>
          , “
          <article-title>Efficient discovery of understandable declarative process models from event logs</article-title>
          ,” in CAiSE,
          <year>2012</year>
          , pp.
          <fpage>270</fpage>
          -
          <lpage>285</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>S.</given-names>
            <surname>Scho</surname>
          </string-name>
          <article-title>¨nig, A</article-title>
          .
          <string-name>
            <surname>Rogge-Solti</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Cabanillas</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Jablonski</surname>
            , and
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Mendling</surname>
          </string-name>
          , “
          <article-title>Efficient and customisable declarative process mining with SQL</article-title>
          ,” in CAiSE,
          <year>2016</year>
          , pp.
          <fpage>290</fpage>
          -
          <lpage>305</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>C.</given-names>
            <surname>Di Ciccio</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Mecella</surname>
          </string-name>
          , “
          <article-title>On the discovery of declarative control flows for artful processes</article-title>
          ,
          <source>” ACM Trans. Management Inf. Syst.</source>
          , vol.
          <volume>5</volume>
          , no.
          <issue>4</issue>
          , pp.
          <volume>24</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>24</lpage>
          :
          <fpage>37</fpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>T. B. Le</surname>
            and
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Lo</surname>
          </string-name>
          , “
          <article-title>Beyond support and confidence: Exploring interestingness measures for rule-based specification mining</article-title>
          ,” in SANER,
          <year>2015</year>
          , pp.
          <fpage>331</fpage>
          -
          <lpage>340</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>B. F. van Dongen</surname>
          </string-name>
          ,
          <source>“BPI challenge</source>
          <year>2012</year>
          ,” Eindhoven University of Technology,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>A.</given-names>
            <surname>Cecconi</surname>
          </string-name>
          , G. De Giacomo,
          <string-name>
            <given-names>C.</given-names>
            <surname>Di Ciccio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. M.</given-names>
            <surname>Maggi</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Mendling</surname>
          </string-name>
          , “
          <article-title>A temporal logic-based measurement framework for process mining</article-title>
          ,” in ICPM,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>A.</given-names>
            <surname>Cecconi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. D.</given-names>
            <surname>Ciccio</surname>
          </string-name>
          , G. De Giacomo, and
          <string-name>
            <given-names>J.</given-names>
            <surname>Mendling</surname>
          </string-name>
          , “
          <article-title>Interestingness of traces in declarative process mining: The janus LTLpf approach</article-title>
          ,” in BPM,
          <year>2018</year>
          , pp.
          <fpage>121</fpage>
          -
          <lpage>138</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>