<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Process Instance Clustering Based on Conformance Checking Artefacts (Extended Abstract)</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Mathilde Boltenhagen</string-name>
          <email>mathilde.boltenhagen@outlook.com</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>ENS Paris-Saclay</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Inria</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Université Paris-Saclay</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gif-sur-Yvette</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>France</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Process Mining</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Conformance Checking</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Clustering</string-name>
        </contrib>
      </contrib-group>
      <fpage>6</fpage>
      <lpage>9</lpage>
      <abstract>
        <p>As event data becomes an ubiquitous source of information, data science techniques represent an unprecedented opportunity to analyze and react to the processes that generate this data. Process Mining is an emerging field that bridges the gap between traditional data analysis techniques, like Data Mining, and Business Process Management analysis. One core value of Process Mining is the discovery of formal process models like Petri nets and BPMN models which attempt to make sense of the events recorded in logs. As decision makers increasingly rely on these models, it is crucial to ensure that they model the targeted systems reliably. The quest of obtaining a good process model relies on quality criteria which brings to Conformance Checking, a subfield of Process Mining.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>2. Process Instance Clustering</title>
      <p>Process instance clustering is the partition of log instances in sublogs such that the clusters group
similar processes. This topic of research has shown a large interest in process mining in the two last
decades with 103 relevant works [14]. Thus, the similarity of process instances has been approached
from several perspectives:
•</p>
      <p>
        On the first hand, the study of the control-flow given by the log sequences allows grouping
process instances according to the behavior they describe. In other words, the activities that appear
in the system are assessed. These clustering methods range from the study of the frequency of the
activities [10] to the study of patterns [
        <xref ref-type="bibr" rid="ref3">6,5,3,7</xref>
        ].
Germany
      </p>
      <p>2022 Copyright for this paper by its authors.
• On the other hand, context perspective approaches provide clustering based on the data
attributes. These techniques get closer to classical data mining [13].
• Some works deal with the two approaches [10,8].</p>
      <p>The outputs of those works show a real interest of process instance clustering in process discovery.
Instead of learning a model representing the entire log, the idea is to mine a process model per cluster.
Then, the produced models give a better compromise between the quality criteria thanks to the
homogeneity of the clusters.</p>
      <p>A perspective missing of the last few paragraphs is the existence of a process model. There, trace
variants and clusters of process instances are learned and extracted from the event log only.</p>
    </sec>
    <sec id="sec-2">
      <title>3. Research Motivation</title>
      <p>Once a process model has been validated by its process owner, the practitioner can benefit from the
knowledge of this model by using it as a baseline for log analysis. Hence, trace variant extraction and
process instance clustering can use this reliable process model as input. This idea is in contrast to the
aforementioned situation where the motivation is to learn simpler models from sublogs. Here, the
process model can be complex and the objective is to extract simpler artefacts from it. This perspective
is motivated by the complexity of the process models produced by the discovery algorithms that mainly
prioritize fitness [12]. Since the learned model contains the behavioral information and a visualization
of it which known by the process owner, a log analysis based on it gives a novel view for decision
making.</p>
      <p>
        We proposes to fill this gap by presenting approaches that use conformance checking techniques to
represent sublogs based on a reliable process model. Behind quality measures, conformance checking
brings key artefacts like alignments, multi-alignments and anti-alignments. These artefacts formally
describe the relationship between real cases and modeling and, therefore, play an important role for
process model explainability [
        <xref ref-type="bibr" rid="ref1 ref2">1,2</xref>
        ]. The thesis proposes to exploit the conformance checking artefacts
for clustering the process instances contained in event logs. Thus, we allow partitioning event log and
extract modeled artefacts that we use as model-based trace variants.
      </p>
    </sec>
    <sec id="sec-3">
      <title>4. Contributions</title>
      <p>From the aforementioned motivation, we have elaborated a set of methods for computing
conformance checking artefacts. The thesis gives definitions, algorithms and applications of them for
finding good model-based trace variants, i.e., process instance representatives based on a reliable
process model, through clustering approach.</p>
      <p>The first contribution, schematized in Fig.1, is the development of two algorithms for computing
multi-alignments. Multi-alignment is a conformance checking artefact that relates many log sequences
to a unique modeled sequence. This artefact can help one to get an overview of a log or a sublog and
then, stands as model-based trace variant. The proposed algorithms for computing multi-alignments
extend to classical alignments. Consequently, this chapter provides an novel optimal encoding and
several heuristics for computing both alignments and multi-alignments.</p>
      <p>The disadvantage of multi-alignment is that it is a single artefact that represents all the sequences
given as input. Thus, multi-alignment extraction fits well when the log is homogeneous but becomes
less appropriate when the log contains several types of behaviors. In the latter situation, one want to
separate the behaviors in different groups such that the modeled variant is accurate to each group. We
propose to solve this problem by proposing a set of 3 clustering methods based on alignments. Then,
from a model and a log, the algorithms partition the log sequences into clusters and provide a variant
per cluster.</p>
      <p>Both previous methods assume a process model and extract model-based trace variants of a set of
log sequences based on this model. However, the quality of the input model makes varying the results
of the methods. For this purpose, we present another conformance checking artefact entitled
antialignment which aims at measuring precision of process models. As shown in Fig. 3, the algorithm
takes a model and a log as input and extracts one of the most deviant modeled sequence with respect to
the log.</p>
      <p>All the developed methods are formally presented and given in a SAT encoding. Heuristic
algorithms are then added to deal with computing capacity of today’s computers, at the expense of
losing optimality.</p>
    </sec>
    <sec id="sec-4">
      <title>5. References</title>
      <p>[5] Pieter De Koninck and Jochen De Weerdt. Scalable mixed-paradigm trace clustering using
superinstances. In 2019 International Conference on Process Mining (ICPM), pages 17–24. IEEE, 2019.
[6] Gianluigi Greco, Antonella Guzzo, Luigi Pontieri, and Domenico Sacc`a. Discovering expressive
process models by clustering log traces. IEEE Trans. Knowl. Data Eng., 18(8):1010–1027, 2006.
[7] Xixi Lu, Seyed Amin Tabatabaei, Mark Hoogendoorn, and Hajo A Reijers. Trace clustering on
very large event data in healthcare using frequent sequence patterns. In International Conference
on Business Process Management, pages 198–215. Springer, 2019.
[8] Daniela Luengo and Marcos Sep ́ulveda. Applying clustering in process mining
to find different versions of a business process that changes over time. In
International Conference on Business Process Management, pages 153–158.</p>
      <p>Springer, 2011.
[9] Andrey Mokhov, Jordi Cortadella, and Alessandro de Gennaro. Process windows. In 17th
International Conference on Application of Concurrency to System Design, ACSD 2017, pages
86–95, 2017.
[10] Minseok Song, Christian W. G ̈unther, and Wil M. P. van der Aalst. Trace clustering in process
mining. In Business Process Management Workshops, BPM 2008 International Workshops,
Milano, Italy, September 1-4, 2008. Revised Papers, pages 109–120, 2008.
[11] Niek Tax, Natalia Sidorova, Reinder Haakma, and Wil MP van der Aalst. Mining local process
models. Journal of Innovation in Digital Ecosystems, 3(2):183–196, 2016.
[12] Wil Van Der Aalst, Joos Buijs, and Boudewijn Van Dongen. Towards improving the
representational bias of process mining. In International Symposium on Data-Driven Process
Discovery and Analysis, pages 39–54. Springer, 2011.
[13] Sebastiaan J van Zelst and Yukun Cao. A generic framework for attribute-driven hierarchical trace
clustering. In International Conference on Business Process Management, pages 308–320.</p>
      <p>Springer, 2020.
[14] Fareed Zandkarimi, Jana-Rebecca Rehse, Pouya Soudmand, and Hartmut Hoehle. A generic
framework for trace clustering in process mining. In 2020 2nd International Conference on Process
Mining (ICPM), pages 177–184. IEEE, 2020.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Arya</given-names>
            <surname>Adriansyah</surname>
          </string-name>
          , Jorge Munoz-Gama, Josep Carmona,
          <string-name>
            <surname>Boudewijn F van Dongen</surname>
          </string-name>
          ,
          <article-title>and Wil MP van der Aalst. Alignment based precision checking</article-title>
          .
          <source>In International Conference on Business Process Management</source>
          , pages
          <fpage>137</fpage>
          -
          <lpage>149</lpage>
          . Springer,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Mathilde</given-names>
            <surname>Boltenhagen</surname>
          </string-name>
          , Thomas Chatain, and
          <string-name>
            <given-names>Josep</given-names>
            <surname>Carmona</surname>
          </string-name>
          .
          <article-title>Encoding conformance checking artefacts in sat</article-title>
          .
          <source>In International Conference on Business Process Management</source>
          , pages
          <fpage>160</fpage>
          -
          <lpage>171</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>R. P.</given-names>
            <surname>Jagadeesh Chandra Bose and Wil M. P. van der Aalst</surname>
          </string-name>
          .
          <article-title>Trace clustering based on conserved patterns: Towards achieving better process models</article-title>
          .
          <source>In Business Process Management Workshops, BPM 2009 International Workshops, Revised Papers</source>
          , pages
          <fpage>170</fpage>
          -
          <lpage>181</lpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Josep</given-names>
            <surname>Carmona</surname>
          </string-name>
          , Boudewijn van Dongen,
          <string-name>
            <surname>Andreas Solti</surname>
            , and
            <given-names>Matthias</given-names>
          </string-name>
          <string-name>
            <surname>Weidlich</surname>
          </string-name>
          . Conformance checking. Springer,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>