<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>ASP-Based Log Generation with Purposes in Declare4Py</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ivan Donadello</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fabrizio Maria Maggi</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Francesco Riva</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Manpreet Singh</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Datalane SRL</institution>
          ,
          <addr-line>Verona</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Free University of Bozen-Bolzano</institution>
          ,
          <addr-line>Bolzano</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Wuerth Italia Srl</institution>
          ,
          <addr-line>Egna</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Process mining techniques are meant to extract non-trivial information from complex data. Controlled experiments of the algorithms underlying process mining techniques often require logs of process executions that fit the specific purposes of each specific test. Therefore, many tools for the log generation from both procedural models (e.g., Petri nets or BPMN models) and declarative models (e.g., based on LTL or Declare) have been developed. However, the log generation from declarative models still lacks tools for log generation that address specific purposes such as the specification of trace length distributions, the setting of the number of variants that should appear in the log, or the specification of the number of activations of a constraint that should be contained in a trace. We address this research gap by proposing an extension of the Declare4Py Python library that generates synthetic event logs using an Answer Set Programming-based solution whose eflxibility supports the encoding of specific purposes.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Process Mining</kwd>
        <kwd>Declarative Models</kwd>
        <kwd>Log Generation</kwd>
        <kwd>Answer Set Programming</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Process Mining (PM) is a research area that analyzes the execution data of a business process
(an event log) to extract useful information for process improvement. This is not a trivial task as
event logs are sets of process instances (a.k.a. traces) that are a complex type of data. A trace is
composed of a sequence of events arranged in chronological order and each event contains a set
of attributes that can be both symbolic (e.g., the name of the activity executed or the involved
resource) and numeric (e.g., a timestamp). In addition, process instances are samples of a process
model, that is, background knowledge that constraints the execution order of the events. This
background knowledge can be expressed with procedural models (e.g., Petri nets or BPMN
models), which specify the exact control flow of the activities in a trace, or with declarative
models (that is, constraints expressed in Linear Temporal Logic on finite traces or Declare [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ])
that specify the constraints over process activities that should be satisfied during the process
execution. The former models are more fine-grained, the latter more coarse-grained but more
lfexible.
      </p>
      <p>
        The controlled evaluation of PM algorithms requires event logs that fit the purposes for
which each specific experiment has been designed. For example, one purpose can be to test
how the performance of an algorithm is afected when the distribution of trace lengths in the
event log varies. However, although several tools have been developed for generating event
logs by simulating (declarative and procedural) process models [
        <xref ref-type="bibr" rid="ref2 ref3 ref4 ref5">2, 3, 4, 5</xref>
        ] only PURPLE (a tool
presented in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]) produces event logs that fulfill a given property/purpose. Since PURPLE uses
procedural process models to generate logs, a purpose-guided log generator for declarative
models is still missing. To fill this gap, we extend the Declare4Py [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] Python library, which
implements classical PM tasks starting from the MP-Declare [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] language, with an Answer
Set Programming (ASP) functionality for generating purpose-guided event logs starting from
declarative models [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. This ASP-based method performs the simulation of an input declarative
model by converting its associated deterministic finite automaton (DFA) into a logical program
whose solution is given by an ASP solver. This solution is extremely flexible as it supports the
encoding of the desired purposes as a logical program on top of the DFA encoding.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Log Generation with Declare4Py</title>
      <p>The ASP-based solution adopted in Declare4Py takes as inputs the number  of traces to
generate, the minimum length  and the maximum length  of a trace (i.e., the number of
events in the trace), a process model containing MP-Declare constraints, and encodes these
inputs in an ASP program to be solved by the Clingo solver (https://potassco.org/). The solution
is a set of  traces that satisfy the input model and the minimum/maximum length specification.
We leverage this solution to generate event logs with the following four purposes as additional
inputs.</p>
      <p>Users can specify a trace length distribution, that is, an input probability distribution on
the lengths of the traces in the log. Declare4Py supports three probability distributions: i) the
custom distribution, where the user has to specify the probability of a generated trace to have
length  for each  ∈ [,  + 1, . . . , ]; ii) the uniform distribution where all the trace lengths
have the same probability to appear, i.e., 1/( −  + 1); iii) the Gaussian distribution where
the user has to specify a mean and a variance for sampling the trace lengths from a Gaussian
distribution. Once  has been sampled, the Clingo solver is called  −  + 1 times for each
trace length .</p>
      <p>Users can specify the number of variants  , so the generated event log can be segmented
into  groups of traces having the same control-flow (i.e., the same sequence of activity names)
but that can difer for other attributes, such as the timestamps or the resources. Here, Clingo is
called diferent times: in the first call,  traces are generated representing the variants; then,
for each variant, Clingo is called to generate traces with the control-flow of the variant. In this
second call, the input ASP program also encodes the control-flow of the variant.</p>
      <p>Users can specify a subset of constraints in the input model to generate negative traces,
that is, traces that do not satisfy (at least one of/all) the constraints in the specified subset. This
allows users to obtain event logs with labelings defined by MP-Declare constraints. Such
labeled traces can be used to train and test Machine Learning-based process mining algorithms.
The positive traces satisfy all constraints in the input model, whereas the negative traces do
not satisfy the specified subset of constraints. Also in this case, the subset of constraints to be
violated is encoded in an ASP program.</p>
      <p>Users can specify the number of activations for a subset of constraints in the input model.
For example, the user can specify for the constraint Response[(CRP, J), ReleaseB] an activation
number of 3 that means that the event with activity name CRP with payload J should occur
three times. The user provides as input a range of possible values for the number of activations
for a specific constraint and then Declare4Py randomly selects a value in that range for each
generated trace. The subset of the constraints for which the number of activations is specified
and the user-defined value range for the number of activations are encoded in ASP.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Tool Maturity</title>
      <p>
        The better computational times of the adopted ASP solution with respect to the
state-of-theart (declarative) log generators have already been shown in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Therefore, we performed
a comparison in terms of diversity of the generated traces between our tool and the
Alloybased tool presented in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. We chose this tool as it is the only one available that generates
synthetic logs starting from MP-Declare models. The tool generates the traces by using a
SAT solver (instead of an ASP solver as done in Declare4Py) starting from a representation of
the input MP-Declare model in Alloy (https://alloytools.org/). The diversity is measured with
the average of the syntactic distances among the traces in the generated event log. A higher
average distance indicates a higher diversification of the generated traces making the event
log a more interesting benchmark for testing purposes. As syntactic distance, we considered
the normalized Damerau-Levenshtein Distance (DLD). We considered the activity names and
resources available in the  log1 (13 activity names, 26 resources), in the  15_4 log2
(87 activity names, 10 resources) and in the Road Trafic Fine Management log 3 (   , 10
activity names, 143 resources), in order to define three MP-Declare models. In each model, 5
constraints were defined, and the model was used to generate a log with 100 traces of lengths
ranging from 10 to 20 events. For each synthetic log, we measured the DLD between all pairs of
synthetic traces and computed the average. The DLD was measured on both the control-flow
and the resources (see Table 1). We can notice that Declare4Py has better performance than
the Alloy-based tool, that is, Declare4Py guarantees a higher variability in both control-flow
and resources. Finally, the log generator in Declare4Py, being purpose-guided, represents an
improvement of the existing tools for log generation also in terms of provided functionalities.
110.4121/uuid:915d2bfb-7e84-49ad-a286-dc35f063a460
210.4121/uuid:31a308ef-c844-48da-948c-305d167a0ec1
310.4121/uuid:270fd440-1057-4fb9-89a9-b699b47990f5
      </p>
    </sec>
    <sec id="sec-4">
      <title>4. Screencast and Website</title>
      <p>The GitHub repository of Declare4Py4 contains the source code of the tool and all
the tutorials. A specific tutorial for the ASP-based log generator 5 shows how to
run the log generator using all the available options. The video presentation of
this paper can be accessed at https://www.dropbox.com/scl/fi/cbihgbw34smkisb7u1sry/
Screen-Recording-2023-09-12-at-19.17.17.mov?rlkey=pvzt6cuj5yk98azi611zd79xy&amp;dl=0.
4https://github.com/ivanDonadello/Declare4Py
5https://github.com/ivanDonadello/Declare4Py/blob/main/docs/source/tutorials/9.Log_Generation.ipynb</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Pesic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Schonenberg</surname>
          </string-name>
          ,
          <string-name>
            <surname>W. M. P. van der Aalst</surname>
          </string-name>
          , DECLARE:
          <article-title>Full support for looselystructured processes</article-title>
          , in: EDOC, IEEE Computer Society,
          <year>2007</year>
          , pp.
          <fpage>287</fpage>
          -
          <lpage>300</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Burattin</surname>
          </string-name>
          ,
          <article-title>PLG2: Multiperspective process randomization with online and ofline simulations</article-title>
          ,
          <source>in: BPM (Demos)</source>
          , volume
          <volume>1789</volume>
          <source>of CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>C.</given-names>
            <surname>Di Ciccio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. L.</given-names>
            <surname>Bernardi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cimitile</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. M.</given-names>
            <surname>Maggi</surname>
          </string-name>
          ,
          <article-title>Generating event logs through the simulation of Declare models</article-title>
          ,
          <source>in: EOMAS@CAiSE</source>
          , volume
          <volume>231</volume>
          <source>of Lecture Notes in Business Information Processing</source>
          , Springer,
          <year>2015</year>
          , pp.
          <fpage>20</fpage>
          -
          <lpage>36</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Burattin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Re</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Rossi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Tiezzi</surname>
          </string-name>
          ,
          <article-title>PURPLE: A PURPose-guided Log GEnerator (Extended Abstract)</article-title>
          ,
          <source>in: ICPM Doctoral Consortium / Demo</source>
          , volume
          <volume>3299</volume>
          <source>of CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>90</fpage>
          -
          <lpage>94</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>V.</given-names>
            <surname>Skydanienko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Di Francescomarino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Ghidini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. M.</given-names>
            <surname>Maggi</surname>
          </string-name>
          ,
          <article-title>A tool for generating event logs from multi-perspective Declare models</article-title>
          , in: BPM (Dissertation/Demos/Industry),
          <source>CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>111</fpage>
          -
          <lpage>115</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>I.</given-names>
            <surname>Donadello</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Riva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. M.</given-names>
            <surname>Maggi</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. Shikhizada,</surname>
          </string-name>
          <article-title>Declare4Py: A Python library for declarative process mining, in: BPM (Demos)</article-title>
          ,
          <source>CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>117</fpage>
          -
          <lpage>121</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Burattin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. M.</given-names>
            <surname>Maggi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sperduti</surname>
          </string-name>
          ,
          <article-title>Conformance checking based on multi-perspective declarative process models</article-title>
          ,
          <source>Expert Syst. Appl</source>
          .
          <volume>65</volume>
          (
          <year>2016</year>
          )
          <fpage>194</fpage>
          -
          <lpage>211</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>F.</given-names>
            <surname>Chiariello</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. M.</given-names>
            <surname>Maggi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Patrizi</surname>
          </string-name>
          ,
          <article-title>ASP-based declarative process mining</article-title>
          , in: AAAI, AAAI Press,
          <year>2022</year>
          , pp.
          <fpage>5539</fpage>
          -
          <lpage>5547</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>