<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>B. Wuyts)</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>The DyLoPro Library: Comprehensively Profiling the Dynamics of Event Logs by Means of Visual Analytics</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Brecht Wuyts</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hans Weytjens</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Seppe vanden Broucke</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jochen De Weerdt</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Business Informatics and Operations Management, Ghent University</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>LIRIS, Faculty of Economics and Business, KU Leuven</institution>
          ,
          <addr-line>Leuven</addr-line>
          ,
          <country country="BE">Belgium</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0001</lpage>
      <abstract>
        <p>In Process Mining, a notable issue arises due to the discrepancy between prevailing techniques, which assume constancy in business processes, and the actuality of modern business processes characterized by frequent changes. As this discrepancy can lead to biased results, it is crucial that such drifts are detected and understood, prior to applying other PM techniques on the corresponding event logs. However, such drifts in event logs can manifest in diferent forms (sudden, gradual, recurring, incremental) and occur in many diferent process perspectives (control-flow, data, performance). This necessitates approaches that holistically and eficiently delve into the temporal dynamics present in event logs. Therefore, in this paper, we present the DyLoPro Python library, a visual analytics tool that enables PM practitioners to eficiently and comprehensively explore event log dynamics over time, and which caters to all kinds of event logs. This demo paper aims to familiarize Process Mining practitioners with the DyLoPro library, showcasing its main capabilities, and encouraging the Process Mining field to take the time dimension into greater consideration in all stages of their projects.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Event logs violating Process Mining [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] techniques’ ubiquitous assumption of stationary
processes, will induce a significant, yet oftentimes unnoticed bias in their results, potentially
resulting in flawed conclusions. Therefore, it is imperative that the dynamics are analyzed
and potential drifts are identified prior to the application of PM techniques. However, many
diferent forms of drift exist, and drifts can occur in diferent process perspectives (control-flow,
resource, data, performance). This multi-perspective property, together with the sequential
nature of event logs, complicates the process of studying event log dynamics over time. One
intuitive and powerful manner in which event log dynamics over time could be analyzed, is by
means of visualizations. Dotted charts [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] allow for such a multi-dimensional exploration of
the dynamics over time, and are ofered by several open-source PM tools such as ProM [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and
PM4PY [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. However, dotted charts visualize every event, which makes it hard and cumbersome
to entirely visualize the dynamics of large real-life event logs. Another interesting tool that
provides functionality to visualize dynamics over time, is the Performance Spectrum Miner
(PSM) [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. The PSM focuses solely on visualizing the evolution of the time elapsed between a
pair of two consecutive activities. Nevertheless, to the best of our knowledge, no comprehensive
tool has been developed to eficiently explore the dynamics in an event log over time.
This gap is addressed by the recently introduced Dynamic Log Profiling (DyLoPro) framework [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ],
and its complementary Python library, which we present in this paper. On the one hand, the
visual analytics [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] framework enables PM practitioners to conduct a comprehensive analysis
of the dynamics of event logs over time, from various process perspectives, both individually
and combined with the performance perspective. The complementary DyLoPro library, on the
other hand, provides users with a ready-to-use and user-friendly Python implementation of this
framework, thereby empowering users to leverage the framework’s comprehensive visualization
capabilities in an eficient way, too.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. DyLoPro - the Python Library</title>
      <sec id="sec-2-1">
        <title>2.1. Architecture and Intended Use</title>
        <p>
          The DyLoPro library implements and extends the eponymous framework [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ], whose
comprehensiveness is achieved through the incorporation of the main process perspectives - the control-flow,
data (including resources) and performance, along two orthogonal dimensions of log concepts and
representation types. Accordingly, the DyLoPro library provides functionality to construct and
visualize time series, i.e. the log dynamics, for a variety of log concepts, while for each log concept,
the dynamics can be represented using several diferent representation types. All these
visualization capabilities can be accessed and customized by invoking the appropriate methods on the
initialized DynamicLogPlots instance, and specifying the desired values for their parameters,
respectively. Accordingly, first of all, a DynamicLogPlots instance has to be initialized. The
DynamicLogPlots class provides one single source of access to the library’s functionalities, and
thereby serves as the interface between the users’ Python environment and DyLoPro’s underlying
computational logic. Initializing a DynamicLogPlots instance involves specifying a number of
required arguments, including the event log that should be loaded into a pandas [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] DataFrame
object, and a number of optional arguments. Secondly, after specifying the arguments, the software
will first validate the arguments. If an error was found, the software will raise an adequate error,
with a dedicated message describing the error and how to resolve it. If no errors are found, the
log is preprocessed in an internal format that allows DyLoPro to eficiently compute and visualize
all aggregations on an on-demand basis, as illustrated in Fig. 11. For a more elaborate explanation
detailing the way in which the framework is implemented into the library and consequently how
to access DyLoPro’s extensive capabilities, please refer to the ‘Get Started’ section of the project
description2. A brief demonstration video can be found at https://youtu.be/Z9hqpkGbta0.
1Data used: https://doi.org/10.4121/uuid:d06af4b-79f0-45e6-8ec8-e19730c248f1
2See Footnote 5 or 6.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Functionality</title>
        <p>
          After having successfully initialized a DynamicLogPlots instance, all functionality can easily
be accessed by invoking the corresponding methods on this instance (e.g. Fig. 1. As already
mentioned, the DyLoPro library implements the identically named framework, proposed in [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ],
which consists of three stages, (1) log discretization, (2) domain definition , and (3) time series
construction &amp; visualization.
        </p>
        <p>Log discretization boils down to subdividing the event log into equal length3 sublogs. Domain
definition amounts to defining how to capture and represent event log dynamics over time. This
involves defining log concepts, which refer to the primary dimensions used to capture the dynamics
of event logs, and representation types, which determine how the dynamics of the event logs
should be represented and analyzed for each log concept. The domain definition , i.e. the resulting
log concept - representation type combination, translates into a unique domain-specific mapping
function. Finally, time series construction &amp; visualization basically comes down to applying that
mapping function to each of the (chronologically ordered) sublogs of the log discretization, thereby
yielding several time series.</p>
        <p>
          The functionality provided by the framework is implemented as follows. Each of the six log
concepts defined in [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] is assigned one or two dedicated plotting methods (see Table 1). Given
a certain log concept and an associated plotting method, both the log discretization and the
concept’s representation type can simply be configured by customizing the method’s arguments.
More specifically, the log discretization can be configured by specifying the frequency
parameter (e.g. ’weekly’), which determines the frequency by which cases are grouped together, and
the case_assignment parameter (e.g. ’first_event’) , which determines the condition upon
which each case is assigned to one of the time periods. The representation type can be chosen
by specifying each method’s plt_type parameter. To illustrate, the third command in Fig. 1
generates, for the 10 most frequently occurring variants (max_k=10), time series according
to the ’throughput time’ representation type (plt_type=’type_tt’), while each consecutive
value in each of the time series is calculated by aggregating4 over a sublog covering one week
(frequency=’weekly’), and containing all cases of which the first event occurred during that
specific week ( case_assignment=’first_event’). The third and final stage, time series
con3’Equal length’ means that each sublog covers an equal-length time period.
4In this case, the aggregation corresponds to computing the mean for each sublog (numeric_agg=’mean’).
struction &amp; visualization, simply corresponds to the execution of the specified plotting method.
Table 1 lists all six concepts introduced in [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ], together with their visualization methods. In
addition to the common hyperparameters just discussed, each of the plotting methods is also equipped
with additional configuration parameters, providing users with even more flexibility on the fly.
For more information about DyLoPro’s methods and their parameters, the reader is referred to the
documentation page (see Section 3). The DyLoPro library not only implements the framework (see
Table 1), but also extends it, with new functionality added over time when needed. Several
auxiliary methods are also provided to, e.g., enable the user to adjust the initialized DynamicLogPlots
instance after initialization, or for example to query DataFrames containing the string
representations of the most frequently occurring variants or directly-follows relations. Again, please refer
to the extensive documentation page for an up-to-date overview of all of DyLoPro’s functionality.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. DyLoPro’s Maturity</title>
      <p>DyLoPro was first released on PyPI 5 on 18/06/2023. As we move forward, it will be continuously
monitored, and new releases will be launched whenever necessary to ensure the best possible
performance and features for its users. Moreover, DyLoPro is equipped with an encompassing
GitHub repository6. This repository serves as a centralized hub for collaborative development
and houses essential elements to foster a thriving open-source PM community. Additionally,
DyLoPro ofers an extensive Read the Docs page 7, which documents, i.a., how to initialize a
DynamicLogPlots instance, and its visualization methods.</p>
      <p>Furthermore, within a separate repository8, we present case studies that leveraged DyLoPro to
examine the dynamics of various public real-life event logs frequently referenced in PM literature.
These case studies identify and further examine various remarkable patterns and problems in
a number of these event logs. By increasing transparency and understanding of these intricate
datasets, our case studies aim to improve the validity and accuracy of research relying on these
event logs. As part of an ongoing efort, the repository will continue to grow with additional case
studies of additional public event logs over time. Moreover, these case studies also contribute to
an improved understanding of the use and accessibility of DyLoPro’s extensive array of plotting
methods.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>W. M. P. van der Aalst</surname>
          </string-name>
          ,
          <source>Process Mining: Data Science in Action</source>
          , 2 ed., Springer, Heidelberg,
          <year>2016</year>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>662</fpage>
          -49851-4.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Song</surname>
          </string-name>
          , W. Aalst,
          <article-title>Supporting process mining by showing events at a glance</article-title>
          ,
          <source>Proceedings of 17th Annual Workshop on Information Technologies and Systems (WITS</source>
          <year>2007</year>
          )
          <article-title>(</article-title>
          <year>2007</year>
          )
          <fpage>139</fpage>
          -
          <lpage>145</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>B. F. van Dongen</given-names>
            ,
            <surname>A. K. A. de Medeiros</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. M. W.</given-names>
            <surname>Verbeek</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. J. M. M. Weijters</surname>
            ,
            <given-names>W. M. P. van der Aalst</given-names>
          </string-name>
          ,
          <article-title>The prom framework: A new era in process mining tool support</article-title>
          , in: G. Ciardo, P. Darondeau (Eds.),
          <source>Applications and Theory of Petri Nets</source>
          <year>2005</year>
          , Springer Berlin Heidelberg, Berlin, Heidelberg,
          <year>2005</year>
          , pp.
          <fpage>444</fpage>
          -
          <lpage>454</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Berti</surname>
          </string-name>
          , S. van Zelst,
          <string-name>
            <given-names>W.</given-names>
            <surname>Aalst</surname>
          </string-name>
          ,
          <article-title>Process mining for python (pm4py): Bridging the gap between process-</article-title>
          and
          <source>data science</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>V.</given-names>
            <surname>Denisov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Belkina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Fahland</surname>
          </string-name>
          ,
          <string-name>
            <surname>W. Aalst,</surname>
          </string-name>
          <article-title>The performance spectrum miner: Visual analytics for fine-grained performance analysis of processes</article-title>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>B.</given-names>
            <surname>Wuyts</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Weytjens</surname>
          </string-name>
          , S. vanden Broucke, J. De Weerdt,
          <article-title>DyLoPro: Profiling the dynamics of event logs</article-title>
          , in: C. Di
          <string-name>
            <surname>Francescomarino</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Burattin</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Janiesch</surname>
          </string-name>
          , S. Sadiq (Eds.),
          <source>Business Process Management</source>
          , Springer Nature Switzerland, Cham,
          <year>2023</year>
          , pp.
          <fpage>146</fpage>
          -
          <lpage>162</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Wes</surname>
            <given-names>McKinney</given-names>
          </string-name>
          ,
          <article-title>Data Structures for Statistical Computing in Python</article-title>
          , in: Stéfan van der Walt, Jarrod Millman (Eds.),
          <source>Proceedings of the 9th Python in Science Conference</source>
          ,
          <year>2010</year>
          , pp.
          <fpage>56</fpage>
          -
          <lpage>61</lpage>
          . doi:
          <volume>10</volume>
          .25080/Majora-92bf1922-00a.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>