<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Doctoral Consortium and Tool Demonstration Track
* Corresponding author.
$ gerard.vanhulzen@uhasselt.be (G. A. W. M. van Hulzen); gert.janssenswillen@uhasselt.be (G. Janssenswillen);
niels.martin@uhasselt.be (N. Martin); benoit.depaire@uhasselt.be (B. Depaire)</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Process Analysis with bupaR 0.5.0: What's New? (Extended Abstract)</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Gerhardus A. W. M. van Hulzen</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gert Janssenswillen</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Niels Martin</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Benoît Depaire</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Research Foundation Flanders (FWO)</institution>
          ,
          <addr-line>Egmontstraat 5, 1000 Brussels</addr-line>
          ,
          <country country="BE">Belgium</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Research group Business Informatics, Hasselt University</institution>
          ,
          <addr-line>Martelarenlaan 42, 3500 Hasselt</addr-line>
          ,
          <country country="BE">Belgium</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2022</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0001</lpage>
      <abstract>
        <p>bupaR and the bupaverse are a collection of open-source R-packages designed for process data analysis in R. Due to its focus on interactivity, reproducibility, and extensibility, combined with its open-source nature, bupaR has seen a significant increase in usage over the past few years, both by academics and professional process analysts. In this demonstration, we highlight the new features of bupaR 0.5.0, which can assist practitioners when analysing their process data.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;bupaR</kwd>
        <kwd>R</kwd>
        <kwd>Process analytics</kwd>
        <kwd>Process mining</kwd>
        <kwd>Event data</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Several open-source software solutions are available for process mining analyses, such as
ProM [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], PM4Py [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], Apromore CE [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], and bupaR [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The availability of these tools allows
professionals to experiment and experience the value of process mining easily and free of
charge.
      </p>
      <p>
        For process and data analysts familiar with the statistical software environment R [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], the
bupaverse collection of R-packages provide a starting point for the analysis of process data.
The core focus of bupaverse is based on three key principles: (i) extensibility, (ii) reproducibility,
and (iii) interactivity [
        <xref ref-type="bibr" rid="ref4 ref6">4, 6</xref>
        ]. These fundamental principles, together with its open-source nature,
have contributed to its widespread use.
      </p>
      <p>
        We continuously improve and add new features to enhance the functionalities ofered by
bupaverse. This paper presents the release highlights of bupaR 0.5.0 [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], discusses its maturity
and how one can start using it, and briefly looks forward to future development and releases.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. New Features</title>
      <sec id="sec-2-1">
        <title>2.1. Activity Log</title>
        <p>
          In bupaR 0.5.0, a new kind of log format has been introduced: the activity log. In an activity
log, each row represents a single activity instance. This means that, as opposed to an event
log in which each row represents an event occurring at a particular point in time, an activity
log can have multiple timestamps per row (e.g. schedule, start, complete, etc.) [
          <xref ref-type="bibr" rid="ref8 ref9">8, 9</xref>
          ]. These are
stored across multiple columns, in contrast to the single timestamp column of an event log. An
example of conversion between event log to activity log and vice versa is shown in Fig. 1.
        </p>
        <p>The activity log has been implemented as a new S3 class object (activitylog) alongside the
existing eventlog object. The main advantages of the new activitylog object are a reduced
memory footprint and increased analysis performance. Especially for analyses on activity
instance level, e.g. the durations of activities, the new activitylog is more convenient and
eficient because all events belonging to the same activity instance are stored on the same entry
in the log. Moreover, activity attributes are recorded only once per activity instance, instead of
repeatedly for each event of the same instance.</p>
        <p>Nevertheless, this does not imply that eventlog is completely superseded. In fact, the
eventlog provides more flexibility because attributes can be stored at the event level, allowing
events of the same activity instance to have diferent attributes. For example, diferent resources
could be responsible for the start and completion of an activity instance. In addition, in an
eventlog, the same lifecycle (e.g. schedule, start, complete, etc.) can be repeated multiple times,
which is useful when the activity instance was suspended and later resumed. Therefore,
depending on the use case, either eventlog or activitylog is the most appropriate format. Currently,
bupaR, edeaR, processmapR, and processcheckR fully support activitylog objects, and
other bupaverse packages will follow in subsequent releases. Moreover, logs can be
conveniently transformed from one into the other using the to_eventlog() and to_activitylog()
functions.</p>
        <p>tibble
bupaR</p>
        <p>eventlog
grouped_eventlog
tbl_df</p>
        <p>log
grouped_log</p>
        <p>
          In order to implement activitylog and facilitate the extendibility of the bupaR ecosystem,
we have revised the S3 class inheritance of log objects. Fig. 2 visualises the new class inheritance
schema. Both eventlog and activitylog are inherited from the new base log class, which in
turn uses a tbl_df from the tibble package [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] as back-end data storage. When grouping
is applied to a log class using the group_by() functions, it becomes a grouped_log to signify
the presence of grouping variable(s).
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Augmenting Logs</title>
        <p>
          As of edeaR 0.9.0, our package for exploratory and descriptive event data analysis, all
append and append_column arguments of descriptive metrics (e.g. activity_frequency(),
processing_time(), etc.) have been deprecated in favour of a new augment() method, which
is consistent with the broom package [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] for adding outputs of predictions and estimations to
data. The new workflow is visualised in Fig. 3, and a code example is provided in Listing 1. For
instance, we can calculate the throughput times for each case on the sepsis log and add these
times back to the sepsis log as a new column "case_throughput_time".
        </p>
        <p>log
metric()
augment()
augmented_log
1 sepsis %&gt;%
2 throughput_time(level = "case") %&gt;%
3 augment(log = sepsis, columns = "throughput_time", prefix = "case")</p>
        <sec id="sec-2-2-1">
          <title>Listing 1: R example of augmenting a log.</title>
          <p>This new workflow ensures consistent separation between the outputs of descriptive metrics
and log objects. Furthermore, the augment() method provides a standardised, flexible, and
transparent way to enrich logs with descriptive metrics.</p>
        </sec>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Improved Data Manipulation</title>
        <p>
          Significant changes have been made to the supported dplyr [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] methods for data manipulation
in bupaR (e.g. filter, mutate, slice, etc.), most significantly to group_by(), for grouping
event data for descriptive analyses. For example, the number of cases in which each activity
was executed can be calculated using the code shown on line 1 in Listing 2.
1 sepsis %&gt;% group_by(activity) %&gt;% n_cases()
2 sepsis %&gt;% group_by_ids(activity_id) %&gt;% n_cases()
3 sepsis %&gt;% group_by_activity() %&gt;% n_cases()
        </p>
        <sec id="sec-2-3-1">
          <title>Listing 2: R example of group_by.</title>
          <p>A more convenient way of grouping log objects as of bupaR 0.5.0 is by using the
group_by_ids() method, completed with the desired bupaR attribute function(s) (e.g.
activity_id, case_id, etc.), or by directly using group_by_activity(), as shown on lines
2 and 3, respectively. These new grouping methods allow conducting grouped descriptive
analyses more conveniently without knowing the underlying column names. Moreover, the
handling of grouped logs is improved so that any metric can now be computed for any (set of)
grouping variable(s).</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Maturity &amp; Usage</title>
      <p>Since its conception, bupaR has received over 800K downloads in over 160 countries. Users
come from various industries, e.g., healthcare, governance, automotive, and academics.
Stable versions of bupaR and other bupaverse packages can be installed from CRAN
using install.packages("bupaverse") or, for the version with the latest patches and
bugifxes, directly from GitHub 1 using devtools::install_github("bupaverse/bupaverse"). A
demonstration of the release can be found here.2 Furthermore, the bupar.net website contains
ample documentation and examples on bupaR and the bupaverse packages.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion &amp; Future Work</title>
      <p>This paper presented the release highlights of bupaR 0.5.0, most notably the introduction of the
activity log, a new standardised way to augment logs, and improved data manipulation.</p>
      <p>
        Future releases will focus on extending the bupaverse ecosystem with new functionalities
for process analysis and maintenance of existing code. New functionalities, such as Performance
Spectrum [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], trace and activity clustering, social network mining and process discovery, are
currently on the roadmap. Other functionalities can be requested using GitHub Issues.1
      </p>
      <sec id="sec-4-1">
        <title>1https://github.com/bupaverse/ 2https://tinyurl.com/icpmdemobupar</title>
        <p>The authors would like to warmly thank all users who are actively contributing to the
bupaRframework by submitting issues and pull requests on the GitHub1 repositories.</p>
        <p>This study was supported by the Special Research Fund (BOF) of Hasselt University under
Grant No. BOF19OWB20.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>B. F. van Dongen</given-names>
            ,
            <surname>A. K. A. de Medeiros</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. H. M. W.</given-names>
            <surname>Verbeek</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. J. M. M. Weijters</surname>
            ,
            <given-names>W. M. P. van der Aalst</given-names>
          </string-name>
          ,
          <article-title>The ProM Framework: A New Era in Process Mining Tool Support</article-title>
          , volume
          <volume>3536</volume>
          <source>of LNCS</source>
          , Springer,
          <year>2005</year>
          , pp.
          <fpage>444</fpage>
          -
          <lpage>454</lpage>
          . doi:
          <volume>10</volume>
          .1007/11494744_
          <fpage>25</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Berti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. J. van Zelst</given-names>
            ,
            <surname>W. M. P. van der Aalst</surname>
          </string-name>
          ,
          <article-title>Process Mining for Python (PM4Py): Bridging the Gap Between Process-</article-title>
          and
          <source>Data Science</source>
          , volume
          <volume>2374</volume>
          <source>of CEUR Workshop Proceedings</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>13</fpage>
          -
          <lpage>16</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M.</given-names>
            <surname>La Rosa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. A.</given-names>
            <surname>Reijers</surname>
          </string-name>
          ,
          <string-name>
            <surname>W. M. P. van der Aalst</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. M.</given-names>
            <surname>Dijkman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Mendling</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dumas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>García-Bañuelos</surname>
          </string-name>
          ,
          <source>APROMORE: An Advanced Process Model Repository, Expert Syst. Appl</source>
          .
          <volume>38</volume>
          (
          <year>2011</year>
          )
          <fpage>7029</fpage>
          -
          <lpage>7040</lpage>
          . doi:
          <volume>10</volume>
          .1016/j.eswa.
          <year>2010</year>
          .
          <volume>12</volume>
          .012.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>G.</given-names>
            <surname>Janssenswillen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Depaire</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Swennen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Jans</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Vanhoof</surname>
          </string-name>
          , bupaR:
          <source>Enabling Reproducible Business Process Analysis, Knowl. Based Syst</source>
          .
          <volume>163</volume>
          (
          <year>2019</year>
          )
          <fpage>927</fpage>
          -
          <lpage>930</lpage>
          . doi:
          <volume>10</volume>
          . 1016/j.knosys.
          <year>2018</year>
          .
          <volume>10</volume>
          .018.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>R</given-names>
            <surname>Core Team</surname>
          </string-name>
          ,
          <string-name>
            <surname>R:</surname>
          </string-name>
          <article-title>A Language and Environment for Statistical Computing</article-title>
          , R Foundation for Statistical Computing,
          <year>2022</year>
          . URL: https://www.R-project.
          <source>org.</source>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>G.</given-names>
            <surname>Janssenswillen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Mannhardt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Creemers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Depaire</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Jooken</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Martin</surname>
          </string-name>
          ,
          <string-name>
            <surname>G. Van Houdt</surname>
          </string-name>
          ,
          <article-title>Extensions to the bupaR Ecosystem: An Overview</article-title>
          , volume
          <volume>2703</volume>
          <source>of CEUR Workshop Proceedings</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>43</fpage>
          -
          <lpage>46</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>G.</given-names>
            <surname>Janssenswillen</surname>
          </string-name>
          ,
          <source>bupaR 0.5.0: What's new?</source>
          ,
          <year>2022</year>
          . URL: https://bupar.net/
          <year>2022</year>
          /07/27/ bupar-0-5-0
          <string-name>
            <surname>-</surname>
          </string-name>
          whats-new/.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>N.</given-names>
            <surname>Martin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Van Houdt</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Janssenswillen, DaQAPO: Supporting Flexible and FineGrained Event Log Quality Assessment, Expert Syst</article-title>
          .
          <source>Appl</source>
          .
          <volume>191</volume>
          (
          <year>2022</year>
          )
          <article-title>116274</article-title>
          . doi:
          <volume>10</volume>
          . 1016/j.eswa.
          <year>2021</year>
          .
          <volume>116274</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>L.</given-names>
            <surname>Bouarfa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dankelman</surname>
          </string-name>
          ,
          <article-title>Workflow Mining and Outlier Detection from Clinical Activity Logs</article-title>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Biomed</surname>
          </string-name>
          . Inform.
          <volume>45</volume>
          (
          <year>2012</year>
          )
          <fpage>1185</fpage>
          -
          <lpage>1190</lpage>
          . doi:
          <volume>10</volume>
          .1016/j.jbi.
          <year>2012</year>
          .
          <volume>08</volume>
          .003.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>H.</given-names>
            <surname>Wickham</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Averick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bryan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. D.</given-names>
            <surname>McGowan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>François</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Grolemund</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hayes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Henry</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hester</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kuhn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. L.</given-names>
            <surname>Pedersen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Miller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Bache</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Müller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Ooms</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Robinson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. P.</given-names>
            <surname>Seidel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Spinu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Takahashi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Vaughan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Wilke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Woo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Yutani</surname>
          </string-name>
          , Welcome to the Tidyverse, J. Open Source Softw.
          <volume>4</volume>
          (
          <year>2019</year>
          )
          <article-title>1686</article-title>
          . doi:
          <volume>10</volume>
          .21105/ joss.01686.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>V.</given-names>
            <surname>Denisov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Belkina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Fahland</surname>
          </string-name>
          ,
          <string-name>
            <surname>W. M. P. van der Aalst,</surname>
          </string-name>
          <article-title>The Performance Spectrum Miner: Visual Analytics for Fine-Grained Performance Analysis of Processes</article-title>
          , volume
          <volume>2196</volume>
          <source>of CEUR Workshop Proceedings</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>96</fpage>
          -
          <lpage>100</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>