<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>bupaR: Business Process Analysis in R</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Gert Janssenswillen</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>t Depaire</string-name>
          <email>benoit.depaireg@uhasselt.be</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>UHasselt - Hasselt University, Faculty of Business Economics</institution>
          ,
          <addr-line>Agoralaan, 3590 Diepenbeek</addr-line>
          ,
          <institution>Belgium Research Foundation Flanders (FWO)</institution>
          ,
          <addr-line>Egmontstraat 5, 1060 Brussels</addr-line>
          ,
          <country country="BE">Belgium</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper introduces bupaR which is a collection of Rpackages which can be used for exploring and visualizing event data and monitoring processes. It creates a bridge between the BPM community and the broader R and data science communities. Furthermore, it stimulates the use of process mining in corporations, as R is a very wide used analysis tool in business, with its popularity still strongly increasing.</p>
      </abstract>
      <kwd-group>
        <kwd>R</kwd>
        <kwd>process analysis</kwd>
        <kwd>process monitoring</kwd>
        <kwd>data analysis</kwd>
        <kwd>statistics</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Over the past decades, the open source statistical language R [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] has seen an
enormous increase in popularity, not only among data science researchers, but
also within companies. One of the reasons for this rising popularity is the
Rpackage ecosystem on CRAN and github to which everyone can contribute.
Recently, the number of packages available on CRAN has exceeded 10.000. These
provide a huge range of functionalities, covering a diverse set of techniques and
applications.
      </p>
      <p>This paper introduces bupaR, which is a set of R-packages for business
process analysis in R. The importance of these for the BPM community is twofold.</p>
      <p>Firstly, it allows researchers and data analysts to tap into the wide range of
functionalities in the R-ecosystem. This does not only include traditional data
mining techniques, such as clustering, classi cation etc, but also new
developments, such as text mining, spark, deep learning, etc. Furthermore, since R
originated as a statistical language, a vast amount of statistical tests becomes
available, allowing straightforward con rmatory analyses on event data.</p>
      <p>Secondly, the introduced packages will allow to bridge the gap between
business and academics, as R is increasingly used as a tool for data analysis in
corporations. As such, the packages will provide a low threshold for companies to
discover the bene ts of business process analysis and to become more acquainted
with the BPM community.</p>
      <p>The next section will provide an overview of the packages and their
functionalities, while Section 3 discusses the maturity of the tools and some use
cases. Section 4 then provides more practical information on how to get started,
including a link to a screencast and to the website with further documentation.</p>
      <p>Overview of functionalities
bupaR consists of di erent R-packages, each with their own purpose. The package
bupaR itselfs acts as the central heart of the suite, providing the basic
functionality for handling event data which is used by the other packages. The other
packages are listed below, and are brie y introduced in the next paragraphs.
{ edeaR: for exploratory and descriptive analysis of event data
{ xesreadR: for reading and writing xes les
{ processmapR: for creating process visualizations
{ processmonitoR: for creating process dashboards for monitoring
{ eventdataR: a data repository containing both real-life and arti cial event
logs
2.1</p>
      <p>bupaR
The central package includes basic functionality for creating eventlog-objects
in R. An event log in this context is a dataset combined with a mapping. The
mapping contains the characteristics of the speci c event log, i.e. the case
identi er, activity identi er, timestamp, etc. Each element of this mapping refers to
a variable in the data set. Figure 1 shows how an event log is created using the
user interface of bupaR1.</p>
      <p>The package further contains several functions to get information about an
event log. Furthermore, bupaR also provides speci c event log versions for the
generic R functions print and summary. For a complete list of all the functions,
please look here: http://bupar.net/bupar.html.
1 The function used in Figure 1 is called ieventlog, in which the rst letter i indicates
that it opens a user interface, which is useful when doing interactive analyis. However,
each of these interface functions has a non-interface alternative (in this case the
function eventlog, which can be used both interactively and for scripting).
2.2</p>
      <p>
        edeaR
The name of the package edeaR stands for Exploratory and Descriptive
Eventdata Analysis in R. It was rst introduced at SIMPDA 2016 [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and is the oldest
member of the bupaR family, although it has gone through important
improvements and extensions recently. As itsname implies, the purpose of this package
is to perform more in-depth analyses of event logs. It basically contains two sets
of functions: metrics for the analysis of data and lters for data subsetting.
      </p>
      <p>
        The metric functions are largely build around the concepts of Lean Six
Sigma en Operational Excellence [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], i.e. metrics concerning time (processing
time, throughput time, idle time), variance (trace length, trace coverage, start
activities, etc.), rework (repetitions of work, sel oops of activites), and nally
also concerning organizational aspects (e.g. involvement and specialization of
resources) [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Furthermore, each of the available metrics can be computed at
di erent levels of granularity: e.g. for the complete log, for speci c cases, for
speci c resources, etc.
      </p>
      <p>
        While each of the metrics returns numeric results (an output table or a set
of numbers), the generic plot function of R has been customized to provided
tailored plots for each of the metrics at each level of granularity [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>The subsetting functions can be found easily as each of them starts with
the pre x filter . Di erent lter functions are available: ltering speci c case
identi ers, speci c attributes (event or case attributes), ltering on throughput
time, ltering activities based on their frequencies, etc. Note that for each of
the lter functions, also an interface is provided for straightforward interactive
use. A complete list of functions and guide for edeaR can be found here: http:
//bupar.net/edeaR.html.
2.3</p>
      <p>processmapR
The package processmapR provides a straightforward way to create
visualizations of processes, using customizable map pro les. The default pro le is the
frequency pro le with absolute frequencies on both arcs and nodes, as can
be seen in the example in Figure 2. The return object is a dgr graph
object from the DiagrammeR package, and can be further customized by the user
if needed. Complex graphs can be simpli ed by combining the processmapR
functions with subsetting functions of edeaR. More information can be found
http://bupar.net/processmapR.html.
2.4</p>
      <p>
        xesreadR
As is clear from the name, the purpose of xesreadR is to read and write xes- les.
This makes bupar compatible with the IEEE standard for sharing and storing
event data, and thus with other process mining tools. For example, this is useful
if you want to combine bupaR with RapidProM, as RapidMiner also supports
the use of R-scripts.
With the purpose of testing new techniques and algorithms, the package
eventdataR contains a data repository of both arti cial and real-life event log.
Each of them can be loaded very straightforwardly with the data function in R.
The available datasets are listed here: http://bupar.net/eventdataR.html.
While the other packages can be used both interactively and in production,
processmonitoR is mostly intended for the latter case, as it provides
building blocks for online process monitoring dashboards. The available dashboards,
which build upon the Shiny framework [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], are largely organized along the
metrics of edeaR.
3
      </p>
    </sec>
    <sec id="sec-2">
      <title>Maturity and Use Cases</title>
      <p>
        As explained, the packages can be used for several purposes, ranging from a
onetime interactive analysis of event data to building a full- edged online process
monitoring dashboard. edeaR, the oldest package included in bupaR was rst
published on CRAN in January 2016 and has since been downloaded more than
5000 times wordwide. Its use is also illustrated in several submissions for the
BPI challenge [
        <xref ref-type="bibr" rid="ref3 ref5">3,5</xref>
        ].
      </p>
      <p>Although the other packages have been shared publicly more recently, the
underlying functionality has already proved useful in several projects. A
process monitoring dashboard was developed to analyze the process of connecting
new customers at a Belgian utility network operator, while the visualizations of
processes have been used for visualizing train deviations at a European railway
infrastructure manager.</p>
    </sec>
    <sec id="sec-3">
      <title>Getting started</title>
      <p>Screencast A screencast discussing the installation and use of bupaR and the
related packages can be found here: https://goo.gl/huHTGE.</p>
      <p>Prerequisites In order to use bupaR, R and Rstudio (or another IDE) need
to be installed. More information can be found on cran.rstudio.com and www.
rstudio.com.</p>
      <p>Installing bupaR Installing bupaR can be done easily by executing
install.packages("bupaR") in the R-console. You can then load it using
library(bupaR). The rst time you will be asked to also install the related
packages, on which you should answer Yes (Y).</p>
      <p>Further guidance The website www.bupar.net contains ample documentation
and examples of the tools discussed.</p>
    </sec>
    <sec id="sec-4">
      <title>Acknowledgements</title>
      <p>Special thanks and appreciation go to Mieke Jans and Marijke Swennen for
catalyzing the development of the initial version of edeaR and for their suggestions
and feedback during the further evolution of bupaR.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <issue>1</issue>
          .
          <string-name>
            <surname>Chang</surname>
          </string-name>
          , W., Cheng, J.,
          <string-name>
            <surname>Allaire</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xie</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McPherson</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <article-title>: shiny: Web application framework for</article-title>
          r,
          <year>2015</year>
          . URL http://CRAN. R-project. org/package= shiny.
          <source>R package version 0</source>
          .11 (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Ihaka</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gentleman</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          : R:
          <article-title>a language for data analysis and graphics</article-title>
          .
          <source>Journal of computational and graphical statistics 5(3)</source>
          ,
          <volume>299</volume>
          {
          <fpage>314</fpage>
          (
          <year>1996</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Janssenswillen</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Creemers</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jouck</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Martin</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Swennen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Does werk</article-title>
          .
          <source>nl work?</source>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Janssenswillen</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Swennen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Depaire</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jans</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vanhoof</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Enabling event-data analysis in r: Demonstration</article-title>
          . RWTH Aachen University (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Martin</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Janssenswillen</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jouck</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Swennen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hosseinpour</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Masoumigoudarzi</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>An exploration and analysis of the building permit application process in ve dutch municipalities (</article-title>
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Swennen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Janssenswillen</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jans</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Depaire</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vanhoof</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Capturing process behavior with log-based process metrics</article-title>
          .
          <source>In: SIMPDA</source>
          . pp.
          <volume>141</volume>
          {
          <issue>144</issue>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Swennen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Martin</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Janssenswillen</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jans</surname>
            ,
            <given-names>M.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Depaire</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Caris</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vanhoof</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Capturing resource behaviour from event logs</article-title>
          . RWTH Aachen University (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Wickham</surname>
          </string-name>
          , H.: ggplot2.
          <source>Wiley Interdisciplinary Reviews: Computational Statistics</source>
          <volume>3</volume>
          (
          <issue>2</issue>
          ),
          <volume>180</volume>
          {
          <fpage>185</fpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>