=Paper=
{{Paper
|id=Vol-3299/Paper20
|storemode=property
|title=Process Analysis with bupaR 0.5.0: What’s New? (Extended Abstract)
|pdfUrl=https://ceur-ws.org/Vol-3299/Paper20.pdf
|volume=Vol-3299
|authors=Gerhardus A. W. M. van Hulzen,Gert Janssenswillen,Niels Martin,Benoît Depaire
|dblpUrl=https://dblp.org/rec/conf/icpm/HulzenJMD22
}}
==Process Analysis with bupaR 0.5.0: What’s New? (Extended Abstract)==
Process Analysis with bupaR 0.5.0: What’s New? (Extended Abstract) Gerhardus A. W. M. van Hulzen1,* , Gert Janssenswillen1 , Niels Martin1,2 and Benoît Depaire1 1 Research group Business Informatics, Hasselt University, Martelarenlaan 42, 3500 Hasselt, Belgium 2 Research Foundation Flanders (FWO), Egmontstraat 5, 1000 Brussels, Belgium Abstract bupaR and the bupaverse are a collection of open-source R-packages designed for process data analysis in R. Due to its focus on interactivity, reproducibility, and extensibility, combined with its open-source nature, bupaR has seen a significant increase in usage over the past few years, both by academics and professional process analysts. In this demonstration, we highlight the new features of bupaR 0.5.0, which can assist practitioners when analysing their process data. Keywords bupaR, R, Process analytics, Process mining, Event data 1. Introduction Several open-source software solutions are available for process mining analyses, such as ProM [1], PM4Py [2], Apromore CE [3], and bupaR [4]. The availability of these tools allows professionals to experiment and experience the value of process mining easily and free of charge. For process and data analysts familiar with the statistical software environment R [5], the bupaverse collection of R-packages provide a starting point for the analysis of process data. The core focus of bupaverse is based on three key principles: (i) extensibility, (ii) reproducibility, and (iii) interactivity [4, 6]. These fundamental principles, together with its open-source nature, have contributed to its widespread use. We continuously improve and add new features to enhance the functionalities offered by bupaverse. This paper presents the release highlights of bupaR 0.5.0 [7], discusses its maturity and how one can start using it, and briefly looks forward to future development and releases. ICPM 2022 Doctoral Consortium and Tool Demonstration Track * Corresponding author. $ gerard.vanhulzen@uhasselt.be (G. A. W. M. van Hulzen); gert.janssenswillen@uhasselt.be (G. Janssenswillen); niels.martin@uhasselt.be (N. Martin); benoit.depaire@uhasselt.be (B. Depaire) 0000-0001-8962-9515 (G. A. W. M. van Hulzen); 0000-0002-7474-2088 (G. Janssenswillen); 0000-0003-3279-3853 (N. Martin); 0000-0003-4735-0609 (B. Depaire) © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) 95 2. New Features 2.1. Activity Log In bupaR 0.5.0, a new kind of log format has been introduced: the activity log. In an activity log, each row represents a single activity instance. This means that, as opposed to an event log in which each row represents an event occurring at a particular point in time, an activity log can have multiple timestamps per row (e.g. schedule, start, complete, etc.) [8, 9]. These are stored across multiple columns, in contrast to the single timestamp column of an event log. An example of conversion between event log to activity log and vice versa is shown in Fig. 1. eventlog Event 1 Activity instance a activitylog Event 2 Activity instance b Activity instance a Event 1 Event 3 Activity instance b Event 2 Event 4 Activity instance c Event 5 Activity instance c Event 3 Event 4 Event 5 Event 6 Activity instance d Activity instance d Event 6 Event 7 Activity instance e Event 7 Event 8 Activity instance e Event 8 Activity instance f Event 9 Event 10 Event 9 Activity instance f Activity instance g Event 11 Event 10 Event 11 Activity instance g Figure 1: Conversion from eventlog to activitylog, and vice versa [7]. The activity log has been implemented as a new S3 class object (activitylog) alongside the existing eventlog object. The main advantages of the new activitylog object are a reduced memory footprint and increased analysis performance. Especially for analyses on activity instance level, e.g. the durations of activities, the new activitylog is more convenient and efficient because all events belonging to the same activity instance are stored on the same entry in the log. Moreover, activity attributes are recorded only once per activity instance, instead of repeatedly for each event of the same instance. Nevertheless, this does not imply that eventlog is completely superseded. In fact, the eventlog provides more flexibility because attributes can be stored at the event level, allowing events of the same activity instance to have different attributes. For example, different resources could be responsible for the start and completion of an activity instance. In addition, in an eventlog, the same lifecycle (e.g. schedule, start, complete, etc.) can be repeated multiple times, which is useful when the activity instance was suspended and later resumed. Therefore, depend- ing on the use case, either eventlog or activitylog is the most appropriate format. Currently, bupaR, edeaR, processmapR, and processcheckR fully support activitylog objects, and 96 other bupaverse packages will follow in subsequent releases. Moreover, logs can be conveni- ently transformed from one into the other using the to_eventlog() and to_activitylog() functions. tibble tbl_df bupaR log eventlog activitylog grouped_eventlog grouped_activitylog grouped_log Figure 2: bupaR S3 class inheritance schema. In order to implement activitylog and facilitate the extendibility of the bupaR ecosystem, we have revised the S3 class inheritance of log objects. Fig. 2 visualises the new class inheritance schema. Both eventlog and activitylog are inherited from the new base log class, which in turn uses a tbl_df from the tibble package [10] as back-end data storage. When grouping is applied to a log class using the group_by() functions, it becomes a grouped_log to signify the presence of grouping variable(s). 2.2. Augmenting Logs As of edeaR 0.9.0, our package for exploratory and descriptive event data analysis, all append and append_column arguments of descriptive metrics (e.g. activity_frequency(), processing_time(), etc.) have been deprecated in favour of a new augment() method, which is consistent with the broom package [10] for adding outputs of predictions and estimations to data. The new workflow is visualised in Fig. 3, and a code example is provided in Listing 1. For instance, we can calculate the throughput times for each case on the sepsis log and add these times back to the sepsis log as a new column "case_throughput_time". log metric() augment() augmented_log Figure 3: Augmenting a log [7]. 1 sepsis %>% 2 throughput_time(level = "case") %>% 3 augment(log = sepsis, columns = "throughput_time", prefix = "case") Listing 1: R example of augmenting a log. 97 This new workflow ensures consistent separation between the outputs of descriptive metrics and log objects. Furthermore, the augment() method provides a standardised, flexible, and transparent way to enrich logs with descriptive metrics. 2.3. Improved Data Manipulation Significant changes have been made to the supported dplyr [10] methods for data manipulation in bupaR (e.g. filter, mutate, slice, etc.), most significantly to group_by(), for grouping event data for descriptive analyses. For example, the number of cases in which each activity was executed can be calculated using the code shown on line 1 in Listing 2. 1 sepsis %>% group_by(activity) %>% n_cases() 2 sepsis %>% group_by_ids(activity_id) %>% n_cases() 3 sepsis %>% group_by_activity() %>% n_cases() Listing 2: R example of group_by. A more convenient way of grouping log objects as of bupaR 0.5.0 is by using the group_by_ids() method, completed with the desired bupaR attribute function(s) (e.g. activity_id, case_id, etc.), or by directly using group_by_activity(), as shown on lines 2 and 3, respectively. These new grouping methods allow conducting grouped descriptive analyses more conveniently without knowing the underlying column names. Moreover, the handling of grouped logs is improved so that any metric can now be computed for any (set of) grouping variable(s). 3. Maturity & Usage Since its conception, bupaR has received over 800K downloads in over 160 countries. Users come from various industries, e.g., healthcare, governance, automotive, and academics. Stable versions of bupaR and other bupaverse packages can be installed from CRAN us- ing install.packages("bupaverse") or, for the version with the latest patches and bug- fixes, directly from GitHub1 using devtools::install_github("bupaverse/bupaverse"). A demonstration of the release can be found here.2 Furthermore, the bupar.net website contains ample documentation and examples on bupaR and the bupaverse packages. 4. Conclusion & Future Work This paper presented the release highlights of bupaR 0.5.0, most notably the introduction of the activity log, a new standardised way to augment logs, and improved data manipulation. Future releases will focus on extending the bupaverse ecosystem with new functionalities for process analysis and maintenance of existing code. New functionalities, such as Performance Spectrum [11], trace and activity clustering, social network mining and process discovery, are currently on the roadmap. Other functionalities can be requested using GitHub Issues.1 1 https://github.com/bupaverse/ 2 https://tinyurl.com/icpmdemobupar 98 Acknowledgments The authors would like to warmly thank all users who are actively contributing to the bupaR- framework by submitting issues and pull requests on the GitHub1 repositories. This study was supported by the Special Research Fund (BOF) of Hasselt University under Grant No. BOF19OWB20. References [1] B. F. van Dongen, A. K. A. de Medeiros, E. H. M. W. Verbeek, A. J. M. M. Weijters, W. M. P. van der Aalst, The ProM Framework: A New Era in Process Mining Tool Support, volume 3536 of LNCS, Springer, 2005, pp. 444–454. doi:10.1007/11494744_25. [2] A. Berti, S. J. van Zelst, W. M. P. van der Aalst, Process Mining for Python (PM4Py): Bridging the Gap Between Process- and Data Science, volume 2374 of CEUR Workshop Proceedings, 2019, pp. 13–16. [3] M. La Rosa, H. A. Reijers, W. M. P. van der Aalst, R. M. Dijkman, J. Mendling, M. Dumas, L. García-Bañuelos, APROMORE: An Advanced Process Model Repository, Expert Syst. Appl. 38 (2011) 7029–7040. doi:10.1016/j.eswa.2010.12.012. [4] G. Janssenswillen, B. Depaire, M. Swennen, M. J. Jans, K. Vanhoof, bupaR: Enabling Reproducible Business Process Analysis, Knowl. Based Syst. 163 (2019) 927–930. doi:10. 1016/j.knosys.2018.10.018. [5] R Core Team, R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, 2022. URL: https://www.R-project.org. [6] G. Janssenswillen, F. Mannhardt, M. Creemers, B. Depaire, L. Jooken, N. Martin, G. Van Houdt, Extensions to the bupaR Ecosystem: An Overview, volume 2703 of CEUR Workshop Proceedings, 2020, pp. 43–46. [7] G. Janssenswillen, bupaR 0.5.0: What’s new?, 2022. URL: https://bupar.net/2022/07/27/ bupar-0-5-0-whats-new/. [8] N. Martin, G. Van Houdt, G. Janssenswillen, DaQAPO: Supporting Flexible and Fine- Grained Event Log Quality Assessment, Expert Syst. Appl. 191 (2022) 116274. doi:10. 1016/j.eswa.2021.116274. [9] L. Bouarfa, J. Dankelman, Workflow Mining and Outlier Detection from Clinical Activity Logs, J. Biomed. Inform. 45 (2012) 1185–1190. doi:10.1016/j.jbi.2012.08.003. [10] H. Wickham, M. Averick, J. Bryan, W. Chang, L. D. McGowan, R. François, G. Grolemund, A. Hayes, L. Henry, J. Hester, M. Kuhn, T. L. Pedersen, E. Miller, S. M. Bache, K. Müller, J. Ooms, D. Robinson, D. P. Seidel, V. Spinu, K. Takahashi, D. Vaughan, C. Wilke, K. Woo, H. Yutani, Welcome to the Tidyverse, J. Open Source Softw. 4 (2019) 1686. doi:10.21105/ joss.01686. [11] V. Denisov, E. Belkina, D. Fahland, W. M. P. van der Aalst, The Performance Spectrum Miner: Visual Analytics for Fine-Grained Performance Analysis of Processes, volume 2196 of CEUR Workshop Proceedings, 2018, pp. 96–100. 99