=Paper=
{{Paper
|id=Vol-3299/Paper12
|storemode=property
|title=Amun: A tool for Differentially Private Release of Event Logs for Process Mining (Extended Abstract)
|pdfUrl=https://ceur-ws.org/Vol-3299/Paper12.pdf
|volume=Vol-3299
|authors=Gamal Elkoumy,Alisa Pankova,Marlon Dumas
|dblpUrl=https://dblp.org/rec/conf/icpm/ElkoumyPD22
}}
==Amun: A tool for Differentially Private Release of Event Logs for Process Mining (Extended Abstract)==
Amun: A tool for Differentially Private Release of Event Logs for Process Mining (Extended Abstract) Gamal Elkoumy1,∗ , Alisa Pankova2 and Marlon Dumas1 1 University of Tartu, 18 Narva mnt, Tartu, 51009, Estonia 2 Cybernetica, 20 Narva mnt, Tartu, 51009, Estonia Abstract Event logs capture the execution of business processes inside organizations. Event logs may contain private information about individuals, such as customers in customer-facing business processes, which can be a roadblock to analyzing the logs due to data regulations. To circumvent that, this paper introduces Amun: A web-based application for releasing event logs using differential privacy. The tool enables the users to get a differentially private event log that minimizes the risk to the maximum acceptable threshold given by the user. Therefore, the customer’s privacy is guaranteed, and the organization could release their logs to be analyzed. Keywords Process Mining, Event Log, Differential Privacy Process mining is a family of techniques that analyze the performance, quality, and con- formance of business processes inside organizations [1]. The input to most process mining techniques is an event log that captures an organization’s process execution. An event log may contain sensitive information about the customers being served in a customer-facing business process. Thus, organizations find the analysis of such logs subject to data privacy regulations such as GDPR1 . Privacy-preserving process mining [2] stands to ensure that privacy regulations are met by regulation-compliant guarantees, such as k-anonymity and differential privacy [3]. Some tools enable the user to apply k-anonymity mechanisms to the event logs such as ELPaaS [4] and PC4PM [5]. Other tools enable privacy-preserving process mining across distributed event logs [6]. Among privacy-enhancing technologies, differential privacy stands out due to its proven privacy guarantees and composability. Several approaches in literature have addressed the problem of releasing differentially-private event logs for process mining [2]. However, most of these approaches have stayed in academia and have not been widely adopted in real-world scenarios where organizations need to release their event logs for process mining analysts to find enhancement opportunities. This paper presents Amun, an open-source differentially private event log-releasing tool. The tool anonymizes the user traces in the log so that an individual cannot be singled out using ICPM 2022 Doctoral Consortium and Tool Demonstration Track, October 23–28, 2022, Bolzano, Italy ∗ Corresponding author. Envelope-Open gamal.elkoumy@ut.ee (G. Elkoumy); alisa.pankova@cyber.ee (A. Pankova); marlon.dumas@ut.ee (M. Dumas) © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop CEUR Workshop Proceedings (CEUR-WS.org) Proceedings http://ceur-ws.org ISSN 1613-0073 1 http://data.europa.eu/eli/reg/2016/679/oj 56 Amun Anonymization Approaches Sampling Uploading Risk Noise Preprocessing Filtering with Output Event event log Quantification Quantification Sampling Log Anonymized Oversampling Event Log Figure 1: Overview of Amun a sub-trace. Furthermore, Amun anonymizes the execution timestamps and masks the case IDs. As a nonfunctional requirement, Amun can process large event logs with hundreds of thousands of events. Moreover, the tool lets the users know each event’s re-identification risk in the original log. The rest of the paper is structured as follows. Sect. 1 describes Amun’s functionality and components. Sect. 2 discusses the availability and maturity of the tool. Sect. 3 presents the conclusions. 1. Functionality Figure 1 gives an overview of Amun’s components. Below, we summarize the functionality of each component of Amun. Amun’s detailed explanation and evaluation are presented in [7] and [8]. Input Figure 2 presents the upload page of the web application. The event log publisher uploads their event log to Amun as either an XES (eXtensible Event Stream) or CSV (Comma Separated Value) file. Amun requires the event log to have at least a column representing the case ID, a column representing the activity instance, and a column that records the timestamp executing each activity. Then, the user sets the maximum acceptable risk probability (𝛿) using the slider, selects the anonymization method (sampling, oversampling, or filtering), and clicks Anonymize. The maximum acceptable risk probability (𝛿) represents the increase in the probability of singling out an individual after releasing the log. For example, suppose the attacker has prior information about an individual that makes the presence probability of that individual 20%. In that case, 𝛿 is the increase of that presence probability after releasing the log. Preprocessing and risk quantification Once the user clicks Anonymize, Amun starts processing the file. The first step is to establish a representation that helps to quantify the re-identification risk attached to releasing each event in the log. To this end, Amun represents the input event log as a lossless representation, namely a Deterministic Acyclic Finite State Automata (DAFSA) [9]. Next, Amun annotates each event log with its DAFSA transition, as explained in [8]. Then, for each event, Amun estimates the prior knowledge 𝑃𝑘 , which represents 57 the re-identification risk before publishing the log, and the posterior knowledge 𝑃𝑘′ , which means the re-identification risk after publishing the log. A detailed explanation of this risk quantification is presented in [8]. Anonymization Methods Amun offers the user three different anonymization approaches. All the approaches guarantee that the customers in the anonymized log will not be singled out using a subset of their trace variants or the timestamp of executing their activities. All the approaches provide differential privacy guarantees [3] by injecting noise, quantified by the differential privacy parameter 𝜖, from the control flow perspective, representing user traces in the log and the timestamp perspective. Amun offers the following anonymization approaches: • Oversampling [7]. In some settings, the user requires to have the same set of trace variants in the anonymized event log as in the original log. Therefore, the oversampling approach preserves the same set of trace variants while preventing singling out traces in the log. To this aim, Amun applies the approach presented by Elkoumy et al. [7]. This approach fits structured event logs where the cases of the log share trace variants. • Sampling. In some settings, the user may accept the deletion of some trace variants in order to release an anonymized event log that is close to the original log. To this end, the sampling approach anonymizes the event log so that the anonymization does not add new trace variants in the log, and the difference between the real and the anonymized timestamp is minimal. Amun applies the sampling approach presented in [8]. This approach works with semi-structured event logs. • Filtering with Sampling. Some event logs may contain very unique user traces, result- ing in large noise injection to achieve differential privacy guarantees. Therefore, Amun applies the filtering with sampling approach presented by Elkoumy et al. [8] to enable the anonymization of unstructured event logs, i.e., event logs with unique traces. The filtering approach filters out very risky traces that requires large noise injection. Thus, the anonymized logs preserve more utility. Noise Quantification and Injection At this step, given the estimated re-identification risk per event, Amun estimates the suitable 𝜖 value. We draw noise from Laplacian distribution and inject noise for both the control flow and time perspectives. This step is performed for each event independently. Output Once the event log anonymization is finished, the anonymized event log will be available for download. Amun downloads the anonymized log in the same format as the original log. Amun offers to download the risk quantification of each activity instance in the log as a CSV file. The risk quantification per each activity instance is a column called original risk, which represents the re-identification risk of releasing the event log before the anonymization. Amun anonymizes only the three columns: case ID, activity label, and timestamp. Amun drops the other attributes from the anonymized log. 58 Figure 2: Upload an event log and anonymize it using a selected approach 2. Maturity and Availability Amun has been empirically evaluated with real-life event logs as reported in [7, 8]. The empirical evaluation shows that Amun overcomes the state-of-the-art in terms of Jaccard distance and earth movers’ distance. Also, the empirical evaluation validates the non-functional requirements, as presented in Sect. 2. Amun is developed as a React web application and an API for ease of use. To enable quick trials by the users, Amun is available as a cloud service that can be found at http://a- mun.cloud.ut.ee. The current server deployment accepts event logs with sizes up to 5 MB. Amun is available as a docker image. The image and its installation steps can be found at https://github.com/Elkoumy/amun/tree/amun-flask-app. Also, Amun is available as a python package and can be integrated into other process mining tools. The source code and the instal- lation steps can be found at https://github.com/Elkoumy/amun. A screencast that describes the tool is available on YouTube at https://youtu.be/1dxaCNE9WHk. 59 3. Conclusion In this paper, we introduced Amun, a tool that provides differential privacy guarantees to release event logs for process mining. Amun offers approaches for event logs anonymization, which are suitable for different requirements of event logs publishers. The tool also quantifies the re-identification risk of releasing every activity instance in the log. Acknowledgments Work funded by European Research Council (PIX project) and by EU H2020-SU-ICT-03-2018 Project No.830929 CyberSec4Europe. References [1] M. Dumas, M. La Rosa, J. Mendling, H. A. Reijers, et al., Fundamentals of business process management, volume 1, Springer, 2013. [2] G. Elkoumy, S. A. Fahrenkrog-Petersen, M. F. Sani, A. Koschmider, F. Mannhardt, S. N. von Voigt, M. Rafiei, L. von Waldthausen, Privacy and confidentiality in process mining: Threats and research challenges, ACM Trans. Manag. Inf. Syst. 13 (2022) 11:1–11:17. [3] C. Dwork, A. Roth, et al., The algorithmic foundations of differential privacy., Found. Trends Theor. Comput. Sci. 9 (2014) 211–407. [4] M. Bauer, S. A. Fahrenkrog-Petersen, A. Koschmider, F. Mannhardt, H. van der Aa, M. Wei- dlich, ELPaaS: Event log privacy as a service, in: BPM (PhD/Demos), volume 2420 of CEUR Workshop Proceedings, CEUR-WS.org, 2019, pp. 159–163. [5] M. Rafiei, A. Schnitzler, W. M. P. van der Aalst, PC4PM: A tool for privacy/confidentiality preservation in process mining, in: BPM (PhD/Demos), volume 2973 of CEUR Workshop Proceedings, CEUR-WS.org, 2021, pp. 106–110. [6] G. Elkoumy, S. A. Fahrenkrog-Petersen, M. Dumas, P. Laud, A. Pankova, M. Weidlich, Shareprom: A tool for privacy-preserving inter-organizational process mining, in: BPM (PhD/Demos), volume 2673 of CEUR Workshop Proceedings, CEUR-WS.org, 2020, pp. 72–76. [7] G. Elkoumy, A. Pankova, M. Dumas, Mine me but don’t single me out: Differentially private event logs for process mining, in: ICPM, IEEE, 2021, pp. 80–87. [8] G. Elkoumy, A. Pankova, M. Dumas, Differentially private release of event logs for process mining, CoRR abs/2201.03010 (2022). [9] J. Daciuk, S. Mihov, B. W. Watson, R. E. Watson, Incremental construction of minimal acyclic finite-state automata, Comput. Linguistics 26 (2000) 3–16. 60