=Paper= {{Paper |id=Vol-2673/paperDR06 |storemode=property |title=Practical Aspect of Privacy-Preserving Data Publishing in Process Mining |pdfUrl=https://ceur-ws.org/Vol-2673/paperDR06.pdf |volume=Vol-2673 |authors=Majid Rafiei,Wil M. P. van der Aalst |dblpUrl=https://dblp.org/rec/conf/bpm/RafieiA20a }} ==Practical Aspect of Privacy-Preserving Data Publishing in Process Mining== https://ceur-ws.org/Vol-2673/paperDR06.pdf
     Practical Aspect of Privacy-Preserving Data
            Publishing in Process Mining*

              Majid Rafiei[0000−0001−7161−6927] and Wil M.P. van der
                             Aalst[0000−0002−0955−6940]

    Chair of Process and Data Science, RWTH Aachen University, Aachen, Germany



        Abstract. Process mining techniques such as process discovery and con-
        formance checking provide insights into actual processes by analyzing
        event data that are widely available in information systems. These data
        are very valuable, but often contain sensitive information, and process
        analysts need to balance confidentiality and utility. Privacy issues in pro-
        cess mining are recently receiving more attention from researchers which
        should be complemented by a tool to integrate the solutions and make
        them available in the real world. In this paper, we introduce a Python-
        based infrastructure implementing state-of-the-art privacy preservation
        techniques in process mining. The infrastructure provides a hierarchy of
        usages from single techniques to the collection of techniques, integrated
        as web-based tools. Our infrastructure manages both standard and non-
        standard event data resulting from privacy preservation techniques. It
        also stores explicit privacy metadata to track the modifications applied
        to protect sensitive data.

        Keywords: Responsible process mining · Privacy preservation · Process
        mining · Event data


1     Introduction
Process mining provides fact-based insights into actual business processes using
event data, which are often stored in the form of event logs. The three basic
types of process mining are process discovery, conformance checking, and pro-
cess enhancement [1]. An event log is a collection of events, and each event is
described by its attributes. The main attributes required for process mining are
case id, activity, timestamp, and resource. Some of the event attributes may refer
to individuals, e.g., in the health-care context, the case id attribute may refer
to the patients whose data are recorded, and the resource attribute may refer to
the employees performing activities for the patients, e.g., nurses or surgeons.
    Privacy issues in process mining are highlighted when the individuals’ data
are included in the event logs. According to the regulations such as the European
General Data Protection Regulation (GDPR) [12], organizations are compelled
to take the privacy of individuals into account while analyzing their data. The
    *
      Funded under the Excellence Strategy of the Federal Government and the Länder. We also
thank the Alexander von Humboldt (AvH) Stiftung for supporting our research.




Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons
License Attribution 4.0 International (CC BY 4.0).
2          Majid Rafiei and Wil M.P. vand der Aalst

necessity of responsibly analyzing private data has recently resulted in more
attention for privacy issues in process mining [10,6,4,5]. In [2], the authors in-
troduce a web-based tool, ELPaaS, implementing the privacy preservation tech-
niques introduced in [4] and [5]. ELPaaS gets the required parameters from users
and provides results, as CSV files, in email addresses of the users.
    Figure 1 shows the general approach
of privacy in process mining including
two main activities: Privacy-Preserving       Event Log                                   Event Log
                                                 (EL)                                  Abstraction (ELA)
Data Publishing (PPDP) and Privacy-                              Privacy-Preserving
                                                                  Data Publishing
                                                                      PPDP
Preserving Process Mining (PPPM).                  EL                                          ELA

PPDP aims to hide the identity and                                     Privacy-Aware
                                                                       Event Log (EL')
the sensitive data of record owners in




                                                                                                 Privacy-Preserving
                                                          Process Mining




                                                                                                  Process Mining
                                                                                                         PPPM
event data to protect their privacy.                                         EL'




                                                               PM




                                                                                Process Mining
PPPM aims to extend traditional pro-




                                                                                      PM
cess mining algorithms to work with the
non-standard data resulting from some
PPDP techniques. Note that PPPM al-                         R≈R'
                                                                      Process Mining     Process Mining
gorithms are tightly coupled with the        Process Mining
                                               Result (R)                Result (R')       Result (R')

corresponding PPDP techniques.
    In this paper, we introduce a tool Fig. 1: The general approach of privacy in
which mainly focuses on PPDP and process mining.
offers state-of-the-art privacy preserva-
tion techniques including the connector
method for securely discovering processes [9,10], the decomposition method for
privacy-aware role mining [6], and T LKC-privacy model for process mining [8].
The privacy metadata proposed in [7] are also embedded in the offered privacy
preservation techniques. Moreover, privacy in the context of process mining is
presented through PM4Py-WS (PMTK) [3] with a web-based interface which is
a particular example to show that the provided privacy preservation techniques
can be added to the existing process mining tools for supporting PPPM.
    The remainder of the paper is organized as follows. In Section 2, we demon-
strate the functionality and characteristics of the tool. Section 3 outlines the
maturity and availability of the tool, and Section 4 concludes the paper.


2       Functionality and Characteristics
In this section, we demonstrate the main functionalities and characteristics of
our stand-alone web-based tool, PPDP-PM, which is written in Python using
Django framework1 . Our tool has four main modules: event data management,
privacy-aware role mining, connector method, and TLKC-privacy. The event data
management module has two tabs to upload and manage the event data that
could be standard XES event logs2 or non-standard event data, called Event
Log Abstraction (ELA) [7]. In this module, an event log can be set as the input
    1
        https://www.djangoproject.com/
    2
        http://www.xes-standard.org/
   Practical Aspect of Privacy-Preserving Data Publishing in Process Mining                          3


       Event data
       management module.                                                       Link to the GitHub
                                                                                project.
       The privacy-aware
       role mining module.

       The connector
       method.

       The TLKC-privacy
       method.


                                                             Outputs are temporarily
                                                             stored here.



                             Add an output to the event
                             data management module.




                       Fig. 2: The privacy-aware role mining page in PPDP-PM.

for the privacy preservation techniques. The privacy-aware role mining module
(Figure 2) implements the decomposition method supporting three different tech-
niques: fixed-value, selective, and frequency-based [6]. After applying a technique,
the privacy-aware event log in the XES format is provided in the corresponding
“Outputs” section. The generated event log preserves the data utility for mining
roles from resources without exposing who performs what.
    The connector method implements an encryption-based method for discov-
ering directly follows graphs [9,10]. It breaks the traces down into the collection
of directly-follows relations which are securely stored in a data structure. After
applying the method, the privacy-aware event data are provided in the cor-
responding “Outputs” section as an XML file with the ELA format [7]. The
T LKC-privacy module implements the T LKC-privacy model for process min-
ing [8] that provides group-based privacy guarantees assuming four types of
background knowledge: set, multiset, sequence, and relative. T refers to the ac-
curacy of timestamps in the privacy-aware event log, L refers to the power of
background knowledge, K refers to the k in the k-anonymity definition [11], and
C refers to the bound of confidence regarding the sensitive attribute values in
an equivalence class. Applying this method results in a privacy-aware event log
in the XES format that preserves data utility for process discovery and perfor-
mance analysis. We also provide the same privacy preservation techniques in the
context of an open-source process mining tool. Figure 3 shows a snippet of the
home page of the privacy integration in PMTK where process mining algorithms
can directly be applied to the privacy-aware event data.
    Each privacy preservation technique in the tool is implemented as a Django
application that enables the simultaneous running of different techniques on an
event log. This architecture makes the whole project easy to maintain, and new
techniques can simply be integrated as independent applications. The outputs
for the privacy preservation techniques are provided independently for each tech-
nique and can be downloaded or stored in the event data repository. PPDP-PM
is designed in a way that provides a cycle of privacy preservation techniques, i.e.,
the privacy-aware event data, added to the event data repository, can be set as
the input for the techniques again as long as they are in the form of standard
XES event logs. To keep the process analysts aware of the modifications applied
4           Majid Rafiei and Wil M.P. vand der Aalst




              Process mining algorithms
              that can be applied to the
              results from the privacy
              preservation techniques.




                     Privacy preservation
                     techniques integrated
                     in PMTK.




         Fig. 3: The home page of the privacy integration in PM4Py-WS (PMTK).

to the privacy-aware event logs, the privacy metadata [7] specify the order of
the applied privacy preservation techniques. Moreover, the tool follows a nam-
ing approach to uniquely identify the privacy-aware event data based on name
of the technique, the creation time, and name of the event log.


3       Availability and Maturity
As mentioned, PPDP-PM is a web-based application written in Python. The
source code, a screencast, and other information are available in a GitHub
repository: https://github.com/m4jidRafiei/PPDP-PM. The privacy preser-
vation techniques, explained in Section 2, and the integration into PMTK are
also available as separate GitHub repositories.3 To facilitate the usage and inte-
gration of the privacy preservation techniques, they are also published as stan-
dard Python packages (https://pypi.org/): pp-role-mining, p-connector-dfg, p-
tlkc-privacy, and p-privacy-metadata. Our infrastructure provides a hierarchy
of usages such that users can use each technique independently, they can use
PPDP-PM which integrates a set of privacy preservation techniques as a stand-
alone web-based application, and they can also use the provided techniques in
a process mining tool where the privacy preservation techniques are integrated.
The scalability of the tool varies w.r.t. the privacy preservation technique and
the size of the input event log. Based on our experiments, our tool can handle
real-world event logs, e.g., the BPI challenge datasets4 . However, it can still be
improved for industry-scale usage. PPDP-PM and its integration in PMTK are
also provided as Docker containers which can simply be hosted by the users:
https://hub.docker.com/u/m4jid.


4       Conclusion
Event data often include highly sensitive information that needs to be consid-
ered by process analysts w.r.t. the regulations. In this paper, we introduced a
Python-based infrastructure for dealing with privacy issues in process mining.
A web-based application was introduced implementing privacy-preserving data
    3
        https://github.com/m4jidRafiei/
    4
        https://data.4tu.nl/repository/collection:event logs real
   Practical Aspect of Privacy-Preserving Data Publishing in Process Mining           5

publishing techniques in process mining. We also showed the privacy integration
in PMTK as an open-source web-based process mining tool. The infrastructure
was designed in such a way that other privacy preservation techniques can be
integrated. We plan to cover different perspectives of privacy and confidential-
ity issues in process mining, and novel techniques are supposed to be integrated
into the introduced framework. We also invite other researchers to integrate their
solutions as independent applications in the provided framework.


References
 1. van der Aalst, W.M.P.: Process Mining - Data Science in Action, Second Edition.
    Springer (2016). https://doi.org/10.1007/978-3-662-49851-4
 2. Bauer, M., Fahrenkrog-Petersen, S.A., Koschmider, A., Mannhardt, F., van der
    Aa, H., Weidlich, M.: Elpaas: Event log privacy as a service. In: Proceedings of
    the Dissertation Award, Doctoral Consortium, and Demonstration Track at BPM
    2019 (2019)
 3. Berti, A., van Zelst, S.J., van der Aalst, W.M.P.: Pm4py web services: Easy de-
    velopment, integration and deployment of process mining features in any applica-
    tion stack. In: Proceedings of the Dissertation Award, Doctoral Consortium, and
    Demonstration Track at BPM 2019 (2019)
 4. Fahrenkrog-Petersen, S.A., van der Aa, H., Weidlich, M.: PRETSA: event log san-
    itization for privacy-aware process discovery. In: International Conference on Pro-
    cess Mining, ICPM 2019, Aachen, Germany (2019)
 5. Mannhardt, F., Koschmider, A., Baracaldo, N., Weidlich, M., Michael, J.: Privacy-
    preserving process mining - differential privacy for event logs. Business & Informa-
    tion Systems Engineering 61(5), 595–614 (2019)
 6. Rafiei, M., van der Aalst, W.M.P.: Mining roles from event logs while preserving
    privacy. In: Business Process Management Workshops - BPM 2019 International
    Workshops, Vienna, Austria. pp. 676–689 (2019)
 7. Rafiei, M., van der Aalst, W.M.P.: Privacy-preserving data publishing in process
    mining. In: Business Process Management Forum - BPM Forum 2020, Sevilla,
    Spain, September 13-18, 2020, Proceedings (2020)
 8. Rafiei, M., Wagner, M., van der Aalst, W.M.P.: TLKC-privacy model for process
    mining. In: 14th International Conference on Research Challenges in Information
    Science, RCIS 2020 (2020)
 9. Rafiei, M., von Waldthausen, L., van der Aalst, W.M.P.: Ensuring confidentiality
    in process mining. In: Proceedings of the 8th International Symposium on Data-
    driven Process Discovery and Analysis (SIMPDA 2018), Seville, Spain (2018)
10. Rafiei, M., von Waldthausen, L., van der Aalst, W.M.P.: Supporting condentiality
    in process mining using abstraction and encryption. In: Data-Driven Process Dis-
    covery and Analysis - 8th IFIP WG 2.6 International Symposium, SIMPDA 2018,
    and 9th International Symposium, SIMPDA 2019, Revised Selected Papers (2019)
11. Sweeney, L.: k-anonymity: A model for protecting privacy. International Journal
    of Uncertainty, Fuzziness and Knowledge-Based Systems 10(05), 557–570 (2002)
12. Voss, W.G.: European union data privacy law reform: General data protection
    regulation, privacy shield, and the right to delisting. Business Lawyer 72(1) (2016)