<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Practical Aspect of Privacy-Preserving Data Publishing in Process Mining*</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Majid Ra ei[</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Wil M.P. van der A</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Chair of Process and Data Science, RWTH Aachen University</institution>
          ,
          <addr-line>Aachen</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Process mining techniques such as process discovery and conformance checking provide insights into actual processes by analyzing event data that are widely available in information systems. These data are very valuable, but often contain sensitive information, and process analysts need to balance con dentiality and utility. Privacy issues in process mining are recently receiving more attention from researchers which should be complemented by a tool to integrate the solutions and make them available in the real world. In this paper, we introduce a Pythonbased infrastructure implementing state-of-the-art privacy preservation techniques in process mining. The infrastructure provides a hierarchy of usages from single techniques to the collection of techniques, integrated as web-based tools. Our infrastructure manages both standard and nonstandard event data resulting from privacy preservation techniques. It also stores explicit privacy metadata to track the modi cations applied to protect sensitive data.</p>
      </abstract>
      <kwd-group>
        <kwd>Responsible process mining • Privacy preservation • Process mining • Event data</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Process mining provides fact-based insights into actual business processes using
event data, which are often stored in the form of event logs. The three basic
types of process mining are process discovery, conformance checking, and
process enhancement [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. An event log is a collection of events, and each event is
described by its attributes. The main attributes required for process mining are
case id, activity, timestamp, and resource. Some of the event attributes may refer
to individuals, e.g., in the health-care context, the case id attribute may refer
to the patients whose data are recorded, and the resource attribute may refer to
the employees performing activities for the patients, e.g., nurses or surgeons.
      </p>
      <p>
        Privacy issues in process mining are highlighted when the individuals' data
are included in the event logs. According to the regulations such as the European
General Data Protection Regulation (GDPR) [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], organizations are compelled
to take the privacy of individuals into account while analyzing their data. The
necessity of responsibly analyzing private data has recently resulted in more
attention for privacy issues in process mining [
        <xref ref-type="bibr" rid="ref10 ref4 ref5 ref6">10,6,4,5</xref>
        ]. In [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], the authors
introduce a web-based tool, ELPaaS, implementing the privacy preservation
techniques introduced in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. ELPaaS gets the required parameters from users
and provides results, as CSV les, in email addresses of the users.
      </p>
      <p>Figure 1 shows the general approach
of privacy in process mining including
tDwaotamPaiunblaischtiivnigtie(sP:PPDriPv)acayn-dPrPesreivravicnyg- Eve(nEtL)Log PrDivaatacyP-Purbelissehrivnigng AbstErvaecntitoLno(gELA)
Preserving Process Mining (PPPM). EL PPDP ELA
PPDP aims to hide the identity and Privacy-Aware
tecnPhevoPseensPn-smtsMetnaidnnsaaiidinttmiagavrseadtltgdoodoaaretptiaxatrthoeormtenfesdscrutetltoctriotanwhrdgdoeiritfrkioroowwnpmnairtelihsvrposatmrchoiynee-. iirsscgePnnoM PM EventELLo'gPM(EiirsscgePnnoML') PPPM ii-rrrscvvgyaeePPn iirsscgePnnoM
gPoPrDithPmtsecahrneiqtuigesh.tlNyocteoutphlaetdPwPiPthMtahle- ProRceessuslMt(iRn)ing R≈R' ProRceessusltM(Rin')ing ProRceessusltM(Rin')ing
corresponding PPDP techniques.</p>
      <p>
        In this paper, we introduce a tool Fig. 1: The general approach of privacy in
which mainly focuses on PPDP and process mining.
o ers state-of-the-art privacy
preservation techniques including the connector
method for securely discovering processes [
        <xref ref-type="bibr" rid="ref10 ref9">9,10</xref>
        ], the decomposition method for
privacy-aware role mining [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], and T LKC-privacy model for process mining [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
The privacy metadata proposed in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] are also embedded in the o ered privacy
preservation techniques. Moreover, privacy in the context of process mining is
presented through PM4Py-WS (PMTK) [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] with a web-based interface which is
a particular example to show that the provided privacy preservation techniques
can be added to the existing process mining tools for supporting PPPM.
      </p>
      <p>The remainder of the paper is organized as follows. In Section 2, we
demonstrate the functionality and characteristics of the tool. Section 3 outlines the
maturity and availability of the tool, and Section 4 concludes the paper.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Functionality and Characteristics</title>
      <p>
        In this section, we demonstrate the main functionalities and characteristics of
our stand-alone web-based tool, PPDP-PM, which is written in Python using
Django framework1. Our tool has four main modules: event data management,
privacy-aware role mining, connector method, and TLKC-privacy. The event data
management module has two tabs to upload and manage the event data that
could be standard XES event logs2 or non-standard event data, called Event
Log Abstraction (ELA) [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. In this module, an event log can be set as the input
      </p>
      <sec id="sec-2-1">
        <title>1https://www.djangoproject.com/ 2http://www.xes-standard.org/</title>
        <p>Add an output to theevent
datamanagementmodule.</p>
        <p>Link to the GitHub
project.</p>
        <p>
          Outputs are temporarily
stored here.
for the privacy preservation techniques. The privacy-aware role mining module
(Figure 2) implements the decomposition method supporting three di erent
techniques: xed-value, selective, and frequency-based [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. After applying a technique,
the privacy-aware event log in the XES format is provided in the corresponding
\Outputs" section. The generated event log preserves the data utility for mining
roles from resources without exposing who performs what.
        </p>
        <p>
          The connector method implements an encryption-based method for
discovering directly follows graphs [
          <xref ref-type="bibr" rid="ref10 ref9">9,10</xref>
          ]. It breaks the traces down into the collection
of directly-follows relations which are securely stored in a data structure. After
applying the method, the privacy-aware event data are provided in the
corresponding \Outputs" section as an XML le with the ELA format [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. The
T LKC-privacy module implements the T LKC-privacy model for process
mining [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] that provides group-based privacy guarantees assuming four types of
background knowledge: set, multiset, sequence, and relative. T refers to the
accuracy of timestamps in the privacy-aware event log, L refers to the power of
background knowledge, K refers to the k in the k-anonymity de nition [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ], and
C refers to the bound of con dence regarding the sensitive attribute values in
an equivalence class. Applying this method results in a privacy-aware event log
in the XES format that preserves data utility for process discovery and
performance analysis. We also provide the same privacy preservation techniques in the
context of an open-source process mining tool. Figure 3 shows a snippet of the
home page of the privacy integration in PMTK where process mining algorithms
can directly be applied to the privacy-aware event data.
        </p>
        <p>Each privacy preservation technique in the tool is implemented as a Django
application that enables the simultaneous running of di erent techniques on an
event log. This architecture makes the whole project easy to maintain, and new
techniques can simply be integrated as independent applications. The outputs
for the privacy preservation techniques are provided independently for each
technique and can be downloaded or stored in the event data repository. PPDP-PM
is designed in a way that provides a cycle of privacy preservation techniques, i.e.,
the privacy-aware event data, added to the event data repository, can be set as
the input for the techniques again as long as they are in the form of standard
XES event logs. To keep the process analysts aware of the modi cations applied</p>
        <p>Process miningalgorithms
thatcanbe appliedtothe
results fromthe privacy
preservationtechniques.</p>
        <p>
          Privacypreservation
techniques integrated
inPMTK.
to the privacy-aware event logs, the privacy metadata [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] specify the order of
the applied privacy preservation techniques. Moreover, the tool follows a
naming approach to uniquely identify the privacy-aware event data based on name
of the technique, the creation time, and name of the event log.
3
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Availability and Maturity</title>
      <p>As mentioned, PPDP-PM is a web-based application written in Python. The
source code, a screencast, and other information are available in a GitHub
repository: https://github.com/m4jidRafiei/PPDP-PM. The privacy
preservation techniques, explained in Section 2, and the integration into PMTK are
also available as separate GitHub repositories.3 To facilitate the usage and
integration of the privacy preservation techniques, they are also published as
standard Python packages (https://pypi.org/): pp-role-mining , p-connector-dfg ,
ptlkc-privacy , and p-privacy-metadata. Our infrastructure provides a hierarchy
of usages such that users can use each technique independently, they can use
PPDP-PM which integrates a set of privacy preservation techniques as a
standalone web-based application, and they can also use the provided techniques in
a process mining tool where the privacy preservation techniques are integrated.
The scalability of the tool varies w.r.t. the privacy preservation technique and
the size of the input event log. Based on our experiments, our tool can handle
real-world event logs, e.g., the BPI challenge datasets4. However, it can still be
improved for industry-scale usage. PPDP-PM and its integration in PMTK are
also provided as Docker containers which can simply be hosted by the users:
https://hub.docker.com/u/m4jid.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Conclusion</title>
      <p>Event data often include highly sensitive information that needs to be
considered by process analysts w.r.t. the regulations. In this paper, we introduced a
Python-based infrastructure for dealing with privacy issues in process mining.
A web-based application was introduced implementing privacy-preserving data</p>
      <sec id="sec-4-1">
        <title>3https://github.com/m4jidRa ei/ 4https://data.4tu.nl/repository/collection:event logs real</title>
        <p>publishing techniques in process mining. We also showed the privacy integration
in PMTK as an open-source web-based process mining tool. The infrastructure
was designed in such a way that other privacy preservation techniques can be
integrated. We plan to cover di erent perspectives of privacy and con
dentiality issues in process mining, and novel techniques are supposed to be integrated
into the introduced framework. We also invite other researchers to integrate their
solutions as independent applications in the provided framework.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>van der Aalst</surname>
            ,
            <given-names>W.M.P.</given-names>
          </string-name>
          : Process Mining - Data Science in Action,
          <source>Second Edition</source>
          . Springer (
          <year>2016</year>
          ). https://doi.org/10.1007/978-3-
          <fpage>662</fpage>
          -49851-4
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Bauer</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fahrenkrog-Petersen</surname>
            ,
            <given-names>S.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Koschmider</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mannhardt</surname>
          </string-name>
          , F.,
          <string-name>
            <surname>van der Aa</surname>
          </string-name>
          , H.,
          <string-name>
            <surname>Weidlich</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          : Elpaas:
          <article-title>Event log privacy as a service</article-title>
          .
          <source>In: Proceedings of the Dissertation Award, Doctoral Consortium, and Demonstration Track at BPM</source>
          <year>2019</year>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Berti</surname>
            , A., van Zelst, S.J., van der Aalst,
            <given-names>W.M.P.:</given-names>
          </string-name>
          <article-title>Pm4py web services: Easy development, integration and deployment of process mining features in any application stack</article-title>
          .
          <source>In: Proceedings of the Dissertation Award, Doctoral Consortium, and Demonstration Track at BPM</source>
          <year>2019</year>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Fahrenkrog-Petersen</surname>
            ,
            <given-names>S.A</given-names>
          </string-name>
          ., van der Aa, H.,
          <string-name>
            <surname>Weidlich</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>PRETSA: event log sanitization for privacy-aware process discovery</article-title>
          .
          <source>In: International Conference on Process Mining, ICPM</source>
          <year>2019</year>
          , Aachen, Germany (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Mannhardt</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Koschmider</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Baracaldo</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weidlich</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Michael</surname>
          </string-name>
          , J.:
          <article-title>Privacypreserving process mining - di erential privacy for event logs</article-title>
          .
          <source>Business &amp; Information Systems Engineering</source>
          <volume>61</volume>
          (
          <issue>5</issue>
          ),
          <volume>595</volume>
          {
          <fpage>614</fpage>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Ra</surname>
            <given-names>ei</given-names>
          </string-name>
          , M.,
          <string-name>
            <surname>van der Aalst</surname>
            ,
            <given-names>W.M.P.</given-names>
          </string-name>
          :
          <article-title>Mining roles from event logs while preserving privacy</article-title>
          .
          <source>In: Business Process Management Workshops - BPM 2019 International Workshops</source>
          , Vienna, Austria. pp.
          <volume>676</volume>
          {
          <issue>689</issue>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Ra</surname>
            <given-names>ei</given-names>
          </string-name>
          , M.,
          <string-name>
            <surname>van der Aalst</surname>
            ,
            <given-names>W.M.P.</given-names>
          </string-name>
          :
          <article-title>Privacy-preserving data publishing in process mining</article-title>
          .
          <source>In: Business Process Management Forum - BPM Forum</source>
          <year>2020</year>
          , Sevilla, Spain,
          <source>September 13-18</source>
          ,
          <year>2020</year>
          ,
          <string-name>
            <surname>Proceedings</surname>
          </string-name>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Ra</surname>
            <given-names>ei</given-names>
          </string-name>
          , M.,
          <string-name>
            <surname>Wagner</surname>
          </string-name>
          , M.,
          <string-name>
            <surname>van der Aalst</surname>
            ,
            <given-names>W.M.P.</given-names>
          </string-name>
          :
          <article-title>TLKC-privacy model for process mining</article-title>
          .
          <source>In: 14th International Conference on Research Challenges in Information Science</source>
          ,
          <string-name>
            <surname>RCIS</surname>
          </string-name>
          <year>2020</year>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Ra</surname>
            <given-names>ei</given-names>
          </string-name>
          , M.,
          <string-name>
            <surname>von Waldthausen</surname>
          </string-name>
          , L.,
          <string-name>
            <surname>van der Aalst</surname>
            ,
            <given-names>W.M.P.</given-names>
          </string-name>
          :
          <article-title>Ensuring con dentiality in process mining</article-title>
          .
          <source>In: Proceedings of the 8th International Symposium on Datadriven Process Discovery and Analysis (SIMPDA</source>
          <year>2018</year>
          ), Seville, Spain (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Ra</surname>
            <given-names>ei</given-names>
          </string-name>
          , M.,
          <string-name>
            <surname>von Waldthausen</surname>
          </string-name>
          , L.,
          <string-name>
            <surname>van der Aalst</surname>
            ,
            <given-names>W.M.P.</given-names>
          </string-name>
          :
          <article-title>Supporting condentiality in process mining using abstraction and encryption</article-title>
          .
          <source>In: Data-Driven Process Discovery and Analysis - 8th IFIP WG 2</source>
          .6 International Symposium,
          <string-name>
            <surname>SIMPDA</surname>
          </string-name>
          <year>2018</year>
          ,
          <article-title>and</article-title>
          9th International Symposium,
          <string-name>
            <surname>SIMPDA</surname>
          </string-name>
          <year>2019</year>
          ,
          <article-title>Revised Selected Papers (</article-title>
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Sweeney</surname>
          </string-name>
          , L.:
          <article-title>k-anonymity: A model for protecting privacy</article-title>
          .
          <source>International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems</source>
          <volume>10</volume>
          (
          <issue>05</issue>
          ),
          <volume>557</volume>
          {
          <fpage>570</fpage>
          (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Voss</surname>
          </string-name>
          , W.G.:
          <article-title>European union data privacy law reform: General data protection regulation, privacy shield, and the right to delisting</article-title>
          .
          <source>Business Lawyer</source>
          <volume>72</volume>
          (
          <issue>1</issue>
          ) (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>