<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>M. Rafiei); Alexander.Schnitzler@outlook.com (A. Schnitzler);
wvdaalst@pads.rwth-aachen.de (W. M.P. v. d. Aalst)</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>PC4PM: A Tool for Privacy/Confidentiality Preservation in Process Mining</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Majid Rafiei</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alexander Schnitzler</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Wil M.P. van der Aalst</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Chair of Process and Data Science, RWTH Aachen University</institution>
          ,
          <addr-line>Aachen</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2021</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0001</lpage>
      <abstract>
        <p>Process mining enables business owners to discover and analyze their actual processes using event data that are widely available in information systems. Event data contain detailed information which is incredibly valuable for providing insights. However, such detailed data often include highly confidential and private information. Thus, concerns of privacy and confidentiality in process mining are becoming increasingly relevant and new techniques are being introduced. To make the techniques easily accessible, new tools need to be developed to integrate the introduced techniques and direct users to appropriate solutions based on their needs. In this paper, we present a Python-based infrastructure implementing and integrating state-of-the-art privacy/confidentiality preservation techniques in process mining. Our tool provides an easy-to-use web-based user interface for privacy-preserving data publishing, risk analysis, and data utility analysis. The tool also provides a set of anonymization operations that can be utilized to support privacy/confidentiality preservation. The tool manages both standard XES event logs and non-standard event data. We also store and manage privacy metadata to track the changes made by privacy/confidentiality preservation techniques.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;process mining</kwd>
        <kwd>privacy preservation</kwd>
        <kwd>confidentiality</kwd>
        <kwd>event data</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        the activity attribute may contain a confidential activity name that must not be exposed. Respect
for privacy when analyzing personal data is also dictated by regulations, e.g., the European
General Data Protection Regulation (GDPR)1. Such legitimate and ethical requirements have
recently resulted in more attention to privacy and confidentiality issues in process mining
[
        <xref ref-type="bibr" rid="ref2 ref3 ref4 ref5">2, 3, 4, 5</xref>
        ]. Some tools have also been introduced to provide specific privacy/confidentiality
requirements [
        <xref ref-type="bibr" rid="ref6 ref7 ref8">6, 7, 8</xref>
        ].
      </p>
      <p>
        Figure 1 shows the general overview of
privacy-related activities in process mining
including Privacy-Preserving Data Publishing
(PPDP), Privacy-Preserving Process Mining
(PPPM), and Privacy Analysis (PrAn). PPDP
tries to obscure the identity and/or
sensitive data of individuals to preserve their
privacy. PPDP techniques often apply one or
more anonymization operations, e.g.,
suppression, generalization, etc., to provide the
desired privacy requirements. PPPM intends
to expand existing process mining algorithms
to cope with intermediate results, so-called
abstractions [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], generated by some PPDP
techniques. Note that PPPM algorithms are Figure 1: The general overview of
privacyclosely linked with the corresponding PPDP related activities in process mining.
approaches, and PPPM may refer to the entire
privatization process, starting with an event
log and finishing with process mining findings. PrAn, indicated with dashed lines in Figure 1,
includes two types of activities: risk analysis and utility analysis. Both PrAn activities could be
done for data and results. In this paper, we introduce a tool, named PC4PM, mainly focusing on
the activities indicated by the check-boxes in Figure 1. PC4PM is the successor of the privacy
tool introduced in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], and it ofers new privacy preservation techniques, privacy analysis, a
set of anonymization operations, and user guidance that directs users to the right techniques
based on their requirements. In the rest of the paper, we demonstrate the functionality and
characteristics of PC4PM. We also describe the maturity and availability of the tool.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Functionality and Characteristics</title>
      <p>PC4PM is implemented in Python using Django framework2. Figure 2 shows a high-level view
of the architecture. PC4PM includes eight main Django applications and each application
provides at least one main privacy-related activity implemented as a Python package. The Django
templates, accessible from any web browser, provide a web interface for the applications.
Implementing each technique as an independent Django application enables users to simultaneously
run diferent techniques on event logs. Such architecture makes the process of maintenance and
1http://data.europa.eu/eli/reg/2016/679/oj
2https://www.djangoproject.com/
integration simple. To integrate new techniques, one can create a Python package and integrate
it as an independent application. Moreover, Python packages can independently be imported
into other Python-based tools.</p>
      <p>
        Figure 3 shows the home page of PC4PM. The left
menu shows the main Django applications
including event data management, privacy-aware role
mining, connector method, TLKC-privacy, TLKC-privacy
extended, anonymization operations, PRIPEL, and privacy
analysis. The event data management application
manages both standard XES event logs and non-standard
event data, called event log abstraction [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. The
privacyaware role mining application implements the
decomposition method, proposed in [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], to discover roles
from event logs while preserving privacy. This method Figure 2: The architecture of PC4PM.
perturbs the frequency of activities in an event log to
eliminate frequency-based attacks. The connector method is an encryption-based method for
securely discovering directly follows graphs from event logs. This method breaks down traces
into a collection of directly follows relations to prevent linkage attacks.
      </p>
      <p>
        The TLKC-privacy application
implements the TLKC-privacy model
providing group-based privacy guarantees for
process discovery and performance
analysis. The TLKC-privacy extended
application extends the TLKC-privacy model
and considers all the main perspectives
of process mining [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. The
anonymization operation application, implements
all the main anonymization operations
proposed in [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] including suppression,
addition, substitution, condensation,
swapping, generalization, and cryptography.
      </p>
      <p>
        The PRIPEL application presents the
PRIPEL method [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] which applies the no- Figure 3: The home page of PC4PM.
tion of diferential privacy to provide
privacy guarantees for event logs. The privacy analysis application includes three components for
analyzing disclosure risks, data utility [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], and FCB-anonymity [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
      </p>
      <p>
        PC4PM supports users with a four-step user guide to help them choose the right technique(s)
based on their needs. The user guidance works based on a four-dimension signature assigned to
each technique. The signature reflects the following aspects: process mining perspective (PMPS),
process mining activity (PMAC), privacy perspective (PRPS), and privacy activity (PRAC). PMPS
indicates the process mining perspective that a privacy technique focuses on, e.g., control-flow.
PMAC shows the process mining activity, e.g, process discovery, for which the utility of event
data is preserved. PRPS shows the privacy perspective of a privacy technique, i.e., resource
or case. PRAC indicates the main privacy-related activity of a privacy preservation technique,
i.e., PPDP, PPPM, or PrAn. Moreover, PC4PM helps users with help tooltips provided for the
parameters used by techniques. PC4PM also inherits all the characteristics of its predecessor [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
Some of those are as follows: (1) Each Django application provides the results in an independent
output section, (2) It enables a cycle of privacy/confidentiality preservation techniques such
that the results from one technique can be added to the event data repository and used as an
input for other techniques, (3) The privacy metadata [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] which specify the order and type of
the main anonymization operations are added to anonymized event logs.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Availability and Maturity</title>
      <p>The source code, a screencast, a user manual, and all other resources are available in our
GitHub repository: https://github.com/m4jidRafiei/PC4PM. Each privacy/confidentiality Python
package is linked to a separate GitHub project. The main GitHub project contains links to all
those projects. In the corresponding GitHub project of each privacy/confidentiality Python
package, one can find the name of the Python package, the link to the main paper, and a sample
source code that shows the usage. In terms of performance and time complexity, each privacy
preservation technique which is linked to a Django application behaves diferently w.r.t. the size
of the input event log. Based on our experiments, the applications are able to handle real-world
event logs, e.g., BPI challenge datasets: https://data.4tu.nl/. Moreover, all the complicated and
time-consuming functions, developed in the Python packages, have a parameter to be run
using multi-processing which is enabled by default. In this case, the input event log is divided
into smaller pieces w.r.t. the cores of the processor hosting PC4PM. PC4PM is provided as a
Docker container that can simply be hosted by users: https://hub.docker.com/r/m4jid/pc4pm.
The Docker usage is also explained in the GitHub repository.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion</title>
      <p>In this paper, we introduced a tool for publishing event data w.r.t. privacy concerns. Our
webbased tool is mainly focused on privacy-/confidentiality-preserving data publishing and privacy
analysis considering both data utility and disclosure risk analyses. PC4PM can be considered as
a sanitizer that provides sanitized event logs that can be used by any process mining tool. The
architecture has been designed in such a way that other privacy preservation techniques can
easily be integrated, e.g., we integrated PRIPEL as an external library. The goal of PC4PM is
to provide a comprehensive set of techniques that can cover all the aspects of privacy-related
activities for diferent perspectives of process mining. We invite other researchers to integrate
their solutions as independent applications into the provided framework.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>Funded under the Excellence Strategy of the Federal Government and the Länder. We also thank
the Alexander von Humboldt (AvH) Stiftung for supporting our research.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>W. M. P. van der Aalst</surname>
          </string-name>
          ,
          <source>Process Mining - Data Science in Action, Second Edition</source>
          , Springer,
          <year>2016</year>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>662</fpage>
          -49851-4.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S. A.</given-names>
            <surname>Fahrenkrog-Petersen</surname>
          </string-name>
          , H. van der Aa, M. Weidlich,
          <article-title>PRIPEL: privacy-preserving event log publishing including contextual information</article-title>
          , in: Business Process Management - 18th International Conference, BPM, volume
          <volume>12168</volume>
          of Lecture Notes in Computer Science,
          <year>2020</year>
          , pp.
          <fpage>111</fpage>
          -
          <lpage>128</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M.</given-names>
            <surname>Rafiei</surname>
          </string-name>
          ,
          <string-name>
            <surname>W. M. P. van der Aalst</surname>
          </string-name>
          ,
          <article-title>Group-based privacy preservation techniques for process mining</article-title>
          ,
          <source>Data &amp; Knowledge Engineering</source>
          <volume>134</volume>
          (
          <year>2021</year>
          )
          <article-title>101908</article-title>
          . doi:https://doi.org/10. 1016/j.datak.
          <year>2021</year>
          .
          <volume>101908</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>G.</given-names>
            <surname>Elkoumy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. A.</given-names>
            <surname>Fahrenkrog-Petersen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. F.</given-names>
            <surname>Sani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Koschmider</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Mannhardt</surname>
          </string-name>
          ,
          <string-name>
            <surname>S. N. von Voigt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Rafiei</surname>
          </string-name>
          , L. von Waldthausen,
          <article-title>Privacy and confidentiality in process mining - threats and research challenges</article-title>
          ,
          <source>CoRR abs/2106</source>
          .00388 (
          <year>2021</year>
          ). URL: https://arxiv.org/abs/ 2106.00388. arXiv:
          <volume>2106</volume>
          .
          <fpage>00388</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>G.</given-names>
            <surname>Elkoumy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. A.</given-names>
            <surname>Fahrenkrog-Petersen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dumas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Laud</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Pankova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Weidlich</surname>
          </string-name>
          ,
          <article-title>Secure multi-party computation for inter-organizational process mining</article-title>
          , in: Enterprise,
          <string-name>
            <surname>Business-Process and Information Systems</surname>
          </string-name>
          Modeling - 21st International Conference, BPMDS, Springer,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>G.</given-names>
            <surname>Elkoumy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. A.</given-names>
            <surname>Fahrenkrog-Petersen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dumas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Laud</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Pankova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Weidlich</surname>
          </string-name>
          ,
          <article-title>Shareprom: A tool for privacy-preserving inter-organizational process mining</article-title>
          ,
          <source>in: Proceedings of the Best Dissertation Award, Doctoral Consortium, and Demonstration &amp; Resources Track at BPM</source>
          <year>2020</year>
          , volume
          <volume>2673</volume>
          <source>of CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>72</fpage>
          -
          <lpage>76</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M.</given-names>
            <surname>Rafiei</surname>
          </string-name>
          ,
          <string-name>
            <surname>W. M. P. van der Aalst</surname>
          </string-name>
          ,
          <article-title>Practical aspect of privacy-preserving data publishing in process mining</article-title>
          ,
          <source>in: Proceedings of the Best Dissertation Award, Doctoral Consortium, and Demonstration &amp; Resources Track at BPM</source>
          <year>2020</year>
          co
          <article-title>-located with the 18th International Conference on Business Process Management (BPM 2020), CEUR-WS</article-title>
          .org,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M.</given-names>
            <surname>Bauer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. A.</given-names>
            <surname>Fahrenkrog-Petersen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Koschmider</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Mannhardt</surname>
          </string-name>
          , H. van der Aa, M. Weidlich, Elpaas:
          <article-title>Event log privacy as a service</article-title>
          ,
          <source>in: Proceedings of the Dissertation Award, Doctoral Consortium, and Demonstration Track at BPM</source>
          <year>2019</year>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>M.</given-names>
            <surname>Rafiei</surname>
          </string-name>
          ,
          <string-name>
            <surname>W. M. P. van der Aalst</surname>
          </string-name>
          ,
          <article-title>Privacy-preserving data publishing in process mining</article-title>
          ,
          <source>in: Business Process Management Forum - BPM Forum</source>
          <year>2020</year>
          , Seville, Spain,
          <source>September 13-18</source>
          ,
          <year>2020</year>
          , Proceedings, volume
          <volume>392</volume>
          <source>of Lecture Notes in Business Information Processing</source>
          , Springer,
          <year>2020</year>
          , pp.
          <fpage>122</fpage>
          -
          <lpage>138</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>030</fpage>
          -58638-6\_8.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>M.</given-names>
            <surname>Rafiei</surname>
          </string-name>
          ,
          <string-name>
            <surname>W. M. P. van der Aalst</surname>
          </string-name>
          ,
          <article-title>Mining roles from event logs while preserving privacy</article-title>
          , in: Business Process Management Workshops - BPM 2019 International Workshops, Vienna, Austria,
          <year>2019</year>
          , pp.
          <fpage>676</fpage>
          -
          <lpage>689</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>M.</given-names>
            <surname>Rafiei</surname>
          </string-name>
          ,
          <string-name>
            <surname>W. M. P. van der Aalst</surname>
          </string-name>
          ,
          <article-title>Towards quantifying privacy in process mining</article-title>
          , in: International Conference on Process Mining - ICPM 2020 International Workshops, Padua, Italy, October 4-
          <issue>9</issue>
          ,
          <year>2020</year>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>M.</given-names>
            <surname>Rafiei</surname>
          </string-name>
          ,
          <string-name>
            <surname>W. M. P. van der Aalst</surname>
          </string-name>
          ,
          <article-title>Privacy-preserving continuous event data publishing</article-title>
          ,
          <source>CoRR abs/2105</source>
          .11991 (
          <year>2021</year>
          ). URL: https://arxiv.org/abs/2105.11991. arXiv:
          <volume>2105</volume>
          .
          <fpage>11991</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>