<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>ELPaaS: Event Log Privacy as a Service</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Martin Bauer</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stephan A. Fahrenkrog-Petersen</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Agnes Koschmider</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Felix Mannhardt</string-name>
          <email>felix.mannhardt@sintef.no</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Han van der Aa</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Matthias Weidlich</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Humboldt-Universitat zu Berlin</institution>
          ,
          <addr-line>Berlin</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Kiel University</institution>
          ,
          <addr-line>Kiel</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>SINTEF Digital</institution>
          ,
          <addr-line>Trondheim</addr-line>
          ,
          <country country="NO">Norway</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The privacy of an organization's workers represents a crucial concern in process mining settings, where data on an individual's performance is recorded and possibly shared for analysis. To enable users to appropriately deal with privacy concerns in process mining, this paper introduces ELPaaS (Event Log Privacy as a Service), a web application that o ers state-of-the-art techniques for event log sanitization and privacy-preserving process mining queries. By employing our techniques, users obtain event logs and process mining results that provide privacy guarantees such as di erential privacy and k-anonymity. Hence, the privacy of an organization's workers is protected.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Process mining represents a family of techniques for the data-driven analysis of
business processes [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. These techniques utilize event data recorded by
information systems during the execution of a business process, stored in the form of
event logs. Event logs are employed for a variety of use cases, such as process
discovery, in which a process model is constructed on the basis of the recorded
event data, conformance checking, in which event data is compared to a process
model, and model enhancement, in which, for example, performance information
is added to an obtained process model.
      </p>
      <p>
        Recognizing the potential of process mining, organizations strive to record
event data in an accurate and ne-granular manner. While this enables
organizations to ensure the e cient and correct execution of their processes, it can
also result in the disclosure of sensitive information regarding an organization's
employees. Event logs may breach an individual's privacy [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], violating their
ability to control who has access to their personal data [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Therefore, disclosure
of recorded event data in the form of event logs should be assessed in light of
ethical considerations, as well as in the context of privacy regulations, such as
the European General Data Protection Regulation (GDPR) [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] and the
California Consumer Privacy Act. For instance, the GDPR prohibits the processing of
personal data unless explicit consent has been given.
      </p>
      <p>Process Mining
Event Log 
Sanitization
Sanitized  
Event Data
Process Mining </p>
      <p>Artifact
Data Contribution</p>
      <p>Data Extraction</p>
      <p>Privatized Process Mining
Individual</p>
      <p>Information System</p>
      <p>Event Data
Process Mining </p>
      <p>Artifact</p>
      <p>
        To enable the appropriate handling of recorded event data in process mining,
we introduce the ELPaaS (Event Log Privacy as a Service) web application. As
visualized in Figure 1, the application supports two fundamental ways in which
privacy guarantees in process mining can be provided: (1) event log sanitization
and (2) privatized process mining. Event log sanitization involves the
transformation of an extracted event log into one that satis es established privacy
metrics. An event log obtained in this manner can subsequently reduce the
disclosure of personally identi able records in the log. Privatized process mining, by
contrast, involves process mining techniques that have been speci cally designed
such that the obtained process mining artifacts, e.g., derived process models or
query results, meet desired privacy guarantees. ELPaaS o ers state-of-the-art
techniques for both directions. For event log sanitization, the application o ers
the PRETSA [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] (PRE x-Tree based event log SAnitisation for t-closeness)
algorithm, which sanitizes event logs to guarantee k-anonymity and t-closeness.
For privatized process mining techniques, di erential privacy mechanisms for
common queries on event logs, developed by Mannhardt et al. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], are o ered.
      </p>
      <p>The remainder of this paper is structured as follows. Section 2 describes and
visualizes the functionality of ELPaaS, Section 3 discusses the maturity and
availability of the application, before concluding in Section 4.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Functionality</title>
      <p>
        This section describes the main input, functions, and output of ELPaaS.
Input. Figure 2 presents a snippet of the opening page of the web application.
The rst step for a user is to upload an event log. Event logs should be provided
as XES (eXtensible Event Stream) or CSV (Comma Separated Value) les. CSV
les should at least contain a column representing a case ID and an activity, and
should be sorted according to the execution order of the events.
Event Log Sanitization. Users that want to sanitize an event log can directly
do so by selecting the PRETSA algorithm [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] in the upload screen, as shown in
      </p>
      <p>
        Figure 3a. It then transforms an event log into one that satis es k-anonymity and
t-closeness requirements, while striving to preserve maximum utility for process
discovery and process enhancement. When an event log satis es k-anonymity,
any event execution can be related to at least k di erent actors, mitigating an
attacker's ability to con dently associate events to speci c workers. An event log
that satis es t-closeness furthermore ensures that the performance of individual
workers (e.g., in terms of throughput time) cannot be derived from a sanitized
event log. We kindly refer the reader to Fahrenkrog-Petersen et al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] for insights
on the impact of di erent k and t values on process mining utility.
Privatized Process Mining Queries. Users are currently able to execute
two distinct privacy-preserving process mining queries: the derivation of the
frequency of a directly-follows relation (i.e., how often one activity is executed after
another one) and the derivation of a trace variants (i.e., how often a
particular sequence of activity executions is contained in the log). Both queries follow
the privacy-preserving techniques developed by Mannhardt et al. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Here, the
general idea is to introduce so-called Laplace noise according to a user-de ned
value. By doing so, the obtained results will be guaranteed to satisfy -di erential
privacy. This means that the query result will not allow attackers to accurately
determine the sequencing of activity executions, which could identify a
particular worker involved in the execution of the process (e.g., through a particular
pattern of activity executions).
      </p>
      <p>
        To execute the directly-follows query, i.e., the Laplacian df-based algorithm,
solely the parameter needs to be speci ed. For the trace variant query, shown
in Figure 3b, a user needs to select , as well as a maximum sequence length,
and a pruning parameter. To avoid exploring a possibly in nite amount of trace
variants, the technique only explores trace variants up to a certain maximum
length. The pruning parameter is used to further limit the search space by
avoiding the consideration of infrequent variants. For a detailed explanation of the
queries and their privacy-preserving mechanisms, the reader is referred to [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
Output. When an event log and algorithm have been selected, the event log
will be uploaded and the application of the algorithm will be started as a batch
      </p>
      <p>(b) Di erential private query
job. The user will be noti ed at a provided e-mail address when the execution
has been completed. The user can retrieve the obtained process mining artifact
(either a sanitized event log or a query result) through the token from the e-mail.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Maturity and Availability</title>
      <p>ELPaaS, its source code, a tutorial, and all other information is available at
github.com/samadeusfp/elpaas and accessible without registration. The source
code from ELPaaS is available under the MIT licence. The project was
implemented using Python. We used the Django framework4 as a basis. A screencast
of our tool is available under: https://youtu.be/XLq124VpZ6Q</p>
      <p>Our application is an online service that has been developed to provide
privacy-preserving process mining techniques to other researchers. As such, the
web application in its current form is not optimized for industry-scale usage.
Nevertheless, the application is suitable to handle real-world event logs, such as
those of the BPI challenges5. Given its computational complexity, the event log</p>
      <sec id="sec-3-1">
        <title>4 https://www.djangoproject.com 5 https://data.4tu.nl/repository/collection:event logs real</title>
        <p>sanitization algorithm requires up to several hours to complete, whereas
privacypreserving queries can be executed in a matter of seconds. Our web deployment
is available through a secure connection and does not store the original event
log permanently. With these features we provide security to the users of our
application. Alternatively, it is possible for users to host the application
themselves. We provide our application as an isolated container, so it can be run
on a Docker6 instance. Given the ongoing developments occurring in the area
of privacy-preserving process mining, the ELPaaS architecture is designed for
simple integration of novel techniques in the future.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusion</title>
      <p>In this paper, we introduced ELPaaS, a web application that supports
privacypreserving process mining. The application bundles approaches for both event log
sanitization, which transforms an event log into one that meets privacy criteria,
as well as privatized process mining techniques, which ensure that obtained
process mining results adhere to privacy requirements. As such, the application
enables users to choose a technique that best suits their purposes.</p>
      <p>As research into privacy-preserving process mining is ongoing, we intend to
continuously expand the techniques o ered by the application in the future.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>This work was in part supported by Bane NOR and the Alexander von Humboldt
Foundation.</p>
      <sec id="sec-5-1">
        <title>6 https://www.docker.com</title>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Van der Aalst</surname>
          </string-name>
          , W.M.:
          <string-name>
            <surname>Process</surname>
          </string-name>
          Mining - Data Science in Action. Springer (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Asikis</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pournaras</surname>
          </string-name>
          , E.:
          <article-title>Optimization of privacy-utility trade-o s under informational self-determination</article-title>
          . FGCS, in press (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Fahrenkrog-Petersen</surname>
            ,
            <given-names>S.A.</given-names>
          </string-name>
          :
          <article-title>Providing privacy guarantees in process mining</article-title>
          .
          <source>Proceedings of the CAiSE Doctoral Consortium</source>
          . pp.
          <volume>23</volume>
          {
          <issue>30</issue>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Fahrenkrog-Petersen</surname>
            ,
            <given-names>S.A</given-names>
          </string-name>
          ., van der Aa, H.,
          <string-name>
            <surname>Weidlich</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          : Pretsa:
          <article-title>Event log sanitization for privacy-aware process discovery</article-title>
          . ICPM, in press (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Mannhardt</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Petersen</surname>
            ,
            <given-names>S.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Oliveira</surname>
            ,
            <given-names>M.F.</given-names>
          </string-name>
          :
          <article-title>Privacy challenges for process mining in human-centered industrial environments</article-title>
          .
          <source>In: 14th Int'l Conf. on Intelligent Environments</source>
          . pp.
          <volume>64</volume>
          {
          <fpage>71</fpage>
          .
          <string-name>
            <surname>IEEE</surname>
          </string-name>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Mannhardt</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Koschmider</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Baracaldo</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weidlich</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Racz</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Michael</surname>
          </string-name>
          , J.:
          <article-title>Privacy-preserving Process Mining: Di erential Privacy for Event Logs</article-title>
          . BISE, accepted (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Voss</surname>
          </string-name>
          , W.G.:
          <article-title>European union data privacy law reform: General data protection regulation, privacy shield, and the right to delisting</article-title>
          .
          <source>Business Lawyer</source>
          <volume>72</volume>
          (
          <issue>1</issue>
          ),
          <volume>221</volume>
          {
          <fpage>233</fpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>