<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Amun: A tool for Diferentially Private Release of Event Logs for Process Mining (Extended Abstract)</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Gamal Elkoumy</string-name>
          <email>gamal.elkoumy@ut.ee</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alisa Pankova</string-name>
          <email>alisa.pankova@cyber.ee</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marlon Dumas</string-name>
          <email>marlon.dumas@ut.ee</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Process Mining, Event Log, Diferential Privacy</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Cybernetica</institution>
          ,
          <addr-line>20 Narva mnt, Tartu, 51009</addr-line>
          ,
          <country country="EE">Estonia</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Tartu</institution>
          ,
          <addr-line>18 Narva mnt, Tartu, 51009</addr-line>
          ,
          <country country="EE">Estonia</country>
        </aff>
      </contrib-group>
      <fpage>56</fpage>
      <lpage>60</lpage>
      <abstract>
        <p>Event logs capture the execution of business processes inside organizations. Event logs may contain private information about individuals, such as customers in customer-facing business processes, which can be a roadblock to analyzing the logs due to data regulations. To circumvent that, this paper introduces Amun: A web-based application for releasing event logs using diferential privacy. The tool enables the users to get a diferentially private event log that minimizes the risk to the maximum acceptable threshold given by the user. Therefore, the customer's privacy is guaranteed, and the organization could release their logs to be analyzed.</p>
      </abstract>
      <kwd-group>
        <kwd>Abstract)</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Amun
Filtering with</p>
      <p>Sampling
Oversampling</p>
      <p>Noise
Quantification
a sub-trace. Furthermore, Amun anonymizes the execution timestamps and masks the case
IDs. As a nonfunctional requirement, Amun can process large event logs with hundreds of
thousands of events. Moreover, the tool lets the users know each event’s re-identification risk
in the original log.</p>
      <p>The rest of the paper is structured as follows. Sect. 1 describes Amun’s functionality and
components. Sect. 2 discusses the availability and maturity of the tool. Sect. 3 presents the
conclusions.</p>
    </sec>
    <sec id="sec-2">
      <title>1. Functionality</title>
      <p>Input Figure 2 presents the upload page of the web application. The event log publisher
uploads their event log to Amun as either an XES (eXtensible Event Stream) or CSV (Comma
Separated Value) file. Amun requires the event log to have at least a column representing the
case ID, a column representing the activity instance, and a column that records the timestamp
executing each activity. Then, the user sets the maximum acceptable risk probability ( ) using
the slider, selects the anonymization method (sampling, oversampling, or filtering), and clicks
Anonymize.</p>
      <p>
        The maximum acceptable risk probability ( ) represents the increase in the probability of
singling out an individual after releasing the log. For example, suppose the attacker has prior
information about an individual that makes the presence probability of that individual 20%. In
that case,  is the increase of that presence probability after releasing the log.
Preprocessing and risk quantification Once the user clicks Anonymize, Amun starts
processing the file. The first step is to establish a representation that helps to quantify the
re-identification risk attached to releasing each event in the log. To this end, Amun represents
the input event log as a lossless representation, namely a Deterministic Acyclic Finite State
Automata (DAFSA) [
        <xref ref-type="bibr" rid="ref10">9</xref>
        ]. Next, Amun annotates each event log with its DAFSA transition, as
explained in [
        <xref ref-type="bibr" rid="ref9">8</xref>
        ]. Then, for each event, Amun estimates the prior knowledge   , which represents
the re-identification risk before publishing the log, and the posterior knowledge  ′, which
means the re-identification risk after publishing the log. A detailed explanation of this risk

quantification is presented in [
        <xref ref-type="bibr" rid="ref9">8</xref>
        ].
      </p>
      <sec id="sec-2-1">
        <title>Anonymization Methods</title>
        <sec id="sec-2-1-1">
          <title>Amun ofers the user three diferent anonymization approaches.</title>
          <p>
            All the approaches guarantee that the customers in the anonymized log will not be singled
out using a subset of their trace variants or the timestamp of executing their activities. All the
approaches provide diferential privacy guarantees [
            <xref ref-type="bibr" rid="ref4">3</xref>
            ] by injecting noise, quantified by the
diferential privacy parameter  , from the control flow perspective, representing user traces in
the log and the timestamp perspective. Amun ofers the following anonymization approaches:
• Oversampling [
            <xref ref-type="bibr" rid="ref8">7</xref>
            ]. In some settings, the user requires to have the same set of trace
variants in the anonymized event log as in the original log. Therefore, the oversampling
approach preserves the same set of trace variants while preventing singling out traces in
the log. To this aim, Amun applies the approach presented by Elkoumy et al. [
            <xref ref-type="bibr" rid="ref8">7</xref>
            ]. This
approach fits structured event logs where the cases of the log share trace variants.
• Sampling. In some settings, the user may accept the deletion of some trace variants in
order to release an anonymized event log that is close to the original log. To this end, the
sampling approach anonymizes the event log so that the anonymization does not add
new trace variants in the log, and the diference between the real and the anonymized
timestamp is minimal. Amun applies the sampling approach presented in [
            <xref ref-type="bibr" rid="ref9">8</xref>
            ]. This
approach works with semi-structured event logs.
• Filtering with Sampling. Some event logs may contain very unique user traces,
resulting in large noise injection to achieve diferential privacy guarantees. Therefore, Amun
applies the filtering with sampling approach presented by Elkoumy et al. [
            <xref ref-type="bibr" rid="ref9">8</xref>
            ] to enable
the anonymization of unstructured event logs, i.e., event logs with unique traces. The
ifltering approach filters out very risky traces that requires large noise injection. Thus,
the anonymized logs preserve more utility.
          </p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>Noise Quantification and Injection</title>
        <sec id="sec-2-2-1">
          <title>At this step, given the estimated re-identification risk</title>
          <p>per event, Amun estimates the suitable  value. We draw noise from Laplacian distribution and
inject noise for both the control flow and time perspectives. This step is performed for each
event independently.</p>
        </sec>
      </sec>
      <sec id="sec-2-3">
        <title>Output</title>
        <p>Once the event log anonymization is finished, the anonymized event log will be
available for download. Amun downloads the anonymized log in the same format as the original
log. Amun ofers to download the risk quantification of each activity instance in the log as
a CSV file. The risk quantification per each activity instance is a column called original risk,
which represents the re-identification risk of releasing the event log before the anonymization.
Amun anonymizes only the three columns: case ID, activity label, and timestamp. Amun drops
the other attributes from the anonymized log.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>2. Maturity and Availability</title>
      <p>
        Amun has been empirically evaluated with real-life event logs as reported in [
        <xref ref-type="bibr" rid="ref8 ref9">7, 8</xref>
        ]. The empirical
evaluation shows that Amun overcomes the state-of-the-art in terms of Jaccard distance and
earth movers’ distance. Also, the empirical evaluation validates the non-functional requirements,
as presented in Sect. 2.
      </p>
      <p>Amun is developed as a React web application and an API for ease of use. To enable
quick trials by the users, Amun is available as a cloud service that can be found at
http://amun.cloud.ut.ee. The current server deployment accepts event logs with sizes up to 5 MB.
Amun is available as a docker image. The image and its installation steps can be found at
https://github.com/Elkoumy/amun/tree/amun-flask-app . Also, Amun is available as a python
package and can be integrated into other process mining tools. The source code and the
installation steps can be found at https://github.com/Elkoumy/amun. A screencast that describes the
tool is available on YouTube at https://youtu.be/1dxaCNE9WHk.</p>
    </sec>
    <sec id="sec-4">
      <title>3. Conclusion</title>
      <p>In this paper, we introduced Amun, a tool that provides diferential privacy guarantees to release
event logs for process mining. Amun ofers approaches for event logs anonymization, which
are suitable for diferent requirements of event logs publishers. The tool also quantifies the
re-identification risk of releasing every activity instance in the log.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments References</title>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <article-title>Work funded by European Research Council (PIX project) and by EU H2020-SU-</article-title>
          <string-name>
            <surname>ICT-</surname>
          </string-name>
          03-2018 Project No.
          <volume>830929</volume>
          <fpage>CyberSec4Europe</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Dumas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. La</given-names>
            <surname>Rosa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Mendling</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. A.</given-names>
            <surname>Reijers</surname>
          </string-name>
          , et al.,
          <source>Fundamentals of business process management</source>
          , volume
          <volume>1</volume>
          , Springer,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>G.</given-names>
            <surname>Elkoumy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. A.</given-names>
            <surname>Fahrenkrog-Petersen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. F.</given-names>
            <surname>Sani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Koschmider</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Mannhardt</surname>
          </string-name>
          ,
          <string-name>
            <surname>S. N. von Voigt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Rafiei</surname>
          </string-name>
          , L. von Waldthausen,
          <article-title>Privacy and confidentiality in process mining: Threats and research challenges</article-title>
          ,
          <source>ACM Trans. Manag. Inf. Syst</source>
          .
          <volume>13</volume>
          (
          <year>2022</year>
          )
          <volume>11</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>11</lpage>
          :
          <fpage>17</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>C.</given-names>
            <surname>Dwork</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Roth</surname>
          </string-name>
          , et al.,
          <article-title>The algorithmic foundations of diferential privacy</article-title>
          .,
          <source>Found. Trends Theor. Comput. Sci. 9</source>
          (
          <year>2014</year>
          )
          <fpage>211</fpage>
          -
          <lpage>407</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Bauer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. A.</given-names>
            <surname>Fahrenkrog-Petersen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Koschmider</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Mannhardt</surname>
          </string-name>
          , H. van der Aa, M. Weidlich,
          <article-title>ELPaaS: Event log privacy as a service, in: BPM (PhD/Demos)</article-title>
          , volume
          <volume>2420</volume>
          <source>of CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>159</fpage>
          -
          <lpage>163</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Rafiei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Schnitzler</surname>
          </string-name>
          ,
          <string-name>
            <surname>W. M. P. van der Aalst</surname>
          </string-name>
          ,
          <article-title>PC4PM: A tool for privacy/confidentiality preservation in process mining, in: BPM (PhD/Demos)</article-title>
          , volume
          <volume>2973</volume>
          <source>of CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>106</fpage>
          -
          <lpage>110</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>G.</given-names>
            <surname>Elkoumy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. A.</given-names>
            <surname>Fahrenkrog-Petersen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dumas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Laud</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Pankova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Weidlich</surname>
          </string-name>
          ,
          <article-title>Shareprom: A tool for privacy-preserving inter-organizational process mining, in: BPM (PhD/Demos)</article-title>
          , volume
          <volume>2673</volume>
          <source>of CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>72</fpage>
          -
          <lpage>76</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>G.</given-names>
            <surname>Elkoumy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Pankova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dumas</surname>
          </string-name>
          ,
          <article-title>Mine me but don't single me out: Diferentially private event logs for process mining</article-title>
          , in: ICPM, IEEE,
          <year>2021</year>
          , pp.
          <fpage>80</fpage>
          -
          <lpage>87</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>G.</given-names>
            <surname>Elkoumy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Pankova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dumas</surname>
          </string-name>
          ,
          <article-title>Diferentially private release of event logs for process mining</article-title>
          ,
          <source>CoRR abs/2201</source>
          .03010 (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>J.</given-names>
            <surname>Daciuk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mihov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. W.</given-names>
            <surname>Watson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. E.</given-names>
            <surname>Watson</surname>
          </string-name>
          ,
          <article-title>Incremental construction of minimal acyclic finite-state automata</article-title>
          ,
          <source>Comput. Linguistics</source>
          <volume>26</volume>
          (
          <year>2000</year>
          )
          <fpage>3</fpage>
          -
          <lpage>16</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>