<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Ensuring Con dentiality in Process Mining</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Chair of Process</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Data Science</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>RWTH Aachen University</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Aachen</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Yale University</institution>
          ,
          <addr-line>New Haven</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <fpage>3</fpage>
      <lpage>17</lpage>
      <abstract>
        <p>To gain novel and valuable insights into the actual processes executed within a company, process mining provides a variety of powerful data-driven analyses techniques ranging from automatically discovering process models to detecting and predicting bottlenecks, and process deviations. On the one hand, recent breakthroughs in process mining resulted in powerful techniques, encouraging organizations and business owners to improve their processes through process mining. On the other hand, there are great concerns about the use of highly sensitive event data. Within an organization, it often su ces that analysts only see the aggregated process mining results without being able to inspect individual cases, events, and persons. When analysis is outsourced also the results need to be encrypted to avoid con dentiality problems. Surprisingly, little research has been done toward security methods and encryption techniques for process mining. Therefore, in this paper, we introduce a novel approach that allows us to hide con dential information in a controlled manner while ensuring that the desired process mining results can still be obtained. We provide a sample solution for process discovery and evaluate it by applying a case study on a real-life event log.</p>
      </abstract>
      <kwd-group>
        <kwd>Responsible process mining</kwd>
        <kwd>Con dentiality</kwd>
        <kwd>Process discovery</kwd>
        <kwd>Directly follows graph</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Data science is changing the way we do business, socialize, conduct research, and
govern society. Data are collected on anything, at any time, and in any place.
Therefore, it is not surprising that many are concerned about the usage of such
data. The Responsible Data Science (RDS) [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] initiative focuses on four main
questions: (1) Data science without prejudice (How to avoid unfair conclusions
even if they are true?), (2) Data science without guesswork (How to answer
questions with a guaranteed level of accuracy?), (3) Data science that ensures
con dentiality (How to answer questions without revealing secrets?), and (4)
Data science that provides transparency (How to clarify answers such that they
become indisputable?). This paper focuses on the con dentiality problem (third
question) when applying process mining to event data.
      </p>
      <p>
        In recent years, process mining has emerged as a new eld which bridges the
gap between data science and process science. Process mining uses event data
to provide novel insights [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The breakthroughs in process mining are truly
remarkable. Currently, over 25 commercial tools supporting process mining are
available (e.g., Celonis, Disco, Magnaview, QPR, etc.) illustrating the value of
event data [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. However, existing tools and also the corresponding research rarely
considers con dentiality issues. Since the event logs used as a basis for process
mining often contain highly sensitive data, con dentiality is a major problem.
      </p>
      <p>As we show in this paper, con dentiality in process mining cannot be achieved
by simply encrypting all data. Since people need to use and see process mining
results, the challenge is to retain as little information as possible while still being
able to have the same desired result. Here, the desired result is a process model
that can be used to check compliance and spot bottlenecks. The discovered
models based on encrypted event logs should be identical to the results obtained
for the original event data (assuming proper authorizations).</p>
      <p>In this paper, we present a new approach to deal with con dentiality in
process mining. Selected parts of data will be encrypted or anonymized while also
keeping parts of the original event logs. For example, activity names remain
unchanged, but one cannot correlate events into end-to-end cases. The new
approach is explained through a sample solution for process discovery based on a
framework for con dentiality. The framework allows us to derive the same results
from secure event logs when compared to the results from original event logs,
while unauthorized persons cannot access con dential information. In addition,
this framework provides a secure solution for process mining when processes are
cross-organizational.</p>
      <p>The remainder of this paper is organized as follows. Section 2 outlines related
work and the problem background. In Section 3, we clarify process mining and
cryptography as preliminaries. In Section 4, the problem is explained in detail.
The new approach is introduced in Section 5. In Section 6 our evaluation is
described, and Section 7 concludes the paper.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        In both data science and information systems, con dentiality has been a topic of
interest in the last decade. In computer science, privacy-preserving algorithms
and methods in di erential privacy have the closest similarity to con dentiality
in process mining. In sequential pattern mining, the eld of data science most
closely related to process mining, there has been work on preserving privacy in
settings with distributed databases [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] or in cross-organizational settings [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ].
      </p>
      <p>
        The Process Mining Manifesto [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] also points out that privacy concerns
should be addressed. Although there have been a lot of breakthroughs in the
eld of process mining ranging from data preprocessing [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], and process
discovery [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] to performance analysis [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], the research eld con dentiality and
privacy has received relatively little attention.
      </p>
      <p>
        The topic of Responsible Process Mining (RPM) [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] has been put forward
by several authors thereby raising concerns related to fairness, accuracy, con
dentiality, and transparency. In [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] a method for securing event logs to be able
to do process discovery by Alpha algorithm has been proposed. In [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] a possible
approach toward a solution, allowing the outsourcing of process mining while
ensuring the con dentiality of dataset and processes, has been presented. In [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]
the authors has used a cross-organizational process discovery setting, where
public process model fragments are shared as safe intermediates. There are also a
few online guidelines [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ].
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Background</title>
      <p>In this section, we brie y present the main concepts and refer the readers to
relevant literature.
3.1</p>
      <sec id="sec-3-1">
        <title>Process Mining</title>
        <p>The four basic types of process mining are; (1) process discovery, which is used
to learn a process model based on event data , (2) conformance checking, which
compares observed behavior and modeled behavior , (3) process reengineering,
used for improving or extending the process model , and (4) operational support,
providing warning, predictions, and/or recommendations. In this paper, we focus
on process discovery.</p>
        <p>Events are the smallest data unit in process mining and occur when an
activity in a process is executed. In Table 1 each row indicates an event with
di erent attributes.</p>
        <p>A trace is a sequence of events and represents for one instance how a process
is executed. E.g., candidate George (case 3) is rst registered, then admitted.</p>
        <p>
          An event log is a collection of sequences of events. There are process mining
algorithms that can use them as input. Event data are widely available in current
information systems [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ].
        </p>
        <p>As you can see in Table 1, \Timestamp" identi es the moment in time at
which an event has taken place, and \Case ID" is what all events in a trace have
in common so that they can be identi ed as part of that process instance. Event
logs can also include additional attributes for the events they record. There are
two main attribute types that fall under this category. \Event Attributes" which
are speci c to an event, and \Case Attributes" which are ones that stay the same
throughout an entire trace.</p>
        <p>
          A Directly Follows Graph (DFG) is a graph where the nodes represent
activities and the arcs represent causalities. Activities \a" and \b" are connected
when \a" is frequently followed by \b". The weights of the arrows denote the
frequency of the relation [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]. Most commercial process mining tools use DFGs.
Unlike more advanced process discovery techniques (e.g., implemented in ProM),
DFGs can not express concurrency. The DFGs used in this paper also include
times, i.e., besides the frequencies also the average time that it takes to go from
one activity to another one is also included.
3.2
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>Cryptography</title>
        <p>
          Cryptography or cryptology is about constructing and analyzing protocols that
prevent third parties or the public from reading private messages [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ].
Cryptosystem is a suite of cryptographic algorithms needed to implement
a particular security service, most commonly for achieving con dentiality [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ].
There are di erent kinds of cryptosystems. In this paper, we use the following
ones.
        </p>
        <p>{ Symmetric Cryptosystem: In symmetric systems, the same secret key is used
to encrypt and decrypt a message. Data manipulation in symmetric systems
is faster than asymmetric systems as they generally use shorter key lengths.</p>
        <p>Advanced Encryption Standard (AES) is a symmetric encryption algorithm.
{ Asymmetric Cryptosystem: Asymmetric systems use a public key to encrypt
a message and a private key to decrypt it or vice versa. Use of asymmetric
systems enhances the security of communication. Rivest-Shamir-Adleman
(RSA) is an asymmetric encryption algorithm.
{ Deterministic Cryptosystem: A deterministic cryptosystem is a cryptosystem
which always produces the same ciphertext for a given plaintext and key,
even over separate executions of the encryption algorithm.
{ Probabilistic Cryptosystem: A probabilistic cryptosystem as opposed to
deterministic cryptosystem is a cryptosystem which uses randomness in an
encryption algorithm so that when encrypting the same plaintext several
times it will produce di erent ciphertexts.
{ Homomorphic Cryptosystem: A homomorphic cryptosystem allows
computation on ciphertext. E.g. Paillier is a partially homomorphic cryptosystem.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Problem De nition</title>
      <p>To illustrate the challenge of con dentiality in process mining, we start this
section with an example. Consider Table 2 describing a totally encrypted event
log, belonging to surgeries in a hospital. Since we need to preserve di erence to
nd a sequence of activities for each case, discovering process model, and other
analyses like social network discovery, \Case ID", \Activity", and \Resource"
are encrypted based on a deterministic encryption method. Numerical data (i.e.,
\Timestamp" and \Cost") are encrypted by a homomorphic encryption method
to be able to do basic mathematical computations. Now suppose that we have
background knowledge about surgeons and the approximate cost of di erent
types of surgeries and the question is whether this log is secure or not.</p>
      <p>Owning to the fact that the \Cost" is encrypted by a homomorphic
encryption method, the maximum value for the \Cost" is the real maximum cost and
based on the background knowledge we know that e.g., the most expensive event
in the hospital was the brain surgery by \Dr. Jone", on \01/09/2018 at 12:00",
and the patient name is \Judy". Since \Case ID", \Activity", and \Resource"
are encrypted by a deterministic encryption method, we can replace all these
encrypted values with the corresponding plain values. Consequently, some part
of the encrypted data could be made visible without decryption. This example
clearly demonstrates that even when event logs are totally encrypted, given a
small fraction of contextual knowledge, data leakage is possible.</p>
      <p>
        There are also some other techniques, which can be used to extract knowledge
from an encrypted event log, exploiting background knowledge and some speci c
characteristics of the event log. In the following, we describe some of them.
{ Exploring Order of Activities: in large processes, most cases follow a unique
path, which can cause data leakage by focusing on the order of activities [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
{ Frequency Mining: one can nd the most or the less frequent activities and
simply replace the encrypted values with the real values based on a
knowledge about the frequency of activities.
{ Exploring Position of Activities: limited information about the position of
activities in traces can lead to data leakage. E.g., in a hospital, one can easily
know that the rst activity is registration.
      </p>
      <p>
        These are just some examples to demonstrate that encryption alone is not a
solution. For example, in [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] it is shown that mobility traces are easily identi
able after encryption. Any approach which is based on just encrypting the whole
event log will have the following additional weaknesses:
{ Encrypted Results: since results are encrypted, the data analyst is not able to
interpret the results. E.g., as data analyst we want to know which paths are
the most frequent after \Registration" activity; how can we do this analysis
when we do not know which activity is \Registration"? The only solution is
decrypting results.
{ Impossibility of Accuracy Evaluation: how can we make sure that a result of
the encrypted event log is the same as the result of the plain event log? The
only solution is decrypting the result of the encrypted event log.
      </p>
      <p>
        Generally and as explored by [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], using cryptography is a resource consuming
activity, and decryption is even much more resource consuming than encryption.
These weaknesses demonstrate that it would be better if we could keep some
parts of a data as plain text even in the secure event log. However, the challenge
is to decide what should be kept in plain format and what not (encrypted or
removed), and how we should address the data leakage that may arise from the
plain data. In the next section, an approach is introduced, where we provide
some answers to this questions.
5
      </p>
    </sec>
    <sec id="sec-5">
      <title>Approach</title>
      <p>As mentioned, the approach is described based on a sample solution for process
discovery. In fact, the aim is to convert an event log to a secure event log such
that just authorized persons can have access to con dential data, process model
for the secure event log is the same as process model for the plain event log, and
the current process discovery techniques can be used with the secure event log.</p>
      <p>
        Fig. 1 shows the scheme which has been depicted as a framework to provide
a solution for the above-mentioned purpose. This framework has been inspired
by [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], where abstractions are introduced as intermediate results for relating
models and logs. As can be seen in Fig. 1 three di erent environments and two
con dentiality solutions are presented.
      </p>
      <p>{ Forbidden Environment: In this environment, the actual information system
runs that needs to use the real data. The real event logs (EL) produced
by this environment contain a lot of valuable con dential information and
except some authorized persons no one can access this data.
{ Internal Environment: This environment is just accessible by the authorized
stakeholders. A data analyst can be considered as an authorized stakeholder
and can access the internal event logs. Event logs in this environment are
partially secure, selected results produced in this environment (e.g., a process
model) are the same as the results produced in the forbidden environment,
and data analyst is able to interpret the results without decryption.
{ External Environment: In this environment, unauthorized external persons
can access the data. Such environments may be used to provide the
computing infrastructure dealing with large data sets (e.g., a cloud solution). Event
logs in this environment are entirely secure, and the results are encrypted.
Whenever data analyst wants to interpret the results, these results have to
be decrypted and converted to the internal version. Also, results from the
external environment do not need to be exactly the same as the results from
the internal environment.
For ICS we combine several methods and introduce the connector method,
where several techniques are utilized to create a new level of security. Fig. 2
gives an overview of the anonymization steps.</p>
      <sec id="sec-5-1">
        <title>Filtering and Modifying The Input. The rst step to e ective anonymiza</title>
        <p>tion is preparing the data input. To lter the input, simple limits for frequencies
can be set, and during loading an event log all traces that do not reach the
minimal frequencies are not transferred to the EL0. Attributes which are irrelevant
for analysis should be removed regardless of their sensitivity.</p>
        <p>
          Choosing The Plain Data. As mentioned, we need to produce interpretable
results. Hence, some parts of event log remain as plain text in the internal version
of the secure event log (EL0). To make a process model based on EL0, we should
take a look at what information and/or structure is strictly necessary for
discovering a process model. Here there are di erent choices; however, we consider
the DFG, used by many discovery approaches, as an abstraction which relates
logs and models [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. Therefore, Abstractions (i.e., AEL, AEL0, and AEL00) are
DFGs.
        </p>
        <p>If we have a DFG, then the process model can be made based on it. Therefore,
the next step is taking a look at what information and/or structure is necessary
to make a DFG. Since a DFG is a graph which shows the directly follows relation
between activities, we need activities as information to be plain, and we also need
a structure which can be used for extracting directly follows relations. Such a
structure should be embedded into EL0.</p>
        <p>Encryption. Here there are two important choices. The rst choice is which
columns of the event log should be encrypted. Second, we need to decide which
algorithms should be used. As can be seen in Fig. 3, for the internal environment,
we use Paillier as a good choice for numeric attributes (i.e. \Cost") and AES for
other attributes (i.e. \Activity").</p>
        <p>Making Times Relative. Times need to be modi ed because keeping the
exact epoch time of an event can allow one to identify it. The naive approach, of
setting the starting time of every trace to 0, would make it impossible to replay
events and reconstruct the original log. Thus, we select another time that all
events are made relative to. This time can be kept secure along with the keys for
decryption. Fig. 3 shows a sample log after encrypting and making times relative
to the \30.12.2010:00.00".</p>
        <p>The Connector Method. Using the connector method we embed the
structure, which can be used for extracting directly follows relations, into EL0. Also,
the connector method helps us to reconstruct the full original event logs when
keys and relative values are given. In the rst step, the previous activity (\Prev.
Activity") column is added in order of identifying which arcs can be directly
added to the directly follows graph later.</p>
        <p>In the second step, we nd a way to securely save the information contained
in the \Case ID", without allowing it to link the events. This can be done
by giving each row a random ID (\ID") and a previous ID (\PrevID"). These
uniquely identify the following event in a trace because the IDs are not generic
like activity names. The ID for start activities is always a number of zeros. Fig. 4
shows the log after adding \Prev. Activity" and \PrevID".
(a) The sample event log.</p>
        <p>(b) Encrypting resources and costs and
making times relative.
In the third step, regarding the fact that these columns contain the same
information previously found in the \Case ID", they have to be hidden and
secure. This can be done by concatenating the \ID" and \PrevID" of each row
and encrypting those using AES. Due to the nature of AES neither orders nor
sizes of the IDs can thus be inferred. The concatenation can be done in any style,
in this example, we however simply concatenate the \PrevID" behind the \ID".
To retain the \ID" and \PrevID" one simply needs to decrypt the \Connector"
column and cut the resulting number in two equal parts. This method requires
that every time the two IDs di er by a factor 10 a zero must be added to
guarantee equal length. Fig. 5 shows the log after concatenating the ID columns
and encrypting them as a connector.</p>
        <p>In the nal step, we use the \Case ID" to anonymize the \Time tamp". The
\Time tamp" attribute of events which have the same \Case ID" is made relative
to the preceded one. The exception is the rst event of each trace which remains
unchanged. This allows the complete calculation of all durations of the arcs in a
directly follows graph but makes it complicated to identify events based on the
epoch times they occurred at. After creating the relative times, we are free to
delete the \Case ID" and randomize the order of all rows, ending up with an
unconnected log in Fig. 6.
(a) Concatenating ID and previous ID.
(b) Encrypting the connector.</p>
        <p>Fig. 6 is internally secure event log (EL0), which can be used by a data
analyst to make DF G (AEL0) and P M 0. It is obvious that if process discovery
could have been done on the plain event log (EL), AEL would be identical to
AEL0 (i.e., both of them are the same DFG) and P M would be identical to
P M 0.</p>
        <p>Comparing Fig. 6 and the original log (Fig. 3a), one can see that there is
no answer for the following questions in EL0 anymore: (1) What is the name of
a resource? (2) Who was responsible for doing an activity at exact time t? (3)
What is the sequence of activities which has been done for case c? (4) How long
did it take to process case c? (5) What is the cost of activity a which has been
done by resource r for case c?</p>
        <p>However, it is still possible to answer the following question: Who is
responsible for activity a? In fact, EL0 is a partially secure version of event log in such
a way that contains the minimum level of information, which data analyst needs
to reach the result. Although ICS does not preserve the standard format of the
event log which is used by the current process discovery techniques, it provides
an intermediate input (i.e., a DFG), which can be used by the current tools. In
the External Con dentiality Solution (ECS), we need to avoid any form of data
leakage, i.e., the results do not need to be interpreted by the external party.
In the external environment, the plain part of the event log may cause data
leakage. E.g., based on background knowledge, one with a little e ort can extract
that who is responsible for \Registration". Therefore, in ECS, we convert El0 to
the externally secure event log (EL00) in such a way that it prevents an adversary
from extracting valuable information even by inference. In the following, our
twosteps ECS is explained.</p>
        <p>Encrypting The Plain Part. In this step, activities are encrypted by a
deterministic encryption method like AES. A deterministic encryption method has
to be used because for discovering a DFG or a process model, di erences should
be preserved. Fig. 7 shows the result after encrypting activities.</p>
        <p>However, after encrypting, detecting \START" activities seem to be
impossible, and without detecting them, extracting the relations is not possible. For
identifying the \START" activities, we can go through the \Activity" and \Prev.
Activity" columns, the activities which are appeared in the \Prev. Activity"
column but not appeared in the \Activity" column are the \START" activities.</p>
      </sec>
      <sec id="sec-5-2">
        <title>Fortifying Encryption and/or Projecting Event Logs. In our sample, re</title>
        <p>sources are encrypted by a deterministic encryption method (AES-ECB), and
costs are encrypted by homomorphic encryption, which preserves di erences.
Consequently, by comparison, one can nd the minimum and maximum cost,
which can be used as knowledge for extracting con dential information (e.g.
name of resource). In order to decrease the e ect of such analyses, fortifying
encryption and/or projecting event logs could be done. E.g., resources can be
encrypted by a probabilistic encryption (e.g. AES-CTR), and costs can be
removed. In fact, all attributes not needed for process discovery can be removed.
6</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Evaluation</title>
      <p>We consider three evaluation criteria for the proposed approach while
performance is also taken into account.
{ Ensuring Con dentiality: as explained in Section 5, we can increase the
level of con dentiality by de ning di erent environments an indicating level
of information which is accessible by each environment. In addition, using
multiple encryption methods and our connector method for disassociating
events from their cases improve con dentiality.
{ Providing Reversibility: when the keys and the value used for making times
relative are given, both ICS and ECS are reversible, which means that
transparency is addressed by the proposed approach.
{ Proving Accuracy: to prove the accuracy of our approach, by a case study we
show that DF G of the original event log (AEL) and DF G of the secure event
logs (i.e., AEL0 and AEL00) are the same, and consequently corresponding
process models are similar.
6.1</p>
      <sec id="sec-6-1">
        <title>Proving Accuracy</title>
        <p>As can be seen in Fig. 1, to prove accuracy, we need to show that the abstraction
of the original event log is the same as the abstraction of the internal event
log (AEL = AEL0) (rule (1)), and also the abstraction of the internal event
log is the same as the abstraction of the external event log, which is encrypted
(AEL0 = ECS 1(AEL00)) (rule (2)). For this purpose, we have implemented four
plugins for ProM including; \ICS", \ECS", \DFG from secure logs", and \DFG
from regular logs". \ICS" is used for converting an event log in regular XES
format to the internal version of secure event log, \ECS" is used for converting
internal version of secure event log to the external version of secure event log,
\DFG creator from secure logs" is able to make a DFG based on the secure
version of event log, and \DFG creator from regular logs" is used to make a
DFG from regular XES log. These plugins have been used along with a case
study of real life logs to prove the accuracy. In summary:</p>
        <p>AEL = AEL0 ) P M = P M 0
AEL0 = ECS 1(AEL00) ) P M 0</p>
        <p>ECS 1(P M 00)
(1)
(2)
6.2</p>
      </sec>
      <sec id="sec-6-2">
        <title>Case Study: Real Life Log of Sepsis Patients</title>
        <p>
          The real-life event log for a group of sepsis patients in a hospital [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ], containing
1050 cases, 15214 events, and 16 event classes, is used to prove the accuracy.
        </p>
        <p>In the rst step, EL0, and EL00 have been created by \ICS", and \ECS"
plugins respectively. Then, to verify that AEL is identical to AEL0, \DFG from
Regular Logs" and \DFG from Secure Logs" have been used to produce
corresponding DFGs. The resulting DFGs were exactly the same. Because of the
space limitations, we are not able to show them. Finally, to prove that AEL0
is the same as AEL00, where activities are encrypted, we have used \DFG from
Secure Logs" plugin. To be able to take a closer look at the AEL0 and AEL00,
in Fig. 8, we have zoomed in both of them and highlighted a speci c path from
(a) A part of the plain DF G
(b) A part of the encrypted DF G
\ER Registration" to \ER Sepsis Triage". As can be seen in Fig. 8, both AEL0
and AEL00 show the same relation between these two activities. The frequency
of this relation is 11 and the average time is 0.06 (f=11, t=0.06). In addition,
this gure shows that \ER Registration" has no real input link, and \ER Sepsis
Triage" has ten input links and eight output links.
This paper presented a novel approach to ensure con dentiality in process
mining. We demonstrated that con dentiality in process mining can not be achieved
by only encrypting the whole event log. We discussed the few related works,
most of which use just encryption, and explained their weaknesses. Moreover,
(a) Exec. time for choice loop events</p>
        <p>(b) Exec. time for sequence loop events
we elaborated on the open challenges in this research area. The new approach
is introduced based on the fact that there always exist a trade-o between
condentiality and data utility. Therefore, we reasoned backwards from the desired
results and how they can be obtained with as little data as possible.</p>
        <p>Here, the desired result was a process model and the solution presented by
introducing a framework for con dentiality that can be extended to include
other forms of process mining, e.g., conformance checking, performance analysis,
social network analysis, etc. (i.e., di erent ICS and ECS could be explored
for di erent process mining activities). A new method named \Connector" has
been introduced, which can be employed in any situation in which we need to
store some associations securely. For evaluating the proposed approach, four
plugins have been implemented and a real-life log was used as a case study. The
approach is tailored towards the discovery of the directly follows graph. Also,
the framework could be utilized in cross-organizational context such that each
environment could cover speci c constraints and authorizations of a party.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Van der Aalst</surname>
          </string-name>
          , W.:
          <article-title>Process mining: data science in action</article-title>
          . Springer (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Van der Aalst</surname>
          </string-name>
          , W.:
          <article-title>Responsible data science: using event data in a \people friendly" manner</article-title>
          .
          <source>In: International Conference on Enterprise Information Systems</source>
          . pp.
          <volume>3</volume>
          {
          <fpage>28</fpage>
          . Springer (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Van der Aalst</surname>
          </string-name>
          , W.:
          <article-title>Benchmarking logs to test scalability of process discovery algorithms</article-title>
          . Eindhoven University of Technology. https://data.4tu.nl/ repository/uuid:1cc41f8a-3557
          <string-name>
            <surname>-</surname>
          </string-name>
          499a
          <string-name>
            <surname>-</surname>
          </string-name>
          8b34-
          <fpage>880c1251bd6e</fpage>
          (
          <year>2017</year>
          ), [Online; accessed 17-September-2018]
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Van der Aalst</surname>
          </string-name>
          , W.:
          <article-title>Process discovery from event data: Relating models and logs through abstractions</article-title>
          .
          <source>Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery</source>
          <volume>8</volume>
          (
          <issue>3</issue>
          ),
          <year>e1244</year>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Van der Aalst</surname>
          </string-name>
          , W.,
          <string-name>
            <surname>Adriansyah</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>De Medeiros</surname>
            ,
            <given-names>A.K.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Arcieri</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Baier</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Blickle</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bose</surname>
            ,
            <given-names>J.C.</given-names>
          </string-name>
          , Van Den Brand,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Brandtjen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Buijs</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.</surname>
          </string-name>
          , et al.:
          <article-title>Process mining manifesto</article-title>
          .
          <source>In: International Conference on Business Process Management</source>
          . pp.
          <volume>169</volume>
          {
          <fpage>194</fpage>
          . Springer (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Van der Aalst</surname>
          </string-name>
          , W.,
          <string-name>
            <surname>Bichler</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Heinzl</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Responsible data science</article-title>
          .
          <source>Business &amp; Information Systems Engineering</source>
          <volume>59</volume>
          (
          <issue>5</issue>
          ),
          <volume>311</volume>
          {313 (Oct
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Bellare</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rogaway</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Introduction to modern cryptography</article-title>
          .
          <source>Ucsd Cse</source>
          <volume>207</volume>
          ,
          <issue>207</issue>
          (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Burattin</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Conti</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Turato</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Toward an anonymous process mining</article-title>
          .
          <source>In: Future Internet of Things and Cloud (FiCloud)</source>
          ,
          <year>2015</year>
          3rd International Conference on. pp.
          <volume>58</volume>
          {
          <fpage>63</fpage>
          .
          <string-name>
            <surname>IEEE</surname>
          </string-name>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Kapoor</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Poncelet</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Trousset</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Teisseire</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Privacy preserving sequential pattern mining in distributed databases</article-title>
          .
          <source>In: Proceedings of the 15th ACM international conference on Information and knowledge management</source>
          . pp.
          <volume>758</volume>
          {
          <fpage>767</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Katz</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Menezes</surname>
            ,
            <given-names>A.J.</given-names>
          </string-name>
          , Van Oorschot,
          <string-name>
            <given-names>P.C.</given-names>
            ,
            <surname>Vanstone</surname>
          </string-name>
          ,
          <string-name>
            <surname>S.A.</surname>
          </string-name>
          :
          <article-title>Handbook of applied cryptography</article-title>
          . CRC press (
          <year>1996</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Leemans</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Van der Aalst</surname>
          </string-name>
          , W., van den Brand, M.G.:
          <article-title>Hierarchical performance analysis for process mining</article-title>
          .
          <source>In: Proceedings of the 2018 International Conference on Software and System Process</source>
          . pp.
          <volume>96</volume>
          {
          <fpage>105</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Leemans</surname>
            ,
            <given-names>S.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fahland</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Van der Aalst</surname>
          </string-name>
          , W.:
          <article-title>Scalable process discovery and conformance checking</article-title>
          .
          <source>Software &amp; Systems Modeling</source>
          <volume>17</volume>
          (
          <issue>2</issue>
          ),
          <volume>599</volume>
          {
          <fpage>631</fpage>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Duan</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Qingtian</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lu</surname>
          </string-name>
          , F., Cheng, J.:
          <article-title>Towards comprehensive support for privacy preservation cross-organization business process mining</article-title>
          .
          <source>IEEE Transactions on Services Computing (1)</source>
          , 1{
          <issue>1</issue>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Ma</surname>
          </string-name>
          , C.Y.,
          <string-name>
            <surname>Yau</surname>
            ,
            <given-names>D.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yip</surname>
            ,
            <given-names>N.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rao</surname>
            ,
            <given-names>N.S.:</given-names>
          </string-name>
          <article-title>Privacy vulnerability of published anonymous mobility traces</article-title>
          .
          <source>IEEE/ACM transactions on networking (TON) 21(3)</source>
          ,
          <volume>720</volume>
          {
          <fpage>733</fpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Mannhardt</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <string-name>
            <surname>Sepsis Cases - Event Log</surname>
          </string-name>
          . Eindhoven University of Technology. https://doi.org/10.4121/uuid:
          <fpage>915d2bfb</fpage>
          -7e84
          <string-name>
            <surname>-</surname>
          </string-name>
          49ad
          <string-name>
            <surname>-</surname>
          </string-name>
          a286-
          <fpage>dc35f063a460</fpage>
          (
          <year>2016</year>
          ), [Online; accessed 17-September-2018]
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Mannhardt</surname>
          </string-name>
          , F.,
          <string-name>
            <surname>de Leoni</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Reijers</surname>
            ,
            <given-names>H.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Van der Aalst</surname>
          </string-name>
          , W.,
          <string-name>
            <surname>Toussaint</surname>
            ,
            <given-names>P.J.:</given-names>
          </string-name>
          <article-title>Guided process discovery{a pattern-based approach</article-title>
          .
          <source>Information Systems</source>
          <volume>76</volume>
          , 1{
          <fpage>18</fpage>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Rozinat</surname>
          </string-name>
          , Gunther, C.W.:
          <article-title>Privacy, security and ethics in process mining</article-title>
          . http://coda.fluxicon.com/assets/downloads/Articles/PMNews/ Privacy-Security-
          <article-title>and-</article-title>
          <string-name>
            <surname>Ethics-In-</surname>
          </string-name>
          Process-Mining.pdf (
          <year>2016</year>
          ), [Online; accessed 17-September-2018]
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Sani</surname>
          </string-name>
          , M.F.,
          <string-name>
            <surname>van Zelst</surname>
            ,
            <given-names>S.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Van der Aalst</surname>
          </string-name>
          , W.:
          <article-title>Repairing outlier behaviour in event logs</article-title>
          .
          <source>In: International Conference on Business Information Systems</source>
          . pp.
          <volume>115</volume>
          {
          <fpage>131</fpage>
          . Springer (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Tillem</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Erkin</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lagendijk</surname>
            ,
            <given-names>R.L.</given-names>
          </string-name>
          :
          <article-title>Privacy-preserving alpha algorithm for software analysis</article-title>
          .
          <source>In: 37th WIC Symposium on Information Theory in the Benelux/6th WIC/IEEE SP Symposium on Information Theory and Signal Processing in the Benelux</source>
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Zhan</surname>
            ,
            <given-names>J.Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Matwin</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Privacy-preserving collaborative sequential pattern mining</article-title>
          .
          <source>Tech. rep.</source>
          , Ottawa Univ (Ontario) School of Information Technology (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>