=Paper= {{Paper |id=Vol-2673/paperDR02 |storemode=property |title=Shareprom: A Tool for Privacy-Preserving Inter-Organizational Process Mining |pdfUrl=https://ceur-ws.org/Vol-2673/paperDR02.pdf |volume=Vol-2673 |authors=Gamal Elkoumy,Stephan A. Fahrenkrog-Petersen,Marlon Dumas,Peeter Laud,Alisa Pankova,Matthias Weidlich |dblpUrl=https://dblp.org/rec/conf/bpm/ElkoumyFDLPW20 }} ==Shareprom: A Tool for Privacy-Preserving Inter-Organizational Process Mining== https://ceur-ws.org/Vol-2673/paperDR02.pdf
      Shareprom: A Tool for Privacy-Preserving
        Inter-Organizational Process Mining

     Gamal Elkoumy1 , Stephan A. Fahrenkrog-Petersen2 , Marlon Dumas1 ,
           Peeter Laud3 , Alisa Pankova3 , and Matthias Weidlich2
                         1
                         University of Tartu, Tartu, Estonia
                      {gamal.elkoumy,marlon.dumas}@ut.ee
                 2
                   Humboldt-Universität zu Berlin, Berlin, Germany
                       {fahrenks,weidlima}@hu-berlin.de
                          3
                            Cybernetica, Tartu, Estonia
                     {peeter.laud,alisa.pankova}@cyber.ee



       Abstract. Process mining is a set of techniques to analyze business
       processes based on event logs extracted from information systems. Existing
       process mining techniques are designed for intra-organizational settings,
       as they assume that the entire event log of a process is available for
       analysis at once. In an intra-organizational process, each party only has
       access to its own private event log, which gives only a partial picture of the
       whole process. Moreover, the involved parties may be unwilling or unable
       to share their private logs due to confidentiality or privacy imperatives.
       In this context, this paper presents a tool, namely Shareprom, that
       enables independent parties to execute process mining operations without
       revealing any data other than the output of the analysis. Specifically,
       Shareprom uses secure multi-party computation techniques to compute
       the frequency or time-annotated Directly-Follows Graph (DFG) of event
       logs held by multiple parties, without these parties having to share any
       information other than the resulting DFG. The tool applies a differentially
       private release mechanism before revealing the output DFG in order to
       provide an additional layer of privacy protection.


1    Introduction

Process mining techniques enable organizations to analyze business processes
from event logs extracted from information systems. These techniques allow us,
for example, to identify performance bottlenecks and to recommend changes to
a process in order to increase its efficiency. Existing process mining techniques
assume that the event log to be analyzed can be accessed in its entirety. This
assumption is reasonable in an intra-organizational setting, where all the data
about a given process is located within the same trust domain. However, in
an inter-organizational process, the event log is distributed across multiple
independent organizations. For example, in a supply chain process, a manufacturer
interacts with multiple suppliers. Each of them holds its own private event log,
which only provides a partial view of the whole process.




Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons
License Attribution 4.0 International (CC BY 4.0).
2       G. Elkoumy et al.

    Due to confidentiality concerns and privacy regulations, such as GDPR4
and HIPAA5 , the involved organizations are unwilling to, or sometimes legally
prevented from, sharing their event logs. They are prevented from sharing the
logs either with each other or with a third-party, such as an independent analysis
firm, to jointly perform the analysis.
    In this paper, we present Shareprom, a secure multi-party computation
system for inter-organizational process mining. Shareprom allows independent
cooperating organizations to control what is disclosed from their event logs. We
assume that the parties have agreed to reveal the time-annotated process map.
In other words, the analysis firm will not learn anything about the independent
organizations other than this output. Shareprom applies differential privacy to
the resulting DFG so as to provide an extra layer of privacy, as presented in
Section 2. The analysis firm will neither have access to specific traces, nor to any
information associated to these traces. And naturally, the data providers will not
be able to learn anything about each other’s private data.


2     Overview of the Tool
Shareprom uses secure multi-party computation (MPC) to process event logs and
to build the DFG matrix. Before revealing the DFG matrix, Shareprom applies
Laplacian differential privacy by the injection of noise into the matrix. Shareprom
relies on a platform for secret-sharing based MPC (specifically Sharemind [2]).
    Shareprom uses the three-party MPC protocol set of Sharemind, which is
secure against honest-but-curious adversaries. This means that as long as the
parties are following the protocols honestly and do not collude, none of them will
learn more than the size of the data. Parties in a typical secure MPC deployment
can be: input parties, computation parties, or output parties [4]. The sets of
input and computing parties may intersect. In this paper, we assume a simplified
scenario, where the three computing parties themselves provide the inputs and
receive the outputs. Two of the parties contribute the input data, and the third
party is the analysis firm that receives the final output.
    We assume that input parties share with each other the number of activities
and the maximum trace length in their event logs. Also, we assume that the
case identifiers are the same across the input parties. This is needed to conduct
preprocessing that reduces the amount of information that might be learned from
the size of data. Even with encrypted data, contextual knowledge might lead to
leakage of some information [5]. An adversarial party might learn the shortest or
the longest trace, and using the domain knowledge, they could reveal the actual
activities. For such a case, we apply padding to the logs on the client side of
each input party, so the resulting log has all the traces with the same length,
which is set to the maximum trace length. Parties can hide the maximum trace
length by adding extra padding. The logs are uploaded to computation servers
in a secret-shared manner, and the MPC protocol performs computation without
4
    https://eur-lex.europa.eu/eli/reg/2016/679/oj
5
    https://www.hhs.gov/hipaa/
                     Privacy-Preserving Inter-Organizational Process Mining       3

any intermediate declassifications. As such, during the computation, the parties
do not learn anything in addition to the sizes of padded logs. This also excludes
any attacks related to access patterns, like frequent pattern mining, which would
be possible, if the events that are equal to each other, are leaked.
   Figure 1 gives an overview of the system components. Below we summarize
the functionality of each component of Shareprom. More information about the
system components can be found in [3].

Preprocessing. Each party of the cooperating organization uses Shareprom clients
to import the XES file of their event log. Shareprom then performs preprocessing
of the event log at its organization site. The parties exchange the maximum trace
length and the number of unique activities of their entire log. All the traces need
to be padded to have the same length as the longest trace. The activities are
mapped to a one-hot encoding format, independently on each input party. Also,
a sort step of the logs by traces is performed. The two latter steps are needed as
a preparation for the subsequent secure MPC protocol, which requires the data
to be available in a specific format.

Mapping Event Log to Secret-shares. Each party uses their Shareprom client to
map their event logs into secret-shares. Each client pushes its secret-shares to the
Sharemind servers. Secret-shares do not reveal any information about the event
log. Until this point, the analysis firm, as one of the computation parties, has
only received a single share of each input, which alone looks like a random value.

DFG Matrix Calculation. Shareprom runs the Sharemind MPC protocol to
construct the DFG matrix from uploaded secret-shared data. An algorithm based
on a one-hot encoding technique allows us to ensure that the computation does
not reveal any information to the parties, including the access pattern (e.g. which
log entries follow each other). The details of this algorithm can be found in [3].
The resulting DFG takes the form of a matrix with secret-shared entries, where
each entry corresponds to a transition between two activities, containing the
frequency count and the total execution time for that transition.

Differential Privacy. The DFG itself may reveal some information about the
parties’ inputs. We consider the frequency and time difference between each pair
of activities in the DFG as a separate query. Hence, before disclosing the final
result to the analyst party, we enhance it with differential privacy by injecting
Laplacian noise to the frequency and time difference of each pair of activities.
The added noise conceals the order of activities and their execution times. We
consider understanding the trade-offs between the amount of added noise and
the utility of the output as a direction for future work.


3   Tool Packaging and Maturity
Shareprom is developed in Python 3.7 and SecreC – a dialect of the C language
supported by the Sharemind plartform [1]. For ease of use, Shareprom comes with
4            G. Elkoumy et al.


     Party
      A                                      Mapping Event
                            Preprocessing    Log to Secret
                                                Shares
                  Import
                   XES


                                                             Combine
                                                                                   Parallel Sort

                                             Mapping Event
                            Preprocessing    Log to Secret                         DFG Matrix
                                                Shares                             Calculation
                  Import
                   XES
    Party
     B                                                                             Differential
                                                              Secure Processing     Privacy        Analyzing the
              Cooperating Parties
                                                             Using Secret Shares                      model



                                            Fig. 1: Overview of Shareprom


a desktop client application. The desktop client allows a user to import an event
log (in XES format) as shown in Figure 2a. The desktop client connects with its
respective Shareprom server. In a typical configuration, three Sharemind servers
are inter-connected, e.g. two input providers and a computation node – the latter
representing for example an analysis firm, as shown in Figure 2b. The analysis
firm can view the output (with differential privacy noise) as shown in Figure 2c.
An analyst at this firm can then analyze the output and share the findings (or
the whole output) with the input providers depending on the intended use case.
    Shareprom has been tested with event logs of different characteristics, as
reported in [3]. The tool can handle event logs with several thousands of events
with up to a couple of dozens distinct activity labels. Further optimizations
targeted at specific use cases should allow it to scale to large logs in future.


4       Screencast and Source Code

A screencast is available at https://youtu.be/uz2mrYz-y-w . This screencast
illustrates a scenario where two parties (a manufacturer and a supplier) delegate
an analysis task to a data analysis firm. The manufacturer produces goods based
on purchase orders. Parts of the production pipeline need intermediate parts
from the supplier. The manufacturer orders the intermediate parts from the
supplier. The supplier produces the intermediate parts and transports them to
the manufacturer. Each party has its own separate event log. The analysis firm
analyzes the joint DFG constructed from the event logs.
    The source code, installation steps, dependencies, and example event logs can
be found at https://github.com/Elkoumy/shareprom/releases/tag/v0.2.


5       Conclusion

Shareprom leverages the secure multi-party computation capabilities of Sharemind
in order to enable multiple parties to perform inter-organizational process mining
without revealing any data other than the analysis output. A differential privacy
                     Privacy-Preserving Inter-Organizational Process Mining         5




             (a) Shareprom interface for
             Organizations




             (b) Shareprom interface for      (c) The Joint DFG After
             Analysis Firm                    Differential Privacy

                           Fig. 2: Shareprom Interface.

mechanism allows this output to be further protected against possible privacy
leakages. At present, Shareprom focuses on the computation of DFGs, which is
a basic graphical process representation used in process mining tools. Future
work aims at expanding the capabilities of Shareprom to other process mining
operations, while at the same time tackling associated scalability challenges.
Acknowledgments. This research is partly funded by ERDF via the Estonian
Centre of Excellence in ICT (EXCITE).

References
1. Bogdanov, D., Laud, P., Randmets, J.: Domain-polymorphic programming of privacy-
   preserving applications. In: Proceedings of the Ninth Workshop on Programming
   Languages and Analysis for Security. pp. 53–65 (2014)
2. Bogdanov, D., Laur, S., Willemson, J.: Sharemind: A framework for fast privacy-
   preserving computations. In: European Symposium on Research in Computer Security.
   pp. 192–206. Springer (2008)
3. Elkoumy, G., Fahrenkrog-Petersen, S.A., Dumas, M., Laud, P., Pankova, A., Weidlich,
   M.: Secure multi-party computation for inter-organizational process mining. In:
   Proceedings of the BPMDS and EMMSAD Working Conferences. Springer (2020)
4. Kamm, L.: Privacy-preserving statistical analysis using secure multi-party computa-
   tion. Ph.D. thesis, University of Tartu (2015)
5. Rafiei, M., von Waldthausen, L., van der Aalst, W.M.: Supporting confidentiality in
   process mining using abstraction and encryption. In: Data-Driven Process Discovery
   and Analysis, pp. 101–123. Springer (2018)