<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>“BPELanon”: Anonymizing BPEL Processes</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Marigianna Skouradaki</string-name>
          <email>skouradaki@iaas.uni-stuttgart.de</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dieter Roller</string-name>
          <email>dieter.h.roller@iaas.uni-stuttgart.de</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Cesare Pautasso</string-name>
          <email>c.pautasso@ieee.org</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Frank Leymann</string-name>
          <email>leymann@iaas.uni-stuttgart.de</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Faculty of Informatics, University of Lugano</institution>
          ,
          <country country="CH">Switzerland</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Institute of Architecture of Application Systems, University of Stuttgart</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>We are currently developing a performance benchmark for Workflow Management System. As a first activity we are collecting real-world processes. However, to protect their competitive advantage, some companies are not willing to share their corporate assets. This work's objective is to propose a method (“BPELanon”) for BPEL process anonymization in order to deal with the problem. The method transforms a process to preserve its original structure and runtime behavior, while completely anonymizing its business semantics. Anonymization is a complicated task that must meet the requirements we outline in this paper. Namely, we need to preserve the structural and executional information while anonymizing information such as namespaces, names (activity names, variable names, partner link names etc.), and XPath expressions that may reveal proprietary information. Furthermore, the names contained in the anonymized process should be chosen carefully in order to avoid conflicts, preserve privacy, and file-readability. Multiple dependency relations among process artifacts raise the challenge of fulfilling the aforementioned requirements, as a unique change in a file potentially leads to a flow of changes to other related process artifacts.</p>
      </abstract>
      <kwd-group>
        <kwd>Anonymization</kwd>
        <kwd>BPEL</kwd>
        <kwd>Workflows</kwd>
        <kwd>Business Processes</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Given the fact that “process equals product” [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] most companies and business
organizations are not willing to share their process models with academic
researchers due to competitive reasons to protect their intellectual property. Since
our first goal with the “BenchFlow” project 1 is to collect real-world business
process models that can be later used to synthesize a Benchmark, we want to
encourage sharing of models that are suitable for our purposes without revealing
critical company information. The contributions of this work are as follows:
1. identify the requirements of anonymization methodology
2. propose a method (“BPELanon”) that exports the anonymized process model
containing the original BPEL process without its business semantics, but
solely its executable structure
      </p>
    </sec>
    <sec id="sec-2">
      <title>Approaching the Problem</title>
      <p>2.1</p>
      <sec id="sec-2-1">
        <title>Requirements</title>
        <p>
          The design of “BPELanon” must address the following initial list of requirements
identiefid during our work in various research projects, and especially during
our collaboration with industry partners: The main requirement and purpose of
methodology is to:
R1: Support both pseudonimization and anonymization of data upon the user’s
choice. Pseudonimization is the technique of masking the data, while
maintaining ways to the original data [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. On the contrary, anonymization changes
the critical data and makes it impossible to trace back the original version of
data [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. Providing the option of pseudonimization makes it possible for the
originator to trace bugs or inconsistencies found in the anonymized file, and
apply changes to the original process.
        </p>
        <p>In order to satisfy [R1] a number of other requirements occur. These can be
grouped to requirements that stem from the XML nature of BPEL:
R2: Scramble the company’s sensitive information that can be revealed in activity
names, variable names, partner link names, partnerlink type names, port type
names, message names, operation names, role names, XSD Element names,
namespaces, and XPath expressions. The name choice for these attributes is
usually descriptive, and reflects the actual actions to which they correspond.</p>
        <p>So they can reveal a lot of the process semantics.</p>
        <p>R3: The exported process model must not contain namespace information in
incoming links to external web sites that reveal business information
(backlinks)
R4: The exported process model must not contain names (including activity
names, variable names, partner link names, partnerlink type names, message
names, operation names, role names, and XSD Element names) with backlinks
to business information
R5: The exported process model must not contain XPath expressions with
backlinks to business information. If no custom XPath functions are used, [R5] is
a consequence of requirement [R4].</p>
        <p>R6: Remove description containers (comments and documentation) that reveal
critical information and semantics.</p>
        <p>BPEL-specific requirements:
R7: The exported process model must keep the structural information and
executability
R8: The exported process must maintain an equivalent run-time behavior
R9: The exported process must maintain equivalent timing behavior
The following requirements are related to the renaming methodology that will
be applied:
R10: It has to be ensured that the scrambled name prevents reverse engineering
to get the original names. For example if data is encrypted with a known
function (e.g. RSA, MD5) and we know the used key, then it is easy to obtain
the original data.</p>
        <p>R11: Names must be chosen in a way that conflicts are avoided between the
original and exported file. For example an easy name choice would be to
change each name with respect to its type followed by an ascending ID. For
example the name of activity “Payment” could have been changed to the
name “Activity1”. Nevertheless, this way is not considered safe. “Activity1”
could also have been a possible name choice for the original process model as
it is a word frequently met in Business Process Management. This would lead
to a sequence of conflicts. Which elements named “Activity1” correspond
to the anonymized element and which to the one contained in the original
process?
R12: The names must lead to an human-readable exported file. For example let’s
assume that we use UUIDs for name choice. That would lead to activity
names such as: f81d4fae-7dec-11d0-a765-00a0c91e6bf6. The exported
ifle will not be easy to read for humans.
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Challenges</title>
        <p>This section analyzes the challenges that stem from the need to satisfy the
requirements described in
2.3</p>
      </sec>
      <sec id="sec-2-3">
        <title>Requirements</title>
        <p>Each process specification is wrapped in a package which is a directory
containing all deployment artifacts. At the minimum the directory should contain
a deployment descriptor, and one or more process definitions (BPEL), WSDL,
and XSD files 1. Many dependency relations among lfies as shown in Fig. 1
increase the complexity of anonymization as small changes in a file may lead
to numerous subsequent changes to other process artifacts [Challenge 1]. The
complexity increased by the need to meet Requirement 2 [Challenge 2]. The
renaming methodology also needs to be carefully examined in order to satisfy
Requirements 9–12 [Challenge 3].</p>
        <p>The BPEL-specific requirements reveal a new set of challenges that will be
more complex to fullfil. How do data and data specific decisions afect the
runtime behavior of the anonymized model? [Challenge 4]. How is BPEL lifecycle
afected by anonymization? [Challenge 5]. To what extend will timing behavior
be maintained? [Challenge 6]. These challenges will be addressed in future work.</p>
        <p>Following the approach of “divide and conquer” the anonymization
methodology followed for each artifact should be separately and carefully examined.
In this paper we focus on the BPEL - WSDL anonymization aiming to satisfy
[Challenges 1, 2, 3].</p>
        <sec id="sec-2-3-1">
          <title>1 http://ode.apache.org/creating-a-process.html</title>
          <p>WSDL
Deployment Descriptor
&lt;&lt;dependency&gt;&gt;</p>
          <p>XSD
&lt;&lt;dependency&gt;&gt;</p>
          <p>BPEL Process
&lt;&lt;dependency&gt;&gt;</p>
          <p>Fig. 2 shows a more detailed analysis of the occurring dependencies between
the BPEL and WSDL artifacts. The grey entities represent the BPEL elements
while the green entities represent WSDL elements. The directed associations that
connect the members with each other show dependency between the entities.
The arrow shows the “direction” of dependency. This means that the member
to which the arrow leads is an artifact which creates high dependencies between
the rest of the participating entities. Therefore when this member is changed the
interconnected members should be accessed and changed correspondingly.
This section describes the methodology that is used for developing “BPELanon”.
Elements in a BPEL file can be divided into three groups:
– Free Elements Group: Elements that need to be anonymized, but are not
bound to changes that occurred in other files.
– WSDL Bounded Group: Elements that need to be changed because they were
bounded with elements that are changed in the WSDL file.
– Internally Bounded Group: Elements that need to be changed because they
are bounded to other changed elements within the same file. Internally
Bounded Groups can be found in both BPEL and WSDL files.</p>
          <p>The anonymization of “Free Elements Group” is trivial. However, the
anonymization of “WSDL Bounded Group” and “Internally Bounded Group” are more
complex tasks. For its implementation we need a “Registry of Alterations”. This
is a registry of metadata that is created during the anonymization a file and logs
the occurring changes. It must contain in the minimum the following information:
the element’s type, and the corresponding attributes’ new and old data.</p>
          <p>The main idea of the anonymization is to scan the documents (WSDL, BPEL
does not matter) looking for element attributes that might contain semantics
(critical attributes) and need to be scrambled, and adding them to the “Registry
of Alterations” the old and new value. The information on which attributes
are critical can be stored with metadata. Next we scan the documents looking
for references to the scrambled elements and update their values. Below it is
described the anonymization method for the “WSDL Bounded Group”.</p>
          <p>Anonymization starts with the creation of a metadata schema that reflects
the interconnections shown in Fig. 2. Next we construct a “Table of References”
that shows correlation of a BPEL process and its WSDL files. This is done by
parsing the &lt;bpel:import&gt; annotations of the BPEL lfie. We then process the
WSDL files, which contain the definitions for the artifacts that are referenced
in BPEL. We run through each one of the WSDL files in “Table of References”
and start anonymizing the attributes of the elements step by step. In order to
fulfill [R8] the function of anonymization will pick random worlds of an English
Dictionary 1. As argued before a world of a well known human language will lead
to more readable results than UUIDs. We only focus on the anonymization of
critical attributes as not every attribute needs to be anonymized. By maintaining
a “Registry of Alterations”, we apply the subsequent changes to the BPEL. We
have created an outer loop that repeats this process for each WSDL file. Another
option would have been to parse all WSDL files and finally apply the changes
to BPEL lfie in one parse. However WSDL files might have common names and
this would lead to more complex solution. We have therefore chosen this safer
although most likely more time consuming method.</p>
          <p>At the end of the process “Table of References” and “Registry of Alterations”
is destroyed if the tool is set to anonymize and not pseudonimize. The above
procedure describes Algorithm 1. For the anonymization of the “Internally
Bounded Group” a similar process needs to be followed.
4</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Related Work</title>
      <p>
        Attempts for anonymization can be found in various fields of computer science
such as network security (filtering, replacement, reduction of accuracy etc. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]) and
      </p>
      <sec id="sec-3-1">
        <title>1 http://www.winedt.org/Dict/</title>
        <p>
          Algorithm 1 Anonymization process of BPEL-WSDL for “WSDL Bounded
Group”
create TableOfReferences by parsing &lt;bpel:import&gt; annotations of BPEL
for all WSDL files W in tableOfReferences do
for all elements E in W do
a ← getCriticalAttributes(E)
for all a do
updateRegistryOfAlterations(E.type,a.type,a.data,“old”)
applyAnonymizationPattern(a.data)
updateRegistryOfAlterations(E.type,a.type,a.data,“new”)
end for
end for
for all element E in BPEL file do
a ← getCriticalAttributes(E)
for all a do
resultT ype ← findTypeOfInterconnection( E.type,a.type)
a.data ← getNewValueOfAttribute(resultT ype,a.data) {from
registryOfAlterations}
end for
end for
end for
if anonymization then
delete tableOfReferences
delete registryOfAlterations
end if
database systems (data generation, encryption etc. [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], k-anonymity, l-diversity,
and t-closeness1). These approaches cannot be applied to BPEL as they are
tightly tailored to the architecture and principles of diferent technologies.
        </p>
        <p>The tools XMLAnonymizer2 and XMLAnonymizerBean3 were found.
XMLAnonymizer is a primary approach to anonymization that focuses on changing
the attribute value of the XML file ([R4] partially covered). The
XMLAnonymizerBean anonymizes elements and attributes by removing the namespaces of an
XML file ([R3] partially covered). Overall, these utilities partially satisfy the
requirements of “BPELanon”. The “BPELanon” method is a more complex
approach since it deals with all the requirements and challenges described in
Sect. 2.
5</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusions and Future Work</title>
      <p>
        In this paper we have proposed a method for the anonymization of BPEL
processes. We focus on BPEL processes without extensions as experience shows
1 http://arx.deidentifier.org/
2 https://code.google.com/p/xmlanonymizer/
3 http://help.sap.com/saphelp_nw04/helpdata/en/45/
d169186a29570ae10000000a114a6b/content.htm
that BPEL is used widely in industry to implement workflows. There are more
than 60 BPEL extensions available [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], but the processes we collected so far
indicate that none of these extensions is used in real-world settings. We have
analyzed a set of requirements and challenges that make process anonymization
dificult. To address the requirements and challenges we suggest an algorithm
that is a first approach to the methodology of business process anonymization.
The main contribution of this paper is the design of a methodology with focus
on BPEL anonymization.
      </p>
      <p>In future work we will investigate what is the impact of anonymization to
the BPEL process lifecycle, the ways that data and data dependent decisions
are influenced by anonymization, and include timing behavior information into
BPELanon methodology. The implementation of “BPELanon” has started, and
will be tested with a set of workflows with various characteristics. The first release
will be then distributed to companies for evaluation and usage. We intend to
extend “BPELanon” in order to provide various options of anonymization, and
anonymization valid for other languages. After collecting a sizable sample of
anonymous process models, we will work on a method for “Statistical Analysis”
that aims to calculate useful statistical information out of the BPEL process
collection.</p>
      <p>Acknowledgments This work is funded by the “BenchFlow” (LE 2275/7-1)
project supported by German Research Foundation (DFG).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Federal</surname>
          </string-name>
          <article-title>Ministry of Justice: German Federal Data Protection Law (</article-title>
          <year>1990</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Kopp</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Görlach</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Karastoyanova</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leymann</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Reiter</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schumm</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sonntag</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Strauch</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Unger</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wieland</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Khalaf</surname>
          </string-name>
          , R.:
          <source>A Classification of BPEL Extensions. JSI</source>
          <volume>2</volume>
          (
          <issue>4</issue>
          ),
          <fpage>2</fpage>
          -
          <lpage>28</lpage>
          (
          <year>November 2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Leymannn</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Managing business processes via workflow technology</article-title>
          .
          <source>Tutorial at VLDB</source>
          <year>2001</year>
          (
          <volume>11</volume>
          -
          <issue>14</issue>
          <year>September 2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Strauch</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Breitenbücher</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kopp</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leymann</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Unger</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Cloud Data Patterns for Confidentiality</article-title>
          .
          <source>In: Proceedings of the 2 nd International Conference on Cloud Computing and Service Science</source>
          ,
          <string-name>
            <surname>CLOSER</surname>
          </string-name>
          <year>2012</year>
          . pp.
          <fpage>387</fpage>
          -
          <lpage>394</lpage>
          . SciTePress (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Vinogradov</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pastsyak</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Evaluation of data anonymization tools</article-title>
          .
          <source>In: DBKDA</source>
          <year>2012</year>
          ,
          <source>The Fourth International Conference on Advances in Databases, Knowledge, and Data Applications</source>
          . pp.
          <fpage>163</fpage>
          -
          <lpage>168</lpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Yurcik</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Woolam</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hellings</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Khan</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thuraisingham</surname>
            ,
            <given-names>B.M.</given-names>
          </string-name>
          :
          <article-title>Toward trusted sharing of network packet traces using anonymization: Single-field privacy/analysis tradeofs</article-title>
          .
          <source>CoRR abs/0710</source>
          .3979 (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>