<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Joint Conference (March</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Privacy-Preserving Data Analysis Workflows for eScience</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Khalid Belhajjame</string-name>
          <email>khalid.belhajjame@dauphine.fr</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vanilson Burégio</string-name>
          <email>vanilson.buregio@ufrpe.br</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Noura Faci</string-name>
          <email>noura.faci@univ-lyon1.fr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Edvan Soares</string-name>
          <email>edvan.soares@ufrpe.br</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Zakaria Maamar</string-name>
          <email>zakaria.maamar@zu.ac.ae</email>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mahmoud Barhamgi</string-name>
          <email>mahmoud.barhamgi@univ-lyon1.fr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Claude Bernard University</institution>
          ,
          <addr-line>Lyon</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Federal Rural University of</institution>
          ,
          <addr-line>Pernambuco, Recife</addr-line>
          ,
          <country country="BR">Brazil</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>PSL, Université Paris-Dauphine, LAMSADE</institution>
          ,
          <addr-line>Paris</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Zayed University</institution>
          ,
          <addr-line>Dubai</addr-line>
          ,
          <country country="AE">United Arab Emirates</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <volume>26</volume>
      <issue>2019</issue>
      <abstract>
        <p>Computing-intensive experiences in modern sciences have become increasingly data-driven illustrating perfectly the Big-Data era's challenges. These experiences are usually specified and enacted in the form of workflows that would need to manage (i.e., read, write, store, and retrieve) sensitive data like persons' past diseases and treatments. While there is an active research body on how to protect sensitive data by, for instance, anonymizing datasets, there is a limited number of approaches that would assist scientists identifying the datasets, generated by the worklfows, that need to be anonymized along with setting the anonymization degree that must be met. We present in this paper a preliminary for setting and inferring anonymization requirements of datasets used and generated by a workflow execution. The approach was implemented and showcased using a concrete example, and its eficiency assessed through validation exercises.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>
        Data-driven transformation and analysis (e.g., re-formatting data
and computing statistics) are omnipresent in science and have
become attractive for verifying scientists’ hypotheses. This
verification is dependent on dataset availability that third parties
(e.g., government bodies and independent organizations) supply
for re-formatting, combination, and scrutiny using what the
community refers to as complex Data analysis Workflow ( DWf) [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
A DWf is a process that has an objective (e.g., discover
prognostic molecular biomarkers) and a set of operations packaged (at
design time) into stages (e.g., pre-process and analyze) and
orchestrated (at run-time) according to data and other dependencies
that the workflow designer specifies. Despite the availability of
free datasets for the scientific community (e.g., Figshare 1,
Dataverse2, OpenAire3, and DataOne4), data providers, in certain
disciplines, are still reluctant to sharing their datasets with the
community. Indeed, there is a serious concern about dataset
inappropriate manipulation/misuse during experiences that could
lead to sensitive-data leak and/or misuse. Although this could
happen inadvertently, the consequences remain the same. As a
result, some scientists/DWfs are deprived of valuable and necessary
datasets due to some restrictions (e.g., access control policies)
1ifgshare.com
2dataverse.org
3openaire.eu
4dataone.org
that the data providers impose. Moreover, data analysis may yield
into sensitive and private data about individuals (e.g., health
conditions) that were not expected during the experiment design.
      </p>
      <p>
        Various research works (e.g., [
        <xref ref-type="bibr" rid="ref18 ref26 ref29 ref30 ref31 ref4">4, 18, 26, 29–31</xref>
        ]) have examined
data outsourcing and/or sharing from a privacy perspective. We
note, however, that in the context of data analysis workflows the
techniques/tools that assist the designer in the specification and
enforcement of data protection policies are limited. In particular,
scientists need to identify the parameters in the workflows that
carry sensitive datasets during their execution, and determine
which anonymization method should be applied to those datasets
prior to their publication. This task can be tedious, especially for
large workflows.
      </p>
      <p>
        In this preliminary work, we overcome the above issue by
providing scientists with the means to automatically (i) identify
the workflow parameters that are bound to sensitive data during
the workflow execution, and (ii) infer the anonymity degree
that needs to be applied to such datasets before releasing them
publicly. We will define what we exactly mean by anonymity
degree later on in Section 3.1 when introducing k-anonymity
[
        <xref ref-type="bibr" rid="ref23">23</xref>
        ].
      </p>
      <p>Our contributions are as follows: (i) an architecture of a
privacy preserving workflow system that preserves the privacy of
the dataset used and generated when enacting workflows, ( ii) a
method for automatically detecting sensitive dataset and setting
their anonymity degree, and (iii) a system that implements the
proposed method and experiments that showcase its eficiency
using real-world scientific workflows.</p>
      <p>The paper is organized as follows. Section 2 presents a
scientific workflow from the health-care domain that we use as a
running example. Section 3 presents an architecture for a
privacypreserving workflow environment, and then discusses certain
necessary requirements that this environment should satisfy.
Section 4 presents a new method for automatically detecting
sensitive workflow parameters, and for inferring the anonymity
degree that should be enforced when publishing the datasets
used or generated by such parameters as a result of the workflow
execution. This method is implemented and validated in Section 5
and Section 6, respectively. Section 7 presents a literature review.
Conclusions are drawn in Section 8.</p>
    </sec>
    <sec id="sec-2">
      <title>RUNNING SCENARIO</title>
      <p>
        Fig. 1 exemplifies a DWf that consists of five operations ( opi=1,5)
connected through dataflow dependencies. Input/Output
parameters are omitted for the sake of readability. This workflow’s
operations are as follows:
• op1 query a dataset to get nutrition data. Table 1 is an
example of this operation’s output listing for each patient her
average daily intake of fruits &amp; vegetables, dairy products,
meat, and dessert.
• op2 retrieves oncology data about patients in terms of type
of cancer and age (Table 2).
• op3 combines Table 1 and Table 2’s data. Specifically, it
performs a natural join on nutrition and oncology
information. The combination’s outcome is presented in Table 3.
Note that, in the general case, not all nutrition patients will
be oncology patients, and vice-versa. We have the same
patients in Tables 1 and 2 for the sake of illustration, only.
• op4 implements a machine learning model that helps
predict the likelihood of a patient to sufer from a particular
type of cancer given his/her nutrition habits. Examples
of models that can be produced are decision-based trees,
neural networks, and Bayesian networks, to mention just
a few.
• Finally, op5 generates a final report that the scientist will
examine. Such a report contains various information such
as nutrition attributes that are prevalent in identifying
the type of cancer the patients may sufer from, as well
as information about the performance of the prediction
model, e.g., accuracy, ROC curve, etc. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>We assume that dietetics&amp;nutrition and oncology departments
willing to share their datasets, should receive the necessary
guarantees that safeguard private data from being leaked, misused, or
tampered, for example. In particular, they should be able to state
that their datasets are sensitive and set the anonymity degree
that should be respected when anonymizing their datasets.</p>
    </sec>
    <sec id="sec-3">
      <title>3 PRIVACY-PRESERVING WORKFLOW</title>
    </sec>
    <sec id="sec-4">
      <title>MANAGEMENT SYSTEM</title>
      <p>This section presents the architecture of our privacy-preserving
WfMS and defines the requirements that would preserve this
privacy.</p>
    </sec>
    <sec id="sec-5">
      <title>3.1 Overview</title>
      <p>In Fig. 2, providers make their datasets available to a (trusted)
workflow management system, that will be able to manipulate
such datasets without them being anonymized. The datasets
supplied can be sensitive or non-sensitive. Sensitive datasets
carry personal details on individuals and therefore, should be
anonymized before making them publicly available.</p>
      <p>Initially, the datasets are transferred to a data repository that
is private to the workflow system in preparation for their
“cleansing" (Step 1). Once the DWf starts (Step 2), the execution
engine loads the “cleansed" datasets from the private data
repository (Step 3). The obtained intermediate and final datasets are
stored again in this repository (Step 4). If the DWf execution
reveals new insights at the scientist’s discretion, she may choose to
publish (some of) the datasets used and/or generated by the
worklfow in a public data repository (Step 6) for the benefit of the
community who could explore, reuse, or even review such datasets.
Prior to the release, these datasets are anonymized (Step 5).</p>
      <p>Data owner</p>
      <p>Trusted workflow environment
Sensi&amp;ve  Data  
Non  Sensi&amp;ve  </p>
      <p>Data  
Data owner</p>
      <p>Sensi&amp;ve  Data  
1
share
2 launch</p>
      <p>execution
3
get
inputs</p>
      <p>Private  data  
repository  
6 get data</p>
      <p>Workflow  
execu&amp;on  engine  </p>
      <p>Workflow  
workbench  
4 store
outputs
5
launch data
anonymization
Data  anonymizer  
7 publish data
Public data repositories</p>
      <p>Non  Sensi&amp;ve  </p>
      <p>Data  </p>
      <p>Non  Sensi&amp;ve  </p>
      <p>Data  </p>
      <p>Non  Sensi&amp;ve  </p>
      <p>
        Data  
Diferent techniques can be used for data anonymization,
e.g., generalization [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ], perturbation [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], suppression [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ],
encryption, k-anonymization [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ] and diferential privacy [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
Differential privacy is perhaps the most sophisticated method with
better privacy guarantees. That said, it is not suitable for our
purpose. Indeed, diferential privacy is used to protect individual
privacy in the context of statistical queries. In our case, we are
interested in providing users with the means to explore data
produced the executions of a workflow, as opposed to
generating some statistics, which is what diferential privacy is mainly
targeted for. Because of this, we use in the context of this
paper k-anonymity. k-anonymity has been extensively studied in
the database and data mining communities [
        <xref ref-type="bibr" rid="ref12 ref25">12, 25</xref>
        ]. However,
its use in data analysis workflows is still limited. To illustrate
k-anonymity, let us consider a dataset (d) of records referring
each to an individual, e.g., age, address, and gender that could be
used to reveal his identity. Such attributes are known as
quasiidentifiers. (d) is k-anonymized, where (k) is an integer, if each
quasi-identifier tuple occurs in at least (k) records in (d). For
example, the dataset illustrated in Table 4 is 2−anonymized. Each
tuple occurs at least twice in the dataset. Therefore, each patient
contained in the anonymized version of (d) cannot be
distinguished from at least 2 individuals. In the remainder of the paper,
we use the term anonymity degree to refer to (k).
3.2
      </p>
    </sec>
    <sec id="sec-6">
      <title>How to achieve a privacy-preserving</title>
    </sec>
    <sec id="sec-7">
      <title>WfMS?</title>
      <p>Datasets that a workflow uses or generate are not independent
of each other. In particular, the workflow operations will derive
new datasets from an initial set of datasets that are eventually
sensitive during the workflow execution. Dependencies between
the datasets should, therefore, be considered, when setting the
anonymity degree of the derived datasets based on the anonymity
degree of the initial sensitive datasets. With this in mind, we
present hereafter the requirements that should be met by a
worklfow environment to preserve the privacy of the datasets it uses
and generates during the execution of workflows.</p>
      <p>(1) The scientist should be able to specify the DWf’s inputs
that are bound to sensitive datasets during the execution
of DWf.
(2) Datasets’ providers that submit sensitive inputs to a
worklfow should establish their privacy requirements in terms
of degree of anonymization. This degree will then be used
to anonymize such datasets prior to their publication by
the WfMS.
(3) The dependencies between the parameters of the
operations that compose the workflow should be extracted.
Such dependencies allow identifying the sensitive datasets
that were used to derive a given dataset, with the view
to calculate the anonymity degree of the later based on
the anonymity degrees of the former. Indeed, protecting a
workflow’s input datasets may not be suficient to protect
private information. Intermediate and final datasets that
result from a workflow execution can contain sensitive
data, too.
(4) A WfMS should assist scientists in identifying workflow
parameters that are bound to sensitive datasets, and
calculating the anonymity degree that needs to be enforced
when publishing such datasets.</p>
      <p>The next section illustrates how the aforementioned
requirements are taken into account in the design of a privacy-preserving
data workflow.</p>
    </sec>
    <sec id="sec-8">
      <title>4 PRIVACY-PRESERVING DATA ANALYSIS</title>
    </sec>
    <sec id="sec-9">
      <title>WORKFLOWS</title>
      <p>We begin by presenting a formal model for a DWf and then specify
the inputs of the workflow that are sensitive and their anonymity
degree. Finally, we present a solution that automatically identifies
the sensitivity and anonymity degree of the remaining
parameters of the DWf.
4.1</p>
    </sec>
    <sec id="sec-10">
      <title>Workflow model definition</title>
      <p>Workflow model. We formally define a DWf as a tuple
⟨DWfid , OP, DL⟩ where DWfid is a unique identifier of the
worklfow, OP is a set of data manipulation operations (opi ) that
constitute the workflow, and DL is the set of data links between these
operations.</p>
      <p>An operation opi is defined by ⟨name, in, out⟩ where name
is self-descriptive, and in and out represent input and output
parameters, respectively. As some output parameters could be
other operations’ inputs, a parameter has a unique name (pname).</p>
      <p>
        Let IN = ∪op∈OP(op.in) and OUT = ∪op∈OP(op.out) be the sets
of all operations’ inputs and outputs in a DWf, respectively. The
set of data links connecting the workflow operations must then
satisfy the following: DL ⊆ (OP × OUT) × (OP × IN). A data link
relating op1’s output ⟨o, op1⟩ to op2’s input ⟨i, op2⟩ is therefore
denoted by the pair ⟨⟨o, op1⟩, ⟨i, op2⟩⟩. We use INDWf and OUTDWf
to denote DWf’s inputs and outputs, respectively. In this work,
we consider acyclic workflows that are free of loops. It is worth
noting that most of existing scientific workflow languages do not
support loops [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ].
      </p>
      <p>Sensitive parameters. To specify that a (DWf)’s given input or
output parameter carries sensitive data, we use the following
boolean function:</p>
      <p>isSensitive(⟨op, p⟩)
that is true if the data bound to ⟨op, p⟩ during the DWf’s execution
are sensitive; otherwise, false. For example, in the running
example (Section 2), the two initial parameters of the workflow are
sensitive in that their instances are collections of records about
patients along with their nutritions and cancer histories.</p>
      <p>Parameter anonymity degree. The execution of a DWf
corresponds to a DWf instance denoted by (insWf). The anonymity
degree of a DWf’s parameter (⟨p, op⟩) is defined with respect to a
given DWf instance (insWf). Indeed, diferent instances of DWf may
have as input datasets diferent anonymity degree requirements.
For example, the owner of an input dataset used for a given
worklfow instance ( insWf1) may impose a more stringent anonymity
degree than the owner of an input dataset used for a diferent
workflow instance ( insWf2). As a result the same workflow
parameter may have diferent anonymity degrees depending on the
workflow instance in question. Due to this diference in
requirement, we use the following function to specify the anonymity
degree of a given parameter ⟨p, op⟩ with respect to a workflow
instance insWf:</p>
      <p>
        anonymity(⟨p, op⟩, insWf)
For example, anonymity(⟨p, op1 ⟩, w1) = 3 specifies that the
parameter ⟨p, op1 ⟩ has an anonymity degree of 3 within the
workflow instance w1. Consider that the dataset (d) is bound
to the parameter ⟨p, op1 ⟩ within the workflow instance (w1).
Given that anonymity(⟨i, op1 ⟩, w1) = 3, (d) must be anonymized
before its publication. Specifically, each record (individual) in the
anonymized (d) must not be distinguished from at least (2) other
individuals [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ].
4.2
      </p>
    </sec>
    <sec id="sec-11">
      <title>Detecting sensitive parameters and inferring their anonymity degrees</title>
      <p>Manual identification of a workflow’s parameters that are
sensitive and setting their anonymity degrees can be tedious. This
becomes a serious concern when the workflow includes a large
number of operations. To address this issue, we propose in this
section, an approach that takes as input the sensitivity of the input
parameters of the workflow (DWf) together with their anonymity
degrees. It then detects the list of (intermediate and final)
parameters in (DWf) that may be sensitive, and infer the anonymity
degree that should be applied to the datasets bound to those
parameters during the execution of the (DWf).</p>
      <p>Parameter dependencies. Dependencies between a
worklfow (DWf)’s parameters is a key element to our approach. A
parameter ⟨op, p⟩ depends on a parameter ⟨op′, p′⟩ in a
worklfow (DWf), if during the execution of (DWf) the data bound to the
parameter ⟨op′, p′⟩ contribute to or influence the data bound to
the parameter ⟨op′, p′⟩5.</p>
      <p>Parameter dependencies can be specified by examining the
workflow specification (DWf)6. Given a workflow (DWf), the
dependencies between its parameters are inferred as follows:
• Given an operation (op) that belongs to (DWf), we can infer
that the outputs of (op) depends on its inputs. Consider for
example that ⟨i, op⟩ and ⟨o, op⟩ are an input and output of
(op). We can infer that ⟨o, op⟩ depends on ⟨i, op⟩, which
we write:</p>
      <p>dependsOn(⟨o, op⟩, ⟨i, op⟩)
• If the workfow (DWf) contains a data link connecting an
output ⟨op, o⟩ to an input ⟨op, i⟩, then we infer that ⟨op, i⟩
depends on ⟨op, o⟩, i.e., dependsOn(⟨o, op⟩, ⟨i, op′⟩). This
is because the data bound to ⟨o, op⟩ during the workflow
execution is a copy of the data bound to ⟨i, op′⟩.</p>
      <p>We also transitively derive dependencies between the
operation parameters of a workflow based on the following rules:
R1 : dependsOn∗(⟨p, op⟩, ⟨p′, op′⟩) : − dependsOn(⟨p, op⟩, ⟨p′, op′⟩)
R2 d:ependsOn∗(⟨p, op⟩, ⟨p′, op′⟩) : − dependsOn∗(⟨p, op⟩, ⟨p”, op”⟩),
dependsOn∗(⟨p”, op”⟩, ⟨p′, op′⟩)
Applying the above rules to our example workflow, we conclude
for instance, that dependsOn∗(⟨o, op3 ⟩, ⟨i, op2 ⟩), where i and o are
parameter names.</p>
      <p>Detecting sensitive parameters. We use parameter
dependencies to assist the workflow designer identify the
intermediate and final parameters that may be sensitive. Specifically, a
parameter ⟨p′, op′⟩ that is not an input to the workflow, i.e.,
⟨p′, op′⟩ &lt; I NDW f , may be sensitive if it depends on a workflow
input that is known to be sensitive, i.e.,
∃⟨i, op⟩ ∈ INDWf s.t. sensitive(i, op)</p>
      <p>∧ dependsOn∗(⟨p′, op′⟩, ⟨i, op⟩)</p>
      <p>
        Note that we say that ⟨p′, op′⟩ may be sensitive. This is
because an operation that consumes sensitive datasets may produce
5The notion of contribution and influence are in line with the derivation and
influence relationship defined by the W3C PROV recommendation [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ].
6Parameter dependencies correspond to what is referred to in the scientific workflow
community by retrospective provenance. This is because such dependencies can be
inferred from the workflow specification as opposed to other kinds of information,
e.g., execution log, which can only be obtained retrospectively once the workflow
execution terminates.
non-sensitive datasets. For example, op5 in Fig. 1 generates
nonsensitive information although its outputs are sensitive inputs of
the workflow. The output of such an operation is a report that is
free from information about individual patients.
      </p>
      <p>Inferring anonymity degree. In addition to assisting the
designer identify sensitive intermediate and final output
parameters, we also infer details about the anonymity degree
that should be applied to dataset instances of those sensitive
parameters. To illustrate this, consider that ⟨p′, op′⟩ is a sensitive
intermediate or final output parameter. The anonymity degree
of such a parameter given a workflow execution insWf can be
defined as the maximum degree of the sensitive datasets that are
used as input to the workflow and that contribute to the datasets
instances of ⟨p′, op′⟩. Taking the maximum anonymity degree
of the contributing inputs ensures that the anonymity degrees
imposed on such inputs is honored by the dependent parameter
in question. That is:
anonymity( ⟨p′, op′ ⟩, insWf) =
max({anonymity( ⟨i, op⟩, insWf) s.t. sensitive( ⟨i, op⟩)
∧ dependsOn∗( ⟨p′, op′ ⟩, ⟨i, op⟩)})</p>
      <p>
        Once anonymity degree is computed, the WfMS uses an
anonymization algorithm proposed in the literature like
Mondarian [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] before publishing the datasets used and generated as a
result of the workflow execution.
5
      </p>
    </sec>
    <sec id="sec-12">
      <title>IMPLEMENTATION</title>
      <p>
        Fig. 3 depicts the system architecture implementing our
privacyaware workflow approach. Not all the components reported in
Fig. 2 have been implemented. Indeed, instead of reinventing the
wheel, we make use of some existing popular scientific workflow
systems [
        <xref ref-type="bibr" rid="ref14 ref28 ref6">6, 14, 28</xref>
        ]. We have, therefore, focused on
implementing the Anonymizer component which consists of the following
modules.
      </p>
      <p>DWf
designer
.cwl File</p>
      <p>.JSON File
Workflow Dependency</p>
      <p>Extractor
Workflow
Loader
.JSON File
Sensitive</p>
      <p>I/O</p>
      <sec id="sec-12-1">
        <title>Workflow Dependency Extractor . This module is used to</title>
        <p>identify the dependencies between workflow parameters. It takes
as input a workflow specification and produces as output a list of
pairs of parameters ⟨p1, p2⟩ where p2 depends on p1. Let us
consider our running example of Section 2. Applying the Workflow
Dependency Extractor to this workflow reveals, for instance, that
the input of op3 depends on the inputs of op1 and op2, among
other dependencies.</p>
      </sec>
      <sec id="sec-12-2">
        <title>Sensitive Parameter Detector. This module identifies work</title>
        <p>lfow parameters that may be sensitive. It takes as input the
workflow input that is indicated (by the user or workflow’s
author) as sensitive, and the parameter dependencies produced by
Workflow Dependency Extractor. It produces as output a list
of parameters that may be sensitive. Let us consider our
running example along with the inputs of operations op1 and op2
that the scientist sets as sensitive because of handling personal
information. The Sensitive Parameter Detector concludes that
the remaining parameters of the workflow may be sensitive.
Indeed, the workflow’s all intermediate and final parameters
depend on op1 and op2 inputs. It is worth underlining that the
sensitive − detector − parameter identifies the parameters
that may be sensitive. In other words, not all the parameters that
are returned by this module will be flagged as sensitive. This
is the case for the outputs of op4 : establish correlations
and op5 : generate report, which, respectively, deliver a
machine learning model and report that are free of any
personal detail, and as such do not need to be anonymized.
Note, however, that if a parameter is not returned by the
sensitive − detector − parameter, then that means that such
parameter is definitely not sensitive.</p>
        <p>Anonymity Degree Calculator. This module computes the
anonymity degree of a workflow’s sensitive parameters. To this
end, it establishes the anonymity degree that must be met by a
sensitive parameter that is not a workflow’s initial input. Indeed,
the anonymity degree of the initial parameters of the workflow
as a whole is specified by the user. It takes as input the anonymity
degree of each input of the workflow that is known to be
sensitive, the list of parameter dependencies that are produced by the
Workflow Dependency Extractor, and the list of workflow
parameters that are identified as sensitive by the Sensitive Parameter
Detector. It then produces the anonymity degree of each
sensitive parameter of the workflow (other than the initial workflow
inputs). Let us consider the nutrition and oncology departments
that state that their data should be 2-anonymized before
publication. By using the anonymity − degree − calculator, we
establish that the anonymity degree op1,2,3’s outputs should be
equal to 2.</p>
        <p>
          k-Anonymizer. Once the anonymity degrees of the
parameters are produced, the k-Anonymizer is enabled to anonymize the
dataset instances of these parameters during a workflow
execution. The anonymization operation is out of the scope of this
paper. Instead, existing k-anonymization algorithms (e.g., ARX [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ],
an open source data anonymization tool) can be used. For
instance, Tables 4, 5, and 6 show the data obtained by anonymizing
the data of Tables 1, 2, and 3, respectively, with the anonymity
degree k = 2.
        </p>
        <p>Sensitive Parameter</p>
        <p>Detector</p>
        <p>Anonymity Degree</p>
        <p>Calculator</p>
        <p>k-Anonymizer
Annotated
sensitive I/O</p>
        <p>Workflow Loader . To ensure our system interoperability
with existing workflow systems, we decided on handling the
workflows specified in the Common Workflow Language (CWL 7).
CWL has recently gained momentum and is currently supported
by major scientific workflow systems. The Workflow Loader
module converts a CWL workflow into an equivalent JSON format,
which is used internally by our system.
7https://github.com/common-workflow-language/common-workflow-language
For validation purposes, diferent experiments were carried out
upon the system described in Section 5. 20 diferent CWL
worklfows 8 (500 executions per workflow) have been used so that
parameters like loading times, identifying parameter
dependencies and sensitive parameters, and computing anonymity degree
have been assessed. Number of operations, sensitive inputs, and
anonymity degrees highlight the diferences between these
worklfows.</p>
        <p>For each workflow, we compute the minimum, maximum,
and average overhead due to workflow loading, parameter
dependency extraction, sensitive parameter identification, and
anonymity degree computation, across the 10K executions. On
the one hand, Fig. 4 is for workflow loading. The minimum time
is nearly 0ms in most cases, which can hardly be seen on the
chart. The average time is almost the same for all workflows;
i.e., approximately equal to 0.1ms. Regarding the maximum time,
it varies between 1ms and 3ms, which are small numbers. On
the other hand, Fig. 5 is for parameter dependency extraction.
Required minimum and average time can be hardly seen on the
chart; in fact, the extraction of dependencies is instantaneous in
most cases. For the required maximum time, it is less than 0.2ms
for most workflows. However, 3 outliers have been identified,
Workflows 2, 13, and 20, that take almost 15ms in the worst case.
This can be explained by the fact that dependency extraction is
influenced by the number of input and output parameters the
workflow has. The examination of Workflows 2, 13, and 20
revealed that they have a larger number of outputs compared with
the rest of workflows.</p>
        <p>Regarding the overhead due to sensitive parameter detection
and anonymity degree calculation, it is almost instantaneous for
all workflows, and therefore there was no need to show the charts
for them (also due to limited space). In summary, the result of the
experiment we ran are encouraging and show that the overhead
due to the solution can bearly be noticed.</p>
      </sec>
    </sec>
    <sec id="sec-13">
      <title>RELATED WORK</title>
      <p>Privacy concerns in the context of workflows have been
examined by a number of proposals. We present in this section these
proposals and conclude the section by discussing how our work
advances the state of the art.</p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], Gil et al. address the issue of data privacy in the
context of DWfs. To this end, they propose an ontology that preserves
this privacy along with enforcing access control over data with
respect to a given set of access permissions. The ontology
speciifes eligible privacy-preserving policies (e.g., generalization and
anonymization) per DWf’s input/output parameter. To support
privacy policy enforcement in DWfs, a framework was developed
to represent policies as a set of elements that include applicable
context, data usage requirement, privacy protection requirement,
and corrective actions if the policy is violated.
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], Chebbi and Tata propose a workflow reduction-based
abstraction approach for workflow advertisement purposes. The
approach reduces a workflow inter-visibility using 13 rules that
depend on dependencies between operations in the workflows
along with the operation types (i.e., internal versus external.
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ], Teepe et al. analyze a business workflow specification
to determine the properties that would achieve privacy protection
of a company’s partners and customers. To this end, they
represent workflows as Color-X diagrams and then translate them into
Prolog so that privacy relevant properties over data are analyzed,
e.g., need-to-know principle. This analysis inspects the messages
sent by all employees involved in the business workflow to detect
“gossipy” employees, i.e., those who exchange more information
than they are asked for.
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ], Sharif et al. introduce MPHC standing for
Multiterminal Cut for Privacy in Hybrid Clouds framework to minimize
the cost of executing workflows while satisfying both task/data
privacy and deadline/budget constraints. In [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ], Sharif et al.
extend MPHC with Bell-LaPadula rules so that all data and tasks
are deployed over hybrid cloud instances with greater or equal
privacy levels.
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], Alhaqbani et al. propose a privacy-enforcement
approach for business workflows based on 4 requirements: (i)
capture the subject (i.e., data owner)’s privacy policy during the
workflow specification on top of the privacy policies defined by
the workflow administrator, (ii) define data properties (i.e., hide
and generalize) linked to private data so that these properties
influence the workflow engine to protect data as per the
subject’s privacy policy, (iii) allocate work while preserving privacy,
i.e., assign the task referring to some manipulation of data, to the
employee who has the lowest restriction level according to the
subject’s privacy policy, and (iv) keep the subject informed about
any attempt for accessing his/her data.
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], Barth et al. present a privacy-policy violation detection
approach based on execution logs of business processes. The
aim is to identify a set of employees potentially responsible for
privacy breach. The authors introduce two types of compliance:
strong and weak. An action is strongly compliant with a privacy
policy given a trace if there exists an extension of the trace that
contains the action and satisfies the policy. An action is weakly
compliant with a policy given a trace if the trace augmented
with the action satisfies the present requirements of the privacy
policy.
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], Davidson et al. discuss privacy-preserving management
of provenance-aware workflow systems. The authors first
formalize the privacy concerns: (i) data privacy that requires outputs
of the workflow’s modules ( aka operations) should not reveal to
users without an access privilege, (ii) module privacy that requires
the functionality of this module is not revealed, and (iii) structural
privacy that refers to hiding the data flow’s structure in the given
execution.
      </p>
      <p>
        The aforementioned proposals can be classified into two
categories. Those that preserve the privacy of tasks (operations) of
workflows. This is exemplified in the works by Barth et al. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]
and Davidson et al. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. And those that preserve the privacy
of data that workflows manipulate at run-time. This is
exemplified with the works of Gil et al. [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], Teepe et al. [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ], and
Alhaqbani et al. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Contrarily, the work of Sharif et al. [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ]
addresses the privacy of both task and data. In the context of
our work, we are concerned with the privacy of workflow data
and hence, is in line with the second category of proposals.
However, achieving this privacy requires that the workflow designer
manually identifies sensitive workflow parameters and sets the
degree to which the datasets bound to those parameters need to
be anonymized. We have taken care of both aspects in our work.
8
      </p>
    </sec>
    <sec id="sec-14">
      <title>CONCLUSION</title>
      <p>
        We presented an approach for preserving privacy in the context
of scientific workflows that heavily rely on large datasets. We
have shown how data plays a role in (i) identifying sensitive
operation parameters in the workflow and ( ii) deriving the anonymity
degree that needs to be enforced when publishing the datasets
instances of these parameters. To the best of our knowledge, this
is the first work that looks into these aforementioned items (i)
and (ii). We have also implemented a system that showcases our
solution and conducted some experiments for eficiency needs.
This work opens up opportunities for more research in the field
of anonymization of workflow data. In this respect, our
ongoing work includes investigating the applicability of our solution
to anonymization techniques, other than k-anonymity, e.g.,
ldiversity and t-closeness [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1] [n. d.].
          <article-title>A critique of k-anonymity and some of its enhancements</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>B.</given-names>
            <surname>Alhaqbani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Adams</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. J.</given-names>
            <surname>Fidge</surname>
          </string-name>
          , and
          <string-name>
            <surname>A. H. M. ter Hofstede</surname>
          </string-name>
          .
          <year>2013</year>
          . PrivacyAware Workflow Management . Springer, Dortmund, Germany,
          <fpage>111</fpage>
          -
          <lpage>128</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>E.</given-names>
            <surname>Alpaydin</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Introduction to Machine Learning (2nd ed</article-title>
          .). The MIT Press, Cambridge, Massachusset, USA.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>G.</given-names>
            <surname>Antoniou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Baldoni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. A.</given-names>
            <surname>Bonatti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Nejdl</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Olmedilla</surname>
          </string-name>
          .
          <year>2007</year>
          .
          <source>In Secure Data Management in Decentralized Systems</source>
          . Springer,
          <fpage>169</fpage>
          -
          <lpage>216</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Barth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. C.</given-names>
            <surname>Mitchell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Datta</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Sundaram</surname>
          </string-name>
          .
          <year>2007</year>
          .
          <article-title>Privacy and Utility in Business Processes</article-title>
          .
          <source>In Computer Security Foundations Symposium - CSF</source>
          , 6
          <article-title>-8 July</article-title>
          . IEEE, Venice, Italy,
          <fpage>279</fpage>
          -
          <lpage>294</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S. P.</given-names>
            <surname>Callahan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Freire</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Santos</surname>
          </string-name>
          , et al.
          <year>2006</year>
          .
          <article-title>Vistrails: Visualization meets data management</article-title>
          .
          <source>In SIGMOD</source>
          . ACM Press, Chicago, IL, USA,
          <fpage>745</fpage>
          -
          <lpage>747</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>I.</given-names>
            <surname>Chebbi</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Tata</surname>
          </string-name>
          .
          <year>2007</year>
          .
          <article-title>Workflow Abstraction for Privacy Preservation</article-title>
          .
          <source>In International Conference on Web Information Systems Engineering - WISE, December 3</source>
          . Springer Link, Nancy, France,
          <fpage>166</fpage>
          -
          <lpage>177</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S. B.</given-names>
            <surname>Davidson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Khanna</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Tannen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Roy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Milo</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Stoyanovich</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>Enabling Privacy in Provenance-Aware Workflow Systems</article-title>
          .
          <source>In Biennial Conference on Innovative Data Systems Research, January</source>
          <volume>9</volume>
          -
          <fpage>12</fpage>
          . CIDR Conference, Asilomar, CA, USA,
          <fpage>215</fpage>
          -
          <lpage>218</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>E.</given-names>
            <surname>Deelman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Gannon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Shields</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and I.</given-names>
            <surname>Taylor</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>Workflows and eScience: An Overview of Workflow System Features and Capabilities</article-title>
          .
          <source>Future Generation Computer Systems</source>
          <volume>25</volume>
          ,
          <issue>5</issue>
          (
          <year>2009</year>
          ),
          <fpage>528</fpage>
          -
          <lpage>540</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>R. B. Dolby</surname>
            , G. Harvey,
            <given-names>N. P.</given-names>
          </string-name>
          <string-name>
            <surname>Jenkins</surname>
            , and
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Raviraj</surname>
          </string-name>
          .
          <year>2000</year>
          .
          <article-title>Data suppression and regeneration</article-title>
          . (
          <year>2000</year>
          ).
          <source>US Patent 6</source>
          ,
          <issue>038</issue>
          ,
          <fpage>231</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>Cynthia</given-names>
            <surname>Dwork</surname>
          </string-name>
          .
          <year>2006</year>
          .
          <article-title>Diferential Privacy</article-title>
          .
          <source>In Automata, Languages and Programming</source>
          , 33rd International Colloquium,
          <string-name>
            <surname>ICALP</surname>
          </string-name>
          <year>2006</year>
          , Venice, Italy,
          <source>July 10- 14</source>
          ,
          <year>2006</year>
          , Proceedings, Part II. Springer,
          <fpage>1</fpage>
          -
          <lpage>12</lpage>
          . https://doi.org/10.1007/11787006_
          <fpage>1</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>A.</given-names>
            <surname>Friedman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Wolf</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Schuster</surname>
          </string-name>
          .
          <year>2008</year>
          .
          <article-title>Providing k-anonymity in data mining</article-title>
          .
          <source>The VLDB Journal 17</source>
          ,
          <issue>4</issue>
          (
          <year>2008</year>
          ),
          <fpage>789</fpage>
          -
          <lpage>804</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Gil</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.K.</given-names>
            <surname>Cheung</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Ratnakar</surname>
          </string-name>
          , and
          <string-name>
            <surname>K-K. Chan</surname>
          </string-name>
          .
          <year>2007</year>
          .
          <article-title>Privacy Enforcement in Data Analysis Workflows</article-title>
          .
          <source>In AAAI Workshop on Privacy Enforcement</source>
          and
          <article-title>Accountability with Semantics (PEAS)</article-title>
          . AAAI, Busan, Korea,
          <fpage>41</fpage>
          -
          <lpage>48</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Gil</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Ratnakar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kim</surname>
          </string-name>
          , et al.
          <year>2011</year>
          .
          <article-title>Wings: Intelligent Workflow-Based Design of Computational Experiments</article-title>
          .
          <source>Intelligent Systems</source>
          <volume>26</volume>
          ,
          <issue>1</issue>
          (
          <year>2011</year>
          ),
          <fpage>62</fpage>
          -
          <lpage>72</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>H.</given-names>
            <surname>Kargupta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Datta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Wang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Sivakumar</surname>
          </string-name>
          .
          <year>2003</year>
          .
          <article-title>On the privacy preserving properties of random data perturbation techniques</article-title>
          .
          <source>In International Conference on Data Mining - ICDM'03</source>
          . IEEE, Melbourne, Florida, USA,
          <fpage>99</fpage>
          -
          <lpage>106</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>K. LeFevre</surname>
          </string-name>
          ,
          <string-name>
            <surname>D. J. DeWitt</surname>
            , and
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Ramakrishnan</surname>
          </string-name>
          .
          <year>2006</year>
          .
          <article-title>Mondrian Multidimensional K-Anonymity</article-title>
          .
          <source>In International Conference on Data Engineering, ICDE</source>
          <year>2006</year>
          ,
          <article-title>3-8 April</article-title>
          . IEEE, Atlanta,
          <string-name>
            <surname>GA</surname>
          </string-name>
          , USA,
          <volume>25</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>J.</given-names>
            <surname>Liu</surname>
          </string-name>
          , E. Pacitti,
          <string-name>
            <given-names>P.</given-names>
            <surname>Valduriez</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Mattoso</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>A Survey of Data-Intensive Scientific Workflow Management</article-title>
          .
          <source>J. Grid Comput</source>
          .
          <volume>13</volume>
          ,
          <issue>4</issue>
          (
          <year>2015</year>
          ),
          <fpage>457</fpage>
          -
          <lpage>493</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>A.</given-names>
            <surname>Machanavajjhala</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kifer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gehrke</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Venkitasubramaniam</surname>
          </string-name>
          .
          <year>2007</year>
          .
          <article-title>l-diversity: Privacy beyond k-anonymity</article-title>
          .
          <source>Transactions on Knowledge Discovery from Data (TKDD) 1</source>
          ,
          <issue>1</issue>
          (
          <year>2007</year>
          ),
          <fpage>3</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>P.</given-names>
            <surname>Missier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Belhajjame</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Cheney</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>The W3C PROV family of specifications for modelling provenance metadata</article-title>
          .
          <source>In Joint 2013 EDBT/ICDT Conferences, EDBT '13 Proceedings, Genoa, Italy, March</source>
          <volume>18</volume>
          -22,
          <year>2013</year>
          . ACM press,
          <fpage>773</fpage>
          -
          <lpage>776</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>F.</given-names>
            <surname>Prasser</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Kohlmayer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Lautenschläger</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Kuhn</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>ARX - A Comprehensive Tool for Anonymizing Biomedical Data</article-title>
          . In American Medical Informatics Association Annual Symposium. AMIA.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>S.</given-names>
            <surname>Sharif</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Taheri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. Y.</given-names>
            <surname>Zomaya</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Nepal</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>MPHC: Preserving Privacy for Workflow Execution in Hybrid Clouds</article-title>
          .
          <source>In International Conference on Parallel and Distributed Computing</source>
          , Applications and Technologies,
          <string-name>
            <surname>PDCAT</surname>
          </string-name>
          , December
          <volume>16</volume>
          -
          <fpage>18</fpage>
          . Sponsored by IEEE, Taipei, Taiwan,
          <fpage>272</fpage>
          -
          <lpage>280</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>S.</given-names>
            <surname>Sharif</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Watson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Taheri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Nepal</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A. Y.</given-names>
            <surname>Zomaya</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Privacy-Aware Scheduling SaaS in High Performance Computing Environments</article-title>
          .
          <source>IEEE Trans. Parallel Distrib. Syst</source>
          .
          <volume>28</volume>
          ,
          <issue>4</issue>
          (
          <year>2017</year>
          ),
          <fpage>1176</fpage>
          -
          <lpage>1188</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>L.</given-names>
            <surname>Sweeney</surname>
          </string-name>
          .
          <year>2002</year>
          .
          <article-title>k-anonymity: A model for protecting privacy</article-title>
          .
          <source>International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 10</source>
          ,
          <issue>05</issue>
          (
          <year>2002</year>
          ),
          <fpage>557</fpage>
          -
          <lpage>570</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>W.</given-names>
            <surname>Teepe</surname>
          </string-name>
          , R.P. van de Riet, and
          <string-name>
            <given-names>M.S.</given-names>
            <surname>Olivier</surname>
          </string-name>
          .
          <year>2003</year>
          .
          <article-title>WorkFlow Analyzed for Security and Privacy in using Databases</article-title>
          .
          <source>Journal of Computer Security</source>
          <volume>11</volume>
          ,
          <issue>3</issue>
          (
          <year>2003</year>
          ),
          <fpage>271</fpage>
          -
          <lpage>282</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>M.</given-names>
            <surname>Terrovitis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Mamoulis</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Kalnis</surname>
          </string-name>
          .
          <year>2008</year>
          .
          <article-title>Privacy-preserving anonymization of set-valued data</article-title>
          .
          <source>VLDB Endowment 1</source>
          ,
          <issue>1</issue>
          (
          <year>2008</year>
          ),
          <fpage>115</fpage>
          -
          <lpage>125</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>B.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and H.</given-names>
            <surname>Li</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>Oruta: Privacy-Preserving Public Auditing for Shared Data in the Cloud</article-title>
          .
          <source>In Proceedings of the 2012 IEEE Fifth International Conference on Cloud Computing. IEEE Computer Society</source>
          ,
          <fpage>295</fpage>
          -
          <lpage>302</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>K.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. S.</given-names>
            <surname>Yu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Chakraborty</surname>
          </string-name>
          .
          <year>2004</year>
          .
          <article-title>Bottom-up generalization: A data mining solution to privacy protection</article-title>
          .
          <source>In International Conference on Data Mining - ICDM'04</source>
          . IEEE, Brighton, UK,
          <fpage>249</fpage>
          -
          <lpage>256</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>K.</given-names>
            <surname>Wolstencroft</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Haines</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Fellows</surname>
          </string-name>
          , et al.
          <year>2013</year>
          .
          <article-title>The Taverna workflow suite: designing and executing workflows of Web Services on the desktop, web or in the cloud</article-title>
          .
          <source>Nucleic acids research</source>
          (
          <year>2013</year>
          ),
          <fpage>W557âĂŞW561</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>S.</given-names>
            <surname>Guadie Worku</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhao</surname>
          </string-name>
          , and
          <string-name>
            <given-names>X.</given-names>
            <surname>He</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Secure and Eficient Privacypreserving Public Auditing Scheme for Cloud Storage</article-title>
          .
          <source>Comput. Electr. Eng</source>
          .
          <volume>40</volume>
          ,
          <issue>5</issue>
          (
          <year>2014</year>
          ),
          <fpage>1703</fpage>
          -
          <lpage>1713</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>X.</given-names>
            <surname>Xiao</surname>
          </string-name>
          and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Tao</surname>
          </string-name>
          .
          <year>2006</year>
          .
          <article-title>Anatomy: Simple and efective privacy preservation</article-title>
          .
          <source>In Proceedings of the 32nd international conference on Very large data bases. VLDB Endowment</source>
          ,
          <volume>139</volume>
          -
          <fpage>150</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>M.</given-names>
            <surname>Yiu</surname>
          </string-name>
          , G. Ghinita,
          <string-name>
            <given-names>C.</given-names>
            <surname>Jensen</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Kalnis</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>Outsourcing Search Services on Private Spatial Data</article-title>
          .
          <source>In Proceedings of the 25th International Conference on Data Engineering, ICDE 2009, March 29 2009 - April 2</source>
          <year>2009</year>
          , Shanghai, China. IEEE,
          <fpage>1140</fpage>
          -
          <lpage>1143</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>