<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Interactive Anonymization for Privacy aware Machine Learning</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Bernd Malle</string-name>
          <email>b.malle@hci-kdd.org</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Peter Kieseberg</string-name>
          <email>PKieseberg@sba-research.org</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andreas Holzinger</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Holzinger Group HCI-KDD Institute for Medical Informatics, Statistics &amp; Documentation Medical University Graz</institution>
          ,
          <country country="AT">Austria</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>SBA Research gGmbH</institution>
          ,
          <addr-line>Favoritenstrae 16, 1040 Wien</addr-line>
        </aff>
      </contrib-group>
      <fpage>15</fpage>
      <lpage>26</lpage>
      <abstract>
        <p>Privacy aware Machine Learning is the discipline of applying Machine Learning techniques in such a way as to protect and retain personal identities during the process. This is most easily achieved by first anonymizing a dataset before releasing it for the purpose of data mining or knowledge extraction. Starting in June 2018, this will also remain the sole legally permitted way within the EU to release data without granting people involved the right to be forgotten, i.e. the right to have their data deleted on request. To governments, organizations and corporations, this represents a serious impediment to research operations, since any anonymization results in a certain degree of reduced data utility. In this paper we propose applying human background knowledge via interactive Machine Learning to the process of anonymization; this is done by eliciting human preferences for preserving some attribute values over others in the light of specific tasks. Our experiments show that human knowledge can yield measurably better classification results than a rigid automatic approach. However, the impact of interactive learning in the field of anonymization will largely depend on the experimental setup, such as an appropriate choice of application domain as well as suitable test subjects.</p>
      </abstract>
      <kwd-group>
        <kwd>Machine Learning</kwd>
        <kwd>Privacy aware ML</kwd>
        <kwd>interactive ML</kwd>
        <kwd>Knowledge Bases</kwd>
        <kwd>Anonymization</kwd>
        <kwd>k-Anonymity</kwd>
        <kwd>SaNGreeA</kwd>
        <kwd>Information Loss</kwd>
        <kwd>Weight Vectors</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction and Motivation</title>
      <p>In many sectors of today’s data-driven economies technical progress is dependent
on data mining, knowledge extraction from diverse sources, as well as the
analysis of personal information. Especially the latter constitutes a vital
buildingblock for business intelligence and the provision of personalized services, which
are practically demanded by modern society. Often, the insights necessary for
enabling organizations to provide these goods require publication, linkage, and
systematic analysis of personal data sets from heterogeneous sources, exposing
those data to the risk of leakage, with repercussions ranging from mild
inconvenience (exposure of a social profile) to potentially catastrophic ramifications
(leakage of health information to an employer).</p>
      <p>
        Living up to those challenges, governments around the world are
contemplating or already enacting new laws concerning the handling of personal data.
For instance, under the new European General Data Protection Regulations
(GDPR) taking effect on June 1st, 2018, customers are given a right to be
forgotten, meaning that an organization is obligated to remove a customer’s personal
data upon request. An exception to this rule is only granted to organizations
which anonymize data before analyzing them in any wholesale, automated
fashion. This brings us to the field of Privacy aware machine learning (PaML), e.g.
the application of ML algorithms only on previously anonymized data. Such
anonymization can be provided by perturbing data (e.g. introduction noise into
numerical values or differential privacy [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]) or k-anonymity [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] (clustering of
data into equivalence groups), which has since become the industry standard.
      </p>
      <p>
        The original requirement of k-anonymity has since been extended by the
concepts of l-diversity [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] (where every cluster must contain at least l diverse
sensitive values), t-closeness [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] (demanding that the local distribution over sensitive
values must not diverge from its global distribution by more than a threshold of
t) as well as delta-presence [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] (which incorporates the background knowledge
of a potential attacker). Although all of those concepts are interesting in their
own right, for the sake of comparing interactive ML algorithms to their fully
automatic counterpart, we only took k-anonymity into consideration.
      </p>
      <p>
        Based on our previous works on this topic [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], in which we conducted
a comparison study of binary classification performance on perturbed (selective
deletion) vs. wholesale anonymized data, in this paper we introduce the notion
of interactive Machine Learning for (k-)anonymization.
2
      </p>
      <p>
        k-Anonymity
Given the original tabular concept of anonymization, we will usually encounter
three different categories of attributes within a given dataset:
– Personal identifiers are data items which directly identify a person
without having to cross-reference or further analyze them. Examples are email
address or social security number (SSN). As personal identifiers are
immediately dangerous, this category of data is usually removed.
– Sensitive data, also called ’payload’, represents information that is crucial
for further data mining or research purposes. Examples for this category
would be disease classification, drug intake or personal income. This data
shall be preserved in the anonymized dataset and can therefore not be deleted
or generalized.
– Quasi identifiers (QI’s), are data which in themselves do not directly
reveal the identity of a person, but might be used in aggregate to reconstruct
it. For instance, [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] reported in 2002 that the identity of 87% of U.S. citizens
could be uncovered via just the 3 attributes zip code, gender and date of
birth. Despite this danger, QI’s may contain vital information to research
applications (like ZIP code in a disease spread study); they are therefore
generalized to an acceptable compromise between privacy (data loss) and
information content (data utility).
      </p>
      <p>
        Based on this categorization k-anonymity [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] was introduced as a formal
concept of privacy, in which a record is released only if its quasi-identifiers are
indistinguishable from at least k − 1 other entities in the dataset. This can be
imagined like a clustering of data into so-called equivalence groups of at least
size k, with all internal QI’s being generalized to the exact same level.
      </p>
      <p>Generalization in this setting means an abstraction of attribute value: e.g.
given two ZIP codes of ’8010’ and ’8045’, we could first generalize to ’80**’, then
incorporate another data point showing ZIP ’8500’ by generalizing the cluster to
’8***’, and finally merging with any other ZIP code to the highest level of ’all’,
also denoted as ’*’.
3</p>
      <p>
        interactive Machine Learning
Interactive ML algorithms adjust their inner workings by continuously
interacting with an outside oracle, drawing positive / negative reinforcement from
this interaction [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Such systems are especially useful for highly-personalized
predictions or decision support [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]; moreover many real-world problems exhibit
(super)exponential algorithmic runtime; in such cases human brains dwarf
machines at approximating solutions and learning from very small samples, thus
enabling us to ’intuit’ solutions efficiently [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>
        By incorporating humans as oracles into this process, we can elicit
background knowledge regarding specific use cases unknown to automatic algorithms
[
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. This however is highly dependent on the users’ experience in a certain field
as well as data / classification complexity; domain experts can of course be
expected to contribute more valuable decision points than laymen; likewise, a
low-dimensional dataset and simple classification tasks will result in higher
quality human responses than convoluted problem sets.
      </p>
      <p>
        While the authors of [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] propose a system that interacts with a user in order
to set a certain k-factor and subsequently provides a report on information loss
and Kurtosis of QI distributions, the algorithm is not interactive by our definition
in that it does not influence the inner workings of the algorithm during the
learning phase. This is also true in case of the Cornell Anonymization Toolkit
(Cat) [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ], which conducts a complete anonymization run and only afterwards
lets the user decide if they are satisfied with the results. In contrast, our approach
alters algorithmic parameters upon every (batch of) human decisions, letting the
algorithm adapt in real-time.
      </p>
      <p>
        [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] describe an approach incorporating humans into the anonymization
process by allowing them to set constraints on attribute generalization; moreover
they construct generalization hierarchies involving domain-specific ontologies.
Although this technique marks a departure from wholesale automatic
anonymization, it still lacks the dynamic human-computer interaction of our approach.
      </p>
      <p>
        Apart from the field of privacy, interactive ML is present in a wide spectrum
of applications, from bordering medical fields like protein interactions /
clusterings [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] via on-demand group-creation in social networks [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] to even teaching
algorithms suitable mappings from gestures to music-generating parameters [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
4
      </p>
    </sec>
    <sec id="sec-2">
      <title>Experiments</title>
      <p>The following sections will describe our experiment in detail, encompassing the
general iML setting, chosen data set, anonymization algorithm used as well as a
description of the overall processing pipeline employed to obtain the final results
as presented.
4.1</p>
      <sec id="sec-2-1">
        <title>General setting</title>
        <p>The basic idea of our experiment was to compare different weight vectors
representing attribute (quasi-identifier) importance during anonymization: Let’s say
that a doctor needs ro release a dataset for the purpose of studying
diseasespread; in this case ’ZIP code’ information is probably (but not necessarily) of
much greater importance then ’occupation’ or ’race’. However, if a skin cancer
study is to be performed, ’race’ information might be of utmost importance,
whereas ’ZIP code’ might be negligible.</p>
        <p>In our experiment, the task was to classify a people dataset on the target
attributes income, education level and marital status. Therefore, we tested an
equal weight vector setting against two others obtained from human experiments:
1) bias in which the user just specified which attributes they thought would be
important for a specific classification by moving sliders, and 2) iML in which
the user was tasked to decide a series of clustering possibilities by moving a
data row to one of two partly anonymized clusters presented, thereby conveying
which attributes were more important to preserve than others (Figure 1). Only
the last method constitutes an interactive learning approach by introducing an
oracle into the process.
4.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Data</title>
        <p>We chose the adults dataset from the UCI Machine Learning repository which
was generated from US census data from 1994 and contains approximately 50k
entries in it’s original; this data-set is used by many anonymization researchers
and therefore constitutes a quasi-standard. After initial preprocessing we chose
the first 500 complete data rows as our iML experimental data to be presented to
users. After obtaining bias / iml weights from the experiment, we chose the first
3k entries of the original data as the basis for producing 775 new, anonymized
data sets. Although 3k rows might seem overly frugal on our part, we have
asserted via random deletion of original data points that classifier performance
remains stable for as little as 1.5k rows. Of the original attributes (data columns)
provided 4 were deleted: ’capital-gain’ &amp; ’capital-loss’ (both were too skewed to
be useful for humans), ’fnlwgt’ (a mere weighting factor) as well as ’education’
which is also represented by ’education num’.
4.3</p>
      </sec>
      <sec id="sec-2-3">
        <title>Anonymization Algorithm</title>
        <p>
          In order to conduct our experiments, it was necessary to choose an algorithm
which would enable us to easily hook into its internal logic - we therefore chose a
greedy clustering algorithm called SaNGreeA (Social network greedy clustering)
which was introduced by [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] and implemented it in JavaScript. This enabled us
to execute it within a browser environment during our iML experiments as well
as server-side for batch-execution of all derived datasets afterwards. As a greedy
clustering algorithm SaNGreeA’s runtime lies in O(n2) - which we were willing
to accept in exchange for it’s white-box internals.
        </p>
        <p>Besides its capacity to anonymize graph structures (which we did not utilize
during this work), it is a relatively simple algorithm considering General
information loss - or GIL - during anonymization. This GIL can be interpreted by the
sum of information loss occurring during generalization of continuous (range) as
well as hierarchical attributes:</p>
        <p>s size(gen(cl)[Nj ])
GIL(cl) = |cl| · (X
j=1 size(minx N (X[Nj ]), maxx N (X[Nj ]))
j=1</p>
        <p>t
+ X height(Λ(gen(cl)[Cj ]))
height(HCj )
)
where:
- |cl| denotes the cluster cl’s cardinality;
- size([i1, i2]) is the size of the interval [i1, i2], i.e., (i2 − i1);
- Λ(w), w HCj is the sub-hierarchy of HCj rooted in w;
- height(HCj ) denotes the height of the tree hierarchy HCj ;</p>
        <p>The following formulas then give the total / normalized GIL, respectively:
GIL(G, S) =</p>
        <p>v
X GIL(clj ) and
j=1</p>
        <p>NGIL(G, S) =</p>
        <p>GIL(G, S)
n · (s + t)</p>
        <p>The algorithm starts by picking a (random or pre-defined) data row as its
first cluster, then iteratively picking best candidates for merging by minimizing
GIL until the cluster reaches size k, at which point a new data point is chosen as
the initiator for the next cluster; this process continues until all data points are
merged into clusters, satisfying the k-anonymity criterion for the given dataset.
4.4</p>
      </sec>
      <sec id="sec-2-4">
        <title>Processing pipeline for obtaining results</title>
        <p>
          Once our iML experiments had yielded enough weight vectors, we had to generate
a whole new set of anonymized datasets on which we subsequently applied 4
classifiers on each of the 3 target attributes (columns) described; therefore we
designed the following processing pipeline:
1. Taking the first 5k rows of the original, preprocessed dataset as input and
applying k-anonymization with a k-factor range of [
          <xref ref-type="bibr" rid="ref10 ref20 ref5">5, 10, 20, 50, 100, 200</xref>
          ]
and 129 different weight vectors (equal, bias, iml) from our experiments on
it, we produced 774 anonymized datasets (775 including the original).
2. We executed 4 classifiers on all of the datasets and compared their F1 score;
the reason for selecting multiple algorithms was to explore if
anonymization would yield different behaviors on different mathematical approaches
for classification. The four algorithms used were linear SVC (as a
representative of Support Vector Machines), logistic regression (gradient descent),
gradient boosting (ensemble, boosting) as well as random forest (ensemble,
bagging). While reading the datasets pertaining to the classification target of
education, the 14 different education levels present within the adult dataset
were grouped into 4 categories ’pre-high-school’, ’high school’, ’&lt;=bachelors’
and ’advanced studies’.
3. For each combination of classification target (income, marital status,
education) and weight category (equal, bias, iml ) we averaged the respective
results. Results were plotted per target, as this allows better comparison
between different classifiers. The leftmost point in all plots designates the
original, un-anonymized dataset.
5
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Results &amp; Discussion</title>
      <p>
        As per the results in our previous work on PaML [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] we generally expected
1/x shaped curves for classifier performance as factors of k are increasing. These
expectations held to only a small degree; moreover for targets education as well
as income there was no clear winner amongst the weight categories, with some
achieving better or worse depending on a specific factor of k.
      </p>
      <p>We got the smoothest results for the marital status target, with human bias
winning consistently over equal weights as well as human interaction (Figure 2).
We interpret this as stemming from the fact that there is a significant correlation
between the attributes ’marital-status’ and ’relationship’ in the dataset, which
led users to consciously overvalue the latter when prompted for their bias. It is
not completely clear why the iML results were not able to keep up in this case,
but since this seems to be a general phenomenon throughout our results, we will
discuss this in a later paragraph.</p>
      <p>On classification target education, bias still mostly outperforms iML-obtained
attribute weights, with equal weights slightly winning out at very high factors of
k (Figure 3). Although we assume that apparently important clues towards
education might be misleading (like income or working hours), this cannot explain
the difference between bias- and iML-based results. It has to be noted however,
that results on this target are distinctly inferior to those of the other scenarios
which might diminish the gap’s significance.</p>
      <p>Only on target income did we observe a partly reversed order between human
bias and iML - however at the cost of both being usually inferior to a simple
setting with equal attribute weights (Figure 4). This is especially surprising
because income was the only binary classification task in our experiments, which
should have given humans a slight advantage over the algorithm. On the other
hand, human bias seems most susceptible to falling prey to certain stereotypes
in the area of money (w.r.t. gender, race, marital status...), which would explain
the reversal of results.</p>
      <p>As for the failure of iML to significantly outperform both the equal weight
setting and especially human bias, we conjecture that our experimental setup has
produced those effects: Since we wanted our users to conduct their experiment in
real-time but needed a simple implementation of an anonymization algorithm to
enable this interaction (which resulted in an O(n2) algorithmic runtime), we had
to limit ourselves to just a tiny subset of data (500 rows, merely 1% of the original
dataset). This choice apparently resulted in generalizations proceeding far too
quickly, reaching suppression (’all’) levels prematurely, thereby denying our users
sensible clustering choices. On the other hand, the effect could also stem from
users not really trying to contribute to the experiments in a meaningful way;
this effect could only be mitigated by selecting more serious users or choosing
some less serious (more social?) application domain.</p>
      <p>Overall, we were also surprised that a seemingly absurd k-factor of 200 would
still yield comparably good results (and in some cases even improve
performance..).
6</p>
    </sec>
    <sec id="sec-4">
      <title>Open problems &amp; future challenges</title>
      <p>As iML for anonymization is still a fledgling sub-area in the larger fields of
privacy as well as Machine Learning, there are certainly innumerable possibilities
for even basic progress &amp; development. The following list is only a tiny subset
of possible research venues we deem suitable for our own future work:
– Explain the unexpected behavior of linear SVC on the income target at
high levels of k; probably by performing comparison studies on synthetically
generated datasets.
– Faster algorithm. Repeat the experiments with a faster algorithmic
implementation so that we can use thousands of data points even in real time
within a Browser: this would lead to more relaxed generalizations,
allowing the user to make better interactive choices, thus presumably improving
results by quite some margin.
– Expert domain, domain experts. Choosing an expert domain like
cancer studies in combination with proper experts like medical professionals,
we would expect both human bias as well as iML results to significantly
outperform a pre-defined weight vector.
– Different setting. On the other hand, a more ’gamified’ setting such as
recommendations within a social network could motivate amateur users to get
more immersed into the experiment, yielding better results even for mundane
application tasks.
– Different data formats. As Artificial Intelligence is slowly reaching
maturity, it is now also applied to non- and semi-structured data like audio/video</p>
      <p>or even *omics data. Since images are clearly relevant for medical research,
and humans extremely efficient at processing them, studying interactive ML
on visual data promises great scientific revenue.
7</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>Based on the emerging necessity of Privacy aware data processing, in this work
we presented a fundamental approach of bringing human knowledge to bear on
the task of anonymization via interactive Machine Learning. We devised an
experiment involving clustering of data points with respect to human preference
for attribute preservation and tested the resulting parameters on classification of
anonymized people data into classes of marital status, education and income. Our
preliminary results show that human bias can definitely contribute to even
mundane application areas, whereas more complex or convoluted tasks may require
trained professionals or better data preparation (dimensionality reduction etc.).
We also described our insights regarding technical details for iML experiments
and closed by outlining promising future research venues.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Saleema</given-names>
            <surname>Amershi</surname>
          </string-name>
          , Maya Cakmak, William Bradley Knox, and
          <string-name>
            <given-names>Todd</given-names>
            <surname>Kulesza</surname>
          </string-name>
          .
          <article-title>Power to the People: The Role of Humans in Interactive Machine Learning</article-title>
          .
          <source>AI Magazine</source>
          ,
          <volume>35</volume>
          (
          <issue>4</issue>
          ):
          <fpage>105</fpage>
          -
          <lpage>120</lpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Saleema</given-names>
            <surname>Amershi</surname>
          </string-name>
          , James Fogarty, and
          <string-name>
            <given-names>Daniel</given-names>
            <surname>Weld</surname>
          </string-name>
          .
          <article-title>ReGroup: interactive machine learning for on-demand group creation in social networks</article-title>
          .
          <source>Proceedings of the 2012 ACM annual conference on Human Factors in Computing Systems - CHI '12, page 21</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Alina</given-names>
            <surname>Campan</surname>
          </string-name>
          and
          <article-title>Traian Marius Truta. Data and structural k-anonymity in social networks</article-title>
          .
          <source>In Privacy</source>
          , Security, and Trust in KDD, pages
          <fpage>33</fpage>
          -
          <lpage>54</lpage>
          . Springer,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>Cynthia</given-names>
            <surname>Dwork.</surname>
          </string-name>
          <article-title>Differential privacy: A survey of results</article-title>
          .
          <source>In International Conference on Theory and Applications of Models of Computation</source>
          , pages
          <fpage>1</fpage>
          -
          <lpage>19</lpage>
          . Springer,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>R.</given-names>
            <surname>Fiebrink</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Trueman</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.R.</given-names>
            <surname>Cook</surname>
          </string-name>
          .
          <article-title>A metainstrument for interactive, onthe-fly machine learning</article-title>
          .
          <source>Proc. NIME</source>
          ,
          <volume>2</volume>
          :
          <fpage>3</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>A</given-names>
            <surname>Holzinger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M</given-names>
            <surname>Plass</surname>
          </string-name>
          ,
          <string-name>
            <surname>K Holzinger</surname>
          </string-name>
          ,
          <article-title>GC Crisan, CM Pintea,</article-title>
          and
          <string-name>
            <given-names>V</given-names>
            <surname>Palade</surname>
          </string-name>
          .
          <article-title>Towards interactive machine learning (iml): Applying ant colony algorithms to solve the traveling salesman problem with the human-in-the-loop approach</article-title>
          .
          <source>In IFIP International Cross Domain Conference and Workshop (CD-ARES)</source>
          , pages
          <fpage>81</fpage>
          -
          <lpage>95</lpage>
          . Springer, Heidelberg, Berlin, New York,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>Andreas</given-names>
            <surname>Holzinger</surname>
          </string-name>
          .
          <article-title>Interactive machine learning for health informatics: When do we need the human-in-the-loop?</article-title>
          <source>Springer Brain Informatics (BRIN)</source>
          ,
          <volume>3</volume>
          (
          <issue>2</issue>
          ):
          <fpage>119</fpage>
          -
          <lpage>131</lpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>Peter</given-names>
            <surname>Kieseberg</surname>
          </string-name>
          , Bernd Malle,
          <string-name>
            <given-names>Peter</given-names>
            <surname>Frhwirt</surname>
          </string-name>
          , Edgar Weippl, and
          <string-name>
            <given-names>Andreas</given-names>
            <surname>Holzinger</surname>
          </string-name>
          .
          <article-title>A tamper-proof audit and control system for the doctor in the loop</article-title>
          .
          <source>Brain Informatics</source>
          , pages
          <fpage>1</fpage>
          -
          <lpage>11</lpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>Ninghui</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Tiancheng</given-names>
            <surname>Li</surname>
          </string-name>
          , and Suresh Venkatasubramanian.
          <article-title>t-closeness: Privacy beyond k-anonymity and l-diversity</article-title>
          .
          <source>In IEEE 23rd International Conference on Data Engineering, ICDE</source>
          <year>2007</year>
          , pages
          <fpage>106</fpage>
          -
          <lpage>115</lpage>
          . IEEE,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Brian</surname>
            <given-names>C.S.</given-names>
          </string-name>
          <string-name>
            <surname>Loh</surname>
          </string-name>
          and
          <string-name>
            <surname>Patrick H.H. Then</surname>
          </string-name>
          .
          <article-title>Ontology-enhanced interactive anonymization in domain-driven data mining outsourcing</article-title>
          .
          <source>Proceedings - 2nd International Symposium on Data, Privacy</source>
          , and E-Commerce,
          <string-name>
            <surname>ISDPE</surname>
          </string-name>
          <year>2010</year>
          , (June):
          <fpage>9</fpage>
          -
          <lpage>14</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Ashwin</surname>
            <given-names>Machanavajjhala</given-names>
          </string-name>
          , Daniel Kifer, Johannes Gehrke, and Muthuramakrishnan Venkitasubramaniam. l-diversity:
          <article-title>Privacy beyond k-anonymity</article-title>
          .
          <source>ACM Transactions on Knowledge Discovery from Data (TKDD)</source>
          ,
          <volume>1</volume>
          (
          <issue>1</issue>
          ):
          <fpage>1</fpage>
          -
          <lpage>52</lpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Bernd</surname>
            <given-names>Malle</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Peter</given-names>
            <surname>Kieseberg</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Andreas</given-names>
            <surname>Holzinger</surname>
          </string-name>
          .
          <article-title>Do not disturb? classifier behavior on perturbed datasets</article-title>
          .
          <source>In Machine Learning and Knowledge Extraction</source>
          ,
          <source>IFIP CD-MAKE, Lecture Notes in Computer Science LNCS</source>
          Volume
          <volume>10410</volume>
          , pages
          <fpage>155</fpage>
          -
          <lpage>173</lpage>
          . Springer, Cham,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Bernd</surname>
            <given-names>Malle</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Peter</given-names>
            <surname>Kieseberg</surname>
          </string-name>
          , Edgar Weippl, and
          <string-name>
            <given-names>Andreas</given-names>
            <surname>Holzinger</surname>
          </string-name>
          .
          <article-title>The right to be forgotten: towards machine learning on perturbed knowledge bases</article-title>
          .
          <source>In International Conference on Availability, Reliability, and Security</source>
          , pages
          <fpage>251</fpage>
          -
          <lpage>266</lpage>
          . Springer,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Carlos</surname>
            <given-names>Moque</given-names>
          </string-name>
          , Alexandra Pomares, and Rafael Gonzalez.
          <article-title>AnonymousData.co: A Proposal for Interactive Anonymization of Electronic Medical Records</article-title>
          .
          <source>Procedia Technology</source>
          ,
          <volume>5</volume>
          :
          <fpage>743</fpage>
          -
          <lpage>752</lpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>M. E. Nergiz</surname>
            and
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Clifton</surname>
          </string-name>
          .
          <article-title>delta-presence without complete world knowledge</article-title>
          .
          <source>IEEE Transactions on Knowledge and Data Engineering</source>
          ,
          <volume>22</volume>
          (
          <issue>6</issue>
          ):
          <fpage>868</fpage>
          -
          <lpage>883</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <given-names>Pierangela</given-names>
            <surname>Samarati</surname>
          </string-name>
          .
          <article-title>Protecting respondents identities in microdata release</article-title>
          .
          <source>IEEE Transactions on Knowledge and Data Engineering</source>
          ,
          <volume>13</volume>
          (
          <issue>6</issue>
          ):
          <fpage>1010</fpage>
          -
          <lpage>1027</lpage>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <given-names>Latanya</given-names>
            <surname>Sweeney</surname>
          </string-name>
          .
          <article-title>Achieving k-anonymity privacy protection using generalization and suppression</article-title>
          .
          <source>International Journal of Uncertainty, Fuzziness and KnowledgeBased Systems</source>
          ,
          <volume>10</volume>
          (
          <issue>5</issue>
          ):
          <fpage>571</fpage>
          -
          <lpage>588</lpage>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Latanya</surname>
          </string-name>
          <article-title>Sweeney. k-anonymity: A model for protecting privacy</article-title>
          .
          <source>International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems</source>
          ,
          <volume>10</volume>
          (
          <issue>05</issue>
          ):
          <fpage>557</fpage>
          -
          <lpage>570</lpage>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>MALCOLM</surname>
            <given-names>WARE</given-names>
          </string-name>
          ,
          <string-name>
            <surname>EIBE</surname>
            <given-names>FRANK</given-names>
          </string-name>
          ,
          <string-name>
            <surname>GEOFFREY</surname>
            <given-names>HOLMES</given-names>
          </string-name>
          ,
          <string-name>
            <surname>MARK</surname>
            <given-names>HALL</given-names>
          </string-name>
          ,
          <article-title>and IAN H WITTEN</article-title>
          .
          <article-title>Interactive machine learning: letting users build classifiers</article-title>
          .
          <source>International Journal of Human-Computer Studies</source>
          ,
          <volume>55</volume>
          (
          <issue>3</issue>
          ):
          <fpage>281</fpage>
          -
          <lpage>292</lpage>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Xiaokui</surname>
            <given-names>Xiao</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Guozhang</given-names>
            <surname>Wang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Johannes</given-names>
            <surname>Gehrke</surname>
          </string-name>
          .
          <article-title>Interactive anonymization of sensitive data</article-title>
          .
          <source>Proceedings of the 35th SIGMOD international conference on Management of data - SIGMOD '09, page 1051</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>