<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Privacy-preserving Active Learning on Sensitive Data for User Intent Classification</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Borja Balle Amazon Research pigem@amazon.co.uk</string-name>
          <email>draket@amazon.com</email>
          <email>sey@amazon.com</email>
          <email>tdiethe@amazon.co.uk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Oluwaseyi Feyisetan Amazon Research</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Thomas Drake Amazon Research</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Tom Diethe Amazon Research</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <abstract>
        <p>Active learning holds promise of significantly reducing data annotation costs while maintaining reasonable model performance. However, it requires sending data to annotators for labeling. This presents a possible privacy leak when the training set includes sensitive user data. In this paper, we describe an approach for carrying out privacy preserving active learning with quantifiable guarantees. We evaluate our approach by showing the tradeoff between privacy, utility and annotation budget on a binary classification task in a active learning setting.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Preserving data privacy is an essential tenet required to
maintain the bond of trust between consumers and corporations.
Consumers expect their data to remain secure while being
used to design better services for them without
compromising their identities – especially while carrying out sensitive
transactions and interactions. We define these potentially
compromising and personally identifiable data as sensitive
data. Annotated data drives the machine learning economy
and sensitive data holds the key to building richer experiences
for users interacting with modern AI interfaces. However, in
a bid to get annotations, sensitive data in the wrong hands
could lead to irreparable damage in terms of reputation and
trust between data holders and their users.</p>
      <p>
        This potential data transfer deserves greater monitoring in
the era of human powered crowdsourcing and active learning.
As niche classification tasks arise to power new applications,
they often lack an abundance of pre-annotated datasets. With
active learning, the learner can select a subset of the available
data points to be annotated. This can exponentially reduce
        <xref ref-type="bibr" rid="ref36">(Settles 2010)</xref>
        (in some cases) the number of training queries
required. However, the cost
        <xref ref-type="bibr" rid="ref2 ref36 ref7">(Dasgupta 2011; Arora, Nyberg,
and Rose´ 2009; Settles 2010)</xref>
        of labelling machine learning
datasets is traditionally viewed as a function of the expert,
time or price.
      </p>
      <p>In this paper, we argue that, for non-public datasets, the
cost of learning the true labels should also factor in the
privacy of the information contributed by the data owners to
the data custodians. As a result, the active learning
condition (beyond simply selecting the best examples) becomes</p>
    </sec>
    <sec id="sec-2">
      <title>Background</title>
      <p>In this section, we present an introduction to active learning
and the privacy challenge of outsourcing queries to the crowd.
We then describe k-anonymity, its shortcoming in providing
an adequate privacy model for active learning and how this
can be improved with differential privacy.</p>
      <sec id="sec-2-1">
        <title>Active Learning</title>
        <p>
          The central premise of active learning is that a model is able
to perform as well with less data, if a learner can select the
training examples that provide the highest information
          <xref ref-type="bibr" rid="ref36">(Settles 2010)</xref>
          . Formally described, using a classification task:
let D be a distribution over X ⇥ Y where the goal is to
output a label Y from the label space {±1} given an input
from the feature space X . The learner receives a batch of
i.i.d. draws (x1, y1), ..., (xn, yn) over the unknown
underlying distribution D. The value of yi is unknown unless an
annotation request is made by the learner. The objective is
to select a hypothesis function h : X ! Y , where err(h) =
P(x,y)⇠D [h(x) 6= y] is small. Given that H is the space of
all hypothesis, and h⇤ = argmin{err(h) : h 2 H} is the
hypothesis with minimum error, the aim of active learning is to
select a hypothesis h 2 H with error err(h) within reasonable
bounds of err(h⇤ ) by using few annotation requests (i.e., few
compared to a passive learner).
        </p>
        <p>
          Various strategies have been proposed to implement an
active learner. One is uncertainty sampling
          <xref ref-type="bibr" rid="ref26">(Lewis and Gale
1994)</xref>
          which attempts to select the query x0 that the model is
least convinced about; i.e., x0 = argmaxx 1 P✓ (yˆ|x), where
yˆ is the label with the highest posterior for model ✓ and x is
maximized over the range of all the unlabeled examples in the
training pool. Other approaches to uncertainty sampling use
either the margin between the two most probable classes yˆ1
and yˆ2; i.e., x0 = argminx P✓ (yˆ1|x) P✓ (yˆ2|x) or a general
entropy-based uncertainty over all the possible yˆi classes; i.e.,
x0 = argmaxx Pi P✓ (yˆi|x)logP✓ (yˆi|x).
        </p>
        <p>
          The main privacy issue with active learning stems from the
need to scale the annotation process by crowdsourcing the
labels via an open call
          <xref ref-type="bibr" rid="ref20">(Howe 2006)</xref>
          . Whenever you make
a request to an external resource, you pay a privacy cost by
transmitting the information to be annotated. This problem
is compounded when there is only one oracle
          <xref ref-type="bibr" rid="ref28 ref3 ref34">(Avidan and
Butman 2007)</xref>
          or collusion among crowd workers. In this
paper, we describe privacy notions that can be used to address
these concerns along the privacy-utility tradeoff spectrum.
Privacy-preserving machine learning
k-Anonymity At first glance, a straightforward approach
for addressing the privacy concerns of active learning could
be through k-anonymity
          <xref ref-type="bibr" rid="ref35 ref8">(Sweeney 2002; Di Castro et al.
2016)</xref>
          ; i.e., ensuring each query that is sent out for
crowdsourcing occurs at least k times. In deploying k-anonymity,
the first step involves identifying a set of quasi-identifiers. In
our context, these are user queries which can be potentially
combined with an externally available dataset to uniquely
identify a user. The frequency set of these quasi-identifiers
represent the number of occurrences in the dataset. We
therefore say that a dataset satisfies k-anonymity relative to the
quasi-identifiers if when it is projected on an external dataset,
the frequency set occurs greater than or equal to k times.
        </p>
        <p>To achieve k-anonymity when the size of the frequency
set is less than a desired k, the attributes are anonymized by
either generalizing or suppressing the information. For
example, marital status attributes listed as married, divorced or
widowed are generalized as once married, while the ethnicity
is redacted as *****.</p>
        <p>
          Despite its promise, k-anonymity has fundamental
challenges, some of which are exacerbated by our unstructured
data domain. First,
          <xref ref-type="bibr" rid="ref1">(Aggarwal 2005)</xref>
          demonstrated that
kanonymity suffers from the curse of dimensionality since
generalization (such as with traditional database columns),
requires co-occurrence of words across different examples,
but unstructured data such as text phrases tend to follow a
heavy-tailed distribution that have a low co-occurrence of
words. Secondly, the choice of quasi-identifiers might
exclude the selection of some useful sensitive attributes which
could then be used for re-identification attacks. This led to
other approaches such as l-diversity
          <xref ref-type="bibr" rid="ref10 ref32">(Machanavajjhala et al.
2006)</xref>
          and t-closeness
          <xref ref-type="bibr" rid="ref28 ref3 ref34">(Li, Li, and Venkatasubramanian 2007)</xref>
          to handle sensitive attributes. In our implementation, we
subsume the quasi-identifiers to include the entire user query.
        </p>
        <p>Therefore, by ‘hiding in the crowd’ of k, a user has
received some assurance from k-anonymity that their sensitive
query will not be outsourced from the active learning model
unless it passes a meaningful threshold. However, stronger
formal privacy guarantees are required to demonstrate that
given the user’s query, an attacker cannot decide where it
came from with certainty. With k-anonymity, we are unable
to directly quantify a privacy loss value, nor state the bounds
of the guarantee of this loss. These two quantities are
obtainable from a differential privacy model which we now
describe.</p>
        <p>
          Differential Privacy To motivate our discourse on why
we need stronger privacy guarantees than what k-anonymity
provides, we consider a hypothetical scenario: Would a user
be comfortable asking an AI agent a sensitive question, with
the knowledge that the question will be possibly used to
further train agent’s learning model? We denote the training
data available to the model before the user submission as D,
and the data after the user question as D0. These are adjacent
datasets differing on only one record. We posit that a user c
will be comfortable if (1) Q(D) = Q(D0) where Q is a query
over the dataset; and (2) P (S(c)|D0) = P (S(c)) where S
is a user secret. These 2 points are articulated in Dalenius’s
Desideratum
          <xref ref-type="bibr" rid="ref12">(Dwork 2011)</xref>
          that:
        </p>
        <p>Anything that can be learned about a respondent from
the statistical database should be learnable without
access to the database
However, we can’t make these exact guarantees because
datasets are meant to convey information and they will have
no utility if these points were true.</p>
        <p>
          What Differential Privacy
          <xref ref-type="bibr" rid="ref12 ref22 ref9">(Dwork 2011; Dwork and Roth
2014)</xref>
          offers is a strong privacy guarantee on adjacent datasets
(taking our AI agent example), that: the example selected for
active learning will be very similar whether or not the user
added their sensitive question. This means, an adversarial
annotator receiving a random training query cannot guess
with certainty if the query was from dataset D (which doesn’t
include the user’s query) or D0 (which includes it).
        </p>
        <p>With this, we state that, a randomized algorithm M :
NX ! C that receives as input a dataset D with records
from a universe X and outputs an element from C is (",
)differentially private if for every pair of databases D and
D0 differing in one record and every possible set of outputs
C ✓ C we have</p>
        <p>Pr[M(D) 2 C]  e"Pr[M(D0) 2 C] +
(1)</p>
        <p>The parameter accounts for a &lt; 1/||D||1 relaxed chance
of the ✏ guarantee not holding – otherwise, it will be
equivalent to just selecting a random sample on the order of the size
of the dataset. One benefit of the differential privacy model
is that it has a quantifiable, non binary value for privacy loss
which helps in deciding to comparatively select one
algorithm over the other. We observe an output of the random
algorithm C ⇠ M (D) where we believe that C was more
likely produced by D and not D0, then the privacy loss from
the query that yields C on an auxiliary input x is:
L(C; M, x, D, D0) d=ef ln(</p>
        <p>P r[M(D) = C]
P r[M(D0) = C]
)
(2)</p>
        <p>So we surmise that differential privacy promises to prevent
a user from sustaining additional damage by including their
data in a dataset; and the privacy loss obtained is ✏ with
probability 1 .</p>
        <p>
          A common method for making the results of a statistical
query differentially private involves adding Laplacian noise
proportional to either the query’s global sensitivity
          <xref ref-type="bibr" rid="ref10 ref11 ref32">(Dwork
2008; Dwork et al. 2006)</xref>
          or the smooth bound of the local
sensitivity
          <xref ref-type="bibr" rid="ref28 ref3 ref34">(Nissim, Raskhodnikova, and Smith 2007)</xref>
          (where
sensitivity f = max ||f D f D0||). However, for
noncontinuous domains, adding noise can result in unintended
consequences that completely wipe out the utility of the
results e.g.,
          <xref ref-type="bibr" rid="ref22 ref9">(Dwork and Roth 2014)</xref>
          describe how attempting
to add noise to the query for the optimal price for an auction
could drive the revenue to zero.
        </p>
        <p>
          Research has however shown that apart from providing
reasonable and well understood protection from inadvertent
exposure
          <xref ref-type="bibr" rid="ref35 ref8">(Di Castro et al. 2016)</xref>
          , k-anonymity can also be
used as a launchpad for achieving quantifiable differential
privacy without the utility loss that comes from applying
noise
          <xref ref-type="bibr" rid="ref17 ref30 ref31">(Li, Qardaji, and Su 2012; Soria-Comas et al. 2014)</xref>
          .
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Privacy Preserving Active Learning</title>
    </sec>
    <sec id="sec-4">
      <title>Framework</title>
      <p>This section introduces our proposed framework for carrying
out active learning with privacy guarantees on queries that
are sent to an external oracle. It presents the task we try
our approach on, highlights the considerations that drive
our choices and lays out a high level pseudo-code of our
approach.</p>
      <sec id="sec-4-1">
        <title>Task model</title>
        <p>
          Our task consists of a very large dataset of user queries
U = {u1, ..., un} that represent the user intent (we map
the queries to the intent and do not extract specific
quasiidentifiers in order to prevent leakages from un-captured
sensitive attributes). Our pipeline consists of an active
learning model which learns a binary classifier, predicting if a
user intent belongs to a specified class or not. The model
is bootstrapped with a golden set of user queries and their
associated intents. Subsequent queries from a fixed pool are
added to a RANKEDEXAMPLEPOOL where they are ordered
by confidence/uncertainty
          <xref ref-type="bibr" rid="ref14 ref18">(Gal and Ghahramani 2016)</xref>
          from
our deep learning model.
        </p>
        <p>To train the model, it first draws on the golden set, then we
make a next example call to draw an uncertain query from
the pool with the criteria that knowing the accurate intent of
this query gives the best performance increase to the model
while preserving privacy. The query is then outsourced to
external annotators and the annotated labels are re-incorporated
into the model training process.</p>
      </sec>
      <sec id="sec-4-2">
        <title>Considerations</title>
        <p>Given the size and projected scale of our dataset (⇡ 109
queries), we decide to employ randomized probabilistic
algorithms in estimating if a query satisfies k-anonymity.
Compute and memory resources are thus freed up for training and
retraining the model rather than maintaining the frequency
and cardinality of incoming queries. Each algorithm (detailed
below) is adjusted to prevent over-estimations which could
erode the privacy guarantees. Furthermore, after a query is
presumed to satisfy k-anonymity, only 1 of the k queries is
sent to n external annotators to prevent an aggregation of
privacy losses.</p>
      </sec>
      <sec id="sec-4-3">
        <title>Approach</title>
        <p>
          In this paper, we adopt the differential privacy algorithm
from
          <xref ref-type="bibr" rid="ref17 ref30 ref31">(Li, Qardaji, and Su 2012)</xref>
          but we utilize it in an active
learning setting to select a subset of training examples to
send for crowdsourcing. We also note that other DP methods
that have been designed for search logs and include a form
of k parameter aggregation such as:
          <xref ref-type="bibr" rid="ref23">(Korolova et al. 2009)</xref>
          ,
ZEALOUS
          <xref ref-type="bibr" rid="ref16">(Gotz et al. 2012)</xref>
          and SafeLog
          <xref ref-type="bibr" rid="ref35 ref8">(Zhang et al. 2016)</xref>
          can be implemented to obtain similar results.
        </p>
        <p>
          We take a two-stepped approach to extend k-anonymity
to yield a quantifiable differentially private active learning
model taking a cue from how
          <xref ref-type="bibr" rid="ref17 ref30 ref31">(Li, Qardaji, and Su 2012)</xref>
          demonstrated the use of pre-sampling to achieve differential
privacy with k-anonymity. This is predicated on THEOREM 1
from
          <xref ref-type="bibr" rid="ref17 ref30 ref31">(Li, Qardaji, and Su 2012)</xref>
          which states that: given an
algorithm M which satisfies ( 1, ✏1, 1)-differential privacy
under sampling, then M also satisfies ( 2, ✏2, 2)-differential
privacy under sampling for any 2 &lt; 1 where
✏2 = ln 1 +
✓
2 (e✏1
1
        </p>
        <p>◆!
1)
; 2 =
2
1
1
(3)</p>
        <p>
          Therefore, k-anonymity on our full dataset (i.e., 1 = 1)
can instead be preceded by a mechanism that samples each
row of its input with probability 2, with k-anonymity then
applied to the resulting sub-sample to yield ✏2, 2-differential
privacy for ✏2 = ln(1 + ( 2(e✏1 1))) within the bounds
2 = 2 1. Thus the effect of sampling serves to amplify
pre-existing privacy guarantees
          <xref ref-type="bibr" rid="ref5">(Balle, Barthe, and Gaboardi
2018)</xref>
          .
        </p>
        <p>
          Furthermore, we harden our k-anonymity to offer ‘safe’
k-anonymization by aggregating the queries by frequency
rather than using a distance based measure
          <xref ref-type="bibr" rid="ref24">(LeFevre, DeWitt,
and Ramakrishnan 2006)</xref>
          . The benefit we get from this is that
no query within our set of k contains any extraneous sensitive
text which could be used as a source of re-identification or to
carry out reconstruction attacks.
        </p>
        <p>The next sections describe: how we carry out our sampling
to ensure we select useful candidates in an efficient manner,
and how we estimate k-anonymity using the queries.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Efficient sub-sampling for active learning</title>
      <p>Given a multiset of query sets M = {U1, ..., Us} with
repetitions where a given Ui is a tuple hui, ..., uki, and a sampling
rate , our objective is to return a sub-sample from which to
Algorithm 1: Privacy-preserving Active Learning
1 Let be the sampling rate // = 2 from (3)
2 Let k be the anonymity parameter
3 Let l be number of samples for variance calculation
Data: Input multiset of samples x with unknown labels
y as: U = {x1, y1}, ..., {xn, yn}</p>
      <p>Result: U 0 filtered private multiset
4
5 Bootstrap AL probablistic model P✓ with labeled
utterances L
6 Retrieve random sub-sample U 0 of size n
7 Pool creation: to add queries to the</p>
      <p>RANKEDEXAMPLEPOOL
8 for {x, y} 2 U 0 do
9 retrieve freq(x);
10 if freq(x) k then
11 retrieve variance x = var(P✓ (yˆ|x) : 1..l) on u;
// computed using l draws
add hx, xi to RANKEDEXAMPLEPOOL;
12
13 else
14 remove {x, y} from U 0;
15 end
16 end
17
18 Acquire labels for top examples sorted by x
descending as {yˆ1, ..., yˆnˆ}
19 Set U 0 to be (x1, yˆ1), ..., (xnˆ, yˆnˆ)
20 Update model: draw the next training example from U 0
over x
21 get xˆ = argmaxu 1 P✓ (yˆ|x); - # sample we are
least confident of</p>
      <p>retrain learning model using {xˆ, yˆ}
22
23
24 return U 0
carry out k-anonymization before training our active learner.
Let n be the number of distinct query sets —{U1, ..., Us}—
with elements {e1, ..., en}. For a very large dataset size s, we
seek to estimate nˆ using only m registers where m &lt;&lt; n.
The number of distinct queries in our sample set therefore
become nˆ.</p>
      <p>
        To estimate the cardinality nˆ, we utlize the
HYPERLOGLOG algorithm by
        <xref ref-type="bibr" rid="ref13">(Flajolet et al. 2007)</xref>
        .
HYPERLOGLOG is a probabilistic cardinality estimator that uses
a very small memory footprint (⇡ 12kb per key) for a low
standard error (⇡ 0.81%) while scaling up to dataset sizes as
large as 264 items1.
      </p>
      <p>For each incoming Ui : hui, ..., uki, a hash h(Ui) is
computed and converted to base 2. The b least significant bits
are used to identify the register location to modify, where
2b = m or log2 m. With the remaining bits w, a count p(w) is
made of the number of running 0s up to the leftmost 1. For a
very large, uniformly distributed multiset of random numbers,
2 raised to the maximum value of p(w) gives a wide
approx1Values taken for the Redis implementation of HYPERLOGLOG
- http://antirez.com/news/75
imate of the cardinality. To correct this, HYPERLOGLOG
breaks the multiset into subsets and uses the harmonic mean
of the subsets.</p>
      <p>After determining our sample size, the next step is to draw
a random set of unique samples without replacement up to
nˆ. We keep each element in the dataset with probability .
The ensuing sub-sample represents the new dataset in our
RANKEDEXAMPLEPOOL from which we will carry out our
k-anonymization.</p>
    </sec>
    <sec id="sec-6">
      <title>Estimating k-anonymity using query</title>
      <p>
        frequency
Given a multiset of query sets M = {U1, ..., Us} with
repetitions such that the frequency of Ui is fUi and Ui is a
tuple hui, ..., uki. For a very large dataset size s, we seek to
ˆ
estimate fUi using sub-linear space. To estimate the query
frequency, we use the COUNT-MEAN-MIN with conservative
update
        <xref ref-type="bibr" rid="ref17 ref30 ref31">(Goyal, Daume´ III, and Cormode 2012)</xref>
        sketch
algorithm which is an improvement on the proposed COUNT-MIN
sketch algorithm by
        <xref ref-type="bibr" rid="ref6">(Cormode and Muthukrishnan 2005)</xref>
        .
For each incoming Ui : hqi, ..., qki, d different hashes of the
queries is computed and a counter indexed by each hashed
result is incremented. To return the frequency, the minimum
over all d index locations for Qi is returned. To further reduce
the potential of error from over-estimation, conservative
updates are employed to increment only the minimum counter
from the d indexes, and an estimated noise is further deducted
from the result.
      </p>
      <p>Therefore after initial pre-sampling step, we select only
queries which occur at least k times. These queries are
then added to the RANKEDEXAMPLEPOOL where the
next example is drawn based on the element with the highest
uncertainty measure. The benefit of using the frequency to
satisfy k-anonymity rather than using partitioning,
clustering and recoding, or distance based algorithms, is to prevent
attacks that rise from an attackers a-priori knowledge of a
dataset. For example, a cluster of k with one sensitive or
extreme outlier (e.g., a cluster of incomes within zip code
with one UHNW outlier becomes easily identifiable by an
attacker even though the aggregation was based on nearest
neighbors).</p>
    </sec>
    <sec id="sec-7">
      <title>Experiments</title>
      <p>Our work seeks to demonstrate quantifiable privacy
preserving guarantees in an active learning setting by taking a
presampling approach before carrying out k-anonymization. We
evaluate our approach on an internal dataset used for intent
classification on voice devices.</p>
      <sec id="sec-7-1">
        <title>Datasets</title>
        <p>The Intent Classifier dataset consists of a subset of queries
from February 2018. The dataset is used to train a model
which determines a binary intent for a user. The dataset
consists of 2.5M queries comprising 58K distinct data points.
Each record contains a user query and a label indicating if
it is categorized as a POSITIVE or NEGATIVE intent query.
Part of the dataset has also been previously discussed and
described by (Yang et al. 2018). Figures 1c and 1d show the
nature of the dataset with a histogram and plot of the
frequency distribution of the queries. As expected with textual
data, there is a long tail of queries which were observed just
once (making up ⇡ 60% of the dataset). The dataset consists
of 63% of queries labelled as POSITIVE intents vs 27% being
NEGATIVE.</p>
      </sec>
      <sec id="sec-7-2">
        <title>Experiment setup</title>
        <p>The experiment task was binary intent classification in an
active learning setting. We created a new baseline model
which predicts POSITIVE and NEGATIVE intents. For the
experiments, the model was initially bootstrapped with 1, 000
labeled examples. The active learner then queries a data pool
to get a batch of additional training examples to improve the
model. The active learning strategy was uncertainty sampling
based on confidence scores.</p>
        <p>The confidence and uncertainty scores for the active
learning model were obtained from a Bayesian deep
learning model described in (Yang et al. 2018) where model
uncertainty, quantified by Shannon entropy is U (x) =</p>
        <p>Pc( T1 Pt pˆc) log( T1 Pt pˆc) and T1 Pt pˆc is the averaged
predicted probability of class c for x, sampled T times by
Monte Carlo dropout. A histogram of the confidence and
uncertainty scores can be seen in Figures 1a and 1b.</p>
        <p>We simulated the probability of the crowd annotators
returning the correct answers to the requested queries by
drawing from a normal distribution with mean centered at 0.65
and standard deviation 0.01 (see (Yang et al. 2018)’s Figure
2(a) for more).</p>
      </sec>
      <sec id="sec-7-3">
        <title>Evaluation metrics</title>
        <p>To evaluate our results, we compared the annotation accuracy
between the baseline model, and the models trained with
active learning and our privacy preserving model. We vary
the sub-sampling parameter and the anonymization factor
k while training our model and recording its accuracy. We
set the evaluation data at 5, 000 samples (i.e., about 10%
of the dataset). We also provide privacy guarantee values
from numerical computations of ✏ and and highlight in the
appendix, what values of k and provide those levels of
guarantees.</p>
        <p>Baseline condition train standard classification model.
Sub-sampling parameter = 1, anonymization factor k = 1
i.e., using the entire dataset
Experiment conditions train classification model using
privacy preserving active learning. Sub-sampling
parameter varied at = {0.1, 0.3, 0.6, 0.9}, anonymization factor
varied at k = {1, 20, 100, 200, 500}</p>
      </sec>
      <sec id="sec-7-4">
        <title>Results</title>
        <p>The results of our experiments are presented in Figures 2, 3
and 4. Our findings provide insight to the tradeoffs between
privacy, utility and our annotation budget.
(a) Distribution of confidence (b) Distribution of uncertainty
scores for each unique query scores for each unique query
(c) Histogram of frequency dis- (d) Log frequency distribution of
tribution queries</p>
      </sec>
      <sec id="sec-7-5">
        <title>Privacy vs Utility Tradeoff</title>
        <p>Figure 2 highlights the privacy–utility tradeoff which occurs
as a result of varying and k. As expected, as the value
of k, gets smaller, i.e., by selecting more items in the tail
of the dataset, we are able to improve the accuracy of our
model. This however has the effect of degrading our privacy
guarantees. Similarly, by providing privacy amplification by
subsampling, the utility of our model suffers. Figure 2 paints a
wholistic picture of this by showing how by tuning the values
of and k, we can arrive at the same values of accuracy.
Figure 3 describes how our annotation budget changes for
different privacy settings. With a stronger privacy model,
we incur less cost as a function of less annotation requests.
By reading across the graph, we also discover that the same
budget can be realized from different privacy configurations:
e.g., by subsampling with = 0.1 and selecting k = 20, we
incur the same budget as = 0.3 and k = 100 and therefore,
the same accuracy (from Figure 2 above).
We established from Figure 2 and Figure 3, the relationship
between privacy and accuracy, and between privacy and our
annotation budget. Since we can obtain the same level of
accuracy and budget requirements from different parameter
values, Figure 4 highlights how an increase in budget affects
our overall model accuracy. Increasing the budget initially
accelerates the improvement of our model, however, the utility
gains quickly slow down. For example, after 30, 000 labels,
we do not see any significant increase in model accuracy.</p>
        <p>These results can serve as a guideline in selecting
appropriate privacy parameters for different annotation budgets
in a way that is more representative of the dataset. For
example, for a fixed annotation budget, you can reduce to
select more data points from the tail of the dataset (i.e., a
smaller k). This variation can also be done by starting with a
target accuracy score and varying and k. The results also
demonstrate that by sacrificing some utility gains, we can
make stronger privacy guarantees and reduce our annotation
budget when carrying out active learning.</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>Conclusion</title>
      <p>We now briefly revisit our results in the light of our
hypothesis. We also discuss the limitations of our process, its
implication to the broader discourse on privacy and machine
learning and conclude with future work.</p>
      <p>
        We apply the approach from
        <xref ref-type="bibr" rid="ref17 ref30 ref31">(Li, Qardaji, and Su 2012)</xref>
        to
offer privacy guarantees when training models with active
learning which requires sending unlabelled examples to an
external oracle. Our results join the conversation on
differential privacy and machine learning
        <xref ref-type="bibr" rid="ref22 ref9">(Ji, Lipton, and Elkan
2014)</xref>
        with particular reference to preserving the privacy of
users.
      </p>
      <p>Our results show that by taking a small performance hit,
we can achieve similar accuracy scores with a smaller
annotation budget and stronger privacy guarantee. One limitation
however is that we have only reported results on a binary
classification task. We are currently expanding our approach
by designing a new algorithm for differential privacy on text.
We show that the accuracy loss increases as task complexity
increases. Therefore, if we were to apply the approach in this
work to other NLP tasks, e.g. multi-class classification or
question answering, we will expect the accuracy loss to be
greater.</p>
      <p>
        Another limitation of our approach and potentially other
k parameter based approaches (
        <xref ref-type="bibr" rid="ref23">(Korolova et al. 2009)</xref>
        ,
        <xref ref-type="bibr" rid="ref16">(Gotz
et al. 2012)</xref>
        ,
        <xref ref-type="bibr" rid="ref35 ref8">(Zhang et al. 2016)</xref>
        ) to differential privacy for
text is that it will not work for tasks where almost all the data
is unique i.e., k is essentially 1 (e.g., a datasets of emails or
movie reviews). Therefore, a different approach is needed to
provide quantifiable privacy guarantees without resorting to
k-anonymity.
      </p>
      <p>We believe that this is an area worthy of further research
in order to further quantify the true cost of privacy in
crowdsourcing and machine learning. We have already begun
further work to address two of the limitations reported in this
current paper.</p>
    </sec>
    <sec id="sec-9">
      <title>Appendix</title>
      <p>Table 1 lays out a grid of ✏, scores and the corresponding
sampling parameter and anonymization factor k required
to satisfy that level of (✏, )-differential privacy. The shaded
region presents a ⇥ k high level view of how to achieve a
desired level of privacy.</p>
      <p>A few insights can be gleaned from the results, the most
obvious being that strong privacy requirements, (indicated
by small ✏ and scores as we traverse the table towards the
bottom left corner), require a higher anonymization factor
k and smaller sampling rate . This is also observed by the
fact that lowering the k factor only preserves privacy at the
highest displayed ✏ value of 1.0 and = 1 ⇥ 10 6 (at the top
right corner of the table).</p>
      <p>Observing individually, we also note that, when k and
✏ are at fixed values, as decreases, i.e., to obtain stronger
privacy guarantees, we need to lower the sampling rate .
This indicates as decreases, privacy guarantees increase.
Similarly, observing k by fixing the values of and ✏,
demonstrates that increasing k improves our privacy guarantees.
[Soria-Comas et al. 2014] Soria-Comas, J.; Domingo-Ferrer,
J.; Sa´nchez, D.; and Mart´ınez, S. 2014. Enhancing data
utility in differential privacy via microaggregation-based
kanonymity. The VLDB Journal 23(5):771–794.
[Sweeney 2002] Sweeney, L. 2002. k-anonymity: A model
for protecting privacy. International Journal of Uncertainty,
Fuzziness and Knowledge-Based Systems 10(05):557–570.
[Yang et al. 2018] Yang, J.; Drake, T.; Damianou, A.; and
Maarek, Y. 2018. Leveraging crowdsourcing data for deep
active learning an application: Learning intents in Alexa. In
Proceedings of the 2018 World Wide Web Conference on
World Wide Web, 23–32. International World Wide Web
Conferences Steering Committee.
[Zhang et al. 2016] Zhang, S.; Yang, G. H.; Singh, L.; and
Xiong, L. 2016. Safelog: Supporting web search and
mining by differentially-private query logs. In 2016 AAAI Fall
Symposium Series.
INTRODUCTION
RESULTS</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [Aggarwal 2005] Aggarwal,
          <string-name>
            <surname>C. C.</surname>
          </string-name>
          <year>2005</year>
          .
          <article-title>On k-anonymity and the curse of dimensionality</article-title>
          .
          <source>In Proceedings of the 31st international conference on Very large data bases</source>
          ,
          <fpage>901</fpage>
          -
          <lpage>909</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [Arora, Nyberg, and Rose´ 2009]
          <string-name>
            <surname>Arora</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Nyberg</surname>
            , E.; and Rose´,
            <given-names>C. P.</given-names>
          </string-name>
          <year>2009</year>
          .
          <article-title>Estimating annotation cost for active learning in a multi-annotator environment</article-title>
          .
          <source>In Proceedings of the NAACL HLT 2009 Workshop on Active Learning for Natural Language Processing</source>
          ,
          <fpage>18</fpage>
          -
          <lpage>26</lpage>
          . Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <source>[Avidan and Butman</source>
          <year>2007</year>
          ] Avidan,
          <string-name>
            <given-names>S.</given-names>
            , and
            <surname>Butman</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <article-title>Efficient methods for privacy preserving face detection</article-title>
          .
          <source>In Advances in neural information processing systems</source>
          ,
          <volume>57</volume>
          -
          <fpage>64</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [Balle, Barthe, and Gaboardi 2018]
          <string-name>
            <surname>Balle</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Barthe</surname>
            , G.; and Gaboardi,
            <given-names>M.</given-names>
          </string-name>
          <year>2018</year>
          .
          <article-title>Privacy amplification by subsampling: Tight analyses via couplings and divergences</article-title>
          . arXiv preprint arXiv:
          <year>1807</year>
          .01647.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <source>[Cormode and Muthukrishnan</source>
          <year>2005</year>
          ] Cormode,
          <string-name>
            <given-names>G.</given-names>
            , and
            <surname>Muthukrishnan</surname>
          </string-name>
          ,
          <string-name>
            <surname>S.</surname>
          </string-name>
          <year>2005</year>
          .
          <article-title>An improved data stream summary: the count-min sketch and its applications</article-title>
          .
          <source>Journal of Algorithms</source>
          <volume>55</volume>
          (
          <issue>1</issue>
          ):
          <fpage>58</fpage>
          -
          <lpage>75</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [Dasgupta 2011] Dasgupta,
          <string-name>
            <surname>S.</surname>
          </string-name>
          <year>2011</year>
          .
          <article-title>Two faces of active learning</article-title>
          .
          <source>Theoretical computer science</source>
          <volume>412</volume>
          (
          <issue>19</issue>
          ):
          <fpage>1767</fpage>
          -
          <lpage>1781</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>[Di</surname>
            Castro et al. 2016]
            <given-names>Di</given-names>
          </string-name>
          <string-name>
            <surname>Castro</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Lewin-Eytan</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Maarek</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Wolff</surname>
          </string-name>
          , R.; and
          <string-name>
            <surname>Zohar</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <year>2016</year>
          .
          <article-title>Enforcing kanonymity in web mail auditing</article-title>
          .
          <source>In Proceedings of the Ninth ACM International Conference on Web Search and Data Mining</source>
          ,
          <fpage>327</fpage>
          -
          <lpage>336</lpage>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <source>[Dwork and Roth</source>
          <year>2014</year>
          ] Dwork,
          <string-name>
            <given-names>C.</given-names>
            , and
            <surname>Roth</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          <year>2014</year>
          .
          <article-title>The algorithmic foundations of differential privacy</article-title>
          .
          <source>Foundations and Trends R in Theoretical Computer Science</source>
          <volume>9</volume>
          (
          <issue>3</issue>
          -4):
          <fpage>211</fpage>
          -
          <lpage>407</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [Dwork et al. 2006]
          <string-name>
            <surname>Dwork</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>McSherry</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Nissim</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Smith</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <year>2006</year>
          .
          <article-title>Calibrating noise to sensitivity in private data analysis</article-title>
          .
          <source>In Theory of Cryptography Conference</source>
          ,
          <volume>265</volume>
          -
          <fpage>284</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [Dwork 2008] Dwork,
          <string-name>
            <surname>C.</surname>
          </string-name>
          <year>2008</year>
          .
          <article-title>Differential privacy: A survey of results</article-title>
          .
          <source>In International Conference on Theory and Applications of Models of Computation</source>
          ,
          <fpage>1</fpage>
          -
          <lpage>19</lpage>
          . Springer.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [Dwork 2011] Dwork,
          <string-name>
            <surname>C.</surname>
          </string-name>
          <year>2011</year>
          .
          <article-title>A firm foundation for private data analysis</article-title>
          .
          <source>Communications of the ACM</source>
          <volume>54</volume>
          (
          <issue>1</issue>
          ):
          <fpage>86</fpage>
          -
          <lpage>95</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [Flajolet et al. 2007] Flajolet,
          <string-name>
            <given-names>P.</given-names>
            ;
            <surname>Fusy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E</given-names>
            ´ .;
            <surname>Gandouet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            ; and
            <surname>Meunier</surname>
          </string-name>
          ,
          <string-name>
            <surname>F.</surname>
          </string-name>
          <year>2007</year>
          .
          <article-title>Hyperloglog: the analysis of a nearoptimal cardinality estimation algorithm</article-title>
          .
          <source>In AofA: Analysis of Algorithms</source>
          ,
          <fpage>137</fpage>
          -
          <lpage>156</lpage>
          . Discrete Mathematics and Theoretical Computer Science.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <source>[Gal and Ghahramani</source>
          <year>2016</year>
          ] Gal,
          <string-name>
            <given-names>Y.</given-names>
            , and
            <surname>Ghahramani</surname>
          </string-name>
          ,
          <string-name>
            <surname>Z.</surname>
          </string-name>
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          2016.
          <article-title>Dropout as a Bayesian approximation: Representing model uncertainty in deep learning</article-title>
          .
          <source>In international conference on machine learning</source>
          ,
          <fpage>1050</fpage>
          -
          <lpage>1059</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [Gotz et al. 2012] Gotz,
          <string-name>
            <given-names>M.</given-names>
            ;
            <surname>Machanavajjhala</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ;
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            ;
            <surname>Xiao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            ; and
            <surname>Gehrke</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.</surname>
          </string-name>
          <year>2012</year>
          .
          <article-title>Publishing search logs - a comparative study of privacy guarantees</article-title>
          .
          <source>IEEE Transactions on Knowledge and Data Engineering</source>
          <volume>24</volume>
          (
          <issue>3</issue>
          ):
          <fpage>520</fpage>
          -
          <lpage>532</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [Goyal, Daume´ III, and Cormode 2012]
          <string-name>
            <surname>Goyal</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ; Daume´ III, H.; and Cormode,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <year>2012</year>
          .
          <article-title>Sketch algorithms for estimating point queries in nlp</article-title>
          .
          <source>In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning</source>
          ,
          <fpage>1093</fpage>
          -
          <lpage>1103</lpage>
          . Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [Hamm, Cao, and Belkin 2016] Hamm,
          <string-name>
            <surname>J.</surname>
          </string-name>
          ; Cao,
          <string-name>
            <surname>Y.</surname>
          </string-name>
          ; and Belkin,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <year>2016</year>
          .
          <article-title>Learning privately from multiparty data</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <source>In International Conference on Machine Learning</source>
          ,
          <fpage>555</fpage>
          -
          <lpage>563</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [Howe 2006] Howe,
          <string-name>
            <surname>J.</surname>
          </string-name>
          <year>2006</year>
          .
          <article-title>The rise of crowdsourcing.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <source>Wired magazine 14(6)</source>
          :
          <fpage>1</fpage>
          -
          <lpage>4</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [Ji, Lipton, and Elkan 2014]
          <string-name>
            <surname>Ji</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Lipton</surname>
            ,
            <given-names>Z. C.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Elkan</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <year>2014</year>
          .
          <article-title>Differential privacy and machine learning: a survey and review</article-title>
          .
          <source>arXiv preprint arXiv:1412</source>
          .
          <fpage>7584</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [Korolova et al. 2009]
          <string-name>
            <surname>Korolova</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Kenthapadi</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Mishra</surname>
          </string-name>
          , N.; and
          <string-name>
            <surname>Ntoulas</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <year>2009</year>
          .
          <article-title>Releasing search queries and clicks privately</article-title>
          .
          <source>In Proceedings of the 18th international conference on World wide web</source>
          ,
          <fpage>171</fpage>
          -
          <lpage>180</lpage>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [LeFevre, DeWitt, and Ramakrishnan 2006] LeFevre,
          <string-name>
            <given-names>K.</given-names>
            ;
            <surname>DeWitt</surname>
          </string-name>
          , D. J.; and Ramakrishnan,
          <string-name>
            <surname>R.</surname>
          </string-name>
          <year>2006</year>
          .
          <article-title>Mondrian multidimensional k-anonymity</article-title>
          .
          <source>In Data Engineering</source>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          <source>ICDE'06. Proceedings of the 22nd International Conference on</source>
          ,
          <fpage>25</fpage>
          -
          <lpage>25</lpage>
          . IEEE.
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          <source>[Lewis and Gale</source>
          <year>1994</year>
          ]
          <string-name>
            <surname>Lewis</surname>
            ,
            <given-names>D. D.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Gale</surname>
            ,
            <given-names>W. A.</given-names>
          </string-name>
          <year>1994</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          <article-title>A sequential algorithm for training text classifiers</article-title>
          .
          <source>In Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval</source>
          ,
          <fpage>3</fpage>
          -
          <lpage>12</lpage>
          . Springer-Verlag New York, Inc.
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          <string-name>
            <surname>[Li</surname>
            ,
            <given-names>Li,</given-names>
          </string-name>
          <source>and Venkatasubramanian</source>
          <year>2007</year>
          ]
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Venkatasubramanian</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <year>2007</year>
          .
          <article-title>t-closeness: Privacy beyond k-anonymity and l-diversity</article-title>
          .
          <source>In Data Engineering</source>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          <string-name>
            <surname>ICDE</surname>
          </string-name>
          <year>2007</year>
          . IEEE 23rd International Conference on,
          <fpage>106</fpage>
          -
          <lpage>115</lpage>
          . IEEE.
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [Li,
          <string-name>
            <surname>Qardaji,</surname>
          </string-name>
          <source>and Su</source>
          <year>2012</year>
          ]
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Qardaji</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Su</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          2012.
          <article-title>On sampling, anonymization, and differential privacy or, k-anonymization meets differential privacy</article-title>
          .
          <source>In Proceedings of the 7th ACM Symposium on Information, Computer and Communications Security</source>
          ,
          <fpage>32</fpage>
          -
          <lpage>33</lpage>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [Machanavajjhala et al. 2006]
          <string-name>
            <surname>Machanavajjhala</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Gehrke</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Kifer</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ; and Venkitasubramaniam,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <year>2006</year>
          .
          <article-title>l-diversity: Privacy beyond k-anonymity</article-title>
          .
          <source>In Data Engineering</source>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          <source>ICDE'06. Proceedings of the 22nd International Conference on</source>
          ,
          <fpage>24</fpage>
          -
          <lpage>24</lpage>
          . IEEE.
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [Nissim, Raskhodnikova, and Smith 2007]
          <string-name>
            <surname>Nissim</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Raskhodnikova</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Smith</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <year>2007</year>
          .
          <article-title>Smooth sensitivity and sampling in private data analysis</article-title>
          .
          <source>In Proceedings of the thirty-ninth annual ACM symposium on Theory of computing</source>
          ,
          <fpage>75</fpage>
          -
          <lpage>84</lpage>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [Papernot et al. 2016]
          <string-name>
            <surname>Papernot</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Abadi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Erlingsson</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Goodfellow</surname>
            ,
            <given-names>I.;</given-names>
          </string-name>
          and
          <string-name>
            <surname>Talwar</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <year>2016</year>
          .
          <article-title>Semi-supervised knowledge transfer for deep learning from private training data</article-title>
          .
          <source>arXiv preprint arXiv:1610</source>
          .
          <fpage>05755</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [Settles 2010]
          <string-name>
            <surname>Settles</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <year>2010</year>
          .
          <article-title>Active learning literature survey</article-title>
          .
          <year>2010</year>
          .
          <source>Computer Sciences Technical Report 1648.</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>