<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Privacy-Preserving Textual Analysis via Calibrated Perturbations</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Thomas Drake Amazon draket@amazon.com</string-name>
          <email>draket@amazon.com</email>
          <email>pigem@amazon.co.uk</email>
          <email>sey@amazon.com</email>
          <email>tdiethe@amazon.co.uk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Borja Balle Amazon</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Oluwaseyi Feyisetan Amazon</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Tom Diethe Amazon</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2020</year>
      </pub-date>
      <abstract>
        <p>Accurately learning from user data while providing quanti able privacy guarantees provides an opportunity to build better ML models while maintaining user trust. This paper presents a formal approach to carrying out privacy preserving text perturbation using the notion of d -privacy designed to achieve geo-indistinguishability in location data. Our approach applies carefully calibrated noise to vector representation of words in a high dimension space as de ned by word embedding models. We present a privacy proof that satises d -privacy where the privacy parameter provides guarantees with respect to a distance metric de ned by the word embedding space. We demonstrate how can be selected by analyzing plausible deniability statistics backed up by large scale analysis on G V and T embeddings. We conduct privacy audit experiments against 2 baseline models and utility experiments on 3 datasets to demonstrate the tradeo between privacy and utility for varying values of on di erent task types. Our results demonstrate practical utility (&lt; 2% utility loss for training binary classi ers) while providing better privacy guarantees than baseline models.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Copyright ©2020 for this paper by its authors. Use permitted under Creative Commons
License Attribution 4.0 International (CC BY 4.0). Presented at the PrivateNLP 2020
Workshop on Privacy in Natural Language Processing Colocated with 13th ACM
International WSDM Conference, 2020, in Houston, Texas, USA.</p>
      <p>PrivateNLP ’20, February 7, 2020, Houston, TX, USA
© 2020
A viable solution: Differential Privacy
ε-Differential Privacy (DP) bounds the
influence of any single input on the
output of a computation.</p>
    </sec>
    <sec id="sec-2">
      <title>Differential Privacy Mechanism Details Summary</title>
      <p>•User’s goal: meet some specific need with
respect to an issued query x
•Agent’s goal: satisfy the user’s request
•Question: what occurs when x is used to
make other inferences about the user
•Mechanism: modify the query to protect
privacy whilst preserving semantics
•Our approach: Generalized Metric
Differential Privacy.</p>
    </sec>
    <sec id="sec-3">
      <title>Introduction</title>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>