-

Privacy-Preserving Textual Analysis via Calibrated Perturbations

Thomas Drake Amazon draket@amazon.com

draket@amazon.com pigem@amazon.co.uk sey@amazon.com tdiethe@amazon.co.uk 0 1 2 0 Borja Balle Amazon 1 Oluwaseyi Feyisetan Amazon 2 Tom Diethe Amazon

2020

Accurately learning from user data while providing quanti able privacy guarantees provides an opportunity to build better ML models while maintaining user trust. This paper presents a formal approach to carrying out privacy preserving text perturbation using the notion of d -privacy designed to achieve geo-indistinguishability in location data. Our approach applies carefully calibrated noise to vector representation of words in a high dimension space as de ned by word embedding models. We present a privacy proof that satises d -privacy where the privacy parameter provides guarantees with respect to a distance metric de ned by the word embedding space. We demonstrate how can be selected by analyzing plausible deniability statistics backed up by large scale analysis on G V and T embeddings. We conduct privacy audit experiments against 2 baseline models and utility experiments on 3 datasets to demonstrate the tradeo between privacy and utility for varying values of on di erent task types. Our results demonstrate practical utility (< 2% utility loss for training binary classi ers) while providing better privacy guarantees than baseline models.

Copyright ©2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). Presented at the PrivateNLP 2020 Workshop on Privacy in Natural Language Processing Colocated with 13th ACM International WSDM Conference, 2020, in Houston, Texas, USA.

Differential Privacy Mechanism Details Summary

•User’s goal: meet some specific need with respect to an issued query x •Agent’s goal: satisfy the user’s request •Question: what occurs when x is used to make other inferences about the user •Mechanism: modify the query to protect privacy whilst preserving semantics •Our approach: Generalized Metric Differential Privacy.

Introduction