Hyperbolic Embeddings for Preserving Privacy and Utility in Text Oluwaseyi Feyisetan Tom Diethe Thomas Drake Amazon Amazon Amazon sey@amazon.com tdiethe@amazon.co.uk draket@amazon.com ABSTRACT Guaranteeing a certain level of user privacy in an arbitrary piece of text is a challenging issue. However, with this challenge comes the potential of unlocking access to vast data stores for training machine learning models and supporting data driven decisions. We address this problem through the lens of d -privacy, a generaliza- tion of Di�erential Privacy to non Hamming distance metrics. In this work, we explore word representations in Hyperbolic space as a means of preserving privacy in text. We provide a proof satisfying d -privacy, then we de�ne a probability distribution in Hyperbolic space and describe a way to sample from it in high dimensions. Privacy is provided by perturbing vector representations of words in high dimensional Hyperbolic space to obtain a semantic gen- eralization. We conduct a series of experiments to demonstrate the tradeo� between privacy and utility. Our privacy experiments illustrate protections against an authorship attribution algorithm while our utility experiments highlight the minimal impact of our perturbations on several downstream machine learning models. Compared to the Euclidean baseline, we observe > 20x greater guarantees on expected privacy against comparable worst case statistics. ACM Reference Format: Oluwaseyi Feyisetan, Tom Diethe, and Thomas Drake. 2020. Hyperbolic Embeddings for Preserving, Privacy and Utility in Text. In Proceedings of Workshop on Privacy and Natural Language Processing (PrivateNLP ’20). Houston, TX, USA, 1 page. https://doi.org/10.1145/nnnnnnn.nnnnnnn Copyright ©2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). Presented at the PrivateNLP 2020 Workshop on Privacy in Natural Language Processing Colocated with 13th ACM International WSDM Conference, 2020, in Houston, Texas, USA. PrivateNLP ’20, February 7, 2020, Houston, TX, USA © 2020 H YPERBOLIC M ETRIC D IFFERENTIAL P RIVACY Oluwaseyi Feyisetan, Tom Diethe, Thomas Drake {sey,tdiethe,draket}@amazon.com S UMMARY H YPERBOLIC W ORD E MBEDDINGS E XPERIMENTS • User’s goal: meet some specific need with Traditional embeddings map from words into a vector space : To help choose ", we define: respect to a query x W 7! Rn , such as neural network based models (e.g. Word2Vec, • Uncertainty statistics for the adversary over the outputs • Agent’s goal: satisfy the user’s request GloVe, fastText). In this space, nearest neigbors preserve semantics. • Indistinguishability statistics: plausible deniability • Question: what occurs when x is used to • Find a radius of high protection: guarantee on the likelihood of chang- make other inferences ing any word in the embedding vocabulary • Mechanism: Modify the query to protect Privacy Experiments 1 privacy whilst preserving semantics • Task: obfuscation vs. Koppel’s authorship attribution algorithm • Our approach: • Datasets: TPAN@Clef, correct author predictions (lower=better) Hyperbolic Metric Differential Privacy PAN-11 PAN-12 small large set-A set-C set-D set-I M ETRIC D IFFERENTIAL P RIVACY 0.5 36 72 4 3 2 5 A randomised mechanism M : X ! Y is (", )-differentially private if 1 35 73 3 3 2 5 Hyperbolic space can be thought of as the continuous analog of a tree 40 78 4 3 2 5 for all neighbouring inputs x ' x (i.e. dh (x, x ) = 1 where dh is the 0 0 2 structure. In natural language, this captures hypernomy and hyponomy, 8 65 116 4 5 4 5 Hamming distance) and for all sets of outputs E ✓ Y , for 2 [0, 1], leading to embeddings require fewer dimensions. Nearest neighbors 1 147 259 6 6 6 12 "dh (x,x0 ) are often hypernyms! P[M(x) 2 E]  e P [M (x0 ) 2 E] + . Privacy Experiments 2 Metric DP generalizes this to use any valid metric d(x, x0 ), (i.e. satisfies • Task: expected privacy vs Euclidean baseline (lower Nw is better) nonnegativity, indiscernibles, symmetry, triangle inequality). • Datasets: 100/200/300d GloVe embeddings Mechanism: For Euclidean metric DP, we use multivariate Laplacian expected value Nw noise to achieve " mDP, i.e: ⇠ ⇠ Lap n" 1 " worst-case Nw HYP -100 EUC -100 EUC -200 EUC -300 • Robust to post-processing: 0.125 134 1.25 38.54 39.66 39.88 M is (", )-DP, then f (M) is at least (", )-DP 0.5 148 1.62 42.48 43.62 43.44 • Composition: Pn Pn 1 172 2.07 48.80 50.26 53.82 if M1 , ..., Mn are (", )-DP, g(M1 , ..., Mn ) is ( i=1 ✏i , i=1 i )-DP 2 297 3.92 92.42 93.75 90.90 • Protects against side knowledge: 8 960 140.67 602.21 613.11 587.68 xi xi if attacker has prior Pprior and computes Pposterior after observing M(x) from "-DP mechanism, then dist(Pprior xi xi , Pposterior ) = O("). Utility Experiments • Tasks: 5x classification (sentiment x2, product reviews, opinion Fig. 1: Projection from Lorentz Fig. 2: WebIsADb IS - A relation- polarity, question-type), 3x natural language tasks (NL inference, Metric DP Mechanism for word embeddings model Hn to Poincaré model ships in GloVe on B 2 Poincaré disk paraphrase detection, semantic textual similarity) • Baselines: SentEval vs. random replacement Inputs: • Distances in n dimensional Poincaré ball model are given by: • w 2 W: word to be “privatised” from dictionary HYP -100d original ! • : W 7! Z: embedding function 2 Dataset rand. " = 1/8 "=1 "=8 InferSent SkipThought fastText ku vk • d : Z ⇥ Z 7! R: distance function in embedding space dBn (u, v) = arcosh 1 + 2 2 2 MR 58.19 58.38 63.56 74.52 81.10 79.40 78.20 (1 kuk )(1 kvk ) CR 77.48 86.30 83.1 80.20 • ⌦("): DP noise distribution 83.21 83.92 85.19 1. Project word v = (w) MPQA 84.27 88.53 88.62 88.98 90.20 89.30 88.00 • We prove that dBn (u, v) is a valid metric (via Lorentzian model) SST-5 30.81 41.76 42.40 42.53 46.30 - 45.10 2. Perturb the word vector: v0 = v + ⇠ where ⇠ ⇠ ⌦(") TREC-6 75.20 82.40 82.40 84.20 88.20 88.40 83.40 3. The new vector v0 will not be a word (a.s.) SICK-E 79.20 86.10 79.5 78.9 4. Project back to W: w0 = arg minw2W d(v0 , (w)) H YPERBOLIC D IFFERENTIAL P RIVACY 81.00 82.38 82.34 MRPC 69.86 74.78 75.07 75.01 76.20 - 74.40 5. return w0 • We derive the Hyperbolic Laplace distribution that satsifies "-DP: STS14 0.17 0.44 0.45 0.52 0.68 0.44 0.65 ✓ ◆ " What do we need? 1+" 2 S ELECTED R EFERENCES p(x|µ = 0, ") = 1 • d satisfies the axioms of a metric 2 2 F1 (1, ", 2 + ", 1) kxk 1 [1] K.Chatzikokolakis, et al. Broadening the scope of differential privacy us- • A way to sample using ⌦ in the metric space that respects d and gives where 2 F1 (a, b; c, z) is the hypergeometric function ing metrics. Intl. Symposium on Privacy Enhancing Technologies, 2013. us "-metric DP • We develop a Metropolis Hastings sampler that operates in Hyper- [2] M. Nickel and D. Kiela, Poincaré embeddings for learning hierarchical rep- bolic space resentations. NeurIPS, 2017.