Hyperbolic Embeddings for Preserving
                                       Privacy and Utility in Text
             Oluwaseyi Feyisetan                                             Tom Diethe        Thomas Drake
                      Amazon                                                  Amazon                Amazon
                  sey@amazon.com                                       tdiethe@amazon.co.uk   draket@amazon.com

ABSTRACT
Guaranteeing a certain level of user privacy in an arbitrary piece
of text is a challenging issue. However, with this challenge comes
the potential of unlocking access to vast data stores for training
machine learning models and supporting data driven decisions. We
address this problem through the lens of d -privacy, a generaliza-
tion of Di�erential Privacy to non Hamming distance metrics. In
this work, we explore word representations in Hyperbolic space as
a means of preserving privacy in text. We provide a proof satisfying
d -privacy, then we de�ne a probability distribution in Hyperbolic
space and describe a way to sample from it in high dimensions.
Privacy is provided by perturbing vector representations of words
in high dimensional Hyperbolic space to obtain a semantic gen-
eralization. We conduct a series of experiments to demonstrate
the tradeo� between privacy and utility. Our privacy experiments
illustrate protections against an authorship attribution algorithm
while our utility experiments highlight the minimal impact of our
perturbations on several downstream machine learning models.
Compared to the Euclidean baseline, we observe > 20x greater
guarantees on expected privacy against comparable worst case
statistics.
ACM Reference Format:
Oluwaseyi Feyisetan, Tom Diethe, and Thomas Drake. 2020. Hyperbolic
Embeddings for Preserving, Privacy and Utility in Text. In Proceedings of
Workshop on Privacy and Natural Language Processing (PrivateNLP ’20).
Houston, TX, USA, 1 page. https://doi.org/10.1145/nnnnnnn.nnnnnnn


Copyright ©2020 for this paper by its authors. Use permitted under Creative Commons
License Attribution 4.0 International (CC BY 4.0). Presented at the PrivateNLP 2020
Workshop on Privacy in Natural Language Processing Colocated with 13th ACM
International WSDM Conference, 2020, in Houston, Texas, USA.
PrivateNLP ’20, February 7, 2020, Houston, TX, USA
© 2020
                                                                        H YPERBOLIC M ETRIC D IFFERENTIAL P RIVACY
                                                                                                  Oluwaseyi Feyisetan, Tom Diethe, Thomas Drake
                                                                                                       {sey,tdiethe,draket}@amazon.com

S UMMARY                                                                        H YPERBOLIC W ORD E MBEDDINGS                                                E XPERIMENTS
• User’s goal: meet some specific need with                                     Traditional embeddings map from words into a vector space           :        To help choose ", we define:
  respect to a query x                                                          W 7! Rn , such as neural network based models (e.g. Word2Vec,                • Uncertainty statistics for the adversary over the outputs
• Agent’s goal: satisfy the user’s request                                      GloVe, fastText). In this space, nearest neigbors preserve semantics.        • Indistinguishability statistics: plausible deniability
• Question: what occurs when x is used to                                                                                                                    • Find a radius of high protection: guarantee on the likelihood of chang-
  make other inferences                                                                                                                                        ing any word in the embedding vocabulary
• Mechanism: Modify the query to protect
                                                                                                                                                             Privacy Experiments 1
  privacy whilst preserving semantics
                                                                                                                                                             • Task: obfuscation vs. Koppel’s authorship attribution algorithm
• Our approach:
                                                                                                                                                             • Datasets: TPAN@Clef, correct author predictions (lower=better)
  Hyperbolic Metric Differential Privacy
                                                                                                                                                                                   PAN-11                             PAN-12
                                                                                                                                                                                 small large           set-A       set-C set-D           set-I
M ETRIC D IFFERENTIAL P RIVACY
                                                                                                                                                                        0.5         36          72            4        3            2       5
A randomised mechanism M : X ! Y is (", )-differentially private if                                                                                                     1           35          73            3        3            2       5
                                                                                Hyperbolic space can be thought of as the continuous analog of a tree                               40          78            4        3            2       5
for all neighbouring inputs x ' x (i.e. dh (x, x ) = 1 where dh is the
                                 0              0                                                                                                                       2
                                                                                structure. In natural language, this captures hypernomy and hyponomy,                   8           65         116            4        5            4       5
Hamming distance) and for all sets of outputs E ✓ Y , for 2 [0, 1],
                                                                                leading to embeddings require fewer dimensions. Nearest neighbors                       1          147         259            6        6            6      12
                                 "dh (x,x0 )                                    are often hypernyms!
              P[M(x) 2 E]  e                  P [M (x0 ) 2 E] + .
                                                                                                                                                             Privacy Experiments 2
Metric DP generalizes this to use any valid metric d(x, x0 ), (i.e. satisfies                                                                                • Task: expected privacy vs Euclidean baseline (lower Nw is better)
nonnegativity, indiscernibles, symmetry, triangle inequality).                                                                                               • Datasets: 100/200/300d GloVe embeddings
Mechanism: For Euclidean metric DP, we use multivariate Laplacian                                                                                                                                                 expected value Nw
noise to achieve " mDP, i.e: ⇠ ⇠ Lap n"     1
                                                                                                                                                                "        worst-case Nw          HYP -100          EUC -100  EUC -200             EUC -300
• Robust to post-processing:                                                                                                                                    0.125            134                   1.25         38.54        39.66             39.88
  M is (", )-DP, then f (M) is at least (", )-DP                                                                                                                0.5              148                   1.62         42.48        43.62             43.44
• Composition:                                       Pn         Pn                                                                                              1                172                   2.07         48.80        50.26             53.82
  if M1 , ..., Mn are (", )-DP, g(M1 , ..., Mn ) is ( i=1 ✏i , i=1 i )-DP                                                                                       2                297                   3.92         92.42        93.75             90.90
• Protects against side knowledge:                                                                                                                              8                960                 140.67        602.21       613.11            587.68
                           xi                      xi
  if attacker has prior Pprior  and computes Pposterior       after observing
  M(x) from "-DP mechanism, then dist(Pprior    xi       xi
                                                      , Pposterior ) = O(").                                                                                 Utility Experiments
                                                                                                                                                             • Tasks: 5x classification (sentiment x2, product reviews, opinion
                                                                                Fig. 1: Projection from Lorentz        Fig. 2: WebIsADb IS - A relation-       polarity, question-type), 3x natural language tasks (NL inference,
   Metric DP Mechanism for word embeddings
                                                                                model Hn to Poincaré model             ships in GloVe on B 2 Poincaré disk     paraphrase detection, semantic textual similarity)
                                                                                                                                                             • Baselines: SentEval vs. random replacement
   Inputs:                                                                      • Distances in n dimensional Poincaré ball model are given by:
   • w 2 W: word to be “privatised” from dictionary                                                                                                                                        HYP -100d                                     original
                                                                                                                                        !
   • : W 7! Z: embedding function                                                                                                2                            Dataset   rand.      " = 1/8       "=1          "=8      InferSent        SkipThought    fastText
                                                                                                                         ku vk
   • d : Z ⇥ Z 7! R: distance function in embedding space                                 dBn (u, v) = arcosh 1 + 2         2         2                       MR         58.19         58.38     63.56        74.52         81.10            79.40      78.20
                                                                                                                    (1 kuk )(1 kvk )                          CR         77.48                                              86.30             83.1      80.20
   • ⌦("): DP noise distribution                                                                                                                                                       83.21     83.92        85.19
      1. Project word v = (w)                                                                                                                                 MPQA       84.27         88.53     88.62        88.98         90.20            89.30      88.00
                                                                                • We prove that dBn (u, v) is a valid metric (via Lorentzian model)           SST-5      30.81         41.76     42.40        42.53         46.30                -      45.10
      2. Perturb the word vector: v0 = v + ⇠ where ⇠ ⇠ ⌦(")
                                                                                                                                                              TREC-6     75.20         82.40     82.40        84.20         88.20            88.40      83.40
      3. The new vector v0 will not be a word (a.s.)                                                                                                          SICK-E     79.20                                              86.10             79.5       78.9
      4. Project back to W: w0 = arg minw2W d(v0 , (w))                         H YPERBOLIC D IFFERENTIAL P RIVACY                                                                     81.00     82.38        82.34
                                                                                                                                                              MRPC       69.86         74.78     75.07        75.01         76.20                -      74.40
      5. return w0                                                              • We derive the Hyperbolic Laplace distribution that satsifies "-DP:          STS14       0.17          0.44      0.45         0.52          0.68             0.44       0.65
                                                                                                                                  ✓               ◆ "
What do we need?                                                                                                 1+"                     2                   S ELECTED R EFERENCES
                                                                                        p(x|µ = 0, ") =                                       1
• d satisfies the axioms of a metric                                                                    2 2 F1 (1, ", 2 + ", 1)       kxk 1
                                                                                                                                                             [1] K.Chatzikokolakis, et al. Broadening the scope of differential privacy us-
• A way to sample using ⌦ in the metric space that respects d and gives
                                                                                  where 2 F1 (a, b; c, z) is the hypergeometric function                     ing metrics. Intl. Symposium on Privacy Enhancing Technologies, 2013.
  us "-metric DP
                                                                                • We develop a Metropolis Hastings sampler that operates in Hyper-           [2] M. Nickel and D. Kiela, Poincaré embeddings for learning hierarchical rep-
                                                                                  bolic space                                                                resentations. NeurIPS, 2017.