Perfectly Privacy-Preserving AI
                                      What is it and how do we achieve it?
                                                               Patricia Thaine, Gerald Penn
                                                                      University of Toronto
                                                                 {pthaine,gpenn}@cs.toronto.edu

ABSTRACT
Many AI applications need to process huge amounts of sensitive
information for model training, evaluation, and real-world integra-
tion. These tasks include facial recognition, speaker recognition,
text processing, and genomic data analysis. Unfortunately, one of
the following two scenarios occur when training models to perform
the aforementioned tasks: either models end up being trained on
sensitive user information, making them vulnerable to malicious
actors, or their evaluations are not representative of their abilities
since the scope of the test set is limited. In some cases, the models
never get created in the first place.
   There are a number of approaches that can be integrated into AI
algorithms in order to maintain various levels of privacy. Namely,
differential privacy, secure multi-party computation, homomorphic
encryption, federated learning, secure enclaves, and automatic data
de-identification. We will briefly explain each of these methods and
describe the scenarios in which they would be most appropriate.
   Recently, several of these methods have been applied to ma-
chine learning models. We will cover some of the most interesting
examples of privacy-preserving ML, including the integration of
differential privacy with neural networks to avoid unwanted infer-
ences from being made of a network’s training data. We will also
discuss the work we have done on privacy-preserving language
modeling and on training neural networks on obfuscated data.
   Finally, we will discuss how the privacy-preserving machine
learning approaches that have been proposed so far would need to
be combined in order to achieve perfectly privacy-preserving ML.


Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons
License Attribution 4.0 International (CC BY 4.0).
PrivateNLP ’20, February 7, 2020, Houston, TX, USA
© 2020
                                                                                                                 P ERFECTLY P RIVACY-P RESERVING AI
                                                                                                            W HAT IS IT AND HOW DO WE ACHIEVE IT ?
                                                                                                                              Patricia Thaine, Gerald Penn
                                                                                                                                {pthaine,gpenn}@cs.toronto.edu


       Four Pillars Perfectly Privacy-Preserving AI                                                                             Differential Privacy                                            Creating Perfectly Privacy-Preserving AI
                                                                                                       Deﬁnition. A random algorithm A is (�, δ)-diﬀerentially private if               We can achieve perfectly privacy-preserving AI by combining diﬀerent
                                                                                                                                                             �
                                                                                                                                                                                        privacy-preserving methods. For example:
                                                                                                                         Pr[A(D) ∈ S] ≤ exp(�) × Pr[A(D ) ∈ S] + δ
                                                                                                                                                                                        • Diﬀerential Privacy + Homomorphic Encryption or Secure Enclaves
                                                                                                       for any set S of possible outputs of A, and any two data sets D, D� that diﬀer
                                                                                                       in at most one element. (Carlini et al., 2018)                                   • Secure Multi-Party Computation + Federated Learning +
                                                                                                                                                                                          Diﬀerential Privacy + Secure Enclaves or Homomorphic Encryption


                                                                                                                          Homomorphic Encryption
                                                                                                                                                                                                                               Resources
                                                                                                                                                                                        Diﬀerential Privacy
                                                                                                                                                                                        • TensorFlow Privacy: https://github.com/tensorflow/privacy
                                                                                                                                                                                        • Blog post explaining DP + ML: http://www.cleverhans.io/privacy/
                                                                                                                                                                                          2018/04/29/privacy-and-machine-learning.html
 Consumer and provider privacy are put at risk by:
  • criminal hackers,                                                                                                                                                                   • “The Promise of Diﬀerential Privacy. A Tutorial on Algorithmic Tech-
  • incompetent employees,                                                                                                                                                                niques” by Cynthia Dwork (2011)
  • autocratic governments,
  • manipulative companies.
                                                                                                                                                                                        Homomorphic Encryption
                                                                                                                    Secure Multi-Party Computation
 But privacy protection does not have to be about preventing access to sensitive data.
                                                                                                                                                                                        • PALISADE Library: https://git.njit.edu/palisade/PALISADE
                                                                                                                                                                                        • Microsoft SEAL Library: https://github.com/Microsoft/SEAL
 Privacy-preserving methods allow scientists and engineers to use otherwise inaccessible
 data due to privacy-concerns (e.g., for genomic data analysis (Jagadeesh et al., 2017)).                                                                                               • Intro to HE: “Homomorphic Encryption for Beginners: A Practical Guide”
 Data privacy and data utility are positive-sum features of eﬀective ML models.                                                                                                         • “Cryptonets: Applying neural networks to encrypted data with high
                                                                                                                                                                                          throughput and accuracy” by Gilad-Bachrach, Ran, et al. (2016)
                                                                                                                                                                                        Federated Learning and Secure Multi-Party Computation

                     Training Data Vulnerabilities                                                                                                                                      • Code: https://github.com/OpenMined
                                                                                                                                                                                        • Florian Hartmann’s Blog: https://florian.github.io/
                                                                                                                                                                                        • “Practical Secure Aggregation for Privacy Preserving Machine Learning”
                                                                                                                                                                                          by Bonawitz et al. (2017)
                                                                                                                               Federated Learning

                                                                                                                                                                                                                              References
                                                                                                                                                                                        Carlini, Nicholas, et al. “The secret sharer: Measuring unintended neural network memorization
                                                                                                                                                                                          extracting secrets.” arXiv preprint arXiv:1802.08232 (2018).
Left: Results for 5% of PTB. Loss at a minimum after 10 epochs, when estimated exposure peaks.                                                                                          El Emam, Khaled, et al. “A globally optimal k-anonymity method for the de-identiﬁcation of health
Right: Estimated exposure of inserted secret. 620K(+/-5K) parameters / model. (Carlini et al., 2018)                                                                                      data.” Journal of the American Medical Informatics Association 16.5 (2009): 670-682.
                                                                                                                                                                                        Jagadeesh, Karthik A., et al. “Deriving genomic diagnoses without revealing patient genomes.” Sci-
                                                                                                                                                                                          ence 357.6352 (2017): 692-695.
                                                                                                                                                                                        Thaine, P., Gorbunov, S., Penn, G. (2019). Eﬃcient Evaluation of Activation Functions over En-
                                                                                                                                                                                          crypted Data. In Proceedings of the 2nd Deep Learning and Security Workshop, 40th IEEE Sym-
                      Some Alternative Solutions                                                                                                                                          posium on Security and Privacy, San Francisco, USA.
                                                                                                                                                                                        Thaine, P., Penn, G. (2019). Privacy-Preserving Character Language Model. In Proceedings of the
• Automatic data de-identiﬁcation (e.g., El Emam et al. (2009))                                                                                                                           Privacy-Enhancing Artiﬁcial Intelligence and Language Technologies AAAI Spring Symposium,
                                                                                                                                                                                          PAL 2019, Stanford University, Palo Alto, USA.
• Data synthesis (e.g., Triastcyn et al. (2018))                                                                                                                                        Triastcyn, Aleksei, and Boi Faltings. “Generating diﬀerentially private datasets using gans.” arXiv
• Secure Enclaves (Intel SGX, Keystone Project, AMD-SP)                                                      Image source: https://www.slideshare.net/MindosCheng/federated-learning      preprint arXiv:1803.03148 (2018).
                                                                                                                                                                                                                                                                                   LATEX TikZposter