Attacks on Machine Learning: Lurking Danger for Accountability Katja Auernhammer, Ramin Tavakoli Kolagari, Markus Zoppelt Nuremberg Institute of Technology, Faculty of Computer Science Hohfederstrasse 40 Nuremberg, 90489, Germany {katja.auernhammer, ramin.tavakolikolagari, markus.zoppelt}@th-nuernberg.de Abstract • Confidentiality ensures that private or confidential infor- mation is not made available or disclosed to unauthorized It is well-known that there is no safety without security. That users, and that users can control (or influence) what in- being said, a sound investigation of security breaches on Ma- chine Learning (ML) is a prerequisite for any safety concerns. formation related to them may be collected, used, and Since attacks on ML systems and their impact on the security to whom it is disclosed. Confidentiality is often imple- goals threaten the safety of an ML system, we discuss the im- mented through cryptography / encryption. pact attacks have on the ML models’ security goals, which • Integrity ensures that information is not changed (mod- are rarely considered in published scientific papers. ified) or destroyed unauthorizedly. Integrity can be com- The contribution of this paper is a non-exhaustive list of pub- promised even if the information or system produces the lished attacks on ML models and a categorization of attacks correct output. according to their phase (training, after-training) and their im- pact on security goals. Based on our categorization we show • Availability ensures that a system works promptly, ser- that not all security goals have yet been considered in the lit- vice is not denied to authorized users, and access to and erature, either because they were ignored or there are no pub- use of information is timely and reliable. lications on attacks targeting those goals specifically, and that some are difficult to assess, such as accountability. This is • Authenticity is the characteristic of being genuine probably due to some ML models being a black box. and verifiable and trustworthy. Authenticity is ensured through authentication processes that verify whether users are who they say they are (entity authenticity). Authentic- Introduction ity is often enabled through cryptography / cryptographic During the last few years scientists and researchers have signatures. published a variety of different attacks on Machine Learning • Reliability is the property of a system such that reliance (ML) systems. However, the papers only rarely mention se- can be justifiably placed on the service it delivers, i.e., the curity goals—such as integrity, availability, confidentiality, system adheres to the specification it was engineered to reliability, authenticity, and accountability—that are endan- address. gered by these attacks. Even if a paper explicitly mentions • Accountability refers to the requirements for actions of the violation of a security goal it is not clear if the breach an entity to be traced uniquely to that entity (e.g., non- refers to the whole system in which the ML model is em- repudiation of a communication that took place). Ac- bedded or rather the ML model itself or parts of it. countability allows a certain degree of transparency to The contribution of this paper is a non-exhaustive list what happened when and what was performed by whom. of published attacks on ML and a derivation of different groups of attacks. We further elaborate on the breaches of known security goals (integrity, availability, confidentiality, Attacks on Machine Learning Algorithms etc.) caused by the listed attacks to justify our categorization Important criteria that influence the applicability of certain and show the security goals mentioned in published papers attacks on ML models at this level of detail are the learn- about attacks on ML. Our categorization clarifies that there ing type (supervised, unsupervised, reinforcement learning) are some security goals, such as accountability, which are and if the algorithm undergoes lifelong learning. Different yet difficult to evaluate due to the complex operations within attacks are designed to target combinations of different cri- ML models. teria. The implications to the security goals of the ML model are equivalent to the security goals corresponding to the cat- Security Goals egorization of the attack. In Table 1 the first column names the ML algorithm in al- The six main security goals as described in [21] are summa- phabetical order, followed by the learning type and whether rized as follows: the model is capable of lifelong learning or not. Lifelong Copyright held by authors. learning is a criterion that is often ignored by researchers Table 1: Published attacks on ML categorized by ML algorithms. The listed ML algorithms are derived from the publications of the attacks, therefore, there might be attacks aimed at, e.g., neural networks in general but also attacks on specific sub-types of neural networks, e.g., convolutional neural networks. The columns “Learning Type” and “Lifelong Learning” do not solely refer to what the algorithm is capable of but to the premises the ML algorithm must meet to render the attack effective ML Algorithm Learning Type Lifelong L. Attack Complete-linkage Hierarchical Clustering Unsupervised No Poisoning Attack [9] Single-Linkage Hierarchical Clustering Unsupervised No Poisoning Attack [13] Obfuscation Attack [13, 14] Decision Tree/Random Forest Supervised Yes/No Poisoning Attack [46] No Path-finding Attack [72] Model Inversion [26] Ateniese et al. Attack [4] Adversarial Examples [31, 52, 66] Hidden Markov Model Supervised No Ateniese et al. Attack [4] k-Nearest Neighbors Supervised Yes/No Poisoning Attack [46] No Adversarial Examples [31] k-Means Clustering Unsupervised No Ateniese et al. Attack [4] Linear Regression Supervised Yes/No Poisoning Attack [8, 35, 41] No Model Inversion [27] Lowd-Meek Attack [44, 72] Logistic Regression Supervised No Equation-solving Attack [49] Hyperparameter Stealing [73] Adversarial Examples [52, 70, 71] Multi-class Logistic Regression Supervised No Equation-solving Attack [49] Maximum Entropy Models Supervised No Lowd-Meek Attack [44] Naive Bayes Supervised No Classifier Evasion [3, 22] Lowd-Meek Attack [44] Neural Network Reinforcement Unclear Strategically-timed Attack [40] Learning Enchanting Attack [40] Adversarial Examples [33, 40] Neural Network Supervised No Model Inversion [26] Membership Inference [63] Hyperparameter Stealing Attack [73] Ateniese et al. Attack [4] Adversarial Examples [29, 31, 45, 52, 62, 70] Trojan Trigger [43] Multi-layer Perceptron Supervised Yes/No Poisoning Attack [46] No Equation-solving Attack [49] Ateniese et al. Attack [4] Convolutional Neural Network Supervised No Side-channel Attack [74] Training Data Extraction [18] Adversarial Examples [50, 52, 70] Recurrent Neural Network Supervised No Training Data Extraction [18] Classifier Evasion [3] Adversarial Examples [57] Support Vector Machine Supervised Yes/No Poisoning Attack [12, 46] Adversarial Label Flips [76, 77] No Hyperparameter Stealing [73] Lowd-Meek Attack [44, 72] Ateniese et al. Attack [4] Evasion Attack [3, 24, 30, 61, 66] Feature Deletion [28] Adversarial Examples [31, 52, 66, 71] or at least not explicitly mentioned in papers. We comple- rized in Table 2 under integrity. mented this information wherever necessary according to Table 2 shows our mapping of the analyzed attacks listed the definition in common text books. There are four possi- in Table 1 to the six security goals described in the Secu- ble values for lifelong learning: Yes, No, Yes/No (when both rity Goals section. While Table 1 focused on the ML algo- can be the case) and unclear (when we simply do not know). rithms Table 2 brings the attacks into focus. The assignment In the last column we list the attacks with corresponding lit- in Table 2 is based on the description of the attacks in the erature. respective publications. In the table, an “X” indicates which We also identified attacks that are employable against sev- security goal (related to the ML component as a whole) is eral ML algorithms. Attacks we consider applicable to sys- affected by which attack. tems regardless of the ML algorithm, learning type, and life- In addition, many attacks have been published that relate long learning capability are, for example, poisoning attacks to pre- and post-processing units of ML components (their [8, 46] as these attacks do not focus on the model but the environment). These attacks do not differ from those on tra- training data; therefore, poisoning attacks are considered in- ditional software, therefore they are not described in this pa- dependent of the ML algorithm. per. Another group of attacks that tamper with data fed into An obvious peculiarity of ML components compared to the ML model, and thus are applicable on a wide range of traditional software is their training, so there are two essen- different ML algorithms, are adversarial examples [5, 6, 17, tial phases in their life cycle: the training phase (T) and the 34, 51, 59], evasion attacks [23, 78], and feature deletion deployment phase that we prefer to call the after-training attacks. These attacks exploit weaknesses in the ML model phase (A), as this also considers lifelong learning ML al- without changing the model itself by simply perturbing the gorithms, which are trained with every input even after de- input to falsify the output. ployment. This continuous learning process makes attacks Shokri et al. [63] claim their attack, membership infer- in deployment time possible, which are also applicable in ence, to be generic, although they only apply it to classifica- training time (such as poisoning attacks [60]) and, on the tion algorithms. We also think the attack is only applicable to other hand, disables the applicability of attacks that require ML algorithms that are not capable of lifelong learning, as a fixed target model (e.g., model inversion [26]). membership inference relies on computing multiple inputs Unlike previous research (e.g., [7, 55]) we do not consider via the ML model to extract information about the training whether an attack is targeted, whether the opponent causes a data. If the model adapts with every given input, this ap- certain wrong output or not, whether a wrong output is gen- proach can be aggravated. erated, or whether the opponent has white box or black box knowledge. At this point we also do not distinguish between Categorization of ML Attacks with Regard to different types of learning (supervised, unsupervised, rein- forcement learning). Considering all these kinds of criterion, Security Goals a blurred categorization would be created that contradicts a In software security it is well-established to distinguish be- clear distinction between attacks. Instead, we propose con- tween attacks with regard to their effects on security goals sidering the above criteria within each of our main groups in (see Section “Security Goals”). The attacks described in Ta- order to add further dimensions and form sub-groups. This ble 2 affect one or more security goals of a system (here: is not within the scope of this paper, although we consider an ML component). A categorization of the published at- the learning type in Table 1, which can be used as a starting tacks according to security goals compiles an overview of point for further investigations. clusters of similar attack scenarios as well as of missing but By analyzing the security goals that are breached by the expected attack clusters. These gaps in the categorization of attacks and the time the attack takes place, we can create attacks may result from unknown publications about attacks different categories of attacks. The names of the categories on ML components, from unpublished attacks or attacks that are derived from whether the attack takes place during train- have not yet been executed but which are all conceivable and ing time (T) or after-training time (A) followed by a dash therefore executable in principle. Therefore, these gaps in (-) and the first one or two letters of the main security goals, the categorization are particularly revealing. which are breached by the attacks. Grey “X”s indicate the Of particular relevance for the categorization of attacks main assignments of attacks to security goals. developed here is the violation of security goals, which af- First of all, it is noticeable that all attacks at training time fect the ML component as a whole. Thus, the violation of the affect both integrity and reliability. This also makes sense integrity for an ML component means that the ML compo- immediately: if only the integrity was corrupted during train- nent itself is changed (in some form). In the publications on ing time, the system could be corrected conform to the spec- the attacks on ML components analyzed here (and also listed ification via the existing reliability. If only the reliability was in Table 2), statements are partly made on the violations of corrupted, the unchanged behavior would result in a differ- the security goals, but these sometimes refer (only) to par- ence to the specification, which would result in a correction tial areas of an attack. Thus, the attack adversarial examples of the specification. Only a simultaneous attack on both se- [69], which manipulates data fed into the model, targets— curity goals can therefore be successful during the training according to the authors—integrity, namely the integrity of phase. Confidentiality is not a main security goal for attacks the input data; as the integrity of the ML model itself is not during the training phase, but most of the identified attacks attacked because it has not been changed, it is not catego- have attacked the confidentiality as well. However, success- Table 2: Mapping of published attacks on ML on the security goals violated. The attacks are categorized according to the security goals they breach. The first column “Att. Cat.” (Attack Category) labels the categories. The names are derived from the time of the ML algorithm lifecycle (Training, After-training) the attacks take place and the security goals the attack brakes that are most relevant for the specified category Att. Published Attacks Confiden- Availa- Integrity Reliability Authen- Accoun- Cat. tiality bility ticity tability Poisoning Attack [60] X [14, 47] [10, 13, 14, X [10, 14, 35, X 35, 39, 47] 36, 39, 47, 55, 67] Adversarial Label Flips [76] X X [56] X T-IR Strategically-timed Attack [40] X X X Enchanting Attack [40] X X X Obfuscation Attack [13] X [13] X IR Trojan Trigger [43] X X X A- Model Inversion [26] X [26, 27, X 32, 56, 72, 75] Membership Inference [63] X [63, 65] X Side-channel Attack [74] X [74] X Lowd-Meek Attack [44] X [55] A-C Training Data Extraction [18] X [18] X Ateniese et al. Attack [4] X [4] X Path-finding Attack [49] X X Equation-solving Attack [49] X Hyperparameter Stealing [73] X [73] Classifier Evasion [11] X [7, 48] [20, 48, 55, 68] X Adversarial Examples [69] X [15] [15, 17, 52, 53, X [25, A-R 54, 55] 64] Feature Deletion [28] X X ful attacks during the training phase that relate exclusively to The entity, which can not deny an action, is ultimately rel- integrity and reliability would also be conceivable. Attack- evant in a legal context, namely in case of finding the party ing the security goal availability makes no sense during the liable for a specific action. It is not relevant, however, how training phase. a single element of an algorithm contributed to the system’s Attacks on integrity and reliability during the deployment decision, but whether the wrong decision was caused due to phase are theoretically meaningful and have been published faulty training, biases in the training data or malicious at- pertinently. They represent the mirroring of attacks on in- tacks. tegrity and reliability from the training phase. An essential We find that there is no clear definition of accountabil- group with a particularly large number of published attacks ity and that it is difficult to transfer existing definitions to in the deployment phase refers to confidentiality. The fact the field of ML. In order to guarantee accountability at all, that these attacks are often accompanied by restrictions in changes in the system, e.g., in traditional software this could availability is rather a side effect than a main aspect. A cat- be changes in the database, must be recorded. Without a egory of attacks on ML components that mainly refers to form of audit that promises some form of tracing, account- availability (think of DoS attacks on traditional software) ability cannot be broken, because the goal was not even makes little sense in theory and has not been published. The reached in the first place. With a ML system, the changes frequently cited adversarial examples attack group is among within a system do not necessarily have to be recorded. others in the category of reliability attacks during the de- Rather the decisions of the system or of parts of the system ployment phase; typically, integrity is not corrupted because should be made assignable to a distinct entity. the ML components themselves are not modified. In the context of ML, a distinction between accountability The lack of assignments to the security goals authentic- and liability should be considered. Both focus on retracing ity and accountability are also particularly informative. In an action to an entity. Liability, however, concentrates on the our research we could not find any attacks on these secu- assignment of blame or debt relief of individual entities and rity goals of the ML components. Authenticity is usually is also possible without an audit of the actions and decision implemented in the environmental components surrounding made by inner components within the ML algorithm. For an ML component. This will probably change in the future, liability it is sufficient to record the final decision of the ML however, when comprehensive tasks will be implemented in system solely. a network of ML components and it becomes necessary to Accountability, on the other hand, is only possible by log- establish the ML components as mission-critical communi- ging the internal processes. The definition of an “entity”, cation partners. Accountability of ML is considered—even however, is still unclear. Furthermore, logging requires a cer- in the community of ML experts—to be mostly inaccessi- tain understanding of the model, which is difficult up until ble (especially with the so-called black box ML components now. However, if ML algorithms become comprehensible in such as deep neural networks), because these components the future, accountability could be achievable and this also cannot be read like traditional software and cannot be se- means that accountability—as a security goal—can be bro- mantically deduced from the structure. Nevertheless, we be- ken by attackers. lieve that a new field of attacks on ML components will Assume it will be possible to identify which nodes in open up in this field in the future because initiatives such a neural network are responsible for a particular decision. as eXplainable AI (layer-wise relevance propagation [16], E.g., we know which nodes in an image recognition sys- Black Box Explanations through Transparent Approxima- tem are responsible for detecting certain objects, such as tions (BETA) [37], LIME [58], Generalized Additive Model stop signs. If these nodes are regarded as entities, they can (GAM) [19], etc.) and the political demand for comprehensi- be made accountable for their decisions. Accountability al- ble AI decisions will ensure greater comprehensibility in the lows ML algorithms to be developed and validated more ef- area of the black box ML, which will ultimately also help ficiently maybe even to the point where they become similar the attackers. to the code of traditional software development. This is de- sirable in any case, as it greatly simplifies development and The Peculiarity of Accountability troubleshooting. If this knowledge about accountability is leaked, adversaries can also take advantage of it and launch It is yet unclear, how the concept of accountability applies more targeted attacks, which might ultimately also target ac- to ML. Accountability in traditional software engineering countability. A breach in accountability will most likely be means an action can always be retraced to the entity per- the first step to sophisticated attacks that violate other secu- forming the action. An entity is usually a human or a digital rity goals as well. agent, however, the definition of an entity is not clear in the It is unclear what types of attacks might be possible once field of ML. An entity could be an input feature which leads ML models can be fully explained to humans, though. to a certain output of the ML model (this meets the defini- tion made by Papernot et al. [56]). An entity could also be an element within in the ML model, e.g., each single neuron Related Work within a neural network, which makes its own decision that Barreno et al. [7] give relevant properties they consider im- influences the final output of the model. From a different portant when conducting attacks on ML. The properties are point of view even the software developer could be consid- grouped into three categories: the influence of the attack on ered the entity. the target system, the specificity (targeted or untargeted) and the security violation (integrity, availability). Their paper fo- [2] Ibrahim M. Alabdulmohsin, Xin Gao, and Xiangliang cuses mostly on countermeasures against attacks. Papernot Zhang. “Adding Robustness to Support Vector Machines et al. [55] also review attacks and distinguish them into black Against Adversarial Reverse Engineering”. In: Proceedings box and white box attacks. They focus on attacks on classi- of the 23rd ACM International Conference on Conference fication algorithms and list theoretical countermeasures. Liu on Information and Knowledge Management - CIKM ’14. New York, New York, USA: ACM Press, 2014, pp. 231– et al. [42] also discuss different attacks and propose interest- 240. ISBN: 9781450325981. DOI: 10.1145/2661829. ing points to consider in future research. Biggio et al. [15] 2662047. take a different view on attacks on ML. They focus on how [3] Mark Anderson, Andrew Bartolo, and Pulkit Tandon. the field has developed during the years since its first men- “Crafting Adversarial Attacks on Recurrent Neural Net- tion in 2004. They also review published countermeasures. works”. In: (2017). Alabdulmohsin et al. [2] sort attacks into causative or ex- [4] Giuseppe Ateniese, Giovanni Felici, Luigi V. Mancini, An- ploratory attacks. A survey of attacks against deep learning gelo Spognardi, Antonio Villani, and Domenico Vitali. in computer vision was conducted by Akhtar and Mian [1]. “Hacking Smart Machines with Smarter Ones: How to Ex- They list several published countermeasures against adver- tract Meaningful Data from Machine Learning Classifiers”. sarial examples. Laskov and Kloft [38] propose a “frame- In: (2013). ISSN: 1747-8405. DOI: 10 . 1504 / IJSN . work for quantitative security analysis of ML models”. 2015.071829. arXiv: 1306.4447. [5] Shumeet Baluja and Ian Fischer. “Adversarial Transforma- Conclusion tion Networks: Learning to Generate Adversarial Exam- ples”. In: (2017). arXiv: 1703.09387. In this paper we give an overview of the current state-of-the- [6] Shumeet Baluja and Ian Fischer. “Learning to Attack: Ad- art ML algorithms and their respective attacks. This list is es- versarial Transformation Networks”. In: Association for the pecially interesting when considering some of the more crit- Advancement of Artificial Intelligence - AAAI’18. 2018. ical fields ML is used in, such as autonomous driving. Au- [7] Marco Barreno, Blaine Nelson, Russell Sears, Anthony D. tonomous driving uses ML models in safety-critical applica- Joseph, and J. D. Tygar. “Can machine learning be se- tions. Ignoring known attacks on pertinent ML algorithms is cure?” In: Proceedings of the 2006 ACM Symposium on In- hazardous as human life is at stake. Likewise, regular soft- formation, computer and communications security - ASI- ware development, security by design has to be applied to ACCS ’06. New York, USA: ACM Press, 2006. ISBN: the development of ML algorithms as well. 1595932720. DOI: 10.1145/1128817.1128824. We also propose a classification of published attacks on [8] Alex Beatson, Zhaoran Wang, and Han Liu. “Blind At- ML models based on security goals and life cycle phase. tacks on Machine Learners”. In: 30th Conference on Neu- Our research shows that accountability is not covered by ral Information Processing Systems (NIPS 2016) (2016), literature as there have not yet been any attacks published. pp. 2397–2405. ISSN: 10495258. This is probably due to the fact that accountability for ML [9] Battista Biggio, Samuel Rota Bulò, Ignazio Pillai, Michele is difficult to attack as ML models are yet beyond human Mura, Eyasu Zemene Mequanint, Marcello Pelillo, and understanding and, therefore, the security goal is not com- Fabio Roli. “Poisoning Complete-Linkage Hierarchical pulsory. Clustering”. In: ed. by Ana Fred, Terry M. Caelli, Robert P. W. Duin, Aurélio C. Campilho, and Dick de Ridder. Although, there are already some papers working on a Vol. 3138. Lecture Notes in Computer Science. Berlin, solution to improve comprehensibility of ML models, we Heidelberg: Springer Berlin Heidelberg, Aug. 2004. ISBN: think there is still a long way to go until humans are able 978-3-540-22570-6. DOI: 10 . 1007 / b98738. arXiv: to completely understand ML models. If accountability can 9780201398298. be guaranteed for all kinds of ML models this will enable a [10] Battista Biggio, Igino Corona, Giorgio Fumera, Giorgio Gi- wide range of new yet unknown attacks. acinto, and Fabio Roli. “Bagging classifiers for fighting poi- Further research will elaborate the implications of vulner- soning attacks in adversarial classification tasks”. In: Lec- able ML models. It will also discuss whether and how the ture Notes in Computer Science (including subseries Lec- security goal accountability can be transferred to the field ture Notes in Artificial Intelligence and Lecture Notes in of ML and if proper accountability of ML models has to be Bioinformatics) 6713 LNCS (2011), pp. 350–359. ISSN: considered in liability claims. 03029743. DOI: 10 . 1007 / 978 - 3 - 642 - 21557 - 5_37. Acknowledgement [11] Battista Biggio, Igino Corona, Davide Maiorca, Blaine Nel- son, Pavel Laskov, Giorgio Giacinto, and Fabio Roli. “Eva- Katja Auernhammer and Markus Zoppelt were supported by sion Attacks against Machine Learning at Test Time”. In: the BayWISS Consortium Digitization. ECML PKDD (2013), pp. 387–402. DOI: 10.1007/978- 3-642-40994-3\_25. References [12] Battista Biggio, Blaine Nelson, and Pavel Laskov. “Poi- [1] Naveed Akhtar and Ajmal Mian. “Threat of Adversarial At- soning Attacks against Support Vector Machines”. In: Pro- tacks on Deep Learning in Computer Vision: A Survey”. ceedings of the 29 th International Conference on Machine In: IEEE Access 6 (Jan. 2018), pp. 14410–14430. ISSN: Learning (June 2012). arXiv: 1206.6389. 21693536. DOI: 10.1109/ACCESS.2018.2807385. [13] Battista Biggio, Ignazio Pillai, Samuel Rota Bulò, Davide arXiv: 1801.00553. Ariu, Marcello Pelillo, and Fabio Roli. “Is data cluster- ing in adversarial settings secure?” In: Proceedings of the 2013 ACM workshop on Artificial intelligence and secu- 10.1007/978-3-319-49055-7_29. arXiv: 1709. rity - AISec ’13 (2013), pp. 87–98. ISSN: 15437221. DOI: 00045. 10.1145/2517312.2517321. [25] Alhussein Fawzi, Seyed Mohsen Moosavi-Dezfooli, and [14] Battista Biggio, Konrad Rieck, Davide Ariu, Christian Pascal Frossard. “The Robustness of Deep Networks: A Ge- Wressnegger, Igino Corona, Giorgio Giacinto, and Fabio ometrical Perspective”. In: IEEE Signal Processing Maga- Roli. “Poisoning behavioral malware clustering”. In: Pro- zine 34.6 (2017), pp. 50–62. ISSN: 10535888. DOI: 10 . ceedings of the 2014 Workshop on Artificial Intelligent and 1109/MSP.2017.2740965. Security Workshop - AISec ’14. New York, USA: ACM [26] Matt Fredrikson, Somesh Jha, and Thomas Ristenpart. Press, Nov. 2014, pp. 27–36. ISBN: 9781450331531. DOI: “Model Inversion Attacks that Exploit Confidence Informa- 10.1145/2666652.2666666. tion and Basic Countermeasures”. In: Proceedings of the [15] Battista Biggio and Fabio Roli. “Wild Patterns: Ten Years 22nd ACM SIGSAC Conference on Computer and Com- After the Rise of Adversarial Machine Learning”. In: munications Security - CCS ’15. New York, USA: ACM Pattern Recognition 84 (Dec. 2017), pp. 317–331. ISSN: Press, 2015, pp. 1322–1333. ISBN: 9781450338325. DOI: 00313203. DOI: 10.1016/j.patcog.2018.07.023. 10.1145/2810103.2813677. arXiv: 1712.03141. [27] Matt Fredrikson, Eric Lantz, Somesh Jha, Simon Lin, David [16] Alexander Binder, Sebastian Bach, Gregoire Montavon, Page, and Thomas Ristenpart. “Privacy in Pharmacogenet- Klaus Robert Müller, and Wojciech Samek. “Layer-wise ics: An End-to-End Case Study of Personalized Warfarin relevance propagation for deep neural network architec- Dosing”. In: Proceedings of the 23rd USENIX Security tures”. In: Lecture Notes in Electrical Engineering 376 Symposium (2014), pp. 17–32. (2016), pp. 913–922. ISSN: 18761119. DOI: 10 . 1007 / [28] Amir Globerson and Sam Roweis. “Nightmare at test time: 978-981-10-0557-2_87. robust learning by feature deletion”. In: Proceedings of the [17] Wieland Brendel, Jonas Rauber, and Matthias Bethge. 23rd international conference on Machine learning (2006), “Decision-Based Adversarial Attacks: Reliable Attacks pp. 353–360. DOI: 10.1145/1143844.1143889. Against Black-Box Machine Learning Models”. In: (Dec. [29] Abigail Graese, Andras Rozsa, and Terrance E. Boult. 2017). arXiv: 1712.04248. “Assessing threat of adversarial examples on deep neu- [18] Nicholas Carlini, Chang Liu, Jernej Kos, Úlfar Erlings- ral networks”. In: Proceedings - 2016 15th IEEE Interna- son, and Dawn Song. “The Secret Sharer: Measuring Un- tional Conference on Machine Learning and Applications, intended Neural Network Memorization & Extracting Se- ICMLA 2016 (2017), pp. 69–74. DOI: 10.1109/ICMLA. crets”. In: (Feb. 2018). arXiv: 1802.08232. 2016.44. arXiv: 1610.04256. [19] Rich Caruana, Yin Lou, Johannes Gehrke, Paul Koch, [30] Yi Han and Benjamin I. P. Rubinstein. “Adequacy of the Marc Sturm, and Noemie Elhadad. “Intelligible Models for Gradient-Descent Method for Classifier Evasion Attacks”. HealthCare”. In: Proceedings of the 21th ACM SIGKDD In- In: (2017). arXiv: 1704.01704. ternational Conference on Knowledge Discovery and Data [31] Jamie Hayes and George Danezis. “Machine Learning as an Mining - KDD ’15 (2015), pp. 1721–1730. ISSN: 1869- Adversarial Service: Learning Black-Box Adversarial Ex- 0327. DOI: 10.1145/2783258.2788613. amples”. In: (2017). arXiv: 1708.05207. [20] Lingwei Chen, Yanfang Ye, and Thirimachos Bourlai. “Ad- [32] Briland Hitaj, Giuseppe Ateniese, and Fernando Perez- versarial machine learning in malware detection: Arms race Cruz. “Deep Models Under the GAN: Information Leak- between evasion attack and defense”. In: Proceedings - age from Collaborative Deep Learning”. In: Proceedings 2017 European Intelligence and Security Informatics Con- of the 2017 ACM SIGSAC Conference on Computer and ference, EISIC 2017 2017-Janua (2017), pp. 99–106. DOI: Communications Security - CCS ’17. New York, USA: 10.1109/EISIC.2017.21. ACM Press, 2017, pp. 603–618. ISBN: 9781450349468. [21] Fabiano Dalpiaz, Elda Paja, and Paolo Giorgini. Secu- DOI : 10 . 1145 / 3133956 . 3134012. arXiv: 1702 . rity Requirements Engineering: Designing Secure Socio- 07464. Technical Systems. MIT Press, 2016, p. 224. ISBN: [33] Sandy Huang, Nicolas Papernot, Ian Goodfellow, Yan 0262034212. Duan, and Pieter Abbeel. “Adversarial Attacks on Neural [22] Nilesh Dalvi, Pedro Domingos, Mausam, Sumit Sanghai, Network Policies”. In: (Feb. 2017). arXiv: 1702.02284. and Deepak Verma. “Adversarial classification”. In: Pro- [34] Andrew Ilyas, Logan Engstrom, Anish Athalye, and Jessy ceedings of the 2004 ACM SIGKDD international confer- Lin. “Query-Efficient Black-box Adversarial Examples”. ence on Knowledge discovery and data mining - KDD ’04. In: (Dec. 2017). arXiv: 1712.07113. New York, USA: ACM Press, 2004. DOI: 10 . 1145 / [35] Matthew Jagielski, Alina Oprea, Battista Biggio, Chang 1014052.1014066. Liu, Cristina Nita-Rotaru, and Bo Li. “Manipulating Ma- [23] Hung Dang, Yue Huang, and Ee-Chien Chang. “Evading chine Learning: Poisoning Attacks and Countermeasures Classifiers by Morphing in the Dark”. In: Proceedings of for Regression Learning”. In: 2018 IEEE Symposium on Se- the 2017 ACM SIGSAC Conference on Computer and Com- curity and Privacy (SP). IEEE, May 2018, pp. 19–35. ISBN: munications Security - CCS ’17. New York, USA: ACM 978-1-5386-4353-2. DOI: 10.1109/SP.2018.00057. Press, 2017, pp. 119–133. ISBN: 9781450349468. DOI: arXiv: 1804.00308. 10.1145/3133956.3133978. arXiv: 1705.07535. [36] Ricky Laishram and Vir Virander Phoha. “Curie: A method [24] Ambra Demontis, Paolo Russu, Battista Biggio, Giorgio for protecting SVM Classifier from Poisoning Attack”. In: Fumera, and Fabio Roli. “On security and sparsity of lin- (June 2016). arXiv: 1606.01584. ear classifiers for adversarial settings”. In: Lecture Notes in Computer Science (including subseries Lecture Notes in [37] Himabindu Lakkaraju, Ece Kamar, Rich Caruana, and Jure Artificial Intelligence and Lecture Notes in Bioinformatics) Leskovec. “Interpretable & Explorable Approximations of 10029 LNCS (2016), pp. 322–332. ISSN: 16113349. DOI: Black Box Models”. In: (2017). arXiv: 1707.01154. [38] Pavel Laskov and Marius Kloft. “A framework for quantita- [49] Tam N. Nguyen. “Attacking Machine Learning models as tive security analysis of machine learning”. In: Proceedings part of a cyber kill chain”. In: (2017). arXiv: 1705 . of the 2nd ACM workshop on Security and artificial intelli- 00564. gence - AISec ’09. New York, New York, USA: ACM Press, [50] Andrew P. Norton and Yanjun Qi. “Adversarial- 2009. ISBN: 9781605587813. DOI: 10.1145/1654988. Playground: A visualization suite showing how adversarial 1654990. examples fool deep learning”. In: 2017 IEEE Symposium [39] Bo Li, Yining Wang, Aarti Singh, and Yevgeniy Vorobey- on Visualization for Cyber Security (VizSec). Vol. 2017- chik. “Data Poisoning Attacks on Factorization-Based Col- Octob. IEEE, Oct. 2017. ISBN: 978-1-5386-2693-1. laborative Filtering”. In: 29th Conference on Neural Infor- DOI : 10 . 1109 / VIZSEC . 2017 . 8062202. arXiv: mation Processing Systems (NIPS 2016) Nips (Aug. 2016). 1708.00807. ISSN : 10495258. arXiv: 1608.08182. [51] Nicolas Papernot. “Characterizing the Limits and Defenses [40] Yen Chen Lin, Zhang Wei Hong, Yuan Hong Liao, Meng of Machine Learning in Adversarial Settings”. In: (2018). Li Shih, Ming Yu Liu, and Min Sun. “Tactics of adver- [52] Nicolas Papernot, Patrick McDaniel, and Ian Goodfellow. sarial attack on deep reinforcement learning agents”. In: “Transferability in Machine Learning: from Phenomena to IJCAI International Joint Conference on Artificial Intel- Black-Box Attacks using Adversarial Samples”. In: (2016). ligence (2017), pp. 3756–3762. ISSN: 10450823. arXiv: arXiv: 1605.07277. 1703.06748. [53] Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, [41] Chang Liu, Bo Li, Yevgeniy Vorobeychik, and Alina Oprea. Somesh Jha, Z. Berkay Celik, and Ananthram Swami. “Robust Linear Regression Against Training Data Poison- “Practical Black-Box Attacks against Machine Learning”. ing”. In: Proceedings of the 10th ACM Workshop on Arti- In: (Feb. 2016). DOI: 10 . 1145 / 3052973 . 3053009. ficial Intelligence and Security - AISec ’17 (2017), pp. 91– arXiv: 1602.02697. 102. DOI: 10.1145/3128572.3140447. [54] Nicolas Papernot, Patrick Mcdaniel, Somesh Jha, Matt [42] Qiang Liu, Pan Li, Wentao Zhao, Wei Cai, Shui Yu, and Fredrikson, Z. Berkay Celik, and Ananthram Swami. “The Victor C.M. Leung. “A survey on security threats and limitations of deep learning in adversarial settings”. In: Pro- defensive techniques of machine learning: A data driven ceedings - 2016 IEEE European Symposium on Security view”. In: IEEE Access 6 (2018), pp. 12103–12117. ISSN: and Privacy, EURO S and P 2016 (2016), pp. 372–387. 21693536. DOI: 10.1109/ACCESS.2018.2805680. DOI : 10 . 1109 / EuroSP . 2016 . 36. arXiv: 1511 . [43] Yingqi Liu, Shiqing Ma, Yousra Aafer, Wen-Chuan Lee, 07528. Juan Zhai, Authors Yingqi Liu, Weihang Wang, and Xi- [55] Nicolas Papernot, Patrick McDaniel, Arunesh Sinha, and angyu Zhang. “Trojaning Attack on Neural Networks”. In: Michael Wellman. “SoK: Towards the Science of Security NDSS 2018 (Network and Distributed System Security Sym- and Privacy in Machine Learning”. In: (Nov. 2016). arXiv: posium) (Feb. 2018). DOI: 10 . 14722 / ndss . 2018 . 1611.03814. 23291. [56] Nicolas Papernot, Patrick McDaniel, Arunesh Sinha, and [44] Daniel Lowd and Christopher Meek. “Adversarial learn- Michael P. Wellman. “SoK: Security and Privacy in Ma- ing”. In: Proceeding of the eleventh ACM SIGKDD interna- chine Learning”. In: 2018 IEEE European Symposium tional conference on Knowledge discovery in data mining - on Security and Privacy (EuroS&P). IEEE, Apr. 2018, KDD ’05 (2005). DOI: 10.1145/1081870.1081950. pp. 399–414. ISBN: 978-1-5386-4228-3. DOI: 10.1109/ [45] Seyed Mohsen Moosavi-Dezfooli, Alhussein Fawzi, Omar EuroSP.2018.00035. arXiv: 1611.03814. Fawzi, and Pascal Frossard. “Universal adversarial pertur- [57] Nicolas Papernot, Patrick McDaniel, Ananthram Swami, bations”. In: Proceedings - 30th IEEE Conference on Com- and Richard Harang. “Crafting adversarial input sequences puter Vision and Pattern Recognition, CVPR 2017 2017- for recurrent neural networks”. In: MILCOM 2016 - 2016 January (2017), pp. 86–94. ISSN: 1063-6919. DOI: 10 . IEEE Military Communications Conference. IEEE, Nov. 1109/CVPR.2017.17. arXiv: 1705.09554. 2016, pp. 49–54. ISBN: 978-1-5090-3781-0. DOI: 10 . [46] Mehran Mozaffari-Kermani, Susmita Sur-Kolay, Anand 1109 / MILCOM . 2016 . 7795300. arXiv: 1604 . Raghunathan, and Niraj K. Jha. “Systematic poisoning at- 08275. tacks on and defenses for machine learning in healthcare”. [58] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. In: IEEE Journal of Biomedical and Health Informatics “”Why Should I Trust You?”: Explaining the Predictions of 19.6 (2015), pp. 1893–1905. ISSN: 21682194. DOI: 10 . Any Classifier”. In: KDD ’16 Proceedings of the 22nd ACM 1109/JBHI.2014.2344095. SIGKDD International Conference on Knowledge Discov- [47] Luis Muñoz-González, Battista Biggio, Ambra Demon- ery and Data Mining (Aug. 2016), pp. 1135–1144. ISSN: tis, Andrea Paudice, Vasin Wongrassamee, Emil C. Lupu, 9781450321389. DOI: 10.1145/2939672.2939778. and Fabio Roli. “Towards Poisoning of Deep Learning Al- arXiv: 1602.04938. gorithms with Back-gradient Optimization”. In: Proceed- [59] Amir Rosenfeld, Richard Zemel, and John K. Tsotsos. “The ings of the 10th ACM Workshop on Artificial Intelligence Elephant in the Room”. In: (2018). arXiv: 1808.03305. and Security - AISec ’17. New York, SA: ACM Press, [60] Benjamin I.P. Rubinstein, Blaine Nelson, Ling Huang, An- 2017, pp. 27–38. ISBN: 9781450352024. DOI: 10.1145/ thony D Joseph, Shing-hon Lau, Satish Rao, Nina Taft, and 3128572.3140451. arXiv: 1708.08689. J. D. Tygar. “ANTIDOTE: Understanding and Defending [48] Blaine Nelson, Marco Barreno, Fuching Jack Chi, Anthony against Poisoning of Anomaly Detectors”. In: Proceedings D. Joseph, Benjamin I.P. Rubinstein, Udam Saini, Charles of the 9th ACM SIGCOMM conference on Internet mea- Sutton, J. D. Tygar, and Kai Xia. “Exploiting machine surement conference - IMC ’09. New York, New York, learning to subvert your spam filter”. In: In Proceedings of USA: ACM Press, Nov. 2009. ISBN: 9781605587714. DOI: the First Workshop on Large-scale Exploits and Emerging 10.1145/1644893.1644895. Threats (LEET) April (2008), Article 7. [61] Paolo Russu, Ambra Demontis, Battista Biggio, Giorgio [75] Xi Wu, Matthew Fredrikson, Somesh Jha, and Jeffrey Fumera, and Fabio Roli. “Secure Kernel Machines against F. Naughton. “A methodology for formalizing model- Evasion Attacks”. In: Proceedings of the 2016 ACM Work- inversion attacks”. In: Proceedings - IEEE Computer Se- shop on Artificial Intelligence and Security - ALSec ’16 curity Foundations Symposium 2016-Augus (2016). ISSN: (2016), pp. 59–69. DOI: 10.1145/2996758.2996771. 19401434. DOI: 10.1109/CSF.2016.32. [62] Ali Shafahi, W. Ronny Huang, Mahyar Najibi, Octavian Su- [76] Han Xiao, Huang Xiao, and Claudia Eckert. “Adversarial ciu, Christoph Studer, Tudor Dumitras, and Tom Goldstein. label flips attack on support vector machines”. In: Frontiers “Poison Frogs! Targeted Clean-Label Poisoning Attacks on in Artificial Intelligence and Applications 242.4 (2012), Neural Networks”. In: (Apr. 2018). arXiv: 1804.00792. pp. 870–875. ISSN: 09226389. DOI: 10.3233/978- 1- [63] Reza Shokri, Marco Stronati, Congzheng Song, and Vi- 61499-098-7-870. taly Shmatikov. “Membership Inference Attacks Against [77] Huang Xiao, Battista Biggio, Blaine Nelson, Han Xiao, Machine Learning Models”. In: Proceedings - IEEE Sym- Claudia Eckert, and Fabio Roli. “Support vector machines posium on Security and Privacy (2017), pp. 3–18. ISSN: under adversarial label contamination”. In: Neurocomput- 10816011. DOI: 10.1109/SP.2017.41. arXiv: 1610. ing 160 (2015), pp. 53–62. ISSN: 18728286. DOI: 10 . 05820. 1016/j.neucom.2014.08.081. [64] D.B. Skillicorn. “Adversarial Knowledge Discovery”. In: [78] Zhizhou Yin, Fei Wang, Wei Liu, and Sanjay Chawla. IEEE Intelligent Systems 24.6 (2009), pp. 1–13. ISSN: “Sparse Feature Attacks in Adversarial Learning”. In: IEEE 1541-1672. DOI: 10.1109/MIS.2009.108. Transactions on Knowledge and Data Engineering 30.6 [65] Congzheng Song, Thomas Ristenpart, and Vitaly (2018), pp. 1164–1177. ISSN: 10414347. DOI: 10.1109/ Shmatikov. “Machine Learning Models that Remem- TKDE.2018.2790928. ber Too Much”. In: (2017). ISSN: 15437221. DOI: 10.1145/3133956.3134077. arXiv: 1709.07886. [66] Nedim Šrndić and Pavel Laskov. “Practical evasion of a learning-based classifier: A case study”. In: Proceedings - IEEE Symposium on Security and Privacy (2014), pp. 197– 211. ISSN: 10816011. DOI: 10.1109/SP.2014.20. [67] Jacob Steinhardt, Pang Wei Koh, and Percy Liang. “Certi- fied Defenses for Data Poisoning Attacks”. In: (June 2017). ISSN : 10495258. arXiv: 1706.03691. [68] Rock Stevens, Octavian Suciu, Andrew Ruef, Sanghyun Hong, Michael Hicks, and Tudor Dumitraş. “Summon- ing Demons: The Pursuit of Exploitable Bugs in Machine Learning”. In: (Jan. 2017). arXiv: 1701.04739. [69] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. “Intriguing properties of neural networks”. In: (Dec. 2013), pp. 1–10. ISSN: 15499618. DOI: 10.1021/ct2009208. arXiv: 1312.6199. [70] Pedro Tabacof and Eduardo Valle. “Exploring the space of adversarial images”. In: Proceedings of the Interna- tional Joint Conference on Neural Networks 2016-Octob.1 (2016), pp. 426–433. DOI: 10 . 1109 / IJCNN . 2016 . 7727230. arXiv: 1510.05328. [71] Florian Tramèr, Nicolas Papernot, Ian Goodfellow, Dan Boneh, and Patrick McDaniel. “The Space of Transferable Adversarial Examples”. In: (2017). arXiv: 1704.03453. [72] Florian Tramèr, Fan Zhang, Ari Juels, Michael K. Re- iter, and Thomas Ristenpart. “Stealing Machine Learning Models via Prediction APIs”. In: Proceedings of the 25th USENIX Security Symposium 94.3 (Sept. 2016), pp. 601– 618. ISSN: 2469-9985. DOI: 10.1103/PhysRevC.94. 034301. arXiv: 1609.02943. [73] Binghui Wang and Neil Zhenqiang Gong. “Stealing Hy- perparameters in Machine Learning”. In: Proceedings - IEEE Symposium on Security and Privacy 2018-May.May (2018), pp. 36–52. ISSN: 10816011. DOI: 10.1109/SP. 2018.00038. arXiv: 1802.05351. [74] Lingxiao Wei, Yannan Liu, Bo Luo, Yu Li, and Qiang Xu. “I Know What You See: Power Side-Channel Attack on Convolutional Neural Network Accelerators”. In: (2018). arXiv: 1803.05847.