Understanding the One-pixel Attack: Propagation Maps and Locality Analysis Danilo Vasconcellos Vargas1 , Jiawei Su2 1 Kyushu University, 2 KDDI vargas@kyushu-u.ac.jp Abstract Deep neural networks were shown to be vulnerable to single pixel modifications. However, the reason behind such phenomena has never been elucidated. Here, we propose Propagation Maps which show the influence of the perturbation in each layer of the network. Propagation Maps reveal that even in ex- tremely deep networks such as Resnet, modification in one pixel easily propagates until the last layer. In fact, this initial local perturbation is also shown to spread becoming a global one and reaching abso- lute difference values that are close to the maximum value of the original feature maps in a given layer. Moreover, we do a locality analysis in which we demonstrate that nearby pixels of the perturbed one in the one-pixel attack tend to share the same vulner- ability, revealing that the main vulnerability lies in neither neurons nor pixels but receptive fields. Hope- fully, the analysis conducted in this work together with a new technique called propagation maps shall shed light into the inner workings of other adversar- Figure 1: a) Propagation Maps (PMmax) of a successful one-pixel ial samples and be the basis of new defense systems attack on Resnet shows how the influence of one pixel perturbation to come. grows and spreads (bright colors show differences in feature map that are close to the maximum original layer output). b) Average Propagation Map over the entire set of propagation maps shows the 1 Introduction overall distribution of attacks and their propagation. c) Illustration of Recently, a series of papers have shown that deep neural locality analysis. networks (DNN) are vulnerable to various types of attacks [Szegedy, 2014],[Nguyen et al., 2015],[Moosavi-Dezfooli et al., 2017],[Brown et al., 2017],[Su et al., 2019], [Moosavi- In this paper, inspired by recent attacks and defenses we Dezfooli et al., 2016].[Carlini and Wagner, 2017],[Kurakin et propose a technique called propagation maps that would be al., 2016],[Sharif et al., 2016], [Athalye and Sutskever, 2018] able to explain most of them. Here, giving the constrained However, the reasons underlying these vulnerabilities are still space, we focus on one attack which is puzzling and largely largely unknown. We argue that one of the most important unexplained, the one-pixel attack. Propagation maps enable facets of adversarial machine learning resides in its inves- us to show how one pixel perturbation may grow in influence tigative nature. In other words, adversarial machine learning over the layers and spread over many pixels to cause a final provides us with key tools to understand current DNNs. As change in class. Moreover, statistical properties of the prop- much as attacks tells us about the security behind DNNs, they agation reveal many properties of the attacks as well as their also tells us to what extent DNNs can reason over data and distribution (Figure 1). what do they understand to be, for example, the concept of a Additionally, to further understand the one-pixel attack a ”car” or ”horse”. locality analysis is performed. The locality analysis consists of Copyright c 2020 for this paper by its authors. Use permitted executing the attack in nearby pixels of a successful one-pixel under Creative Commons License Attribution 4.0 International (CC attack, i.e., using the same pixel perturbation but different BY 4.0). pixel position. Indeed, the success rate of nearby pixel is effective and equal among different neural networks (Figure 1), i.e., the total number of non-zero elements in the perturbation showing that rather than pixels or neurons, the vulnerability vector. The complete equation is as follows: lies in some of the receptive fields. This reveals an interesting property shared among DNNs which is independent of the maximize f (x + x )) x model or attack success rate. (3) subject to kx k0 ≤ d, 2 Adversarial Samples and Different Types of where d is a small number of dimensions (d = 1 for the Attacks one-pixel attack). The samples that can make machine learning algorithms misclassify received the name of adversarial samples. Let 3 Recent Advances in Attacks and Defenses f (x) ∈ Rk be the output of a machine learning algorithm in The question of if machine learning is secure was asked some which x ∈ Rm×n is the input of the algorithm for input and time ago [Barreno et al., 2006],[Barreno et al., 2010]. How- output of sizes m × n and k respectively. It is possible to ever, it was only in 2013 that Deep Neural Networks’ (DNN) define adversarial samples x’ explicitly as follows: security was completely put into question [Szegedy, 2014]. C. x’ = x + x Szegedy et al. demonstrated that by adding noise to an image (1) it is possible to produce a visually identical image which can {x’ ∈ Rm×n | argmax(f (x’)j ) 6= argmax(f (x)i ), make DNNs misclassify. This was counter-intuitive, since the j i DNNs that misclassified had a very high accuracy in the tests in which x ∈ Rm×n is a small perturbation added to the rivaling even the accuracy of human beings. input. Recently, the vulnerabilities of neural networks were shown In adversarial machine learning one wants to search for to be even more aggravating. Universal adversarial perturba- adversarial samples. For that it is possible to use the knowl- tions in which a single crafted perturbation is able to make edge of the DNN in question to craft samples such as using a DNN misclassify multiple samples was also shown to be back-propagation for obtaining gradient information and sub- possible [Moosavi-Dezfooli et al., 2017]. Moreover, in [Su sequently using gradient descent as done by the “fast gradi- et al., 2019] it was shown that even one pixel could make ent sign” proposed by I.J. Goodfellow et al. [Goodfellow et DNNs’ misclassify. Indicating that although DNNs have a al., 2014a]. There is also the greedy perturbation searching high accuracy in recognition tasks, their ”understanding” of method proposed by S.M. Moosavi-Dezfooli et al. [Moosavi- what is a ”dog” or ”cat” is still very different from human Dezfooli et al., 2016] and N. Papernot et al. utilize Jaco- beings. In fact, adversarial samples can be used to evaluate the bian matrix of the function learned during training to create robustness of a DNN [Moosavi-Dezfooli et al., 2016].[Carlini a saliency map which will guide the search for adversarial and Wagner, 2017]. samples [Papernot et al., 2016]. Although much of the research in adversarial machine learn- However, it is possible to search for adversarial samples ing is conducted under ideal conditions in a laboratory, the without taking into account the internal characteristics of same techniques are not difficult to apply to real world sce- DNNs. This type of model agnostic search is also called narios because printed out adversarial samples still work, i.e., black-box attack. many adversarial samples are robust against different light Regarding untargeted attacks, the objective function can be conditions [Kurakin et al., 2016]. In fact, in [Athalye and defined, for example, as the minimization of the soft label for Sutskever, 2018] the authors go a step further and verify the the outputted class f (x)i . See the complete equation below. existence of 3d adversarial objects which can fool DNNs even when viewpoint, noise, and different light conditions are tak- maximize − f (x + x )i x (2) ing into consideration. subject to kx k ≤ L There are many works in attacks and defenses but the reason behind such lack of robustness for accurate classifiers is still The minimization of the difference between the highest soft- largely unknown. In [Goodfellow et al., 2014b] it is argued label index and the second highest one is also one of the that DNNs’ linearity are one of the main reasons. If this is other possibilities of objective function for untargeted attacks. the case, perhaps hybrid systems that can leverage the non- There are many black-box attacks in the literature. To cite linearity that arise from complex models by using evolutionary some [Narodytska and Kasiviswanathan, 2017],[Papernot et based optimization techniques such as self-organizing clas- al., 2017],[Dang et al., 2017]. sifiers [Vargas et al., 2013] and neuroevolution with unified It is important to note that  should be small enough to neuron models [Vargas and Murata, 2017] would make for a not allow an image to become a different class. Such a trans- promising investigation. formation would invalidate the adversarial sample creation because it is not a misclassification. Since most attacks use perturbations which comprise of the whole image, kx k ≤ L 4 One-Pixel Attack is a good optimization constraint. However, it is also possible One-Pixel Attack investigated the opposite extreme of most to look the other way around and deal with few perturbed attacks to date. Instead of searching for small spread perturba- dimensions. In this case, the constraint changes to a L0 norm tion, it focus on perturbing just one pixel. This vulnerability to which actually counts the dimensions of the the perturbation, one-pixel is a totally different scenario, i.e., neural networks that are vulnerable to usual attacks may not be vulnerable to 6 Propagation Maps for the One-Pixel Attack one-pixel attack and vice-versa. To achieve such an attack in a black-box scenario the au- To investigate how a single pixel perturbation can cause thors used differential evolution which is a simple yet effective changes in class, we will make use of the proposed propa- evolutionary (DE) algorithm [Storn and Price, 1997]. A candi- gation maps. This will allow us to visualize the perturbations date solution is coded as a pixel position and its related pertur- in each of the layers of the neural network. bation. The DE search for promising candidate solutions by For the experiments, Resnet, which is one of the most accu- minimizing the output label of the correct class (Equation 2). rate types of neural networks, is used. Each of the subsections In this paper, we use the same differential evolution settings as below investigate a specific scenario. the original paper [Su et al., 2019]. However, here we define a successful attack as an adversarial attack made over a correctly 6.1 Single Pixel Perturbations that Change Class classified sample. As a consequence, adversarial attacks over already misclassified samples will be ignored. In Figure 2, the propagation map (PMmax) of a successful one-pixel attack is shown. The perturbation is shown to start small and localized and then spread in deeper layers. In the 5 Propagation Maps last layer, the perturbation spread enough to influence strongly Perturbation on the input image propagates throughout the neu- more than a quarter of the feature map. This is the element- ral network to change its class in adversarial samples. How- wise maximum behavior which allows us to identify how ever, much of this process is unknown. In other words, how strong is the maximum difference in feature maps. does this perturbation cause a change in the class label? What The propagation map based on averaged differences are the internal differences between adversarial attacks and (PMavg) shows that the difference is concentrated in some failed attacks if any? feature maps (Figure 3). Moreover this average difference is Here, to walk towards an answer to the questions above we kept more or less the same throughout the layers. In the case propose a technique called propagation maps which can reveal of PMmax, the difference had a sort of wave behavior, some- the perturbation throughout the layers. Propagation maps con- times growing in strength, sometimes slightly fading away. sists of comparing the feature maps of both adversarial and All observed adversarial samples shared similar features of original samples. Specifically, by calculating the difference propagation maps. This is to be expected, since they need to between the feature maps and averaging them (or getting their influence enough in order to change the class. maximum value) for each convolutional layer, the perturba- Surprisingly, one pixel change can cause influences that tion’s influence can be estimated. Consider an element-wise spread over the entire feature map, specially in deeper layers. maximum of a three dimensional array O for indices a, b and This also contradicts to some extent the expectation that high k to be described as: level features will be processed in deeper layers. Ma,b = max(Oa,b,0 , Oa,b,1 , ..., Oa,b,k ), (4) 6.2 Single Pixel Perturbations that Do Not Change k Class where M is the resulting two dimensional array. Therefore, for a layer i, its respective propagation map P Mi Successful one-pixel attacks were shown to grow its influence can be obtained by: throughout the layers, culminating in a strong and spread influence in the last layers. Here we change the position of the adv pixel to unable the attack to succeed. Figure 4 shows that in P Mi = max(|F Mi,k − F Mi,k |), (5) k such a case the influence’s intensity decreases. In fact, in the adv last layer it is almost imperceptible the influence. However, where F Mi,k and F Mi,k are respectively the feature maps this is not the rule. A counterexample is shown in Figure 5 for layer i and kernel k of the natural (original) and adversarial in which a pixel is changed without changing the class label. samples. This time however, the perturbation propagates strongly, being Alternatively, one may wish to see the average over the as strong if not stronger than the successful one-pixel attack filters which exposes a slightly different influence diluted over observed in Figure 2. One might argue that the influence has the kernels in the same layer. It can be computed as follows: caused the confidence to decrease but not enough to cause the change. For this case, indeed the confidence decreases 1 X adv from 99% to 52%. However, Figure 6 has a similar behavior P Mi = (|F Mi,k − F Mi,k |), (6) nk although the confidence decreases only one percentage (from k 100% to 99%). where nk is the number of filters. Thus, qualitatively speaking, unsuccessful one-pixel attacks PMmax and PMavg will be used to differentiate between not necessarily fail to achieve a high influence in the last layer. Propagation Maps using Equations 5 and 6. Notice that in It depends strongly on the pixel position and sample. This order to put the perturbation’s influence in the same scale of is accordance with saliency maps which show that different the original feature map, the maximum scale value will be set parts of the image have different importance in the recognition to the original feature map when plotting. process [Simonyan et al., 2013]. Figure 2: Propagation Map (PMmax) for Resnet using a sample from CIFAR. For this experiment, Equation 5 is used. The sample above is incorrectly classified as automobile after one pixel is changed in the image. Values are scaled with the maximum value for each layer of the feature maps being the maximum value achievable in the color map. Therefore, bright values show that the difference in activation is close to the maximum value in the feature map, demonstrating the strength of the former. Figure 3: Propagation Map (PMavg) for Resnet using a sample from CIFAR. For this experiment, Equation 6 is used. Values are scaled with the maximum value for each layer of the feature maps being the maximum value achievable in the color map. 7 Statistical Evaluation of Propagation Maps attack. In previous sections, single attacks were analyzed in the light To further clarify if there is any explicit difference between of examples and counterexamples. These experiments were successful and failed attacks, we explicitly calculated the mean important to investigate what happens in detail for each of over all the feature maps for the previous successful and un- these attacks. However, they do not tell much about the distri- successful attacks. The plot is shown in Figure 8. The average bution of attacks. This section aims to fill this gap by investi- difference is also unable to distinguish between successful and gating the attacks’ distribution and other statistically relevant failed attacks, with both having very similar behavior. data. By evaluating the average of PMmean over successful at- 8 Position Sensitivity and Locality tacks, we can observe the attack distribution over the entire feature map as well as their average spread over the layers The one-pixel attack works by searching for a one-pixel per- (Figure 7). The successful attacks seem to concentrate mostly turbation where the class can be modified (misclassified). This close to the center of the image. In deeper layers, the influence search process is costly but to what extent is the success of the expands and increase in intensity, specially at its center. This attack dependent on its position? reveals that the behavior observed in Figure 2 is usual for most Here the position sensitivity of the attack will be analyzed. of the attacks. First, we consider an attack in which a pixel is randomly cho- Given this distribution for successful attacks, it would be sen to be perturbed by the same amount that could create an interesting to contrast them with failed attempts. This would adversarial sample. The results, which are shown in Table 1, enable us to further clarify the characteristics of a successful demonstrate that random pixel attacks have a very low success Figure 4: Propagation Map (PMmax) for a perturbation that failed to change the class for Resnet using a sample from CIFAR. For this experiment, Equation 5 is used. The sample above is correctly classified as cat even after one pixel is changed in the image. Values are scaled with the maximum value for each layer of the feature maps being the maximum value achievable in the color map. Figure 5: Propagation Map (PMmax) for a perturbation that failed to change the class for Resnet using a sample from CIFAR. For this experiment, Equation 5 is used. The sample above is correctly classified as horse even after one pixel is changed in the image. Values are scaled with the maximum value for each layer of the feature maps being the maximum value achievable in the color map. rate. This suggests that position is important and by disre- neural networks with similar architectures share similar re- garding it is almost impossible to achieve a successful attack. ceptive field relationships, nearby attacks on similar networks Having said that, the attack on nearby pixels (i.e., the eight should have similar success rate. adjacent pixels) shows a positive result. In fact, if we con- sider that this attack is not conducting any search at all but 9 The Conflicting Salience Hypothesis only taking one random pixel. The results may be considered Propagation maps demonstrated that some pixels’ influence extremely positive. failed to reach the last layer (Figure 4) while others influenced The extremely positive results present in Table1, however, the last layer enough to cause a change in class (Figure 2). are in accordance with the receptive fields of convolutional This analysis share a close resemblance to saliency maps in layers. In other words, every neuron in convolutional layers which one wishes to discover which pixels are responsible for calculates a convolution of the kernel with a part of the input a class. In fact, since propagation maps measure the amount image which is of the same size. The convolution itself is of influence from perturbations, it would be reasonable to as- a linear function in which the change in one input would sume that they may have a close relationship with disturbance cause the whole convolution to be affected. Thus, the result in saliency maps. Consequently, adversarial samples would of the convolution will be the same for nearby neurons in cause enough disturbance in saliency maps to cause a change the same receptive field. Consequently, this shows that the in class. Thus, we raise here the hypothesis of a conflicting vulnerable part of DNNs were neither neurons nor pixels but saliency from adversarial samples. some receptive fields. If this is true, then what adversarial machine learning is Interestingly, completely different networks have a very sim- doing is not fooling DNNs but rather taking away his attention. ilar success rate for both nearby attacks. This further demon- It is like a magician that calls attention to his right hand while strate that the receptive field is the vulnerable part. Since his left hand pushes the magic ball. Or like the blinking light Figure 6: Propagation Map (PMmax) for a perturbation that failed to change the class for Resnet using a sample from CIFAR. For this experiment, Equation 5 is used. The sample above is correctly classified as airplane even after one pixel is changed in the image. Values are scaled with the maximum value for each layer of the feature maps being the maximum value achievable in the color map. Figure 7: Average of PMmean over 318 successful attacks on Resnet from CIFAR dataset, i.e., all successful attacks from 1000 trials. on the street that calls attention of the driver which suddenly Acknowledgements drive through a red traffic light. This work was supported by JST, ACT-I Grant Number JP- Having said that, propagation maps is a feedforward based 50243 and JSPS KAKENHI Grant Number JP20241216. technique to visualize and measure the influence of a perturba- tion while saliency maps aim to investigate the salient pixels References for a given class with backpropagated gradients. Therefore, the methods differ in many ways and their relationship, which [Athalye and Sutskever, 2018] Anish Athalye and Ilya might be more complex than what is stated here, goes beyond Sutskever. Synthesizing robust adversarial examples. In the scope of this paper. We leave this as an open question that ICML, 2018. should be worthwhile to investigate. [Barreno et al., 2006] Marco Barreno, Blaine Nelson, Russell Sears, Anthony D Joseph, and J Doug Tygar. Can machine 10 Conclusions learning be secure? In Proceedings of the 2006 ACM Symposium on Information, computer and communications This paper proposed a novel technique called propagation security, pages 16–25. ACM, 2006. maps and used it to analyze one of the most puzzling attacks, [Barreno et al., 2010] Marco Barreno, Blaine Nelson, An- the one-pixel attack. The analysis showed how a pixel modifi- thony D Joseph, and J Doug Tygar. The security of machine cation causes an influence throughout the layers, culminating learning. Machine Learning, 81(2):121–148, 2010. in the change of the class. Moreover, a locality analysis re- vealed that receptive fields are the vulnerable parts of DNNs [Brown et al., 2017] Tom B Brown, Dandelion Mané, Aurko and therefore nearby attacks to successful one-pixel attack Roy, Martı́n Abadi, and Justin Gilmer. Adversarial patch. have a high success rate. Lastly, a new hypothesis was pro- arXiv preprint arXiv:1712.09665, 2017. posed that could together with the proposed propagation maps [Carlini and Wagner, 2017] Nicholas Carlini and David Wag- help explain the reason behind adversarial attacks in DNNs. ner. Towards evaluating the robustness of neural networks. Table 1: Success rate of one-pixel attack on both nearby pixels and a single randomly chosen pixel. This experiment is conducted in the following manner. First the one-pixel attack is executed. Afterwards, the same perturbation is used to modify one random pixel of the original image and evaluate success of the method. To obtain a statistically relevant result, both random and nearby pixel attack are repeated once per image for each successful attack in 5000 samples of the CIFAR-10 dataset (in which there are 1638 successful attacks for Resnet and 2934 successful attacks for Lenet). L ENET R ESNET O RIGINAL O NE - PIXEL ATTACK 59% 33% O NE -P IXEL ATTACK ON Random Pixels 4.9% 3.1% O NE - PIXEL ATTACK ON Nearby Pixels 33.1% 31.3% Frossard. Universal adversarial perturbations. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 86–94. IEEE, 2017. [Narodytska and Kasiviswanathan, 2017] Nina Narodytska and Shiva Kasiviswanathan. Simple black-box adversarial attacks on deep neural networks. In 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 1310–1318. IEEE, 2017. [Nguyen et al., 2015] Anh Nguyen, Jason Yosinski, and Jeff Clune. Deep neural networks are easily fooled: High con- fidence predictions for unrecognizable images. In Pro- ceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 427–436, 2015. [Papernot et al., 2016] Nicolas Papernot, Patrick McDaniel, Somesh Jha, Matt Fredrikson, Z Berkay Celik, and Anan- thram Swami. The limitations of deep learning in adversar- ial settings. In Security and Privacy (EuroS&P), 2016 IEEE European Symposium on, pages 372–387. IEEE, 2016. Figure 8: Average difference for all layers when the attack is success- ful, when it fails and the average value of the feature map without [Papernot et al., 2017] Nicolas Papernot, Patrick McDaniel, any modification. Ian Goodfellow, Somesh Jha, Z Berkay Celik, and Anan- thram Swami. Practical black-box attacks against machine In 2017 IEEE Symposium on Security and Privacy (SP), learning. In Proceedings of the 2017 ACM on Asia Con- pages 39–57. IEEE, 2017. ference on Computer and Communications Security, pages [Dang et al., 2017] Hung Dang, Yue Huang, and Ee-Chien 506–519. ACM, 2017. Chang. Evading classifiers by morphing in the dark. 2017. [Sharif et al., 2016] Mahmood Sharif, Sruti Bhagavatula, Lujo Bauer, and Michael K Reiter. Accessorize to a crime: [Goodfellow et al., 2014a] Ian J Goodfellow, Jonathon Real and stealthy attacks on state-of-the-art face recogni- Shlens, and Christian Szegedy. Explaining and harness- tion. In Proceedings of the 2016 ACM SIGSAC Conference ing adversarial examples. arXiv preprint arXiv:1412.6572, on Computer and Communications Security, pages 1528– 2014. 1540. ACM, 2016. [Goodfellow et al., 2014b] Ian J Goodfellow, Jonathon [Simonyan et al., 2013] Karen Simonyan, Andrea Vedaldi, Shlens, and Christian Szegedy. Explaining and harness- and Andrew Zisserman. Deep inside convolutional net- ing adversarial examples. arXiv preprint arXiv:1412.6572, works: Visualising image classification models and saliency 2014. maps. arXiv preprint arXiv:1312.6034, 2013. [Kurakin et al., 2016] Alexey Kurakin, Ian Goodfellow, and [Storn and Price, 1997] Rainer Storn and Kenneth Price. Dif- Samy Bengio. Adversarial examples in the physical world. ferential evolution–a simple and efficient heuristic for arXiv preprint arXiv:1607.02533, 2016. global optimization over continuous spaces. Journal of [Moosavi-Dezfooli et al., 2016] Seyed-Mohsen Moosavi- global optimization, 11(4):341–359, 1997. Dezfooli, Alhussein Fawzi, and Pascal Frossard. Deepfool: [Su et al., 2019] Jiawei Su, Danilo Vasconcellos Vargas, and a simple and accurate method to fool deep neural networks. Sakurai Kouichi. One pixel attack for fooling deep neural In Proceedings of the IEEE Conference on Computer networks. IEEE Transactions on Evolutionary Computa- Vision and Pattern Recognition, pages 2574–2582, 2016. tion, 2019. [Moosavi-Dezfooli et al., 2017] Seyed-Mohsen Moosavi- [Szegedy, 2014] Christian et al. Szegedy. Intriguing proper- Dezfooli, Alhussein Fawzi, Omar Fawzi, and Pascal ties of neural networks. In In ICLR. Citeseer, 2014. [Vargas and Murata, 2017] Danilo Vasconcellos Vargas and Junichi Murata. Spectrum-diverse neuroevolution with uni- fied neural models. IEEE transactions on neural networks and learning systems, 28(8):1759–1773, 2017. [Vargas et al., 2013] Danilo Vasconcellos Vargas, Hirotaka Takano, and Junichi Murata. Self organizing classifiers: first steps in structured evolutionary machine learning. Evo- lutionary Intelligence, 6(2):57–72, 2013.