=Paper=
{{Paper
|id=Vol-2646/47-paper
|storemode=property
|title=kNN-guided Adversarial Attacks
|pdfUrl=https://ceur-ws.org/Vol-2646/47-paper.pdf
|volume=Vol-2646
|authors=Fabio Valerio Massoli,Fabrizio Falchi,Giuseppe Amato
|dblpUrl=https://dblp.org/rec/conf/sebd/MassoliFA20
}}
==kNN-guided Adversarial Attacks==
kNN-guided Adversarial Attacks (DISCUSSION PAPER) Fabio Valerio Massoli[0000−0001−6447−1301] , Fabrizio Falchi[0000−0001−6258−5313] , and Giuseppe Amato[0000−0003−0171−4315] ISTI-CNR, via G. Moruzzi 1, 56124 Pisa, Italy {fabio.massoli, fabrizio.falchi, giuseppe.amato}@isti.cnr.it Abstract In the last decade, we have witnessed a renaissance of Deep Learning models. Nowadays, they are widely used in industrial as well as scientific fields, and noticeably, these models reached super-human per- formances on specific tasks such as image classification. Unfortunately, despite their great success, it has been shown that they are vulnerable to adversarial attacks - images to which a specific amount of noise imper- ceptible to human eyes have been added to lead the model to a wrong decision. Typically, these malicious images are forged, pursuing a misclas- sification goal. However, when considering the task of Face Recognition (FR), this principle might not be enough to fool the system. Indeed, in the context FR, the deep models are generally used merely as features ex- tractors while the final task of recognition is accomplished, for example, by similarity measurements. Thus, by crafting adversarials to fool the classifier, it might not be sufficient to fool the overall FR pipeline. Start- ing from this observation, we proposed to use a k-Nearest Neighbour algorithm as guidance to craft adversarial attacks against an FR system. In our study, we showed how this kind of attack could be more threaten- ing for an FR system than misclassification-based ones considering both the targeted and untargeted attack strategies. Keywords: Adversarial Attacks · Face Recognition · Convolutional Neural Networks. 1 Introduction Since the publication of the AlexNet [6] in 2012, Deep Learning (DL) models started to play a central role in several scientific and industrial fields. Moreover, due to the extremely high computational power reached by the modern GPUs, the use of DL techniques has become state-of-the-art for solving problems such as: vision (e.g., image classification [6], object detection [4]), natural language processing [14], and sentiment analysis [9]. Despite this rebirth, there is a severe Copyright c 2020 for this paper by its authors. Use permitted under Creative Com- mons License Attribution 4.0 International (CC BY 4.0). This volume is published and copyrighted by its editors. SEBD 2020, June 21-24, 2020, Villasimius, Italy. threat that poses a strong limit to the use of Deep Neural Networks (DNNs) in real-world scenarios, especially in sensitive contexts such as surveillance systems [11] for instance. It was recently shown that DNNs are vulnerable to adversarial samples [12,1] - images to which a precise amount of noise imperceptible to hu- man eyes is added to fool a model. As shown in the next section of the paper, it is common for the adversaries to craft malicious samples following a misclassific- ation principle, i.e., with the goal in mind of leading a model, such as a classifier, to output a wrong prediction with very high confidence. However, if we consider the case of Face Recognition (FR) systems, the game paradigm deeply changes. Indeed, differently from the classical classification tasks, in the context of FR, a DNN is usually used merely as a features extractor [8,13] while the final task of recognition is realized by performing, for example, similarity measurements among the extracted representations. For instance, we can consider the case of Face Identification (FI) in which the features extracted from a query image has to be compared with a database of known identities to identify a person. Thus, adversarials crafted employing a misclassification principle might not succeed in fooling a similarity-based scheme. Generally speaking, we can divide the at- tacks against FR systems into two categories, namely: impersonation and evasion attacks. In the former case, the goal of the attacker is to lead the system to re- cognize two faces as belonging to the same person when that it is not true, while in the latter case he wants the system not to recognize a person. In this context, starting from the idea of deep representation attacks [10], we proposed a variant based on the use of a k-Nearest Neighbour (k-NN) algorithm as guidance, to fool an FR system that relies on similarity measurement to assess queries iden- tity. Moreover, we showed how such an attack could give rise to a greater threat concerning misclassification-based attacks. The rest of the paper is organized as follows: in Section 2, we briefly described some known algorithms to craft adversarial samples. In Section 3 and Section 4, we described our approach and presented the results of our study, respectively. Finally, in Section 5, we reported the conclusion and future perspectives of our work. 2 Adversarial Attacks Usually, the guiding principle followed while designing adversarial attack al- gorithms is to lead the model to assign a wrong label with high probability to an image. One of the first studies in this direction was proposed by [12] that de- signed an optimization procedure based on the L-BFGS algorithm. Specifically, the authors solved the following optimization problem: min L(x + δ, ygt ) + λ· k δ k2 ; with x + δ ∈ [l, u]m , (1) δ where L is the loss function, λ is a parameter found by linear search, ygt is the ground truth label for the input x, δ is the adversarial perturbation, l and u represent the lower and upper bound for the pixel values respectively, and x + δ is the current adversarial sample crafted from the given input x. A faster way to produce malicious images was proposed by [5], in which the authors proposed to linearize the loss function around the current value of the parameters, thus crafting the adversarial as: xadv = x + · sign(∇x J(θ, x, ytrue )) , (2) The method shown in Equation 2 is known as Fast Gradient Sign Method (FGSM) and allows to craft adversarial samples by only employing a single gradient step. In Equation 2, ∇x J(θ, x, y) is the gradient of the loss function J evaluated around the current neural network status θ, x is the input, ytrue is its label, and is the maximum distortion allowed on the input such that k x − xadv k∞ < . In [7], the authors proposed a unified view on adversarial attacks and defenses. Specifically, they reformulated the adversarial training for deep models as a “saddle-point” optimization problem min ρ(θ) = min E(x,y)∼D max L(θ, x + δ, y) (3) θ θ δ∈S in which the inner maximization aims at finding an adversarial of a given input x while the outer minimization problem has the goal of reducing the loss after the attack. Finally, they proposed an iterative procedure named Projected Gradient Descent (PGD): Y xadv N +1 = (xadv N + α · sign(∇x L(θ, x, y))) . (4) x+S An improved version of the iterative attacks was proposed in [3] named Momentum- Iterative FGSM (MI-FGSM). The authors integrated a momentum term that helped to escape from poor local maxima during, thus yielding stronger attacks. The main idea is first to evaluate the velocity vector in the gradient direction and then use it to craft the adversarial. The velocity vector is given by J(xadv N , y) gN +1 = µ · gN + , (5) k ∇x J(xadv N , y) k1 where xadv 0 = x, g0 = 0, µ is the decay factor of the running average, and y is the ground truth label. Finally, the adversarial example in the -vicinity measured by L2 distance is given by gN +1 xadv adv N +1 = xN + α · , (6) k gN +1 k2 where α = /T with T being the total number of iterations. All the attacks we discussed hitherto focused their attention on fooling a model by letting it predict a wrong class, with high confidence, for the given input. Nevertheless, the situation changes in the context of FR systems. Indeed, in such cases, the neural network is usually not used as a classifier, but as features extractor. For each query image, the model extracts a deep representation, which in turn is compared with a database of known identities. Thus, in those cases, the previous attack techniques might not be effective as one could expect. Following this concern, interesting suggestions on how to design more suited attacks for FR systems came from [10]. In [10], the authors formulated the attack against a DNN by looking at the internal representation of a model, in other words, they focused their attack on making the deep features of an adversarial as close as possible to the one of a target query. More formally, given a doublet (Is , It ) where the former is the source image and the latter the target one, the goal of the crafting process is to find a new image, Iα , such that its internal representation at a specific layer l of the neural network under attack, φl (Iα ), is as close as possible to the one of the target image, φl (It ). Meanwhile, Iα has to be as close as possible to the source image, Is , in the input space. The optimization problem can be formulated as follows: Iα = arg min k φl (I) − φl (It ) k22 , subject to k I − Is k∞ < δ , (7) I where δ is the maximum perturbation allowed on each pixel. 3 Proposed Approach In our experiments, we considered adversarial samples crafted utilizing two guid- ing principles. On the one hand, we used a typical misclassification-based ap- proach in which the goal of the attacks was to fool a DNN acting as a classifier. On the other hand, we proposed an attack in which we explicitly took into account a more realistic FR setup in which the DNN was used as a features ex- tractor, and the final task of the FR was accomplished by employing similarity measurements among deep representations. In this case, we used our proposed approach where we considered a k-NN algorithm as guidance to craft adversarial samples. Finally, we compared the effectiveness of both approaches to fool an FR system. Regarding the threatened model, we considered the state-of-the-art SeNet-50 [2] from which we extracted the deep features at the penultimate layer. As source dataset, we used the test set of VGGFace2 [2], which comprises 500 identities, that we split into training and experimental sets. To be able to craft misclassification-based attacks, we attached a fully connected (FC) layer on top of the features extractor, and to train it, we used the training split cited above, the SGD optimizer with a learning rate of 1.e−3 , a momentum of 0.9 and a batch size equal to 128. Then, we used the PGD [7] and the MI-FGSM [3] algorithms to craft misclassification-based adversarials. Concerning our proposed approach, instead, we started by training the k-NN using the class centroids of the VGGFace2 [2] training split images, and then we set k=1. Subsequently, we cast the adversarial crafting procedure as an op- timization problem, and we used the L-BFGS algorithm to solve it. Specifically, the main parameters to provide to the algorithm were the following: a starting point from which to begin the optimization procedure, a target, the function to minimize, and a set of parameters that were directly passed to it. The starting point was always a query image from which we wanted to craft an adversarial, while the target was the centroid of the class we wanted to move the adversarial features close to. As a minimization criterion, we used the regression loss. Spe- cifically, the loss was evaluated among the target centroid and the current state of the adversarial sample deep features. At each step of the optimization pro- cedure, we used the k-NN to assess if the deep representation of the adversarials were able to fool the FR system, i.e., if they were close enough to the target centroid deep representation. If so, we stopped the attack; otherwise, we contin- ued the optimization. Moreover, to be sure that the adversarial remained closed enough to the query image so that a human eye could not distinguish between them, we empirically bounded their distance, in the pixel space, to be: δ <= 7 (Equation 7). A schematic view of the overall algorithm is given in Figure 1. Source Adversarial Source noise Target Target Figure 1. Schematic view of the adversarials generation process. Left: initial features representation of the source image (green circle) and target class centroid (red rhom- bus). Right: features of the crafted adversarial sample (red solid circle). The blue rhombus represent the centroids of other classes. In Figure 1, we reported a face image next to the target centroid (red rhom- bus) only to show an example of the identity represented by the target point. 4 Experimental Results In this section, we compared the adversarials obtained through our method with the ones obtained by misclassification ones. Specifically, we compared our k- NN guided attack against PGD [7] and MI-FGSM [3] attacks. In Figure 2, we reported a few examples of the manipulated images produced by our algorithm. In Figure 2, we can notice how similar the images in the first and third columns appear to a human eye. Thus, even though we surfed the features space to craft the adversarials, we have been able to keep the malicious inputs very Figure 2. Examples of adversarial attacks crafted by using a k-NN as guidance. From left to right, each column corresponds to: source image (Is ), target class image (It ), adversarial sample (Iα ), adversarial noise, respectively. close to the original images in the pixel space. Subsequently, we compared the adversarials we generated with the k-NN with the ones crafted employing other attacks. Specifically, we considered attacks focused on fooling the classification task. Since our goal was to show that misclassification-based attacks might not be suitable to produce samples able to fool an FR system, a fundamental property to analyze was the euclidean distance among the adversarials deep features and the centroids of the respective adversarial class. Such a scenario emulates the case in with the features extracted from a query image have to be compared against a known database of known identities. The results are shown in Figure 3 for both targeted and untargeted attacks. As we can see from Figure 3, the samples generated using the k-NN guided algorithm have an expected value of the euclidean distance from the (adversarial) class centroids lower than the ones generated with other attacks. Such a result holds for targeted as well as untargeted attacks. As a gedankenexperiment, we shall suppose that the dotted-dashed black line in Figure 3 represented the threshold applied by the FR system when it had to decide on a FI task, for instance. In this case, to identify a person, the FR system evaluated the distance of the deep features of the query image from the centroids of the various available identities in the database. Thus, if Figure 3. Euclidean distance of the adversarial features from the centroids of the adversarial class. Left: targeted attacks. Right: untargeted attacks. The dashed-dotted line represents a hypothetical threshold used by a FR system to assess the similarity of deep features. we considered the threshold as shown in Figure 3, we observed that the attacks’ success rate for misclassification-based attacks dramatically decreased compared to what happened for the k-NN guided ones. The results are reported in Table 1 Table 1. Attack success rates after applying the distance threshold as shown in Fig- ure 3. The 100% reference value is given by the total number of crafted adversarial. Attack Algorithm Attack Success rate with thr (%) Targeted Untargeted PGD 2.7 16.1 MI-FGSM 4.5 17.2 k-NN 97.3 88.7 To sum up, we can assess that, in the attempt of attacking an FR sys- tem in which a DNN is used as a backbone features extractor, the use of misclassification-based attacks might not be successful as in classification con- texts to fool the recognition system. This happens because, in the FR scenario, the final output does not usually rely on the output of a deep classifier instead of a similarity measurement among deep features. Thus, it is mandatory to consider such aspects while designing attacks. 5 Conclusion and Future Perspectives The threat of adversarial attacks poses severe limits on the use of deep learn- ing techniques in sensitive real-world applications such as surveillance systems. Several algorithms have been proposed to fool a DNN to lead it to output a wrong prediction with high confidence. As shown, even though a classical misclassification-based attack can fool a neural network used as a classifier, the very same attack might not be strong enough to fool an FR system in which the output is based on similarity measurements carried upon deep features. Thus, different strategies have to be considered. We demonstrated that it is possible to fool an FR system through a deep features attack in which a k-NN is used as a guide in finding the proper perturbation to apply to the input image. Thus, it is possible to bring an adversarial closer to the aimed identity features compared to a misclassification-based. These results hold for the targeted and untargeted attacks. There are several potential extensions of this work. Different techniques to craft adversarials against an FR system might be considered, for example, with k > 1 or based on other guiding principles. On the other hand, it would be interesting trying to detect those kinds of attacks in which the adversary tries to emulate the deep representation of natural images within an attack. References 1. Biggio, B., Corona, I., Maiorca, D., Nelson, B., Šrndić, N., Laskov, P., Giacinto, G., Roli, F.: Evasion attacks against machine learning at test time. In: Joint European conference on machine learning and knowledge discovery in databases. pp. 387–402. Springer (2013) 2. Cao, Q., Shen, L., Xie, W., Parkhi, O.M., Zisserman, A.: Vggface2: A dataset for recognising faces across pose and age. In: 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018). pp. 67–74. IEEE (2018) 3. Dong, Y., Liao, F., Pang, T., Su, H., Zhu, J., Hu, X., Li, J.: Boosting adversarial attacks with momentum. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 9185–9193 (2018) 4. Girshick, R.: Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision. pp. 1440–1448 (2015) 5. Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014) 6. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep con- volutional neural networks. In: Advances in neural information processing systems. pp. 1097–1105 (2012) 7. Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A.: Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083 (2017) 8. Massoli, F.V., Amato, G., Falchi, F.: Cross-resolution learning for face recognition. arXiv preprint arXiv:1912.02851 (2019) 9. Ortis, A., Farinella, G.M., Battiato, S.: An overview on image sentiment analysis: Methods datasets and current challenges. In: Proc. 16th Int. Joint Conf. E-Bus. Telecommun. vol. 1, pp. 290–300 (2019) 10. Sabour, S., Cao, Y., Faghri, F., Fleet, D.J.: Adversarial manipulation of deep representations. arXiv preprint arXiv:1511.05122 (2015) 11. Sreenu, G., Durai, M.S.: Intelligent video surveillance: a review through deep learn- ing techniques for crowd analysis. Journal of Big Data 6(1), 48 (2019) 12. Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., Fer- gus, R.: Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199 (2013) 13. Wang, F., Cheng, J., Liu, W., Liu, H.: Additive margin softmax for face verification. IEEE Signal Processing Letters 25(7), 926–930 (2018) 14. Young, T., Hazarika, D., Poria, S., Cambria, E.: Recent trends in deep learn- ing based natural language processing. ieee Computational intelligenCe magazine 13(3), 55–75 (2018)