Projecting Trouble: Light Based Adversarial Attacks on Deep Learning Classifiers Nicole Nichols 1,2 Robert Jasper1 nicole.nichols@pnnl.gov robert.jasper@pnnl.gov 1 Pacific Northwest National Laboratory, Seattle, Washington 2 Western Washington University, Bellingham, Washington Abstract manipulate physical objects to fool classifiers, which could pose a much greater real world threat. This work demonstrates a physical attack on a deep learning image classification system using projected light onto a phys- ical scene. Prior work is dominated by techniques for creat- Related Research ing adversarial examples which directly manipulate the digi- Researchers have proposed many theories about the cause tal input of the classifier. Such an attack is limited to scenar- of model vulnerabilities. Evidence suggests that adversarial ios where the adversary can directly update the inputs to the samples lie close to the decision boundary in the low dimen- classifier. This could happen by intercepting and modifying sional manifold representing high dimensional data. Adver- the inputs to an online API such as Clarifai or Cloud Vision. sarial manipulation in the high dimension is often impercep- Such limitations have led to a vein of research around physi- tible to humans and can shift the low dimensional represen- cal attacks where objects are constructed to be inherently ad- versarial or adversarial modifications are added to cause mis- tation to cross the decision boundary (Feinman et al. 2017). classification. Our work differs from other physical attacks in Many approaches are available to perform this manipulation that we can cause misclassification dynamically without al- if the attacker has access to the defender’s classifier. Further- tering physical objects in a permanent way. more, adversarial examples have empirically been shown We construct an experimental setup which includes a light to transfer between different classifier types (Papernot, Mc- projection source, an object for classification, and a camera to Daniel, and Goodfellow 2016; Szegedy et al. 2013). This capture the scene. Experiments are conducted against 2D and enhances the attacker’s potential capability when there is no 3D objects from CIFAR-10. Initial tests show projected light access to the defender’s classifier. patterns selected via differential evolution could degrade clas- It is difficult for defenses to keep pace with attacks, and sification from 98% to 22% and 89% to 43% probability for the advantage lies with the adversary. This was highlighted 2D and 3D targets respectively. Subsequent experiments ex- when seven of the eight white box defenses announced at plore sensitivity to physical setup and compare two additional the prestigious ICLR2018 were defeated within a week of baseline conditions for all 10 CIFAR classes. Some physical publication (Athalye, Carlini, and Wagner 2018). targets are more susceptible to perturbation. Simple attacks show near equivalent success, and 6 of the 10 classes were Researchers have successfully demonstrated physical disrupted by light. world attacks against deep learning classifiers. Some of the first physical attacks were demonstrated by printing an ad- versarial example, photographing the printed image, and Introduction verifying the adversarial attack remained (Kurakin, Good- fellow, and Bengio 2016). (Sharif et al. 2016) demonstrated Machine learning models are vulnerable to adversarial at- printed eyeglasses frames that thwart facial recognition sys- tacks by making small but targeted modifications to inputs tems and fully avoid face detection by the Viola-Jones object that cause misclassification. The research around adversar- detection algorithm. It has also been noted that near infra- ial attacks on deep learning systems has grown significantly red light can also be used to evade face detection (Yamada, since (Szegedy et al. 2013) demonstrated intriguing proper- Gohshi, and Echizen 2013). Our work is different because ties. The scope and limitations of such attacks is an active we leverage dynamic generation methods use real world area of research in the academic community. Most of the feedback when learning the patterns of light to project. research has focused on the purely digital manipulation. Re- Putting aside adversarial attacks, most image classifiers cently, researchers have developed techniques that alter or are not inherently invariant to object scale, translation, or rotation. Notable exceptions are (Cohen and Welling 2014), Copyright c by the papers authors. Copying permitted for private and academic purposes. In: Joseph Collins, Prithviraj Dasgupta, which attempts to learn object recognition by construction Ranjeev Mittu (eds.): Proceedings of the AAAI Fall 2018 Sympo- of parts, and (Qi et al. 2017) which use 3D point cloud rep- sium on Adversary-Aware Learning Techniques and Trends in Cy- resentation for object classification. To some degree, this in- bersecurity, Arlington, VA, USA, 18-19 October, 2018, published variance can be learned from training data if it has intention- at http://ceur-ws.org ally been designed to address this gap. For example the early work by (LeCun, Huang, and Bottou 2004) was evaluated by the camera. Though only one pixel was modified in the with the NORB dataset which was systematically collected digital attack pattern, because of the distance between the to assess pose, lighting, and rotation of 3D objects. projector and object, a larger area in the captured scene and Simulating scale, translation, and rotation of 2D images many input pixels to the camera are modified. The original is conducive to experiment automation, and many recent ad- and attacked scenes are shown in Figure 1. vances in rotational invariance such as Spatial Transformer Through this attack, the probability of horse was de- Networks (Jaderberg et al. 2015), use this framework for creased from 98% to 22%. evaluation of robustness to these properties. However, fur- ther research is needed to validate the ability of this simu- 3D Presentation lated rotational invariance to transfer to real world rotation To demonstrate the potential for light based attacks, we ex- of 3D figures. We emphasize the need for invariant models tended the 2D methodology to a 3D scene in two experimen- because it is impossible to disambiguate the success of an tal phases. First, we placed a toy car in the field of view of attack when it is can only be validated with a weak model. the web camera to capture the scene. To perform the attack, Maintaining adversarial attack under a range of pose or the projector iteratively applies the same adversarial noise lighting conditions may prove to be the most difficult as- procedure to the 3D physical scene and the same ResNet38 pect of this task. Some preliminary research suggests this is model is used for evaluation. The object probabilities for the possible and demonstrated two toy examples in the physi- original scene were 89% automobile and 11% truck. cal world (Athalye and Sutskever 2017). They introduce an The attacked scene probabilities were 43 % automobile Expectation over Transformation (EoT) method for differ- and 57% truck. The second phase of experiments was de- entiating texture patterns through a 3D renderer to produce signed to improve the repeatability and confidence of the ini- an adversarial object. An additional demonstration of phys- tial demonstration. Results are expanded to evaluate all 10 ical attack is to introduce an adversarial patch to the physi- CIFAR classes: airplane, automobile, bird, cat, cal scene, which is invariant to location, rotation, scale, and deer, dog, frog, horse, ship, truck. The figurines cause specific misclassification (Brown et al. 2017). used for each of these classes are shown in Figure 4a. The yellow car in phase 1 was not available and was replaced Experimental Setup and Results with a red car in phase 2. Rotation invariance is important for interpreting the pre- We constructed a test environment to perform light based sented experimental setup. This impacts our data collec- adversarial attacks and collect data in an office environment tion because we observed in a baseline condition, with with minimal lighting control. Our attacks were conducted no added light, the distance to the camera and object ori- against 2D and 3D target objects placed in the scene. We entation yielded highly variable classification results. We used a projector to project light onto the target and a com- tested four experimental conditions: ambient light, white mon web camera to capture the scene. For the 2D and initial light from the projector, white light with a randomly located 3D experiments, the projector was a Casio XJ-A257 and the pixel in the 32x32 grid, and differential evolution process camera was a Logitech C930e. During the second phase of to control color and location of one pixel in a 32x32 white 3D experiments, we used an Epson VS250 projector, Log- grid. We observed classification variability in the physical itech C615 HD camera and an Altura HD-ND8, neutral den- scene when no modifications were applied. For this reason sity filter to control the light intensity of the projector. we introduced some lighting controls which observation- ally provided a significantly more stable baseline classifica- 2D Presentation tion. Three physical modifications were made. The projected For the 2D scene, we chose a random image (horse) from background color was changed from black to white to pro- the CIFAR-10 dataset to be attacked. The image was printed vide more uniformity to the scene. We used a foam block to and secured to the wall in front of the camera and projec- minimize stray reflections caused by the projector. Addition- tor. Following a similar methodology of earlier work (Su, ally we used a neutral density filter to scale the light inten- Vargas, and Kouichi 2017) on single pixel attacks we use sity. To verify stability, we collected twenty image captures differential evolution (DE) to optimize a light based attack of each test condition, and 200 for differential evolution (50 to cause misclassification. Differential evolution is a heuris- population sample and 4 evolution phases). tic global optimization strategy similar to genetic algorithms Reproducibility of the physical placement of each object where the algorithm maintains a population of candidate so- in the scene is imprecise, thus each test condition was col- lutions, selecting a small number (potentially one) for fur- lected in sequence without any disturbance (besides light). ther rounds of modification and refinement. We projected An unrecorded calibration phase was used to reposition a digital black 32x32 square containing a single pixel at a the object for a maximum baseline classification score be- variable location and RGB values. Because projectors can’t fore the recorded baseline and light projected data was col- project black (the absence of light) the projector adjusted the lected. For each class and test condition, we report the mean, black pixels to present the illusion of a black background. median, standard deviation, variance, minimum, maximum, This adjustment is impacted somewhat by RGB value of ∆mean and ∆median. The ∆mean and ∆median are the the single pixel being projected. Each iteration of the dif- computation of the reduction in probability score for the ferential evolution was projected, captured, and input to a given attack type relative to baseline. Larger ∆ numbers rep- standard ResNet38 for classification of the image captured resent more powerful decrease in the true class probability. (a) The 2D scene without adversarial attack. (b) The 2D scene with adversarial attack. Figure 1: Images demonstrating light based attack on 2D physical presentation All scores are reported in Table 1. to extend this to more complex physical scenarios and clas- Interpreting the table yields one immediate observation: sification models. some examples (Automobile, Bird, Horse, Ship) are We chose to attack the CIFAR-10 framework in a manner invariant to the light attack, consistently being identified as similar to what was demonstrated in the original single pixel the true class at 100% (within rounding error) while other attack (Su, Vargas, and Kouichi 2017). This framework is classes (Airplane, Cat, Deer, Dog, Frog, and Truck) an easier target because it is a low resolution, low param- have varying degrees of susceptibility. It is unclear whether eter model. To assess the robustness of stronger models, a these differences are inherent in the classes themselves, or ResNet50 classifier trained on ImageNet was also used to to the particular figurines we chose. As one might expect evaluate all of the collected images. Because of a lack of cor- with a research classifier, there is a high degree of variability responding true class identification, scores are not reported, based on the particular example. We incremented the com- but it was observed that the top1 class prediction was shifted plexity of light attack from pure white light, random square, with the addition of light based attacks. and differential evolution, to assess if there was something There is also a closed world assumption of 10 relatively unique in the more sophisticated attack, or if it was merely dissimilar classes, where the probability of all classes sums the addition of light, or a pattern, that was causing the ob- to one. When a misclassification occurs, it tends to be more served decrease in classification. In many cases, the simple outlandish than it could otherwise be. For example, rose addition of white light is as effective as the other attacks. For and tulip might be a more forgiving mistake than frog example, the mean airplane class was decreased from 1.000 and airplane but in the CIFAR closed world framework, to 0.151, with only the addition of white light. The corre- the model is limited to the 10 known classes. sponding trials with random and differential evolution light In our attack on the 3D presentation, the true class was patterns yielded only slightly stronger attacks, with 0.113 correctly identified as car when no attack was present. By and 0.133 mean scores respectively. However, the decline is applying the adversarial light attack, we were able to de- noteworthy, independent of sophistication. crease the confidence of car from 89% to 43%, and instead predict truck with 57% probability. We would not iden- Discussion tify this as a 3D attack because we had a fixed orientation Physical attacks on machine learning systems could be ap- between the camera, projector, and object. In this example, plied in a wide range of security domains. The literature the single square attack is visually perceptible but transient. has primarily discussed the safety of road signs and au- However, the notion of human perception is not as simple tonomous driving (Eykholt et al. 2017; Chen et al. 2018), as an L∞ distance in pixel space. This is highlighted by however other security applications may also be impacted. the fact that consecutive video frames can be significantly An adversary may be trying to hide themselves or physi- mis-classified by top performing image classification sys- cal ties to illegal activities to evade law enforcement (e.g. tems (Zheng et al. 2016). Images that are imperceptibly dif- knives/weapons, contraband, narcotics manufacturing, etc). ferent can have large distance in pixel or feature space, and Any AI to be deployed for law-enforcement applications images that are perceptually different can be close. needs to be robust in an adversarial environment where A key topic that needs further understanding is why the physical obfuscation could be employed. Light based at- extreme variability in class identification. One potential ex- tacks: planation is the degree of self similarity within a class, and • Can perform targeted and non-targeted attacks. training data bias. For example, the horse images in the • Do not modify physical object in a permanent way. training data, are potentially all self similar and also closely match the example figurine. The variation between different • Can be a transient effect occurring at specified times. types of horses is likely smaller than the visual difference This work aims to be a first step towards understanding the between different breeds of dogs. abilities and limitations of such physical attacks. We picked Another possible explanation is the scale or percentage of a relatively easy first target to verify the possibility and plan the scene that the object occupies. Most of the classes which (a) The 3D scene without any adversarial attack. (b) The 3D scene with adversarial attack. Figure 2: Images demonstrating light based attack on 3D physical presentation (a) Downsampled image without any adversarial attack. (b) Downsampled image with adversarial attack. Figure 3: Downsampled images demonstrating light based attack on 3D physical representation (a) The toy figurines used to represent the CIFAR classes. (b) Physical setup demonstrating relative position of projector, camera, object, and lighting control. Figure 4: Experiment setup and figurines for second phase experiments with 3D presentation. were susceptible to attack were relatively small. The notable Athalye, A.; Carlini, N.; and Wagner, D. 2018. Ob- exception was the truck which was actually the largest figure fuscated gradients give a false sense of security: Circum- used for data, yet was still susceptible to misclassification venting defenses to adversarial examples. arXiv preprint errors with the addition of light. arXiv:1802.00420. There are a several important constraints present when Brown, T. B.; Mané, D.; Roy, A.; Abadi, M.; and Gilmer, J. crafting a light based physical attack that are unconstrained 2017. Adversarial patch. arXiv preprint arXiv:1712.09665. in a digital attack. Specifically, light is always an additive Chen, S.-T.; Cornelius, C.; Martin, J.; and Chau, D. H. 2018. noise and turning a dark color to white with the addition of Robust physical adversarial attack on faster r-cnn object de- light is impossible. The angle of projection and the texture tector. arXiv preprint arXiv:1804.05810. of the scene may impact the colors reflected to the camera. The camera itself will introduce color balance changes as Cohen, T. S., and Welling, M. 2014. Transformation it adjusts to the adversarial addition of light. Even a fully properties of learned visual representations. arXiv preprint manual camera will always have CCD shot noise, which is a arXiv:1412.7659. function of shutter speed and temperature, that could influ- Eykholt, K.; Evtimov, I.; Fernandes, E.; Li, B.; Rahmati, A.; ence the success or failure of a light based attack. The pro- Xiao, C.; Prakash, A.; Kohno, T.; and Song, D. 2017. Robust jected pixel was not constrained to overlap the target object, Physical-World Attacks on Deep Learning Models. and would appear in the background. Empirically, these sin- Feinman, R.; Curtin, R. R.; Shintre, S.; and Gardner, A. B. gle pixel projections onto the background of an image could 2017. Detecting adversarial samples from artifacts. arXiv significantly change classifier predictions. preprint arXiv:1703.00410. Jaderberg, M.; Simonyan, K.; Zisserman, A.; et al. 2015. Conclusion and Future Work Spatial transformer networks. In Advances in neural infor- The presented work is an empirical demonstration of light mation processing systems, 2017–2025. based attacks on deep learning based object recognition sys- Kurakin, A.; Goodfellow, I.; and Bengio, S. 2016. Adver- tems. Adversarial machine learning research has empha- sarial examples in the physical world. Arxiv (c):1–15. sized attacks against deep learning architectures, however it has been observed that other models are equally suscep- LeCun, Y.; Huang, F. J.; and Bottou, L. 2004. Learning tible to attack and that adversarial examples often transfer methods for generic object recognition with invariance to between model types (Papernot, McDaniel, and Goodfellow pose and lighting. In Computer Vision and Pattern Recogni- 2016). The empirical demonstration of light based attack tion, 2004. CVPR 2004. Proceedings of the 2004 IEEE Com- was against a deep learning architecture. However, based puter Society Conference on, volume 2, II–104. IEEE. on this prior work, it is likely that it could be demonstrated Papernot, N.; McDaniel, P.; and Goodfellow, I. 2016. Trans- against other model types. ferability in Machine Learning: from Phenomena to Black- We plan on conducting experiments with higher resolu- Box Attacks using Adversarial Samples. tion and more robust classifiers and more subtle manip- Qi, C. R.; Su, H.; Mo, K.; and Guibas, L. J. 2017. Pointnet: ulations. We believe that more targeted optimization ap- Deep learning on point sets for 3d classification and seg- proaches that initially focus on sensitive image areas will mentation. Proc. Computer Vision and Pattern Recognition likely lead to faster identification of successful attacks. We (CVPR), IEEE 1(2):4. expect light based attacks could use more complex projected Sharif, M.; Bhagavatula, S.; Bauer, L.; and Reiter, M. K. textures and take advantage of 3D geometry. Presented re- 2016. Accessorize to a crime: Real and stealthy attacks on sults clearly show light has the potential to be another av- state-of-the-art face recognition. In Proceedings of the 2016 enue of adversarial attack in the physical domain. ACM SIGSAC Conference on Computer and Communica- tions Security, 1528–1540. ACM. Acknowledgments Su, J.; Vargas, D. V.; and Kouichi, S. 2017. One pixel The research described in this paper is part of the Analysis in attack for fooling deep neural networks. arXiv preprint Motion Initiative at Pacific Northwest National Laboratory; arXiv:1710.08864. conducted under the Laboratory Directed Research and Szegedy, C.; Zaremba, W.; Sutskever, I.; Bruna, J.; Erhan, Development Program at PNNL, a multi-program national D.; Goodfellow, I.; and Fergus, R. 2013. Intriguing proper- laboratory operated by Battelle for the U.S. Department ties of neural networks. arXiv preprint arXiv:1312.6199. of Energy. The authors are especially grateful to Mark Greaves, Artem Yankov, Sean Zabriskie, Michael Henry, Yamada, T.; Gohshi, S.; and Echizen, I. 2013. Privacy visor: Jeremiah Rounds, Court Corley, Nathan Hodas, Will Koella Method for preventing face image detection by using differ- and our Quickstarter supporters. ences in human and device sensitivity. In IFIP International Conference on Communications and Multimedia Security, 152–161. Springer. Zheng, S.; Song, Y.; Leung, T.; and Goodfellow, I. 2016. References Improving the Robustness of Deep Neural Networks via Sta- bility Training. Athalye, A., and Sutskever, I. 2017. Synthesizing robust adversarial examples. arXiv preprint arXiv:1707.07397. CIFAR Class Experiment Condition Mean Median SD Var Min Max ∆ Mean ∆ Median Airplane Baseline 1.000 1.000 .000 .000 1.000 1.000 .000 .000 White Light .151 .101 .198 .039 .017 .997 .849 .899 Random .114 .105 .088 .008 .022 .445 .886 .895 Diff Evolution .133 .112 .087 .007 .014 .459 .867 .888 Automobile Baseline 1.000 1.000 .000 .000 1.000 1.000 .000 .000 White Light 1.000 1.000 .000 .000 .999 1.000 .000 .000 Random 1.000 1.000 .000 .000 .999 1.000 .000 .000 Diff Evolution 1.000 1.000 .000 .000 1.000 1.000 .000 .000 Bird Baseline 1.000 1.000 .000 .000 1.000 1.000 .000 .000 White Light 1.000 1.000 .002 .000 .993 1.000 .000 .000 Random 1.000 1.000 .000 .000 1.000 1.000 .000 .000 Diff Evolution 1.000 1.000 .000 .000 .999 1.000 .000 .000 Cat Baseline .990 .991 .004 .000 .979 .996 .000 .000 White Light .009 .008 .005 .000 .000 .020 .981 .983 Random .011 .007 .012 .000 .001 .047 .979 .984 Diff Evolution .023 .017 .019 .000 .002 .124 .967 .974 Deer Baseline .999 .999 .000 .000 .999 1.000 .000 .000 White Light .516 .516 .145 .021 .242 .997 .483 .483 Random .545 .507 .155 .024 .327 .871 .454 .492 Diff Evolution .473 .467 .130 .017 .144 .829 .526 .532 Dog Baseline .993 .993 .003 .000 .986 .996 .000 .000 White Light .512 .499 .088 .008 .390 .695 .481 .494 Random .482 .497 .123 .015 .136 .753 .511 .496 Diff Evolution .386 .388 .088 .008 .123 .601 .606 .605 Frog Baseline .888 .888 .025 .001 .842 .933 .000 .000 White Light .008 .008 .003 .000 .000 .015 .881 .880 Random .030 .011 .076 .006 .004 .360 .858 .877 Diff Evolution .071 .038 .093 .009 .005 .576 .817 .849 Horse Baseline 1.000 1.000 .000 .000 1.000 1.000 .000 .000 White Light .999 1.000 .001 .000 .993 1.000 .000 .000 Random 1.000 1.000 .000 .000 1.000 1.000 .000 .000 Diff Evolution 1.000 1.000 .000 .000 1.000 1.000 .000 .000 Ship Baseline 1.000 1.000 .000 .000 1.000 1.000 .000 .000 White Light 1.000 1.000 .000 .000 1.000 1.000 .000 .000 Random 1.000 1.000 .000 .000 1.000 1.000 .000 .000 Diff Evolution 1.000 1.000 .000 .000 1.000 1.000 .000 .000 Truck Baseline 1.000 1.000 .000 .000 1.000 1.000 .000 .000 White Light .832 .832 .052 .003 .729 1.000 .168 .168 Random .818 .819 .072 .005 .634 .970 .182 .180 Diff Evolution .826 .839 .088 .008 .507 .949 .174 .161 Table 1: Classification statistics for baseline and attacked CIFAR figures.