Robustness as Inherent Property of Datapoints∗ Andrei Ilie , Marius Popescu , Alin Stefanescu University of Bucharest {cilie, marius.popescu, alin}@fmi.unibuc.ro Abstract We will mainly focus on safety and robustness for image classification tasks, but the work can be easily extended to Characterizing how effective a machine learning al- other topics. gorithm is while being trained and tested on slightly Distribution shifts, which affect the performance of ma- different data is a widespread matter. The property chine learning systems, can mainly occur because of two rea- of models which perform well under this general sons. The first reason, adversarial attacks [Wiyatno et al., framework is commonly known as robustness. 2019; Szegedy et al., 2014], has been receiving growing at- We propose a class of model-agnostic empirical ro- tention over the past years. Adversarial attacks are ”hidden bustness measures for image classification tasks. messages” [Wiyatno et al., 2019] added on top of images To any random image perturbation scheme, we at- which are nearly imperceptible to the human eye, but which tach a robustness measure that empirically checks cause the model to fault, in other words creating ”machine how easy it is to perturb a labelled image and cause illusions”. the model to misclassify it. The second reason, covariate shift [Shimodaira, 2000], is We also introduce a methodology for training more encountering a natural change in the data distribution. For robust models using the information gained about example, imagine an autonomous car model trained solely the empirical robustness measure of the training set. on rainy and sunny conditions in a city where it has not We only keep a fraction of datapoints that are robust been snowing over the past five years. However, one day it according to our robustness measure and retrain the starts snowing, and the image recognition system of the au- model using it. Our methodology validates that the tonomous car could have serious issues in identifying objects robustness of the model increases by measuring its and road signs because of completely different lighting con- empirical robustness on test data. ditions. While improving models to be less exposed to known ad- 1 Introduction versarial attacks is very important, one has to keep in mind that this is, after all, an adversarial game, where the attacker During the last decade, the field of machine learning has and the security researcher keep alternately coming up with made considerable advances in many tasks, such as image better strategies. For example, the adversarial attack strategy classification, object detection, machine translation, or ques- Fast Gradient Sign Method [Szegedy et al., 2014] can be mit- tion answering, with deep neural networks easily becom- igated by Adversarial training [Szegedy et al., 2014], which ing the state-of-the-art approaches [Touvron et al., 2020; can in turn be bypassed by R+FGSM [Tramèr et al., 2018]. Zhang et al., 2020; Edunov et al., 2018]. The main prior- The defense methods against adversarial attacks seek to make ity has been on the capacity of the models to perform well the model robust with respect to certain adversarial points in on the test set of some well-known datasets (MNIST, CI- the neighbourhood of unaltered images. FAR, SQuAD) [LeCun and Cortes, 2010; Krizhevsky, 2009; Therefore, one is prompted to consider a more general ro- Rajpurkar et al., 2016]. However, the training and the test bustness framework, in which the interest lays in the model sets are usually generated from the same underlying distri- not making a mistake anywhere in the neighbourhood of an bution, leaving the model’s performance under distribution image1 . There exist various tools that can achieve robust- shifts unknown. Given that machine learning techniques are ness guarantees of deep neural networks [Ruan et al., 2018; being employed in sensitive tasks, such as self-driving cars Tjeng et al., 2019], but most of them are usually very depen- and healthcare, the robustness should become a crucial met- dent on the model’s architecture, either not being able to scale ric to be taken into consideration together with the accuracy with deeper networks, or only working with certain kinds of when evaluating the performance of models. layers. ∗ Copyright c 2020 for this paper by its authors. Use permitted 1 under Creative Commons License Attribution 4.0 International (CC For example, the neighbourhood could be specified by some BY 4.0). metric ball around the image. Figure 1: Images deemed as robust by our simple CNN on the first row against images deemed as not robust on the second row. The images on the second row were classified correctly by the model M before applying the random perturbation process. We propose a model-agnostic2 empirical method for esti- 2 Randomized Perturbation Robustness mating the robustness of a model. This estimation of a model near an image X is done by iteratively sampling datapoints 2.1 Definition close to it, according to a specified random scheme3 . It feeds We propose a class of empirical robustness measures RPR each of the sampled datapoints to the model and stops ei- (Randomized Perturbation Robustness) for image classi- ther when the model classifies them incorrectly, or when a fication tasks, which is model-agnostic. Let R be a ran- maximum number of steps has been reached. The number of dom image perturbation scheme.4 The empirical robustness such sampling steps serves as a proxy for the local robustness RPR(R) of a model M with respect to a datapoint x belong- around image X. Intuitively, the easier it is to perturb the la- ing to class y is the minimum between MAX STEPS and the bel of X by sampling around it, the less robust the model is expected number of retrying steps of applying R to the origi- around it. We use this method for estimating the robustness of nal x such that M does not classify R(x) as y. the model on entire datasets, by locally checking the model’s If the empirical robustness of M with respect to (x, y) is robustness around each datapoint and combining the results. MAX STEPS, we stop and deem x as robust; otherwise as not-robust. We also claim that the robustness of the model is correlated Note that the random perturbations of an image are not ap- with the inherent robustness of the images with respect to the plied on top of previous perturbation attempts, but rather on classification task. Therefore, the robustness of a model de- the original image. This perturbation process is repeated until pends on both the architecture’s robustness itself, but also on the conditions above are fulfilled. the inherent robustness of the datapoints it has been trained on. 2.2 Empirical robustness on datapoints and on We believe that training a model on certain correctly la- entire datasets belled images can lead towards highly unnatural borders be- The introduced framework is a simple empirical way of as- tween classes. These might be datapoints that we would sessing a model’s robustness near an image. It is suitbale rather misclassify than include in the model at an additional under various setups, such as the random image perturbation high cost of robustness. We test this hypothesis and obtain scheme of adding weather conditions5 in the autonomous car indeed a more robust model by discarding the not-robust im- situation. ages from the training process. We propose two use cases based on the empirical robust- ness measure introduced above: One estimating the model Our main technical contributions are introducing the em- robustness on an entire (test) dataset and another one training pirical robustness measure that is model-agnostic and the a model only using the images that are deemed as robust in training methodology based on robust images. order to obtain a more robust model. An important general direction we want to shed light on is The first use case, estimating the robustness of the model that images from classification tasks should be seen as carry- on an entire dataset, is done by applying the Randomized ing an inherent level of robustness, which could be estimated Perturbation Robustness method described above on each and exploited. datapoint and computing the percentage of images that are deemed as robust. The model-agnosticism makes it an easy plug-in method in any classification task and can easily be introduced as a baseline check for machine learning systems. 2 The second use case is based on our claim that the robust- The method does not need to have any knowledge about the architecture of the model. Note that the model does not necessarily ness of the model with respect to a datapoint can be seen to have to be a deep neural network. 3 4 The random scheme should not alter the underlying true class of For example Gaussian noise, replacing at most k pixels of an the image that we sampled around. Intuitively, the samples should image, blurring, etc. 5 be classified by a human in the same way as the original image is. Applying snow, fog, rain effects, etc. some extent as the inherent robustness of the datapoint with respect to the classification task. This allows us to retrain the model using only images from the train set that are deemed as robust by our empirical measure, giving us a more robust model. This happens as the model only learns using the ro- bust images, which justifies inferring simple, more natural class separators. We claim that the images that are deemed as not robust by our method can generally be seen as edge cases, causing the model to infer irregular separators. 2.3 Methodology and experiments We experiment using a CNN architecture for classifying im- ages from MNIST. As this classification task is not complex, we use a very simple model6 which achieves a test accuracy Figure 3: The ratio of images from the test set that are still robust of only 98.85% to showcase the main ideas we introduce. The as a function of the number of perturbation iterations that have been randomized image perturbation scheme we use is randomly applied. The initial model M is used. altering a pixel count of at most the square root of the number of image pixels (28 in our case). We use MAX STEPS = 250 in our experiments. We show in Figure 2 an image that is classified correctly by M against its random perturbation under the scheme de- scribed above, which is incorrectly classified by M. Figure 4: Distribution of training images that are deemed as robust under model M. Images labelled as 7 seem to inherently be more robust, while images labelled as 1, 8, and 9 can easily be corrupted by random perturbations. Figure 2: The image on the left is labelled as 6 by M. The image on the right is obtained by perturbing at most 28 pixels from the left In order to achieve a more robust network, we apply the one, and it is labelled as 2 by M. The perturbed image was obtained same procedure of deeming an image as robust or not ro- after 47 random perturbation steps of altering at most 28 pixels. All bust on the MNIST train set, using the model M, which was the previous 46 random perturbations were not able to confuse the model. trained on exactly this data. There are 71.28% images which are deemed as robust from the train set, however the distribu- We compare in Figure 1 robust and not robust images tion is not uniform at all as seen in Figure 4. Therefore, we which, without any perturbation, are correctly classified by randomly sample 1500 datapoints from each class of the ro- M. These were randomly chosen and give some intuition bust training images, such that the training set does not have a about what a robust image means compared to one that is not class bias, and proceed to retrain the simple CNN architecture robust. solely by using this data. Let the model trained on this data, The process we described for determining the empiri- which amounts for only 25% data from the MNIST training cal robustness is very similar, when seen as a function of set, be MR . We encounter a drop of approximately 2% in the MAX STEPS, to a learning curve. Discovering images which test accuracy, obtaining a 96.92% score, which is to be ex- are not robust eventually flattens, which allows us to use it to- pected considering the relatively sparse training data we have gether with some early-stopping mechanism. trained on. In Figure 3 we can see how the ratio of test images that The model MR is much more robust on the test set, ob- are still robust as a function of MAX STEPS flattens. We taining a ratio of 0.5101 robust images, cf. Figure 5, as com- obtain a ratio of 0.2957 images from the test set which can pared to the robustness of the original M of only 0.2957. withstand 250 random permutations, which is a surprisingly This stands as evidence that the robust nature of the selected small fraction, considering the simple noising we apply. This training images led to a more robust model. stands as straightforward empirical evidence that the simple CNN architecture we used is not robust. 3 Conclusions and future work 6 We use two small convolutional layers, one max pooling layer, The simple empirical robustness checking method we intro- and a fully connected layer with softmax activation. We also train duce opens the way towards building fast, model-agnostic the model with ADAM using the default hyperparameters. tools to estimate robustness of machine learning models. This [Krizhevsky, 2009] Alex Krizhevsky. Learning Multiple Layers of Features from Tiny Images, chapter 3. Technical Report TR-2009, University of Toronto, 2009. [LeCun and Cortes, 2010] Yann LeCun and Corinna Cortes. MNIST handwritten digit database. http://yann.lecun.com/exdb/mnist/, 2010. [Rajpurkar et al., 2016] Pranav Rajpurkar, Jian Zhang, Kon- stantin Lopyrev, and Percy Liang. Squad: 100, 000+ questions for machine comprehension of text. In Jian Su, Xavier Carreras, and Kevin Duh, editors, Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, Austin, Texas, USA, Figure 5: The ratio of images from the test set that are still robust November 1-4, 2016, pages 2383–2392. The Association as a function of the number of perturbation iterations that have been for Computational Linguistics, 2016. applied. Here, the model MR is used. There is a clear improvement [Ruan et al., 2018] Wenjie Ruan, Xiaowei Huang, and Marta in robustness when compared to the model M. Kwiatkowska. Reachability analysis of deep neural net- works with provable guarantees. In Jérôme Lang, edi- method can be easily embedded as a base check in machine tor, Proceedings of the Twenty-Seventh International Joint learning systems. Conference on Artificial Intelligence, IJCAI 2018, July 13- 19, 2018, Stockholm, Sweden, pages 2651–2659. ijcai.org, One of the main takeaways is that robustness can be seen as 2018. an inherent property of the images with respect to the classi- fication task. The robustness of models depends both on their [Shimodaira, 2000] Hidetoshi Shimodaira. Improving pre- architecture and on the robustness of the data it is trained on. dictive inference under covariate shift by weighting the This can be exploited in various ways, such as the training log-likelihood function. Journal of Statistical Planning methodology we proposed, which improves significantly the and Inference, 90:227–244, October 2000. robustness of the model. [Szegedy et al., 2014] Christian Szegedy, Wojciech Some interesting other applications could include using Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Generative Adversarial Networks (GANs) to augment the ro- Ian Goodfellow, and Rob Fergus. Intriguing properties bust training data from the training methodology we pro- of neural networks. January 2014. 2nd International posed. Data augmentation with GANs has successfully been Conference on Learning Representations, ICLR 2014 ; used in improving the quality of data and accuracy of mod- Conference date: 14-04-2014 Through 16-04-2014. els [Antoniou et al., 2017] and we believe that it could be [Tjeng et al., 2019] Vincent Tjeng, Kai Y. Xiao, and Russ used to generate diverse robust images as well. These could Tedrake. Evaluating robustness of neural networks with contribute to increasing the accuracy of robust models trained mixed integer programming. In 7th International Confer- under our methodology. ence on Learning Representations, ICLR 2019, New Or- Another area of further investigation is checking how our leans, LA, USA, May 6-9, 2019. OpenReview.net, 2019. empirical robustness measure relates with the formal verifica- [Touvron et al., 2020] Hugo Touvron, Andrea Vedaldi, and tion tools that obtain exact robustness guarantees. Note that Herve Jegou Matthijs Douz and. Fixing the train-test reso- this kind of experiment is not possible for any model, as ex- lution discrepancy: FixEfficientNet. arXiv:2003.08237v4, isting formal verification tools are limited to specific machine April 2020. learning architectures or do not scale well with complex mod- [Tramèr et al., 2018] Florian Tramèr, Alexey Kurakin, Nico- els. las Papernot, Ian J. Goodfellow, Dan Boneh, and Patrick D. McDaniel. Ensemble adversarial training: At- References tacks and defenses. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, [Antoniou et al., 2017] Antreas Antoniou, Amos J. Storkey, Canada, April 30 - May 3, 2018, Conference Track Pro- and Harrison Edwards. Data augmentation generative ad- ceedings. OpenReview.net, 2018. versarial networks. CoRR, abs/1711.04340, 2017. [Wiyatno et al., 2019] Rey Reza Wiyatno, Anqi Xu, Ous- mane Dia, and Archy de Berker. Adversarial exam- [Edunov et al., 2018] Sergey Edunov, Myle Ott, Michael ples in modern machine learning: A review. CoRR, Auli, and David Grangier. Understanding back-translation abs/1911.05268, 2019. at scale. In Ellen Riloff, David Chiang, Julia Hocken- maier, and Jun’ichi Tsujii, editors, Proceedings of the 2018 [Zhang et al., 2020] Hang Zhang, Chongruo Wu, Zhongyue Conference on Empirical Methods in Natural Language Zhang, Yi Zhu, Haibin Lin, Zhi Zhang, Yue Sun, Processing, Brussels, Belgium, October 31 - November 4, Tong He, Jonas Mueller, R. Manmatha, Mu Li, and 2018, pages 489–500. Association for Computational Lin- Alexander Smola. Resnest: Split-attention networks. guistics, 2018. arXiv:2004.08955v1, April 2020.