Robustness as Inherent Property of Datapoints∗

                                     Andrei Ilie , Marius Popescu , Alin Stefanescu
                                                   University of Bucharest
                                       {cilie, marius.popescu, alin}@fmi.unibuc.ro


                            Abstract                                   We will mainly focus on safety and robustness for image
                                                                    classification tasks, but the work can be easily extended to
        Characterizing how effective a machine learning al-         other topics.
        gorithm is while being trained and tested on slightly          Distribution shifts, which affect the performance of ma-
        different data is a widespread matter. The property         chine learning systems, can mainly occur because of two rea-
        of models which perform well under this general             sons. The first reason, adversarial attacks [Wiyatno et al.,
        framework is commonly known as robustness.                  2019; Szegedy et al., 2014], has been receiving growing at-
        We propose a class of model-agnostic empirical ro-          tention over the past years. Adversarial attacks are ”hidden
        bustness measures for image classification tasks.           messages” [Wiyatno et al., 2019] added on top of images
        To any random image perturbation scheme, we at-             which are nearly imperceptible to the human eye, but which
        tach a robustness measure that empirically checks           cause the model to fault, in other words creating ”machine
        how easy it is to perturb a labelled image and cause        illusions”.
        the model to misclassify it.                                   The second reason, covariate shift [Shimodaira, 2000], is
        We also introduce a methodology for training more           encountering a natural change in the data distribution. For
        robust models using the information gained about            example, imagine an autonomous car model trained solely
        the empirical robustness measure of the training set.       on rainy and sunny conditions in a city where it has not
        We only keep a fraction of datapoints that are robust       been snowing over the past five years. However, one day it
        according to our robustness measure and retrain the         starts snowing, and the image recognition system of the au-
        model using it. Our methodology validates that the          tonomous car could have serious issues in identifying objects
        robustness of the model increases by measuring its          and road signs because of completely different lighting con-
        empirical robustness on test data.                          ditions.
                                                                       While improving models to be less exposed to known ad-
1       Introduction                                                versarial attacks is very important, one has to keep in mind
                                                                    that this is, after all, an adversarial game, where the attacker
During the last decade, the field of machine learning has           and the security researcher keep alternately coming up with
made considerable advances in many tasks, such as image             better strategies. For example, the adversarial attack strategy
classification, object detection, machine translation, or ques-     Fast Gradient Sign Method [Szegedy et al., 2014] can be mit-
tion answering, with deep neural networks easily becom-             igated by Adversarial training [Szegedy et al., 2014], which
ing the state-of-the-art approaches [Touvron et al., 2020;          can in turn be bypassed by R+FGSM [Tramèr et al., 2018].
Zhang et al., 2020; Edunov et al., 2018]. The main prior-           The defense methods against adversarial attacks seek to make
ity has been on the capacity of the models to perform well          the model robust with respect to certain adversarial points in
on the test set of some well-known datasets (MNIST, CI-             the neighbourhood of unaltered images.
FAR, SQuAD) [LeCun and Cortes, 2010; Krizhevsky, 2009;                 Therefore, one is prompted to consider a more general ro-
Rajpurkar et al., 2016]. However, the training and the test         bustness framework, in which the interest lays in the model
sets are usually generated from the same underlying distri-         not making a mistake anywhere in the neighbourhood of an
bution, leaving the model’s performance under distribution          image1 . There exist various tools that can achieve robust-
shifts unknown. Given that machine learning techniques are          ness guarantees of deep neural networks [Ruan et al., 2018;
being employed in sensitive tasks, such as self-driving cars        Tjeng et al., 2019], but most of them are usually very depen-
and healthcare, the robustness should become a crucial met-         dent on the model’s architecture, either not being able to scale
ric to be taken into consideration together with the accuracy       with deeper networks, or only working with certain kinds of
when evaluating the performance of models.                          layers.
    ∗
    Copyright c 2020 for this paper by its authors. Use permitted
                                                                       1
under Creative Commons License Attribution 4.0 International (CC         For example, the neighbourhood could be specified by some
BY 4.0).                                                            metric ball around the image.
Figure 1: Images deemed as robust by our simple CNN on the first row against images deemed as not robust on the second row. The images
on the second row were classified correctly by the model M before applying the random perturbation process.


   We propose a model-agnostic2 empirical method for esti-              2     Randomized Perturbation Robustness
mating the robustness of a model. This estimation of a model
near an image X is done by iteratively sampling datapoints
                                                                        2.1    Definition
close to it, according to a specified random scheme3 . It feeds         We propose a class of empirical robustness measures RPR
each of the sampled datapoints to the model and stops ei-               (Randomized Perturbation Robustness) for image classi-
ther when the model classifies them incorrectly, or when a              fication tasks, which is model-agnostic. Let R be a ran-
maximum number of steps has been reached. The number of                 dom image perturbation scheme.4 The empirical robustness
such sampling steps serves as a proxy for the local robustness          RPR(R) of a model M with respect to a datapoint x belong-
around image X. Intuitively, the easier it is to perturb the la-        ing to class y is the minimum between MAX STEPS and the
bel of X by sampling around it, the less robust the model is            expected number of retrying steps of applying R to the origi-
around it. We use this method for estimating the robustness of          nal x such that M does not classify R(x) as y.
the model on entire datasets, by locally checking the model’s              If the empirical robustness of M with respect to (x, y) is
robustness around each datapoint and combining the results.             MAX STEPS, we stop and deem x as robust; otherwise as
                                                                        not-robust.
   We also claim that the robustness of the model is correlated            Note that the random perturbations of an image are not ap-
with the inherent robustness of the images with respect to the          plied on top of previous perturbation attempts, but rather on
classification task. Therefore, the robustness of a model de-           the original image. This perturbation process is repeated until
pends on both the architecture’s robustness itself, but also on         the conditions above are fulfilled.
the inherent robustness of the datapoints it has been trained
on.                                                                     2.2    Empirical robustness on datapoints and on
   We believe that training a model on certain correctly la-                   entire datasets
belled images can lead towards highly unnatural borders be-             The introduced framework is a simple empirical way of as-
tween classes. These might be datapoints that we would                  sessing a model’s robustness near an image. It is suitbale
rather misclassify than include in the model at an additional           under various setups, such as the random image perturbation
high cost of robustness. We test this hypothesis and obtain             scheme of adding weather conditions5 in the autonomous car
indeed a more robust model by discarding the not-robust im-             situation.
ages from the training process.                                            We propose two use cases based on the empirical robust-
                                                                        ness measure introduced above: One estimating the model
   Our main technical contributions are introducing the em-
                                                                        robustness on an entire (test) dataset and another one training
pirical robustness measure that is model-agnostic and the
                                                                        a model only using the images that are deemed as robust in
training methodology based on robust images.
                                                                        order to obtain a more robust model.
   An important general direction we want to shed light on is              The first use case, estimating the robustness of the model
that images from classification tasks should be seen as carry-          on an entire dataset, is done by applying the Randomized
ing an inherent level of robustness, which could be estimated           Perturbation Robustness method described above on each
and exploited.                                                          datapoint and computing the percentage of images that are
                                                                        deemed as robust. The model-agnosticism makes it an easy
                                                                        plug-in method in any classification task and can easily be
                                                                        introduced as a baseline check for machine learning systems.
   2                                                                       The second use case is based on our claim that the robust-
      The method does not need to have any knowledge about the
architecture of the model. Note that the model does not necessarily     ness of the model with respect to a datapoint can be seen to
have to be a deep neural network.
    3                                                                      4
      The random scheme should not alter the underlying true class of        For example Gaussian noise, replacing at most k pixels of an
the image that we sampled around. Intuitively, the samples should       image, blurring, etc.
                                                                           5
be classified by a human in the same way as the original image is.           Applying snow, fog, rain effects, etc.
some extent as the inherent robustness of the datapoint with
respect to the classification task. This allows us to retrain the
model using only images from the train set that are deemed
as robust by our empirical measure, giving us a more robust
model. This happens as the model only learns using the ro-
bust images, which justifies inferring simple, more natural
class separators. We claim that the images that are deemed as
not robust by our method can generally be seen as edge cases,
causing the model to infer irregular separators.
2.3    Methodology and experiments
We experiment using a CNN architecture for classifying im-
ages from MNIST. As this classification task is not complex,
we use a very simple model6 which achieves a test accuracy               Figure 3: The ratio of images from the test set that are still robust
of only 98.85% to showcase the main ideas we introduce. The              as a function of the number of perturbation iterations that have been
randomized image perturbation scheme we use is randomly                  applied. The initial model M is used.
altering a pixel count of at most the square root of the number
of image pixels (28 in our case). We use MAX STEPS = 250
in our experiments.
   We show in Figure 2 an image that is classified correctly
by M against its random perturbation under the scheme de-
scribed above, which is incorrectly classified by M.


                                                                         Figure 4: Distribution of training images that are deemed as robust
                                                                         under model M. Images labelled as 7 seem to inherently be more
                                                                         robust, while images labelled as 1, 8, and 9 can easily be corrupted
                                                                         by random perturbations.
Figure 2: The image on the left is labelled as 6 by M. The image
on the right is obtained by perturbing at most 28 pixels from the left      In order to achieve a more robust network, we apply the
one, and it is labelled as 2 by M. The perturbed image was obtained      same procedure of deeming an image as robust or not ro-
after 47 random perturbation steps of altering at most 28 pixels. All
                                                                         bust on the MNIST train set, using the model M, which was
the previous 46 random perturbations were not able to confuse the
model.                                                                   trained on exactly this data. There are 71.28% images which
                                                                         are deemed as robust from the train set, however the distribu-
   We compare in Figure 1 robust and not robust images                   tion is not uniform at all as seen in Figure 4. Therefore, we
which, without any perturbation, are correctly classified by             randomly sample 1500 datapoints from each class of the ro-
M. These were randomly chosen and give some intuition                    bust training images, such that the training set does not have a
about what a robust image means compared to one that is not              class bias, and proceed to retrain the simple CNN architecture
robust.                                                                  solely by using this data. Let the model trained on this data,
   The process we described for determining the empiri-                  which amounts for only 25% data from the MNIST training
cal robustness is very similar, when seen as a function of               set, be MR . We encounter a drop of approximately 2% in the
MAX STEPS, to a learning curve. Discovering images which                 test accuracy, obtaining a 96.92% score, which is to be ex-
are not robust eventually flattens, which allows us to use it to-        pected considering the relatively sparse training data we have
gether with some early-stopping mechanism.                               trained on.
   In Figure 3 we can see how the ratio of test images that                 The model MR is much more robust on the test set, ob-
are still robust as a function of MAX STEPS flattens. We                 taining a ratio of 0.5101 robust images, cf. Figure 5, as com-
obtain a ratio of 0.2957 images from the test set which can              pared to the robustness of the original M of only 0.2957.
withstand 250 random permutations, which is a surprisingly               This stands as evidence that the robust nature of the selected
small fraction, considering the simple noising we apply. This            training images led to a more robust model.
stands as straightforward empirical evidence that the simple
CNN architecture we used is not robust.                                  3    Conclusions and future work
   6
    We use two small convolutional layers, one max pooling layer,        The simple empirical robustness checking method we intro-
and a fully connected layer with softmax activation. We also train       duce opens the way towards building fast, model-agnostic
the model with ADAM using the default hyperparameters.                   tools to estimate robustness of machine learning models. This
                                                                        [Krizhevsky, 2009] Alex Krizhevsky. Learning Multiple
                                                                           Layers of Features from Tiny Images, chapter 3. Technical
                                                                           Report TR-2009, University of Toronto, 2009.
                                                                        [LeCun and Cortes, 2010] Yann LeCun and Corinna
                                                                           Cortes.          MNIST handwritten digit database.
                                                                           http://yann.lecun.com/exdb/mnist/, 2010.
                                                                        [Rajpurkar et al., 2016] Pranav Rajpurkar, Jian Zhang, Kon-
                                                                           stantin Lopyrev, and Percy Liang. Squad: 100, 000+
                                                                           questions for machine comprehension of text. In Jian Su,
                                                                           Xavier Carreras, and Kevin Duh, editors, Proceedings of
                                                                           the 2016 Conference on Empirical Methods in Natural
                                                                           Language Processing, EMNLP 2016, Austin, Texas, USA,
Figure 5: The ratio of images from the test set that are still robust      November 1-4, 2016, pages 2383–2392. The Association
as a function of the number of perturbation iterations that have been      for Computational Linguistics, 2016.
applied. Here, the model MR is used. There is a clear improvement       [Ruan et al., 2018] Wenjie Ruan, Xiaowei Huang, and Marta
in robustness when compared to the model M.
                                                                           Kwiatkowska. Reachability analysis of deep neural net-
                                                                           works with provable guarantees. In Jérôme Lang, edi-
method can be easily embedded as a base check in machine                   tor, Proceedings of the Twenty-Seventh International Joint
learning systems.                                                          Conference on Artificial Intelligence, IJCAI 2018, July 13-
                                                                           19, 2018, Stockholm, Sweden, pages 2651–2659. ijcai.org,
   One of the main takeaways is that robustness can be seen as             2018.
an inherent property of the images with respect to the classi-
fication task. The robustness of models depends both on their           [Shimodaira, 2000] Hidetoshi Shimodaira. Improving pre-
architecture and on the robustness of the data it is trained on.           dictive inference under covariate shift by weighting the
This can be exploited in various ways, such as the training                log-likelihood function. Journal of Statistical Planning
methodology we proposed, which improves significantly the                  and Inference, 90:227–244, October 2000.
robustness of the model.                                                [Szegedy et al., 2014] Christian      Szegedy,      Wojciech
   Some interesting other applications could include using                 Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan,
Generative Adversarial Networks (GANs) to augment the ro-                  Ian Goodfellow, and Rob Fergus. Intriguing properties
bust training data from the training methodology we pro-                   of neural networks. January 2014. 2nd International
posed. Data augmentation with GANs has successfully been                   Conference on Learning Representations, ICLR 2014 ;
used in improving the quality of data and accuracy of mod-                 Conference date: 14-04-2014 Through 16-04-2014.
els [Antoniou et al., 2017] and we believe that it could be             [Tjeng et al., 2019] Vincent Tjeng, Kai Y. Xiao, and Russ
used to generate diverse robust images as well. These could                Tedrake. Evaluating robustness of neural networks with
contribute to increasing the accuracy of robust models trained             mixed integer programming. In 7th International Confer-
under our methodology.                                                     ence on Learning Representations, ICLR 2019, New Or-
   Another area of further investigation is checking how our               leans, LA, USA, May 6-9, 2019. OpenReview.net, 2019.
empirical robustness measure relates with the formal verifica-          [Touvron et al., 2020] Hugo Touvron, Andrea Vedaldi, and
tion tools that obtain exact robustness guarantees. Note that              Herve Jegou Matthijs Douz and. Fixing the train-test reso-
this kind of experiment is not possible for any model, as ex-              lution discrepancy: FixEfficientNet. arXiv:2003.08237v4,
isting formal verification tools are limited to specific machine           April 2020.
learning architectures or do not scale well with complex mod-
                                                                        [Tramèr et al., 2018] Florian Tramèr, Alexey Kurakin, Nico-
els.
                                                                           las Papernot, Ian J. Goodfellow, Dan Boneh, and
                                                                           Patrick D. McDaniel. Ensemble adversarial training: At-
References                                                                 tacks and defenses. In 6th International Conference on
                                                                           Learning Representations, ICLR 2018, Vancouver, BC,
[Antoniou et al., 2017] Antreas Antoniou, Amos J. Storkey,                 Canada, April 30 - May 3, 2018, Conference Track Pro-
  and Harrison Edwards. Data augmentation generative ad-                   ceedings. OpenReview.net, 2018.
  versarial networks. CoRR, abs/1711.04340, 2017.                       [Wiyatno et al., 2019] Rey Reza Wiyatno, Anqi Xu, Ous-
                                                                           mane Dia, and Archy de Berker. Adversarial exam-
[Edunov et al., 2018] Sergey Edunov, Myle Ott, Michael
                                                                           ples in modern machine learning: A review. CoRR,
  Auli, and David Grangier. Understanding back-translation                 abs/1911.05268, 2019.
  at scale. In Ellen Riloff, David Chiang, Julia Hocken-
  maier, and Jun’ichi Tsujii, editors, Proceedings of the 2018          [Zhang et al., 2020] Hang Zhang, Chongruo Wu, Zhongyue
  Conference on Empirical Methods in Natural Language                      Zhang, Yi Zhu, Haibin Lin, Zhi Zhang, Yue Sun,
  Processing, Brussels, Belgium, October 31 - November 4,                  Tong He, Jonas Mueller, R. Manmatha, Mu Li, and
  2018, pages 489–500. Association for Computational Lin-                  Alexander Smola. Resnest: Split-attention networks.
  guistics, 2018.                                                          arXiv:2004.08955v1, April 2020.