<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Evolving Robust Neural Architectures to Defend from Adversarial Attacks</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Shashank Kotyan</string-name>
          <email>shashankkotyan@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Danilo Vasconcellos Vargas</string-name>
          <email>vargas@inf.kyushu-u.ac.jp</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Informatics, Kyushu University</institution>
          ,
          <country country="JP">Japan</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Neural networks are prone to misclassify slightly modified input images. Recently, many defences have been proposed, but none have improved the robustness of neural networks consistently. Here, we propose to use adversarial attacks as a function evaluation to search for neural architectures that can resist such attacks automatically. Experiments on neural architecture search algorithms from the literature show that although accurate, they are not able to find robust architectures. A significant reason for this lies in their limited search space. By creating a novel neural architecture search with options for dense layers to connect with convolution layers and vice-versa as well as the addition of concatenation layers in the search, we were able to evolve an architecture that is inherently accurate on adversarial samples. Interestingly, this inherent robustness of the evolved architecture rivals state-ofthe-art defences such as adversarial training while being trained only on the non-adversarial samples. Moreover, the evolved architecture makes use of some peculiar traits which might be useful for developing even more robust ones. Thus, the results here confirm that more robust architectures exist as well as opens up a new realm of feasibilities for the development and exploration of neural networks.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Neural Architecture Search (NAS) and adversarial samples
have rarely appeared together. Regarding adversarial
samples, they were discovered in 2013 when neural networks
were shown to behave strangely for nearly the same
images [Szegedy, 2014]. Afterwards, a series of
vulnerabilities were found in [Moosavi-Dezfooli et al., 2017; Su et al.,
2019]. Such adversarial attacks can also be easily applied to
real-world scenarios which confer a big problem for current
deep neural networks’ applications. Currently, there is not
any known learning algorithm or procedure that can defend
against adversarial attacks consistently.</p>
      <p>Regarding NAS, the automatic design of architectures has
been of broad interest for many years. The aim is to develop
methods that do not need specialists in order to be applied to
a different application. This would confer not only
generality but also easy of use. Most of the algorithms for NAS are
either based on reinforcement learning [Pham et al., 2018;
Zoph et al., 2018] or evolutionary computation [Real et al.,
2017; Miikkulainen et al., 2019]. On the one hand, in
reinforcement learning approaches, architectures are created
from a sequence of actions which are afterwards rewarded
proportionally to the crafted architecture’s accuracy. On
the other hand, in evolutionary computation based methods,
small changes in the architecture (mutations) and
recombinations (crossover) are used to create new architectures. All
architectures evolved are evaluated based on their accuracy.
Some of the best architectures based on this accuracy are
chosen to continue to the next generation.</p>
      <p>Here we propose the use of NAS to tackle the robustness
issues exposed by adversarial samples. In other words,
architecture search will be employed not only to find accurate
neural networks but also robust ones. This is based on the
principle that robustness of neural networks can be assessed
by using accuracy on adversarial samples as an evaluation
function. We hypothesise that if there is a solution in a given
architecture search space, the search algorithm would be able
to find it. This is not only a blind search for a cure. The best
architectures found should also hint which structures and
procedures provide robustness for neural networks. Therefore, it
would be possible to use the results of the search to
understand further how to improve the representation of models as
well as design yet more robust ones.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Adversarial Machine Learning</title>
      <p>Adversarial machine learning is a constrained optimisation
problem. Let f (x) 2 [[1::N ]] be the output of a machine
learning algorithm in multi-label classification setting. Here,
x 2 Rk is the input of the algorithm for the input of size k
and N is the number of classes in which x can be classified.
In Image Classification problem k = m n 3 where m n
is the the size of the image. Adversarial samples x0 can be
thus defined as follows:
in which x 2 Rk is a small perturbation added to the input.
Therefore, adversarial machine learning can be defined as an
optimization problem1:
minimize g(x + x)c
x
subject to
k xk
th
(2)
where th is a pre-defined threshold value and g()c is the
softlabel or confidence for the correct class c such that f (x) =
argmax g(x)</p>
      <p>Moreover, attacks can be divided according to the
function optimised. In this way, there are L0 (limited number of
pixels attacked), L1, L2 and L1 (limited amount of
variation in each pixel) types of attacks. There are many types
of attacks as well as their improvements. Universal
perturbation types of attacks were shown possible in which a
single perturbation added to most of the samples is capable
of fooling a neural network in most of the cases
[MoosaviDezfooli et al., 2017]. Moreover, extreme attacks such as
only modifying one pixel (L0 = 1) called one-pixel attack
is also shown to be surprisingly effective [Su et al., 2019;
Vargas and Kotyan, 2019]. Most of these attacks canbe
easily transferred to real scenarios by using printed out versions
of them [Kurakin et al., 2016]. Moreover, carefully crafted
glasses [Sharif et al., 2016] or even general 3D adversarial
objects are also capable of causing misclassification
[Athalye and Sutskever, 2018]. Regarding understanding the
phenomenon, it is argued in [Goodfellow et al., 2014] that neural
networks’ linearity is one of the main reasons. Another
recent investigation proposes the conflicting saliency added by
adversarial samples as the reason for misclassification
[Vargas and Su, 2019].</p>
      <p>Many defensive systems were proposed to mitigate some
of the problems. However, current solutions are still far
from solving the problems. Defensive distillation
[Papernot et al., 2016] uses a smaller neural network to learn the
content from the original one; however, it was shown not to
be robust enough [Carlini and Wagner, 2017]. The addition
of adversarial samples to the training dataset, called
adversarial training, was also proposed [Goodfellow et al., 2014;
Huang et al., 2015; Madry et al., 2018]. However,
adversarial training has a strong bias in the type of adversarial
samples used and is still vulnerable to attacks Many recent
variations of defences were proposed which are carefully analysed,
and many of their shortcomings explained in [Athalye et al.,
2018; Uesato et al., 2018]. In this article, different from
previous approaches, we aim to tackle the robustness problems
of neural networks by automatically searching for inherent
robust architectures.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Neural Architecture Search</title>
      <p>There are three components to a neural architecture search:
search space, search strategy and performance estimation. A
search space substantially limits the representation of the
architecture in a given space. A search strategy must be
employed to search for architectures in a defined search space.
Some widely used search strategies for NAS are: Random
1Here the definition will only concern untargeted attacks but a
similar optimization problem can be defined for targeted attacks
Search, Bayesian Optimization, Evolutionary Methods,
Reinforcement Learning, and Gradient Based Methods. Finally,
a performance estimation (usually error rate) is required to
evaluate the explored architectures.</p>
      <p>Currently, most of the current NAS suffer from high
computational cost while searching in a relatively small search
space [Lee et al., 2018; Lima and Pozo, 2019].It is already
shown in [Yu and Kim, 2018] that, if there is a possibility of
fitness approximation at small search spaces, we could evolve
algorithms in an ample search space. Moreover, many
architecture searches focus primarily on the hyper-parameter
search while using architecture search spaces around
previously hand-crafted architecture [Lee et al., 2018; He et al.,
2019] such as DenseNet which are proved to be vulnerable to
adversarial attacks [Vargas and Kotyan, 2019]. Therefore, for
finding robust architectures, it is crucial to expand the search
space beyond the current NAS.</p>
      <p>SMASH [Brock et al., 2017] uses a neural network to
generate the weights of the primary model. The main strength
of this approach lies in preventing high computational cost,
which is incurred in other searches. However, this comes at
the cost of not being able of tweaking hyperparameters which
affect weights like initialisers and regularisers. Deep
Architect [Negrinho and Gordon, 2017] follows a hierarchical
approach using various search algorithms such as Monte Carlo
Tree Search (MCTS) and Sequential Model based Global
Optimization (SMBO).
4</p>
    </sec>
    <sec id="sec-4">
      <title>Searching for Robust Architectures</title>
      <p>A robust evaluation (defined in Section 4.1) and search
algorithm must be defined to search for robust architectures. The
search algorithm may be a NAS provided that some
modifications are made (Section 4.2). However, to allow for a more
extensive search space, which is better suited to the problem,
we also propose the Robust Architecture Search (Section 5).
4.1</p>
      <sec id="sec-4-1">
        <title>Robustness Evaluation</title>
        <p>Adversarial accuracy may seem like a natural evaluation
function for assessing neural networks’ robustness. However,
there are many types of perturbations possible; each will
result in a different type of robustness assessment and
evolution. For example, let us suppose an evaluation with th = 5
is chosen, robust networks against th = 5 might be
developed. At the same time, nothing can be said for other th and
attack types (different L). Therefore, th plays a role but the
different types of L0, L1, L2 and L1 completely change the
type of robustness, such as wide perturbations (L1),
punctual perturbations (L0) and a mix of both (L1 and L2). To
avoid creating neural networks that are only robust against
one type of robustness and at the same time to allow
robustness to slowly build-up from any partial robustness, a set of
adversarial attacks for varying th and L are necessary.</p>
        <p>To evaluate the robustness of architectures in varying th
and L while at the same time keeping computational cost low,
we use here a transferable type of attack. In other words,
adversarial samples previously found by attacking other
methods are stored and used as possible adversarial samples to the
current model under evaluation. This solves the problem that</p>
        <sec id="sec-4-1-1">
          <title>Layer1</title>
        </sec>
        <sec id="sec-4-1-2">
          <title>Layer2</title>
        </sec>
        <sec id="sec-4-1-3">
          <title>Layer50</title>
          <p>Layer1
Layer4
Layer5
Layer16</p>
        </sec>
        <sec id="sec-4-1-4">
          <title>Block1</title>
        </sec>
        <sec id="sec-4-1-5">
          <title>Block2 Block50</title>
          <p>most of the attacks are usually slow to be put inside a loop
which can make the search for architectures too expensive.</p>
          <p>Table 1 shows a summary of the number of images used
from each type of attack, totalling 2812 adversarial samples.
Samples were generated using the model agnostic dual
quality assessment [Vargas and Kotyan, 2019]. Specifically, we
use the adversarial samples from two types of attacks (L0 and
L1 attacks) with two optimization algorithms (Covariance
Matrix Adaptation Evolution Strategy (CMA-ES) [Hansen et
al., 2003] and Differential Evolution (DE) [Storn and Price,
1997]). We use CIFAR-10 dataset [Krizhevsky et al., 2009]
to generate the adversarial samples. We attacked traditional
architectures such as ResNet [He et al., 2016] and CapsNet
[Sabour et al., 2017]. We also attacked some
state-of-artdefences such as Adversarial Training (AT) [Madry et al.,
2018] and Feature Squeezing (FS) [Xu et al., 2017] defending
ResNet. The evaluation procedure consists of calculating the
amount of successful adversarial samples divided by the total
of possible adversarial samples. This also avoids problems
with different amount of perturbation necessary for attacks to
succeed, which could cause incomparable results.
4.2</p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>Robust Search Conversion of existing NAS</title>
        <p>By changing the fitness function (in the case of
evolutionary computation based NAS) or the reward function (in the
case of reinforcement learning based NAS), it is possible to
create robust search versions of NAS algorithms. In other
words, it is possible to convert the search for accuracy into
the search for robustness and accuracy. Here we use SMASH
and DeepArchitect for the tests. The reason for the choice
lies in the difference between the methods and availability of
the code. Both methods have their evaluation function
modified to contain not only accuracy but also robustness (Section
4.1).
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Robust Architecture Search (RAS)</title>
      <p>Here, we propose an evolutionary algorithm to search for
robust architectures called Robust Architecture Search (RAS)2.
It makes sense to focus here on search spaces that allow for
unusual layer types and their combinations to happen, which
is more vast than the current traditional search spaces. The
motivation to consider this vast search space is that some
of the most robust architectures might contain these unusual
combinations which are not yet found or deeply explored.
RAS Overview RAS works by creating three initial
populations (layer, block and model populations). Every
generation, the model population have each of its members modified
five times by mutations. The modified members are added to
the population as new members. Here we propose a utility
evaluation in which layer and block populations are evaluated
by the number of models (architectures) using them. Models
are evaluated by their accuracy and attack resilience
(accuracy on adversarial samples). All blocks and layers which are
2Code is available at http://bit.ly/RobustArchitectureSearch
Cluster Population
Model1_1</p>
      <p>Model1_2
Model2_1</p>
      <p>Model2_2
Model25_1</p>
      <p>Model25_2
not used by any of the current members of the model
population are removed at the end step of each generation.
Moreover, architectures compete with similar ones in their
subpopulation, such that only the fittest of each subpopulation
survives.
5.1</p>
      <sec id="sec-5-1">
        <title>Description of Population</title>
        <p>For such a vast search space to be more efficiently searched,
we propose to use three subpopulations, allowing for the
reuse of blocks and layers. Specifically, the layers consist of:
Layer Population: Raw layers (convolutional and fully
connected) which make up the blocks. Block Population: Blocks
which are a combination of layers. Model Population: A
population of architectures which consists of interconnected
blocks. Figure 1 illustrates the architecture.</p>
        <p>The initial population consists of 25 random architectures
which contain U (2; 5) blocks made up of U (2; 5) layers, in
which U (a; b) is a uniform random distribution with
minimum a and maximum b values. The possible available
parameters for the layers are as follows: for convolutional
layers, filter size might be 8, 16, 32 or 64, stride size may be 1
or 2 and kernel size is either 1, 3 or 5; for fully connected
layers, the unit size may assume the values of 64, 128, 256
or 512. All the layers use Rectified Linear Unit (ReLU) as an
activation function and are followed by a batch-normalisation
layer.
5.2</p>
      </sec>
      <sec id="sec-5-2">
        <title>Mutation Operators</title>
        <p>Regarding the mutation operators used to evolve the
architecture, they can be divided into layer, block and model
mutations which can only be applied to the respective layer, block
and model populations’ individuals. The following
paragraphs define the possible mutations.</p>
        <p>Layer Mutation Layer mutations are of the following
types: a) Change Kernel: Changes the kernel size of the
convolution layer, b) Change Filter: Changes the filter size of
the convolution layer, c) Change Units: Changes the unit size
of the fully connected layer, d) Swap Layer: Chosen layer is
swapped with a random layer from the layer population.
Block Mutation Block mutation change a single block in
the block population. The possibilities are: e) Add Layer: A
random layer is added to a chosen random block, f) Remove
Layer: A random layer is removed from a chosen random
block, g) Add Layer Connection: A random connection
between two layers from the chosen random block is added, h)
Remove Layer Connection: A random connection between
the two layers from the chosen random block is removed, i)
Swap Block: Chosen block is swapped with a random block
from the population.</p>
        <p>Model Mutation Model mutation modify a given
architecture. The possible model mutations are: j) Add Block: A
random block is added to the model, k) Remove Block: A random
block is removed from the model, l) Add Block connection: A
random connection between the two blocks is added, m)
Remove Block connection: A random connection between the
two blocks is removed.</p>
        <p>All mutations add a new member to the population instead
of substituting the previous one. In this manner, if nothing
is done, the population of layers and blocks may explode,
increasing the number of lesser quality layers and blocks. This
would cause the probability of choosing functional layers and
blocks to decrease. To avoid this, when the layer or block
population exceeds 100 individuals, the only layer/block
mutation available is swap layers/blocks.
5.3</p>
      </sec>
      <sec id="sec-5-3">
        <title>Objective (Fitness) Function</title>
        <p>Fitness of an individual of the model population is measured
using the final validation accuracy of the model after training
for a maximum of 200 epochs with early stopping if
accuracy or validation accuracy do not change more than 0:001 in
the span of 15 epochs. Regarding the fitness calculation, the
fitness is calculated as the accuracy of the model plus the
robustness of the model (Fitness = Accuracy + Robustness).</p>
        <p>The Accuracy of the architecture is calculated after the
model is trained for 50 epochs over the whole set of
samples (50000 samples) of the CIFAR-10’s training dataset for
every 10 generation(s) or over 1000 random samples of the
CIFAR-10 training dataset for all other generations. This
allows an efficient evolution to happen in which blocks and
layers evolve at a faster rate without interfering with the
architecture’s accuracy. Using entire dataset subjects to evolving
the architecture to have better accuracy and using a subset of
the dataset evolves the layers and blocks of the architecture at
a faster rate. The Robustness of the architecture is calculated
using accuracy on adversarial samples as described in Section
4.1.
To keep a high amount of diversity while searching in a vast
search space by using a novel algorithm described below also
shown in Figure 2. This niching scheme uses the idea of
Spectrum-based niching from [Vargas and Murata, 2017] but
explores a different approach to it. First, all the initial
population is converted into a cluster population such that each
individual in the initial population is a cluster representative.
Then we create two child individuals for each cluster
representative by randomly applying five mutation operators on
cluster representative. We then find the closest cluster
representative to the child individual using spectrum described
below. If the fitness of the child individual is better than
the closest cluster representative than the child individual
becomes the new cluster representative, and the old cluster
representative is removed from the population and the
generation. The process is completed for all the individuals in a
cluster population. We are hence evolving a generation of the
evolution.</p>
        <p>Here, we use the spectrum as a histogram containing the
features: Number of Blocks, Number of Total Layers,
Number of Block Connections, Number of Total Layer
Connections, Number of Dense Layers, Number of Convolution
Layers, Number of Dense to Dense Connections, Number of
Dense to Convolution Connections, Number of Convolution
to Dense Connections, and Number of Convolution to
Convolution Connections. By using this Spectrum-based niching
scheme, we aim to achieve an open-ended evolution,
preventing the evolution from converging to a single robust
architecture. Preserving diversity in the population ensures that the
exploration rate remains relatively high, allowing us to find
different architectures even after many evolution steps. For
the vast search space of architectures, this property is
especially important, allowing the algorithm to traverse the vast
search space efficiently.
6</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Experiments on RAS and Converted NAS</title>
      <p>Architecture Search</p>
      <p>Testing ER</p>
      <p>ER on Adversarial Samples
DeepArchitect*
SMASH*
Ours</p>
      <p>Here, experiments are conducted on both the proposed
RAS and converted versions of DeepArchitect and SMASH.
The objective is to achieve the highest robustness possible
using different types of architecture search algorithms and
compare their result and effectiveness. Initially,
DeepArchitect and Smash found architectures which had an error rate
of 11% and 4% respectively when the fitness is only based
on the neural network’s testing accuracy. However, when the
accuracy on adversarial samples is included in the evaluation
function, the final error rate increases to 25% and 23%
respectively (Table 2). This may also indicate that poisoning
the dataset might cause a substantial decrease in accuracy for
the architectures found by SMASH and DeepArchitect. In
the case of RAS, even with a more extensive search space, an
error rate of 18% is achieved.</p>
      <p>Regarding the robustness of the architectures found, Table
2 shows that the final architecture found by DeepArchitect
and SMASH were very susceptible to attacks, with error rate
on adversarial samples of 75% and 82% respectively.
Despite the inclusion of the R (measured accuracy on
adversarial samples) on the evaluation function, the architectures
were still unable to find a robust architecture. This might be
a consequence of the relatively small search space used and
more focused initialisation procedures. Moreover, the
proposed method (RAS) finds an architecture which has an error
rate of only 42% on adversarial samples. Note, however,
that in the case of the evolved architecture, this is an
inherent property of the architecture found. The architecture is
inherently robust without any kind of specialised training or
defence such as adversarial training (i.e., the architecture was
only trained on the training dataset). The addition of defences
should increase its robustness further.
7</p>
    </sec>
    <sec id="sec-7">
      <title>Analyzing RAS</title>
      <p>In this section, we will evaluate the proposed architecture
regarding its evolution quality and how subpopulations behave
throughout the process. Figure 3 shows how the mean
accuracy of the architectures evolved increases over time. The
pattern of behaviour is typical of evolutionary algorithms,
showing that evolution is happening as expected.</p>
      <p>In Figure 4, the overall characteristics of the evolved
architectures throughout the generations are shown. The average
number of blocks and the connections between them increase
over the generations. However, the average number of layers
never reaches the same complexity as the initial models. The
number of layers decreases steeply initially while slowly
increasing afterwards. Therefore, the overall behaviour is that
blocks become smaller and numerous. A consequence of this
is that the number of connections becomes proportional to the
number of connections between blocks and therefore exhibit
similar behaviour. The average number of layers per block
and the average number of connections shows little change,
varying only 0:5 and 0:16 respectively.</p>
      <p>Notice that the average number of layers increases but the
average number of layers per block continues to decrease
albeit slowly. Consequently, blocks tend to degenerate into a
few layers, resulting in around three layers per block from
the first average number of 3:2 layers per block. Lastly, the
average number of connections in a block is kept more or less
the same, with the mean varying throughout only from 2:1
to 2:26. The behaviour described above might suggest that it
is hard to create big reusable blocks. This seems to be
supported by both the decrease of complexity observed as well
as the increase in the number of blocks.
8</p>
    </sec>
    <sec id="sec-8">
      <title>Analyzing the Final Architecture:</title>
    </sec>
    <sec id="sec-9">
      <title>Searching for the Key to Inherent</title>
    </sec>
    <sec id="sec-10">
      <title>Robustness</title>
      <p>RAS found an architecture that possesses inherent robustness
capable of rivalling current defences. To investigate the
reason behind this robustness, we can take a more in-depth look
at the architecture found. Figure 5 show(s) some
peculiarities from the evolved architecture: multiple bottlenecks,
projections into high-dimensional space and paths with different
constraints.</p>
      <sec id="sec-10-1">
        <title>Multiple Bottlenecks and Projections into High</title>
        <p>Dimensional Space The first peculiarity is the use of
Dense layers in-between Convolutional ones. This might
seem like a bottleneck similar to the ones used in variational
autoencoders. However, it is the opposite of a bottleneck
(Figure 5); it is a projection in high-dimensional space. The
evolved architecture uses mostly a low number of filters
while, in some parts of it, high-dimensional projections exist.
In the whole architecture, four Dense layers in-between
Convolutional ones were used, and all of the projects into
higher dimensional space. This follows directly from Cover’s
Theorem which states that projecting into high dimensional
space makes a training set linearly separable [Cover, 1965].
Paths with Different Constraints The second peculiarity
is the use of multiple paths with the different number of
filters and output sizes after high-dimensional projections.
Notice how the number of filters differs in each of the
Convolutional layers in these paths. This means there are different
constraints over the learning in each of these paths, which
should foster different types of features. Therefore, this is
a multi-bottleneck structure forcing the learning of different
sets of features which are now easily constructed from the
previous high-dimensional projection.
9</p>
      </sec>
    </sec>
    <sec id="sec-11">
      <title>Conclusions</title>
      <p>Automatic search for robust architectures is proposed as a
paradigm for developing and researching robust models. This
paradigm is based on using adversarial attacks together with
error rate as evaluation functions in NAS. Experiments on
using this paradigm with some of the current NAS had poor
results. This was justified by the small search space used
by current methods. Here, we propose the RAS method,
which has a broader search space, including concatenation,
connections between dense to convolutional layer and
viceversa. Results with RAS showed that inherently robust
architectures do indeed exist. In fact, the evolved architecture
achieved robust results comparable with state-of-the-art
defences while not having any specialised training or defence.
In other words, the evolved architecture is inherently robust.
Such inherent robustness could increase if adversarial
training, or other types of defence, or a combination of them are
employed together with it.</p>
      <p>Moreover, investigating the reasons behind such robustness
have shown that some peculiar traits are present. The evolved
architecture has overall a low number of filters and many
bottlenecks. Multiple projections into high-dimensional space
are also present to possibly facilitate the separation of features
(Cover’s Theorem). It also uses multiple paths with
different constraints after the high-dimensional projection, which
should, consequently, cause a diverse set of features to be
learned by the network. Thus, in the search space of neural
networks, more robust architectures do exist, and more
research is required to find and fully document them as well as
their features.</p>
    </sec>
    <sec id="sec-12">
      <title>Acknowledgments</title>
      <p>This work was supported by JST, ACT-I Grant Number
JP50243 and JSPS KAKENHI Grant Number JP20241216.
Additionally, we would like to thank Prof. Junichi Murata for the
kind support without which it would not be possible to
conduct this research.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <source>[Athalye and Sutskever</source>
          , 2018]
          <string-name>
            <given-names>Anish</given-names>
            <surname>Athalye</surname>
          </string-name>
          and
          <string-name>
            <given-names>Ilya</given-names>
            <surname>Sutskever</surname>
          </string-name>
          .
          <article-title>Synthesizing robust adversarial examples</article-title>
          .
          <source>In Icml</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [Athalye et al.,
          <year>2018</year>
          ]
          <string-name>
            <given-names>Anish</given-names>
            <surname>Athalye</surname>
          </string-name>
          , Nicholas Carlini,
          <string-name>
            <given-names>and David</given-names>
            <surname>Wagner</surname>
          </string-name>
          .
          <article-title>Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples</article-title>
          .
          <source>In Icml</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [Brock et al.,
          <year>2017</year>
          ]
          <string-name>
            <given-names>Andrew</given-names>
            <surname>Brock</surname>
          </string-name>
          , Theodore Lim,
          <string-name>
            <surname>James M Ritchie</surname>
            ,
            <given-names>and Nick</given-names>
          </string-name>
          <string-name>
            <surname>Weston</surname>
          </string-name>
          .
          <article-title>Smash: one-shot model architecture search through hypernetworks</article-title>
          .
          <source>arXiv preprint arXiv:1708.05344</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <source>[Carlini and Wagner</source>
          , 2017]
          <string-name>
            <given-names>Nicholas</given-names>
            <surname>Carlini</surname>
          </string-name>
          and
          <string-name>
            <given-names>David</given-names>
            <surname>Wagner</surname>
          </string-name>
          .
          <article-title>Towards evaluating the robustness of neural networks</article-title>
          .
          <source>In 2017 IEEE Symposium on Security and Privacy (SP)</source>
          , pages
          <fpage>39</fpage>
          -
          <lpage>57</lpage>
          . Ieee,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <source>[Cover</source>
          , 1965]
          <article-title>Thomas M Cover. Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition</article-title>
          .
          <source>IEEE transactions on electronic computers</source>
          , (
          <volume>3</volume>
          ):
          <fpage>326</fpage>
          -
          <lpage>334</lpage>
          ,
          <year>1965</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [Goodfellow et al.,
          <year>2014</year>
          ] Ian J Goodfellow, Jonathon Shlens, and
          <string-name>
            <given-names>Christian</given-names>
            <surname>Szegedy</surname>
          </string-name>
          .
          <article-title>Explaining and harnessing adversarial examples</article-title>
          .
          <source>arXiv preprint arXiv:1412.6572</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [Hansen et al.,
          <year>2003</year>
          ]
          <string-name>
            <given-names>Nikolaus</given-names>
            <surname>Hansen</surname>
          </string-name>
          ,
          <string-name>
            <surname>Sibylle D Mu</surname>
          </string-name>
          <article-title>¨ller, and Petros Koumoutsakos. Reducing the time complexity of the derandomized evolution strategy with covariance matrix adaptation (cma-es)</article-title>
          .
          <source>Evolutionary computation</source>
          ,
          <volume>11</volume>
          (
          <issue>1</issue>
          ):
          <fpage>1</fpage>
          -
          <lpage>18</lpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [He et al.,
          <year>2016</year>
          ]
          <string-name>
            <given-names>Kaiming</given-names>
            <surname>He</surname>
          </string-name>
          , Xiangyu Zhang, Shaoqing Ren, and
          <string-name>
            <given-names>Jian</given-names>
            <surname>Sun</surname>
          </string-name>
          .
          <article-title>Deep residual learning for image recognition</article-title>
          .
          <source>In Proceedings of the IEEE conference on computer vision and pattern recognition</source>
          , pages
          <fpage>770</fpage>
          -
          <lpage>778</lpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [He et al.,
          <year>2019</year>
          ]
          <string-name>
            <given-names>Xin</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <surname>Kaiyong Zhao</surname>
            ,
            <given-names>and Xiaowen</given-names>
          </string-name>
          <string-name>
            <surname>Chu</surname>
          </string-name>
          .
          <article-title>Automl: A survey of the state-of-the-art</article-title>
          . arXiv preprint arXiv:
          <year>1908</year>
          .00709,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [Huang et al.,
          <year>2015</year>
          ]
          <string-name>
            <given-names>Ruitong</given-names>
            <surname>Huang</surname>
          </string-name>
          , Bing Xu,
          <string-name>
            <given-names>Dale</given-names>
            <surname>Schuurmans</surname>
          </string-name>
          , and
          <article-title>Csaba Szepesva´ri. Learning with a strong adversary</article-title>
          .
          <source>arXiv preprint arXiv:1511.03034</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [Krizhevsky et al.,
          <year>2009</year>
          ]
          <string-name>
            <given-names>Alex</given-names>
            <surname>Krizhevsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Geoffrey</given-names>
            <surname>Hinton</surname>
          </string-name>
          , et al.
          <article-title>Learning multiple layers of features from tiny images</article-title>
          .
          <source>Technical report</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [Kurakin et al.,
          <year>2016</year>
          ]
          <string-name>
            <given-names>Alexey</given-names>
            <surname>Kurakin</surname>
          </string-name>
          , Ian Goodfellow, and
          <string-name>
            <given-names>Samy</given-names>
            <surname>Bengio</surname>
          </string-name>
          .
          <article-title>Adversarial examples in the physical world</article-title>
          .
          <source>arXiv preprint arXiv:1607.02533</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <surname>[Lee</surname>
          </string-name>
          et al.,
          <year>2018</year>
          ]
          <string-name>
            <surname>Hyeon-Chang</surname>
            <given-names>Lee</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dong-Pil Yu</surname>
          </string-name>
          , and
          <string-name>
            <surname>Yong-Hyuk Kim</surname>
          </string-name>
          .
          <article-title>On the hardness of parameter optimization of convolution neural networks using genetic algorithm and machine learning</article-title>
          .
          <source>In Proceedings of the Genetic and Evolutionary Computation Conference Companion</source>
          , pages
          <fpage>51</fpage>
          -
          <lpage>52</lpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <source>[Lima and Pozo</source>
          , 2019]
          <article-title>Ricardo HR Lima and Aurora TR Pozo</article-title>
          .
          <article-title>Evolving convolutional neural networks through grammatical evolution</article-title>
          .
          <source>In Proceedings of the Genetic and Evolutionary Computation Conference Companion</source>
          , pages
          <fpage>179</fpage>
          -
          <lpage>180</lpage>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [Madry et al.,
          <year>2018</year>
          ]
          <string-name>
            <given-names>Aleksander</given-names>
            <surname>Madry</surname>
          </string-name>
          , Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and
          <string-name>
            <given-names>Adrian</given-names>
            <surname>Vladu</surname>
          </string-name>
          .
          <article-title>Towards deep learning models resistant to adversarial attacks</article-title>
          .
          <source>In Iclr</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [Miikkulainen et al.,
          <year>2019</year>
          ]
          <string-name>
            <given-names>Risto</given-names>
            <surname>Miikkulainen</surname>
          </string-name>
          , Jason Liang, Elliot Meyerson, Aditya Rawal, Daniel Fink, Olivier Francon, Bala Raju, Hormoz Shahrzad, Arshak Navruzyan,
          <string-name>
            <given-names>Nigel</given-names>
            <surname>Duffy</surname>
          </string-name>
          , et al.
          <article-title>Evolving deep neural networks</article-title>
          .
          <source>In Artificial Intelligence in the Age of Neural Networks and Brain Computing</source>
          , pages
          <fpage>293</fpage>
          -
          <lpage>312</lpage>
          . Elsevier,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [
          <string-name>
            <surname>Moosavi-Dezfooli</surname>
          </string-name>
          et al.,
          <year>2017</year>
          ]
          <string-name>
            <surname>Seyed-Mohsen</surname>
            <given-names>MoosaviDezfooli</given-names>
          </string-name>
          , Alhussein Fawzi, Omar Fawzi, and
          <string-name>
            <given-names>Pascal</given-names>
            <surname>Frossard</surname>
          </string-name>
          .
          <article-title>Universal adversarial perturbations</article-title>
          .
          <source>In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)</source>
          , pages
          <fpage>86</fpage>
          -
          <lpage>94</lpage>
          . Ieee,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <source>[Negrinho and Gordon</source>
          , 2017]
          <string-name>
            <given-names>Renato</given-names>
            <surname>Negrinho</surname>
          </string-name>
          and
          <string-name>
            <given-names>Geoff</given-names>
            <surname>Gordon</surname>
          </string-name>
          . Deeparchitect:
          <article-title>Automatically designing and training deep architectures</article-title>
          .
          <source>arXiv preprint arXiv:1704.08792</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [Papernot et al.,
          <year>2016</year>
          ]
          <string-name>
            <given-names>Nicolas</given-names>
            <surname>Papernot</surname>
          </string-name>
          ,
          <string-name>
            <surname>Patrick</surname>
            <given-names>McDaniel</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xi Wu</surname>
            , Somesh Jha, and
            <given-names>Ananthram</given-names>
          </string-name>
          <string-name>
            <surname>Swami</surname>
          </string-name>
          .
          <article-title>Distillation as a defense to adversarial perturbations against deep neural networks</article-title>
          .
          <source>In 2016 IEEE Symposium on Security and Privacy (SP)</source>
          , pages
          <fpage>582</fpage>
          -
          <lpage>597</lpage>
          . Ieee,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [Pham et al.,
          <year>2018</year>
          ]
          <string-name>
            <given-names>Hieu</given-names>
            <surname>Pham</surname>
          </string-name>
          , Melody Guan, Barret Zoph, Quoc Le, and
          <string-name>
            <given-names>Jeff</given-names>
            <surname>Dean</surname>
          </string-name>
          .
          <article-title>Efficient neural architecture search via parameter sharing</article-title>
          .
          <source>In International Conference on Machine Learning</source>
          , pages
          <fpage>4092</fpage>
          -
          <lpage>4101</lpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [Real et al.,
          <year>2017</year>
          ]
          <string-name>
            <given-names>Esteban</given-names>
            <surname>Real</surname>
          </string-name>
          , Sherry Moore, Andrew Selle,
          <string-name>
            <given-names>Saurabh</given-names>
            <surname>Saxena</surname>
          </string-name>
          , Yutaka Leon Suematsu, Jie Tan, Quoc V Le,
          <string-name>
            <given-names>and Alexey</given-names>
            <surname>Kurakin</surname>
          </string-name>
          .
          <article-title>Large-scale evolution of image classifiers</article-title>
          .
          <source>In Proceedings of the 34th International Conference on Machine Learning-</source>
          Volume
          <volume>70</volume>
          , pages
          <fpage>2902</fpage>
          -
          <lpage>2911</lpage>
          . JMLR. org,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [Sabour et al.,
          <year>2017</year>
          ]
          <string-name>
            <given-names>Sara</given-names>
            <surname>Sabour</surname>
          </string-name>
          , Nicholas Frosst, and
          <string-name>
            <given-names>Geoffrey E</given-names>
            <surname>Hinton</surname>
          </string-name>
          .
          <article-title>Dynamic routing between capsules</article-title>
          .
          <source>In Advances in neural information processing systems</source>
          , pages
          <fpage>3856</fpage>
          -
          <lpage>3866</lpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [Sharif et al.,
          <year>2016</year>
          ]
          <string-name>
            <given-names>Mahmood</given-names>
            <surname>Sharif</surname>
          </string-name>
          , Sruti Bhagavatula, Lujo Bauer, and
          <article-title>Michael K Reiter</article-title>
          .
          <article-title>Accessorize to a crime: Real and stealthy attacks on state-of-the-art face recognition</article-title>
          .
          <source>In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security</source>
          , pages
          <fpage>1528</fpage>
          -
          <lpage>1540</lpage>
          . Acm,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          <source>[Storn and Price</source>
          , 1997]
          <string-name>
            <given-names>R.</given-names>
            <surname>Storn</surname>
          </string-name>
          and
          <string-name>
            <given-names>K.</given-names>
            <surname>Price.</surname>
          </string-name>
          <article-title>Differential evolution-a simple and efficient heuristic for global optimization over continuous spaces</article-title>
          .
          <source>Journal of global optimization</source>
          ,
          <volume>11</volume>
          (
          <issue>4</issue>
          ):
          <fpage>341</fpage>
          -
          <lpage>359</lpage>
          ,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [Su et al.,
          <year>2019</year>
          ]
          <string-name>
            <given-names>Jiawei</given-names>
            <surname>Su</surname>
          </string-name>
          , Danilo Vasconcellos Vargas, and
          <string-name>
            <given-names>Kouichi</given-names>
            <surname>Sakurai</surname>
          </string-name>
          .
          <article-title>One pixel attack for fooling deep neural networks</article-title>
          .
          <source>IEEE Transactions on Evolutionary Computation</source>
          ,
          <volume>23</volume>
          (
          <issue>5</issue>
          ):
          <fpage>828</fpage>
          -
          <lpage>841</lpage>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          <source>[Szegedy</source>
          , 2014] Christian et al. Szegedy.
          <article-title>Intriguing properties of neural networks</article-title>
          .
          <source>In In ICLR. Citeseer</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [Uesato et al.,
          <year>2018</year>
          ]
          <string-name>
            <given-names>Jonathan</given-names>
            <surname>Uesato</surname>
          </string-name>
          ,
          <string-name>
            <surname>Brendan O'Donoghue</surname>
            ,
            <given-names>Pushmeet</given-names>
          </string-name>
          <string-name>
            <surname>Kohli</surname>
            , and
            <given-names>Aaron</given-names>
          </string-name>
          <string-name>
            <surname>Oord</surname>
          </string-name>
          .
          <article-title>Adversarial risk and the dangers of evaluating against weak attacks</article-title>
          .
          <source>In International Conference on Machine Learning</source>
          , pages
          <fpage>5032</fpage>
          -
          <lpage>5041</lpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          <source>[Vargas and Kotyan</source>
          , 2019]
          <article-title>Danilo Vasconcellos Vargas</article-title>
          and
          <string-name>
            <given-names>Shashank</given-names>
            <surname>Kotyan</surname>
          </string-name>
          .
          <article-title>Robustness assessment for adversarial machine learning: Problems, solutions and a survey of current neural networks and defenses</article-title>
          . arXiv preprint arXiv:
          <year>1906</year>
          .06026,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          <source>[Vargas and Murata</source>
          , 2017]
          <article-title>Danilo Vasconcellos Vargas</article-title>
          and
          <string-name>
            <given-names>Junichi</given-names>
            <surname>Murata</surname>
          </string-name>
          .
          <article-title>Spectrum-diverse neuroevolution with unified neural models</article-title>
          .
          <source>IEEE transactions on neural networks and learning systems</source>
          ,
          <volume>28</volume>
          (
          <issue>8</issue>
          ):
          <fpage>1759</fpage>
          -
          <lpage>1773</lpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          <source>[Vargas and Su</source>
          , 2019]
          <article-title>Danilo Vasconcellos Vargas and Jiawei Su. Understanding the one-pixel attack: Propagation maps and locality analysis</article-title>
          .
          <source>arXiv preprint arXiv:1902.02947</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [Xu et al.,
          <year>2017</year>
          ]
          <string-name>
            <given-names>Weilin</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>David</given-names>
            <surname>Evans</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and Yanjun</given-names>
            <surname>Qi</surname>
          </string-name>
          .
          <article-title>Feature squeezing: Detecting adversarial examples in deep neural networks</article-title>
          .
          <source>arXiv preprint arXiv:1704.01155</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          <source>[Yu and Kim</source>
          , 2018]
          <article-title>Dong-Pil Yu</article-title>
          and
          <string-name>
            <surname>Yong-Hyuk Kim</surname>
          </string-name>
          .
          <article-title>Is it worth to approximate fitness by machine learning? investigation on the extensibility according to problem size</article-title>
          .
          <source>In Proceedings of the Genetic and Evolutionary Computation Conference Companion</source>
          , pages
          <fpage>77</fpage>
          -
          <lpage>78</lpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [Zoph et al.,
          <year>2018</year>
          ]
          <string-name>
            <given-names>Barret</given-names>
            <surname>Zoph</surname>
          </string-name>
          , Vijay Vasudevan, Jonathon Shlens, and
          <string-name>
            <surname>Quoc</surname>
            <given-names>V</given-names>
          </string-name>
          <string-name>
            <surname>Le</surname>
          </string-name>
          .
          <article-title>Learning transferable architectures for scalable image recognition</article-title>
          .
          <source>In Proceedings of the IEEE conference on computer vision and pattern recognition</source>
          , pages
          <fpage>8697</fpage>
          -
          <lpage>8710</lpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>