<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Series</journal-title>
      </journal-title-group>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Vulnerability of machine learning models to adversarial examples</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Petra Vidnerová</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Roman Neruda</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Institute of Computer Science, Academy of Sciences of the</institution>
          <country country="CZ">Czech Republic</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2016</year>
      </pub-date>
      <volume>1649</volume>
      <fpage>187</fpage>
      <lpage>194</lpage>
      <abstract>
        <p>We propose a genetic algorithm for generating adversarial examples for machine learning models. Such approach is able to find adversarial examples without the access to model's parameters. Different models are tested, including both deep and shallow neural networks architectures. We show that RBF networks and SVMs with Gaussian kernels tend to be rather robust and not prone to misclassification of adversarial examples.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>
        Deep networks and convolutional neural networks enjoy
high interest nowadays. They have become the
state-ofart methods in many fields of machine learning, and have
been applied to various problems, including image
recognition, speech recognition, and natural language
processing [
        <xref ref-type="bibr" rid="ref3">5</xref>
        ].
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref10">12</xref>
        ] a counter-intuitive property of deep networks is
described. It relates to the stability of a neural network
with respect to small perturbation of their inputs. The
paper shows that applying an imperceptible non-random
perturbation to an input image, it is possible to
arbitrarily change the network prediction. These perturbations are
found by optimizing the input to maximize the prediction
error. Such perturbed examples are known as adversarial
examples. On some datasets, such as ImageNet, the
adversarial examples were so close to the original examples that
the differences were indistinguishable to the human eye.
      </p>
      <p>
        Paper [
        <xref ref-type="bibr" rid="ref2">4</xref>
        ] suggests that it is the linear behaviour in
highdimensional spaces what is sufficient to cause adversarial
examples (for example, a linear classifier exhibits this
behaviour too). They designed a fast method of generating
adversarial examples (adding small vector in the direction
of the sign of the derivation) and showed that adding these
examples to the training set further improves the
generalization of the model. In [
        <xref ref-type="bibr" rid="ref2">4</xref>
        ], in addition, the authors state
that adversarial examples are relatively robust, and they
generalize between neural networks with varied number
of layers, activations, or trained on different subsets of the
training data. In other words, if we use one neural
network to generate a set of adversarial examples, these
examples are also misclassified by another neural network
even when it was trained with different hyperparameters,
or when it was trained on a different set of examples.
Another results of fooling deep and convolutional networks
can be found in [
        <xref ref-type="bibr" rid="ref8">10</xref>
        ].
      </p>
      <p>
        This paper examines a vulnerability to adversarial
examples throughout variety of machine learning methods.
We propose a genetic algorithm for generating adversarial
examples. Though the evolution is slower than techniques
described in [
        <xref ref-type="bibr" rid="ref10 ref2">12, 4</xref>
        ], it enables us to obtain adversarial
examples even without the access to model’s weights. The
only thing we need is to be able to query the network to
classify a given example. From this point of view, the
misclassification of adversarial examples represent a security
flaw.
      </p>
      <p>This paper is organized as follows. Section 2 brings a
brief overview of machine learning models considered in
this paper. Section 3 describes the proposed genetic
algorithm. Section 4 describes the results of our experiments.
Finally, Section 5 concludes our paper.
2
2.1</p>
      <p>Deep and Shallow Architectures</p>
      <p>Deep and Convolutional Networks
Deep neural networks are feedforward neural networks
with multiple hidden layers between the input and output
layer. The layers typically have different units
depending on the task at hand. Among the units, there are
traditional perceptrons, where each unit (neuron) realizes a
nonlinear function, such as the sigmoid function: y(z) =
1
tanh(z) or y(z) = 1+e−z . Another alternative to the
perceptron is the rectified linear unit (ReLU): y(z) = max(0, z).
Like the sigmoid neurons, rectified linear units can be used
to compute any function, and they can be trained using
algorithms such as back-propagation and stochastic gradient
descent.</p>
      <p>Convolutional layers contain the so called convolutional
units that take advantage of the grid-like structure of the
inputs, such as in the case of 2-D bitmap images, time series,
etc. Convolutional units perform a simple discrete
convolution operation, which – for 2-D data – can be represented
by a matrix multiplication. Usually, to deal with large data
(such as large images), the convolution is applied multiple
times by sliding a small window over the data. The
convolutional units are typically used to extract some features
from the data, and they are often used together with the
socalled max pooling layers that perform an input reduction
by selecting one of many inputs, typically the one with
maximal value. Thus, the overall architecture of a deep
network for image classification tasks resembles a
pyramid with smaller number of units in higher layers of the
networks.</p>
      <p>For the output layer, mainly for classification tasks, the
softmax function: y(z) j = ∑kK=ez1jezk is often used. It has
the advantage that the output values can be interpreted as
probabilities of individual classes.</p>
      <p>Networks with at least one convolutional layer are
called convolutional neural networks (CNN), while
networks with all hidden layers consisting of perceptrons are
called multi-layer perceptrons (MLP).
2.2</p>
      <p>
        RBF networks and Kernel Methods
The history of radial basis function (RBF) networks can be
traced back to the 1980s, particularly to the study of
interpolation problems in numerical analysis [
        <xref ref-type="bibr" rid="ref6">8</xref>
        ]. The RBF
network [3] is a feedforward network with one hidden layer
realizing the basis functions and linear output layer. It
represents an alternative to classical models, such as
multilayer perceptrons. There is variety of learning methods for
RBF networks [
        <xref ref-type="bibr" rid="ref7">9</xref>
        ].
      </p>
      <p>
        In 1990s, a family of machine learning algorithms,
known as kernel methods, became very popular. They
have been applied to a number of real-world problems, and
they are still considered to be state-of-the-art methods in
various domains [
        <xref ref-type="bibr" rid="ref12">14</xref>
        ].
      </p>
      <p>
        Based on theoretical results on kernel approximation,
the popular support vector machine (SVM) [
        <xref ref-type="bibr" rid="ref11">2, 13</xref>
        ]
algorithm was developed. Its architecture is similar to RBF –
one hidden layer of kernel units and a linear output layer.
The learning algorithm is different, based on search for a
separating hyperplane with the highest margin. Common
kernel functions used for SVM learning are linear hx, x′i,
polynomial (γ hx, x′i + r)d , Gaussian exp(−γ |x − x′|2), and
sigmoid tanh(γ hx, x′i + r).
      </p>
      <p>Recently, due to popularity of deep architectures, such
models with only one hidden layer are often referred to as
shallow models.
3</p>
    </sec>
    <sec id="sec-2">
      <title>Genetic Algorithms</title>
      <p>
        To obtain an adversarial example for the trained machine
learning model (such as a neural network), we need to
optimize the input image with respect to network
output. For this task we employ genetic algorithms (GA).
GA represent a robust optimization method working with
the whole population of feasible solutions [
        <xref ref-type="bibr" rid="ref5">7</xref>
        ]. The
population evolves using operators of selection, mutation, and
crossover. Both the machine learning model and the target
output are fixed during the optimization.
      </p>
      <p>Each individual represents one possible input vector, i.e.
one image encoded as a vector of pixel values:</p>
      <p>I = {i1, i2, . . . , iN },
where ii ∈&lt; 0, 1 &gt; are levels of grey, and N is the size of
a flatten image. (For the sake of simplicity, we consider
only greyscale images in this paper, but it can be seen that
the same principle can be used for RGB images as well.)</p>
      <p>The crossover operator performs a classical two-point
crossover. The mutation introduces a small change to
some pixels. With the probability pmutate_pixel each pixel
is changed:</p>
      <p>ii = ii + r,
where r is drawn from Gaussian distribution. As a
selection, the tournament selection with tournament size 3 is
used.</p>
      <p>The fitness function should reflect the following two
criteria:
1. the individual should resemble the target image
2. if we evaluate the individual by our machine learning
model, we aim to obtain a prescribed target output
(i.e., misclassify it).</p>
      <p>Thus, in our case, a fitness function is defined
as: f (I) = −(0.5 ∗ cdist(I, target_image) + 0.5 ∗
cdist(model(I), target_answer)), where cdist is a
Euclidean distance.
4</p>
    </sec>
    <sec id="sec-3">
      <title>Experimental Results</title>
      <p>The goal of our experiments is to test various machine
learning models and their vulnerability to adversarial
examples.
4.1</p>
      <sec id="sec-3-1">
        <title>Overview of models</title>
        <p>
          As a representative of deep models we use two deep
architectures – an MLP network with rectified linear units
(ReLU), and a CNN. The MLP used in our experiments
consist of three fully connected layers. Two hidden layers
have 512 ReLUs each, using dropout; the output layer has
10 softmax units. The CNN has two convolutional
layers with 32 filters and ReLUs each, a max pooling layer,
a fully connected layer of 128 ReLUs, and a fully
connected output softmax layer. In addition to these two
models, we also used an ensemble of 10 MLPs. All models
were trained using the KERAS library [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ].
        </p>
        <p>
          Shallow networks in our experiments are represented by
an RBF network with 1000 Gaussian units, and SVM
models with Gaussian kernel (SVM-gauss), polynomial kernel
of grade 2 and 4 (SVM-poly2 and SVM-poly4), sigmoidal
kernel (SVM-sigmoid), and linear kernel (SVM-linear).
SVMs were trained using the SCIKIT library [
          <xref ref-type="bibr" rid="ref9">11</xref>
          ], Grid
search and crossvalidation techniques were used to tune
hyper-parameters. For RBF networks, we used our own
implementation. Overview of train and test accuracies can
be found in Tab. 1.
4.2
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>Experimental setup</title>
        <p>
          The well known MNIST data set [
          <xref ref-type="bibr" rid="ref4">6</xref>
          ] was used. It contains
70 000 images of hand written digits, 28 × 28 pixel each.
60 000 are used for training, 10 000 for testing. The
genetic algorithm was run with 50 individuals, for 10 000
        </p>
        <p>RBF MLP CNN SVM-gaSuVssM-poSlyV2M-poSlyV4M-sigSmVoMid-linear
Train 0.96 1.00 1.00 0.99 1.00 0.99 0.87 0.95
Test 0.96 0.98 0.99 0.98 0.98 0.98 0.88 0.94
generations, with crossover probability set to 0.6, and
mutation probability set to 0.1. The GA was run 9 times for
each model to find adversarial examples that resemble 9
different images (training samples from the beginning of
training set). All images were optimized to be classified
as zero.
• RBF network, SVM-gauss, and SVM-linear; never
misclassified, i.e. the genetic algorithm was not able
to find adversarial example for these models;
• SVM-poly2 and SVM-poly4 were resistant to finding
adversarial examples in 2 and 5 cases, respectively.</p>
        <p>Fig. 3 and 4 deal with the generalization of
adversarial examples over different models. For each adversarial
example the figure lists the output vectors of all models.
In the case of a digit 3, the adversarial example evolved
for MLP is also misclassified by an ensemble of MLPs,
and vice versa. Both examples are misclassified by
SVMsigmoid. However, adversarial example for the
SVMsigmoid is misclassified only by the SVM-linear model.
Adversarial example for SVM-poly2 is misclassified also
with other SVMs, except the SVM-gauss model.</p>
        <p>In general, it often happens that adversarial example
evolved for one model is misclassified by some of the other
models (see Tab. 6 and 7). There are some general trends:
• adversarial example evolved for CNN was never
misclassified by other models, and CNN never
misclassified other adversarial examples than those evolved
for the CNN;
• adversarial examples evolved for MLP are
misclassified also by ensemble of MLPs (all cases except
two) and adversarial examples evolved for ensemble
of MLPs are misclassified by MLP (all cases);
• adversarial examples evolved for the SVM-sigmoid
model are misclassified by SVM-linear (all cases
except two);
• adversarial examples for the SVM-poly2 model are
often (6 cases) misclassified by other SVMs
(SVMpoly4, SVM-sigmoid, SVM-linear), and in 4 cases
also by the SVM-gauss. In three cases it was also
misclassified by MLP and ensemble of MLPs, in one
case, the adversarial example for SVM-poly2 is
misclassified by all models but CNN (however, this
example is quite noisy);
• adversarial example for the SVM-poly4 model is in
two cases misclassified by all models but CNN, in
different case it is misclassified by all but the CNN
and RBF models, and in one case by all but CNN,
RBF, and SVM-gauss models;
• RBF network, SVM-gauss, and SVM-linear were
resistant to adversarial examples by genetic algorithm,
however they sometimes misclassify adversarial
examples of other models. These examples are already
quite noisy, however by human they would still be
classified correctly.
5</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusion</title>
      <p>We proposed a genetic algorithm for generating
adversarial examples for machine learning models. Our
experiment show that many machine models suffer from
vulnerability to adversarial examples, i.e. examples designed to be
misclassified. Some models are quite resistant to such
behaviour, namely models with local units – RBF networks
and SVMs with Gaussian kernels. It seems that it is the
local behaviour of units that prevents the models from being
fooled.</p>
      <p>Adversarial examples evolved for one model are often
misclassified also by some of other models, as was
elaborated in the experiments.</p>
      <sec id="sec-4-1">
        <title>Acknowledgements</title>
        <p>This work was partially supported by the Czech Grant
Agency grant 15-18108S, and institutional support of the
Institute of Computer Science RVO 67985807.
RBF
MLP
CNN
ENS
SVM-gauss
SVM-poly2
SVM-poly4
SVM-sigmoid
SVM-linear
RBF
MLP
CNN
ENS
SVM-gauss
SVM-poly
SVM-poly4
SVM-sigmoid
SVM-linear
0
5
10
15
20
25
0
5
10
15
20
25
0
5
10
15
20
25
0
5
10
15
20
25</p>
        <p>Evolved against RBF
0
5
10
15
20</p>
        <p>25</p>
        <p>Evolved against MLP
0
5
10
15
20</p>
        <p>25</p>
        <p>Evolved against CNN
0
5
10
15
20</p>
        <p>25</p>
        <p>Evolved against ENS
0
5
10
15
20
25</p>
        <p>0 1 2 3 4 5 6 7 8 9
RBF 0.16 0.06 0.12 0.79 0.01 -0.02 -0.06 -0.00 0.02 0.03
MLP 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00
CNN 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00
ENS 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00
SVM-gauss 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00
SVM-poly 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00
SVM-poly4 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00
SVM-sigmoid 0.04 0.00 0.02 0.79 0.00 0.06 0.00 0.01 0.05 0.02
SVM-linear 0.00 0.00 0.00 0.96 0.00 0.00 0.00 0.00 0.03 0.00
0 1 2 3 4 5 6 7 8 9
RBF 0.30 0.04 0.17 0.75 0.02 -0.03 -0.04 -0.01 -0.07 -0.00
MLP 0.96 0.00 0.01 0.01 0.00 0.00 0.00 0.00 0.01 0.01
CNN 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00
ENS 0.86 0.00 0.01 0.04 0.00 0.00 0.00 0.00 0.01 0.08
SVM-gauss 0.00 0.00 0.00 0.99 0.00 0.00 0.00 0.00 0.00 0.01
SVM-poly 0.04 0.00 0.01 0.91 0.00 0.00 0.00 0.00 0.02 0.02
SVM-poly4 0.03 0.00 0.01 0.93 0.00 0.00 0.00 0.00 0.01 0.01
SVM-sigmoid 0.49 0.00 0.03 0.30 0.00 0.04 0.00 0.01 0.10 0.02
SVM-linear 0.25 0.02 0.10 0.30 0.02 0.05 0.02 0.03 0.18 0.06
0 1 2 3 4 5 6 7 8 9
RBF 0.12 0.05 0.15 0.89 0.01 -0.18 -0.02 0.02 0.10 -0.03
MLP 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00
CNN 0.94 0.00 0.02 0.01 0.00 0.00 0.00 0.00 0.03 0.00
ENS 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00
SVM-gauss 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00
SVM-poly 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00
SVM-poly4 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00
SVM-sigmoid 0.04 0.00 0.02 0.86 0.00 0.03 0.00 0.00 0.04 0.01
SVM-linear 0.00 0.00 0.00 0.99 0.00 0.00 0.00 0.00 0.00 0.00
0 1 2 3 4 5 6 7 8 9
RBF 0.30 0.05 0.18 0.76 -0.01 -0.06 -0.04 -0.03 -0.05 -0.00
MLP 0.83 0.00 0.05 0.06 0.00 0.00 0.00 0.00 0.05 0.00
CNN 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00
ENS 0.96 0.00 0.01 0.02 0.00 0.00 0.00 0.00 0.01 0.00
SVM-gauss 0.00 0.00 0.00 0.99 0.00 0.00 0.00 0.00 0.00 0.01
SVM-poly 0.02 0.00 0.01 0.94 0.00 0.00 0.00 0.00 0.01 0.02
SVM-poly4 0.01 0.00 0.00 0.96 0.00 0.00 0.00 0.00 0.01 0.01
SVM-sigmoid 0.40 0.00 0.03 0.35 0.01 0.06 0.00 0.01 0.11 0.02
SVM-linear 0.19 0.01 0.06 0.50 0.01 0.05 0.01 0.02 0.11 0.04</p>
        <p>Evolved against SVM-RBF
0
5
10
15
20</p>
        <p>25</p>
        <p>Evolved against SVM-poly
0
5
10
15
20</p>
        <p>25</p>
        <p>Evolved against SVM-poly4
0
5
10
15
20</p>
        <p>25</p>
        <p>Evolved against SVM-sigmoid
0
5
10
15
20
25</p>
        <p>0 1 2 3 4 5 6 7 8 9
RBF 0.06 0.01 0.15 0.74 -0.00 -0.03 -0.04 -0.01 0.26 0.05
MLP 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00
CNN 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00
ENS 0.00 0.00 0.00 0.99 0.00 0.00 0.00 0.00 0.01 0.00
SVM-gauss 0.00 0.00 0.00 0.90 0.00 0.00 0.00 0.00 0.10 0.00
SVM-poly 0.00 0.00 0.00 0.50 0.00 0.00 0.00 0.00 0.50 0.00
SVM-poly4 0.00 0.00 0.00 0.60 0.00 0.00 0.00 0.00 0.40 0.00
SVM-sigmoid 0.03 0.00 0.03 0.63 0.00 0.09 0.00 0.01 0.19 0.02
SVM-linear 0.00 0.00 0.00 0.36 0.00 0.00 0.00 0.00 0.63 0.00</p>
        <sec id="sec-4-1-1">
          <title>Evolved for</title>
          <p>Example 1:
MLP
ensemble
CNN
SVM-poly2</p>
          <p>Also misclassified by
—
MLP
—
SVM-poly4, SMV-sigmoid,
SVM-linear
—
ensemble
MLP
—
ensemble, MLP, SVM-gauss, SVM-poly4,
SVM-sigmoid, SVM-linear
RBF, ensemble, MLP, SVM-gauss,
SVM-poly4, SVM-sigmoid, SVM-linear
SVM-linear</p>
        </sec>
        <sec id="sec-4-1-2">
          <title>SVM-sigmoid</title>
          <p>Example 3:
MLP
ensemble
CNN
SVM-poly2</p>
        </sec>
        <sec id="sec-4-1-3">
          <title>SVM-poly4</title>
        </sec>
        <sec id="sec-4-1-4">
          <title>SVM-sigmoid</title>
          <p>Example 4:
MLP
ensemble
CNN
SVM-poly2</p>
        </sec>
        <sec id="sec-4-1-5">
          <title>SVM-sigmoid</title>
          <p>Example 5:
MLP
ensemble
CNN</p>
          <p>SVM-poly2
https://github. om/f hollet/keras, 2015.
[2] C. Cortes and V. Vapnik. Support-vector networks.
Machine Learning, 20(3):273–297, 1995.
[3] F. Girosi, M. Jones, and T. Poggio. Regularization theory
and Neural Networks architectures. Neural Computation,
2:219–269, 7 1995.</p>
          <p>ensemble
MLP
—
ensemble, MLP, SVM-poly2
SVM-poly4, SVM-sigmoid, SVM-linear
all except CNN
SVM-linear
ensemble
MLP
—
SVM-gauss, SVM-poly4, SVM-sigmoid,
SVM-linear
SVM-linear
—
MLP
—
MLP, ensemble, SVM-poly2,
SVM-sigmoid, SVM-linear
SVM-linear
ensemble
MLP
—
—</p>
        </sec>
        <sec id="sec-4-1-6">
          <title>SVM-poly4 SVM-sigmoid</title>
          <p>Example 6:
MLP
ensemble
CNN
SVM-poly4</p>
        </sec>
        <sec id="sec-4-1-7">
          <title>SVM-sigmoid</title>
          <p>Example 7:
MLP
ensemble
CNN</p>
          <p>SVM-sigmoid</p>
        </sec>
        <sec id="sec-4-1-8">
          <title>Evolved for</title>
          <p>Example 8:
MLP
ensemble
CNN
SVM-poly2
SVM-sigmoid
Example 9:
MLP
ensemble
CNN
SVM-poly2
SVM-sigmoid
Example 10:
MLP
ensemble
CNN
SVM-poly2
SVM-poly4</p>
        </sec>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <source>digit 1</source>
          [1]
          <string-name>
            <given-names>François</given-names>
            <surname>Chollet</surname>
          </string-name>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Ian</surname>
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Goodfellow</surname>
            , Jonathon Shlens, and
            <given-names>Christian</given-names>
          </string-name>
          <string-name>
            <surname>Szegedy</surname>
          </string-name>
          .
          <source>Explaining and harnessing adversarial examples</source>
          ,
          <year>2014</year>
          . arXiv:
          <volume>1412</volume>
          .
          <fpage>6572</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Yoshua</given-names>
            <surname>Bengio Ian Goodfellow</surname>
          </string-name>
          and
          <string-name>
            <given-names>Aaron</given-names>
            <surname>Courville</surname>
          </string-name>
          .
          <article-title>Deep learning. Book in preparation for</article-title>
          MIT Press,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Yann</given-names>
            <surname>LeCun and Corinna Cortes</surname>
          </string-name>
          .
          <article-title>The mnist database of handwritten digits</article-title>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M.</given-names>
            <surname>Mitchell</surname>
          </string-name>
          .
          <article-title>An Introduction to Genetic Algorithms</article-title>
          . MIT Press, Cambridge, MA,
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J.</given-names>
            <surname>Moody</surname>
          </string-name>
          and C. Darken.
          <article-title>Fast learning in networks of locally-tuned processing units</article-title>
          .
          <source>Neural Computation</source>
          ,
          <volume>1</volume>
          :
          <fpage>289</fpage>
          -
          <lpage>303</lpage>
          ,
          <year>1989</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>R.</given-names>
            <surname>Neruda</surname>
          </string-name>
          and
          <string-name>
            <given-names>P.</given-names>
            <surname>Kudová</surname>
          </string-name>
          .
          <article-title>Learning methods for radial basis functions networks</article-title>
          .
          <source>Future Generation Computer Systems</source>
          ,
          <volume>21</volume>
          :
          <fpage>1131</fpage>
          -
          <lpage>1142</lpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Anh</given-names>
            <surname>Mai</surname>
          </string-name>
          <string-name>
            <surname>Nguyen</surname>
          </string-name>
          , Jason Yosinski, and
          <string-name>
            <given-names>Jeff</given-names>
            <surname>Clune</surname>
          </string-name>
          .
          <article-title>Deep neural networks are easily fooled: High confidence predictions for unrecognizable images</article-title>
          .
          <source>CoRR, abs/1412</source>
          .
          <year>1897</year>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>F.</given-names>
            <surname>Pedregosa</surname>
          </string-name>
          et al.
          <article-title>Scikit-learn: Machine learning in Python</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          ,
          <volume>12</volume>
          :
          <fpage>2825</fpage>
          -
          <lpage>2830</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Christian</surname>
            <given-names>Szegedy</given-names>
          </string-name>
          , Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and
          <string-name>
            <given-names>Rob</given-names>
            <surname>Fergus</surname>
          </string-name>
          .
          <source>Intriguing properties of neural networks</source>
          ,
          <year>2013</year>
          . arXiv:
          <volume>1312</volume>
          .
          <fpage>6199</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>V. N.</given-names>
            <surname>Vapnik</surname>
          </string-name>
          .
          <article-title>Statistical Learning Theory</article-title>
          . Wiley, NewYork,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>J. P.</given-names>
            <surname>Vert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Tsuda</surname>
          </string-name>
          , and
          <string-name>
            <given-names>B.</given-names>
            <surname>Scholkopf</surname>
          </string-name>
          .
          <article-title>A primer on kernel methods</article-title>
          .
          <source>Kernel Methods in Computational Biology</source>
          , pages
          <fpage>35</fpage>
          -
          <lpage>70</lpage>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>