<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Evolution Strategies for Deep Neural Network Models Design</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Petra Vidnerová</string-name>
          <email>petra@cs.cas.cz</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Roman Neruda</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Institute of Computer Science, The Czech Academy of Sciences</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2017</year>
      </pub-date>
      <volume>1885</volume>
      <fpage>159</fpage>
      <lpage>166</lpage>
      <abstract>
        <p>Deep neural networks have become the state-ofart methods in many fields of machine learning recently. Still, there is no easy way how to choose a network architecture which can significantly influence the network performance. This work is a step towards an automatic architecture design. We propose an algorithm for an optimization of a network architecture based on evolution strategies. The algorithm is inspired by and designed directly for the Keras library [3] which is one of the most common implementations of deep neural networks. The proposed algorithm is tested on MNIST data set and the prediction of air pollution based on sensor measurements, and it is compared to several fixed architectures and support vector regression.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>
        Deep neural networks (DNN) have become the
state-ofart methods in many fields of machine learning in recent
years. They have been applied to various problems,
including image recognition, speech recognition, and natural
language processing [
        <xref ref-type="bibr" rid="ref10 ref8">8, 10</xref>
        ].
      </p>
      <p>Deep neural networks are feed-forward neural networks
with multiple hidden layers between the input and output
layer. The layers typically have different units depending
on the task at hand. Among the units, there are traditional
perceptrons, where each unit (neuron) realizes a
nonlinear function, such as the sigmoid function, or the rectified
linear unit (ReLU).</p>
      <p>While the learning of weights of the deep neural
network is done by algorithms based on the stochastic
gradient descent, the choice of architecture, including a number
and sizes of layers, and a type of activation function, is
done manually by the user. However, the choice of
architecture has an important impact on the performance of the
DNN. Some kind of expertise is needed, and usually a trial
and error method is used in practice.</p>
      <p>
        In this work we exploit a fully automatic design of
deep neural networks. We investigate the use of
evolution strategies for evolution of a DNN architecture. There
are not many studies on evolution of DNN since such
approach has very high computational requirements. To keep
the search space as small as possible, we simplify our
model focusing on implementation of DNN in the Keras
library [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] that is a widely used tool for practical
applications of DNNs.
      </p>
      <p>
        The proposed algorithm is evaluated both on benchmark
and real-life data sets. As the benchmark data we use the
MNIST data set that is classification of handwritten digits.
The real data set is from the area of sensor networks for
air pollution monitoring. The data came from De Vito et
al [
        <xref ref-type="bibr" rid="ref21 ref5">21, 5</xref>
        ] and are described in detail in Section 5.1.
      </p>
      <p>The paper is organized as follows. Section 2 brings an
overview of related work. Section 3 briefly describes the
main ideas of our approach. In Section 4 our algorithm
based on evolution strategies is described. Section 5
summarizes the results of our experiments. Finally, Section 6
brings conclusion.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        Neuroevolution techniques have been applied successfully
for various machine learning problems [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. In classical
neuroevolution, no gradient descent is involved, both
architecture and weights undergo the evolutionary process.
However, because of large computational requirements the
applications are limited to small networks.
      </p>
      <p>
        There were quite many attempts on architecture
optimization via evolutionary process (e.g. [
        <xref ref-type="bibr" rid="ref1 ref19">19, 1</xref>
        ]) in previous
decades. Successful evolutionary techniques evolving the
structure of feed-forward and recurrent neural networks
include NEAT [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], HyperNEAT [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] and CoSyNE [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]
algorithms.
      </p>
      <p>On the other hand, studies dealing with evolution of
deep neural networks and convolutional networks started
to emerge only very recently. The training of one DNN
usually requires hours or days of computing time, quite
often utilizing GPU processors for speedup. Naturally,
the evolutionary techniques requiring thousands of
training trials were not considered a feasible choice.
Nevertheless, there are several approaches to reduce the overall
complexity of neuroevolution for DNN. Still due to limited
computational resources, the studies usually focus only on
parts of network design.</p>
      <p>
        For example, in [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] CMA-ES is used to optimize
hyperparameters of DNNs. In [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] the unsupervised
convolutional networks for vision-based reinforcement learning
are studied, the structure of CNN is held fixed and only a
small recurrent controller is evolved. However, the recent
paper [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] presents a simple distributed evolutionary
strategy that is used to train relatively large recurrent network
with competitive results on reinforcement learning tasks.
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] automated method for optimizing deep learning
architectures through evolution is proposed, extending
existing neuroevolution methods. Authors of [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] sketch a
genetic approach for evolving a deep autoencoder network
enhancing the sparsity of the synapses by means of special
operators. Finally, the paper [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] presents two version of
an evolutionary and co-evolutionary algorithm for design
of DNN with various transfer functions.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Our Approach</title>
      <p>In our approach we use evolution strategies to search
for optimal architecture of DNN, while the weights are
learned by gradient based technique.</p>
      <p>The main idea of our approach is to keep the search
space as small as possible, therefore the architecture
specification is simplified. It directly follows the
implementation of DNN in Keras library, where networks are defined
layer by layer, each layer fully connected with the next
layer. A layer is specified by number of neurons, type of
an activation function (all neurons in one layer have the
same type of an activation function), and type of
regularization (such as dropout).</p>
      <p>In this paper, we work only with fully connected
feedforward neural networks, but the approach can be further
modified to include also convolutional layers. Then the
architecture specification would also contain type of layer
(dense or convolutional) and in case of convolutional layer
size of the filter.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Evolution Strategies for DNN Design</title>
      <p>
        Evolution strategies (ES) were proposed for work with
real-valued vectors representing parameters of complex
optimization problems [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. In the illustration algorithm
bellow we can see a simple ES working with n individuals
in a population and generating m offspring by means of
Gaussian mutation. The environmental selection has two
traditional forms for evolution strategies. The so called
(n + m)-ES generates new generation by deterministically
choosing n best individuals from the set of (n + m)
parents and offspring. The so called (n, m)-ES generates new
generation by selecting from m new offspring (typically,
m &gt; n). The latter approach is considered more robust
against local optima premature convergence.
      </p>
      <p>
        Currently used evolution strategies may carry more
meta-parameters of the problem in the individual than just
a vector of mutation variances. A successful version of
evolution strategies, the so-called covariance matrix
adaptation ES (CMA-ES) [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] uses a clever strategy to
approximate the full N × N covariance matrix, thus representing
a general N-dimensional normal distribution. Crossover
operator is usually used within evolution strategies.
      </p>
      <p>In our implementation (n, m)-ES (see Alg. 1) is used.
Offspring are generated using both mutation and crossover
operators. Since our individuals are describing network
topology, they are not vectors of real numbers. So our
operators slightly differ from classical ES. The more detail
description follows.</p>
      <p>Algorithm 1 (n, m)-Evolution strategy optimizing
realvalued vector and utilizing adaptive variance for each
parameter
procedure (n, m)-ES
t ← 0</p>
      <p>Initialize population Pt n by randomly generated
vectors ~xt = (xt1, . . . , xtN,σ1t , . . . ,σ Nt )</p>
      <p>Evaluate individuals in Pt
while not terminating criterion do
for i ← 1, . . . , m do
choose randomly a parent ~xti,
generate an offspring ~yt</p>
      <p>i
by Gaussian mutation:
for j ← 1, . . . , N do
σ ′j ← σ j · (1 + α · N(0, 1))
x′j ← x j + σ ′j · N(0, 1)
end for
insert ~yti to offspring candidate population Pt′
end for</p>
      <p>Deterministically choose Pt+1 as n best
individuals from P′</p>
      <p>t
Discard Pt and Pt′
t ← t + 1
end while
end procedure</p>
      <sec id="sec-4-1">
        <title>4.1 Individuals</title>
        <p>Individuals are coding feed-forward neural networks
implemented as Keras model Sequential. The model
implemented as Sequential is built layer by layer, similarly an
individual consists of blocks representing individual
layers.</p>
        <p>I = (
[size1, drop1, act1,σ1size,σ1drop]1, . . . ,
[sizeH , dropH , actH ,σ Hsize,σHdrop]H
),
where H is the number of hidden layers, sizei
is the number of neurons in corresponding layer
that is dense (fully connected) layer, dropi is the
dropout rate (zero value represents no dropout),
acti ∈ { relu, tanh, sigmoid, hardsigmoid, linear}
stands for activation function, and σisize and σidrop are
strategy coefficients corresponding to size and dropout.</p>
        <p>So far, we work only with dense layers, but the
individual can be further generalized to work with convolutional
layers as well. Also other types of regularization can be
considered, we are limited to dropout for the first
experiments.
4.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Crossover</title>
        <p>The operator crossover combines two parent individuals
and produces two offspring individuals. It is implemented
as one-point crossover, where the cross-point is on the
border of a block.</p>
        <p>Let two parents be</p>
        <p>Ip1 = (B1p1, B2p1, . . . , Bkp1)</p>
        <p>Ip2 = (B1p2, B2p2, . . . , Blp2),
then the crossover produces offspring</p>
        <p>Io1 = (B1p1, . . . , Bcp1, Bcpp22+1, . . . , Blp2)</p>
        <p>p1
Io1 = (B1p2, . . . , Bcp2, Bcpp11+1, . . . , Bkp1),</p>
        <p>p2
where cp1 ∈ { 1, . . . , k − 1} and cp2 ∈ { 1, . . . , l − 1} .
4.3</p>
      </sec>
      <sec id="sec-4-3">
        <title>Mutation</title>
        <p>The operator mutation brings random changes to an
individual. Each time an individual is mutated, one of the
following mutation operators is randomly chosen:
• mutateLayer - introduces random changes to one
randomly selected layer. One of the following operators
is randomly chosen:
– changeLayerSize - the number of neurons is
changed. Gaussian mutation is used, adapting
strategy parameters σ size, the final number is
rounded (since size has to be integer).
– changeDropOut - the dropout rate is changed
using Gaussian mutation adapting strategy
parameters σ drop.
– changeActivation - the activation function is
changed, randomly chosen from the list of
available activations.
• addLayer - one randomly generated block is inserted
at random position.
• delLayer - one randomly selected block is deleted.</p>
        <p>Note, that the ES like mutation comes in play only when
size of layer or dropout parameter is changed. Otherwise
the strategy parameters are ignored.
4.4</p>
      </sec>
      <sec id="sec-4-4">
        <title>Fitness</title>
        <p>Fitness function should reflect a quality of the network
represented by an individual. To assess the generalization
ability of the network represented by the individual we use
a crossvalidation error. The lower the crossvalidation
error, the higher the fitness of the individual.</p>
        <p>Classical k-fold crossvalidation is used, i.e. the training
set is split into k-folds and each time one fold is used for
testing and the rest for training. The mean error on the
testing set over k run is evaluated.</p>
        <p>The mean squared error is used as an error function:
E = 100
1 N</p>
        <p>∑( f (xt ) − yt )2,</p>
        <p>N t=1
where T = (x1, y1), . . . , (xN , yN ) is the actual testing set and
f is the function represented by the learned network.
4.5</p>
      </sec>
      <sec id="sec-4-5">
        <title>Selection</title>
        <p>The tournament selection is used, i.e. each turn of the
tournament k individuals are selected at random and the one
with the highest fitness, in our case the one with the
lowest crossvalidation error, is selected.</p>
        <p>
          Our implementation of the proposed algorithm is
available at [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ].
5
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Experiments</title>
      <p>5.1</p>
      <sec id="sec-5-1">
        <title>Data Set</title>
        <p>
          For the first experiment we used real-world data from the
application area of sensor networks for air pollution
monitoring [
          <xref ref-type="bibr" rid="ref21 ref5">21, 5</xref>
          ], for the second experiment the well known
MNIST data set [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ].
        </p>
        <p>The sensor data contain tens of thousands
measurements of gas multi-sensor MOX array devices recording
concentrations of several gas pollutants collocated with a
conventional air pollution monitoring station that provides
labels for the data. The data are recorded in 1 hour
intervals, and there is quite a large number of gaps due to
sensor malfunctions. For our experiments we have chosen
data from the interval of March 10, 2004 to April 4, 2005,
taking into account each hour where records with missing
values were omitted. There are altogether 5 sensors as
inputs and 5 target output values representing concentrations
of CO, NO2, NOx, C6H6, and NMHC.</p>
        <p>The whole time period is divided into five intervals.
Then, only one interval is used for training, the rest is
utilized for testing. We considered five different choices of
the training part selection. This task may be quite difficult,
since the prediction is performed also in different parts of
the year than the learning, e.g. the model trained on data
obtained during winter may perform worse during summer
(as was suggested by experts in the application area).</p>
        <p>Table 1 brings overview of data sets sizes. All tasks have
8 input values (five sensors, temperature, absolute and
relative humidity) and 1 output (predicted value). All values
are normalized between h0, 1i.</p>
        <p>The MNIST data set contains 70 000 images of hand
written digits, 28 × 28 pixel each (see Fig. 1). 60 000 are
used for training, 10 000 for testing.
0 0 0 0
5 5 5 5
10 10 10 10
15 15 15 15
20 20 20 20
250 5 10 15 20 25 250 5 10 15 20 25 250 5 10 15 20 25 250 5 10 15 20 25
0 0 0 0
5 5 5 5
10 10 10 10
15 15 15 15
20 20 20 20
250 5 10 15 20 25 250 5 10 15 20 25 250 5 10 15 20 25 250 5 10 15 20 25
For the sensor data the proposed algorithm was run for
100 generations for each data set, with n = 10 and m = 30.
During fitness function evaluation the network weights
are trained by RMSprop (one of the standard algorithms)
for 500 epochs. Besides the ES classical GA was
implemented and run on sensor data with same fitness function.</p>
        <p>For the MNIST data set, the algorithm was run for 30
generations, with n = 5 and m = 10, for fitness evaluation
the RMSprop was run for 20 epochs.</p>
        <p>When the best individual is obtained, the corresponding
network is built and trained on the whole training set and
evaluated on the test set.
5.3</p>
      </sec>
      <sec id="sec-5-2">
        <title>Results</title>
        <p>The resulting testing errors obtained by GA and ES in
the first experiment are listed in Table 3. There are
average, standard deviation, minimum and maximum errors
over 10 computations. The performance of ES over GA is
slightly better, the ES achieved lower errors in 15 cases,
GA in 11 cases.</p>
        <p>
          Table 4 compares ES testing errors to results obtained
by support vector regression (SVR) with linear, RBF,
polynomial, and sigmoid kernel function. SVR was trained
using Scikit-learn library [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ], hyperparameters were found
using grid search and crossvalidation.
        </p>
        <p>The ES outperforms the SVR, it found best results in
17 cases.</p>
        <p>Finally, Table 5 compares the testing error of evolved
network to error of three fixed architectures (for example
30-10-1 stands for 2 hidden layers of 30 and 10 neurons,
one neuron in output layers, ReLU activation is used and
dropout 0.2). The evolved network achieved the most (10)
best results.</p>
        <p>Since this task does not have much training samples,
also the networks evolved are quite small. The typical
evolved network had one hidden layer of about 70
neurons, dropout rate 0.3 and ReLU activation function.</p>
        <p>The second experiment was the classification of MNIST
letters. As a baseline architecture was taken the one from
Keras examples, i.e. network with two hidden layers of
512 ReLU units each, both with dropout 0.2. This network
has a fairly good performance. It was trained 10 times
and the results are listed in Table 2, together with results
obtained by the evolved network.</p>
        <p>The evolved network had also two hidden layers, first
with 736 ReLU units and dropout parameter 0.09, the
second with 471 hard sigmoid units and dropout 0.2. The ES
found a competitive result, the evolved network achieved
better accuracy than the baseline model.
6</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Conclusion</title>
      <p>We have proposed an algorithm for automatic design of
DNNs based on evolution strategies. The algorithm was
tested in experiments on the real-life sensor data set and
MNIST dataset of handwritten digits. On sensor data set,
the solutions found by our algorithm outperforms SVR and
selected fixed architectures. The activation function
dominating in solutions is the ReLU function. For the MNIST
data set, the network with ReLU and hard sigmoid units
was found, outperforming the baseline solution. We have
shown that our algorithm is able to found competitive
solutions.</p>
      <p>The main limitation of the algorithm is the time
complexity. One direction of our future work is to try to lower
the number of fitness evaluations using surrogate
modeling or to use asynchronous evolution.</p>
      <p>Also we plan to extend the algorithm to work also with
convolutional networks and to include more parameters,
such as other types of regularization, the type of
optimization algorithm, etc.</p>
      <p>The gradient based optimization algorithm depends
significantly on the random initialization of weights. One
way to overcome this is to combine the evolution of
weights and gradient based local search that is another
possibility of future work.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgment</title>
      <p>This work was partially supported by the Czech Grant
Agency grant 15-18108S and institutional support of the
Institute of Computer Science RVO 67985807.</p>
      <p>Access to computing and storage facilities owned by
parties and projects contributing to the National Grid
Infrastructure MetaCentrum provided under the programme
"Projects of Large Research, Development, and
Innovations Infrastructures" (CESNET LM2015042), is greatly
appreciated.</p>
      <sec id="sec-7-1">
        <title>Task</title>
      </sec>
      <sec id="sec-7-2">
        <title>Task</title>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Jasmina</given-names>
            <surname>Arifovic</surname>
          </string-name>
          and
          <string-name>
            <given-names>Ramazan</given-names>
            <surname>Gençay</surname>
          </string-name>
          .
          <article-title>Using genetic algorithms to select architecture of a feedforward artificial neural network</article-title>
          .
          <source>Physica A: Statistical Mechanics and its Applications</source>
          ,
          <volume>289</volume>
          (
          <issue>3-4</issue>
          ):
          <fpage>574</fpage>
          -
          <lpage>594</lpage>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>H.-G.</given-names>
            <surname>Beyer</surname>
          </string-name>
          and
          <string-name>
            <given-names>H. P.</given-names>
            <surname>Schwefel</surname>
          </string-name>
          .
          <article-title>Evolutionary strategies: A comprehensive introduction</article-title>
          .
          <source>Natural Computing</source>
          , pages
          <fpage>3</fpage>
          -
          <lpage>52</lpage>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>François</given-names>
            <surname>Chollet</surname>
          </string-name>
          . Keras. https://github.com/fchollet/keras ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Omid</surname>
            <given-names>E. David and Iddo</given-names>
          </string-name>
          <string-name>
            <surname>Greental</surname>
          </string-name>
          .
          <article-title>Genetic algorithms for evolving deep neural networks</article-title>
          .
          <source>In Proceedings of the Companion Publication of the 2014 Annual Conference on Genetic and Evolutionary Computation</source>
          , GECCO Comp '
          <volume>14</volume>
          , pages
          <fpage>1451</fpage>
          -
          <lpage>1452</lpage>
          , New York, NY, USA,
          <year>2014</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.</given-names>
            <surname>De Vito</surname>
          </string-name>
          , G. Fattoruso,
          <string-name>
            <given-names>M.</given-names>
            <surname>Pardo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Tortorella</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G. Di</given-names>
            <surname>Francia</surname>
          </string-name>
          .
          <article-title>Semi-supervised learning techniques in artificial olfaction: A novel approach to classification problems and drift counteraction</article-title>
          .
          <source>Sensors Journal, IEEE</source>
          ,
          <volume>12</volume>
          (
          <issue>11</issue>
          ):
          <fpage>3215</fpage>
          -
          <lpage>3224</lpage>
          ,
          <year>Nov 2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Dario</given-names>
            <surname>Floreano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Peter</given-names>
            <surname>Dürr</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Claudio</given-names>
            <surname>Mattiussi</surname>
          </string-name>
          .
          <article-title>Neuroevolution: from architectures to learning</article-title>
          .
          <source>Evolutionary Intelligence</source>
          ,
          <volume>1</volume>
          (
          <issue>1</issue>
          ):
          <fpage>47</fpage>
          -
          <lpage>62</lpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Faustino</given-names>
            <surname>Gomez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Juergen</given-names>
            <surname>Schmidhuber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and Risto</given-names>
            <surname>Miikkulainen</surname>
          </string-name>
          .
          <article-title>Accelerated neural evolution through cooperatively coevolved synapses</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          , pages
          <fpage>937</fpage>
          -
          <lpage>965</lpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Ian</given-names>
            <surname>Goodfellow</surname>
          </string-name>
          , Yoshua Bengio, and
          <string-name>
            <given-names>Aaron</given-names>
            <surname>Courville</surname>
          </string-name>
          .
          <source>Deep Learning</source>
          . MIT Press,
          <year>2016</year>
          . http://www.deeplearningbook.org.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Jan</given-names>
            <surname>Koutník</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Juergen</given-names>
            <surname>Schmidhuber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and Faustino</given-names>
            <surname>Gomez</surname>
          </string-name>
          .
          <article-title>Evolving deep unsupervised convolutional networks for vision-based reinforcement learning</article-title>
          .
          <source>In Proceedings of the 2014 Annual Conference on Genetic and Evolutionary Computation</source>
          ,
          <source>GECCO '14</source>
          , pages
          <fpage>541</fpage>
          -
          <lpage>548</lpage>
          , New York, NY, USA,
          <year>2014</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Yann</surname>
            <given-names>Lecun</given-names>
          </string-name>
          , Yoshua Bengio, and
          <string-name>
            <given-names>Geoffrey</given-names>
            <surname>Hinton</surname>
          </string-name>
          .
          <article-title>Deep learning</article-title>
          .
          <source>Nature</source>
          ,
          <volume>521</volume>
          (
          <issue>7553</issue>
          ):
          <fpage>436</fpage>
          -
          <lpage>444</lpage>
          , 5
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Yann</surname>
            <given-names>LeCun</given-names>
          </string-name>
          and
          <string-name>
            <given-names>Corinna</given-names>
            <surname>Cortes</surname>
          </string-name>
          .
          <article-title>The mnist database of handwritten digits</article-title>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Ilya</given-names>
            <surname>Loshchilov</surname>
          </string-name>
          and
          <string-name>
            <given-names>Frank</given-names>
            <surname>Hutter</surname>
          </string-name>
          .
          <article-title>CMA-ES for hyperparameter optimization of deep neural networks</article-title>
          .
          <source>CoRR, abs/1604.07269</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Tomas</surname>
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Maul</surname>
          </string-name>
          , Andrzej Bargiela,
          <string-name>
            <surname>Siang-Yew Chong</surname>
          </string-name>
          , and
          <string-name>
            <surname>Abdullahi</surname>
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Adamu</surname>
          </string-name>
          .
          <article-title>Towards evolutionary deep neural networks</article-title>
          .
          <source>In Flaminio Squazzoni</source>
          , Fabio Baronio, Claudia Archetti, and Marco Castellani, editors,
          <source>ECMS 2014 Proceedings. European Council for Modeling and Simulation</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Risto</surname>
            <given-names>Miikkulainen</given-names>
          </string-name>
          , Jason Zhi Liang, Elliot Meyerson, Aditya Rawal, Dan Fink, Olivier Francon, Bala Raju, Hormoz Shahrzad, Arshak Navruzyan, Nigel Duffy, and
          <string-name>
            <given-names>Babak</given-names>
            <surname>Hodjat</surname>
          </string-name>
          .
          <article-title>Evolving deep neural networks</article-title>
          .
          <source>CoRR, abs/1703.00548</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>F.</given-names>
            <surname>Pedregosa</surname>
          </string-name>
          et al.
          <article-title>Scikit-learn: Machine learning in Python</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          ,
          <volume>12</volume>
          :
          <fpage>2825</fpage>
          -
          <lpage>2830</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>T.</given-names>
            <surname>Salimans</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Ho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <surname>and I. Sutskever.</surname>
          </string-name>
          <article-title>Evolution Strategies as a Scalable Alternative to Reinforcement Learning</article-title>
          . ArXiv e-prints,
          <year>March 2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Kenneth</surname>
            <given-names>O.</given-names>
          </string-name>
          <string-name>
            <surname>Stanley</surname>
          </string-name>
          ,
          <string-name>
            <surname>David B. D'Ambrosio</surname>
            ,
            <given-names>and Jason</given-names>
          </string-name>
          <string-name>
            <surname>Gauci</surname>
          </string-name>
          .
          <article-title>A hypercube-based encoding for evolving largescale neural networks</article-title>
          .
          <source>Artif. Life</source>
          ,
          <volume>15</volume>
          (
          <issue>2</issue>
          ):
          <fpage>185</fpage>
          -
          <lpage>212</lpage>
          ,
          <year>April 2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Kenneth</surname>
            <given-names>O.</given-names>
          </string-name>
          <string-name>
            <surname>Stanley</surname>
            and
            <given-names>Risto</given-names>
          </string-name>
          <string-name>
            <surname>Miikkulainen</surname>
          </string-name>
          .
          <article-title>Evolving neural networks through augmenting topologies</article-title>
          .
          <source>Evolutionary Computation</source>
          ,
          <volume>10</volume>
          (
          <issue>2</issue>
          ):
          <fpage>99</fpage>
          -
          <lpage>127</lpage>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>B. u.</given-names>
            <surname>Islam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Baharudin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. Q.</given-names>
            <surname>Raza</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Nallagownden</surname>
          </string-name>
          .
          <article-title>Optimization of neural network architecture using genetic algorithm for load forecasting</article-title>
          .
          <source>In 2014 5th International Conference on Intelligent and Advanced Systems (ICIAS)</source>
          , pages
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          ,
          <year>June 2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>Petra</given-names>
            <surname>Vidnerová</surname>
          </string-name>
          . GAKeras. github.com/PetraVidnerova/GAKeras,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>S.</given-names>
            <surname>De Vito</surname>
          </string-name>
          , E. Massera,
          <string-name>
            <given-names>M.</given-names>
            <surname>Piga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Martinotto</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G. Di</given-names>
            <surname>Francia</surname>
          </string-name>
          .
          <article-title>On field calibration of an electronic nose for benzene estimation in an urban pollution monitoring scenario</article-title>
          .
          <source>Sensors and Actuators B: Chemical</source>
          ,
          <volume>129</volume>
          (
          <issue>2</issue>
          ):
          <fpage>750</fpage>
          -
          <lpage>757</lpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>