<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Evolution Strategies for Deep Neural Network Models Design</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Petra</forename><surname>Vidnerová</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Institute of Computer Science</orgName>
								<orgName type="institution">The Czech Academy of Sciences</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Roman</forename><surname>Neruda</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Institute of Computer Science</orgName>
								<orgName type="institution">The Czech Academy of Sciences</orgName>
							</affiliation>
						</author>
						<title level="a" type="main">Evolution Strategies for Deep Neural Network Models Design</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">3DA45FB1F53833BA5A76070FB0103C0E</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T10:10+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Deep neural networks have become the state-ofart methods in many fields of machine learning recently. Still, there is no easy way how to choose a network architecture which can significantly influence the network performance.</p><p>This work is a step towards an automatic architecture design. We propose an algorithm for an optimization of a network architecture based on evolution strategies. The algorithm is inspired by and designed directly for the Keras library [3] which is one of the most common implementations of deep neural networks.</p><p>The proposed algorithm is tested on MNIST data set and the prediction of air pollution based on sensor measurements, and it is compared to several fixed architectures and support vector regression.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Deep neural networks (DNN) have become the state-ofart methods in many fields of machine learning in recent years. They have been applied to various problems, including image recognition, speech recognition, and natural language processing <ref type="bibr" target="#b7">[8,</ref><ref type="bibr" target="#b9">10]</ref>.</p><p>Deep neural networks are feed-forward neural networks with multiple hidden layers between the input and output layer. The layers typically have different units depending on the task at hand. Among the units, there are traditional perceptrons, where each unit (neuron) realizes a nonlinear function, such as the sigmoid function, or the rectified linear unit (ReLU).</p><p>While the learning of weights of the deep neural network is done by algorithms based on the stochastic gradient descent, the choice of architecture, including a number and sizes of layers, and a type of activation function, is done manually by the user. However, the choice of architecture has an important impact on the performance of the DNN. Some kind of expertise is needed, and usually a trial and error method is used in practice.</p><p>In this work we exploit a fully automatic design of deep neural networks. We investigate the use of evolution strategies for evolution of a DNN architecture. There are not many studies on evolution of DNN since such approach has very high computational requirements. To keep the search space as small as possible, we simplify our model focusing on implementation of DNN in the Keras library <ref type="bibr" target="#b2">[3]</ref> that is a widely used tool for practical applications of DNNs.</p><p>The proposed algorithm is evaluated both on benchmark and real-life data sets. As the benchmark data we use the MNIST data set that is classification of handwritten digits. The real data set is from the area of sensor networks for air pollution monitoring. The data came from De Vito et al <ref type="bibr" target="#b20">[21,</ref><ref type="bibr" target="#b4">5]</ref> and are described in detail in Section 5.1.</p><p>The paper is organized as follows. Section 2 brings an overview of related work. Section 3 briefly describes the main ideas of our approach. In Section 4 our algorithm based on evolution strategies is described. Section 5 summarizes the results of our experiments. Finally, Section 6 brings conclusion.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Related Work</head><p>Neuroevolution techniques have been applied successfully for various machine learning problems <ref type="bibr" target="#b5">[6]</ref>. In classical neuroevolution, no gradient descent is involved, both architecture and weights undergo the evolutionary process. However, because of large computational requirements the applications are limited to small networks.</p><p>There were quite many attempts on architecture optimization via evolutionary process (e.g. <ref type="bibr" target="#b18">[19,</ref><ref type="bibr" target="#b0">1]</ref>) in previous decades. Successful evolutionary techniques evolving the structure of feed-forward and recurrent neural networks include NEAT <ref type="bibr" target="#b17">[18]</ref>, HyperNEAT <ref type="bibr" target="#b16">[17]</ref> and CoSyNE <ref type="bibr" target="#b6">[7]</ref> algorithms.</p><p>On the other hand, studies dealing with evolution of deep neural networks and convolutional networks started to emerge only very recently. The training of one DNN usually requires hours or days of computing time, quite often utilizing GPU processors for speedup. Naturally, the evolutionary techniques requiring thousands of training trials were not considered a feasible choice. Nevertheless, there are several approaches to reduce the overall complexity of neuroevolution for DNN. Still due to limited computational resources, the studies usually focus only on parts of network design.</p><p>For example, in <ref type="bibr" target="#b11">[12]</ref> CMA-ES is used to optimize hyperparameters of DNNs. In <ref type="bibr" target="#b8">[9]</ref> the unsupervised convolutional networks for vision-based reinforcement learning are studied, the structure of CNN is held fixed and only a small recurrent controller is evolved. However, the recent paper <ref type="bibr" target="#b15">[16]</ref> presents a simple distributed evolutionary strategy that is used to train relatively large recurrent network with competitive results on reinforcement learning tasks.</p><p>In <ref type="bibr" target="#b13">[14]</ref> automated method for optimizing deep learning architectures through evolution is proposed, extending ex-isting neuroevolution methods. Authors of <ref type="bibr" target="#b3">[4]</ref> sketch a genetic approach for evolving a deep autoencoder network enhancing the sparsity of the synapses by means of special operators. Finally, the paper <ref type="bibr" target="#b12">[13]</ref> presents two version of an evolutionary and co-evolutionary algorithm for design of DNN with various transfer functions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Our Approach</head><p>In our approach we use evolution strategies to search for optimal architecture of DNN, while the weights are learned by gradient based technique.</p><p>The main idea of our approach is to keep the search space as small as possible, therefore the architecture specification is simplified. It directly follows the implementation of DNN in Keras library, where networks are defined layer by layer, each layer fully connected with the next layer. A layer is specified by number of neurons, type of an activation function (all neurons in one layer have the same type of an activation function), and type of regularization (such as dropout).</p><p>In this paper, we work only with fully connected feedforward neural networks, but the approach can be further modified to include also convolutional layers. Then the architecture specification would also contain type of layer (dense or convolutional) and in case of convolutional layer size of the filter.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Evolution Strategies for DNN Design</head><p>Evolution strategies (ES) were proposed for work with real-valued vectors representing parameters of complex optimization problems <ref type="bibr" target="#b1">[2]</ref>. In the illustration algorithm bellow we can see a simple ES working with n individuals in a population and generating m offspring by means of Gaussian mutation. The environmental selection has two traditional forms for evolution strategies. The so called (n + m)-ES generates new generation by deterministically choosing n best individuals from the set of (n + m) parents and offspring. The so called (n, m)-ES generates new generation by selecting from m new offspring (typically, m &gt; n). The latter approach is considered more robust against local optima premature convergence.</p><p>Currently used evolution strategies may carry more meta-parameters of the problem in the individual than just a vector of mutation variances. A successful version of evolution strategies, the so-called covariance matrix adaptation ES (CMA-ES) <ref type="bibr" target="#b11">[12]</ref> uses a clever strategy to approximate the full N × N covariance matrix, thus representing a general N-dimensional normal distribution. Crossover operator is usually used within evolution strategies.</p><p>In our implementation (n, m)-ES (see Alg. 1) is used. Offspring are generated using both mutation and crossover operators. Since our individuals are describing network topology, they are not vectors of real numbers. So our operators slightly differ from classical ES. The more detail description follows.  </p><formula xml:id="formula_0">for j ← 1, . . . , N do σ ′ j ← σ j • (1 + α • N(0, 1)) x ′ j ← x j + σ ′ j • N(0,</formula><formula xml:id="formula_1">I = ( [size 1 , drop 1 , act 1 , σ size 1 , σ drop 1 ] 1 , . . . , [size H , drop H , act H , σ size H , σ drop H ] H ),</formula><p>where H is the number of hidden layers, size i is the number of neurons in corresponding layer that is dense (fully connected) layer, drop i is the dropout rate (zero value represents no dropout), act i ∈ {relu, tanh, sigmoid, hardsigmoid, linear} stands for activation function, and σ size i and σ drop i are strategy coefficients corresponding to size and dropout.</p><p>So far, we work only with dense layers, but the individual can be further generalized to work with convolutional layers as well. Also other types of regularization can be considered, we are limited to dropout for the first experiments.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">Crossover</head><p>The operator crossover combines two parent individuals and produces two offspring individuals. It is implemented as one-point crossover, where the cross-point is on the border of a block.</p><p>Let two parents be</p><formula xml:id="formula_2">I p1 = (B p1 1 , B p1 2 , . . . , B p1 k ) I p2 = (B p2</formula><p>1 , B p2 2 , . . . , B p2 l ), then the crossover produces offspring</p><formula xml:id="formula_3">I o1 = (B p1 1 , . . . , B p1 cp1 , B p2 cp2+1 , . . . , B p2 l ) I o1 = (B p2 1 , . . . , B p2 cp2 , B p1 cp1+1 , . . . , B p1 k )</formula><p>, where cp 1 ∈ {1, . . . , k − 1} and cp 2 ∈ {1, . . . , l − 1}.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3">Mutation</head><p>The operator mutation brings random changes to an individual. Each time an individual is mutated, one of the following mutation operators is randomly chosen:</p><p>• mutateLayer -introduces random changes to one randomly selected layer. One of the following operators is randomly chosen:</p><p>-changeLayerSize -the number of neurons is changed. Gaussian mutation is used, adapting strategy parameters σ size , the final number is rounded (since size has to be integer). -changeDropOut -the dropout rate is changed using Gaussian mutation adapting strategy parameters σ drop . -changeActivation -the activation function is changed, randomly chosen from the list of available activations.</p><p>• addLayer -one randomly generated block is inserted at random position.</p><p>• delLayer -one randomly selected block is deleted.</p><p>Note, that the ES like mutation comes in play only when size of layer or dropout parameter is changed. Otherwise the strategy parameters are ignored.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.4">Fitness</head><p>Fitness function should reflect a quality of the network represented by an individual. To assess the generalization ability of the network represented by the individual we use a crossvalidation error. The lower the crossvalidation error, the higher the fitness of the individual.</p><p>Classical k-fold crossvalidation is used, i.e. the training set is split into k-folds and each time one fold is used for testing and the rest for training. The mean error on the testing set over k run is evaluated.</p><p>The mean squared error is used as an error function:</p><formula xml:id="formula_4">E = 100 1 N N ∑ t=1 ( f (x t ) − y t ) 2 ,</formula><p>where T = (x 1 , y 1 ), . . . , (x N , y N ) is the actual testing set and f is the function represented by the learned network.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.5">Selection</head><p>The tournament selection is used, i.e. each turn of the tournament k individuals are selected at random and the one with the highest fitness, in our case the one with the lowest crossvalidation error, is selected. Our implementation of the proposed algorithm is available at <ref type="bibr" target="#b19">[20]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Experiments</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1">Data Set</head><p>For the first experiment we used real-world data from the application area of sensor networks for air pollution monitoring <ref type="bibr" target="#b20">[21,</ref><ref type="bibr" target="#b4">5]</ref>, for the second experiment the well known MNIST data set <ref type="bibr" target="#b10">[11]</ref>.</p><p>The sensor data contain tens of thousands measurements of gas multi-sensor MOX array devices recording concentrations of several gas pollutants collocated with a conventional air pollution monitoring station that provides labels for the data. The data are recorded in 1 hour intervals, and there is quite a large number of gaps due to sensor malfunctions. For our experiments we have chosen data from the interval of <ref type="bibr">March 10, 2004</ref>  The whole time period is divided into five intervals. Then, only one interval is used for training, the rest is utilized for testing. We considered five different choices of the training part selection. This task may be quite difficult, since the prediction is performed also in different parts of the year than the learning, e.g. the model trained on data obtained during winter may perform worse during summer (as was suggested by experts in the application area).</p><p>Table <ref type="table" target="#tab_2">1</ref> brings overview of data sets sizes. All tasks have 8 input values (five sensors, temperature, absolute and relative humidity) and 1 output (predicted value). All values are normalized between 0, 1 . The MNIST data set contains 70 000 images of hand written digits, 28 × 28 pixel each (see Fig. <ref type="figure" target="#fig_1">1</ref>). 60 000 are used for training, 10 000 for testing. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2">Setup</head><p>For the sensor data the proposed algorithm was run for 100 generations for each data set, with n = 10 and m = 30.</p><p>During fitness function evaluation the network weights are trained by RMSprop (one of the standard algorithms) for 500 epochs. Besides the ES classical GA was implemented and run on sensor data with same fitness function.</p><p>For the MNIST data set, the algorithm was run for 30 generations, with n = 5 and m = 10, for fitness evaluation the RMSprop was run for 20 epochs.</p><p>When the best individual is obtained, the corresponding network is built and trained on the whole training set and evaluated on the test set.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.3">Results</head><p>The resulting testing errors obtained by GA and ES in the first experiment are listed in Table <ref type="table" target="#tab_4">3</ref>. There are average, standard deviation, minimum and maximum errors over 10 computations. The performance of ES over GA is slightly better, the ES achieved lower errors in 15 cases, GA in 11 cases.</p><p>Table <ref type="table" target="#tab_5">4</ref> compares ES testing errors to results obtained by support vector regression (SVR) with linear, RBF, polynomial, and sigmoid kernel function. SVR was trained using Scikit-learn library <ref type="bibr" target="#b14">[15]</ref>, hyperparameters were found using grid search and crossvalidation.</p><p>The ES outperforms the SVR, it found best results in 17 cases.</p><p>Finally, Table <ref type="table">5</ref> compares the testing error of evolved network to error of three fixed architectures (for example 30-10-1 stands for 2 hidden layers of 30 and 10 neurons, one neuron in output layers, ReLU activation is used and dropout 0.2). The evolved network achieved the most <ref type="bibr" target="#b9">(10)</ref> best results.</p><p>Since this task does not have much training samples, also the networks evolved are quite small. The typical evolved network had one hidden layer of about 70 neurons, dropout rate 0.3 and ReLU activation function.</p><p>The second experiment was the classification of MNIST letters. As a baseline architecture was taken the one from Keras examples, i.e. network with two hidden layers of 512 ReLU units each, both with dropout 0.2. This network has a fairly good performance. It was trained 10 times The evolved network had also two hidden layers, first with 736 ReLU units and dropout parameter 0.09, the second with 471 hard sigmoid units and dropout 0.2. The ES found a competitive result, the evolved network achieved better accuracy than the baseline model.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">Conclusion</head><p>We have proposed an algorithm for automatic design of DNNs based on evolution strategies. The algorithm was tested in experiments on the real-life sensor data set and MNIST dataset of handwritten digits. On sensor data set, the solutions found by our algorithm outperforms SVR and selected fixed architectures. The activation function dominating in solutions is the ReLU function. For the MNIST data set, the network with ReLU and hard sigmoid units was found, outperforming the baseline solution. We have shown that our algorithm is able to found competitive solutions.</p><p>The main limitation of the algorithm is the time complexity. One direction of our future work is to try to lower the number of fitness evaluations using surrogate modeling or to use asynchronous evolution.</p><p>Also we plan to extend the algorithm to work also with convolutional networks and to include more parameters, such as other types of regularization, the type of optimization algorithm, etc.</p><p>The gradient based optimization algorithm depends significantly on the random initialization of weights. One way to overcome this is to combine the evolution of weights and gradient based local search that is another possibility of future work. </p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head></head><label></label><figDesc>to April 4, 2005, taking into account each hour where records with missing values were omitted. There are altogether 5 sensors as inputs and 5 target output values representing concentrations of CO, NO 2 , NOx, C6H6, and NMHC.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Example of MNIST data set samples.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head></head><label></label><figDesc>Algorithm 1 (n, m)-Evolution strategy optimizing realvalued vector and utilizing adaptive variance for each pa-</figDesc><table><row><cell>rameter</cell></row><row><cell>procedure (n, m)-ES</cell></row><row><cell>t ← 0 Initialize population P t n by randomly generated vectors x t = (x t 1 , . . . , x t N , σ t 1 , . . . , σ t N )</cell></row><row><cell>Evaluate individuals in P t</cell></row><row><cell>while not terminating criterion do</cell></row><row><cell>for i ← 1, . . . , m do choose randomly a parent x t i ,</cell></row><row><cell>generate an offspring y t i</cell></row><row><cell>by Gaussian mutation:</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 1 :</head><label>1</label><figDesc>Overview of data sets sizes.</figDesc><table><row><cell>Task</cell><cell cols="2">train set test set</cell></row><row><cell>CO</cell><cell>1469</cell><cell>5875</cell></row><row><cell>NO2</cell><cell>1479</cell><cell>5914</cell></row><row><cell>NOx</cell><cell>1480</cell><cell>5916</cell></row><row><cell>C6H6</cell><cell>1799</cell><cell>7192</cell></row><row><cell>NMHC</cell><cell>178</cell><cell>709</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 2 :</head><label>2</label><figDesc>Test accuracies on the MNIST data set. .13 98.18 98.55 evolved by ES 98.64 0.05 98.55 98.73 and the results are listed in Table 2, together with results obtained by the evolved network.</figDesc><table><row><cell>model</cell><cell>avg</cell><cell>std</cell><cell>min</cell><cell>max</cell></row><row><cell>baseline</cell><cell>98.34 0</cell><cell></cell><cell></cell><cell></cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_4"><head>Table 3 :</head><label>3</label><figDesc>Errors on test set for networks found by GA and ES. The average, standard deviation, minimum and maximum of 10 evaluations of the learning algorithm are listed.</figDesc><table><row><cell></cell><cell></cell><cell>GA</cell><cell></cell><cell></cell><cell></cell><cell>ES</cell><cell></cell></row><row><cell></cell><cell>avg</cell><cell>std</cell><cell>min</cell><cell>max</cell><cell>avg</cell><cell>std</cell><cell>min</cell><cell>max</cell></row><row><cell>CO part1</cell><cell cols="4">0.209 0.014 0.188 0.236</cell><cell cols="4">0.229 0.026 0.195 0.267</cell></row><row><cell>CO part2</cell><cell cols="4">0.801 0.135 0.600 1.048</cell><cell cols="4">0.657 0.024 0.631 0.694</cell></row><row><cell>CO part3</cell><cell cols="4">0.266 0.029 0.222 0.309</cell><cell cols="4">0.256 0.045 0.199 0.349</cell></row><row><cell>CO part4</cell><cell cols="4">0.404 0.226 0.186 0.865</cell><cell cols="4">0.526 0.108 0.308 0.701</cell></row><row><cell>CO part5</cell><cell cols="4">0.246 0.024 0.207 0.286</cell><cell cols="4">0.235 0.025 0.199 0.277</cell></row><row><cell>NOx part1</cell><cell cols="4">2.201 0.131 1.994 2.506</cell><cell cols="4">2.132 0.086 2.021 2.284</cell></row><row><cell>NOx part2</cell><cell cols="4">1.705 0.284 1.239 2.282</cell><cell cols="4">1.599 0.077 1.444 1.685</cell></row><row><cell>NOx part3</cell><cell cols="4">1.238 0.163 0.982 1.533</cell><cell cols="4">1.339 0.242 1.106 1.955</cell></row><row><cell>NOx part4</cell><cell cols="4">1.490 0.173 1.174 1.835</cell><cell cols="4">1.610 0.164 1.435 2.041</cell></row><row><cell>NOx part5</cell><cell cols="4">0.551 0.052 0.456 0.642</cell><cell cols="4">0.622 0.075 0.521 0.726</cell></row><row><cell>NO2 part1</cell><cell cols="4">1.697 0.266 1.202 2.210</cell><cell cols="4">1.506 0.217 1.132 1.823</cell></row><row><cell>NO2 part2</cell><cell cols="4">2.009 0.415 1.326 2.944</cell><cell cols="4">1.371 0.048 1.242 1.415</cell></row><row><cell>NO2 part3</cell><cell cols="4">0.593 0.082 0.532 0.815</cell><cell cols="4">0.660 0.078 0.599 0.863</cell></row><row><cell>NO2 part4</cell><cell cols="4">0.737 0.023 0.706 0.776</cell><cell cols="4">0.782 0.043 0.711 0.856</cell></row><row><cell>NO2 part5</cell><cell cols="4">1.265 0.158 1.054 1.580</cell><cell cols="4">0.730 0.111 0.520 0.905</cell></row><row><cell>C6H6 part1</cell><cell cols="4">0.013 0.005 0.006 0.024</cell><cell cols="4">0.013 0.004 0.007 0.018</cell></row><row><cell>C6H6 part2</cell><cell cols="4">0.039 0.015 0.025 0.079</cell><cell cols="4">0.034 0.010 0.020 0.050</cell></row><row><cell>C6H6 part3</cell><cell cols="4">0.019 0.011 0.009 0.041</cell><cell cols="4">0.048 0.015 0.016 0.075</cell></row><row><cell>C6H6 part4</cell><cell cols="4">0.030 0.015 0.014 0.061</cell><cell cols="4">0.020 0.010 0.010 0.042</cell></row><row><cell>C6H6 part5</cell><cell cols="4">0.017 0.015 0.004 0.051</cell><cell cols="4">0.027 0.011 0.014 0.051</cell></row><row><cell>NMHC part1</cell><cell cols="4">1.719 0.168 1.412 2.000</cell><cell cols="4">1.685 0.256 1.448 2.378</cell></row><row><cell>NMHC part2</cell><cell cols="4">0.623 0.164 0.446 1.047</cell><cell cols="4">0.713 0.097 0.566 0.865</cell></row><row><cell>NMHC part3</cell><cell cols="4">1.144 0.181 0.912 1.472</cell><cell cols="4">1.097 0.270 0.775 1.560</cell></row><row><cell>NMHC part4</cell><cell cols="4">1.220 0.206 0.994 1.563</cell><cell cols="4">1.099 0.166 0.898 1.443</cell></row><row><cell>NMHC part5</cell><cell cols="4">1.222 0.126 1.055 1.447</cell><cell cols="4">1.023 0.050 0.963 1.116</cell></row><row><cell></cell><cell>11</cell><cell></cell><cell></cell><cell></cell><cell>15</cell><cell></cell><cell></cell></row><row><cell></cell><cell>44%</cell><cell></cell><cell></cell><cell></cell><cell>60%</cell><cell></cell><cell></cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_5"><head>Table 4 :</head><label>4</label><figDesc>Test errors for evolved network and SVR with different kernel functions. For the evolved network the average, standard deviation, minimum and maximum of 10 evaluations of learning algorithm are listed.</figDesc><table><row><cell>Task</cell><cell></cell><cell cols="2">Evolved network</cell><cell></cell><cell></cell><cell>SVR</cell></row><row><cell></cell><cell>avg</cell><cell>std</cell><cell>min</cell><cell>max</cell><cell>linear</cell><cell>RBF Poly. Sigmoid</cell></row><row><cell>CO_part1</cell><cell cols="4">0.229 0.026 0.195 0.267</cell><cell cols="2">0.340 0.280 0.285</cell><cell>1.533</cell></row><row><cell>CO_part2</cell><cell cols="4">0.657 0.024 0.631 0.694</cell><cell cols="2">0.614 0.412 0.621</cell><cell>1.753</cell></row><row><cell>CO_part3</cell><cell cols="4">0.256 0.045 0.199 0.349</cell><cell cols="2">0.314 0.408 0.377</cell><cell>1.427</cell></row><row><cell>CO_part4</cell><cell cols="4">0.526 0.108 0.308 0.701</cell><cell cols="2">1.127 0.692 0.535</cell><cell>1.375</cell></row><row><cell>CO_part5</cell><cell cols="4">0.235 0.025 0.199 0.277</cell><cell cols="2">0.348 0.207 0.198</cell><cell>1.568</cell></row><row><cell>NOx_part1</cell><cell cols="4">2.132 0.086 2.021 2.284</cell><cell cols="2">1.062 1.447 1.202</cell><cell>2.537</cell></row><row><cell>NOx_part2</cell><cell cols="4">1.599 0.077 1.444 1.685</cell><cell cols="2">2.162 1.838 1.387</cell><cell>2.428</cell></row><row><cell>NOx_part3</cell><cell cols="4">1.339 0.242 1.106 1.955</cell><cell cols="2">0.594 0.674 0.665</cell><cell>2.705</cell></row><row><cell>NOx_part4</cell><cell cols="4">1.610 0.164 1.435 2.041</cell><cell cols="2">0.864 0.903 0.778</cell><cell>2.462</cell></row><row><cell>NOx_part5</cell><cell cols="4">0.622 0.075 0.521 0.726</cell><cell cols="2">1.632 0.730 1.446</cell><cell>2.761</cell></row><row><cell>NO2_part1</cell><cell cols="4">1.506 0.217 1.132 1.823</cell><cell cols="2">2.464 2.404 2.401</cell><cell>2.636</cell></row><row><cell>NO2_part2</cell><cell cols="4">1.371 0.048 1.242 1.415</cell><cell cols="2">2.118 2.250 2.409</cell><cell>2.648</cell></row><row><cell>NO2_part3</cell><cell cols="4">0.660 0.078 0.599 0.863</cell><cell cols="2">1.308 1.195 1.213</cell><cell>1.984</cell></row><row><cell>NO2_part4</cell><cell cols="4">0.782 0.043 0.711 0.856</cell><cell cols="2">1.978 2.565 1.912</cell><cell>2.531</cell></row><row><cell>NO2_part5</cell><cell cols="4">0.730 0.111 0.520 0.905</cell><cell cols="2">1.0773 1.047 0.967</cell><cell>2.129</cell></row><row><cell>C6H6_part1</cell><cell cols="4">0.013 0.004 0.007 0.018</cell><cell cols="2">0.300 0.511 0.219</cell><cell>1.398</cell></row><row><cell>C6H6_part2</cell><cell cols="4">0.034 0.010 0.020 0.050</cell><cell cols="2">0.378 0.489 0.369</cell><cell>1.478</cell></row><row><cell>C6H6_part3</cell><cell cols="4">0.048 0.015 0.016 0.075</cell><cell cols="2">0.520 0.663 0.538</cell><cell>1.317</cell></row><row><cell>C6H6_part4</cell><cell cols="4">0.020 0.010 0.010 0.042</cell><cell cols="2">0.217 0.459 0.123</cell><cell>1.279</cell></row><row><cell>C6H6_part5</cell><cell cols="4">0.027 0.011 0.014 0.051</cell><cell cols="2">0.215 0.297 0.188</cell><cell>1.526</cell></row><row><cell cols="5">NMHC_part1 1.685 0.256 1.448 2.378</cell><cell cols="2">1.718 1.666 1.621</cell><cell>3.861</cell></row><row><cell cols="5">NMHC_part2 0.713 0.097 0.566 0.865</cell><cell cols="2">0.934 0.978 0.839</cell><cell>3.651</cell></row><row><cell cols="5">NMHC_part3 1.097 0.270 0.775 1.560</cell><cell cols="2">1.580 1.280 1.438</cell><cell>2.830</cell></row><row><cell cols="5">NMHC_part4 1.099 0.166 0.898 1.443</cell><cell cols="2">1.720 1.565 1.917</cell><cell>2.715</cell></row><row><cell cols="5">NMHC_part5 1.023 0.050 0.963 1.116</cell><cell cols="2">1.238 0.944 1.407</cell><cell>2.960</cell></row><row><cell></cell><cell>17</cell><cell></cell><cell></cell><cell></cell><cell>2</cell><cell>2</cell><cell>4</cell></row><row><cell></cell><cell>68%</cell><cell></cell><cell></cell><cell></cell><cell>8%</cell><cell>8%</cell><cell>16%</cell></row></table></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgment</head><p>This work was partially supported by the Czech Grant Agency grant 15-18108S and institutional support of the Institute of Computer Science RVO 67985807.</p><p>Access to computing and storage facilities owned by parties and projects contributing to the National Grid Infrastructure MetaCentrum provided under the programme "Projects of Large Research, Development, and Innovations Infrastructures" (CESNET LM2015042), is greatly appreciated.</p></div>
			</div>

			<div type="annex">
<div xmlns="http://www.tei-c.org/ns/1.0" />			</div>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Using genetic algorithms to select architecture of a feedforward artificial neural network</title>
		<author>
			<persName><forename type="first">Jasmina</forename><surname>Arifovic</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ramazan</forename><surname>Gençay</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Physica A: Statistical Mechanics and its Applications</title>
		<imprint>
			<biblScope unit="volume">289</biblScope>
			<biblScope unit="issue">3-4</biblScope>
			<biblScope unit="page" from="574" to="594" />
			<date type="published" when="2001">2001</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Evolutionary strategies: A comprehensive introduction</title>
		<author>
			<persName><forename type="first">H.-G</forename><surname>Beyer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">P</forename><surname>Schwefel</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Natural Computing</title>
		<imprint>
			<biblScope unit="page" from="3" to="52" />
			<date type="published" when="2002">2002</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<title/>
		<author>
			<persName><forename type="first">François</forename><surname>Chollet</surname></persName>
		</author>
		<author>
			<persName><surname>Keras</surname></persName>
		</author>
		<ptr target="https://github.com/fchollet/keras,2015" />
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Genetic algorithms for evolving deep neural networks</title>
		<author>
			<persName><forename type="first">Omid</forename><forename type="middle">E</forename><surname>David</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Iddo</forename><surname>Greental</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Companion Publication of the 2014 Annual Conference on Genetic and Evolutionary Computation, GECCO Comp &apos;14</title>
				<meeting>the Companion Publication of the 2014 Annual Conference on Genetic and Evolutionary Computation, GECCO Comp &apos;14<address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="1451" to="1452" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Semi-supervised learning techniques in artificial olfaction: A novel approach to classification problems and drift counteraction</title>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">De</forename><surname>Vito</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Fattoruso</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Pardo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Tortorella</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Di Francia</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Sensors Journal</title>
		<imprint>
			<biblScope unit="volume">12</biblScope>
			<biblScope unit="issue">11</biblScope>
			<biblScope unit="page" from="3215" to="3224" />
			<date type="published" when="2012-11">Nov 2012</date>
		</imprint>
	</monogr>
	<note>IEEE</note>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Neuroevolution: from architectures to learning</title>
		<author>
			<persName><forename type="first">Dario</forename><surname>Floreano</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Peter</forename><surname>Dürr</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Claudio</forename><surname>Mattiussi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Evolutionary Intelligence</title>
		<imprint>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="47" to="62" />
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Accelerated neural evolution through cooperatively coevolved synapses</title>
		<author>
			<persName><forename type="first">Faustino</forename><surname>Gomez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Juergen</forename><surname>Schmidhuber</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Risto</forename><surname>Miikkulainen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Machine Learning Research</title>
		<imprint>
			<biblScope unit="page" from="937" to="965" />
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<monogr>
		<title level="m" type="main">Deep Learning</title>
		<author>
			<persName><forename type="first">Ian</forename><surname>Goodfellow</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Yoshua</forename><surname>Bengio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Aaron</forename><surname>Courville</surname></persName>
		</author>
		<ptr target="http://www.deeplearningbook.org" />
		<imprint>
			<date type="published" when="2016">2016</date>
			<publisher>MIT Press</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Evolving deep unsupervised convolutional networks for vision-based reinforcement learning</title>
		<author>
			<persName><forename type="first">Jan</forename><surname>Koutník</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Juergen</forename><surname>Schmidhuber</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Faustino</forename><surname>Gomez</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2014 Annual Conference on Genetic and Evolutionary Computation, GECCO &apos;14</title>
				<meeting>the 2014 Annual Conference on Genetic and Evolutionary Computation, GECCO &apos;14<address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="541" to="548" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Deep learning</title>
		<author>
			<persName><forename type="first">Yann</forename><surname>Lecun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Yoshua</forename><surname>Bengio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Geoffrey</forename><surname>Hinton</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Nature</title>
		<imprint>
			<biblScope unit="volume">521</biblScope>
			<biblScope unit="issue">7553</biblScope>
			<biblScope unit="page">5</biblScope>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<monogr>
		<title level="m" type="main">The mnist database of handwritten digits</title>
		<author>
			<persName><forename type="first">Yann</forename><surname>Lecun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Corinna</forename><surname>Cortes</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<monogr>
		<title level="m" type="main">CMA-ES for hyperparameter optimization of deep neural networks</title>
		<author>
			<persName><forename type="first">Ilya</forename><surname>Loshchilov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Frank</forename><surname>Hutter</surname></persName>
		</author>
		<idno>CoRR, abs/1604.07269</idno>
		<imprint>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Towards evolutionary deep neural networks</title>
		<author>
			<persName><forename type="first">Tomas</forename><forename type="middle">H</forename><surname>Maul</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Andrzej</forename><surname>Bargiela</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Siang-Yew</forename><surname>Chong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Abdullahi</forename><forename type="middle">S</forename><surname>Adamu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ECMS 2014 Proceedings. European Council for Modeling and Simulation</title>
				<editor>
			<persName><forename type="first">Flaminio</forename><surname>Squazzoni</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Fabio</forename><surname>Baronio</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Claudia</forename><surname>Archetti</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Marco</forename><surname>Castellani</surname></persName>
		</editor>
		<imprint>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<monogr>
		<title level="m" type="main">Evolving deep neural networks</title>
		<author>
			<persName><forename type="first">Jason</forename><forename type="middle">Zhi</forename><surname>Risto Miikkulainen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Elliot</forename><surname>Liang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Aditya</forename><surname>Meyerson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Dan</forename><surname>Rawal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Olivier</forename><surname>Fink</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Bala</forename><surname>Francon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Hormoz</forename><surname>Raju</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Arshak</forename><surname>Shahrzad</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Nigel</forename><surname>Navruzyan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Babak</forename><surname>Duffy</surname></persName>
		</author>
		<author>
			<persName><surname>Hodjat</surname></persName>
		</author>
		<idno>CoRR, abs/1703.00548</idno>
		<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Scikit-learn: Machine learning in Python</title>
		<author>
			<persName><forename type="first">F</forename><surname>Pedregosa</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Machine Learning Research</title>
		<imprint>
			<biblScope unit="volume">12</biblScope>
			<biblScope unit="page" from="2825" to="2830" />
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<monogr>
		<title level="m" type="main">Evolution Strategies as a Scalable Alternative to Reinforcement Learning</title>
		<author>
			<persName><forename type="first">T</forename><surname>Salimans</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Ho</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Sutskever</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2017-03">March 2017</date>
		</imprint>
	</monogr>
	<note>ArXiv e-prints</note>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">A hypercube-based encoding for evolving largescale neural networks</title>
		<author>
			<persName><forename type="first">Kenneth</forename><forename type="middle">O</forename><surname>Stanley</surname></persName>
		</author>
		<author>
			<persName><forename type="first">David</forename><forename type="middle">B</forename><surname>D'ambrosio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jason</forename><surname>Gauci</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Artif. Life</title>
		<imprint>
			<biblScope unit="volume">15</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="185" to="212" />
			<date type="published" when="2009-04">April 2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Evolving neural networks through augmenting topologies</title>
		<author>
			<persName><forename type="first">Kenneth</forename><forename type="middle">O</forename><surname>Stanley</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Risto</forename><surname>Miikkulainen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Evolutionary Computation</title>
		<imprint>
			<biblScope unit="volume">10</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="99" to="127" />
			<date type="published" when="2002">2002</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Optimization of neural network architecture using genetic algorithm for load forecasting</title>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">U</forename><surname>Islam</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Baharudin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">Q</forename><surname>Raza</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Nallagownden</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">2014 5th International Conference on Intelligent and Advanced Systems (ICIAS)</title>
				<imprint>
			<date type="published" when="2014-06">June 2014</date>
			<biblScope unit="page" from="1" to="6" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<monogr>
		<title/>
		<author>
			<persName><forename type="first">Petra</forename><surname>Vidnerová</surname></persName>
		</author>
		<author>
			<persName><surname>Gakeras</surname></persName>
		</author>
		<ptr target="github.com/PetraVidnerova/GAKeras" />
		<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">On field calibration of an electronic nose for benzene estimation in an urban pollution monitoring scenario</title>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">De</forename><surname>Vito</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Massera</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Piga</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Martinotto</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Di Francia</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Sensors and Actuators B: Chemical</title>
		<imprint>
			<biblScope unit="volume">129</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="750" to="757" />
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
