<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Neural Architecture Search using Particle Swarm and Ant Colony Optimization</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Seamus Lankford</string-name>
          <email>seamus.lankford@adaptcentre.ie</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Diarmuid Grimes</string-name>
          <email>diarmuid.grimes@cit.ie</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Adapt Centre, Dublin City University</institution>
          ,
          <country country="IE">Ireland</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Cork Institute of Technology</institution>
          ,
          <country country="IE">Ireland</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Neural network models have a number of hyperparameters that must be chosen along with their architecture. This can be a heavy burden on a novice user, choosing which architecture and what values to assign to parameters. In most cases, default hyperparameters and architectures are used. Signi cant improvements to model accuracy can be achieved through the evaluation of multiple architectures. A process known as Neural Architecture Search (NAS) may be applied to automatically evaluate a large number of such architectures. A system integrating open source tools for Neural Architecture Search (OpenNAS), in the classi cation of images, has been developed as part of this research. OpenNAS takes any dataset of grayscale, or RBG images, and generates Convolutional Neural Network (CNN) architectures based on a range of metaheuristics using either an AutoKeras, a transfer learning or a Swarm Intelligence (SI) approach. Particle Swarm Optimization (PSO) and Ant Colony Optimization (ACO) are used as the SI algorithms. Furthermore, models developed through such metaheuristics may be combined using stacking ensembles. In the context of this paper, we focus on training and optimizing CNNs using the Swarm Intelligence (SI) components of OpenNAS. Two major types of SI algorithms, namely PSO and ACO, are compared to see which is more e ective in generating higher model accuracies. It is shown, with our experimental design, that the PSO algorithm performs better than ACO. The performance improvement of PSO is most notable with a more complex dataset. As a baseline, the performance of ne-tuned pre-trained models is also evaluated.</p>
      </abstract>
      <kwd-group>
        <kwd>AutoML</kwd>
        <kwd>NAS</kwd>
        <kwd>Swarm Intelligence</kwd>
        <kwd>PSO</kwd>
        <kwd>ACO</kwd>
        <kwd>CNN</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        generating more complex architectures, such as CNNs, are at early stages of
development. Consequently they are poorly documented and often unreliable
[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. In addition, the alternative of using commercial platforms is expensive and
therefore users are left with few practical or viable options.
      </p>
      <p>The development of OpenNAS integrates several metaheuristic approaches
in a single application used for the neural architecture search of more complex
neural architectures such as convolutional neural networks. Furthermore, the
e ectiveness of NAS in generating good neural architectures for image classi
cation is evaluated. Standard approaches to NAS, using the AutoKeras framework,
are also incorporated into the system design.</p>
      <p>
        A key aspect of the study is to contrast Swarm Intelligence (SI) algorithms
for NAS. Consequently, Particle Swarm Optimization (PSO) [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] and Ant Colony
Optimization (ACO) [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] have been chosen as metaheuristics for creating high
performing CNN architectures for grayscale and RGB image datasets.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Background</title>
      <p>2.1</p>
      <sec id="sec-2-1">
        <title>Convolutional Neural Networks</title>
        <p>
          CNNs are feed-forward Deep Neural Networks (DNNs) used for image
recognition. The original CNN architecture was proposed by LeCun [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] and consisted
of two convolution layers, two pooling layers, two fully connected (FC) layers
and an output layer. Subsequently, numerous models were developed including
popular ones such as ResNet [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] and VGG [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]. In this study, custom CNN
architectures are created by using SI heuristics to nd better combinations of
convolutional, pooling and FC layers.
2.2
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Auto ML</title>
        <p>
          AutoML involves the automation of the entire machine learning pipeline
including data augmentation, feature engineering, model selection, choice of hyper
parameters and nally neural architecture selection and creation. By constrast,
NAS has a more narrow focus in that it concentrates on neural architecture
selection and creation [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ].
        </p>
        <p>
          Tree-based Pipeline Optimization Tool (TPOT) is an open source python
package that uses genetic programming in optimizing the machine learning
pipeline [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. The library performs well on simple NAS tasks involving the
scikitlearn API. Given this study involves generating more complex CNNs, rather
than developing optimal pipelines, it was decided not to use TPOT as part of
the initial solution architecture. However, as part of future work, it may have a
role in optimizing hyper parameter selection.
        </p>
        <p>
          AutoKeras [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] is an open source AutoML system using Bayesian optimization
and network morphism for e cient neural architecture search.
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>Neural Architecture Search</title>
        <p>
          Neural architecture search is the process of automatically nding and tuning
DNNs. It has been shown that DNNs have made remarkable progress in solving
many real world problems such as image recognition, speech recognition and
machine translation [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]. In general, NAS systems consist of three main
components: a search space, a search algorithm and an evaluation strategy. The
search space sets out which architectures can be used in principle whereas the
search strategy outlines how the search space is explored. Finally the evaluation
strategy determines which architectures yield the best results on unseen data.
        </p>
        <p>A basic approach to NAS is the brute force training and evaluation of all
possible model combinations. On completion, the best performing model is
selected. However, this is impractical due to the combinatorics of the problem.
Using metaheuristics, such as swarm intelligence, is an alternative which seeks
the best model within reasonable time constraints.
2.4</p>
      </sec>
      <sec id="sec-2-4">
        <title>Swarm Intelligence</title>
        <p>
          Swarm Intelligence, a category of Evolutionary Computing, has been used for
classi cation problems in the following forms: Particle Swarm Optimization
(PSO) [
          <xref ref-type="bibr" rid="ref15 ref16">15, 16</xref>
          ] and Ant Colony Optimization (ACO) [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ].
        </p>
        <p>
          Particle Swarm Optimization PSO belongs to the class of swarm
intelligence techniques and is a population-based stochastic technique for solving
optimization problems developed in 1995. An open source python library, for CNN
optimization using the PSO algorithm was developed by Fernandes et al [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ].
The results demonstrate that their approach, psoCNN, quickly nds CNN
architectures which o er competitive performance for any given dataset.
Ant Colony Optimization ACO, modelled on the activities of real ant colonies,
involves moving through a parameter space of all potential solutions to nd the
optimal weights for a neural network.
        </p>
        <p>
          Using ACO, a system known as DeepSwarm was developed by Byla and Pang
[
          <xref ref-type="bibr" rid="ref19">19</xref>
          ] to nd high performing neural architectures for CNNs. They showed that it
o ers competitive performance when tested on well-known image datasets.
3
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Approach</title>
      <p>With arti cial neural networks, there are many parameters to choose from such
as the number of hidden network layers, number of neurons per layer, type of
activation function, choice of optimizer and so on. The nal network design often
depends on the problem domain and is typically achieved in a time consuming
trial and error fashion.</p>
      <p>Similar problems exist with CNNs but these problems are exacerbated by
the length of time, and amount of computational resources required to train
such networks. Clearly, a core objective of NAS is to nd good network
performance within acceptable time limits through the reduction of both the number
of networks tested and the length of time required for their evaluation. The
implementation of NAS can be achieved through a variety of approaches including
transfer learning using pre-trained networks, network morphism or swarm
intelligence. Using these approaches as its pillars, a NAS system (OpenNAS) has
been built which tackles such problems 3. OpenNAS does not enforce a
particular architecture but rather it allows novel and interesting architectures to be
discovered.</p>
      <p>In this work we focus on the swarm intelligence component of the OpenNAS
system. The swarm optimization techniques currently used are Particle Swarm
Optimization and Ant Colony Optimization.</p>
      <p>The PSO algorithm determines how the principal CNN layer types, and their
associated hyperparameters, are connected together. The generated models
consist of architectures using a mix of convolutional, average pooling, max pooling
and fully connected layers. In addition, dropout layers and batch normalization
layers are also added to alleviate over tting. The hyperparameters associated
with each layer type are indicated in Table 1.
3 https://github.com/seamusl/OpenNAS-v1</p>
      <p>Particle architectures, i.e. model architectures, are compiled for a number of
epochs and evaluation is carried out using the standard loss function of
crossentropy loss. Particle architectures with the smallest loss are selected by the
algorithm. The number of epochs parameter for pBest must be carefully chosen
since it is the main driver of both run time and model accuracy.</p>
      <p>Using an ACO approach, the parameters used for model training in the
exploration process are highlighted in Table 2. Two test con gurations are considered.
In the rst case, 8 ants are used with 30 epochs and in the second case, 16 ants
are used with 15 epochs. The depth parameter was xed at 20.</p>
      <p>Fine tuning was implemented by initially removing the fully connected layers
from the top of the model. Two blocks are then added, each of which has a fully
connected layer, a batch normalization layer and a dropout layer. The hybrid
structure is then trained with the new dataset. Fine tuning of a VGG16 network
is illustrated in Figure 1.
The high level view of the system architecture is presented in Figure 2. The
system is organized into the following python modules: OpenNAS, pre-processor,
trainer, ensemble, super stacker, syscon g and loader. The pre-train function
uses transfer learning as either a feature extractor or to ne tune the pre-trained
networks of VGG16, VGG19, MobileNet or ResNet50.</p>
      <p>
        With the swarm function, PSO or ACO can be used to search for the best
neural architecture. Existing open source python libraries were customized for
both PSO and ACO functionality. Particle swarms were implemented using a
psoCNN library [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] whereas ant colonies used the DeepSwarm library [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. The
environment required Python 3.7, Tensor ow 1.14, Keras 2.2.4, Numpy 1.16.4
and Matplotplib 3.1.0.
      </p>
      <p>Existing NAS tools, such as AutoKeras, were also integrated into the
OpenNAS system. AutoKeras is a powerful open source library which provides
functions to automatically search for optimal architectures for deep learning models.
However, this library is still in beta development and the associated
documentation is quite poor.</p>
      <p>With the ensemble module, there are options to build stacked ensembles using
either homogeneous or heterogeneous base learners. These learner outputs are
subsequently passed to a suite of meta learner algorithms. The system generates
the optimal neural architecture model using the chosen heuristic.
5</p>
    </sec>
    <sec id="sec-4">
      <title>Evaluation</title>
      <p>
        Two datasets were chosen for the experimental design, namely CIFAR10 [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]
and Fashion Mnist [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ]. A primary research objective is the development of a
Neural Architecture Search tool which generates high performing architectures
for generic datasets of either grayscale (one channel) or colour (triple channel)
images. The CIFAR10 dataset meets this requirement in that it is a challenging
dataset of colour images. The Fashion Mnist dataset is also suitable since it a
well-tested and well understood dataset of black and white images. For reference,
the state of the art (SOA) accuracy achieved on CIFAR10 is 98.5% [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] whereas
with Fashion Mnist, the SOA accuracy is 94.6% [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ].
5.1
      </p>
      <sec id="sec-4-1">
        <title>Particle Swarm Optimization</title>
        <p>In order to test variance and reproducibility, each con guration was run 5 times
on both CIFAR10 and Fashion Minst which resulted in the evaluation of 4000
CNN architectures for this phase of the study.</p>
      </sec>
      <sec id="sec-4-2">
        <title>Evaluation of models trained on CIFAR10 dataset Validation accuracy</title>
        <p>was used to evaluate the performance of both PSO con gurations. It is clear from
Table 3 that the PSO model trained on swarm settings of a lower population
and higher number of iterations (population of 10 and 20 iterations) performed
signi cantly better. In terms of accuracy, the mean performance was 3.5% better.</p>
        <p>MAacxc MAecacn StADcecv (Tmi mine) Layers
Population 10
Iterations 20 0.900 0.853
Population 20
Iterations 10 0.883 0.818</p>
        <p>Both con gurations have a very low standard deviation for model accuracy
indicating a high level of reproducibility between test runs. At a mean run time
of 21.9 hours for the rst con guration and 18.6 hours for the second con
guration, the PSO search for CNN architectures is a slow process considering high
performance workstations, with NVIDIA GeForce GTX 1080 Ti graphic cards,
were used.</p>
      </sec>
      <sec id="sec-4-3">
        <title>Evaluation of models trained on Fashion Mnist dataset PSO models</title>
        <p>trained on the Fashion Mnist dataset (Table 4), achieved much higher accuracy
compared with models developed using CIFAR10 data. Similar to CIFAR10, the
low standard deviation associated with both implementations of Fashion Mnist
models indicate the PSO approach produces consistent results between di erent
test runs.</p>
        <p>The stochastic nature of metaheuristics impacts the run times associated
with PSO for both Fashion Mnist and CIFAR10. In all tests, no clear pattern
emerged with regard to run times: CIFAR10 was faster using a population of
10 with 20 iterations whereas Fashion Mnist was faster with a population of 20
with 10 iterations. Therefore, in terms of run time, no clear conclusion could be
drawn by doubling the population and halving the iterations.</p>
        <p>With regard to the impact of swarm settings on model accuracy for
Fashion Mnist, again there is little to separate the con gurations. With a mean
accuracy of 93.5% for a population of 20 with 10 iterations and a corresponding
mean model accuracy of 93.2% using a population of 10 with 20 iterations, no
clear conclusion can be drawn.</p>
        <p>
          Therefore, unlike CIFAR10, changing the swarm settings by doubling
population and halving iterations does not impact model accuracy in the case of
Fasion Mnist. Both con gurations for the PSO algorithm perform well on this
dataset.
Similar to other metaheuristics, there are several parameters which can be tuned
for optimal neural architecture search using Ant Colony Optimization [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ]. With
OpenNAS, users may select the options of depth, number of ants and number
of epochs in directing how the neural architecture search is conducted.
        </p>
      </sec>
      <sec id="sec-4-4">
        <title>Evaluation of models trained on CIFAR10 dataset With CIFAR10 data,</title>
        <p>the results from Table 5 indicate that a greater number of ants leads to higher
model accuracy. The improvement in max model accuracy achieved, through
doubling the number of ants and halving the number of epochs, was modest at
just 1.2%. The impact on run time for a small increase in accuracy was severe.
Doubling the number of ants e ectively doubled the run time (even though the
number of epochs was halved). The standard deviation for accuracy is very low
indicating good reproducibility between the various test runs.</p>
      </sec>
      <sec id="sec-4-5">
        <title>Evaluation of models trained on Fashion Mnist dataset The performance</title>
        <p>of ACO models using Fashion Mnist data is highlighted Table 6. It can be seen
that both con gurations perform well resulting in accuracies greater than 93%.
The di erence in mean model accuracy between con guration A (8 ants and 30
epochs) is trivial when compared to con guration B (8 ants and 30 epochs).
However, similar to ACO on CIFAR10, the di erence in run time is very
significant for con guration B. E ectively it took over 7 hours longer to achieve an
accuracy improvement of 0.1%.</p>
        <p>
          Clearly in the case of a simpler dataset such as Fashion Mnist, using a number
of ants in excess of 8 is not worth doing. This nding is similar to that seen with
the more complex CIFAR10 dataset, above. Therefore choosing the number of
ants, used for this ACO implementation, is an important consideration impacting
run time performance. As anticipated, the standard deviation for accuracy is also
very low indicating good reproducibility between the various test runs.
The OpenNAS performance of all models across both datasets is illustrated in
Figure 3. The results demonstrate performance comparable to that achieved by
the pso-CNN [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ] approach and better than that of DeepSwarm [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ].
        </p>
        <p>The highest accuracy of OpenNAS in CIFAR-10 classi cation was 90.0%.
This was achieved using using a PSO-derived model. By comparison, DeepSwarm
achieved a top accuracy of 88.7%.</p>
        <p>With Fashion Mnist data, the highest performing model for OpenNAS is
again a PSO derived model with an accuracy of 94.3%. This result compares
very favourably with the SOA accuracy of 94.6%. The highest performing model
for DeepSwarm achieved an accuracy of 93.56%.</p>
        <p>In the case of pso-CNN, experiments were conducted on Fashion Mnist but
not on CIFAR-10. The best performing pso-CNN model on Fashion Mnist was
91.9% without dropout and 94.5% with dropout.</p>
        <p>The ndings clearly show that a PSO approach leads to higher model
accuracies given that DeepSwarm is exclusively based on an ACO approach.</p>
        <p>The pre-trained networks of MobileNet and RestNet50 delivered the
poorest performance with CIFAR10. The other pre-trained networks, using VGG
architectures, performed very well on the same dataset.</p>
        <p>With a more complex dataset, such as CIFAR10, the mean performance
improvement of the PSO algorithm is signi cant when compared with ACO.
With con gurations used in this study, PSO achieved a mean accuracy of 85.3%
on CIFAR10 compared with an ACO mean accuracy of 82.2%.</p>
        <p>The approach taken by ACO, in determining the best architecture is very
di erent to the PSO approach. With ACO, simpler models are initially evaluated
at lower depths with progressively more complex models being evaluated at
deeper search levels. Therefore at search depth 1, there is essentially just a
single hidden layer being evaluated. The number of ants speci ed creates new
architectures which simply vary the hyper parameters used for that layer. With
each new depth being explored, an additional layer is added to the architecture
being explored.</p>
        <p>Furthermore, the ACO approach enables the targeting of hyper parameter
optimization within a given layer type rather than optimizing at the overall
architecture level. Specifying a large number of ants, with a reduced depth,
ensures the search space is restricting to studying the e ects of layer hyper
parameters rather than model depth and the constituent layers. By comparison,
the number of layers in PSO generated models is entirely stochastic.
7</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>The OpenNAS approach identi es the hyperparameters within each layer of
networks used for image classi cation of grayscale and color datasets. In addition
the number and type of layers for the neural architecture are also identi ed.
This combined approach generates model architectures which achieve competitve
accuracies when classifying the CIFAR10 and Fashion Mnist datasets.</p>
      <p>The results of swarm intelligence algorithms, in the context of this study, have
generated impressive performances. However, in many cases, their performance is
only marginally better than ne tuned pre-trained VGG models. The accuracies
of PSO derived models have been shown to exceed those of ACO derived models
in the image classi cation of grayscale and color datasets.</p>
      <p>In addition, the OpenNAS integrated approach, using both PSO and ACO
algorithms, yields higher accuracies when compared with DeepSwarm which relies
on a single metaheuristic.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>F.</given-names>
            <surname>Hutter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Kottho</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Vanschoren</surname>
          </string-name>
          ,
          <source>Automated machine learning: methods, systems, challenges. Springer Nature</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>L.</given-names>
            <surname>Kottho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Thornton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. H.</given-names>
            <surname>Hoos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Hutter</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Leyton-Brown</surname>
          </string-name>
          , \
          <article-title>Autoweka 2.0: Automatic model selection and hyperparameter optimization in weka,"</article-title>
          <source>The Journal of Machine Learning Research</source>
          , vol.
          <volume>18</volume>
          , no.
          <issue>1</issue>
          , pp.
          <volume>826</volume>
          {
          <issue>830</issue>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>B.</given-names>
            <surname>Komer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bergstra</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Eliasmith</surname>
          </string-name>
          , \
          <article-title>Hyperopt-sklearn,"</article-title>
          <source>in Automated Machine Learning</source>
          . Springer, Cham,
          <year>2019</year>
          , pp.
          <volume>97</volume>
          {
          <fpage>111</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>H.</given-names>
            <surname>Jin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Song</surname>
          </string-name>
          , and
          <string-name>
            <given-names>X.</given-names>
            <surname>Hu</surname>
          </string-name>
          , \
          <article-title>Auto-keras: An e cient neural architecture search system,"</article-title>
          <source>in Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery &amp; Data Mining</source>
          ,
          <year>2019</year>
          , pp.
          <year>1946</year>
          {
          <year>1956</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>M.</given-names>
            <surname>Feurer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Klein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Eggensperger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. T.</given-names>
            <surname>Springenberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Blum</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F.</given-names>
            <surname>Hutter</surname>
          </string-name>
          , \
          <article-title>Auto-sklearn: e cient and robust automated machine learning,"</article-title>
          <source>in Automated Machine Learning</source>
          . Springer, Cham,
          <year>2019</year>
          , pp.
          <volume>113</volume>
          {
          <fpage>134</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>M.</given-names>
            <surname>Feurer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Eggensperger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Falkner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lindauer</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F.</given-names>
            <surname>Hutter</surname>
          </string-name>
          , \
          <article-title>Auto-sklearn 2.0: The next generation,"</article-title>
          arXiv preprint arXiv:
          <year>2007</year>
          .04074,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>R. S.</given-names>
            <surname>Olson</surname>
          </string-name>
          and
          <string-name>
            <given-names>J. H.</given-names>
            <surname>Moore</surname>
          </string-name>
          , \
          <article-title>Tpot: A tree-based pipeline optimization tool for automating,"</article-title>
          <source>Automated Machine Learning: Methods, Systems</source>
          , Challenges, p.
          <fpage>151</fpage>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>B. A.</given-names>
            <surname>Garro</surname>
          </string-name>
          and
          <string-name>
            <given-names>R. A.</given-names>
            <surname>Vazquez</surname>
          </string-name>
          , \
          <article-title>Designing arti cial neural networks using particle swarm optimization algorithms," Computational intelligence and neuroscience</article-title>
          , vol.
          <year>2015</year>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>M.</given-names>
            <surname>Mavrovouniotis</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Yang</surname>
          </string-name>
          , \
          <article-title>Training neural networks with ant colony optimization algorithms for pattern classi cation,"</article-title>
          <source>Soft Computing</source>
          , vol.
          <volume>19</volume>
          , no.
          <issue>6</issue>
          , pp.
          <volume>1511</volume>
          {
          <issue>1522</issue>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10. Y. LeCun, L. Bottou,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bengio</surname>
          </string-name>
          , and P. Ha ner, \
          <article-title>Gradient-based learning applied to document recognition,"</article-title>
          <source>Proceedings of the IEEE</source>
          , vol.
          <volume>86</volume>
          , no.
          <issue>11</issue>
          , pp.
          <volume>2278</volume>
          {
          <issue>2324</issue>
          ,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <given-names>K.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , S. Ren, and
          <string-name>
            <given-names>J.</given-names>
            <surname>Sun</surname>
          </string-name>
          , \
          <article-title>Deep residual learning for image recognition,"</article-title>
          <source>in Proceedings of the IEEE conference on computer vision and pattern recognition</source>
          ,
          <year>2016</year>
          , pp.
          <volume>770</volume>
          {
          <fpage>778</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <given-names>K.</given-names>
            <surname>Simonyan</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Zisserman</surname>
          </string-name>
          , \
          <article-title>Very deep convolutional networks for large-scale image recognition,"</article-title>
          <source>arXiv preprint arXiv:1409.1556</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <given-names>T.</given-names>
            <surname>Elsken</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. H.</given-names>
            <surname>Metzen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Hutter</surname>
          </string-name>
          et al.,
          <source>\Neural architecture search."</source>
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <given-names>V.</given-names>
            <surname>Sze</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.-H.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <surname>T.-J. Yang</surname>
            , and
            <given-names>J. S.</given-names>
          </string-name>
          <string-name>
            <surname>Emer</surname>
          </string-name>
          , \
          <article-title>E cient processing of deep neural networks: A tutorial and survey,"</article-title>
          <source>Proceedings of the IEEE</source>
          , vol.
          <volume>105</volume>
          , no.
          <issue>12</issue>
          , pp.
          <volume>2295</volume>
          {
          <issue>2329</issue>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <given-names>J.</given-names>
            <surname>Kennedy</surname>
          </string-name>
          and
          <string-name>
            <given-names>R.</given-names>
            <surname>Eberhart</surname>
          </string-name>
          , \
          <article-title>Particle swarm optimization,"</article-title>
          <source>in Proceedings of ICNN'95-International Conference on Neural Networks</source>
          , vol.
          <volume>4</volume>
          . IEEE,
          <year>1995</year>
          , pp.
          <year>1942</year>
          {
          <year>1948</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <given-names>R. C.</given-names>
            <surname>Eberhart</surname>
          </string-name>
          and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Shi</surname>
          </string-name>
          , \
          <article-title>Comparison between genetic algorithms and particle swarm optimization,"</article-title>
          <source>in International conference on evolutionary programming</source>
          . Springer,
          <year>1998</year>
          , pp.
          <volume>611</volume>
          {
          <fpage>616</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <given-names>M.</given-names>
            <surname>Dorigo</surname>
          </string-name>
          and
          <string-name>
            <given-names>L. M.</given-names>
            <surname>Gambardella</surname>
          </string-name>
          , \
          <article-title>Ant colony system: a cooperative learning approach to the traveling salesman problem,"</article-title>
          <source>IEEE Transactions on evolutionary computation</source>
          , vol.
          <volume>1</volume>
          , no.
          <issue>1</issue>
          , pp.
          <volume>53</volume>
          {
          <issue>66</issue>
          ,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <given-names>F. E. F.</given-names>
            <surname>Junior</surname>
          </string-name>
          and
          <string-name>
            <given-names>G. G.</given-names>
            <surname>Yen</surname>
          </string-name>
          , \
          <article-title>Particle swarm optimization of deep neural networks architectures for image classi cation," Swarm and Evolutionary Computation</article-title>
          , vol.
          <volume>49</volume>
          , pp.
          <volume>62</volume>
          {
          <issue>74</issue>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19. E. Byla and
          <string-name>
            <given-names>W.</given-names>
            <surname>Pang</surname>
          </string-name>
          , \Deepswarm:
          <article-title>Optimising convolutional neural networks using swarm intelligence,"</article-title>
          <source>in UK Workshop on Computational Intelligence</source>
          . Springer,
          <year>2019</year>
          , pp.
          <volume>119</volume>
          {
          <fpage>130</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <given-names>A.</given-names>
            <surname>Krizhevsky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Nair</surname>
          </string-name>
          , and G. Hinton, \The cifar-
          <volume>10</volume>
          dataset," online: http://www. cs. toronto. edu/kriz/cifar. html, vol.
          <volume>55</volume>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21. H.
          <string-name>
            <surname>Xiao</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Rasul</surname>
            , and
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Vollgraf</surname>
          </string-name>
          , \
          <article-title>Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms,"</article-title>
          <source>arXiv preprint arXiv:1708.07747</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <given-names>E. D.</given-names>
            <surname>Cubuk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Zoph</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Mane</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Vasudevan</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Q. V.</given-names>
            <surname>Le</surname>
          </string-name>
          , \
          <article-title>Autoaugment: Learning augmentation strategies from data,"</article-title>
          <source>in Proceedings of the IEEE conference on computer vision and pattern recognition</source>
          ,
          <year>2019</year>
          , pp.
          <volume>113</volume>
          {
          <fpage>123</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <given-names>B.</given-names>
            <surname>Ma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Xia</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , \
          <article-title>Autonomous deep learning: A genetic dcnn designer for image classi cation,"</article-title>
          <source>Neurocomputing</source>
          , vol.
          <volume>379</volume>
          , pp.
          <volume>152</volume>
          {
          <issue>161</issue>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>