<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Neural Architecture Search using Particle Swarm and Ant Colony Optimization</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Séamus</forename><surname>Lankford</surname></persName>
							<email>seamus.lankford@adaptcentre.ie</email>
							<affiliation key="aff0">
								<orgName type="department">Adapt Centre</orgName>
								<orgName type="institution">Dublin City University</orgName>
								<address>
									<country key="IE">Ireland</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Diarmuid</forename><surname>Grimes</surname></persName>
							<email>diarmuid.grimes@cit.ie</email>
							<affiliation key="aff1">
								<orgName type="institution">Cork Institute of Technology</orgName>
								<address>
									<country key="IE">Ireland</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Neural Architecture Search using Particle Swarm and Ant Colony Optimization</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">31DDF8E7C3E9D37F09BAAC679C5AE528</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-25T02:39+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>AutoML</term>
					<term>NAS</term>
					<term>Swarm Intelligence</term>
					<term>PSO</term>
					<term>ACO</term>
					<term>CNN</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Neural network models have a number of hyperparameters that must be chosen along with their architecture. This can be a heavy burden on a novice user, choosing which architecture and what values to assign to parameters. In most cases, default hyperparameters and architectures are used. Significant improvements to model accuracy can be achieved through the evaluation of multiple architectures. A process known as Neural Architecture Search (NAS) may be applied to automatically evaluate a large number of such architectures.</p><p>A system integrating open source tools for Neural Architecture Search (OpenNAS), in the classification of images, has been developed as part of this research. OpenNAS takes any dataset of grayscale, or RBG images, and generates Convolutional Neural Network (CNN) architectures based on a range of metaheuristics using either an AutoKeras, a transfer learning or a Swarm Intelligence (SI) approach. Particle Swarm Optimization (PSO) and Ant Colony Optimization (ACO) are used as the SI algorithms. Furthermore, models developed through such metaheuristics may be combined using stacking ensembles. In the context of this paper, we focus on training and optimizing CNNs using the Swarm Intelligence (SI) components of OpenNAS. Two major types of SI algorithms, namely PSO and ACO, are compared to see which is more effective in generating higher model accuracies. It is shown, with our experimental design, that the PSO algorithm performs better than ACO. The performance improvement of PSO is most notable with a more complex dataset. As a baseline, the performance of fine-tuned pre-trained models is also evaluated.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>The area of Auto Machine Learning (AutoML) <ref type="bibr" target="#b0">[1]</ref> is a growing area of interest in recent years. This is reflected in the development of several open source AutoML libraries among which include Auto-WEKA <ref type="bibr" target="#b1">[2]</ref>, Hyperopt-Sklearn <ref type="bibr" target="#b2">[3]</ref>, AutoKeras <ref type="bibr" target="#b3">[4]</ref>, Auto-Sklearn <ref type="bibr" target="#b4">[5,</ref><ref type="bibr" target="#b5">6]</ref> and TPOT <ref type="bibr" target="#b6">[7]</ref>.</p><p>Despite a renewal of interest in AutoML, many of these open source solutions focus on creating simpler neural architectures. Libraries which concentrate on generating more complex architectures, such as CNNs, are at early stages of development. Consequently they are poorly documented and often unreliable <ref type="bibr" target="#b3">[4]</ref>. In addition, the alternative of using commercial platforms is expensive and therefore users are left with few practical or viable options.</p><p>The development of OpenNAS integrates several metaheuristic approaches in a single application used for the neural architecture search of more complex neural architectures such as convolutional neural networks. Furthermore, the effectiveness of NAS in generating good neural architectures for image classification is evaluated. Standard approaches to NAS, using the AutoKeras framework, are also incorporated into the system design.</p><p>A key aspect of the study is to contrast Swarm Intelligence (SI) algorithms for NAS. Consequently, Particle Swarm Optimization (PSO) <ref type="bibr" target="#b7">[8]</ref> and Ant Colony Optimization (ACO) <ref type="bibr" target="#b8">[9]</ref> have been chosen as metaheuristics for creating high performing CNN architectures for grayscale and RGB image datasets.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Background</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">Convolutional Neural Networks</head><p>CNNs are feed-forward Deep Neural Networks (DNNs) used for image recognition. The original CNN architecture was proposed by LeCun <ref type="bibr" target="#b9">[10]</ref> and consisted of two convolution layers, two pooling layers, two fully connected (FC) layers and an output layer. Subsequently, numerous models were developed including popular ones such as ResNet <ref type="bibr" target="#b10">[11]</ref> and VGG <ref type="bibr" target="#b11">[12]</ref>. In this study, custom CNN architectures are created by using SI heuristics to find better combinations of convolutional, pooling and FC layers.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">Auto ML</head><p>AutoML involves the automation of the entire machine learning pipeline including data augmentation, feature engineering, model selection, choice of hyper parameters and finally neural architecture selection and creation. By constrast, NAS has a more narrow focus in that it concentrates on neural architecture selection and creation <ref type="bibr" target="#b12">[13]</ref>.</p><p>Tree-based Pipeline Optimization Tool (TPOT) is an open source python package that uses genetic programming in optimizing the machine learning pipeline <ref type="bibr" target="#b6">[7]</ref>. The library performs well on simple NAS tasks involving the scikitlearn API. Given this study involves generating more complex CNNs, rather than developing optimal pipelines, it was decided not to use TPOT as part of the initial solution architecture. However, as part of future work, it may have a role in optimizing hyper parameter selection.</p><p>AutoKeras <ref type="bibr" target="#b3">[4]</ref> is an open source AutoML system using Bayesian optimization and network morphism for efficient neural architecture search.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3">Neural Architecture Search</head><p>Neural architecture search is the process of automatically finding and tuning DNNs. It has been shown that DNNs have made remarkable progress in solving many real world problems such as image recognition, speech recognition and machine translation <ref type="bibr" target="#b13">[14]</ref>. In general, NAS systems consist of three main components: a search space, a search algorithm and an evaluation strategy. The search space sets out which architectures can be used in principle whereas the search strategy outlines how the search space is explored. Finally the evaluation strategy determines which architectures yield the best results on unseen data.</p><p>A basic approach to NAS is the brute force training and evaluation of all possible model combinations. On completion, the best performing model is selected. However, this is impractical due to the combinatorics of the problem. Using metaheuristics, such as swarm intelligence, is an alternative which seeks the best model within reasonable time constraints.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.4">Swarm Intelligence</head><p>Swarm Intelligence, a category of Evolutionary Computing, has been used for classification problems in the following forms: Particle Swarm Optimization (PSO) <ref type="bibr" target="#b14">[15,</ref><ref type="bibr" target="#b15">16]</ref> and Ant Colony Optimization (ACO) <ref type="bibr" target="#b16">[17]</ref>.</p><p>Particle Swarm Optimization PSO belongs to the class of swarm intelligence techniques and is a population-based stochastic technique for solving optimization problems developed in 1995. An open source python library, for CNN optimization using the PSO algorithm was developed by Fernandes et al <ref type="bibr" target="#b17">[18]</ref>. The results demonstrate that their approach, psoCNN, quickly finds CNN architectures which offer competitive performance for any given dataset.</p><p>Ant Colony Optimization ACO, modelled on the activities of real ant colonies, involves moving through a parameter space of all potential solutions to find the optimal weights for a neural network.</p><p>Using ACO, a system known as DeepSwarm was developed by Byla and Pang <ref type="bibr" target="#b18">[19]</ref> to find high performing neural architectures for CNNs. They showed that it offers competitive performance when tested on well-known image datasets.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Approach</head><p>With artificial neural networks, there are many parameters to choose from such as the number of hidden network layers, number of neurons per layer, type of activation function, choice of optimizer and so on. The final network design often depends on the problem domain and is typically achieved in a time consuming trial and error fashion.</p><p>Similar problems exist with CNNs but these problems are exacerbated by the length of time, and amount of computational resources required to train such networks. Clearly, a core objective of NAS is to find good network performance within acceptable time limits through the reduction of both the number of networks tested and the length of time required for their evaluation. The implementation of NAS can be achieved through a variety of approaches including transfer learning using pre-trained networks, network morphism or swarm intelligence. Using these approaches as its pillars, a NAS system (OpenNAS) has been built which tackles such problems<ref type="foot" target="#foot_0">3</ref> . OpenNAS does not enforce a particular architecture but rather it allows novel and interesting architectures to be discovered.</p><p>In this work we focus on the swarm intelligence component of the OpenNAS system. The swarm optimization techniques currently used are Particle Swarm Optimization and Ant Colony Optimization. The PSO algorithm determines how the principal CNN layer types, and their associated hyperparameters, are connected together. The generated models consist of architectures using a mix of convolutional, average pooling, max pooling and fully connected layers. In addition, dropout layers and batch normalization layers are also added to alleviate overfitting. The hyperparameters associated with each layer type are indicated in Table <ref type="table" target="#tab_0">1</ref>.</p><p>Particle architectures, i.e. model architectures, are compiled for a number of epochs and evaluation is carried out using the standard loss function of crossentropy loss. Particle architectures with the smallest loss are selected by the algorithm. The number of epochs parameter for pBest must be carefully chosen since it is the main driver of both run time and model accuracy.</p><p>Using an ACO approach, the parameters used for model training in the exploration process are highlighted in Table <ref type="table" target="#tab_1">2</ref>. Two test configurations are considered. In the first case, 8 ants are used with 30 epochs and in the second case, 16 ants are used with 15 epochs. The depth parameter was fixed at 20. Fine tuning was implemented by initially removing the fully connected layers from the top of the model. Two blocks are then added, each of which has a fully connected layer, a batch normalization layer and a dropout layer. The hybrid structure is then trained with the new dataset. Fine tuning of a VGG16 network is illustrated in Figure <ref type="figure">1</ref>. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Design</head><p>The high level view of the system architecture is presented in Figure <ref type="figure">2</ref>. The system is organized into the following python modules: OpenNAS, pre-processor, trainer, ensemble, super stacker, sysconfig and loader. The pre-train function uses transfer learning as either a feature extractor or to fine tune the pre-trained networks of VGG16, VGG19, MobileNet or ResNet50.</p><p>With the swarm function, PSO or ACO can be used to search for the best neural architecture. Existing open source python libraries were customized for both PSO and ACO functionality. Particle swarms were implemented using a psoCNN library <ref type="bibr" target="#b17">[18]</ref> whereas ant colonies used the DeepSwarm library <ref type="bibr" target="#b18">[19]</ref> Existing NAS tools, such as AutoKeras, were also integrated into the Open-NAS system. AutoKeras is a powerful open source library which provides functions to automatically search for optimal architectures for deep learning models.</p><p>However, this library is still in beta development and the associated documentation is quite poor.</p><p>With the ensemble module, there are options to build stacked ensembles using either homogeneous or heterogeneous base learners. These learner outputs are subsequently passed to a suite of meta learner algorithms. The system generates the optimal neural architecture model using the chosen heuristic.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Evaluation</head><p>Two datasets were chosen for the experimental design, namely CIFAR10 <ref type="bibr" target="#b19">[20]</ref> and Fashion Mnist <ref type="bibr" target="#b20">[21]</ref>. A primary research objective is the development of a Neural Architecture Search tool which generates high performing architectures for generic datasets of either grayscale (one channel) or colour (triple channel) images. The CIFAR10 dataset meets this requirement in that it is a challenging dataset of colour images. The Fashion Mnist dataset is also suitable since it a well-tested and well understood dataset of black and white images. For reference, the state of the art (SOA) accuracy achieved on CIFAR10 is 98.5% <ref type="bibr" target="#b21">[22]</ref> whereas with Fashion Mnist, the SOA accuracy is 94.6% <ref type="bibr" target="#b22">[23]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1">Particle Swarm Optimization</head><p>In order to test variance and reproducibility, each configuration was run 5 times on both CIFAR10 and Fashion Minst which resulted in the evaluation of 4000 CNN architectures for this phase of the study.</p><p>Evaluation of models trained on CIFAR10 dataset Validation accuracy was used to evaluate the performance of both PSO configurations. It is clear from Table <ref type="table" target="#tab_2">3</ref> that the PSO model trained on swarm settings of a lower population and higher number of iterations (population of 10 and 20 iterations) performed significantly better. In terms of accuracy, the mean performance was 3.5% better. Both configurations have a very low standard deviation for model accuracy indicating a high level of reproducibility between test runs. At a mean run time of 21.9 hours for the first configuration and 18.6 hours for the second configuration, the PSO search for CNN architectures is a slow process considering high performance workstations, with NVIDIA GeForce GTX 1080 Ti graphic cards, were used.</p><p>Evaluation of models trained on Fashion Mnist dataset PSO models trained on the Fashion Mnist dataset (Table <ref type="table" target="#tab_3">4</ref>), achieved much higher accuracy compared with models developed using CIFAR10 data. Similar to CIFAR10, the low standard deviation associated with both implementations of Fashion Mnist models indicate the PSO approach produces consistent results between different test runs. The stochastic nature of metaheuristics impacts the run times associated with PSO for both Fashion Mnist and CIFAR10. In all tests, no clear pattern emerged with regard to run times: CIFAR10 was faster using a population of 10 with 20 iterations whereas Fashion Mnist was faster with a population of 20 with 10 iterations. Therefore, in terms of run time, no clear conclusion could be drawn by doubling the population and halving the iterations.</p><p>With regard to the impact of swarm settings on model accuracy for Fashion Mnist, again there is little to separate the configurations. With a mean accuracy of 93.5% for a population of 20 with 10 iterations and a corresponding mean model accuracy of 93.2% using a population of 10 with 20 iterations, no clear conclusion can be drawn.</p><p>Therefore, unlike CIFAR10, changing the swarm settings by doubling population and halving iterations does not impact model accuracy in the case of Fasion Mnist. Both configurations for the PSO algorithm perform well on this dataset.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2">Ant Colony Optimization</head><p>Similar to other metaheuristics, there are several parameters which can be tuned for optimal neural architecture search using Ant Colony Optimization <ref type="bibr" target="#b16">[17]</ref>. With OpenNAS, users may select the options of depth, number of ants and number of epochs in directing how the neural architecture search is conducted.</p><p>Evaluation of models trained on CIFAR10 dataset With CIFAR10 data, the results from Table <ref type="table" target="#tab_4">5</ref> indicate that a greater number of ants leads to higher model accuracy. The improvement in max model accuracy achieved, through doubling the number of ants and halving the number of epochs, was modest at just 1.2%. The impact on run time for a small increase in accuracy was severe. Doubling the number of ants effectively doubled the run time (even though the number of epochs was halved). The standard deviation for accuracy is very low indicating good reproducibility between the various test runs. Evaluation of models trained on Fashion Mnist dataset The performance of ACO models using Fashion Mnist data is highlighted Table <ref type="table" target="#tab_5">6</ref>. It can be seen that both configurations perform well resulting in accuracies greater than 93%.</p><p>The difference in mean model accuracy between configuration A (8 ants and 30 epochs) is trivial when compared to configuration B (8 ants and 30 epochs). However, similar to ACO on CIFAR10, the difference in run time is very significant for configuration B. Effectively it took over 7 hours longer to achieve an accuracy improvement of 0.1%. Clearly in the case of a simpler dataset such as Fashion Mnist, using a number of ants in excess of 8 is not worth doing. This finding is similar to that seen with the more complex CIFAR10 dataset, above. Therefore choosing the number of ants, used for this ACO implementation, is an important consideration impacting run time performance. As anticipated, the standard deviation for accuracy is also very low indicating good reproducibility between the various test runs. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">Discussion</head><p>The OpenNAS performance of all models across both datasets is illustrated in Figure <ref type="figure">3</ref>. The results demonstrate performance comparable to that achieved by the pso-CNN <ref type="bibr" target="#b17">[18]</ref> approach and better than that of DeepSwarm <ref type="bibr" target="#b18">[19]</ref>.</p><p>The highest accuracy of OpenNAS in CIFAR-10 classification was 90.0%. This was achieved using using a PSO-derived model. By comparison, DeepSwarm achieved a top accuracy of 88.7%.</p><p>With Fashion Mnist data, the highest performing model for OpenNAS is again a PSO derived model with an accuracy of 94.3%. This result compares very favourably with the SOA accuracy of 94.6%. The highest performing model for DeepSwarm achieved an accuracy of 93.56%.</p><p>In the case of pso-CNN, experiments were conducted on Fashion Mnist but not on CIFAR-10. The best performing pso-CNN model on Fashion Mnist was 91.9% without dropout and 94.5% with dropout.</p><p>The findings clearly show that a PSO approach leads to higher model accuracies given that DeepSwarm is exclusively based on an ACO approach.</p><p>The pre-trained networks of MobileNet and RestNet50 delivered the poorest performance with CIFAR10. The other pre-trained networks, using VGG architectures, performed very well on the same dataset.</p><p>With a more complex dataset, such as CIFAR10, the mean performance improvement of the PSO algorithm is significant when compared with ACO. With configurations used in this study, PSO achieved a mean accuracy of 85.3% on CIFAR10 compared with an ACO mean accuracy of 82.2%.</p><p>The approach taken by ACO, in determining the best architecture is very different to the PSO approach. With ACO, simpler models are initially evaluated at lower depths with progressively more complex models being evaluated at deeper search levels. Therefore at search depth 1, there is essentially just a single hidden layer being evaluated. The number of ants specified creates new architectures which simply vary the hyper parameters used for that layer. With each new depth being explored, an additional layer is added to the architecture being explored.</p><p>Furthermore, the ACO approach enables the targeting of hyper parameter optimization within a given layer type rather than optimizing at the overall architecture level. Specifying a large number of ants, with a reduced depth, ensures the search space is restricting to studying the effects of layer hyper parameters rather than model depth and the constituent layers. By comparison, the number of layers in PSO generated models is entirely stochastic.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7">Conclusion</head><p>The OpenNAS approach identifies the hyperparameters within each layer of networks used for image classification of grayscale and color datasets. In addition the number and type of layers for the neural architecture are also identified. This combined approach generates model architectures which achieve competitve accuracies when classifying the CIFAR10 and Fashion Mnist datasets.</p><p>The results of swarm intelligence algorithms, in the context of this study, have generated impressive performances. However, in many cases, their performance is only marginally better than fine tuned pre-trained VGG models. The accuracies of PSO derived models have been shown to exceed those of ACO derived models in the image classification of grayscale and color datasets.</p><p>In addition, the OpenNAS integrated approach, using both PSO and ACO algorithms, yields higher accuracies when compared with DeepSwarm which relies on a single metaheuristic.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Fig. 1 .Fig. 2 .</head><label>12</label><figDesc>Fig. 1. Tuned VGG16 model</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head></head><label></label><figDesc>. The environment required Python 3.7, Tensorflow 1.14, Keras 2.2.4, Numpy 1.16.4 and Matplotplib 3.1.0.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0"><head></head><label></label><figDesc></figDesc><graphic coords="10,134.77,115.84,345.84,171.34" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 .</head><label>1</label><figDesc>Parameters for Particle Swarm Optimization</figDesc><table><row><cell cols="3">Config A Config B</cell></row><row><cell>Swarm</cell><cell></cell><cell></cell></row><row><cell>Number of iterations</cell><cell>10</cell><cell>20</cell></row><row><cell>Swarm size</cell><cell>20</cell><cell>10</cell></row><row><cell>Cg</cell><cell>0.5</cell><cell>0.5</cell></row><row><cell>CNN architecture</cell><cell></cell><cell></cell></row><row><cell>Minimum outputs from a Conv layer</cell><cell>3</cell><cell>3</cell></row><row><cell>Maximum outputs from a Conv layer</cell><cell>256</cell><cell>256</cell></row><row><cell>Maximum neurons in a FC layer</cell><cell>300</cell><cell>300</cell></row><row><cell>Minimum size of a Conv kernel</cell><cell>3 x 3</cell><cell>3 x 3</cell></row><row><cell>Maximum size of a Conv kernel</cell><cell>7 x 7</cell><cell>7 x 7</cell></row><row><cell>Minimum layers</cell><cell>3</cell><cell>3</cell></row><row><cell>Maximum layers</cell><cell>20</cell><cell>20</cell></row><row><cell>CNN Training</cell><cell></cell><cell></cell></row><row><cell># epochs for particle evaluation</cell><cell>5</cell><cell>5</cell></row><row><cell># epochs for global best</cell><cell>100</cell><cell>100</cell></row><row><cell>Dropout rate</cell><cell>0.5</cell><cell>0.5</cell></row><row><cell>Batch normalize layer outputs</cell><cell>Yes</cell><cell>Yes</cell></row><row><cell>Probability Settings</cell><cell></cell><cell></cell></row><row><cell>probability convolution</cell><cell>0.6</cell><cell>0.6</cell></row><row><cell>probability pooling</cell><cell>0.3</cell><cell>0.3</cell></row><row><cell>probability fully connected</cell><cell>0.1</cell><cell>0.1</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2 .</head><label>2</label><figDesc>Parameters for Ant Colony Optimization</figDesc><table><row><cell></cell><cell cols="2">Config A Config B</cell></row><row><cell>Ant Colony</cell><cell></cell><cell></cell></row><row><cell>Number of Ants</cell><cell>8</cell><cell>16</cell></row><row><cell>Number of Epochs</cell><cell>30</cell><cell>15</cell></row><row><cell>Search Depth</cell><cell>20</cell><cell>20</cell></row><row><cell cols="2">CNN architecture</cell><cell></cell></row><row><cell>Kernel Sizes</cell><cell>1, 3, 5</cell><cell>1, 3, 5</cell></row><row><cell>Minimum layers</cell><cell>1</cell><cell>1</cell></row><row><cell>Maximum layers</cell><cell>20</cell><cell>20</cell></row><row><cell cols="2">CNN Training</cell><cell></cell></row><row><cell>Dropout rate</cell><cell cols="2">0.1, 0.3, 0.5 0.1, 0.3, 0.5</cell></row><row><cell>Batch normalize layer outputs</cell><cell>Yes</cell><cell>Yes</cell></row><row><cell cols="2">Probability Settings</cell><cell></cell></row><row><cell>pheromone start, decay, evaporation</cell><cell>0.1</cell><cell>0.1</cell></row><row><cell>greediness</cell><cell>0.5</cell><cell>0.5</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 3 .</head><label>3</label><figDesc>Performance of PSO models on CIFAR10</figDesc><table><row><cell>Model</cell><cell>Acc Max</cell><cell>Acc Mean</cell><cell>Acc StDev</cell><cell>Time (min)</cell><cell>Layers</cell></row><row><cell>Population 10 Iterations 20</cell><cell cols="4">0.900 0.853 0.044 1316</cell><cell>30</cell></row><row><cell>Population 20 Iterations 10</cell><cell cols="4">0.883 0.818 0.053 1119</cell><cell>38</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 4 .</head><label>4</label><figDesc>Performance of PSO models on Fashion Mnist</figDesc><table><row><cell>Model</cell><cell>Acc Max</cell><cell>Acc Mean</cell><cell>Acc StDev</cell><cell>Time (min)</cell><cell>Layers</cell></row><row><cell>Population: 10 Iterations: 20</cell><cell cols="4">0.943 0.932 0.009 994</cell><cell>30</cell></row><row><cell>Population: 20 Iterations: 10</cell><cell cols="4">0.943 0.935 0.008 1319</cell><cell>38</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_4"><head>Table 5 .</head><label>5</label><figDesc>Performance of ACO models on CIFAR10</figDesc><table><row><cell>Model</cell><cell>Acc Max</cell><cell>Acc Mean</cell><cell>Acc StDev</cell><cell>Time (min)</cell><cell>Layers</cell></row><row><cell>Ants: Epochs: 30 8</cell><cell cols="4">0.848 0.822 0.025 541</cell><cell>18</cell></row><row><cell>Ants: Epochs: 15 16</cell><cell cols="4">0.836 0.821 0.014 1004</cell><cell>16</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_5"><head>Table 6 .</head><label>6</label><figDesc>Performance of ACO models on Fashion MnistFig. 3. OpenNAS Performance of all Models on CIFAR10 and Fashion Mnist</figDesc><table><row><cell>Model</cell><cell>Acc Max</cell><cell>Acc Mean</cell><cell>Acc StDev</cell><cell>Time (min)</cell><cell>Layers</cell></row><row><cell>Ants: Epochs: 30 8</cell><cell cols="4">0.934 0.931 0.002 375</cell><cell>7</cell></row><row><cell>Ants: Epochs: 15 16</cell><cell cols="4">0.934 0.932 0.004 837</cell><cell>19</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_0">https://github.com/seamusl/OpenNAS-v1</note>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<title level="m" type="main">Automated machine learning: methods, systems, challenges</title>
		<author>
			<persName><forename type="first">F</forename><surname>Hutter</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Kotthoff</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Vanschoren</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2019">2019</date>
			<publisher>Springer Nature</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Autoweka 2.0: Automatic model selection and hyperparameter optimization in weka</title>
		<author>
			<persName><forename type="first">L</forename><surname>Kotthoff</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Thornton</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">H</forename><surname>Hoos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Hutter</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Leyton-Brown</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">The Journal of Machine Learning Research</title>
		<imprint>
			<biblScope unit="volume">18</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="826" to="830" />
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Hyperopt-sklearn</title>
		<author>
			<persName><forename type="first">B</forename><surname>Komer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Bergstra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Eliasmith</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Automated Machine Learning</title>
				<meeting><address><addrLine>Cham</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="97" to="111" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Auto-keras: An efficient neural architecture search system</title>
		<author>
			<persName><forename type="first">H</forename><surname>Jin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Song</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Hu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery &amp; Data Mining</title>
				<meeting>the 25th ACM SIGKDD International Conference on Knowledge Discovery &amp; Data Mining</meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="1946" to="1956" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Auto-sklearn: efficient and robust automated machine learning</title>
		<author>
			<persName><forename type="first">M</forename><surname>Feurer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Klein</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Eggensperger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">T</forename><surname>Springenberg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Blum</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Hutter</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Automated Machine Learning</title>
				<meeting><address><addrLine>Cham</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="113" to="134" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<author>
			<persName><forename type="first">M</forename><surname>Feurer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Eggensperger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Falkner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Lindauer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Hutter</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2007.04074</idno>
		<title level="m">Auto-sklearn 2.0: The next generation</title>
				<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Tpot: A tree-based pipeline optimization tool for automating</title>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">S</forename><surname>Olson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">H</forename><surname>Moore</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Automated Machine Learning: Methods, Systems, Challenges</title>
				<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page">151</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Designing artificial neural networks using particle swarm optimization algorithms</title>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">A</forename><surname>Garro</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">A</forename><surname>Vázquez</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Computational intelligence and neuroscience</title>
		<imprint>
			<biblScope unit="volume">2015</biblScope>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Training neural networks with ant colony optimization algorithms for pattern classification</title>
		<author>
			<persName><forename type="first">M</forename><surname>Mavrovouniotis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Yang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Soft Computing</title>
		<imprint>
			<biblScope unit="volume">19</biblScope>
			<biblScope unit="issue">6</biblScope>
			<biblScope unit="page" from="1511" to="1522" />
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Gradient-based learning applied to document recognition</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Lecun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Bottou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Bengio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Haffner</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the IEEE</title>
				<meeting>the IEEE</meeting>
		<imprint>
			<date type="published" when="1998">1998</date>
			<biblScope unit="volume">86</biblScope>
			<biblScope unit="page" from="2278" to="2324" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Deep residual learning for image recognition</title>
		<author>
			<persName><forename type="first">K</forename><surname>He</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Ren</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Sun</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the IEEE conference on computer vision and pattern recognition</title>
				<meeting>the IEEE conference on computer vision and pattern recognition</meeting>
		<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="770" to="778" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<monogr>
		<title level="m" type="main">Very deep convolutional networks for large-scale image recognition</title>
		<author>
			<persName><forename type="first">K</forename><surname>Simonyan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Zisserman</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1409.1556</idno>
		<imprint>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b12">
	<monogr>
		<title level="m" type="main">Neural architecture search</title>
		<author>
			<persName><forename type="first">T</forename><surname>Elsken</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">H</forename><surname>Metzen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Hutter</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Efficient processing of deep neural networks: A tutorial and survey</title>
		<author>
			<persName><forename type="first">V</forename><surname>Sze</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y.-H</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T.-J</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">S</forename><surname>Emer</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the IEEE</title>
				<meeting>the IEEE</meeting>
		<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="volume">105</biblScope>
			<biblScope unit="page" from="2295" to="2329" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Particle swarm optimization</title>
		<author>
			<persName><forename type="first">J</forename><surname>Kennedy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Eberhart</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of ICNN&apos;95-International Conference on Neural Networks</title>
				<meeting>ICNN&apos;95-International Conference on Neural Networks</meeting>
		<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="1995">1995</date>
			<biblScope unit="volume">4</biblScope>
			<biblScope unit="page" from="1942" to="1948" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Comparison between genetic algorithms and particle swarm optimization</title>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">C</forename><surname>Eberhart</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Shi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International conference on evolutionary programming</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="1998">1998</date>
			<biblScope unit="page" from="611" to="616" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Ant colony system: a cooperative learning approach to the traveling salesman problem</title>
		<author>
			<persName><forename type="first">M</forename><surname>Dorigo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">M</forename><surname>Gambardella</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions on evolutionary computation</title>
		<imprint>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="53" to="66" />
			<date type="published" when="1997">1997</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Particle swarm optimization of deep neural networks architectures for image classification</title>
		<author>
			<persName><forename type="first">F</forename><forename type="middle">E F</forename><surname>Junior</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">G</forename><surname>Yen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Swarm and Evolutionary Computation</title>
		<imprint>
			<biblScope unit="volume">49</biblScope>
			<biblScope unit="page" from="62" to="74" />
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Deepswarm: Optimising convolutional neural networks using swarm intelligence</title>
		<author>
			<persName><forename type="first">E</forename><surname>Byla</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Pang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">UK Workshop on Computational Intelligence</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="119" to="130" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<monogr>
		<title level="m" type="main">The cifar-10 dataset</title>
		<author>
			<persName><forename type="first">A</forename><surname>Krizhevsky</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Nair</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Hinton</surname></persName>
		</author>
		<ptr target="http://www.cs.toronto.edu/kriz/cifar.html" />
		<imprint>
			<date type="published" when="2014">2014</date>
			<biblScope unit="volume">55</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<monogr>
		<title level="m" type="main">Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms</title>
		<author>
			<persName><forename type="first">H</forename><surname>Xiao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Rasul</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Vollgraf</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1708.07747</idno>
		<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Autoaugment: Learning augmentation strategies from data</title>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">D</forename><surname>Cubuk</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Zoph</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Mane</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Vasudevan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><forename type="middle">V</forename><surname>Le</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the IEEE conference on computer vision and pattern recognition</title>
				<meeting>the IEEE conference on computer vision and pattern recognition</meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="113" to="123" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">Autonomous deep learning: A genetic dcnn designer for image classification</title>
		<author>
			<persName><forename type="first">B</forename><surname>Ma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Xia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Neurocomputing</title>
		<imprint>
			<biblScope unit="volume">379</biblScope>
			<biblScope unit="page" from="152" to="161" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
