=Paper=
{{Paper
|id=Vol-1853/p03
|storemode=property
|title=Optimum size of feed forward neural network for
Iris data set
|pdfUrl=https://ceur-ws.org/Vol-1853/p03.pdf
|volume=Vol-1853
|authors=Wojciech Masarczyk
|dblpUrl=https://dblp.org/rec/conf/system/Masarczyk17
}}
==Optimum size of feed forward neural network for
Iris data set==
Optimum size of feed forward neural network for
Iris data set.
Wojciech Masarczyk
Faculty of Applied Mathematics
Silesian University of Technology
Gliwice, Poland
Email: wojcmas042@polsl.pl
Abstract—This paper illustrates a process of finding optimum simultaneously resulting in brain processing speed close to
structure of neural network which will be used to solve problem 1018 operations per second.
of Iris data set. This work presents dependencies between number Why one should even bother applying neural network into any
of layers, neurons and efficiency of network providing the best
configuration for given data set. field of computations? Mostly due to the fact that artificial
neural networks are able to generalize obtained knowledge
Index Terms—feed forward neural network, Iris data set, which means that after proper training such network should
backpropagation algorithm be able to predict correctly value of given example not
included in training data. Second advantage of these models
is robustness for random fluctuations or lack of part of values
I. I NTRODUCTION in data set. Generally neural networks are used to solve
Computers can help on automated models implemented to problems that seems to be incomputable or too complicated
co-work with other devices in service. We can find various ap- to solve by classical algorithms.
plication of intelligent systems in medicine, technology, trans-
port, etc. We expect from computer to control possible dangers
and advise the best options, while assisting humans [3], [4].
B. Artificial neuron
However in these aspects it is necessary to process data of
various origins [13]. We have many possible approaches to One can observe that artificial neuron is really simplified
data analysis. There are solutions devoted to big data [8], versions of human neuron, however still keeps three most
where we use some sophisticated methods implemented for important features of neuron:
knowledge engineering. -taking input from other neurons (dendrites)
Mainly computational intelligence is assisting in data pro- -computing and processing impulses taken as input
cessing to discover knowledge from input data, there are -sending on computed impulse to further neurons(axon)
mathematical models implemented to find incomplete infor-
mation and work with this issues [1], [2]. Similarly we can
find reports on efficiency of neural networks in processing
incoming information [10]. Neural networks are efficient in
processing input data of various types, from voice samples
[15] to handwritings [11]. However to improve efficiency of
processing it is important to discuss optimum size of these
architectures, what will be done in this article using iris data
set as an example.
II. N EURAL NETWORK MODEL
A. Biological inspiration
Neural network is a biologically inspired model of
mathematical computations which structure is based on
architecture of human brain. In the simplest approach
brain consists of approximately 8.6 · 1010 [5] neurons
that are connected with each other creating a network of
1015 connections through which impulses are being sent
Copyright c 2017 held by the authors. Figure 1. Biological and artificial neuron
14
One cycle of computations in neurons may be described
as follow: values are being multiplied by appropriate weights
and summed together in a cell of neuron after which sum is
taken as argument to activation function which value is set to
be impulse send as the output of neuron. Equation describing
this process is presented below:
n
P
y = f( w(xi )j · xi ), (1)
i=1
where
w(xi )j − weight between i and j neuron in consecutive layers,
Figure 2. Graph of different activation functions tested in this paper
xi − value of neuron i,
f − activation function
D. Neural network
In (1) it is assumed that previous layers consists of n neurons. Feed forward neural network is constructed by neurons
stacked in a rows called layers that are fully connected
C. Activation function with previous and subsequent layer. First and last layer are
Activation functions is an abstract indicator that says about respectively called input and output layer, each layer between
action taken in neuron. In simplest case once information is these two is called a hidden layer. Role of neurons in input
important it outputs 1 otherwise neuron outputs 0. This model layer is to store the initial data and send it further. Flow of
is called a binary step function, despite its simplicity it is information takes place from left to right, so once initial data
able to solve a few problems, however it is insufficient with is provided the result will appear at the output layer which
more complex problems due to the lack of desired features ends one full cycle of computations for network.
detailed in following list:
- Finite range results in stability while gradient - based
method are used for learning
- Continuous differentiable is necessary for every gradient -
based method, because of that feature binary step function
cannot be used in models with gradient based learning
algorithms.
- Identity near origin fasten the learning process once initial
weights are small numbers. Figure 3. Example of feed forward neural network with 2 hidden layers
In this paper results will be obtained using only four different
activation functions: According to (1) output of neural network is just a sum of
random weights multiplied by initial values, the only param-
Logistic function: eter that might be modified in neural network are weights so
learning algorithm is just a process of changing weights in
f (x) = 1+e1−βx (2) such way that output is exactly the same as presented in data
set. In this paper backpropagation algorithm will be discussed.
where
β − determines steepness of function near origin, E. Backpropagation algorithm
Hyperbolic tangent: To train the network it is necessary to provide a data
set that consists of vector of input signals (x1 , x2 , ..., xn )
f (x) = tanh (x) = 1+e2−2x − 1 (3) and corresponding desired output z. Thanks to this fact it is
possible to compute the difference between output signal y
Arctangent
and desired value z. Let δ = z − y. Next step is to propagate
δ error through every neuron according to equation:
f (x) = tan1 x (4) n
P
δj = w(j)i δi , (6)
Modified Hyperbolic tangent: i=1
Modification proposed by Yann LeCun [6]
where
f (x) = 1.7159 tanh( 32 x) (5) w(i)j − weight between i and j neuron in consecutive layer,
δi − value of error on neuron i in consecutive layer,
15
Once δ error is computed for each neuron following step isAs batch learning approach is used in this case every iteration
to actualize weights using equation below: is equal to updating every weight in whole neural network
structure. Since then it seems natural to come up with equation
w(j)i = w(j)i + ηδi dfdx
i (x)
yj (7) describing numbers of updates done throughout whole process
where of learning each network.
w(i)j − weight between i and j neuron in consecutive layers, Q=q·r (9)
yj − value of neuron j,
where
fi − activation function on neuron i
r − number of iterations,
δi − value of error on neuron i in consecutive layer,
q − number of neurons in network,
η − coefficient that affects speed of learning.
Q from equation (9) will be used in next section to compare
Learning is an iterative process that takes place until δ for
efficiency of different network structures.
output layer is smaller than desired precision.
A. Results
III. C ONVERGENCE OF GRADIENT DESCENT ALGORITHM
It is clearly visible in table below that in case network
WITH RESPECT TO η
consists of only input and output layers (0 hidden layers) it
While analyzing equation (7) it is obvious that η has a fails to classify properly each of three types, usually being
direct impact on pace of learning of neural network since able to classify correctly two of them while having issues
derivative determines direction where error function descends. with third one. This is caused by the fact that one class is
It is crucial to select balanced value of η. Too small value linearly separable from the other two, however the latter are
may result in slow learning process while value bigger than not linearly separable from each other.
necessary will end with divergence of algorithm. Four different
scenarios of η values are presented in graphical interpretation.
Every test in this paper was carried out with η = 0.02 as it One can observe that Hyperbolic Tangent function is most
appears to be most suitable value of this parameter. uniform function while being most effective taking into con-
IV. I RIS DATA SET sideration number of iterations and number of neurons needed
to obtain desired precision. Similar behaviour can be viewed
Iris data set is a classic and one of the best known sets
with Arctangent function, however it fails to work for networks
for pattern recognition described in [7]. It consists of 150
that were properly computed with Hyperbolic Tan.
instances that are equally divided into three different classes.
On the other side Modified Tangent performed below expecta-
Each class refers to a different type of iris plant. Every sample
tions, however it is important to remember that all tests were
is a vector of length equal four, where vector components
carried out for fixed η. Since steepness of this function is mod-
describes:
ified and value of η should be set with respect to derivatives
1. sepal length [cm]
[6] its underperformance seems to be understandable.
2. sepal width [cm]
3. petal length [cm]
After analyzing table it is clear that adding additional hidden
4. petal width [cm].
layers is not worth the effort since optimum is obtained for 1 or
In order to create training data set, 35 samples were chosen
2 hidden layers. Note that each hidden layer with 10 neurons
from each class, so that 30% of whole set is left for testing
added have to result in decreasing iterations tenfold to be
accuracy of trained model. The aim for neural network is to
computationally profitable, unfortunately this does not happen
classify Iris, described by these four dimensions, into one of
with any function. Moreover, as shown, network that is too
three given classes.
big (5 hidden layers and more) tend to fail this classification
V. T ESTS task at all. It might caused by the fact that Iris Data set is
For further tests η is equal to 0.02, it will be pointed out relatively small (150 examples) and it is not enough to adjust
once η will change. In every case desired error is set to be properly so many weights.
Three graphs presented in this paper represents Error func-
smaller than 10−5 . Maximum epochs are set to 20000.
tion over consecutive iterations. Figure 6 presents nearly
In order to compare obtained results it is necessary to come up
perfect plot that is smooth and monotonic at most of a time. On
with general formula determining number of weights in single
contrary to the Figure 5 which presents heavy oscillations of
network: n
Q Error function caused by wrong η coefficient. Exactly the same
q =4·3 i, (8) situations appears to be on Figure 7 at around 400 iteration,
i=1
luckily it is a situation when η > ηopt and after a few iterations
where of oscillations it finally converges. Values presented in table
i − neurons on each hidden layer, show that network failed to obtain desired precision in 20000
n − number of hidden layers, iterations.
16
Figure 4. Graphical interpretation of η coefficient.
Figure 6. Error function over iterations in 1 hidden layer network
Figure 5. Error function over iterations in 0 hidden layer network
VI. R EMARKS
During designing artificial neural network there is a list
of crucial parameters that need to be set precisely in order
17
Activation function Number of hidden layers Neurons on hidden layers Total weights Iterations Weights · iterations
0 0 12 F F
5 60 F F
1
10 120 10811 1297320
10 - 10 1200 8467 10160400
2 10 - 5 900 F F
5 - 10 900 5909 5318100
8-8-8 6144 F F
Logistic Function 3 10 - 8 - 6 10800 3867 41763600
6 - 8 - 10 10800 553 5972400
6-6-6-6 15552 F F
4 10 - 9 - 8 - 6 51840 F F
6 - 8 - 9 - 10 51840 F F
7-7-7-7-7 201684 F F
5 12 - 10 - 9 - 8 - 7 725760 F F
7 - 8 - 9 - 10 - 12 725760 F F
0 0 12 F F
5 60 259 15540
1
10 120 2047 245640
10 - 10 1200 1124 1348800
2 10 - 5 900 1266 1139400
5 - 10 900 F F
8-8-8 6144 F F
Hyperbolic Tangent 3 10 - 8 - 6 10800 313 3380400
6 - 8 - 10 10800 1359 14677200
6-6-6-6 15552 1512 23514624
4 10 - 9 - 8 - 6 51840 1566 81181440
6 - 8 - 9 - 10 51840 874 45308160
7-7-7-7-7 201684 1072 216205248
5 12 - 10 - 9 - 8 - 7 725760 F F
7 - 8 - 9 - 10 - 12 725760 F F
0 0 12 F F
5 60 699 41940
1
10 120 311 37320
10 - 10 1200 2824 3388800
2 10 - 5 900 2567 2310300
5 - 10 900 4026 3623400
8-8-8 6144 F F
Arctangent 3 10 - 8 - 6 10800 777 8391600
6 - 8 - 10 10800 831 8974800
6-6-6-6 15552 1072 16671744
4 10 - 9 - 8 - 6 51840 235 12182400
6 - 8 - 9 - 10 51840 F F
7-7-7-7-7 201684 F F
5 12 - 10 - 9 - 8 - 7 725760 F F
7 - 8 - 9 - 10 - 12 725760 F F
0 0 12 F F
5 60 248 14880
1
10 120 1245 1297320
10 - 10 1200 F F
2 10 - 5 900 F F
5 - 10 900 F F
8-8-8 6144 F F
Modified Tangent 3 10 - 8 - 6 10800 F F
6 - 8 - 10 10800 F F
6-6-6-6 15552 F F
4 10 - 9 - 8 - 6 51840 F F
6 - 8 - 9 - 10 51840 F F
7-7-7-7-7 201684 F F
5 12 - 10 - 9 - 8 - 7 725760 F F
7 - 8 - 9 - 10 - 12 725760 F F
to achieve properly working model. As most important and • Learning coefficient - highly depends on activation
discussed in this paper are: function, is a crucial factor in converging process of
gradient - based methods of learning,
• Activation function - as turned out, it has a huge impact
on capabilities of network, not only these provided by • Number of hidden layers and neurons - in simple
differentiability but also can significantly accelerate problems like that one discussed in this paper 2 hidden
process of learning, layers are enough,
18
[7] R. Fisher, The use of multiple measurements in taxonomic problems,
Annual Eugenics, vol. 7, pp. 179-188, 1936.
[8] Z. Marszałek, “Novel Recursive Fast Sort Algorithm,” in Communica-
tions in Computer and Information Science, vol. 639, pp. 344–355, 2016,
DOI: 10.1007/978-3-319-46254-7_27.
[9] C. Napoli, E. Tramontana, E., “An object-oriented neural network tool-
box based on design patterns,” in International Conference on Informa-
tion and Software Technologies, pp. 388-399, 2015, DOI: 10.1007/978-
3-319-24770-0_34.
[10] C. Napoli, G. Pappalardo, E. Tramontana, “A mathematical model for
file fragment diffusion and a neural predictor to manage priority queues
over BitTorrent,” in Applied Mathematics and Computer Science, vol.
26, no. 1, pp. 147-160, 2016, DOI: 10.1515/amcs-2016-0010.
[11] D. Połap, M. Woźniak, “Flexible Neural Network Architecture for
Handwritten Signatures Recognition” International Journal of Electron-
ics and Telecommunications, vol. 62, no. 2, pp. 197–202, 2016, DOI:
10.1515/eletel-2016-0027.
[12] C. Napoli, G. Pappalardo, G. M. Tina, E. Tramontana, “Cooper-
ative strategy for optimal management of smart grids by wavelet
rnns and cloud computing” IEEE transactions on neural networks
Figure 7. Error function over iterations in 2 hidden layer network and learning systems, vol. 27, no. 6, pp. 1672–1685, 2016, DOI:
10.1109/TNNLS.2015.2480709.
[13] D. Połap, M. Woźniak, “Introduction to the Model of the Active
Assistance System for Elder and Disabled People,” in Communications
• Number of iterations and desired precision - again, in Computer and Information Science, vol. 639, pp. 392–403, 2016,
strongly depends on problem one tries to solve, there is DOI: 10.1007/978-3-319-46254-7_31.
[14] C. Napoli, G. Pappalardo, E. Tramontana, R. K. Nowicki, J. T. Star-
no general rule that describes how to set these values, czewski, J. T., M. Woźniak, “Toward work groups classification based
however it is important to set upper limit of iterations on probabilistic neural network approach” in International Conference
in order to stop the computations at some point once on Artificial Intelligence and Soft Computing, pp. 79–89. 2016, DOI:
10.1007/978-3-319-19324-3_8.
learning algorithm is unable to achieve desired precision, [15] D. Połap, “Neuro-heuristic voice recognition,” in 2016 Federated Con-
ference on Computer Science and Information Systems, FedCSIS 2016,
Proceedings. 11-14 September, Gdańsk, Poland, IEEE 2016, pp. 487–
490, DOI: 10.15439/2016F128.
VII. S UMMARY
Among various architectures neural networks are one of
most efficient processors of information about controlled ob-
jects, moreover these architectures can cooperate with infor-
mation presented in a form of image, voice sample, network
statistics, etc.
Each implemented solution needs adjusted architecture to
exactly fit the model of decision support. Therefore research
on performance of various types can give valuable information
about performance, what can improve implementations of
neural networks.
R EFERENCES
[1] P. Artiemjew, Stability of Optimal Parameters for Classifier Based on
Simple Granules of Knowledge, Technical Sciences, vol. 14, no. 1, pp.
57-69. UWM Publisher, Olsztyn 2011.
[2] P. Artiemjew, P. Gorecki, K. Sopyła, Categorization of Similar Objects
Using Bag of Visual Words and k – Nearest Neighbour Classifier,
Technical Sciences, vol. 15, no.2, pp. 293-305, UWM Publisher, Olsztyn
2012.
[3] R. Damasevicius, M. Vasiljevas, J. Salkevicius, M. Woźniak, “Human
Activity Recognition in AAL Environments Using Random Projec-
tions” Comp. Math. Methods in Medicine, vol. 2016, pp. 4073584:1–
4073584:17, 2016, DOI: 10.1155/2016/4073584.
[4] R. Damasevicius, R. Maskeliunas, A. Venckauskas, M. Woźniak,
“Smartphone User Identity Verification Using Gait Characteris-
tics” Symmetry, vol. 8, no. 10, pp. 100:1–100:20, 2016, DOI:
10.3390/sym8100100.
[5] S. Herculano-Houzel, The human brain in numbers: a linearly scaled-up
primate brain, Frontiers in Human Neuroscience vol. 3, pp. 31:1-31:11,
2009, DOI: 10.3389/neuro.09.031.2009
[6] Y. Le Cun, L. Bottou, G. Orr, K. Muller, Efficient backprop. Neural
Networks: Tricks of the trade, 1998.
19