<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Analysis of the Modular Topology of Hybrid Neural Networks</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>O.I. Chumachenko</string-name>
          <email>chumachenko@tk.kpi.ua</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>K.D. Riazanovskiy</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>A.T. Kot</string-name>
          <email>anatoly.kot@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Technical Cybernetic Department National Technical University of Ukraine “Ihor Sikorsky Kyiv Polytechnic Institute” Kyiv</institution>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Technical Cybernetic Department National Technical University of Ukraine “Ihor Sikorsky Kyiv Polytechnic Institute” Kyiv</institution>
          ,
          <addr-line>Ukraine ORCID 0000-0003-3006-7460</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>-the report discusses the structures of modules composition and the problems associated with their learning. The optimal algorithm for modules learning for the classification problem is considered. Examples of specific structures are given. The structural-parametric synthesis of an ensemble of modules of neural networks are described. The results of the training of the modules and ensembles are presented, as well as a comparison with the results of the training of individual neural networks.</p>
      </abstract>
      <kwd-group>
        <kwd>neural network</kwd>
        <kwd>module</kwd>
        <kwd>ensemble</kwd>
        <kwd>topology</kwd>
        <kwd>classification</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>INTRODUCTION</p>
      <p>
        The classification problem is one of the most frequent
tasks that arises in the field of machine learning. It is common
in many areas of life. Researchers from around the world are
developing tools and algorithms to solve this problem
[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ][
        <xref ref-type="bibr" rid="ref2">2</xref>
        ][
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Due to its diversity, the data classification problem
cannot always be solved by the same tools and algorithms,
one of the most successful are neural networks (NN) [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Due
to their flexibility and scalability, they can solve the most
diverse and complex problems that are beyond the power of
classical machine learning algorithms.
      </p>
      <p>For more difficult tasks, the new development branch of
NN has become to combine them into one module: one large
network, which consists of several base networks. In the
future, to increase accuracy, it is possible to combine such
modules into ensembles. This will compensate for the
shortcomings of each module by others, which will certainly
have a positive impact on the final result of learning.</p>
      <p>II.</p>
    </sec>
    <sec id="sec-2">
      <title>PROBLEM STATEMENT</title>
      <p>The goal of this report is to research the possible
structures of the module, the topologies of the networks
included in the module and the machine learning of the
module for solving the classification problem. The further
task of using the NN modules is to combine them into an
ensemble.</p>
      <p>III.</p>
    </sec>
    <sec id="sec-3">
      <title>PROBLEM SOLUTION</title>
      <sec id="sec-3-1">
        <title>A. Modules</title>
        <p>The module topology involves the sequential combination
of several different neural network architectures. In general,
the module operates in the same way as an individual NN. Its
advantage is the combination of various data transformations,
which allows to obtain more accurate results. The module
topology is presented in Fig. 1.</p>
        <p>Fig. 1. Module topology</p>
        <p>Modern technologies allow to operate with huge networks
as with simple elements for building something larger, like
LEGO bricks. The abstraction level of modern software is
very high, everything is at an intuitive and understandable
level, so the practical implementation of NN modules is a
fairly easy task.</p>
        <p>The main problem of modules composition is their
learning algorithm. Inside the module there are several
networks, therefore, we can train all these networks in
different ways:
• all networks are trained together;
Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0)
• some networks are trained together, some are trained</p>
        <p>separately;
• each network is trained separately on the training sets;</p>
        <p>then they are combined into a module.</p>
        <p>
          The simplest way to train networks in a module is to use a
genetic algorithm [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. Each network has a certain number of
its parameters. For each parameter a certain number of bits is
allocated, then the parameters of all networks are combined
into one chromosome in their bit representation. After this,
the genetic algorithm is actually performed:
1) generate initial population;
2) compute fitness;
3) selection;
4) crossover;
5) mutation;
6) compute fitness;
7) if population converged, then stop, else go
to 3.
        </p>
        <p>As you can see, this training algorithm belongs to the first
type: all networks are trained together. It has several
disadvantages. Consider, for example, a module of three
networks, each of which has 100 parameters. For each
parameter, we will allocate 4 bits. The chromosome will
contain 3 * 100 * 4 = 1200 genes. The convergence of this
algorithm will require a tremendous amount of time and
resources, so, in this case it is inefficient.</p>
        <p>To maintain a balance of learning speed and accuracy, this
report proposes using the network that is based on
unsupervised learning as the first network. Its output goes to
the input of the base network, which is trained separately
under supervised learning. After the base network another
network can be placed to refine the result.</p>
        <p>The advantage of the structure presented above is that the
first network is trained without a teacher very quickly,
compared to the large base networks. It performs
preprocessing (clustering, dimensionality reduction) of the
input data, which ultimately has a positive effect on the
following base networks. This preprocessing reduces the
number of layers and neurons in the layers of the base
network, so that it will be trained much faster and more
accurately.</p>
        <p>In some simple cases, a network that is based on
unsupervised learning will already produce fairly accurate
results, so the subsequent small base network will only refine
them. In sum, the training of such two small networks will be
much less than the training of one large base network.</p>
        <p>
          In this report, the use of the Kohonen network [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] as the
first network is proposed for solving the classification
problem. It performs the separation of input data into groups,
and the subsequent base network can determine the correct
class label using the “hint” of the first network. Kohonen
network is trained very quickly. It will reduce the dimension
of the input data, and also determine the group of each input
sample. In this case, samples that belong to the same class fall
into one or neighboring groups. For more accurate work of
the Kohonen network, the use of an interpolation algorithm is
required during training.
        </p>
        <p>As a base network, various networks can be used (e.g.
perceptron, radial-basic function network, NEFClassM, etc.).</p>
        <p>The full topology of the proposed module is presented in
Fig. 2.</p>
        <p>
          To refine the classification result, bidirectional associative
memory [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] can be used after the base network.
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>B. Ensembles</title>
        <p>In simple cases, the construction of a single module may
be sufficient to achieve the required accuracy, but to solve
complex problems it is necessary to use several modules
combined into one ensemble.</p>
        <p>The construction of the ensemble allows you to look at the
problem from points of view of different modules. The usage
of elements of the ensemble as modules presented in Fig. 2
instead of simple neural networks has a set of advantages. It
requires less memory, takes less time to train and, as will be
shown in the next part, has greater accuracy than an ensemble
consisting of individual neural networks.</p>
        <p>IV.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>EXAMPLE ON REAL SET</title>
      <p>
        For the experiment, modules were used, which consist of
two networks. The first is the Kohonen network and the
second is the base network. As the base networks were used:
perceptron, radial-basis function network, counter
propagation network, probabilistic neural network,
NEFClassM, Naïve Bayes Classifier. In order to show a clear
advantage of modular topology over individual NN, a
comparison was made with the results obtained in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. For the
experiment, the same Wine data set was used: 178 samples,
13 features represented by real numbers greater than zero, 3
classes, 80% datasets were taken for the training sample and
20% for the test sample. In this experiment no data
standardization was performed.
      </p>
      <p>1. Kohonen network. At the beginning of the study, the
Kohonen network was trained. The number of output
neurons in the network is 3. In this way, after preprocessing
by this network, the data dimension decreased by more than
4 times. Network training time is 3.6 ms. On large datasets,
obviously, time will be longer, but it is not comparable with
dozens of minutes, hours, days of training large networks
with complex architectures.</p>
      <p>
        2. Perceptron. Reducing the number of input neurons
to 3 reduced the number of neurons in the hidden layer from
48, as in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], to 6, that is, 8 times. The activation function of
the neurons of hidden layer is logistic sigmoid and of output
layer is softmax function. The optimization algorithm is
Adam. Loss function is crossentropy.
      </p>
      <p>
        3. Radial-basis function network (RBFN). The number
of neurons in the hidden layer is 3, decreased by half,
compared with [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. As a radial basis function, a Gaussian
function was selected. Gaussian function centers of each
neuron of the hidden layer is initialized by centers of 3
clusters, found using the k-means algorithm in the training
sample. The optimization algorithm is Adam. Loss function
is crossentropy.
      </p>
      <p>4. Counter propagation network (CPN). The number
of neurons in the input layer is 3, in the hidden layer is 3.
Before the start of training, input vectors were normalized.
The weights of the Kohonen layer were initialized with
random values from the interval (0, 1) and normalized.</p>
      <p>5. Probabilistic neural network (PNN). The number of
neurons in the input layer is 3, in the first hidden layer is
142, in the second hidden layer is 3.</p>
      <p>
        6. NEFClassM. Number of input neurons is 3. For
each feature three initial fuzzy sets were defined with the
names “small”, “medium”, “large”. In the rule layer there are
3 neurons. From the trained rule base, one best rule was
obtained for each class. Number of output neurons is 3. The
maximum number of generated rules is 50 instead of 40 as in
[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. The parameters of fuzzy sets were trained by the
gradient method.
      </p>
      <p>7. Naïve Bayes Classifier. Distribution functions are
normal distributions. Priors is a ratio of the number of
samples in the class to the total number of samples.</p>
      <p>The learning results of the modules and all the networks
described above are presented in Table 1.</p>
      <p>
        As you can see, the simplification of the architecture and
the lack of preliminary standardization significantly affected
the accuracy of the underlying networks. At the same time,
the preprocessing by the Kohonen network gave very good
results that exceed the results presented in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
      <p>
        Subsequently, individual contributions of each network
[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] were found and the ensemble pruning operation [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] was
performed. The results obtained are shown in Table 2.
      </p>
      <p>NUMBER OF MISSCLASSIFIED SAMPLES WITH PRUNED</p>
      <p>ENSEMBLE OF INDIVIDUAL NN AND MODULES</p>
      <p>Pruned NN ensemble
Naïve Bayes, Train
NEFClassM, 7</p>
      <p>PNN</p>
      <p>
        As the results showed, the accuracy of the pruned
ensemble of modules is higher than the accuracy of an
ensemble of individual networks without preprocessing. It
also exceeded the accuracy values of the ensemble from [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
At the same time, thanks to the simplified architecture of the
basic networks, the learning time was significantly lower.
      </p>
    </sec>
    <sec id="sec-5">
      <title>CONCLUSION</title>
      <p>The results showed that the use of the neural networks
module, the first element of which is the Kohonen network,
and the second is the basic network, allows to obtain
accuracy indicators that exceed the corresponding indicators
of individual networks. At the same time, to achieve a given
accuracy, the total training time of the module is much lower
than the training of a separate network with a complex
architecture. Simplification of the topology of the basic
networks in the module allowed to reduce the memory
occupied by them.</p>
      <p>
        Due to these advantages, the construction of an ensemble
of modules of neural networks is a more efficient and fast
solution. As the results of the study showed, the pruned
ensemble of the modules presented in this report has
accuracy indicators that exceed those of the individual
networks from [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], while less memory and training time is
required.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Wozniak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Graña</surname>
          </string-name>
          and E. Corchado, “
          <article-title>A survey of multiple classifier systems as hybrid systems</article-title>
          ,” in Information Fusion, vol.
          <volume>16</volume>
          , pp.
          <fpage>3</fpage>
          -
          <lpage>17</lpage>
          ,
          <year>March 2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S. S.</given-names>
            <surname>Tirumala</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Narayanan</surname>
          </string-name>
          , “
          <article-title>Hierarchical data classification using deep neural networks</article-title>
          ,
          <source>” in Neural Information Processing</source>
          , Springer International Publishing,
          <year>2015</year>
          , pp.
          <fpage>492</fpage>
          -
          <lpage>500</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>X-F.</given-names>
            <surname>Gu</surname>
          </string-name>
          and
          <string-name>
            <given-names>L.</given-names>
            <surname>Liu</surname>
          </string-name>
          and
          <string-name>
            <given-names>J-P.</given-names>
            <surname>Li</surname>
          </string-name>
          and
          <string-name>
            <given-names>Y-Y.</given-names>
            <surname>Huang</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Lin</surname>
          </string-name>
          , “
          <source>Data classification based on artificial neural networks,” International Conference on Apperceiving Computing and Intelligence Analysis</source>
          , pp.
          <fpage>223</fpage>
          -
          <lpage>226</lpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Fernandez-Delgado</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Cernadas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Barro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Amorim</surname>
          </string-name>
          , “
          <article-title>Do we need hundreds of classifiers to solve reald world classification problems?</article-title>
          ”
          <source>in Journal of Machine Learning Research</source>
          ,
          <volume>15</volume>
          ,
          <year>2014</year>
          , pp.
          <fpage>3133</fpage>
          -
          <lpage>3181</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Mitchell</surname>
          </string-name>
          ,
          <source>Genetic Algorithms: An Overview. Complexity</source>
          , vol.
          <volume>1</volume>
          , pp.
          <fpage>31</fpage>
          -
          <lpage>39</lpage>
          ,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>T.</given-names>
            <surname>Kohonen</surname>
          </string-name>
          and
          <string-name>
            <given-names>T.</given-names>
            <surname>Honkela</surname>
          </string-name>
          , “Kohonen network,”
          <year>2007</year>
          , accessed
          <issue>:</issue>
          <year>March 2012</year>
          . [Online]. Available: http://www.scholarpedia.org/article/Kohonen network.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>O. I.</given-names>
            <surname>Chumachenko and K. D. Riazanovskiy</surname>
          </string-name>
          , “
          <article-title>Structural-parametric syntesis of neural network ensemble based on the estimation of individual contribution,” in Electronics and Control Systems</article-title>
          , No 59, pp.
          <fpage>66</fpage>
          -
          <lpage>77</lpage>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>B.</given-names>
            <surname>Kosko</surname>
          </string-name>
          , “Bidirectional associative memories,
          <source>” IEEE Transactions on Systems, Man, and Cybernetics</source>
          , vol.
          <volume>18</volume>
          , pp.
          <fpage>49</fpage>
          -
          <lpage>60</lpage>
          , January/
          <year>February 1988</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>