<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Construction Features and Data Analysis by BP-SOM Modular Neural Network</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Vladimir Gridin</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vladimir Solodovnikov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Design information technologies Center Russian Academy of Sciences</institution>
          ,
          <addr-line>Odintsovo, Moscow region</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2016</year>
      </pub-date>
      <fpage>114</fpage>
      <lpage>125</lpage>
      <abstract>
        <p>Data are a valuable resource that keeps a great potential for recovery of the useful analytical information. One of the most promising toolkits to solve the problems of data mining could be the usage of neural network technology. The problem of initial parameters values and ways of neural network construction on example of a multilayer perceptron are considered. Also information about the task and available raw data are taken into account. The modular BP-SOM network, which combines the multi-layered feed-forward network with the Back-Propagation (BP) learning algorithm and Kohonens self-organising maps (SOM), is suggested for visualization of the internal information representation and the resulting architecture assessment. The features of BP-SOM functioning, methods of rule extraction from trained neural networks and the ways of the result interpretation are presented.</p>
      </abstract>
      <kwd-group>
        <kwd>neural network</kwd>
        <kwd>multilayer feedforward network</kwd>
        <kwd>Kohonen self-organizing maps</kwd>
        <kwd>modular network BP-SOM</kwd>
        <kwd>rules extraction</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>results of the assessment based on knowledge of the problem and the available
source data. After that, the training and testing processes are taking place. Their
results are used in the decision-making process, that the network meets all the
requirements.</p>
      <p>
        Another complication in using neural network approach could be related with
the results interpretation and its preconditions. Especially clearly, this problem
appears for the multilayer perceptron (MLP) [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. In fact, neural network acts as
a ”black box”, where the source data are sent to the input neurons and the result
is got from the output, but the explanation about the reasons of such a solution
is not provided. The rules are contained in the weight coefficients, activation
functions and connections between neurons, but usually their structure is too
complex for understanding. Moreover, in the multilayer network, these
parameters may represent non-linear, non-monotonic relationship between input and
target values. So, generally, it is not possible to distinguish the influence of a
certain characteristic to the target value, because the effect is mediated by the
values of other parameters. Also, you may experience some difficulties in using
the learning algorithm of back-propagation BP, both with local extremes of the
error function, and with the solutions of a certain class of problems.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>The architecture and initial values choice for the neural network</title>
      <p>
        The choice in favor of neural network architecture can be based on knowledge
of the problem being solved and the available source data, their dimension and
the samples scope. There are different approaches for choosing the initial values
of the neural network characteristics. For example, the ”Network Advisor” of
the ST Neural Networks package offers by default one intermediate layer with
the number of elements equals to the half of the sum of the quantity of inputs
and outputs for the multilayer perceptron. In general, the problem of choosing
the number of hidden elements for the multilayer perceptron should account
two opposite properties, on the one hand, the number of elements should be
adequate for the task, and on the other, should not be too large to provide
the necessary generalization capability and avoid overfitting. In addition, the
number of hidden units is dependent on the complexity of the function, which
should be reproduced by the neural network, but this function is not known in
advance. It should be noted that while the number of elements increases, the
required number of observations also increases. As an estimate, it is possible to
use the principle of joint optimization of the empirical error and the complexity
of the model, which takes the following form [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]:
min{
 ℎ

+ 
 ℎ

}
length of the error description equals to zero. The second part makes sense to the
amount of information needed to select a specific model from the set of all
possible. Its accounting allows to apply the necessary constraints on the complexity
of the model by suppressing an excessive amount of tuning parameters.
      </p>
      <p>The accuracy of the neural network function approximation increases with
the number of neurons in the hidden layer.</p>
      <p>When there are ℎ neurons the error could be estimated as  (︀ 1 )︀ . Since the
number of outputs in the network does not exceed, and typically much smaller
than the number of inputs, so the main number of weights in the two-layer
ℎ
network would be concentrated in the first layer, i.e. 
the total number of weights in the network as follows:  (︀  )︀ .
input dimension. In this case, the average approximation error is expressed by</p>
      <p>The network description is associated with the models complexity and is
basically comes down to the consideration of the amount of information in the
transmission of its weights values through some communication channel. If we
accept the hypothesis  about the network settings, its weights and the number
of neurons, the amount of information (in the absence of noise) while
trans
=  · ℎ , where  - is the
ferring the weights will be − log ( 
event before the message arrives at the receiver input. For a given accuracy this
description requires about − log  ( ) ∼ 
one pattern associated with the complexity of the model could be estimated as:
∼   , where  is the number of patterns in the training set. The error decreases
monotonically with increasing the number of patterns. So Haykin, using the
results from the work of Baum and Hessler, gives recommendations about the
volume of a training sample relative to the number of weighting coefficients and
taking into account the proportion of errors allowed during the test, which can
be expressed by the inequality:  ≥   , where  is the proportion of errors which
allowed during testing. Thus, when 10% of errors are acceptable then the
number of training patterns must be 10 times greater than the number of available
bit. Therefore, a specific error for
), where  
is the probability of this
weighting coefficients in the network.</p>
      <p>Thus, both components of the network generalization error from expression
(1) were considered. It is important that these components are differently depend
on the network size (number of weights), which implies the possibility of choosing
the optimal size that minimizes the total error:
where  , ℎ are the quantity of neurons in input and hidden layers,  is the
number of weights and  is the amount of patterns in the training sample.
(1) ∼ 

+


≥ 2 ·
︃√



√
Minimum error (equal sign) is achieved with the optimal number of the weights
∼
 ·  which corresponds to the number of neurons in the
(2)
(3)
3</p>
    </sec>
    <sec id="sec-3">
      <title>Evaluation of the resulting architecture and parameters</title>
      <p>
        After the network architecture was selected the learning process is carried out. In
particular, for a multi-layer perceptron this may be the error back-propagation
(BP) algorithm. One of the most serious problems is that the network is trained
to minimize the error on the training set, rather than an error that can be
expected from the network while it will process completely new patterns. Thus,
the training error will differ from the generalization error for the previously
unknown model of the phenomenon in the absence of the ideal and the infinitely
large training sample [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>
        Since the generalization error is defined for data, which are not included in the
training set, the solution may consist of separation of all the available data into
two sets: a training set, which is used to match the specific values to the weights,
and validation set, which evaluates the predictive ability of the network and
selects the models optimal complexity. The training process is commonly stops
with consideration of ”learning curves” which track dependencies of learning and
generalization errors according to the neural network size [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ]. The optimum
matches to local minima and points, where the graphs meet asymptote. Figure
1 shows the stop point, which corresponds to the validation error minimum
(dash-dotted line), while the training error (solid line) keeps going down.
      </p>
      <p>
        Another class of learning curves uses the dependencies of the neural network
internal properties to its size, and then mapped to an error of generalization.
For example, in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] the analysis of the internal representation of the problem
being solved, relationship of the training error and the maximum sum of the
synapse weights modules attributable to the neuron of network are carried out.
Also there are variants of generalized curves, which are based on dependence of
the wave criterion from the neural network size [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] or perform a comparison of
the average module values of synapse weights [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>In simplified form, the following criteria could be formulated to assess the
already constructed neural network model:
– if the training error is small, and testing error is large, it means that the
network includes too much synapses;
– if the training error and testing error is large, then the number of synapses
is too small;
– if all the synapse weights are too large, it means that there are too few
synapses.</p>
      <p>After evaluation of the neural network model, the decision-making process is
taking place about the necessity of changing the number of hidden elements
in one or another direction, and the learning process should be repeated. It is
worth to mention that modified decision trees, which are based on the
firstorder predicate logic, could be applied as a means of decision making support
and neural network structure construction automation.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Rules extraction and results interpretation</title>
      <p>Generally speaking, there are two approaches to extract rules from the multilayer
neural networks. The first approach is based on extraction of global rules that
characterize the output classes directly through the input parameter values. An
alternative is in extraction of local rules, separating the multilayer net on a set of
single-layer networks. Each extracted local rule characterizes a separate hidden
or output neuron based on weighted connections with other elements. Then rules
are combined into a set, which determines the behavior of the whole network.</p>
      <p>
        The NeuroRule algorithm is applicable for rules extraction from the trained
multilayer neural networks, such as perceptron. This algorithm performs the
network pruning and identifies the most important features. However, it sets
quite strict limitations on the architecture, the number of elements, connections
and type of activation functions. As an alternative approach may be highlighted
TREPAN type algorithms which extract structured knowledge not only of the
extremely simplified neural networks, but also arbitrary classifiers in the form of
a decision tree [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. However, this approach does not take into account structural
features that can introduce additional information.
      </p>
      <p>
        The decision of such kind of problems could be based on the usage of the
modular neural network BP-SOM[
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
5
5.1
      </p>
    </sec>
    <sec id="sec-5">
      <title>Modular neural network BP-SOM</title>
      <sec id="sec-5-1">
        <title>Network architecture</title>
        <p>
          The main idea is to increase the reactions similarity of the hidden elements
while processing the patterns from the sample, which belong to the same class.
The traditional architecture of the direct distribution network [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ], in particular
the multi-layer perceptron with the back-propogation learning algoritm (BP),
combines with the Kohonen self-organizing maps (SOM) [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], where each hidden
layer of the perceptron network is associated with a certain self-organizing map.
The structure of such a network is shown in Figure 2.
which are specific to the learning rules of its component parts [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. First, the
initial vector from the training sample is fed to the input of the network and its
direct passage is carried out. At the same time, the result of neurons activation
in each hidden layer is used as a vector of input values for the corresponding
SOM network. Training of SOM components is carried out in the usual way
and ensures their self-organization. In further, this self-organization is used to
account classes, which tags are assigned to each element of the Kohonen maps.
For this purpose, a counting is taking place, which purpose is to get the number
of times the SOM-neuron became the winner and determine what class the initial
vector of training sample belongs to. The winner is chosen from the SOM-neuron,
whose weights vector is the closest in terms of the Euclidean distance measure
to the output values vector of the hidden layer neurons. The most common class
is taken as the mark. Reliability is calculated from the ratio of the number of
class mark occurrences to the total number of victories of the neuron, i.e. for
example, if the SOM-neuron became 4 times winner of class A and 2 times for
class B, class A label with certainty 4/6 is selected. The total accuracy of the
self-organizing map is equal to the average reliability of all elements of the card.
Also, SOM allows data visualizing and displays areas for the various classes (Fig.
2). Learning rule of the multilayer perceptron component part is carried out by
the similar to BackPropagation (BP) algorithm, minimizing aggregate square
error:
where index  runs through all outputs of the multi-layer network,   is the
desired output of the neuron  ,   is the current output of the neuron  from
the last layer. This error is transferred over the network in the opposite
direction from the output to the hidden layers. Also, an additional error component
  for neurons from hidden layers is introduced, which is based on
information about the particular class of input vector and taking into account the
self-organizing map data. Thus, in the SOM, which corresponds to the current
hidden layer, the searches for an special element   is taking place. This
element should be closest, in terms of Euclidean distance, to the output vector
of the hidden layer  ℎ , and be the same class label as the input vector. The
distance between the detected vector   and the vector  ℎ is taken as
the error value   and accounted for all of the hidden layer neurons. If
the item   is not found, then the error value   is assumed to be
0. Thus, the total error for the neurons of the hidden layer takes the form:



= (1 −  ) ·  
+  ·  · 

,
(5)
where   is the error of perceptron (from BackPropagation algorithm);
  is the error of the winner neuron from Kohonen network;  is the
reliability factor of the winner neuron from Kohonen network;  is the influence
coefficient of the Kohonen network errors (if the value is equal to 0, then it will
become original BP).
        </p>
        <p>
          The results of the Kohonen maps self-organization are used for changing of
the weight coefficients in the process of network training. This provides an effect,
in which the activation of neurons in the hidden layer will become more similar
to all other cases processing vectors of the same class [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. The SOM with the
dimension 7 per 7 is shown in Figure 3.
        </p>
        <p>It characterizes the reaction of the hidden layer neurons of the BP-SOM
network, which was trained to solve the problem of classifying for two classes.
Map on the left corresponds to the base algorithm of back-propagation (BP), the
right - with the influence of the Kohonen map, i.e. BP-SOM training. Here white
cells correspond to class A, and black cells to class B. In turn, the size of the
shaded region of the cell determines the accuracy of the result. So, completely
white or completely black cell is characterized by the accuracy of 100% .</p>
        <p>This could ensure structuring and visualization of information extracted from
data, improve the perception of the under study phenomenon and help in the
network architecture selection process. For example, if it is impossible to isolate
the areas at the SOM for the individual classes, then there are not enough
neurons and their number should be increased. Moreover, this approach can
simplify the rules extraction from already trained neural network and provide
the result in a hierarchical structure of the consistent rules such as ”if-then”.
5.3</p>
      </sec>
      <sec id="sec-5-2">
        <title>Rules extraction</title>
        <p>
          As an example, let’s consider a small test BP-SOM network that will be trained
to solve the classification problem which is defined by the next logical function
[
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]:
        </p>
        <p>( 0,  1,  2) = ( 0 ∧  ¯1 ∧  ¯2) ∨ ( ¯0 ∧  1 ∧  ¯2) ∨ ( ¯0 ∧  ¯1 ∧  2)</p>
        <p>This function is set to True (Class 1) only in case where one of the arguments
is True, otherwise the function value is False (Class 0). Two-layer neural network
could be used for implementation, which consists of three input elements, three
neurons in the hidden layer, and two neurons in the resulting output layer. The
dimension for the Kohonen map for the intermediate layer is 3×3 (Figure 4).</p>
        <p>Four elements of the Kohonen map acquired class labels with certainty 1
and 5 elements were left without a label, and their reliability is equal to 0 after
training (Figure 4).</p>
        <p>
          One of the methods for rules extraction from such a neural network could be
the algorithm, which was designed for classification problems with digital inputs.
It consists of the following two steps [
          <xref ref-type="bibr" rid="ref10 ref11">10,11</xref>
          ]:
1. Searching for such groups of patterns in the training set, which are
potentially could be combined into individual subsets, each of which is connected
to one element of the Kohonen map (Table 1).
2. Then, each of the subsets is examined to identify the values of the inputs that
have constant value in the subset. For example, in the subgroup associated
with the element  1 all the attributes  0,  1 and  2 have a constant value
0, and for the element  3 value 1 respectively.
SOM element Class  0  1  2
        </p>
        <p>Thus, it is possible to get the following two rules from the Table 1:
– 
– 
( 0 = 0 ∧  1 = 0 ∧  2 = 0) 
( 0 = 1 ∧  1 = 1 ∧  2 = 1)</p>
        <p>However, it is rather problematic to use this way to distinguish rules for
elements  7 and  9. Of course, it is possible to compose a disjunction of all the
possible options for each SOM-element, but these rules would be complicated in
perception.</p>
        <p>Additional available information consists of the attributes sum for each
patterns from the training sample. It is easy to notice that each SOM-element is
responsible for a certain obtained sum value (Table 2).</p>
        <p>1  3  7  9
Sum</p>
        <p>
          0
Class 0
3
0
1
1
2
0
attribute values of the input vectors and their weight coefficients for extraction
rules, which corresponds to Kohonen map elements. This may be done by
backpropagation of minimum and maximum values of the neuron activation back
to the previous layer, i.e. we have to apply the function  −1( 
activation function) to the output neuron value [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ].

) (inverse to

) =  −1( (∑︁

   
 
︁∑
        </p>
        <p>+ 
 )) =
   
 
+ 
 ,
(6)
where   
is the  -th neuron output from the current layer,  
 
is the
output of the  -th neuron of the previous layer. Assuming that sigmoid was used
as the neurons activation function, then in this case we will get:

) = − ln(   
1</p>
        <p>Additionally, it is known that self-organizing map elements, which are
connected to the elements of the first hidden layer of the perceptron, will respond to
proximity of the weight vectors and outputs of the hidden layer neurons.
Therefore, it is proposed to replace the back-propagation neural activation of the
hidden layer to the weight vector values of the self-organization map element
during the rules construction for each SOM-element. For example, if the hidden
layer contains neuron  , and the weight between this neuron and SOM-element
 was denoted by   , then the restriction takes the form:
 −1(  ) − 
    
1 ≈ (
    )/( −1(  ) − 
 ).</p>
        <p>(8)
︁∑</p>
        <p>Such restrictions, which were obtained for all the neurons of the hidden layer,
could be used for construction of following rules:

(∧ ( −1(  ) − 
 ≈
︁∑

    ))  
(
= 
( )).</p>
        <p>(9)
Similar rules are required for all SOM-components, the reliability of which
exceeds a certain threshold.</p>
        <p>If we apply the considered method for the initial example, then four sets
of restrictions would be obtained. Each set includes three restrictions, which
corresponds to the number of neurons in the hidden layer of the perceptron.</p>
        <p>For SOM-element  1 :
– 1 ≈ 687 *  0 + 687 *  1 + 687 *  2;
– 1 ≈ 738 *  0 + 738 *  1 + 738 *  2;
– 1 ≈ 1062 *  0 + 1062 *  1 + 1062 *  2.
– 1 ≈  0 +  1 +  2;</p>
        <p>Restrictions for element  9:
– 1 ≈ 0.5 *  0 + 0.5 *  1 + 0.5 *  2;</p>
        <p>Taking into account that  0,  1,  2 ∈ {0, 1} then the best results may be achieved</p>
        <p>For element  3 all restrictions coincide and have the form:
– 1 ≈ 0.33 *  0 + 0.33 *  1 + 0.33 *  2;
Thus, the values are as follows:  0 = 1,  1 = 1,  2 = 1.</p>
        <p>For element  7 restrictions coincide and take the form:
This corresponds to the case when only one of the attributes is equal to 1.
This condition characterizes the case when two of three attributes are 1.</p>
        <p>Thus, it is obtained the following set of rules, if all restrictions would be
generalized:
– 
– 
– 
– 
( 0 +  1 +  2 ≈ 0)  
( 0 +  1 +  2 ≈ 3)  
( 0 +  1 +  2 ≈ 1)  
( 0 +  1 +  2 ≈ 2)  
(
(
(
(
It is easy to notice that this set correctly describes all the elements of a
selforganizing network.
6</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Conclusion</title>
      <p>Approaches for the selection of the initial values of the neural network
parameters were considered. The analysis of the training process, its stopping criteria
and evaluation of the received architecture were carried out. Combining
different neural network architectures, such as multi-layer perceptron with the
backpropagation learning algorithm, and Kohonen’s self-organizing maps, could bring
additional possibilities in the learning process and in the rules extraction from
trained neural network. Self-organizing maps are used both for information
visualization, and for influencing the weights changes during the network training
process. That provides an effect, in which the neurons activation in the hidden
layer will become increasingly similar to all the cases processing vectors of the
same class. This ensures the extracted information structuring and has the main
purpose to improve the perception of the studied phenomena, assist in the
process of selecting the network architecture and simplify the extraction rules. The
results could be used for data processing and hidden patterns identification in
the information storage, which could become the basis for prognostic and design
solutions.</p>
      <p>Acknowledgeents. Work is carried out with the financial support of the RFBR
(grant 15-07-01117-a).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Ezhov</surname>
            <given-names>A. A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shumskiy</surname>
            <given-names>S. A.</given-names>
          </string-name>
          <string-name>
            <surname>Neyrokomp</surname>
          </string-name>
          <article-title>'yuting i ego primeneniya v ekonomike i biznese</article-title>
          . Moscow, MEPhI Publ.,
          <year>1998</year>
          . (in Russian)
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Bishop C.M. Neural</surname>
          </string-name>
          <article-title>Networks for Pattern Recognition</article-title>
          . Oxford Press.
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Watanabe</surname>
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shimizu</surname>
            <given-names>H</given-names>
          </string-name>
          .
          <article-title>Relationships between internal representation and generalization ability in multi layered neural network for binary pattern classification problem /</article-title>
          <source>Proc. IJCNN</source>
          <year>1993</year>
          , Nagoya, Japan,
          <year>1993</year>
          . Vol.
          <volume>2</volume>
          . pp.
          <fpage>1736</fpage>
          -
          <lpage>1739</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Cortes</surname>
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jackel L. D.</surname>
          </string-name>
          ,
          <string-name>
            <surname>Solla</surname>
            <given-names>S. A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vapnik</surname>
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Denker</surname>
            <given-names>J. S.</given-names>
          </string-name>
          <article-title>Learning curves: asymptotic values and rate of convergence /</article-title>
          <source>Advances in Neural Information Processing Systems</source>
          <volume>7</volume>
          (
          <year>1994</year>
          ). MIT Press,
          <year>1995</year>
          . pp.
          <fpage>327</fpage>
          -
          <lpage>334</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Lar</surname>
          </string-name>
          <article-title>'ko, A. A. Optimizaciya razmera nejroseti obratnogo rasprostraneniya</article-title>
          . [Electronic resource]. http://www.sciteclibrary.ru/rus/catalog/pages/8621.html.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Caregorodcev</surname>
            <given-names>V.G.</given-names>
          </string-name>
          <article-title>Opredelenie optimal'nogo razmera nejroseti obratnogo rasprostraneniya cherez sopostavlenie srednih znachenij modulej vesov sinapsov. /Materialy 14 mezhdunarodnoj konferencii po nejrokibernetike, Rostov-na-</article-title>
          <string-name>
            <surname>Donu</surname>
          </string-name>
          ,
          <year>2005</year>
          . T.2. S.
          <volume>60</volume>
          -
          <fpage>64</fpage>
          . (in Russian)
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Gridin</surname>
            <given-names>V.N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Solodovnikov</surname>
            <given-names>V.I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Evdokimov</surname>
            <given-names>I.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Filippkov S</surname>
          </string-name>
          .V.
          <article-title>Postroenie derev'ev reshenij i izvlechenie pravil iz obuchennyh nejronnyh setej / Iskusstvennyj intellekt i prinyatie reshenij</article-title>
          <year>2013</year>
          . 4 Str.
          <fpage>26</fpage>
          -
          <lpage>33</lpage>
          . (in Russian)
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Weijters</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <article-title>The BP-SOM architecture and learning rule</article-title>
          .
          <source>Neural Process-ing Letters</source>
          ,
          <volume>2</volume>
          ,
          <fpage>13</fpage>
          -
          <lpage>16</lpage>
          ,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>T.</given-names>
            <surname>Kohonen.</surname>
          </string-name>
          Self-Organization and
          <string-name>
            <given-names>Associative</given-names>
            <surname>Memory</surname>
          </string-name>
          , Berlin: Springer Verlag,
          <year>1989</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Ton</surname>
            <given-names>Weijters</given-names>
          </string-name>
          , Antal van den Bosch,
          <article-title>Jaap van den Herik Interpretable neural networks with BP-SOM, Machine Learning:</article-title>
          <source>ECML-98, Lecture Notes in Computer Science</source>
          Volume
          <volume>1398</volume>
          ,
          <year>1998</year>
          , pp
          <fpage>406</fpage>
          -
          <lpage>411</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>J.Eggermont</surname>
          </string-name>
          ,
          <article-title>Rule-extraction and learning in the BP-SOM architecture</article-title>
          ,
          <year>Thesis 1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Sebastian</surname>
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Thrun</surname>
          </string-name>
          .
          <article-title>Extracting provably correct rules from artificial neural networks</article-title>
          .
          <source>Technical Report IAI-TR-93-5</source>
          , University of Bonn, Department of Computer Science,
          <year>1993</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>