Some approaches to improving the quality of
               artificial neural network training
          Yefim Rozenberg                                      Alexey Olshansky                                  Ignat Dovgerd
 JSC Railway Signalling Institute (JSC                JSC Railway Signalling Institute (JSC            JSC Railway Signalling Institute (JSC
               NIIAS)                                               NIIAS)                                           NIIAS)
          Moscow, Russia                                        Moscow, Russia                                  Moscow, Russia
       greatfime@gmail.com                                  lexolshans@gmail.com                                ignatikus@bk.ru

            Gleb Dovgerd                                      Alexander Ignatenkov                               Paul Ignatenkov
 JSC Railway Signalling Institute (JSC                         JSC VTB Capital AM                            Smolensk State University
               NIIAS)                                            Moscow, Russia                                 Smolensk, Russia
           Moscow, Russia                                   a.ignatenkov@gmail.com                             beat.pi@gmail.com
      christmas1409@yandex.ru

    Abstract—The paper deals with improving the quality                      neural network with variable signal conductivity
of artificial neural network (ANN) training. The research                    (abbreviated as MANN VSC) to be applied for scheduling.
covers a complex neural network consisting of 2-dimensional                  Currently, this subject is considered to be the main source for
Kohonen network and Wilshaw and von der Malsburg                             research in the field of improving the quality of education.
network capable of solving scheduling problems in
transport. Existing results of using optimal control theory for                 ANN VSC is a hybrid neural network combining the
ANN training are analysed; the authors suggest a new                         characteristic features of a multilayer perceptron, the
technique based on the direct neural control. Comparative                    Wilshaw – von der Malsburg network with the Hopfield
error values during the training process using both the                      network.
traditional methods and a new approach are presented. The
new technique proves to be better than the traditional one for               II. ABOUT OPTIMAL CONTROL IN NEURAL NETWORKS TASKS
considered neural networks.                                                      Recently, the scope of application of neural networks has
                                                                             expanded considerably. The most popular tasks are synthesis
   Keywords—artificial neural network, Kohonen network,
                                                                             of control systems, identification tasks, data processing,
multilayered neural network, control
                                                                             information recovery tasks, scheduling problems and other
                         I.    INTRODUCTION                                  original activities (e.g. creating new pictures and arts).
    Issues related to scheduling have always been of great                       Despite routine modifications of the structure and
significance for railway industry. Among the most common                     topologies of ANN and training methods, ANN is a system
scheduling tasks one can mention routing, timetabling,                       controllable only by using sets of recommendations based on
volume planning, timetabling and volume planning, etc.                       heuristic approaches [2], numerical experiments, etc. Most
Solving these tasks with strict methods we face certain                      authors emphasize that the quality of ANN training and the
problems such as combinatory complexities, exhaustive                        development and creation of neural network solutions is a
searches, computer memory deficiency, and time-consuming                     complicated scientific problem. Sometimes we may see
computations. In this case a number of heuristic algorithms                  certain attempts of combined application of ANN and
are used (the Monte Carlo algorithm, evolutional algorithms,                 optimal control theory as a rigor mathematical method
neural networks etc.). The present paper is aimed at                         applicable for any task.
illustrating how neural networks (a special category) solve
                                                                                 Paper [10] contains an attempt to create an algorithm for
timetabling tasks and create methods to control the quality of
                                                                             the development of the deep convolutional neural networks
ANN training. The ANN under investigation [1] looks like 2-
                                                                             using manifold compactification. This approach is suitable
dimensional modification of the Kohonen and Wilshaw and
                                                                             for computer vision ANN but it is inconvenient for MANN
von der Malsburg network.
                                                                             with variable signal conductivity due to dissimilarity of their
    When seeking a neural network solution of every task we                  structures.
should answer the following questions:
                                                                                 The theses [9] are more relevant for the ANN under
   1.    How to translate the task into the language
                                                                             consideration but it is impossible to apply the general idea of
“understandable” for the neural networks; how to find the
                                                                             [9] because MANN follows its own rules of output
correspondence between the states of neurons and the values
                                                                             calculation. Traditionally an artificial neural network
of optimized parameters?
                                                                             implements an epoch as a full sequence of pairs “input-
   2.    How to construct a network energy function with
                                                                             output” but MANN under consideration does not work with
given constraints and given target function?
                                                                             the set of different examples [8].
    Immediately we run into two difficulties:
    1. How to establish a correspondence between the                             We should focus on paper [6] where the author suggests a
members of a network energy function and the members of                      genetic algorithm to optimize the vector of hyperparameters
the general form of network energy?                                          for convolutional neural networks. The closest result is in [7]
    2. How to calculate weighting factors for penalty                        where an asynchrony mover is a control object and two
functions?                                                                   neural networks are suggested. The first network creates a
    One of the first attempts to overcome these shortcomings                 control signal; the second one catches the difference between
with regard to railway transport dates back to 2015 [1] and is               the desired output and the measurable output.
connected with the development of a multilayer artificial


Copyright © 2020 for this paper by its authors.
Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0)
Data Science

    Paper [3] deals with constructing the optimal time                    integral regulation, K3 is the coefficient of derivative
sequences which consist of weights between neurons of a                   regulation. The PID-controller is implemented in the
dynamic ANN. In [3] the two-point boundary value                          programming language R in the RStudio environment and
nonlinear problem is solved. It yields optimal rules of the               after that it is incorporated into the code of multilayered
ANN training. The weight matrix of the ANN in every time                  ANN. The novelty of this approach is in the controllable
step (epoch) is set as an optimal time sequence. The authors              object (the MANN as a kind of ANNs) and in the universal
note that at best the weight matrix at the final time step                algorithm to transform a concrete PID-control curve to a
relates to the symmetric matrix constructed by J.J. Hopfield              strict indicator which sets a direction of the MANN signal
for associated memory [4].                                                trajectory.
   Initial conditions are set as an input vector concatenated                 Fig. 1 shows dynamics of changing the error signal for
several training samples.                                                 the MANN consisting of 27 layers and 1920 neurons in each
                                                                          layer and with 185 schedules as a computational load
    The functional (the criterion) of quality minimizes the               without control.
value which is an opposite value of correlation between the
output of the neuron and the desired output of the neuron at
the final time step of controlling. During the time interval
between the first step and the last step of controlling the
functional penalizes miscorrelation level between the desired
output and the answer of activation function of each neuron.
   In this case an optimal control strategy is founded as
Lagrange problem for a task of an optimal program control
of the multilayered perceptron with a sigmoid activation
function.
   Another way of control is applying PID-controllers as a                Fig. 1. The MANN error signal (a typical mode with a traditional
                                                                          algorithm [8], no control).
control technique.
    A few more papers concerning ANN application for                           Fig. 2 shows the desired error change signal.
scheduling tasks should be mentioned. These solutions refer
to scheduling, too; nevertheless, they touch upon
modification of ANN activation function or the ANN
structure. Thus, paper [11] analyzes a pickup of empirical
coefficients for multilayer perceptrons and describes
transferring to stochastic methods of weight modification at
Hopfield models, etc.
    Papers [12-16] address NP-hard problems (timetabling
tasks, path searching in graphs) and its neural network
solutions with different types of ANN (MLP, LSTM, CNN,
etc.) and with various key algorithms (genetic algorithms,                Fig. 2. Setup change (the principal view of the desired signal).
adjusting ANN parameters, error back propagation, standard
searching).                                                                   The authors organized and conducted about 1200 starts
   However, these papers, like other articles analyzed                    of the ANN with different parameters of the proportional
above, do not consider an artificial neural network as a                  (ranging from 0.1 to 1), the integral (from 10 to 40) and the
controllable object using optimal control theory.                         differential (from 0.1 to 4.1) error components and the
                                                                          disturbations value from 5 till 60 points per every time step.
    Paper [16] is a meta-study about various approaches to                It is not a not very efficient method of control because it
solving schedule problems with different recommendations –                provides only 10% stable trajectories. The stability is taken
from project management techniques to neural expert                       into consideration in a Lyapunov sense [5]. Computational
systems but without any neurocontrol and adjustments.                     experiments illustrate that the marginal critical value of the
                                                                          disturbations feed to the ANN is no more than 10-15% of the
    Examination of articles [11-16] leads to conclusion that              average error in the stable mode (Table 1.). This result
the problem of improving the quality of neural network                    cannot be evaluated as practical.
solutions is being analyzed in many countries. However,
mission statement with regard to neurocontrol as a control                IV. DIRECT NEUROCONTROL FOR MULTILAYERS ARTIFICIAL
task with two ANN has not yet received attention it deserves.                        NEURAL NETWORKS AND ITS ADVANTAGES
In the field of neurocontrol this problem is rightfully
considered novel. It refers both to optimal control theory and                Along with the traditional training algorithm the authors
hybrid neurocontrol.                                                      suggest a direct neurocontrol mode for training. The object to
                                                                          be controlled is a multilayered ANN with variable signal
     III. ABOUT PID-CONTROL IN NEURAL NETWORKS                            conductivity [1]; a three-layer perceptron with sigmoid
                                                                          activation functions is taken as a controller.
    PID control of the ANN error signal is found with the
following classical formula [5]:                                               The main scheme of control is shown in Fig.3.
                    Gss*s                               The ANN-controller is trained by the aggregation of
where s is the argument of the transfer function, K1 is the               triple sets “ The level of error per epoch” – “The level of
coefficient of proportional regulation, K2 is the coefficient of


VI International Conference on "Information Technology and Nanotechnology" (ITNT-2020)                                                       28
Data Science

error at the previous moment” – “The control signal from the                                              networks with direct neurocontrol are of much better quality
previous time step to the present time step” or “The previous                                             (as compared with those obtained with traditional algorithms.
level of error” – “The current level of error” – “The control
signal”.                                                                                                                       ACKNOWLEDGMENT
                                                                                                              This research was provided by Russian Foundation for
                        The Executive mechanism (the                                                      Basic Research (the project #17-20-01065 “A theory of
                       algorithm of signal transmission)
                                                                                                          railway transport system neural network control”).
                                                 Training technique
                                                                                                                                         REFERENCES
                         Multilayered Artificial Neural
                         Network with variable signal                           Time delay                [1]  A. Olshansky and A. Ignatenkov, “One approach to control of a
                                                            Output
              Input              conductivity                                                                  neural network with variable signal conductivity,” Information
                                                                                                               tecnologiesn and Nanotechnologies (ITNT), pp. 984-987, 2017.
                                                                       Output
                                                                                                          [2] A.V. Nazarov and A.I. Loskutov, “Neural networks algorithms of
                                                                                                               forecast and system optimization,” Saint Petersburg: Science and
                                                                                                               technic, 2003, 384 p.
                                ANN-controller                                    Discrepancy summation
                                                                                                          [3] O. Fahotimi, A. Dembo and T. Kailath, “Neural network weight
                                                            Output forecast
                                                                                                               matrix synthesis using optimal control techniques,” Advances in
                                                                                                               Neural Information Processing Systems-2 (NIPS-2), USA, Denver,
              Control signal                                                                                   Colorado, USA, Stanford,1989.
                                                                                                          [4] J.J. Hopfield, “Neural networks and physical systems with emergent
Fig. 3. The scheme of a direct neurocontrol mode.
                                                                                                               collective computational abilities,” Proc. Natl. Acad. Sci. Biophysics,
                                                                                                               USA, vol. 79 pp. 2554-2558, 1982.
    The current error signal of the ANN and the previous one                                              [5] R.C. Dorf and R.H. Bishop, “Modern control systems,” Pearson,
are gathered and entered the trained and ready multilayer                                                      2011.
perceptron. An answer signal of the ANN-controller entered                                                [6] Y.R. Tsoy, “Neuroevolution algorithm and software for image
the discrepancy summation and actuating mechanism (an                                                          processing: The dissertation on competition of a scientific degree of
algorithm). Hereinafter the value of summated discrepancy is                                                   Candidate of technical sciences,” Tomsk Polythechnic University,
also fed by the ANN-controller.                                                                                2007, 209 p.
                                                                                                          [7] A.M. Sagdatullin, “A neural network controller for the velocity value
    The control scheme described above was tested for the                                                      of an asynchronic motor,” Theses of Russian Congress for Control
concrete scheduling problem (the railway branch Arkhara –                                                      Problems, Russia, Moscow, Institute of control sciences of Russian
Volochaevka, 27 railway stations). The task included 185                                                       Academy of Sciences, pp. 4485-4498, 2014.
trains per 24 hours.                                                                                      [8] A.M. Olshansky and A.V. Ignatenkov, “Development of an artificial
                                                                                                               neural network for constructing a train schedule,” Bulletin of the
    The results of testing are given in the table 1.                                                           Ryazan State Radio Engineering University, vol. 55, pp. 73-80, 2016.
                                                                                                          [9] I.M. Kulikovskikh, “Reducing computational costs in deep learning
    TABLE I.               A COMPARISON OF DIFFERENT TRAINING METHODS                                          on almost linearly separable training data,” Computer Optics, vol. 44,
                                                                                                               no. 2, pp. 282-289, 2020. DOI: 10.18287/2412-6179-CO-645.
   Training error                                          PID-controller                   Direct
                                                                                                          [10] Yu.V. Vizilter, V.S. Gorbatsevich and S.Y. Zheltov, “Structure-
      (points)                                                (the best                  neurocontrol
                               Traditional                 configuration
                                                                                                               functional analysis and synthesis of deep convolutional neural
                               algorithms                       with                                           networks,” Computer Optics, vol. 43, no. 5, pp. 886-900, 2019. DOI:
                                                            K1/K2/K3 =                                         10.18287/2412-6179-2019-43-5-886-900.
                                                             0.1/40/2.1)                                  [11] A.S. Jain, “Meeran S. Job-shop scheduling using neural networks,”
        Min                          75                          362                          193              International Journal of production research, vol. 36, no. 5, pp. 1249-
                                                                                                               1272, 1998.
        Max                      134795                         211585                       57895
                                                                                                          [12] Z. Li, Q. Chen and V. Koltun, “Combinatorial optimization with
      Median                       5469                           471                         210              graph convolutional networks and guided tree search,” Advances in
                                                                                                               Neural Information Processing Systems, pp. 539-548, 2018.
      Average                     16548                          1830                         384         [13] A. Milan, “Data-driven approximations to NP-hard problems,”
                                                                                                               Thirty-First AAAI Conference on Artificial Intelligence, 2017.
         SD                        6687                          4485                        1180
                                                                                                          [14] J. Bruck and J.W. Goodman, “On the power of neural networks for
    Rate of error                    50                               15                      0.4              solving hard problems,” Neural Information Processing Systems, pp.
     overshoot                                                                                                 137-143, 1988.
                                                                                                          [15] A. Chaudhuri and K. De, “Job Scheduling Problem Using Rough
                                  V. CONCLUSIONS                                                               Fuzzy Multilayer Perception Neural Networks,” Journal of Artificial
                                                                                                               Intelligence, vol. 1, no. 1, 2010.
    Thus, this work shows the principal possibility to control                                            [16] S.J. Noronha and V.V.S. Sarma, “Knowledge-based approaches for
the multilayered artificial neural network with variable signal                                                scheduling problems: A survey,” IEEE Transactions on Knowledge
conductivity. The three layered perceptron with the                                                            and Data Engineering, vol. 3, no. 2, pp. 160-171, 1991.
sigmoidal activation function is used as a controller. The
solutions achieved using multilayered artificial neural


VI International Conference on "Information Technology and Nanotechnology" (ITNT-2020)                                                                                             29