Some approaches to improving the quality of artificial neural network training Yefim Rozenberg Alexey Olshansky Ignat Dovgerd JSC Railway Signalling Institute (JSC JSC Railway Signalling Institute (JSC JSC Railway Signalling Institute (JSC NIIAS) NIIAS) NIIAS) Moscow, Russia Moscow, Russia Moscow, Russia greatfime@gmail.com lexolshans@gmail.com ignatikus@bk.ru Gleb Dovgerd Alexander Ignatenkov Paul Ignatenkov JSC Railway Signalling Institute (JSC JSC VTB Capital AM Smolensk State University NIIAS) Moscow, Russia Smolensk, Russia Moscow, Russia a.ignatenkov@gmail.com beat.pi@gmail.com christmas1409@yandex.ru Abstract—The paper deals with improving the quality neural network with variable signal conductivity of artificial neural network (ANN) training. The research (abbreviated as MANN VSC) to be applied for scheduling. covers a complex neural network consisting of 2-dimensional Currently, this subject is considered to be the main source for Kohonen network and Wilshaw and von der Malsburg research in the field of improving the quality of education. network capable of solving scheduling problems in transport. Existing results of using optimal control theory for ANN VSC is a hybrid neural network combining the ANN training are analysed; the authors suggest a new characteristic features of a multilayer perceptron, the technique based on the direct neural control. Comparative Wilshaw – von der Malsburg network with the Hopfield error values during the training process using both the network. traditional methods and a new approach are presented. The new technique proves to be better than the traditional one for II. ABOUT OPTIMAL CONTROL IN NEURAL NETWORKS TASKS considered neural networks. Recently, the scope of application of neural networks has expanded considerably. The most popular tasks are synthesis Keywords—artificial neural network, Kohonen network, of control systems, identification tasks, data processing, multilayered neural network, control information recovery tasks, scheduling problems and other I. INTRODUCTION original activities (e.g. creating new pictures and arts). Issues related to scheduling have always been of great Despite routine modifications of the structure and significance for railway industry. Among the most common topologies of ANN and training methods, ANN is a system scheduling tasks one can mention routing, timetabling, controllable only by using sets of recommendations based on volume planning, timetabling and volume planning, etc. heuristic approaches [2], numerical experiments, etc. Most Solving these tasks with strict methods we face certain authors emphasize that the quality of ANN training and the problems such as combinatory complexities, exhaustive development and creation of neural network solutions is a searches, computer memory deficiency, and time-consuming complicated scientific problem. Sometimes we may see computations. In this case a number of heuristic algorithms certain attempts of combined application of ANN and are used (the Monte Carlo algorithm, evolutional algorithms, optimal control theory as a rigor mathematical method neural networks etc.). The present paper is aimed at applicable for any task. illustrating how neural networks (a special category) solve Paper [10] contains an attempt to create an algorithm for timetabling tasks and create methods to control the quality of the development of the deep convolutional neural networks ANN training. The ANN under investigation [1] looks like 2- using manifold compactification. This approach is suitable dimensional modification of the Kohonen and Wilshaw and for computer vision ANN but it is inconvenient for MANN von der Malsburg network. with variable signal conductivity due to dissimilarity of their When seeking a neural network solution of every task we structures. should answer the following questions: The theses [9] are more relevant for the ANN under 1. How to translate the task into the language consideration but it is impossible to apply the general idea of “understandable” for the neural networks; how to find the [9] because MANN follows its own rules of output correspondence between the states of neurons and the values calculation. Traditionally an artificial neural network of optimized parameters? implements an epoch as a full sequence of pairs “input- 2. How to construct a network energy function with output” but MANN under consideration does not work with given constraints and given target function? the set of different examples [8]. Immediately we run into two difficulties: 1. How to establish a correspondence between the We should focus on paper [6] where the author suggests a members of a network energy function and the members of genetic algorithm to optimize the vector of hyperparameters the general form of network energy? for convolutional neural networks. The closest result is in [7] 2. How to calculate weighting factors for penalty where an asynchrony mover is a control object and two functions? neural networks are suggested. The first network creates a One of the first attempts to overcome these shortcomings control signal; the second one catches the difference between with regard to railway transport dates back to 2015 [1] and is the desired output and the measurable output. connected with the development of a multilayer artificial Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0) Data Science Paper [3] deals with constructing the optimal time integral regulation, K3 is the coefficient of derivative sequences which consist of weights between neurons of a regulation. The PID-controller is implemented in the dynamic ANN. In [3] the two-point boundary value programming language R in the RStudio environment and nonlinear problem is solved. It yields optimal rules of the after that it is incorporated into the code of multilayered ANN training. The weight matrix of the ANN in every time ANN. The novelty of this approach is in the controllable step (epoch) is set as an optimal time sequence. The authors object (the MANN as a kind of ANNs) and in the universal note that at best the weight matrix at the final time step algorithm to transform a concrete PID-control curve to a relates to the symmetric matrix constructed by J.J. Hopfield strict indicator which sets a direction of the MANN signal for associated memory [4]. trajectory. Initial conditions are set as an input vector concatenated Fig. 1 shows dynamics of changing the error signal for several training samples. the MANN consisting of 27 layers and 1920 neurons in each layer and with 185 schedules as a computational load The functional (the criterion) of quality minimizes the without control. value which is an opposite value of correlation between the output of the neuron and the desired output of the neuron at the final time step of controlling. During the time interval between the first step and the last step of controlling the functional penalizes miscorrelation level between the desired output and the answer of activation function of each neuron. In this case an optimal control strategy is founded as Lagrange problem for a task of an optimal program control of the multilayered perceptron with a sigmoid activation function. Another way of control is applying PID-controllers as a Fig. 1. The MANN error signal (a typical mode with a traditional algorithm [8], no control). control technique. A few more papers concerning ANN application for Fig. 2 shows the desired error change signal. scheduling tasks should be mentioned. These solutions refer to scheduling, too; nevertheless, they touch upon modification of ANN activation function or the ANN structure. Thus, paper [11] analyzes a pickup of empirical coefficients for multilayer perceptrons and describes transferring to stochastic methods of weight modification at Hopfield models, etc. Papers [12-16] address NP-hard problems (timetabling tasks, path searching in graphs) and its neural network solutions with different types of ANN (MLP, LSTM, CNN, etc.) and with various key algorithms (genetic algorithms, Fig. 2. Setup change (the principal view of the desired signal). adjusting ANN parameters, error back propagation, standard searching). The authors organized and conducted about 1200 starts However, these papers, like other articles analyzed of the ANN with different parameters of the proportional above, do not consider an artificial neural network as a (ranging from 0.1 to 1), the integral (from 10 to 40) and the controllable object using optimal control theory. differential (from 0.1 to 4.1) error components and the disturbations value from 5 till 60 points per every time step. Paper [16] is a meta-study about various approaches to It is not a not very efficient method of control because it solving schedule problems with different recommendations – provides only 10% stable trajectories. The stability is taken from project management techniques to neural expert into consideration in a Lyapunov sense [5]. Computational systems but without any neurocontrol and adjustments. experiments illustrate that the marginal critical value of the disturbations feed to the ANN is no more than 10-15% of the Examination of articles [11-16] leads to conclusion that average error in the stable mode (Table 1.). This result the problem of improving the quality of neural network cannot be evaluated as practical. solutions is being analyzed in many countries. However, mission statement with regard to neurocontrol as a control IV. DIRECT NEUROCONTROL FOR MULTILAYERS ARTIFICIAL task with two ANN has not yet received attention it deserves. NEURAL NETWORKS AND ITS ADVANTAGES In the field of neurocontrol this problem is rightfully considered novel. It refers both to optimal control theory and Along with the traditional training algorithm the authors hybrid neurocontrol. suggest a direct neurocontrol mode for training. The object to be controlled is a multilayered ANN with variable signal III. ABOUT PID-CONTROL IN NEURAL NETWORKS conductivity [1]; a three-layer perceptron with sigmoid activation functions is taken as a controller. PID control of the ANN error signal is found with the following classical formula [5]: The main scheme of control is shown in Fig.3.  Gss*s  The ANN-controller is trained by the aggregation of where s is the argument of the transfer function, K1 is the triple sets “ The level of error per epoch” – “The level of coefficient of proportional regulation, K2 is the coefficient of VI International Conference on "Information Technology and Nanotechnology" (ITNT-2020) 28 Data Science error at the previous moment” – “The control signal from the networks with direct neurocontrol are of much better quality previous time step to the present time step” or “The previous (as compared with those obtained with traditional algorithms. level of error” – “The current level of error” – “The control signal”. ACKNOWLEDGMENT This research was provided by Russian Foundation for The Executive mechanism (the Basic Research (the project #17-20-01065 “A theory of algorithm of signal transmission) railway transport system neural network control”). Training technique REFERENCES Multilayered Artificial Neural Network with variable signal Time delay [1] A. Olshansky and A. Ignatenkov, “One approach to control of a Output Input conductivity neural network with variable signal conductivity,” Information tecnologiesn and Nanotechnologies (ITNT), pp. 984-987, 2017. Output [2] A.V. Nazarov and A.I. Loskutov, “Neural networks algorithms of forecast and system optimization,” Saint Petersburg: Science and technic, 2003, 384 p. ANN-controller Discrepancy summation [3] O. Fahotimi, A. Dembo and T. Kailath, “Neural network weight Output forecast matrix synthesis using optimal control techniques,” Advances in Neural Information Processing Systems-2 (NIPS-2), USA, Denver, Control signal Colorado, USA, Stanford,1989. [4] J.J. Hopfield, “Neural networks and physical systems with emergent Fig. 3. The scheme of a direct neurocontrol mode. collective computational abilities,” Proc. Natl. Acad. Sci. Biophysics, USA, vol. 79 pp. 2554-2558, 1982. The current error signal of the ANN and the previous one [5] R.C. Dorf and R.H. Bishop, “Modern control systems,” Pearson, are gathered and entered the trained and ready multilayer 2011. perceptron. An answer signal of the ANN-controller entered [6] Y.R. Tsoy, “Neuroevolution algorithm and software for image the discrepancy summation and actuating mechanism (an processing: The dissertation on competition of a scientific degree of algorithm). Hereinafter the value of summated discrepancy is Candidate of technical sciences,” Tomsk Polythechnic University, also fed by the ANN-controller. 2007, 209 p. [7] A.M. Sagdatullin, “A neural network controller for the velocity value The control scheme described above was tested for the of an asynchronic motor,” Theses of Russian Congress for Control concrete scheduling problem (the railway branch Arkhara – Problems, Russia, Moscow, Institute of control sciences of Russian Volochaevka, 27 railway stations). The task included 185 Academy of Sciences, pp. 4485-4498, 2014. trains per 24 hours. [8] A.M. Olshansky and A.V. Ignatenkov, “Development of an artificial neural network for constructing a train schedule,” Bulletin of the The results of testing are given in the table 1. Ryazan State Radio Engineering University, vol. 55, pp. 73-80, 2016. [9] I.M. Kulikovskikh, “Reducing computational costs in deep learning TABLE I. A COMPARISON OF DIFFERENT TRAINING METHODS on almost linearly separable training data,” Computer Optics, vol. 44, no. 2, pp. 282-289, 2020. DOI: 10.18287/2412-6179-CO-645. Training error PID-controller Direct [10] Yu.V. Vizilter, V.S. Gorbatsevich and S.Y. Zheltov, “Structure- (points) (the best neurocontrol Traditional configuration functional analysis and synthesis of deep convolutional neural algorithms with networks,” Computer Optics, vol. 43, no. 5, pp. 886-900, 2019. DOI: K1/K2/K3 = 10.18287/2412-6179-2019-43-5-886-900. 0.1/40/2.1) [11] A.S. Jain, “Meeran S. Job-shop scheduling using neural networks,” Min 75 362 193 International Journal of production research, vol. 36, no. 5, pp. 1249- 1272, 1998. Max 134795 211585 57895 [12] Z. Li, Q. Chen and V. Koltun, “Combinatorial optimization with Median 5469 471 210 graph convolutional networks and guided tree search,” Advances in Neural Information Processing Systems, pp. 539-548, 2018. Average 16548 1830 384 [13] A. Milan, “Data-driven approximations to NP-hard problems,” Thirty-First AAAI Conference on Artificial Intelligence, 2017. SD 6687 4485 1180 [14] J. Bruck and J.W. Goodman, “On the power of neural networks for Rate of error 50 15 0.4 solving hard problems,” Neural Information Processing Systems, pp. overshoot 137-143, 1988. [15] A. Chaudhuri and K. De, “Job Scheduling Problem Using Rough V. CONCLUSIONS Fuzzy Multilayer Perception Neural Networks,” Journal of Artificial Intelligence, vol. 1, no. 1, 2010. Thus, this work shows the principal possibility to control [16] S.J. Noronha and V.V.S. Sarma, “Knowledge-based approaches for the multilayered artificial neural network with variable signal scheduling problems: A survey,” IEEE Transactions on Knowledge conductivity. The three layered perceptron with the and Data Engineering, vol. 3, no. 2, pp. 160-171, 1991. sigmoidal activation function is used as a controller. The solutions achieved using multilayered artificial neural VI International Conference on "Information Technology and Nanotechnology" (ITNT-2020) 29