Parallel genetic method for the synthesis of recurrent neural networks for using in medicine Serhii Leoshchenko1[0000-0001-5099-5518], Andrii Oliinyk2[0000-0002-6740-6078] , Stepan Skrupsky3[0000-0002-9437-9095], Sergey Subbotin4[0000-0001-5814-8268] and Viktor Lytvyn5[0000-0003-4061-4755] 1,2,4,5 Dept. of Software Tools, Zaporizhzhia National Technical University, Zaporizhzhia 69063, Ukraine 3 Dept. of Computer Systems and networks, Dept. of Software Tools, Zaporizhzhia National Technical University, Zaporizhzhia 69063, Ukraine sergleo.zntu@gmail.com, olejnikaa@gmail.com, sskrupsky@gmail.com, subbotin@zntu.edu.ua, lytvynviktor.a@gmail.com Abstract. In modern medicine, information technologies are widely used in the diagnosis and treatment of various diseases. The main task of creating such sys- tems is to improve the quality of diagnosis and treatment. Therefore, the work aimed at finding new solutions in the creation of such systems are relevant. De- spite all the advantages of neural networks, there are many difficulties in their implementation in medicine. In this paper are presented methods for solving the problem of recurrent neural network synthesis, which can be used as models in medical diagnostics. Keywords: medical diagnosis, neural networks, synthesis, parallel, genetic method. 1 Introduction Intensive development of medical science, expansion of possibilities of deepening in etiology, pathogenesis of disease, increase in data on markers of various pathological conditions dictates the necessity ща searching new approaches to the processing of the results. Today, it is important to quickly analyze a large number of data and make the right decision, which can affect the prognosis, course and outcomes of the disease. In this case, more and more attention is paid to information technologies (IT), and in the scope of medicine it can be talked about electronic medicine [1]. IT is imple- mented in the form of special medical systems for various purposes and individual automated diagnostic and treatment devices. The using IT allows to solve various tasks, which include prediction the risks of diseases, complications and treatment effectiveness, early diagnosis, treatment plan- ning, monitoring the patient's health, automated analysis and statistical processing of clinical material. Medical systems significantly simplify the work in such situations when it is impossible to present the problem in numerical form, there is no certainty or accuracy in the studied parameters or there is no one-digit algorithm for solving problems [2]. These characteristics are suitable for solving medical problems, which are a large amount of multidimensional, complex and sometimes contradictory clini- cal data obtained in the process of censored observations. Currently, the use of statistical methods of data processing prevails in medical re- search. The most common descriptive methods used in traditional statistical studies are survival analysis and multivariate complex analysis classified as discriminant, cluster, factor, and correlation. The fact that artificial neural network (ANN) are very successfully used in various fields, where it is necessary to solve the problems of forecasting, classification and management, explaine the undying interest to the ANN methods, which has been observed recently. ANN have the ability to nonlinear modeling in combination with a relatively simple implementation and this makes them indispensable in solving com- plex multidimensional problems, including medical [3]. Today, there are many models of using ANN's architectures, which differ in their computational complexity, the degree of similarity with the living neurons of the brain, as well as having an exclusive and unique in its creation. Therefore, ANN are not subject to any classification standards in comparison with traditional statistical methods. Existing ANN are able to work with both numerical data lying in a certain limited range and non-numerical parameters, for example, graphic images of various configu- rations. However, the non-standard scale of quantitative characteristics, the presence of missing values, the variability of nominal variables, the conversion of qualitative parameters into a numerical function or declaring them insignificant, create additional problems in the operation of the ANN and distort the output result. From both scientific and practical point of view, one of the main advantages of us- ing ANN is its ability to learn with data analysis, the establishment of complex and hidden connections and the subsequent presentation of independent results [4], [5]. In the process of training, when a large number of errors appear, it is possible to revise both the configuration of the network itself and to change the parameters included in its training [1]. Thus, the advantages of using ANN are: ─ ability to learn from multiple qualitative and quantitative examples with unknown patterns between input and output data without fragmentation of the data sample. A more accurate description of the parameters, the ability to display the dynamics of the statistical properties of various indicators; ─ effective data compression due to the construction of nonlinear mappings and the ability to visualize in the space of a smaller number of nonlinear principal compo- nent neural networks built; ─ ability to make decisions based on absolute resistance to noise of input data and adaptation to environmental changes; ─ modelling real situations solving tasks is done by analyzing the knowledge from their own experience of the ANN for an independent but. Minimal or complete ab- sence of subjective factor influence on the final result. The ability to manually edit the values of individual parameters and their properties of an ANN, as well as oth- er ways to include expert knowledge in the network; ─ potential fault tolerance in hardware implementation of ANN; ─ the possibility of using in situations that require the immediate adoption of the solution. However, the use of ANN technologies for solving practical problems is associated with many difficulties. One of the dominant problems in the application of ANN’s models is the unknown architecture of the projected neural network and its degree of complexity, which will be sufficient for the reliability of the result. 2 Review of the literature In a number of works [6–16] was presented different algorithms to perform the ANNs training stage. The most common the Backpropagation method (BP), which allows you to adjust the weight of multi-layer complex ANNs using training sets. On the recommendation of E. Baum and D. Hassler [7, 8], the volume of the training set is directly proportional to the number of all ANN weights and inversely proportional to the proportion of erroneous decisions in the operation of the trained network [9, 10]. It should be noted that the BP method was one of the first methods for ANNs train- ing. Most of all brings trouble indefinitely long learning process. In complex tasks, it can take days or even weeks to train a network, and it may not train at all. The cause may be one of the following [6, 11, 12]. 1. Network paralysis. During network training, the weights can become very large as a result of the correction. This can cause all or most neurons to function at very high OUT values, in an area where the derivative of the compression function is very small. Since the error sent back in the learning process is proportional to this derivative, the learning process can practically freeze. 2. Local minimum. The network can hit a local when there are much deeper lows nearby. At the point of the local minimum, all directions lead up, and the network is unable to get out of it. Statistical training techniques can help avoid this trap, but they are slow. 3. Step size. The step size should be taken as final. If the step size is fixed and very small, the convergence is too slow, if it is fixed and too large, paralysis or constant instability may occur. It should also be noted the possibility of retraining the network, which is rather the result of erroneous design of its topology. With too many neurons, the property of the network to generalize information is lost. The training set will be examined by the network, but any other sets, even very similar ones, may be misclassified. The Backpropagation through time (BPTT) method has become a continuation, which is why it is faster. Moreover, it solves some of the problems of its predecessor. However, the BPTT experiences difficulties with local optima. In recurrent neural networks (RNN), the local optimum is a much more significant problem than in feed- forward neural networks. Recurrent connections in such ANNs tends to create chaotic reactions in the error surface, resulting in local optima appearing frequently. Also in the blocks of RNN, when the error value propagates back from the output, the error is trapped in the part of the block. This is referred to as the “error carousel”, which con- stantly feeds the error back to each of the valves until they become trained to cut off this value. Thus, regular back propagation is effective when training an RNN unit to memorize values for very long durations [13, 14]. The main difference between genetic programming and genetic algorithms is that each individual in the population now encodes not the numerical characteristics that provide the optimality of the problem, but some solution to the problem. The term solution here refers to the configuration of the neural network. At the moment, there are several reasons to criticize the genetic algorithm and ge- netic programming using. Below is a list of the main drawbacks of this approach. ─ Re-evaluation of fitness function for complex problems is often a factor limiting the use of artificial evolution algorithms. Finding the optimal solution for a com- plex high-dimensional problem often requires a very costly evaluation of the fit- ness function. ─ Genetic algorithms are poorly scalable to the complexity of the problem to be solved. ─ Despite attempts to formalize genetic algorithms and the neuroevolutionary ap- proach in particular [17], the theoretical basis remains scant. ─ The solution is only more suitable than other solutions. As a result, the stopping criterion of the algorithm is unclear for each problem. ─ In many problems, genetic algorithms tend to converge to a local optimum or even to controversial points, instead of a global optimum for the given problem [18]. However, many of these shortcomings can be corrected. For example, to prevent premature convergence, it is necessary to correctly select such parameters of the ge- netic algorithm as the population size and the percentage of individuals subjected to mutation. In addition, new variants of genetic operators are constantly being devel- oped. The additional cost of recalculating the fitness function value can be avoided by using flags for cases where the fitness function value does not change over time. However, in addition to the above drawbacks, genetic algorithms have quite a sig- nificant list of advantages. ─ Scalability. Genetic algorithms can be easily adapted for parallel and multicores programming, so that due to the peculiarities of this approach, the corresponding overhead costs are significantly reduced. ─ Universality. Genetic algorithms do not require any information about the response surface, they work with almost any tasks. ─ Genetic algorithms may be used for tasks in which the value of the fitness function changes over time or depends on various changing factors. ─ Even in cases where existing techniques work well, interesting results can be achieved by combining them with genetic algorithms, using them as a complement to proven methods. ─ Gaps existing on the response surface have little effect on the full efficiency of optimization, which also allows to further expand their use. Thus, taking into account all the advantages and disadvantages of genetic algo- rithms, it is possible to obtain a sufficiently universal system for solving the necessary problems [19] and, in particular, for optimization of the neural network. 3 Sequential modified genetic method of recurrent neural networks synthesis In the method, which is proposed to find a solution using a population of neural net- works: P  NN 1 , NN 2 ,..., NN n  , that is, each individual is a separate ANN Ind i  NN i [18–20]. During initialization population divided into two halves, the genes g Indi  g1 , g 2 ,..., g n  of the first half of the individuals is randomly assigned g Indi  g1  Rand, g 2  Rand,..., g n  Rand . Genes of the second half of the popu- lation are defined as the inversion of genes of the first half g Ind i  g 1  Rand , g 2  Rand ,..., g n  Rand  . This allows for a uniform distribution of single and zero bits in the population to minimize the probability of early conver- gence of the method ( p  min ). After initialization, all individuals have coded networks in their genes with-out hidden neurons (Nh), and all input neurons (Ni) are connected to each output neuron (No). That is, at first, all the presented ANNs differ only in the weights of the in- terneuron connection wi. In the process of evaluation, based on the genetic informa- tion of the individual under consideration, a neural network is first built, and then its performance is checked, which determines the fitness function ( f fitness ) of the indi- vidual. After evaluation, all individuals are sorted in order of reduced fitness, and a more successful half of the sorted population is allowed to cross, with the best indi- vidual immediately moving to the next generation. In the process of reproduction, each individual is crossed with a randomly selected individual from among those selected for crossing. The resulting two descend-ants are added to the new generation G  P ` Ind 1 , Ind 2 ,..., Ind n  . Once a new generation is formed the mutation operator starts working. However, it is important to note that the selection of the truncation significantly reduces the diversity within the population, leading to an early conver- gence of the algorithm, so the probability of mutation is chosen to be rather large ( p mut  15-25%) [20]. If the best individual in the population does not change for a certain number of generations (by default, it is proposed to set this number at eight), this individual is forcibly removed, and a new best individual is randomly selected from the queue. This makes it possible to realize the exit from the areas of local minima due to the relief of the objective function, as well as a large degree of convergence of individuals in one generation. The general scheme of the method demonstrated at Fig.1. Population initialization P  NN 1 , NN 2 ,..., NN n  Assignment of the genes of a population g Ind  g 1 , g 2 ,..., g n  i Setting the value of the weights of neural connections w i Evaluation of genetic information of individuals Ind i Choosing the best individual Sorting of the individuals Сrossing of individuals Choosing new best individual Best individuals are similar Comparison of the best Random selection of the new best individual individuals Formation of a new generation G  P ` Ind 1 , Ind 2 ,..., Ind n  Fig. 1. The general scheme of the method 3.1 The calculation of the output layer of ANN On condition using the support vector machine, the optimality criterion for calculating the output weights may not be specified. If the value of the mean square error is re- placed by the criterion of the maximum separation of the support vectors, then the optimal linear weights of the output can be estimated using, for example, quadratic programming, as in the traditional method of support vectors, for this it is advisable to use the Evoke operator [21], by the formula:   k li y t   w0    wi , j K  t ,  i  j  i 1 j  0 , (1) where  t   R is the output of a recurrent neural network f  at a time t ; K , is n a predefined kernel function; wi, j is weights corresponding to k training sequences  i , each length li , and are calculated using the support vector machine. The value of the mean square error is replaced by the criterion of maximum separa- tion of support vectors. In this case, the optimal linear weights can be estimated using quadratic programming, as in the traditional support vector machine. One of the problems of neuroevolutionary method realization is the algorithm of ANN output calculation with arbitrary topology. ANN can be represented as a directed planar graph. Based on the fact that the net- work structure can be any, loops and cycles containing any nodes are allowed in the graph, except for the nodes of the corresponding input neurons. Let denote the set of nodes of the graph by V  v i | i  0; N v  1 , and a set of arcs through   E  e j | j  0; N e  1 , where N v and N e are accordingly, the number of nodes and arcs in the graph, and Nv  Ns , and N e  N c . The arc, which goes from node k to node 1 denote by an ordered pair c k ,l  v k , v l  , the weight of the corresponding link will be denoted by wk ,l . Give the index to the nodes of the graph as neurons, that is, the nodes that are the input neurons, called input. have an index out of range 0; N l  1 . By analogy, the indexes of outgoing nodes belong to the interval N l ; N l  N o  1 , and indexes for hidden nodes will be set in the interval N l  N o ; N v  1 . Let introduce an additional characteristic for all nodes of the graph equal to the minimum length of the chain to any of the input nodes and denote it chi . Let's call chi the layer to which the ith node belongs. Thus, all input nodes belong to the 0th layer, not all input nodes that have input arcs from the input belong to the 1st layer, all other nodes with input arcs from nodes of the 1st layer will belong to the layer with index 2, etc .in this case, there may be situations when the node does not have input arcs, we will call it a hanging node with the layer number chi  1 . For arcs, we also introduce an additional characteristic b k ,l for the arc c k ,l , which is necessary to determine whether the arc corresponds to forward or reverse. It will be calculated as follows:  1, chl  chk  0 bk ,l   (2)  1, chl  chk  0 That is, if the index of the layer of the end node of the arc is greater than the index of the layer of the beginning node, then we will consider such an arc as a straight line, otherwise we will consider the arc as an inverse. Since each node of the graph represents a neuron, we denote by sumi the value of the weighted sum of inputs, and through oi is the value of the output (the value of the activation function of the ith neuron-node). Then, oi  f fitnesssumi  where f fitness is the function of neuron activation. Let's divide the whole process of signal propagation from the input nodes into stages, and during one such stage the signals manage to pass only one arc. The num- ber of the stage is denoted by s. For the very first stage s=1. For short assumed that all arcs have the same length, and the signals are sewn on them instantly. We denote the feature that the output of node i was updated at this stage through a i , that is, if ai  1 , then the output of the node at stage s is calculated, otherwise, if ai  1 is not. Let's introduce one more designation X  xi | i  0; N l  1 it is vector of input signals. Then the algorithm for calculating the ANN output is as follows: 1. oi  x i , ai  1 , for all i  0; N l  1 ; 2. oi  0 , for all i  N l ; N s  1 ; 3. s=1; 4. sumi  0 , ai  1 , for all i  N l ; N s  1 ; 5. if s  1 , than go to the step number 7; 6. calcultion of the feedback network. For all input feedbacks c j , k node v k , where k  N l ; N s  1 : sumk  sumk  o j , if ch j  s ; 7. if a i  0 , than fn(i ) for all i  N l ; N s  1 ; 8. if the stop criterion is not met, than s=s+1 and go to the step number 4. Fig. 2. The general scheme of the calculation of the output layer of ANN Here fn(i ) is a recursive function that calculates the output of the 1st node taking into account all straight arcs. Works on the following algorithm: 1. if chi  0 , than go to the step number 3; 2. for all input arcs c k ,l node vi : if a k  1 , than sum i  sum i  o k , else fn(k ) ; 3. oi  f sum i  ; 4. exit. The stopping criterion of the ANN output calculation algorithm can be one of the following: ─ stabilization of values at the output of ANN; ─ s exceeds the set value. It is more reliable to calculate the output until the values at the output of ANN do not change, but for the case when the network contains cycles and/or loops, its output may never become stable. Therefore, the required additional stopping criteria limiting the maximum number of stages of calculation of network output. 4 Parallel genetic modified method for the synthesis of recurrent neural networks Considering the features of the proposed modified genetic method for RNN synthesis, its parallel form can be represented as in Fig. 3. All stages of the method can be di- vided into 3 stages, separated by points of barrier synchronization. At the first stage, the main core initializes the population P, and adjusts the initial parameters of the method, namely: the stopping criterion, the population size, the criterion for adaptive selection of mutations. Next, the distribution of equal parts of the population (sub- populations) and initial parameters to the cores of the computer system is performed. Initialization of the initial population cannot be carried out in parallel on the cores of the system, because the generated independent populations intersect thus increasing the search for solutions. The second stage of the proposed method is performed in parallel by the cores of the system. All cores perform the same sequence of operations on their initial population. After the barrier synchronization, the main core receives the best solutions from the other cores and checks the stopping criterion. If it is, then the next generation (G) is formed. Otherwise, after changing the initial parameters, allowing the cores of the system getting the other solutions, return to the distribution of the initial parameters to the cores on the system is performed. And then the cores perform parallel calculations according to the second stage of the method. Fig. 3. Parallel genetic method for RNN synthesis The proposed parallel method for RNN synthesis can be applied both on MIMD- systems [22] (clusters and supercomputers) and on SIMD (for example, graphics pro- cessors programmed with CUDA technology). 5 Experiments The following hardware and software have been used for experimental verification of the proposed parallel genetic method for RNN synthesis [23]: 1. cluster of Pukhov Institute for Modeling in Energy Engineering National Academy of Sciences of Ukraine (IPME), Kyiv: processors Intel Xeon 5405, RAM – 4×2 GB DDR-2 for each node, communication environment InfiniBand 20Gb/s, middle- ware Torque and OMPI. MPI and Java threads programming models; 2. the computing system of the Department of software tools of Zaporizhzhya na- tional technical university (ZNTU), Zaporizhzhya: Xeon processor E5-2660 v4 (14 cores), RAM 4x16 GB DDR4, the programming model of Java threads. 3. Nvidia GTX 960 graphics processor (GPU) with 1024 cores, which are pro- grammed using CUDA technology. During testing, the main task is to track the speed of the proposed method, quality and stability. Since synthesized RNN can be further used as diagnostic models for medical diagnosis, testing should be carried out on the relevant test data. Data for testing were taken from the open repository – UC Irvine Machine Learn- ing Repository. Data sample was used: Parkinson's Disease Classification Data Set [24]. The data used in this study were gathered from 188 patients with PD (107 men and 81 women) with ages ranging from 33 to 87 (65.1±10.9). The data used in this study were gathered from 188 patients with PD (107 men and 81 women) with ages ranging from 33 to 87 (65.1±10.9) at the Department of Neurology in Cerrahpasa Faculty of Medicine, Istanbul University. The control group consists of 64 healthy individuals (23 men and 41 women) with ages varying between 41 and 82 (61.1±8.9). During the data collection process, the microphone is set to 44.1 KHz and following the physicians examination, the sustained phonation of the vowel was collected from each subject with three repetitions. Table 1 shows the main character- istics of the data sample. Table 1. Main characteristics of the Parkinson's Disease Classification Data Set Criterion Characteristic Criterion Characteristic Data Set Characteristics Multivariate Number of Instances 756 Attribute Characteristics Integer, Real Number of Attributes 754 6 The results analysis In the Fig. 4 and 5 are graphs of the execution time (in minutes) of the proposed method on computer systems, which depends on the number of cores involved. It can be seen from the graphs that the proposed method has an acceptable degree of paral- lelism and is effectively performed on both MIMD and SIMD systems. This way, the IPME cluster was able to reduce the method execution time from 1565 minutes (on one core) to an acceptable 147 minutes on 16 cores. On the ZNTU the computing system, the method execution time was reduced from 1268 minutes on a single core to 110 minutes on 16 cores. The differences in the performance of the systems are due to their architectural features: in the cluster cores are connected by means of the Infini- Band communicator, and in the multi-core computer they are located on a single chip, which explains the smaller impact of overhead (transfers and synchronizations). In addition, the processor in multi-core computer supports Turbo Boost technology [25], making the time of the method execution on the single core much less than the execu- tion time on the core of the cluster that does not support this technology. On a GPU with 960 cores involved, the execution time was 326.4 minutes, which can be adequately compared with the four cores of an IPME cluster or a ZNTU computing system. 1800 in 1565 m1600 ZNTU ,e IPME m ti 1400 1268 n oi 1200 t u ce 1000 x 814 E 800 656 600 424 340 400 236 187 110 147 200 0 1 2 4 8 16 Number of CPU cores Fig. 4. Dependence the execution time of the proposed method to the number of involved cores of IPME cluster and ZNTU the computing system n 1800 i 1702,4 m1600 ,e m 1400 GPU ti n o 1200 it 987,4 u ce 1000 x 800 E 622,1 600 432,3 326,4 400 200 0 60 120 240 480 960 Number of GPU threads Fig. 5. Dependence the execution time of the proposed method to the number of GPU cores involved The speedup graphics of calculations on a cluster IPME, ZNTU computing system and the GPU are shown in Fig. 6 and 7. p u 13 d ee ZNTU 11,53 p IPME S 11 10,64 9 6,78 7 6,64 5 3,73 3 1,93 3,69 1,00 1,92 1 1 2 4 8 16 Number of CPU cores Fig. 6. The speedup graphics of calculations on a cluster IPME and ZNTU computing system p 6,0 u d ee GPU 4,79 p 5,0 S 4,0 3,62 3,0 2,52 2,0 1,58 0,92 1,0 0,0 60 120 240 480 960 Number of GPU threads Fig. 7. The speedup graphics of calculations on a GPU From the figures it is noticeable that the acceleration, though not linear, but ap- proaches to linear. This is explained by the fact that communication overhead of the proposed method execution on computer systems is relatively small (Fig. 8, 9), and the number of parallel operations significantly exceeds the number of serial opera- tions and synchronizations. In communication overhead, is understood the ratio of the time spent by the system for transfers and synchronization among cores to the time of target calculations on a given number of cores. The graph of efficiency of computer systems IPME and ZNTU is presented in Fig. 10. It shows that the using of even 16 cores of computer systems for the implementation of the proposed method retains the efficiency at a relatively acceptable level and indi- cates the potential, if necessary and possibly, to use even more cores. 0,45 0,43 d ae ZNTU h 0,40 re IPME vo 0,35 n oi 0,30 0,32 t ac 0,25 i n 0,19 u0,20 m m0,15 o 0,08 0,15 C 0,10 0,04 0,05 0,07 0,00 0,03 0,00 1 2 4 8 16 Number of CPU cores Fig. 8. Communication overhead performing the proposed method to the number of cores in- volved of IPME cluster and ZNTU the computing system d 0,60 ae 0,51 h re 0,50 v o n 0,40 0,39 oi GPU ta ci n 0,30 0,26 u m m o 0,20 0,16 C 0,11 0,10 0,00 60 120 240 480 960 Number of GPU threads Fig. 9. Communication overhead performing the proposed method to the number of GPU cores involved y 1,1 c ZNTU n iec 0,97 if 1,0 1,00 IPME f 0,93 E 0,9 0,96 0,92 0,85 0,8 0,83 0,72 0,7 0,67 0,6 0,5 1 2 4 8 16 Number of CPU cores Fig. 10. The efficiency graph of IPME and ZNTU computing systems when executing the proposed method Thus, the proposed method is well parallelized on modern computer architectures, which can significantly reduce the task: generate the models for future medical diag- nosis execution time. 7 Conclusion The problem of finding the optimal method of synthesis of ANN requires a compre- hensive approach. Existing methods of ANNs training are well tested, but they have a number of nuances and disadvantages. The paper proposes a mechanism for the use a modified genetic algorithm for its subsequent application in the synthesis of ANNs. A model of parallel genetic method of RNS synthesis is proposed, which in com- parison with the sequential implementation significantly speed up the synthesis proc- ess. In the developed model is proposed to parallelize the most resource-intensive operations: the generation of RNS populations, the calculation of genetic information about individuals, which can significantly accelerate the process of finding the best solution in the synthesis of networks. Based on the analysis of the experimental results, it can be argued about the good work of the proposed method. However, to reduce iterativity and improve accuracy, it should be continued to work towards parallelization of calculations. Acknowledgment The work was performed as part of the project “Methods and means of decision- making for data processing in intellectual recognition systems” (number of state regis- tration 0117U003920) of Zaporizhzhia National Technical University. References 1. Volchek, Y.A., Shyshko, V.M., Spiridonova, O.S., Mokhort, Т.V.: Position of the model of the artificial neural network in medical expert systems. Juvenis scientia (9), 4–9. Scien- tia, Saint Petersburg (2017). 2. Shkarupylo, V., Skrupsky, S., Oliinyk, A., Kolpakova T.: Development of stratified ap- proach to software defined networks simulation. EasternEuropean Journal of Enterprise Technologies, vol. 89, issue 5/9, pp. 67–73 (2017). doi: 10.15587/1729-4061.2017.110142 3. Leoshchenko, S., Oliinyk, A., Subbotin, S., Gorobii, N., Zaiko, T.: Synthesis of artificial neural networks using a modified genetic algorithm. Proceedings of the 1st International Workshop on Informatics & Data-Driven Medicine (IDDM 2018), pp. 1-13 (2018). dblp key: conf/iddm/PerovaBSKR18 4. Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classifica- tion: labelling unsegmented sequence data with recurrent neural networks. In Proceedings of the 23rd international conference on Machine learning (ICML '06). ACM, New York, pp. 369–376 (2006). doi: https://doi.org/10.1145/1143844.1143891. 5. Kotsur, M., Yarymbash, D., Kotsur, I., Bezverkhnia, Yu.: Speed Synchronization Methods of the Energy-Efficient Electric Drive System for Induction Motors. IEEE: 14th Interna- tional Conference on Advanced Trends in Radioelectronics, Telecommunications and Computer Engineering (TCSET) 2018, pp. 304-307, Lviv-Slavske, Ukraine (2018). doi:10.1109/TCSET.2018.8336208 6. Van Tuc, N.: Approximation contexts in addressing graph data structures. University of Wollongong Thesis Collection, 30–55 (2015). 7. Barkoulas, J. T., Baum, Ch. F.: Long Term Dependence in Stock Returns. EconomicsLet- ters, vol. 53, no. 3, 253–259 pp. (1996). 8. Barkoulas, J. T., Baum, Ch. F., Travlos, N.,: Long Memory in the Greek StockMarket. Applied Financial Economics, vol. 10, no. 2, 177–184 pp. (2000). 9. Kolpakova, T., Oliinyk, A., Lovkin, V.: Improved method of group decision making in ex- pert systems based on competitive agents selection. IEEE First Ukraine Conference on Electrical and Computer Engineering (UKRCON), Institute of Electrical and Electronics Engineers, pp. 939–943, Kyiv (2017). doi: 10.1109/UKRCON.2017.8100388 10. Stepanenko, O., Oliinyk, A., Deineha, L., Zaiko, T.: Development of the method for de- composition of superpositions of unknown pulsed signals using the second-order adaptive spectral analysis. EasternEuropean Journal of Enterprise Technologies, vol. 2, no 9, pp. 48–54 (2018). doi: 10.15587/1729-4061.2018.126578. 11. Handa, A., Patraucean, V.: Backpropagation in convolutional LSTMS. University of Cam- bridge Cambridge, pp. 1–5 (2015). 12. Boden M.: A guide to recurrent neural networks and backpropagation. Halmstad Univer- sity, pp. 1–10 (2001). 13. Guo, J.: BackPropagation Through Time, pp. 1–6 (2013). 14. Yue, B., Fu, J., Liang, J.: Residual Recurrent Neural Networks for Learning Sequential Representations. Information, 9, 56 (2018). 15. Erofeeva, V.A.: Review of the theory of data mining based on neural networks [Obzor te- orii intellektualnogo analiza dannyih na baze neyronnyih setey. Stohasticheskaya optimi- zatsiya v informatike], 11 (3), pp. 3–17 (2015). 16. Yarymbash, D., Kotsur, M., Subbotin, S., Oliinyk, A.: A New Simulation Approach of the Electromagnetic Fields in Electrical Machines. IEEE: The International Conference on In- formation and Digital Technologies, July 5th - 7th, Zilina, Slovakia, 2017, Catalog Num- ber CFP17CDT-USB, pp. 452-457, (2017). DOI: 10.1109/DT.2017.8024332 17. Balakrishan K., Honavar V.: Properties of Genetic Representation of Neural Architectures. Iowa State University (1995). 18. Whitley D. Genetic Algorithms and Neural Networks. Genetic Algorithms in Engineering and Computer Science, pp. 203-216, (1995). 19. Mitchell M. An introduction to Genetic Algorithm. MIT Press (1996). 20. Hochreiter, S., Schmidhuber, J.: Long Short–Term Memory. Neural Computation, vol. 9. issue 8, pp. 1735–1780 (1997). 21. Schmidhuber, J., Wierstra, D., Gagliolo, M., Gomez, F.: Training Recurrent Networks by Evolino. Neural computation. vol. 19(3), 757–779 pp. (2007). doi: 10.1162/neco.2007.19.3.757. 22. Skillicorn, D.: Taxonomy for computer architectures. Computer (21), pp. 46-57 (1988). doi: 10.1109/2.86786. 23. Alsayaydeh, J.A., Shkarupylo, V., Hamid, M.S., Skrupsky, S., Oliinyk, A.: Stratified Mod- el of the Internet of Things Infrastructure, Journal of Engineering and Applied Science, vol. 13, issue 20, pp. 8634-8638, (2018). doi: 10.3923/jeasci.2018.8634.8638. 24. Parkinson's Disease Classification Data Set, https://archive.ics.uci.edu/ml/datasets/ Parkin- son%27s+Disease+Classification 25. Intel Turbo Boost Technology 2.0, https://www.intel.com/content/www/us/en/architecture- and-technology/turbo-boost/turbo-boost-technology.html