Neural Networks in Intelligent Analysis Medical Data for Decision Support Vasyl Sheketaa, Mykola Pasiekaa, Nelly Lysenkob, Oleksandra Lysenkob, Nadia Pasiekab and Yulia Romanyshyna a National Tech. University of Oil & Gas, Ivano-Frankivsk, 76068, Ukraine b Vasyl Stefanyk Precarpathian National University, Ivano-Frankivsk, 76000, Ukraine, Abstract The main purpose of the work was to consider the problem of neural networks and their application, especially for data management and control in the medical industry. The software product, analyzes processing of unstructured and poorly structured medical data reliability, to support decision-making, implements the neural network, was developed and studied from sets of user-defined information flows. On the basis of the scientific task, the program training algorithm was developed, which provides comprehensive support for decision-making based on the study. The developed software application is focused on cross-platform, and the graphical interface is implemented using Java FX. The software product provides a network for the reverse propagation of neural network errors (BackPropagation) and a network of directed random search (Directed Random Search). Designed neural network is trained and further recognizes the type of distribution (uniform, normal) on the specified characteristics, and used the rule "3 Sigma" to generate synthetic data. According to the study, we can conclude that the Directed Random Search learning algorithm, although more complex to implement the search for relevant medical documents, works much faster than the classical reverse distribution. Keywords 1 Neural network, mathematical models, systems architecture, software applications, CEUR-WS Introduction 1.1. Basic concepts of neural networks Artificial neural networks – mathematical models, as well as their software and hardware implementation, built on the principle of biological neural networks - networks of nerve cells of a living organism. Systems, architecture and principle of operation are based on analogy with the brain of living beings. The key element of these systems is an artificial neuron as an imitation model of the brain nerve cell, a biological neuron. This term arose when studying the processes occurring in the brain and attempting to simulate these processes. The first such attempt was the McCalock and Pitts neural networks [1, 4, 6, 11, 12, 18, 23, 27, 34, 37]. As a consequence, after the development of training algorithms, the obtained models were used for practical purposes: in forecasting tasks, for pattern recognition, in control tasks, and others [21]. The neural networks can be classified by: Type of input information:  analog neural networks (use information in the form of real numbers);  binary neural networks (operate with information presented in binary form). Character of learning: IDDM’2020: 3rd International Conference on Informatics & Data-Driven Medicine, November 19–21, 2020, Växjö, Sweden MAIL: vasylsheketa@gmail.com (A. 1); pms.mykola@gmail.com (A. 2); lysenkowa@gmail.com (A. 3); leuro@list.ru (A. 4); pasyekanm@gmail.com (A. 5); yulromanyshyn@gmail.com (A. 6) ORCID: 0000-0002-1318-4895 (A. 1); 0000-0002-3058-6650 (A. 2); 0000-0002-1029-7843 (A. 3); 0000-0002-1029-7843 (A. 4); 0000-0002-4824-2370 (A. 5); 0000-0001-7231-8040 (A. 6) ©️ 2020 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org)  learning with a teacher - known neural network output;  teacher less learning - the neural network processes only the input of unstructured and poorly structured medical data and generates the output results itself. Such networks are called self- organizing;  teacher-supported learning - a system of fines and incentives from the environment. The nature of synapses setting:  fixed-linked networks (neural network weights are selected immediately based on the W conditions of the task, with:  0 where W is the network weights); t  networks with dynamic links (for them, synaptic links are set up in the course of training, W i.e.,  0 where W are the weights of the network). t 1.2. Back propagation network Backpropagation is a method of teaching multilayer perceptron. The method was first described in 1974 by A.I. Galushkino and independently and simultaneously by Paul G. Verbos. It was further developed in 1986 by David I. Rumelhart, J.E. Hinton and Ronald J. Williams and independently and simultaneously by S.I. Bartsevim and V.A. Okhoninim (Krasnoyarsk Group). This is an iterative gradient algorithm that is used to minimize the error of multilayer perceptron operation and to obtain the desired output. The main idea of this method is to distribute error signals from the network outputs to its inputs, in the direction opposite to the direct distribution of signals in normal operation. Barth and Ohanian proposed a general method («duality principle»), applicable to a wider class of systems, including systems with a delay, distributed systems, etc. [17] To be able to apply the method of inverse error propagation, the neuron transfer function should be differentiated [24]. The method is a modification of the classical gradient descent method. Error reverse propagation algorithm is one of the methods to teach multilayer forward propagation neural networks (multilayer perceptron’s) [2, 7, 25, 30, 36 40]. Training in the method of error propagation reverse involves two passes through all layers of the network: forward and reverse. In a forward pass, the input vector is fed to the input layer of the neural network and then propagated through the network from layer to layer. As a result, a set of output signals is generated, which is the actual reaction of the network to this input image. All synaptic weights of the network are fixed during the direct pass. During the reverse pass, all synaptic weights are adjusted according to the error correction rule, namely: the actual network output is subtracted from the desired one, resulting in an error signal. This signal is then propagated through the network in the opposite direction to the synaptic links. Hence, the name is the method of error propagation backwards. Synaptic weights are adjusted to bring the network output as close to the desired one as possible [9, 10, 13]. The appearance of the reverse propagation algorithm has become a landmark event in the field of neural network development, as it implements a computationally efficient method of multilayer perceptron training. It would be wrong to say that the error reverse propagation algorithm offers a really optimal solution to all potential problems, but it has dispelled the pessimism about multilayer machine learning [26, 29, 33]. Let us consider the work of the algorithm in more detail. Let's assume that it is necessary to teach the next neural network (Fig. 1) by applying the algorithm of error back propagation: The following symbols are used in this picture:  each layer of the neural network has its own letter, e.g. the input layer has its own letter a and the source layer c;  all neurons of each layer are numbered in Arabic numerals;  w (a1-b1) - synaptic scales between neurons a1 and b1. As an activation function in multilayer perceptron’s, the sigmoid activation function, in particular the logistic one, is usually used: 1 𝑂𝑈𝑇 = ⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡(1) 1 + exp⁡(−𝑎𝑌) where a is the tilt parameter of the sigmoid function. Figure 1: Example of a two-layer perceptron By changing this parameter, you can build functions with different steepness. Let us agree that for all subsequent considerations will be used exactly the logistic activation function (Fig. 2), represented by the (formula 1). Figure 2: Sigmoid The sigmoid narrows the range of change so that the OUT value lies between zero and one. Multilayer neural networks have more reflective power than single-layer networks only when non- linearity is present. The compression function provides the necessary non-linearity. In fact, there are many functions that could be used. For an algorithm to reverse the error propagation, it is only necessary that the function is differentiated everywhere. The sigmoid meets this requirement. Its additional advantage is the automatic gain control. For weak signals (i.e. when OUT is close to zero) the input/output curve has a strong slope which gives a high gain. When the signal becomes larger the gain drops. Thus large signals are perceived by the network without saturation, while weak signals pass through the network without excessive attenuation. The purpose of error propagation training for the network is to adjust the weights so that a certain number of inputs lead to the required number of outputs. For short, these sets of inputs and outputs are called vectors. At training it is supposed that for each input vector there is a target vector paired to it, which sets the required output. Together, they are called a training pair. The network learns on many pairs. The algorithm of error back propagation is as follows: 1. Initialize synaptic weights with small random values. 2. Select the next training pair from the training set; submit the input vector to the network input. 3. Calculate the network output. 4. Calculate the difference between a network output and the required output (target vector of a training pair). 5. Adjust the network weights to minimize the error. 6. Repeat steps 2 to 5 for each vector of a training set until the error on all set reaches an admissible level. The operations performed in Steps 2 and 3 are similar to those performed at operation of the already trained network, i.e. input vector is supplied and output is calculated. Calculations shall be performed layer-by-layer. In Fig. 1, the outputs of neurons of layer B (layer A of the input layer, which means there are no calculations in it) are calculated first, then they are used as inputs of layer C, the outputs of OUT (CN) of neurons of layer C are calculated, which form the output vector of the OUT network. Steps 2 and 3 form the so-called «pass forward» because the signal is distributed in the network from input to output. Steps 4 and 5 make up a «reverse pass», here the calculated error signal spreads back through the network and is used to adjust the weights. Let us consider in detail step 5 - correction of the network scales. Two cases should be highlighted here. Case 1. Correction of synaptic weights of the output layer For example, for the neural network model in Fig. 1., these scales have the following designations: w(B1-C1) and w(B2-C1). Let us define that the index p will denote the neuron from which the synaptic scales follow, and the q-neuron into which it enters [8, 19]. Enter the value Δ, which is equal to the difference between the required 𝑇𝑞 and the real 𝑂𝑈𝑇𝑞 outputs multiplied by the derivative of the activation logistic function (formula 1.): ∆⁡= ⁡ OUT𝑞 (1– OUT𝑞 )(T𝑞 – OUT𝑞 )⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡(2) Then, the weights of the output layer after the correction will be equal: 𝑊𝑝−𝑞 (i + 1) ⁡ = ⁡ 𝑊𝑝−𝑞 (i) + n∆𝑞 OUT𝑝 ⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡(3) where i is the number of the current learning iteration; 𝑊𝑝−𝑞 is the value of synaptic weight connecting neuron p with neuron q; n - the «learning rate» coefficient, which allows to control the average value of weight change; OUT𝑝 - neuron p output. Here is an example of calculations for synaptic weight 𝑊𝑏1−𝑐1 : ∆𝑐1 ⁡= ⁡ OUT𝑐1 (1– OUT𝑐1 )(T– OUT𝑐1 )⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡(4) 𝑊𝑏1−𝑐1 (i + 1) = 𝑊𝑏1−𝑐1 (i) + ⁡n∆𝑐1 OUT𝑏1 ⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡(5) Case 2. Correction of synaptic weights of the hidden layer. For the neural network model in Fig. 1, it will be the corresponding weights between layers A and B. Let us define that the index p will indicate the neuron from which the synaptic weight follows, and q - the neuron which enters (we pay attention to the appearance of a new variable k): 𝑚 ∆𝑞 ⁡= ⁡ OUT𝑞 (1– OUT𝑞 ) ∑ ∆kwq − k ⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡(6) 𝑘−1 Then, the weights of the hidden layer after the correction will be equal: 𝑊𝑝−𝑞 (i + 1) = 𝑊𝑝−𝑞 (i) + ⁡n∆𝑝 OUT𝑝 ⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡(7) Here is an example of a calculation for synaptic weight 𝑊𝐴1−𝐵1 : ∆𝐵1 ⁡= ⁡ OUT𝐵1 (1– OUT𝐵1 )⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡(8) 𝑊𝐴1−𝐵1 (i + 1) = 𝑊𝐴1−𝐵1 (i) + ⁡n∆𝐵1 OUT𝐴1 ⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡(9) For each neuron in the hidden layer, Δ must be calculated and all weights associated with this layer must be set. This process is repeated layer by layer until all weights have been corrected. Despite the many successful applications of inverse propagation, it is not a universal solution. What causes most trouble is an indefinitely long learning process. In complex tasks, it may take days or even weeks to train a network and it may not learn at all. The reason may be one of the following. Network fixation. In the process of learning a network, the weights may become very large values as a result of correction. This can cause all or most neurons to function at very high OUT values, in an area where the compression function derivative is very small. Since the error sent back in the learning process is proportional to this derivative, the learning process can almost freeze. In theory, this problem is poorly understood. Usually this is avoided by reducing the step size of η, but this increases the learning time. Different heuristics have been used to prevent loops or to recover from them, but so far they can only be seen as experimental. Local minimums. Reverse propagation uses a type of gradient descent, i.e. the descent down the error surface, continuously adjusting the weights in the direction of the minimum. The error surface of a complex network is severely cut and consists of elevations, valleys, folds in the space of high dimensionality. A network can hit the local minimum (shallow valleys) when there is a much deeper minimum nearby. At the point of the local minimum, all directions lead upwards and the network is unable to get out of it. The main difficulty in training neural networks are precisely the methods of getting out of the local minimum: each time, based on the local minimum is again looking for the next local minimum by the same method of reverse propagation of the error until it is no longer possible to find an exit. Step size. A careful analysis of the evidence of convergence shows that the correction of weights is assumed to be infinitely small. It is clear that this is not feasible in practice, as it leads to infinite learning time. The step size should be taken as a final one. If the step size is fixed and very small, the convergence is too slow; if it is fixed and too large, you may experience paralysis or constant instability. Effectively increase the step until the improvement of the evaluation in this direction of the anti-gradient stops and decrease if such improvement does not occur. P. D. Wasserman described an adaptive step selection algorithm that automatically corrects step size during training. The book by A. N. Gorban offers a branched technology of learning optimization. It should also be noted that the possibility of network retraining is rather a result of erroneous design of its topology [5]. If there are too many neurons the property of the network to generalize information is lost. The whole set of images provided for learning will be studied by the network, but any other images, even very similar ones, can be classified incorrectly. [3, 14, 15]] 1.2.1. Extracting knowledge from a dataset to determine the distribution type (Normal, uniform) The results of any measurement or observation, presented as figures, can be considered random values corresponding to probable laws. They are likely to understand that it is fundamentally impossible to obtain the actual value of the parameter we are interested in — too many factors affect it, the process of change, etc. We can only get closer to the actual value, estimate the interval in which it falls. All conclusions made when working with random variables are not definitions but probabilistic. An arbitrary random value is most fully described by the distribution function, which determines the probability that a given value will take a value less than or equal to a given one as a result of a single experiment. If a random variable is continuous, its derivative will be a probability density function, and it is impossible to calculate it explicitly for each natural object because of the large set. It is known from the experience of long-term use of the apparatus of applied mathematical statistics that the absolute majority of random phenomena in nature describe functions of only a few types with high accuracy [31]. They are well known; I am called in detail the basic laws of distribution of random variables. Among them, there is one — the normal law of distribution. It is one of the most common models, and the specific features of the function that describes it make this law the main one in applied statistics methods. The normal distribution density graph is characterized by symmetry. This means that deviations from the most probable value of a random value are equal to both greater and lesser ones, a property that simplifies calculations. Most of the methods described prove that the random value under study is distributed by a normal law. Therefore, at the beginning of any statistical analysis processing of unstructured and poorly structured medical of data at least approximately determine the distribution law and estimate the degree of its deviation from the normal [20, 38, 39]. There are many methods that solve the medical problem. The simplest method is based on the visual evaluation of the distribution of asymmetry and excess coefficients by histograms and value. The histogram is a simplified model of the curve of random value density distribution. By constructing it and comparing it with reference plots of the basic laws, we can roughly judge the degree of similarity between them. To build a histogram, a random value is broken down into a certain number of bits (grouping intervals) and counts how many of them fall into each bit. Then, the abscissa is placed on the abscissa axis and the frequencies corresponding to them are placed on the ordinate axis. There are no absolutely strict methods of determining the number of discharges. Mostly 8-12 bits are used. There is no ideal match for histograms of real random variables. By the type of the built histogram we can judge about the degree of deviation from its normal distribution. If the histogram is symmetric with respect to the vertical axis passing through the apex, we can speak about possible approximation of its normal laws. The discrete random value is the most probable value, while continuity is the value where the density of distribution is maximum. If a curve of the distribution law has more than one maximum, the distribution is called binary or poly-modal, respectively. A media random value is a value in relation to which a random value is equally likely to be detected more or less than that value. 1.2.2. Rule "3 sigma" A normal distribution, also called a Gaussian distribution, is a distribution of probabilities defined by the probability density function, which coincides with the Gauss function: 1 (𝑥−µ)2 𝑓(𝑥) = ⁡ 𝑒 22 ⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡(10) √2 where µ - mathematical expectation, 2 – the variance of a random variable. The central limit theorem states that the normal distribution occurs when a given random variable is the sum of a large number of independent random variables, each of which plays an insignificant role in the formation of the whole sum. For example, the distance from a projectile hitting a target with a large number of shots is characterized by the normal distribution. The standard normal distribution is called normal distribution with mathematical expectation µ = 0 and standard deviation  = 1. The rule of 3 sigma (3) - where almost all values of a normally widespread random value lie in the interval [𝑥 − 3; 𝑥 + 3]⁡(Fig. 3). Figure 3: Rule of 3 sigma To be more precise - at least with 99.7% reliability, the value of a normally distributed random value lies within the specified interval (unless the value of x is precisely known and not obtained as a result of sample processing). If the true value of the value is unknown, then you should use not , but s. Thus, the rule of 3 s will turn into the rule of 3 s. 2. Practical Implementation of the Method of Searching for Weakly Structured Information 2.1. UML class diagrams The GUI package (classes that implement the interface) includes 2 classes: MainController and Main. MainController is a class responsible for user interaction; it contains event handlers and the main interface elements (buttons, tables, combo boxes), as well as links to Main class. The Main class is responsible for initializing the main window of the program and loading the interface structure from an FXML file [16, 32]. The neuralNets package contains the NeuralNet interface and the DirectedRandomSearchNet and BackPropagationNeuralNet classes (Fig. 5). The NueralNet interface is used (according to SOLID principles) to provide flexibility of the program architecture. It contains signatures of the following methods: train - training of the neural network, solve - calculation of results by already trained neural network, loadWeights - loading of scales from a file, saveWeights - saving of scales to a file, getWeights - return of scales, and others. Figure 4: GUI package class diagram Figure 5: Neural nets class diagram 2.2. Description of neural network structure To accomplish this task, a neural network (multilayer perceptron) with 3 layers was designed. The input layer consists of 4 neurons (due to the size of the input image), the hidden layer consists of 10 neurons (the size is chosen empirically) and the output layer contains one neuron (Fig. 6) [22]. Figure 6: The structure of the neural network For the hidden layer the logistic unipolar function - sigmoid (11) is used. 1 𝑓(𝑧) = ⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡⁡(10) 1 + 𝑒 −𝑧 The value of landslide neurons - units. Initial values of weights are small random numbers within [-0.3; 0.3]. 2.3. GUI Description The application is focused on cross-platform so developed by means of Java SE 14 and JavaFX (note, to run the application must be installed JRE 10.0.2). To get acquainted with the functionality of the application, consider the main window of the program (Fig. 7). Figure 7: Main program window 1 – Menu file 2 – Selection of neural network type 3 – Button to start the training of the neural network 4 – Training data table 5 – Table of forecasted data 6 – Button for generation of synthetic data 7 – The button to start the process of classifying objects by the neural network 8 – Text field of execution results Figure 8: Menu file The file menu includes the following sub-items: 1 - (New training data) - clearing the "Training data" table; 2 - (Download training data) - uploading data to the "Training data" table from the CSV file; 3 - (Save training data) - save data from "Training data" table to CSV-file; 4 - (New data) - clearing the "Real data" table; 5 - (Download data) - upload data to the table "Real data" from the CSV-file; 6 - (Save data) - save data from table "Real data" to CSV-file; 7 - (New weights) - clear weights for neural network; 8 - (Download Weights) - download weights from the file; 9 - (Save Weights) - save weights to a file; 10 - (Clear Results) - clear the text field results; 11 - (Exit) - finish the program; To increase the functionality of the application, dynamic tables were implemented, i.e. you can add new records to the table, edit and delete existing ones, as well as upload and save unstructured and poorly structured medical data to a file. To improve the functionality of the application, dynamic tables were implemented, i.e. you can add new records to the table, edit and delete existing ones, as well as upload and save data to a file [28, 35]. To load data into a table, you need to go to the menu "File" -> ("Download training data" or "Download data") and select the file you want to upload (only CSV files are supported). The data and weights are saved in the same way - "File" -> ("Save Training Data", "Save Data" or "Save Weights" respectively). To train a neural network, first you need to select the algorithm by which it will learn (Fig. 9). Figure 9: Setting of neural network training parameters After that press the "Train" button and set the neural network training parameters, such as: training speed, maximum number of iterations and maximum permissible error (Fig. 10). Figure 10: Setting of neural network training parameters After setting the training parameters for the neural network, you need to press the "Yes" key to start training. When the neural network finishes training, the user will see the graph of error change (Fig. 11, Fig. 12), as well as information about training results (Fig. 13) and final weights, which can be saved to a file. Figure 11: Error change schedule for Back Propagation network Figure 12: Error change schedule for Directed Random Search network (randomly directed search) Figure 13: Learning outcomes According to the error graphs (Fig. 11, Fig. 12) and the number of iterations, it can be concluded that the Directed Random Search learning algorithm, although more complex to implement, works much faster than the classic Back Propagation. With the same error and speed of learning, training the network using Directed Random Search completed in the twentieth iteration, and using Back Propagation for nine hundred and forty. Synthetic analytical processing of unstructured and poorly structured medical data generated by the "3 sigma" rule is used to test the neural network operation. To generate artificial medical data, press the "Generate" key, specify the number of columns to be filled with artificial medical data and press the "Yes" key (Fig. 14). Figure 14: Options for generation of synthetic data After the generation process is complete, the table will be filled with the corresponding number of columns with artificial data (Fig. 15, Fig. 16). Figure 15: The result of generating synthetic data Figure 16: Calculation result Conclusions The main objective of this work was to review the problem of neural networks and their applications, especially for management and control. A software product implementing the neural network was developed and learned from user-defined medical data. After the training, the program provides support for decision-making based on what it has learned. This program is focused on cross-platform, so it is made by means of Java SE14, and the graphical interface is designed by means of Java FX. The software product implements such a neural network error reverse propagation network (BackPropagation) and the network of directed random search (Directed Random Search). The neural network is trained and further recognizes the type of distribution (uniform, normal) by the specified characteristics. The "3 Sigma" rule is used for generation of synthetic unstructured and poorly structured medical data. According to the research we can conclude that the learning algorithm Directed Random Search, although there is more difficult to implement, but works much faster than the classic Back Propagation. With the same error and speed of learning, training the network using Directed Random Search can be several times faster than Back Propagation. References [1] A. Lisovskaya and T. Skripnik, "Processing of Neural System Information with the Use of Artificial Spiking Neural Networks," 2019 IEEE Conference of Russian Young Researchers in Electrical and Electronic Engineering (EIConRus), Saint Petersburg and Moscow, Russia, 2019, pp. 1183-1186, doi: 10.1109/EIConRus.2019.8656651. [2] A. Liu, Y. Yang, Q. Sun and Q. Xu, "A Deep Fully Convolution Neural Network for Semantic Segmentation Based on Adaptive Feature Fusion," 2018 5th International Conference on Information Science and Control Engineering (ICISCE), Zhengzhou, 2018, pp. 16-20, doi: 10.1109/ICISCE.2018.00013. [3] Andrunyk, V., Vasevych, A., Chyrun, L., Chernovol, N., Antonyuk, N., Gozhyj, A., Gozhyj, V., Kalinina, I. and Korobchynskyi, M. (2020). Development of information system for aggregation and ranking of news taking into account the user needs. Paper presented at the CEUR Workshop Proceedings, , 2604 1127-1171. [4] B. J. Isaac, H. Kinjo, K. Nakazono and N. Oshiro, "Suitable Activity Function of Neural Networks for Data Enlargement," 2018 18th International Conference on Control, Automation and Systems (ICCAS), Daegwallyeong, 2018, pp. 392-397. [5] D. Ageyev, A. Mohsin, T. Radivilova and L. Kirichenko, "Infocommunication Networks Design with Self-Similar Traffic," IEEE 15th International Conference on the Experience of Designing and Application of CAD Systems (CADSM), Polyana, Ukraine, 2019, pp. 24-27, doi: 10.1109/CADSM.2019.8779314. [6] F. Lotfi, V. Ajallooeian and H. D. Taghirad, "Robust Object Tracking Based on Recurrent Neural Networks," 2018 6th RSI International Conference on Robotics and Mechatronics (IcRoM), Tehran, Iran, 2018, pp. 507-511, doi: 10.1109/ICRoM.2018.8657608. [7] G. Zhou, L. Lv, X. Qiao and L. Jin, "Hierarchical Attention-based Fuzzy Neural Network for Subject Classification of Power Customer Service Work Orders," 2019 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), New Orleans, LA, USA, 2019, pp. 1-6, doi: 10.1109/FUZZ-IEEE.2019.8858852. [8] Hengliang Tang, Yuan Mi, Fei Xue, Yang Cao, "An Integration Model Based on Graph Convolutional Network for Text Classification", Access IEEE, vol. 8, pp. 148865-148876, 2020. [9] I. Dronyuk, O. Fedevych and N. Kryvinska, "High Quality Video Traffic Ateb-Forecasting and Fuzzy Logic Management," 2019 7th International Conference on Future Internet of Things and Cloud (FiCloud), Istanbul, Turkey, 2019, pp. 308-311, doi: 10.1109/FiCloud.2019.00051. [10] I. Dronyuk, Y. Klishch and S. Chupakhina, "Developing Fuzzy Traffic Management for Telecommunication Network Services," 2019 IEEE 15th International Conference on the Experience of Designing and Application of CAD Systems (CADSM), Polyana, Ukraine, 2019, pp. 1-4, doi: 10.1109/CADSM.2019.8779323. [11] J. C. Heck and F. M. Salem, "Simplified minimal gated unit variations for recurrent neural networks," 2017 IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS), Boston, MA, 2017, pp. 1593-1596, doi: 10.1109/MWSCAS.2017.8053242. [12] K. Vulinović, L. Ivković, J. Petrović, K. Skračić and P. Pale, "Neural Networks for File Fragment Classification," 2019 42nd International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), Opatija, Croatia, 2019, pp. 1194-1198, doi: 10.23919/MIPRO.2019.8756878. [13] M. Benyamini, S. R. Nason, C. A. Chestek and M. Zacksenhouse, "Neural Correlates of error processing during grasping with invasive brain-machine interfaces*," 2019 9th International IEEE/EMBS Conference on Neural Engineering (NER), San Francisco, CA, USA, 2019, pp. 215- 218, doi: 10.1109/NER.2019.8717020. [14] M. Pasyeka, V. Sheketa, N. Pasieka, S. Chupakhina and I. Dronyuk, "System Analysis of Caching Requests on Network Computing Nodes," 2019 3rd International Conference on Advanced Information and Communications Technologies (AICT), Lviv, Ukraine, 2019, pp. 216-222, doi: 10.1109/AIACT.2019.8847909. [15] Medykovskyy, M., Pasyeka, M., Pasyeka, N. & Turchyn, O. (2017). Scientific research of life cycle perfomance of information technology. Paper presented at the Proceedings of the 12th International Scientific and Technical Conference on Computer Sciences and Information Technologies, CSIT 2017, , 1 425-428. doi:10.1109/STC-CSIT.2017.809882 [16] Mishchuk, O., & Tkachenko, R. (2019). One-step prediction of air pollution control parameters using neural-like structure based on geometric data transformations. Paper presented at the 2019 11th International Scientific and Practical Conference on Electronics and Information Technologies, ELIT 2019 - Proceedings, 192-196. doi:10.1109/ELIT.2019.8892333 [17] Nazarkevych, M., Lotoshynska, N., Brytkovskyi, V., Dmytruk, S., Dordiak, V., & Pikh, I. (2019). Biometric identification system with ateb-gabor filtering. Paper presented at the 2019 11th International Scientific and Practical Conference on Electronics and Information Technologies, ELIT 2019 - Proceedings, 15-18. doi:10.1109/ELIT.2019.8892282 [18] Ö. F. Ertuğrul, R. Tekin and Y. Kaya, "Randomized feed-forward artificial neural networks in estimating short-term power load of a small house: A case study," 2017 International Artificial Intelligence and Data Processing Symposium (IDAP), Malatya, 2017, pp. 1-5, doi: 10.1109/IDAP.2017.8090344. [19] Pasieka, N., Sheketa, V., Romanyshyn, Y., Pasieka, M., Domska, U., & Struk, A. (2019). Models, methods and algorithms of web system architecture optimization. Paper presented at the 2019 IEEE International Scientific-Practical Conference: Problems of Infocommunications Science and Technology, PIC S and T 2019 - Proceedings, 147-152. doi:10.1109/PICST47496.2019.9061539 [20] Pasyeka, M., Sheketa, V., Pasieka, N., Chupakhina, S., & Dronyuk, I. (2019). System analysis of caching requests on network computing nodes. Paper presented at the 2019 3rd International Conference on Advanced Information and Communications Technologies, AICT 2019 - Proceedings, 216-222. doi:10.1109/AIACT.2019.8847909 [21] Q. Wang and M. Iwaihara, "Deep Neural Architectures for Joint Named Entity Recognition and Disambiguation," 2019 IEEE International Conference on Big Data and Smart Computing (BigComp), Kyoto, Japan, 2019, pp. 1-4, doi: 10.1109/BIGCOMP.2019.8679233. [22] R. Jozefowicz, W. Zaremba and I. Sutskever, "An empirical exploration of recurrent network architectures", Proc. Int'l Conf. on Machine Learning, pp. 2342-2350, 2015. [23] R. Wang, Z. Li, J. Cao, T. Chen and L. Wang, "Convolutional Recurrent Neural Networks for Text Classification," 2019 International Joint Conference on Neural Networks (IJCNN), Budapest, Hungary, 2019, pp. 1-6, doi: 10.1109/IJCNN.2019.8852406. [24] Romanyshyn, Y., Sheketa, V., Pikh, V., Poteriailo, L., Kalambet, Y. & Pasieka, N. (2019). Social- communication web technologies in the higher education as means of knowledge transfer. Paper presented at the IEEE 2019 14th International Scientific and Technical Conference on Computer Sciences and Information Technologies, CSIT 2019 - Proceedings, , 3 35-38. doi:10.1109/STC- CSIT.2019.8929753 [25] S. Ying and Q. Jianguo, "A Method of Arc Priority Determination Based on Back-Propagation Neural Network," 2017 4th International Conference on Information Science and Control Engineering (ICISCE), Changsha, 2017, pp. 38-41, doi: 10.1109/ICISCE.2017.18. [26] Seungwan Seo, Czangyeob Kim, Haedong Kim, Kyounghyun Mo, Pilsung Kang, "Comparative Study of Deep Learning-Based Sentiment Classification", Access IEEE, vol. 8, pp. 6861-6875, 2020. [27] T. Dong and T. Huang, "Neural Cryptography Based on Complex-Valued Neural Network," in 2019 IEEE Transactions on Neural Networks and Learning Systems, doi: 10.1109/TNNLS.2019.2955165. [28] Tao Dong, Qinqin Zhang, "Dynamics of a Hybrid Circuit System With Lossless Transmission Line", Access IEEE, vol. 8, pp. 92969-92976, 2020. [29] Tianyu Gao, Jin Yang, Wenjun Peng, Luyu Jiang, Yihao Sun, Fangchuan Li, "A Content-Based Method for Sybil Detection in Online Social Networks via Deep Learning", Access IEEE, vol. 8, pp. 38753-38766, 2020. [30] Ting He, Ying Liu, Chengyi Xu, Xiaolin Zhou, Zhongkang Hu, Jianan Fan, "A Fully Convolutional Neural Network for Wood Defect Location and Identification", Access IEEE, vol. 7, pp. 123453-123462, 2019. [31] Tkachenko, R., Izonin, I., Kryvinska, N., Dronyuk, I., & Zub, K. (2020). An approach towards increasing prediction accuracy for the recovery of missing iot data based on the grnn-sgtm ensemble. Sensors (Switzerland), 20(9) doi:10.3390/s20092625 [32] Tkachenko, R., Izonin, I., Vitynskyi, P., Lotoshynska, N., & Pavlyuk, O. (2018). Development of the non-iterative supervised learning predictor based on the ito decomposition and sgtm neural- like structure for managing medical insurance costs. Data, 3(4) doi:10.3390/data3040046 [33] V. Sheketa, L. Poteriailo, Y. Romanyshyn, V. Pikh, M. Pasyeka and M. Chesanovskyy, "Case- Based Notations for Technological Problems Solving in the Knowledge-Based Environment," 2019 IEEE 14th International Conference on Computer Sciences and Information Technologies (CSIT), Lviv, Ukraine, 2019, pp. 10-14, doi: 10.1109/STC-CSIT.2019.8929784. [34] W. Huang, S. Oh and W. Pedrycz, "Hybrid Fuzzy Wavelet Neural Networks Architecture Based on Polynomial Neural Networks and Fuzzy Set/Relation Inference-Based Wavelet Neurons," in IEEE Transactions on Neural Networks and Learning Systems, vol. 29, no. 8, pp. 3452-3462, Aug. 2018, doi: 10.1109/TNNLS.2017.2729589. [35] Xiangyu Bu, Tao Dong, "Differential Privacy Optimal Consensus for Multiagent System by Using Functional Perturbation", Information Cybernetics and Computational Social Systems (ICCSS) 2019 6th International Conference on, pp. 157-162, 2019. [36] Y. Huang, L. F. Capretz and D. Ho, "Neural Network Models for Stock Selection Based on Fundamental Analysis," 2019 IEEE Canadian Conference of Electrical and Computer Engineering (CCECE), Edmonton, AB, Canada, 2019, pp. 1-4, doi: 10.1109/CCECE.2019.8861550. [37] Y. Lin, C. Chou, S. Yang, H. Lai, Y. Lo and Y. Chen, "Neural Decoding Forelimb Trajectory Using Evolutionary Neural Networks with Feedback-Error-Learning Schemes," 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Honolulu, HI, 2018, pp. 2539-2542, doi: 10.1109/EMBC.2018.8512775. [38] Yan Cheng, Leibo Yao, Guoxiong Xiang, Guanghe Zhang, Tianwei Tang, Linhui Zhong, "Text Sentiment Orientation Analysis Based on Multi-Channel CNN and Bidirectional GRU With Attention Mechanism", Access IEEE, vol. 8, pp. 134964-134975, 2020. [39] Z. Mohammadi, A. Klug, C. Liu and T. C. Lei, "Data reduction for real-time enhanced growing neural gas spike sorting with multiple recording channels," 2019 9th International IEEE/EMBS Conference on Neural Engineering (NER), San Francisco, CA, USA, 2019, pp. 1084-1087, doi: 10.1109/NER.2019.8717062. [40] Z. Ying, Z. Xing, C. Jian and S. Hui, "Processor Free Time Forecasting Based on Convolutional Neural Network," 2018 37th Chinese Control Conference (CCC), Wuhan, 2018, pp. 9331-9336, doi: 10.23919/ChiCC.2018.8483132.