Hybrid GMDH Deep Learning Networks - State-of Art and New Prospective Trends Yuriy Zaychenkoa and Galib Hamidovb a Institute for Applied System Analysis, Igor Sikorsky Kyiv Polytechnic Institute, Peremogy avenue 37, Kyiv, 03056, Ukraine b Information Technologies Department, Azershig,str. K. Kazim-zade 20, Baku, , AZ1008, Azerbaijan Abstract In this paper new class of deep learning (DL) neural networks is considered and investigated- so-called hybrid DL networks based on self-organization method Group Method of Data Handling (GDMDH). The application of GMDH enables not only to train neural weights but to construct the network structure as well. As nodes of this structure different elementary neurons with two inputs may be used. So, the advantage of such structure is a small number of tuning parameters. In the paper the following types of neurons are considered: Wang- Mendel network with two inputs and neo-fuzzy neurons. The advantage of the neo-fuzzy neurons is unlike general fuzzy neurons is absence of fuzzy membership functions training and less computational time for training. The application of GMDH enables to train neuron weights sequentially layer after layer in the process of construction network structure until the stop criterion holds. Such approach allows to exclude drawbacks of DL training algorithms -decay or explosion of gradient. The process of structure construction and optimization using GMDH algorithm is presented. The numerous applications of suggested hybrid DL networks for solution of AI problems like forecasting of share prices and market indicators at various stock exchanges are considered and analyzed. The comparison with conventional DL networks is performed which enables to estimate their efficiency and advantages. Keywords 1 Hybrid deep learning networks, self-organization, structure optimization, forecasting 1. Introduction Last years deep learning (DL) networks are widely used in different problems of artificial intelligence: forecasting, pattern recognition, medical diagnostics, etc.[1-4]. For its training various algorithms were developed usually based on Back propagation method. Presence of many layers when using gradient algorithm usually lead to occurrence drawbacks as vanishing or explosion of gradient. Therefore, the approach was suggested how to exclude this drawback to perform layer after layer training using stacked encoder-decoder or stacked restricted Boltsmann machines [1, 2]. However, the problem is left how to choose the number of layers in DL network. The existing methods of DL don’t enable to generate structure of Dl networks. But the training process will be more efficient if to adapt not only neuron weights but the structure of network as well. For this goal the application of GMDH method seems very promising. GMDH is based on principle of self- organization and enables to construct network structure automatically in the process of algorithm run [5-7]. In the previous years GMDH-neural networks having active neurons [5-7], R-neurons [19], Q-neurons [3] as nodes were developed; in the area integrating fuzzy GMDH and neural networks the GMDH- neuro-fuzzy systems and GMDH-neo-fuzzy systems [13] were developed; GMDH-wavelet-neuro- fuzzy systems [14,15] were also elaborated. The very important property of GMDH is that as building blocks for construction of a structure of DL networks elementary models with only two inputs, so-called Information Technology and Implementation (IT&I-2021), December 01–03, 2021, Kyiv, Ukraine EMAIL: zaychenkoyuri@ukr.net (A. 1); galib.hamidov@gmail.com (A. 2) ORCID: 0000-0001-9662-3269 (A. 1); 0000-0002-9942-1950 (A. 2) ©️ 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) 135 partial descriptions, are used. This allows to cut substantially training time for hybrid Dl network as compared with conventional DL networks. Therefore, GMDH-hybrid neuro-fuzzy system was developed in [16] that combines advantages of the traditional GMDH and DL fuzzy networks and may be trained with simple learning procedures. The nodes of this network are Wang-Mendel elementary neural networks with only two inputs. The experimental investigations of this class of hybrid DL networks have shown their efficiency and preference over conventional DL networks. But the drawbacks of application of Wang-Mendel networks as nodes of hybrid Dl networks lies herein that it’s necessary to train not only neural weights but membership functions as well. Therefore, later another class of hybrid networks – GMDH – neo-fuzzy networks were developed wherein as nodes of network -neo-fuzzy neurons with two inputs are used [17]. For their training its necessary to adapt only neuron weights that demands less computational resources and cuts training time. That’s very important for DL networks with a large number of hidden layers. The experimental investigations of hybrid -neo-fuzzy networks and comparison with conventional DL network have shown their efficiency and less computational calculations for training. The goal of this paper is to investigate different hybrid GMDH-neo-fuzzy networks with small number of adjusted parameters and estimate their efficiency for structure optimization and forecasting. 2. Hybrid network structure optimization based on GMDH method The GMDH method was used to synthesize the structure of the hybrid network based on the principle of self-organization. The successive increase in the number of layers is carried out until the value of the external criterion of optimality MSE begins to increase for the best model of the current layer. In this case it is necessary to return to the previous layer, to find there the best model with the minimum value of criterion. Then we move backward, go through its connections, find the corresponding neurons of the previous layer. This process continues until we reach the first layer and the corresponding structure is automatically determined. The process of synthesis of the network structure in the forward direction is shown in Fig. 1 where in green color the outputs which passed through selection block (SB)are shown while in red color - outputs which were dropped (excluded) by SB. The process of restoring the desired structure in the backward direction is shown in Fig. 2. In the yellow color nodes and their connections selected by this process are indicated Figure 1. Hybrid network structure construction using GMDH method The corresponding optimal constructed structure of the hybrid network for this forecasting problem is shown in Fig. 3. It consists of 3 layers: first layer has 3 neo-fuzzy neurons, second layer- two neurons and the last- one neuron. 136 Figure 2. Process of restoring found optimal structure in backward direction Figure 3. Optimal Structure of hybrid network for covid forecast constructed by GMDH 3. Experimental investigations of hybrid GMDH -fuzzy networks in forecasting problems For efficiency estimation of hybrid GMDH DL networks the problems of the forecasting share price and market indices at the stock exchanges were considered. The experimental investigations for stock prices forecasting were carried out. In the first experiment as a forecasted variable the RTS index in 2013 with time step one week was chosen. As external regressors (inputs) stock prices of the leading companies were used. Total sample had 55 points that was used while searching the optimal partial description in the GMDH. As the accuracy criteria of the obtained models MAPE and RMSE were used. In the first experiment the dependence of MAPE on inputs number was explored. The forecasting results for hybrid neuro- fuzzy network are presented in the table 1. For comparison the corresponding results for full cascade neo-fuzzy network (NFN) network are presented. As it follows, hybrid GMDH-neuro-fuzzy network has higher accuracy than the cascade neuro-fuzzy network due to properties of hybrid networks. In the next experiment the problem of forecasting share prices of Microsoft corp was considered. As input sample the stock prices of Microsoft corp. since 01.11.14 to 29.12.14 were used. The sample size was 64 points. The training sample included 62 points the test sample 4 points. The forecasting interval was for 4 steps ahead, the first two steps are checked with available data. The constructed GMDH- neuro-fuzzy network had 6 fuzzy inputs. The experimental results are presented in Table 2 and Table 3. 137 As it follows from the experimental results the GMDH-neuro-fuzzy network showed better forecasting accuracy than the cascade neuro-fuzzy network. Its MAPE value doesn’t exceed 0.4%. Table 1. Accuracy for hybrid GMDH-network and Cascade neo-fuzzy network MAPE for hybrid GMDH- number of inputs MAPE for cascade NFN network 2 0,04038 0,06031 4 0,03950 0,05141 6 0,03998 0,04425 8 0,04248 0,04396 10 0,04935 0,05171 12 0,04084 0,04465 Table 2. Forecasting Results for hybrid GMDH- network Date Real value Predicted value Absolute error Relative error, % 26.12.14 18030,2 17971,63 58,577 0,325 24.12.14 18053,7 17991,94 61,772 0,342 Table 3. Forecasting Results (MAPE) for Different Neuro-Fuzzy Networks and GMDH GMDH-neuro-fuzzy Cascade-neuro-fuzzy Real value GMDH system network network 48.14 0.623 1. 20 3.40 47.88 2.13 1.94 2.54 average 1.377 1.57 2.97 In the next experiment training time for GMDH-neuro-fuzzy network, and cascade fuzzy network were compared. In the Table 4 the training time in seconds for hybrid GMDH- neuro-fuzzy network and full cascade neuro-fuzzy network is presented. As an initial sample Microsoft stock prices in the same period since 01.11.14 to 29.12.14 was used. Table 4. Training time for hybrid GMDH network and cascade network Inputs number GMDH-hybrid network, s cascade network, s 2 0,004 0,015 4 0,009 0,021 6 0,013 0,037 8 0,021 0,048 10 0,030 0,053 In the next experiments efficiency of hybrid neo-fuzzy network in forecasting index NASDAQ was explored. The data was taken in the period from 13.11. 17 till 29.11.19. The sample size was 510 points. As an output variable the closing price of the index NASDAQ next day was taken. In the first experiment the accuracy dependence on number of inputs for hybrid neo-fuzzy network was investigated. In the table 5 the forecasting results are presented under different inputs number with 8 membership functions per variable (parameter h) and ratio training/test =70/30. In the next experiment the investigation of error dependence on number of MF per variable (parameter h) was performed. Number of inputs was n=5, training/test ratio was 70/30. The results are presented in Fig.4. Analyzing these results, one may conclude that with MF number rise MAPE first falls, then attains minimum and after then begins to rise. That fully matches to self- organization principle of GMDH 138 method [3]. The best value was obtained with the following parameters values: number of inputs n=5, h=8, number of layers 4 and MAPE value is 3,91. Table 5. Forecasting MAPE versus inputs number for hybrid neo-fuzzy netwrok Inputs 2 3 4 5 6 7 8 9 10 number MAPE 5,2 4,7 4,33 3,91 4,22 4,72 5,24 5,53 5,85 Figure 4. MAPE versus number of membership functions h per variable For forecasting efficiency estimation of the hybrid network, it was compared with a cascade neo- fuzzy network [11] and GMDH at the same data. In the cascade neo-fuzzy network, the following parameters values were used: number of inputs n=9, number of rules 9, cascades number is 3. The comparative forecasting results are presented in the table 6, training sample – 70%. Analyzing these results one can easily conclude the suggested hybrid neo-fuzzy network and neuro- fuzzy network have the best accuracy, the second one is GMDH method and the worst is the cascade neo-fuzzy network. The forecasting accuracy of both hybrid networks differs insignificantly. In the next experiments the training time of different hybrid networks and alternative NN was investigated and compared. In the table 7 the training time in seconds for GMDH-neuro-fuzzy and - neo-fuzzy network and full cascade neuro-fuzzy network are presented. As an initial sample we used Microsoft stock prices in the period since 01.11.14 to 29.12.14., a sample size is 64 points. As it follows from the presented results the least training time has hybrid neo-fuzzy network, the second place takes hybrid neuro fuzzy network and the last is full cascade network 4. Optimization of hybrid GMDH -neo-fuzzy network in the problem of forecasting In the next experiments investigations of hybrid GMDH -neo-fuzzy network in the problem of Dow Jones Index forecasting were performed and compared with FNN ANFIS. The Dow Jones is the stock index of the 30 largest American companies, which was founded in 1896. The initial data was taken from Yahoo, a leading financial information provider owned by Yahoo! To prepare the initial data, data were uploaded at various intervals, namely the value of the stock index by days, weeks and months. Each of the sets contains the following data:  Date - data period;  Open - opening price;  High - the highest price for the period;  Low - the lowest price for the period;  Close - the price at the end of the period; 139  Adj Close - average closing price;  Volume - sales for the period. Table 6. MAPE values for different forecasting methods Hybrid Hybrid GMDH- neo- inputs Cascade neo-fuzzy neuro-fuzzy fuzzy GMDH number/method neural network network network 4 inputs 4,30 4,31 4,19 6,04 5 inputs 3,93 3,91 4,11 6,09 6 inputs 4,35 4,36 5,53 8,01 7 inputs 4.80 4,77 6,26 8,68 Table 7. Training time for different fuzzy neural models Inputs number Time for GMDH- Time for GMDH- Time for full cascade neuro-fuzzy neo-fuzzy network, network, s network, s s 2 0.004 0.003 0.015 4 0.009 0.007 0.021 6 0.013 0.012 0.037 8 0.021 0.018 0.048 10 0.030 0.025 0.053 The data set for the interval of one day contains 4867 records, of which non-zero records are 4788 ones. The data set for the interval one month contains 1001 records, of which 1000 records are non- zero. The data set for the interval of one month contains 195 records, of which 195 are non-zero. Data normalizing. Reduction to a single scale is provided by normalization of each variable to the range of its values. In the simplest case, it is a linear transformation 𝑥−𝑥𝑖 𝑚𝑖𝑛 m𝑥̅𝑖 = 𝑥 in the interval 𝑥𝑖 ∈ [0, 1]. 𝑖 𝑚𝑎𝑥 −𝑥𝑖 𝑚𝑖𝑛 To find the most informative features as an input vector, we have alternately trained the network on data sets that transmit only the following features subsets: ('Open', 'High', 'Low', 'Volume', 'Close'); ('Open', 'High', 'Low', 'Volume'); ('Open', 'High', 'Low', 'Close'); ('Open', 'High', 'Low'); ('Open', 'High', 'Close'); ('Open', 'High', 'Volume'); ('Open', 'Close', 'Low'); ('Open', 'Volume', 'Low'); ('High', 'Low', 'Close'); ('Open', 'High'); ('High', 'Close'); ('Low', 'Close'); ('Open', 'Volume'). The main network parameters that can be configured include the size of the input vector, the number of rules, and the function that sets them, the number of parameters that are transferred to the next layer. The size of the input vector is determined by the number of informative features that are transmitted for training, and the number of days on the basis of which the network gives the predicted value. Also, the number of network functions that can be set includes the number of membership functions and their appearance, as well as the degree of freedom of choice of the system. To select these parameters, it is necessary to conduct an experiment, training the system, setting these parameters in the interval, and keeping those that give the best results in the test sample. The following parameters were investigated:  n – number of preceding days, based on which the forecasting is performed (sliding window size). N ∈ [1; 6];  h - number of membership functions in each node, h ∈ [2; 9]; 𝑥−𝑐 2 (𝑏 – 𝑎)  s – membership function parameter 𝑒𝑥𝑝 [− ( 2𝜎 𝑖) ], where 𝜎 = ℎ (𝑠 ∗ (ℎ − 1)); 140  𝑏 – an interval end;  𝑎 – an interval beginning;  ℎ – membership functions number, which cover the interval;  𝑠 ∈ [0.01; 1.5];  𝑓 − number of parameters which are transferred to the network next layer (freedom of choice). To set of initial data was divided into a training sample and test sample in the ratio of 70% and 30%, respectively. Having launched GMDH-neo-fuzzy system for training, values of MAE and MAPE criteria were obtained with different combinations of these parameters. For the Dow Jones stock index with different forecast intervals, the best parameters for the different set of informative features were obtained as a result of training and testing, which are shown in Table 8. Table 8. The results of the selection of the optimal parameters of GMDH-neo-fuzzy system for Dow Jones index with different prediction intervals Sets of informative 1 month 1 week features 𝑛 ℎ 𝑓 𝑠 МАЕ МАPE 𝑛 ℎ 𝑓 𝑠 МАЕ МАPE 'Open', 'High', 'Low', 1 2 2 1.0 0.0147 0.0452 2 4 2 0.7 0.0077 0.0295 'Volume', 'Close' 'Open', 'High', 'Low', 1 2 3 1.3 0.0156 0.0476 2 4 3 0.9 0.0086 0.0332 'Volume' 'Open', 'High', 'Low', 1 2 2 1.0 0.0147 0.0453 2 4 2 0.7 0.0077 0.0295 'Close' 'Open', 'High', 'Low' 1 2 3 1.3 0.0156 0.0476 2 4 3 0.9 0.0086 0.0332 'Open', 'High', 'Close' 1 2 3 1.2 0.0153 0.0467 2 4 3 0.9 0.0079 0.0309 'Open', 'High', 'Volume' 5 2 5 0.1 0.0177 0.0654 2 4 3 1.0 0.0098 0.0380 'Open', 'Low', 'Close' 1 2 3 1.2 0.0147 0.0456 2 4 3 0.7 0.0081 0.0308 'Open', 'Volume', 'Low' 5 3 7 0.1 0.0171 0.0644 4 2 6 0.1 0.0095 0.0348 'High', 'Low', 'Close' 1 2 2 1.0 0.0147 0.0453 2 4 2 0.7 0.0077 0.0295 'Open', 'High' 5 2 5 0.1 0.0177 0.0654 2 4 3 1.0 0.0098 0.0380 'Open', 'Close' 1 2 2 1.3 0.0165 0.0498 2 4 3 0.6 0.0085 0.0331 'High', 'Close' 1 2 2 1.2 0.0154 0.0467 2 4 3 0.9 0.0079 0.0309 'Low', 'Close' 1 2 2 1.2 0.0147 0.0456 2 4 2 0.7 0.0081 0.0306 'Open', 'Volume' 5 2 2 0.8 0.0189 0.0689 3 4 2 0.1 0.0112 0.0445 Thus, analyzing presented results one may conclude that the most informative for GMDH-neo-fuzzy system are the following sets of features: ['Open', 'High', 'Close'], ['Open', 'Low', 'Close'], ['High', 'Low', 'Close'], ['High', 'Close'], ['Low', 'Close']. For the Dow Jones stock index for one month forecast period, the following optimal configurations of GMDH-neo-fuzzy network were obtained:  the number of informative features - 3;  the number of periods on the basis of which the forecast is made - 1;  the number of membership functions in each of the nodes - 2;  the number of layers - 2;  the number of nodes in the first layer – 3;  number of nodes on the second layer – 1. For the Dow Jones stock index for the one week forecast period, the following optimal configurations of the GMDH-neo-fuzzy system were obtained:  the number of informative features- 3;  the number of periods on the basis of which the forecast is made - 2;  the number of membership functions in each of the nodes - 4; 141  the number of layers - 2;  the number of nodes on the first layer - 15;  the number of nodes on the second layer – 1. The form of the membership function for forecasting interval of one week is shown in the Figure 5. Figure 5. Forms of the membership function of Dow Jones index for the forecast period of 1 week For the Dow Jones stock index for one week forecast period, the following optimal configurations of GMDH-neo-fuzzy network were obtained:  number of informative features - 3;  the number of periods on the basis of which the forecast is made - 5;  the number of membership functions in each of the nodes - 2;  the number of layers - 2;  the number of nodes in the first layer - 105;  the number of nodes in the second layer - 1 Next, experiments were performed to find the optimal values of the parameters of FNN ANFIS. The size of the input vector is determined by the number of informative features that are transmitted for training, and the number of days of prehistory, on the basis of which the forecasting is performed. To select these parameters, an experiment was performed, including training of the network, setting these parameters in the interval, and choosing those that give the best results at the test sample. The following intervals for parameters were set:  n is the number of previous days on the basis of which the forecast is made (the size of the sliding window), n∈ [1; 6];  h - the number of membership functions in each of the nodes, h ∈ [2; 9]. The set of initial data was divided into a training sample and test data in the proportion of 70% and 30%, respectively. By launching the ANFIS network with different combinations of these parameters, data on MAE and MAPE criteria were obtained. For the Dow Jones stock index one month forecast period, the following optimal ANFIS network configurations were obtained:  number of informative features - 3;  number of nodes – 6;  the number of periods on the basis of which the forecast is made - 2;  the number of membership functions in each of the nodes - 6. After finding all the optimal parameters of GMDH-neo-fuzzy system and training parameters, the system was trained, and then the data for prediction was provided. Training and testing of the system took place on data for the period up to 01.01.2021 for monthly periods, and until 01.06.2021 for weekly and day periods. Forecasting was based on data for the period after 01.01.2021. for monthly periods and after 01.06.2021 for day and week periods. For Dow Jones index with a forecast period of one month, the following forecasting data were obtained: MAE - 0.02952; MAPE - 0.0335, forecasting time - 0.00025s. Learning and forecasting results are shown in Figure 6. 142 5. Comparison of forecasting results of GMDH-neo-fuzzy system and ANFIS network Experimental investigations of the accuracy of Dow Jones index forecasting with forecasting intervals of one month, one week and one day were performed. using a hybrid GMDH-neo-fuzzy network. For each prediction interval the optimal parameters found in previous experiments were selected. A comparative analysis with the forecasting results obtained by FNN ANFIS was performed. According to the results of forecasting, values of MAE, MAPE and training time for each type of neural network were obtained. All comparison results are summarized in Tables 10 – 12. Table 9. The results of the selection of the optimal characteristics of ANFIS network for Dow Jones index with different forecast intervals Sets of 1 month 1 week 1 day informative 𝑛 ℎ МАЕ МАPE 𝑛 ℎ МАЕ МАPE 𝑛 ℎ МАЕ МАPE features 'Open', 2 6 0.222 0.0710 1 9 0.0091 0.0334 1 10 0.0037 0.0142 'High', 'Low' 'Open', 2 3 0.0223 0.0727 2 8 0.0080 0.0303 1 11 0.0034 0.0129 'High', 'Close' 'Open', 2 6 0.0192 0.0680 2 10 0.0804 0.0307 1 5 0.0045 0.0154 'Low', 'Close' 'High', 'Low', 2 8 0.0209 0.0720 2 9 0.0903 0.0325 2 10 0.0036 0.0134 'Close' 'High', 2 9 0.0223 0.0750 1 3 0.0077 0.0282 1 7 0.0035 0.0135 'Close' 'Low', 2 7 0.0201 0.0691 1 5 0.0094 0.0338 1 5 0.0035 0.0136 'Close' As one can see, for all forecasting intervals the best forecasting results were obtained for hybrid GMDH-neo-fuzzy system. The worst forecasting result for ANFIS network was obtained for one month forecasting period. The largest difference in the accuracy of forecasting by both criteria was obtained for the forecasting period of one month (over 200%). As the forecasting period decreases, the gap between the networks accuracy also decreases. In addition, training and direct prediction times were also significantly less for hybrid GMDH-neo-fuzzy system. Figure 6. Results of training and forecasting Dow Jones Index with interval one month by hybrid GMDH neo-fuzzy system 143 Table 10. Comparison of the forecasting results of GMDH-neo-fuzzy neural network and FNN ANFIS for Dow Jones Index with forecasting interval 1 month Criterion GMDH-neo-fuzzy Мережа ANFIS Difference neural network MAE at training sample 0.016938 0.016135 4.70% MAPE at training sample 0.061866 0.052607 14.97% MAE at test sample 0.02952 0.096734 -227.68% MAPE at test sample 0.03350 0.107397 -220.59% Training time (sec.) 0.0023246 75.258 32375x Forecasting time (sec) 0.0003123 0.02652 84.92x Table 11. Comparison of the forecasting results of GMDH-neo-fuzzy neural network and FNN ANFIS for Dow Jones Index with forecasting interval 1 week Criterion GMDH-neo-fuzzy FNN ANFIS Difference neural network MAE at training sample 0.007949 0.008564 -7.74% MAPE at training sample 0.029890 0.029291 2.00% MAE at test sample 0.011476 0.019279 -67.99% MAPE at test sample 0.012468 0.020923 -67.82% Training time (sec.) 0.012840 194.3520 14980x Forecasting time (sec) 0.00027132 0.028604 105.42x Table 12. Comparison of the forecasting results of GMDH-neo-fuzzy neural network and FNN ANFIS for Dow Jones Index with forecasting interval 1 day Criterion GMDH-neo-fuzzy FNN ANFIS Difference neural network MAE at training sample 0.003618 0.004234 -17.03% MAPE at training sample 0.013981 0.014067 -0.615% MAE at test sample 0.005348 0.005822 -8.86% MAPE at test sample 0.005812 0.005822 -0.172% Training time (sec.) 0.19944 876.3658 4394.13x Forecasting time (sec) 0.00040317 0.038055 94.39x 6. Conclusion In the paper hybrid GMDH -neuro-fuzzy and neo-fuzzy networks are considered and investigated. The algorithm of hybrid network structure synthesis is presented and demonstrated at the problem of forecasting. The experimental investigations of the hybrid networks were carried out and compared with conventional DL networks. The experiments have shown that forecasting accuracy of hybrid neuro-fuzzy and neo- fuzzy networks at the considered problems are approximately equal and it’s better than for alternative DL cascade neo- fuzzy networks and GMDH. The problem of forecasting Dow Jones Index with application of hybrid neo- fuzzy networks was considered, investigated and compared with FNN ANFIS at the different forecasting intervals. The optimal parameters of hybrid neo- fuzzy networks were found. The experimental results have shown the forecasting accuracy of hybrid neo- fuzzy networks is much better than for FNN ANFIS. The training time is the least for hybrid neo- fuzzy networks as compared with all considered alternative DL networks. 144 In a whole the hybrid DL networks based on GMDH are free from drawbacks of conventional DL networks- decay or explosion of gradient. Besides, they enable to construct optimal network structure automatically in the process of algorithm GMDH run and additionally they demand less computational costs for training due to small number of tunable parameters (only two) in every hidden node as compared with DL networks of general structure. That’s is especially significant for DL networks with large number of layers. 7. References [1] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. MIT Press, 2016. [2] G. Hinton, S. Osindero, and Y.-W. Teh, “A fast learning algorithm for deep belief nets,” Neural Computation, vol. 18, no. 7, pp. 1527–1554, May 2006. [3] Y. Bengio, Y. LeCun, and G. Hinton, “Deep learning,” Nature, no. 521, pp. 436–444, May 2015. [4] J. Schmidhuber, “Deep learning in neural networks: an overview,” Neural Networks, no. 61, pp. 85-117, 2015. [5] A.G. Ivakhnenko, G.A. Ivakhnenko, J.A. Mueller, Self-organization of the neural networks with active neurons. Pattern Recognition and Image Analysis 4, 2 (1994): 177- 188. [6] A.G. Ivakhnenko, D. Wuensch, G.A. Ivakhnenko, Inductive sorting-out GMDH algorithms with polynomial complexity for active neurons of neural networks. Neural Networks 2 (1999): 1169- 1173. [7] G.A. Ivakhnenko, Self-organization of neuronet with active neurons for effects of nuclear test explosions forecasting. System Analysis Modeling Simulation 20 (1995): 107-116. [8] M. Zgurovsky, Yu. Zaychenko, Fundamentals of computational intelligence- System approach. Springer, 2016. [9] L.-X. Wang and J. M. Mendel, “Fuzzy basis functions, universal approximation, and orthogonal least-squares learning”. IEEE Trans. on Neural Networks 3, №5 (1992): 807-814. [10] J.-S. Jang, “ANFIS: Adaptive-network-based fuzzy inference systems”. IEEE Trans. on Systems, Man, and Cybernetics. 23 (1993,): 665-685. [11] T. Yamakawa, E. Uchino, T. Miki, H. Kusanagi, A neo-fuzzy neuron and its applications to system identification and prediction of the system behavior, in: Proc. 2nd Intеrn. Conf. Fuzzy Logic and Neural Networks «LIZUKA-92». Lizuka, 1992, pp. 477-483. [12] Ye. Bodyanskiy, N. Teslenko and P. Grimm, Hybrid evolving neural network using kernel activation functions, in: Proc. 17th Zittau East-West Fuzzy Colloquium, Zittau/Goerlitz, HS, 2010, pp. 39 46. [13] Ye. Bodyanskiy, Yu. Zaychenko, E. Pavlikovskaya, M. Samarina and Ye. Viktorov, The neo- fuzzy neural network structure optimization using the GMDH for the solving forecasting and classification problems, Proc. Int. Workshop on Inductive Modeling, Krynica, Poland, 2009, pp. 77-89. [14] Ye. Bodyanskiy, O. Vynokurova, A. Dolotov, O. Kharchenko, Wavelet-neuro-fuzzy network structure optimization using GMDH for the solving forecasting tasks, in: Proc. 4th Int. Conf. on Inductive Modelling ICIM 2013, Kyiv, 2013, pp. 61-67. [15] Ye. Bodyanskiy, O. Vynokurova and N. Teslenko, Cascade GMDH-wavelet-neuro-fuzzy network, in: Proc. 4th Int. Workshop on Inductive Modeling «IWIM 2011», Kyiv, Ukraine, 2011, pp. 22-30. [16] Ye. Bodyanskiy, O. Boiko Yu. Zaychenko, G. Hamidov, Evolving Hybrid GMDH-Neuro-Fuzzy Network and Its Applications, in: Proceedings of the Intern conference SAIC 2018, Kiev, Ukraine, 2018. [17] Evgeniy Bodyanskiy, Yuriy Zaychenko, Olena Boiko, Galib Hamidov, Anna Zelikman. The hybrid GMDH-neo-fuzzy neural network in forecasting problems in financial sphere, in: Proceedings of the International conference IEEE SAIC 2020, Kiev, Ukraine, 2020. [18] D. T. Pham and X. Liu, Neural Networks for Identification, Prediction and Control. London, Springer-Verlag, 1995. [19] T. Ohtani, Automatic variable selection in RBF network and its application to neuro-fuzzy GMDH, Proc. Fourth Int. Conf. on Knowledge-Based Intelligent Engineering Systems and Allied Technologies, 2000, V.2, pp. 840-843 145