161 The Neo-Fuzzy Autoencoder for Adaptive Deep Neural Systems and its Learning Yevgeniy Bodyanskiy, Iryna Pliss, Olena Vynokurova Control Systems Research Laboratory, Kharkiv National University of Radio Electronics, UKRAINE, Kharkiv, 14 Nauky Ave., email: yevgeniy.bodyanskiy@nure.ua, iryna.pliss@nure.ua, vynokurova@gmail.com Abstract: In this paper the autoencoder based on the Kolmogorov’s neuro-fuzzy network [10-14] that implements generalized neo-fuzzy neurons is proposed. Also its fast the multiresolution approach and is the universal approximator learning algorithm was proposed. Such system can be used according to the theorem of Kolmogorov-Arnold and Yam- as part of deep learning neural networks. The proposed Nguen-Kreinovich. neo-fuzzy autoencoder is characterized by high learning It should be notice in [15, 16] the architecture, which nodes speed and less number of tuned parameters in comparison are the neo-fuzzy neurons (NFNs) [17], is considered. In spite with well-known approaches. The efficiency of proposed of the simplicity of learning algorithm for synaptic weights, approach has been examined based on different such system is abundant in the sense of the membership benchmarks and real data sets. functions number. Keywords: neo-fuzzy autoencoder, deep learning Using the generalized neo-fuzzy neurons [18] instead of network, neo-fuzzy neuron, fast learning algorithm, data conventional NFN allows significantly to reduce the compression. membership functions number and to introduce stacked NN [19]. Such stacked NN allows to simplify the architecture of I. INTRODUCTION autoencoder and in this way to speed up the learning process. The task of compressing information that should be further Therefore, autoencoder consists of two sequentially processed, is one of main problem, which is solved in Data connected layers, which are implemented with the generalized Mining. For solving this problem, a lot of approaches [1-5] are neo-fuzzy neurons GNFN[1] and GNFN[2]. proposed, at that more of these methods comprehend the The sequence of input signals, which have to compress information processing in batch mode, when the = fixed-sized x(k ) ( x1 (k ), x2 (k ), , xn (k ) ) ∈ R n , ( k = 1, 2, is a number T data set is processed many times. It is very important in order to the process of compression, of the [1]observation or a current instant of time), is fed to the loss of information is minimal. Nowadays the approaches GNFN .[1] based on Deep neural networks (DNNs) [6-9] are widely used GNFN consists of the n multidimensional nonlinear for solving the many tasks, which is connected with analysis synapses MNSi , i = 1, 2, , n , where each of them has a one [1] of Big Data. As it can be seen from many researches the DNNs input, m outputs, h membership functions provide significantly better results than the conventional µli[1] ( xi (k )), l = 1, 2, , h and mh tuned synaptic weights shallow neural networks. The inherent part of DNN is, so-called autoencoder, which w[1]jli , j = 1, 2, , m . implements the compression of the input data and forms the The output of GNFN[1] is the compressed vector of the y (k ) ( y1 (k ), , y j (k ), , ym (k ) ) ∈ R m , m < n , input layers of the neural network. T As such autoencoders the multilayer associative “bottle- = signals neck” perceptrons or restricted Boltzmann machines, in which which simultaneously is output of the autoencoder. The signal nodes are the elementary Rosenblatt perceptrons with the y (k ) is fed to the inputs of GNFN[2], which contains m sigmoidal activation functions, are often used. inputs, m multidimensional nonlinear synapses MNS [2] j , Unfortunately, the learning process of such autoencoders demands a large time spending and cannot be implemented in where each of them has one input, n outputs, h membership online mode. functions µlj[2] ( y j (k )), l = 1, 2, , h and nh synaptic weights In the connection with the intensive development of Data Mining, Data Stream Mining, Web Mining over recent years wilj[2] . the development of high speed information compression Thus, considered autoencoder contains 2nmh tuned systems is an important problem. Such systems have to process synaptic weights and (n + m)h membership functions that is data in sequential mode (perhaps in online mode) as the real significantly fewer than in the architecture in [20]. information processing systems need. In the outputs of GNFN[2] the recovered signal II. THE ARCHITECTURE OF NEO-FUZZY xˆ (k ) = ( xˆ1 (k ), , xˆi (k ), , xˆn (k ) ) is formed. In such manner T AUTOENCODER the autoencoder is the autoassociative hybrid neo-fuzzy Fig. 1 shows the architecture of the proposed autoencoder, system of computational intelligence. which is autoassociative “bottle-neck” modification of ACIT 2018, June 1-3, 2018, Ceske Budejovice, Czech Republic 162 Fig 1. The architecture of the proposed neo-fuzzy autoencoder. The proposed system is implemented a nonlinear mapping  w111 [1] [1] w121  w1[1]hn  in the form  [1]  [1]  w2[1]hn  W [1] =  211 w w221   ,  n h    w2[1]hn   [1]  m h xˆi (k ) = ∑∑ wilj[2] µlj[2]  ∑∑ w[1] jli µli ( xi ( k ))  , [1] wm[1]21 [1]   wmhn =j 1 =l 1 =i 1 =l 1   wm11  or in the matrix form  w111 [2] [2] w121  w1[2]   [2] [2]  hm w221  w2 hm  [2] W [2] =  211 w xˆ(k ) = W [2] µ [2] (W [1] µ [1] ( x(k ))) ,   ,   w2[2]hm   [2] [2]   where  wn11 wn[2]21  wnhm  ACIT 2018, June 1-3, 2018, Ceske Budejovice, Czech Republic 163 µ [1] ( x(k )) = ( µ11[1] ( x1 (k )), µ21 [1] ( x1 (k )), , µ h[1]1 ( x1 (k )), are compressed based on the autoassociative multilayer neural network “Bottle Neck”. ( xn (k )) ) , T µ12[1] ( x2 (k )), , µli[1] ( xi (k )), , µhn [1] µ [2] ( y (k )) = ( µ11[2] ( y1 (k )), µ21 [2] ( y1 (k )), , µ h[2] 1 ( y1 ( k )), ( ym (k )) ) . T µ12[2] ( y2 (k )), , µlj[2] ( y j (k )), , µhm [2] ІІІ. THE LEARNING ALGORITHM FOR SYNAPTIC WEIGHTS OF NEO-FUZZY AUTOENCODER For the tuning the synaptic weights of GNFN[2] we can use the gradient procedure of minimizing the quadratic criterion in the form ∂ei2 (k ) k ) wilj[2] (k − 1) − η [2] (k ) wilj[2] (= = ∂wilj[2] = wilj[2] (k − 1) + η [2] (k )ei (k ) µlj[2] ( y j (k )) a) where η [2] (k ) is learning rate parameter of the output layer, which is chosen accordingly to the condition in [20, 21] 2 η [2]= (k ) α r [2] (k − 1) + µ [2] ( y (k )) (k ) (r [2] (k )) −1 ; r [2]= where 0 ≤ α ≤ 1 is forgetting factor. For tuning the synaptic weights GNFN[1] the optimized backpropagation error procedure, which for uniformly distributed in the line of X-axis the triangular membership functions with centers xli[1] ylj[2] can be write in the form jli ( k − 1) + η ( k )ei ( k ) µli ( xi ( k )) wij ( k ) jli (= w[1] k ) w[1] [1] [1]  [2] where b) Fig. 2. Data set Hayes-roth after compression based on the 2 η [1]= (k ) α r [1] (k − 1) + µ [1] ( x(k )) , (k ) (r [1] (k )) −1 ; r [1]= autoassociative multilayer neural network “Bottle Neck” (а) and neo-fuzzy autoencoder (b)  ( yl[2] −1 , j − yl −1, j ) , if y j ( k ) ∈ [ yl −1, j , yl , j ], [2] [2] [2]  h   The results, which were obtained using the proposed neo- w ij[2] (k ) =∑ wilj[2] (k )  ( yl[2] [2] −1 , j − yl +1, j ) , if y j ( k ) ∈ [ yl , j , yl +1, j ],  . [2] [2] fuzzy autoencoder, were compared with the results of l =1    0 otherwise  autoassociative multilayer neural network “Bottle Neck”   (Table I). The dimension of compression data was 2 components. The simulation was performed 20 times with The proposed learning algorithm for synaptic weights of different initial condition and the results were averaged. autoencoder is characterized by high speed and adjugate following and filtering properties. TABLE I. RESULTS OF SIMULATION IV. EXPERIMENTS DATA AUTOENCODERS SETS MSE For effectiveness verification of the proposed neo-fuzzy autoencoder, the data sets were taken from UCI Repository Iris 0.199 [22]: Iris, Wine, Hayes-roth. Data set “Iris” contains 150 Wine 0.499 Neo-fuzzy autoencoder observations (Number of Attributes: 4) of 3 classes, Data set Hayes- 0.312 “Wine” contains 178 observations (Number of Attributes: 13) roth of 3 classes, data set “Hayes-roth” contains 160 observations Iris 0.486 Autoassociative three (Number of Attributes: 5) of 3 classes. Wine 0.903 layer neural network It is seen from Fig.2 data, which are compressed using neo- Hayes- “Bottle Neck” 0.593 roth fuzzy autoencoder, are more compact clusters than data, which ACIT 2018, June 1-3, 2018, Ceske Budejovice, Czech Republic 164 ІV. CONCLUSIONS [13] V. Kolodyazhniy, Ye. Bodyanskiy, V. Poyedyntseva, and A. Stephan “Neuro-fuzzy Kolmogorov’s network with a The architecture of «bottle-neck» two-layer autoencoder modified perceptron learning rule for classification and its learning algorithm are proposed. Such system is based problems,” in Advances in Soft Computing, vol. 38, B. on generalized neo-fuzzy neurons and is autoassociative Reuch, Ed., Berlin-Heidelberg: Springer-Verlag, 2006, “bottle-neck” modification of Kolmogorov’s neuro-fuzzy pp. 41-49. network. The proposed hybrid neo-fuzzy system of [14] Ye. Bodyanskiy, Ye. Gorshkov, V. Kolodyazhniy, and V. computational intelligence provides high quality of Poyedyntseva “Neuro-fuzzy Kolmogorov's network,” in information compression, which are fed sequentially for Lecture Notes in Computer Science, vol.3697, W. Duch, processing. It is characterized by computational simplicity and J. Kacprzyk, E. Oja, and S. Zadrozny, Eds., Berlin- high speed of the learning process. Heidelberg: Springer-Verlag, 2005, pp.1-6. REFERENCES [15] V. Kolodyazhniy, F. Klawonn, and K. Tschumitschew, “A neuro-fuzzy model for dimensionality reduction and its [1] J. Han and M. Kamber, “Data Mining: Concepts and application” International Journal of Uncertainty, Techniques”. Amsterdam: Morgan Kaufman Publ., 2006. Fuzziness and Knowledge-Based Systems vol. 15, is. 05, [2] C.C. Aggarwal, “Data Mining”, N.Y.: Springer, 2015. October 2007, pp. 571-593. [3] A. Bifet, R. Gavaldà, G. Holmes, and B. Pfahringer, [16] Vynokurova O., Bodyanskiy Ye., Pliss I., Peleshko D., Machine Learning for Data Streams with Practical Rashkevych Yu. “Neo-fuzzy encoder and its adaptive Examples in MOA. The MIT Press, 2018. learning for Big Data processing.” Scientific Journal of [4] A. Bifet, Adaptive Stream Mining: Pattern Learning and RTU, Series “Computer Science” Volume “Information Mining from Evolving Data Streams. Amsterdam: IOS Technology and Management Science” 2017, vol. 20, pp. Press, 2010. 6–11. [5] A. Menshawy, Deep Learning By Example: A hands-on [17] T. Yamakawa, E. Uchino, T. Miki and H. Kusanagi, “A guide to implementing advanced machine learning neo-fuzzy neuron and its applications to system iden- algorithms and neural networks. Packt Publishing tification and prediction of the system behavior,” in Proc. Limited, 2018. of 2-nd Int. Conf. on Fuzzy Logic and Neural Networks [6] M. Fullan, J. Quinn, and J. McEachen, Deep Learning: “IIZUKA-92”, Iizuka, Japan, pp. 477–483, 1992. Engage the World Change the World. Corwin, 2017. [18] R.P.Landim, B. Rodrigues, S.R. Silva, and W.M. [7] A. L. Caterini and D. E. Chang, Deep Neural Networks Caminhas, “A neo-fuzzy-neuron with real time training in a Mathematical Framework. Springer, 2018. applied to flux observer for an induction motor”. In: [8] Y. LeCun, Y. Bengio, and G.E. Hinton, “Deep Learning”. Proceedings of IEEE Vth Brazilian Symposium on Neural Nature, 2015, v. 521, pp. 436-444. Networks, Belo Horizonte, 9-11 Dec 1998, pp. 67-72. [9] D. Graupe, “Deep Learning Neural Networks: Design [19] J. Schmidhuber, “Deep learning in neural networks: An and Case Studies”. World Scientific Publishing overview,” Neural Networks, vol. 61, pp. 85-117, Jan. Company, 2016. 2015. (doi: 10.1016/j.neunet.2014.09.003) [10] V. Kolodyazhniy and Ye. Bodyanskiy, “Fuzzy [20] Ye. Bodyanskiy, I. Kokshenev, V. Kolodyazhniy, “An Kolmogorov’s Network,” in Lecture Notes in Computer adaptive learning algorithm for a neo fuzzy neuron,” in Science, vol. 3214, M.G. Negoita et al., Eds., Springer- Proc. 3rd Int. Conf. of European Union Society for Fuzzy Verlag, 2004, pp.764-771. Logic and Technology (EUSFLAT 2003), Zittau, 2003, [11] Ye. Bodyanskiy, V. Kolodyazhniy and P. Otto, “Neuro- pp. 375-379. fuzzy Kolmogorov’s network for time-series prediction [21] P. Otto, Ye. Bodyanskiy, V. Kolodyazhniy, “A new and pattern classification,” in Lecture Notes in Artificial learning algorithm for a forecasting neuro-fuzzy Intelligence, vol. 3698, U. Furbach, Ed., Heidelberg: network,” Integrated Computer-Aided Engineering, vol. Springer –Verlag, 2005, pp. 191-202. 10, pp. 399-409, Dec. 2003 [12] V. Kolodyazhniy, Ye. Bodyanskiy and P. Otto, “Universal [22] UCI Repository of machine learning databases. CA: approximator employing neo-fuzzy neurons,” in University of California, Department of Information and Computational Intelligence Theory and Applications, Ed. Computer Science. [Online]. Available: B. Reusch, Ed., Berlin-Heidelberg: Springer, 2005, pp. http://www.ics.uci.edu/~mlearn/MLRepository.html 631-640. ACIT 2018, June 1-3, 2018, Ceske Budejovice, Czech Republic