161


  The Neo-Fuzzy Autoencoder for Adaptive Deep Neural
               Systems and its Learning
                               Yevgeniy Bodyanskiy, Iryna Pliss, Olena Vynokurova
      Control Systems Research Laboratory, Kharkiv National University of Radio Electronics, UKRAINE, Kharkiv, 14 Nauky Ave.,
                         email: yevgeniy.bodyanskiy@nure.ua, iryna.pliss@nure.ua, vynokurova@gmail.com


   Abstract: In this paper the autoencoder based on the                    Kolmogorov’s neuro-fuzzy network [10-14] that implements
generalized neo-fuzzy neurons is proposed. Also its fast                   the multiresolution approach and is the universal approximator
learning algorithm was proposed. Such system can be used                   according to the theorem of Kolmogorov-Arnold and Yam-
as part of deep learning neural networks. The proposed                     Nguen-Kreinovich.
neo-fuzzy autoencoder is characterized by high learning                      It should be notice in [15, 16] the architecture, which nodes
speed and less number of tuned parameters in comparison                    are the neo-fuzzy neurons (NFNs) [17], is considered. In spite
with well-known approaches. The efficiency of proposed                     of the simplicity of learning algorithm for synaptic weights,
approach has been examined based on different                              such system is abundant in the sense of the membership
benchmarks and real data sets.                                             functions number.
   Keywords: neo-fuzzy autoencoder, deep learning                            Using the generalized neo-fuzzy neurons [18] instead of
network, neo-fuzzy neuron, fast learning algorithm, data                   conventional NFN allows significantly to reduce the
compression.                                                               membership functions number and to introduce stacked NN
                                                                           [19]. Such stacked NN allows to simplify the architecture of
                       I. INTRODUCTION                                     autoencoder and in this way to speed up the learning process.
   The task of compressing information that should be further                Therefore, autoencoder consists of two sequentially
processed, is one of main problem, which is solved in Data                 connected layers, which are implemented with the generalized
Mining. For solving this problem, a lot of approaches [1-5] are            neo-fuzzy neurons GNFN[1] and GNFN[2].
proposed, at that more of these methods comprehend the                       The sequence of input signals, which have to compress
information processing in batch mode, when the   =   fixed-sized  x(k ) ( x1 (k ), x2 (k ), , xn (k ) ) ∈ R n , ( k = 1, 2, is a number
                                                                                                             T

data set is processed many times.
   It is very important in order to the process of compression, of the [1]observation or a current instant of time), is fed to
the loss of information is minimal. Nowadays the approaches GNFN .[1]
based on Deep neural networks (DNNs) [6-9] are widely used            GNFN consists of the n multidimensional nonlinear
for solving the many tasks, which is connected with analysis synapses MNSi , i = 1, 2, , n , where each of them has a one
                                                                                          [1]

of Big Data. As it can be seen from many researches the DNNs input,                m          outputs,       h      membership         functions
provide significantly better results than the conventional
                                                                  µli[1] ( xi (k )), l = 1, 2, , h and mh tuned synaptic weights
shallow neural networks.
   The inherent part of DNN is, so-called autoencoder, which      w[1]jli , j = 1, 2, , m .
implements the compression of the input data and forms the            The output of GNFN[1] is the compressed vector of the
                                                                                       y (k ) ( y1 (k ), , y j (k ), , ym (k ) ) ∈ R m , m < n ,
input layers of the neural network.                                                                                               T

   As such autoencoders the multilayer associative “bottle- =    signals
neck” perceptrons or restricted Boltzmann machines, in which which simultaneously is output of the autoencoder. The signal
nodes are the elementary Rosenblatt perceptrons with the           y (k ) is fed to the inputs of GNFN[2], which contains m
sigmoidal activation functions, are often used.
                                                                 inputs, m multidimensional nonlinear synapses MNS [2]                         j ,
   Unfortunately, the learning process of such autoencoders
demands a large time spending and cannot be implemented in where each of them has one input, n outputs, h membership
online mode.                                                     functions µlj[2] ( y j (k )), l = 1, 2, , h and nh synaptic weights
   In the connection with the intensive development of Data
Mining, Data Stream Mining, Web Mining over recent years          wilj[2] .
the development of high speed information compression                 Thus, considered autoencoder contains 2nmh tuned
systems is an important problem. Such systems have to process synaptic weights and (n + m)h membership functions that is
data in sequential mode (perhaps in online mode) as the real
                                                                 significantly fewer than in the architecture in [20].
information processing systems need.                                  In the outputs of GNFN[2] the recovered signal
          II. THE ARCHITECTURE OF NEO-FUZZY
                                                                  xˆ (k ) = ( xˆ1 (k ), , xˆi (k ), , xˆn (k ) ) is formed. In such manner
                                                                                                                  T

                       AUTOENCODER
                                                                 the autoencoder is the autoassociative hybrid neo-fuzzy
   Fig. 1 shows the architecture of the proposed autoencoder, system of computational intelligence.
which is autoassociative “bottle-neck” modification of


                             ACIT 2018, June 1-3, 2018, Ceske Budejovice, Czech Republic
                                                                              162


                                                  Fig 1. The architecture of the proposed neo-fuzzy autoencoder.
   The proposed system is implemented a nonlinear mapping                                                   w111
                                                                                                               [1]    [1]
                                                                                                                     w121       w1[1]hn 
in the form                                                                                                 [1]                         
                                                                                                                      [1]
                                                                                                                                w2[1]hn 
                                                                                                   W [1] =  211
                                                                                                              w      w221
                                                                                                                                         ,
                                n h                          
                                                                                                                               w2[1]hn 
                                                                                                            [1]                        
               m h
  xˆi (k ) = ∑∑ wilj[2] µlj[2]  ∑∑ w[1]  jli µli ( xi ( k ))  ,
                                               [1]
                                                                                                                     wm[1]21      [1] 
                                                                                                                                wmhn
             =j 1 =l 1         =i 1 =l 1                                                                   wm11                       

  or in the matrix form                                                                                     w111
                                                                                                               [2]     [2]
                                                                                                                     w121     w1[2]   
                                                                                                            [2]                 [2] 
                                                                                                                                  hm

                                                                                                                     w221  w2 hm 
                                                                                                                       [2]
                                                                                                   W [2] =  211
                                                                                                              w
  xˆ(k ) = W [2] µ [2] (W [1] µ [1] ( x(k ))) ,                                                                                        ,
                                                                                                                             w2[2]hm 
                                                                                                            [2]                [2] 
                                                                                                                                       
  where                                                                                                      wn11   wn[2]21  wnhm    


                                  ACIT 2018, June 1-3, 2018, Ceske Budejovice, Czech Republic
                                                                                                163

   µ [1] ( x(k )) = ( µ11[1] ( x1 (k )), µ21
                                          [1]
                                              ( x1 (k )), , µ h[1]1 ( x1 (k )),                    are compressed based on the autoassociative multilayer neural
                                                                                                    network “Bottle Neck”.
                                                 ( xn (k )) ) ,
                                                                T
µ12[1] ( x2 (k )), , µli[1] ( xi (k )), , µhn
                                             [1]


   µ [2] ( y (k )) = ( µ11[2] ( y1 (k )), µ21
                                           [2]
                                               ( y1 (k )), , µ h[2]
                                                                  1 ( y1 ( k )),


                                                  ( ym (k )) ) .
                                                                    T
µ12[2] ( y2 (k )), , µlj[2] ( y j (k )), , µhm
                                              [2]


    ІІІ. THE LEARNING ALGORITHM FOR SYNAPTIC
        WEIGHTS OF NEO-FUZZY AUTOENCODER
  For the tuning the synaptic weights of GNFN[2] we can use
the gradient procedure of minimizing the quadratic criterion in
the form

                                                                ∂ei2 (k )
                             k ) wilj[2] (k − 1) − η [2] (k )
                    wilj[2] (=                                         =
                                                                 ∂wilj[2]
             = wilj[2] (k − 1) + η [2] (k )ei (k ) µlj[2] ( y j (k ))
                                                                                                                                    a)

where η [2] (k ) is learning rate parameter of the output layer,
which is chosen accordingly to the condition in [20, 21]

                                                                                            2
    η [2]=                           (k ) α r [2] (k − 1) + µ [2] ( y (k ))
         (k ) (r [2] (k )) −1 ; r [2]=

where 0 ≤ α ≤ 1 is forgetting factor.
   For tuning the synaptic weights GNFN[1] the optimized
backpropagation error procedure, which for uniformly
distributed in the line of X-axis the triangular membership
functions with centers xli[1] ylj[2] can be write in the form


                      jli ( k − 1) + η ( k )ei ( k ) µli ( xi ( k )) wij ( k )
           jli (=
         w[1]   k ) w[1]              [1]             [1]
                                                                      [2]

where                                                                                                                                 b)
                                                                                                         Fig. 2. Data set Hayes-roth after compression based on the
                                                                                        2
     η [1]=                           (k ) α r [1] (k − 1) + µ [1] ( x(k )) ,
          (k ) (r [1] (k )) −1 ; r [1]=                                                               autoassociative multilayer neural network “Bottle Neck” (а) and
                                                                                                                          neo-fuzzy autoencoder (b)
                               ( yl[2]              −1
                                     , j − yl −1, j ) , if y j ( k ) ∈ [ yl −1, j , yl , j ],
                                            [2]                           [2]         [2]
                                                                                              
                h                                                                                    The results, which were obtained using the proposed neo-
w ij[2] (k ) =∑ wilj[2] (k )  ( yl[2]     [2]      −1
                                     , j − yl +1, j ) , if y j ( k ) ∈ [ yl , j , yl +1, j ],  .
                                                                           [2]      [2]
                                                                                                    fuzzy autoencoder, were compared with the results of
               l =1                                                                          
                               0 otherwise                                                        autoassociative multilayer neural network “Bottle Neck”
                                                                                                  (Table I). The dimension of compression data was 2
                                                                                                    components. The simulation was performed 20 times with
   The proposed learning algorithm for synaptic weights of                                          different initial condition and the results were averaged.
autoencoder is characterized by high speed and adjugate
following and filtering properties.                                                                              TABLE I. RESULTS OF SIMULATION

                               IV. EXPERIMENTS                                                                                           DATA
                                                                                                            AUTOENCODERS                 SETS          MSE
   For effectiveness verification of the proposed neo-fuzzy
autoencoder, the data sets were taken from UCI Repository                                                                            Iris              0.199
[22]: Iris, Wine, Hayes-roth. Data set “Iris” contains 150                                                                           Wine              0.499
                                                                                                        Neo-fuzzy autoencoder
observations (Number of Attributes: 4) of 3 classes, Data set                                                                        Hayes-
                                                                                                                                                       0.312
“Wine” contains 178 observations (Number of Attributes: 13)                                                                          roth
of 3 classes, data set “Hayes-roth” contains 160 observations                                                                        Iris              0.486
                                                                                                        Autoassociative three
(Number of Attributes: 5) of 3 classes.                                                                                              Wine              0.903
                                                                                                        layer neural network
   It is seen from Fig.2 data, which are compressed using neo-                                                                       Hayes-
                                                                                                        “Bottle Neck”                                  0.593
                                                                                                                                     roth
fuzzy autoencoder, are more compact clusters than data, which


                                       ACIT 2018, June 1-3, 2018, Ceske Budejovice, Czech Republic
                                                               164

                     ІV. CONCLUSIONS                               [13] V. Kolodyazhniy, Ye. Bodyanskiy, V. Poyedyntseva, and
                                                                        A. Stephan “Neuro-fuzzy Kolmogorov’s network with a
   The architecture of «bottle-neck» two-layer autoencoder              modified perceptron learning rule for classification
and its learning algorithm are proposed. Such system is based           problems,” in Advances in Soft Computing, vol. 38, B.
on generalized neo-fuzzy neurons and is autoassociative                 Reuch, Ed., Berlin-Heidelberg: Springer-Verlag, 2006,
“bottle-neck” modification of Kolmogorov’s neuro-fuzzy                  pp. 41-49.
network. The proposed hybrid neo-fuzzy system of                   [14] Ye. Bodyanskiy, Ye. Gorshkov, V. Kolodyazhniy, and V.
computational intelligence provides high quality of                     Poyedyntseva “Neuro-fuzzy Kolmogorov's network,” in
information compression, which are fed sequentially for                 Lecture Notes in Computer Science, vol.3697, W. Duch,
processing. It is characterized by computational simplicity and         J. Kacprzyk, E. Oja, and S. Zadrozny, Eds., Berlin-
high speed of the learning process.                                     Heidelberg: Springer-Verlag, 2005, pp.1-6.
                        REFERENCES                                 [15] V. Kolodyazhniy, F. Klawonn, and K. Tschumitschew, “A
                                                                        neuro-fuzzy model for dimensionality reduction and its
[1] J. Han and M. Kamber, “Data Mining: Concepts and                    application” International Journal of Uncertainty,
     Techniques”. Amsterdam: Morgan Kaufman Publ., 2006.                Fuzziness and Knowledge-Based Systems vol. 15, is. 05,
[2] C.C. Aggarwal, “Data Mining”, N.Y.: Springer, 2015.                 October 2007, pp. 571-593.
[3] A. Bifet, R. Gavaldà, G. Holmes, and B. Pfahringer,            [16] Vynokurova O., Bodyanskiy Ye., Pliss I., Peleshko D.,
     Machine Learning for Data Streams with Practical                   Rashkevych Yu. “Neo-fuzzy encoder and its adaptive
     Examples in MOA. The MIT Press, 2018.                              learning for Big Data processing.” Scientific Journal of
[4] A. Bifet, Adaptive Stream Mining: Pattern Learning and              RTU, Series “Computer Science” Volume “Information
     Mining from Evolving Data Streams. Amsterdam: IOS                  Technology and Management Science” 2017, vol. 20, pp.
     Press, 2010.                                                       6–11.
[5] A. Menshawy, Deep Learning By Example: A hands-on              [17] T. Yamakawa, E. Uchino, T. Miki and H. Kusanagi, “A
     guide to implementing advanced machine learning                    neo-fuzzy neuron and its applications to system iden-
     algorithms and neural networks. Packt Publishing                   tification and prediction of the system behavior,” in Proc.
     Limited, 2018.                                                     of 2-nd Int. Conf. on Fuzzy Logic and Neural Networks
[6] M. Fullan, J. Quinn, and J. McEachen, Deep Learning:                “IIZUKA-92”, Iizuka, Japan, pp. 477–483, 1992.
     Engage the World Change the World. Corwin, 2017.              [18] R.P.Landim, B. Rodrigues, S.R. Silva, and W.M.
[7] A. L. Caterini and D. E. Chang, Deep Neural Networks                Caminhas, “A neo-fuzzy-neuron with real time training
     in a Mathematical Framework. Springer, 2018.                       applied to flux observer for an induction motor”. In:
[8] Y. LeCun, Y. Bengio, and G.E. Hinton, “Deep Learning”.              Proceedings of IEEE Vth Brazilian Symposium on Neural
     Nature, 2015, v. 521, pp. 436-444.                                 Networks, Belo Horizonte, 9-11 Dec 1998, pp. 67-72.
[9] D. Graupe, “Deep Learning Neural Networks: Design              [19] J. Schmidhuber, “Deep learning in neural networks: An
     and Case Studies”. World Scientific Publishing                     overview,” Neural Networks, vol. 61, pp. 85-117, Jan.
     Company, 2016.                                                     2015. (doi: 10.1016/j.neunet.2014.09.003)
[10] V. Kolodyazhniy and Ye. Bodyanskiy, “Fuzzy                    [20] Ye. Bodyanskiy, I. Kokshenev, V. Kolodyazhniy, “An
     Kolmogorov’s Network,” in Lecture Notes in Computer                adaptive learning algorithm for a neo fuzzy neuron,” in
     Science, vol. 3214, M.G. Negoita et al., Eds., Springer-           Proc. 3rd Int. Conf. of European Union Society for Fuzzy
     Verlag, 2004, pp.764-771.                                          Logic and Technology (EUSFLAT 2003), Zittau, 2003,
[11] Ye. Bodyanskiy, V. Kolodyazhniy and P. Otto, “Neuro-               pp. 375-379.
     fuzzy Kolmogorov’s network for time-series prediction         [21] P. Otto, Ye. Bodyanskiy, V. Kolodyazhniy, “A new
     and pattern classification,” in Lecture Notes in Artificial        learning algorithm for a forecasting neuro-fuzzy
     Intelligence, vol. 3698, U. Furbach, Ed., Heidelberg:              network,” Integrated Computer-Aided Engineering, vol.
     Springer –Verlag, 2005, pp. 191-202.                               10, pp. 399-409, Dec. 2003
[12] V. Kolodyazhniy, Ye. Bodyanskiy and P. Otto, “Universal       [22] UCI Repository of machine learning databases. CA:
     approximator employing neo-fuzzy neurons,” in                      University of California, Department of Information and
     Computational Intelligence Theory and Applications, Ed.            Computer          Science.        [Online].     Available:
     B. Reusch, Ed., Berlin-Heidelberg: Springer, 2005, pp.             http://www.ics.uci.edu/~mlearn/MLRepository.html
     631-640.


                          ACIT 2018, June 1-3, 2018, Ceske Budejovice, Czech Republic