Online Neuro Fuzzy Clustering of Data with Omissions
         and Outliers based on Сompletion Strategy

           Yevgeniy Bodyanskiy1[0000-0001-5418-2143], Alina Shafronenko1[0000-0002-
                     8040-0279]
                               , and Diana Rudenko1[0000-0002-1792-5080]
1
 Kharkiv National University of Radio Electronics, Nauky Ave., 14, Kharkiv, Ukraine
                   yevgeniy.bodyanskiy@nure.ua,
                      alina.shafronenko@nure.ua,
                            diana.rudenko@nure.ua


     Abstract. In the paper new recurrent adaptive algorithms for fuzzy clustering
     of data with missing values are proposed. This algorithm is based on fuzzy
     clustering procedures and self-learning Kohonen’s rule using principle
     “Winner-Takes-More” with Cauchy neighborhood function. Using proposed
     approach it’s possible to solve clustering task in on-line mode in situation
     when the amount of missing values in data is too big.
     Keywords: fuzzy clustering, Kohonen self-organizing network, learning
     rule, incomplete data with missing values.


1 Introduction
The problem of data sets described by vector-images clustering often occurs in many
applications associated with Data Mining, but recently the focus on fuzzy clustering
[1-3], when processed vector-image with different levels of probabilities, possibilities
or memberships, can belong to more than one class.
    However, there are situations when the data sets contain omissions and outliers,
the information that is lost. In this situation more effective is to use mathematical
apparatus of Computational Intelligence [4] and, first of all artificial neural networks
[5], that solve the task of restoring the lost observations, and modifications of the
popular method of fuzzy c-means [6], which solve the problem of clustering without
recovery of data.
    Existing approaches [7,8] for data processing with omissions and outliers, are effi-
cient in cases when the massive of the original observations is given in batch form
and does not change during the processing. At the same time, there is a wide class of
problems in which the data that arrive to the processing, have the form of sequence
that is feed in real time as it occurs in the training of Kohonen self-organizing maps
[9] or their modifications [10]. In this regard we introduced [11] the adaptive neuro-
fuzzy Kohonen network to solve the problem of clustering data with gaps based on
the strategy of partial distances (PDS FCM). However, in situations where the number
of such omissions and outliers is so much, the strategy of partial distances [12] may
be not effective, and therefore it may be necessary, along with the solution of fuzzy
clustering simultaneously estimate the missing observations. In this situation, a more
efficient is approach that is based on the optimal expansion strategy (OCS FCM) [6].
This work is devoted to the task of on-line data clustering using the optimal expansion
strategy, adapted to the case when information is processed in a sequential mode, and
its volume is not determined in advance.

2 Probabilistic adaptive fuzzy clustering with missing data based
on the optimal completion strategy
Baseline information for solving the task of clustering in a batch mode is the sample
of    observations,    formed      from      N n -dimensional       feature    vectors
X  {x1 , x2 ,..., xN }  R n , xk  X , k  1, 2,..., N . The result of clustering is the
partition of original data set into m classes (1  m  N ) with some level of member-
ship U q (k ) of k -th feature vector to the q -th cluster (1  q  m) . Incoming data
previously are centered and standardized by all features, so that all observations be-
long to the hypercube [1,1]n . Therefore, the data for clustering form array
X  {x1 ,..., xk ,..., x N }  R n , xk  ( xk1 ,..., xki ,..., xkn )T , 1  xki  1 , 1  m  N ,
1  q  m , 1  i  n , 1  k  N that is, all observations xki are available for proc-
essing.
    Introducing the objective function of clustering [1]
                                                   N     m
                             E (U q (k ), wq )  U q (k ) D 2 ( xk , wq )
                                                   k 1 q 1
                       m                       N
with constraints      U (k )  1, 0  U (k )  N and solving standard nonlinear
                      q 1
                              q
                                              k 1
                                                       q


programming problem, we get the probabilistic fuzzy clustering algorithm [2, 3]
                                                   2
                                                                      1
                                                              ( ) 1 
                              ( 1)           ( D ( xk , wq ))
                             U q (k )  m                              1
                                                                           ,
                             
                                           
                                             l 1
                                                       2         ( ) 1 
                                                  ( D ( xk , wl ))
                                       N
                                                                                                   (1)

                               ( 1) 
                                           (U q( 1) (k ))  xk
                               wq  N
                                       k 1
                                                                    ,
                              
                              
                                         
                                         k 1
                                                   ( 1)
                                              (U q (k ))       


where wq - prototype (centroid) of q -th cluster,   1 - parameter that is called

fuzzyfier and defines "vagueness" of boundaries between classes, D 2 ( xk , wq ) - the
distance between xk and wq in adopted metric,   0,1, 2,... - index of epoch of
information         processing         which       is      organized           as   a   sequence   of
w (0)
  q     U   (1)
             q      w U
                       (1)
                       q
                                 (2)
                                 q      ... . The calculation process continues until satisfy the
condition
                                   wq( 1)  wq( )    1  q  m,
(here  - defines threshold of accuracy) or until the specified maximum number of
epochs Q (   0,1, 2,..., Q ).
    Note also that when   2 and
                                                                     2
                                       D 2 ( xk , wq )  xk  wq ,
we get a popular algorithm of Bezdek’s fuzzy c-means (FCM) [1].
   The process of fuzzy clustering can be organized in on-line mode as sequentially
processing. At this situation batch algorithm (1) can be rewritten in recurrent form
[12]
                                          2
                                                                 1
                                                                1 
                                      ( D ( xk 1 , wq (k )))
                   U q (k  1)  m                                1
                                                                        ,
                    
                    
                                    
                                    l 1
                                             2
                                         ( D ( xk 1 , wl (k ))) 1                              (2)

                                                              
                     wq (k  1)  wq (k )   (k  1)U q (k  1)( xk 1  wq (k )),
where  (k  1) - learning rate parameter, U q (k  1) - bell-shaped neighborhood
function of neuro-fuzzy Kohonen network (Cauchy function), designed to solve the
problems of fuzzy clustering [10, 11], based on the principle "Winner Takes More»
(WTM) [9].
In the presence of an unknown number of missing values in vector images xk , that
form array X , following [6], we introduce the sub-arrays:
X F  {xk  X | xk - vector containing all components} ;
X P  {xki ,1  i  n,1  k  N | values xk , available in X } ;
X G  {xki  ?,1  i  n,1  k  N | values xk , absent in X } .
   The optimal completion strategy consists in the fact that the elements of sub-array
 X G are considered as additional variables, which are estimated by minimization of
objective function E . Thus, in parallel with clustering (optimization E by
U q (k ) and wq ) estimation of missing observations is made (optimization E by
xki  X G ). In this case, the algorithm of fuzzy c-means based on the optimal expan-
sion strategy can be written as the following sequence of steps [6]:
    1.    Setting the initial conditions for the algorithm:   0 ; 1  m  N ;   0 ;
          wq(0) ; 1  q  m ;

      0,1, 2,..., Q ; X G(0)  {1  xˆki(0)  1} , where X G(0)  N G (1  N G  (n  1) N )
    arbitrary initial estimates xˆki(0) of missing values xki  X G ;
     2.   Calculation of membership levels by solving the optimization problem:
                                                                                                             1
                                                                                 2         ( )      ( )   1 
                                                                           ( D ( xˆ , w ))
     U q( 1) (k )  arg min E (U q (k ), wq( ) , X G( ) ) 
                                                                                           k         q
                                                                         m                                         1
                         Uq (k )
                                                                         ( D 2 ( xˆk( ) , wl( ) ))1 
                                                                         l 1
                                                                                                                                     1
                                                                                                                             2
                                                                                                     ( xˆk( )  wq( ) )1 
                                                                                                                                        1
                                                                                                   m

                                                                                                   ( xˆk( )  wl( ) )1
                                                                                                                                 2

                                                                                                  l 1
                        ( )
    (here vector xˆ     k      differs from xk by replacing missing values xki  X G by esti-
    mates xˆki( ) that are calculated for the  -th epoch of data processing);
    3.    Calculation the centroids of clusters:
                                                                                N


              ( 1)                     ( 1)                  ( )
                                                                                 (U  (k ))  xˆ 
                                                                                             ( 1)
                                                                                             q
                                                                                                                   ( )
                                                                                                                   k
            w q         arg min E (U    q        (k ), wq , X   G      )  k 1N                                        ;
                                                                                  (U q( 1) (k )) 
                               wq

                                                                                    k 1

    4.    Checking the stop conditions:
    if wq( 1)  wq( )    1  q  m or   Q , then the algorithm terminates, oth-
    erwise go to step 5;
    5.   Estimation of missing observations by solving the optimization problem:
                         X G( 1)  arg min E (U q( 1) (k ), wq( 1) , X G )
                                           XG

    or, equivalently
                                 E (U q( 1) (k ), wq( 1) , X G )
                                                                              0,
                                                    xˆki
that leads to
                                 m                                     m
                    xˆki( 1)   (U q( 1) ( k ))  wqi( 1)       (U  (k )) .
                                                                               ( 1)
                                                                               q
                                q 1                                  q 1

   Information processing with this algorithm is organized as a sequence
     wq(0)  U q(1)  xˆki(1)  wq(1)  U q(2)  ...  wq( )  U q( 1) 
                                                                    xˆki( 1)  wq( 1)  ...  wq( Q ) .

3 Possibilistic adaptive fuzzy clustering with missing data based on
the optimal completion strategy
The main disadvantage of conventional probabilistic algorithms is connected with the
constraints on membership levels which sum has to be equal unity. This reason has
led to the creation of possibilistic fuzzy clustering algorithms [14].
    In possibilistic clustering algorithms the objective function has the form
                                         N      m                                      m      N
          E (U q (k ), wq ,  q )   U q (k ) D 2 ( xk , wq )   q  (1  U q (k )) 
                                        k 1 q 1                                      q 1   k 1
(where the scalar parameter   0 determines the distance at which level of member-
ship equals to 0.5, i.e. if D 2 ( xk , wq )   q , then wq (k )  0.5 ) and its minimization

by U q ( k ) , wq ,  q leads to the result:

                                                                              1
                   U q( 1) (k )  1 1  ( D 2 ( xk , wq( ) ) q( ) )  1 ,
                   
                    ( 1) N                                   N

                    q
                     w              (U ( 1)
                                         q      ( k )) 
                                                         
                                                         x k       (U q( 1) (k ))  ,          (3)
                                k 1                          k 1
                    ( 1) N                                                     N
                    q   (U q (k )) D ( xk , wq )  (U q (k )) .
                                         ( 1)             2       ( 1)              ( 1) 

                                k 1                                            k 1
In the on-line processing recurrent form of the algorithm (3) can be written as [10,13]:

                                         D 2 ( xk 1 , wq (k )) 11
                  U q (k  1)  1 1  (                         ) ,
                                                 q (k )
                                                          
                   wq (k  1)  wq (k )   (k  1)U q (k  1)( xk 1  wq (k )),                    (4)
                                k 1                                  k 1
                    (k  1)   U  ( p ) D 2 ( x , w (k  1))  U  ( p) ,
                    q                q                 p    q              q
                                p 1                                  p 1


and the second relation in (4) differs only in the form of the neighborhood function
U q (k  1) . Thus, the recurrent possibilistic fuzzy clustering is based on Kohonen
competitive self learning.
    Considering the situation with missing observations and using the optimal comple-
tion strategy, as the objective function of possibilistic fuzzy clustering use the expres-
sion:
                                     N m                                 m       N
     E (U q (k ), wq , q , X G )    U q (k ) D 2 ( xˆk , wq )    q  (1  U q (k )) 
                                    k 1 q 1                           q 1    k 1

and its minimization leads to the system of obvious relations:

                           E (U q (k ), wq , q , X G ) U q (k )  0,
                                                                
                            wq E (U q (k ), wq , q , X G )  0,
                            
                            E (U q (k ), wq , q , X G ) xˆki  0,
                            
                            E (U q (k ), wq , q , X G ) q  0,
or

                                                                        1
                                             D 2 ( xˆk( ) , wq( ) )  1
                       (
                 U q (k )  1 1  (
                          1)
                                                                     )     ,
                                                     q( )
                 
                  ( 1) N                                       N
                  wq          (U q( 1) (k ))  xˆk( )  (U q( 1) (k ))  ,
                                k 1                             k 1
                                                                                                       (5)
                   ( 1)      m
                                     ( 1)        ( 1)
                                                                     m
                                                                           ( 1)     
                   xˆki   (U q (k )) wqi                           (U q (k )) ,
                              q 1                                 q 1
                                N                                                 N
                   q( 1)   (U q( 1) (k ))  D 2 ( xˆk( 1) , wq( 1) )  (U q( 1) (k ))  .
                              k 1                                              k 1
   Similarly to probabilistic fuzzy clustering based on optimal completion strategy
we can organize the possibilistic clustering process with missing observations
                ( 1)                                      1
               U q (k  1)                                             2 1
                                                                                   ,
                                                xˆk()1  wq (k )
                                         1 (               ( )
                                                                            )  1
                                                           q
                                m
               
                                (U q( 1) (k  1))  wqi (k )
                                q 1
                xˆk(1,1)i                                            ,
                                     m
                                             ( 1)               
                                     (U q (k  1))                                                      (6)
                                    q 1
                                         k 1                                          2
                                             (U q( 1) ( p ))  xˆ (p 1)  wq (k )
                ( 1)                   p 1
                q (k  1)                            k 1
                                                                                          ,
                                                                  ( 1)
                                                         (U q ( p))             
                                                       p 1
               
                wq ( k  1)  wq (k )   (k  1)(U q(Q ) (k  1))  ( xˆk(Q1)  wq (k )),
               
or in accelerate time

                ( 1)                                   1
               U q (k  1)                                             2 1
                                                                                    ,
                                           xˆk()1  wq( ) (k  1)
                                   1 (                  ( )
                                                                           )  1
                                                        q
                (0)                (Q )
                wq (k  1)  wq (k ),
                ( 1)                ( )                              ( 1)            
                wq (k  1)  wq (k  1)   (k  1)(U q (k  1)) *
               
                                                                          * ( xˆk()1  wq( ) (k  1)),
               
                             m                                                                            (7)
                             (U q( 1) (k  1))  wqi( 1) (k  1)
                xˆ ( 1)  q 1                                           ,
                k 1,i              m
                                            ( 1)             
                                    (U q (k  1))
                                   q 1
                                    k 1                                                         2
                                        (U q( 1) ( p ))  xˆ (p 1)  wq( 1) (k  1)
                ( 1)              p 1
                q (k  1)                           k 1
                                                                                                    .
                                                               ( 1)        
                                                       (U q ( p))
                                                      p 1
    It’s understandable that the algorithm (7) from computational point of view is the
most unwieldy; however, its advantage is that it can be used in on-line mode to detect
the emergence of new clusters. In the procedures (6), (7) is unnecessary accumulate
the processed sample, which is important in problems (for example, Web Mining),
where large amounts of information have to be processed.

4 Experiments
Experimental research conducted on two standard samples of data such as Wine and
Iris UCI repository.
    To estimate the quality of the algorithm we used quality criteria partitioning into
clusters such as: Partition Coefficient (PC), Classification Entropy (CE), Partition
Index (SC), Separation Index (S), Xie and Beni's Index (XB), Dunn's Index (DI).
    We also compared the results of our proposed algorithm with other more well-
known such as Fuzzy C-means (FCM) clustering algorithm and Gustafson-Kessel
clustering algorithm.
    As seen from the experimental results (Table 1, Table 2 and Table 3), the pro-
posed algorithm shown better results than the FCM and Gustafson-Kessel clustering
algorithm.

                  Table 1. Results of experiments with 10 missing values

                                                Iris UCI repository                                                          Wine UCI repository
                     Gaps


  Algorithms
                                                                              XB


                                                                                                                                                         XB
                                         CE


                                                                                                              CE
                            PC


                                                       SC


                                                                                                  PC


                                                                                                                              SC


                                                                                                                                                                               PC
                                                                                        DI


                                                                                                                                                                     DI
                                                                 S


                                                                                                                                           S


Adaptive fuzzy
                                                                                                              -2,28375e-04


                                                                                                                                                                               4,56245е-07
                                         -2,3308e-04


                                                                                                                              3,6936e+05
                                                                                                                                           2,7115e+08
                            9,1249е-07


                                                                                                                                                         1,359e+08


possibilistic
                                                                                                  0,582e-13
                                                                 48,7067
                                                                              24,4028
                                                       0,3733


                                                                                        0,2005


                                                                                                                                                                     0,0109


clustering data
with missing
values
                                                                                                                                           3,42085e-06
                                                                 0,7473e-04


                                                                                                                              3,6674e-04
                                         0,21415

                                                       0,00715


                                                                              1,92845

                                                                                        0,01375


                                                                                                                                                                     0,00585
                                                                                                                                                                               0,38085
                            0,3808


                                                                                                  0,3954

                                                                                                              0,1903


                                                                                                                                                         2,8555
                     10


FCM
                                         0,05725


                                                                                                  0,27535

                                                                                                              0,31965

                                                                                                                              4,29665
                                                                                                                                           0,02415


                                                                                                                                                                     0,05075


Gustafson-
                            0,4731


                                                       0,2395
                                                                 0,0032
                                                                              1,7309

                                                                                        0,1699


                                                                                                                                                         0,5375


                                                                                                                                                                               0,4731


Kessel
                FCM
                                                                                                                                                              FCM


                          values
                                                                                                                                                                         values


Kessel
                                                                                                                                              Kessel


Gustafson-
                                                                                                                                              Gustafson-


                          possibilistic
                                                                                                                                                                         possibilistic


                          with missing
                                                                                                                                                                         with missing


                                                 Algorithms
                                                                                                                                                                                                Algorithms


                          clustering data
                                                                                                                                                                         clustering data


                          Adaptive fuzzy
                                                                                                                                                                         Adaptive fuzzy
                 100                             Gaps                                                                                                               50                         Gaps

0,8335        0,8120       1,7429e-06       PC                                                                                                0,9422        0,7399        9,1249е-07       PC
0,2850        0,3555       -4,6617e-04      CE                                                                                                0,1177        0,4632        -4,6617e-04      CE

0,9994        0,0181         0,1209         SC                                                                                                0,5219        0,0174          0,3775         SC

0,0060       9,2846e-05        25,5         S                                                                                                 0,0035       1,8345e-04       48,7067        S
2,9995        3,9633         22,7008        XB                                                                                                3,4413        4,4887          48,8301        XB


                                                              Iris UCI repository
                                                                                                                                                                                                             Iris UCI repository


0,1236        0,0579         0,3497         DI                                                                                                0,3341        0,0355          0,3365         DI
0,7974        0,7798       2,1059e-11       PC                                                                                                0,5824        0,7892        7,5181e-12       PC
0,4854        0,3731       -5,7912e-04      CE                                                                                                0,6010        0,3838        -5,4992e-04      CE
7,1553       7,6342e-04    4,3497e+03       SC                                                                                                4,7678       7,6110e-04     1,1834e+04       SC
                                                                                                                                                                                                                                   Table 2. Results of experiments with 50 missing values


                                                                                    Table 3. Results of experiments with 100 missing values
0,0144       7,3466e-06    1,4587e+06       S                                                                                                 0,0268       7,1760e-06     4,1981e+06       S
1,3627        8,2453       1,5006e+06       XB
                                                                                                                                              1,1703        8,8618        4,2024e+06       XB
0,0974        0,3590         0,0236         DI
                                                              Wine UCI repository                                                             0,1030        0,0237          0,0240         DI
                                                                                                                                                                                                             Wine UCI repository


0,8254        0,8131       1,7329e-06       PC
                                                                                                                                              0,9422        0,7399        9,1249е-07       PC
5 Сonclusions
The problem of probabilistic and possibilistic fuzzy adaptive clustering, containing a
priori unknown number of gaps, based on the optimal completion of data strategy is
considered. The proposed algorithms are based on the recurrent optimization of a
special type of goal functions. Missing observations are replaced by their estimates
also obtained in the solution of optimization problem. Сentroids of recovered clusters
are tuned using a procedure close to the T.Kohonen WTM-rule with the function of
the neighborhood (membership), having the Cauchian form.

References

1. Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms.
    Plenum Press, New York (1981).
2. Hoeppner, F., Klawonn, F., Kruse, R., Runker, T.: Fuzzy Clustering Analysis:
    Methods for Classification, Data Analysis and Image Recognition. Chichester,
    John Wiley &Sons (1999)
3. Xu, R., Wunsch, D.C.: Clustering. Hoboken, N.J. John Wiley & Sons, Inc. (2009)
4. Rutkowski, L.: Computational Intelligence. Methods and Techniques. Berlin-
    Heidelberg: Springer-Verlag (2008)
5. Marwala, T.: Computational Intelligence for Missing Data Imputation, Estimation,
    and Management: Knowledge Optimization Techniques. Hershey-New York, In-
    formation Science Reference (2009)
6. Hathaway, R.J., Bezdek, J.C.: Fuzzy c-means clustering of incomplete data. IEEE
    Trans. on Systems, Man, and Cybernetics, №5, 31, pp. 735-744 (2001)
7. Zagoruyko, N.G.: Empirical predictions. Novosibirsk, Nauka (1979)
8. Zagoruyko, N.G.: Applied Data Analysis and Knowledge. Novosibirsk (1999)
9. Kohonen, T.: Self-Organizing Maps. Berlin: Springer-Verlag (1995)
10. Gorshkov, Ye., Kolodyazhniy, V., Bodyanskiy, Ye.: New recursive learning algo-
    rithms for fuzzy Kohonen clustering network. Proc. 17th Int. Workshop on Nonli-
    near Dynamics of Electronc Systems. (Rapperswil, Switzerland, June 21-24,
    2009) Rapperswil, Switzerland, pp. 58-61 (2009)
11. Bodyanskiy, Ye., Shafronenko, A., Volkova, V.: Adaptive clustering of incom-
    plete data using neuro-fuzzy Kohonen network. In “Artificial Intelligence Meth-
    ods and Techniques for Business and Engineering Applications” – Rzeszow-Sofia:
    ITHEA, pp. 287-296 (2012)
12. Shafronenko, A., Dolotov, A., Bodyanskiy, Y., Setlak, G.: Fuzzy Clustering of
    Distorted Observations Based on Optimal Expansion Using Partial Distances. In
    “2018 IEEE Second International Conference on Data Stream Mining & Process-
    ing (DSMP)”, pp. 327 – 330 (2018)
13. Bodyanskiy, Ye.: Computational intelligence techniques for data analysis. Lecture
    Notes in Informatics. Bonn: GI (2005)
14. Krishnapuram, R., Keller, J.M.: A possibilistic approach to clustering. Fuzzy Sys-
    tems, №2, pp.98-110 (1993)