Online Neuro Fuzzy Clustering of Data with Omissions and Outliers based on Сompletion Strategy Yevgeniy Bodyanskiy1[0000-0001-5418-2143], Alina Shafronenko1[0000-0002- 8040-0279] , and Diana Rudenko1[0000-0002-1792-5080] 1 Kharkiv National University of Radio Electronics, Nauky Ave., 14, Kharkiv, Ukraine yevgeniy.bodyanskiy@nure.ua, alina.shafronenko@nure.ua, diana.rudenko@nure.ua Abstract. In the paper new recurrent adaptive algorithms for fuzzy clustering of data with missing values are proposed. This algorithm is based on fuzzy clustering procedures and self-learning Kohonen’s rule using principle “Winner-Takes-More” with Cauchy neighborhood function. Using proposed approach it’s possible to solve clustering task in on-line mode in situation when the amount of missing values in data is too big. Keywords: fuzzy clustering, Kohonen self-organizing network, learning rule, incomplete data with missing values. 1 Introduction The problem of data sets described by vector-images clustering often occurs in many applications associated with Data Mining, but recently the focus on fuzzy clustering [1-3], when processed vector-image with different levels of probabilities, possibilities or memberships, can belong to more than one class. However, there are situations when the data sets contain omissions and outliers, the information that is lost. In this situation more effective is to use mathematical apparatus of Computational Intelligence [4] and, first of all artificial neural networks [5], that solve the task of restoring the lost observations, and modifications of the popular method of fuzzy c-means [6], which solve the problem of clustering without recovery of data. Existing approaches [7,8] for data processing with omissions and outliers, are effi- cient in cases when the massive of the original observations is given in batch form and does not change during the processing. At the same time, there is a wide class of problems in which the data that arrive to the processing, have the form of sequence that is feed in real time as it occurs in the training of Kohonen self-organizing maps [9] or their modifications [10]. In this regard we introduced [11] the adaptive neuro- fuzzy Kohonen network to solve the problem of clustering data with gaps based on the strategy of partial distances (PDS FCM). However, in situations where the number of such omissions and outliers is so much, the strategy of partial distances [12] may be not effective, and therefore it may be necessary, along with the solution of fuzzy clustering simultaneously estimate the missing observations. In this situation, a more efficient is approach that is based on the optimal expansion strategy (OCS FCM) [6]. This work is devoted to the task of on-line data clustering using the optimal expansion strategy, adapted to the case when information is processed in a sequential mode, and its volume is not determined in advance. 2 Probabilistic adaptive fuzzy clustering with missing data based on the optimal completion strategy Baseline information for solving the task of clustering in a batch mode is the sample of observations, formed from N n -dimensional feature vectors X  {x1 , x2 ,..., xN }  R n , xk  X , k  1, 2,..., N . The result of clustering is the partition of original data set into m classes (1  m  N ) with some level of member- ship U q (k ) of k -th feature vector to the q -th cluster (1  q  m) . Incoming data previously are centered and standardized by all features, so that all observations be- long to the hypercube [1,1]n . Therefore, the data for clustering form array X  {x1 ,..., xk ,..., x N }  R n , xk  ( xk1 ,..., xki ,..., xkn )T , 1  xki  1 , 1  m  N , 1  q  m , 1  i  n , 1  k  N that is, all observations xki are available for proc- essing. Introducing the objective function of clustering [1] N m E (U q (k ), wq )  U q (k ) D 2 ( xk , wq ) k 1 q 1 m N with constraints U (k )  1, 0  U (k )  N and solving standard nonlinear q 1 q k 1 q programming problem, we get the probabilistic fuzzy clustering algorithm [2, 3]  2 1 ( ) 1   ( 1) ( D ( xk , wq )) U q (k )  m 1 ,    l 1 2 ( ) 1  ( D ( xk , wl ))  N (1)  ( 1)   (U q( 1) (k ))  xk  wq  N k 1 ,    k 1 ( 1) (U q (k ))  where wq - prototype (centroid) of q -th cluster,   1 - parameter that is called fuzzyfier and defines "vagueness" of boundaries between classes, D 2 ( xk , wq ) - the distance between xk and wq in adopted metric,   0,1, 2,... - index of epoch of information processing which is organized as a sequence of w (0) q U (1) q  w U (1) q (2) q  ... . The calculation process continues until satisfy the condition wq( 1)  wq( )    1  q  m, (here  - defines threshold of accuracy) or until the specified maximum number of epochs Q (   0,1, 2,..., Q ). Note also that when   2 and 2 D 2 ( xk , wq )  xk  wq , we get a popular algorithm of Bezdek’s fuzzy c-means (FCM) [1]. The process of fuzzy clustering can be organized in on-line mode as sequentially processing. At this situation batch algorithm (1) can be rewritten in recurrent form [12]  2 1 1   ( D ( xk 1 , wq (k ))) U q (k  1)  m 1 ,    l 1 2 ( D ( xk 1 , wl (k ))) 1   (2)    wq (k  1)  wq (k )   (k  1)U q (k  1)( xk 1  wq (k )), where  (k  1) - learning rate parameter, U q (k  1) - bell-shaped neighborhood function of neuro-fuzzy Kohonen network (Cauchy function), designed to solve the problems of fuzzy clustering [10, 11], based on the principle "Winner Takes More» (WTM) [9]. In the presence of an unknown number of missing values in vector images xk , that form array X , following [6], we introduce the sub-arrays: X F  {xk  X | xk - vector containing all components} ; X P  {xki ,1  i  n,1  k  N | values xk , available in X } ; X G  {xki  ?,1  i  n,1  k  N | values xk , absent in X } . The optimal completion strategy consists in the fact that the elements of sub-array X G are considered as additional variables, which are estimated by minimization of objective function E . Thus, in parallel with clustering (optimization E by U q (k ) and wq ) estimation of missing observations is made (optimization E by xki  X G ). In this case, the algorithm of fuzzy c-means based on the optimal expan- sion strategy can be written as the following sequence of steps [6]: 1. Setting the initial conditions for the algorithm:   0 ; 1  m  N ;   0 ; wq(0) ; 1  q  m ;   0,1, 2,..., Q ; X G(0)  {1  xˆki(0)  1} , where X G(0)  N G (1  N G  (n  1) N ) arbitrary initial estimates xˆki(0) of missing values xki  X G ; 2. Calculation of membership levels by solving the optimization problem: 1 2 ( ) ( ) 1  ( D ( xˆ , w )) U q( 1) (k )  arg min E (U q (k ), wq( ) , X G( ) )  k q m 1 Uq (k )  ( D 2 ( xˆk( ) , wl( ) ))1  l 1 1 2 ( xˆk( )  wq( ) )1   1 m  ( xˆk( )  wl( ) )1 2 l 1 ( ) (here vector xˆ k differs from xk by replacing missing values xki  X G by esti- mates xˆki( ) that are calculated for the  -th epoch of data processing); 3. Calculation the centroids of clusters: N ( 1) ( 1) ( )  (U  (k ))  xˆ  ( 1) q ( ) k w q  arg min E (U q (k ), wq , X G )  k 1N ;  (U q( 1) (k ))  wq k 1 4. Checking the stop conditions: if wq( 1)  wq( )    1  q  m or   Q , then the algorithm terminates, oth- erwise go to step 5; 5. Estimation of missing observations by solving the optimization problem: X G( 1)  arg min E (U q( 1) (k ), wq( 1) , X G ) XG or, equivalently E (U q( 1) (k ), wq( 1) , X G )  0, xˆki that leads to m m xˆki( 1)   (U q( 1) ( k ))  wqi( 1)  (U  (k )) . ( 1) q q 1 q 1 Information processing with this algorithm is organized as a sequence wq(0)  U q(1)  xˆki(1)  wq(1)  U q(2)  ...  wq( )  U q( 1)   xˆki( 1)  wq( 1)  ...  wq( Q ) . 3 Possibilistic adaptive fuzzy clustering with missing data based on the optimal completion strategy The main disadvantage of conventional probabilistic algorithms is connected with the constraints on membership levels which sum has to be equal unity. This reason has led to the creation of possibilistic fuzzy clustering algorithms [14]. In possibilistic clustering algorithms the objective function has the form N m m N E (U q (k ), wq ,  q )   U q (k ) D 2 ( xk , wq )   q  (1  U q (k ))  k 1 q 1 q 1 k 1 (where the scalar parameter   0 determines the distance at which level of member- ship equals to 0.5, i.e. if D 2 ( xk , wq )   q , then wq (k )  0.5 ) and its minimization by U q ( k ) , wq ,  q leads to the result:  1 U q( 1) (k )  1 1  ( D 2 ( xk , wq( ) ) q( ) )  1 ,   ( 1) N N  q w   (U ( 1) q ( k ))   x k  (U q( 1) (k ))  , (3)  k 1 k 1  ( 1) N N  q   (U q (k )) D ( xk , wq )  (U q (k )) . ( 1)  2 ( 1) ( 1)   k 1 k 1 In the on-line processing recurrent form of the algorithm (3) can be written as [10,13]:  D 2 ( xk 1 , wq (k )) 11 U q (k  1)  1 1  ( ) ,  q (k )    wq (k  1)  wq (k )   (k  1)U q (k  1)( xk 1  wq (k )), (4)  k 1 k 1   (k  1)   U  ( p ) D 2 ( x , w (k  1))  U  ( p) ,  q q p q q  p 1 p 1 and the second relation in (4) differs only in the form of the neighborhood function U q (k  1) . Thus, the recurrent possibilistic fuzzy clustering is based on Kohonen competitive self learning. Considering the situation with missing observations and using the optimal comple- tion strategy, as the objective function of possibilistic fuzzy clustering use the expres- sion: N m m N E (U q (k ), wq , q , X G )    U q (k ) D 2 ( xˆk , wq )    q  (1  U q (k ))  k 1 q 1 q 1 k 1 and its minimization leads to the system of obvious relations: E (U q (k ), wq , q , X G ) U q (k )  0,    wq E (U q (k ), wq , q , X G )  0,  E (U q (k ), wq , q , X G ) xˆki  0,  E (U q (k ), wq , q , X G ) q  0, or  1 D 2 ( xˆk( ) , wq( ) )  1 ( U q (k )  1 1  ( 1) ) ,   q( )   ( 1) N N  wq   (U q( 1) (k ))  xˆk( )  (U q( 1) (k ))  , k 1 k 1  (5)  ( 1) m ( 1)  ( 1) m ( 1)   xˆki   (U q (k )) wqi  (U q (k )) ,  q 1 q 1  N N  q( 1)   (U q( 1) (k ))  D 2 ( xˆk( 1) , wq( 1) )  (U q( 1) (k ))  .  k 1 k 1 Similarly to probabilistic fuzzy clustering based on optimal completion strategy we can organize the possibilistic clustering process with missing observations  ( 1) 1 U q (k  1)  2 1 ,  xˆk()1  wq (k )  1 ( ( ) )  1   q  m    (U q( 1) (k  1))  wqi (k ) q 1  xˆk(1,1)i  ,  m ( 1)    (U q (k  1)) (6)  q 1  k 1 2   (U q( 1) ( p ))  xˆ (p 1)  wq (k )  ( 1) p 1  q (k  1)  k 1 ,  ( 1)  (U q ( p))   p 1   wq ( k  1)  wq (k )   (k  1)(U q(Q ) (k  1))  ( xˆk(Q1)  wq (k )),  or in accelerate time  ( 1) 1 U q (k  1)  2 1 ,  xˆk()1  wq( ) (k  1)  1 ( ( ) )  1   q  (0) (Q )  wq (k  1)  wq (k ),  ( 1) ( ) ( 1)   wq (k  1)  wq (k  1)   (k  1)(U q (k  1)) *   * ( xˆk()1  wq( ) (k  1)),   m (7)   (U q( 1) (k  1))  wqi( 1) (k  1)  xˆ ( 1)  q 1 ,  k 1,i m ( 1)    (U q (k  1))  q 1  k 1 2   (U q( 1) ( p ))  xˆ (p 1)  wq( 1) (k  1)  ( 1) p 1  q (k  1)  k 1 .  ( 1)    (U q ( p))  p 1 It’s understandable that the algorithm (7) from computational point of view is the most unwieldy; however, its advantage is that it can be used in on-line mode to detect the emergence of new clusters. In the procedures (6), (7) is unnecessary accumulate the processed sample, which is important in problems (for example, Web Mining), where large amounts of information have to be processed. 4 Experiments Experimental research conducted on two standard samples of data such as Wine and Iris UCI repository. To estimate the quality of the algorithm we used quality criteria partitioning into clusters such as: Partition Coefficient (PC), Classification Entropy (CE), Partition Index (SC), Separation Index (S), Xie and Beni's Index (XB), Dunn's Index (DI). We also compared the results of our proposed algorithm with other more well- known such as Fuzzy C-means (FCM) clustering algorithm and Gustafson-Kessel clustering algorithm. As seen from the experimental results (Table 1, Table 2 and Table 3), the pro- posed algorithm shown better results than the FCM and Gustafson-Kessel clustering algorithm. Table 1. Results of experiments with 10 missing values Iris UCI repository Wine UCI repository Gaps Algorithms XB XB CE CE PC SC PC SC PC DI DI S S Adaptive fuzzy -2,28375e-04 4,56245е-07 -2,3308e-04 3,6936e+05 2,7115e+08 9,1249е-07 1,359e+08 possibilistic 0,582e-13 48,7067 24,4028 0,3733 0,2005 0,0109 clustering data with missing values 3,42085e-06 0,7473e-04 3,6674e-04 0,21415 0,00715 1,92845 0,01375 0,00585 0,38085 0,3808 0,3954 0,1903 2,8555 10 FCM 0,05725 0,27535 0,31965 4,29665 0,02415 0,05075 Gustafson- 0,4731 0,2395 0,0032 1,7309 0,1699 0,5375 0,4731 Kessel FCM FCM values values Kessel Kessel Gustafson- Gustafson- possibilistic possibilistic with missing with missing Algorithms Algorithms clustering data clustering data Adaptive fuzzy Adaptive fuzzy 100 Gaps 50 Gaps 0,8335 0,8120 1,7429e-06 PC 0,9422 0,7399 9,1249е-07 PC 0,2850 0,3555 -4,6617e-04 CE 0,1177 0,4632 -4,6617e-04 CE 0,9994 0,0181 0,1209 SC 0,5219 0,0174 0,3775 SC 0,0060 9,2846e-05 25,5 S 0,0035 1,8345e-04 48,7067 S 2,9995 3,9633 22,7008 XB 3,4413 4,4887 48,8301 XB Iris UCI repository Iris UCI repository 0,1236 0,0579 0,3497 DI 0,3341 0,0355 0,3365 DI 0,7974 0,7798 2,1059e-11 PC 0,5824 0,7892 7,5181e-12 PC 0,4854 0,3731 -5,7912e-04 CE 0,6010 0,3838 -5,4992e-04 CE 7,1553 7,6342e-04 4,3497e+03 SC 4,7678 7,6110e-04 1,1834e+04 SC Table 2. Results of experiments with 50 missing values Table 3. Results of experiments with 100 missing values 0,0144 7,3466e-06 1,4587e+06 S 0,0268 7,1760e-06 4,1981e+06 S 1,3627 8,2453 1,5006e+06 XB 1,1703 8,8618 4,2024e+06 XB 0,0974 0,3590 0,0236 DI Wine UCI repository 0,1030 0,0237 0,0240 DI Wine UCI repository 0,8254 0,8131 1,7329e-06 PC 0,9422 0,7399 9,1249е-07 PC 5 Сonclusions The problem of probabilistic and possibilistic fuzzy adaptive clustering, containing a priori unknown number of gaps, based on the optimal completion of data strategy is considered. The proposed algorithms are based on the recurrent optimization of a special type of goal functions. Missing observations are replaced by their estimates also obtained in the solution of optimization problem. Сentroids of recovered clusters are tuned using a procedure close to the T.Kohonen WTM-rule with the function of the neighborhood (membership), having the Cauchian form. References 1. Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, New York (1981). 2. Hoeppner, F., Klawonn, F., Kruse, R., Runker, T.: Fuzzy Clustering Analysis: Methods for Classification, Data Analysis and Image Recognition. Chichester, John Wiley &Sons (1999) 3. Xu, R., Wunsch, D.C.: Clustering. Hoboken, N.J. John Wiley & Sons, Inc. (2009) 4. Rutkowski, L.: Computational Intelligence. Methods and Techniques. Berlin- Heidelberg: Springer-Verlag (2008) 5. Marwala, T.: Computational Intelligence for Missing Data Imputation, Estimation, and Management: Knowledge Optimization Techniques. Hershey-New York, In- formation Science Reference (2009) 6. Hathaway, R.J., Bezdek, J.C.: Fuzzy c-means clustering of incomplete data. IEEE Trans. on Systems, Man, and Cybernetics, №5, 31, pp. 735-744 (2001) 7. Zagoruyko, N.G.: Empirical predictions. Novosibirsk, Nauka (1979) 8. Zagoruyko, N.G.: Applied Data Analysis and Knowledge. Novosibirsk (1999) 9. Kohonen, T.: Self-Organizing Maps. Berlin: Springer-Verlag (1995) 10. Gorshkov, Ye., Kolodyazhniy, V., Bodyanskiy, Ye.: New recursive learning algo- rithms for fuzzy Kohonen clustering network. Proc. 17th Int. Workshop on Nonli- near Dynamics of Electronc Systems. (Rapperswil, Switzerland, June 21-24, 2009) Rapperswil, Switzerland, pp. 58-61 (2009) 11. Bodyanskiy, Ye., Shafronenko, A., Volkova, V.: Adaptive clustering of incom- plete data using neuro-fuzzy Kohonen network. In “Artificial Intelligence Meth- ods and Techniques for Business and Engineering Applications” – Rzeszow-Sofia: ITHEA, pp. 287-296 (2012) 12. Shafronenko, A., Dolotov, A., Bodyanskiy, Y., Setlak, G.: Fuzzy Clustering of Distorted Observations Based on Optimal Expansion Using Partial Distances. In “2018 IEEE Second International Conference on Data Stream Mining & Process- ing (DSMP)”, pp. 327 – 330 (2018) 13. Bodyanskiy, Ye.: Computational intelligence techniques for data analysis. Lecture Notes in Informatics. Bonn: GI (2005) 14. Krishnapuram, R., Keller, J.M.: A possibilistic approach to clustering. Fuzzy Sys- tems, №2, pp.98-110 (1993)