-

Online Hybrid Probabilistic-Fuzzy Clustering in Medical Data Mining Tasks

niy Bo

nskiy

0 0 Control systems research laboratory, Kharkiv National University of Radio Electronics , Kharkiv , Ukraine

0000 0001

In the paper the online fuzzy clustering recurrent procedure has been introduced that allows the forming of hyperellipsoidal clusters with an arbitrary orientation of the axes is investigated. The proposed clustering system is the generalization of a number of known algorithms, it is intended to solve tasks within the general problems of Medical Data Mining, when information is sequentially fed to processing in online mode.

Medical Data Mining Big Data Computational Intelligence Fuzzy Clustering EM-Algorithm Kohonen's Self-Learning Soft Clustering

Clustering task has a special place in the general problem of Data Mining [1,2] since it’s solution implements in self-learning mode (unsupervised learning) when the researcher doesn’t have a marked-up training dataset in advance. It is clear that here the level of a priori uncertainty is much higher comparatively to other problems of data analysis, which led to the emergence of a variety of approaches, methods and algorithms for this problem solving [3-6], differing both in initial premises and in mathematical procedures, which often leads to different results in the end. This is distinguishes the task of clustering from other traditional Data Mining tasks such as classification, forecasting, identification of hidden dependencies contained in data, etc.

Clustering task is complicated if the data for processing are received sequentially in online mode, forming a data stream [7], in doing so the frequency of data income is such that it's impossible to process an accumulated information in time between two neighboring observations. The situation is even more complex if the amount of data is so large that it’s processing is impossible as a single array (Big Data concept [8]). Hence, the idea of online clustering of data flow seems to be very attractive, especially in Medical Data Mining task, connected with mass examination of patients.

Artificial neural networks, fuzzy reasoning systems, and hybrid neuro-fuzzy systems [3, 9, 10] can be successfully used to solve the problems of processing data in online mode, while performance issues come to the fore, especially learning processes. Obviously, that multi-epoch learning in this situation is ineffective. The incremental learning is more promising when the parameters of the clustering system are refined sequentially synchronously with data arrival. Here, first of all, it is necessary to note the clustering Kohonen’s neural networks [9] – self-organizing maps (SOM), which have shown their effectiveness in solving many real-world problems. It should be remembered that SOMs solve clustering problems under convex (linearly separable) nonoverlapped classes.

In real-world problems, data usually form overlapping classes, wherein each observation simultaneously belonging to two or more classes. It’s clear, in this case, in the process of clustering, it is necessary to calculate both the possible classes and the probability membership levels of each vector-pattern to each of the possible classes. Obviously, that in this situation, the traditional Kohonen’s neural network is ineffective. In this case, the so-called soft computing methods come to the fore, among them the most popular is the Expectation-Maximization approach (EM-algorithm) [9-15], which is based on probabilistic assumptions. It should also be noted a powerful method of Fuzzy-C-Means (FCM) [10-12] proposed by J.C.Bezdek. Both of these methods were combined in the hybrid approach proposed in [15].

However, it must be noted that EM and FCM are algorithms that process information in batch form in multi-epoch mode, which makes it impossible to use them in Data Stream Mining tasks. In this regard, recurrent adaptive FCM versions operating in the online sequential mode were introduced in [16, 17]. The disadvantages of this approach include the use of conventional Euclidean metrics, which allows the formation of clusters of only a spherical shape. Obviously that in the class conditions of a complex nonconvex form, there may be too many such spheroid clusters, which reduces the speed of the processing.

In this regard, it is advisable to develop recurrent online algorithms of sequential fuzzy clustering which allow forming data clusters of a more complex form than the hyperspheres and, particularly, hyprellipsoidal one arbitrarily oriented in the feature space of overlapping classes. 2

Batch Clustering Using Probabilistic And Fuzzy Approaches Let the initial data array be given in the dataset form of n-dimensional observation vectors such that x(k ) = ( x1 (k ) ,..., x ( k ) ,..., xn (k ))T Î Rn , where k =1, 2,..., N i describes the number of the observation in this dataset. As a result of clustering, this dataset should be divided into m (1<m<N) overlapping ellipsoidal classes.

In the framework of the statistical approach (EM-algorithm), it is assumed that the data are of a random nature and have a Gaussian density distribution:

ae n p j ( x ) = ç ( 2p )2 è

ö-1 ae det å j ÷ exp ç ø è 12 ( x - wj )T å-j1 ( x - wj ) ö÷ø where wj - n-dimensional vector centroid of the j-th cluster, å j - the correlation matrix of the j-th cluster of dimension (n ´ n) : ( 1 ) ( 2 ) ( 3 ) ( 4 ) ( 5 ) ( 6 ) m å p j = 1. j=1 m å µ j = 1 j=1

It’s obviously that using of this metrics instead of the traditional Euclidean one in the FCM algorithm allows to restore the classes of the hyperellipsoidal form with an arbitrary orientation of the axes in the initial space of features.

In the process of clustering using the EM-approach, the maximization of the function of log likelihood is implemented: å j =

N1 åkN=1 ( x ( k ) - wj ) ( x ( k ) - wj )T .

Basing on ( 1 ), it is easy to write down the joint distribution density of observations of an initial dataset in the form

N N ae n p ( x ) = å p j p j ( x ) = å p j ç (2p )2 k =1 k =1 è

ö-1 ae 1 det å j ÷ exp ç ø è 2 ( x - wj )T å-j1 ( x - wj ) ö÷ø =

N ae n = å p j ç (2p )2 k =1 è

ö-1 ae 1 ö det å j ÷ø exp çè - 2 d M2 ( x, wj ) ÷ø where p j - a priori probabilities that satisfy standard conditions

It is easy to see that equation ( 4 ) coincides with the constraint on the sum of the memberships in the Fuzzy C-Means algorithm where 0 < µ j (k) < 1 - membership level of k-th observation to j-th cluster.

Due to the constraints of ( 5 ) procedures of FCM-type are called fuzzy probabilistic algorithms [10].

Let’s introduce Mahalanobis distance between centroids wj and vectors-pattern x(k ) in the form of dm2 ( x ( k ) , wj ) = ( x (k ) - wj ) å-j1 ( x (k ) - wj ).

T wherein, the final result can be written in the form [13] ïíw = åN p j ( x (k )) x (k ) ïî j k =1

N å p j ( x (k )).

k =1 ì ö m ö ïï p j ( x (k )) = exp aeèç - 12 d M2 ( x (k ) , wj ) ÷ø ål=1 exp aeèç - 12 d M2 ( x (k ) , wj ) ÷ø ,

It should be noted that when pj = m-1 and the unity matrix å-1 , the EM algorithm coincides with the standard K-means clustering procedure.

As it is known [4,5] K-means clustering procedure is related with minimization of the goal function

N m E ( x (k ), wj ) = ååµ j (k ) x (k ) - wj k=1 j=1 2

N m = ååµ j (k ) dE2 ( x (k ), wj )

k=1 j=1 ìï1, if x (k ) belongs to j - th cluster, µ j (k ) = í

ïî0, otherwise.

In doing so, the use of Euclidean metrics leads to the fact that the emerging clusters have a spherical shape. A modification of the standard K-means is the procedure [6] of Mahalanobis K-means, associated with the goal function

N ae m ö E ( x (k ) , wj , å j , p j ) = ålog ç å p j p j ( x (k )) ÷

k=1 è k=1 ø

N m E ( x (k ) , wj ) = å åµ j (k ) ( x(k) - wj )T å-j1 ( x(k) - wj ) =

k=1 j=1

N m = å åµ j (k ) d M2 ( x (k ) , wj )

k=1 j=1 where ( 7 ) ( 8 ) ( 9 ) ( 10 ) ( 11 ) ( 12 ) the minimization of which leads to an assessment of the centroids’ position of the form:

N wj = åµ j (k ) x (k ) k =1

N åµ j (k ) = k =1 1

å x (k ) N j x(k)ÎClj where N j - is the number of observations of the initial dataset associated with the j-th cluster.

Crisp goal functions ( 9 ), ( 11 ) are the partial case of fuzzy clustering criterion [18]

N m E ( x (k ), wj , µ j ) = å åµ bj (k ) d 2 ( x (k ), wj )

k=1 j=1 where a positive fuzzifier b > 0 describes blurring of the clusters, wherein most often (FCM) the value of this parameter is equal to two. At the same time, as distances d 2 ( x (k ) , wj ) – Euclidean metrics are taken.

The solution of the minimization of the goal function problem ( 13 ) with the constraints ( 5 ) leads to fuzzy probabilistic clustering algorithm [3]: ( 13 ) ( 14 ) ( 15 ) ( 16 ) ( 17 ) which when b = 2 turns into a standard FCM procedure: ì 2 ïïµ j (k ) = d 1-b ( x (k ) , wj ) ïíw = åNµ bj ( x (k ) , wj ) ï j î k=1 m 2 åd 1=b ( x (k ) , wl ) , l=1 ì dE-2 ( x (k ) , wj ) ïïµ j (k ) = m ï å dE-2 ( x (k ) , wl ) íïïw = åNµl =2j1(k ) x (k ) ïî j k =1

N åµ 2j (k ). k =1

x (k ) - wj = m å x (k ) - wl l=1 -2 -2 The first relation ( 14 ) can be rewritten in the form the corresponding description of generalized Gaussian [19], which when b = 2 turns into Cauchy probabilities density function, which leads to the expression µ j (k ) = 1+ d 2 ( x (k ) , wj ) 1 g j ae ö ç m , g j = ç å d 2 ( x (k ) , wl ) ÷÷ .

çè ll=¹1j, ø÷ -1

It is interesting to note here that if the EM algorithm is based on Gaussian distribution, then fuzzy procedures are connected with the distribution of Cauchy.

It is also interesting to note that the popular clustering algorithm of Gath-Geva [20], which minimizes the goal function ( 13 ), occupies an intersection between the EM and FCM approaches since the estimate uses as the distance

-1 ae 1 ö dG2G ( x (k ) , wj ) = q j ( det å j ) exp ç - ( x - wj )T å=j1 ( x - wj ) ÷ø =

è 2 aeè 12 d M2 ( x, wj ) ÷ø = q j ( det å j )=1 exp ç - ö

N q j = åµ bj (k ) k =1

N m å åµlb (k ).

k =1 l=1

As a result of minimization ( 18 ) with the constraints ( 5 ), ( 19 ), we came to the algorithm where which when b = 2 takes the form ì 2 ïµ j (k ) = dG1-Gb ( x (k ) , w j ) ï ïï N N íwj = åµ bj (k ) x (k ) åµ bj (k ) , ï k=1 k=1 ï N ïïîå j = åk=1 µ bj (k ) ( x (k ) - wj ) ( x (k ) - wj )

T m 2 å d 1-b ( x (k ) , w j ) , l=1 which is the FCM modification for the clusters-hyperellipsoids case. ( 18 ) ( 19 ) ( 20 ) ( 21 ) 3

Online Fuzzy Probabilistic Clustering In The Case Of

Hyperellipsoidal Classes The procedures discussed above assume that the initial dataset is specified in the batch of data form, which is processed several times in multi-epoch learning mode. It is clear that if the information is fed for processing in the form of a data stream x ( 1 ), x ( 2 ), ..., x(k ), x (k +1),..., (here k – index of the current discrete time), the clustering methods discussed above are ineffective.

As is known, self-organizing T. Kohonen’s map solves the clustering problem in sequential mode by minimizing the goal function ( 9 ), i.e., in fact, implements the Kmeans method for data flow. Popular WTA self-learning rule that looks like [21] ìïwj (k ) +h (k +1) ( x (k +1) - wj (k )) , ï wj (k +1) = íif wj (k ) - " winner ", ï ïîwj (k ) - otherwise (here 0 <h (k +1) < 1- the learning rate parameter is chosen according to stochastic approximation conditions) actually computes the usual arithmetic mean in recurrent form.

The self-learning rule ( 22 ) is closely related to the EM algorithm since the E-step of expectation implements the process of Kohonen’s competition, and the M-step of maximization implements the synaptic adaptation process. With this distance: dE2 ( x (k +1) , wj (k )) = x (k +1) - wj (k ) 2 minimizing by the gradient procedure ( 24 ), in fact, coinciding with ( 22 ) ìïwj ( k ) -h (k +1) Ñcj dE2 ( x (k +1) , wj (k )) , ï wj (k +1) = íif wj (k ) - " winner ", ï ïîwj ( k ) - otherwise.

Similarly ( 24 ) Mahalanobis metrics can be minimized ( 6 ) using a recurrent procedure [22]: ìïwj (k ) -h (k +1) Ñcj d M2 ( x (k +1) , wj (k )) , ï wj (k +1) = íif wj (k ) - " winner ", ï ïîwj (k ) - otherwise or, which is the same: ( 22 ) ( 23 ) ( 24 ) (25) where kl – the number of “wins” of the j-th neuron of Kohonen’s map.

For fuzzy situations where the clusters being formed are mutually overlapped, in addition to the procedure (26), an assessment of the membership level can be declared similar to ( 15 ): (26) (27) (28) (29) µ j ( k ) = d -2 ( x (k ) , wj (k ))

M m å d -2 ( x (k ) , wl (k )) =

M l=1 (( x (k ) - wj (k )) å-j1 (k ) ( x (k ) - wj (k )))

T = m å (( x (k ) - wl (k ))T ål-1 (k ) ( x (k ) - wl (k ))) l=1 -1

Next, solving the nonlinear programming problem (goal function ( 13 ) with constraints ( 5 )), using the Arrow-Hurwitz-Uzawa algorithm, it is easy to write down relations: which when b = 2 taken the simple form [16]: ìïwj (k +1) = wj (k ) +h (k +1)µ bj (k +1) ( x (k +1) - wj (k )), í 2 ïîµ j (k +1) = dE1-b ( x (k +1), wj (k )) m 2 å d 1-b ( x (k +1) , wl (k )),

E l=1 ìïwj (k +1) = wj (k ) +h (k +1)µ 2j (k +1) ( x (k +1) - wj (k )), í ïîµ j (k +1) = x (k +1) - wj (k ) -2 m å x (k +1) - wl (k ) -2 . l=1

Here, the multipliers µ bj (k +1) , µ 2j (k +1) in fact are neighbourhood functions in Kohonen’s self-learning rule “Winner Takes More”. Wherein, instead of the usual Gaussian, generalized Gaussian and Cauchian are used. It is interesting to note that the receptive fields parameters of these functions are automatically evaluated here.

It should be noted that the recurrent modification of the Gath-Geva algorithm [20] was introduced in [23], but this modification is not related to optimization procedures. Further modification can be written in the form of recurrence relations ïìwj (k +1) = wj (k ) +h (k + 1) ( x (k + 1) - wj (k )), if w j (k ) - " winner ", ïîíå j (k + 1) = (1-h (k + 1)) å j (k ) +h (k ) ( x (k + 1) - wj (k )) ( x (k + 1) - w j (k ))T .

It can be noticed that relations (30) are the WTA self-learning rule of T.Kohonen and the procedure for correcting the correlation matrix. It is interesting to note that this matrix doesn’t fluent the process of the centroids tuning.

A more effective is algorithm also proposed in [23], where a correction is made to the membership levels of the form: (30) (31) (32) k +1 U j (k +1) = åµ bj (t ) = U j (k ) + µ bj (k +1).

t =1 In this case, the algorithm has the form: ì µ bj (k +1) ïwj (k +1) = wj (k ) + ï U j (k +1) ï ï ae ïå j (k +1) = U j (k ) U j (k +1) çç å j (k ) + í è ïï! ( x (k +1) - wj (k ))T ) , ï ï 2 ïîµ j (k +1) = dG1-Gb ( x (k +1) , wj (k )) ( x (k +1) - wj (k )) , µ bj (k +1) U j (k +1)

( x (k +1) - wj (k )) !

Algorithm (32) is close to procedure (28) and coincides with it when h (k +1) = U -j1 (k +1), however, the metric dG2G ( x (k +1) , wj (k )) used is significantly different from the previously used one dE2 ( x (k +1) , wj (k )) . Once again, in this algorithm, the correlation matrix å j (k ) does not affect the process of centroids tuning.

Combining procedure (25) using the gradient of Mahalanobis ’metrics and the standard Gath-Geva algorithm, we will get a relation describing the fuzzy clustering method with the hyperellipsoidal classes: ( x (k +1) - wj (k )) !

(33) Using the fuzzifier b = 2 , we get adaptive recurrent procedure [24]: ì ïU j (k +1) = U j ( k ) + µ 2j (k +1) , ï µ 2j (k +1) ïïwj (k +1) = wj (k ) + U j ( k +1) å-j1 ( x (k + 1) - wj (k )) , ï ï U j (k ) ae µ 2j (k +1) ïíå j (k +1) = U j (k +1) çèç å j (k ) + U j (k +1) ïï! ( x (k +1) - wj (k ))T ) , ï ï ïîµ j (k +1) = dG-G2 ( x (k +1) , wj (k )) m å dG-G2 ( x (k +1) , wj (k )). l=1 ( x (k +1) - wj (k )) (34) 4

Computer Experiments With Medical Data As a demonstration of the developed method Online Fuzzy Probabilistic Clustering, two medical samples from the UCI repository were used. The first series of experiments was performed on a dataset of Breast Cancer Wisconsin, which related to the applied problem in the medical field. The data describe two types of tumors – malignant or benign. Clustering results are represented as a 3D model.

The structure of Breast Cancer Wisconsin dataset is complex and non-linear. The maximum classification accuracy reached 97.5 percent. Fig. 1, 2, and 3 show the visualization of clustering results in three different projections. For visualization of our dataset, that was previously compressed with Principal Component Analysis method (PCA-method) into three principal components.

The received results we compared with standard EM-algorithm and Fuzzy Cmeans method of J. Bezdek. The results of the experiment can be seen in Table 1. In the first row of the table, the outputs of EM-procedure are represented, in the second row – FCM one and in the third – the proposed approach. It can certainly be seen, that considered procedures surpass the quality of results both EM and FCM algorithms.

For each cell nucleus, ten real signs are calculated that are columns in the sample: radius (average distance from the center to the points along the perimeter), texture (standard deviation of gray values); perimeter; square; smoothness; compactness; concavity (the severity of the concave parts of the contour); the number of firing points (the number of concave parts on the contour of the tumor); symmetry; fractal size.

The obtained accuracy ranged from 89% to 97.5%, which in general was from 569 examples correctly classified from 490 to 547 examples.

The second series of experiments was carried out on a sample of Dermatology which is the data obtained as a result of a study of the histopathological features of patients with dermatological diseases. The goal is to determine the type of dermatological disease.

The dataset has 33 attributes, of which: one target attribute, which shows the presence of a specific dermatological disease, there are 6 classes. Classification: 1st class (psoriasis) – 112 examples, 2nd class (seborrheic dermatitis) – 61 examples, 3rd grade (lichen) – 72 examples, 4th grade (pink lichen) – 49 examples, 5th grade (chronic dermatitis) – 52 examples, 6th grade (lichen planus) – 20 examples.

This sample is not quite popular for testing various algorithms for medical data analytics since it has several linearly inseparable clusters that are difficult to process with standard clustering algorithms.

The final results of the clustering of the developed method are presented in Fig. 4 and 5. As can be seen in Fig. 4 and 5 some examples of one cluster are very close to another cluster, this suggests that some clusters are not linearly separable.

A series of experiments indicate that the proposed Online Fuzzy Probabilistic Clustering method works quite quickly and shows the high quality of clustering of data arrays in conditions of non-linear data. And also the experiment proves that the proposed method can solve practical problems in the field of medical data processing. The problem of online fuzzy clustering of data that are fed to processing sequentially in the form of data stream was considered. A feature of approach under consideration is that the formed classes have the hyperellipsoidal shape with an arbitrary orientation of the axes in feature space. The proposed procedure uses Mahalanobis’ metrics and also it is the generalization of a number of well-known clustering algorithms, has high speed and simple, in computational implementation. The obtained results allow solving a number of problems arising in Data Mining and especially Medical Data Mining ones connected with mass examination of patients.

1. Aggarwa,l C.C. : Data Mining . Cham: Springer, Int. Publ., Switzerland ( 2015 ).

2. Bramer , M. : Principles of Data Mining . Springer-Verlag London ( 2016 ).

3. Höppner , F. , Klawonn , F. , Kruse , R. , Runkler , T. : Fuzzy Cluster Analysis: Methods for Classification, Data Analysis and Image Recognition . John Wiley & Sons. Chichester ( 1999 ).

4. Gan , G. , Ma, Ch., Wu , J. : Data Clustering: Theory, Algorithms and Applications, Philadelphia: SIAM ( 2007 ).

5. Xu , R. , Wunsch , D. C. : Clustering . IEEE Press Series on Computational Intelligence. Hoboken, NJ: John Wiley & Sons, Inc. ( 2009 ).

6. Aggarwal , C. C. , Reddy , C. K. : Data Clustering. Algorithms and Application . Boca Raton: CRC Press ( 2014 ).

7. Bifet , A. : Adaptive Stream Mining: Pattern Learning and Mining from Evolving Data Streams , IOS Press ( 2010 ).

8. Kacprzyk , J. , Pedrycz , W.: Springer Handbook of Computational Intelligence, Berlin Heidelberg: Springer, Verlag ( 2015 ).

9. Du , K.-L., Swamy , M. N. S. : Neural Networks and Statistical Learning. London: SpringerVerlag ( 2014 ).

10. Bezdek , J.-C.: Pattern Recognition with Fuzzy Objective Function Algorithms , N.Y.: Plenum Press ( 1981 ).

11. Bodyanskiy , Ye . V., Deineko , A. O. , Kutsenko , Y. V. : On-line kernel clustering based on the general regression neural network and T. Kohonen's self-organizing map , Automatic Control and Computer Sciences 51 ( 1 ), 55 - 62 ( 2017 ).

12. Bezdek , J. C. , Keller, J., Krishnapuram , R. , Pal , N.: Fuzzy Models and Algorithms for Pattern Recognition and

Image

Processing . The Handbook of Fuzzy Sets . Kluwer, Dordrecht, Netherlands: Springer, vol. 4 ( 1999 ).

13. Dempster , A. P. , Laird , N. M. , Rubin , D. B. : Maximum likelihood from incomplete data via the EM algorithm , J. of the Royal Statistical Society, Ser.B 39 ( 1 ), рр. 1- 38 ( 1977 ).

14. Hathaway , R.: Another interpretation of the EM algorithm for mixture distributions , J. of Statistics & Probability Letters , vol. 4 , pp. 53 - 56 ( 1986 ).

15. Meng , X. L. , Rubin , D. B. : Maximum likelihood estimation via the ECM algorithm:a general framework , Biometrica , vol. 80 , рр. 267 - 278 ( 1993 ).

16. Bodyanskiy , Ye.: Computational intelligence techniques for data analysis , Lecture Notes in Informatics, Bonn: GI , pp. 15 - 36 ( 2005 ).

17. Gorshkov , Ye., Kolodyaznhiy , V. , Bodyanskiy , Ye.: New recursive learning algorithms for fuzzy Kohonen clustering network , Proc. 17th Int. Workshop on Nonlinear Dynamics of Electronic Systems , Rapperswil, Switzerland, pp. 58 - 61 ( 2009 ).

18. Mumford , C. , Jain , L. : Computational

Intelligence

, Collaboration, Fuzzy and Emergence, Berlin: Springer, Vergal ( 2009 ).

19. Osowski , S. : Sieci neuronowe do przetwarzania informacji , Warszawa: Oficijna Wydawnicza Politechniki Warszawskiej ( 2006 ).

20. Gath , I. , Geva , A. B. : Unsupervised optimal fuzzy clustering , Pattern Analysis and Machine Intelligence 2 ( 7 ), pp. 773 - 787 ( 1989 ).

21. Kohonen , T. : Self-Organizing Maps . Berlin: Springer-Verlag ( 1995 ).

22. Bodyanskiy , Ye., Deineko , A. , Kutsenko , Y. , Zayika , O. : Data streams fast EM-fuzzy clustering based on Kohonen`s self-learning, The 1st IEEE International Conference on Data Stream Mining & Processing (DSMP 2016): Proc. of Int. Conf., Lviv , pp. 309 - 313 ( 2016 ).

23. Geva , A.B. : Clustering as a basis for evolving neuro-fuzzy modeling , Evolving Systems , pp. 59 - 71 ( 2010 ).

24. Deineko , A. , Zhernova , P. , Gordon , B. , Zayika , O. , Pliss , I. , Pabyrivska , N. : Data stream online clustering based on fuzzy expectation-maximization approach . The 2nd IEEE International Conference on Data Stream Mining and Processing (DSMP 2018): Proc. of Int. Conf., Lviv , pp 171 - 176 ( 2018 ).