Online Image Segmentation using Сredibilistic Fuzzy Clustering Yevgeniy Bodyanskiy1, Alina Shafronenko1, Diana Rudenko1, Anton Pоlubiekhin1, Dmytro Frolov1 1Kharkiv National University of Radio Electronics, Nauky ave 14, Kharkiv, 61166, Ukraine Abstract Computational intelligence methods are widely used to solve many complex problems, including, of course, traditional: Data Mining and such new directions as Dynamic Data Mining, Data Stream Mining, Big Data Mining, Web Mining, Text Mining, etc. In the paper was proposed new adaptive on-line methods of fuzzy robust clustering-segmentation of data streams based on probabilistic, possibilistic and credibilistic approaches. Using proposed approach, it’s possible to solve clustering task in on-line mode when data are fed to processing sequentially, possible in real time. Keywords 1 Segmentation, clustering, data stream, credibilistic approach, fuzzy image segmentation 1. Introduction The current state of technological development is inextricably linked with the development of computerized tools, which, in turn, are dependent on the mathematical apparatus and practical algorithms that use it. The development of computer tools, in particular hardware, acts as a catalyst for the development of existing and the emergence of new scientific fields, such as Data Science. Modern capabilities of computing environments allow the implementation of algorithmically sufficiently complex methods that are the basis of intellectual analysis. And this should become an impetus for the development of new hardware and software systems based on the theoretical principles of artificial intelligence. Recently, in the tasks of analyzing and processing non-stationary signals of an arbitrary nature under conditions of uncertainty, computational intelligence methods are increasingly being used, among which hybrid neural networks can be distinguished. By the task of data segmentation, we will understand the division of the data sample into homogeneous homomorphic segments based on the analysis of changes in the internal properties of the data. Currently, several segmentation methods are known, namely using wavelet analysis [1], fractal-wavelet technologies [2], neuro-fuzzy technologies [3-5], etc. Depending on the specifics of the problem being solved, two main types of forecasting and segmentation methods can be applied: real-time and batch. Many neural network architectures, including hybrid structures, are used to solve this kind of problems, but these systems are either cumbersome in their architecture or not sufficiently adapted for real-time learning. In most cases, the activation functions of such networks are sigmoidal functions, splines, polynomials, and radial basis functions. COLINS-2024: 8th International Conference on Computational Linguistics and Intelligent Systems, April 12–13, 2024, Lviv, Ukraine yevgeniy.bodyanskiy@nure.ua (Ye. Bodyanskiy); alina.shafronenko@nure.ua (A. Shafronenko); diana.rudenko@nure.ua (D. Rudenko); anton.polubiekhin@nure.ua (A. Pоlubiekhin); dmytro.frolov@nure.ua (D. Frolov) 0000-0001-5418-2143 (Ye. Bodyanskiy); 0000-0002-8040-0279 (A. Shafronenko); 0000-0002-1792-5080 (D. Rudenko); 0009-0002-7492-3900 (A. Pоlubiekhin); 0009-0009-3291-3561 (D. Frolov) © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings 2. Credibilistic fuzzy clustering Traditionally, the initial information for the clustering problem is a sample of observations consisting of N n - dimensional feature vectors: X = { x(1), x(2),..., x(k ),..., x( N )} , x(k ) ( x1 (k ),..., xn (k ) ) ∈ R n , k = 1, 2, …, N, T = and the result of the algorithm is the distribution of the initial data set into m classes with a certain level wj(k) belonging to the k-th feature vector of the j-th cluster. At the same time, there is a wide class of problems when the initial information comes not in vector, but in matrix form, i.e. x(k ) = {xi1i2 (k )}; where i1 = 1, 2, …, n1, i2 = 1, 2, …, n2, k = 1, 2, …, N. Such situation is characteristic, for example, in image processing [6], when the initial (N1 × N2)-matrix is divided into N = N1N2(n1n2)-1 (n1 × n2) fragment matrices that are subject to clustering, because of which the homogeneous in some sense segments of this image. Traditionally, this problem is solved by preliminary vectorization of fragments and the use of already known procedures, the most popular of which is the method of clustering fuzzy C-means [6, 7]. To process matrix data, it is necessary to introduce matrix methods of clustering- segmentation, for which it is advisable to introduce into consideration the matrix method of fuzzy C - means, which is a generalization of FCM. This method will avoid unnecessary vectorization- devectorization operations when processing data given in the form of two-dimensional arrays and provides information processing in online mode. So, let the sample of observations be given x(k ) {xi1i2 (k )} ∈ R n1 ×n2 , k = 1, 2,..., N , = at the same time, for the convenience of further processing, these data are pre-centered relative to the average: 1 N x= ∑ x(k ) N k =1 (1) and normalized to its spherical norm (Frobenius norm): x(k ) = Tr ( x(k ) xT (k ) ) . (2) The matrix probabilistic criterion is used as the objective function of clustering: N m N m ( w j (k ), c j ) E= ∑∑ wβj (k ) D= k 1 =j 1 = 2 ( x(k ), c j ) ∑∑ w (k )Tr ( ( x(k ) − c )( x(k ) − c ) ), k 1 =j 1 = β j j j T (3) in presence of constraints m m m j =j 1 =j 1 ∑ w (k ) = 1 ∨ ∑ w (k ) − 1 = 0, k = 1, 2,..., N , 0 < ∑ w (k ) < N , j = 1, 2,..., m. j =j 1 j By introducing the Lagrange function: N m N  m  ∑∑ = w β j k ) D 2 L( w j (k ), c j , λ (k )) ( ( x ( k ), c j ) + ∑ =λ ( k )  ∑ w j ( k ) − 1 k 1 =j 1 = k 1 = =j 1  (4) N  m  m  = ∑  ∑ wβj (k ) D 2 ( x(k ), c j ) + λ(k )  ∑ w j (k ) − 1  , j1  =  k 1= =j1  where λ(k) – uncertain Lagrange multiplier, and solving the system of Kuhn-Tucker equations:  ∂L( w (k ), c , λ (k ))  j j = βwβ− 1 2 j ( k ) D ( x ( k ), c j ) + λ ( k ) = 0;  ∂w j (k )   ∂L( w j (k ), c j , λ (k )) m  = ∑ w j (= k ) − 1 0;  ∂λ j (k ) j =1   ∂L( w j ( k ), c j , λ(k ))  = N   −2∑ wβj ( k )( x( k ) − c j ) = O,  ∂c j (k )   k =1  ∂L( w j (k ), c j , λ (k ))  where   – (n1 × n2 ) - matrix formed from partial derivatives  ∂c j (k )  ∂L( w j (k ), c j , λ (k )) ; Ο – matrix of the same dimension formed by zeros, thus, we arrive at the ∂c ji1i2 final form of the algorithm:  1 2  ( D ( x(k ), c j )) 1−β  w j (k ) = m 1 ;   ∑l =1 2 ( D ( x(k ), cl )) 1−β    m  1   1−β  λ ( k ) = − ∑  βD ( x(k ), cl )   ;  2 1−β (5)   l =1        N   ∑ wβj (k ) x(k ) c j = k =1 N .   ∑ k =1 w j (k ) β The resulting system gives rise to a wide class of clustering procedures. Thus, if we set β = 2, we get a simple and effective matrix clustering algorithm [8], which is a generalization of the popular procedure of J. Bezdek [6]:  (Tr ( x(k ) − c j )( x(k ) − c j )T ) −1  w j (k ) = m ;   ∑l =1 T −1 (Tr ( x(k ) − cl )( x(k ) − cl ) )   N (6)  ∑ w2j (k ) x(k ) c j = k =1 N ,    ∑ k =1 2 u j (k ) where Tr – matrix trace symbol. The main difference between probabilistic and possibilistic approaches is that probabilistic algorithms use relative similarities between objects and clusters, while probabilistic algorithms use absolute similarities. Instead of the fuzzy partition matrix in the fuzzy C-means algorithm, the possible C-means algorithm uses a ( N × m) - matrix of possibilities or typicality matrix T = {tj(k)}, where tj(k) ∈ [0, 1] – the possibility that the object x(k) belongs to cluster j. The possibilistic matrix has only the following limitations: m 0 < ∑ t j (k ) ≤ m, k =1, 2,…, N .   (7) j =1 This means that an object can have a feature vector that contains only values close to zero (usually such objects are considered noise) or only ones. Krishnapuram, Keller et al proposed the probabilistic C-means (PCM) algorithm and two algorithms that combine probabilistic and possibilistic approaches: the probabilistic-possibilistic C-means algorithm (FPCM) and the possibilistic-probabilistic C-means algorithm (PFCM) [9-11]. In the PCM algorithm, formula (6) was replaced by the expression:  1 t j ( k ) = 1 ;   Tr ( x ( k ) − c ) ( x ( k ) − c )  T β−1  1+   j j   γj      (8) N   ∑ t βj ( k ) x ( k ) c j = k =1 N ,   ∑ k =1 t j (k ) β where γj > 0 – a constant determined empirically. It can be seen that the calculation of the cluster prototype in formulas (6) and (8) is identical, with the only difference that the matrix of fuzzy partitioning is changed to the matrix of possibilities. The calculation of the possibility of an object belonging to a cluster in formula (8) can be justified as a bell-shaped function presented in Figure 1. Figure 1: A bell function showing the dependence between membership distances Keller and Krishnapuram suggested choosing the parameter γj in the form: N ∑ w ( k ) Tr ( x ( k ) − c ) ( x ( k ) − c ) β T j j j γ j =K k =1 N , (9) ∑ wβj ( k ) k =1 where K > 0 (most often K = 1). But calculations γj by formula (9) requires memory to store the fuzzy partition matrix, as well as time for its use. The PCM algorithm does a good job of suppressing interference and can usually be applied when it is necessary to improve the results obtained with the help of other algorithms. Also, this algorithm can merge close clusters into one, from which it follows that the initial number of clusters that was set in advance is too large (at the same time, the PCM algorithm can merge clusters that should be separated). The FPCM and PFCM algorithms use both a fuzzy partition matrix and a feature matrix, trying to take advantage of both approaches. The FPCM algorithm uses the following formulas:  1  w j ( k ) = ( Tr ( x ( k ) − c j ) ( x ( k ) − c j ) ) T 1−β ; 1  ( ) m ∑ Tr ( x ( k ) − cl ) ( x ( k ) − cl ) T 1−β   l =1  1   t j ( k ) = ( Tr ( x ( k ) − c j ) ( x ( k ) − c j ) ) T 1−η ; (10) 1 ( ) N  Tr ( x ( l ) − c j ) ( x ( l ) − c j ) T 1−η  ∑l =1   N  ∑ ( wβj ( k ) + t ηj ( k ) ) x(k ) c j = k =1 N ,   ∑ ( j ( ) j ( )) w β k + t η k  k =1 where η > 0 (in most cases η = 2). The FPCM algorithm uses the standard procedure for calculating the fuzzy partition matrix, but the possibility matrix is calculated using a new formula. Cluster prototypes are calculated using the sum of both matrices. The PFCM method uses a standard procedure for calculating the fuzzy partition matrix (as in formula (6)). The procedure for calculating the possibility matrix was taken from PCM (8) and slightly modified. Centroids are calculated as in the FPCM algorithm, but both matrices have their own weights:  1  w j ( k ) = ( Tr ( x ( k ) − c j ) ( x ( k ) − c j ) T 1−β ) ; 1  ( ) m ∑ Tr ( x ( k ) − cl ) ( x ( k ) − cl ) T 1−β   l =1   t k = 1  j( ) ;  1 (11)   Tr ( x ( k ) − c ) ( x ( k ) − c )  T β−1  1 + b j j    γ j     N   ∑ ( awβj ( k ) + bt ηj ( k ) ) x(k ) c j = k =1 N ,   ∑ ( j ( ) j ( )) aw β k + bt η k k =1 where a > 0, b > 0. The constants a and b determine the relative importance of the fuzzy partition matrix and the capability matrix in the centroid calculation function. By setting a = 0, algorithm (11) goes to PCM, and by setting b = 0, algorithm (11) goes to FCM. Analyzing all the presented methods, several conclusions can be drawn. First, the membership function of the FCM algorithm with its limitations is too "strong", allowing outlier objects to be assigned to one or more clusters, which, in turn, can greatly affect the underlying structure of the data set. On the other hand, the PCM method's constraint on the features is too weak – it allows to refer to a cluster independently of the rest of the data. Also, PCM is very sensitive to the initialization of the capability matrix. The PFCM method is an efficient combination of the two approaches, and the clustering results depend on the parameter setting a, b, β, η. Algorithm (6) can be extended to the case when data for processing are received sequentially in on-line mode. To do this, by applying the Arrow-Hurwitz-Uzawa saddle point search procedure to the Lagrangian (4), when the (k+1)th observation is received, the estimates of the membership levels and centroids can be refined using recurrence relations [12]  2 1  ( D ( x(k + 1), c j (k )) ) 1−β  w j (k + 1) =m 1 ;   ∑ l =1 2 ( D ( x(k + 1), cl (k )) ) 1−β   (12)    ∂L( w j (k + 1), c j , λ (k + 1))  c j ( k = + 1) c j (k ) − η(k )  ∂c j =      = c (k ) + η(k ) w (k + 1)( x(k + 1) − c (k )), β  j j j for an arbitrary value of the fuzzifier β and  (Tr ( x(k + 1) − c j (k ))( x(k + 1) − c j (k ))T ) −1  w j (k + 1) =m ;   ∑ T −1 (Tr ( x(k + 1) − cl (k ))( x(k + 1) − cl (k )) ) (13)  l =1 c j ( k = + 1) c j (k ) + η(k ) w2j (k + 1)( x(k + 1) − c j (k )),  for β = 2. It is easy to see that the expression (13) is an adaptive version of the procedure (4), and (13) is, accordingly, (6). The matrix credibility criterion is used as the objective function of clustering: N m N m ( Cred j (k ), c j ) E= ) D ( x(k ), c ) ∑∑ Cred β (k )Tr ( ( x(k ) − c )( x(k ) − c ) ) (14) ∑∑ Cred β (k= j 2 j j j j T k 1 =j 1 = k 1 =j 1 = in presence of constraints 0 ≤ Cred j (k ) ≤ 1∀j , k ,  sup Cred j (k ) ≥ 0,5∀k , (15)  Cred j (k ) + sup Credl (k ) = 1 where Cred j (k ) - level of observation x(k ) credibility. In the procedures of credibilistic fuzzy clustering, the level of membership is determined by the membership functions [13]: ( ) ) ϕ j D ( x(k ), c j )= ϕ j Tr ( ( x(k ) − c j )( x(k ) − c j )T ) w j (k= (16) where ϕ j - decreases monotonically on the interval [0, ∞] and with condition ϕ j (0) = 1,ϕ j (∞) → 0 . It is easy to see that membership level (16) using the distance is based on similarity measure [14]. As such a measure in [15], it was proposed to use a function: 1 w j (k ) = . (17) 1 + Tr ( ( x( k ) − c j )( x( k ) − c j )T ) Thus, if the fuzzy clustering algorithm in a batch form can be written as [16]:  1  w j (k ) = 1 + Tr ( x(k ) − c )( x(k ) − c )T ,  ( j j )  w j (k )  w∗j (k ) = ,  sup wl (k )  Cred (k ) = w j (k ) + 1 − sup wl (k ) , ∗ ∗ (18)  j 2  N   ∑ Cred βj ( k ) x( k ) c j = k =1 N ,   ∑ k =1 β Cred j ( k ) in the online mode this procedure (18) has the form (19):   −1 T 1− β  m 1 σ= ∑  Tr ( ( x(k + 1) − c j )( x(k + 1) − c j ) )  , 2 j ( k + 1)  l =1    l≠ j  β −1 −1     wq (k + 1) = 1 + ( Tr ( ( x ( k + 1) − c j )( x ( k + 1) − c j ) T ) )  , σ 2j (k + 1)         (19)   ∗ w j (k + 1)  w j (k + 1) = ,  sup w l ( k + 1)  1 ∗ Cred j (k= + 1) 2 ( w j (k + 1) + 1 − sup wl∗ (k + 1) ) ,  = c j (k ) + η (k + 1)Cred βj (k + 1) ( x(k + 1) − c j (k ) ) c j (k + 1)  or in case when β = 2   m  −1  2 (  Tr ( x(k + 1) − c )( x(k + 1) − c )T  , )  −2 σ= j ( k + 1)  ∑ j j   l =1  l≠ j      Tr ( ( x(k + 1) − c j )( x(k + 1) − c j )T )  −1   wq (k + 1) = 1 +  ,   σ 2j (k + 1)  (20)     ∗ w q ( k + 1)  w j (k + 1) = ,  sup wl (k + 1)  1 ∗ Cred j (k= + 1) 2 ( w j (k + 1) + 1 − sup wl∗ (k + 1) ) ,  = c j (k ) + η (k + 1)Cred 2j (k + 1) ( x(k + 1) − c j (k ) ) . c j (k + 1)  It is easy to see that the recurrent fuzzy clustering algorithm is not more complex than the online modifications of probabilistic, possibilistic, and robust procedures [17, 18]. 3. Experimental research Digital images, including satellite images of the city of Kharkiv, were used to test the implemented matrix credibilistic modifications of the clustering algorithm. Samples have no missing attributes and are numeric. The result of the algorithm is the final fuzzy partition matrix for all sample objects and class prototypes. When processing digital images, objects (matrices or vectors of the same dimensions) are formed from fragments of this image, and each pixel from the RGB (Red-Green-Blue) color model is converted to the Grayscale model, where the brightness of a pixel is expressed as a scalar value from the interval [0,1]. The conversion from the RGB model to the Grayscale model is performed according to the formula: Y = (0.299 R + 0.587G + 0.114 B) / 255 , where Y is the brightness of the pixel glow, R, G, B are the brightness of the glow of red, green, and blue tones, respectively, the values of which are in the interval [0, 255]. Observation sets formed from digital images are processed according to the same principle as standard quantitative samples. After image processing, each cluster is assigned the colors of the Grayscale model, and each object is colored in the color of the nearest cluster. To evaluate the quality of the algorithm, the following criteria were used: Partition Coefficient (PC), Classification Entropy (CE), Partition Index (PI) with the same initialized fuzzy partition matrix U0. Table 1 shows the results of the accuracy and speed of the clustering algorithms on the Iris sample, and Table 2 shows the results of the satellite digital image of the city of Kharkiv. The time given is an average for one iteration, considering the vectorization-devectorization operation. Table 1 Results of cluster analysis on the Iris sample Methods PC CE PI Time (c) FCM 0.531 0.811 12.19 0.003 Matrix method of FCM 0.531 0.811 12.19 0.0025 Matrix method of credibilistic clustering 0.530 0.811 12.18 0.0022 Table 2 The results of the cluster analysis on the digital image Methods PC CE PI Time (c) FCM 0.697 0.419 8.23 1.9 Matrix method of FCM 0.697 0.419 8.23 1.8 Matrix method of credibilistic clustering 0.695 0.419 8.22 1.8 On Figure 2 show the initial image, the resampled sample (20% of objects) and the result of the cluster analysis and the process of the algorithm. (a) (b) (c) Figure 2: The result of clustering: (a) - Initial digital image for clustering; (b) - Refined sample (20% of objects) for clustering; (c) - Output image of cluster analysis On Figure 3 shows the result of digital image clustering by the adaptive matrix mrthod of fuzzy credibilistic clustering. Figure 3: Output image after adaptive cluster analysis Acknowledgements The work is supported by the state budget scientific research project of Kharkiv National University of Radio Electronics "Deep hybrid systems of computational intelligence for data stream mining and their fast learning" (state registration number 0119U001403). References [1] Chan, K. Efficient time series matching by wavelets, in: Proceedings of 15th IEEE Int. Conf. on Data Engineering, 1999, pp. 126-133. [2] Ruspini, Enrique H., James C. Bezdek, James M. Keller, Fuzzy clustering: A historical perspective, in: IEEE Computational Intelligence Magazine 14.1, 2019, pp. 45-55. [3] J.C. Bezdek, Pattern Recognition with Fuzzy Objective Function Algorithms, N.Y.: Plenum Press, 1981. [4] Oyewole, Gbeminiyi John, and George Alex Thopil, Data clustering: application and trends, Artificial Intelligence Review 56.7, 2023, pp. 6439-6475. [5] Agrawal, Anurag Vijay, et al. "A probability-based fuzzy algorithm for multi-attribute decision-analysis with application to aviation disaster decision-making." Decision Analytics Journal 8 (2023): 100310. [6] Wongkhuenkaew, Ritipong, et al. "Fuzzy K-nearest neighbor based dental fluorosis classification using multi-prototype unsupervised possibilistic fuzzy clustering via cuckoo search algorithm." International Journal of Environmental Research and Public Health 20.4 (2023): 3394. [7] Bodyanskiy, Yevgeniy, Valentyna Volkova, and Mark Skuratov. "Matrix Neuro-Fuzzy Self- Organizing Clustering Network." Computer Science (1407-7493) 50 (2011). [8] Zhang, Yong, et al. "Possibilistic c-means clustering based on the nearest-neighbour isolation similarity." Journal of Intelligent & Fuzzy Systems 44.2 (2023): 1781-1792. [9] Hashemi, Seyed Emadedin, Fatemeh Gholian-Jouybari, and Mostafa Hajiaghaei-Keshteli. "A fuzzy C-means algorithm for optimizing data clustering." Expert Systems with Applications 227 (2023): 120377. [10] Zhang, Yong, et al. "Possibilistic c-means clustering based on the nearest-neighbour isolation similarity." Journal of Intelligent & Fuzzy Systems 44.2 (2023): 1781-1792. [11] Hussain, Ishtiaq, Kristina P. Sinaga, and Miin-Shen Yang. "Unsupervised multiview fuzzy c- means clustering algorithm." Electronics 12.21 (2023): 4467. [12] Yang, Miin-Shen, and Josephine BM Benjamin. "Sparse possibilistic c-means clustering with Lasso." Pattern Recognition 138 (2023): 109348. [13] Bodyanskiy Ye, Shafronenko A., Mashtalir S., Online robust fuzzy clustering of data with omissions using similarity measure of special type, Lecture Notes in Computational Intelligence and Decision Waking-Cham: Springer, 2020, pp.637-646. [14] Young F.W., Hamer R.M. Theory and Applications of Multidimensional Scaling-Hillsdale, N.J.: Erlbaum, 1994. [15] Dahiya, Sonika, and Anjana Gosain. "DOIFCM: An Outlier Efficient IFCM." Computational Intelligence in Analytics and Information Systems. Apple Academic Press, 2023. 135-149 [16] Kyrychenko, O., Ostapov, S., & Malyk, I., Cluster Analysis of Information in Complex Networks, in: International Journal of Computing, 22(4), 2023, pp. 515-523. https://doi.org/10.47839/ijc.22.4.3360 [17] Linan, M. N., B. Gerardo, and R. Medina. "Modified weight initialization in the self-organizing map using Nguyen-Widrow initialization algorithm." Journal of Physics: Conference Series. Vol. 1235. No. 1. IOP Publishing, 2019. https://doi.org/10.47839/ijc.19.1.1694 [18] Sirola, Martti, and John Einar Hulsund. "Machine-learning methods in prognosis of ageing phenomena in nuclear power plant components." International Scientific Journal of Computing 20.1 (2021): 11-21.https://doi.org/10.47839/ijc.20.1.2086