Online Stacking Credibilistic Fuzzy Clustering for Data
                         Stream Mining
                         Yevgeniy Bodyanskiy1, Alina Shafronenko1, Diana Rudenko1 and Oleksii Tanianskyi1
                         1Kharkiv National University of Radio Electronics, Nauky ave 14, Kharkiv, 61166, Ukraine


                                         Abstract
                                         An important problem that arises when processing large amounts of observations is data compression
                                         to highlight the most essential information and identify certain latent factors that implicitly determine
                                         the nature of the phenomenon being studied. One of the most effective approaches to solving this
                                         problem is the apparatus of factor analysis, which has found wide application in problems of processing
                                         empirical data in various fields.
                                         Fuzzy clustering is a popular approach for soft data partitioning, its use always encounters difficulties
                                         in solving the problems of processing high-dimensional real data with complex hidden distributions.
                                         This paper proposes a disclosure of a kind of stack fuzzy clustering method where the data is
                                         represented in a new feature space created by a staking neural network. This approach aims to
                                         overcome the challenges associated with processing complex data and can bring significant
                                         improvements in the quality of clustering.

                                         Keywords
                                         Stack learning, data compression, fuzzy clustering, credibilistic fuzzy clustering, data stream mining1


                         1. Introduction
                         Clustering is a technique in machine learning and data analysis that involves grouping a set of
                         data points into subsets, or clusters, based on the similarity between them. Fuzzy clustering is a
                         variation of traditional clustering methods that allows for more flexible and nuanced assignments
                         of data points to clusters [1-5]. In contrast, fuzzy clustering allows data points to belong to
                         multiple clusters simultaneously with varying degrees of membership. This reflects the inherent
                         uncertainty or ambiguity present in real-world data.
                             The Fuzzy C-Means (FCM) algorithm, proposed by James Bezdek in 1973, is a prominent
                         method in fuzzy clustering [6]. FCM assigns membership degrees to data points, indicating the
                         likelihood of each point belonging to different clusters. This flexibility makes fuzzy clustering
                         particularly useful in scenarios where data points may exhibit overlapping characteristics or
                         uncertainty in their categorization.
                             Over the years, fuzzy clustering has found applications in diverse fields, including pattern
                         recognition, image processing, and Data Mining. Researchers have developed various extensions
                         and enhancements to the original FCM algorithm, addressing specific challenges and improving
                         its adaptability to different data patterns.
                             The validity of fuzzy clustering solutions became a key focus, leading to the introduction of
                         indices to assess the quality of clustering results. These indices help researchers and practitioners
                         evaluate the effectiveness of fuzzy clustering algorithms in capturing meaningful patterns within
                         datasets.
                             The evolution of fuzzy clustering has seen ongoing advancements, with researchers exploring
                         sophisticated membership functions and integrating fuzzy clustering with other machine
                         learning techniques. This integration has expanded the capabilities of fuzzy clustering, making it
                         applicable to complex problems in large-scale data analysis.


                         CMIS-2024: Seventh International Workshop on Computer Modeling and Intelligent Systems, May 3, 2024,
                         Zaporizhzhia, Ukraine
                            yevgeniy.bodyanskiy@nure.ua (Ye. Bodyanskiy); alina.shafronenko@nure.ua (A. Shafronenko);
                         diana.rudenko@nure.ua (D. Rudenko), oleksii.tanianskyi@nure.ua (O. Tanianskyi)
                           0000-0001-5418-2143 (Ye. Bodyanskiy); 0000-0002-8040-0279 (A. Shafronenko); 0000-0002-1792-5080 (D.
                         Rudenko); 0009-0005-3491-4470 (O. Tanianskyi)
                                    © 2024 Copyright for this paper by its authors.
                                    Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).

CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
   The era of big data has significantly influenced the field of clustering, including traditional
clustering methods and the development of fuzzy clustering techniques. Deep learning in big data
represents a powerful combination that has transformed various fields by enabling more
sophisticated analysis, pattern recognition, and decision-making capabilities.
   Recently, there has been significant research into leveraging deep learning to uncover
meaningful data representations through neural networks. A notable area of exploration involves
the integration of unsupervised clustering algorithms with stack neural networks. This synergy
has become a vibrant field of research, aiming to jointly optimize the performance of deep
learning models and clustering algorithms.
   The goal of the work is propose the stack neuro-fuzzy system for Data Stream Mining using
credibilistic approach and designed to work both in batch and its recurrent online version.

2. Neural Network Data Compression
An important problem that arises when processing large amounts of observations is data
compression to highlight the most essential information and identify certain latent factors that
implicitly determine the nature of the phenomenon being studied. One of the most effective
approaches to solving this problem is the apparatus of factor analysis [7], which has found wide
application in problems of processing empirical data in various fields: psychology, sociology,
technology, economics, medicine, criminology, etc.
   The basic idea of factor analysis, which allows for the presence of a priori unknown hidden
factors, leads to the following informal task: by observing a large number of measured
parameters (indicators), identify a small number of parameter-factors that mainly determine the
behavior of the measured parameters, or in other words: knowing the values of a large number
of measured functions parameters, set the appropriate values of the factor arguments common
to all functions and restore the form of these functions.
   The initial information for factor analysis is the ( N × n ) observation matrix:
                 x1 (1)    x2 (1) ... xn (1)                           xT (1) 
                                                                       T        
                 x1 (2) x2 (2) ... xn (2)                              x (2) 
                 ...          ...  ...     ...                         ...      
      =X ( N ) =                                                                ,                     (1)
                x1 (k ) x2 (k ) ... xn (k )                            xT ( k ) 
                ...                                                             
                           ...      ... ...                              ...      
                                               
                 x1 ( N ) x2 ( N ) ... xn ( N )                        xT ( N ) 
                                                                                  
that formed by an array of N n-th dimensional vectors x(k ) = ( x1 , x2 ,..., xN ) and autocorrelation
                                                                                        T


matrix ( n × n )
                                1 N                                               1 N
                                  ∑   (                   )(                   )    ∑
                                                                                T
                            R ( N=
                                 )      x ( k ) − x ( N )    x ( k ) − x ( N )  =       x (k ) x T (k ), (2)
  =                             N k 1=                                            N k 1

where
                                             1 N
                                 x=(k )         ∑ x(k ), x=
                                            N k =1
                                                               (k ) x(k ) − x (k )                         (3)

vectors of measured indicators centered relative to the average of the data array.
    One of the most common and effective methods for finding factors is the principal component
method or component analysis, which is widely used in problems of data compression, pattern
recognition, coding, image processing, spectral analysis, etc. and known in pattern recognition
theory as the Karhunen-Loeve transform.
    The task of component analysis is to project data vectors from the original n-dimensional
space into a m-dimensional one ( m < n ) space of principal components and reduces to searching
for a system w1 , w2 ,..., wm orthonormal eigenvectors of the matrix R ( N ) such that
w1 = ( w11 , w12 ,..., w1n )T corresponds to the largest eigenvalue λ1 matrix R( N ) , w2 - second largest
eigenvalue λ2 , etc. In other words, the problem comes down to finding solutions to the matrix
equation:
                                               ( R( N ) − λ I ) w =
                                                              j n 0     j

such, that λ1 ≥ λ2 ≥ ... ≥ λm ≥ ε ≥ 0 and w j = 1 .
     The dimension of the space of principal components m is determined, as a rule, from
empirical considerations and the required degree of compression of the data array.
     Thus, in algebraic terms, solving a factor problem is closely related to the problem of
eigenvalues and finding the rank of the correlation matrix; in a geometric sense, this is the
problem of moving to a lower-dimensional space with minimal loss of information; in a statistical
sense, this is the problem of finding a set of orthonormal vectors in the input space that “accept”
the maximum possible variation of the data, and finally, in an algorithmic sense, this is the
problem of sequentially determining a set of eigenvectors w1 , w2 ,..., wm by optimizing a set of local
criteria that form a global objective function

                                              1 m k 1 m k
                                                                    ( wTj x ( p ) )
                                                                                     2
                                   =Ek         =∑
                                              k =j 1
                                                     Ej   ∑∑
                                                        k =j 1 =p 1

with constraints

                                          wTj wl =
                                                 0, j ≠ l , wTj w j =
                                                                    1.

     The first principal component w1 can be found by maximizing the criterion:

                                                         1 k
                                                                ( w1T x ( p ) )
                                                                                 2
                                                 E1k =     ∑
                                                         k p =1
                                                                                                    (4)


by solving a nonlinear programming problem using uncertain Lagrange multipliers.
    However, if data processing must be carried out in real time, neural network technologies
come to the fore, among which the self-learning rule and E. Oya’s neuron should be noted.
    It is with the help of Oya's rule in the form:

                                      = w1 (k ) + η (k ) y1 (k ) ( x (k ) − w1 (k ) y1 (k ) ) ,
                            w1 (k + 1)
                                        T
                                                                                                    (5)
                         = y1 (k ) w1 (k ) x (k ), w1 (0) ≠ 0

the first principal component can be isolated.
        Next, following the procedure of standard principal component analysis, from each vector
 x (k ), k = 1, 2,..., N its projection onto the first principal component is subtracted and the first
principal component of the differences is calculated, which is the second principal component of
the original data and the orthonormal first. The third principal component is calculated by
projecting each original vector x (k ) into the first two components, subtracting this projection
from x (k ) and finding the first principal component of the differences, which is the third principal
component of the original data array. The remaining principal components are calculated
recursively according to the described strategy.
        It is this idea of recursive calculation of principal components that forms the basis of the
algorithm proposed by T. Sanger [8] and in a modified form having the form [9]
                                 1) w j (k ) + η (k )e j (k ) y j (k ),
                       w j ( k +=
                      
                      =e j (k ) e j −1 (k ) − w j (k ) y j (k ),
                                    T
                     = y j (k ) w j (k ) x (k ), w j (0) ≠ 0,                                   (6)
                      
                      e0 (k ) x=
                      =            (k ), j 0, 2,..., m,
                      η (k=      −1                                    2
                                                ) α r (k − 1) + x (k ) , 0 ≤ α ≤ 1.
                       ) r (k ), r (k=

     It is easy to see that the first principal component is calculated using the Oya algorithm, then
the projection of the input vectors onto w1 (k ) are subtracted from the inputs and the differences
are processed by the next neuron, etc.


   Figure 1: T. Sanger neural network
     In Fig. 1 shows a diagram of a modified artificial T. Sanger’s neural network, composed of
E. Oya’s neurons and implementing the algorithm (6).
     The first layer of the network is formed by encoder neurons that pre-process signals by
centering and normalizing them. Further signals x1 (k ), x2 (k ),..., xn (k ) are processed in the second
hidden layer formed by E. Oya's neurons, after which they are sent to the output layer formed by
elements with activation rectifier functions with a dead zone

                                                       u , if u ≥ θ ,
                                           ψ (u ) =                                                    (7)
                                                       0, otherwise

which allows you to highlight informative signals y j (k ) and filter out the noise.
     The Sanger neural network is an effective means of compressing information with minimal
loss of accuracy, but its capabilities are limited by the fact that, implementing essentially the
standard technique of factor analysis, it solves a linear problem, while the main advantage of
neural network technologies is the ability to work in purely nonlinear situations.
     The problem of nonlinear factor analysis can be effectively solved using credibilistic theory
and clustering analysis.

3. Fuzzy credibilistic clustering
Alternatively, to probabilistic and possibilistic procedures [10] it was introduced credibilistic
fuzzy clustering approach using as its basis the credibility theory [11] and is largely devoid of the
drawbacks of known methods.
   The most common approach within the framework of probabilistic fuzzy clustering is
associated with minimizing the goal function [12-14].

                                                           N    m
                                      E ( uq (k ), cq ) = ∑∑ uqβ (k )d 2 ( x(k ), cq )                  (8)
                                                         k 1=
                                                         =  q 1


with constraints

                                                 m
                                                 ∑ uq (k ) = 1,
                                                  q =1
                                                                                                       (9)
                                                         N
                                                 0 < u ( k ) < N .
                                                 
                                                 
                                                        ∑
                                                        k =1
                                                             q


  Solution of nonlinear programming problem using the method of Lagrange indefinite
multipliers leads to the well-known result [9, 11, 15]:

                                                                                   1

                                                          (                     )
                                                              d 2 ( x(k ), cq ) 1− β
                                           uq (k ) =                                 1
                                                                                          ,
                                                          m
                                                               ( d 2 ( x(k ), cl ) )
                                                                                     1− β
                                           
                                           
                                                         ∑
                                           
                                                          l =1                                        (10)
                                                    N
                                           
                                                         (           )
                                                                       β
                                                 ∑        u q ( k )     x(k )
                                           cq =   k =1
                                                         N
                                                             ( uq ( k ) )
                                                                          β
                                           
                                           
                                           
                                                       ∑k =1


coinciding with β = 2 a popular method of Fuzzy C-Means of J. Bezdek (FCM) [6].
   If the data are fed to processing sequentially, the solution of the nonlinear programming
problem (8), (9) using the Arrow-Hurwitz-Uzawa algorithm leads to an online procedure:
                                                                                   1

                                                  ( d  2
                                                          ( x ( k + 1), cq ( k ) ) 1− β
                                                                                       )
                                   uq (k + 1) =m                                     1
                                                                                           ,
                                    
                                    
                                                 ∑    ( d ( x(k + 1), cl (k ) ) )
                                                          2                          1 − β             (11)
                                                 l =1
                                    
                                  cq (k + 1)= c(k ) + η (k + 1)uq (k + 1) ( x(k + 1) − cq (k ) ) .
                                                                         β


     The goal function of credibilistic fuzzy clustering has the form [6, 11] close to (8)

                                                                 N    m
                                     E ( Cred q (k ), cq ) = ∑∑ Cred qβ (k )d 2 ( x(k ), cq )          (12)
                                                                k 1=
                                                                =  q 1


with "softer" than (9) constraints:

                                       0 ≤ Cred q (k ) ≤ 1, for all q and k ,
                                       
                                       sup Cred q (k ) ≥ 0,5, for all k ,                             (13)
                                       
                                       Cred q (k ) + sup Credl (k ) = 1,
                                       for any q and k ,for which Cred (k ) ≥ 0,5.
                                                                           q


   It should be noted that the goal functions (8) and (12) are similar and that there are no rigid
probabilistic constraints in (13) on the sum of the membership in (9).
   In the procedures of credibilistic clustering, there is also the concept of fuzzy membership,
which is calculated using the neighborhood function of the form:


                                                            (
                                           uq (k ) = ϕq d ( x(k ), cq )       )                        (14)


monotonically decreasing on the interval [0, ∞] so that ϕ q (0)
                                                             = 1,ϕq (∞) → 0.
   Such a function is essentially an empirical similarity measure of [13, 15, 16] related to distance
by the relation:

                                                                   1                                   (15)
                                            uq ( k ) =                          .
                                                         1 + d 2 ( x( k ), cq )


     Note also that earlier it was shown in [14] that the first relation (10) for β = 2 can be rewritten
as

                                                    d 2 ( x(k ), cq ) 
                                                                                  −1
                                               
                                             ) 1 +
                                        uq (k=                         ,                              (16)
                                                         σ q2        
                                                                     

where

                                                                                       −1
                                                      m                
                                                σ q = ∑ d ( x(k ), cl ) 
                                                  2         2
                                                                                                       (17)
                                                      l =1            
                                                       l ≠ε             

which is a generalization of the function (15) (for σ q2 = 1 (15) coincides with (17)) and satisfies
all the conditions for (14).
  In batch form the algorithm of credibilistic fuzzy clustering in the accepted notation can be
written as


                                                     (           (
                                         u (k )= 1 + d 2 x(k ), c −1 ,
                                          q                                 )
                                                                            q)

                                          ∗
                                         uq (k ) = uq (k ) ( sup ul (k ) ) ,
                                                                               −1


                                                             1 ∗                          
                                         Cred
                                              =  q (k )         uq (k ) + 1 − sup ul∗ (k )  ,
                                                             2                  l ≠q             (18)
                                                 N
                                                       ( Cred q (k ) ) x(k )
                                                                      β
                                          
                                          
                                                ∑
                                          cq =
                                                 k =1
                                                       N
                                                           ( Cred q (k ) )
                                                                           β
                                          
                                          
                                                     ∑k =1


and in the online mode, considering (16), (17):

                           2                             1
                          σ q (k + 1) =m 2                              ,
                          
                          
                                           ∑
                                           l =1
                                                d ( x(k + 1), cl (k ) )
                                          l ≠q

                                         d 2 ( x(k + 1), cq (k ) ) 
                                                                             −1

                          u (k + 1) = 1 +                                 ,
                           q                          σ q2 (k + 1)       
                                                                         
                                            u  ( k  + 1)                                           (19)
                          uq∗ (k + 1) = q                    ,
                                         sup ul (k + 1)
                          
                          Cred (k=             1 ∗                               
                                       + 1)         uq (k + 1) + 1 − sup ul∗ (k )  ,
                                 q
                                                2                     l ≠q        
                          
                          
                          c (k + 1)
                           q       = cq (k ) + η (k + 1)Cred qβ (k + 1) ( x(k + 1) − cq (k ) ) .

   From the point of view of computational implementation, algorithm (19) is not more
complicated than procedure (11) and, in the general case, is its generalization to the case of
credibilistic approach to fuzzy clustering.

4. Experimental research
Conducting experimental studies and comparative analysis of the quality of data clustering using
various metrics allows you to objectively assess the effectiveness of the developed method in
accordance with analogues. To estimate the quality of the method we used quality criteria
partitioning into clusters such as [3, 6]:
       − Partition Coefficient (PC);
       − Classification Entropy (CE);
       − Partition Index (SC);
       − Separation Index (S);
       − Xie and Beni's Index (XB);
       − Dunn's Index (DI).
   Partition Coefficient (PC): measures the amount of "overlapping" between clusters.
   Classification Entropy (CE): it measures the fuzzyness of the cluster partition only, which is
similar to the Partition Coefficient.
   Partition Index (SC): is the ratio of the sum of compactness and separation of the clusters. It is
a sum of individual cluster validity measures normalized through division by the fuzzy cardinality
of each cluster. SC is useful when comparing different partitions having equal number of clusters.
A lower value of SC indicates a better partition.
    Separation Index (S): on the contrary of partition index (SC), the separation index uses a
minimum-distance separation for partition validity.
    Xie and Beni's Index (XB): it aims to quantify the ratio of the total variation within clusters and
the separation of clusters. The optimal number of clusters should minimize the value of the index.
    Dunn's Index (DI): this index is originally proposed to use at the identification of "compact and
well separated clusters".
    So the result of the clustering has to be recalculated as it was a hard partition method. The
specific information of the data sets is shown in Table 1.

Table 1
Information of the data dets
                  Data set                             Observations                                           Clusters
 Parkinson’s telemonitoring                               5875                                                   21
 Superconductor temperature prediction                    10000                                                  81

Table 2
Results of experiments for Parkinson’s telemonitoring data set
                         Methods                            PC          CE             SC           S         XB        DI       PC


                                                                        -2.3308e-04
                                                               9.1249е-07


                                                                                                                                 0.582e-13
                                                                                                              24.4028
                                                                                                 48.7067


                                                                                                                        0.2005
                                                                                       0.3733
 Online Stack Fuzzy Credibilistic Clustering for Data
 Stream Mining


                                                                                                 0.7473e-04
                                                                        0.21415

                                                                                       0.00715


                                                                                                              1.92845

                                                                                                                        0.01375
                                                              0.3808


                                                                                                                                 0.3954
 SOM based on possibilistic fuzzy clustering
                                                                        0.05725


                                                                                                                                 0.27535
                                                                                       0.2395
                                                               0.4731


                                                                                                 0.0032

                                                                                                              1.7309

                                                                                                                        0.1699
 SOM based on probabilistic fuzzy clustering


Table 3
Results of experiments for Superconductor data set
                        Methods                              CE        SC              S        XB            DI        PC       CE
                                                                                                                        4.56245е-07
                                                                                      271 150 000

                                                                                                135 900 000


                                                                                                                                 -0.000228


 Online Stack Fuzzy Credibilistic Clustering for Data
                                                             -0.00022

                                                                       369360


                                                                                                              0.0109


 Stream Mining
                                                                                      0.0000034
                                                                       0.000366


                                                                                                              0.00585

                                                                                                                        0.38085


 SOM based on possibilistic fuzzy clustering
                                                             0.1903


                                                                                                2.8555


                                                                                                                                 0.1903
                                                             0.31965

                                                                       4.29665

                                                                                      0.02415


                                                                                                              0.05075


                                                                                                                                 0.31965
                                                                                                0.5375


                                                                                                                        0.4731


 SOM based on probabilistic fuzzy clustering
Table 4
A comparison of accuracy
                                                                                 Accuracy
       Data                             Methods
                                                                          Highest           Mean
                     Online Stack Fuzzy Credibilistic Clustering           61.68            60.98
   Parkinson’s                for Data Stream Mining
 telemonitoring      SOM based on possibilistic fuzzy clustering           61.75            61.30
                     SOM based on probabilistic fuzzy clustering           62.68            62.98
                     Online Stack Fuzzy Credibilistic Clustering
 Superconductor                                                            63.27            58.45
                              for Data Stream Mining
  temperature        SOM based on possibilistic fuzzy clustering           64.68            55.55
   prediction
                     SOM based on probabilistic fuzzy clustering           65.45            57.33

Table 5
Comparative characteristics of the average error with different number of observations (%)
        Algorithm              50           t         100           t        150           t
           FCM                1.62       1.19         1.35        2.55       0.98        3.03
      SOM based on
   probabilistic fuzzy        1.66       1.62         1.32        2.72       0.99        3.12
        clustering
      SOM based on
    possibilistic fuzzy       1.22       1.15         1.02        2.02       0.75        2.10
        clustering
   Online Stack Fuzzy
 Credibilistic Clustering     0.68       1.00         0.45        1.25       0.12        1.33
 for Data Stream Mining

5. Discussions
Upon analyzing the results acquired, it can be inferred that irrespective of the volume of the initial
data provided, the processing through the proposed method exhibits comparable speed and
clustering quality when contrasted with established clustering algorithms and methodologies.
     The obtained results confirm that the performance stack neuro-fuzzy system is better than
other network structures, and it can be a viable structure for Data Stream Mining.
     The results of accuracy that demonstrated in Table 4 confirm that proposed method online
stack fuzzy credibilistic clustering for Data Stream Mining time more superiority regardless of
the number observations that are fed on clustering process.
     Based on the experimental findings, it is advisable to endorse the proposed system for
practical application in addressing the challenges associated with automatic clustering of large
datasets.

6. Conclusion
The problem of fuzzy clustering of data streams by stack neuro-fuzzy system is considered. In the
paper was proposed the stack neuro-fuzzy system for Data Stream Mining using credibilistic
approach and designed to work both in batch and its recurrent online version.
     The network shows that stack structures based on fuzzy models can be applicable in data
clustering. The proposed stack neuro-fuzzy system is quite simple in numerical implementation
and can use the well-known online fuzzy clustering methods intended for solving Data Stream
Mining problems.
    Future research endeavors could explore the potential of employing stack neuro-fuzzy
systems for fuzzy clustering of data streams, aiming to address the complexities inherent in
automatic clustering of big data.

Acknowledgements
The work is supported by the state budget scientific research project of Kharkiv National
University of Radio Electronics "Deep hybrid systems of computational intelligence for data
stream mining and their fast learning" (state registration number 0119U001403).

References
   [1] GAs, Genetic Algorithms. "On Genetic-Fuzzy Data-Mining Techniques." (2023).
   [2Liu, Hongfu, et al. "Infinite ensemble for image clustering." Proceedings of the 22nd ACM
SIGKDD international conference on knowledge discovery and data mining. 2016.
   [3] Xu, Jindong, et al. "A fuzzy C-means clustering algorithm based on spatial context model for
image segmentation." International Journal of Fuzzy Systems 23 (2021): 816-832.
   [4] Gong, Maoguo, et al. "Fuzzy clustering with a modified MRF energy function for change
detection in synthetic aperture radar images." IEEE Transactions on Fuzzy Systems 22.1 (2013):
98-109.
   [5] Rani, V., and S. Kumar. "Dissimilarity measure between intuitionistic Fuzzy sets and its
applications in pattern recognition and clustering analysis." Journal of Applied Mathematics,
Statistics and Informatics 19.1 (2023): 61-77.
   [6] Bezdek, James C., Robert Ehrlich, and William Full. "FCM: The fuzzy c-means clustering
algorithm." Computers & geosciences 10.2-3 (1984): 191-203.
   [7] Cureton, Edward E., and Ralph B. D'Agostino. Factor analysis: An applied approach.
Psychology press, 2013.” (1967).
   [8] Sanger, Terence D. "Optimal unsupervised learning in a single-layer linear feedforward
neural network." Neural networks 2.6 (1989): 459-473.
   [9] Specht, Donald F. "A general regression neural network." IEEE transactions on neural
networks 2.6 (1991): 568-576.
   [10] Höppner, Frank, et al. Fuzzy cluster analysis: methods for classification, data analysis and
image recognition. John Wiley & Sons, 1999.
   [11Zhou, Jian, et al. "Credibilistic clustering: the model and algorithms." International Journal
of Uncertainty, Fuzziness and Knowledge-Based Systems 23.04 (2015): 545-564.
   [12] Zhang, Yong, et al. "Possibilistic c-means clustering based on the nearest-neighbour
isolation similarity." Journal of Intelligent & Fuzzy Systems 44.2 (2023): 1781-1792.
   [13] Hu, Zhengbing, et al. "Fuzzy clustering of incomplete data by means of similarity
measures." 2019 IEEE 2nd Ukraine Conference on Electrical and Computer Engineering
(UKRCON). IEEE, 2019. doi: 10.1109/UKRCON.2019.8879844
   [14] Shafronenko, Alina, et al. "Online credibilistic fuzzy clustering of data using membership
functions of special type." CMIS. 2020.
   [15] Bodyanskiy, Yevgeniy, Alina Shafronenko, and Sergii Mashtalir. "Online robust fuzzy
clustering of data with omissions using similarity measure of special type." Lecture Notes in
Computational Intelligence and Decision Making: Proceedings of the XV International Scientific
Conference “Intellectual Systems of Decision Making and Problems of Computational
Intelligence” (ISDMCI'2019), Ukraine, May 21–25, 2019 15. Springer International Publishing,
2020.
   [16] Bodyanskiy, Ye V., A. Yu Shafronenko, and I. N. Klymova. "Online fuzzy clustering of
incomplete data using credibilistic approach and similarity measure of special type." Radio
Electronics, Computer Science, Control 1 (2021): 97-104.