Novel Methodology for Detecting Effective Features in Patients
with Multiple Myeloma
Małgorzata Szatkowska1, Waldemar Wójcik1, Wojciech    Legieć2,                                       Iryna   Lurie3,   Serge
         4                      5                   5
Olszewski , Volodymyr Lytvynenko , Mariia Voronenko
1
  Lublin University of Technology, ul. Nadbystrzycka 38D, Lublin Voivodeship, 20-618, Lublin, Poland
2
  Centrum Onkologii Ziemi Lubelskiej imienia sw Jana z Dukli, Doktora Kazimierza Jaczewskiego 7, 20-090
Lublin, Poland
3
  Ben-Gurion University of Negev, David Ben Gurion Blvd 1, Beer Sheva, 8410501, Izrael
4
  Taras Shevchenko National University of Kyiv, 64/13, Volodymyrska Street, City Kyiv, 01601, Ukraine
5
  Kherson National Technical University, Beryslavske Shose, 24, Kherson, 73008, Ukraine


                Abstract
                Multiple myeloma (MM) is a malignant condition characterized by the uncontrolled growth
                of abnormal plasma cells and the extensive destruction of bone tissue, leading to symptoms
                such as pain and bone fractures. This disease is caused by chromosomal abnormalities and
                abnormalities in the surrounding tissue microenvironment. In this study, we present a novel
                comprehensive technology for selecting effective lineament in a collected dataset of patients
                with MM and removing irrelevant lineament from this data. This research presents classical
                and inductive technologies based on the K-means, C-means, and Bayesian hierarchical
                Technology clustering (BHC) technologies. The main technology used in this study was the
                BHC technology, and the impact of four internal measure (silhouette, Dunn index, Calinski-
                Harabasz index, entropy) on clustering effectiveness was investigated. The overall use of the
                proposed noise elimination technique in conjunction with the inductive approach
                significantly improves the quality of clustering complex objects. The proposed clustering
                technology can be beneficial for extracting relevant lineament from the results of laboratory
                tests for patients with multiple myeloma in several aspects.

                Keywords 1
                Multiple myeloma, lineament selection, K-means, C-means, Inductive clustering, Bayesian
                hierarchical Technology clustering, denoise, Data imputation, silhouette, Dunn index,
                Calinski-Harabasz index, entropy.


1. Introduction
   With the elaboration of computer technology, their use in medical diagnostics continues to grow.
Although the physical examination by a physician is still a valuable diagnostic technique, it is now
standard practice to use a variety of modern diagnostic instruments and devices, especially when
analyzing the results of laboratory tests such as haematological, cytological, biochemical, and
immunological tests. This is especially true for extensive imaging studies such as CT and MRI, the


MoMLeT+DS 2023: 5th International Workshop on Modern Machine Learning Technologies and Data Science, June 3, 2023, Lviv,
Ukraine
    EMAIL: m.szatkowska@pollub.pl (M. Szatkowska); waldemar.wojcik@pollub.pl (W. Wójcik); Legiec.wojciech@gmail.com
(W.Legieć); lurieira@gmail.com (I. Lurie); olszewski.serge@gmail.com (S. Olszewski); immun56@gmail.com (V. Lytvynenko);
mary.voronenko@gmail.com (M. Voronenko)
    ORCID: 0009-0004-5549-8464 (M. Szatkowska); 0000-0002-0843-8053 (W. Wójcik); 0000-0003-30935248 (W. Legieć); 0000-0001-
8100-1846 (I. Lurie); 0000-0003-4499-8485 (S. Olszewski); 0000-0002-1536-5542 (V. Lytvynenko); 0000−0002−5392−5125
(M.Voronenko)
           ©� 2023 Copyright for this paper by its authors.
           Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
           CEUR Workshop Proceedings (CEUR-WS.org)
evaluation of which can vary depending on the radiologist's experience and working conditions
(amount of tests performed, stress level, fatigue).
    In many persons, the diagnosis may be too radical (over diagnosis), which may lead them to
believe they have a disease of their own, or conversely, miss some lesions (hypodiagnosis). In this
context, it becomes natural to look for solutions that can provide a more objective interpretation of
research results. Thanks to the use of specialized technologies and advanced digital data processing
techniques, the diagnosis can be established more quickly and objectively, and the role of the
oncologist will be to control and verify this process.
    Bone marrow cancer is a malignant disease that develops as a result of cell mutation. The
pathological process is also called myeloma disease or sarcoma. During this pathology, the tissue
undergoes a mutation, which stops functioning. As a result, other organs also stop working properly,
which in general has a negative impact on the entire human body.
    In this study, the author decided to focus on multiple myeloma [1]. Multiple myeloma (MM) is a
tumor disease with uncontrolled proliferation of clonal plasma cells and extensive skeletal bone
damage, accompanied by pain and bone fractures, which is caused by chromosomal abnormalities and
stromal microenvironment pathology. The disease is also characterized by the attendance of
monoclonal protein in the blood and/or urine.
    MM accounts for 1% of all cancers and 10-13% of hematological tumors. MM accounts for 2% of
deaths in all malignant tumors. The disease occurs in all countries of the world in people of all races.
In Western countries and the United States, the incidence of MM is 5-10 cases per 100,000 people per
year. For example, about 4,000 new cases of MM are diagnosed each year in Italy and 20,000 in the
United States. The mortality rate is 4.1 cases per 100,000 person per year. The incidence among the
Japanese and Chinese is much lower at 1 per 100,000 person. About 2,000 people fall ill each year
and an equal amount die [1, 2]. MM is a disease of the elderly. The median age is approximately 70
years; only 37% of person are less than 65 years old by the time the disease is diagnosed. At the age
of 65-74 years, 26% of people fall ill; at the age of 75 years and older, 37% fall ill. Persons at the age
of 80 have the disease 10 times more often than 50-year-olds. The rate of people under 40 years old
doesn't exceed 2-3%, and under 30 years old - 0.3%. Men fall ill more often (about 60%) than
women. The annual incidence of MM in persons aged 65-74 years is about 31 cases per 100,000
people, and at the age of 75 years and older - up to 46 cases [1, 2]. In the future, the amount of elderly
MM persons is likely to increase, which is associated with improved survival rates due to the use of
new drugs and hematopoietic stem cell transplantation, as well as the increase in life expectancy of
the global population as a whole.
    Geographically, it varies widely in different regions of the world and is highest in industrialized
areas of Australia, New Zealand, Europe and North America [3]. The incidence of MM in the United
States averages 4-5 new cases and can reach 9-10 cases per 100,000 population per year among the
African American population [4, 5]. In contrast, in East Asian countries, particularly in Japan, this
rate is lower and does not exceed 1.2 cases per 100,000 person per year [6]. Countries with a low
incidence of MM include South Korea (1.4) [7], China (1.3) [8] and Taiwan (1.8 cases per 100,000
population per year) [9]. So far, MM remains an incurable pathology, and therefore the main goal of
treatment is to prolong overall survival (OS). This figure varies from country to country and depends
on the quality of care provided.
    The incidence of myeloma showed a strong correlation with mortality rates in countries with very
low incidence rates (less than 1/100,000; ρ = 0.95, p < 0.0001), indicating a significantly shorter
survival time in these countries. However, as the incidence rates increased, the correlation between
incidence and mortality gradually decreased. In countries with incidence rates ranging from 1 to 3 per
100,000, the correlation coefficient dropped to 0.58, and in countries with incidence rates greater than
3 per 100,000, it further decreased to 0.36.
In the field of medicine, especially in oncology, modern data collection technologies have enabled the
generation of massive datasets containing thousands or more lineament. However, the high
dimensionality of these datasets poses challenges for selecting discriminatory lineament due to the
curse of dimensionality. While several population-based techniques to feature selection have been
proposed, few studies address the fact that there can be multiple optimal subsets of lineament for the
task of selecting relevant lineament.
   We propose a feature selection technique that utilizes cluster assay of lineament. This technique
leverages knowledge about correlation to cluster lineament, incorporating this knowledge into the
coding technique and search process. The objective is to identify different subsets of lineament that
exhibit very similar or identical classification performance. In addition, we propose the use of both
traditional iterative clustering techniques such as K-means, C-means, and Bayesian hierarchical
Technology clustering, as well as their inductive counterparts, to further enhance the feature selection
process.


Figure 1: Relationship between mortality and morbidity in countries with a population of more than
1 million (n = 150) [9].

    The main obstacles to early diagnosis of multiple myeloma are early detection and diagnosis.
Therefore, it is essential that people with suspected multiple myeloma be estimated as soon as
possible so that a treatment regimen can be established as quickly as possible. Otherwise, the disease
will progress rapidly, leading to death.
    Feature selection is a decisive preprocessing technique for reducing data dimensionality. In the
field of medical diagnostics, it is essential to identify the most significant risk factors associated with
a disease. By identifying the most influential lineament, unnecessary and redundant lineament can be
eliminated from the disease dataset, leading to faster and more accurate results.
    Prior to applying any technology to the data, it is advisable to remove noisy and inconsistent data
to improve the accuracy of the results and save time. While reducing the dimensionality of the dataset
is significant in real-world applications, the selection of the most significant lineament significantly
decreases complexity exponentially [10].
    In recent years, intelligent feature selection techniques have been widely applied to healthcare
datasets to extract valuable information. Clinical databases use feature selection techniques for the
assay and prediction of various chronic diseases, including diabetes, cancer, heart disease, strokes,
hypertension, thalassemia, and more [11]. Given the abundance of redundant and irrelevant lineament
in medical databases, an efficient feature selection technique is necessary to identify relevant
lineament associated with the disease.
    However, it is worth noting that non-hierarchical clustering technologies also have limitations
when applied to medical data lineament.
    Sensitivity to initial conditions. The results of non-hierarchical technologies can heavily depend on
the chosen initial conditions or random initialization. Different runs of the technology may lead to
different clusters and interpretations of results. This can complicate result in repeatability and
reproducibility, especially when working with massive datasets or complex structures.
    Dependency on hyperparameter selection. Non-hierarchical technologies require the selection of
various hyperparameters, such as the amount of clusters or the distance metrics used. Incorrect choice
of these hyperparameters can lead to misinterpretation or distortion of results. Finding optimal values
for hyperparameters can be a challenging task, especially when working with medical data where
explicit knowledge of the true cluster structure may be lacking.
    Scalability issue. Non-hierarchical technologies can face scalability issues when processing
massive medical datasets. The computational complexity of the technologies can be high, especially
with a massive amount of lineament or records. This can lead to performance limitations and longer
technology execution times.
    Lack of hierarchical information. Unlike hierarchical technologies, non-hierarchical technologies
do not preserve the hierarchical structure of clusters. This means that information about the
relationships and hierarchy between clusters may be lost. This can be a imperfection if hierarchical
information is significant for data interpretation or further assay.
    These drawbacks need to be considered when choosing a feature clustering technique for medical
data and attention should be paid to adequacy. Feature selection techniques can be such types:
• Filter techniques: These techniques analyze the intrinsic properties of data, disregarding the
     classifier.
• Wrapper techniques: These techniques employ classifiers to assess the performance of a given
     feature subset.
• Embedded techniques: These techniques integrate the feature selection process directly into the
     training of the classifier.
    Most of these techniques can perform two primary operations: ranking and subset selection. In the
ranking operation, the significance of each individual feature is estimated, typically without
considering potential interactions between elements in the overall feature set. In the subset selection
operation, a final subset of relevant lineament is generated. In some cases, these two operations are
executed sequentially, while in others, only the selection operation is performed. Generally, subset
selection is always controlled, while the level of control in ranking techniques may vary.
    In this case, the investigated data lacks class labels. Conducting feature clustering in medical data
is reasonable, even in the absence of class labels and well-defined target lineament. The reasons and
necessity for such an approach are as follows:
    Exploring data structure. Clustering allows for the exploration of data structure and the
identification of internal patterns and relationships between lineament. Even without specific target
labels, clustering can help identify groups of similar observations and uncover common
characteristics in the data. This can be useful for generating hypotheses, understanding relationships
between changeables, and informing further research.
    Detecting new patient subgroups. Clustering can help discover new subgroups of patients with
multiple myeloma who share common medical characteristics, even without explicit target lineament.
This may lead to the discovery of new disease subtypes or conditions that could have clinical or
prognostic significance. This approach can be particularly valuable in phenotype classification studies
or personalized medicine research. Data preprocessing for subsequent assay. Feature clustering can
serve as a stage of data preprocessing, especially in the absence of class labels. It can help decrease
the dimensionality, identify the significant lineament, and prepare the data for further assay or
modeling. Such an approach can contribute to improving assay efficiency, enhancing result
interpretation, and reducing the influence of noise or uninformative lineament.
    Risk group identification and risk factors. Clustering can help identify patient groups at an
increased risk of developing specific diseases or conditions.
    This can provide a foundation for further investigations into risk factors and the elaboration of
personalized techniques to prevention and treatment. Overall, conducting feature clustering in medical
data has several significant excellences:
         • Discovery of new medical subgroups: Cluster assay can help identify patient subgroups
             with common medical characteristics that may indicate new medical subtypes of diseases
             or conditions. This can be useful for more accurate diagnosis, personalized treatment, and
             providing more effective and personalized healthcare.
       •    Revealing relationships and significant factors: Cluster assay enables the identification of
            hidden relationships and significant factors that may be associated with the elaboration or
            prediction of specific diseases. This can contribute to a deeper understanding of diseases,
            the identification of new risk factors or predictors, and help in the elaboration of novel
            techniques to prevention and treatment.
        • Support for decision-making in medicine: Feature clustering can be valuable for decision-
            making in clinical practice. By identifying characteristics associated with specific
            outcomes or predictions, technologies or tools can be developed to assist doctors in
            making informed decisions about diagnosis, treatment, and patient management.
        • Simplification and interpretation of data: Cluster assay can help decrease data
            dimensionality by identifying the most informative and distinguishing lineament. This can
            significantly simplify data assay, enhance understanding and interpretation of results, and
            facilitate data visualization.
        • Advancement of personalized medicine: Feature clustering can serve as a foundation for
            the elaboration of personalized medicine, where treatment and care can be tailored to the
            unique characteristics and needs of each patient. This involves identifying subgroups and
            designating optimal treatment strategies.
   Both hierarchical and non-hierarchical clustering can be used for feature clustering. However, non-
hierarchical clustering technologies have several excellences over hierarchical technologies when
addressing feature selection tasks in massive medical datasets:
   High performance: Non-hierarchical technologies typically operate faster than hierarchical
technologies, especially when processing massive datasets. This is particularly significant when
working with medical data, which often contain a massive amount of lineament and records.
   Flexibility and control: Non-hierarchical technologies offer more flexibility in controlling the
clustering process and selecting settings according to specific needs and tasks. You can choose the
amount of clusters, distance metrics used, clustering merging and splitting technologies, and more.
This allows for more precise customization of the feature selection process and adaptation to specific
requirements of medical research.
   Better scalability: Non-hierarchical clustering technologies usually scale better when working with
massive datasets. Hierarchical technologies may encounter high computational complexity and
require massive memory volumes when analyzing massive medical datasets, leading to limitations
and performance issues.
   Better robustness to outliers: Non-hierarchical clustering technologies are typically more robust to
outliers and noise in the data, as they do not construct a complete hierarchical structure. This is
particularly significant in the assay of medical data, where anomalies or incorrect values may be
present.
   However, it should be noted that the choice between non-hierarchical and hierarchical clustering
technologies depends on the specific context and task.
   Hierarchical technologies are linked to the creation of dendrograms. In agglomerative
technologies, each object is initially treated as a separate cluster, and these clusters are progressively
merged as the technology progresses.
   After all, the technology of hierarchical cluster assay is used for a limited amount of lineament and
is not applicable for massive data sets due to the complexity of the agglomerative technology and
excessively massive dendrograms. In iterative technologies, the data is divided into several clusters at
once, the amount of which is estimated over certain measure. Then, elements are moved between
clusters to optimize a specific measure, such as minimizing variability within clusters [11].
   Nevertheless, iterative clustering technologies, such as k-means, exhibit a amount of limitations:
    1. The guarantee of achieving the global minimum of the overall sum of squared deviations is
         not provided, only one of the local minima.
    2. The outcome is influenced by the initial selection of cluster centers, which makes designating
         the optimal choice uncertain.
    3. Prior knowledge of the amount of clusters is required.
    One of the main drawbacks of existing iterative technologies is their high subjectivity. To enhance
the objectivity of clustering, inductive techniques based on the group technique of data processing
[11] can be employed. These techniques involve processing data through two equally influential
subsets and making the final decision regarding object partitioning into clusters based on the
combined use of outside measure for relevance and internal measure for assessing clustering quality.
Therefore, developing technologies and clustering techniques based on inductive modeling techniques
to solve the problem of cancer subtype identification is an significant task.
    Non-hierarchical clustering technologies also have some drawbacks when conducting feature
clustering in medical data:
• Sensitivity to initial conditions: The results of non-hierarchical technologies can heavily depend
     on the selected initial conditions or random initialization. Different technology runs can lead to
     different clusters and interpretation of results. This can complicate result repeatability and
     reproducibility, especially when working with massive datasets or complex structures.
• Dependency on the choice of hyperparameters: Non-hierarchical technologies require the
     selection of various hyperparameters, such as the amount of clusters or distance metrics used.
     Incorrect selection of these hyperparameters can lead to misinterpretation or distortion of results.
     Finding optimal values for hyperparameters can be a challenging task, especially when working
     with medical data, where explicit knowledge of the true cluster structure may be lacking.
• Scaling problem: Non-hierarchical technologies may encounter scalability issues when processing
     massive medical datasets. The computational complexity of technologies can be high, especially
     with a massive amount of lineament or records. This can result in performance limitations and
     increased technology execution time.
• Lack of hierarchical information: Unlike hierarchical technologies, non-hierarchical technologies
     do not preserve the hierarchical structure of clusters. This means that information about the
     relationships and hierarchy between clusters may be lost. This can be a imperfection if
     hierarchical information is significant for data interpretation or further assay.
    These drawbacks should be taken into account when choosing a technique for clustering medical
data lineament and attention should be paid to adequacy.
    The main objective of this study is to present a novel comprehensive technology for selecting
effective lineament in a collected dataset of multiple myeloma patient test results and removing
irrelevant lineament from this data. This research presents classical and inductive technologies based
on K-means, C-means, and Bayesian Hierarchical clustering technologies. Comparative studies of the
presented technologies have been conducted. This approach can help identify influential lineament in
the dataset. Changeables that change the cluster structure during technology realization are identified
and selected as significant changeables in the dataset. Additionally, by identifying effective
changeables in clustering, it is possible to select cluster labels based on the identified changeables.

2. Related Works
   Currently, there are numerous techniques to feature selection, some of which are outlined below.
In [12], the authors presented a new feature selection technology for symbolic attributes based on
measuring the distance between feature values. In [13], the author estimated the effectiveness of
feature selection techniques based on inter-class and probability distances in the preprocessing stage
for constructing decision trees. This research showed that, overall, the proposed technique
outperforms the use of probability measures. Researchers in [14] proposed a new feature selection
technology to enhance the accuracy of classification techniques, utilizing fuzzy entropy measure for
selecting relevant lineament. In [15], the authors applied fuzzy approximation operators for feature
selection. The authors of [16] used a hybrid approach, combining genetic technologies and
generalized regression neural networks, for selecting a subset of lineament. In [17], a novel approach
to feature selection was introduced, utilizing 2,1-norm minimization and noise elimination. [18]
presented a hybrid feature selection technology combining mutual information and rough sets. An ant
colony optimization (CO) technology was applied for feature selection and elimination in
electromyography signal classification in [19].
    The application of genetic technologies and particle swarm optimization in a hybrid feature
selection technique was demonstrated in [20]. [21] proposed a feature selection technology based on
Case-based Reasoning (CBR) that incorporated feature selection reduction techniques and cluster
assay. [22] proposed an unsupervised feature selection technology based on the Regularized Self-
Representation (RSR) technique, selecting the most significant lineament for clustering and
classification tasks.
    A hybrid feature selection technology combining a binary quantum-inspired gravitational search
technology with the k-nearest neighbor classifier was presented in [23], showing favorable results
compared to other techniques. [24] proposed a hybrid feature selection technology using particle
swarm optimization and correlation information. [25] introduced a combined technology based on
Ant Colony Optimization (ACO) and Bee Colony Optimization (BCO) for selecting significant
lineament in a dataset, demonstrating high effectiveness. Feature selection technologies based on
random projections, singular value decomposition, and K-Means clustering were proposed in [26,27],
where lineament were clustered using the K-Means technology.
    As evident from the compilation of research in this field, various techniques have been developed
using evolutionary technologies for feature selection, such as [12, 13, 14, 20, 24], but the K-means
technique was not utilized. Although evolutionary techniques have computational complexity and
long execution times, the technology proposed in this work can be effective due to its short
operational time. In [28, 29], an integrated technology for selecting effective lineament in a dataset
and removing irrelevant lineament was proposed. It was based on the K-means clustering technology.
In this technique, changeables that alter the cluster structure during the technology realization are
identified and selected as significant changeables in the dataset. Additionally, for designating
effective changeables, cluster labels were selected based on the identified changeables. The results
showed that the proposed technology achieved higher classification efficiency and could eliminate
irrelevant and redundant lineament more effectively than other techniques.
    A drawback of evolutionary technologies is that they require exploring numerous solution domains
and spaces to achieve an optimal answer, which can be time-consuming. In contrast, the designation
of solution domains in this approach is selective, and there is no need to explore the entire situation.
    This work consists of five sections. Section 3 describes the problem statement. The proposed
technique is presented in Section 4. Section 5 presents the experimental results, and Section 6
provides discussions. Section 6 concludes the work.

3. Problem Statement
   A flowchart of the identification of experimental data obtained in the examination of persons with
multiple myeloma is presented in Figure 1.
   The study incorporates the rules of inductive modeling into the process of inductive clustering,
which encompasses the following steps [30]:
   Missing data recovery.
   Normalization of the lineament of the objects under study, i.e., bringing them to the same diapason
with a common feature median.
   Noise removal from the data.
   Division of the original dataset into two equally sized subsets.
   Designation of an outside measure or set of significance measure for selecting the optimum
clustering for the two equally sized subsets.
   Selection or elaboration of a base clustering technology used as a component of the inductive
technology for objective clustering.
   The process of extracting relevant lineament consists of the following stages: data preprocessing
and feature selection, techniques of identifying relevant lineament, verification of results,
visualization, and description of clusters.
                 Data from                        Recover                       Data
            multiple myeloma                   missing data                  normalize
                 patients


                 Validation                         Data                      Removing
                clustering                      clustering                   noise from
                                                                                data


                  Visualizati                     Analysis and
                  on result                     description of
                                               obtained clusters

Figure 2: Procedure for Identification of Signs Obtained in the Examination of Persons with Multiple
Myeloma.

   In most cases, laboratory datasets are multidimensional and contain noise and missing values. This
work employs six clustering techniques:
   a) Classical clustering techniques: k-means and its fuzzy version, c-means, and Bayesian
hierarchical clustering.
   b) Inductive clustering techniques: inductive k-means, its fuzzy version, c-means, and inductive
Bayesian hierarchical clustering.
   The evaluation of results is done using the Dunn index, the Calinski-Harabasz index, entropy, and
graphical visualization using the Silhouette index.
   The goal of the work is to develop inductive technologies for feature clustering based on k-means,
c-means, and hierarchical Bayesian clustering technologies, as well as to estimate the quality of the
obtained results. Hierarchical technologies are associated with constructing dendrograms. In
agglomerative technologies, all objects are initially considered separate clusters and are merged
during the technology's execution.
   The hierarchical clustering is suitable for a a bit of amount of objects and is not suitable for
massive datasets due to the complexity of the agglomerative technology and the resulting massive
dendrograms. In iterative clustering technologies, the data is immediately divided into multiple
clusters, and the amount of clusters is estimated based on certain conditions. Then, elements are
moved between clusters to optimize a specific measure, such as minimizing within-cluster variability
[30].
   Nevertheless, iterative clustering technologies, specifically the k-means technology, possess
certain limitations:
   •     The global minimum of the overall sum of squares is not guaranteed to be achieved, only one
   of the local minima.
   •     The results depend on the initial selection of cluster centers, and the optimal selection is
   unknown.
   •     The amount of clusters must be known in advance.
   High subjectivity is one of the key drawbacks of existing iterative technologies. Increasing the
objectivity of clustering is possible by using inductive modelling techniques for complex systems
based on the inductive data processing approach [31]. In this approach, data processing is performed
on two equally sized subsets, and the final decision regarding the nature of object separation into
clusters is made based on the combined use of outside relevance measure and internal clustering
quality evaluation measure.
    Each technique has its own excellences and limitations and is oriented towards specific types of
data. High subjectivity is a significant obstacle in existing procedures, meaning that qualitative
processing on one sample collection does not yield an equivalent level of results on another
comparable dataset. Enhancing the objectivity of clustering is possible by applying inductive
modelling techniques for complex systems based on the Group Technique of Data Handling (GMDH)
[32, 33]. In this approach, two subsets of equal size ensure data processing, and the final judgment is
based on the nature of object partitioning into clusters according to outside relevance rules and
internal clustering quality assessment recommendations.
    The technology of inductive clustering [31, 33, 34] obeys the rules:
    1. The heuristic self-organization strategy, which involves sequentially amounting different
increasingly complex candidate technologies in order to select the best technologies based on a
specific outside measure or a group of recommendations for evaluating the data grouping measure.
    2. The postulate of outside extension, which aims to request "fresh knowledge" for objective
technology verification repeatedly.
    3. The non-finality policy of decisions, involves generating not just a single result but a set of
intermediate results with subsequent selection of the best ones.
    Implementing these policies in a modified form serves as a prerequisite for creating an inductive
framework for the objective clustering of complex data.
    Therefore, the elaboration of hybrid object clustering systems based on inductive modelling
techniques for complex systems is a relevant task both theoretically and practically. There are
numerous clustering technologies available. Some divide a dataset into a known amount of groups,
while others automatically designate the amount of clusters. One of the objectives of this work is to
conduct a comparative study on the effectiveness of applying inductive clustering technology based
on the K-means, C-means, and Bayesian hierarchical clustering (BHC) technologies.
    Thus, the elaboration of technologies and clustering techniques for feature selection based on
inductive modelling techniques to solve the problem of extracting effective lineament and removing
irrelevant lineament from laboratory test results in multiple myeloma is a relevant task.

4. Materials and Techniques
4.1. Data
    The data consists of a collection from 21 persons with varying stages of multiple myeloma
severity. Data was gathered from 213 cytological, haematological, immunological, and biochemical
tests, with a total of 525 examinations conducted during the screening process.

4.2.    Missing data imputation
   To address the issue of missing data, there are several techniques that can be used for imputation.
Here are some of them [36, 37]:
   Mean imputation: This technique involves replacing missing values with the mean value of the
column. It is suitable when the data is normally distributed and there are no significant outliers. Mean
imputation can be performed for the entire column or only for specific rows with missing values.
   Median imputation: Similar to mean imputation, the median can be used to fill in missing values
instead of the mean. The median is more robust to outliers and may be preferred if the data contains
outliers or is not normally distributed.
   Interpolation: Interpolation is used to fill in missing values based on neighboring values. There are
various interpolation techniques available, such as linear interpolation, cubic interpolation, or nearest
neighbor interpolation. The choice of interpolation technique depends on the data characteristics and
context.
   Regression technologies: If you have other lineament that can be used to predict the missing
values, you can build a regression technology where the missing values are the dependent changeable
and the other lineament are independent changeables. Then, use this technology to predict and fill in
the missing values.
    Multiple imputations: Multiple imputations is a statistical technique that generates multiple
possible values for each missing value. These values can then be used for data assay or modelling.
Multiple imputations is based on modelling and random generation of values, taking into account the
relationships between changeables.
    Machine learning techniques: Machine learning techniques such as random forests or gradient
boosting can be used for imputing missing values. You can use other lineament in the data to train a
technology and predict the missing values.
    Each approach has its excellences and limitations, and the choice of technique depends on the
specific dataset and the nature of the missing data.
    The choice of a specific technique for filling in missing numerical data depends on several factors,
such as:
    Data characteristics: Examine the distribution and properties of the data. For example, if the data is
heavily skewed or contains outliers, techniques that are robust to outliers may be preferable. If the
data exhibits temporal dependence, techniques that account for this dependence may be more suitable.
    Context of assay: Consider the purpose of the assay and the specific nature of the data. Certain
techniques may be more appropriate for particular types of assay or modelling.
    The proportion of missing values: If the missing values constitute a significant portion of the data,
removing or imputing them with constants may lead to result distortion. In such cases, the use of
machine learning techniques or multiple imputation techniques may be preferable.
    Available resources: Some techniques may require greater computational resources or expert
knowledge. Ensure that the chosen technique is feasible given the available resources.
    Results verification: It is significant to assess how the chosen imputation technique affects the
final data assay or technology. Different techniques can yield different results, and it is crucial to
ensure that the imputation does not distort the final conclusions.
    In general, it is recommended to conduct multiple experiments with different techniques and
compare their results. This will help in selecting the most suitable technique for filling in missing
numerical data based on the specific dataset and assay task.
    In this study, the k-nearest neighbor technology (k-NN) was used. K-NN is an automatic object
classification technique. The main rule is that an object is assigned to the class that is most common
among its neighbors [38].
    The neighbors are selected based on a set of objects whose classes are already known, and, using
the key value k for this technique, it is designated which class is the most common among them. The
k-nearest neighbor technology is based on the assumption that if objects are close in terms of n-1
properties, they are also close in terms of the n-th property.
    Filling in missing values in a data table using the k-nearest neighbor technique works as follows:
first, among all rows in the table, k rows that are most "similar" to the row containing the missing
value are identified. The measure of "similarity" between rows (objects) is given by the Euclidean
distance between rows in the column (property) space. The smaller the Euclidean distance between
objects in the property space, the more "similar" they are to each other.
    The column containing the predicted value is referred to as the target column. To obtain a
prediction for the unknown element's value of the target property, the values of the target property
from the k nearest neighbors are averaged, weighted by the inverse of the Euclidean distance to the
row containing the missing value.
                                                  k

                                                 ∑a C        lj       l
                                        aij =    l =1
                                                     k
                                                                          ,                        (1)
                                                  ∑C  l =1
                                                                  l

where Сl – weight (competence) of the l-th nearest neighbor, inversely proportional to the Cartesian
distance rli between l-th and j-th lines

                                         ∑(a − a )
                                                                               2
                               =rli                    lp                 ip                       (2)
                                          p ≠l
    Specifically, the k-nearest neighbor (k-NN) technology substitutes the missing value with the
target property value of the object that is most similar to the predicted object.
    The main feature that distinguishes the k-NN technique from others is the absence of a training
stage in this technology. One of the main excellences of this approach is the ability to update the
training dataset without retraining the classifier. This property can be useful, for example, in cases
where the training dataset is frequently augmented with new data, and retraining takes too much time.
The main drawback of the k-nearest neighbor technique is the time-consuming nature of the
classification stage.

4.3.     Removing Noise from Data
   For many arrays of experimental data aimed at uncovering the relationship between diverse
characteristics of the studied phenomena, the relationship between quantitative measures of similarity
between two characteristics and the category amount to which a selected characteristic belongs has a
fundamentally stochastic nature [40, 41]. If the category amount is represented as an ordered set of
amounts, then the relationship between the quantitative similarity measure and the category amount
S[n] can be viewed as a stochastic process with an uncertain probability distribution. The attendance
of numerous undesignated parameters when describing such arrays using probability theory
techniques gives rise to various challenges in constructing continuous predictive technologies for
these processes [42]. If we treat these ordered arrays as generalized signals with noise:
                                       S=[ n] x [ n] + ξ [ n]                                   (3)
where x [ n ] is the discrete values of a smooth defining function that describes the shape of the signal,
and ξ [ n ] is the discrete values of a symmetric random process; applying standard discrete signal
processing techniques to them allows for the extraction of the defining component and the
construction of a predictive technology on the array x [ n ] . One of the commonly used techniques for
processing noisy signals is its transformation using a moving average technology.
   Moving average is a discrete sequence of data constructed by averaging several consecutive values
of another discrete sequence. In our case, it is the investigated signal S [ n ] . It can be seen as a type of
mathematical convolution. If we represent the original sequence as y1 , , yn , then its two-sided
moving average is given by the following expression:
                                1 k
                          zt =      ∑ yt + j , t =k + 1, k + 2,, n − k.
                              2k + 1 − k
                                                                                                (4)

  Thus, zk +1 , , zn−k forms a new sequence that is based on the average values of the original time
series, { yt } . Similarly, the one-sided moving average { yt } is given by the following expression:
                                       1 k
                                 zt =      ∑ yt − j , t =k + 1, k + 2,, n.
                                     k + 1 j =0
                                                                                                          (5)

    Moving averages are used in two main ways:
    Two-sided (weighted) moving averages are applied to filter a discrete sequence, suppressing
additive noise, in order to estimate or extract the underlying trend [43, 44].
    One-sided (weighted) moving averages are used as simple forecasting techniques for time series.
Typically, the noisy discrete sequence of data consists of a smooth underlying trend and additive
symmetric noise:=   yt f ( t ) + ε t , where f ( t ) is a smooth and continuous function of t, and {ε t } is the
additive noise with zero means. In this case, the power of the additive noise significantly exceeds the
power of the smooth trend. Suppressing the additive noise and estimating f ( t ) is referred to as
filtering, and the two-sided moving average is one way to accomplish this.
                                      1       k
                         fˆ ( t ) =        ∑ yt + j , t =k + 1, k + 2,, n − k.                          (6)
                                    2k + 1 j =− k
    The idea behind using moving averages for filtering is that experimental data or observations,
presented as an ordered sequence, are likely to be close in value. Thus, averaging points that are
located near an observation provides a reasonable estimate of the trend at that observation. The
moving average eliminates the stochastic component of the data, leaving the smooth trend component.
   Moving averages do not allow for the estimation of f ( t ) near the ends of a time series (in the first
and last k periods). This can pose difficulties when the trend estimation is used for forecasting or
analyzing the most recent data. Each average consists of 2k + 1 observations, and sometimes it is
referred to as a ( 2k + 1) moving average filter or smoother. The massive the value of k, the flatter and
smoother the estimate of f ( t ) will be. A smooth estimate is usually desirable, but a flat estimate is
biased, especially near peaks and troughs in f ( t ) . When {ε t } is a white noise sequence (i.e.,
independent and identically distributed with zero mean and variance σ 2 , the bias is given by
                           1
 E  fˆ ( x )  − f ( x ) ≈ f ′′ ( x ) k ( k + 1) and the variance is given by V  fˆ  ≈ σ 2 ( 2k + 1) . Thus, there is
                         6                                                      
a trade-off between increasing bias (with massive k) and increasing variance (with smaller k).

4.4. Inductive clustering technologies
4.4.1. Normalization
   The data was normalized based on its characteristics using the following formula:
                                         xij − med j                                                                (7)
                               xij′ =                   ,
                                                  (
                                      max xij − med j            )
where xij is the value of the attribute i in column j, xij′ is the normalized value of this attribute, med j j
is the median of column j. The choice of this normalization technique was designated by the fact that
as a result, the set of data attributes in all columns had the same median with a maximum diapason of
variation of attributes from -1 to 1, while the data volume for each column falling into the
interquartile distance (50%) is the massive compared to other normalization techniques.
    The formula used for data normalization considered the value of attribute i in column j ( xij ), the
normalized value of this attribute ( xij′ ), and the median of column j ( med j ). This normalization
technique was chosen to ensure that all data attributes across columns had the same median, with the
attribute values ranging from -1 to 1. Additionally, this technique maximized the data volume within
the interquartile diapason (50%) for each column, making it more favorable compared to other
normalization techniques.

4.4.2. Division into Equally Spaced Sets

  The technology for dividing the original set of objects Ω into 2 equally powerful disjoint subsets
ΩA and ΩB consists of the following steps [45]:
  •   calculation of               pairwise distances between objects in the original data sample;
  •   selection of a pair of objects       the distance between which is minimal:
                             d ( X , X ) = min d ( X , X )
                                        S    p               i       j
                                                                                                   (8)
                                                      i, j


   •     distribution of the object    into a subset   , and the object    into a subset ;
   •     repeating steps 2-3 for the remaining objects. If the amount of objects is odd, the last object is
   distributed into both subsets.

4.4.3. Inductive k-means technology
   The k-means technology is a machine learning technology designed to solve the clustering
problem. It is a non-hierarchical and iterative clustering technique that has gained popularity for its
simplicity, ease of realization, and high-quality results. The technology was first developed
independently by mathematicians Hugo Steinhaus [45] and Stuart Lloyd [46] in the 1950s, and it
gained further attention with the publication of McQueen's work [47] in 1967.
   The k-means technology is based on the expectation-maximization (EM) technology, which is also
used for Gaussian mixture technologies. The main idea behind the k-means technology is to randomly
assign data points to clusters and then iteratively update the cluster centroids based on the mean of the
data points assigned to each cluster. In each iteration, the data points are reassigned to the cluster with
the closest centroid based on a chosen distance metric.
   The objective of the k-means technology is to divide a set of n observations into k clusters, where
each observation is assigned to the cluster with the closest centroid based on a chosen distance metric.
The aim is to create clusters that minimize the distance between each observation and its assigned
cluster centroid.
   Step 1. Start
   Step 2. Formation of the initial set Ω of studied objects. Presentation of the data in the form of a
matrix
            { }
       Ω = x ; i = 1, n; j = 1, m
                ij
                               , where n is the amount of rows or the amount of objects under
investigation, m is the amount of columns or the amount of lineament characterizing the objects.
    Step 3. Data preprocessing - data normalization:
    •    median normalization (Feature Median) is obtained by calculating the median of all data
attributes:
    =zij ( xij − med j ) mad j

where xij   ( z ) is the i-th observation in the j-th changeable (the i-th normalized observation in the j-
              ij


                               i
                                    ( )
th changeable), med j = med xij is the median for the j-th changeable, and mad j = mad ( xij ) is the
                                                                                              i

mean absolute deviation for the j-th changeable.
   •   normalization using a standardized score (z-score) is a measure of the relative spread of the
observed or measured value, which shows how many standard deviations is its spread of the relative
average value. This is a dimensionless statistic used to compare values of different dimensions or a
measurement scale.
                                              x −X
                                         zij = ij                                            (9)
                                                 S xij
where X is the average value, S xij is the standard deviation of the i-th observation in the j-th
changeable. The best normalization technique depends on the data that will be normalized. Typically,
the Z-score is very common to normalize the data [48].
   Step 4. Dividing Ω into two equally powerful subsets in accordance with the above technology.
The resulting subsets Ω A and Ω B can be formally represented as follows:

                                                 { }           { }
                                         Ω A = x ijA ; Ω B = x ijB ;
                                                                                                   (10)
                                    i = 1, n A = n B ; n A + n B = n; j = 1, m


   Step 5. Choosing the initial amount of clusters k = kmin .
   Step 6. Configuring the k-means clustering technology.
For each equidistant subset:
   Step 7. Sequential clustering and cluster fixing.
   Step 8. Calculation of the internal measure for the quality of clustering.
                    1 K
Silhouette: SWC = ∑ S x j
                    K j =1
 Dunn Index: DI ( k ) = min s
                           i∈k

                                                 QCB ⋅ ( N − K )
                    Index: QCCH
 Calinski – Harabasz=                                            → max
                                                 QCW ⋅ ( K − 1)
                  Q   K
                                                        
                 ∑∑ ln ( u )    qk                     
Entropy:  PE
=                 q 1=
                  =  k 1
                                      , PE ∈ [ 0,ln Kk ]
                          Q                            
                                                       
                                                       

                                                                         ( IC A − ICB ) → opt
                                                                                    2

                                               =
    Step 9. Calculation of the outside balance measure: ECB
                                                                         ( IC A + ICB )
                                                                                        2


    Step 10. If the value of the balance measure reaches the optimum, then:
    Step 11 Fixes the resulting clustering.
 otherwise the amount of clusters increases by 1 and steps 5–9 are repeated
    Step 12. Designating the optimal amount of clusters kopt .
    Step 13. Clustering data (the set Ω of objects under study), fixing the clusters.
    Step 14. Validation of the results of clustering.
    Step 15. Visualize the results of clustering.
    Step 13. The End

 4.4.4. Inductive Fuzzy c-Means Technology
    The fuzzy c-means clustering technique (also known as fuzzy clustering, soft k-means, or c-
 means) is used to partition a given set of elements into a specified amount of fuzzy sets. It can be
 considered as an enhanced version of the k-means technology, where the degree of membership (or
 responsibility) of each element to each cluster is calculated.
    The original c-means technology was developed in 1973 [49] and further improved in 1981 [50].
 The pseudocode of the fuzzy c-means clustering technology is presented in Figure 2.


 Figure 3: Pseudocode of the C-means technology for solving feature clustering problems.
4.4.5. Inductive Bayesian Hierarchical Technology
   The Bayesian hierarchical clustering (BHC) technology, proposed in [51], differs from other
hierarchical clustering technologies that use fixed distance measures like Euclidean or Manhattan
distances. Instead, BHC utilizes a probabilistic distance measure, where the distance represents the
probability of data elements belonging to a particular cluster. This probabilistic approach is crucial
during the iterative merging of clusters to form new clusters within the hierarchical structure [50].
   BHC is an technology for hierarchical agglomerative clustering that employs a Bayesian
probabilistic distance measure. It follows a bottom-up approach, starting with all data elements in
separate clusters and iteratively merging them until fusion occurs. The merging process is guided by
pre-computed probabilities using Bayes' theorem. The output is a dendrogram illustrating the
hierarchical structure derived from the input dataset.
   The current focus lies in addressing the challenges associated with clustering complex high-
dimensional data in the attendance of high levels of noise. In this study, high-dimensional data refers
to data where the dimensionality of the feature space is equal to or significantly greater than the
amount of objects being analyzed. Along with high dimensionality, the data exhibit specific
characteristics such as the level and specificity of the noise component, arising from biological
processes or imperfections in the data generation system.
   The increasing demand for accurate detection and identification systems across various conditions
has led to a growing interest in extracting information from complex high-dimensional data. While
numerous clustering technologies exist, each with its own excellences and imperfections, their
subjectivity poses a significant drawback. Achieving high-quality clustering on one dataset does not
guarantee similar results on another dataset. To enhance the objectivity of clustering, inductive
techniques based on the group technique of data processing [51] can be employed. These techniques
involve processing data through two equally influential subsets and making the final decision on
object partitioning into clusters based on the combined use of outside relevance measure and internal
clustering quality assessment. Therefore, the elaboration of hybrid technologies and techniques for
clustering objects based on inductive modeling of complex systems remains a pressing issue in both
theory and practice.
   The realization of the technology includes the following steps:
   In more detail, the technology is as follows:
   Step 1. Start
   Step 2. Data preprocessing to decrease the dimension of the feature space using the Shannon
entropy:
               n
   H =− K ⋅ ∑ pi ln ( pi )
              i =1

  Step 3. Formation of the initial set Ω of studied objects.
  Step 4. Dividing Ω into two equally powerful subsets in accordance with the above technology.
The resulting subsets Ω A and Ω B can be formally represented as follows:
       ΩA
       =       {x } ;=
                     Ω {x } ;
                     A
                     ij
                                B     B
                                      ij


   i = 1, nA = nB ; nA + nB = n; j = 1, m
   Step 5. Configuring the BHC clustering technology.
   Step 6. For the amount of clusters k ∈ [ kmin , kmax ] :
   Step 6.1 Sequential clustering and cluster fixing for =
                                                         ΩA   {x }; =
                                                                A
                                                                ij
                                                                    Ω {x }
                                                                         B        B
                                                                                  ij


                                                                   Ω {x }; =
   Step 6.2 Calculation of internal clustering quality measure for = A
                                                                           Ω {x } :
                                                                             A
                                                                             ij
                                                                                       B   B
                                                                                           ij


Index                               1 K
Silhouette
                SWC
                =                     ∑ S x j → max
                                    K i =1
Index Dunn                DI ( k ) = min
                                    i∈k
Index                 QCB ⋅ ( N − K )
Calinski = –   QCCH                   → max
                      QCW ⋅ ( K − 1)
Harabasz
 Index Entropy                Q  K
                                                                           
                                   ∑ ∑ ln ( u )  qk                       
         = PE                      q 1=
                                    =  k 1
                                                        , PE ∈ [ 0, ln Kk ]
                                            Q                             
                                                                          
                                                                          

                                                                 p

                                                               ∑ ( IC − IC )
                                                                       i
                                                                       A
                                                                            i 2
                                                                            B
   Step 6.3 Calculation of outside balance
                                        =  measure ECB          i =1
                                                                  p
                                                                                    → min , p .
                                                               ∑ ( IC + IC )
                                                                       i    i   2
                                                                       A    B
                                                                i =1

   Step 6.4 If the value of the balance measure is not minimum, then Step 6.1 is repeated. - 6.3.
otherwise
   Step 7. Fixing the minimum value of the outside balance measure.
   Step 8. Designating the optimal amount of clusters kopt
   Step 9. Clustering data (the set Ω of objects under study), fixing the clusters.
   Step 10. The End

4.5.    Clustering Quality Assessment
   As measure for the quality of clustering were used:
Index Silhouette [52]:
                                         1 K
                          SWC
                          =                ∑ S x j → max ,
                                         K i =1
                                                                                                  (11)

where K represents the amount of clusters, S x j denotes the optimal membership of the element x j in
cluster p. Silhouette refers to a technique for interpreting and checking consistency within data
clusters. The silhouette value is a measure of how similar the object is in its own cluster (cohesion)
compared to other clusters (separation). The silhouette diapasons from -1 to +1, where a high value
indicates that the object is in good agreement with its own cluster and is poorly aligned with
neighboring clusters. If most objects are of high importance, then a clustering configuration is
appropriate. If many points have a low or negative value, then the clustering configuration may be too
many or too few clusters. The best partition is characterized by the maximum SWC, which is
achieved when the distance inside the cluster is a bit of and the distance between the elements of
neighboring clusters is massive.
    Index Dunn [53]. It is a metric for evaluating clustering technologies. Compares intercluster
dissolution with cluster diameter. The higher the index value, the better the clustering. The purpose of
this index is to identify clusters that are compact, with a a bit of difference between cluster members
and well-separated, where the objects of different clusters are quite far apart from each other than the
dispersion within the cluster. For this purpose of clusters, a higher Dunn index indicates better
clustering. One of the imperfections of using this index is the high computational cost, as the amount
of clusters and the dimension of the data increase.
                                           DI ( k ) = min                                        (12)
                                                  i∈k
   Calinski-Harabasz Index [54]:
                                        QCB ⋅ ( N − K )
                      =QCCH                             → max                                     (13)
                                        QCW ⋅ ( K − 1)
where N represents the total amount of objects in the dataset, and K denotes the amount of clusters
being considered.
   The Calinski-Harabasz Index, also referred to as the Variance Ratio Measure, calculates the ratio
between the sum of the between-cluster dispersion and the within-cluster dispersion for all clusters.
The highest index value indicates the most optimal cluster structure.
   Entropy [55]:
                                 Q K
                                                                  
                               ∑∑        ( )
                                        ln uqk                    
          =               PE =q 1 =k 1        , PE ∈ [ 0, ln Kk ]                          (14)
                                       Q                         
                                                                 
                                                                 
   Entropy is a quantitative measure of the organization or disorder within a system. The entropy of a
partition reaches its minimum value when the system is highly organized (in the case of a perfect
partition, the entropy is zero). In other words, the higher the degree of membership of an element to a
specific cluster (and the lower its membership to other clusters), the lower the entropy value,
indicating a more accurate clustering.

5. Experiment and Results
5.1. Characterization pre-processing results
   As a result of applying the missing data imputation technologies, normalization, and moving
average filtering described in Section 4.3, three datasets were obtained, as shown in Figure 3.


                                                                            b)
                        a)


                        c)                                               d)
Figure 3: Data surface before and after moving average preprocessing. a) represents the original set
of descriptor vectors, b) with moving average noise suppression using window 3 averaging point, c)
with moving average noise suppression using window 9 averaging point, and d) with moving average
noise suppression using window 27 averaging point.
Figure 4: The result of suppressing symmetric additive noise using the moving average technique at
different window values. Graph a) corresponds to window value 3, graph b) to 9, and graph c) to 27.
Curve 1 in all graphs corresponds to the original data set, curve 2 corresponds to the data set with
suppressed noise at the corresponding value of the moving average window.
5.2.    Clustering Results
   Fig. 5. Results of symmetric additive noise suppression by moving average technique at different
values of the window. Graph a) corresponds to window value 3, graph b) to 9, and graph c) to 27.
Curve 1 in all graphs corresponds to the original data set, curve 2 corresponds to the data set with
suppressed noise at the corresponding moving average window value.
   After data normalization and data reduction, combined moving average techniques in non-
inductive k-means, c-means, and hierarchical inductive clustering technologies for three 213 × 525
datasets The results are presented in Table 1.

Table 1
Results of designating the amount of clusters by k-means, c-means and hierarchical Bayesian
clustering technologies and assessment of clustering quality
                                                                                    Bayesian
Technology                                      K-means            C-means          Hierarchical
                                                                                    Clustering
Size                                            213x525            213x525          213x525
Technique for designating the amount of         Elbow              Silhouette       Elbow
Clusters                                        Technique          Technique        Technique
Amount of Clusters                              3                  8                3
Silhouette                                      0.139              0.082            0.115
Dunn Index                                      0.124              0.126            0.127
Calinski-Harabasz Index                         62.263             25.150           50.590
Entropy                                         1.042077           1.82242          1.011493
    After data normalization and data reduction, combined moving average techniques in inductive k-
means, c-means, and hierarchical inductive clustering technologies for three data sets divided into sets
A and B as matrices of size 107x525. The results are presented in Table 2.

Table 2
Results of designating the amount of clusters by k-means, c-means, and inductive hierarchical
Bayesian clustering technologies and clustering quality assessment
                                                                            Bayesian Hierarchical
Technology               K-means Inductive        C-means Inductive
                                                                                 Clustering
Data                     Set A        Set B       Set A         Set B     Set A        Set B
Size                     107x525      107x525 107x525           107x525   107x525      107x525
                                                                          Inductive
Technique           for Inductive     Inductive
                                                  Inductive     Inductive              Inductive
designating        the Techniqu
                                                  Technique Technique Techniqu         Technique
amount of Clusters      e            Technique
                                                                          e
Amount of Clusters       3            3           2             2         4            4
Silhouette               0.150        0.082       0.240         0.234     0.155        0.079
Dunn Index               0.240        0.232       0.213         0.170     0.240        0.232
Calinski-Harabasz
                         30.956       30.889      45.897        43.931    22.723       22.7902
Index
Entropy                  1.085        1.064       0.688         0.688     1.152        1.348
ECB                      0.001                    0.022                   0.003

Table 3
Placement of lineament in clusters, using non-inductive clustering techniques
 Technology Elow        Feature
            Technique
K-means   Cluster 1     x1 x2 x3 x4 x14 x15 x16 x17 x18 x20 x21 x28 x30 x31 x33 x38 x39 x46 x55
                        x60 x61 x62 x63 x70 x71 x72 x73 x74 x75 x76 x79 x92 x93 x94 x98 x99
                        x100 x107 x109 x111 x122 x124 x125 x145 x146 x149 x151 x152 x155
                        x160 x161 x169 x170 x171 x172 x173 x174 x179 x180 x181 x182 x183
                        x184 x185 x190 x194 x195 x196 x198 x241 x245 x249 x250 x252 x253
                        x255 x258 x259 x260 x261 x262 x266 x267 x268 x271
          Cluster 2     x19 x34 x35 x36 x37 x40 x41 x42 x44 x47 x48 x54 x64 x65 x66 x68 x80 x81
                        x82 x83 x84 x86 x87 x88 x90 x91 x95 x96 x97 x101 x103 x104 x105 x113
                        x115 x116 x117 x118 x119 x120 x121 x126 x127 x144 x150 x156 x157
                        x158 x159 x162 x163 x164 x165 x166 x167 x168 x186 x187 x188 x200
                        x201 x203 x204 x205 x207 x208 x209 x210 x211 x212 x213 x215 x216
                        x217 x218 x220 x221 x222 x223 x224 x225 x227 x228 x229 x231 x251
                        x263 x264 x265
          Cluster 3     x5 x6 x7 x8 x9 x10 x11 x12 x13 x22 x23 x24 x25 x26 x27 x57 x58 x59 x77
                        x78 x108 x147 x148 x153 x154 x191 x192 x193 x197 x246 x247 x248 x254
                        x273 x275 x277 x281 x282 x283
          Shijhouette   Feature
          Technique
          Cluster 1     x1 x4 x14 x21 x28 x60 x61 x62 x71 x72 x73 x74 x75 x76 x78 x79 x107 x108
                        x109 x146 x147 x148 x152 x170 x171 x182 x183 x184 x185 x193 x194
                        x195 x196 x197 x198 x245 x249 x253 x254 x255 x258 x260 x261 x262
                        x267 x268 x271 x273
          Cluster 2     x2 x3 x15 x16 x17 x18 x19 x20 x30 x31 x33 x34 x38 x39 x55 x63 x68 x70
                        x84 x86 x91 x92 x93 x94 x95 x98 x99 x100 x101 x105 x111 x113 x115 x116
                        x122 x124 x125 x126 x127 x144 x145 x149 x150 x151 x155 x160 x161
                        x162 x163 x164 x168 x169 x172 x173 x186 x190 x205 x241 x250 x251
                        x252 x259 x263 x265 x266
          Cluster 3     x5 x6 x7 x8 x9 x10 x11 x12 x13 x22 x23 x24 x25 x26 x27 x57 x58 x59 x77
                        x153 x154 x191 x192 x246 x247 x248 x275 x277 x281 x282 x283
          Cluster 4     x35 x36 x37 x40 x41 x42 x48 x54 x64 x65 x66 x80 x81 x82 x83 x87 x88 x90
                        x96 x97 x103 x104 x117 x118 x119 x120 x121 x156 x157 x158 x159 x165
                        x166 x167 x187 x188 x216 x217 x218 x220 x221 x222 x223 x224 x225
                        x227 x228 x229 x231 x264
          Cluster 5     x174 x179 x180 x181
          Cluster 6     x44 x46 x47
          Cluster 7     x200 x201 x203 x204
          Cluster 8     x207 x208 x209 x210 x211 x212 x213 x215
C-means   Elow
          Technique
          Cluster 1     x2 x15 x16 x17 x20 x30 x31 x38 x39 x63 x79 x94 x98 x111 x113 x124 x144
                        x149 x155 x160 x161 x172 x173 x179 x185 x200 x201 x203 x204 x205
                        x209 x220
          Cluster 2     x3 x18 x19 x33 x34 x35 x36 x37 x40 x41 x42 x44 x46 x47 x48 x54 x55 x64
                        x65 x66 x68 x80 x81 x82 x83 x84 x86 x87 x88 x90 x91 x92 x95 x96 x97
                        x101 x103 x104 x105 x115 x116 x117 x118 x119 x120 x121 x122 x125
                        x126 x127 x150 x151 x156 x157 x158 x159 x162 x163 x164 x165 x166
                        x167 x168 x174 x186 x187 x188 x207 x208 x210 x211 x212 x213 x215
                        x216 x217 x218 x221 x222 x223 x224 x225 x227 x228 x229 x231 x241
                        x250 x251 x252 x263 x264 x265 x266
          Cluster 3     x1 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13 x14 x21 x22 x23 x24 x25 x26 x27 x28
                        x57 x58 x59 x60 x61 x62 x70 x71 x72 x73 x74 x75 x76 x77 x78 x93 x99
                        x100 x107 x108 x109 x145 x146 x147 x148 x152 x153 x154 x169 x170
                             x171 x180 x181 x182 x183 x184 x190 x191 x192 x193 x194 x195 x196
                             x197 x198 x245 x246 x247 x248 x249 x253 x254 x255 x258 x259 x260
                             x261 x262 x267 x268 x271 x273 x275 x277 x281 x282 x283
               Shijhouette
                Technique
                Cluster 1    x1 x4 x5 x8 x9 x13 x14 x21 x22 x23 x26 x27 x28 x30 x57 x58 x59 x60 x61
                             x62 x71 x72 x73 x74 x75 x76 x77 x78 x79 x108 x109 x111 x152 x154 x171
                             x180 x181 x182 x183 x184 x193 x194 x195 x196 x197 x198 x245 x246
                             x247 x248 x249 x253 x254 x255 x260 x261 x267 x268 x271 x273 x275
                             x283
               Cluster 2     x2 x3 x15 x16 x17 x18 x19 x20 x31 x33 x34 x38 x39 x63 x68 x70 x91 x92
                             x93 x94 x95 x98 x99 x100 x101 x107 x122 x124 x125 x126 x144 x145 x149
                             x150 x151 x155 x161 x162 x163 x185 x204 x205 x241 x250 x251 x252
                             x258 x259 x262 x263 x266
               Cluster 3     x6 x7 x10 x11 x12 x190 x191 x192 x277 x281 x282
               Cluster 4     x35 x36 x37 x40 x41 x42 x47 x48 x54 x64 x65 x66 x80 x81 x82 x83 x87 x88
                             x90 x96 x97 x103 x104 x105 x113 x115 x116 x117 x118 x119 x120 x121
                             x127 x156 x157 x158 x159 x164 x165 x166 x167 x186 x187 x188 x200
                             x201 x203 x207 x208 x212 x213 x215 x216 x217 x218 x220 x221 x222
                             x223 x224 x225 x227 x228 x229 x231 x264
               Cluster 5     x24 x25 x146 x147 x148 x153
               Cluster 6     x44 x46 x55 x84 x86 x160 x172 x173 x174 x179 x265
               Cluster 7     x168 x169 x170
               Cluster 8     x209 x210 x211
Bayesian       Elow
Hierarchica    Technique
l Clustering
               Cluster 1     x34 x35 x36 x37 x40 x41 x42 x47 x48 x54 x64 x65 x66 x80 x81 x82 x83 x86
                             x87 x88 x90 x96 x97 x103 x104 x105 x117 x118 x119 x120 x121 x127 x156
                             x157 x158 x159 x163 x164 x165 x166 x167 x168 x186 x187 x188 x201
                             x203 x207 x208 x209 x210 x211 x212 x213 x215 x216 x217 x218 x220
                             x221 x222 x223 x224 x225 x227 x228 x229 x231 x251 x263 x264 x265
               Cluster 2     x1 x5 x6 x7 x8 x9 x10 x11 x12 x13 x14 x21 x22 x23 x24 x25 x26 x27 x28 x57
                             x58 x59 x60 x76 x77 x78 x108 x109 x146 x147 x148 x152 x153 x154 x170
                             x183 x184 x191 x192 x193 x196 x197 x246 x247 x248 x249 x254 x255
                             x261 x268 x271 x273 x275 x277 x281 x282 x283
               Cluster 3     x2 x3 x4 x15 x16 x17 x18 x19 x20 x30 x31 x33 x38 x39 x44 x46 x55 x61 x62
                             x63 x68 x70 x71 x72 x73 x74 x75 x79 x84 x91 x92 x93 x94 x95 x98 x99
                             x100 x101 x107 x111 x113 x115 x116 x122 x124 x125 x126 x144 x145
                             x149 x150 x151 x155 x160 x161 x162 x169 x171 x172 x173 x174 x179
                             x180 x181 x182 x185 x190 x194 x195 x198 x200 x204 x205 x241 x245
                             x250 x252 x253 x258 x259 x260 x262 x266 x267
               Shijhouette
               Technique
               Cluster 1     x5 x6 x7 x8 x9 x10 x11 x12 x13 x22 x23 x24 x25 x26 x27 x57 x58 x59 x77
                             x147 x153 x154 x191 x192 x246 x247 x248 x275 x277 x281 x282 x283
               Cluster 2     x167 x168 x169
               Cluster 3     x86 x87 x88 x90 x163 x164 x165 x218 x220 x221 x222
               Cluster 4     x93 x94 x98 x99 x100 x122 x124 x125 x145 x146
               Cluster 5     x2 x3 x18 x19 x20 x31 x33 x34 x37 x38 x39 x44 x46 x63 x68 x80 x83 x84
                             x91 x92 x95 x101 x105 x111 x113 x115 x116 x117 x121 x126 x127 x144
                        x150 x151 x160 x161 x162 x173 x186 x190 x200 x203 x204 x205 x207
                        x241 x250 x251 x252 x263 x265 x266
            Cluster 6   x35 x36 x40 x41 x42 x47 x48 x54 x64 x65 x66 x81 x82 x96 x97 x103 x104
                        x118 x119 x120 x156 x157 x158 x159 x166 x187 x188 x201 x208 x209
                        x210 x211 x212 x213
                        x215 x216 x217 x223 x224 x225 x227 x228 x229 x231 x264
            Cluster 7   x1 x4 x14 x15 x16 x17 x21 x28 x30 x55 x60 x61 x62 x70 x71 x72 x73x74
                        x75 x76 x78 x79 x107 x108 x109 x148 x149 x152 x155 x170 x171 x172
                        x182 x183 x184 x185 x193 x194 x195 x196 x197 x198 x245 x249 x253
                        x254 x255 x258 x259 x260 x261 x262 x267 x268 x271 x273
            Cluster 8   x174 x179 x180 x181

Table 4
Placement of lineament in clusters, using inductive clustering techniques
K-means            Feature
       Cluster 1   x2 x3 x4 x15 x16 x17 x18 x19 x20 x30 x31 x33 x38 x39 x44 x46 x55 x61 x62 x63
                   x68 x70 x71 x72 x73 x74 x75 x79 x84 x91 x92 x93 x94 x95 x98 x99 x100 x101
                   x107 x111 x113 x115 x116 x122 x124 x125 x126 x144 x145 x149 x150 x151 x155
                   x160 x161 x162 x169 x171 x172 x173 x174 x179 x180 x181 x182 x185 x190 x194
                   x195 x198 x200 x204 x205 x241 x245 x250 x252 x253 x258 x259 x260 x262 x266
                   x267
       Cluster 2   x34 x35 x36 x37 x40 x41 x42 x47 x48 x54 x64 x65 x66 x80 x81 x82 x83 x86 x87
                   x88 x90 x96 x97 x103 x104 x105 x117 x118 x119 x120 x121 x127 x156 x157 x158
                   x159 x163 x164 x165 x166 x167 x168 x186 x187 x188 x201 x203 x207 x208 x209
                   x210 x211 x212 x213 x215 x216 x217 x218 x220 x221 x222 x223 x224 x225 x227
                   x228 x229 x231 x251 x263 x264 x265
       Cluster 3   x1 x5 x6 x7 x8 x9 x10 x11 x12 x13 x14 x21 x22 x23 x24 x25 x26 x27 x28 x57 x58
                   x59 x60 x76 x77 x78 x108 x109 x146 x147 x148 x152 x153 x154 x170 x183 x184
                   x191 x192 x193 x196 x197 x246 x247 x248 x249 x254 x255 x261 x268 x271 x273
                   x275 x277 x281 x282 x283
C-means
       Cluster 1   x1 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13 x14 x17 x21 x22 x23 x24 x25 x26 x27 x28
                   x30 x57 x58 x59 x60 x61 x62 x70 x71 x72 x73 x74 x75 x76 x77 x78 x79 x92 x93
                   x94 x99 x100 x107 x108 x109 x145 x146 x147 x148 x149 x152 x153 x154 x169
                   x170 x171 x172 x179 x180 x181 x182 x183 x184 x190 x191 x192 x193 x194 x195
                   x196 x197 x198 x245 x246 x247 x248 x249 x253 x254 x255 x258 x259 x260 x261
                   x262 x266 x267 x268 x271 x273 x275 x277 x281 x282 x283
       Cluster 2   x2 x3 x15 x16 x18 x19 x20 x31 x33 x34 x35 x36 x37 x38 x39 x40 x41 x42 x44 x46
                   x47 x48 x54 x55 x63 x64 x65 x66 x68 x80 x81 x82 x83 x84 x86 x87 x88 x90 x91
                   x95 x96 x97 x98 x101 x103 x104 x105 x111 x113 x115 x116 x117 x118 x119 x120
                   x121 x122 x124 x125 x126 x127 x144 x150 x151 x155 x156 x157 x158 x159 x160
                   x161 x162 x163 x164 x165 x166 x167 x168 x173 x174 x185 x186 x187 x188 x200
                   x201 x203 x204 x205 x207 x208 x209 x210 x211 x212 x213 x215 x216 x217 x218
                   x220 x221 x222 x223 x224 x225 x227 x228 x229 x231 x241 x250 x251 x252 x263
                   x264 x265
       Cluster 3   x5 x6 x7 x8 x9 x10 x11 x12 x13 x22 x23 x24 x25 x26 x27 x57 x58 x59 x77 x153
                   x154 x191 x192 x246 x247 x248 x275 x277 x281 x282 x283
       Cluster 4   x35 x36 x37 x40 x41 x42 x47 x48 x54 x64 x65 x66 x80 x81 x82 x83 x87 x88 x90
                   x96 x97 x103 x104 x117 x118 x119 x120 x121 x156 x157 x158 x159 x165 x166
                   x167 x187 x188 x201 x203 x207 x208 x209 x210 x211 x212 x213 x215 x216 x217
                   x218 x220 x221 x222 x223 x224 x225 x227 x228 x229 x231 x264
5.3.   Visualization of results


                      a)                                               b)


                      c)                                               d)


                      e)                                               f)

 Figure 6: Silhouette diagrams obtained using classical k-means, c-means and BHC clustering
techniques: (a) silhouette k-means, k=3, designation of the amount of clusters was performed using
Elbow Technique; (b) silhouette k-means, k=8 designation of the amount of clusters was performed
using Silhouette Technique; (c) silhouette c-means, k=3, designation of the amount of clusters was
performed using Elbow Technique; d) c-means silhouette, k=8 designation of the amount of clusters
was performed using Silhouette Technique; e) Bayesian Hierarchical Clustering silhouette, k=3
designation of the amount of clusters was performed using Elbow Technique; f) Bayesian
Hierarchical Clustering silhouette, k=8 designation of the amount of clusters was performed using
Silhouette Technique
                        a1)                                                   a2)


                        b1)                                                   b2)


                       c1)                                                c2)
Figure 7: Silhouette diagrams obtained using inductive clustering technologies k-means, c-means
and BHC, (in the inductive technologies the amount of clusters is designated automatically): (a1 and
a2) silhouette k-means, k=3; (b1 and b2) silhouette k-means, k=2; (c1 and c2) silhouette c-means,
k=4.

6. Discussion
    Comparative studies of non-inductive clustering technologies and their evaluation using the
measure for assessing the quality of clustering (Tabl.1) showed that the highest quality partitioning at
k=3. This applies to all three clustering technologies. As for inductive analogues of the technologies
under study (Tabl.2), here the evaluation was performed for each technology simultaneously on two
equal and relevant sets (A and B), which does not contradict the further association of their selected
relevant clusters (Tab.4.).
    It is significant to mention that, due to the nature of inductive technologies, the designation of the
amount of clusters was performed automatically. Each technology utilized stochastic indicators or
values of the membership function, as well as the inductive probabilistic hierarchical clustering
technology.
    The inductive probabilistic hierarchical clustering technology employed boundary probabilities to
identify which clusters should be merged in order to prevent overflow. Essentially, it estimated the
probability that all the data in a potential union originated from the same mixture component and
compared this probability to the significantly massive amount of hypotheses at the lower levels of the
clustering hierarchy.
    As shown in Table 2, when partitioned into 3 clusters, the Silhouette index yielded values of 0.150
and 0.082 for the respective sets A and B. Using the c-means technology, 2 clusters were obtained, as
evidenced by the Dunn Index, Calinski-Harabasz Index, and Entropy Index values, indicating that the
boundaries between clusters are highly fuzzy. The inductive BHC technology identified 4 clusters,
which is confirmed by the values of Silhouette index (0.155 and 0.079) respectively on the relevant
sets A and B).
   Regarding the Silhouette index of Figs. 5 and 6, it should be noted that "questionable" clusters are
characterized by negative values, which requires additional research.

7. Conclusions
    The proposed clustering technology can be useful for identifying relevant lineament in the results
of laboratory tests for persons with multiple myeloma. It demonstrates high performance of the
developed inductive technologies, namely k-means, c-means, and Bayesian hierarchical clustering
based on the inductive modeling of complex systems. The main technology used in this study was the
Bayesian hierarchical clustering technology, and the impact of four internal measure (silhouette, Dunn
index, Calinski-Harabasz index, entropy) on clustering effectiveness was investigated. Additionally,
the application of the moving average technology for noise elimination in the data was proposed for
the first time. The overall use of the proposed noise elimination technique in conjunction with the
inductive approach significantly improves the quality of clustering complex objects. The excellence
of the proposed technologies lies in their stability, achieved by using an outside balance measure for
two identical samples.
    The proposed clustering technology can be beneficial for extracting relevant lineament from the
results of laboratory tests for persons with multiple myeloma in several aspects:
    Identification of influential lineament: The proposed clustering techniques allow for the
identification of groups of similar objects based on their characteristics. Lineament that significantly
alter the cluster structure or separate objects into different groups can be considered significant and
relevant. This helps identify lineament that may play a crucial role in the diagnosis, prognosis, or
classification of multiple myeloma.
    Removal of irrelevant lineament: The proposed clustering techniques can help identify lineament
that do not contribute significantly to the cluster structure or fail to separate objects into distinct
groups. Such lineament can be deemed irrelevant and excluded from further assay. This decreases the
dimensionality of the data and simplifies result interpretation.
    Selection of cluster labels: Clustering can aid in identifying clusters that exhibit distinct
characteristics or behaviors. Extracting relevant lineament can assist in choosing appropriate labels
for these clusters, facilitating more accurate result interpretation with potential clinical implications.
    Overall, the proposed clustering technology allows for the systematic assay of laboratory test
results for persons with multiple myeloma, the identification of significant lineament, and the
simplification of data interpretation. This can enhance the understanding of the disease, the
elaboration of diagnostic and prognostic technologies, and support decision-making in clinical
practice.

8. References
[1] J.-L. Harousseau, M. Dreyling, on behalf of the ESMO Guidelines Working Group. Multiple
    myeloma: ESMO Clinical Practice Guidelines for diagnosis, treatment and follow-up. Ann.
    Oncology 2010, 21(Suppl. 5): v155–7
[2] S.S. Bessmeltsev, K.M. Abdukadyrov, Mnozhestvennaya miyeloma. Sovremennyy vzglyad na
    problemu [Multiple myeloma. Current view of the problem]. Almaty, 2007.
[3] A. Jemal, et al., Cancer statistics. CA Cancer J. Clin 2007, 57(1), pp.43–66. doi:
    10.3322/canjclin.57.1.43. PMID: 17237035.
[4] J. Cid Ruzafa, et al., Patient population with multiple myeloma and transitions across different
    lines of therapy in the USA: an epidemiologic model. Pharmacoepidemiol Drug Saf 2016, 25(8),
    pp. 871–9. doi: 10.1002/pds.3927. PMID: 27476979.
[5] P.S. Rosenberg, et al., Future distribution of multiple myeloma in the United States by sex, age,
    and race/ ethnicity. Blood 2015, 125(2), pp. 410–2. doi: 10.1182/blood-2014-10-609461. PMID:
    25573972.
[6] K. Yamabe, et al., Epidemiology and burden of multiple myeloma in Japan: a systematic review.
     Value Health 2015, 18(7): A449, doi: 10.1016/j. jval.2015.09.1129. PMID: 26532529.
[7] J. Hong, et al., Recent advances in multiple myeloma: a Korean perspective. Korean J Intern
     Med 2016, 31(5):820–34, doi: 10.3904/kjim.2015.408. PMID: 27604794.
[8] X.C. Chen, et al., Epidemiological differences in haematological malignancies between Europe
     and China. Lancet Oncol 2014;15(11):471–2, doi: 10.1016/ S1470-2045(14)70441-3. PMID:
     25281463.
[9] J.H. Chen, et al., Prevalence and mortality-related factors of multiple myeloma in Taiwan. PLoS
     One 2016;11(12):e0167227, doi: 10.1371/ journal.pone.0167227. PMID: 27907052.
[10] H. Ludwig, et al., Multiple Myeloma Incidence and Mortality Around the Globe; Interrelations
     Between Health Access and Quality, Economic Resources, and Patient Empowerment.
     Oncologist. 2020 Sep;25(9):e1406-e1413. doi: 10.1634/theoncologist.2020-0141. Epub 2020
     May 7. PMID: 32335971; PMCID: PMC7485361.
[11] Kh. Tadist, et al., Feature selection techniques and genomic big data: a systematic review.
     Journal of Big Data 27.08.2019.
[12] M. Alweshah, et al., Coronavirus herd immunity optimizer with greedy crossover for feature
     selection in medical diagnosis. Knowledge-Based Systems, 235 (2022) 107629.
[13] C. Stan, D. Waltz, Towards memory based reasoning. Commun ACM. 1986, 29(12), pp.1213–
     1228.
[14] S. Piramuthu, Evaluating feature selection techniques for learning in data mining applications.
     Eur J Oper Res. 2004, 156(2), pp. 483–494, doi:10.1016/S0377-2217 (02)00911-6.
[15] J.D. Shie, S.M. Chen, Feature subset selection based on fuzzy entropy measures for handling
     classification problems. Appl Intell. 2008, 28(1), pp.69–82.
[16] S. Zhao, E.C. Tsang, On fuzzy approximation operators in attribute reduction with fuzzy rough
     sets. J Inf Sci. 2008, 178(16), pp. 3163–3176.
[17] I.A. Gheyas, L.S. Smith, Feature subset selection in massive dimensionality domains.
     PatternRecognit. 2010, 43(1), pp.5–13.
[18] F. Nie, et al. Efficient and robust feature selection via joint 2, 1-norms minimization. Adv Neural
     Information Process Syst. 2010, pp.1813–1821.
[19] S. Foithong, O. Pingern, B. Atachoo, Feature subset selection wrapper based on mutual
     information and rough sets. Expert Syst Appl. 2011, 39(1), pp. 574–584.
[20] H. Huang, et al. Ant colony optimization–based feature selection for surface electromyography
     signals classification. Comput Biol Med. 2011, 42(1), pp. 30–38.
[21] P. Ghamisi, et al., Feature selection based on hybridization of genetic technology and particle
     swarm optimization. IEEE Geosci Remote Sens Lett. 2015,12(2), pp. 309–313.
[22] G.N. Zhu, et al., An integrated feature selection and cluster assay techniques for case-based
     reasoning. Eng Appl Artif Intel. 2015, 39, pp.14–22. doi:10.1016/j.engappai.2014.11.006.
[23] P. Zhu, et al., Unsupervised feature selection by regularized self-representation. Pattern
     Recognit, 2015,48(2), pp. 438–446.
[24] F. Barani, et al., Application of binary quantum-inspired gravitational search technology in
     feature subset selection. Appl Intell. 2017, 47(2), pp. 304–318.
[25] P. Moradi, et al., A hybrid particle swarm optimization for feature subset selection by integrating
     a novel local search strategy. Appl Soft Comput. 2016, 43, pp.117–130.
[26] P. Shunmugapriya, et al., A hybrid technology using ant and bee colony optimization for feature
     selection and classification (AC-ABC hybrid). Swarm Evol Comput. 2017, 36, pp. 27–36.
[27] C. Boutsidis, et al., Unsupervised feature selection for the k-means clustering problem. NIPS.
     2009, pp.153–161.
[28] C. Boutsidis, et al. Randomized dimensionality Reduction for k-means clustering. IEEE Trans.
     Inf Theory. 2015, 61(2). pp.1045–1062.
[29] F. Moslehi, A. Haeri , A novel feature selection approach based on clustering
     technology, Journal of Statistical Computation and Simulation, 2021,91:3, pp. 581-
     604, doi: 10.1080/00949655.2020.1822358
[30] S. Babichev, M. A. Taif, V. Lytvynenko, Estimation of the inductive model of objects clustering
     stability based on the k-means technology for different levels of data noise. Radio electronics,
     computer science, management. Zaporozhye: NAS of Ukraine, 2016, no. 4, pp. 54-60.
[31] M. E Celebi, et al., A comparative study of efficient initialization techniques for the k-means
     clustering technology. Expert Systems with Applications. 40 (1), pp. 200-210. arXiv: 1209.1960.
[32] H. R. Madala, Inductive Learning Technologies for Complex Systems Modeling / H. R. Madala,
     A. G. Ivakhnenko. - CRC Press, 1994. –365 p.
[33] A.G. Ivakhnenko, Objective clusterization on the basis of the theory of self-organization of
     models. Soviet J. Automat. Inform. Sci. 20(5), pp. 1–9 (1987).
[34] V. Stepashko, Inductive modeling from historical perspective. In: Proceedings of the 12th
     international scientific and technical conference on computer sciences and information
     technologies, CSIT 2017, vol. 1, pp. 537–542 (2017).
[35] V.S. Stepashko, Theoretical aspects of GMDH as a technique of inductive modelling. Managing
     Systems and Machines 2, pp. 31-38 (2003) [In Russian].
[36] Zh. Hu, et al., An Evolving Cascade System Based on a Set of Neo - Fuzzy Nodes. International
     Journal of Intelligent Systems and Applications (IJISA) 8(9), 2016, pp. 1-7.
[37] F.D. Mwale, et al., Infilling of Missing Rainfall and Streamflow Data in the Shire River Basin,
     MalawiA Self Organizing Map Approach. Physics and Chemistry of the Earth, Parts A/B/C 50–
     52 (2012), pp. 34–43, doi:10.1016/j.pce.2012.09.006.
[38] F. B. Hamzah, et al., Imputation Techniques for Recovering Streamflow Observation: A
     Techniqueological Review. Edited by Fei Li. Cogent Environmental Science 6, no. 1 (January 1,
     2020): 1745133. doi:10.1080/23311843.2020.1745133
[39] H. Lee, K. Kwangmin, Interpolation of Missing Precipitation Data Using Kernel Estimations for
     Hydrologic Modeling. Advances in Meteorology 2015, pp. 1–12. doi:10.1155/2015/935868.
[40] J. Chen, et al., Jackknife Variance Estimation for Nearest-Neighbor Imputation. Journal of the
     American Statistical Association 96, no. 453 (March 2001), pp.260–269.
     doi:10.1198/016214501750332839.
[41] M. G. Kendall, A. Stuart, J. K. Ord, Kendall’s advanced theory of statistics, vol. 3, Hodder
     Arnold, London, 1983.
[42] D. Ladiray, et al., Seasonal adjustment with the X-11 technique, vol. 158 of Lecture notes in
     statistics, Springer-Verlag, 2001.
[43] S. Makridakis, et al., Forecasting: techniques and applications, 3rd edn, John Wiley & Sons, New
     York, 1998.
[44] J. Spencer, On the graduation of the rates of sickness and mortality, Journal of the Institute of
     Actuaries 38, 1904, pp. 334–343.
[45] R. J. Hyndman International Encyclopedia of Statistical Science, ed. Miodrag Lovric, Springer.
     pp.866-869.
[46] L.V. Sarycheva, Objective cluster assay of the data on the basis of the Group Technique of Data
     Handling//Problem of Management and Informatics, 2008, no. 2, pp. 86-104. [In Russian].
[47] H.Steinhaus, Sur la division des corp materiels en parties.Bull. Acad Polon Sci 1.804 (1956):801.
[48] S. P. Lloyd, Least square quantization in PCM. Bell Telephone Laboratories Paper. Published in
     journal much later: Lloyd, SP: Least squares quantization in PCM. IEEE Trans. Inform. Theor.
     (1957/1982).
[49] J. MacQueen, Some techniques for classification and assay of multivariate observations.
     Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, vol. 1,
     no. 14, 1967.
[50] M. Melnik, Fundamentals of applied statistics. Moscow: Energoatomizdat, 1983, 416 p.
[51] J.C. Dunn. A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-
     Separated Clusters // Journal of Cybernetics, 1973, 17 09 (t. 3, No. 3), pp. 32–57, ISSN 0022-
     0280, doi: 10.1080 / 01969727308546046.
[52] K.A. Heller, Z. Ghahramani, (2005). Bayesian Hierarchical Clustering. Proceedings of the 22nd
     international     conference     on     machine    learning,     pp.297-304.    Retrieved    from
     https://doi.org/10.1145/1102351.1102389.
[53] N. Lowing, R. Bomalaski, D. Mitra, (2017). Bayesian Hierarchical Clustering. Nicholas Lowing
     & Ryan Bomalaski Group 3 CSE 5290 Dr.
[54] A. Ivakhnenko, Group technique of data handling as competitor to the technique of stochastic
     approximation / A. Ivakhnenko // Soviet Automatic Control,1968, vol. 3, pp. 64–68.
[55] L. Kaufman, P. Rousseeuw, Finding Groups in Data. An Introduction to Cluster Assay. Wiley,
     2005, https://doi.org/ 10.1002/9780470316801.
[56] J.C. Bezdek, et al., Optimal fuzzy partitions: A heuristic for estimating the parameters in a
     mixture of normal dustrubutions//IEEE Transactions on Computers, 1975, pp. 835–838,
     https://doi.org/10.1109/T-C.1975.224317.
[57] T. Calinski, et al., A dendrite technique for cluster assay. Communication in statistics, 1974,
     no.3, pp. 1–27, https://doi.org/ 10.1080/03610927408827101.
[58] S. Ch. Sripada, M. S. Rao, Comparison of purity and entropy of k-means clustering and fuzzy c
     means clustering, Indian journal of computer science and engineering; Vol 2 no.3 June 2011,
     ISSN:0976-5166.