Evolutionary Approach to Multimodal
            Clustering on Formal Contexts

Mikhail Bogatyrev1[0000−0001−8477−6006] , Sergey Dvoenko1[0000−0001−8673−7617] ,
                  Dmitry Orlov1 , and Tatyana Shestaka1
                  1
                      Tula State University, 92 Lenin ave., Tula, Russia
                                     okkambo@mail.ru


      Abstract. Evolutionary approach to multimodal clustering on multi-
      dimensional formal contexts is proposed. Its main advantage is that it
      allows one to find a well-grounded number of clusters corresponding to
      the global or close to the global extremum of the function that charac-
      terizes the quality of solution of the clustering problem. This approach
      is effective when the proximity measure for a data being clustered is not
      Euclidean. Formal context is the data model in Formal Concept Anal-
      ysis, the area in data analysis where mathematically rigorous methods
      from lattice theory have been applied for discovering relationships on
      heterogeneous data. Taking into account the effect of data heterogene-
      ity in cluster analysis can be effectively implemented using multimodal
      clustering methods. The paper contains main definitions from Formal
      Concept Analysis, description the principle of evolutionary computation
      and evolutionary approach to multimodal clustering. Experimental study
      of proposed approach is performed on the task of phenotyping of disease
      of myocardial infarction.

      Keywords: Evolutionary Computation, Formal Context, Multimodal
      Clustering.


1   Introduction

One of the features of the data used in the area of data intensive processing
is heterogeneity. Accordingly, in the methods of intensive data processing, it
is necessary to take into account their heterogeneity, which makes it possible
to preserve knowledge presented in data sets obtained on domains of different
nature.
    There are a lot of works, for example [1-3], devoted to clustering heteroge-
neous data and cluster interpreting. Taking into account the effect of data het-
erogeneity in cluster analysis can be effectively implemented using multimodal
clustering methods. Classical cluster analysis is based on dividing a single set of
objects into disjoint subsets that are clusters. At the same time, despite the wide
variety of proximity measures of objects used here, the problem of interpreting
the obtained clusters remains urgent in cluster analysis. The clusters interpre-
tation is always performed in the context of the applied proximity measure, and


Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License
Attribution 4.0 International (CC BY 4.0).


                                            158
for heterogeneous data from different domains, such an interpretation based on
a single proximity measure will obviously not be correct.
    Multimodal clustering involves not one, but several sets of objects to be
clustered simultaneously. Such sets can be formed by heterogeneous data. A
multimodal cluster is a subset in the form of combinations of objects from dif-
ferent sets. The very fact of the combination of certain objects in a multimodal
cluster can carry important information and serve as the basis for clusters inter-
pretation.
    The simplest variant of multimodal clustering is biclustering, which was first
proposed in [4]. Biclustering algorithms work on data in the form of matrices,
the rows of which contain instances of objects with attributes (features) located
in columns. Biclustering methods have been applied in various fields of data
analysis, but especially in bioinformatics, in the task of studying gene expression
[5]. Here namely biclusters make it possible to objectively assess the mutual
influence of genes on biological processes.
    Biclustering data in the form of an object-attribute representation is also
used in Formal Concept Analysis (FCA) [6]. FCA is the paradigm of concep-
tual modeling which studies how objects can be hierarchically grouped together
according to their common attributes. Such grouping of objects is really bi-
clustering of them. The output of standard FCA algorithms is conceptual lat-
tice which contains hierarchically linked formal concepts which are biclusters
[7]. FCA methods for constructing concept lattices differ from the biclustering
methods used, for example, in the analysis of gene expression, and FCA meth-
ods expand the possibilities of interpreting clusters. The hierarchy of concepts
in the lattice reflects a hierarchy of data that is not obvious in the original view.
The use of concept lattices also makes it possible to build association rules and
functional dependencies on clustered data.
    Triclustering is a generalization of biclustering [8]. However, the appearance
of the third set in the data presentation fundamentally changes the situation,
and triclustering algorithms are not built by simple scaling of biclustering ones.
An overview of triclustering algorithms can be found in [9]. In [10], the analysis
of triclustering algorithms implemented using FCA is performed.
    The generalization of the approach based on the construction of concept lat-
tices to the n-dimensional case of multimodal clustering is presented in [11].
The most well-known algorithm for n-dimensional multimodal clustering is the
DataPeeler algorithm [12], which is often used as a benchmark for comparison
with other algorithms. The novelty of this work lies in the development of an
approach to multimodal data clustering based on the use of evolutionary cal-
culations. The fact that this approach is used on the data model in the form
of formal contexts allows us to naturally take into account the heterogeneity of
data.
    Evolutionary computation has been applied in clustering [13, 15]. Their main
advantage is that the evolutionary algorithms allow one to find a reasonable
number of clusters corresponding to the global or close to the global extremum
of the function that characterize the quality of solution of the clustering problem.


                                        159
This approach is also effective when there is no Euclidean proximity measure for
clustered data, which is relevant for heterogeneous data.
    The rest of the paper is organized as follows. In Section 2, there is brief
description of multimodal clustering on formal contexts. It contains main defi-
nitions from FCA and explanation why triclustering is not scalable biclustering.
Section 3 contains the description of the main contribution of this paper as evo-
lutionary algorithm of multimodal clustering on formal contexts. In the Section
4, the results of experimental study of proposed approach are presented. They
are illustrated on the task of phenotyping of disease of myocardial infarction.


2    Multimodal Clustering on Formal Contexts
We perceive multimodal clustering on formal context which is the data model
in FCA. That is why we briefly consider the main issues of the FCA [6, 10, 12].
Classical FCA deals with two basic notions: formal context and concept lattice.
Formal context is a triple K = (G, M, I) where G is a set of objects, M – set of
their attributes, I ⊆ G × M – binary relation which represents facts of belonging
attributes to objects. Formal context may be represented by [0, 1] - matrix
K = {ki,j } in which units mark correspondence between objects gi ⊆ G and
attributes mj ⊆ M . The concepts in the formal context have been determined
by the following way. If for subsets of objects A ⊆ G and attributes B ⊆ M
there are exist mappings which are realized as prime operators A′ : A → B and
B ′ : B → A with the following properties of completeness

                   A′ := {m ∈ M | < g, m > I ∈ for all g ∈ A}
                                                              
                                                                              (1)
                   B ′ := {g ∈ G| < g, m > I ∈ for all m ∈ B}
then the pair (A, B) that A′ = B, B ′ = A is named as formal concept. The
composition of mappings demonstrates following properties of A and B: (A, B)
that A′′ = B, B ′′ = A; A and B is called the extent and the intent of a formal
context K = (G, M, I) respectively.
    By other words, a formal concept is a pair (A, B) of subsets of objects and
attributes which are connected so that every object in A has every attribute in
B, for every object in G that is not in A, there is an attribute in B that the
object does not have and for every attribute in M that is not in B, there is an
object in A that does not have that attribute.
    If for formal concepts (A1 , B1 ) and (A2 , B2 ), A1 ⊑ A2 and B2 ⊑ B1 then
(A1 , B1 ) ≤ (A2 , B2 ) and formal concept (A1 , B1 ) is less general than (A2 , B2 ).This
order is represented by concept lattice. A lattice consists of a partially ordered
set in which every two elements have a unique supremum (also called a least
upper bound or join) and a unique infimum (also called a greatest lower bound
or meet).
    In the concept lattice, formal concepts are hierarchically grouped together
according to their common attributes. Such grouping of objects is really biclu-
clustering of them i.e. clustering on the two sets simultaneously, the set of objects
and the set of attributes.


                                          160
    Example 1. Consider an example of a formal context and the concept lattice
built on it. In the Fig. 1 a) a fragment of the data contained in the myocardial
infarction database is shown in the form of a two-dimensional binary context.
The context has 7 objects and 4 binary attributes. The objects are patient IDs,
and the meaning of the attributes is as follows:
•gb is the presence of an essential hypertension of a patient;
•ant im is the presence of an anterior myocardial infarction (left ventricular);
•im pg p is the presence of a right ventricular myocardial infarction;
•let is is the lethal outcome of a patient.


                                         a)


                                         b)


          Fig. 1: Example of a formal context and the concept lattice.


The concept lattice built on the considered formal context is shown on the Fig.
1 b). It has seven formal concepts including top and bottom ones which are
unit and zero elements of an abstract lattice correspondingly. Information about
patients is contained in five formal concepts, shown as filled circles. It is used so
called reduced labeling [21] in order to succinctly represent information about


                                        161
objects and attributes of formal context. If label of attribute A is attached to
some concept, that means, that this attribute occurs in objects of all concepts,
reachable by descending paths from this concept to zero concept (bottom ele-
ment) of lattice. If label of object O is attached to some concept, this means,
that object O lays in all concepts, reachable by ascending paths in lattice graph
from this concept to unit concept (top element) of lattice. If drawing of node
contains blue filled upper semicircle, that means, that there is an attribute, at-
tached to this concept. If drawing of node contains black filled lower semicircle,
that means, that there is an object, attached to this concept.
    In this example, all the patients have anterior myocardial infarction. Ac-
cording with reduced labeling, we can see in the lattice that ant im attribute
is presented in all five filled concepts. Among patients, those ones with IDs
1500,1503 and 1655 have the lethal outcome. On the left in the lattice there is a
separate path with the nodes as the following formal concepts:
    ({1500, 1503, 1655}, {ant im, let is}), (1655, {ant im, let is, gb}).
The first concept is more general than the second, since it is located above
it. The first concept reflects two facts: the presence of a myocardial infarction
and a lethal outcome for all of three patients. The second concept reflects the
peculiarity of patient 1655: he is the only one of the three who has essential
hypertension.
    Thus, the formal context and concept lattice are a visual tool for representing
knowledge.


2.1    Multidimensional Formal Contexts and Multimodal Clustering

Multidimensional or polyadic formal contexts arise as generalization of usual
formal contexts. A multidimensional, n-ary formal context is defined by a rela-
tion R ⊆ D1 × D2 × . . . × Dn on data domains D1 , D2 , . . . , Dn . The context is
an n+1 set:
                          K =< K1 , K2 , . . . , Kn , R >,                     (2)

where Ki ⊆ Di . Every n-ary context begets k -ary contexts, whose number is
                                                k
                                             1
                                                  (−1)i (ki )(k − i)n . As it is shown
                                                P
given by the Stirling formula [12] S(n, k) = k!
                                                        i=0
in [11], multidimensional n-ary context also contains formal concepts which also
form a lattice. Clustering on multidimensional formal contexts is called multi-
modal clustering [10].
    According to multimodal clustering, for any dimension of formal context, the
purpose of its processing is to find n-sets H = < X1 , X2 , . . . , Xn > which have
the closure property [12]:

                   ∀u = (x1 , x2 , . . . , xn ) ∈ X1 , X2 , . . . , Xn , u ∈ R,           (3)

∀j = 1, 2, . . . , n, ∀xj ∈ Dj \Xj < X1 , . . . , Xj ∪ {xj }, . . . , Xn > does not satisfy (3).
The sets H = < X1 , X2 , . . . , Xn > constitute multimodal clusters.


                                              162
    In the Formal Concept Analysis, biclustering algorithms have been devel-
oped in sufficient detail [16]. The same cannot be said about the triclustering
algorithms and especially about the multimodal clustering algorithms.
    The simplest multidimensional formal context is a triadic context (tricontext)
of the form T = (G, M, B, I), where B is a set specifying the conditions for
the belonging of attributes to objects, I ⊆ G × M × B is a ternary relation.
Accordingly, the ternary concepts (triconcepts) are defined as triplets of the
form:
                     (C1 , C2 , C3 ), C1 ⊆ G, C2 ⊆ M, C3 ⊆ B,                  (4)
with corresponding closure conditions for prime operators. Prime operators have
several other implementations here:

                            m′ = {(g, b)|(g, m, b) ∈ I}
                            g ′ = {(m, b)|(g, m, b) ∈ I}                      (5)
                            b′ = {(g, m)|(g, m, b) ∈ I}

as well as the corresponding double prime operators:
                               ∼               T ∼
                      m′′ = {m |(g, b) ∈ m′ (g, m, b) ∈ I}
                               ∼               T ∼
                       g ′′ = { g |(m, b) ∈ g ′ ( g , m, b) ∈ I}              (6)
                               ∼                        ∼
                       b′′ = { b |(g, m) ∈ b′ (g, m, b ) ∈ I}
                                               T


    If formal context T is represented by a three dimensional tensor, then a
triconcept is a 3-dimensional rectangle full of crosses.
    Although there are several recognized algorithms for constructing three-
dimensional formal concepts, for example, the Data-Peeler algorithm [12], the
problem of constructing three-dimensional clusters of a given density, which are
insufficiently dense concepts, is of practical interest.
    If C = (X, Y, Z), X ⊆ K1 , Y ⊆ K2 , Z ⊆ K3 is a cluster then its density is
defined as
                                     |I ∩ (X × Y × Z)|
                           d(C) =                                            (7)
                                            v(C)
where cluster volume is
                             v(C) = |X| × |Y | × |Z|                          (8)
    Example 2. A three-dimensional formal context can be built on the data
from the infarct database presented in the Example 1. The use of drugs on
certain days of hospitality may be a third dimension in the context. The standard
maximum treatment time for myocardial infarction is 21 days (at least in Russia),
which defines the scale of the third dimension. Fig. 2 shows a regular three-
dimensional cluster built on the context from the Example 1 with additional
attributes. It is built with our evolutionary modeling framework [20] but may be
discovered by other tools. The second subset of the cluster contains attributes
lat im, inf im, post im which detail variants of myocardial infarction. Attribute
tikl s n means the use of the drug of Ticlid during therapy. It has values on the


                                        163
third dimension. On the Fig. 2 it is seen that Ticlid was used immediately (day
0) and on 21 st day of therapy.
    The cluster on the Fig. 2 is not dense. This means that not all combinations of
elements from the three subsets of the cluster take place. The “informativeness”
of multidimensional clusters depends on their density, but not absolutely.


                        Fig. 2: Three-dimensional cluster.
In this example, we are not sure that the Ticlid was applied to these three
patients only on these days, but the fact contained in this cluster may be of
interest to cardiologists. “Absolutely reliable” facts have been contained in ab-
solutely dense clusters, which are formal concepts. However, special facts that
fall out of the general regularities can also be found in loose clusters. As for this
example, this subset of patients may not form absolutely dense clusters and the
facts should be searched for in low-density clusters by their additional analysis.
    Evolutionary methods make it possible to simulate the clustering process in
such a way that they allow to investigate the density distributions across clusters
and thereby have a detailed picture of clustering.

3     Evolutionary Approach to Multimodal Clustering
Evolutionary Approach to Multimodal Clustering is based on Evolutionary Com-
putation [15]. Evolutionary computation is a term referring to several methods
of global optimization, united by the fact that they all use the concept of the
evolution of a set of solutions to an optimization problem, leading to solutions
corresponding to the extreme value of some function that sets the optimization
quality criterion. Evolutionary computation is effective when working with mul-
timodal functions. If such a function has a global extremum, the evolutionary
algorithm finds solutions corresponding to the range of values of the quality
function that are sufficiently close to the that global extremum.

3.1   Principle of Evolutionary Computation
Let X is a set of solutions of a problem. Every solution x ∈ X can be charac-
terized by a quality measure named as fitness function f(x). This measure, in


                                        164
general, is a mapping f : X → Y where Y ⊆ R is the subset of a set of real
numbers. This is basis for existence many variations of fitness functions which
for example characterize configuration of clusters.
    Let solutions of a problem depend on a set of parameters P . Most problems
which have been solved by using Evolutionary Computation can be formulated
as the following optimization problem: it is required to find optimal values of
parameters p∗ which deliver maximum fitness value y ∗ ∈ Y, so the following is
true:
                              p∗ = argmaxp∗ ∈P f(x)                         (9)
    Evolutionary approach to solving this problem consists in the following.
    Building encoding scheme. Encoding scheme is the mapping φ : P → S
where set S contains objects which encode parameters from P. Genetic algo-
rithms [13] which are widely used in Evolutionary Computation often use binary
encoding and every value of p ∈ P is represented as binary string. Encoding
scheme is not necessarily binary (as it is not binary in Nature): every string
position contains a symbol (gene) from encoding alphabet, and there are variants
of alphabets applied in encoding schemata [14]. But necessarily there exists an
inverse mapping φ−1 : S → P , so for every s ∈ S there exists p ∈ P .
    Evolutionary algorithm. For given encoding scheme the following algo-
rithm solves the problem (7).
          A. Randomly generate an initial set (population) S0 of objects from S.
          B. Start evolution of the populations by applying a set of operators A
to population S0 and further iteratively so that for every Sk+1 = A(Sk ) exists
at least one
                            f[φ−1 (Sk+1 )] ≥ f[φ−1 (Sk )],                  (10)
where sk ∈ Sk and sk+1 ∈ Sk+1 .
          C. Finish the evolution of the population in accordance with the stop-
ping criterion. Most often, the criterion for stopping is the immutability of the
fitness function values over several steps of evolution.
If the set of operators A consists of genetic operators of selection, mutation
and recombination (crossover) then evolutionary algorithm is named as genetic
algorithm [14].
    Selection works so that condition (10) is supported by the following “biologi-
cal” principle: good parents produce good offspring (that is not true in Nature).
So the higher fitness chromosomes have more opportunity to be selected than
the lower ones and good solution is always alive in the next generation.
    Crossover is the genetic operator that mixes two chromosomes together to
form a new offspring. It does mixing by replacing fragments of chromosome’s
code divided in certain one or several randomly selected points.
    MutationMutation involves modification of the gene values by randomly se-
lecting new value from the alphabet at random point in the strings of genes.
    Being realized, the algorithm (A. – C.) provides fairly accurate solution of
the problem (9).
    Fairly accurate means that evolutionary algorithm stops in a neighbourhood
of global extreme of fitness function f. The size of a neighbourhood around


                                       165
extreme depends on the fitness function and parameters of genetic operators.
When evolutionary algorithm works too fast it may stop at local extreme. This
feature is traditionally considered as the lack of the algorithm but it may be
useful for clustering since local extreme of quality measure may be “semantically
better” than global extreme because it may correspond to the cluster containing
an interesting fact. In our experiments we have observed just that situations.

3.2   Evolutionary Computation in Multimodal Clustering
Here we outline our solution to applying evolutionary computation to clustering
in multidimensional contexts.
    There are two crucial parameters of clustering problem: a measure of sim-
ilarity of clustering objects (proximity measure) and number of clusters – is
it given or not before clustering. Evolutionary algorithms have advantage over
traditional clustering methods when:

1. measure of similarity of clustering objects is not traditional (Euclidian norm)
   [22];
2. number of clusters is not given and
3. number of clusters is great.

   Consider this in more detail.
1. Many clustering algorithms use proximity measures of objects based on cal-
   culating the distances between them. Evolutionary and genetic algorithms
   work on the black box principle. The black box inputs are the values of the
   parameters P (see previous Section) and at the outputs we get the corre-
   sponding solutions, for which we calculate the values of the fitness function.
   A dependence between input and output may be very complex and not
   being expressed analytically as traditional proximity measures. For our ex-
   periments, there is no analytical dependence between cluster configurations
   specified by chromosomes and, for example, cluster densities. Therefore, the
   fitness function used in this case does not use the traditional proximity mea-
   sure.
2. In evolutionary and genetic algorithms for clustering [15], the number of
   clusters which will be obtained depends on chromosome encodings. After
   analyzing the existing variants of chromosome encodings, we settled on two
   of them. The first variant is our chained integer encoding scheme [20]. The
   second encoding scheme is a binary scheme organized according to the prin-
   ciple of ”one chromosome – one cluster”. In this scheme, the chromosome
   gene is the object number. In this encoding, the maximal number of clusters
   is equal to the number of chromosomes. The number of clusters is not set
   initially. As a result of the evolution of n chromosomes, really k different
   chromosomes from the population should remain.
3. Models in the form of formal contexts give rise to a large number of for-
   mal concepts and an even greater number of clusters. When applying n-
   dimensional contexts, the upper bound of the number of clusters is estimated


                                      166
    as 2|K1 | × . . . × 2|Kn | [11]. In the study of gene expression with evolution-
    ary algorithms the number of genes in experiments may be over dozens of
    thousands and the length of chromosomes which represent clusters may be
    giant. Nevertheless, the computational problem of processing very long chro-
    mosomes (usually binary) is solved now [9, 18]. We performed evolutionary
    clustering with genetic algorithm realizing evolutionary algorithm (A. – C.)
    and having various parameters

   Chromosome encoding. After analyzing the existing variants of chromo-
some encoding [15], we settled on two of them. The first variant is our chained
integerencoding scheme [20]. The second encoding scheme is a binary scheme or-
ganized according to the principle of ”one chromosome – one cluster”. It has one,
two or three sections in chromosomes according with the variant of encoding (see
Section 3.1) and dimension of a context. Chromosomes for three-dimensional
contexts have sections ”patients”, ”attrbutes” and ”days”. In the sections, a
number of gene is the number of patient, number of attribute from the context
and number of a day according with objects order in the corresponding subsets
in formal tricontext. Different chromosomes form different clusters. Because of
the evolution of many such chromosomes, really k different chromosomes from
n members of the population should remain.
   Fitness function. As in FCA, we control cluster density (7), its volume (8)
and special kind of interestingness. There is the trade-off problem between the
density and the volume of triclusters [10, 17]. Depending on the data, density
and volume may be contradictory characteristics of clusters. The data that we
use are sparse, and if we collect enough units in a cluster, it will be simultane-
ously voluminous. Therefore, we do not use the volume of clusters in the fitness
function, but only use their density. Nevertheless we calculate cluster volumes
during evolution.
   For the binary encoding scheme, fitness function has the form:
                                            N
                                       1 X
                              f(d) =         αi d(Ci ),                        (11)
                                       N i=1

    where αi is user defined coefficient, which in general depends on cluster den-
sity, N is the number of chromosomes in population which is equal to the max-
imal number of clusters.
    For the chained integer-encoding scheme fitness function is the following:

                                       N         Kj
                                   1 X 1 X
                          f(d) =                αi d(Ci ),                     (12)
                                   N j=1 Kj i=1

   where Kj is the number of clusters in the j-th chromosome.
   Interestingness of a clusters. In a genetic algorithm, the whole fitness of
population hides the features of individual chromosomes. But if selection leaves
chromosomes with maximum fitness, then there is a chance that they will lead
evolution to good solutions. Patient ID values found in clusters, other attributes


                                           167
corresponding to them from the “treatment” and “treatment outcomes” domains
are evaluated for the presence of information in them that can be treated as facts.
The formal criteria for selecting such “interesting” clusters are the following.
   A single cluster. The presence of a single cluster at the end of evolution
means that the algorithm most likely stopped at the global extremum of a fitness
function [14, 22]. That cluster may be interesting and it probably may be dense.
But as it was mentioned above, the diversity of clustering results is important
and the presence of a single cluster is a reason to change parameters of the
algorithm to have several extremes of a fitness function.
   Dense clusters. The densest clusters among the received are very important
since the information they contain may be certainly treated as facts.
   Clusters of the maximum volume are interesting if they are dense enough. A
large number of objects from different domains in the cluster is a sign that some
pattern occurs on the data.
   Clusters with given values of density and volume are interesting when density
or volume values are specified in relation to some other characteristics of the
clustering task.


4     Experiments

Experimental studies of the proposed approach were carried out in order to
test the performance of the genetic algorithm under various parameters and for
checking the possibility of interpreting clustering results as facts.


4.1   Data Set

We use Myocardial Infarction Complications Data Set [19] for experiments. It
contains information about 1700 patients having disease of myocardial infarction.
This data set has 1700 objects and 124 attributes collected in the multivalued
formal context. Among attributes, there are ones about patients (ID only), their
anamnesis, their treatment methods, and complications after treatment. An at-
tribute may be binary or has a value as natural number.


4.2   Clustering Task

Based on the data of myocardial infarction, the task of phenotyping this disease
was solved. Phenotyping refers to the determination of the form of the disease
based on the clinical profile. A clinical profile is a cluster that can include various
data describing both the disease itself and the methods of its treatment, as well
as the conditions of patients. Therefore, we were interested in various triples of
attribute sets from the domains ”patient”, ”treatment”, ”treatment results”.


                                         168
4.3   Investigated Properties of the Algorithm
In the experiments, we investigated, first of all, the effectiveness of various op-
tions for implementing the evolutionary clustering algorithm, as well as its per-
formance.
   Diversity in clusters. To ensure diversity in clusters, we used large values of
mutation probability. The graph in Fig. 3 shows that when a certain threshold
value of the mutation probability is reached, the number of clusters increases
sharply and remains unchanged with an increase in the mutation probability. In
this case, the algorithm found 30 local extrema of the fitness function.


   Fig. 3: Effect of the mutation probability value on the number of clusters.
   Scaling algorithm performance. The algorithm processes very long three-
section chromosomes of about 2000 genes fairly quickly. To investigate algorithm
performance we have constructed seven formal contexts acquired from the whole
set which number of objects and attributes are shown in Table 1 where ECG is
electrocardiogram.

         Table 1: Number of objects and attributes of formal contexts.
              Context                 Objects             Attributes
             Anamnesis                  1700                   33
              Therapy                   1700                   24
              Analyzes                  1700                   19
                                                                            .
               Infarct                  1700                    6
                ECG                     1700                   27
           Therapy results              1700                   14
              Full data                 1700                   123
   Fig. 4 shows clustering execution time for each of the seven contexts. The
graph on this figure shows that the cluster construction time depends quasilin-
early on the amount of data.


                                       169
          Fig. 4: Clustering execution time for several formal contexts.

   Conflicting criteria. As we expected the criteria for maximizing the volume
and density of clusters contradict each other. In the graphs on Figure 5 it is shown
that the volume (graph a) and density (graph b) of clusters synchronously change
during evolution.


                  a)                                           b)

   Fig. 5: Effect of the mutation probability value on the number of clusters.

The combination of cluster density and volume in a single fitness function masks
certain relationships between attributes in clustering results. Therefore, here we
make a principal conclusion that it is necessary to apply two optimization criteria
in this clustering task with genetic algorithm.
   Comparison with Data-Peeler. We were also interested in absolutely
dense clusters, the formal concepts. As expected, there were few such clusters,
which follows from the sparsity of myocardial data. To compare our results with
well known another algorithm, we selected Data-Peeler [12] and modernized its
code [23] by adding graphical user interface. Comparison of the results is shown
in Table 2.


                                       170
             Table 2: Clustering results compared with Data-Peeler.
                       Number of          Number of          Number of Data-
    Formal context
                         clusters       dense clusters        Peeler concepts
      Anamnesis              30                14                  449639
       Therapy               30                19                   28599
       Analyzes              30                17                    162
        Infarct              30                30                     65
         ECG                 30                10                  689011
    Therapy results          30                12                    7798
       Full data             30                 4                 12564890

The results in the last row of Table 2 can be explained by the high sparsity of
data in this formal context. Accordingly, the Data-Peeler algorithm has built a
lot of small concepts.


4.4    Facts Extracted with Clustering

We were interested in special clusters. First of all, these are clusters with large
groups of patients characterized by certain combinations of attributes from the
domains ”patient”, ”treatment”, ”treatment results”. Several such groups were
obtained.

 1. We have found that the lethal outcome of myocardial infarction is inherent
    in elderly patients over 60 years of age. This fact is consistent with the known
    data of cardiology.
 2. In more detail, cases of heart attack in the anamnesis correlate with a fatal
    outcome, which also looks natural.

   For both this groups of patients, we found absolutely dense clusters built on
tensors with age and anamnesis attributes.
   Unexpected result. We have found one unexpected result, which is as follows.
On the data of myocardial infarction, there are stable (not changing according
with different parameters of the genetic algorithm) and rather dense clusters in
which a subgroup of patients with a lethal outcome have not got certain drugs.
At the same time, patients with a non-lethal outcome had these drugs.


5     Conclusion

This paper proposes an approach to multimodal clustering on multidimensional
formal contexts using evolutionary computation. This approach has been shown


                                       171
to be effective in experiments on clustering three-dimensional formal contexts
based on data of patients with myocardial infarction.
   The presented experimental results reflect the initial stage of research in this
area. In the future, it is planned to do the following.

 1. Evaluate the informativeness of the obtained clusters not manually, but using
    a user interface focused on doctors.
 2. Experiments have confirmed that the criteria of cluster density and volume
    contradict each other. Therefore, it is necessary to apply multi-objective
    evolutionary clustering with appropriate algorithms.
 3. Transition to the dimension of formal contexts greater than three. Separate
    groups of parameters can be represented as dimensions. Then their combi-
    nations obtained in clusters will reflect in more detail the relationships in
    heterogeneous data.

Acknowledgments. The reported study was funded by Russian Foundation of
Basic Research, the research projects № 19-07-01178, № 20-07-00055 and RFBR
and Tula Region according to research project № 19-47-710007.


References
1. Data Clustering: Algorithms and Applications. Ed. Charu Aggarwal, Chandan
   Reddy. CRC Press, London, (2013).
2. S. Theodoridis and K. Koutroumbas: Pattern Recognition. 3rd edn. Academic Press,
   (2006).
3. Sergey D. Dvoenko, Jan W. Owsinski: The Permutable k-means for the Bi-partial
   Crite-rion. Informatica 43(4), 253–262 (2019).
4. Hartigan J A. Direct clustering of a data matrix. Journal of the American statistical
   association, 67(337): 123—129 (1972)
5. Madeira SC, Oliveira AL. Biclustering algorithms for biological data analysis: a
   survey. IEEE/ACM Trans. Comput. Biol. Bioinform. Jan-Mar;1(1):24-45. (2004)
   DOI: 10.1109/TCBB.2004.2.
6. Ganter, Bernhard; Stumme, Gerd; Wille, Rudolf, eds., Formal Concept Analysis:
   Foundations and Applications, Lecture Notes in Artificial Intelligence, No. 3626,
   Springer-Verlag. Berlin (2005) DOI:10.1007/978-3-540-31881-1
7. Kaytoue, Mehdi, Kuznetsov, Sergei, Napoli, Amedeo. Biclustering Numerical Data
   in Formal Concept Analysis. 135-150. (2011). DOI: 10.1007/978-3-642-20514-9 12.
8. Ignatov D. I., Kuznetsov S. O., Zhukov L. E., Poelmans J., Can triconcepts become
   triclusters? // International Journal of General Systems, Vol. 42. No. 6 (2013)
9. Henriques R., Madeira S. Triclustering Algorithms for Three-Dimensional Data
   Analy-sis: A Comprehensive Survey. ACM Comput. Surv. V. 51. № 5. P. 1–43.
   (2019) DOI: 10.1145/3195833.
10. Dmitry V. Gnatyshak, Dmitry I. Ignatov, Sergei O. Kuznetsov: From Triadic FCA
   to Triclustering: Experimental Comparison of Some Triclustering Algorithms. In:
   Proceedings of the Tenth International Conference on Concept Lattices and Their
   Applications (CLA’2013), La Rochelle: Laboratory L3i, University of La Rochelle,
   pp. 249-260, (2013)
11. Voutsadakis, G. Polyadic concept analysis. – Order. Vol. 19 (3). Pp. 295–304 (2002)


                                         172
12. Cerf, L., Besson, J., Robardet, C., Boulicaut, J.F.: Closed Patterns Meet N-ary
   Relations. In: ACM Trans. Knowl. Discov. Data. 3, 1, Article 3, 36 p. (2009)
13. R. M. Cole: Clustering with Genetic Algorithms, MSc Thesis, University of Western
   Australia, Australia (1998)
14. Goldberg D.E. Genetic Algorithms in Search Optimization and Machine Learning.
   Addison-Wesley, Reading, MA, USA (1989)
15. Hruschka E., Campello R., Freitas A., de Carballo A. A Survey of Evolutionary
   Algorithms for Clustering. Systems, Man, and Cybernetics, Part C: Applications
   and Reviews, IEEE Transactions on Evolutionary Computation. V. 39. P. 133–155.
   (2009) DOI: 10.1109/TSMCC.2008.2007252.
16. S.O. Kuznetsov and S.A. Obiedkov: Comparing Performance of Algorithms for
   Generating Concept Lattices. Journal of Experimental and Theoretical Artificial
   Intelligence, Vol. 14, no. 2-3, pp. 189-216, 2002.
17. Ignatov D. I., Gnatyshak D. V., Sergei O. Kuznetsov, Boris G. Mirkin: Triadic For-
   mal Concept Analysis and triclustering: searching for optimal patterns. In: Machine
   Learning, April, 2015, pp. 1-32.
18. Ma P., Chan K., Yao X., Chiu D.: An evolutionary clustering algorithm for gene
   expression microarray data analysis. IEEE Transactions on Evolutionary Compu-
   tation. V. 10. P. 296– 314 (2006) doi: 10.1109/TEVC.2005.859371.
19. Myocardial             infarction         complications         Data          Set.
   http://archive.ics.uci.edu/ml/machine-learning-databases/00579/
20. M. Y. Bogatyrev, A. P. Terekhov: Framework for Evolutionary Modelling in Text
   Mining. - Proceedings of the SENSE’09 - Conceptual Structures for Extracting
   Natural Language Semantics. Workshop at 17th International Conference on Con-
   ceptual Structures (ICCS’09), p.p. 26-37 (2009)
21. Serhiy A. Yevtushenko: System of data analysis ”Concept Explorer”. (In Russian).
   In: Proceedings of the 7th National Conference on Artificial Intelligence KII-2000,
   p. 127-134, Moscow (2000)
22. Dan Simon: Evolutionary Optimization Algorithms: Biologically-Inspired and
   Population-Based Approaches to Computer Intelligence. John Wiley & Sons, New
   Jersey (2013)
23. https://github.com/ibrahim85/d-peeler


                                        173