<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Evolutionary Approach to Multimodal Clustering on Formal Contexts</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>il Bog</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dmitry Orlov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tatyana Shestaka</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Tula State University</institution>
          ,
          <addr-line>92 Lenin ave., Tula</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <fpage>158</fpage>
      <lpage>173</lpage>
      <abstract>
        <p>Evolutionary approach to multimodal clustering on multidimensional formal contexts is proposed. Its main advantage is that it allows one to find a well-grounded number of clusters corresponding to the global or close to the global extremum of the function that characterizes the quality of solution of the clustering problem. This approach is efective when the proximity measure for a data being clustered is not Euclidean. Formal context is the data model in Formal Concept Analysis, the area in data analysis where mathematically rigorous methods from lattice theory have been applied for discovering relationships on heterogeneous data. Taking into account the efect of data heterogeneity in cluster analysis can be efectively implemented using multimodal clustering methods. The paper contains main definitions from Formal Concept Analysis, description the principle of evolutionary computation and evolutionary approach to multimodal clustering. Experimental study of proposed approach is performed on the task of phenotyping of disease of myocardial infarction.</p>
      </abstract>
      <kwd-group>
        <kwd>Evolutionary Computation</kwd>
        <kwd>Formal Context</kwd>
        <kwd>Multimodal Clustering</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>One of the features of the data used in the area of data intensive processing
is heterogeneity. Accordingly, in the methods of intensive data processing, it
is necessary to take into account their heterogeneity, which makes it possible
to preserve knowledge presented in data sets obtained on domains of diferent
nature.</p>
      <p>
        There are a lot of works, for example [
        <xref ref-type="bibr" rid="ref1 ref2 ref3">1-3</xref>
        ], devoted to clustering
heterogeneous data and cluster interpreting. Taking into account the efect of data
heterogeneity in cluster analysis can be efectively implemented using multimodal
clustering methods. Classical cluster analysis is based on dividing a single set of
objects into disjoint subsets that are clusters. At the same time, despite the wide
variety of proximity measures of objects used here, the problem of interpreting
the obtained clusters remains urgent in cluster analysis. The clusters
interpretation is always performed in the context of the applied proximity measure, and
for heterogeneous data from diferent domains, such an interpretation based on
a single proximity measure will obviously not be correct.
      </p>
      <p>Multimodal clustering involves not one, but several sets of objects to be
clustered simultaneously. Such sets can be formed by heterogeneous data. A
multimodal cluster is a subset in the form of combinations of objects from
different sets. The very fact of the combination of certain objects in a multimodal
cluster can carry important information and serve as the basis for clusters
interpretation.</p>
      <p>
        The simplest variant of multimodal clustering is biclustering, which was first
proposed in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Biclustering algorithms work on data in the form of matrices,
the rows of which contain instances of objects with attributes (features) located
in columns. Biclustering methods have been applied in various fields of data
analysis, but especially in bioinformatics, in the task of studying gene expression
[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Here namely biclusters make it possible to objectively assess the mutual
influence of genes on biological processes.
      </p>
      <p>
        Biclustering data in the form of an object-attribute representation is also
used in Formal Concept Analysis (FCA) [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. FCA is the paradigm of
conceptual modeling which studies how objects can be hierarchically grouped together
according to their common attributes. Such grouping of objects is really
biclustering of them. The output of standard FCA algorithms is conceptual
lattice which contains hierarchically linked formal concepts which are biclusters
[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. FCA methods for constructing concept lattices difer from the biclustering
methods used, for example, in the analysis of gene expression, and FCA
methods expand the possibilities of interpreting clusters. The hierarchy of concepts
in the lattice reflects a hierarchy of data that is not obvious in the original view.
The use of concept lattices also makes it possible to build association rules and
functional dependencies on clustered data.
      </p>
      <p>
        Triclustering is a generalization of biclustering [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. However, the appearance
of the third set in the data presentation fundamentally changes the situation,
and triclustering algorithms are not built by simple scaling of biclustering ones.
An overview of triclustering algorithms can be found in [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. In [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], the analysis
of triclustering algorithms implemented using FCA is performed.
      </p>
      <p>
        The generalization of the approach based on the construction of concept
lattices to the n-dimensional case of multimodal clustering is presented in [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
The most well-known algorithm for n-dimensional multimodal clustering is the
DataPeeler algorithm [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], which is often used as a benchmark for comparison
with other algorithms. The novelty of this work lies in the development of an
approach to multimodal data clustering based on the use of evolutionary
calculations. The fact that this approach is used on the data model in the form
of formal contexts allows us to naturally take into account the heterogeneity of
data.
      </p>
      <p>
        Evolutionary computation has been applied in clustering [
        <xref ref-type="bibr" rid="ref13 ref15">13, 15</xref>
        ]. Their main
advantage is that the evolutionary algorithms allow one to find a reasonable
number of clusters corresponding to the global or close to the global extremum
of the function that characterize the quality of solution of the clustering problem.
This approach is also efective when there is no Euclidean proximity measure for
clustered data, which is relevant for heterogeneous data.
      </p>
      <p>The rest of the paper is organized as follows. In Section 2, there is brief
description of multimodal clustering on formal contexts. It contains main
definitions from FCA and explanation why triclustering is not scalable biclustering.
Section 3 contains the description of the main contribution of this paper as
evolutionary algorithm of multimodal clustering on formal contexts. In the Section
4, the results of experimental study of proposed approach are presented. They
are illustrated on the task of phenotyping of disease of myocardial infarction.
2</p>
      <p>
        Multimodal Clustering on Formal Contexts
We perceive multimodal clustering on formal context which is the data model
in FCA. That is why we brieyfl consider the main issues of the FCA [
        <xref ref-type="bibr" rid="ref10 ref12 ref6">6, 10, 12</xref>
        ].
Classical FCA deals with two basic notions: formal context and concept lattice.
Formal context is a triple K = (G, M, I) where G is a set of objects, M – set of
their attributes, I ⊆ G × M – binary relation which represents facts of belonging
attributes to objects. Formal context may be represented by [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ] - matrix
K = {ki,j } in which units mark correspondence between objects gi ⊆ G and
attributes mj ⊆ M . The concepts in the formal context have been determined
by the following way. If for subsets of objects A ⊆ G and attributes B ⊆ M
there are exist mappings which are realized as prime operators A′ : A → B and
B′ : B → A with the following properties of completeness
      </p>
      <p>A′ := {m ∈ M | &lt; g, m &gt; I ∈ for all g ∈ A}
B′ := {g ∈ G| &lt; g, m &gt; I ∈ for all m ∈ B}
(1)
then the pair (A, B) that A′ = B, B′ = A is named as formal concept. The
composition of mappings demonstrates following properties of A and B: (A, B)
that A′′ = B, B′′ = A; A and B is called the extent and the intent of a formal
context K = (G, M, I) respectively.</p>
      <p>By other words, a formal concept is a pair (A, B) of subsets of objects and
attributes which are connected so that every object in A has every attribute in
B, for every object in G that is not in A, there is an attribute in B that the
object does not have and for every attribute in M that is not in B, there is an
object in A that does not have that attribute.</p>
      <p>If for formal concepts (A1, B1) and (A2, B2), A1 ⊑ A2 and B2 ⊑ B1 then
(A1, B1) ≤ (A2, B2) and formal concept (A1, B1) is less general than (A2, B2).This
order is represented by concept lattice. A lattice consists of a partially ordered
set in which every two elements have a unique supremum (also called a least
upper bound or join) and a unique infimum (also called a greatest lower bound
or meet).</p>
      <p>In the concept lattice, formal concepts are hierarchically grouped together
according to their common attributes. Such grouping of objects is really
bicluclustering of them i.e. clustering on the two sets simultaneously, the set of objects
and the set of attributes.</p>
      <p>Example 1. Consider an example of a formal context and the concept lattice
built on it. In the Fig. 1 a) a fragment of the data contained in the myocardial
infarction database is shown in the form of a two-dimensional binary context.
The context has 7 objects and 4 binary attributes. The objects are patient IDs,
and the meaning of the attributes is as follows:
•gb is the presence of an essential hypertension of a patient;
•ant im is the presence of an anterior myocardial infarction (left ventricular);
•im pg p is the presence of a right ventricular myocardial infarction;
•let is is the lethal outcome of a patient.</p>
      <p>
        The concept lattice built on the considered formal context is shown on the Fig.
1 b). It has seven formal concepts including top and bottom ones which are
unit and zero elements of an abstract lattice correspondingly. Information about
patients is contained in five formal concepts, shown as filled circles. It is used so
called reduced labeling [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] in order to succinctly represent information about
objects and attributes of formal context. If label of attribute A is attached to
some concept, that means, that this attribute occurs in objects of all concepts,
reachable by descending paths from this concept to zero concept (bottom
element) of lattice. If label of object O is attached to some concept, this means,
that object O lays in all concepts, reachable by ascending paths in lattice graph
from this concept to unit concept (top element) of lattice. If drawing of node
contains blue filled upper semicircle, that means, that there is an attribute,
attached to this concept. If drawing of node contains black filled lower semicircle,
that means, that there is an object, attached to this concept.
      </p>
      <p>In this example, all the patients have anterior myocardial infarction.
According with reduced labeling, we can see in the lattice that ant im attribute
is presented in all vfie filled concepts. Among patients, those ones with IDs
1500,1503 and 1655 have the lethal outcome. On the left in the lattice there is a
separate path with the nodes as the following formal concepts:</p>
      <p>({1500, 1503, 1655}, {ant im, let is}), (1655, {ant im, let is, gb}).
The first concept is more general than the second, since it is located above
it. The first concept reflects two facts: the presence of a myocardial infarction
and a lethal outcome for all of three patients. The second concept reflects the
peculiarity of patient 1655: he is the only one of the three who has essential
hypertension.</p>
      <p>Thus, the formal context and concept lattice are a visual tool for representing
knowledge.
2.1</p>
    </sec>
    <sec id="sec-2">
      <title>Multidimensional Formal Contexts and Multimodal Clustering</title>
      <p>Multidimensional or polyadic formal contexts arise as generalization of usual
formal contexts. A multidimensional, n-ary formal context is defined by a
relation R ⊆ D1 × D2 × . . . × Dn on data domains D1, D2, . . . , Dn. The context is
an n+1 set:</p>
      <p>
        K =&lt; K1, K2, . . . , Kn, R &gt;,
(2)
where Ki ⊆ Di. Every n-ary context begets k -ary contexts, whose number is
k
given by the Stirling formula [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] S(n, k) = k1! iP=0(−1) i(ik)(k − i)n. As it is shown
in [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], multidimensional n-ary context also contains formal concepts which also
form a lattice. Clustering on multidimensional formal contexts is called
multimodal clustering [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
      </p>
      <p>
        According to multimodal clustering, for any dimension of formal context, the
purpose of its processing is to find n-sets H = &lt; X1, X2, . . . , Xn &gt; which have
the closure property [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]:
∀u = (x1, x2, . . . , xn) ∈ X1, X2, . . . , Xn, u ∈ R,
(3)
∀j = 1, 2, . . . , n, ∀xj ∈ Dj \Xj &lt; X1, . . . , Xj ∪ {xj }, . . . , Xn &gt; does not satisfy (3).
The sets H = &lt; X1, X2, . . . , Xn &gt; constitute multimodal clusters.
      </p>
      <p>
        In the Formal Concept Analysis, biclustering algorithms have been
developed in suficient detail [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. The same cannot be said about the triclustering
algorithms and especially about the multimodal clustering algorithms.
      </p>
      <p>The simplest multidimensional formal context is a triadic context (tricontext)
of the form T = (G, M, B, I), where B is a set specifying the conditions for
the belonging of attributes to objects, I ⊆ G × M × B is a ternary relation.
Accordingly, the ternary concepts (triconcepts) are defined as triplets of the
form:
(C1, C2, C3), C1 ⊆</p>
      <p>G, C2 ⊆</p>
      <p>M, C3 ⊆</p>
      <p>B,
(4)
with corresponding closure conditions for prime operators. Prime operators have
several other implementations here:
(5)
(6)
(7)
(8)
m′ = {(g, b)|(g, m, b) ∈ I}
g′ = {(m, b)|(g, m, b) ∈ I}
b′ = {(g, m)|(g, m, b) ∈ I}
as well as the corresponding double prime operators:
m′′ = { m∼ |(g, b) ∈ m′ T(g, m∼, b) ∈ I}
g′′ = {∼g |(m, b) ∈ g′ T(∼g , m, b) ∈ I}
b′′ = {∼ b |(g, m) ∈ b′ T(g, m, ∼ b ) ∈ I}</p>
      <p>If formal context T is represented by a three dimensional tensor, then a
triconcept is a 3-dimensional rectangle full of crosses.</p>
      <p>
        Although there are several recognized algorithms for constructing
threedimensional formal concepts, for example, the Data-Peeler algorithm [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], the
problem of constructing three-dimensional clusters of a given density, which are
insuficiently dense concepts, is of practical interest.
      </p>
      <p>If C = (X, Y, Z), X ⊆ K1, Y ⊆ K2, Z ⊆ K3 is a cluster then its density is
defined as
d(C) = |I ∩ (X × Y × Z)|</p>
      <p>v(C)
v(C) = |X| × |Y
| × |Z |
where cluster volume is</p>
      <p>
        Example 2. A three-dimensional formal context can be built on the data
from the infarct database presented in the Example 1. The use of drugs on
certain days of hospitality may be a third dimension in the context. The standard
maximum treatment time for myocardial infarction is 21 days (at least in Russia),
which defines the scale of the third dimension. Fig. 2 shows a regular
threedimensional cluster built on the context from the Example 1 with additional
attributes. It is built with our evolutionary modeling framework [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] but may be
discovered by other tools. The second subset of the cluster contains attributes
lat im, inf im, post im which detail variants of myocardial infarction. Attribute
tikl s n means the use of the drug of Ticlid during therapy. It has values on the
third dimension. On the Fig. 2 it is seen that Ticlid was used immediately (day
0) and on 21 st day of therapy.
      </p>
      <p>The cluster on the Fig. 2 is not dense. This means that not all combinations of
elements from the three subsets of the cluster take place. The “informativeness”
of multidimensional clusters depends on their density, but not absolutely.</p>
      <p>In this example, we are not sure that the Ticlid was applied to these three
patients only on these days, but the fact contained in this cluster may be of
interest to cardiologists. “Absolutely reliable” facts have been contained in
absolutely dense clusters, which are formal concepts. However, special facts that
fall out of the general regularities can also be found in loose clusters. As for this
example, this subset of patients may not form absolutely dense clusters and the
facts should be searched for in low-density clusters by their additional analysis.</p>
      <p>Evolutionary methods make it possible to simulate the clustering process in
such a way that they allow to investigate the density distributions across clusters
and thereby have a detailed picture of clustering.
3</p>
      <p>
        Evolutionary Approach to Multimodal Clustering
Evolutionary Approach to Multimodal Clustering is based on Evolutionary
Computation [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. Evolutionary computation is a term referring to several methods
of global optimization, united by the fact that they all use the concept of the
evolution of a set of solutions to an optimization problem, leading to solutions
corresponding to the extreme value of some function that sets the optimization
quality criterion. Evolutionary computation is efective when working with
multimodal functions. If such a function has a global extremum, the evolutionary
algorithm finds solutions corresponding to the range of values of the quality
function that are suficiently close to the that global extremum.
3.1
      </p>
    </sec>
    <sec id="sec-3">
      <title>Principle of Evolutionary Computation</title>
      <p>Let X is a set of solutions of a problem. Every solution x ∈ X can be
characterized by a quality measure named as fitness function f(x). This measure, in
general, is a mapping f : X → Y where Y ⊆ R is the subset of a set of real
numbers. This is basis for existence many variations of fitness functions which
for example characterize configuration of clusters.</p>
      <p>Let solutions of a problem depend on a set of parameters P . Most problems
which have been solved by using Evolutionary Computation can be formulated
as the following optimization problem: it is required to find optimal values of
parameters p∗ which deliver maximum fitness value y∗ ∈ Y, so the following is
true:
p∗ = argmaxp∗ ∈P f(x)
(9)
Evolutionary approach to solving this problem consists in the following.</p>
      <p>
        Building encoding scheme. Encoding scheme is the mapping φ : P → S
where set S contains objects which encode parameters from P. Genetic
algorithms [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] which are widely used in Evolutionary Computation often use binary
encoding and every value of p ∈ P is represented as binary string. Encoding
scheme is not necessarily binary (as it is not binary in Nature): every string
position contains a symbol (gene) from encoding alphabet, and there are variants
of alphabets applied in encoding schemata [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. But necessarily there exists an
inverse mapping φ−1 : S → P , so for every s ∈ S there exists p ∈ P .
      </p>
      <p>Evolutionary algorithm. For given encoding scheme the following
algorithm solves the problem (7).</p>
      <p>A. Randomly generate an initial set (population) S0 of objects from S.</p>
      <p>B. Start evolution of the populations by applying a set of operators A
to population S0 and further iteratively so that for every Sk+1 = A(Sk) exists
at least one
f[φ−1 (Sk+1)] ≥ f[φ−1 (Sk)],
(10)
where sk ∈ Sk and sk+1 ∈ Sk+1.</p>
      <p>C. Finish the evolution of the population in accordance with the
stopping criterion. Most often, the criterion for stopping is the immutability of the
iftness function values over several steps of evolution.</p>
      <p>
        If the set of operators A consists of genetic operators of selection, mutation
and recombination (crossover) then evolutionary algorithm is named as genetic
algorithm [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
      </p>
      <p>Selection works so that condition (10) is supported by the following
“biological” principle: good parents produce good ofspring (that is not true in Nature).
So the higher fitness chromosomes have more opportunity to be selected than
the lower ones and good solution is always alive in the next generation.</p>
      <p>Crossover is the genetic operator that mixes two chromosomes together to
form a new ofspring. It does mixing by replacing fragments of chromosome’s
code divided in certain one or several randomly selected points.</p>
      <p>MutationMutation involves modification of the gene values by randomly
selecting new value from the alphabet at random point in the strings of genes.</p>
      <p>Being realized, the algorithm (A. – C.) provides fairly accurate solution of
the problem (9).</p>
      <p>Fairly accurate means that evolutionary algorithm stops in a neighbourhood
of global extreme of fitness function f. The size of a neighbourhood around
extreme depends on the fitness function and parameters of genetic operators.
When evolutionary algorithm works too fast it may stop at local extreme. This
feature is traditionally considered as the lack of the algorithm but it may be
useful for clustering since local extreme of quality measure may be “semantically
better” than global extreme because it may correspond to the cluster containing
an interesting fact. In our experiments we have observed just that situations.
3.2</p>
    </sec>
    <sec id="sec-4">
      <title>Evolutionary Computation in Multimodal Clustering</title>
      <p>Here we outline our solution to applying evolutionary computation to clustering
in multidimensional contexts.</p>
      <p>
        There are two crucial parameters of clustering problem: a measure of
similarity of clustering objects (proximity measure) and number of clusters – is
it given or not before clustering. Evolutionary algorithms have advantage over
traditional clustering methods when:
1. measure of similarity of clustering objects is not traditional (Euclidian norm)
[
        <xref ref-type="bibr" rid="ref22">22</xref>
        ];
2. number of clusters is not given and
3. number of clusters is great.
      </p>
      <p>
        Consider this in more detail.
1. Many clustering algorithms use proximity measures of objects based on
calculating the distances between them. Evolutionary and genetic algorithms
work on the black box principle. The black box inputs are the values of the
parameters P (see previous Section) and at the outputs we get the
corresponding solutions, for which we calculate the values of the fitness function.
A dependence between input and output may be very complex and not
being expressed analytically as traditional proximity measures. For our
experiments, there is no analytical dependence between cluster congfiurations
specified by chromosomes and, for example, cluster densities. Therefore, the
iftness function used in this case does not use the traditional proximity
measure.
2. In evolutionary and genetic algorithms for clustering [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], the number of
clusters which will be obtained depends on chromosome encodings. After
analyzing the existing variants of chromosome encodings, we settled on two
of them. The rfist variant is our chained integer encoding scheme [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. The
second encoding scheme is a binary scheme organized according to the
principle of ”one chromosome – one cluster”. In this scheme, the chromosome
gene is the object number. In this encoding, the maximal number of clusters
is equal to the number of chromosomes. The number of clusters is not set
initially. As a result of the evolution of n chromosomes, really k diferent
chromosomes from the population should remain.
3. Models in the form of formal contexts give rise to a large number of
formal concepts and an even greater number of clusters. When applying
ndimensional contexts, the upper bound of the number of clusters is estimated
as 2|K1| × . . . × 2|Kn| [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. In the study of gene expression with
evolutionary algorithms the number of genes in experiments may be over dozens of
thousands and the length of chromosomes which represent clusters may be
giant. Nevertheless, the computational problem of processing very long
chromosomes (usually binary) is solved now [
        <xref ref-type="bibr" rid="ref18 ref9">9, 18</xref>
        ]. We performed evolutionary
clustering with genetic algorithm realizing evolutionary algorithm (A. – C.)
and having various parameters
      </p>
      <p>
        Chromosome encoding. After analyzing the existing variants of
chromosome encoding [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], we settled on two of them. The first variant is our chained
integerencoding scheme [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. The second encoding scheme is a binary scheme
organized according to the principle of ”one chromosome – one cluster”. It has one,
two or three sections in chromosomes according with the variant of encoding (see
Section 3.1) and dimension of a context. Chromosomes for three-dimensional
contexts have sections ”patients”, ”attrbutes” and ”days”. In the sections, a
number of gene is the number of patient, number of attribute from the context
and number of a day according with objects order in the corresponding subsets
in formal tricontext. Diferent chromosomes form diferent clusters. Because of
the evolution of many such chromosomes, really k diferent chromosomes from
n members of the population should remain.
      </p>
      <p>
        Fitness function. As in FCA, we control cluster density (7), its volume (8)
and special kind of interestingness. There is the trade-of problem between the
density and the volume of triclusters [
        <xref ref-type="bibr" rid="ref10 ref17">10, 17</xref>
        ]. Depending on the data, density
and volume may be contradictory characteristics of clusters. The data that we
use are sparse, and if we collect enough units in a cluster, it will be
simultaneously voluminous. Therefore, we do not use the volume of clusters in the fitness
function, but only use their density. Nevertheless we calculate cluster volumes
during evolution.
      </p>
      <p>For the binary encoding scheme, fitness function has the form:
where α i is user defined coeficient, which in general depends on cluster
density, N is the number of chromosomes in population which is equal to the
maximal number of clusters.</p>
      <p>For the chained integer-encoding scheme fitness function is the following:
f(d) =
1 N</p>
      <p>X α id(Ci),</p>
      <p>N i=1
f(d) =
1 XN 1 XKj α id(Ci),
N j=1 Kj i=1
(11)
(12)
where Kj is the number of clusters in the j-th chromosome.</p>
      <p>Interestingness of a clusters. In a genetic algorithm, the whole fitness of
population hides the features of individual chromosomes. But if selection leaves
chromosomes with maximum fitness, then there is a chance that they will lead
evolution to good solutions. Patient ID values found in clusters, other attributes
corresponding to them from the “treatment” and “treatment outcomes” domains
are evaluated for the presence of information in them that can be treated as facts.
The formal criteria for selecting such “interesting” clusters are the following.</p>
      <p>
        A single cluster. The presence of a single cluster at the end of evolution
means that the algorithm most likely stopped at the global extremum of a fitness
function [
        <xref ref-type="bibr" rid="ref14 ref22">14, 22</xref>
        ]. That cluster may be interesting and it probably may be dense.
But as it was mentioned above, the diversity of clustering results is important
and the presence of a single cluster is a reason to change parameters of the
algorithm to have several extremes of a fitness function.
      </p>
      <p>Dense clusters. The densest clusters among the received are very important
since the information they contain may be certainly treated as facts.</p>
      <p>Clusters of the maximum volume are interesting if they are dense enough. A
large number of objects from diferent domains in the cluster is a sign that some
pattern occurs on the data.</p>
      <p>Clusters with given values of density and volume are interesting when density
or volume values are specified in relation to some other characteristics of the
clustering task.
4</p>
      <sec id="sec-4-1">
        <title>Experiments</title>
        <p>4.1</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Data Set</title>
      <p>
        Experimental studies of the proposed approach were carried out in order to
test the performance of the genetic algorithm under various parameters and for
checking the possibility of interpreting clustering results as facts.
We use Myocardial Infarction Complications Data Set [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] for experiments. It
contains information about 1700 patients having disease of myocardial infarction.
This data set has 1700 objects and 124 attributes collected in the multivalued
formal context. Among attributes, there are ones about patients (ID only), their
anamnesis, their treatment methods, and complications after treatment. An
attribute may be binary or has a value as natural number.
4.2
      </p>
    </sec>
    <sec id="sec-6">
      <title>Clustering Task</title>
      <p>Based on the data of myocardial infarction, the task of phenotyping this disease
was solved. Phenotyping refers to the determination of the form of the disease
based on the clinical profile. A clinical profile is a cluster that can include various
data describing both the disease itself and the methods of its treatment, as well
as the conditions of patients. Therefore, we were interested in various triples of
attribute sets from the domains ”patient”, ”treatment”, ”treatment results”.
4.3</p>
    </sec>
    <sec id="sec-7">
      <title>Investigated Properties of the Algorithm</title>
      <p>In the experiments, we investigated, first of all, the efectiveness of various
options for implementing the evolutionary clustering algorithm, as well as its
performance.</p>
      <p>Diversity in clusters. To ensure diversity in clusters, we used large values of
mutation probability. The graph in Fig. 3 shows that when a certain threshold
value of the mutation probability is reached, the number of clusters increases
sharply and remains unchanged with an increase in the mutation probability. In
this case, the algorithm found 30 local extrema of the fitness function.
33
24
19
6
27
14
123
.</p>
      <p>Scaling algorithm performance. The algorithm processes very long
threesection chromosomes of about 2000 genes fairly quickly. To investigate algorithm
performance we have constructed seven formal contexts acquired from the whole
set which number of objects and attributes are shown in Table 1 where ECG is
electrocardiogram.</p>
      <p>Fig. 4 shows clustering execution time for each of the seven contexts. The
graph on this figure shows that the cluster construction time depends
quasilinearly on the amount of data.</p>
    </sec>
    <sec id="sec-8">
      <title>Context</title>
      <p>Anamnesis</p>
      <sec id="sec-8-1">
        <title>Therapy</title>
        <p>Analyzes</p>
      </sec>
      <sec id="sec-8-2">
        <title>Infarct</title>
        <p>ECG
Therapy results</p>
        <p>Full data
1700
1700
1700
1700
1700
1700
1700</p>
        <p>Fig. 4: Clustering execution time for several formal contexts.</p>
        <p>Conflicting criteria. As we expected the criteria for maximizing the volume
and density of clusters contradict each other. In the graphs on Figure 5 it is shown
that the volume (graph a) and density (graph b) of clusters synchronously change
during evolution.</p>
        <p>a)
b)</p>
        <p>Fig. 5: Efect of the mutation probability value on the number of clusters.
The combination of cluster density and volume in a single tfiness function masks
certain relationships between attributes in clustering results. Therefore, here we
make a principal conclusion that it is necessary to apply two optimization criteria
in this clustering task with genetic algorithm.</p>
        <p>
          Comparison with Data-Peeler. We were also interested in absolutely
dense clusters, the formal concepts. As expected, there were few such clusters,
which follows from the sparsity of myocardial data. To compare our results with
well known another algorithm, we selected Data-Peeler [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] and modernized its
code [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ] by adding graphical user interface. Comparison of the results is shown
in Table 2.
        </p>
      </sec>
    </sec>
    <sec id="sec-9">
      <title>Formal context</title>
      <sec id="sec-9-1">
        <title>Anamnesis Therapy Analyzes Infarct</title>
        <p>The results in the last row of Table 2 can be explained by the high sparsity of
data in this formal context. Accordingly, the Data-Peeler algorithm has built a
lot of small concepts.
We were interested in special clusters. First of all, these are clusters with large
groups of patients characterized by certain combinations of attributes from the
domains ”patient”, ”treatment”, ”treatment results”. Several such groups were
obtained.
1. We have found that the lethal outcome of myocardial infarction is inherent
in elderly patients over 60 years of age. This fact is consistent with the known
data of cardiology.
2. In more detail, cases of heart attack in the anamnesis correlate with a fatal
outcome, which also looks natural.</p>
        <p>For both this groups of patients, we found absolutely dense clusters built on
tensors with age and anamnesis attributes.</p>
        <p>Unexpected result. We have found one unexpected result, which is as follows.
On the data of myocardial infarction, there are stable (not changing according
with diferent parameters of the genetic algorithm) and rather dense clusters in
which a subgroup of patients with a lethal outcome have not got certain drugs.
At the same time, patients with a non-lethal outcome had these drugs.
5</p>
        <sec id="sec-9-1-1">
          <title>Conclusion</title>
          <p>This paper proposes an approach to multimodal clustering on multidimensional
formal contexts using evolutionary computation. This approach has been shown
to be efective in experiments on clustering three-dimensional formal contexts
based on data of patients with myocardial infarction.</p>
          <p>The presented experimental results reflect the initial stage of research in this
area. In the future, it is planned to do the following.
1. Evaluate the informativeness of the obtained clusters not manually, but using
a user interface focused on doctors.
2. Experiments have confirmed that the criteria of cluster density and volume
contradict each other. Therefore, it is necessary to apply multi-objective
evolutionary clustering with appropriate algorithms.
3. Transition to the dimension of formal contexts greater than three. Separate
groups of parameters can be represented as dimensions. Then their
combinations obtained in clusters will reflect in more detail the relationships in
heterogeneous data.</p>
          <p>Acknowledgments. The reported study was funded by Russian Foundation of
Basic Research, the research projects № 19-07-01178, № 20-07-00055 and RFBR
and Tula Region according to research project № 19-47-710007.</p>
        </sec>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Data</given-names>
            <surname>Clustering</surname>
          </string-name>
          : Algorithms and Applications. Ed. Charu Aggarwal, Chandan Reddy. CRC Press, London, (
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>S.</given-names>
            <surname>Theodoridis</surname>
          </string-name>
          and
          <string-name>
            <surname>K.</surname>
          </string-name>
          <article-title>Koutroumbas: Pattern Recognition. 3rd edn</article-title>
          . Academic Press, (
          <year>2006</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Sergey</surname>
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Dvoenko</surname>
          </string-name>
          , Jan W. Owsinski:
          <article-title>The Permutable k-means for the Bi-partial Crite-rion</article-title>
          .
          <source>Informatica</source>
          <volume>43</volume>
          (
          <issue>4</issue>
          ),
          <fpage>253</fpage>
          -
          <lpage>262</lpage>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Hartigan</surname>
          </string-name>
          J A.
          <article-title>Direct clustering of a data matrix</article-title>
          .
          <source>Journal of the American statistical association</source>
          ,
          <volume>67</volume>
          (
          <issue>337</issue>
          ):
          <fpage>123</fpage>
          -
          <lpage>129</lpage>
          (
          <year>1972</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Madeira</surname>
            <given-names>SC</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Oliveira</surname>
            <given-names>AL</given-names>
          </string-name>
          .
          <article-title>Biclustering algorithms for biological data analysis: a survey</article-title>
          .
          <source>IEEE/ACM Trans. Comput. Biol. Bioinform</source>
          . Jan-Mar;
          <volume>1</volume>
          (
          <issue>1</issue>
          ):
          <fpage>24</fpage>
          -
          <lpage>45</lpage>
          . (
          <year>2004</year>
          ) DOI:
          <fpage>10</fpage>
          .1109/TCBB.
          <year>2004</year>
          .
          <volume>2</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Ganter</surname>
          </string-name>
          , Bernhard; Stumme, Gerd; Wille, Rudolf, eds.,
          <source>Formal Concept Analysis: Foundations and Applications, Lecture Notes in Artificial Intelligence, No. 3626</source>
          , Springer-Verlag. Berlin (
          <year>2005</year>
          ) DOI:
          <fpage>10</fpage>
          .1007/978-3-
          <fpage>540</fpage>
          -31881-1
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Kaytoue</surname>
          </string-name>
          , Mehdi, Kuznetsov, Sergei, Napoli, Amedeo.
          <source>Biclustering Numerical Data in Formal Concept Analysis</source>
          .
          <fpage>135</fpage>
          -
          <lpage>150</lpage>
          . (
          <year>2011</year>
          ). DOI:
          <volume>10</volume>
          .1007/978-3-
          <fpage>642</fpage>
          -20514-9
          <fpage>12</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Ignatov</surname>
            <given-names>D. I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kuznetsov</surname>
            <given-names>S. O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhukov</surname>
            <given-names>L. E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Poelmans</surname>
            <given-names>J.</given-names>
          </string-name>
          ,
          <source>Can triconcepts become triclusters? // International Journal of General Systems</source>
          , Vol.
          <volume>42</volume>
          . No.
          <volume>6</volume>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Henriques</surname>
            <given-names>R.</given-names>
          </string-name>
          , Madeira S.
          <article-title>Triclustering Algorithms for Three-Dimensional Data Analy-sis: A Comprehensive Survey</article-title>
          . ACM Comput. Surv. V.
          <volume>51</volume>
          . № 5. P. 1-
          <fpage>43</fpage>
          . (
          <year>2019</year>
          ) DOI:
          <fpage>10</fpage>
          .1145/3195833.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Dmitry</surname>
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Gnatyshak</surname>
            ,
            <given-names>Dmitry I. Ignatov</given-names>
          </string-name>
          , Sergei O.
          <article-title>Kuznetsov: From Triadic FCA to Triclustering: Experimental Comparison of Some Triclustering Algorithms</article-title>
          .
          <source>In: Proceedings of the Tenth International Conference on Concept Lattices and Their Applications</source>
          (CLA'
          <year>2013</year>
          ), La Rochelle: Laboratory L3i, University of La Rochelle, pp.
          <fpage>249</fpage>
          -
          <lpage>260</lpage>
          , (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Voutsadakis</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <article-title>Polyadic concept analysis</article-title>
          .
          <source>- Order</source>
          . Vol.
          <volume>19</volume>
          (
          <issue>3</issue>
          ). Pp.
          <volume>295</volume>
          -
          <fpage>304</fpage>
          (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Cerf</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Besson</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Robardet</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Boulicaut</surname>
            ,
            <given-names>J.F.</given-names>
          </string-name>
          :
          <string-name>
            <surname>Closed Patterns Meet N-ary Relations</surname>
          </string-name>
          .
          <source>In: ACM Trans. Knowl. Discov. Data. 3</source>
          ,
          <issue>1</issue>
          , Article 3, 36 p. (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>R. M.</surname>
          </string-name>
          <article-title>Cole: Clustering with Genetic Algorithms</article-title>
          ,
          <source>MSc Thesis</source>
          , University of Western Australia,
          <string-name>
            <surname>Australia</surname>
          </string-name>
          (
          <year>1998</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Goldberg</surname>
            <given-names>D.E.</given-names>
          </string-name>
          <string-name>
            <surname>Genetic</surname>
          </string-name>
          <article-title>Algorithms in Search Optimization and Machine Learning</article-title>
          .
          <source>Addison-Wesley</source>
          , Reading, MA, USA (
          <year>1989</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Hruschka</surname>
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Campello</surname>
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Freitas</surname>
            <given-names>A</given-names>
          </string-name>
          .,
          <string-name>
            <surname>de Carballo</surname>
          </string-name>
          <article-title>A. A Survey of Evolutionary Algorithms for Clustering</article-title>
          .
          <source>Systems, Man, and Cybernetics</source>
          , Part C:
          <article-title>Applications</article-title>
          and Reviews, IEEE Transactions on Evolutionary Computation. V. 39. P.
          <volume>133</volume>
          -
          <fpage>155</fpage>
          . (
          <year>2009</year>
          ) DOI:
          <fpage>10</fpage>
          .1109/TSMCC.
          <year>2008</year>
          .
          <volume>2007252</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <given-names>S.O.</given-names>
            <surname>Kuznetsov</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.A.</given-names>
            <surname>Obiedkov</surname>
          </string-name>
          <article-title>: Comparing Performance of Algorithms for Generating Concept Lattices</article-title>
          .
          <source>Journal of Experimental and Theoretical Articfiial Intelligence</source>
          , Vol.
          <volume>14</volume>
          , no.
          <issue>2-3</issue>
          , pp.
          <fpage>189</fpage>
          -
          <lpage>216</lpage>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Ignatov</surname>
            <given-names>D. I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gnatyshak</surname>
            <given-names>D. V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sergei</surname>
            <given-names>O.</given-names>
          </string-name>
          <string-name>
            <surname>Kuznetsov</surname>
          </string-name>
          , Boris G.
          <article-title>Mirkin: Triadic Formal Concept Analysis and triclustering: searching for optimal patterns</article-title>
          .
          <source>In: Machine Learning</source>
          , April,
          <year>2015</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>32</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18. Ma P.,
          <string-name>
            <surname>Chan</surname>
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yao</surname>
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chiu</surname>
            <given-names>D.:</given-names>
          </string-name>
          <article-title>An evolutionary clustering algorithm for gene expression microarray data analysis</article-title>
          .
          <source>IEEE Transactions on Evolutionary Computation</source>
          . V. 10. P.
          <volume>296</volume>
          -
          <fpage>314</fpage>
          (
          <year>2006</year>
          ) doi: 10.1109/TEVC.
          <year>2005</year>
          .
          <volume>859371</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <article-title>Myocardial infarction complications Data Set</article-title>
          . http://archive.ics.uci.edu/ml/machine-learning-databases/00579/
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>M. Y. Bogatyrev</surname>
            ,
            <given-names>A. P.</given-names>
          </string-name>
          <string-name>
            <surname>Terekhov</surname>
          </string-name>
          <article-title>: Framework for Evolutionary Modelling in Text Mining</article-title>
          .
          <source>- Proceedings of the SENSE'09 - Conceptual Structures for Extracting Natural Language Semantics. Workshop at 17th International Conference on Conceptual Structures (ICCS'09)</source>
          , p.p.
          <fpage>26</fpage>
          -
          <lpage>37</lpage>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Serhiy</surname>
            <given-names>A</given-names>
          </string-name>
          .
          <article-title>Yevtushenko: System of data analysis ”Concept Explorer”. (In Russian)</article-title>
          .
          <source>In: Proceedings of the 7th National Conference on Artificial Intelligence KII-2000</source>
          , p.
          <fpage>127</fpage>
          -
          <lpage>134</lpage>
          , Moscow (
          <year>2000</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22. Dan Simon:
          <article-title>Evolutionary Optimization Algorithms: Biologically-Inspired and Population-Based Approaches to Computer Intelligence</article-title>
          . John Wiley &amp; Sons, New Jersey (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>23. https://github.com/ibrahim85/d-peeler</mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>