<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>July</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Framework for Pareto-Optimal Multimodal Clustering</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Mikhail Bogatyrev</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dmitry Orlov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Tula State University</institution>
          ,
          <addr-line>92 Lenin ave., Tula</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2022</year>
      </pub-date>
      <volume>23</volume>
      <issue>2022</issue>
      <fpage>51</fpage>
      <lpage>62</lpage>
      <abstract>
        <p>The data used in Artificial Intelligence systems is often multimodal. Their representation in the form of formal contexts leads to contexts of high dimension. When constructing formal concepts and clustering on such contexts, the algorithms that are robust to the increasing dimension of contexts and capable of displaying a variety of clustering options are in demand. The modeling framework that meets these requirements is proposed. The framework uses multi-objective optimization and Evolutionary computation. The clustering results performed in the framework are compared with the known ones.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;multi-objective optimization</kwd>
        <kwd>multimodal clustering</kwd>
        <kwd>Pareto optimization</kwd>
        <kwd>fact extraction</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
    </sec>
    <sec id="sec-2">
      <title>2. Background and Related Work</title>
      <p>The background of this work consists of two areas. The first area is multi-objective optimization
using evolutionary algorithms [11]. The second area is multimodal clustering in FCA.</p>
      <sec id="sec-2-1">
        <title>2.1. Multi-objective Optimization with Evolutionary Algorithms</title>
        <p>Multi-objective optimization is simultaneous optimization a number of objectives. Its specificity
is manifested when the objectives conflict each other, i.e., improving the values of one objective,
we worsen the values of another. This problem initiates the emergence of a set of compromise
optimal solutions, commonly known as the Pareto-optimal solutions.</p>
        <p>Pareto-optimal Multi-objective Optimization. The concept of Pareto optimality belongs
to the mainstream in the domain of multi-objective optimization. Pareto optimality from the
viewpoint of maximization optimization problem may be defined as follows. Let S = {} is
the set of solutions of multi-objective optimization problem,  = 1, 2, . . . , ,  = { },  =
1, 2, . . . ,  is the set of objective functions. Every solution is characterized by vector fi =
{ ()}. One feasible solution fi is said to dominate another feasible solution fk if and only if
 () ≥  () for all  = 1, 2, . . . ,  and  () &gt;  () for least one objective function
,  ∈ {1, 2, . . . , }. A solution is said to be Pareto optimal if it is not dominated by any
other solution. A Pareto optimal solution cannot be improved with respect to any objective
function without worsening value at least one other objective function. The set of all feasible
non-dominated solutions is referred to as the Pareto optimal set, and for a given Pareto optimal
set, the corresponding objective function values in the objective space are called the Pareto
front.</p>
        <p>Evolutionary algorithms belong to the Evolutionary computation, the set of global
optimization methods that use the evolution of solutions. The first known evolutionary algorithm is
genetic algorithm, which realizes a probabilistic optimization technique based on the biological
principles of evolution:
−
−
−
encoding every solution as the string of symbols from certain alphabet (chromosome);
using of a set (population) of solutions that evolves to one solution or to a subset of
solutions corresponding to the extreme value of the certain quality criterion;
applying various types of selecting the better solutions and (genetic) operators for
manipulating solutions in the form of mutation and crossover of chromosomes.</p>
        <p>The first two features of genetic and evolutionary algorithms determine their efectiveness in
solving multi-objective optimization problems.</p>
        <p>Encoding solutions as chromosomes allows one to simulate solutions of various problems.
For example in clustering, chromosomes can directly represent clusters.</p>
        <p>Using population of chromosomes is suitable for creating Pareto optimal solutions. There are
several well-known multi-objective evolutionary algorithms (MOEAs) focused on obtaining
Pareto optimal solutions: Niched Pareto Genetic Algorithm (NPGA), Strength Pareto
Evolutionary Algorithm (SPEA), Non-dominated Sorting Genetic Algorithm (NSGA), and others reviewed
in [11].</p>
        <p>The mentioned algorithms are used in solving various problems of multi-objective optimization
in engineering, business and science. They are also used in data clustering [12]. In this work,
the use MOEAs in multimodal clustering of formal contexts represents their new application.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Multimodal Clustering Problem in FCA</title>
        <sec id="sec-2-2-1">
          <title>In FCA, multimodal clustering is formulated as follows.</title>
          <p>
            If  ⊆ 1 × 2 × · · · ×  is a relation on data domains 1, 2, . . . ,  then formal context
is an  + 1 set:
(
            <xref ref-type="bibr" rid="ref1">1</xref>
            )
(
            <xref ref-type="bibr" rid="ref2">2</xref>
            )
(
            <xref ref-type="bibr" rid="ref3">3</xref>
            )
where  ⊆ . Multimodal clusters on the context (
            <xref ref-type="bibr" rid="ref1">1</xref>
            ) are n – sets
          </p>
          <p>K =&lt; 1, 2, . . . , ,  &gt;</p>
          <p>C =&lt; 1, 2, . . . ,  &gt;
which have the closure property [4]:</p>
          <p>
            ∀ = (1, 2, . . . , ) ∈ 1, 2, . . . , ,  ∈ 
∀ = 1, 2, . . . , , ∀ ∈  ∖  &lt; 1, . . . ,  ∩ {1}, . . . ,  &gt; does not satisfy (
            <xref ref-type="bibr" rid="ref3">3</xref>
            ).
          </p>
          <p>
            A multimodal cluster is a subset in the form of combinations of elements from diferent sets .
It is also defined as a closed n-set [3] since the closure property (
            <xref ref-type="bibr" rid="ref3">3</xref>
            ) provides its “self-suficiency”:
it cannot be enlarged without violating closure property.
          </p>
          <p>Formal concepts on multimodal formal context are those multimodal clusters where for all
 = (1, 2, . . . , ) ∈ 1, 2, . . . , ,  ∈  and  is maximally possible. In other words,
they are the largest possible k-dimensional hypercubes completely filled with units. The concept
of the density of a multimodal cluster is introduced in FCA and formal concepts are interpreted
as absolutely dense clusters [3].</p>
          <p>There are some practical arguments in favour of studying multimodal clusters as none dense
concepts additionally to studying formal concepts. Non-dense multimodal clusters can contain
important information. For example, the very fact that there are certain data instances in a
subset of a cluster may be an indicator of the importance of this fact. If the cluster is not dense,
then to find the rest of the data that is combined with the found instance, one need to refer to
the formal context. However, the search in this case will be limited by the size of the found
cluster.</p>
          <p>The Need of Multi-objective Optimization. In addition to the density of clusters, their
other characteristics of volume, modality, diversity and coverage have been introduced [6]. These
characteristics illustrate the quality of multimodal clustering and in some cases help to interpret
the contents of clusters.</p>
          <p>Having a set of clustering quality parameters, the multimodal clustering problem is
formulated as an optimization problem in which the extremum of the criterion based on mentioned
characteristics is searched for [5, 6]. In fact, some of these characteristics, for example, the
volume of clusters and their density, form conflicting criteria.</p>
          <p>Therefore, multimodal clustering on formal context may be formulated as a multi-objective
optimization problem.</p>
          <p>There are directions in FCA in which the construction of multimodal clusters is associated
with the solution of optimization problems [6, 7].</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Experimental Framework</title>
      <sec id="sec-3-1">
        <title>Consider the main functional elements of the proposed framework.</title>
        <sec id="sec-3-1-1">
          <title>3.1. Evolutionary Multi-Objective Algorithm for Multimodal Clustering</title>
          <p>The basis of our system is evolutionary multi-objective algorithm for multimodal clustering. The
algorithm uses Evolutionary computation. Evolutionary approach is applied in Pareto-optimal
optimization.</p>
          <p>Our algorithm is based on the NSGA-II algorithm [13], which was adapted for clustering. We
also expanded it with functions for visualization Pareto fronts.</p>
          <p>The algorithm is shown on the Fig. 1. As any evolutionary algorithm, it contains functions
being characteristic for genetic algorithms.</p>
          <p>doSelection function realizes selection chromosomes according to the selection method. There
are proportional, random universal, tournament and truncation selection methods realized in the
algorithm. The specific selection method is picked through the user interface.</p>
          <p>The doMultipleCrossover function, in addition to performing a crossover, accesses the original
tensor in order to filter out the wrong chromosomes. We have also provided the crossover mode
which is performed only in certain sections of chromosomes.</p>
          <p>Encoding scheme. Encoding chromosomes is core element in evolutionary algorithms. After
analyzing the several variants of chromosome encoding [12], we settled on the binary scheme
organized according to the principle "one chromosome – one cluster". If formal context has
modality n then a chromosome has n modal sections. In the sections, a number of gene is the
number of an element of corresponding set in multimodal context. The units in the chromosome
representing the cluster denote the elements included in this cluster. This binary encoding
scheme is not compact because for large contexts with high modality the chromosomes will be
very long. Nevertheless, in the task of clustering, it is much more convenient to work with such
chromosomes than with chromosomes with more compact length. Explicit representation of
clusters in the form of separate chromosomes does not require additional computations, which
are necessary for other encodings. In addition, handling large binary strings is not a problem.</p>
          <p>
            The following characteristics of multimodal clusters are used in the clustering algorithm.
Cluster density and volume. For a cluster (
            <xref ref-type="bibr" rid="ref2">2</xref>
            ) its density is defined as
(C) = | ∩ (1 × 2 × · · · ×
          </p>
          <p>|1| × | 2| × · · · × |
and volume of a cluster has the following form
v(C) = |1| × | 2| × · · · × |
)|
|
|</p>
          <p>Cluster density and volume are contradictory criteria for cluster quality. A large and dense
cluster is interesting because combinations of elements of its subsets set a property that manifests
itself on a large number of elements and, possibly, means a regularity. However, often the
clustered data is sparse and the existence of large and dense clusters on them is unlikely.
Therefore, when selecting clusters, a trade-of between density and volume is provided by the
algorithm.
(4)
(5)</p>
          <p>Coverage and diversity. These two cluster characteristics were discussed and defined for
triclustering problem in [6]. They also have generalized for multimodal clustering. Coverage
is defined as a fraction of the tuples of the context included in at least one of the multimodal
clusters. This can be defined by analogy with the definition in [6]:
 (Ω) =</p>
          <p>∑︁
(1,2,...,)∈
[(1, 2, . . . , ) ∈</p>
          <p>⋃︁
(1,2,...,)∈Ω
(1 × 2 × · · · ×
)/||],
(6)
where Ω is a set of multimodal clusters.</p>
          <p>The data of the sets that make up the cluster modalities have diferent meanings. Sometimes
it is important to control the coverage of the context by some subset of the cluster. In this case,
in the expression (6), instead of a whole tuple (1, 2, . . . , ) one of its elements is used.</p>
          <p>The definition of cluster diversity given in [6] is valid for multimodal clusters:
 (Ω) = 1 −
∑︀ ∑︀  (Ω , Ω  )
 &lt;
|Ω|(|Ω|− 1)
2
,
(7)
where  (Ω , Ω  ) is an intersection function which is equal to 1 if clusters Ω , Ω  intersect at
least one of their subsets and 0 otherwise.</p>
          <p>Elitist Nondominated Sorting. Evolution of solutions in the evolutionary algorithm is
performed by applying genetic operators of selection, mutation and crossover to chromosome
population. If the probability of mutation and crossover is high enough and the crossover is
not tied to the peculiarities of chromosome encoding, then the algorithm performs random
uncontrolled walks in the search space. By this way, the algorithm may explore most of the
search space to find the global extremum of the fitness function. However, such walks reduce
the convergence of the algorithm and, in principle, do not exclude its cycling in the regions of
local extrema. Moreover, when calculating the Pareto front, random walks can lead to a "loss
of the front", when the constructed Pareto front is destroyed at the next step of evolution. To
exclude such phenomena we apply elitism [12, 13]. Elitism may be considered as an operator
which preserves the better of parent and child solutions (or populations or Pareto fronts) so that
a previously found better solution is never deleted. In the case of Pareto optimization, elitism
is associated with dominance, and it is necessary to preserve not individual solutions, but, if
possible, the entire front. In the MOEAs, elitism is realized as nondominated sorting [13].</p>
        </sec>
        <sec id="sec-3-1-2">
          <title>3.2. Framework Realization</title>
          <p>Considered framework is currently realized as desktop PC application with the use of some
elements of Wolfram Mathematica™ environment. We use several Mathematica kernels for
parallel computation. Since parallelization is natural for evolutionary algorithms, it can be
realized on Mathematica kernels and helps to increase computing performance.</p>
          <p>Java technology is also applied in the framework. Json data format is used for representing
multidimensional formal contexts. We also plan to apply Java in future Web realization of the
framework.</p>
          <p>The framework uses program interface (API) Mathematica – Java and user interface with
visualization Pareto fronts during evolution of computation.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experiments</title>
      <p>To demonstrate functionalities of the framework we present some results of multimodal
clustering on the several data sets.</p>
      <p>Data sets. The first data set contains data about ofenses committed by juveniles [14]. We
selected this data set to be able to compare our results with the results of triclustering performed
by FCA algorithm of Data-Peeler [4]. This data set contains 30 objects which are the ofense
names, 7 attributes being the age group (m/f) which had the most amount of the certain sort
ofense, and 23 conditions being the years when ofenses took place. Tricontext is presented as
690 incidents in the form {ofense name, age group, year }. There are 79 formal concepts acquired
from the context by Data-Peeler algorithm.</p>
      <p>Other data sets are five tensors of dimensions from 2 to 6 generated randomly on the set
{1, 2, . . . , 10}. They are used in experiments to study the scaling of the algorithm.</p>
      <p>Comparison with Data-Peeler. Using the juvenile ofenses data set from [14] we have
the possibility to compare the results of evolutionary clustering with the results of acquiring
formal concepts from this data set performed by Data-Peeler algorithm [4]. The results of the
comparison are as follows.</p>
      <p>Juvenile ofenses data have a feature that manifests itself in the multimodal clustering. All
formal concepts found by Data-Peeler algorithm, with the exception of concepts having empty
subsets of elements, contain unique attribute values denoting the gender and age of juveniles.
Most of these concepts, namely 46 ones, contain attribute m_17 denoting boys of 17 years old.
On the Fig. 3 a) there are examples of three such formal concepts. They are intersecting over
subsets of objects and conditions and may be united into clusters having a larger volume and
containing the same information as the set of small clusters.</p>
      <p>Our algorithm, following the principle of increasing the volume of the cluster, finds namely
these, coarse-grained clusters. Example of such cluster is shown on the Fig. 3 b). It demonstrates
the fact that 17 years old boys had all types of ofenses at all the years of observation.</p>
      <p>Guided evolution. Our algorithm finds coarse-grained clusters of high density, which at the
same time have the maximum volume. However, formal concepts of small volume containing
no more than one element in one or several subsets are of particular interest. Among the formal
concepts in the juvenile ofenses data set there is the following one: Runaway, f_16, (2007, 2009,
2010). This cluster cannot be found when the algorithm is configured to the maximum density
and volume of clusters. It is necessary either to look for clusters with a minimum volume
and maximum density, or to use the evolution control tools inherent in the algorithm. In our
case, we supplement the principle of nondominated sorting with additional conditions for the
preservation of chromosomes containing only one unit in the first and second sections. This
requires running a separate experiment, in which previously found clusters may be lost, but
the desired clusters of small volume are found, reflecting the unique features of the data. Fig. 4
illustrates the result of applying guided evolution on the previous example of a single formal
concept.</p>
      <p>Scaling the algorithm. The NSGA-II algorithm has computational complexity of ( *  2)
where  is the number of objectives and  is the population size [13]. In the problem of
multimodal clustering on formal contexts, it is useful to estimate the performance of algorithms
depending on the dimension of the formal context. For our algorithm, in which the dimension
of the context determines the size of the chromosomes, such estimates are especially important.</p>
      <p>Fig. 5 shows the results of testing algorithm on the randomly generated formal contexts
having dimensions from 2 to 6. The population size was constant equal to 100 chromosomes,
mutation probability was 0.01.</p>
      <p>The execution time is not an objective performance characteristic of algorithms, especially
for evolutionary algorithms which have a population size as a parameter. However, we use
the execution time to roughly estimate the scaling of the algorithm. The dependence of the
execution time on the dimension of the formal context on the Fig. 5 is approximately estimated
as 10( − 2) where  is dimension on formal context.</p>
      <p>The number of steps required to construct the Pareto front has no explicit dependence on
the dimension of the context. This corresponds to the well-known properties of evolutionary
algorithms, in which the number of evolution steps randomly depends on several parameters of
the algorithm.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>Evolutionary algorithms have certain advantages in implementing Pareto-optimal solutions to
multi-objective optimization problems. Among them, there is one important which consists in
the fact that the presence of a population of solutions supported by the algorithm allows one to
naturally organize the formation of the Pareto front.</p>
      <p>In this paper, we propose two innovations, which may be interested in FCA community.</p>
      <p>The first innovation is the application of multi-objective optimization for the construction of
multimodal clusters on formal contexts.</p>
      <p>The second innovation is the ability to control the process of building multimodal clusters
through the use of an evolutionary optimization algorithms.</p>
      <p>In the future research, we plan to perform a deeper comparison of the work of the evolutionary
algorithm with other well-known FCA algorithms [3].</p>
      <p>Also we plan to explore several other encodings in evolutionary algorithm to exclude the
appearance of extra chromosomes in the population.</p>
      <p>We hope that the modeling framework proposed here would be useful for the FCA community.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>The reported study was funded by Russian Foundation of Basic Research, the research project
№ 20-07-00055 and RFBR and Tula Region research project № 19-47-710007. Our special thanks
is for Robin Gruna for visualization of Pareto fronts.
[4] Cerf, L., Besson, J., Robardet, C., Boulicaut, J.F.: Closed Patterns Meet N-ary Relations. In:</p>
      <p>ACM Trans. Knowl. Discov. Data. 3, 1, Article 3, 36 p. (2009)
[5] Ignatov D. I., Semenov A., Komissarova D. V., Gnatyshak D. V. Multimodal Clustering for
Community Detection, in: Formal Concept Analysis of Social Networks / Ed. by R. Missaoui,
S. Kuznetsov, S. Obiedkov. Springer, P. 59-96. (2017)
[6] Ignatov D. I., Gnatyshak D. V., Kuznetsov, S. O., Mirkin, B., G.: Triadic Formal Concept</p>
      <p>Analysis and triclustering: searching for optimal patterns. Mach. Learn. 101:271–302. (2015)
[7] Mirkin, B. G., Kramarenko, A. V.: Approximate bicluster and tricluster boxes in the analysis
of binary data. In: Rough sets, fuzzy sets, data mining and granular computing, LNCS, Vol.
6743, pp. 248–256. (2011)
[8] Ignatov D. I., Egurnov D. Triclustring Toolbox, in: Supplementary Proceedings ICFCA 2019</p>
      <p>Conference and Workshops Vol. 2378. CEUR Workshop Proceedings, 2019. P. 65-69. (2019)
[9] Neznanov A., Ilvovsky D., Kuznetsov S. FCART: A New FCA-based System for Data Analysis
and Knowledge Discovery, in: Contributions to the 11th International Conference on Formal
Concept Analysis. Dresden : Qucoza,. P. 31-44. (2013)
[10] Mikhail Bogatyrev, Dmitry Orlov and Tatyana Shestaka. Multimodal Clustering with
Evolutionary Algorithms. Proc. of the 9th Int. Workshop "What can FCA do for Artificial
Intelligence?" co-located with the 30th Int. Joint Conference on Artificial Intelligence (IJCAI
2021). Montréal, Québec, Canada, August 21, 2021. CEUR Proceedings, Vol. 2972. Pp. 71-86.
(2021)
[11] Li, K., Wang, R., Zhang, T., et al. Evolutionary many-objective optimization: A comparative
study of the state-of-the-art. IEEE Access, vol. 6, pp. 26194-26214. (2018)
[12] Hruschka E., Campello R., Freitas A., de Carballo A.: A Survey of Evolutionary Algorithms
for Clustering. IEEE Transactions on Evolutionary Computation. V. 39. P. 133–155. (2009)
[13] Deb, A. Pratap, Agrawal, S. and Meyarivan, T.: A fast and elitist multiobjective genetic
algorithm: NSGA-II. IEEE Trans. on Evolutionary Computation, vol. 6, pp. 182–197. (2002)
[14] Kis, L.L., Sacarea, C., Troanca, D.: FCA Tools Bundle - a Tool that Enables Dyadic and
Triadic Conceptual Navigation. In: Proceedings of the 5th International Workshop “What
can FCA do for Artificial Intelligence?” Co-located with the European Conference on
Artificial Intelligence, The Hague, The Netherlands, 30 August 2016, pp. 42–50 (2016)</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Voutsadakis</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <article-title>Polyadic concept analysis</article-title>
          .
          <source>- Order</source>
          . Vol.
          <volume>19</volume>
          (
          <issue>3</issue>
          ). Pp.
          <volume>295</volume>
          -
          <fpage>304</fpage>
          . (
          <year>2002</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Ganter</surname>
          </string-name>
          , Bernhard; Stumme, Gerd; Wille, Rudolf, eds.,
          <source>Formal Concept Analysis: Foundations and Applications, Lecture Notes in Artificial Intelligence, No. 3626</source>
          , Springer-Verlag. Berlin (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Dmitry</surname>
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Gnatyshak</surname>
            ,
            <given-names>Dmitry I. Ignatov</given-names>
          </string-name>
          , Sergei O.
          <article-title>Kuznetsov: From Triadic FCA to Triclustering: Experimental Comparison of Some Triclustering Algorithms</article-title>
          .
          <source>In: Proceedings of the Tenth International Conference on Concept Lattices and Their Applications</source>
          (CLA'
          <year>2013</year>
          ), La Rochelle: Laboratory L3i, University of La Rochelle, pp.
          <fpage>249</fpage>
          -
          <lpage>260</lpage>
          . (
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>