<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Comparing Performance of Algorithms for Generating Concept Lattices</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Sergei O. Kuznetsov</string-name>
          <email>serge@viniti.ru</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sergei A. Ob''edkov</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>All-Russia Institute for Scientific and Technical Information (VINITI)</institution>
          ,
          <addr-line>Moscow</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Russian State University for the Humanities</institution>
          ,
          <addr-line>Moscow</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <fpage>35</fpage>
      <lpage>47</lpage>
      <abstract>
        <p>Several algorithms that generate the set of all formal concepts and diagram graphs of concept lattices are considered. Some modifications of wellknown algorithms are proposed. Algorithmic complexity of the algorithms is studied both theoretically (in the worst case) and experimentally. Conditions of preferable use of some algorithms are given in terms of density/sparseness of underlying formal contexts. Principles of comparing practical performance of algorithms are discussed.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>
        Concept (Galois) lattices proved to be a useful tool in many applied domains:
machine learning, data mining and knowledge discovery, information retrieval, etc. [
        <xref ref-type="bibr" rid="ref22 ref23 ref3 ref6 ref9">3, 6,
9, 22, 23</xref>
        ]. The problem of generating the set of all concepts and the concept lattice of
a formal context is extensively studied in the literature [
        <xref ref-type="bibr" rid="ref11 ref13 ref16 ref18 ref19 ref2 ref20 ref21 ref22 ref3 ref4 ref5 ref7">2-5, 7, 11, 13, 16, 18–22</xref>
        ]. It is
known that the number of concepts can be exponential in the size of the input context
(e.g., when the lattice is a Boolean one) and the problem of determining this number
is #P-complete [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ].
      </p>
      <p>
        Therefore, from the standpoint of the worst-case complexity, an algorithm
generating all concepts and/or a concept lattice can be considered optimal if it generates the
lattice with polynomial time delay and space linear in the number of all concepts
(modulo some factor polynomial in the input size). First, we give some standard
definitions of Formal Concept Analysis (FCA) [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>
        A formal context is a triple of sets (G, M, I), where G is called a set of objects, M is
called a set of attributes, and I ⊆ G × M. For A ⊆ G and B ⊆ M: A' = {m ∈ M | ∀g∈A
(gIm)}, B' = {g ∈ G | ∀m∈B (gIm)}. A formal concept of a formal context (G, M, I) is
a pair (A, B), where A ⊆ G, B ⊆ M, A' = B, and B' = A. The set A is called the extent,
and the set B is called the intent of the concept (A, B). For a context (G, M, I), a
concept X = (A, B) is less general than or equal to a concept Y = (C, D) (or X ≤ Y) if A ⊆
C or, equivalently, D ⊆ B. For two concepts X and Y such that X ≤ Y and there is no
concept Z with Z ≠ X, Z ≠ Y, X ≤ Z ≤ Y, the concept X is called a lower neighbor of Y,
and Y is called an upper neighbor of X. This relationship is denoted by X Y. We call
the (directed) graph of this relation a diagram graph. A plane (not necessarily a
planar) embedding of a diagram graph where a concept has larger vertical coordinate
than that of any of its lower neighbors is called a line (Hasse) diagram. The problem
of drawing line diagrams [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] is not discussed here.
      </p>
      <p>
        The problem of comparing performance of algorithms for constructing concept
lattices and their diagram graphs is a challenging and multifaceted one. The first
comparative study of several algorithms constructing the concept set and diagram graphs
can be found in [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. However, the formulation of the algorithms is not always
correct, and the description of the results of the experimental tests lacks any information
about data used for tests. The fact that the choice of the algorithm should be
dependent on the input data is not accounted for. Besides, only one of the algorithms
considered in [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], namely that of Bordat [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], constructs the diagram graph; thus, it is hard
to compare its time complexity with that of the other algorithms.
      </p>
      <p>
        A later review with more algorithms and more information on experimental data
can be found in [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. Only algorithms generating diagram graphs are considered. The
algorithms that were not originally designed for this purpose are extended by the
authors to generate diagram graphs. Unfortunately, such extensions are not always
effective: for example, the time complexity of the version of the Ganter algorithm
(called Ganter-Allaoui) dramatically increases with the growth of the context size.
This drawback can be cancelled by the efficient use of binary search in the list
produced by the original Ganter algorithm. Tests were conducted only for contexts with
small number of attributes per object as compared to the number of all attributes. Our
experiments (we consider some other algorithms, e.g., that of Nourine [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ]) also show
that the algorithm proposed in [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] works faster on such contexts than the others do.
However, in other situations not covered in [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] this algorithm is far behind some
other algorithms.
      </p>
      <p>The rest of the paper is organized as follows. In Section 2, we discuss the
principles of comparing efficiency of algorithms and make an attempt at their classification.
In Section 3, we give a short review of the algorithms and analyze their worst-case
complexity. In Section 4, we present the results of experimental comparison.</p>
    </sec>
    <sec id="sec-2">
      <title>2 On Principles of Comparison</title>
      <p>
        In our study, we considered both theoretical (worst-case) and experimental
complexity of algorithms. As for the worst-case upper bounds, the algorithms with complexity
linear in the number of concepts (modulo a factor polynomial of the input size) are
better than those with complexity quadratic in the number of concepts; and the former
group can be subdivided into smaller groups according to the form of the factor
polynomial of input. According to this criterion, the present champion is the algorithm by
Nourine [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ]. On the other hand, “dense” contexts, which realize the worst case by
bringing about exponential number of concepts, may occur not often in practice.
      </p>
      <p>Starting a comparison of algorithms “in practice”, we face a bunch of problems.
First, algorithms, as described by their authors, often allow for different interpretation
of crucial details, such as the test of uniqueness of a generated concept. Second,
authors seldom describe exactly data structures and their realizations. Third, algorithms
behave differently on different databases (contexts). Sometimes authors compare their
algorithms with other on specific data sets. We would like to propose the community
to reach a consensus w.r.t. databases to be used as testbeds. Our idea is to consider
two types of testbeds. On the one hand, some “classical” (well-recognized in data
analysis community) databases should be used, with clearly defined scalings if they
are many-valued. On the other hand, we propose to use “randomly generated
contexts”. The main parameters of a context K = (G, M, I) seem here to be the (relative to
|M|) number of objects |G| and the (relative to |G|) number of attributes, the (relative,
i.e. compared to |G||M|) size of the relation I, average number of attributes per object
intent (resp., average number of objects per attribute extent). The community should
specify particular type(s) of random context generator(s) that can be tuned by the
choice of above (or some other) parameters.</p>
      <p>Another major difficulty resides in the choice of a programming language and
platform, which strongly affects the performance. A possible way of avoiding this
difficulty could be comparing not the time but number of specified operations
(intersections, unions, closures, etc.) from a certain library, but here one encounters the
difficulty of weighting these operations in order to get the overall performance. Much
simpler would be comparing algorithms using a single platform.</p>
      <p>In this article, we compare performance of several algorithms for clearly specified
random data sets (contexts). As for ambiguities in original pseudo-code formulations
of the algorithms, we tried to find most efficient realizations for them. Of course, this
does not guarantee that a more efficient realization cannot be found.</p>
      <p>
        In most cases, it was possible to improve the original versions of the algorithms.
Since only few known algorithms generating the concept set construct also the
diagram graph, we attempted to modify some algorithms making them able to construct
diagram graphs. The versions of algorithms used for comparison are presented in
[
        <xref ref-type="bibr" rid="ref17">17</xref>
        ].
      </p>
      <p>
        As mentioned above, data structures that realize concept sets and diagram graphs
of concept lattices are of great importance. Since their sizes can be exponential w.r.t.
the input size, some their natural representations are not polynomially equivalent, as it
is in the case of graphs. For example, the size of the incidence matrix of a diagram
graph is quadratic w.r.t. the size of the incidence list of the diagram graph and thus
cannot be reduced to the latter in time polynomial w.r.t. the input. Moreover, some
important operations, such as finding a concept, are performed for some
representations (spanning trees [
        <xref ref-type="bibr" rid="ref10 ref2">2, 10</xref>
        ], ordered lists [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], CbO trees [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], 2-3 trees, see [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] for the
definition) in polynomial time, but for some other representations (unordered lists)
they can be performed only in exponential time. A representation of a concept lattice
can be considered reasonable if its size cannot be exponentially compressed w.r.t. the
input and allows the search for a particular concept in time polynomial in the input.
      </p>
      <p>
        Table 1 presents an attempt at classification of algorithms. Note that this
classification refers to our versions of the algorithms described in [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] rather than to original
versions (except for Titanic [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ], which we have not implemented; it is included into
classification, because it realizes an approach completely different from that of the
other algorithms). Here, we do not address techniques for building diagram graphs;
the attributes of the context in Table 1 describe only construction of the concept set.
      </p>
      <p>
        All the algorithms can be divided into two categories: incremental algorithms [
        <xref ref-type="bibr" rid="ref11 ref20 ref3 ref5">3, 5,
11, 20</xref>
        ], which, at the ith step, produce the concept set or the diagram graph for i first
objects of the context, and batch ones, which build the concept set and its diagram
graph for the whole context from scratch [
        <xref ref-type="bibr" rid="ref16 ref18 ref2 ref24 ref4 ref7">2, 4, 7, 16, 18, 24</xref>
        ]. Besides, any batch
algorithm typically adheres to one of the two strategies: top–down (from the maximal
extent to the minimal one) or bottom–up (from the minimal extent to the maximal one).
However, it is always possible to reverse the strategy of the algorithm by considering
attributes instead of objects and vice versa; therefore, we choose not to include this
property into the classification.
      </p>
      <p>
        Generation of the concept set presents two main problems: (1) how to generate all
concepts; (2) how to avoid repetitive generation of the same concept or, at least, to
determine whether a concept is generated for the first time. There are several ways to
generate a new intent. Some algorithms (in particular, incremental ones) intersect a
generated intent with some object intent. Other algorithms compute an intent
explicitly intersecting all objects of the corresponding extent. There are algorithms that,
starting from object intents, create new intents by intersecting already obtained
intents. Lastly, the algorithm from [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] does not use the intersection operation to
generate intents. It forms new intents by adding attributes to those already generated and
tests some condition on supports of attribute sets (a support of an attribute set is the
number of objects whose intents contain all attributes from this set) to realize whether
an attribute set is an intent.
      </p>
      <p>In Table 1, attributes m2–m6 correspond to techniques used to avoid repetitive
generation of the same concept. This can be done by maintaining specific data
structures. For example, the Nourine algorithm constructs a tree of concepts and searches
in this tree for every newly generated concept. Note that other algorithms (e.g.,
Bordat and Close by One) also may use trees for storing concepts, which allows efficient
search for a concept when the diagram graph is to be constructed. However, these
algorithms use other techniques for identifying the first generation of a concept, and,
therefore, they do not have the m5 attribute in the context from Table 1.</p>
      <p>
        Some algorithms divide the set of all concepts into disjoint sets, which allows
narrowing down the search space. For example, the Chein algorithm stores concepts in
layers, each layer corresponding to some step of the algorithm. The original version of
this algorithm looks through the current layer each time a new concept is generated.
The version we used for comparison does not involve search to detect duplicate
concepts; instead, it employs a canonicity test based on the lexicographical order (similar
to that of Ganter), which made it possible to greatly improve the efficiency of the
algorithm. We use layers only for generation of concepts: a new intent is produced as
the intersection of two intents from the same layer. (In our version of the algorithm,
layers are much smaller than those in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]; see [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] for details.) The Godin algorithm
uses a hash function (the cardinality of intents), which makes it possible to distribute
concepts among “buckets” and to reduce the search. Several algorithms (Ganter,
Close by One) generate concepts in the lexicographical order of their extents
assuming that there is a linear order on the set of objects. At each step of the algorithm,
there is a current object. The generation of a concept is considered canonical if its
extent contains no object preceding the current object. Our implementation of the
Bordat algorithm uses an attribute cache: the uniqueness of a concept is tested by
intersecting its intent with the content of the cache (for more details, see [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]).
      </p>
      <sec id="sec-2-1">
        <title>Close by One Bordat Ganter</title>
      </sec>
      <sec id="sec-2-2">
        <title>Lindig</title>
        <p>Chein
Nourine</p>
        <p>Norris</p>
        <p>Godin
Dowling
Titanic</p>
        <p>X
X
X
X</p>
        <p>X
X
X
X
X</p>
        <p>X
X</p>
        <p>X</p>
        <p>X
X
m6
X
m7
X
X</p>
        <p>X</p>
        <p>X
X
X
X
X
X</p>
        <p>X</p>
        <p>
          In many cases, we attempted to improve the efficiency of the original algorithms.
Only some of the original versions of the algorithms construct the diagram graph [
          <xref ref-type="bibr" rid="ref11 ref18 ref2 ref21">2,
11, 18, 21</xref>
          ]; it turned out that the other algorithms could be extended to construct the
diagram graph within the same worst-case time complexity bounds.
        </p>
        <p>
          In the next section, we will discuss worst-case complexity bounds of the
considered algorithms. Since the output size can be exponential in the input, it is reasonable
to estimate complexity of the algorithms not only in terms of input and output sizes,
but also in terms of (cumulative) delay. Recall that an algorithm for listing a family of
combinatorial structures is said to have polynomial delay [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] if it executes at most
polynomially many computation steps before either outputting each next structure or
halting. An algorithm is said to have a cumulative delay d [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] if it is the case that at
any point of time in any execution of the algorithm with any input p the total number
of computation steps that have been executed is at most d(p) plus the product of d(p)
and the number of structures that have been output so far. If d(p) can be bounded by a
polynomial of p, the algorithm is said to have a polynomial cumulative delay.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3 Algorithms: a Short Survey</title>
      <p>
        Some top-down algorithms have been proposed in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ]. The algorithm
MItree from [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ] generates the concept set, but does not build the diagram graph. In
MItree, every new concept is searched for in the set of all concepts generated so far. The
algorithm of Bordat [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] uses a tree (a “trie,” cf. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]) for fast storing and retrieval of
concepts. Our version of this algorithm uses a technique that requires O(|M|) time to
realize whether a concept is generated for the first time without any search. The time
complexity of Bordat is O(|G||M|2|L|), where |L| is the size of the concept lattice.
Moreover, this algorithm has a polynomial delay O(|G||M|2).
      </p>
      <p>The algorithm proposed by Ganter computes closures for only some of subsets of
G and uses an efficient canonicity test, which does not address the list of generated
concepts. It produces the set of all concepts in time O(|G|2|M||L|) and has polynomial
delay O(|G|2|M|).</p>
      <p>The Close by One (CbO) algorithm uses a similar notion of canonicity, a similar
method for selecting subsets, and an intermediate structure that helps to compute
closures more efficiently using the generated concepts. Its time complexity is
O(|G|2|M||L|), and its polynomial delay is O(|G|3|M|).</p>
      <p>
        The idea of a bottom-up algorithm in [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] is to generate the bottom concept and
then, for each concept that is generated for the first time, generate all its upper
neighbors. Lindig uses a tree of concepts that allows one to check whether some
concept was generated earlier. The time complexity of the algorithm is O(|G|2|M||L|). Its
polynomial delay is O(|G|2|M|).
      </p>
      <p>
        The Chein [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] algorithm represents the objects by extent–intent pairs and generates
each new concept intent as the intersection of intents of two existent concepts. At
every iteration step of the Chein algorithm, a new layer of concepts is created by
intersecting pairs of concept intents from the current layer and the new intent is
searched for in the new layer. We introduced several modifications that made it
possible to greatly improve the performance of the algorithm. The time complexity of the
modified algorithm is O(|G|3|M||L|). The algorithm has polynomial delay O(|G|3|M|).
      </p>
      <p>Due to their incremental nature, the algorithms considered below do not have
polynomial delay. Nevertheless, they all have cumulative polynomial delay.</p>
      <p>
        Nourine proposes an O((|G| + |M|)|G||L|) algorithm for the construction of the
lattice using a lexicographic tree [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] with edges labeled by attributes and nodes labeled
by concepts. Note that this algorithm is only half-incremental. First, this algorithm
incrementally constructs the concept set outputting a tree of concepts; next, it uses
this tree to construct the diagram graph.
      </p>
      <p>
        The algorithm proposed by Norris [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] is essentially an incremental version of the
CbO algorithm. The original version of the Norris algorithm from [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] does not
construct the diagram graph. The time complexity of the algorithm is O(|G|2|M||L|).
      </p>
      <p>
        The algorithm proposed by Godin [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] has the worst-case time complexity
quadratic in the number of concepts. This algorithm is based on the use of an efficiently
computable hash function f (which is actually the cardinality of an intent) defined on
the set of concepts.
      </p>
      <p>
        Dowling proposed [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] an incremental algorithm for computing knowledge spaces.
A dual formulation of the algorithm allows generation of the concept set. Despite the
fact that the theoretical worst-case complexity of the algorithm is O(|M||G|2|L|), the
constant in this upper bound seems to be too large and in practice the algorithm
performs worse than other algorithms.
      </p>
    </sec>
    <sec id="sec-4">
      <title>4 Results of Experimental Tests</title>
      <p>
        The algorithms were implemented in C++ in the Microsoft Visual C++ environment.
The tests were run on a Pentium II–300 computer, 256 MB RAM. Here, we present a
number of charts that show how the execution time of the algorithms depends on
various parameters. More charts can be found in [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ].
      </p>
      <p>35000
s30000
d
on25000
c
e
is20000
l
l
im15000
n
ie10000
iTm5000</p>
      <p>0
CbO</p>
      <sec id="sec-4-1">
        <title>Godin</title>
      </sec>
      <sec id="sec-4-2">
        <title>Ganter</title>
      </sec>
      <sec id="sec-4-3">
        <title>Lindig</title>
      </sec>
      <sec id="sec-4-4">
        <title>Norris</title>
      </sec>
      <sec id="sec-4-5">
        <title>Nourine Bordat</title>
        <p>For tests, we used randomly generated data. Contexts were generated based on
three parameters: |G|, |M|, and the number of attributes per object (denoted below as
|g'|; all objects of the same context had equal numbers of attributes). Given |g'|, every
row of the context (i.e., every object intent) was generated by successively calling the
rand function from the standard C library to obtain the numbers of attributes
constituting the object intent, which lead to uniform distribution.
20
30
40
50
60
70
80
90</p>
        <p>100
|G|</p>
      </sec>
      <sec id="sec-4-6">
        <title>Chein</title>
      </sec>
      <sec id="sec-4-7">
        <title>Bordat</title>
        <p>CbO</p>
      </sec>
      <sec id="sec-4-8">
        <title>GodinEx</title>
      </sec>
      <sec id="sec-4-9">
        <title>Ganter</title>
      </sec>
      <sec id="sec-4-10">
        <title>Godin</title>
      </sec>
      <sec id="sec-4-11">
        <title>Norris</title>
      </sec>
      <sec id="sec-4-12">
        <title>Dowling</title>
        <p>The Godin algorithm (and GodinEx, which is the version of the Godin algorithm
using the cardinality of extents for the hash function) is a good choice in the case of a
sparse context. However, when contexts become denser, its performance decreases
dramatically. The Bordat algorithm seems most suitable for large contexts, especially
200000
s180000
d160000
n
o140000
c
ie120000
s
l
il 100000
m80000
in 60000
e
im40000
T 20000
0
10
20
30
40
50
60
70
80
90
100
CbO</p>
      </sec>
      <sec id="sec-4-13">
        <title>Godin</title>
        <p>|G|</p>
      </sec>
      <sec id="sec-4-14">
        <title>Ganter</title>
      </sec>
      <sec id="sec-4-15">
        <title>Lindig</title>
      </sec>
      <sec id="sec-4-16">
        <title>Norris</title>
      </sec>
      <sec id="sec-4-17">
        <title>Nourine</title>
      </sec>
      <sec id="sec-4-18">
        <title>Bordat</title>
        <p>if it is necessary to build the diagram graph. When |G| is small, the Bordat algorithm
runs several times slower than other algorithms, but, as |G| grows, the difference
between Bordat and other algorithms becomes smaller, and, in many cases, Bordat
finally turns out to be the leader. For large and dense contexts, the fastest algorithms
are bottom-up canonicity-based algorithms (Norris, CbO).</p>
        <p>10
20
30</p>
        <p>40
|G|</p>
      </sec>
      <sec id="sec-4-19">
        <title>Chein</title>
      </sec>
      <sec id="sec-4-20">
        <title>Bordat</title>
        <p>CbO</p>
      </sec>
      <sec id="sec-4-21">
        <title>GodinEx</title>
      </sec>
      <sec id="sec-4-22">
        <title>Ganter</title>
      </sec>
      <sec id="sec-4-23">
        <title>Godin</title>
      </sec>
      <sec id="sec-4-24">
        <title>Norris</title>
      </sec>
      <sec id="sec-4-25">
        <title>Dowling</title>
        <p>It should be noted that the Nourine algorithm featuring the smallest time
complexity, has not been the fastest algorithm: even when diagonal contexts of the form (G,
G,≠) (which corresponds to the worst case) are processed, its performance was
inferior to the Norris algorithm. Probably, this can be accounted to the fact that we
represent attribute sets by bit strings, which allows very efficient implementation of
settheoretical operations (32 attributes per one processor cycle); whereas searching in the
200000
s180000
d160000
n
o140000
c
ie120000
s
l
il 100000
m80000
in 60000
e
im40000
T 20000
0
10
20
30</p>
        <p>40
CbO</p>
      </sec>
      <sec id="sec-4-26">
        <title>Godin</title>
        <p>|G|</p>
      </sec>
      <sec id="sec-4-27">
        <title>Ganter</title>
      </sec>
      <sec id="sec-4-28">
        <title>Lindig</title>
      </sec>
      <sec id="sec-4-29">
        <title>Norris</title>
      </sec>
      <sec id="sec-4-30">
        <title>Nourine Bordat</title>
        <p>Figures 8–9 show the execution time for the contexts of the form (G, G, ≠), which
yield 2|G| concepts.
Nourine-style lexicographic tree, one still should individually consider each attribute
labeling edges.</p>
        <p>10
11
12
13
15
16
17</p>
        <p>18
14
|G| = |M|</p>
      </sec>
      <sec id="sec-4-31">
        <title>Chein</title>
      </sec>
      <sec id="sec-4-32">
        <title>Bordat</title>
        <p>CbO</p>
      </sec>
      <sec id="sec-4-33">
        <title>GodinEx</title>
      </sec>
      <sec id="sec-4-34">
        <title>Ganter</title>
      </sec>
      <sec id="sec-4-35">
        <title>Godin</title>
      </sec>
      <sec id="sec-4-36">
        <title>Norris</title>
      </sec>
      <sec id="sec-4-37">
        <title>Dowling</title>
        <p>14
|G| = |M|
CbO</p>
      </sec>
      <sec id="sec-4-38">
        <title>Godin</title>
      </sec>
      <sec id="sec-4-39">
        <title>Ganter</title>
      </sec>
      <sec id="sec-4-40">
        <title>Lindig</title>
      </sec>
      <sec id="sec-4-41">
        <title>Norris</title>
      </sec>
      <sec id="sec-4-42">
        <title>Nourine Bordat</title>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5 Conclusion</title>
      <p>In this work, we attempted to compare, both theoretically and experimentally, some
well-known algorithms for constructing concept lattices. We discussed principles of
experimental comparison.</p>
      <p>
        A new algorithm was proposed in [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] quite recently, so we could not include it in
our experiments. Its worst time complexity is not better than that of the algorithms
described above, but the authors report on its good practical performance for
databases with very large number of objects. Comparing the performance of this
algorithm with those considered above and testing the algorithms on large databases,
including “classical” ones, will be the subject of the further work. We can also mention
works [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] where similar algorithms were applied for machine learning and data
analysis, e.g., in [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] a Bordat-type algorithm was used.
      </p>
      <p>The choice of an algorithm for construction of the concept lattice should be based
on the properties of input data. Recommendations based on our experiments are as
follows: the Godin algorithm should be used for small and sparse contexts; for dense
contexts, the algorithms based on the canonicity test, linear in the number of input
objects, such as Close by One and Norris, should be used. Bordat performs well on
contexts of average density, especially, when the diagram graph is to be constructed.
Of course, these recommendations should not be considered as the final judgement.
By this work, we would like rather to provoke further interest in well-substantiated
comparison of algorithms that generate concept lattices.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Aho</surname>
            ,
            <given-names>A.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hopcroft</surname>
            ,
            <given-names>J.E.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Ullmann</surname>
            ,
            <given-names>J.D.</given-names>
          </string-name>
          ,
          <source>Data Structures and Algorithms</source>
          , Reading, Addison-Wesley,
          <year>1983</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Bordat</surname>
            ,
            <given-names>J.P.</given-names>
          </string-name>
          , Calcul pratique du treillis de Galois d'une correspondance,
          <source>Math. Sci. Hum</source>
          .,
          <year>1986</year>
          , no.
          <issue>96</issue>
          , pp.
          <fpage>31</fpage>
          -
          <lpage>47</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Carpineto</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Romano</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <article-title>A Lattice Conceptual Clustering System</article-title>
          and Its Application to Browsing Retrieval,
          <source>Machine Learning</source>
          ,
          <year>1996</year>
          , no.
          <issue>24</issue>
          , pp.
          <fpage>95</fpage>
          -
          <lpage>122</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Chein</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <article-title>Algorithme de recherche des sous-matrices premières d'une matrice</article-title>
          ,
          <source>Bull. Math. Soc. Sci. Math. R.S. Roumanie</source>
          ,
          <year>1969</year>
          , no.
          <issue>13</issue>
          , pp.
          <fpage>21</fpage>
          -
          <lpage>25</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Dowling</surname>
            ,
            <given-names>C.E.</given-names>
          </string-name>
          ,
          <article-title>On the Irredundant Generation of Knowledge Spaces</article-title>
          , J. Math. Psych.,
          <year>1993</year>
          , vol.
          <volume>37</volume>
          , no.
          <issue>1</issue>
          , pp.
          <fpage>49</fpage>
          -
          <lpage>62</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Finn</surname>
            ,
            <given-names>V.K.</given-names>
          </string-name>
          ,
          <source>Plausible Reasoning in Systems of JSM Type, Itogi Nauki i Tekhniki</source>
          , Ser. Informatika,
          <year>1991</year>
          , vol.
          <volume>15</volume>
          , pp.
          <fpage>54</fpage>
          -
          <lpage>101</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Ganter</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <article-title>Two Basic Algorithms in Concept Analysis</article-title>
          ,
          <fpage>FB4</fpage>
          -Preprint No.
          <volume>831</volume>
          ,
          <string-name>
            <given-names>TH</given-names>
            <surname>Darmstadt</surname>
          </string-name>
          ,
          <year>1984</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Ganter</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Wille</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <source>Formal Concept Analysis. Mathematical Foundations</source>
          , Springer,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Ganter</surname>
            <given-names>B.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Kuznetsov</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <source>Formalizing Hypotheses with Concepts</source>
          ,
          <source>Proc. of the 8th International Conference on Conceptual Structures, ICCS 2000, Lecture Notes in Artificial Intelligence</source>
          , vol.
          <year>1867</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Ganter</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Reuter</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Finding All Closed Sets: A General Approach</surname>
          </string-name>
          , Order,
          <year>1991</year>
          , vol.
          <volume>8</volume>
          , pp.
          <fpage>283</fpage>
          -
          <lpage>290</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Godin</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Missaoui</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Alaoui</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <source>Incremental Concept Formation Algorithms Based on Galois Lattices, Computation Intelligence</source>
          ,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Goldberg</surname>
            ,
            <given-names>L.A.</given-names>
          </string-name>
          ,
          <source>Efficient Algorithms for Listing Combinatorial Structures</source>
          , Cambridge University Press,
          <year>1993</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Guénoche</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <article-title>Construction du treillis de Galois d'une relation binaire</article-title>
          ,
          <source>Math. Inf. Sci. Hum</source>
          .,
          <year>1990</year>
          , no.
          <issue>109</issue>
          , pp.
          <fpage>41</fpage>
          -
          <lpage>53</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Johnson</surname>
            ,
            <given-names>D.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yannakakis</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Papadimitriou</surname>
            ,
            <given-names>C.H.</given-names>
          </string-name>
          ,
          <article-title>On Generating all Maximal Independent Sets, Inf</article-title>
          .
          <source>Proc. Let</source>
          .,
          <year>1988</year>
          , vol.
          <volume>27</volume>
          ,
          <fpage>119</fpage>
          -
          <lpage>123</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Kuznetsov</surname>
            ,
            <given-names>S.O.</given-names>
          </string-name>
          ,
          <article-title>Interpretation on Graphs and Complexity Characteristics of a Search for Specific Patterns</article-title>
          ,
          <source>Automatic Documentation and Mathematical Linguistics</source>
          , vol.
          <volume>24</volume>
          , no.
          <issue>1</issue>
          (
          <year>1989</year>
          )
          <fpage>37</fpage>
          -
          <lpage>45</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Kuznetsov</surname>
            ,
            <given-names>S.O.</given-names>
          </string-name>
          ,
          <article-title>A Fast Algorithm for Computing All Intersections of Objects in a Finite Semi-lattice</article-title>
          ,
          <source>Automatic Documentation and Mathematical Linguistics</source>
          , vol.
          <volume>27</volume>
          , no.
          <issue>5</issue>
          (
          <year>1993</year>
          )
          <fpage>11</fpage>
          -
          <lpage>21</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Kuznetsov</surname>
            ,
            <given-names>S.O.</given-names>
          </string-name>
          and Ob''edkov
          <string-name>
            <surname>S.A.</surname>
          </string-name>
          ,
          <article-title>Algorithm for the Construction of the Set of All Concepts and Their Line Diagram</article-title>
          ,
          <source>Preprint MATH-Al-05</source>
          , TU-Dresden,
          <year>June 2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Lindig</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <article-title>Algorithmen zur Begriffsanalyse und ihre Anwendung bei Softwarebibliotheken, (Dr</article-title>
          .-
          <source>Ing.) Dissertation</source>
          , Techn.
          <source>Univ. Braunschweig</source>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <given-names>Mephu</given-names>
            <surname>Nguifo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            and
            <surname>Njiwoua</surname>
          </string-name>
          ,
          <string-name>
            <surname>P.</surname>
          </string-name>
          ,
          <article-title>Using Lattice-Based Framework As a Tool for Feature Extraction, in Feature Extraction, Construction and Selection: A Data Mining Perspective</article-title>
          , H. Liu and H. Motoda, Eds., Kluwer,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Norris</surname>
            ,
            <given-names>E.M.,</given-names>
          </string-name>
          <article-title>An Algorithm for Computing the Maximal Rectangles in a Binary Relation</article-title>
          ,
          <string-name>
            <surname>Revue Roumaine de Mathématiques Pures</surname>
          </string-name>
          et Appliquées,
          <year>1978</year>
          , no.
          <issue>23</issue>
          (
          <issue>2</issue>
          ), pp.
          <fpage>243</fpage>
          -
          <lpage>250</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Nourine</surname>
            <given-names>L.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Raynaud</surname>
            <given-names>O.</given-names>
          </string-name>
          ,
          <article-title>A Fast Algorithm for Building Lattices</article-title>
          ,
          <source>Information Processing Letters</source>
          , vol.
          <volume>71</volume>
          ,
          <year>1999</year>
          ,
          <fpage>199</fpage>
          -
          <lpage>204</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Stumme</surname>
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Taouil</surname>
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bastide</surname>
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pasquier</surname>
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lakhal</surname>
            <given-names>L.</given-names>
          </string-name>
          ,
          <article-title>Fast Computation of Concept Lattices Using Data Mining Techniques</article-title>
          ,
          <source>in Proc. 7th Int. Workshop on Knowledge Representation Meets Databases (KRDB</source>
          <year>2000</year>
          )
          <volume>129</volume>
          -
          <fpage>139</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Stumme</surname>
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wille</surname>
            <given-names>R.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Wille</surname>
            <given-names>U.</given-names>
          </string-name>
          ,
          <article-title>Conceptual Knowledge Discovery in Databases Using Formal Concept Analysis Methods</article-title>
          ,
          <source>in Proc. 2nd European Symposium on Principles of Data Mining and Knowledge Discovery (PKDD'98).</source>
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Zabezhailo</surname>
            ,
            <given-names>M.I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ivashko</surname>
            ,
            <given-names>V.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kuznetsov</surname>
            ,
            <given-names>S.O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mikheenkova</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Khazanovskii</surname>
            ,
            <given-names>K.P.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Anshakov</surname>
            ,
            <given-names>O.M.</given-names>
          </string-name>
          ,
          <article-title>Algorithms and Programs of the JSM-Method of Automatic Hypothesis Generation</article-title>
          ,
          <source>Automatic Documentation and Mathematical Linguistics</source>
          , vol.
          <volume>21</volume>
          , no.
          <issue>5</issue>
          (
          <issue>1987</issue>
          )
          <fpage>1</fpage>
          -
          <lpage>14</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>