=Paper= {{Paper |id=None |storemode=property |title=Generation Algorithm of a Concept Lattice with Limited Object Access |pdfUrl=https://ceur-ws.org/Vol-959/paper16.pdf |volume=Vol-959 |dblpUrl=https://dblp.org/rec/conf/cla/DemkoB11 }} ==Generation Algorithm of a Concept Lattice with Limited Object Access== https://ceur-ws.org/Vol-959/paper16.pdf
    Generation algorithm of a concept lattice with
                limited object access

                           Ch. DemkoF, K. Bertet

      L3I - Université de La Rochelle - av Michel Crépeau - 17042 La Rochelle
                             cdemko,kbertet@univ-lr.fr
                       F Joomla! Production Leadership Team
                            christophe.demko@joomla.org



      Abstract. Classical algorithms for generating the concept lattice (C, ≤
      ) of a binary table (O, I, R) have a complexity in O(|C| ∗ |I|2 ∗ |O|).
      Although the number of concepts is exponential in the size of the table
      in the worst case, the generation of a concept is output polynomial. In
      practice, the number of concepts is often polynomial in the size of the
      table. However, the cost of generating a concept remains high when the
      table is composed of a large number of objects.
      We propose in this paper an algorithm for generating the lattice with
      limited object access, which can improve the computation time. Experi-
      ments were conducted with Joomla!, a content management system based
      on relational algebra, and located on a MySQL database.


    keywords: concept lattice ; databases ; algorithm


1    Introduction
Galois lattices (or concept lattices) were first introduced in a formal way in the
graph and ordered structures theory [1–3]. Later, they were developed in the field
of Formal Concept Analysis (FCA) [4] for data analysis and classification. The
concept lattice structure, based on the notion of concept, enables data description
while preserving its diversity. It is used to analyse data when organised by a
binary relation between objects and attributes.
    Galois lattice is a graph providing a representation of all the possible cor-
respondences between a set of objects (or examples) O and a set of attributes
(or features) I. The technological improvements of the last decades enable use
of these structures for data mining problems though they are exponential in
space/time (worst case). It has to be noted that in practice, in most cases, the
size of the lattice remains reasonable.
    In addition, some applications offer to only generate some concepts from
the huge amount of available data. Bordat’s algorithm [5] is the more appro-
priate since it generates the cover relation between concepts, and thus allows
an on-demand generation of concepts. Moreover, huge amount of data are of-
ten described by a huge amount of objects. It is the case in databases where
sophisticated key-indexation techniques are used to improve object access.
    In this paper, we propose the Limited Object Access algorithm (LOA algo-
rithm), an extension of Bordat’s immediate successors generation with a limited
access to objects. This algorithm, compounded with an on-demand strategy, and
with sophisticated key-indexation techniques to improve objects’s access, aims
to improve time computation for a large amount of objects. However, worst case
theoretical complexity remains the same as Bordat’s algorithm. Experiments
were conducted with Joomla!, a content management system based on relational
algebra, and located on a MySQL database.
    This paper is organized as follows. In section 2, we describe the Galois lattice
structure and the Bordat’s generation algorithm. In section 3, we describe our
limited object access algorithm, illustrated by an example and some experiments.

2     Description and generation of a concept lattice
2.1    Description of a concept lattice
The concept lattice is a particular graph defined and generated from a relation R
between objects O and attributes I. This graph is composed of a set of concepts
ordered by a relation verifying the properties of a lattice, i.e. an order relation
≤ (transitive, reflexive and antisymmetric relation) such that, for each pair of
concepts in the graph, there exists both a lower bound and an upper bound.
Therefore, a lattice contains a minimum (resp. maximum) element according to
the relation ≤ called the bottom (resp. top) of the lattice. The Hasse diagram of
a graph [1] is the cover relation of ≤ denoted as ≺, i.e. the suppression on the
graph of both transitivity and reflexivity edges.
   We associate to a set of objects A ⊆ O the set f (A) of attributes in relation
R with the objects of A:
                          f (A) = {y ∈ I | xRy ∀ x ∈ A}
Dually, to a set of attributes B ⊆ I, we define the set g(B) of objects in relation
with the attributes of B:
                          g(B) = {x ∈ O | xRy ∀ y ∈ B}
These two functions f and g defined between objects and attributes form a Galois
correspondence. The relation between the set of objects and the set of attributes
is described by a formal context. A formal context C is a triplet C = (O, I, R)
(or C = (O, I, (f, g))) represented by a table.
    A formal concept represents maximal objects-attributes correspondences (fol-
lowing relation R) by a pair (A, B) with A ⊆ O and B ⊆ I, which verifies
f (A) = B and g(B) = A. The whole set of formal concepts thus corresponds to
all the possible maximal correspondences between a set of objects O and a set
of attributes I.
    Two formal concepts (A1 , B1 ) and (A2 , B2 ) are in relation in the lattice when
they verify the following inclusion property:
                                            A2 ⊆ A1
                (A1 , B1 ) ≤ (A2 , B2 ) ⇔
                                            (equivalent to B1 ⊆ B2 )
    The whole set of formal concepts fitted out by the order relation ≤ is called
concept lattice or Galois lattice because it verifies the lattice properties: the
relation ≤ is clearly an order relation, and for each pair of concepts (A1 , B1 ) and
(A2 , B2 ), there exists the greatest lower bound (resp. the least upper bound)
called meet (resp. join) denoted (A1 , B1 ) ∧ (A2 , B2 ) (resp. (A1 , B1 ) ∨ (A2 , B2 ))
defined by:
                   (A1 , B1 ) ∧ (A2 , B2 ) = (g(B1 ∩ B2 ), (B1 ∩ B2 ))              (1)

                  (A1 , B1 ) ∨ (A2 , B2 ) = ((A1 ∩ A2 ), f (A1 ∩ A2 ))              (2)
The concepts ⊥ = (O, f (O)) and > = (g(I), I) respectively correspond to the
bottom and the top of the concept lattice.
   In formal concept analysis (FCA) concept lattices are used to analyse data
when organised by a binary relation between objects and attributes. See the
book of Ganter and Wille [4] for a more complete description of formal concept
analysis.
   In the following, we abuse notation and use X + x (respectively, X \ x) for
X ∪ {x} (respectively, X\{x}).


2.2   Generation algorithms of a concept lattice

Numerous generation algorithms for concept lattices have been proposed in lit-
erature [6,7,5,8]. Although all these algorithms generate the same lattice, they
propose different strategies. Some of these algorithms are incremental [6,9]. Gan-
ter’s NextClosure [7] is the reference algorithm that determines the concepts in
lectical order (next, the concepts may be ordered by ≤ to form the concept lat-
tice) while Bordat’s algorithm [5] is the first algorithm that computes directly
the Hasse diagram of the lattice. Recent work [10] proposed a generic algorithm
unifying the existing algorithms in a unique framework, which makes easier the
comparison of these algorithms. A formal and experimental comparative study
of the different algorithms has been published [11].
    All of these proposed algorithms have a polynomial complexity with respect
to the number of concepts (at best quadratic in [8]). The complexity is therefore
determined by the size of the lattice, this size being bounded by 2|O+I| in the
worst case and by |O + I| in the best case. Studies on average complexity are
difficult to carry out because the size of the concept lattice depends both on the
dimensionality of the data to classify and on their organization and diversity.
However, in practice the size of the Galois lattice generally remains reasonable.
    Some applications offer to only generate some concepts from the huge amount
of available data. Bordat’s algorithm [5] is the more appropriate since it generates
the cover relation between concepts, and thus allows an on-demand generation
of concepts usually used in concrete applications. Bordat’s algorithm is issued
from a corollary of Bordat’s theorem:

Theorem 1 (Bordat [5]). Let (A, B) and (A0 , B 0 ) be two concepts of a context
(O, I, R). Then (A, B) ≺ (A0 , B 0 ) if and only if A0 is inclusion-maximal in the
following set system FA defined on O1 :
                             FA = {g(x + B) : x ∈ I\B}                           (3)
Corollary 2 (Bordat [5]). Let (A, B) be a concept. There is a one-to-one map-
ping between the immediate successors of (A, B) in the Hasse diagram of the
lattice and the inclusion-maximal subsets of FA .
    Bordat’s algorithm recursively computes all the concepts of C by computing
immediate successors for each concept (A, B), starting from the bottom concept
⊥ = (f (G), G), until all concepts are generated. Immediate successors are gen-
erated using Corollary 2 in O(|I|2 ∗ |O|): the set system has first to be generated
in a linear time ; then inclusion-maximal subsets of FB , can easily be computed
in O(|I|2 ∗ |O|).

3     Limited Object Access Algorithm (LOA)
3.1     Description of the LOA Algorithm
Large data are often described by a huge amount of objects, as in databases for
example where the number of recordings (i.e. objects) can be huge, indexed using
sophisticated key-indexation techniques. In this section, we describe our Limited
Object Access algorithm (LOA algorithm), an extension of Bordat’s immediate
successors generation with a limited object access. This algorithm, compounded
with an on-demand strategy aims to improve time computation for large amount
of objects.
    Our algorithm considers the restriction of a concept lattice to the attributes.
A nice result establishes that any concept lattice (C, ≤C ) is isomorphic to the
lattice (CI , ⊆) defined on the set I of attributes, with CI the restriction of C to
the attributes in each concept. The lattice (CI , ⊆) is also known as the closed
sets lattice on the attributes I of a context (O, I, R), where the set system CI is
composed of all closed set - i.e. fixed points - for the closure operator ϕ = g ◦ f .
See the survey of Caspard and Monjardet [12] for more details about closed set
lattices.
    Using the closed sets lattice (CI , ⊆) instead of the whole concept lattice
(C, ≤C ) gives raise to a storage improvement, for example in case of large amount
of objects.
    A closed sets lattice can be generated using an algorithm similar to Bordat’s
algorithm, and therefore enabling an on-demand generation in order to reduce
the whole amount of closed sets. This algorithm (see Alg. 1) recursively computes
immediate successors (see Alg. 2) of a closed set B, starting from the bottom
closed set ⊥ = ϕ(∅), until I is generated.
    The Immediates Successors LOA algorithm we propose (see Alg. 3) rein-
forces the object access limitation by considering the cardinality of the subset
g(X + B) instead of the subset itself to compute the inclusion-maximal subsets
of FA using the following property:
1
    In [5], the equivalent formulation g(x) ∩ A is used instead of g(x + B)
Proposition 3. Consider a concept (A, B), and two subsets X and Y of at-
tributes in B\I. Then
          g(X + B) ⊆ g(Y + B) ⇐⇒ |g(X + B)| = |g(X + Y + B)|                  (4)
This proposition is a direct consequence of the two following remarks:
1. The equivalence between inclusion and intersection set operations (C ⊆
   D ⇐⇒ C = C ∩ B) allows to deduce the equivalence between g(X + B) ⊆
   g(Y + B) and g(X + B) = g(X + B) ∩ g(Y + B):
2. Then, by definition of g, we have g(X + B) ∩ g(Y + B) = g(X + Y + B).
    More precisely, the Immediates Successors LOA algorithm (see Alg. 3) first
initialize the set Succ of immediate successors of a closed set B with the emp-
tyset. The set Succ is then updated by considering each attribute x of I\B and
another already inserted potential successor X ⊆ I\B by considering the fol-
lowing four cases, where cB (Y ) denotes the cardinality of g(B + Y ) for a Y of
attributes:
Merge x with X: When g(x + B) = g(X + B), then x and X belongs to the
   same closed set, and thus have to be merged in a same potential successor
   of B. By Proposition 3, this case is tested by cB (X + x) = cB (X) and
   cB (X) = cB (x).
Eliminate X: When g(X + B) ⊂ g(x + B), then the closed set containing X
   isn’t inclusion-maximal in FA , and thus hasn’t to be considered as a potential
   successor of B. By Proposition 3, this case is tested by cB (X + x) = cB (X)
   and cB (X) < cB (x).
Eliminate x: When g(x + B) ⊂ g(X + B), then the closed set containing x
   isn’t inclusion-maximal if FA , and thus hasn’t to be considered as a potential
   successor of B. By Proposition 3, this case is tested by cB (X + x) = cB (X)
   and cB (x) < cB (X).
Insert x: When x is neither eliminated or merged with X, then it is added as
   a potential successor of B ; another attribute is then considered.


3.2   Example
To illustrate our algorithm, we use the following context where numbers from 1
to 9 are described by some properties: the number is a prime number, an odd or
even number, a square, a composite number or a factorial number.
                 (p)rime o(dd) (e)ven (s)quare (c)omposite (f)actorial
          nb 1             ×             ×                     ×
          nb 2      ×            ×                             ×
          nb 3      ×      ×
          nb 4                   ×       ×          ×
          nb 5      ×      ×
          nb 6                   ×                  ×          ×
          nb 7      ×      ×
          nb 8                   ×                  ×
          nb 9             ×             ×          ×
   Name: Closed Set Lattice
   Data: A context K = (O, I, R)
   Result: The Hasse diagram (CI , ≺) of the lattice (CI , ⊆)
   begin
      CI = {f (O)};
      foreach B ∈ CI not marked do
          SuccB =Immediates successors (K, B);
          foreach X ∈ SuccB do
              B 0 = B + X;
              if B 0 6∈ CI then add B 0 to CI ;
              add a cover relation B ≺ B 0
          end
          mark B
      end
      return (CI , ≺)
   end
Algorithm 1: Generation of the Hasse diagram of the closed set lattice (CI , ⊆)




   Name: Immediates Successors
   Data: A context K ; A closed set B of the closed set lattice (CI , ⊆) of K
   Result: The immediate successors of B in the lattice
   begin
      initialize the set system FA with ∅;
      foreach x ∈ I\B do
           add g(x + B) to FA
      end
      Succ=maximal inclusion subsets of FA ;
      return Succ
   end
Algorithm 2: Generation of the immediate successors of a closed set in the Hasse
diagram of the lattice (CI , ⊆)
   Name: Immediates Successors LOA
   Data: A context K ; A closed set B of the closed set lattice (CI , ⊆) of K
   Result: The immediate successors of B in the lattice
   begin
      initialize the SuccB family to an empty set;
      foreach x ∈ I \ B do
           add = true;
           foreach X ∈ SuccB do
               \\ Merge x and X in the same potential successor
               if cB (x) = cB (X) then
                   if cB (X + x) = cB (x) then
                       replace X by X + x in SuccB ;
                       add=false; break;
                   end
               end
               \\ Eliminate x as potential successor
               if cB (x) < cB (X) then
                   if cB (X + x) = cB (x) then
                       add=false; break;
                   end
               end
               \\ Eliminate X as potential successor
               if cB (x) > cB (X) then
                   if cB (X + x) = cB (X) then
                       delete X from SuccB
                   end
               end
           end
           \\ Insert x as a new potential successor ;
           if add then add {x} to SuccB
      end
      return SuccB ;
   end
Algorithm 3: Generation of the immediate successors of a closed set in the Hasse
diagram of the lattice (CI , ⊆)
                                      Fig. 1. Concept lattice


    Figure 1 gives the concept lattice of this context. When the algorithm com-
putes the successors of the closed sets e (resp. p), it proceeds as described in
Table 1 (resp. Table 2). The different steps of these two examples show the
different actions taken by the algorithm.

 SuccF         x   cB (x)   X     cB (X) cB (x + X) Case                        Action
 ∅             p   1                                                            Insert [p]
 {[p]}         o   0        [p]   1       0        cB (x + X) = cB (x) < cB (X) Eliminate [o]
 {[p]}         s   1        [p]   1       0        cB (x + X) < cB (X) = cB (x)
 {[p]}         c   3        [p]   1       0        cB (x+X) < cB (X) < cB (x) Insert [c]
 {[p], [c]}    f   2        [p]   1       1        cB (x + X) = cB (X) < cB (x) Eliminate [p]
 {[c]}         f   2        [c]   3       1        cB (x+X) < cB (x) < cB (X) Insert [f ]
 {[c], [f ]}

                            Table 1. Immediate successors of [e]




3.3      Complexity
The complexity of computing the immediate successors of a closed set B using
the Immediates Successors LOA algorithm is:
                            (|I| − |B|)(|I| − |B|)
                                                   ∗ O(cB (X))
                                      2
 SuccF          x   cB (x)   X     cB (X) cB (x + X) Case                        Action
 ∅              o   3                                                            Insert [o]
 {[o]}          e   1        [o]   3     0          cB (x+X) < cB (x) < cB (X) Insert [e]
 {[o], [e]}     s   0        [o]   3     0          cB (x + X) = cB (x) < cB (X) Eliminate [s]
 {[o], [e]}     c   0        [o]   3     0          cB (x + X) = cB (x) < cB (X) Eliminate [c]
 {[o], [e]}     f   1        [o]   3     0          cB (x+X) < cB (x) < cB (X)
 {[o], [e]}     f   1        [e]   1     1          cB (x + X) = cB (x) = cB (X) Merge [e], [f ]
 {[o], [ef ]}

                             Table 2. Immediate successors of [p]




      which leads to
                                   O((|I| − |B|)2 ∗ O(cB (X)))
using the big O notation.
    This has to be compared with O(|I|2 ∗ |O|) of the Immediates Successors
algorithm. In addition the cost O(CB (x)) of computing the cardinality of objects
satisfying the required properties can be based on multiple keys and robust
algorithms used in databases that do not need to load all data for computing a
cardinality.


3.4     Experimentations

In the experiment, we use a dataset composed of:

54 attributes: the 6 attributes of the example, and attributes corresponding
   to the property to be multiple of {3 − 50}.
100000 objects: the integers between 1 and 100000

   The dataset is stored in a database MySQL 5.5.14. We have implemented
our Immediates Successors LOA algorithm using PhP 5.3.6. The counting of
objects satisfying a set of properties is realised by the SQL request comparing
indexes with a constant:

select count (*) from att1=1 and att2=1

    We compare the processing time of our Immediates Successors LOA algo-
rithm in the two following cases:

Indexed: Each attribute is defined to be an index. Objects are indexed by their
   attributes, and MySQL can quickly retrieve them in the dataset using a B-
   tree indexation with a logarithmic complexity [13]: O(cB (X)) = O(log|I|).
Not indexed: Objects are not indexed and a scan of all the lines is neces-
   sary to retrieve objects. The complexity is then similar to those of the
   Immediates Successors algorithm: O(cB (X)) = O(|I| − |B|).

   We compare the processing time of computing the immediate successors of
the botom element ∅ in this two cases (indexed and not indexed):
            (a) 100000 first integers as objects ; a number of attributes
            between 6 to 54.




                   (b) 54 attributes ; integers between 1000 and 100000


                 Fig. 2. Calculating the immediate successors of ∅



Fig.2(a): for the 100000 first integers as objects, and a number of attributes
   between 6 to 54.
Fig.2(b): for the 54 attributes, and integer between 1000 and 100000.

In the results, the time processing is really improved with an indexed dataset,
and seems to be near to linear in O(|I| + |O|). Immediate successors of the ∅ for
100000 objects and 54 attributes are computed in 3 seconds with the indexed
algorithm, and in 18 seconds with the not indexed one.
   Moreover, the explain of the count operation shows that an index-merge
operation is realized on indexes corresponding to an intersect computation:

mysql> explain select count(*) from numbers where p=1 and o=1;
+----+-------------+------+---------+------+-----------------------+
| id | select_type | key | key_len | rows | Extra                  |
+----+-------------+------+---------+------+-----------------------+
| 1 | index_merge | p,o | 1,1       |    2 | Using intersect(p,o); |
|    |             |      |         |      | Using where;          |
|    |             |      |         |      | Using index           |
+----+-------------+------+---------+------+-----------------------+
1 row in set (0.00 sec)


   Therefore, optimizing the intersection operation, with an adaptated sort on
the lines for example, would be a possible optimization of our algorithm.


4   Conclusion

In this paper, we described a new algorithm for computing the immediate succes-
sors of a concept using the counting of objects satisfying a set of properties. By
separating the counting from the rest of the algorithm, new systems for explor-
ing concept lattices can now rely on optimization algorithms used in relational
databases. If the tests we will realize on PostgreSQL and MySQL databases are
successfull in terms of manipulating a huge amounts of data, we plan to propose
a library for extending content management system such as Joomla!.


References

 1. Birkhoff, G.: Lattice theory. 3d edn. American Mathematical Society (1967)
 2. Barbut, M., Monjardet, B.: Ordres et classifications : Algèbre et combinatoire.
    Hachette, Paris (1970) 2 tomes.
 3. Davey, B., Priestley, H.: Introduction to lattices and orders. 2nd edn. Cambridge
    University Press (1991)
 4. Ganter, B., Wille, R.: Formal Concept Analysis, Mathematical foundations.
    Springer Verlag, Berlin (1999)
 5. Bordat, J.: Calcul pratique du treillis de Galois d’une correspondance. Math. Sci.
    Hum. 96 (1986) 31–47
 6. Norris, E.: An algorithm for computing the maximal rectangles in a binary relation.
    Revue Roumaine de Mathématiques Pures et Appliquées 23 (1978)
 7. Ganter, B.: Two basic algorithms in concept analysis. Technische Hochschule
    Darmstadt (Preprint 831) (1984)
 8. Nourine, L., Raynaud, O.: A fast algorithm for building lattices. Information
    Processing Letters 71 (1999) 199–204
 9. Gödin, R., Missaoui, R., Alaoui, H.: Incremental concept formation algorithms
    based on Galois (concept) lattices. Computational Intelligence 11 (1995) 246–267
10. Gely, A.: A generic algorithm for generating closed sets of binary relation. Third
    International Conference on Formal Concept Analysis (ICFCA 2005) (2005) 223–
    234
11. Kuznetsov, S., Obiedkov, S.: Comparing performance of algorithms for generating
    concept lattices. Journal of Experimental and Theorical Artificial Intelligence 14
    (2002) 189–216
12. Caspard, N., Monjardet, B.: The lattice of closure systems, closure operators and
    implicational systems on a finite set: a survey. Discrete Applied Mathematics 127
    (2003) 241–269
13. Bayer, R. et McCreight, E.M.: Organization and maintenance of large ordered
    indexes. Acta Informatica 1 (1972) 173–189