=Paper=
{{Paper
|id=None
|storemode=property
|title=Information Retrieval by On-line Navigation in the Latticial Space-search of a Database, with Limited Objects Access
|pdfUrl=https://ceur-ws.org/Vol-939/paper5.pdf
|volume=Vol-939
|dblpUrl=https://dblp.org/rec/conf/ecai/DemkoB12
}}
==Information Retrieval by On-line Navigation in the Latticial Space-search of a Database, with Limited Objects Access==
<pdf width="1500px">https://ceur-ws.org/Vol-939/paper5.pdf</pdf>
<pre>
    Information retrieval by on-line navigation in
     the latticial space-search of a database, with
                  limited objects access

                           Ch. DemkoF, K. Bertet

      L3I - Université de La Rochelle - av Michel Crépeau - 17042 La Rochelle
                             cdemko,kbertet@univ-lr.fr
                       F Joomla! Production Leadership Team
                            christophe.demko@joomla.org


       Abstract. We propose in this paper basic operations with limited ac-
       cess to the objects in the table, which can improve the computation
       time. Experiments were conducted with Joomla!, a content management
       system based on relational algebra, and located on a MySQL database.
       This work follows the results presented in [5].


    keywords: concept lattice ; databases ; algorithm ; closure operator


1    Introduction

Galois lattice is a graph providing a representation of all the possible correspon-
dences between a set of objects (or examples) O and a set of binary attributes (or
features) I. Galois lattices (or concept lattices) were first introduced in a formal
way in the graph and ordered structures theory [2,1,4].
    The concept lattice is a rich and flexible navigation structure automatically
derived from the context, and can therefore be considered as a dynamic and
complete space search enables data description while preserving its diversity.
Querying and navigation can be freely combined: to each user request corre-
sponds a concept of the lattice as answer ; the user can then improve its search
either by amending its request, or by on-line browsing arround the concept in
the lattice structure.
    In this paper, we propose an implementation of these basic operations with
Limited Object Access, aiming to improve time computation for a large amount
of objects in large databases.
    This paper is organized as follows. In section 2, we describe the concept
lattice and the closed set lattice. In section 3, we present the motivations that
have conducted this word. In section 4, we describe our basic operations with
limited object access. In section 5, we present some experiments.
2   Concept lattice: definition and generation

Definition. The concept lattice is a particular graph defined and generated from
a a binary table (also denoted a formal context) C described by a relation R
between a set of objects O and a set of attributes I. We associate to a set of
objects A ⊆ O the set f (A) of attributes in relation R with the objects of A:

                          f (A) = {y ∈ I | xRy ∀ x ∈ A}

Dually, to a set of attributes B ⊆ I, we define the set g(B) of objects in relation
with the attributes of B:

                          g(B) = {x ∈ O | xRy ∀ y ∈ B}

These two functions f and g defined between objects and attributes form a
Galois correspondence. A formal concept represents maximal objects-attributes
correspondences (following relation R) by a pair (A, B) with A ⊆ O and B ⊆ I,
which verifies f (A) = B and g(B) = A. The whole set of formal concepts thus
corresponds to all the possible maximal correspondences between a set of objects
O and a set of attributes I. Two formal concepts (A1 , B1 ) and (A2 , B2 ) are in
relation when they verify the following inclusion property:

                                            A2 ⊆ A1
                (A1 , B1 ) ≤ (A2 , B2 ) ⇔
                                            (equivalent to B1 ⊆ B2 )


3   Motivations

The existence of a concept lattice underlying a data table allows to consider an
information retrieval strategy combining querying and navigation by a browsing
in this lattice as in an area of research. Indeed, the user request is a concept of a
lattice, and the user can then improve its search either by amending its request,
or by browsing in the lattice structure. From a computational point of view, such
a mechanism of information retrieval by request and by navigation requires two
main operations:

 1. Generation of the smallest concept (g(B), f (g(B)) containing a given subset
    B of attributes: B is the request, the objects part f (g(B)) of the concept is
    the answer, g(B) are inferred attributes.
 2. Generation of the immediates successors of a given concept (A, B) for a
    browsing in the lattice by computing the inclusion-maximal in the set system
    FA defined on O by FA = {g(x + B) : x ∈ I\B}.

    Large data are often described by a huge amount of objects, as in databases
for example where the number of recordings (i.e. objects) can be huge, indexed
using sophisticated key-indexation techniques. We propose in this paper an im-
provement of these basic operations with two limited objects access strategies:
A storage improvment by considering the restriction of the concept lattice
   to the attributes, namely the closed set lattice.
A computation improvment by considering in the operations the cardinality
   of the subset of objects instead of the subset itself, using the count function.


   Name: Immediates Successors LOA
   Data: A context K ; A closed set B of the closed set lattice (CI , ⊆) of K
   Result: The immediate successors of B in the lattice
   begin
      initialize the SuccB family to an empty set;
      foreach x ∈ I \ B do
           add = true;
           foreach X ∈ SuccB do
               \\ Merge x and X in the same potential successor
               if c(B + x) = c(B + X) then
                   if c(B + X + x) = c(B + x) then
                       replace X by X + x in SuccB ; add=false; break;
                   end
               end
               \\ Eliminate x as potential successor
               if c(B + x) < c(B + X) then
                   if c(B + X + x) = c(B + x) then add=false; break;
               end
               \\ Eliminate X as potential successor
               if c(B + x) > c(B + X) then
                   if c(B + X + x) = c(B + X) then delete X from SuccB
               end
           end
           \\ Insert x as a new potential successor ;
           if add then add {x} to SuccB
      end
      return SuccB ;
   end
Algorithm 1: Generation of the immediate successors of a closed set in the Hasse
diagram of the lattice (CI , ⊆)


Closed set lattice. Instead of a concept lattice, it is possible to consider its
restriction to the attributes in order to limit the storage of huge ammount of
objects in each concept. A nice result establishes that any concept lattice (C, ≤C )
is isomorphic to the lattice (CI , ⊆) defined on the set I of attributes, with CI
the restriction of C to the attributes in each concept. The lattice (CI , ⊆) is also
known as the closed sets lattice on the attributes I of a context (O, I, R), where
the set system CI is composed of all closed set - i.e. fixed points - for the closure
operator ϕ = g ◦ f - i.e. a map that is isotone, extensive and idempotent):

                             CI = {ϕ(X) : X ⊆ I}                                (1)

The closed sets ⊥ = ϕ(∅) = f (O) and > = I respectively correspond to the
bottom and the top of the closed set lattice. See the survey of Caspard and
Monjardet [3] for more details about closed set lattices.
    Therefore, each smallest concept (g(B), f (g(B)) containing a given set B of
attributes is replaced by the closure ϕ(B) = f (g(B) on the attributes. Thus,
only the attributes part is stored.

The count function. Moreover, we propose to reinforce the object access limita-
tion by considering the cardinality of the subset g(B) instead of the subset itself
in the treatment. The count function c associates to any subset X of attributes
the cardinality of the subset g(X): c(X) = |g(X)|
    It corresponds to the notion of support introducing in rules extraction from
data-bases, and is in particular used by Titanic algorithm [6]. We use the count
function c instead of the closure operator ϕ since c and ϕ possesses together nice
properties, ∀X, X 0 ⊆ I:


                                      X ⊆ Y ⇒ c(X) ≥ c(Y )                      (2)
                               ϕ(X) = ϕ(Y ) ⇒ c(X) = c(Y )                      (3)
                   X ⊆ Y and c(X) = c(Y ) ⇒ ϕ(X) = ϕ(Y )                        (4)


4   Basic operations in a lattice with limited objects access
Using the closed sets lattice (CI , ⊆) instead of the whole concept lattice (C, ≤C )
gives raise to a storage improvement since only the attributes part is stored.
Using the count function, the two main operations can therefore be reformulated
as follows:

Generation of the closed-set ϕ(B) of a subset B of attributes. It can be performed
using the following equality:

                   ϕ(B) = B + {x ∈ I\B : c(B) = c(B + x)}                       (5)

This equality is a direct consequence of the third property of c together with the
isotone property of ϕ (i.e. X ⊆ X 0 ⇒ ϕ(X) ⊆ ϕ(X 0 )).

Generation of the immediates successors of a closed set B. A closed sets lattice
can be generated using an algorithm similar to Bordat’s algorithm, where the
immediate successors of a closed set B in the Hasse diagram of the lattice are
the inclusion-minimal subsets of

                          FB = {ϕ(B + x) : x ∈ I\B}                             (6)
   In our previous work [5], we present an incremental algorithm by testing, for
each attribute x of I\B and each already inserted potential successor X ⊆ I\B,
the inclusion between ϕ(B + X) and ϕ(B + x):

1. Merge x with X when ϕ(B + x) = ϕ(B + X).
2. Eliminate X as potential successor of B when ϕ(B + x) ⊂ ϕ(B + X)
3. Eliminate x as potential successor of B when ϕ(B + X) ⊂ ϕ(B + x)
4. Insert x as potential successor of B when x is neither eliminated or merged
   with X.

The inclusion test between ϕ(B + X) and ϕ(B + x) can easily be performed
using the count function c and the following proposition deduced from Prop. 1
in [5]:

Proposition 1. ϕ(B + X) ⊆ ϕ(B + x) ⇐⇒ c(B + X + x) = c(B + X)

⇒: Consider that ϕ(B + X) ⊆ ϕ(B + x). The equivalence between inclusion
and intersection set operations (C ⊆ D ⇐⇒ C = C ∩ D) allows to de-
duce that ϕ(B + X) = ϕ(B + X) ∩ ϕ(B + x). Since ϕ(B + X) ∩ ϕ(B + x) =
ϕ(B + X) ∧ ϕ(B + x) = ϕ(B + X + x), then ϕ(B + X) = ϕ(B + X + x). We
conclude by c(B + X + x) = c(B + X) using the second property of the count
function c (see Eq. 3).
⇐: Consider that c(B + X + x) = c(B + X). By Eq. 4, and since B + X ⊆
B + X + x, we deduce that ϕ(B + X + x) = ϕ(B + X), and we conclude by
ϕ(B + X) ⊆ ϕ(B + x) as above.

    In the case where ϕ(B + X) ⊆ ϕ(B + x), the strict inclusion has then to
be tested in order to decide if x has to be deleted as potential successor, or
merged with X. Using Eq. 2 and Eq. 3, this test can be performed by checking if
c(B+X) > c(B+x) or c(B+X) = c(B+x). The case where ϕ(B+x) ⊆ ϕ(B+X)
is dualy tested in order to decide if X has to be deleted or not as potential
successor.
    The complexity of computing the immediate successors of a closed set B
using the Immediates Successors LOA algorithm is:

                      (|I| − |B|)(|I| − |B|)
                                             ∗ O(c(B + X))
                                2
   which leads to
                         O((|I| − |B|)2 ∗ O(c(B + X)))
using the big O notation.
     This has to be compared with O(|I|2 ∗ |O|) of the Bordat’s algorithm. In
addition the cost O(c( B + x)) of computing the cardinality of objects satisfying
the required properties can be based on multiple keys and robust algorithms
used in databases that do not need to load all data for computing a cardinality
[5].
           (a) 100.000 objects and attributes varying from 10 to 50,
           each object randomly described by 20 % of the attributes


           (b) 100.000 objects and 50 attributes, each object described
           by random attributes varying from 2 to 49


                Fig. 1. Calculating the immediate successors of ∅


5   Experimentations

In the experiment, we use a dataset composed of 100.000 objects described by
a random set of attributes varying from 10 to 50, each objects described by a
random set of attributes. The dataset is stored in a database MySQL 5.5.17.
We have implemented our algorithms using PhP 5.3.8 using a laptop with 8
processors clocked at 1.73GHz and 8Gb of memory. The counting of objects
            (a) Average time of a closure generation for attributes vary-
            ing from 1 to 50 with 100.000 objects and 50 attributes, each
            object described by 10, 20, 30 then 40 random attributes


            (b) closure of attributes with 100 000 objects, 50 attributes,
            each object described by random attributes from 2 to 49


                           Fig. 2. Closures computation


satisfying a set of properties is realised by the SQL request comparing indexes
with a constant:

select count (*) from att1=1 and att2=1


   We compare the processing time of our algorithms in the following cases:
Immediate successors generation: we compute the immediate successors of
   the bottom concept in the two following cases:
    1. 100.000 objects and attributes varying from 10 to 50, each object ran-
       domly described by 20% of the attributes (see Fig. 1(a))
    2. 100.000 objects and 50 attributes, each object described by random at-
       tributes varying from 2 to 49 (see Fig. 1(b))
Closure generation: We compute closures in the following cases:
    1. We compute the average time of a closure generation for attributes vary-
       ing from 1 to 50 in the following case, with 100.000 objects and 50 at-
       tributes, each object described by 10, 20, 30 then 40 random attributes.
       (see Fig. 2(a))
    2. We compute the closure of a singleton attribute with 100 000 objects,
       50 attributes, each object described by random attributes from 2 to 49.
    Trends of our results are consistent with theoretical complexities of the algo-
rithms. Deporting a part of the calculation in the SQL engine, we believe that
the index dedicated to counting object can improve performance. This results
renforces those obtained in [5] on the efficience of the key-indexation techniques
in SQL. In addition, new technologies related to cloud computing will also divide
the workload of the SQL engine on demand.

6   Conclusion
In this paper, we described two basic algorithms for browsing in a concept lattice
with limited objects access. By separating the counting from the rest of the algo-
rithm, new systems for exploring concept lattices can now rely on optimization
algorithms used in relational databases. If the tests we will realize on PostgreSQL
and MySQL databases are successfull in terms of manipulating a huge amounts
of data, we plan to propose a library for extending content management system
such as Joomla!.

References
1. M. Barbut and B. Monjardet. Ordres et classifications : Algèbre et combinatoire.
   Hachette, Paris, 1970. 2 tomes.
2. G. Birkhoff. Lattice theory. American Mathematical Society, 3d edition, 1967.
3. N. Caspard and B. Monjardet. The lattice of closure systems, closure operators
   and implicational systems on a finite set: a survey. Discrete Applied Mathematics,
   127(2):241–269, 2003.
4. B.A. Davey and H.A. Priestley. Introduction to lattices and orders. Cambridge
   University Press, 2nd edition, 1991.
5. C. Demko and K. Bertet. Generation algorithm of a concept lattice with limited
   object access. In Proc. of Concept lattices and Applications (CLA’11), pages 113–
   116, Nancy, France, October 2011.
6. G. Stumme, R. Taouil, Y. Bastide, N. Pasquier, and L. Lakhal. Computing iceberg
   concept lattices with TITANIC. Data and Knowledge Engineering, 42(2):189–222,
   August 2002.

</pre>