=Paper= {{Paper |id=Vol-162/paper-7 |storemode=property |title=Efficient algorithms for clone items detection |pdfUrl=https://ceur-ws.org/Vol-162/paper7.pdf |volume=Vol-162 |dblpUrl=https://dblp.org/rec/conf/cla/MedinaNR05 }} ==Efficient algorithms for clone items detection== https://ceur-ws.org/Vol-162/paper7.pdf
     Efficient
     Efficient algorithms
               algorithms for
                          for clone
                              clone items
                                    items detection
                                          detection

               Raoul Medina, Caroline Noyer, and Olivier Raynaud
               Raoul Medina, Caroline Noyer and Olivier Raynaud
                        LIMOS - Université Blaise Pascal,
                         LIMOS -des
             Campus universitaire  Université
                                      Cézeaux,Blaise Pascal,
                                                Clermont-Ferrand, France
             Campus universitaire des Cézeaux,
                           {medina,             Clermont-Ferrand, France
                                      raynaud}@isima.fr
                           {medina, raynaud}@isima.fr


        Abstract. This paper presents efficient algorithms for clone items de-
        tection in a binary relation. Best implementation of these algorithms has
        O(|J|.||M||) time complexity, with J corresponding to the set of items of
        the relation and ||M|| corresponding to the size of the relation. This re-
        sult improves the previous algorithm given in [3] which is O(|J|2 .||M||).
        Clone items have been introduced by Medina and Nourine to explain
        why, sometimes, the number of rules of a minimum cover of a relation is
        exponential with the number of items of the relation.


 1    Introduction
 The clone items notion has been introduced by Medina and Nourine in [3]. Aim
 of their paper was to understand the combinatorial explosion that might arise
 in a minimum basis of a relation. Most famous example of such exponential
 minimum basis is given by Manilla and Rähiä in [2]. Medina and Nourine no-
 ticed that in this example some items play symetrical role in the basis. Indeed,
 for each rule containing a given item a in the antecedent there is a symetri-
 cal rule where item a is replaced by item b. Such symetrical items are said to
 be clone items. The clone notion is an equivalence relation and thus classes of
 clone items can be found in a binary relation. In [3], the authors show how to
 compute such clone classes and, from those classes, how to reduce the binary
 relation in order to obtain a minimum basis with no clone items. The detection
 algorithm as well as the reduction algorithm have both polynomial time com-
 plexities. The obtained minimum basis is smaller than the original one (in the
 case of the Mannila and Rähiä example it reduces to a single rule) and this later
 can be reconstructed from the clone free minimum basis and the clone classes
 information in polynomial time.
     Recently, Gely et al. (in [1]) extended the notion of clone items (which are
 defined on closed sets) to the notion of P-clone items (defined on pseudo-closed
 sets) and to the notion of A-clone items (defined on what they call an ”atomized”
 context). Their approach could be generalized to any generation problem where,
 from a given relation, one wants to compute a (potentially exponential) collection
 of sets verifying a property (e.g. the sets must be closed, or the sets must be
 ideals, or the sets are pseudo-closed sets, etc...). The idea is then to reduce the
 combinatorial explosion of the wanted collection by representing it by a clone
 free collection and the classes of clone items of this collection. The problem is


Radim Bělohlávek, Václav Snášel (Eds.): CLA 2005, pp. 70–81, ISBN 80–248–0863–3.
                               Efficient algorithms for clone items detection   71


then to be able to compute the clone classes of the (potentially exponential)
collection without generating its sets. To solve this problem in polynomial time,
the general method could be as follows:
 – Let M be the (potentially exponential) collection of sets verifying a property
   over a given binary relation R. Compute in polynomial time a collection M0
   such that:
    1. the size of M0 is polynomial in the size of R,
    2. items a and b are clone in M if and only if they are clone items in M0 .
 – Detect the clone classes in M0 .
    Our paper focuses on the detection phase of this approach. The clone classes
computation algorithm given in [3] has an O(|J|2 .||M||) time complexity, where
J is the set
         P of items and ||M|| is the size of the input collection. In other words,
||M|| = m∈M |m|. In this paper, we present different algorithms to solve the
clone classes computation. The best complexity we obtain is in O(|J|.||M||).
    This paper is organized as follows: section 2 formally defines the problem in
terms of collection of sets and introduces the corresponding Abstract Data Type
which will be used by our algorithms. Section 3 describes three computation
strategies and the corresponding time complexities are studied. Section 4 shows
how to take advantage of those algorithms in order to compute the clone items
classes as defined in [3].


2     General context and definitions
In this section, we first formally define the studied problem. Then we present
the Abstract Data Structure called Map used in our algorithms and discuss on
its possible implementations.

2.1    Clone items in a Sets Collection
Let J be a set of items {x1 , ..., x|J| } and M a sets collection on J. We denote
by ϕa,b : 2J → 2J the mapping which associates to any subset of J its image
by swapping items a and b. More formally :
                         
                          (m \ {a}) ∪ {b} if b 6∈ m and a ∈ m
              ϕa,b (m) = (m \ {b}) ∪ {a} if a 6∈ m and b ∈ m
                           m                   otherwise
                         

Definition 1. Let M be a collection of sets defined on J. We say that items a
and b are clone items in M if and only if ∀m ∈ M, ϕa,b (m) ∈ M.
    The clone items concept is a binary one. To the question ”are a and b clone
items ?”, only the answer ”true” or ”false” is possible. It could be interesting
to have more precisions when the negative response is given. Are a and b very
far from being clone items or why are not they clone ? For this purpose, we
introduce a measure to qualify the clone property. This measure will represent
72        Raoul Medina, Caroline Noyer, Olivier Raynaud


a distance between two items a and b, based on the definition of the clone items
property. This distance is exactly the number of elements m of M which do not
have an image in M when applying the swapping function ϕa,b (m). When this
distance is zero, a and b are clone items. The greater the distance is, the farther
a and b are to be clone. More formally:

Definition 2. Let M be a sets collection on J and let (a, b) in J 2 . We call
distance between a and b, denoted by dM (a, b), the mapping :

                             J2 → N
                       dM (a, b) → {m ∈ M | ϕa,b (m) 6∈ M}


      Thanks to definition 2, clone items could be charaterized as follows :

Proposition 1. Let M be a sets collection on J and (a, b) in J 2 , a and b are
clone items if and only if dM (a, b) = 0.
    The problem we study in this paper is the computation of distances between
all pair of items of J:
Problem 1 (Distance).
    Data : a sets collection M on J;
    Result : the matrix dM .
    Here, we present the main property on which rely our algorithms. It charac-
terizes a couple (m, m0 ) of the sets collection M such that m = ϕa,b (m0 ).

Proposition 2. Let M be a sets collection defined over J, m and m0 two dis-
tinct sets of M and (a, b) two items of J such that a ∈ m and b ∈ m0 . Then the
following assertions are equivalent:
1. m = ϕa,b (m0 )
2. m0 = ϕa,b (m)
3. |m| = |m0 | and m \ m0 = {a} and m0 \ m = {b}
4. m \ {a} = m0 \ {b}
   This property states that two sets m and m0 are their respective images by
the swapping function ϕ if and only if they have same size t and share t − 1
items. This property follows directly from the definition of the ϕ mapping.

2.2     Abstract Data Type : Key Mapping
Interface. We use a Map abstract data type similar to the Map interface of
Java language. This data structure maps keys to values. In our case, the keys
are the sets of the collection. The values mapped by the keys depend on the
algorithm.
   This abstract data type supplies the following operators:

 – new() operator: creates a Map object and returns an empty map.
                                                                                                                                                                                                      Efficient algorithms for clone items detection                                                                                                                                                                                                                                                                                                                                                                                                                                                                73


 – get(e) operator: returns the value associated to the key e if this key maps
   a value, or Nil otherwise.
 – put(e,value) operator: inserts set e in the map and associates value to it.

    Time complexities of those operators deeply rely on the data structure used
for the implementation of the Map data type. We propose an implementation
which takes advantage of the type of the keys, i.e. sets. To implement the Map
type we propose a lexicographic tree: itemsets are represented by branches of
the tree.

Implementation: lexicographic tree. We first give a formal definition of a
lexicographic tree corresponding to a sets collection.

Definition 3. Let M be a sets collection defined over J, with a total order on
J denoted by